Terminal state concept in POMCP algorithm #75

GijsMargadant · 2024-09-09T10:42:31Z

I'm trying to implement a maintenance planning algorithm using POMCP. In this context, the decision maker is mainly interested in knowing when to perform a certain action given current and historical sensor observations. In this context, there also exists a concept of terminal states. When such terminal state is reached, any further actions are irrelevant, e.g., whenever the component fails or maintenance is initiated. The concept of terminal states is also mentioned in the original POMCP paper from Silver, D., and Veness, J. (2010). In particular, their Simulate and Rollout functions take it into account.

Because of a previous issue I opened (#73), I took a closer look at the _rollout function. It seems the current stopping condition only takes the max tree-depth into account. Is this observation correct or am I missing something?

The text was updated successfully, but these errors were encountered:

zkytony · 2024-09-14T18:14:14Z

POMCP implemented here doesn't have a designated concept for a terminal state. But you can achieve the same effect (in terms of value estimation, asymptotically) by defining a terminal state, such that it transitions to itself and has 0 reward. See this comment on a similar topic: #8 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminal state concept in POMCP algorithm #75

Terminal state concept in POMCP algorithm #75

GijsMargadant commented Sep 9, 2024

zkytony commented Sep 14, 2024

Terminal state concept in POMCP algorithm #75

Terminal state concept in POMCP algorithm #75

Comments

GijsMargadant commented Sep 9, 2024

zkytony commented Sep 14, 2024