Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminal state concept in POMCP algorithm #75

Open
GijsMargadant opened this issue Sep 9, 2024 · 1 comment
Open

Terminal state concept in POMCP algorithm #75

GijsMargadant opened this issue Sep 9, 2024 · 1 comment

Comments

@GijsMargadant
Copy link

I'm trying to implement a maintenance planning algorithm using POMCP. In this context, the decision maker is mainly interested in knowing when to perform a certain action given current and historical sensor observations. In this context, there also exists a concept of terminal states. When such terminal state is reached, any further actions are irrelevant, e.g., whenever the component fails or maintenance is initiated. The concept of terminal states is also mentioned in the original POMCP paper from Silver, D., and Veness, J. (2010). In particular, their Simulate and Rollout functions take it into account.

Because of a previous issue I opened (#73), I took a closer look at the _rollout function. It seems the current stopping condition only takes the max tree-depth into account. Is this observation correct or am I missing something?

@zkytony
Copy link
Collaborator

zkytony commented Sep 14, 2024

POMCP implemented here doesn't have a designated concept for a terminal state. But you can achieve the same effect (in terms of value estimation, asymptotically) by defining a terminal state, such that it transitions to itself and has 0 reward. See this comment on a similar topic: #8 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants