-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controllor): make agent controller stops when encounter fatal observation #4573
Conversation
@@ -255,7 +261,8 @@ async def _handle_observation(self, observation: Observation): | |||
self.state.metrics.merge(self.state.local_metrics) | |||
elif isinstance(observation, FatalErrorObservation): | |||
await self.report_error( | |||
'There was a fatal error during agent execution: ' + str(observation) | |||
'There was a fatal error during agent execution: ' + str(observation), | |||
add_to_eventstream=False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does adding it to the event stream cause the behavior?
this seems like it'd be nice to have in the EventStream (e.g. if someone does share-openhands)--is there another fix we could make?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ohh - in the line below we actually add FatalErrorObservation directly to eventstream.
Without this fix, what happened was:
- we add an ErrorObservtaion to eventstream - which will possibly triggers more agent actions
- Then we add an FatalErrorObservation - which should stop the controller (but it didn't for some unknown reason)
In this PR, we simply don't do (1) - so FatalErrorObservation will properly stop the agent controller.
For ShareOpenHands -- i think as long as FatalErrorObservation is in the eventstream, we should be good, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, we also should reraise this Fatal, maybe here.
Although... ideally we wouldn't use send_error for it. It has too many purposes...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh the real problem here I think is that we're adding it to the event stream in a loop! This is an event stream handler, and it's adding the same event its handling back into the event stream
Very tricky. #4575 for a better failure here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in evals, it will be part of the final report.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@xingyaoww I see several other places where we first set agent state, then report_error
, along with a note that this is important.
Can you try making that change? i.e. just move line 267 up, and revert everything else?
I'm going to work on making report_error
better on the side
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think while you're right in essence, Robert, I'm not sure we are indeed adding the same obs.
which should stop the controller (but it didn't for some unknown reason)
TBH I am confused by the error handling in the controller lately and I'm again not so sure about this one. Allow me to go back to basics. These are observations, not Exceptions. The way it would stop the controller is if we get here in the start loop with an exception, is that correct? Is there another way to stop than an uncaught exception? Can't we just raise on a FatalObservation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you try making that change? i.e. just move line 267 up, and revert everything else?
Just pushed a most recent commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm going to make this better--stay tuned!
…l error
End-user friendly description of the problem this fixes or functionality that this introduces
Give a summary of what the PR does, explaining any non-trivial design decisions
While running SWE-Bench eval, I found a few cases where FatalErrorObservation didn't really stop the agent controller (see below).
This PR:
cc @enyst - i think this is actually related to the
report_error
thing you mentioned earlier 😓Link of any specific issues this addresses