Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(controllor): make agent controller stops when encounter fatal observation #4573

Merged
merged 4 commits into from
Oct 26, 2024

Conversation

xingyaoww
Copy link
Contributor

…l error

End-user friendly description of the problem this fixes or functionality that this introduces

  • Include this change in the Release Notes. If checked, you must provide an end-user friendly description for your change below

Give a summary of what the PR does, explaining any non-trivial design decisions

While running SWE-Bench eval, I found a few cases where FatalErrorObservation didn't really stop the agent controller (see below).
This PR:

  • fixes this issue
  • add a unit test to make sure this doesn't happen again
2024-10-25 21:10:05,132 - INFO - **CmdRunAction (source=None)**
THOUGHT: Now that we've created the script, let's run it to see if we can reproduce the error:
COMMAND:
python /workspace/django__django__3.0/reproduce_error.py
2024-10-25 21:10:22,559 - ERROR - Error during action execution: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:22,560 - INFO - **FatalErrorObservation**
[Runtime (ID=5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4)] Action execution failed: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:22,855 - INFO - CodeActAgent LEVEL 0 LOCAL STEP 3 GLOBAL STEP 3
2024-10-25 21:10:26,597 - INFO - Cost: 0.01 USD | Accumulated Cost: 0.21 USD
Input tokens: 13599 | Output tokens: 119
Input tokens (cache hit): 13371
Input tokens (cache write): 223

2024-10-25 21:10:26,598 - INFO - **CmdRunAction (source=None)**
THOUGHT: I apologize for the error. It seems that the script couldn't run properly. Let's modify our approach and create a minimal Django project structure to reproduce the issue.

First, let's create a new directory for our test project:
COMMAND:
mkdir -p /workspace/django__django__3.0/test_project
2024-10-25 21:10:26,616 - ERROR - Error during action execution: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:26,617 - INFO - **FatalErrorObservation**
[Runtime (ID=5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4)] Action execution failed: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:26,700 - INFO - CodeActAgent LEVEL 0 LOCAL STEP 4 GLOBAL STEP 4
2024-10-25 21:10:34,737 - INFO - Cost: 0.01 USD | Accumulated Cost: 0.23 USD
Input tokens: 13849 | Output tokens: 304
Input tokens (cache hit): 13594
Input tokens (cache write): 250

2024-10-25 21:10:34,737 - INFO - **IPythonRunCellAction**
THOUGHT: I apologize for the continued errors. It seems that we're having issues with executing bash commands. Let's try to create the necessary files directly using the `str_replace_editor` function.

Let's create the main Django project file:
CODE:
print(file_editor(**{'command': 'create', 'path': '/workspace/django__django__3.0/test_project/manage.py', 'file_text': '#!/usr/bin/env python\nimport os\nimport sys\n\nif __name__ == "__main__":\n    os.environ.setdefault("DJANGO_SETTINGS_MODULE", "test_project.settings")\n    try:\n        from django.core.management import execute_from_command_line\n    except ImportError as exc:\n        raise ImportError(\n            "Couldn\'t import Django. Are you sure it\'s installed and "\n            "available on your PYTHONPATH environment variable? Did you "\n            "forget to activate a virtual environment?"\n        ) from exc\n    execute_from_command_line(sys.argv)'}))
2024-10-25 21:10:34,744 - ERROR - Error during action execution: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:34,745 - INFO - **FatalErrorObservation**
[Runtime (ID=5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4)] Action execution failed: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:34,841 - INFO - CodeActAgent LEVEL 0 LOCAL STEP 5 GLOBAL STEP 5
2024-10-25 21:10:40,048 - INFO - Cost: 0.01 USD | Accumulated Cost: 0.25 USD
Input tokens: 14282 | Output tokens: 182
Input tokens (cache hit): 13844
Input tokens (cache write): 433

2024-10-25 21:10:40,048 - INFO - **IPythonRunCellAction**
THOUGHT: I apologize for the persistent errors. It seems that we're encountering issues with the environment or permissions. Let's try a different approach. Instead of creating a new project, let's modify an existing file in the Django codebase to reproduce the issue.

Let's edit the `/workspace/django__django__3.0/django/urls/resolvers.py` file to add our custom converter and modify the error handling:
CODE:
print(file_editor(**{'command': 'view', 'path': '/workspace/django__django__3.0/django/urls/resolvers.py'}))
2024-10-25 21:10:40,054 - ERROR - Error during action execution: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:40,055 - INFO - **FatalErrorObservation**
[Runtime (ID=5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4)] Action execution failed: 404 Client Error: Not Found for url: https://5b68ee14-4288-4189-bae0-048d0fbb31be-e1d222bb4dca1ef4.eval-runtime.all-hands.dev/execute_action
2024-10-25 21:10:40,151 - INFO - CodeActAgent LEVEL 0 LOCAL STEP 6 GLOBAL STEP 6
2024-10-25 21:10:57,753 - INFO - Cost: 0.02 USD | Accumulated Cost: 0.29 USD
Input tokens: 14595 | Output tokens: 909
Input tokens (cache hit): 14277
Input tokens (cache write): 313

cc @enyst - i think this is actually related to the report_error thing you mentioned earlier 😓


Link of any specific issues this addresses

@xingyaoww xingyaoww changed the title fix(controllor): make sure agent controller stops when encounter fata… fix(controllor): make agent controller stops when encounter fatal observation Oct 26, 2024
@xingyaoww xingyaoww requested review from enyst and rbren October 26, 2024 01:52
@xingyaoww xingyaoww requested a review from tofarr October 26, 2024 02:11
@@ -255,7 +261,8 @@ async def _handle_observation(self, observation: Observation):
self.state.metrics.merge(self.state.local_metrics)
elif isinstance(observation, FatalErrorObservation):
await self.report_error(
'There was a fatal error during agent execution: ' + str(observation)
'There was a fatal error during agent execution: ' + str(observation),
add_to_eventstream=False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does adding it to the event stream cause the behavior?

this seems like it'd be nice to have in the EventStream (e.g. if someone does share-openhands)--is there another fix we could make?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh - in the line below we actually add FatalErrorObservation directly to eventstream.

Without this fix, what happened was:

  1. we add an ErrorObservtaion to eventstream - which will possibly triggers more agent actions
  2. Then we add an FatalErrorObservation - which should stop the controller (but it didn't for some unknown reason)

In this PR, we simply don't do (1) - so FatalErrorObservation will properly stop the agent controller.

For ShareOpenHands -- i think as long as FatalErrorObservation is in the eventstream, we should be good, right?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbren Maybe in part, because of this. We currently treat all error obs the same, and send them to the agent.

Alternative would be like this. On current main I think adding the Fatal obs here should do the same.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, we also should reraise this Fatal, maybe here.

Although... ideally we wouldn't use send_error for it. It has too many purposes...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh the real problem here I think is that we're adding it to the event stream in a loop! This is an event stream handler, and it's adding the same event its handling back into the event stream

Very tricky. #4575 for a better failure here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in evals, it will be part of the final report.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww I see several other places where we first set agent state, then report_error, along with a note that this is important.

Can you try making that change? i.e. just move line 267 up, and revert everything else?

I'm going to work on making report_error better on the side

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I think while you're right in essence, Robert, I'm not sure we are indeed adding the same obs.

which should stop the controller (but it didn't for some unknown reason)

TBH I am confused by the error handling in the controller lately and I'm again not so sure about this one. Allow me to go back to basics. These are observations, not Exceptions. The way it would stop the controller is if we get here in the start loop with an exception, is that correct? Is there another way to stop than an uncaught exception? Can't we just raise on a FatalObservation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rbren

Can you try making that change? i.e. just move line 267 up, and revert everything else?

Just pushed a most recent commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I think currently the controller doesn't stop hard even when the agent got stuck in a loop

image

Copy link
Collaborator

@rbren rbren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to make this better--stay tuned!

@xingyaoww xingyaoww merged commit be3cbb0 into main Oct 26, 2024
12 checks passed
@xingyaoww xingyaoww deleted the xw/fix-fatal branch October 26, 2024 18:28
Ethan0456 pushed a commit to openlocus/OpenHands that referenced this pull request Oct 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants