Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding processing of Mind2Web dataset for Lumos grounding #5

Open
DanielRoeder1 opened this issue May 7, 2024 · 2 comments
Open

Comments

@DanielRoeder1
Copy link

Hello,

I am trying to map the Lumos WebAgent grounding dataset onto the original Mind2Web dataset. Unfortunetly the ids (annotation_id, action_uid) were removed in the Lumos version but via query extraction and matching I can match 1001/1009 samples to their corresponding Mind2Web entries.

But the problem that I am facing now is that Lumos must have done some processing on the actions itself. Lumos appears to have sometimes more, sometimes less actions (i.e. user msgs defining a grounding sentence). Why is this the case? Which processing was applied?

For my work I need a mapping of the Lumos grounding steps (that is the user msgs in the Lumos dataset) to the html_source code found in Mind2Web.

Happy to receive and guidance or advice and thanks for the great open-source work!

@yuchenlin
Copy link
Member

@WadeYin9712 plz take a look at this issue?

@WadeYin9712
Copy link
Contributor

Hi Daniel,

Sorry for the late reply! I was pretty busy working on the other ongoing project.

The mismatch might be due to the annotation conversion process, since sometimes the LLM may output something with invalid formats, and those will be arbitarily discarded (You can take a look at prompt_convertion.py in data folder). But indeed I wasn't aware of the issue about extra actions. But it might be simple to filter these out by matching the actions with the original ones in Mind2Web: If the action doesn't appear in Mind2Web, there must be sth wrong and feel free to remove them.

Let me know if you have further questions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants