Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#89 --- RFB gold processing #91

Merged
merged 7 commits into from
Aug 19, 2024
Merged

#89 --- RFB gold processing #91

merged 7 commits into from
Aug 19, 2024

Conversation

MrSqually
Copy link
Contributor

This PR is to add a process.py file for converting a directory of gold JSON annotations into a single CSV file. This satisfies the first two objectives of #89.

additions

  • role-filler-binding/process.py
  • gold files generated from process.py on 231117-aapb-annotations-44 data

discussion before merging

  • is this a suitable format for the data (newline-separated CSV)?
  • do we want to discard any of the non-RF data from this batch? do we actually want all these duplicates?

Copy link
Member

@keighrim keighrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • gold file(s) isn't compliant to the repository cataloging guideline (https://github.com/clamsproject/aapb-annotations?tab=readme-ov-file#gold-dataset-files) must be in golds dir, separate by asset guid. Please read through the guidelines and associated documents (conventions and templates)
  • gold file is using CR LF, but we use linux format (LF only)
  • the sub-project readme also needs to be updated based on the proc.py and gold format.

@keighrim
Copy link
Member

keighrim commented Jul 8, 2024

For the second discussion point, I think we should keep the duplicate labels, since they are also human generated (valuable) data that have potential usages.

@bohJiang12
Copy link
Contributor

Implementations

  • process.py: Updated the script to
    • export formatted .csv files with additional column SKIPPED indicating whether a frame is skipped by the human annotator
    • store each GUID under the protocol required directory golds
    • set the .csv file as LF format (try run cat -A <csv_file> | less to see if only dollar sign $ at the end for each line)
  • README.md: Added description for process.py and golds directory, especially for how the ANNOTATIONS column relates to raw annotations stored in JSON format.
  • requirements.txt: created my dev environment as required by the document

@bohJiang12 bohJiang12 requested a review from keighrim July 22, 2024 19:28
Copy link
Member

@keighrim keighrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay, output looks good. Just a few minor thing

  • old csv gold file now can be deleted
  • requirements.txt seems to be too verbose, while there's only one thrid-party library (clams-utils) used in the proc.py..?

@bohJiang12
Copy link
Contributor

  • requirements.txt seems to be too verbose, while there's only one thrid-party library (clams-utils) used in the proc.py..?

In fact, I just exported pip installed packages using pip freeze, this might show many packages but only clams_utils be used. So I'll update them later.

@bohJiang12 bohJiang12 requested a review from keighrim July 22, 2024 20:33
keighrim
keighrim previously approved these changes Jul 22, 2024
@keighrim keighrim requested a review from bohJiang12 July 29, 2024 15:06
@keighrim
Copy link
Member

@bohJiang12 I added a new column to mark the scene types on which the annotation is done. Not sure if and how adding this new column affects the evaluation counter-part (clamsproject/aapb-evaluations#66). Please take a look at this change.

@keighrim keighrim merged commit 4596ef2 into main Aug 19, 2024
@keighrim keighrim deleted the 89-rfb-gold branch August 19, 2024 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants