Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating sr readme, adding raws, & batch info #70

Merged
merged 14 commits into from
Nov 18, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions batches/aapb-collaboration-27-a.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# full metadata is available at https://docs.google.com/spreadsheets/d/1C1s7tJErZL3mEME78oTjaRxhWjD2Ke9pIIKFQcSyM8E/edit#gid=0 and for information on selection please visit: https://github.com/clamsproject/aapb-collaboration/issues/27
# This batch contains varied handpicked videos for scene recognition with variability in the slates/chyrons/credits. this batch is the densely-labelled one used in SR.
# This batch has 20 items.
cpb-aacip-129-88qc000k
cpb-aacip-f2c34dd1cd4
cpb-aacip-191-40ksn47s
cpb-aacip-507-028pc2tp2z
cpb-aacip-507-0k26970f2d
cpb-aacip-507-0z70v8b17g
cpb-aacip-512-542j67b12n
cpb-aacip-394-149p8fcw
cpb-aacip-08fb0e1f287
cpb-aacip-512-t43hx1753b
cpb-aacip-d0f2569e145
cpb-aacip-d8ebafee30e
cpb-aacip-c72fd5cbadc
cpb-aacip-b6a2a39b7eb
cpb-aacip-512-4b2x34nv4t
cpb-aacip-512-416sx65d21
cpb-aacip-512-3f4kk95f7h
cpb-aacip-512-348gf0nn4f
cpb-aacip-516-cc0tq5s94c
cpb-aacip-516-8c9r20sq57
24 changes: 24 additions & 0 deletions batches/aapb-collaboration-27-b.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# full metadata is available at https://docs.google.com/spreadsheets/d/1C1s7tJErZL3mEME78oTjaRxhWjD2Ke9pIIKFQcSyM8E/edit#gid=0 and for information on selection please visit: https://github.com/clamsproject/aapb-collaboration/issues/27
# This batch contains varied handpicked videos for scene recognition with variability in the slates/chyrons/credits. this batch is the sparsely-labelled one used in SR.
# This batch has 21 items.
cpb-aacip-254-75r7szdz
cpb-aacip-259-4j09zf95
cpb-aacip-526-hd7np1xn78
cpb-aacip-75-72b8h82x
cpb-aacip-fe9efa663c6
cpb-aacip-f5847a01db5
cpb-aacip-f2a88c88d9d
cpb-aacip-ec590a6761d
cpb-aacip-c7c64922fcd
cpb-aacip-f3fa7215348
cpb-aacip-f13ae523e20
cpb-aacip-e7a25f07d35
cpb-aacip-ce6d5e4bd7f
cpb-aacip-690722078b2
cpb-aacip-e649135e6ec
cpb-aacip-15-93gxdjk6
cpb-aacip-512-4f1mg7h078
cpb-aacip-512-4m9183583s
cpb-aacip-512-4b2x34nt7g
cpb-aacip-512-3n20c4tr34
cpb-aacip-512-3f4kk9534t
3 changes: 3 additions & 0 deletions batches/aapb-collaboration-27-sr-practice.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# this is a batch used for practice on the scene-recognition project. full metadata is available at https://docs.google.com/spreadsheets/d/1C1s7tJErZL3mEME78oTjaRxhWjD2Ke9pIIKFQcSyM8E/edit#gid=0 and for information on selection please visit: https://github.com/clamsproject/aapb-collaboration/issues/27
cpb-aacip-512-cc0tq5sk5w
cpb-aacip-507-028pc2tp5w
17 changes: 15 additions & 2 deletions repository_level_conventions.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,9 @@
> Some estimates of imprecision are given by Margin of Error.
> Directionality definitions help frame the boundaries meant by annotated times.
> The fields in the gold datasets should be standardized.
> Naming conventions - batches: `repoName-issueNumber(-identifier).txt`

## Conventions
## Data Formatting and Precision Conventions
### Time Point Notation
> [!IMPORTANT]
> `hh:mm:ss.mmm` with a **DOT**
Expand Down Expand Up @@ -78,7 +79,19 @@ Practically speaking, there is only a small percentage of cases where the variat
especially in cases of human perception.
The conventions for precision hold until new needs of the project are required.

### Field Naming Conventions for Gold Datasets
## File Naming Conventions
Batches should be named all in lower case in this format: `repoName-issueNumber(-identifier).txt` (parenthesis means optional parts).
The `repoName-issueNumber` part points to a GitHub issue (usually on [AAPB Collaborations Repo Issues](https://github.com/clamsproject/aapb-collaboration)) that contains the discussion/documentation of how this batch was chosen and created.
Any other `identifier`s come after this, and can be used to denote different batches created from the same issue. This will allow a family of batches stay together in usual "listing" operations in file systems.
Because batches can be reused for disparate projects, an identifier can indicate some property about the GUIDs in that batch,
but should not indicate particularity of the annotation project that the batch was used.
If no real discerning quality can be used as an identifier, use `abcd` lettering to denote numbering.
Finally, the whole name of the batch should use lowercase and `-`dashes.

E.g. `aapb-collaboration-27-a.txt` and `aapb-collaboration-27-b.txt`


## Field Naming Conventions for Gold Datasets
* `GUID` (all caps) - the AAPB id for that video e.g. "cpb-aacip-81-881jx33t".
* `start`, `end` - For "anchor" columns annotating a time duration of a phenomenon (e.g., character offsets, time intervals, etc.) use `start`, `end` for the column names.
* `entry` - this is also called index, or the tag number for how many annotations there are. e.g. The first piece of labelled data is "1".
Expand Down
Loading