Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database migrations for contributor audio and curation #274

Merged
merged 5 commits into from
May 18, 2023
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 6 additions & 5 deletions doc/database/media.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,12 @@
A timed media resource like video or audio, from an external source.
The main use case is audio recordings of each document.

| column | type | description |
| ------------- | ------- | --------------------------------------- |
| `id` | `uuid` | Primary key |
| `url` | `text` | Full URL for this media resource |
| `recorded_at` | `date?` | Date and time this resource was created |
| column | type | description |
| ------------- | --------------- | ------------------------------------------------------------------------------------------- |
| `id` | `uuid` | Primary key |
| `url` | `text` | Full URL for this media resource |
| `recorded_at` | `date?` | Date and time this resource was created |
| `recorded_by` | `uuid? -> user` | Unique ID of the user that recorded this audio, if the audio was recorded by a contributor. |

- Deleting also deletes all `media_slice` rows that reference it

Expand Down
1 change: 1 addition & 0 deletions doc/database/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,4 @@ Most of our columns are `not null`, which is long to write so we introduced shor
- [collections](./collections.md): Edited collections tables
- [words](./words.md): Words, word parts, and abbreviation systems
- [media](./media.md): Audio and image resources
- [user](./user.md): User account records
12 changes: 12 additions & 0 deletions doc/database/user.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# User
CharlieMcVicker marked this conversation as resolved.
Show resolved Hide resolved

## `user`

Metadata assocated with a user. `user.id` on this table is equal to `sub` in AWS.

| column | type | description |
| -------------- | ------- | -------------------------------------------------- |
| `id` | `uuid` | Primary key, AWS Cognito `sub` claim |
| `display_name` | `text` | How the user's name should be presented in the app |
| `created_at` | `date` | When the user record was created |
| `archived_at` | `date?` | When the user record was archived, if ever |
41 changes: 26 additions & 15 deletions doc/database/words.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,35 @@

## `word`

| column | type | description |
| ------------------- | ------------------------ | --------------------------------------------------------------------------------------------------- |
| `id` | `uuid` | Primary key |
| `source_text` | `text` | Unambiguous transcription of the whole word |
| `simple_phonetics` | `text?` | Romanized phonetic spelling |
| `phonemic` | `text?` | Underlying phonemic representation, with more pronunciation details |
| `english_gloss` | `text?` | English translation |
| `recorded_at` | `date?` | When this word was written, only specified if it differs from when the document overall was written |
| `commentary` | `text?` | Linguistic or historical commentary supplied by an annotator |
| `audio_slice_id` | `uuid? -> media_slice` | Audio recording of the word read aloud |
| `document_id` | `uuid -> document` | Document the word is in |
| `page_number` | `text?` | Page number, only supplied for documents like dictionaries that may not have `document_page` rows |
| `index_in_document` | `bigint` | Position of the word in the whole document |
| `page_id` | `uuid? -> document_page` | Physical page containing this word |
| `character_range` | `int8range?` | Order of words in a paragraph is determined by character indices |
| column | type | description |
| ------------------------ | ------------------------ | --------------------------------------------------------------------------------------------------- |
| `id` | `uuid` | Primary key |
| `source_text` | `text` | Unambiguous transcription of the whole word |
| `simple_phonetics` | `text?` | Romanized phonetic spelling |
| `phonemic` | `text?` | Underlying phonemic representation, with more pronunciation details |
| `english_gloss` | `text?` | English translation |
| `recorded_at` | `date?` | When this word was written, only specified if it differs from when the document overall was written |
| `commentary` | `text?` | Linguistic or historical commentary supplied by an annotator |
| `audio_slice_id` | `uuid? -> media_slice` | Audio recording of the word read aloud |
| `curated_audio_slice_id` | `uuid? -> media_slice` | A Contributor audio recording of the word read aloud, which has been selected by an Editor |
| `audio_curated_by` | `uuid? -> user` | The Editor who selected the Contributor audio recording to show, if one has been selected. |
| `document_id` | `uuid -> document` | Document the word is in |
| `page_number` | `text?` | Page number, only supplied for documents like dictionaries that may not have `document_page` rows |
| `index_in_document` | `bigint` | Position of the word in the whole document |
| `page_id` | `uuid? -> document_page` | Physical page containing this word |
| `character_range` | `int8range?` | Order of words in a paragraph is determined by character indices |

- One of `page_id` or `character_range` must be supplied

## `word_contributor_media`

A join table linking user audio contributions to words in documents. This is a many-to-many relationship, so should be indexed on both keys, with a compound unique constraint. Ie. you cannot link the same audio to the same word multiple times. Additions should be written as upserts.

| column | type | description |
| ---------------- | --------------------- | ---------------------------------------- |
| `word_id` | `uuid -> word` | Word that is assocated with media slice. |
| `media_slice_id` | `uuid -> media_slice` | Media slice that is assocated with word. |

## `word_segment`

A part of a word, also known as a morpheme within a morphemic segmentation.
Expand Down