From 15f9d02b488056254953a2e8a12a352e1ada58b6 Mon Sep 17 00:00:00 2001 From: Charlie McVicker Date: Thu, 4 May 2023 11:28:31 -0400 Subject: [PATCH 1/4] doc/database changes --- doc/database/media.md | 11 ++++++----- doc/database/readme.md | 1 + doc/database/user.md | 12 ++++++++++++ doc/database/words.md | 41 ++++++++++++++++++++++++++--------------- 4 files changed, 45 insertions(+), 20 deletions(-) create mode 100644 doc/database/user.md diff --git a/doc/database/media.md b/doc/database/media.md index 903a36d0..2cbd9fcc 100644 --- a/doc/database/media.md +++ b/doc/database/media.md @@ -5,11 +5,12 @@ A timed media resource like video or audio, from an external source. The main use case is audio recordings of each document. -| column | type | description | -| ------------- | ------- | --------------------------------------- | -| `id` | `uuid` | Primary key | -| `url` | `text` | Full URL for this media resource | -| `recorded_at` | `date?` | Date and time this resource was created | +| column | type | description | +| ------------- | --------------- | ------------------------------------------------------------------------------------------- | +| `id` | `uuid` | Primary key | +| `url` | `text` | Full URL for this media resource | +| `recorded_at` | `date?` | Date and time this resource was created | +| `recorded_by` | `uuid? -> user` | Unique ID of the user that recorded this audio, if the audio was recorded by a contributor. | - Deleting also deletes all `media_slice` rows that reference it diff --git a/doc/database/readme.md b/doc/database/readme.md index d32edcd5..40c723b2 100644 --- a/doc/database/readme.md +++ b/doc/database/readme.md @@ -33,3 +33,4 @@ Most of our columns are `not null`, which is long to write so we introduced shor - [collections](./collections.md): Edited collections tables - [words](./words.md): Words, word parts, and abbreviation systems - [media](./media.md): Audio and image resources +- [user](./user.md): User account records diff --git a/doc/database/user.md b/doc/database/user.md new file mode 100644 index 00000000..95a26bdc --- /dev/null +++ b/doc/database/user.md @@ -0,0 +1,12 @@ +# User + +## `user` + +Metadata assocated with a user. `user.id` on this table is equal to `sub` in AWS. + +| column | type | description | +| -------------- | ------- | -------------------------------------------------- | +| `id` | `uuid` | Primary key, AWS Cognito `sub` claim | +| `display_name` | `text` | How the user's name should be presented in the app | +| `created_at` | `date` | When the user record was created | +| `archived_at` | `date?` | When the user record was archived, if ever | diff --git a/doc/database/words.md b/doc/database/words.md index eea447cc..4263cca9 100644 --- a/doc/database/words.md +++ b/doc/database/words.md @@ -2,24 +2,35 @@ ## `word` -| column | type | description | -| ------------------- | ------------------------ | --------------------------------------------------------------------------------------------------- | -| `id` | `uuid` | Primary key | -| `source_text` | `text` | Unambiguous transcription of the whole word | -| `simple_phonetics` | `text?` | Romanized phonetic spelling | -| `phonemic` | `text?` | Underlying phonemic representation, with more pronunciation details | -| `english_gloss` | `text?` | English translation | -| `recorded_at` | `date?` | When this word was written, only specified if it differs from when the document overall was written | -| `commentary` | `text?` | Linguistic or historical commentary supplied by an annotator | -| `audio_slice_id` | `uuid? -> media_slice` | Audio recording of the word read aloud | -| `document_id` | `uuid -> document` | Document the word is in | -| `page_number` | `text?` | Page number, only supplied for documents like dictionaries that may not have `document_page` rows | -| `index_in_document` | `bigint` | Position of the word in the whole document | -| `page_id` | `uuid? -> document_page` | Physical page containing this word | -| `character_range` | `int8range?` | Order of words in a paragraph is determined by character indices | +| column | type | description | +| ------------------------ | ------------------------ | --------------------------------------------------------------------------------------------------- | +| `id` | `uuid` | Primary key | +| `source_text` | `text` | Unambiguous transcription of the whole word | +| `simple_phonetics` | `text?` | Romanized phonetic spelling | +| `phonemic` | `text?` | Underlying phonemic representation, with more pronunciation details | +| `english_gloss` | `text?` | English translation | +| `recorded_at` | `date?` | When this word was written, only specified if it differs from when the document overall was written | +| `commentary` | `text?` | Linguistic or historical commentary supplied by an annotator | +| `audio_slice_id` | `uuid? -> media_slice` | Audio recording of the word read aloud | +| `curated_audio_slice_id` | `uuid? -> media_slice` | A Contributor audio recording of the word read aloud, which has been selected by an Editor | +| `audio_curated_by` | `uuid? -> user` | The Editor who selected the Contributor audio recording to show, if one has been selected. | +| `document_id` | `uuid -> document` | Document the word is in | +| `page_number` | `text?` | Page number, only supplied for documents like dictionaries that may not have `document_page` rows | +| `index_in_document` | `bigint` | Position of the word in the whole document | +| `page_id` | `uuid? -> document_page` | Physical page containing this word | +| `character_range` | `int8range?` | Order of words in a paragraph is determined by character indices | - One of `page_id` or `character_range` must be supplied +## `word_contributor_media` + +A join table linking user audio contributions to words in documents. This is a many-to-many relationship, so should be indexed on both keys, with a compound unique constraint. Ie. you cannot link the same audio to the same word multiple times. Additions should be written as upserts. + +| column | type | description | +| ---------------- | --------------------- | ---------------------------------------- | +| `word_id` | `uuid -> word` | Word that is assocated with media slice. | +| `media_slice_id` | `uuid -> media_slice` | Media slice that is assocated with word. | + ## `word_segment` A part of a word, also known as a morpheme within a morphemic segmentation. From c85e6d5c4324f1dd418d52bd59750b5df182864a Mon Sep 17 00:00:00 2001 From: Charlie McVicker Date: Thu, 4 May 2023 14:12:58 -0400 Subject: [PATCH 2/4] clean up lanugage --- doc/database/media.md | 12 ++++++------ doc/database/user.md | 4 +++- doc/database/words.md | 6 +++--- 3 files changed, 12 insertions(+), 10 deletions(-) diff --git a/doc/database/media.md b/doc/database/media.md index 2cbd9fcc..168eafe9 100644 --- a/doc/database/media.md +++ b/doc/database/media.md @@ -5,12 +5,12 @@ A timed media resource like video or audio, from an external source. The main use case is audio recordings of each document. -| column | type | description | -| ------------- | --------------- | ------------------------------------------------------------------------------------------- | -| `id` | `uuid` | Primary key | -| `url` | `text` | Full URL for this media resource | -| `recorded_at` | `date?` | Date and time this resource was created | -| `recorded_by` | `uuid? -> user` | Unique ID of the user that recorded this audio, if the audio was recorded by a contributor. | +| column | type | description | +| ------------- | --------------- | -------------------------------------------------------------------------------------------- | +| `id` | `uuid` | Primary key | +| `url` | `text` | Full URL for this media resource | +| `recorded_at` | `date?` | Date and time this resource was created | +| `recorded_by` | `uuid? -> user` | The user that recorded this audio, if the audio was recorded by a Contributor on the website | - Deleting also deletes all `media_slice` rows that reference it diff --git a/doc/database/user.md b/doc/database/user.md index 95a26bdc..ba7ba591 100644 --- a/doc/database/user.md +++ b/doc/database/user.md @@ -2,7 +2,9 @@ ## `user` -Metadata assocated with a user. `user.id` on this table is equal to `sub` in AWS. +Metadata assocated with a user. `user.id` on this table is equal to `sub` in +AWS. Users are not to be confused with `contributor` entires, which are imported +from Google Sheets. | column | type | description | | -------------- | ------- | -------------------------------------------------- | diff --git a/doc/database/words.md b/doc/database/words.md index 4263cca9..721dfb32 100644 --- a/doc/database/words.md +++ b/doc/database/words.md @@ -11,9 +11,9 @@ | `english_gloss` | `text?` | English translation | | `recorded_at` | `date?` | When this word was written, only specified if it differs from when the document overall was written | | `commentary` | `text?` | Linguistic or historical commentary supplied by an annotator | -| `audio_slice_id` | `uuid? -> media_slice` | Audio recording of the word read aloud | +| `audio_slice_id` | `uuid? -> media_slice` | Audio recording of the word read aloud, as ingested from Google Sheets. | | `curated_audio_slice_id` | `uuid? -> media_slice` | A Contributor audio recording of the word read aloud, which has been selected by an Editor | -| `audio_curated_by` | `uuid? -> user` | The Editor who selected the Contributor audio recording to show, if one has been selected. | +| `audio_curated_by` | `uuid? -> user` | The Editor who selected the Contributor audio recording to show, if one has been selected | | `document_id` | `uuid -> document` | Document the word is in | | `page_number` | `text?` | Page number, only supplied for documents like dictionaries that may not have `document_page` rows | | `index_in_document` | `bigint` | Position of the word in the whole document | @@ -22,7 +22,7 @@ - One of `page_id` or `character_range` must be supplied -## `word_contributor_media` +## `word_user_media` A join table linking user audio contributions to words in documents. This is a many-to-many relationship, so should be indexed on both keys, with a compound unique constraint. Ie. you cannot link the same audio to the same word multiple times. Additions should be written as upserts. From 985ad9b72740531e02f5c506b69d87bcd8753d0c Mon Sep 17 00:00:00 2001 From: Charlie McVicker Date: Thu, 4 May 2023 14:48:34 -0400 Subject: [PATCH 3/4] write the actual migration (yay)! --- doc/database/readme.md | 13 +++++++++ doc/database/user.md | 15 +++++------ .../20230504182127_add_user_audio.sql | 27 +++++++++++++++++++ 3 files changed, 47 insertions(+), 8 deletions(-) create mode 100644 types/migrations/20230504182127_add_user_audio.sql diff --git a/doc/database/readme.md b/doc/database/readme.md index 40c723b2..e344aa84 100644 --- a/doc/database/readme.md +++ b/doc/database/readme.md @@ -4,6 +4,19 @@ The docs here describe every single one of our database tables and columns. If you write a database schema migration, you should change the corresponding docs in this folder to match the new shape of the database. If you work with database tables that aren't sufficiently documented here, please add! +## How to write a new migration + +To create a migration file, use the follow command inside your `nix develop` shell. + +```zsh +cd types +sqlx migrate add +``` + +To test your migration without clearing your database, run `sqlx migrate run`. + +Other developers will get your migrations when they run `dev-migrate-schema`. + ## Abbreviations in this Folder Most of our columns are `not null`, which is long to write so we introduced shorthand for describing database columns. diff --git a/doc/database/user.md b/doc/database/user.md index ba7ba591..d7c9c456 100644 --- a/doc/database/user.md +++ b/doc/database/user.md @@ -1,14 +1,13 @@ # User -## `user` +## `dailp_user` -Metadata assocated with a user. `user.id` on this table is equal to `sub` in +Metadata assocated with a user. `dailp_user.id` on this table is equal to `sub` in AWS. Users are not to be confused with `contributor` entires, which are imported from Google Sheets. -| column | type | description | -| -------------- | ------- | -------------------------------------------------- | -| `id` | `uuid` | Primary key, AWS Cognito `sub` claim | -| `display_name` | `text` | How the user's name should be presented in the app | -| `created_at` | `date` | When the user record was created | -| `archived_at` | `date?` | When the user record was archived, if ever | +| column | type | description | +| -------------- | ------ | -------------------------------------------------- | +| `id` | `uuid` | Primary key, AWS Cognito `sub` claim | +| `display_name` | `text` | How the user's name should be presented in the app | +| `created_at` | `date` | When the user record was created | diff --git a/types/migrations/20230504182127_add_user_audio.sql b/types/migrations/20230504182127_add_user_audio.sql new file mode 100644 index 00000000..24a784e7 --- /dev/null +++ b/types/migrations/20230504182127_add_user_audio.sql @@ -0,0 +1,27 @@ +-- Add migration script here + +create table dailp_user ( + id autouuid primary key, + display_name text not null, + created_at date not null +); + +alter table media_resource + add column recorded_by uuid, + add constraint recorded_by_fkey foreign key (recorded_by) references dailp_user (id) on delete set null; + + +alter table word + add column curated_audio_slice_id uuid, + add constraint curated_audio_slice_id_fkey + foreign key (curated_audio_slice_id) references media_slice (id) on delete set null, + add column audio_curated_by uuid, + add constraint audio_curated_by_fkey + foreign key (audio_curated_by) references dailp_user (id) on delete set null; + + +create table word_user_media ( + word_id uuid not null references word (id) on delete cascade, + media_slice_id uuid not null references media_slice (id) on delete cascade, + primary key (word_id, media_slice_id) +); \ No newline at end of file From d3c8095de8bb3b530bf60e89a809e6ec790ea909 Mon Sep 17 00:00:00 2001 From: Charlie McVicker Date: Thu, 18 May 2023 09:24:14 -0400 Subject: [PATCH 4/4] update docs to reflect new table name --- doc/database/media.md | 12 ++++++------ doc/database/words.md | 2 +- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/doc/database/media.md b/doc/database/media.md index 168eafe9..e28c3b30 100644 --- a/doc/database/media.md +++ b/doc/database/media.md @@ -5,12 +5,12 @@ A timed media resource like video or audio, from an external source. The main use case is audio recordings of each document. -| column | type | description | -| ------------- | --------------- | -------------------------------------------------------------------------------------------- | -| `id` | `uuid` | Primary key | -| `url` | `text` | Full URL for this media resource | -| `recorded_at` | `date?` | Date and time this resource was created | -| `recorded_by` | `uuid? -> user` | The user that recorded this audio, if the audio was recorded by a Contributor on the website | +| column | type | description | +| ------------- | --------------------- | -------------------------------------------------------------------------------------------- | +| `id` | `uuid` | Primary key | +| `url` | `text` | Full URL for this media resource | +| `recorded_at` | `date?` | Date and time this resource was created | +| `recorded_by` | `uuid? -> dailp_user` | The user that recorded this audio, if the audio was recorded by a Contributor on the website | - Deleting also deletes all `media_slice` rows that reference it diff --git a/doc/database/words.md b/doc/database/words.md index 721dfb32..b7f2d7a3 100644 --- a/doc/database/words.md +++ b/doc/database/words.md @@ -13,7 +13,7 @@ | `commentary` | `text?` | Linguistic or historical commentary supplied by an annotator | | `audio_slice_id` | `uuid? -> media_slice` | Audio recording of the word read aloud, as ingested from Google Sheets. | | `curated_audio_slice_id` | `uuid? -> media_slice` | A Contributor audio recording of the word read aloud, which has been selected by an Editor | -| `audio_curated_by` | `uuid? -> user` | The Editor who selected the Contributor audio recording to show, if one has been selected | +| `audio_curated_by` | `uuid? -> dailp_user` | The Editor who selected the Contributor audio recording to show, if one has been selected | | `document_id` | `uuid -> document` | Document the word is in | | `page_number` | `text?` | Page number, only supplied for documents like dictionaries that may not have `document_page` rows | | `index_in_document` | `bigint` | Position of the word in the whole document |