## # A tibble: 183 × 18
+## # A tibble: 198 × 18
## collection_id collection_version_id collection_url consortia contact_email
## <chr> <chr> <chr> <list> <chr>
-## 1 ceb895f4-ff9f-4… ee098b5a-4f33-473b-b… https://cellx… <list> panagiotis.r…
-## 2 af893e86-8e9f-4… 768170a6-c590-4900-a… https://cellx… <list> ruichen@bcm.…
-## 3 1d1c7275-476a-4… 609becde-c797-41bb-8… https://cellx… <list> wey334@g.har…
-## 4 1b014f39-f202-4… 1d88cb46-6e84-4b5b-b… https://cellx… <lgl [1]> kimberly.ald…
-## 5 48d354f5-a5ca-4… 2862daa3-c933-43c8-9… https://cellx… <list> Nathan.Salom…
-## 6 43d4bb39-21af-4… 78360f02-1acc-415c-a… https://cellx… <lgl [1]> raymond.cho@…
-## 7 f7cecffa-00b4-4… 43224f82-db2a-443c-9… https://cellx… <list> st9@sanger.a…
-## 8 f17b9205-f61f-4… 21ff4724-95e2-491b-8… https://cellx… <list> genevieve.ko…
-## 9 64b24fda-6591-4… e414854b-2666-4977-9… https://cellx… <lgl [1]> magness@med.…
-## 10 48259aa8-f168-4… 44601b80-bd11-49d8-a… https://cellx… <lgl [1]> wtk22@cam.ac…
-## # ℹ 173 more rows
+## 1 59c9ecfe-c47d-4… 8db59637-4b42-4195-b… https://cellx… <lgl [1]> twc@stanford…
+## 2 a18474f4-ff1e-4… e1078e5b-da88-49d8-b… https://cellx… <lgl [1]> kpn2114@colu…
+## 3 f5af7a2f-ab4c-4… cb80d978-a1a2-4696-8… https://cellx… <lgl [1]> Yanling_Liao…
+## 4 aee9c366-f2fb-4… 62fc92b9-92f9-4f6f-9… https://cellx… <lgl [1]> kjensen@tgen…
+## 5 c26ca66a-63ea-4… 9b5f598e-b5d6-4d83-a… https://cellx… <lgl [1]> semil.choksi…
+## 6 e5f58829-1a66-4… 519f5ac5-1f84-4b48-9… https://cellx… <chr [2]> angela.pisco…
+## 7 b52eb423-5d0d-4… e75822ca-72d5-4d7f-a… https://cellx… <chr [2]> st9@sanger.a…
+## 8 e3aa612b-0d7d-4… 8bfe4964-eab1-4ae8-b… https://cellx… <chr [1]> hongkuiz@all…
+## 9 1b014f39-f202-4… 8b7765f4-c81a-46fe-a… https://cellx… <lgl [1]> kimberly.ald…
+## 10 48d354f5-a5ca-4… be9483df-dbbd-421f-b… https://cellx… <chr [1]> Nathan.Salom…
+## # ℹ 188 more rows
## # ℹ 13 more variables: contact_name <chr>, curator_name <chr>,
## # description <chr>, doi <chr>, links <list>, name <chr>,
## # publisher_metadata <list>, revising_in <lgl>, revision_of <lgl>,
## # visibility <chr>, created_at <date>, published_at <date>, revised_at <date>
-## # A tibble: 1,179 × 31
+## # A tibble: 1,285 × 32
## dataset_id dataset_version_id collection_id donor_id assay batch_condition
## <chr> <chr> <chr> <list> <list> <list>
-## 1 53ce2631-36… 2f17c183-388a-4c0… ceb895f4-ff9… <list> <list> <list [2]>
-## 2 1d4128f6-c2… 94762ee1-9f9f-49e… ceb895f4-ff9… <list> <list> <list [2]>
-## 3 ed419b4e-db… 758b30a8-5fb0-46c… af893e86-8e9… <list> <list> <lgl [1]>
-## 4 aad97cb5-f3… d6966985-89f9-485… af893e86-8e9… <list> <list> <lgl [1]>
-## 5 8f10185b-e0… 63d7a3a3-9691-41d… af893e86-8e9… <list> <list> <lgl [1]>
-## 6 359f7af4-87… 0f461193-282f-443… af893e86-8e9… <list> <list> <lgl [1]>
-## 7 11ef37ee-21… 74253a67-927c-4cd… af893e86-8e9… <list> <list> <lgl [1]>
-## 8 0129dbd9-a7… a970179d-2e9e-4d2… af893e86-8e9… <list> <list> <lgl [1]>
-## 9 00e5dedd-b9… 94c0e74c-b269-4ce… af893e86-8e9… <list> <list> <lgl [1]>
-## 10 d319af7f-be… 3c80a5bb-8c89-433… 1d1c7275-476… <list> <list> <lgl [1]>
-## # ℹ 1,169 more rows
-## # ℹ 25 more variables: cell_count <int>, cell_type <list>, citation <chr>,
-## # development_stage <list>, disease <list>, embeddings <list>,
-## # explorer_url <chr>, feature_biotype <list>, feature_count <int>,
-## # feature_reference <list>, is_primary_data <list>,
+## 1 595c9010-99… b4645848-e3d8-492… 59c9ecfe-c47… <chr> <list> <lgl [1]>
+## 2 2f05ab20-a0… 3b715360-b0ae-4e5… 59c9ecfe-c47… <chr> <list> <lgl [1]>
+## 3 faed4f71-6b… 1cd4f84b-7fe4-463… a18474f4-ff1… <chr> <list> <lgl [1]>
+## 4 e5233a94-9e… 80d1f22a-8f51-49d… a18474f4-ff1… <chr> <list> <lgl [1]>
+## 5 9b188f26-c8… 1c158adf-90ea-47b… a18474f4-ff1… <chr> <list> <lgl [1]>
+## 6 94423ec1-21… 7c4220a8-ce41-49f… a18474f4-ff1… <chr> <list> <lgl [1]>
+## 7 7bb64315-9e… 9ec33c56-14a4-481… a18474f4-ff1… <chr> <list> <lgl [1]>
+## 8 773b9b2e-70… f186deb1-4926-404… a18474f4-ff1… <chr> <list> <lgl [1]>
+## 9 63bb6359-39… a5df7f5b-34d8-421… a18474f4-ff1… <chr> <list> <lgl [1]>
+## 10 03d5794d-cd… 8db806f1-a518-4a2… a18474f4-ff1… <chr> <list> <lgl [1]>
+## # ℹ 1,275 more rows
+## # ℹ 26 more variables: cell_count <int>, cell_type <list>, citation <chr>,
+## # default_embedding <chr>, development_stage <list>, disease <list>,
+## # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
+## # feature_count <int>, feature_reference <list>, is_primary_data <list>,
## # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
## # raw_data_location <chr>, schema_version <chr>, …
-## # A tibble: 2,338 × 4
+## # A tibble: 2,538 × 4
## dataset_id filesize filetype url
## <chr> <dbl> <chr> <chr>
-## 1 53ce2631-3646-4172-bbd9-38b0a44d8214 406108808 H5AD https://datasets.ce…
-## 2 53ce2631-3646-4172-bbd9-38b0a44d8214 399752425 RDS https://datasets.ce…
-## 3 1d4128f6-c27b-40c4-af77-b1c7e2b694e7 906795740 H5AD https://datasets.ce…
-## 4 1d4128f6-c27b-40c4-af77-b1c7e2b694e7 1060800682 RDS https://datasets.ce…
-## 5 ed419b4e-db9b-40f1-8593-68fdf8dfb076 1071401902 H5AD https://datasets.ce…
-## 6 ed419b4e-db9b-40f1-8593-68fdf8dfb076 1419579253 RDS https://datasets.ce…
-## 7 aad97cb5-f375-45ef-ae9d-178e7f5d5180 785137201 H5AD https://datasets.ce…
-## 8 aad97cb5-f375-45ef-ae9d-178e7f5d5180 1025253758 RDS https://datasets.ce…
-## 9 8f10185b-e0b3-46a5-8706-7f1799225d79 3077438912 H5AD https://datasets.ce…
-## 10 8f10185b-e0b3-46a5-8706-7f1799225d79 4090930879 RDS https://datasets.ce…
-## # ℹ 2,328 more rows
+## 1 595c9010-99ec-462d-b6a1-2b2fe5407871 333391557 H5AD https://datasets.ce…
+## 2 595c9010-99ec-462d-b6a1-2b2fe5407871 348066008 RDS https://datasets.ce…
+## 3 2f05ab20-a092-4bab-9276-3e0eb24e3fee 630668977 H5AD https://datasets.ce…
+## 4 2f05ab20-a092-4bab-9276-3e0eb24e3fee 683935787 RDS https://datasets.ce…
+## 5 faed4f71-6b50-4fc7-bd1c-8f385dccfdce 31737910 H5AD https://datasets.ce…
+## 6 faed4f71-6b50-4fc7-bd1c-8f385dccfdce 26350706 RDS https://datasets.ce…
+## 7 e5233a94-9e43-418c-8209-6f1400c31530 1321449650 H5AD https://datasets.ce…
+## 8 e5233a94-9e43-418c-8209-6f1400c31530 1311136088 RDS https://datasets.ce…
+## 9 9b188f26-c8e1-4a78-af15-622a35a371fc 36324999 H5AD https://datasets.ce…
+## 10 9b188f26-c8e1-4a78-af15-622a35a371fc 30585807 RDS https://datasets.ce…
+## # ℹ 2,528 more rows
Each of these resources has a unique primary identifier (e.g.,
file_id
) as well as an identifier describing the
relationship of the resource to other components of the database (e.g.,
@@ -246,9 +246,9 @@
Using dplyr
to navigate data<
## Rows: 1
## Columns: 18
## $ collection_id <chr> "283d65eb-dd53-496d-adb7-7570c7caa443"
-## $ collection_version_id <chr> "4c16c611-00a9-42f9-a8c4-7b42daa226fe"
+## $ collection_version_id <chr> "c9c120ca-7605-43b0-9ef5-728392d708f5"
## $ collection_url <chr> "https://cellxgene.cziscience.com/collections/28…
-## $ consortia <list> ["BRAIN Initiative", "CZI Single-Cell Biology"]
+## $ consortia <list> <"BRAIN Initiative", "CZI Single-Cell Biology">
## $ contact_email <chr> "kimberly.siletti@ki.se"
## $ contact_name <chr> "Kimberly Siletti"
## $ curator_name <chr> "James Chaffer"
@@ -260,9 +260,9 @@ Using dplyr
to navigate data<
## $ revising_in <lgl> NA
## $ revision_of <lgl> NA
## $ visibility <chr> "PUBLIC"
-## $ created_at <date> 2023-12-12
+## $ created_at <date> 2024-03-19
## $ published_at <date> 2022-12-09
-## $ revised_at <date> 2023-12-13
+## $ revised_at <date> 2024-03-22
We can take a similar strategy to identify all datasets belonging to
this collection
@@ -271,24 +271,24 @@ Using dplyr
to navigate data<
datasets(db),
by = "collection_id"
)
-## # A tibble: 138 × 31
+## # A tibble: 138 × 32
## collection_id dataset_id dataset_version_id donor_id assay batch_condition
## <chr> <chr> <chr> <list> <list> <list>
-## 1 283d65eb-dd53-… ff7d15fa-… 51e05270-1f00-452… <list> <list> <list [1]>
-## 2 283d65eb-dd53-… fe1a73ab-… 4e124ecc-7885-465… <list> <list> <list [1]>
-## 3 283d65eb-dd53-… fbf173f9-… 5a52f557-aeaf-4fc… <list> <list> <list [1]>
-## 4 283d65eb-dd53-… fa554686-… 6606e9aa-e4c4-452… <list> <list> <list [1]>
-## 5 283d65eb-dd53-… f9034091-… 8f5b1977-8317-447… <list> <list> <list [1]>
-## 6 283d65eb-dd53-… f8dda921-… 1ad58833-956c-454… <list> <list> <list [1]>
-## 7 283d65eb-dd53-… f7d003d4-… 4d002ac1-4671-490… <list> <list> <list [1]>
-## 8 283d65eb-dd53-… f6d9f2ad-… 2102f4b8-c1fe-4ee… <list> <list> <list [1]>
-## 9 283d65eb-dd53-… f5a04dff-… b92375fd-dafe-44c… <list> <list> <list [1]>
-## 10 283d65eb-dd53-… f502c312-… b750310e-1abb-4c7… <list> <list> <list [1]>
+## 1 283d65eb-dd53-… ff7d15fa-… 1846f7ed-d11e-44f… <chr> <list> <chr [1]>
+## 2 283d65eb-dd53-… fe1a73ab-… 5e5db76a-4390-41a… <chr> <list> <chr [1]>
+## 3 283d65eb-dd53-… fbf173f9-… a7334adc-f8e7-443… <chr> <list> <chr [1]>
+## 4 283d65eb-dd53-… fa554686-… 9bc5af53-e980-484… <chr> <list> <chr [1]>
+## 5 283d65eb-dd53-… f9034091-… 71809c6b-c83a-497… <chr> <list> <chr [1]>
+## 6 283d65eb-dd53-… f8dda921-… 4fc4d6fb-f105-4f9… <chr> <list> <chr [1]>
+## 7 283d65eb-dd53-… f7d003d4-… c8b0535b-bc4f-478… <chr> <list> <chr [1]>
+## 8 283d65eb-dd53-… f6d9f2ad-… dc08c31c-6629-49d… <chr> <list> <chr [1]>
+## 9 283d65eb-dd53-… f5a04dff-… 2e56729d-ee10-419… <chr> <list> <chr [1]>
+## 10 283d65eb-dd53-… f502c312-… a3fa88de-cbbe-404… <chr> <list> <chr [1]>
## # ℹ 128 more rows
-## # ℹ 25 more variables: cell_count <int>, cell_type <list>, citation <chr>,
-## # development_stage <list>, disease <list>, embeddings <list>,
-## # explorer_url <chr>, feature_biotype <list>, feature_count <int>,
-## # feature_reference <list>, is_primary_data <list>,
+## # ℹ 26 more variables: cell_count <int>, cell_type <list>, citation <chr>,
+## # default_embedding <chr>, development_stage <list>, disease <list>,
+## # embeddings <list>, explorer_url <chr>, feature_biotype <list>,
+## # feature_count <int>, feature_reference <list>, is_primary_data <list>,
## # mean_genes_per_cell <dbl>, organism <list>, primary_cell_count <int>,
## # raw_data_location <chr>, schema_version <chr>, …
@@ -302,20 +302,20 @@ author_datasets
provides a convenient point from which
to make basic queries, e.g., finding the authors contributing the most
datasets.
+#> # ℹ 4,182 more rows
Perhaps one is interested in the most prolific authors based on
‘collections’, rather than ‘datasets’. The five most prolific authors by
collection are
@@ -185,11 +185,11 @@ Challenge and solution#> # A tibble: 5 × 3
#> family given n
#> <chr> <chr> <int>
-#> 1 Teichmann Sarah A. 23
+#> 1 Teichmann Sarah A. 24
#> 2 Regev Aviv 14
-#> 3 Haniffa Muzlifah 12
-#> 4 Meyer Kerstin B. 12
-#> 5 Polanski Krzysztof 12