Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dm_filter on remote PostgreSQL db errors #2228

Open
LDalby opened this issue Sep 6, 2024 · 7 comments
Open

dm_filter on remote PostgreSQL db errors #2228

LDalby opened this issue Sep 6, 2024 · 7 comments

Comments

@LDalby
Copy link

LDalby commented Sep 6, 2024

We have a PostgreSQL db and I would like to create local DuckDB copy of a subset of the rows on it.

My plan was to

  1. create the datamodel using dm_from_con
  2. filter the rows using dm_filter
  3. copy to local database using copy_dm_to

I currently get this error when attempting to filter:

library(dm)
#> 
#> Attaching package: 'dm'
#> The following object is masked from 'package:stats':
#> 
#>     filter

con <- camalienr::ca_connect() # internal pkg - only used here to connect to the db
remote_pg <- dm_from_con(con, schema = "data", learn_keys = TRUE)
dm_nrow(remote_pg)
#>                 chunk             detection      detectionsummary 
#>                 33781              83777960               9372360 
#>                 image             imagemeta                   job 
#>               5557936               2890422                   124 
#>               partner          plantnetcall plantnetcall_metadata 
#>                    11                159568                157834 
#> plantnetrequestparams               species     species_whitelist 
#>                     2                 49479                  2419

# In the detectionssummary table we have a column named speciesid
# integer ID on different species.
dm_filter(.dm = remote_pg,
          detectionsummary = (speciesid == 1356471))
#> Warning in igraph::graph_from_data_frame(., directed = directed, vertices =
#> def$table): In `d' `NA' elements were replaced with string "NA"
#> Error in `map()`:
#> ℹ In index: 1.
#> Caused by error in `igraph::graph_from_data_frame()`:
#> ! Some vertex names in edge list are not listed in vertex data frame

Created on 2024-09-06 with reprex v2.1.1

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.1 (2024-06-14)
#>  os       macOS Sonoma 14.6.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Copenhagen
#>  date     2024-09-06
#>  pandoc   3.1.11 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version     date (UTC) lib source
#>  backports     1.5.0       2024-05-23 [1] CRAN (R 4.4.0)
#>  bit           4.0.5       2022-11-15 [1] CRAN (R 4.4.0)
#>  bit64         4.0.5       2020-08-30 [1] CRAN (R 4.4.0)
#>  blob          1.2.4       2023-03-17 [1] CRAN (R 4.4.0)
#>  cachem        1.1.0       2024-05-16 [1] CRAN (R 4.4.0)
#>  camalienr     0.3.7       2024-07-03 [1] git (https://gitlab.au.dk/ECOS/biodiversa/camalien/camalienr.git@67fbee4)
#>  cli           3.6.3       2024-06-21 [1] CRAN (R 4.4.0)
#>  DBI           1.2.3       2024-06-02 [1] CRAN (R 4.4.0)
#>  dbplyr        2.5.0       2024-03-19 [1] CRAN (R 4.4.0)
#>  digest        0.6.37      2024-08-19 [1] CRAN (R 4.4.1)
#>  dm          * 1.0.10.9010 2024-08-30 [1] https://cynkra.r-universe.dev (R 4.4.1)
#>  dplyr         1.1.4       2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate      0.24.0      2024-06-10 [1] CRAN (R 4.4.0)
#>  fansi         1.0.6       2023-12-08 [1] CRAN (R 4.4.0)
#>  fastmap       1.2.0       2024-05-15 [1] CRAN (R 4.4.0)
#>  fs            1.6.4       2024-04-25 [1] CRAN (R 4.4.0)
#>  generics      0.1.3       2022-07-05 [1] CRAN (R 4.4.0)
#>  glue          1.7.0       2024-01-09 [1] CRAN (R 4.4.0)
#>  hms           1.1.3       2023-03-21 [1] CRAN (R 4.4.0)
#>  htmltools     0.5.8.1     2024-04-04 [1] CRAN (R 4.4.0)
#>  httpuv        1.6.15      2024-03-26 [1] CRAN (R 4.4.0)
#>  igraph        2.0.3       2024-03-13 [1] CRAN (R 4.4.0)
#>  knitr         1.48        2024-07-07 [1] CRAN (R 4.4.0)
#>  later         1.3.2       2023-12-06 [1] CRAN (R 4.4.0)
#>  lifecycle     1.0.4       2023-11-07 [1] CRAN (R 4.4.0)
#>  lubridate     1.9.3       2023-09-27 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3       2022-03-30 [1] CRAN (R 4.4.0)
#>  memoise       2.0.1       2021-11-26 [1] CRAN (R 4.4.0)
#>  mime          0.12        2021-09-28 [1] CRAN (R 4.4.0)
#>  pillar        1.9.0       2023-03-22 [1] CRAN (R 4.4.0)
#>  pkgconfig     2.0.3       2019-09-22 [1] CRAN (R 4.4.0)
#>  promises      1.3.0       2024-04-05 [1] CRAN (R 4.4.0)
#>  purrr         1.0.2       2023-08-10 [1] CRAN (R 4.4.0)
#>  R6            2.5.1       2021-08-19 [1] CRAN (R 4.4.0)
#>  Rcpp          1.0.13      2024-07-17 [1] CRAN (R 4.4.0)
#>  reprex        2.1.1       2024-07-06 [1] CRAN (R 4.4.0)
#>  rlang         1.1.4       2024-06-04 [1] CRAN (R 4.4.0)
#>  rmarkdown     2.28        2024-08-17 [1] CRAN (R 4.4.0)
#>  RPostgres     1.4.7       2024-05-27 [1] CRAN (R 4.4.0)
#>  rstudioapi    0.16.0      2024-03-24 [1] CRAN (R 4.4.0)
#>  sessioninfo   1.2.2       2021-12-06 [1] CRAN (R 4.4.0)
#>  shiny         1.9.1       2024-08-01 [1] CRAN (R 4.4.0)
#>  tibble        3.2.1       2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyr         1.3.1       2024-01-24 [1] CRAN (R 4.4.0)
#>  tidyselect    1.2.1       2024-03-11 [1] CRAN (R 4.4.0)
#>  timechange    0.3.0       2024-01-18 [1] CRAN (R 4.4.0)
#>  utf8          1.2.4       2023-10-22 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5       2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.1       2024-07-31 [1] CRAN (R 4.4.0)
#>  xfun          0.47        2024-08-17 [1] CRAN (R 4.4.0)
#>  xtable        1.8-4       2019-04-21 [1] CRAN (R 4.4.0)
#>  yaml          2.3.10      2024-07-26 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

I'm a bit confused about the error - why is {iGraph} called here?

@krlmlr
Copy link
Collaborator

krlmlr commented Nov 19, 2024

Thanks. Can you please share the output of constructive::construct(remote_pg) ?

@LDalby
Copy link
Author

LDalby commented Nov 20, 2024

Sure!

The output was too long to post as a comment, so I've attached it as txt.
dm-pg-debug.txt

@krlmlr
Copy link
Collaborator

krlmlr commented Nov 20, 2024

Thanks, this is helpful, but constructive's own logic intervened here. Can you please share constructive::construct(unclass(remote_pg)) -- just the output (which is executable R code already), not the whole reprex?

@moodymudskipper
Copy link
Collaborator

moodymudskipper commented Nov 20, 2024

On one hand it seems like constructive fails at constructing the correct fks because we have calls such as dm::dm_add_fk(NA, NA_character_, chunk, "id") that fail.
On the other hand the way connection objects are constructed now uses external pointers, this would cause the constructive message by itself (see also cynkra/constructive#150).

I think the best reprex here would be the output of :

con <- camalienr::ca_connect() # internal pkg - only used here to connect to the db
remote_pg <- dm_from_con(con, schema = "data", learn_keys = TRUE)
# construct_base will use only base functions so will bypass the shortcomings of `.cstr_construct.dm()`
# the data arg will show the connection object as `con` rather than trying to construct it
constructive::construct_base(remote_pg, data = list(con = con))

FYI Here's a simplification of the original code that fits here:

typnames1 <- data.frame(
  oid = c(
    16L, 17L, 18L, 19L, 20L, 21L, 22L, 23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L,
    71L, 75L, 81L, 83L, 114L, 142L, 194L, 3361L, 3402L, 5017L, 32L, 5069L, 600L,
    601L, 602L, 603L, 604L, 628L, 700L, 701L, 705L, 718L, 790L, 829L, 869L, 650L,
    774L, 1033L, 1042L, 1043L, 1082L, 1083L, 1114L, 1184L, 1186L, 1266L, 1560L,
    1562L, 1700L, 1790L, 2202L, 2203L, 2204L, 2205L, 4191L, 2206L, 4096L, 4089L,
    2950L, 3220L, 3614L, 3642L, 3615L, 3734L, 3769L, 3802L, 4072L, 2970L, 5038L,
    3904L, 3906L, 3908L, 3910L, 3912L, 3926L, 4451L, 4532L, 4533L, 4534L, 4535L,
    4536L, 2249L, 2287L, 2275L, 2276L, 2277L, 2278L, 2279L, 3838L, 2280L, 2281L,
    2283L, 2776L, 3500L, 3115L, 325L, 3310L, 269L, 3831L, 5077L, 5078L, 5079L,
    5080L, 4537L, 4538L, 4600L, 4601L, 1000L, 1001L, 1002L, 1003L, 1016L, 1005L,
    1006L, 1007L, 1008L, 1009L, 1028L, 1010L, 1011L, 1012L, 1013L, 210L, 270L,
    272L, 273L, 199L, 143L, 271L, 1017L, 1018L, 1019L, 1020L, 1027L, 629L, 1021L,
    1022L, 719L, 791L, 1040L, 1041L, 651L, 775L, 1034L, 1014L, 1015L, 1182L,
    1183L, 1115L, 1185L, 1187L, 1270L, 1561L, 1563L, 1231L, 2201L, 2207L, 2208L,
    2209L, 2210L, 4192L, 2211L, 4097L, 4090L, 2951L, 3221L, 3643L, 3644L, 3645L,
    3735L, 3770L, 3807L, 4073L, 2949L, 5039L, 3905L, 3907L, 3909L, 3911L, 3913L,
    3927L, 6150L, 6151L, 6152L, 6153L, 6155L, 6157L, 1263L, 12001L, 12000L,
    12003L, 12002L, 12005L, 12004L, 12007L, 12006L, 12009L, 12008L, 12011L,
    12010L, 12013L, 12012L, 12015L, 12014L, 12017L, 12016L, 12019L, 12018L,
    12021L, 12020L, 12023L, 12022L, 12025L, 12024L, 12027L, 12026L, 12029L,
    12028L, 12031L, 12030L, 12033L, 12032L, 12035L, 12034L, 12037L, 12036L,
    12039L, 12038L, 12041L, 12040L, 12043L, 12042L, 12045L, 12044L, 12047L,
    12046L, 12049L, 12048L, 12051L, 12050L, 1248L, 12052L, 12054L, 12053L, 12056L,
    12055L, 2842L, 12057L, 2843L, 12058L, 12060L, 12059L, 12062L, 12061L, 12064L,
    12063L, 12066L, 12065L, 12068L, 12067L, 12070L, 12069L, 12072L, 12071L,
    12074L, 12073L, 12076L, 12075L, 12078L, 12077L, 12080L, 12079L, 12082L,
    12081L, 12084L, 12083L, 12086L, 12085L, 12088L, 12087L, 12090L, 12089L,
    12092L, 12091L, 4066L, 12093L, 12095L, 12094L, 12097L, 12096L, 12099L, 12098L,
    12101L, 12100L, 12103L, 12102L, 12105L, 12104L, 12107L, 12106L, 6101L, 12108L,
    12110L, 12109L, 12219L, 12218L, 12224L, 12223L, 12229L, 12228L, 12233L,
    12232L, 12237L, 12236L, 12242L, 12241L, 12247L, 12246L, 12252L, 12251L,
    12257L, 12256L, 12262L, 12261L, 12267L, 12266L, 12272L, 12271L, 12277L,
    12276L, 12282L, 12281L, 12287L, 12286L, 12292L, 12291L, 12296L, 12295L,
    12300L, 12299L, 12304L, 12303L, 12309L, 12308L, 12314L, 12313L, 12318L,
    12317L, 12323L, 12322L, 12329L, 12328L, 12333L, 12332L, 12337L, 12336L,
    12341L, 12340L, 12345L, 12344L, 12349L, 12348L, 12353L, 12352L, 12357L,
    12356L, 12362L, 12361L, 12367L, 12366L, 12372L, 12371L, 12376L, 12375L,
    12381L, 12380L, 12385L, 12384L, 12390L, 12389L, 12394L, 12393L, 12398L,
    12397L, 12403L, 12402L, 12407L, 12406L, 12411L, 12410L, 12416L, 12415L,
    12420L, 12419L, 12424L, 12423L, 12429L, 12428L, 12433L, 12432L, 12437L,
    12436L, 12442L, 12441L, 12447L, 12446L, 12451L, 12450L, 12455L, 12454L,
    12460L, 12459L, 12464L, 12463L, 12468L, 12467L, 12473L, 12472L, 12477L,
    12476L, 12482L, 12481L, 12486L, 12485L, 12491L, 12490L, 12496L, 12495L,
    12500L, 12499L, 12504L, 12503L, 12508L, 12507L, 12513L, 12512L, 12518L,
    12517L, 12523L, 12522L, 12528L, 12527L, 12533L, 12532L, 12538L, 12537L,
    12543L, 12542L, 14365L, 14364L, 14368L, 14367L, 14370L, 14369L, 14373L,
    14372L, 14376L, 14375L, 14378L, 14377L, 14382L, 14381L, 14387L, 14386L,
    14391L, 14390L, 14396L, 14395L, 14401L, 14400L, 14406L, 14405L, 14411L,
    14410L, 14416L, 14415L, 14421L, 14420L, 14426L, 14425L, 14431L, 14430L,
    14436L, 14435L, 14441L, 14440L, 14446L, 14445L, 14451L, 14450L, 14456L,
    14455L, 14461L, 14460L, 14466L, 14465L, 14471L, 14470L, 14475L, 14474L,
    14480L, 14479L, 14485L, 14484L, 14490L, 14489L, 14494L, 14493L, 14499L,
    14498L, 14504L, 14503L, 14508L, 14507L, 14513L, 14512L, 14518L, 14517L,
    14523L, 14522L, 14528L, 14527L, 14532L, 14531L, 14537L, 14536L, 14542L,
    14541L, 14547L, 14546L, 14552L, 14551L, 14557L, 14556L, 14562L, 14561L,
    14567L, 14566L, 14571L, 14570L, 14576L, 14575L, 14581L, 14580L, 14586L,
    14585L, 14591L, 14590L, 14596L, 14595L, 14600L, 14599L, 14605L, 14604L,
    14609L, 14608L, 14614L, 14613L, 14619L, 14618L, 14624L, 14623L, 14629L,
    14628L, 14634L, 14633L, 14639L, 14638L, 14644L, 14643L, 14649L, 14648L,
    14653L, 14652L, 14657L, 14656L, 14661L, 14660L, 14665L, 14664L, 14670L,
    14669L, 14674L, 14673L, 14678L, 14677L, 14683L, 14682L, 14687L, 14686L,
    14691L, 14690L, 14696L, 14695L, 14701L, 14700L, 24592L, 24595L, 24596L,
    24604L, 24623L, 24626L, 24627L, 24630L, 24631L, 24634L, 24635L, 24638L,
    24893L, 24892L, 24899L, 24898L, 25006L, 25005L, 25259L, 25265L, 25283L,
    25282L, 25418L, 25417L, 25605L, 25608L, 25615L, 25614L, 25618L, 25617L,
    25645L, 25644L, 25658L, 25657L, 25682L, 25681L, 25747L, 25746L, 25999L,
    25998L, 26054L, 26053L, 26133L, 26132L, 26142L, 26141L, 774169L, 774168L,
    774181L, 774180L, 774194L, 774193L, 774208L, 774207L, 774215L, 774214L,
    774253L, 864631L, 864650L, 864649L, 865108L, 864601L, 774241L, 774240L,
    774276L, 774275L, 774261L, 774260L, 774254L, 864600L, 864632L, 864676L,
    864675L, 864696L, 864695L, 864663L, 864662L, 865109L, 869729L, 869728L,
    869736L, 869735L, 869748L, 869747L, 871659L, 871658L, 871664L, 871663L,
    1004015L, 1004014L, 1004039L, 1004038L, 1004358L, 1004357L, 1004364L,
    1004363L, 1004380L, 1004379L, 1004450L, 1004449L, 1004458L, 1004457L,
    1048798L, 1048797L, 1048873L, 1048872L, 1048884L, 1048883L, 1049004L,
    1049003L, 1049023L, 1049022L, 1049028L, 1049027L, 1049039L, 1049038L,
    1049057L, 1049056L, 1049063L, 1049062L, 1049087L, 1049086L, 1049133L,
    1049132L, 1049163L, 1049162L, 1049685L, 1049684L, 1049943L, 1049942L,
    1050665L, 1050664L
  ),
  typname = c(
    "bool", "bytea", "char", "name", "int8", "int2", "int2vector", "int4",
    "regproc", "text", "oid", "tid", "xid", "cid", "oidvector", "pg_type",
    "pg_attribute", "pg_proc", "pg_class", "json", "xml", "pg_node_tree",
    "pg_ndistinct", "pg_dependencies", "pg_mcv_list", "pg_ddl_command", "xid8",
    "point", "lseg", "path", "box", "polygon", "line", "float4", "float8",
    "unknown", "circle", "money", "macaddr", "inet", "cidr", "macaddr8",
    "aclitem", "bpchar", "varchar", "date", "time", "timestamp", "timestamptz",
    "interval", "timetz", "bit", "varbit", "numeric", "refcursor", "regprocedure",
    "regoper", "regoperator", "regclass", "regcollation", "regtype", "regrole",
    "regnamespace", "uuid", "pg_lsn", "tsvector", "gtsvector", "tsquery",
    "regconfig", "regdictionary", "jsonb", "jsonpath", "txid_snapshot",
    "pg_snapshot", "int4range", "numrange", "tsrange", "tstzrange", "daterange",
    "int8range", "int4multirange", "nummultirange", "tsmultirange",
    "tstzmultirange", "datemultirange", "int8multirange", "record", "_record",
    "cstring", "any", "anyarray", "void", "trigger", "event_trigger",
    "language_handler", "internal", "anyelement", "anynonarray", "anyenum",
    "fdw_handler", "index_am_handler", "tsm_handler", "table_am_handler",
    "anyrange", "anycompatible", "anycompatiblearray", "anycompatiblenonarray",
    "anycompatiblerange", "anymultirange", "anycompatiblemultirange",
    "pg_brin_bloom_summary", "pg_brin_minmax_multi_summary", "_bool", "_bytea",
    "_char", "_name", "_int8", "_int2", "_int2vector", "_int4", "_regproc",
    "_text", "_oid", "_tid", "_xid", "_cid", "_oidvector", "_pg_type",
    "_pg_attribute", "_pg_proc", "_pg_class", "_json", "_xml", "_xid8", "_point",
    "_lseg", "_path", "_box", "_polygon", "_line", "_float4", "_float8",
    "_circle", "_money", "_macaddr", "_inet", "_cidr", "_macaddr8", "_aclitem",
    "_bpchar", "_varchar", "_date", "_time", "_timestamp", "_timestamptz",
    "_interval", "_timetz", "_bit", "_varbit", "_numeric", "_refcursor",
    "_regprocedure", "_regoper", "_regoperator", "_regclass", "_regcollation",
    "_regtype", "_regrole", "_regnamespace", "_uuid", "_pg_lsn", "_tsvector",
    "_gtsvector", "_tsquery", "_regconfig", "_regdictionary", "_jsonb",
    "_jsonpath", "_txid_snapshot", "_pg_snapshot", "_int4range", "_numrange",
    "_tsrange", "_tstzrange", "_daterange", "_int8range", "_int4multirange",
    "_nummultirange", "_tsmultirange", "_tstzmultirange", "_datemultirange",
    "_int8multirange", "_cstring", "pg_attrdef", "_pg_attrdef", "pg_constraint",
    "_pg_constraint", "pg_inherits", "_pg_inherits", "pg_index", "_pg_index",
    "pg_operator", "_pg_operator", "pg_opfamily", "_pg_opfamily", "pg_opclass",
    "_pg_opclass", "pg_am", "_pg_am", "pg_amop", "_pg_amop", "pg_amproc",
    "_pg_amproc", "pg_language", "_pg_language", "pg_largeobject_metadata",
    "_pg_largeobject_metadata", "pg_largeobject", "_pg_largeobject",
    "pg_aggregate", "_pg_aggregate", "pg_statistic", "_pg_statistic",
    "pg_statistic_ext", "_pg_statistic_ext", "pg_statistic_ext_data",
    "_pg_statistic_ext_data", "pg_rewrite", "_pg_rewrite", "pg_trigger",
    "_pg_trigger", "pg_event_trigger", "_pg_event_trigger", "pg_description",
    "_pg_description", "pg_cast", "_pg_cast", "pg_enum", "_pg_enum",
    "pg_namespace", "_pg_namespace", "pg_conversion", "_pg_conversion",
    "pg_depend", "_pg_depend", "pg_database", "_pg_database",
    "pg_db_role_setting", "_pg_db_role_setting", "pg_tablespace",
    "_pg_tablespace", "pg_authid", "_pg_authid", "pg_auth_members",
    "_pg_auth_members", "pg_shdepend", "_pg_shdepend", "pg_shdescription",
    "_pg_shdescription", "pg_ts_config", "_pg_ts_config", "pg_ts_config_map",
    "_pg_ts_config_map", "pg_ts_dict", "_pg_ts_dict", "pg_ts_parser",
    "_pg_ts_parser", "pg_ts_template", "_pg_ts_template", "pg_extension",
    "_pg_extension", "pg_foreign_data_wrapper", "_pg_foreign_data_wrapper",
    "pg_foreign_server", "_pg_foreign_server", "pg_user_mapping",
    "_pg_user_mapping", "pg_foreign_table", "_pg_foreign_table", "pg_policy",
    "_pg_policy", "pg_replication_origin", "_pg_replication_origin",
    "pg_default_acl", "_pg_default_acl", "pg_init_privs", "_pg_init_privs",
    "pg_seclabel", "_pg_seclabel", "pg_shseclabel", "_pg_shseclabel",
    "pg_collation", "_pg_collation", "pg_partitioned_table",
    "_pg_partitioned_table", "pg_range", "_pg_range", "pg_transform",
    "_pg_transform", "pg_sequence", "_pg_sequence", "pg_publication",
    "_pg_publication", "pg_publication_rel", "_pg_publication_rel",
    "pg_subscription", "_pg_subscription", "pg_subscription_rel",
    "_pg_subscription_rel", "pg_roles", "_pg_roles", "pg_shadow", "_pg_shadow",
    "pg_group", "_pg_group", "pg_user", "_pg_user", "pg_policies", "_pg_policies",
    "pg_rules", "_pg_rules", "pg_views", "_pg_views", "pg_tables", "_pg_tables",
    "pg_matviews", "_pg_matviews", "pg_indexes", "_pg_indexes", "pg_sequences",
    "_pg_sequences", "pg_stats", "_pg_stats", "pg_stats_ext", "_pg_stats_ext",
    "pg_stats_ext_exprs", "_pg_stats_ext_exprs", "pg_publication_tables",
    "_pg_publication_tables", "pg_locks", "_pg_locks", "pg_cursors",
    "_pg_cursors", "pg_available_extensions", "_pg_available_extensions",
    "pg_available_extension_versions", "_pg_available_extension_versions",
    "pg_prepared_xacts", "_pg_prepared_xacts", "pg_prepared_statements",
    "_pg_prepared_statements", "pg_seclabels", "_pg_seclabels", "pg_settings",
    "_pg_settings", "pg_file_settings", "_pg_file_settings", "pg_hba_file_rules",
    "_pg_hba_file_rules", "pg_timezone_abbrevs", "_pg_timezone_abbrevs",
    "pg_timezone_names", "_pg_timezone_names", "pg_config", "_pg_config",
    "pg_shmem_allocations", "_pg_shmem_allocations", "pg_backend_memory_contexts",
    "_pg_backend_memory_contexts", "pg_stat_all_tables", "_pg_stat_all_tables",
    "pg_stat_xact_all_tables", "_pg_stat_xact_all_tables", "pg_stat_sys_tables",
    "_pg_stat_sys_tables", "pg_stat_xact_sys_tables", "_pg_stat_xact_sys_tables",
    "pg_stat_user_tables", "_pg_stat_user_tables", "pg_stat_xact_user_tables",
    "_pg_stat_xact_user_tables", "pg_statio_all_tables", "_pg_statio_all_tables",
    "pg_statio_sys_tables", "_pg_statio_sys_tables", "pg_statio_user_tables",
    "_pg_statio_user_tables", "pg_stat_all_indexes", "_pg_stat_all_indexes",
    "pg_stat_sys_indexes", "_pg_stat_sys_indexes", "pg_stat_user_indexes",
    "_pg_stat_user_indexes", "pg_statio_all_indexes", "_pg_statio_all_indexes",
    "pg_statio_sys_indexes", "_pg_statio_sys_indexes", "pg_statio_user_indexes",
    "_pg_statio_user_indexes", "pg_statio_all_sequences",
    "_pg_statio_all_sequences", "pg_statio_sys_sequences",
    "_pg_statio_sys_sequences", "pg_statio_user_sequences",
    "_pg_statio_user_sequences", "pg_stat_activity", "_pg_stat_activity",
    "pg_stat_replication", "_pg_stat_replication", "pg_stat_slru",
    "_pg_stat_slru", "pg_stat_wal_receiver", "_pg_stat_wal_receiver",
    "pg_stat_subscription", "_pg_stat_subscription", "pg_stat_ssl",
    "_pg_stat_ssl", "pg_stat_gssapi", "_pg_stat_gssapi", "pg_replication_slots",
    "_pg_replication_slots", "pg_stat_replication_slots",
    "_pg_stat_replication_slots", "pg_stat_database", "_pg_stat_database",
    "pg_stat_database_conflicts", "_pg_stat_database_conflicts",
    "pg_stat_user_functions", "_pg_stat_user_functions",
    "pg_stat_xact_user_functions", "_pg_stat_xact_user_functions",
    "pg_stat_archiver", "_pg_stat_archiver", "pg_stat_bgwriter",
    "_pg_stat_bgwriter", "pg_stat_wal", "_pg_stat_wal",
    "pg_stat_progress_analyze", "_pg_stat_progress_analyze",
    "pg_stat_progress_vacuum", "_pg_stat_progress_vacuum",
    "pg_stat_progress_cluster", "_pg_stat_progress_cluster",
    "pg_stat_progress_create_index", "_pg_stat_progress_create_index",
    "pg_stat_progress_basebackup", "_pg_stat_progress_basebackup",
    "pg_stat_progress_copy", "_pg_stat_progress_copy", "pg_user_mappings",
    "_pg_user_mappings", "pg_replication_origin_status",
    "_pg_replication_origin_status", "cardinal_number", "_cardinal_number",
    "character_data", "_character_data", "sql_identifier", "_sql_identifier",
    "information_schema_catalog_name", "_information_schema_catalog_name",
    "time_stamp", "_time_stamp", "yes_or_no", "_yes_or_no", "applicable_roles",
    "_applicable_roles", "administrable_role_authorizations",
    "_administrable_role_authorizations", "attributes", "_attributes",
    "character_sets", "_character_sets", "check_constraint_routine_usage",
    "_check_constraint_routine_usage", "check_constraints", "_check_constraints",
    "collations", "_collations", "collation_character_set_applicability",
    "_collation_character_set_applicability", "column_column_usage",
    "_column_column_usage", "column_domain_usage", "_column_domain_usage",
    "column_privileges", "_column_privileges", "column_udt_usage",
    "_column_udt_usage", "columns", "_columns", "constraint_column_usage",
    "_constraint_column_usage", "constraint_table_usage",
    "_constraint_table_usage", "domain_constraints", "_domain_constraints",
    "domain_udt_usage", "_domain_udt_usage", "domains", "_domains",
    "enabled_roles", "_enabled_roles", "key_column_usage", "_key_column_usage",
    "parameters", "_parameters", "referential_constraints",
    "_referential_constraints", "role_column_grants", "_role_column_grants",
    "routine_column_usage", "_routine_column_usage", "routine_privileges",
    "_routine_privileges", "role_routine_grants", "_role_routine_grants",
    "routine_routine_usage", "_routine_routine_usage", "routine_sequence_usage",
    "_routine_sequence_usage", "routine_table_usage", "_routine_table_usage",
    "routines", "_routines", "schemata", "_schemata", "sequences", "_sequences",
    "sql_features", "_sql_features", "sql_implementation_info",
    "_sql_implementation_info", "sql_parts", "_sql_parts", "sql_sizing",
    "_sql_sizing", "table_constraints", "_table_constraints", "table_privileges",
    "_table_privileges", "role_table_grants", "_role_table_grants", "tables",
    "_tables", "transforms", "_transforms", "triggered_update_columns",
    "_triggered_update_columns", "triggers", "_triggers", "udt_privileges",
    "_udt_privileges", "role_udt_grants", "_role_udt_grants", "usage_privileges",
    "_usage_privileges", "role_usage_grants", "_role_usage_grants",
    "user_defined_types", "_user_defined_types", "view_column_usage",
    "_view_column_usage", "view_routine_usage", "_view_routine_usage",
    "view_table_usage", "_view_table_usage", "views", "_views",
    "data_type_privileges", "_data_type_privileges", "element_types",
    "_element_types", "_pg_foreign_table_columns", "__pg_foreign_table_columns",
    "column_options", "_column_options", "_pg_foreign_data_wrappers",
    "__pg_foreign_data_wrappers", "foreign_data_wrapper_options",
    "_foreign_data_wrapper_options", "foreign_data_wrappers",
    "_foreign_data_wrappers", "_pg_foreign_servers", "__pg_foreign_servers",
    "foreign_server_options", "_foreign_server_options", "foreign_servers",
    "_foreign_servers", "_pg_foreign_tables", "__pg_foreign_tables",
    "foreign_table_options", "_foreign_table_options", "foreign_tables",
    "_foreign_tables", "_pg_user_mappings", "__pg_user_mappings",
    "user_mapping_options", "_user_mapping_options", "user_mappings",
    "_user_mappings", "spheroid", "_spheroid", "geometry", "_geometry", "box3d",
    "_box3d", "box2d", "_box2d", "box2df", "_box2df", "gidx", "_gidx",
    "geometry_dump", "_geometry_dump", "spatial_ref_sys", "_spatial_ref_sys",
    "valid_detail", "_valid_detail", "geography", "_geography",
    "geography_columns", "_geography_columns", "geometry_columns",
    "_geometry_columns", "raster", "_raster", "rastbandarg", "_rastbandarg",
    "geomval", "_geomval", "addbandarg", "_addbandarg", "summarystats",
    "_summarystats", "agg_count", "_agg_count", "reclassarg", "_reclassarg",
    "agg_samealignment", "_agg_samealignment", "unionarg", "_unionarg",
    "raster_columns", "_raster_columns", "raster_overviews", "_raster_overviews",
    "partner", "_partner", "job", "_job", "chunk", "_chunk", "imagemeta",
    "_imagemeta", "image", "_image", "_species", "_plantnetcall", "detection",
    "_detection", "_plantnetcall_metadata", "plantnetrequestparams",
    "plantnetcall", "_plantnetcall", "detection", "_detection", "plantnetspecies",
    "_plantnetspecies", "species", "_plantnetrequestparams", "plantnetcall",
    "species", "_species", "detectionsummary", "_detectionsummary",
    "plantnetcall_metadata", "_plantnetcall_metadata", "plantnetcall_metadata",
    "family", "_family", "genus", "_genus", "species_whitelist",
    "_species_whitelist", "image_jpg_20240301", "_image_jpg_20240301",
    "image_tiff_20240301", "_image_tiff_20240301",
    "detection_snapshot_20240809_israel", "_detection_snapshot_20240809_israel",
    "detection_snapshot_20240809", "_detection_snapshot_20240809",
    "annotation_enum", "_annotation_enum", "annotations", "_annotations",
    "experiment_meta", "_experiment_meta", "callmeta", "_callmeta", "results",
    "_results", "imagedata_status", "_imagedata_status", "session", "_session",
    "imagedata", "_imagedata", "detectionsummary", "_detectionsummary",
    "imagedata_status", "_imagedata_status", "session", "_session", "imagedata",
    "_imagedata", "additionalimagedata", "_additionalimagedata",
    "detectionsummary", "_detectionsummary", "imagefile", "_imagefile",
    "call_plantnet", "_call_plantnet", "generated_image", "_generated_image",
    "capturepoint", "_capturepoint", "capturepoint", "_capturepoint",
    "status_overview", "_status_overview"
  )
)

dm1 <- dm::dm(
  chunk = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."chunk"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "name", "created", "jobid", "state"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  detection = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."detection"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "callid", "boxid", "score", "boxsize", "centerx", "centery", "speciesid"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  detectionsummary = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."detectionsummary"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "callid", "speciesid", "maxscore", "sumscore", "count", "coverage"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  image = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."image"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "chunkid", "imagemetaid", "created", "filename", "path", "type",
        "md5expected", "md5actual", "state", "generated"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  imagemeta = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."imagemeta"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "tag", "position", "timestamp", "computertime", "exposuretime",
        "velocity", "gain", "altitude", "parsed"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  job = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."job"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "created", "checksum", "partnerid", "name", "imagestorageservice", "size"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  partner = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."partner"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "name", "partnerkey", "storagepath"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  plantnetcall = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."plantnetcall"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "state", "imageid", "imageurl", "responsestatus", "requestparamid",
        "timestamp", "responsetime", "dateparsed", "responsefile"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  plantnetcall_metadata = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."plantnetcall_metadata"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("id", "callid", "uid", "query_dbg", "duration", "versions"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  plantnetrequestparams = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."plantnetrequestparams"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c(
        "id", "slug", "q_stride", "q_stride_x", "q_stride_y", "q_min_size",
        "q_size_factor", "q_multi_scale", "q_min_score", "q_max_rank",
        "q_show_species", "q_show_genus", "q_show_family", "q_show_organ",
        "q_show_location"
      ),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  species = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."species"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = c("gbifid", "pnid", "scientificname", "gbifgenusid"),
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
  species_whitelist = list(
    src = list(
      con = new(
        "PqConnection" |>
          structure(package = "RPostgres"),
        ptr = constructive::.xptr("0x12b7b9b20"),
        bigint = "integer64",
        timezone = "UTC",
        timezone_out = "UTC",
        typnames = typnames1
      ),
      disco = NULL
    ) |>
      structure(class = c("src_PqConnection", "src_dbi", "src_sql", "src")),
    lazy_query = list(
      x = '"camalien"."data"."species_whitelist"' |>
        structure(class = c("dbplyr_table_path", "character")),
      vars = "pnid",
      group_vars = character(0),
      order_vars = NULL,
      frame = NULL
    ) |>
      structure(class = c("lazy_base_remote_query", "lazy_base_query", "lazy_query"))
  ) |>
    structure(class = c("tbl_PqConnection", "tbl_dbi", "tbl_sql", "tbl_lazy", "tbl")),
)  |>
  dm::dm_add_pk(chunk, "id") |>
  dm::dm_add_pk(detection, "id") |>
  dm::dm_add_pk(detectionsummary, "id") |>
  dm::dm_add_pk(image, "id") |>
  dm::dm_add_pk(imagemeta, "id") |>
  dm::dm_add_pk(job, "id") |>
  dm::dm_add_pk(partner, "id") |>
  dm::dm_add_pk(plantnetcall, "id") |>
  dm::dm_add_pk(plantnetcall_metadata, "id") |>
  dm::dm_add_pk(plantnetrequestparams, "id") |>
  dm::dm_add_pk(species, "pnid")

dm2 <- dm1 |>
  dm::dm_add_fk(image, "chunkid", chunk, "id") |>
  dm::dm_add_fk(NA, NA_character_, chunk, "id") |>
  dm::dm_add_fk(NA, NA_character_, image, "id") |>
  dm::dm_add_fk(chunk, "jobid", job, "id") |>
  dm::dm_add_fk(job, "partnerid", partner, "id") |>
  dm::dm_add_fk(NA, NA_character_, partner, "id") |>
  dm::dm_add_fk(NA, NA_character_, partner, "id") |>
  dm::dm_add_fk(NA, NA_character_, plantnetcall, "id") |>
  dm::dm_add_fk(detection, "callid", plantnetcall, "id") |>
  dm::dm_add_fk(detectionsummary, "callid", plantnetcall, "id") |>
  dm::dm_add_fk(plantnetcall_metadata, "callid", plantnetcall, "id") |>
  dm::dm_add_fk(NA, NA_character_, plantnetrequestparams, "id") |>
  dm::dm_add_fk(plantnetcall, "requestparamid", plantnetrequestparams, "id") |>
  dm::dm_add_fk(species_whitelist, "pnid", species, "pnid")

@LDalby
Copy link
Author

LDalby commented Nov 20, 2024

If I call constructive::construct(unclass(remote_pg)) I still get an output of approx. 3700 lines.
Were I supposed to extract something from the output or is that what @moodymudskipper already did above?

@moodymudskipper
Copy link
Collaborator

I don't think unclassing will help, but constructive::construct_base(remote_pg, data = list(con = con)) might hopefully provide a much shorter output without evaluation error message.

@krlmlr
Copy link
Collaborator

krlmlr commented Nov 20, 2024

The problems with constructive aside: What I'm interested in is why dm thinks that the foreign keys have no corresponding detail tables (the dm_add_fk(NA, ...) lines). For this, we could also try a snapshot of dm:::dm_meta(con, schema = ...) , using constructive or through an .rds or similar file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants