Core Library
After several weeks of experimenting we release dataset
API. You can now read data in your destination with a neat, unified interface that works the same way for warehouses, relational databases, sql alchemy dialects, local and remote files, iceberg and delta tables.
You can use simple dot notation to access tables, execute sql or use data-frame expression (compiled to SQL with ibis
). We materialize your data as panda frames, arrow tables or dbapi
compatible records (also in batches). Here's main intro:
https://dlthub.com/docs/general-usage/dataset-access/dataset
Together with this we release our backend-less, catalog-less (well, ad hoc technical catalog is created) Iceberg implementation. You can use append
and replace
write dispositions, create partitions and write to the bucket. Be aware of limitations, we are just starting!
https://dlthub.com/docs/dlt-ecosystem/destinations/delta-iceberg
- bump semver to minimum version 3.0.0 by @sh-rp in #2132
- leverage ibis expression for getting readablerelations by @sh-rp in #2046
iceberg
table format support forfilesystem
destination by @jorritsandbrink in #2067- fixes dlt init fails in Colab (userdata problem) by @rudolfix in #2117
- Add open/closed range arguments for incremental by @steinitzu in #1991
- Fix validation error in for custom auth classes by @burnash in #2129
- add databricks oauth authentication by @donotpush in #2138
- make duckdb handle Iceberg table with nested types by @jorritsandbrink in #2141
- refresh standalone resources (old columns were recreated) by @rudolfix in #2140
- fix ibis az problems on linux by @sh-rp in #2135
- does not raise if data type was changed manually in schema by @rudolfix in #2150
- allows to
--eject
source code of the core sources (ie. sql_database) to allow hacking-in customizations by @rudolfix in #2150 - convert add_limit to pipe step based limiting by @sh-rp in #2131
- Enable datatime format for negative timezone by @hairrrrr in #2155
ℹ️ Note on add_limit
: now you can use it to chunk large resources and load them in pieces. We support chunks created
based on maximum number of rows or after a specified time. Please read the docs: your resource should return ordered rows or
be able to get data from checkpoint. Also note that we apply add_limit
after all processing steps (ie. incremental), before we were limiting generator directly. This was a necessary change to implement chunking and is backward compatible regarding produced data but your resource can be queried many times the get "new" item that ie. is not filtered out by incremental.
https://dlthub.com/docs/examples/backfill_in_chunks
Docs
- prepare dataset release & docs updates by @sh-rp in #2126
- Add missing mention of the required
endpoint_url
config in GCS by @trymzet in #2120 - example how to use
add_limit
to do large backfills in steps by @sh-rp in #2131 - Update auth info in databricks docs by @VioletM in #2153
- improve how dlt works page by @sh-rp in #2152
- explicitly adding docs for destination item size control by @HulmaNaseer in #2118
- Docs: rest_api tutorial: update primary key in merge example by @burnash in #2147
Verified Sources
Code got updated to 1.x.x
dlt
and tests work again. We are accepting contributions again.
ℹ️ 0.5
sources are on 0.5
tag. If you are still on dlt
0.5.x
access this tag via dlt init sql_database duckdb --branch 0.5
New Contributors
- @HulmaNaseer made their first contribution in #2118
- @hairrrrr made their first contribution in #2155
Full Changelog: 1.4.1...1.5.0