Release 1.5.0 · dlt-hub/dlt

Core Library

After several weeks of experimenting we release dataset API. You can now read data in your destination with a neat, unified interface that works the same way for warehouses, relational databases, sql alchemy dialects, local and remote files, iceberg and delta tables.
You can use simple dot notation to access tables, execute sql or use data-frame expression (compiled to SQL with ibis). We materialize your data as panda frames, arrow tables or dbapi compatible records (also in batches). Here's main intro:
https://dlthub.com/docs/general-usage/dataset-access/dataset

Together with this we release our backend-less, catalog-less (well, ad hoc technical catalog is created) Iceberg implementation. You can use append and replace write dispositions, create partitions and write to the bucket. Be aware of limitations, we are just starting!
https://dlthub.com/docs/dlt-ecosystem/destinations/delta-iceberg

bump semver to minimum version 3.0.0 by @sh-rp in #2132
leverage ibis expression for getting readablerelations by @sh-rp in #2046
iceberg table format support for filesystem destination by @jorritsandbrink in #2067
fixes dlt init fails in Colab (userdata problem) by @rudolfix in #2117
Add open/closed range arguments for incremental by @steinitzu in #1991
Fix validation error in for custom auth classes by @burnash in #2129
add databricks oauth authentication by @donotpush in #2138
make duckdb handle Iceberg table with nested types by @jorritsandbrink in #2141
refresh standalone resources (old columns were recreated) by @rudolfix in #2140
fix ibis az problems on linux by @sh-rp in #2135
does not raise if data type was changed manually in schema by @rudolfix in #2150
allows to --eject source code of the core sources (ie. sql_database) to allow hacking-in customizations by @rudolfix in #2150
convert add_limit to pipe step based limiting by @sh-rp in #2131
Enable datatime format for negative timezone by @hairrrrr in #2155

ℹ️ Note on add_limit: now you can use it to chunk large resources and load them in pieces. We support chunks created
based on maximum number of rows or after a specified time. Please read the docs: your resource should return ordered rows or
be able to get data from checkpoint. Also note that we apply add_limit after all processing steps (ie. incremental), before we were limiting generator directly. This was a necessary change to implement chunking and is backward compatible regarding produced data but your resource can be queried many times the get "new" item that ie. is not filtered out by incremental.
https://dlthub.com/docs/examples/backfill_in_chunks

Docs

prepare dataset release & docs updates by @sh-rp in #2126
Add missing mention of the required endpoint_url config in GCS by @trymzet in #2120
example how to use add_limit to do large backfills in steps by @sh-rp in #2131
Update auth info in databricks docs by @VioletM in #2153
improve how dlt works page by @sh-rp in #2152
explicitly adding docs for destination item size control by @HulmaNaseer in #2118
Docs: rest_api tutorial: update primary key in merge example by @burnash in #2147

Verified Sources

Code got updated to 1.x.x dlt and tests work again. We are accepting contributions again.
ℹ️ 0.5 sources are on 0.5 tag. If you are still on dlt 0.5.x access this tag via dlt init sql_database duckdb --branch 0.5

New Contributors

@HulmaNaseer made their first contribution in #2118
@hairrrrr made their first contribution in #2155

Full Changelog: 1.4.1...1.5.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

1.5.0

Core Library

Docs

Verified Sources

New Contributors

Contributors