Skip to content

1.5.0

Latest
Compare
Choose a tag to compare
@rudolfix rudolfix released this 17 Dec 18:49
e8c5e9b

Core Library

After several weeks of experimenting we release dataset API. You can now read data in your destination with a neat, unified interface that works the same way for warehouses, relational databases, sql alchemy dialects, local and remote files, iceberg and delta tables.
You can use simple dot notation to access tables, execute sql or use data-frame expression (compiled to SQL with ibis). We materialize your data as panda frames, arrow tables or dbapi compatible records (also in batches). Here's main intro:
https://dlthub.com/docs/general-usage/dataset-access/dataset

Together with this we release our backend-less, catalog-less (well, ad hoc technical catalog is created) Iceberg implementation. You can use append and replace write dispositions, create partitions and write to the bucket. Be aware of limitations, we are just starting!
https://dlthub.com/docs/dlt-ecosystem/destinations/delta-iceberg

  • bump semver to minimum version 3.0.0 by @sh-rp in #2132
  • leverage ibis expression for getting readablerelations by @sh-rp in #2046
  • iceberg table format support for filesystem destination by @jorritsandbrink in #2067
  • fixes dlt init fails in Colab (userdata problem) by @rudolfix in #2117
  • Add open/closed range arguments for incremental by @steinitzu in #1991
  • Fix validation error in for custom auth classes by @burnash in #2129
  • add databricks oauth authentication by @donotpush in #2138
  • make duckdb handle Iceberg table with nested types by @jorritsandbrink in #2141
  • refresh standalone resources (old columns were recreated) by @rudolfix in #2140
  • fix ibis az problems on linux by @sh-rp in #2135
  • does not raise if data type was changed manually in schema by @rudolfix in #2150
  • allows to --eject source code of the core sources (ie. sql_database) to allow hacking-in customizations by @rudolfix in #2150
  • convert add_limit to pipe step based limiting by @sh-rp in #2131
  • Enable datatime format for negative timezone by @hairrrrr in #2155

ℹ️ Note on add_limit: now you can use it to chunk large resources and load them in pieces. We support chunks created
based on maximum number of rows or after a specified time. Please read the docs: your resource should return ordered rows or
be able to get data from checkpoint. Also note that we apply add_limit after all processing steps (ie. incremental), before we were limiting generator directly. This was a necessary change to implement chunking and is backward compatible regarding produced data but your resource can be queried many times the get "new" item that ie. is not filtered out by incremental.
https://dlthub.com/docs/examples/backfill_in_chunks

Docs

  • prepare dataset release & docs updates by @sh-rp in #2126
  • Add missing mention of the required endpoint_url config in GCS by @trymzet in #2120
  • example how to use add_limit to do large backfills in steps by @sh-rp in #2131
  • Update auth info in databricks docs by @VioletM in #2153
  • improve how dlt works page by @sh-rp in #2152
  • explicitly adding docs for destination item size control by @HulmaNaseer in #2118
  • Docs: rest_api tutorial: update primary key in merge example by @burnash in #2147

Verified Sources

Code got updated to 1.x.x dlt and tests work again. We are accepting contributions again.
ℹ️ 0.5 sources are on 0.5 tag. If you are still on dlt 0.5.x access this tag via dlt init sql_database duckdb --branch 0.5

New Contributors

Full Changelog: 1.4.1...1.5.0