Releases: dask-contrib/dask-deltatable
Releases · dask-contrib/dask-deltatable
Dask-deltatable v0.3.1
This version contains a patch that fixes a problem when reading datasets on a distributed cluster.
Dask-deltatable v0.3
New Features and Enhancements
- More efficient Dask Graph generation (#24)
- Transactional write support for append-only write operations with
to_deltalake
(#29) - Reader now supports partition pruning to only load files that match the provided filters (#30)
- DAT reader acceptance testing against spark generated data (#47)
Breaking changes
- Removed
vaccum_table
(#16) andhistory
(#17) commands. Instead, please use nativedelta-rs
functionality, see https://delta-io.github.io/delta-rs/python/usage.html#vacuuming-tables and https://delta-io.github.io/delta-rs/python/usage.html#history - Minimal supported python version is now 3.9
- Renamed
read_delta_table
toread_deltatable
Dask and delta-rs integeration
This release builds a wrapper around the Rust package called delta-rs
and uses dask for parallel reading.
Features:
- Reads the parquet files based on delta logs parallelly using the dask engine
- Supports all three filesystems like s3, azurefs, gcsfs
- Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters
- row filter
- partition filter
- Query Delta commit info - History
- vacuum the old/ unused parquet files
- load different versions of data using DateTime.
DeltaTable reader using Dask
DeltaTable reader using Dask
- Reads delta table parallelly using dask
- As an Ability to read from different filesystems like S3, Azurefs, gcsfs.
- Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters like row and partition filters.