Skip to content

Releases: benbrandt/text-splitter

v0.2.2 - Add all features to docs.rs

08 May 16:39
Compare
Choose a tag to compare

Add all features to docs.rs

Full Changelog: v0.2.1...v0.2.2

v0.2.1

08 May 15:21
Compare
Choose a tag to compare

New Features

  • impl Default for TextSplitter using Characters. Character count is used for chunk length by default.
  • Specify the current MSRV (1.62.1)

Full Changelog: v0.2.0...v0.2.1

v0.2.0 - Simpler chunking interface

08 May 01:52
Compare
Choose a tag to compare

v0.2.0

Breaking Changes

Simpler Chunking API

Simplified API for the main use case. TextSplitter now only exposes two chunking methods:

  • chunks
  • chunk_indices

The other methods are now private. It was likely that the other methods would have caused confusion since it doesn't return the semantic units themselves, but merged versions.

You also specify chunk size directly in these methods to allow reusing the TextSplitter for different chunk sizes.

Allow passing in tokenizers directly

Rather than wrapping a tokenizer in another struct, you can instead just pass a tokenizer directly into TextSplitter::new.

Bug Fixes

Better handling of recursive paragraph chunking to handle when both double and single newline splits are used.

v0.1.0 - Initial Release

05 May 19:01
Compare
Choose a tag to compare

Initial release to crates.io