Skip to content

Latest commit

 

History

History
48 lines (32 loc) · 940 Bytes

README.md

File metadata and controls

48 lines (32 loc) · 940 Bytes

Blended Dataset Loop

This repository contains a simple loop to compute a balanced ordering of dataset indices to train on. The resulting ordering ensures the data distribution is similar among batches. The loop is implemented in Rust for performance reasons and can be consumed as part of a Python package.

The package uses cffi (rather than e.g. PyO3) in order to be compatible with different Python versions.

Requirements

  • Conda (for Python)
  • Cargo with nightly Rust

Setup

Create a conda environment as follows:

conda create -n blended_dataset_loop python=3.9 -y
conda activate blended_dataset_loop

Install Rust nightly

rustup override set nightly-2024-02-03

Install the Python dev-dependencies:

pip install 'maturin[patchelf]'
pip install '.[dev]'

Develop

After changing the Rust code, run:

maturin develop

or, for release mode

maturin develop --release