Scalable data pre processing and curation toolkit for LLMs
python data data-processing data-preparation deduplication data-quality data-curation data-prep fine-tuning fast-data-processing data-processing-pipelines datacuration large-language-models llm llmapps large-scale-data-processing datarecipes semantic-deduplication llm-data-quality
-
Updated
Dec 20, 2024 - Jupyter Notebook