List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
-
Updated
Aug 14, 2024
List of useful data augmentation resources. You will find here some not common techniques, libraries, links to GitHub repos, papers, and others.
This repository collects papers for "A Survey on Knowledge Distillation of Large Language Models". We break down KD into Knowledge Elicitation and Distillation Algorithms, and explore the Skill & Vertical Distillation of LLMs.
[CVPR 2020--Oral] CycleISP: Real Image Restoration via Improved Data Synthesis
Computer vision utils for Blender (generate instance annoatation, depth and 6D pose by one line code)
[CVPR 2023] Label-Free Liver Tumor Segmentation
Repository for the results of my master thesis, about the generation and evaluation of synthetic data using GANs
official code for Customizable Embodied Multi-modal Perturbations for SLAM Robustness Benchmarking
Code & data for ICLR 2024 spotlight paper: 🍯MUSTARD: Mastering Uniform Synthesis of Theorem and Proof Data
Official repository for Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning
Code & Dataset for Paper: "Distill Visual Chart Reasoning Ability from LLMs to MLLMs"
A data framework for music information retrieval focusing on electronic music.
Official implementaion of EMNLP 2022 paper "Generate, Discriminate, and Contrast: A Semi-Supervised Sentence Representation Learning Framework"
Boosting Document Intelligence
Source code for LDPTrace: Locally Differentially Private Trajectory Synthesis. VLDB 2023.
Apache NiFi Data Synthesizer
A data synthesizer for creating datasets of feet from a first-person perspective.
[Preprint] Deformation-Recovery Diffusion Model (DRDM): Instance Deformation for Image Manipulation and Synthesis
Coursera - RNN Programming Assignment: In this project, we will construct a speech dataset and implement an algorithm for trigger word detection (sometimes also called keyword detection, or wake word detection).
Blender Python Package for extracting internal data from blender scenes for 3d related data generation purposes.
The Coastal Carbon Network Data Library: An open-source database featuring carbon data from tidal wetlands around the world
Add a description, image, and links to the data-synthesis topic page so that developers can more easily learn about it.
To associate your repository with the data-synthesis topic, visit your repo's landing page and select "manage topics."