This is a simple kedro pipeline that shows how custom datasets can be leveraged to process and transcribe raw audio files using torchaudio.
The pipeline includes one custom dataset class AudioDataSet
and two nodes as shown in the image below:
- Clone repository
- Create new conda environment
conda create -n audio_pipeline python=3.10
conda activate audio_pipeline
- Install dependencies
pip install -r requirements.txt
- Make sure that ffmpeg and ffmpeg-python are installed on your machine
- Create an .env file that contains following value:
OPENAI_API_KEY= # your API key
- Put
.mp3
audio files that should be processed into directorydata/01_raw
- Modify Whisper parameters in file
conf/base/parameters.yml
according to your needs - Run pipeline
kedro run
Kedro then runs the pipeline and executes following steps:
- Load audio files located in
01_raw
- Reduce noise in audios by running node
int_reduce_noise
- Save audio files with reduced noise to
05_model_input
- Transcribe audios with OpenAI Whisper by running node
pri_transcription
- Save transcripts to
07_model_output