Prepare for karaoke video creation, by downloading audio from YouTube and lyrics from Genius for a given song.
This was created to make it easier for me to prepare the source audio, lyrics and instrumental audio so I can then do the actual karaoke creation in focus mode without internet.
- Audio Fetching: Automatically downloads audio in WAV format for the YouTube URL provided (or searches for the song/artist name and fetches the top result).
- Lyrics Fetching: Automatically fetches song lyrics from Genius.com using LyricsGenius.
- Audio Separation: Separates the downloaded audio into instrumental and vocal tracks, using audio-separator.
- Multiple Audio Models: Runs audio separation with 2 different models (by default,
UVR_MDXNET_KARA_2
andUVR-MDX-NET-Inst_HQ_3
) to give you options for the backing track. - Easy Configuration: Control the tool's behavior using command-line arguments.
- Organized Outputs: Creates structured output directories for easy access to generated tracks and lyrics.
- Internet First: Completes operations which require internet first, in case user is preparing last-minute before a period of being offline!
- Flexible Input: provide just a YouTube URL, just an artist and title, or both.
- Playlist Processing: Capable of processing an entire YouTube playlist, extracting audio and lyrics for each track.
- Easy Finalisation: After manually performing your sync (e.g. using MidiCo), run
karaoke-finalise
to remux and add your title screen!
You'll need Python version 3.9, 3.10 or 3.11 (newer versions aren't supported by PyTorch or ONNX Runtime yet).
You can install Karaoke Prep using pip:
pip install karaoke-prep
You'll then also need to install a version of audio-separator
for your hardware:
- Mac with Apple Silicon:
pip install audio-separator[silicon]
- Linux/Windows with CUDA GPU:
pip install audio-separator[gpu]
- Anything else:
pip install audio-separator[cpu]
If you're trying to use CUDA for GPU acceleration, you'll also need to ensure the version of PyTorch you have installed is compatible with the CUDA version you have installed.
While PyTorch should get installed as a dependency of karaoke-prep
anyway, you may need to install PyTorch for your CUDA version by running the installation command provided by the official install wizard: https://pytorch.org/get-started/locally/
If you encounter this error: UnicodeEncodeError: 'charmap' codec can't encode character
Try setting the Python I/O encoding in your environment: export PYTHONIOENCODING=utf-8
If you aren't sure what pip
is or how to use the command line, here are some YouTube tutorials which will probably help:
- Install Python on Windows: https://www.youtube.com/watch?v=YKSpANU8jPE
- Command Line basics Windows: https://www.youtube.com/watch?v=MBBWVgE0ewk
- Install Python on Mac: https://www.youtube.com/watch?v=3-sPfR4JEQ8
- Command Line basics Mac: https://www.youtube.com/watch?v=FfT8OfMpARM
- Get a Genius API Access Token (docs) and set it as an environment variable named
GENIUS_API_TOKEN
- Run
karaoke-prep <Artist> <Title>
to fetch and prep the files above - Wait for it to output at least the WAV file and lyrics (you can leave it running in the background while you sync)
- Open the WAV file in MidiCo:
(YouTube xxxxxxxxxxx).wav
- Copy/paste the lyrics into MidiCo:
(Lyrics Processed).txt
- Perform the lyrics sync (here's a video showing me doing it) in MidiCo
- Render the video to 4k using the MidiCo "Export Movie" feature, saving it as
Artist - Title (Karaoke).mov
in the same folder - Run
karaoke-finalise
to remux the instrumental audio and join the title clip to the start - Upload the resulting
Artist - Title (Final Karaoke).mp4
video to YouTube!
Here's my tutorial video with verbal explanation of the whole process, and here's a normal speed demo of me doing it (8 minutes total for a single track).
You can use Karaoke Prep via the command line:
usage: karaoke-prep [-h] [-v] [--log_level LOG_LEVEL] [--model_name MODEL_NAME] [--model_file_dir MODEL_FILE_DIR] [--output_dir OUTPUT_DIR] [--use_cuda] [--use_coreml]
[--denoise DENOISE] [--normalize NORMALIZE] [--create_track_subfolders]
[artist] [title] [url]
Fetch audio and lyrics for a specified song, to prepare karaoke video creation.
positional arguments:
args [YouTube video or playlist URL] [Artist] [Title] of song to prep. If URL is provided, Artist and Title are optional but increase chance of fetching the correct lyrics. If Artist and Title are provided with no URL, the top YouTube search result will be fetched.
options:
-h, --help show this help message and exit
-v, --version show program's version number and exit
--log_level LOG_LEVEL Optional: logging level, e.g. info, debug, warning (default: info). Example: --log_level=debug
--model_name MODEL_NAME Optional: model name to be used for separation (default: UVR_MDXNET_KARA_2). Example: --model_name=UVR-MDX-NET-Inst_HQ_3
--model_file_dir MODEL_FILE_DIR Optional: model files directory (default: /tmp/audio-separator-models/). Example: --model_file_dir=/app/models
--output_dir OUTPUT_DIR Optional: directory to write output files (default: <current dir>). Example: --output_dir=/app/karaoke
--use_cuda Optional: use Nvidia GPU with CUDA for separation (default: False). Example: --use_cuda=true
--use_coreml Optional: use Apple Silicon GPU with CoreML for separation (default: False). Example: --use_coreml=true
--denoise DENOISE Optional: enable or disable denoising during separation (default: True). Example: --denoise=False
--normalize NORMALIZE Optional: enable or disable normalization during separation (default: True). Example: --normalize=False
--no_track_subfolders Optional: do NOT create a named subfolder for each track. Example: --no_track_subfolders
--intro_background_color INTRO_BACKGROUND_COLOR Optional: Background color for intro video (default: black). Example: --intro_background_color=#123456
--intro_background_image INTRO_BACKGROUND_IMAGE Optional: Path to background image for intro video. Overrides background color if provided. Example: --intro_background_image=path/to/image.jpg
--intro_font INTRO_FONT Optional: Font file for intro video (default: Avenir-Next-Bold). Example: --intro_font=AvenirNext-Bold.ttf
--intro_artist_color INTRO_ARTIST_COLOR Optional: Font color for intro video artist text (default: #ff7acc). Example: --intro_artist_color=#123456
--intro_title_color INTRO_TITLE_COLOR Optional: Font color for intro video title text (default: #ffdf6b). Example: --intro_title_color=#123456
For the most consistent results, provide a specific YouTube URL along with the artist and title like so:
karaoke-prep "https://www.youtube.com/watch?v=YOUR_VIDEO_ID" "The Fray" "Never Say Never"
If you don't have a specific YouTube URL, just provide the artist and title. Karaoke Prep will search for and download the top YouTube result.
karaoke-prep "The Fray" "Never Say Never"
This will process the video at the given URL, guessing the artist and title from the YouTube title.
karaoke-prep "https://www.youtube.com/watch?v=YOUR_VIDEO_ID"
To process a playlist, just provide the playlist URL. The script will process every video in the playlist.
karaoke-prep "https://www.youtube.com/playlist?list=YOUR_PLAYLIST_ID"
After running karaoke-prep
you should have the following 11 files, grouped into folder(s) for each track:
Artist - Title (YouTube xxxxxxxxxxx).webm
- Original unmodified video, fetched from YouTube in the highest quality available (may not always be WebM)
- You probably don't need this unless you're creating a custom request
Artist - Title (YouTube xxxxxxxxxxx).png
- A still image taken from 30 seconds into the video, in case that's useful for a custom background
- You probably don't need this unless you're creating a custom request
Artist - Title (YouTube xxxxxxxxxxx).wav
- Unmodified audio from the YouTube video, converted to WAV format for compatibility
- You should open this in MidiCo to begin syncing lyrics to make your karaoke video.
Artist - Title (Lyrics).txt
- Unmodified lyrics fetched from genius.com based on the artist/title provided
- Depending on the song, may have lines which are too long for one karaoke screen
Artist - Title (Lyrics Processed).txt
- Lyrics from genius, split into lines no longer than 40 characters
- You should open this and copy/paste the text into the MidiCo lyrics editor
Artist - Title (Vocals UVR-MDX-NET-Inst_HQ_3).mp3
- Vocal track from the audio, separated using the UVR Inst_HQ_3 model.
- You probably don't need this unless you're trying to tweak the backing track
Artist - Title (Vocals UVR_MDXNET_KARA_2).mp3
- Vocal track from the audio, separated using the UVR KARA_2 model.
- You probably don't need this unless you're trying to tweak the backing track
Artist - Title (Instrumental UVR-MDX-NET-Inst_HQ_3).mp3
- Instrumental track from the audio, separated using the UVR Inst_HQ_3 model.
- This is typically a safe bet for the instrumental track but will not include any backing vocals.
Artist - Title (Instrumental UVR_MDXNET_KARA_2).mp3
- Instrumental track from the audio, separated using the UVR KARA_2 model.
- This is the default choice for the finalisation step below as it usually does a good job of keeping backing vocals. However for some songs it includes far too much or is kinda glitchy, so you should check it before finalising and make a judgement call about using this vs. the Inst_HQ_3 one.
Artist - Title (Title).png
- Title screen static image; if you specify your own background image, font, color etc. this should give you a convenient way to add a title screen to the start of your video.
Artist - Title (Title).mov
- 5 second video clip version of the title screen image, ready to be joined by the finalisation step.
After completing your manual sync process and rendering your Artist - Title (Karaoke).mov
file (still using the original audio!) into the same folder, you can now run karaoke-finalise
.
This will output some additional files:
Artist - Title (Karaoke).mov
- Karaoke video without title screen, remuxed to use the instrumental audio
Artist - Title (With Vocals).mov
- Karaoke video without title screen, using the original audio (useful for practicing!)
Artist - Title (Final Karaoke).mp4
- Final karaoke video with 5 second title screen intro, instrumental audio and converted to MP4 for compatibility and reduced file size. Upload this to YouTube!
Python >= 3.9
Libraries: onnx, onnxruntime, numpy, soundfile, librosa, torch, wget, six
This project uses Poetry for dependency management and packaging. Follow these steps to setup a local development environment:
- Make sure you have Python 3.9 or newer installed on your machine.
- Install Poetry by following the installation guide here.
Clone the repository to your local machine:
git clone https://github.com/YOUR_USERNAME/karaoke-prep.git
cd karaoke-prep
Replace YOUR_USERNAME with your GitHub username if you've forked the repository, or use the main repository URL if you have the permissions.
Run the following command to install the project dependencies:
poetry install
To activate the virtual environment, use the following command:
poetry shell
You can run the CLI command directly within the virtual environment. For example:
karaoke-prep 1
Once you are done with your development work, you can exit the virtual environment by simply typing:
exit
To build the package for distribution, use the following command:
poetry build
This will generate the distribution packages in the dist directory - but for now only @beveradb will be able to publish to PyPI.
Contributions are very much welcome! Please fork the repository and submit a pull request with your changes, and I'll try to review, merge and publish promptly!
- This project is 100% open-source and free for anyone to use and modify as they wish.
- If the maintenance workload for this repo somehow becomes too much for me I'll ask for volunteers to share maintainership of the repo, though I don't think that is very likely
This project is licensed under the MIT License.
- Please Note: If you choose to integrate this project into some other project using the default model or any other model trained as part of the UVR project, please honor the MIT license by providing credit to UVR and its developers!
- Anjok07 - Author of Ultimate Vocal Remover GUI, which was essential for the creation of audio-separator! Thank you!
For questions or feedback, please raise an issue or reach out to @beveradb (Andrew Beveridge) directly.