Skip to content

Latest commit

 

History

History
294 lines (211 loc) · 12.9 KB

README.md

File metadata and controls

294 lines (211 loc) · 12.9 KB

Discord Status GitHub Workflow Status GitHub CodeFactor

Check Requirements Pytest docker

UltraSinger Logo

UltraSinger

⚠️ This project is still under development!

UltraSinger is a tool to automatically create UltraStar.txt, midi and notes from music. It automatically pitches UltraStar files, adding text and tapping to UltraStar files and creates separate UltraStar karaoke files. It also can re-pitch current UltraStar files and calculates the possible in-game score.

Multiple AI models are used to extract text from the voice and to determine the pitch.

Please mention UltraSinger in your UltraStar.txt file if you use it. It helps others find this tool, and it helps this tool get improved and maintained. You should only use it on Creative Commons licensed songs.

❤️ Support

There are many ways to support this project. Starring ⭐️ the repo is just one 🙏

You can also support this work on GitHub sponsors or Patreon or Buy Me a Coffee.

This will help me a lot to keep this project alive and improve it.

Buy Me a Coffee Become a Patron GitHub Sponsor

Table of Contents

💻 How to use this source code

Installation

  • Install Python 3.10 (older and newer versions has some breaking changes). Download
  • Also install ffmpeg separately with PATH. Download
  • Go to folder install and run install script for your OS.
    • Choose GPU if you have an nvidia CUDA GPU.
    • Choose CPU if you don't have an nvidia CUDA GPU.

Run

  • In root folder just run run_on_windows.bat or run_on_linux.sh to start the app.
  • Now you can use the UltraSinger source code with py UltraSinger.py [opt] [mode] [transcription] [pitcher] [extra]. See How to use for more information.

📖 How to use the App

Not all options working now!

    UltraSinger.py [opt] [mode] [transcription] [pitcher] [extra]
    
    [opt]
    -h      This help text.
    -i      Ultrastar.txt
            audio like .mp3, .wav, youtube link
    -o      Output folder
    
    [mode]
    ## if INPUT is audio ##
    default  Creates all
    
    # Single file creation selection is in progress, you currently getting all!
    (-u      Create ultrastar txt file) # In Progress
    (-m      Create midi file) # In Progress
    (-s      Create sheet file) # In Progress
    
    ## if INPUT is ultrastar.txt ##
    default  Creates all

    [transcription]
    # Default is whisper
    --whisper               Multilingual model > tiny|base|small|medium|large-v1|large-v2  >> ((default) is large-v2)
                            English-only model > tiny.en|base.en|small.en|medium.en
    --whisper_align_model   Use other languages model for Whisper provided from huggingface.co
    --language              Override the language detected by whisper, does not affect transcription but steps after transcription
    --whisper_batch_size    Reduce if low on GPU mem >> ((default) is 16)
    --whisper_compute_type  Change to "int8" if low on GPU mem (may reduce accuracy) >> ((default) is "float16" for cuda devices, "int8" for cpu)
    
    [pitcher]
    # Default is crepe
    --crepe            tiny|full >> ((default) is full)
    --crepe_step_size  unit is miliseconds >> ((default) is 10)
    
    [extra]
    --hyphenation           Use automatic hyphenation > True|False >> ((default) is True)
    --disable_separation    Disable track separation > True|False >> ((default) is False)
    --disable_karaoke       True|False >> ((default) is False)
    --ignore_audio          True|False >> ((default) is False)
    --create_audio_chunks   True|False >> ((default) is False)
    --keep_cache            Keep cache folder after generation > True|False >> ((default) is False)
    --plot                  Create a pitch plot > True|False >> ((default) is False)
    --format_version        0.3.0|1.0.0|1.1.0|1.2.0 >> ((default) is 1.2.0)
    --musescore_path        path to MuseScore executable
    --keep_numbers          Transcribe numbers as digits and not words > True|False >> ((default) is False)
    
    [yt-dlp]
    --cookiefile            File name where cookies should be read from

    [device]
    --force_cpu             True|False >> ((default) is False)  All steps will be forced to cpu
    --force_whisper_cpu     True|False >> ((default) is False)  Only whisper will be forced to cpu
    --force_crepe_cpu       True|False >> ((default) is False)  Only crepe will be forced to cpu

For standard use, you only need to use [opt]. All other options are optional.

🎶 Input

Audio (full automatic)

Local file
-i "input/music.mp3"
Youtube
-i https://www.youtube.com/watch?v=BaW_jenozKc

Note that if you run into a yt-dlp error such as Sign in to confirm you’re not a bot. This helps protect our community (yt-dlp issue) you can follow these steps:

  • generate a cookies.txt file with yt-dlp yt-dlp --cookies cookies.txt --cookies-from-browser firefox
  • then pass the cookies.txt to UltraSinger --cookiefile cookies.txt

UltraStar (re-pitch)

This re-pitch the audio and creates a new txt file.

-i "input/ultrastar.txt"

🗣 Transcriber

Keep in mind that while a larger model is more accurate, it also takes longer to transcribe.

Whisper

For the first test run, use the tiny, to be accurate use the large-v2 model.

-i XYZ --whisper large-v2
Whisper languages

Currently provided default language models are en, fr, de, es, it, ja, zh, nl, uk, pt. If the language is not in this list, you need to find a phoneme-based ASR model from 🤗 huggingface model hub. It will download automatically.

Example for romanian:

-i XYZ --whisper_align_model "gigant/romanian-wav2vec2"

✍️ Hyphenation

Is on by default. Can also be deactivated if hyphenation does not produce anything useful. Note that the word is simply split, without paying attention to whether the separated word really starts at the place or is heard.

-i XYZ --hyphenation True

👂 Pitcher

Pitching is done with the crepe model. Also consider that a bigger model is more accurate, but also takes longer to pitch. For just testing you should use tiny. If you want solid accurate, then use the full model.

-i XYZ --crepe full

👄 Separation

The vocals are separated from the audio before they are passed to the models. If problems occur with this, you have the option to disable this function; in which case the original audio file is used instead.

-i XYZ --disable_separation True

Sheet Music

For Sheet Music generation you need to have MuseScore installed on your system. Or provide the path to the MuseScore executable.

-i XYZ --musescore_path "C:/Program Files/MuseScore 4/bin/MuseScore4.exe"

Format Version

This defines the format version of the UltraStar.txt file. For more info see Official UltraStar format specification.

You can choose between 3 different format versions. The default is 1.0.0.

  • 0.3.0 is the old format version. Use this if you have problems with the new format.
  • 1.0.0 is the current format version.
  • 1.1.0 is the upcoming format version. It is not finished yet.
-i XYZ --format_version 1.0.0

🏆 Ultrastar Score Calculation

The score that the singer in the audio would receive will be measured. You get 2 scores, simple and accurate. You wonder where the difference is? Ultrastar is not interested in pitch hights. As long as it is in the pitch range A-G you get one point. This makes sense for the game, because otherwise men don't get points for high female voices and women don't get points for low male voices. Accurate is the real tone specified in the txt. I had txt files where the pitch was in a range not singable by humans, but you could still reach the 10k points in the game. The accuracy is important here, because from this MIDI and sheet are created. And you also want to have accurate files

📟 Use GPU

With a GPU you can speed up the process. Also the quality of the transcription and pitching is better.

You need a cuda device for this to work. Sorry, there is no cuda device for macOS.

It is optional (but recommended) to install the cuda driver for your gpu: see driver. Install torch with cuda separately in your venv. See tourch+cuda. Also check you GPU cuda support. See cuda support

Command for pip:

pip3 install torch==2.0.1+cu117 torchvision==0.15.2+cu117 torchaudio==2.0.2+cu117 --index-url https://download.pytorch.org/whl/cu117

When you want to use conda instead you need a different installation command.

Considerations for Windows users

The pitch tracker used by UltraSinger (crepe) uses TensorFlow as its backend. TensorFlow dropped GPU support for Windows for versions >2.10 as you can see in this release note and their installation instructions.

For now UltraSinger runs the latest version available that still supports GPUs on windows.

For running later versions of TensorFlow on windows while still taking advantage of GPU support the suggested solution is to run UltraSinger in a container.

Crashes due to low VRAM

If something crashes because of low VRAM then use a smaller model. Whisper needs more than 8GB VRAM in the large model!

You can also force cpu usage with the extra option --force_cpu.

Containerized (Docker or Podman)

See container/README.md