Skip to content

Extract notes from Synthesia piano videos on youtube, and export songs to MIDI files.

Notifications You must be signed in to change notification settings

emilamaj/SynToMid

Repository files navigation

SynToMid

This tool helps you convert synthesia-like piano videos of Youtube to a MIDI (.mid) file that can be read by most music tools. It makes extensive use of OpenCV to process the video frames, and to extract the notes played.

Currently implemented:

  • YoutubeStitch.py: Stitch together the frames of the Youtube video into a tall png image of the keys pressed.
  • WaterfallProcess.py: Convert the stitched image to a list of list of rectangles representing the keys pressed. Very buggy, not yet usable for note extraction.

Todo:

  • Better processing of the notes, especially close ones. (Try again pseudo-gradient descent but with cost function in absolute pixels, like L=[10black_pixels - 1white_pixels], where the pixels proposed are the ones enclosed by a rectangle. Beware of the fact that close rectangles will mask each otehr, L also needs an overlap term.)
  • Implement ReadNotes.py, which converts the processed rectangles to a list of notes.
  • Fix YoutubeStitch.py so that the note timings are respected. (Find average scroll rate, and blindly stitch ? Scroll rate might not be constant...)

YoutubeStitch.py

This is the tool that stitches together the frames of the Youtube video into a tall png image of the keys pressed.

Example output of YoutubeStitch.py

This is the result of the script for the first 30 seconds of Chopin - Ballade No. 1 played by Rousseau (https://www.youtube.com/watch?v=Zj_psrTUW_w) This project will consist in 3 parts:

  • YoutubeStitch.py
  • WaterfallProcess.py
  • ReadNotes.py

YoutubeStitch.py

Here is the usage for YoutubeStitch.py:

python YoutubeStitch.py <url> <height> <interval> <start> <stop>
  • url (string): Url of the Youtube video
  • height (string): Percent of the height of the video to process (starting from the top, to ignore the hands of the player)
  • interval (float): Interval in seconds between the frames, to allow the script to run faster
  • start (float): Start position of the video, in seconds (to ignore intros)
  • stop (float): Stop position of the video, in seconds

The process can run for several minutes. You can speedup the script by lowering the height processed. It also removes a lot of the visual artefacts that might remain at the end. Reducing the height however completely messes up the duration of the long notes and the long silences.

It produces a file called output.png, which is the stitched image of the video.

WaterfallProcess.py

This is the tool that converts the stitched image to a list of list of rectangles representing the keys pressed. Usage:

python WaterfallProcess.py <input_png> <output_mid>

Here is the current state of the image processing done with OpenCV:

Example output of WaterfallProcess.py

About

Extract notes from Synthesia piano videos on youtube, and export songs to MIDI files.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages