Skip to content

Suggestions for those interested in developing audio applications of machine learning

Notifications You must be signed in to change notification settings

drscotthawley/ml-audio-start

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

50 Commits
 
 

Repository files navigation

Getting Started in 'ML-Audio'

Suggestions for students.

About

Audio and acoustics students sometimes ask "How do I get started learning machine learning?" Not everyone gets their start in a major research environment, so this page is intended to serve as a series of suggestions for those who may find themselves "on their own" in their interest in this area. It was started by @drscotthawley and Ryan Miller, but is intended to serve and evolve with the community.

  • This is a collaborative page. Please suggest additions, re-organizations, edits, updates, etc., either via Issues or Pull Requests. (In addition, @drscotthawley may gladly cede control of this content to whichever student or group wants to Wiki-fy it!)

Active Practictioners to Follow

Many of us learn about and contribue to news of new developments, papers, conferences, grants, and networking opportunities via Twitter.

Quick Quotes

  • Justin Salomon: "Anyone working in ML, anyone, should be obliged to curate a dataset before they're allowed to train a single model. The lessons learnt in the process are invaluable, and the dangers of skipping said lessons are manifold (see what I did there?)"

Best Practices

"Tips for Publishing Research Code" courtesy of Papers with Code

General Reference Information

Online Training (ML+audio Specific)

Online Training (More General, Courses)

Tutorials

Talks (at conferences)

Talks we found helpful/inspiring (and are hopefully still relevant). TODO: add more recent talks!

Key Papers / Codes

(Let's try to list "representative" or "landmark" papers, not just our latest tweak, unless it includes a really good intro/review section. ;-) )

Demos

(Not sure if this only means "deployed models you can play with in your browser," or if other things should count as demos)

Packages & Libraries

Tools / GUIs / Gists

Books

Computer-Related Topics

Python:

Signal Processing Topics

Statistics / Math Topics

Datasets (raw audio)

One finds that many supposed "audio datasets" are really only features or even just metadata! Here are some "raw audio" datasets:

DIY Audio Dataset-Making:

(Inspired by Nathan Sepulveda)

Searchable resources:

Scrapers

  • https://github.com/carlthome/audio-scraper: "Scrape audio from YouTube and SoundCloud with a simple command-line interface", e.g. audio-scraper "acoustic guitar". It's 5 years old, but it still works in 2021!

Other DIY Audio Dataset Tricks

  • Depending on your application, you might be able to get away with using samples produced by virtual instruments (i.e. MIDI).
  • If you don't have a lot of labels or targets, you can still pretrain your represenations & weights using autoregressive predictions (even for different audio domains) -- this amounts to doing your own Transfer Learning even without a pretrained model. (This strategy was used by FastAI's text language model system "ULMFit")

Cleaning Audio Datasets?

With images, you can quickly look at many of them almost at once. With audio, you have to listen to each one. But take a cue from fast.ai's Jeremy Howard:

"It's easier to clean a dataset once you've trained a model."

So we can train the model, and then look for high-loss / low-confidence ratings for certain samples: those should be the ones we should check first.

Could even start with someone else's pretrained model and look for anomalies when running inference on your data, i.e. similar inputs should yield similar outputs, so if they don't...?

Length of audio?

You might be able to find short samples of exactly what you need, but it's also common to have the desired audio be just a part of a much longer clip. How to segment it and keep just what you want? You could use other people's models, e.g. for detecting speech or guitars:

  • Delete what you don't want: Audio you might get off YouTube needs to be segmented in order to make it useful -- the stuff you don't want needs to be cut out. If you're looking for musical audio, you could use a speech detector (there are lots of them available) and then delete or ignore all the speech.
  • What if all you want is the guitar solo, not the whole song? Someone else's pretrained model for detecting guitars could help you.

Are we classifying or regressing?

Standards are a lot higher for regression systems, e.g. phase errors / time alignment issues probably won't matter to a classifier, but might for a regression model, depending on the goal. What about clipping, distortion,...? This will depend on what you're trying to do.

"Major" ML-Audio Research/Development Groups

Universities:

(or, "Where should I apply for grad school?")

  • QMUL (London)
  • UPF (Barcelona)
  • CRRMA (Stanford, San Francisco)
  • IRCAM (Paris)
  • NYU (New York)

Industry:

("Where can I get an internship/job"?)

Conferences

("Which conference(s) should I go to?" -- asked by student on the day this doc began)

Audio-Specific

**Long list of Music Technology specific conferences https://conferences.smcnetwork.org/ - which is references from here https://github.com/MTG/conferences

  • Audio Engineering Society (AES)
  • ASA
  • Digital Audio Effects (DAFx)
  • ICASSP
  • ISMIR
  • SANE
  • Web Audio Conference (WAC)
  • SMC
  • LVA/ICA
  • Audio Mostly
  • WIMP
  • DCASE
  • CSMC
  • MuMe
  • ICMC
  • CMMR
  • IBAC
  • MLSP
  • Interspeech
  • FMA

General ML

  • ICLR
  • ICML
  • NeurIPS
  • IJCNN

Journals

("Where can I get published?")

In addition, in machine learning specifically, the tendency is for conference papers to be peer-reviewed and to "count" as journal publications.

Competitions / Benchmarks

Some are yearly, some may be defunct but still interesting.

Contributors

Ryan Miller, RJ Skerry-Ryan, Dave Moffat, Jesse Engel, Iver Jordal

If you want your name listed here, you may. ;-)

About

Suggestions for those interested in developing audio applications of machine learning

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published