Slow mixing time due to use of native `audioop` module (now deprecated) #6

Bentroen · 2024-10-05T18:48:34Z

Issue

As of v0.4.0, exporting seems to take about:

10 seconds for a very simple test file with about 1,000 notes (test.py, included in repository);
467 seconds (almost 8 minutes!) for the Note Block Megacollab file with 250k+ notes.

See screenshots below for a snakeviz profiling graph for these two operations (the .prof files out of cProfile are also attached here: nbswave_profile.zip):

Test file (1k notes):

Megacollab (250k notes):

This can be made a heck lot better.

Through the above screenshots, you can see that, when there aren't many notes to place, most of the time is spent loading the sound files. And, when the bulk of the operation becomes placing notes, a lot of time is spent in the audio manipulation operations, particularly on panning and volume (which, as we'll see, are simply array multiplications). This indicates that there are potential optimizations to make both in loading sounds, as well as on the mixing steps themselves.

Reason

Looking at jiaaro/pydub#725, many operations in pydub are implemented using the now deprecated, to-be-removed audioop module. Although it requires no external dependencies, it's extremely inefficient -- and, no wonder, takes up most of the export time.

nbswave already bypasses pydub on the mixing implementation -- we implement our own here using numpy operations since it's a lot more efficient than the alternative implemented by pydub (see my 2021 issue about this: jiaaro/pydub#550)

The audio engine implementation done for the future Python NBS rewrite has also shown that many operations nbswave relies on are really slow in pydub. As such, the library was entirely replaced in the audio module with other tools. In the next section, we'll discuss those implementations briefly and how they could be brought here to make the export performance much better. Most of them leverage numpy, which is already a dependency of this package. If we can rely on it enough to bypass pydub operations, it's possible to even remove it completely from the dependencies of nbswave.

Optimizations to make

Loading sounds

Current solution: pydub.AudioSegment.from_file
Proposed solution: soundfile package
Reason: The former launches a ffmpeg subprocess and takes seconds, while the latter calls libsoundfile via CFFI, which is capable of loading all sounds in a fraction of a second. Implemented here.

Volume

Current solution: pydub.AudioSegment.apply_gain -> audioop.mul
Proposed solution: numpy
Reason: One array multiplication with numpy does the trick. Implemented here.

Panning

Current solution: pydub.AudioSegment.pan -> audioop.tostereo and audioop.mul
Proposed solution: numpy
Reason: Requires two array slice multiplications, one for each channel. It's really easy to calculate the gain boost and cut of each channel from the panning value; we've implemented it here.

Pitch

Current solution: pydub.AudioSegment._spawn -> audioop.ratecv
Proposed solution: libsamplerate
Reason: There are entire libraries dedicated to resampling audio while retaining quality, some with the goal of real-time processing (e.g. OpenAL); others not (e.g. librosa etc.). But audioop is miserable at this.

This article presents a comparison between a few of them. In my own research, I've concluded that resampy and samplerate excel at this. resampy uses scipy and numba to accelerate processing, while samplerate uses the widely-known "Secret Rabbit Code", implemented in C++, using pybind11 to interface with it directly (meaning: it is FAST). There's also librosa with its resample function; though its overhead is much larger; and scipy.signal.resample, but I'd rather not include the entirety of scipy to use one function out of it :D

Here is an implementation using libsamplerate, which should be ported here. The implementation prior to this commit used the real-time API to process slices of each playing sound on-demand, but our implementation here doesn't need this -- it's literally one function call, no callbacks or any of that monstrosity.

Order of operations

When this package was made, it was assumed that resampling (necessary to apply pitch) would be the most computationally-expensive operation, since it requires running costly signal interpolation filters.

That would most likely be true if the other operations (panning and pitch) were optimized as much as they could, since they consist entirely of basic array multiplications -- but in its current state, they aren't. To take advantage of this (non-)fact, the implementation applies pitch (resampling) first, and then caches the result to reuse it when applying panning and velocity. Since they are simple multiplication operations, they aren't expected to take long; alas, here we are.

Here's the bit code that does this:

nbswave/nbswave/main.py

Lines 155 to 209 in 8b6f4a1

    
           last_ins = None 
        
           last_key = None 
        
           last_vol = None 
        
           last_pan = None 
        
           for note in sorted_notes: 
        
               ins = note.instrument 
        
               key = note.key 
        
               vol = note.velocity 
        
               pan = note.panning 
        
               if ins != last_ins: 
        
                   last_key = None 
        
                   last_vol = None 
        
                   last_pan = None 
        
                   try: 
        
                       sound1 = self._instruments[note.instrument] 
        
                   except KeyError:  # Sound file missing 
        
                       if not ignore_missing_instruments: 
        
                           custom_ins_id = ins - self._song.header.default_instruments 
        
                           instrument_data = self._song.instruments[custom_ins_id] 
        
                           ins_name = instrument_data.name 
        
                           ins_file = instrument_data.file 
        
                           raise MissingInstrumentException( 
        
                               f"The sound file for instrument {ins_name} was not found: {ins_file}" 
        
                           ) 
        
                       else: 
        
                           continue 
        
                   if sound1 is None:  # Sound file not assigned 
        
                       continue 
        
                   sound1 = audio.sync(sound1) 
        
               if key != last_key: 
        
                   last_vol = None 
        
                   last_pan = None 
        
                   pitch = audio.key_to_pitch(key) 
        
                   sound2 = audio.change_speed(sound1, pitch) 
        
               if vol != last_vol: 
        
                   last_pan = None 
        
                   gain = audio.vol_to_gain(vol) 
        
                   sound3 = sound2.apply_gain(gain) 
        
               if pan != last_pan: 
        
                   sound4 = sound3.pan(pan) 
        
                   sound = sound4 
        
               last_ins = ins 
        
               last_key = key 
        
               last_vol = vol 
        
               last_pan = pan

So the slowness of the panning and gain functions are amplified by this design decision. After implementing the other optimizations, it's wise to check if the avoidances are working as intended and really reducing the exported time (as opposed to applying all operations to all notes). Although, I believe its potential will really shine when resampling becomes the most costly operation, as originally expected.

Summary

All of the operations to be replaced were already implemented in a past version of the NewNBS audio engine, before OpenAL was used. Their respective source code was presented here in each section, so it's only a matter of bringing the implementations here.

Finally, here's the entire history of the audio.py module -- it's so precious to see how many iterations we've gone through to just land on OpenAL at the end!! The good thing is, we can use everything we learned there to make audio processing more efficient here, so it's a win-win :)

With these implementations, I estimate nbswave can export up to 60–80% faster than it can now. :)

Tasks

Give feedback

Optimize pitch (resampling) via samplerate package
Optimize panning with numpy
Optimize volume (gain) with numpy
Optimize sound loading via soundfile package
Consider removing pydub as project dependency
Options

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow mixing time due to use of native `audioop` module (now deprecated) #6

Slow mixing time due to use of native `audioop` module (now deprecated) #6

Bentroen commented Oct 5, 2024 •

edited

Loading

Tasks

Slow mixing time due to use of native audioop module (now deprecated) #6

Slow mixing time due to use of native audioop module (now deprecated) #6

Comments

Bentroen commented Oct 5, 2024 • edited Loading

Issue

Reason

Optimizations to make

Loading sounds

Volume

Panning

Pitch

Order of operations

Summary

Tasks

Slow mixing time due to use of native `audioop` module (now deprecated) #6

Slow mixing time due to use of native `audioop` module (now deprecated) #6

Bentroen commented Oct 5, 2024 •

edited

Loading