Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slow mixing time due to use of native audioop module (now deprecated) #6

Open
5 tasks
Bentroen opened this issue Oct 5, 2024 · 0 comments
Open
5 tasks

Comments

@Bentroen
Copy link
Member

Bentroen commented Oct 5, 2024

Issue

As of v0.4.0, exporting seems to take about:

  • 10 seconds for a very simple test file with about 1,000 notes (test.py, included in repository);
  • 467 seconds (almost 8 minutes!) for the Note Block Megacollab file with 250k+ notes.

See screenshots below for a snakeviz profiling graph for these two operations (the .prof files out of cProfile are also attached here: nbswave_profile.zip):

Test file (1k notes):
image

Megacollab (250k notes):
image

This can be made a heck lot better.

Through the above screenshots, you can see that, when there aren't many notes to place, most of the time is spent loading the sound files. And, when the bulk of the operation becomes placing notes, a lot of time is spent in the audio manipulation operations, particularly on panning and volume (which, as we'll see, are simply array multiplications). This indicates that there are potential optimizations to make both in loading sounds, as well as on the mixing steps themselves.

Reason

Looking at jiaaro/pydub#725, many operations in pydub are implemented using the now deprecated, to-be-removed audioop module. Although it requires no external dependencies, it's extremely inefficient -- and, no wonder, takes up most of the export time.

nbswave already bypasses pydub on the mixing implementation -- we implement our own here using numpy operations since it's a lot more efficient than the alternative implemented by pydub (see my 2021 issue about this: jiaaro/pydub#550)

The audio engine implementation done for the future Python NBS rewrite has also shown that many operations nbswave relies on are really slow in pydub. As such, the library was entirely replaced in the audio module with other tools. In the next section, we'll discuss those implementations briefly and how they could be brought here to make the export performance much better. Most of them leverage numpy, which is already a dependency of this package. If we can rely on it enough to bypass pydub operations, it's possible to even remove it completely from the dependencies of nbswave.

Optimizations to make

Loading sounds

  • Current solution: pydub.AudioSegment.from_file
  • Proposed solution: soundfile package
  • Reason: The former launches a ffmpeg subprocess and takes seconds, while the latter calls libsoundfile via CFFI, which is capable of loading all sounds in a fraction of a second. Implemented here.

Volume

  • Current solution: pydub.AudioSegment.apply_gain -> audioop.mul
  • Proposed solution: numpy
  • Reason: One array multiplication with numpy does the trick. Implemented here.

Panning

  • Current solution: pydub.AudioSegment.pan -> audioop.tostereo and audioop.mul
  • Proposed solution: numpy
  • Reason: Requires two array slice multiplications, one for each channel. It's really easy to calculate the gain boost and cut of each channel from the panning value; we've implemented it here.

Pitch

  • Current solution: pydub.AudioSegment._spawn -> audioop.ratecv
  • Proposed solution: libsamplerate
  • Reason: There are entire libraries dedicated to resampling audio while retaining quality, some with the goal of real-time processing (e.g. OpenAL); others not (e.g. librosa etc.). But audioop is miserable at this.

This article presents a comparison between a few of them. In my own research, I've concluded that resampy and samplerate excel at this. resampy uses scipy and numba to accelerate processing, while samplerate uses the widely-known "Secret Rabbit Code", implemented in C++, using pybind11 to interface with it directly (meaning: it is FAST). There's also librosa with its resample function; though its overhead is much larger; and scipy.signal.resample, but I'd rather not include the entirety of scipy to use one function out of it :D

Here is an implementation using libsamplerate, which should be ported here. The implementation prior to this commit used the real-time API to process slices of each playing sound on-demand, but our implementation here doesn't need this -- it's literally one function call, no callbacks or any of that monstrosity.

Order of operations

When this package was made, it was assumed that resampling (necessary to apply pitch) would be the most computationally-expensive operation, since it requires running costly signal interpolation filters.

That would most likely be true if the other operations (panning and pitch) were optimized as much as they could, since they consist entirely of basic array multiplications -- but in its current state, they aren't. To take advantage of this (non-)fact, the implementation applies pitch (resampling) first, and then caches the result to reuse it when applying panning and velocity. Since they are simple multiplication operations, they aren't expected to take long; alas, here we are.

Here's the bit code that does this:

nbswave/nbswave/main.py

Lines 155 to 209 in 8b6f4a1

last_ins = None
last_key = None
last_vol = None
last_pan = None
for note in sorted_notes:
ins = note.instrument
key = note.key
vol = note.velocity
pan = note.panning
if ins != last_ins:
last_key = None
last_vol = None
last_pan = None
try:
sound1 = self._instruments[note.instrument]
except KeyError: # Sound file missing
if not ignore_missing_instruments:
custom_ins_id = ins - self._song.header.default_instruments
instrument_data = self._song.instruments[custom_ins_id]
ins_name = instrument_data.name
ins_file = instrument_data.file
raise MissingInstrumentException(
f"The sound file for instrument {ins_name} was not found: {ins_file}"
)
else:
continue
if sound1 is None: # Sound file not assigned
continue
sound1 = audio.sync(sound1)
if key != last_key:
last_vol = None
last_pan = None
pitch = audio.key_to_pitch(key)
sound2 = audio.change_speed(sound1, pitch)
if vol != last_vol:
last_pan = None
gain = audio.vol_to_gain(vol)
sound3 = sound2.apply_gain(gain)
if pan != last_pan:
sound4 = sound3.pan(pan)
sound = sound4
last_ins = ins
last_key = key
last_vol = vol
last_pan = pan

So the slowness of the panning and gain functions are amplified by this design decision. After implementing the other optimizations, it's wise to check if the avoidances are working as intended and really reducing the exported time (as opposed to applying all operations to all notes). Although, I believe its potential will really shine when resampling becomes the most costly operation, as originally expected.

Summary

All of the operations to be replaced were already implemented in a past version of the NewNBS audio engine, before OpenAL was used. Their respective source code was presented here in each section, so it's only a matter of bringing the implementations here.

Finally, here's the entire history of the audio.py module -- it's so precious to see how many iterations we've gone through to just land on OpenAL at the end!! The good thing is, we can use everything we learned there to make audio processing more efficient here, so it's a win-win :)

With these implementations, I estimate nbswave can export up to 60–80% faster than it can now. :)

Tasks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant