Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelisation for functors #3596

Draft
wants to merge 14 commits into
base: develop
Choose a base branch
from

Conversation

brdvd
Copy link
Contributor

@brdvd brdvd commented Feb 11, 2024

This PR adds the functionality to process functors in parallel and enables parallelisation for a few functors where this is straightforward. It should be seen as a starting point to demonstrate how parallelisation can be exploited. To parallelise a significant part of the layout requires a lot more work.

A bit of profiling shows that we actually achieve mild runtime improvements by processing in parallel.
For a medium sized piece (10 pages):

Number of threads ResetAligner runtime [ms] LayoutVertically runtime [ms]
1 52 121
2 36 101
4 38 100
8 35 103

For a large piece (266 pages):

Number of threads ResetAligner runtime [ms] LayoutVertically runtime [ms]
1 845 1560
2 635 1400
4 657 1230
8 651 1430

Further remarks:

  • Parallelisation is switched on by setting the new --max-threads option.
  • It is enabled on a functor by overriding the methods GetProcessingStrategy and CloneFunctor. The first one decides whether concurrent processing happens on measures (horizontal layout) or systems (vertical layout). The second one clones the functor before parallel processing and is quite similar to Object::Clone.
  • The method MergeFunctor can be overridden and is designed to merge the results of the calculation on several functors into one functor again (after parallel processing). One example where this makes sense is the m_crossStaffSlurs flag in AdjustSlurs.
  • It seems tempting to simply add parallelisation to all functors by adding the appropriate methods. However, several functors cannot be parallelised in a straightforward way. One obstacle is if the functor keeps information that lives outside measures or systems (e.g., overriding VisitScore or VisitPage) or transports information from one measure/system to the next (e.g., PrepareTimeSpanning or GenerateMIDI). Another obstacle is if the functor changes global information (e.g. in the document). This might lead to undefined behaviour and requires mutex protection. One needs to carefully check which data is changed by the functor before adding parallelisation.
  • The entire processing code in Object is currently duplicated for const and non-const functors. Not sure if we want to simplify this with templates.

@craigsapp
Copy link
Contributor

Can parallelization be utilized in the Emscripten compiling process to the javascript version of verovio?

It is interesting that LayoutVertically runtime time goes up from 4 to 8 threads.

@brdvd
Copy link
Contributor Author

brdvd commented Feb 11, 2024

It is interesting that LayoutVertically runtime time goes up from 4 to 8 threads.

Yes, maybe this is just my machine. It also depends on which other software is busy at the moment (I had Xcode and the profiler open).

@ahankinson
Copy link
Contributor

At a certain point thread overhead overtakes work done. That’s a pretty normal phenomenon.

is this true parallelisation or is it concurrency? JS has concurrency now(async/await) even though it’s still single threaded.

@brdvd
Copy link
Contributor Author

brdvd commented Feb 12, 2024

It is true parallelisation, i.e. multi-threaded. Async/await like suspension corresponds to coroutines in C++ which is something different.

@brdvd
Copy link
Contributor Author

brdvd commented Apr 9, 2024

@lpugin Do we want to proceed with this? The effect on total runtime is marginal at the moment. I guess we can currently achieve greater performance improvements by focusing on that the (single threaded) implementation is as smooth as possible (e.g. in the line of #3587 or #3630). Maybe just keep this as an idea.

@lpugin
Copy link
Contributor

lpugin commented Apr 19, 2024

@DavidBauer1984 sorry for the delay on the follow up. I wanted to test the different bindings, and as far as I can see it does work as expected. I still want to investigate more precisely the impact (e.g., size) for the JS one.

As you pointed out, the performance improvement remains small. One follow-up refactoring might be to turn the drawing process into a functor. That way we could enable the parallelization for some bounding-box drawing steps, which might lead to some additional gain. However, this would be a quite significant refactoring and might yield some memory access problems to deal with. What do you think?

@brdvd
Copy link
Contributor Author

brdvd commented Apr 23, 2024

@lpugin I think the first step should be to move the beam and (initial) slur calculation from the drawing process into functors. This includes any other layout calculation that is done during drawing. Those calculations are currently done repeatedly and maybe it would be sufficient to do them just once or twice. It would also lead to a cleaner code structure.
Anyway, I can't find the time for another big refactoring project (like the one for functors) at the moment. But sure take your time to investigate the PR. From my point this is something like a proof of concept for parallelisation. It demonstrates that it works and has potential to improve performance, but we still have to figure out how to utilise this in an optimal way.

@lpugin
Copy link
Contributor

lpugin commented Apr 29, 2024

Yes, moving some layout calculation steps would already enable some parallelization for some jobs and would be good. I think I would like to keep the PR as draft so we can keep an eye on it and keep it "mergeable".

@lpugin lpugin marked this pull request as draft April 29, 2024 15:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants