-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve scaling behavior of Mono<Void> and(Publisher<?> other)
#3920
Comments
Hey, thanks for your interest in this subject. Can you please provide a use case where this would become a bottleneck? I wonder how practical it is to use 200K |
Hey, thanks for your response. I definitely understand your concern about memory footprint. However, I certainly can provide some real world use cases to support my suggestion. I am the owner of keldysh - MessageFlow, a reactive messaging framework built on Project Reactor's core concepts to provide secure and scalable integration patterns, constructed as native reactive pipeline. public interface Message<PAYLOAD> {
PAYLOAD payload();
Mono<Void> acknowledge();
} This choice allows us to provide an API for the central However, any kind of aggregation like Similar, some messaging protocols enforce acknowledgements to be processed in the order of message receival. Since, message processing is reactive, and hence does not provide any ordering guarantees whatsoever, in such cases we maintain acknowledgements in a store that reconstructs ordering on demand, again combining acknowledgements with the Both examples are rather typical scenarios and easily pile up millions of messages. From our perspective, the increased memory footprint of an |
Thanks for the additional context. I played a bit with your benchmark. Please note, that's not the right way to do benchmarks in Java. Consider using JMH for more reliable tests in the future. Nevertheless, just to carry on with the brief feedback, I altered your example to depict the expected, common case of a chain of around 3 public static void main(String[] args) {
measure(() -> test(new ArrayList<>(), 3), 1_000_000);
measure(() -> test(new LinkedList<>(), 3), 1_000_000);
measure(() -> testArray(3), 1_000_000);
}
static void measure(Runnable r, int iterations) {
long start = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
r.run();
}
System.out.println(System.currentTimeMillis() - start);
}
static void test(List<Integer> list, int numberOfIterations) {
for (int i = 0; i != numberOfIterations; i++) {
list.add(i);
}
}
static void testArray(int count) {
Integer[] sources = new Integer[]{};
for (int i = 0; i != count; i++) {
int oldLen = sources.length;
Integer[] newSources = new Integer[oldLen + 1];
System.arraycopy(sources, 0, newSources, 0, oldLen);
newSources[oldLen] = i;
sources = newSources;
}
} Output:
Not only the copy looks as if it's quicker, the use of it will also be much faster than going through the When it comes to chaining operators in such great lengths, the stack depth is growing enormously. Hence, the operator performs the macro fusion and replaces the current object with an aggregated one with an added source. However, imagine what happens when your original example is altered:
The optimization won't even have the chance to be applied. In such a case, since Is it a possibility for your library to allow users to provide/work with an |
I mean, I called the suggestion "Improve scaling behavior ... ". With Finally, it is a decision on your end if you want to squeeze out every bit of performance for simple cases, at the cost of potentially stalled applications at real-world scenarios. Or, if you are willing to compromize on a couple of milliseconds at 1 million times three operations, to guarantee a library that performs well under any circumstances. To answer the question: users do not work with the acknowledgements at all. That's actually one of the central points, to guarantee consistent processing. |
Yes. We do. Especially since those simple cases are more common, e.g. in request-response scenarios where a bit of manipulation is performed here and there and if these add up, the latency increases reducing the scalability of the system. What's especially important to monitor is not just the assembly time but the actual runtime - with simple arrays now, the patterns are straightforward. Introducing an ArrayList can be a burden as each access needs to go through a level of indirection. More on that later.
You are free to use
That's a broader subject to be honest. The topics of a hot path, Amdahl's law, GC, etc. It's not only about milliseconds, yet they still do count.
Having an established and mature library, miniscule changes optimizing for specific use cases can harm the majority of usages so we need to be careful. I am not opposing the change per se, by no means. I actually wonder whether the JIT compiler would optimize away some of the concerns I seem to have. During the subscription phase, a bare array can still be obtained and used. To push the research forward we'd need:
If we see that the proposed change is not significantly impacting the current case (an order of magnitude change would be unaccaptable, a few % here and there should be ok. Important things to note:
Having the above, there's some initial work to be done to begin considering the change. We can see if there's more community demand for improvements here and based on that we can spend some time on it or you can drive the discussion with the research. @PMacho Are you interested in pushing this forward? |
The effect of the JIT can be anticipated simply with: static void measure(Runnable r, int iterations) {
for (int i = 0; i != iterations; i++) {
r.run();
}
long start = System.currentTimeMillis();
for (int i = 0; i != iterations; i++) {
r.run();
}
System.out.println(System.currentTimeMillis() - start);
} Above all, this optimizes From what we saw already, the difference between However, yes, I like to push the issue, since I think a general purpose library like Project Reactor should perform well on every scale, rather then being optimized for simple cases while entirely braking at scale. Yet, no, at the moment I am not willing to implement benchmarks, proving obvious mathematical facts. Don't get me wrong, I am a fan of optimizing, even nano optimizations like using arrays over |
If you'd like to contribute and make it work perhaps please have a look at the collaboration and provided benchmark results in a PR that's currently being reviewed - #3901 and take inspiration from that. If you claim however:
Then let's take a pause and perhaps someone willing can take this forward if there's community demand. I treat this as a low priority at the moment since a workaround using |
Motivation
At the moment
Mono
'sand(...)
method scales O(N^2), slowing down when applied multiple times, e.g. like:The reason is in:
Desired solution
I suggest, using an
ArrayList
for thesources
field inMonoWhen
, allowing to simplyadd
new source much more efficient.Simple comparison:
, prints:
Considered alternatives
At the moment, I use
Mono.fromFuture(CompletableFuture.allOf(mono1.toFuture(), mono2.toFuture()));
which certainly is not the idea, I suppose.Additional context
The text was updated successfully, but these errors were encountered: