-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove wasteful safer Inet4 lookups #276
Conversation
e0f3b3c
to
a2c7235
Compare
@radcortez can you check if the ones happening in clinit are the one you saw in the flamegraphs? here i can see many, not just the 3 ones I've fixed. |
Ah no, the flamegraph is per line of code, so the fix here should take care of it already |
a2c7235
to
3a6d6da
Compare
Here is the flamegraph with the current behaviour: And with the change: |
Thanks @radcortez so The overall saving as "allocation bamdwidth" is ~217KB, due to removing the If you want to give this a shot you could profile with the The overall heap allocation pressure reduction between the 2 versions is ~0.8% which is tiny amount, but still measurable; with the native one you can get the global picture of the improvement, if you wish. |
The diff as given doesn't seem to do anything to remove an indy. Also as I said on the WF common change, I think that it's better to use |
Yes, I confirm an improvement in allocs RSS for native as well with this PR. It is somehow significant, but I only noticed it because it was in my analysis path, and it wasn't showing up before on previous versions. We should move forward, but we need to clarify @dmlloyd suggestion first. |
How are you measuring, and are you comparing against I'm not arguing against the change per se, but if we're justifying any additional complexity by a measurable performance improvement, then the measurements had better show a real-world improvement that is greater than the margin of error. |
Nope, sadly the JDK issue is correct, just Claes probaby didn't reported in the issue itself the full analysis which brought me to prepare the reproducer for him, but I will later attach my comment to the PR which explain something more about it. In the original benchmark reported in the issue, you can find https://gist.github.com/franz1981/bf67ed8f328ed112a524c1150833c718#file-asciicopy-java-L73 which is indeed simplifying the life of the JIT during the generation of the array's copy stub, by having separate call-sites which:
Which translate in the conclusion that having a properly sized array payoff in the already slower (if compared to
Related using the
Although on the PR at openjdk/jdk#12453 (comment) I cannot see any mention about it in the main PR comment, but my previous tests report that one as the worst method to use; I can re-verify it, in case. The changes at https://bugs.openjdk.org/browse/JDK-8301958 are instead providing a "better" copy method, using the same trick I've suggested, but it will improve just the copy, and not the inherent checks in Said that, these are facts which should be weighted with the perceived (it's personal, but vs the original version is clearly worsen now!) increase of code complexity vs the actual gain: I've no problem to change the PR or close it and let anyone to propose a different change which still improve the RSS and native usage as this one; the runtime performance is sadly already reduced because of not using the indy, but probably is a small drop in the ocean, if is not the hottest path! |
I think the loss of the indy is not a problem for the same reason that the non-deprecated ctor usage is not a problem: it is indeed a drop in the ocean. We do not create many of these address instances in this way, normally. Certainly not on any hot paths. The purpose of these methods is to construct addresses for which a RDNS lookup is never performed, and the code that exists does that pretty well. The primary question is, do we see a measurable improvement with the patch as it stands, in a real-world application (e.g. a Quarkus application), which is greater than the error%? If not, then I don't think the complexity is warranted (in addition to the message "we provide these methods to construct an address, but do not use them because they are too slow"). |
Addendum: In retrospect, the best approach would have been to create these as static methods with return a dynamic constant that lazily constructs each address. But, the only way we could possibly move to that approach (leaving out the necessary bytecode processing, which is not trivial) would be to deprecate the fields, which would not fix the problem until they are actually removed. So, I'd say it's too late for that. |
I can share what I've learnt at #276 (comment): the difference exists in quickstart uses cases, which, although are not representative of any "real" usage IMO, are still used by some users (eg some experts which can share RSS random numbers on their social accounts) - which can affect us, somehow. |
Where do you get ~217KB from? Upstream already has the indy-removing change AFAICT... this patch only seems to reuse |
Sorry, to clarify, the measurement is with Quarkus main, and with a replace of
I didn't use any fancy measurement; a plain For reference, this is how it looks like in And this is what it looks like with Quarkus 3.5.0, before the WF Common (1.5.4) update: And with the PR change: |
This however all has no bearing on this PR. WF Common is clearly the main cost here. To maximize the measurement of this change, I'd recommend building |
It was from the allocation flamegraphs which @radcortez shared in #276 (comment) |
@dmlloyd let me apologize, my brain was still stuck at #275 which was already merged and for some reason I was thinking this PR to be that one instead 😂 I have probably misread my own comment at #276 (comment) in which I was providing the full analysis of this and the other PR, altogether, compared to without. This one is just saving to allocate few bytes (and Strings), because we already know the addresses, in string literal form; said that I have no problem to close it, in case it looks weird or useless... |
Also, use the new API to construct the global IP address constants, to avoid unnecessary `String` construction. Supercedes smallrye#276.
In that case WDYT of #279 as an API-friendly alternative? |
Yeah @radcortez in both I see that the strong concat indy is very relevant |
Yeah we all agree about the string concat issue. Now we just need to all agree that string concat is not the subject of this PR nor the subject of this discussion. 😉 |
I see @dmlloyd that @radcortez has been stucked with me on the previous pr ihih |
Also, use the new API to construct the global IP address constants, to avoid unnecessary `String` construction. Supercedes smallrye#276.
@franz1981 do we still want to pursue this further? |
We can ask @radcortez if dmlloyd@2274755 (which I like) superseded and fixed this same issue; if it does it, we can safely close it! |
Correct. |
No description provided.