-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add JFR event #19
base: master
Are you sure you want to change the base?
Add JFR event #19
Conversation
This PR adds a JFR event that is triggered whenever there is a "thrashing" event where the cached interface is changed. The event contains the concrete class, the interface class, and a stack trace. It is disabled by default so that there is no impact when JFR is not used. Additionally: - Add a check that a modified class' loader actually can access the TraceInstanceOf class. This is necessary because gradle test workers have a filtering classloader for internal code that does not include the agent, and any type check in that class loader would fail. - Enable the dependency reduced pom to allow for proper publishing of the shaded jar to a maven repo.
@franz1981 these are the code changes. LMK if you want to handle publishing to maven central on the redhat side or if we should fork and publish on the micronaut side. |
Add a test to the benchmarks module that runs certain JMH benchmarks with the redhat type pollution agent ( https://github.com/RedHatPerf/type-pollution-agent ) and triggers a test failure if there is excessive type pollution. This works by adding a new test task which includes the agent, and then using JFRUnit to access the JFR events emitted by the agents. If there are too many thrash events for a single concrete type, details are logged and the test fails. Additionally: - Add a JMH benchmark that mimics the TechEmpower benchmark - Fix various type pollution issues that were found using the FullHttpStackBenchmark This PR is still pending the upstream PR to introduce JFR events to the agent ( RedHatPerf/type-pollution-agent#19 ), and actual publishing of the type pollution agent. We will either wait for RH to publish the agent to maven central, or we will publish it ourselves if they don't want to do it (reply pending).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is good, but I would narrow it to a specific use case - maybe with an example/soak test to make sure it works for real and which help users to understand how to use it properly.
Additionally, can be useful to have a CLI command shipped with it which can process the jfr bin and produce a report too.
Regardless, many thanks for the PR @yawkat : is much appreciated!
@@ -101,6 +101,13 @@ public void onTypeCheckHit(Class interfaceClazz, String trace) { | |||
LAST_SEEN_INTERFACE_UPDATER.lazySet(this, interfaceClazz); | |||
if (lastSeen != null) { | |||
updateTraceCount(interfaceClazz, trace); | |||
|
|||
TypeCheckThrashEvent event = new TypeCheckThrashEvent(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
type-pollution-agent/agent/src/main/java/io/type/pollution/agent/TraceInstanceOf.java
Line 81 in 5324120
updateTraceCount(interfaceClazz, trace); |
IDK if you plan to have it running in a proper benchmark but this can be a VERY high frequency event, and I've designed it to run in the hot path, at the time e.g. traces are pre-computed on bytecode transformation
if you plan to do something similar, maybe you just want to emit the summary event instead - like the report - on demand or at intervals or using a similar trick than the new TLAB JFR events (by using a counter which "overflow" and emit events only if it overflows).
If is not meant to be used in a real benchmark (but sadly that's the easier way to make it happen too :( ) let's leave it as it is and in case, change it in something cheaper after, if it create problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes indeed, this code is very hot. I disabled the event by default for that reason. The performance overhead is substantial in my testing, but only when there is thrashing (for my tests, that's the failure case).
I am not using this inside a benchmark, though I am repurposing jmh benchmarks for this test. So it's not performance critical to me. I hope this logic is sufficient to make it no overhead for the case where the jfr event is not enabled (isEnabled should be constant folded).
The problem with emitting the report is that it is a complicated nested data structure. This makes emitting but also parsing more complicated. I didn't feel it was worth it for the jfrunit use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not using this inside a benchmark
yeah I can see it why but remember that the "pollution" pattern depends by both the ordering of type checks - which can trash the cache using a single thread OR because of high level of concurrency - which means you need an option to do it in a proper real world benchmark too, although it will be used just to detect that there's some trashing going on and fail a release - for example. Something which cannot be done with simple unit tests, really.
What can be done here, to enable using this in high-throughput scenario is:
i.e.
- use a atomic reference field updater which contains the current
TypeCheckThrashEvent
stored within the counter instance: it has to contain an atomic long with the number of accumulated thrashed events - when the current one overflow - we can perform a compareAndSet(usedEvent, null) pro-bono: if it fails is not important
- any racy thread which overflow > 1 more past the threshold of the current event, can try create a new initialized event with compareAndSet(usedEvent, new Event(1)) - if it failes, can retry or give up, really - but with compareAndExchange, in theory can already check if there's a newer event ready to pick it
In a jfr unit test, the default threshold can be 1: we don't care about performance and should "just work": in a benchmark we can have 1000 or 10000 and the actual event count must be scaled using the limit, as per TLAB allocation events, in JFR.
Let me know if it seems appealing for you: if not we can go through this one, but when you'll need it for benchmark, we should get back to this, eventually.
@Description("An interface type check is invalidating a previously cached type check result. If this happens often, it can lead to thrashing and contention of the cache field.") | ||
@StackTrace | ||
@Enabled(false) | ||
public class TypeCheckThrashEvent extends Event { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stack trace help - but it will print it till the tracing class, which is not very nice.
Instead, you can still reuse the existing trace
regardless, because is free and cheap (is precomputed) and is "human actionable" the same, if pasted in the Idea output (as explained in the README).
I think a CLI or similar make little sense for the JFRUnit use case. It would make sense if the goal is to replace the current reporting functionality (the FileChannel stuff etc) with JFR-based tooling. This would need more work, though. |
This PR adds a JFR event that is triggered whenever there is a "thrashing" event where the cached interface is changed. The event contains the concrete class, the interface class, and a stack trace. It is disabled by default so that there is no impact when JFR is not used.
Additionally: