Add JFR event #19

yawkat · 2024-06-18T11:03:31Z

This PR adds a JFR event that is triggered whenever there is a "thrashing" event where the cached interface is changed. The event contains the concrete class, the interface class, and a stack trace. It is disabled by default so that there is no impact when JFR is not used.

Additionally:

Add a check that a modified class' loader actually can access the TraceInstanceOf class. This is necessary because gradle test workers have a filtering classloader for internal code that does not include the agent, and any type check in that class loader would fail.
Enable the dependency reduced pom to allow for proper publishing of the shaded jar to a maven repo.

This PR adds a JFR event that is triggered whenever there is a "thrashing" event where the cached interface is changed. The event contains the concrete class, the interface class, and a stack trace. It is disabled by default so that there is no impact when JFR is not used. Additionally: - Add a check that a modified class' loader actually can access the TraceInstanceOf class. This is necessary because gradle test workers have a filtering classloader for internal code that does not include the agent, and any type check in that class loader would fail. - Enable the dependency reduced pom to allow for proper publishing of the shaded jar to a maven repo.

yawkat · 2024-06-18T11:04:19Z

@franz1981 these are the code changes. LMK if you want to handle publishing to maven central on the redhat side or if we should fork and publish on the micronaut side.

Add a test to the benchmarks module that runs certain JMH benchmarks with the redhat type pollution agent ( https://github.com/RedHatPerf/type-pollution-agent ) and triggers a test failure if there is excessive type pollution. This works by adding a new test task which includes the agent, and then using JFRUnit to access the JFR events emitted by the agents. If there are too many thrash events for a single concrete type, details are logged and the test fails. Additionally: - Add a JMH benchmark that mimics the TechEmpower benchmark - Fix various type pollution issues that were found using the FullHttpStackBenchmark This PR is still pending the upstream PR to introduce JFR events to the agent ( RedHatPerf/type-pollution-agent#19 ), and actual publishing of the type pollution agent. We will either wait for RH to publish the agent to maven central, or we will publish it ourselves if they don't want to do it (reply pending).

franz1981

The idea is good, but I would narrow it to a specific use case - maybe with an example/soak test to make sure it works for real and which help users to understand how to use it properly.

Additionally, can be useful to have a CLI command shipped with it which can process the jfr bin and produce a report too.

Regardless, many thanks for the PR @yawkat : is much appreciated!

franz1981 · 2024-06-18T12:14:14Z

agent/src/main/java/io/type/pollution/agent/TraceInstanceOf.java

@@ -101,6 +101,13 @@ public void onTypeCheckHit(Class interfaceClazz, String trace) {
            LAST_SEEN_INTERFACE_UPDATER.lazySet(this, interfaceClazz);
            if (lastSeen != null) {
                updateTraceCount(interfaceClazz, trace);
+
+                TypeCheckThrashEvent event = new TypeCheckThrashEvent();


type-pollution-agent/agent/src/main/java/io/type/pollution/agent/TraceInstanceOf.java

Line 81 in 5324120

updateTraceCount(interfaceClazz, trace);

can you add a Miss one too - or use some field to distinguish between them?

IDK if you plan to have it running in a proper benchmark but this can be a VERY high frequency event, and I've designed it to run in the hot path, at the time e.g. traces are pre-computed on bytecode transformation
if you plan to do something similar, maybe you just want to emit the summary event instead - like the report - on demand or at intervals or using a similar trick than the new TLAB JFR events (by using a counter which "overflow" and emit events only if it overflows).
If is not meant to be used in a real benchmark (but sadly that's the easier way to make it happen too :( ) let's leave it as it is and in case, change it in something cheaper after, if it create problems.

Yes indeed, this code is very hot. I disabled the event by default for that reason. The performance overhead is substantial in my testing, but only when there is thrashing (for my tests, that's the failure case).

I am not using this inside a benchmark, though I am repurposing jmh benchmarks for this test. So it's not performance critical to me. I hope this logic is sufficient to make it no overhead for the case where the jfr event is not enabled (isEnabled should be constant folded).

The problem with emitting the report is that it is a complicated nested data structure. This makes emitting but also parsing more complicated. I didn't feel it was worth it for the jfrunit use case.

I am not using this inside a benchmark

yeah I can see it why but remember that the "pollution" pattern depends by both the ordering of type checks - which can trash the cache using a single thread OR because of high level of concurrency - which means you need an option to do it in a proper real world benchmark too, although it will be used just to detect that there's some trashing going on and fail a release - for example. Something which cannot be done with simple unit tests, really.

What can be done here, to enable using this in high-throughput scenario is:
i.e.

use a atomic reference field updater which contains the current TypeCheckThrashEvent stored within the counter instance: it has to contain an atomic long with the number of accumulated thrashed events

when the current one overflow - we can perform a compareAndSet(usedEvent, null) pro-bono: if it fails is not important

any racy thread which overflow > 1 more past the threshold of the current event, can try create a new initialized event with compareAndSet(usedEvent, new Event(1)) - if it failes, can retry or give up, really - but with compareAndExchange, in theory can already check if there's a newer event ready to pick it

In a jfr unit test, the default threshold can be 1: we don't care about performance and should "just work": in a benchmark we can have 1000 or 10000 and the actual event count must be scaled using the limit, as per TLAB allocation events, in JFR.

Let me know if it seems appealing for you: if not we can go through this one, but when you'll need it for benchmark, we should get back to this, eventually.

franz1981 · 2024-06-18T12:19:20Z

agent/src/main/java/io/type/pollution/agent/TypeCheckThrashEvent.java

+@Description("An interface type check is invalidating a previously cached type check result. If this happens often, it can lead to thrashing and contention of the cache field.")
+@StackTrace
+@Enabled(false)
+public class TypeCheckThrashEvent extends Event {


The stack trace help - but it will print it till the tracing class, which is not very nice.
Instead, you can still reuse the existing trace regardless, because is free and cheap (is precomputed) and is "human actionable" the same, if pasted in the Idea output (as explained in the README).

yawkat · 2024-06-18T12:59:33Z

I think a CLI or similar make little sense for the JFRUnit use case. It would make sense if the goal is to replace the current reporting functionality (the FileChannel stuff etc) with JFR-based tooling. This would need more work, though.

yawkat mentioned this pull request Jun 18, 2024

Type pollution test micronaut-projects/micronaut-core#10918

Merged

franz1981 requested changes Jun 18, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add JFR event #19

Add JFR event #19

yawkat commented Jun 18, 2024

yawkat commented Jun 18, 2024

franz1981 left a comment •

edited

Loading

franz1981 Jun 18, 2024

yawkat Jun 18, 2024

franz1981 Jun 18, 2024

franz1981 Jun 18, 2024

yawkat commented Jun 18, 2024

Add JFR event #19

Are you sure you want to change the base?

Add JFR event #19

Conversation

yawkat commented Jun 18, 2024

yawkat commented Jun 18, 2024

franz1981 left a comment • edited Loading

Choose a reason for hiding this comment

franz1981 Jun 18, 2024

Choose a reason for hiding this comment

yawkat Jun 18, 2024

Choose a reason for hiding this comment

franz1981 Jun 18, 2024

Choose a reason for hiding this comment

franz1981 Jun 18, 2024

Choose a reason for hiding this comment

yawkat commented Jun 18, 2024

franz1981 left a comment •

edited

Loading