Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add JFR event #19

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add JFR event #19

wants to merge 1 commit into from

Conversation

yawkat
Copy link

@yawkat yawkat commented Jun 18, 2024

This PR adds a JFR event that is triggered whenever there is a "thrashing" event where the cached interface is changed. The event contains the concrete class, the interface class, and a stack trace. It is disabled by default so that there is no impact when JFR is not used.

Additionally:

  • Add a check that a modified class' loader actually can access the TraceInstanceOf class. This is necessary because gradle test workers have a filtering classloader for internal code that does not include the agent, and any type check in that class loader would fail.
  • Enable the dependency reduced pom to allow for proper publishing of the shaded jar to a maven repo.

This PR adds a JFR event that is triggered whenever there is a "thrashing" event where the cached interface is changed. The event contains the concrete class, the interface class, and a stack trace. It is disabled by default so that there is no impact when JFR is not used.

Additionally:

- Add a check that a modified class' loader actually can access the TraceInstanceOf class. This is necessary because gradle test workers have a filtering classloader for internal code that does not include the agent, and any type check in that class loader would fail.
- Enable the dependency reduced pom to allow for proper publishing of the shaded jar to a maven repo.
@yawkat
Copy link
Author

yawkat commented Jun 18, 2024

@franz1981 these are the code changes. LMK if you want to handle publishing to maven central on the redhat side or if we should fork and publish on the micronaut side.

yawkat added a commit to micronaut-projects/micronaut-core that referenced this pull request Jun 18, 2024
Add a test to the benchmarks module that runs certain JMH benchmarks with the redhat type pollution agent ( https://github.com/RedHatPerf/type-pollution-agent ) and triggers a test failure if there is excessive type pollution.

This works by adding a new test task which includes the agent, and then using JFRUnit to access the JFR events emitted by the agents. If there are too many thrash events for a single concrete type, details are logged and the test fails.

Additionally:
- Add a JMH benchmark that mimics the TechEmpower benchmark
- Fix various type pollution issues that were found using the FullHttpStackBenchmark

This PR is still pending the upstream PR to introduce JFR events to the agent ( RedHatPerf/type-pollution-agent#19 ), and actual publishing of the type pollution agent. We will either wait for RH to publish the agent to maven central, or we will publish it ourselves if they don't want to do it (reply pending).
Copy link
Contributor

@franz1981 franz1981 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is good, but I would narrow it to a specific use case - maybe with an example/soak test to make sure it works for real and which help users to understand how to use it properly.

Additionally, can be useful to have a CLI command shipped with it which can process the jfr bin and produce a report too.

Regardless, many thanks for the PR @yawkat : is much appreciated!

@@ -101,6 +101,13 @@ public void onTypeCheckHit(Class interfaceClazz, String trace) {
LAST_SEEN_INTERFACE_UPDATER.lazySet(this, interfaceClazz);
if (lastSeen != null) {
updateTraceCount(interfaceClazz, trace);

TypeCheckThrashEvent event = new TypeCheckThrashEvent();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a Miss one too - or use some field to distinguish between them?

IDK if you plan to have it running in a proper benchmark but this can be a VERY high frequency event, and I've designed it to run in the hot path, at the time e.g. traces are pre-computed on bytecode transformation
if you plan to do something similar, maybe you just want to emit the summary event instead - like the report - on demand or at intervals or using a similar trick than the new TLAB JFR events (by using a counter which "overflow" and emit events only if it overflows).
If is not meant to be used in a real benchmark (but sadly that's the easier way to make it happen too :( ) let's leave it as it is and in case, change it in something cheaper after, if it create problems.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, this code is very hot. I disabled the event by default for that reason. The performance overhead is substantial in my testing, but only when there is thrashing (for my tests, that's the failure case).

I am not using this inside a benchmark, though I am repurposing jmh benchmarks for this test. So it's not performance critical to me. I hope this logic is sufficient to make it no overhead for the case where the jfr event is not enabled (isEnabled should be constant folded).

The problem with emitting the report is that it is a complicated nested data structure. This makes emitting but also parsing more complicated. I didn't feel it was worth it for the jfrunit use case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not using this inside a benchmark

yeah I can see it why but remember that the "pollution" pattern depends by both the ordering of type checks - which can trash the cache using a single thread OR because of high level of concurrency - which means you need an option to do it in a proper real world benchmark too, although it will be used just to detect that there's some trashing going on and fail a release - for example. Something which cannot be done with simple unit tests, really.

What can be done here, to enable using this in high-throughput scenario is:
i.e.

  1. use a atomic reference field updater which contains the current TypeCheckThrashEvent stored within the counter instance: it has to contain an atomic long with the number of accumulated thrashed events
  2. when the current one overflow - we can perform a compareAndSet(usedEvent, null) pro-bono: if it fails is not important
  3. any racy thread which overflow > 1 more past the threshold of the current event, can try create a new initialized event with compareAndSet(usedEvent, new Event(1)) - if it failes, can retry or give up, really - but with compareAndExchange, in theory can already check if there's a newer event ready to pick it

In a jfr unit test, the default threshold can be 1: we don't care about performance and should "just work": in a benchmark we can have 1000 or 10000 and the actual event count must be scaled using the limit, as per TLAB allocation events, in JFR.

Let me know if it seems appealing for you: if not we can go through this one, but when you'll need it for benchmark, we should get back to this, eventually.

@Description("An interface type check is invalidating a previously cached type check result. If this happens often, it can lead to thrashing and contention of the cache field.")
@StackTrace
@Enabled(false)
public class TypeCheckThrashEvent extends Event {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The stack trace help - but it will print it till the tracing class, which is not very nice.
Instead, you can still reuse the existing trace regardless, because is free and cheap (is precomputed) and is "human actionable" the same, if pasted in the Idea output (as explained in the README).

@yawkat
Copy link
Author

yawkat commented Jun 18, 2024

I think a CLI or similar make little sense for the JFRUnit use case. It would make sense if the goal is to replace the current reporting functionality (the FileChannel stuff etc) with JFR-based tooling. This would need more work, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants