Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: add the option to enable continuous (cpu) profiling #135972

Open
srosenberg opened this issue Nov 22, 2024 · 3 comments
Open

roachtest: add the option to enable continuous (cpu) profiling #135972

srosenberg opened this issue Nov 22, 2024 · 3 comments
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team

Comments

@srosenberg
Copy link
Member

srosenberg commented Nov 22, 2024

Cluster-wide automatic cpu profiling can be enabled via cluster setting [1]. There are multiple use-cases, including (manual) performance debugging, and profile-guided optimization [2].

Ideally, this option could be enabled metamorphically for any (eligible) roachtest. However, at this time, we don't have a general mechanism for metamorphic cluster settings [3]. As a temporary workaround, we could (randomly) enable cpu profiling before a given test is executed (i.e., TestSpec.Run). Additionally, we could consider adding a CLI option, e.g., roachtest run --enable-cpu-profling ... to force the corresponding cluster settings.

[1] https://www.cockroachlabs.com/docs/stable/automatic-cpu-profiler
[2] https://go.dev/doc/pgo
[3] #105807

Jira issue: CRDB-44802

@srosenberg srosenberg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team labels Nov 22, 2024
Copy link

blathers-crl bot commented Nov 22, 2024

cc @cockroachdb/test-eng

Copy link

blathers-crl bot commented Nov 22, 2024

This issue has multiple T-eam labels. Please make sure it only has one, or else issue synchronization will not work correctly.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@srosenberg
Copy link
Member Author

srosenberg commented Nov 22, 2024

Examples

Note, at this time I'm finding three existing roachtests which enable cpu profiles,

find pkg/cmd/roachtest/tests -name "*.go" |xargs grep -i "cpu_profile" |awk '{print $1}' |sort -u
pkg/cmd/roachtest/tests/kv.go:
pkg/cmd/roachtest/tests/rebalance_load.go:
pkg/cmd/roachtest/tests/restore.go:

E.g., in kv.go, we enable it unconditionally for the kv/gracefuldraining roachtest [1].

NOTE: .pprof files will be available under the artifacts directory upon test failure, only; e.g., [2].

NOTE: the default sampling interval is 10 seconds (see metricsSampleInterval [3]), thus cpu_profile.interval should be >= 10s.

Possible Workaround

The following pseudocode could be inserted right before a given roachtest is executed [4].

db := c.Conn(ctx, t.L(), 1)
defer db.Close()

for _, stmt := range []string{
  `SET CLUSTER SETTING server.cpu_profile.duration = '5s';`,
  `SET CLUSTER SETTING server.cpu_profile.interval  = '1m';`,
  `SET CLUSTER SETTING server.cpu_profile.cpu_usage_combined_threshold = '20';`,
  `SET CLUSTER SETTING server.cpu_profile.total_dump_size_limit = '256 MiB';`,
} {
   if _, err := db.ExecContext(ctx, stmt); err != nil {
     return err
   }
}

NOTE: when using a multi-tenant setup, we'd probably need to replace the first statement with,

db := c.Conn(ctx, t.L(), validationNode, option.VirtualClusterName("system"))

[1]

settings.ClusterSettings["server.cpu_profile.interval"] = "1s"

[2] https://teamcity.cockroachdb.com/repository/download/Cockroach_Nightlies_RoachtestNightlyGceBazel/17734111:id/kv/gracefuldraining/cpu_arch%3Darm64/run_1/artifacts.zip!/logs/1.unredacted/pprof_dump/cpuprof.2024-11-13T12_01_54.544.23.pprof
[3]
metricsSampleInterval := base.DefaultMetricsSampleInterval

[4]
s.Run(runCtx, t, c)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-testeng TestEng Team
Projects
None yet
Development

No branches or pull requests

1 participant