Persistent audit log around cluster operations #135999

benbardin · 2024-11-22T16:45:05Z

In reviewing a recent customer incident, we realized we were dependent on the customer's logging of key operational events like node startup, shutdown, and upgrade. Future investigations would be aided by a CRDB-owned persistent log of these operations across the cluster. Ideally, this log would store:

Node ID
Timestamp
Event type (startup/cluster join/drain/shutdown)
SHA/version

Ideally, this log would never wrap - or if needed, would wrap on a Very Long timescale.
Also ideally, this log would be kept on every node for every node, so the record could include decommissioned nodes.

(Full context here.) Thank you!

cc @arulajmani @ajstorm @nicktrav

Epic CRDB-42978

benbardin · 2024-11-22T17:36:37Z

Oh, I see much of this information is in system.eventlog.txt! That's great. So the request here would be to expand that log slightly, and verify its wrapping behavior.

benbardin · 2024-11-22T18:14:57Z

More details in https://cockroachlabs.slack.com/archives/C07VDN3CA3U/p1732294864390839

benbardin added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-postmortem Originated from a Postmortem action item. T-observability labels Nov 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent audit log around cluster operations #135999

Persistent audit log around cluster operations #135999

benbardin commented Nov 22, 2024 •

edited by exalate-issue-sync bot

Loading

benbardin commented Nov 22, 2024

benbardin commented Nov 22, 2024

Persistent audit log around cluster operations #135999

Persistent audit log around cluster operations #135999

Comments

benbardin commented Nov 22, 2024 • edited by exalate-issue-sync bot Loading

benbardin commented Nov 22, 2024

benbardin commented Nov 22, 2024

benbardin commented Nov 22, 2024 •

edited by exalate-issue-sync bot

Loading