-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CFR and WTF diagnostics programs #88
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
cratedb-wtf
diagnostics programcratedb-wtf
diagnostics program
cratedb_toolkit/wtf/library.py
Outdated
class Logs: | ||
# TODO: Implement `tail` in one way or another. -- https://stackoverflow.com/q/4714975 | ||
# SELECT * FROM sys.jobs_log OFFSET -10; | ||
# SELECT * FROM sys.jobs_log OFFSET (SELECT count(*) FROM sys.jobs_log)-10; | ||
# https://cratedb.com/docs/crate/reference/en/latest/general/builtins/scalar-functions.html#to-char-expression-format-string | ||
# https://cratedb.com/docs/crate/reference/en/latest/general/builtins/scalar-functions.html#date-format-format-string-timezone-timestamp | ||
user_queries = """ | ||
SELECT | ||
DATE_FORMAT('%Y-%m-%dT%H:%i:%s.%f', started) AS started, | ||
DATE_FORMAT('%Y-%m-%dT%H:%i:%s.%f', ended) AS ended, | ||
classification, stmt, username, node | ||
FROM | ||
sys.jobs_log | ||
WHERE | ||
stmt NOT LIKE '%sys.%' AND | ||
stmt NOT LIKE '%information_schema.%' | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dear @hlcianfagna, @hammerhead, and @WalBeh,
I would like to implement tailing the sys.jobs_log
table in one way or another. Actually, tail --follow
. Did you manage to do that yet, using any kind of OFFSET/LIMIT magic?
SELECT * FROM sys.jobs_log OFFSET -10;
SELECT * FROM sys.jobs_log OFFSET (SELECT count(*) FROM sys.jobs_log)-10;
Those statements outlined above failed for me. Is there any way to get the number of total records out of the subselect into the OFFSET parameter? Most probably, I am only too silly to make it work, so I am humbly asking for your support.
Cheers,
Andreas.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SELECT CURRENT_TIMESTAMP AS last_timestamp,
(ended / 10000) * 10000 + 5000 AS ended_time,
COUNT(*) / 10.0 AS qps,
TRUNC(AVG(ended::bigint - started::bigint), 2) AS duration,
UPPER(regexp_matches(stmt, '^\s*(\w+).*') [1]) AS query_type
FROM sys.jobs_log
WHERE ended > now() - ('15 minutes')::interval
GROUP BY 1,
2,
5
ORDER BY ended_time ASC
This is the query used in the panel to show the latest 15 minutes QPS, perhaps it can serve as inspiration :P
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. So this is synthesizing the limiting by timestamp, compared to what is stored within the ended
column? Hmm. Naturally I'd favor a more generic solution, but in this case, it could make an acceptable workaround. Sweet.
What's stored in ended
when, well, the query has not finished yet? For emulating a tail -f
, I'd probably better rely on started
instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems empty when it doesn't have a value.
Something like
Query ran at 2023-12-02T11:02:47.288Z on CrateDB 5.5.0
SELECT
DATE_FORMAT(started)
FROM
"sys"."jobs_log"
ORDER BY
started DESC
LIMIT
10
QUERY OK, 10 record(s) returned in 0.0045s
date_format(started) |
---|
"2023-12-02T11:02:40.890000Z" |
"2023-12-02T11:02:40.825000Z" |
"2023-12-02T11:02:40.822000Z" |
"2023-12-02T11:02:40.819000Z" |
"2023-12-02T11:02:40.819000Z" |
"2023-12-02T11:02:40.819000Z" |
"2023-12-02T11:02:40.819000Z" |
"2023-12-02T11:02:35.892000Z" |
"2023-12-02T11:02:35.821000Z" |
"2023-12-02T11:02:35.821000Z" |
Seem to work just fine as tail -f
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To follow what is new I would keep track of the last started timestamp observed and query what is new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In order to get this patch merged, we postponed this by adding it as an item to the backlog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We created a dedicated issue, because it's a prominent feature yet missing.
0e3ca31
to
002a255
Compare
3f53a92
to
2b2e8ff
Compare
This comment was marked as off-topic.
This comment was marked as off-topic.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #88 +/- ##
==========================================
+ Coverage 79.01% 80.59% +1.57%
==========================================
Files 56 68 +12
Lines 1973 2705 +732
==========================================
+ Hits 1559 2180 +621
- Misses 414 525 +111
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
2f96848
to
f0902b2
Compare
cratedb-wtf
diagnostics program5f62d1a
to
c3222b3
Compare
4d3b461
to
0b3a47c
Compare
not_started = InfoElement( | ||
name="shard_not_started", | ||
label="Shards not started", | ||
sql=""" | ||
SELECT * | ||
FROM sys.allocations | ||
WHERE current_state != 'STARTED'; | ||
""", | ||
description="Information about shards which have not been started.", | ||
) | ||
not_started_count = InfoElement( | ||
name="shard_not_started_count", | ||
label="Number of shards not started", | ||
description="Total number of shards which have not been started.", | ||
sql=""" | ||
SELECT COUNT(*) AS not_started_count | ||
FROM sys.allocations | ||
WHERE current_state != 'STARTED'; | ||
""", | ||
transform=get_single_value("not_started_count"), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hlcianfagna just shared this snippet gem which might be sensible in this context. Thanks.
Regarding checking which shards are being relocated and any allocation issues as well as reasons for it, this information can be found by querying
sys.shards
andsys.allocations
for all shards that are not started.SELECT * FROM sys.shards WHERE state <> 'STARTED'; SELECT * FROM sys.allocations WHERE current_state <> 'STARTED';
73b821a
to
c16ae9d
Compare
Is this already ready to reviewed or is this still in a draft mode? I'd be very fine this out of draft mode. |
Thank you for your excellent and thorough review. Many quirks you outlined have been originating from blatantly copying together valuable-looking queries from different sources. Trying to bring them into the shape outlined through I like your suggestions, and will improve each item through corresponding fixups, or defer relevant items to later iterations. |
Thanks to your diligent review, this process yielded a whopping number of five items to be improved. I am deliberately postponing them to a later iteration, in order to get the PR merged soon, so we can start shipping the program. I hope you agree with that decision.
|
Re-issuing this PR for another review now. If you agree with all other resolvements, fixes, deferrals, or not, please let me also know which improvement you suggest at #88 (comment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice stuff, thanks!
Probably a few commits could be squashed...
Add basic implementation for `sys-export` and `sys-import` subcommands. It is about exporting system tables of CrateDB into SQL DDL and JSONL files, and re-importing them for later analysis.
- Forward CLI options `--listen` and `--reload` to implementation - Improve documentation about the `--listen` and `--reload` options - Assign random port when `--listen` is supplied without port number
> This looks unreasonable complicated to just translate the primary boolean > into a string. Co-authored-by: Sebastian Utz <su@rtme.net>
Thanks for the acknowledgement. While I agree, I don't know which commits exactly to squash best, and how to shape corresponding scopes and meaningful commit messages. So, I will be merging as-is, in order not to delay this patch any longer. Being still in its infancy, I think it is good enough to have more detailed commit granularity in CrateDB Toolkit, until it will reach 0.1.x, or such. |
About
WTF
There are many smart queries to leverage information from CrateDB's internal
sys.*
tables. This subsystem aims to bundle and collect them, in order to unlock easy access from CLI and HTTP interfaces, and to be able to use them as building blocks for other software components.CFR
Export and import CrateDB system tables conveniently.
Status
It is still a work in progress, and needs more attention and love. What can be done right now is outlined within the preview section. Any kind of help to expand this is much appreciated.
Documentation
Rendered in preview mode.
Both documents include directives to outline how operations work using Docker. However, the program can also be installed natively, also while not on PyPI yet. Please use one of those commands to acquire the software. When using Docker, make sure to use the
cratedb-toolkit:pr-88
image on GHCR for preview purposes.Help
Backlog
[ ] Handovers.