Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clickhouse Overcommit error #578

Open
tsearle opened this issue Sep 26, 2024 · 10 comments
Open

Clickhouse Overcommit error #578

tsearle opened this issue Sep 26, 2024 · 10 comments

Comments

@tsearle
Copy link
Contributor

tsearle commented Sep 26, 2024

Qryn threw the following error
Code: 241. DB::Exception: Memory limit (total) exceeded: would use 7.21 GiB (attempt to allocate chunk of 4476253 bytes), maximum: 7.20 GiB. OvercommitTracker decision: Query was selected to stop by OvercommitTracker.: (while reading column labels): (while reading from part data/3bebf7bf-6abc-4a92-ae16-834c90cbafbc/20240925_0_24268_13230/ in table qryn.time_series (3bebf7bf-6abc-4a92-ae16-834c90cbafbc) located on disk s3WithKeeperDisk of type s3, from mark 0 with max_rows_to_read = 8192): While executing MergeTreeSelect(pool: PrefetchedReadPool, algorithm: Thread). (MEMORY_LIMIT_EXCEEDED) (version 24.5.1.22957 (official build))

When digging on the clickhouse console, the query that failed appears to be...
WITH idx AS (select fingerprint from qryn.time_series_gin as time_series_gin where ((((key = 'name') and (val = 'kamailio_app_server_dead'))) and (date >= toDate(fromUnixTimestamp(1727294240))) and (date <= toDate(fromUnixTimestamp(1727294540))) and (type in (0,0))) group by fingerprint having (groupBitOr(bitShiftLeft(((key = 'name') and (val = 'kamailio_app_server_dead'))::UInt64, 0)) = 1)), raw AS (select argMaxMerge(last) as value,fingerprint,intDiv(timestamp_ns, 15000000000) * 15000 as timestamp_ms from metrics_15s as metrics_15s where ((fingerprint in (idx)) and (timestamp_ns >= 1727294240000000000) and (timestamp_ns <= 1727294540000000000) and (type in (0,0))) group by fingerprint,timestamp_ms order by fingerprint,timestamp_ms), timeSeries AS (select fingerprint,arraySort(JSONExtractKeysAndValues(labels, 'String')) as labels from qryn.time_series where ((fingerprint in (idx)) and (type in (0,0)))) select any(labels) as stream,arraySort(groupArray((raw.timestamp_ms, raw.value))) as values from raw as raw any left join timeSeries as time_series on time_series.fingerprint = raw.fingerprint group by raw.fingerprint order by raw.fingerprint FORMAT RowBinary

This caused the prometheus query (for alarm monitoring) to fail:
kamailio_app_server_dead{}

Am I doing something stupidly wrong?

@akvlad
Copy link
Collaborator

akvlad commented Sep 26, 2024

Hello @tsearle . Let's run a couple of helping requests in order to understand where the bottleneck is:

select count(), uniq(fingerprint) from (SELECT fingerprint
        FROM time_series_gin AS time_series_gin
        WHERE ((key = 'name') AND (val = 'kamailio_app_server_dead')) AND (date >= toDate(fromUnixTimestamp(1727294240))) AND (date <= toDate(fromUnixTimestamp(1727294540))) AND (type IN (0, 0))
        GROUP BY fingerprint
        HAVING groupBitOr(bitShiftLeft(CAST((key = 'name') AND (val = 'kamailio_app_server_dead'), 'UInt64'), 0)) = 1)
WITH
    idx AS
    (
        SELECT fingerprint
        FROM time_series_gin AS time_series_gin
        WHERE ((key = 'name') AND (val = 'kamailio_app_server_dead')) AND (date >= toDate(fromUnixTimestamp(1727294240))) AND (date <= toDate(fromUnixTimestamp(1727294540))) AND (type IN (0, 0))
        GROUP BY fingerprint
        HAVING groupBitOr(bitShiftLeft(CAST((key = 'name') AND (val = 'kamailio_app_server_dead'), 'UInt64'), 0)) = 1
    ),
    raw AS
    (
        SELECT
            argMaxMerge(last) AS value,
            fingerprint,
            intDiv(timestamp_ns, 15000000000) * 15000 AS timestamp_ms
        FROM metrics_15s AS metrics_15s
        WHERE (fingerprint IN (idx)) AND (timestamp_ns >= 1727294240000000000) AND (timestamp_ns <= 1727294540000000000) AND (type IN (0, 0))
        GROUP BY
            fingerprint,
            timestamp_ms
        ORDER BY
            fingerprint ASC,
            timestamp_ms ASC
    )
select count() from raw;

Please share the results of both of them. There should be a set of numbers about the overall cardinality you have.

@tsearle
Copy link
Contributor Author

tsearle commented Sep 27, 2024

first query
count() 0
uniq(fingerprint) 0

2nd query
count() 0

are the queries correct?

@akvlad
Copy link
Collaborator

akvlad commented Sep 27, 2024

@tsearle please try doing USE qryn; before queries.

@tsearle
Copy link
Contributor Author

tsearle commented Sep 27, 2024

to my knowledge this with use qryn (the qryn db is selected)

Screenshot 2024-09-27 at 14 22 02

explicitly adding "use qryn" doesn't change the result

Screenshot 2024-09-27 at 14 23 30

@akvlad
Copy link
Collaborator

akvlad commented Sep 27, 2024

@tsearle in this case the request you have posted should use 0MB of RAM and finish in almost no time. There's no way it is the cause.

Can you please recheck the full request in the issue description manually?

@akvlad
Copy link
Collaborator

akvlad commented Sep 30, 2024

@tsearle I have just understood what's wrong. It's Github markdown. Please try These requests instead.

select count(), uniq(fingerprint) from (SELECT fingerprint
        FROM time_series_gin AS time_series_gin
        WHERE ((key = '__name__') AND (val = 'kamailio_app_server_dead')) AND (date >= toDate(fromUnixTimestamp(1727294240))) AND (date <= toDate(fromUnixTimestamp(1727294540))) AND (type IN (0, 0))
        GROUP BY fingerprint
        HAVING groupBitOr(bitShiftLeft(CAST((key = '__name__') AND (val = 'kamailio_app_server_dead'), 'UInt64'), 0)) = 1)
WITH
    idx AS
    (
        SELECT fingerprint
        FROM time_series_gin AS time_series_gin
        WHERE ((key = '__name__') AND (val = 'kamailio_app_server_dead')) AND (date >= toDate(fromUnixTimestamp(1727294240))) AND (date <= toDate(fromUnixTimestamp(1727294540))) AND (type IN (0, 0))
        GROUP BY fingerprint
        HAVING groupBitOr(bitShiftLeft(CAST((key = '__name__') AND (val = 'kamailio_app_server_dead'), 'UInt64'), 0)) = 1
    ),
    raw AS
    (
        SELECT
            argMaxMerge(last) AS value,
            fingerprint,
            intDiv(timestamp_ns, 15000000000) * 15000 AS timestamp_ms
        FROM metrics_15s AS metrics_15s
        WHERE (fingerprint IN (idx)) AND (timestamp_ns >= 1727294240000000000) AND (timestamp_ns <= 1727294540000000000) AND (type IN (0, 0))
        GROUP BY
            fingerprint,
            timestamp_ms
        ORDER BY
            fingerprint ASC,
            timestamp_ms ASC
    )
select count() from raw;

Kind request for the future to use triple backtics when you paste the request so it's considered "code". Github processor eats parts of text like these double underscores.

Sorry for the inconvenience.

@gabrielgontariu
Copy link

gabrielgontariu commented Sep 30, 2024

@akvlad attached results
Uploading Screenshot 2024-09-30 at 11.07.25.png…

@lmangani
Copy link
Collaborator

@gabrielgontariu @tsearle the screenshot didn't work unfortunately

@tsearle
Copy link
Contributor Author

tsearle commented Sep 30, 2024

Screenshot_20240930-102326.png

Screenshot_20240930-102411.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants