feat: Support split align and caching for instant metric query results #11814

kavirajk · 2024-01-29T08:34:31Z

What this PR does / why we need it:
Follow up to metadata results caching, this PR adds support for instant metric queries. It also adds support to enable aligning the subquery for more reusability.

Config changes:

cache_instant_metric_results - Enable/disable (default disable) - Boolean
instant_metric_results_cache - CacheConfig to tweak (usually not needed, have sane defaults)
instant_metric_query_split_align - Enable/disable (default disable) - Boolean

How it works (without split align)

Consider following query
Query: sum(rate({foo="bar"}[3h])) @ 12:34:00
SplitInterval: 1h

So we need results from 09:34:00 to 12:34:00. (3h total)

Currently, After range mapper, it splits into

sum(rate({foo="bar"}[1h])) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 1h)) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 2h)) @ 12:34:00

Even if we remove the offset it turns into

sum(rate({foo="bar"}[1h])) @ 12:34:00
sum(rate({foo="bar"}[1h])) @ 11:34:00
sum(rate({foo="bar"}[1h])) @ 10:34:00

But the problem is now eval time is not aligned. And it's mostly unlikely these subqueries are reused.

How it works (with split align)

Now consider the same exact query
Query: sum(rate({foo="bar"}[3h])) @ 12:34:00
SplitInterval: 1h

After range mapper, it splits into

sum(rate({foo="bar"}[34m])) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 34m)) @ 12:34:00
sum(rate({foo="bar"}[1h] offset 1h 34m)) @ 12:34:00
sum(rate({foo="bar"}[26m] offset 2h 34m)) @ 12:34:00

And after removing the offset it tuns into (properly eval time aligned)

sum(rate({foo="bar"}[34m])) @ 12:34:00
sum(rate({foo="bar"}[1h])) @ 12:00:00
sum(rate({foo="bar"}[1h])) @ 11:00:00
sum(rate({foo="bar"}[26m])) @ 10:00:00

Now we have (2) and (3) subqueries properly aligned and highly likely be reused.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
Will have some follow up PRs (tried doing it in single PR, but turns out really hard to review and big change) to

Refactor configs to unify and simplify all the results cache configs
Refactor results cache metrics, to avoid lots of duplicates
Simplify some protobuf definitions (particularly stats.proto)

Checklist

Reviewed the CONTRIBUTING.md guide (required)
Documentation added
Tests updated
CHANGELOG.md updated
- If the change is worth mentioning in the release notes, add add-to-release-notes label
Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Fix test cases that failed with this changes Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

pkg/querier/queryrange/downstreamer.go

pkg/logql/rangemapper.go

1. Update both start and end when removing offset 2. Unify subqueries generation in splitalign method Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

pkg/validation/limits.go

docs/sources/configure/_index.md

pkg/querier/queryrange/instant_metric_cache.go

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

dannykopping

Love the detailed description and lots of tests!
Added a few comments, mostly nits. Good job Kavi!

docs/sources/configure/_index.md

pkg/logql/rangemapper.go

pkg/logql/rangemapper_test.go

pkg/logqlmodel/stats/stats.proto

pkg/querier/queryrange/codec.go

pkg/querier/queryrange/instant_metric_cache.go

docs/sources/configure/_index.md

pkg/querier/queryrange/roundtrip.go

pkg/logql/rangemapper.go

ashwanthgoli · 2024-02-16T14:57:31Z

I need to do another pass to review the tests, rest lgtm. Nice one @kavirajk ❤️

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

dannykopping

LGTM!

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk · 2024-02-20T07:34:33Z

Caching and split align in action

Do instant query for 3h range (first time)

-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[3h]))'
2024/02/20 08:18:37 http://localhost:3100/loki/api/v1/query?direction=BACKWARD&limit=30&query=sum%28rate%28%7Bjob%3D%22varlogs%22%7D%5B3h%5D%29%29&time=1708413517893995000
[
  {
    "metric": {},
    "value": [
      1708413517.893,
      "0.21814814814814815"
    ]
  }

How the query is split and cache reqs and cache hits (these logs lines are from metrics.go and engine.go on query-frontend and queriers)

Quey-frontend (saying 4 requests were actually made for cache, and none of those got hit)

latency=fast query="sum(rate({job=\"varlogs\"}[3h]))" query_hash=1737987035 cache_result_req=4 cache_result_hit=0

How the above query got split and run

msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[1h]))" 
msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[1h]))"
msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[41m22s106ms]))" msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[18m37s893ms]))"

Do the instant query for 3h range (second time)

-bash-5.2$ ./cmd/logcli/logcli instant-query 'sum(rate({job="varlogs"}[3h]))'
2024/02/20 08:19:11 http://localhost:3100/loki/api/v1/query?direction=BACKWARD&limit=30&query=sum%28rate%28%7Bjob%3D%22varlogs%22%7D%5B3h%5D%29%29&time=1708413551869792000
[
  {
    "metric": {},
    "value": [
      1708413551.869,
      "0.02935185185185185"
    ]
  }

How the query is split and cache reqs and cache hits (these logs lines are from metrics.go and engine.go on query-frontend and queriers)

Quey-frontend (saying 4 requests were actually made for cache, 2 of those got hit)

latency=fast query="sum(rate({job=\"varlogs\"}[3h]))" query_hash=1737987035 cache_result_req=4 cache_result_hit=2

How the above query got split and run (only two queries that miss cache hit, got run this time)

msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[19m11s869ms]))" 
msg="executing query" type=instant query="sum(count_over_time({job=\"varlogs\"}[40m48s130ms]))"

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Use right split duration (new InstantSplitDuration) for instant queries Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

grafana#11814) Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk added 4 commits January 3, 2024 14:30

feat(caching): Support caching on instant metric queries results

5ba4fa4

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Merge branch 'main' into kavirajk/cache-instant-queries2

4d64df4

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

integrate the basic middleware

5ea7700

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

fixing overrides

c39b68d

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

pull-request-size bot added the size/L label Jan 29, 2024

kavirajk added 3 commits January 30, 2024 08:25

idk

e4fbe8f

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Tweak sub queries without offset before caching

27dcfe6

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

test to assert offset removal

57d77a9

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk force-pushed the kavirajk/cache-instant-queries2 branch from bb83a3a to 57d77a9 Compare January 30, 2024 10:55

kavirajk added 4 commits January 31, 2024 20:10

Fix timestamp adjustments

08c5b4b

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

missed error handling

bd558a1

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

fix failing TestMetricsTripperware_SplitShardStats test

566bc4f

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

tweak downstreamer test

e2e91b0

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

pull-request-size bot added size/XL and removed size/L labels Feb 2, 2024

kavirajk added 5 commits February 7, 2024 09:03

Fix split_by_range test cases for sub queries

d8ff56f

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

fix Downstream with offset removed test case

84bb4d4

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

update stats

634d7f8

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

support split and align of instant subquery for cache reuse

d538f67

Fix test cases that failed with this changes Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Fix some bugs on split align and add more tests

62cd346

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

pull-request-size bot added size/XXL and removed size/XL labels Feb 13, 2024

kavirajk changed the title ~~feat(caching): Support caching for instant metric query results~~ feat(caching): Support split align and caching for instant metric query results Feb 13, 2024

kavirajk added 2 commits February 13, 2024 09:33

Merge branch 'main' into kavirajk/cache-instant-queries2

ae2e565

fix some build failures from merge with main

597d40f

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk marked this pull request as ready for review February 13, 2024 08:48

kavirajk requested a review from a team as a code owner February 13, 2024 08:48

kavirajk changed the title ~~feat(caching): Support split align and caching for instant metric query results~~ feat: Support split align and caching for instant metric query results Feb 13, 2024

ashwanthgoli reviewed Feb 13, 2024

View reviewed changes

pkg/querier/queryrange/downstreamer.go Outdated Show resolved Hide resolved

pkg/logql/rangemapper.go Outdated Show resolved Hide resolved

PR remarks

fce06dc

1. Update both start and end when removing offset 2. Unify subqueries generation in splitalign method Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk added 3 commits February 16, 2024 11:07

Merge branch 'main' into kavirajk/cache-instant-queries2

a6fe289

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

fix additional arguments to results cache related to extent

998051a

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

make doc

4db5398

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

github-actions bot added the type/docs Issues related to technical documentation; the Docs Squad uses this label across many repositories label Feb 16, 2024

make format

c307592

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk requested a review from ashwanthgoli February 16, 2024 10:42

ashwanthgoli reviewed Feb 16, 2024

View reviewed changes

kavirajk added 3 commits February 16, 2024 14:33

PR remarks

292f13a

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

add changelog entry

5cb63d4

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

remove unused ingester query options

e373341

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

dannykopping reviewed Feb 16, 2024

View reviewed changes

ashwanthgoli reviewed Feb 16, 2024

View reviewed changes

pkg/logql/rangemapper.go Outdated Show resolved Hide resolved

pkg/logql/rangemapper.go Show resolved Hide resolved

kavirajk added 2 commits February 19, 2024 09:09

PR remarks

655844f

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

PR remarks and TODO to handle edge case

ef9afeb

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk requested review from dannykopping and ashwanthgoli February 19, 2024 08:25

dannykopping approved these changes Feb 19, 2024

View reviewed changes

kavirajk added 2 commits February 19, 2024 10:33

PR remarks

a5ad611

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Merge branch 'main' into kavirajk/cache-instant-queries2

bbe5605

kavirajk added 3 commits February 20, 2024 08:38

Add cache hit log lines for instant metric query

38e71d6

Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Merge branch 'main' into kavirajk/cache-instant-queries2

35f2c53

fix breaking test cases.

3ff5150

Use right split duration (new InstantSplitDuration) for instant queries Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

kavirajk merged commit fac5997 into main Feb 20, 2024
9 checks passed

kavirajk deleted the kavirajk/cache-instant-queries2 branch February 20, 2024 10:09

kavirajk mentioned this pull request Mar 22, 2024

fix: (Bug) correct resultType when storing instant query results in cache #12312

Merged

8 tasks

loki-gh-app bot mentioned this pull request Mar 27, 2024

chore(add-major-release-workflow): release 3.0.0-rc.1 #12380

Closed

rhnasc pushed a commit to inloco/loki that referenced this pull request Apr 12, 2024

feat: Support split align and caching for instant metric query results (

b2e4905

grafana#11814) Signed-off-by: Kaviraj <kavirajkanagaraj@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support split align and caching for instant metric query results #11814

feat: Support split align and caching for instant metric query results #11814

kavirajk commented Jan 29, 2024 •

edited

Loading

dannykopping left a comment

ashwanthgoli commented Feb 16, 2024

dannykopping left a comment

kavirajk commented Feb 20, 2024

feat: Support split align and caching for instant metric query results #11814

feat: Support split align and caching for instant metric query results #11814

Conversation

kavirajk commented Jan 29, 2024 • edited Loading

How it works (without split align)

How it works (with split align)

dannykopping left a comment

Choose a reason for hiding this comment

ashwanthgoli commented Feb 16, 2024

dannykopping left a comment

Choose a reason for hiding this comment

kavirajk commented Feb 20, 2024

kavirajk commented Jan 29, 2024 •

edited

Loading