[WIP] PartitionEliminationFilterIndexRule implementation #390

apoorvedave1 · 2021-03-25T23:32:12Z

What is the context for this pull request?

In this PR we introduce FilterIndexRule for PartitionEliminationNonCoveringIndex. Rule Algorithm:

Identify if an index contains index columns which can improve point lookups or range queries on data source
Query the index using spark to identify list of data files which could satisfy the query
Redirect original query to this subset of data files instead of complete list of files.

Tracking Issue: If you expect any subjective discussions around this pull request, please consider opening a tracking issue and link to the PR. Write N/A, if this pull request is self-contained.
Parent Issue: Link to the issue that captures the overall plan. Write N/A, if this is a stand-alone pull request with a tracking issue OR self-contained pull request.
Dependencies: Links to issues you depend on for this pull request to work. Write N/A, if no dependencies.
- Issue 1
- Issue 2

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

No

How was this patch tested?

… the index

sezruby · 2021-03-26T21:27:23Z

src/main/scala/com/microsoft/hyperspace/index/rules/PEFilterIndexRule.scala

+
+    val filteredDf =
+      spark.read
+        .parquet(index.content.files.map(_.toString): _*)


I thought we build an index of (value => file id list) instead of full scan of covering index.
Could you measure the perf of optimize phase with 1TB TPCH dataset?

this would not work if we want to merge Covering and partition elimination index.

We cannot deliver this rule without a proper performance validation as it seems expensive. Could you run TPCH 1TB 100k chunk dataset and share the result?

w/o FEFilterIndexRule for comparison - explain time & query execution time

w/ PEFilterindexRule - explain time & query execution time

src/main/scala/com/microsoft/hyperspace/index/rules/PEFilterIndexRule.scala

Co-authored-by: EJ Song <51077614+sezruby@users.noreply.github.com>

apoorvedave1 added 2 commits March 24, 2021 15:10

PEFilterIndexRule initial commit

87dce8c

add filtering of conditions to choose only compatible conditions with…

ecc455b

… the index

sezruby reviewed Mar 26, 2021

View reviewed changes

add distinct on file ids

2923e77

Co-authored-by: EJ Song <51077614+sezruby@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] PartitionEliminationFilterIndexRule implementation #390

[WIP] PartitionEliminationFilterIndexRule implementation #390

apoorvedave1 commented Mar 25, 2021

sezruby Mar 26, 2021

apoorvedave1 Mar 27, 2021

sezruby Apr 8, 2021 •

edited

Loading

[WIP] PartitionEliminationFilterIndexRule implementation #390

Are you sure you want to change the base?

[WIP] PartitionEliminationFilterIndexRule implementation #390

Conversation

apoorvedave1 commented Mar 25, 2021

What is the context for this pull request?

What changes were proposed in this pull request?

Does this PR introduce any user-facing change?

How was this patch tested?

sezruby Mar 26, 2021

Choose a reason for hiding this comment

apoorvedave1 Mar 27, 2021

Choose a reason for hiding this comment

sezruby Apr 8, 2021 • edited Loading

Choose a reason for hiding this comment

sezruby Apr 8, 2021 •

edited

Loading