Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] Add exact key match operation to FilterField #417

Open
matt035343 opened this issue May 14, 2024 · 0 comments
Open

[FEATURE] Add exact key match operation to FilterField #417

matt035343 opened this issue May 14, 2024 · 0 comments
Assignees
Labels
code/new-feature New feature or request

Comments

@matt035343
Copy link
Member

Now that we use complex column types such as dictionaries and lists more and more, we need some features to support this usages.
Let's say we have a table like this:

+-------+---------------------------+
| id        | dict_data                          |
+-------+---------------------------+
| 1          | { "key1": "v", "key2": "b"} |
| 2          | { "key1": "v"}                    |
| 3          | { "key2": "b"}                    |
+-------+----------------------------+

I want to make a filter that only targets dictionaries with key1 (row with id 2) and nothing else (i.e. not rows 1 and 3)).

I have developed a piece of code that can do that in pyarrow for IAR purposes, but it probably should be generalized to Astra etc.)

from pyarrow.compute import field as pyarrow_field, scalar, Expression, map_lookup, list_value_length
import pyarrow as pa

...


     f.reduce(
            lambda filter_expression, columns: filter_expression
            | (
                # Make sure that the forecast contains all the required columns in the granularity_columns
                f.reduce(
                    lambda granularity_condition, col: granularity_condition
                    & ~map_lookup(
                        pyarrow_field(FORECAST.forecast_granularity), pa.scalar(col), "first"
                    ).is_null(),
                    columns,
                    scalar(True),
                )
                # Make sure that the forecast contains only the required columns in the granularity_columns
                # Map type is casted to list type to get the length of the list
                & (
                    list_value_length(

                        pyarrow_field(FORECAST.forecast_granularity).cast(
                            pa.list_(pa.struct([("key", pa.string()), ("value", pa.string())]))
                        )
                    )
                    == scalar(len(columns))
                )
            ),
            granularity_columns,
            scalar(False),
        )
@matt035343 matt035343 added the code/new-feature New feature or request label May 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code/new-feature New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants