[FEATURE] Add exact key match operation to FilterField #417

matt035343 · 2024-05-14T10:00:36Z

Now that we use complex column types such as dictionaries and lists more and more, we need some features to support this usages.
Let's say we have a table like this:

+-------+---------------------------+
| id        | dict_data                          |
+-------+---------------------------+
| 1          | { "key1": "v", "key2": "b"} |
| 2          | { "key1": "v"}                    |
| 3          | { "key2": "b"}                    |
+-------+----------------------------+

I want to make a filter that only targets dictionaries with key1 (row with id 2) and nothing else (i.e. not rows 1 and 3)).

I have developed a piece of code that can do that in pyarrow for IAR purposes, but it probably should be generalized to Astra etc.)

from pyarrow.compute import field as pyarrow_field, scalar, Expression, map_lookup, list_value_length
import pyarrow as pa

...


     f.reduce(
            lambda filter_expression, columns: filter_expression
            | (
                # Make sure that the forecast contains all the required columns in the granularity_columns
                f.reduce(
                    lambda granularity_condition, col: granularity_condition
                    & ~map_lookup(
                        pyarrow_field(FORECAST.forecast_granularity), pa.scalar(col), "first"
                    ).is_null(),
                    columns,
                    scalar(True),
                )
                # Make sure that the forecast contains only the required columns in the granularity_columns
                # Map type is casted to list type to get the length of the list
                & (
                    list_value_length(

                        pyarrow_field(FORECAST.forecast_granularity).cast(
                            pa.list_(pa.struct([("key", pa.string()), ("value", pa.string())]))
                        )
                    )
                    == scalar(len(columns))
                )
            ),
            granularity_columns,
            scalar(False),
        )

The text was updated successfully, but these errors were encountered:

matt035343 added the code/new-feature New feature or request label May 14, 2024

george-zubrienko assigned adelinag08 May 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Add exact key match operation to FilterField #417

[FEATURE] Add exact key match operation to FilterField #417

matt035343 commented May 14, 2024

[FEATURE] Add exact key match operation to FilterField #417

[FEATURE] Add exact key match operation to FilterField #417

Comments

matt035343 commented May 14, 2024