You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, thank you for this great framework that I recently discovered and that I find very interesting.
I have a special need that I will try to explain here, hoping to find a potential solution.
Here is some basic data used to illustrate my need in a simplified way:
Let's say that I apply the following calculation pipe to a subset of my data:
subsetA_metrics=(
######### BASE DATA
ex_table.filter(_['subset']=='subsetA')
######### PIPE
.group_by(['id', 'cat'])
.aggregate(
sum_num=_['num'].sum()
)
.pivot_wider(
id_cols='id',
# each cat modality will be pushed to column
names_from='cat',
# measures to pivot
values_from='sum_num',
values_fill=0,
values_agg='max'
)
)
subsetA_metrics.execute()
The pipe includes, among other things, a pivot calculation, which has the particularity of generating the creation of columns directly dependent on the actual content of the data.
As I understand it, Ibis was building a SQL query in Lazy mode, step by step; this query was only really executed during .execute().
The constructed query can be consulted via .compile() :
The particularity with the pivot is that Ibis seems to be obliged to perform some inspections during the construction of the underlying SQL query (despite the Lazy mode): which is logical because Ibis is obliged to know the real content of the data to produce the table resulting from the pivot. This explains the presence of some hard-coded information in the query (which in my case suits me very well):
Now, let's say that I want to reuse the same resulting query, on another subset of data. If I use the same Ibis code, it would not suit me because Ibis would do new inspections on the new data sample, and would then provide an updated query with the new hard-coded modalities:
subsetB_metrics=(
######### BASE DATA
ex_table.filter(_['subset']=='subsetB') ##### <---- ONLY CHANGE
######### PIPE
.group_by(['id', 'cat'])
.aggregate(
sum_num=_['num'].sum()
)
.pivot_wider(
id_cols='id',
# each cat modality will be pushed to column
names_from='cat',
# measures to pivot
values_from='sum_num',
values_fill=0,
values_agg='max'
)
)
subsetB_metrics.execute()
I don't know how Ibis builds this underlying SQL query.. But is there a way to "navigate" in it (maybe via a node system) to retrieve only the portion I am interested in (just the pipe), and be able to re-execute it identically on another ibis dataset provided as input?
Something like this:
Schematically, the resulting query would have been constructed like this:
I hope my illustration is clear and that a solution is possible. The example given is simplified but I hope it describes sufficiently the nuances of my need. In this example, only the filtering changes, depending on the value 'subsetA' or 'subsetB', but I would like to push any dataset that has undergone any transformation upstream, as long as the final structure is identical.
I could probably achieve my goals by cleverly manipulating the SQL queries directly to substitute the string portion according to my uses, but it would be a shame to deviate from the Ibis syntax knowing that the framework probably already encapsulates all the logic of query/subqueries manipulation.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
First of all, thank you for this great framework that I recently discovered and that I find very interesting.
I have a special need that I will try to explain here, hoping to find a potential solution.
Here is some basic data used to illustrate my need in a simplified way:
Let's say that I apply the following calculation pipe to a subset of my data:
The pipe includes, among other things, a pivot calculation, which has the particularity of generating the creation of columns directly dependent on the actual content of the data.
As I understand it, Ibis was building a SQL query in Lazy mode, step by step; this query was only really executed during .execute().
The constructed query can be consulted via .compile() :
The particularity with the pivot is that Ibis seems to be obliged to perform some inspections during the construction of the underlying SQL query (despite the Lazy mode): which is logical because Ibis is obliged to know the real content of the data to produce the table resulting from the pivot. This explains the presence of some hard-coded information in the query (which in my case suits me very well):
Now, let's say that I want to reuse the same resulting query, on another subset of data. If I use the same Ibis code, it would not suit me because Ibis would do new inspections on the new data sample, and would then provide an updated query with the new hard-coded modalities:
I don't know how Ibis builds this underlying SQL query.. But is there a way to "navigate" in it (maybe via a node system) to retrieve only the portion I am interested in (just the pipe), and be able to re-execute it identically on another ibis dataset provided as input?
Something like this:
Which would give as output a query like this:
Schematically, the resulting query would have been constructed like this:
I hope my illustration is clear and that a solution is possible. The example given is simplified but I hope it describes sufficiently the nuances of my need. In this example, only the filtering changes, depending on the value 'subsetA' or 'subsetB', but I would like to push any dataset that has undergone any transformation upstream, as long as the final structure is identical.
I could probably achieve my goals by cleverly manipulating the SQL queries directly to substitute the string portion according to my uses, but it would be a shame to deviate from the Ibis syntax knowing that the framework probably already encapsulates all the logic of query/subqueries manipulation.
Thanks in advance for your contribution!
Beta Was this translation helpful? Give feedback.
All reactions