Forward data source level lineage for a given datasource #1081
Replies: 3 comments 10 replies
-
It is definitely possible to get a forward lineage (or impact) in Spline data model, the feature has just not been exposed through the Consumer API yet. As you correctly noticed the edge Basically what you need is to flip the logic of the existing lineage overview query, and traverse the graph forward in time. The current implementation traverses the graph along the Try to switch the edge from depends to affects, and instead of finding visible writes for a read, look for reads for which a given write is visible (meaning all read events connected to the same source that happened after the given write, but before the subsequent overwrite). |
Beta Was this translation helpful? Give feedback.
-
I created a new foxx service which gives the forward lineage for a given write event. I was able to visualise the forward lineage on the lineage overview page. Attaching the screenshot of the same. As per the forward lineage, the selected write event is Emp_Sal_Dept_Multiplier which lists Dept_Emp_Sal and Dept_Sal_2 as its succeeding lineage. The query I used is as follows:
@wajda Is the above query correct ? Please suggest any changes if required. If required, we can connect over a zoom/team call. |
Beta Was this translation helpful? Give feedback.
-
Hi @wajda Can I take this as a feature request ? I can contribute to the forward lineage feature. I have created a foxx service which gives the forward lineage for a given write event. You can see the same in the above screenshot attached. Let me know your thoughts. Happy to discuss. Thanks. |
Beta Was this translation helpful? Give feedback.
-
Background
Currently in spline there is no way we can explore the forward lineage of a given datasource. Say: what datasets are getting impacted by a single given dataset ? Eg: Dataset A impacts Dataset B & C and Dataset B impacts Dataset D and E. So the forward lineage looks like:
A(Dataset)->someExecutionPlan->B,C(Dataset)->someExecutionPlan->D,E(Dataset)
Question
@wajda
Is it possible to get the lineage using the current Spline data model entities ?
The current model has an edge collection affects and depends which depicts it, but it points to the dataSource collection in its _to filed.
One possible solution
Is the above solution correct or the same can be achieved using the existing data model ?
@pratapmmmec
Beta Was this translation helpful? Give feedback.
All reactions