Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get flow checkpoints endpoint v2: 233 #264

Merged
merged 52 commits into from
Oct 3, 2024

Conversation

TebaleloS
Copy link
Collaborator

@TebaleloS TebaleloS commented Sep 5, 2024

Release notes:

  • Implementing get flow checkpoint endpoint.

@TebaleloS TebaleloS self-assigned this Sep 5, 2024
@TebaleloS TebaleloS added DB Issues touching the Database part of the project Server Issues touching the server part of the project work in progress Work on this item is not yet finished (mainly intended for PRs) labels Sep 5, 2024
sealed trait PaginatedResult[R] {
def data: Seq[R]
}

object PaginatedResult {

case class ResultHasMore[R](data: Seq[R]) extends PaginatedResult[R]

object ResultHasMore {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove these objects. We do not need encoders/decoders for PaginatedResults, we need it only for PaginatedResponse.

@TebaleloS
Copy link
Collaborator Author

TebaleloS commented Sep 20, 2024

Release Notes

  • This PR implement endpoint to get flow using flow ID.

@@ -14,10 +14,12 @@
* limitations under the License.
*/

CREATE OR REPLACE FUNCTION flows.get_flow_checkpoints(
IN i_partitioning_of_flow JSONB,
-- Function: flows.get_flow_checkpoints_v2(4)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be just bellow. It's then part of the function code, can be read within the DB.

IN i_partitioning_of_flow JSONB,
-- Function: flows.get_flow_checkpoints_v2(4)
CREATE OR REPLACE FUNCTION flows.get_flow_checkpoints_v2(
IN i_flow_id BIGINT,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And even more importantly try to keep the order "natural" - the more important parameters first.

@@ -66,53 +70,60 @@ $$
-- measured_columns - measure columns associated with a given checkpoint
-- measurement_value - measurement details associated with a given checkpoint
-- checkpoint_time - time
-- has_more - flag indicating whether there are more checkpoints available
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
-- has_more - flag indicating whether there are more checkpoints available
-- has_more - flag indicating whether there are more checkpoints available, always `false` if `i_limit` is NULL

IN i_partitioning_of_flow JSONB,
-- Function: flows.get_flow_checkpoints_v2(4)
CREATE OR REPLACE FUNCTION flows.get_flow_checkpoints_v2(
IN i_flow_id BIGINT,
IN i_limit INT DEFAULT 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would rename to i_checkpoint_limit, it's more explanatory.

CREATE OR REPLACE FUNCTION flows.get_flow_checkpoints(
IN i_partitioning_of_flow JSONB,
-- Function: flows.get_flow_checkpoints_v2(4)
CREATE OR REPLACE FUNCTION flows.get_flow_checkpoints_v2(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to be marked with v2

Suggested change
CREATE OR REPLACE FUNCTION flows.get_flow_checkpoints_v2(
CREATE OR REPLACE FUNCTION flows.get_flow_checkpoints(

It has consequences bellow

BEGIN
_fk_partitioning = runs._get_id_partitioning(i_partitioning_of_flow);
-- Execute the query to retrieve checkpoints flow and their associated measurements
RETURN QUERY
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query is wrong.
The limit should be applied to number of checkpoints not, the measurements eventually.
I don't think it's possible to solve without a WITH statement.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will fix it; I will use the agreed method. That I think will fix a lot of things that could go wrong in the procedure.

ON M.fk_measure_definition = MD.id_measure_definition
WHERE PF.fk_flow = i_flow_id
AND (i_checkpoint_name IS NULL OR CP.checkpoint_name = i_checkpoint_name)
ORDER BY CP.process_start_time,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ordering should be switched. See why here.

@TebaleloS TebaleloS removed the work in progress Work on this item is not yet finished (mainly intended for PRs) label Sep 30, 2024
# Conflicts:
#	server/src/main/scala/za/co/absa/atum/server/Main.scala
#	server/src/main/scala/za/co/absa/atum/server/api/controller/BaseController.scala
#	server/src/main/scala/za/co/absa/atum/server/api/http/Endpoints.scala
#	server/src/main/scala/za/co/absa/atum/server/api/http/Routes.scala
#	server/src/test/scala/za/co/absa/atum/server/api/TestData.scala
# Conflicts:
#	server/src/main/scala/za/co/absa/atum/server/api/http/Endpoints.scala
#	server/src/main/scala/za/co/absa/atum/server/api/http/Routes.scala
IF _fk_partitioning IS NULL THEN
status := 41;
status_text := 'Partitioning not found';
-- Check if the flow exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just documentation:
First instinct to check if flows exists, is to go to the flows table. But every flow has at least one partitioning, so at least one record here. And this table is actually used in further queries unlike the flows table, so caching advantage. Great solution. 👏
But as it's a less intuitive solution, please add to the comment why it is done and why it works.

@@ -124,7 +132,8 @@ trait Routes extends Endpoints with ServerOptions {
// getPartitioningEndpointV2,
// getPartitioningMeasuresEndpointV2,
// getFlowPartitioningsEndpointV2,
// getPartitioningMainFlowEndpointV2
// getPartitioningMainFlowEndpointV2,
getFlowCheckpointsEndpointV2,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be also commented.

TebaleloS and others added 3 commits October 3, 2024 12:20
…ts.sql

Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com>
…ts.sql

Co-authored-by: David Benedeki <14905969+benedeki@users.noreply.github.com>
Copy link
Contributor

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good work!

status_text := 'Partitioning not found';
-- Check if the flow exists by querying the partitioning_to_flow table.
-- Rationale:
-- This table is preferred over the flows table because:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the comment 👍

@benedeki benedeki merged commit 59d8e4d into master Oct 3, 2024
9 of 10 checks passed
@benedeki benedeki deleted the feature/#233-getFlowCheckpointsEndpointV2 branch October 3, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DB Issues touching the Database part of the project Server Issues touching the server part of the project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants