Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#38 http client and config #73

Merged
merged 5 commits into from
Sep 22, 2023
Merged

#38 http client and config #73

merged 5 commits into from
Sep 22, 2023

Conversation

cerveada
Copy link
Collaborator

@cerveada cerveada commented Sep 15, 2023

fixes #38

  • Since no API is defined yet, just simple HTTP client implementation.
  • Using dispatcher as an abstraction for publishing data, currently to http nad console, later kafka and other destination can be added
  • added type safe config
  • dispatcher type can be set via config

other possible improvements

  • use async http
  • better error handling

We need to have some discussion about how we want to handle error in the context of Spark.

@github-actions
Copy link

github-actions bot commented Sep 15, 2023

JaCoCo agent module code coverage report - spark:2 - scala 2.12.12

File Coverage [43.87%]
AtumAgent.scala 72.81%
ConsoleDispatcher.scala 63.64%
HttpDispatcher.scala 0%
Total Project Coverage 55.54%

@github-actions
Copy link

JaCoCo server module code coverage report - scala 2.12.12

There is no coverage information present for the Files changed

Total Project Coverage 12.45%

Copy link
Collaborator

@TebaleloS TebaleloS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me.

- fixing typo in package name
- add template for local config
basicRequest
.body(s"$checkpointKey $measureResult")
.post(serverUri)
.send(backend)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would retries work? I never used sttp library but I quickly checked it and it looks awesome!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no direct retry support, but it could be implemented using: https://github.com/softwaremill/retry or https://github.com/hipjim/scala-retry

@cerveada
Copy link
Collaborator Author

Regarding the error handling, I think we have two options:

  1. try our best to deliver the measure result, but when all fails, just log it and let the Spark job continue.
  2. when sending measurement fails, throw an exception and let the Spark job fail. I am not sure if it will fail the whole Job, but since Atum runs as part of the job (not as listener llike Spline) it should, right?

To me it seems the number 1 makes more sense but at the end it depends on the requirements on Atum.

@TebaleloS
Copy link
Collaborator

Yah, in terms of error handling we need the requirements, but I think the first option would be great.

# Conflicts:
#	agent/src/main/scala/za/co/absa/atum/agent/AtumAgent.scala
Copy link
Contributor

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like it.
I could imagine:

  • a dispatcher that would accumulate the results locally, for testing purposes
  • dispatcher not built directly in AtumAgent but in a factory
  • the HttpDispatcher doing a quick ping to the server on creation to fail fast (no endpoint for it now anyway)

And obviously the base trait's methods will change. PR is 😎
*

agent/src/main/resources/reference.conf Show resolved Hide resolved
@benedeki
Copy link
Contributor

Regarding the error handling, I think we have two options:

  1. try our best to deliver the measure result, but when all fails, just log it and let the Spark job continue.
  2. when sending measurement fails, throw an exception and let the Spark job fail. I am not sure if it will fail the whole Job, but since Atum runs as part of the job (not as listener llike Spline) it should, right?

To me it seems the number 1 makes more sense but at the end it depends on the requirements on Atum.

Eventually probably ErrorHandling from SparkCommons can be used. For now, let's keep it as it is, and throw and exception failing the whole job.
If any of us are bored can used the ErrorHandling right away with the default of failing with exception implementation.

.gitignore Show resolved Hide resolved
@@ -0,0 +1,6 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This I would removed though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@benedeki benedeki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • code reviewed
  • pulled
  • built
  • run

@cerveada cerveada merged commit 51b2e80 into master Sep 22, 2023
4 of 5 checks passed
@cerveada cerveada deleted the feature/38-http-client branch September 22, 2023 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add the ability to call the Server REST APIs
4 participants