-
Notifications
You must be signed in to change notification settings - Fork 530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INLONG-7072][Manager][Sort] Resource adaptive adjustment for Hudi #7077
base: master
Are you sure you want to change the base?
Conversation
/inlong-sort/connectors | ||
/plugins | ||
/inlong-sort/sort-dist.jar | ||
/inlong-manager/plugins |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please remove these lines
@@ -209,6 +209,10 @@ | |||
"meta.Sinks.Hudi.PartitionFieldListHelp": "If the field type is timestamp, you must set the format of the field value, support MICROSECONDS, MILLISECONDS, SECONDS, SQL, ISO_8601, and custom, such as: yyyy-MM-dd HH:mm:ss, etc.", | |||
"meta.Sinks.Hudi.FieldFormat": "FieldFormat", | |||
"meta.Sinks.Hudi.ExtListHelper": "The DDL attribute of the hudi table needs to be prefixed with 'ddl.'", | |||
"meta.Sinks.Hudi.RecordPreDayUnit": "row", | |||
"meta.Sinks.Hudi.RecordPreDay": "RecordPreDay", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RecordPerDay
, not PreDay
.
|
||
--> | ||
|
||
## Apache InLong dev toolkit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please do not commit unrelated codes with this issue.
This PR is stale because it has been open for 60 days with no activity. |
This PR is stale because it has been open for 60 days with no activity. |
This PR is stale because it has been open for 60 days with no activity. |
Prepare a Pull Request
[INLONG-7072][Manager][Sort] Resource adaptive adjustment for Hudi
Motivation
Hudi flink jobs often have unreasonable resource allocation. Too much allocation will lead to waste of resources, and too little will lead to back pressure or OOM.
When allocating resources, you first need to determine the concurrency of the source side to ensure that there is no data backlog in the upstream when reading. Here is a general configuration situation, such as partitioning by day, with about 15 billion data per day, and about 50 concurrent configurations. Other data volumes can be converted appropriately.
After determining the concurrency on the source side, you can configure the concurrency of write according to the ratio of 1:1.5 or 1:2.
If OOM occurs in the write operator during operation, you can appropriately add write concurrency and TM memory.
If the following back pressure occurs, the concurrency can be adjusted according to the consumption difference between source and write. As follows, there is a difference of about 50W, that is, there is 50W of data that cannot keep up with the write, and then it can be based on the amount of successfully written data and the running (used) Time to calculate how much write concurrency is needed to calculate the difference of 50W.
Modifications
Verifying this change
(Please pick either of the following options)
This change is a trivial rework/code cleanup without any test coverage.
This change is already covered by existing tests, such as:
(please describe tests)
This change added tests and can be verified as follows:
(example:)
Documentation