Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update s3a_wrapper.sh to support whitespaces #2211

Merged

Conversation

Zejnilovic
Copy link
Contributor

@Zejnilovic Zejnilovic commented Feb 8, 2024

Closes #2210

Copy link

sonarcloud bot commented Feb 8, 2024

Quality Gate Passed Quality Gate passed

Issues
0 New issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

Copy link
Contributor

@dk1844 dk1844 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@dk1844
Copy link
Contributor

dk1844 commented Feb 9, 2024

I have tested the change locally, looks good.
Details:
I have commented out kinit, set -e, get_dataset_info, and handle_jceks_path which I don't have locally and ran the original and the changed script.

The original got messed up by spaced null value:

$ ./s3a_wrapper.sh ./run_standardization.sh --deploy-mode cluster --dataset-name Custom_datasetName --menas-auth-keytab user_authFile.krb --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --header false --delimiter U+00A6 --null-value '        '
  _    _      _                    _____           _       _
 | |  | |    | |                  / ____|         (_)     | |
 | |__| | ___| |_ __   ___ _ __  | (___   ___ _ __ _ _ __ | |_ ___
 |  __  |/ _ \ | '_ \ / _ \ '__|  \___ \ / __| '__| | '_ \| __/ __|
 | |  | |  __/ | |_) |  __/ |     ____) | (__| |  | | |_) | |_\__ \
 |_|  |_|\___|_| .__/ \___|_|    |_____/ \___|_|  |_| .__/ \__|___/
               | |                                  | |
               |_|                                  |_|

 Enceladus's Helper Scripts version 1.0
 Currently running run_standardization.sh

ERROR: Found unrecognized options passed to the script. Parameters are:
    user_authFile.krb

while the new changed did not:

$ ./s3a_wrapper.sh ./run_standardization.sh --deploy-mode cluster --dataset-name Custom_datasetName --menas-auth-keytab user_authFile.krb --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --header false --delimiter U+00A6 --null-value '        '
  _    _      _                    _____           _       _
 | |  | |    | |                  / ____|         (_)     | |
 | |__| | ___| |_ __   ___ _ __  | (___   ___ _ __ _ _ __ | |_ ___
 |  __  |/ _ \ | '_ \ / _ \ '__|  \___ \ / __| '__| | '_ \| __/ __|
 | |  | |  __/ | |_) |  __/ |     ____) | (__| |  | | |_) | |_\__ \
 |_|  |_|\___|_| .__/ \___|_|    |_____/ \___|_|  |_| .__/ \__|___/
               | |                                  | |
               |_|                                  |_|

 Enceladus's Helper Scripts version 1.0
 Currently running run_standardization.sh

Dynamic Resource Allocation enabled
Using Keytab from HDFS
Command line:
/opt/spark-2.4.4/bin/spark-submit --master yarn --deploy-mode cluster --files /absolute/path/application.conf#application.conf  --conf spark.logConf=true --conf spark.dynamicAllocation.enabled=true --conf spark.shuffle.service.enabled=true --conf spark.sql.adaptive.enabled=true --conf spark.dynamicAllocation.maxExecutors=4 --conf spark.dynamicAllocation.minExecutors=0 --conf spark.dynamicAllocation.executorAllocationRatio=0.5 --conf spark.sql.adaptive.shuffle.targetPostShuffleInputSize=134217728 --conf spark.yarn.submit.waitAppCompletion=false --conf "spark.driver.extraJavaOptions=-Dstandardized.hdfs.path=/bigdata/std/std-{0}-{1}-{2}-{3} -Dspline.mongodb.url=mongodb://localhost:27017 -Dspline.mongodb.name=spline -Dhdp.version=2.7.3    -Dconfig.file=application.conf    " --conf "spark.executor.extraJavaOptions=  " --class za.co.absa.enceladus.standardization.StandardizationJob enceladus-spark-jobs.jar --menas-auth-keytab user_authFile.krb --dataset-name Custom_datasetName --dataset-version 1 --report-date 2024-02-07 --report-version 1 --raw-format csv --charset cp1252 --delimiter \U+00A6 --header false --null-value '        '

@dk1844 dk1844 added the PR:tested Only for PR - PR was tested by a tester (person) label Feb 9, 2024
@dk1844 dk1844 merged commit a6d0ba6 into support/2.27 Feb 9, 2024
6 of 7 checks passed
@dk1844 dk1844 deleted the bugfix/2210-fix-s3a-wrapper-to-support-whitespaces branch February 9, 2024 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR:tested Only for PR - PR was tested by a tester (person)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants