Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AUC abdullah-freres collection failing transform validation #386

Open
jacobthill opened this issue Mar 16, 2023 · 2 comments
Open

AUC abdullah-freres collection failing transform validation #386

jacobthill opened this issue Mar 16, 2023 · 2 comments
Labels
bug Something isn't working

Comments

@jacobthill
Copy link
Contributor

jacobthill commented Mar 16, 2023

Exception: ERROR: failed to transform all harvested records: harvested record count (48) != transformed record count (51)

I confirmed 48 records harvested by running harvest again locally. I don't know why the transformed record count is 51. This is strange behavior. I only expected the the transformed record count could be lower. Bad data or errors in the config might prevent some records from being transformed. This is adding 3 records and I have no idea where they would come from. I suspect traject or another ariflow task might be adding blank lines?

auc_ahmed_toughan, auc_coptic, auc_maps, etc. worked

@jacobthill jacobthill added the bug Something isn't working label Mar 16, 2023
@edsu
Copy link
Contributor

edsu commented Mar 17, 2023

We noticed that one JSON object in the Transform log was getting split over multiple lines?

{"cho_title":{"en":["Luxor Temple"]},"cho_creator":{"en":["Abdullah Frères"]},"cho_date":{"en":["1890-1899"]},"cho_date_range_hijri":[1307,1308,1309,1310,1311,1312,1313,1314,1315,1316,1317],"cho_date_range_norm":[1890,1891,1892,1893,1894,1895,1896,1897,1898,1899],"cho_dc_rights":{"en":["To inquire about permissions or reproductions, contact the Rare Books and Special Collections Library, The American University in Cairo at +20.2.2615.3676 or rbscl-ref@aucegypt.edu."]},"cho_description":{"en":["Temple de Luxor. 109A"]},"cho_edm_type":{"en":["Image"],"ar-Arab":["صورة"]},"cho_format":{"en":["image/jpg"]},"cho_has_type":{"en":["Other Images"],"ar-Arab":["صور أخرى"]},"cho_is_part_of":{"en":["19th Century photographs"]},"cho_medium":{"en":["photographic prints"]},"cho_type":{"en":["Still Image"]},"agg_data_provider":{"en":["American University in Cairo"],"ar-Arab":["الجامعة الأمريكية في القاهرة"]},"agg_provider":{"en":["American Uni
[2023-03-16, 17:22:21 UTC] {docker.py:373} INFO - versity in Cairo
[2023-03-16, 17:22:21 UTC] {docker.py:373} INFO - "],"ar-Arab":["الجامعة الأمريكية في القاهرة"]},"agg_provider_country":{"en":["Egypt"],"ar-Arab":["مصر"]},"agg_data_provider_country":{"en":["Egypt"],"ar-Arab":["مصر"]},"cho_type_facet":{"en":["Image","Image:Other Images"],"ar-Arab":["صورة","صورة:صور أخرى"]},"id":"p15795coll38:91","transform_version":"923910e","transform_timestamp":"2023-03-16 17:22:21 +0000","agg_data_provider_collection_id":"auc-abdullah-freres","dlme_source_file":"/auc/abdullah_freres/data.csv","agg_is_shown_at":{"wr_dc_rights":["To inquire about permissions or reproductions, contact the Rare Books and Special Collections Library, The American University in Cairo at +20.2.2615.3676 or rbscl-ref@aucegypt.edu."],"wr_format":["image/jpeg"],"wr_is_referenced_by":["https://cdm15795.contentdm.oclc.org/iiif/p15795coll38:91/manifest.json"],"wr_id":"https://cdm15795.contentdm.oclc.org/iiif/2/p15795coll38:91/full/full/0/default.jpg"},"agg_preview":{"wr_dc_rights":["To inquire about permissions or reproductions, contact the Rare Books and Special Collections Library, The American University in Cairo at +20.2.2615.3676 or rbscl-ref@aucegypt.edu."],"wr_format":["image/jpeg"],"wr_is_referenced_by":["https://cdm15795.contentdm.oclc.org/iiif/p15795coll38:91/manifest.json"],"wr_id":"https://cdm15795.contentdm.oclc.org/iiif/2/p15795coll38:91/full/400,400/0/default.jpg"}}

@edsu
Copy link
Contributor

edsu commented Mar 17, 2023

If you look at line 13 in the transformed outupt at dlme-airflow-dev:/opt/app/dlme/datashare/output-auc-abdullah-freres.ndjson you can see a truncated line:

{"cho_titl

and another on line 25, and another on line 30.

The original CSV data for what should be on line 13 looks like a complete CSV line without line breaks:

http://iiif.io/api/presentation/2/context.json,https://cdm15795.contentdm.oclc.org/iiif/p15795coll38:64/manifest.json,image/jpeg,http://iiif.io/api/image/2/level1.json,https://cdm15795.contentdm.oclc.org/iiif/2/p15795coll38:64/full/full/0/default.jpg,['Albumin'],['Abdullah Frères'],['1890-1899'],['Le champs de bataille à Toski. 84A'],['image/jpg'],['19th Century photographs'],['26.7 x 20.7'],"['To inquire about permissions or reproductions, contact the Rare Books and Special Collections Library, The American University in Cairo at +20.2.2615.3676 or rbscl-ref@aucegypt.edu.']",['photographic prints'],['General views; landscape; Toshka'],['War field in Toski'],['Still Image'],['Rare Books and Special Collections Digital Library '],"['<span>From: <a href=""http://digitalcollections.aucegypt.edu/digital/collection/p15795coll38/id/64"">War field in Toski</a></span>']"

I think this points to some kind of problem in traject / dlme-transform? Maybe it's a problem that we're only noticing now that the transform validation is running again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: Ready
Development

No branches or pull requests

2 participants