Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oai_xml driver not precise enough to handle mods data #285

Open
jacobthill opened this issue Oct 4, 2022 · 3 comments
Open

oai_xml driver not precise enough to handle mods data #285

jacobthill opened this issue Oct 4, 2022 · 3 comments

Comments

@jacobthill
Copy link
Contributor

The oai_xml driver is able to grab mods fields but it loses much of the context in the process. The oai_xml driver builds column names based on xml elements only, while mods captures much of the data in attributes. Without the attributes informing column names, all urls get named the same thing and appended to a list in a single column of the csv output. We cannot use this to map fields because we have to rely on the order of elements in the list, but some records may not have the same number of elements, which throws the order off.

For now, we may have to use the xml driver for mods records but we need to look into improving the oai_xml driver.

This was referenced Oct 4, 2022
@aaron-collier
Copy link
Contributor

Isn't this just a matter of using the proper xpath query in the configuration to get the specific field with attributes? Let's discuss.

@jacobthill
Copy link
Contributor Author

We can use the xpath query for each field with either the xml driver or the oai_xml driver. If we use it with the xml driver, we don't have the wait option which we need to harvest qnl. If we do it with the oai_xml driver we can get all the fields we need but it will also harvest many things we don't need since it harvests many things by default. This makes mapping a bit messy since there will be many fields that don't get mapped in the metadata. I think if we are going to manually enter the xpath query for all of the fields, we really should be using the xml driver.

@edsu
Copy link
Contributor

edsu commented Mar 17, 2023

We could adjust the oai-pmh driver to only serialize fields that are defined in the catalog? If we need to capture attributes for context, like we do in the MARCXML driver, this might be possible too?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

No branches or pull requests

3 participants