-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Invalid RDF - datatype issue #110
Comments
@samiscoding thanks for reporting the issue. Validation is possible in CARML, but the on-switch is currently not exposed via the CLI. |
Hey Pano, |
You can do this by building the mapper with the following builder option: import org.eclipse.rdf4j.model.impl.ValidatingValueFactory;
RdfRmlMapper.builder()
.valueFactorySupplier(ValidatingValueFactory::new) I'm still contemplating if it should be the default. But will update this issue when I do. |
Hey @pmaria, Ideally there would be multiple options:
For any failed validation an indication should appear in the logs why and where it failed (with failed liteal value, expected datatype, if possible also input file and line number/position hint) so unclean data can be identified and fixed. What do you think? |
Right now it would abort. I can see how that can be problematic.
Agreed with above.
For this one there is also the option of doing some value coercion based on the XSD datatype IRI, which I'm currently looking at, like in R2RML.
Agreed. I'll open some issues to track these features. Let me know if you have any comments. |
Not sure what you mean by value coercion in this context? To give an example of what I mean with converting invalid data to string literal: The last variant I sketched above would be to simply change the datatype on the fly to Same could be considered when expecting an integer value for e.g. an age property, but sometimes encountering a value of What would coercion do in these cases? From an implementation point-of-view, all of this could be handled by a ValueFactory, so I can easily create something like this for my application, but this might still be interesting for other users of carml. The only tricky part might be getting the location (file/stream, line number/position) for proper reporting, as this is not exposed to the ValueFactory. This might need to be handled outside. |
I meant something like R2RML's canonical RDF lexical form.
Ah right, coercion would not make sense here.
Same as above, this is indeed not something you could solve with coercion.
Right, so basically this would mean being able to define a strategy on how to handle values that are invalid for a certain datatype.
Hmm yeah, this is not something that can be done reliably. This would be dependent on the source and possibly the parser. It might be possible to incorporate some context on a best effort basis, if a source / parser supports this. |
Hi there,
We noticed that the engine is not checking the validity of the datatype that is provided in mapping based on the data value and generates invalid RDF triple.
e.g. a triple is generated as:
:xxx ds:studyMaximumAge "74 Years"^^xsd:int.
Seems a bug that engine doesn't detect that “74 years” is not an xsd:int and generates an invalid RDF.
The text was updated successfully, but these errors were encountered: