Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addition of renv.lock schema #1877

Closed
jrdnbradford opened this issue Apr 16, 2024 · 4 comments
Closed

Addition of renv.lock schema #1877

jrdnbradford opened this issue Apr 16, 2024 · 4 comments

Comments

@jrdnbradford
Copy link
Contributor

jrdnbradford commented Apr 16, 2024

I see there's been some talk about having a renv.lock schema (#1107). I put together a minimal version of this schema and opened a PR over at lorenzwalthert/precommit#558 that would use pre-commit to validate renv.lock files against the schema.

If there's still interest, it may be a better idea to fill out the schema more and put in a PR here instead, just let me know where it should live. Then it could be maintained here, perhaps with a new renv::validate_lockfile() function, and we could add a hook at https://github.com/lorenzwalthert/precommit that utilizes the functionality native to {renv}

Or I could throw the schema over at https://github.com/SchemaStore/schemastore instead. In any case, I'm happy to do the work and put in a PR with the schema and anything else. Let me know.

@kevinushey
Copy link
Collaborator

Thanks! I think this makes sense. We'd probably want to include this in the renv package in inst/schema/<version>/schema.json, or something similar? And then later promote to a public location likehttps://github.com/SchemaStore/schemastore as you linked above?

My main questions would be around how we should handle future potential changes to the lockfile format. I imagine we'll have to do something like:

  1. Introduce a version component into renv lockfiles, so that different versions of the lockfile can be validated against separate schemas. (Is that normally done via a special comment in the file, or via a field in the document / lockfile itself?)

  2. Have some mechanism for tolerating "optional" fields, since I can imagine some users might want to add their own adornments to an renv lockfile, which I think should still be considered as valid as long as all "required" fields are present. That is, some users might want to use their own extensions to the default renv.lock schema, and I think we should be able to validate those as well.

Does that make sense?

@jrdnbradford
Copy link
Contributor Author

To my knowledge, generally in situations like this practitioners avoid creating a version of a schema for each new version of what they’re validating and instead just expand the schema. JSON schema has the flexibility to set fields such as deprecated to allow this sort of expansion as the lockfile format changes. There are also ways to conditionally validate, validate against multiple patterns, etc, so we can use multiple ways to validate changing fields if necessary. This isn't to say we should do this for renv, but given it's general stability it may be the best way to go.

JSON schema also has a required field to define what is absolutely necessary, and an additionalProperties field to define whether other fields not specified in the schema are allowed. Both are available at each block level. This would allow toleration of user adornments.

Here's my current thoughts: we define a schema that validates all the fields renv knows about. We allow additionalProperties and set the required fields to only the absolute minimum. This might not sound stringent enough, but you still get 99% of the validation with far fewer false negatives and much less maintenance.

For those who want more stringent validation, the function we write to validate can have a “bring your own schema” parameter, so to speak. Users can validate against renv’s schema, or else use it as a starting point for their own lockfile schema:

lockfile_validate <- function(
    lockfile = NULL, # Use default project lockfile if not provided
    lockfile_schema  = NULL, # Use default schema in inst if not provided
    # Other parameters
)

This would make sense for those who have very specific production requirements for their validation.

Let me know what you think. I'll put in a PR with what I'm thinking this week so it can be discussed and iterated over. It will include a schema, an exported validation function, and tests.

@kevinushey
Copy link
Collaborator

Thanks! This all sounds reasonable to me; I'd be happy to review a PR.

@jrdnbradford
Copy link
Contributor Author

Merged in #1889.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants