Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validation of Items on local contruction #98

Open
LPardue opened this issue Jun 10, 2022 · 7 comments · May be fixed by #102
Open

Validation of Items on local contruction #98

LPardue opened this issue Jun 10, 2022 · 7 comments · May be fixed by #102

Comments

@LPardue
Copy link
Contributor

LPardue commented Jun 10, 2022

sfv implements parsing validation, per the rules in RFC 8941. For example, Tokens are checked for valid character usage in https://github.com/undef1nd/sfv/blob/master/src/parser.rs#L290

However, I might have missed it but there doesn't appear to be an equivalent validation in the library when constructing such items. It seems like it would be helpful to support this, without having to resort to construction, serialization, and parsing.

For instance, the backing type to Token is String. And the docs suggest that I could create an Item of type Token by using

let token = BareItem::Token("12345");

This would strictly be an invalid token according to https://www.rfc-editor.org/rfc/rfc8941.html#section-3.3.4-7 - the first character is required to be either ALPHA or "*"

Of course, we could require that implementations sanitize their own values before passing into sfv. But then, sfv already implements the checks, just they aren't exposed in an easy way. It would be great to expose them for reuse.

The use case here is applications that pass through values from other things. If the receiving endpoint fails to parse that it will cause interop issues. It would be muc better if the sender had an opportunity to validate things before sending them, so that it might take mitigating action. There are probably still cases where a sender doesn't care, so perhaps checked and unchecked variants are required.

@LPardue
Copy link
Contributor Author

LPardue commented Jun 10, 2022

Oh I just saw that the serialize functions would catch such an error. But that might be too late in a problem cycle to be able to recover from. Imagine serializing a list of 100 tokens, where one of those tokens is invalid...

@undef1nd
Copy link
Owner

undef1nd commented Jul 3, 2022

Hello @LPardue
Sorry for a long delay with replying. It's a good idea. I will def look into it some time soon.

@LPardue
Copy link
Contributor Author

LPardue commented Jul 3, 2022

No problem! Discussed this with a few people, we thought one approach could be to avoid using primitive types directly and instead make it more like

struct Token(String)

That implements FromStr and Deref<Target = str>, then the validation could be done in these on object construction.

@undef1nd
Copy link
Owner

undef1nd commented Jul 3, 2022

Thanks for the idea.
I'm more than happy to accept contributions 🙂 Otherwise will do my best to work on it myself once I get a bit more spare time.

@zcei
Copy link

zcei commented Jan 3, 2023

Hej folks, stumbled across this crate a bit ago and want to use it to programmatically construct structured field values to be returned as a header from an HTTP API. I found a &str as the error type and failure on serialization to be potentially a problem.

So I tried giving this a stab, as I liked the idea and have never fiddled around with the Deref trait.

It turns out it would be a bigger change than initially anticipated, so I wanted to share my approach before continuing, before I sink too much time into a potentially unwanted change:

  1. The validation logic is currently living in two places, e.g. sticking with the Token example, we have the parser and the serializer which share a set of rules that need to be enforced.
    • Proposal: have a token validation function that acts on Peekable<Chars>. Parser already uses this, serializer can transform its value into it.
  2. Introduce a proper error type. Knowing what went wrong and maybe some context can be helpful when handling the error from constructing a value. Also the &'static str makes error construction hard (if not impossible) when using the shared validation functions proposed above.
    • Proposal: use something like thiserror and adjust the codebase to expect proper error types. Some sort of compatibility can be kept by having public error type that renders to the same sort of error strings that are currently in use, though it would be breaking change technically and should be treated accordingly.
  3. This one is a bit wonky, not decided yet, would love some input How much does the library want to shield from non-spec compliant use? At which point in the stage should it fail? I see two options here:
    1. We can have static methods on the primitives that allow early validation, like BareItem::try_token("foo") which returns a Result of either the token or the validation error. The actual token would be "unverified" in the sense that one could still create BareItem::Token("12345").
    2. Or we could go with the idea of struct Token(String) and have the TryFrom trait as its only mean to instantiate. If one is able to Deref that into a string, the ideas may be able to be combined. That's a set of traits I haven't used much yet, so please correct me if I'm wrong.

Examples for point two - just rough drafts from playing around. I do understand that this is technically a different issue, so feel free to move this to it's own issue if you see fit.

// Those are all known failures for validating a BareItem::Token
#[derive(Error, Debug, PartialEq)]
pub enum TokenError {
    #[error("first character is not ALPHA or '*'")]
    InvalidLeadingCharacter,
    #[error("disallowed character")]
    InvalidCharacter,
    #[error("empty input string")]
    EmptyValue,
    #[error("end of the string")]
    UnexpectedEOF,
}

// `ParserError` contains all the different things that can go wrong during parsing
// plus an `Other` type that is the current &'static string, to allow for easier transition
#[derive(Error, Debug, PartialEq)]
pub enum ParserError {
    #[error("parse_token: {0}")]
    TokenError(#[from] TokenError),
    #[error("{0}")]
    Other(&'static str),
}

// Same, but for serializing, mind how the prefix changes for the TokenError, but the actual error type stays the same
#[derive(Error, Debug, PartialEq)]
pub enum SerializerError {
    #[error("serialise_token: {0}")]
    TokenError(#[from] TokenError),
    #[error("{0}")]
    Other(&'static str),
}

// Needs better name, but is basically the internal representation and could change as needed
#[derive(Error, Debug)]
pub enum ErrorRepr {
    #[error(transparent)]
    ParserError(#[from] ParserError),
    #[error(transparent)]
    SerializerError(#[from] SerializerError),
}

// If we want to we can wrap all of that "transparently" into a public error.
// This would be the type the user works with and would currently only contain
// same string as the library already has. Could be enhanced afterwards.
#[derive(Error, Debug)]
#[error(transparent)]
pub struct PublicError(#[from] ErrorRepr);

This means within the actual codebase first all errors could be moved to the wrapped Other type

- Err("...")
+ Err(ParserError::Other("..."))

Which would change test codebase for all non-migrated errors the types as well:

assert_eq!(
-    Err("serialize_integer: integer is out of range"),

+    Err(SerializerError::Other(
+        "serialize_integer: integer is out of range"
+    )),
    Serializer::serialize_integer(1_000_000_000_000_000, &mut buf)
);

From there the actual errors could be migrated.

@valenting
Copy link
Collaborator

@zcei I think that's the correct approach.
I'm not sure if we really need to change the error handling though - but we can.

Or we could go with the idea of struct Token(String) and have the TryFrom trait as its only mean to instantiate. If one is able to Deref that into a string, the ideas may be able to be combined. That's a set of traits I haven't used much yet, so please correct me if I'm wrong.

I think this is the easiest way forward. Not sure if we really need to Deref the Token into string, but it's a nice to have.

In any case, this would be an API breaking change.
I also plan to add support for the Date type in https://datatracker.ietf.org/doc/draft-ietf-httpbis-sfbis/ which would also be a breaking change, so this is as good a time as any 🙂

@zcei
Copy link

zcei commented Jan 16, 2023

I'm not sure if we really need to change the error handling though - but we can.

What we could do is start with having parse & serialize logic still duplicated, that way the &'static str shouldn't cause too much problem, then we can move this discussion to it's own issue and see whether this should be introduced in the planned breaking change or leave it up for a later one.

I think this is the easiest way forward.

Cool, I'll give it a shot (most likely towards the weekend) and see where I land with this approach.

Not sure if we really need to Deref the Token into string, but it's a nice to have.

From how I understood it, Deref would be mostly useful for further using it in code bases without explicit transforms. Basically from the correctly parsed token to its string representation. The Display trait should be sufficient for the moment (formatting into header values etc) and then we can check wether the other is really needed.

@zcei zcei linked a pull request Jan 21, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants