Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend syntax to include something similar to regex character classes/sets #12

Open
jackdos opened this issue Nov 27, 2019 · 1 comment

Comments

@jackdos
Copy link

jackdos commented Nov 27, 2019

Whilst trying to write some signatures for formats that are JSON based, I realised that the bytesequence I needed to describe should be able to deal with optional layout characters. The following JSON blocks are all functionally equivalent (with layout characters shown in brackets), but can't all be described with a single bytesequence because of the optional white spaces, tabs and new lines.

{"key1":"value1","key2":"value2"}
{(\r\n
    (\t)"key1" : "value1",(\r\n)
    (\t)"key2" : "value2"(\r\n)
}\r\n
{(\n
    "key1" : "value1",(\n)
    "key2" : "value2"(\n)
}

Ideally I should be able to specify a class of bytes and allow for wildcard matching to allow me to express the bytesequence as

BOFOffset 0: An opening {, followed by zero or more spaces, tabs, new-lines or carriage-returns, followed by "key1"....

i.e. :

7B[09 20 0D 0A]*226B65793122[09 20 0D 0A]*3A[09 20 0D 0A]*2276616C756531222C[09 20 0D 0A]*...
@marhop
Copy link

marhop commented Jan 6, 2020

I'd really love to see better regex support in PRONOM signature patterns! Here is another use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants