Extend syntax to include something similar to regex character classes/sets #12

jackdos · 2019-11-27T11:47:49Z

Whilst trying to write some signatures for formats that are JSON based, I realised that the bytesequence I needed to describe should be able to deal with optional layout characters. The following JSON blocks are all functionally equivalent (with layout characters shown in brackets), but can't all be described with a single bytesequence because of the optional white spaces, tabs and new lines.

{"key1":"value1","key2":"value2"}

{(\r\n
    (\t)"key1" : "value1",(\r\n)
    (\t)"key2" : "value2"(\r\n)
}\r\n

{(\n
    "key1" : "value1",(\n)
    "key2" : "value2"(\n)
}

Ideally I should be able to specify a class of bytes and allow for wildcard matching to allow me to express the bytesequence as

BOFOffset 0: An opening {, followed by zero or more spaces, tabs, new-lines or carriage-returns, followed by "key1"....

i.e. :

7B[09 20 0D 0A]*226B65793122[09 20 0D 0A]*3A[09 20 0D 0A]*2276616C756531222C[09 20 0D 0A]*...

The text was updated successfully, but these errors were encountered:

marhop · 2020-01-06T13:50:07Z

I'd really love to see better regex support in PRONOM signature patterns! Here is another use case.

jackdos mentioned this issue Nov 27, 2019

Improve identification of XML based formats #13

Open

Dclipsham added the enhancement label May 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend syntax to include something similar to regex character classes/sets #12

Extend syntax to include something similar to regex character classes/sets #12

jackdos commented Nov 27, 2019

marhop commented Jan 6, 2020

Extend syntax to include something similar to regex character classes/sets #12

Extend syntax to include something similar to regex character classes/sets #12

Comments

jackdos commented Nov 27, 2019

marhop commented Jan 6, 2020