Redo FSE Table Description section #3810

elasota · 2023-11-04T00:54:44Z

This rewrites the FSE Table Description section in a more descriptive and exhaustive style that I think will be clearer to implementers.

Main changes:

Removed examples, instead specifies exactly how all intermediate values are to be calculated and used.
Removed the part about the decoder knowing the total byte consumption. (The decoder doesn't need to track that information.)
Bit reads are described as only occurring if the bits are needed. I think this makes it clearer that this step doesn't require any lookahead.
Removed the failure condition for (1 << Accuracy_Log) being exceeded. I believe this condition is impossible because the largest possible decodable probability value is the value that will cause the cumulative probability to equal (1 << Accuracy_Log) at which point the loop is terminated.

Cyan4973 · 2024-03-03T23:10:48Z

Looking into this now.

While adding a more detailed explanation of the decoding process is welcome and a great idea,
I'm not sure to understand how removing the example helps ?

Also: this PR is now in conflict with dev, likely as a consequence of accepting another spec change on the same or neighboring paragraph.

elasota · 2024-03-15T19:19:25Z

I will add the examples back and fix the merge conflict this weekend.

…tion of the algorithm

…upper bound)

elasota · 2024-03-22T23:20:58Z

I've updated this to add back the examples, but rephrased them to use the nomenclature in the new description.

I've also updated the upper bound description for offset codes: Rejecting out-of-bounds values is well-defined for Huffman weights, literal lengths, and match lengths, since they have fixed upper bounds, but offset code upper bounds are decoder-defined, so I've added some verbiage stating that decoders are allowed to reject any streams encoding a non-zero probability for a value larger than their limit, and indicated that encoders should not include non-zero probabilities for an offset code larger than the largest offset code present in the stream.

Cyan4973 · 2024-04-01T21:09:01Z

I've read the newly proposed version, and while I have no error to report, I unfortunately also can't state that the new version is an obvious improvement (i.e. clearer) compared to the older one. They both look difficult to read to me.

It's unfortunate that the FSE table allocation process is so complex to describe.
It's certainly not an easy task, and there might be no great solution.

I understand that the new version also contains some clarifications which are mildly different from the original version. There might be some merit here, but I can't really separate this part from the rest in the PR.

If you believe some clarifications are worthwhile on their own, it might be easier to review them one by one.
You might be right for several of them, but it's also important to properly distinguish the format specification from the implementation details, as these details may change and must be allowed to change in the future. It's a subtle distinction, which deserves some dedicated attention.

elasota · 2024-04-01T23:06:37Z

You might be right for several of them, but it's also important to properly distinguish the format specification from the implementation details, as these details may change and must be allowed to change in the future. It's a subtle distinction, which deserves some dedicated attention.

I'm not sure what this part is in reference to, but I have been trying to avoid bringing implementation details into this.

Normally the expectation is that if a spec describes an algorithm, an implementation is free to implement any equivalent algorithm. So, tracking cumulative probability vs. remaining probability for instance isn't really important, and a lot of the intermediates could be described in different ways, it doesn't really matter as long as it produces the same results.

What I am mainly trying to improve for example is this type of description:

255-157 = 98 values are remaining in an 8-bits field.

That requires some deduction to tell that the 255 is supposed to be (1<<8)-1, and the 8 is supposed to come from the log2sup call above (even though 8 is also the Accuracy_Log in the example). I think it is better to describe exactly what each value is supposed to represent and how to compute it, and state all of the necessary calculations instead of implying them.

Anyway, I will split this up and close out this PR and open some new ones.

Cyan4973 · 2024-10-01T16:37:17Z

Several specification updates have been integrated since this PR was first submitted.
I presume all relevant ideas have already been extracted and merged at this point.
As a consequence, I guess it's time to close this PR.

facebook-github-bot added the CLA Signed label Nov 4, 2023

Cyan4973 self-assigned this Mar 3, 2024

elasota and others added 2 commits March 22, 2024 18:56

Redo FSE Table Description section to replace examples with a descrip…

512bc37

…tion of the algorithm

Clarify value-limiting behavior for offset codes (since they have no …

01c97ab

…upper bound)

elasota force-pushed the fmt-clarifications branch from da4a607 to 01c97ab Compare March 22, 2024 23:18

Cyan4973 closed this Dec 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Redo FSE Table Description section #3810

Redo FSE Table Description section #3810

elasota commented Nov 4, 2023

Cyan4973 commented Mar 3, 2024

elasota commented Mar 15, 2024

elasota commented Mar 22, 2024

Cyan4973 commented Apr 1, 2024

elasota commented Apr 1, 2024

Cyan4973 commented Oct 1, 2024

Redo FSE Table Description section #3810

Redo FSE Table Description section #3810

Conversation

elasota commented Nov 4, 2023

Cyan4973 commented Mar 3, 2024

elasota commented Mar 15, 2024

elasota commented Mar 22, 2024

Cyan4973 commented Apr 1, 2024

elasota commented Apr 1, 2024

Cyan4973 commented Oct 1, 2024