-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement error correction #205
Implement error correction #205
Conversation
There are two parameterizations of the bech32 checksum (see the "roots" unit test in src/primitives/polynomial.rs for what they are). In rust-bitcoin#203 we mixed them up, using the generator from one but the exponents from the other. We made the same mistake with codex32 apparently. When we implement error correction this will cause failures. Fix it.
Adds a CHARACTERISTIC constant to the Field trait, so this is yet another breaking change (though in practice I don't think anybody is implementing Field on their own types).
c2d0ac8
to
4a10a86
Compare
…near shift registers This provides a general-purpose implementation of the Berlekamp-Massey algorithm for finding a linear shift register that generates a given sequence prefix. If compiled without an allocator, it will run less efficiently (and be limited to a maximum size) but it will work. Also introduces a fuzz test to check that it works properly and does not crash.
This commit pulls everything together. The actual error correction code isn't too big: we interpret a residue as a polynomial, evaluate it at various powers of alpha to get a syndrome polynomial, call berlekeamp-massey on this to get a "connection polynomial", then use Forney's algorithm to get the actual error values. Each step in the above is encapsulated separately -- the "big" stuff, in particular Berlekamp-Massey and obtaining the relevant constants from the checksum definition, were in previous commits. This PR does need to add some more functionality to Polynomial. Specifically we need the ability to evaluate polynomials, take their formal derivatives, and multiply them modulo x^d for a given d. These are the bulk of this PR. The next commit will introduce a fuzztest which hammers on the correction logic to ensure that it's not crashing.
The codex32 test will more thoroughly exercise the algebra, since there we can correct up to 4 errors. The bech32 test on the other hand should work without an allocator (though to exercise this you need to manually edit fuzz/Cargo.toml to disable the alloc feature -- this is rust-lang/cargo#2980 which has been open for 10 years and counting..)
4a10a86
to
76d0dae
Compare
cc @BenWestgate in case you want to look at this API (this is the same as the branch I posted on your discussion topic, but it's cleaned up so CI passes) |
What's the priority on this bro, and what sort of review do you need to be comfortable merging? (I assume the next PR will add a bunch of unit test that prove correctness of the algo here.) |
The fuzz tests exhaustively prove correctness. I can extract some fuzz vectors into a unit test if you think there's value in that. |
The snippet from the docs doesn't compile because I'm missing items that I can't find in the docs. #![cfg(feature = "alloc")]
use bech32::Checksum;
/// The codex32 checksum algorithm, defined in BIP-93.
#[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)]
pub enum Codex32 {}
impl Checksum for Codex32 {
type MidstateRepr = u128;
const CHECKSUM_LENGTH: usize = 13;
const CODE_LENGTH: usize = 93;
// Copied from BIP-93
const GENERATOR_SH: [u128; 5] = [
0x19dc500ce73fde210,
0x1bfae00def77fe529,
0x1fbd920fffe7bee52,
0x1739640bdeee3fdad,
0x07729a039cfc75f5a,
];
const TARGET_RESIDUE: u128 = 0x10ce0795c2fd1e62a;
} |
Don't return "corrections" that don't validate the checksum?
All non-bech32 after the hrp should be treated as erasures.
Filling erasures takes precedence over detecting or correcting errors. Document that as more erasures are marked less errors can be corrected and detected and at the maximum erasures corrected there will be no error detection.
By far the easiest would be: I ask it to decode a bech32 string or list of ints [0-31], '?' or -1 marking erasures, it returns me a tuple with Boolean of checksum validity, a correction if one exists, and a list of error locations. Or (False, None, []) when no correction is possible. It should also throw an error explaining when too many erasures have been marked for the HD of the checksum at this code length. |
@BenWestgate what commit are you using to get those docs? The missing fields are present in this PR (and I test that all doccomments compile).
I can do this but there's a simpler check that I can do. But this doesn't address the API question, which is that we don't know whether the set of corrections is "good" until after they're all yielded. So do we waste memory accumulating them all, waste time generating them twice, or tell the user that they might get an error even after receiving some errors.
Yeah, this seems reasonable. I'll add a parsing API that does this.
I think this guidance is just not applicable to Rust, for a few reasons:
|
Ah, that's my problem, I was reading the docs from the website and master.
How much memory are we talking? And how common is this situation for randomly generated strings? I may have an opinion based on the brute force insert/delete correcting desktop application. It turned out best to not keep candidates in RAM or solving more than 2 inserts isn't possible, with a generator 2 inserts plus a few deletes is possible, perhaps 3 now that no substitutions must be checked and inserts are erasures. So wasting less time would yield better corrections. The memory use is one thread per CPU checking checksums or with this, checking substitution error qty and keeping track of the lowest distance score valid candidate found. If it returns corrections that don't validate, brute force must
Some of the others could be treated as erasures as well: HRP, minor mixed case, wrong witver. If it's correctable that may be useful to return. So perhaps Errors have a "trait" boolean called "correctable", and then a method that actually attempts the ECC. Correctable will be false when there are too many erasures or the length is wrong. if ErrorName.correctable:
candidate = ErrorName.suggest_correction() |
Probably "one plus the maximum length of checksum we support without allocator, times 16, plus overhead". With compaction we should be able to reduce the size to 9, and by adding restrictions on the size of allowable strings we can probably reduce that to 4 or 5, but we'd still be using more memory than a correctly-parsed string would, which is not reasonable for an error type. (We'd have to silence lints to do it and we'd get user complaints that it was blowing up the size of all of their error types.) And as for "how common", it would take this much stack space on every single call to the library no matter whether there were errors or whether somebody was even using the maximum-length checksum or not.
Yes, that's essentially what's in this PR. |
Any more ideas from @BenWestgate? Should I start reviewing this? |
@clarkmoody yes please. I think that @BenWestgate's suggestions are not actionable in this PR -- though I will keep them in mind when I put together the next one. |
Tests running locally on 76d0dae |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 76d0dae
This implements the core algorithms for error correction. In principle this exposes an API which is sufficient for somebody to implement error correction (of both substitutions and erasures). In practice the API is unlikely to be super usable because:
Result
or what).?
s or something.There is also some missing functionality:
The next PR will be an "error correction API" PR. I would like some guidance from users on what this API should look like.