Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: base64 decoding adds extra null bytes #40

Open
Tschrock opened this issue Oct 10, 2024 · 1 comment · May be fixed by #41
Open

Bug: base64 decoding adds extra null bytes #40

Tschrock opened this issue Oct 10, 2024 · 1 comment · May be fixed by #41
Labels
bug Something isn't working

Comments

@Tschrock
Copy link

Describe the current behavior

When decoding base64 text, there will sometimes be extra padding bytes added to the end of the decoded data.

Here's an example where I'm encoding and decoding the same value, and you can see it added extra null characters to my string:

image

Describe the expected behavior

base64 encoding should be symetric and not add extra bytes.

Steps to reproduce the bug

  1. Navigate to ciphereditor.com.
  2. Add two "Binary to text encoding" blocks
  3. Link the encoded data fields

Version, Environment, Platform

  • ciphereditor version: Whatever the current release on https://ciphereditor.com/ is (I can't find a version number listed anywhere)
  • Browser and browser version (if applicable):
    • Google Chrome Beta - 130.0.6723.31
    • Firefox Developer Edition - 131.0b5 and 132.0b5
  • Operating system:
    • Windows 10 Home - 22H2 - 19045.4894
@Tschrock Tschrock added the bug Something isn't working label Oct 10, 2024
@anderium
Copy link
Contributor

anderium commented Oct 14, 2024

I've dug around a bit, and found that it's because the transformUnits function incorrectly appends a null byte when used here:

// Transform unit size (decode)
const units = transformUnitSize(encodedUnits, outUnitSize, inUnitSize)
const buffer = Uint8Array.from(units).buffer

The parameters have values […, length=18], 6, 8. If this was encoding, the transformation would need 13.5 units of 8 bits to encode 18 units of 6 bits. Since this is decoding, the truth is that there is actually only 2 bits of information in the final unit, so we can losslessly store it in 13 bytes.

It isn't fixable for arbitrary transformations, because e.g. encoding one or two 3 bit units into 8 bits is indistinguishable if the second 3 bit is all zero. But if you know that ceil(x * 8 / 6) = 18 then you can conclude that 12.75 < x ≤ 13.5, i.e. x = 13. So the fix here would likely be to add a forward/backward switch into the function, which then calculates the maximum number of units that could have lead to that 18 with floor(18 * 6 / 8). Actually, it's impossible to avoid null units if and only if inUnitSize < outUnitSize, so if we've made it impossible to transform into larger units then we can fix it without adding a parameter.

I'll try to make a PR to fix this later.

anderium added a commit to anderium/ciphereditor that referenced this issue Oct 15, 2024
…ixes wierkstudio#40)

Base64 embeds bytes into 6 bit units. It takes two units to properly encode a single byte. If you then decode this, you can conclude that the 4 additional bits contain no information. (Unless the input violates our assumption that it is a base64 encoding.)

You can conclude that if $y = \lceil xs_{input} / s_{output}\rceil$ with x an integer, then $x <= \lfloor ys_{output} / s_{input}\rfloor$. This commit checks every added unit if it is the last, and doesn't add the null byte if it is. It might be more efficient to just use the larger array and slice off the null byte when it exists.

It can be impossible to determine a true zero byte from padding if `inputSize < outputSize`, but for binary-to-text that requires an alphabet of more than 256 characters.

Signed-off-by: anderium <33520919+anderium@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants