The core of low-degree testing in FRI is to show that a polynomial
Split-and-fold. Given a polynomial
Now, we want to combine the two polynomials,
The new polynomial has degree that is
How is FRI being used. Given a polynomial
is a low degree polynomial. In fact, its degree should be the degree of
There are several ways to do split-and-fold, and these methods have different costs when it comes to the number of witness stack it uses and the number of weight units it contributes.
-
Standard split-and-fold: The split-and-fold is exactly as the one shown above. It requires 2 hint elements for deriving
$\alpha$ , 1 hint element for the sibling (per query point), and log(n/2) hint elements (depending on the Merkle tree arity) for Merkle tree (per query point). At the same time, it needs to perform one qm31 multiplication (between$\alpha$ and $h(X)$) on the queried point.$$f(X) = g(X^2) + X \cdot h(X^2)$$ -
Double split-and-fold: It now performs two split-and-fold using the same
$\alpha$ . This is by splitting the polynomial$f(X)$ into four, say$g_1(X), g_2(X), g_3(X), g_4(X)$ :$$f(X) = g_1(X^4) + X\cdot g_2(X^4) + X^2 \cdot g_3(X^4) + X^3 \cdot g_4(X^4)$$ One can see that the four polynomials here,
$g_1(X), g_2(X), g_3(X), g_4(X)$ , are of degree one quarter. This requires 2 hint elements for deriving$\alpha$ , 3 hint elements for the siblings (per query point), and$log(n/4)$ hint elements for Merkle tree (per query point). At the same time, it needs to perform 3 qm31 multiplications per query point and one qm31 multiplication to compute$\alpha^2$ from$\alpha$ . Although such a two-time split-and-fold is much more expensive, one needs to know that the total number of split-and-fold is also lower, and sometimes this can end up being more beneficial. -
Triple split-and-fold: Similarly, one can perform three split-and-fold using the same
$\alpha$ . Here, the polynomial$f(X)$ is being split into 8 polynomials,$g_1(X), g_2(X), g_3(X), g_4(X), g_5(X), g_6(X), g_7(X), g_8(X)$ :$$f(X) = \sum_{i=1}^8 X^{i-1}\cdot g_{i}(X^8)$$ Here, it requires 2 hint elements for deriving
$\alpha$ , 7 hint elements for the siblings (per query point), and$log(n/8)$ hint elements for Merkle tree (per query point). At the same time, it needs to perform 7 qm31 multiplications per query point in addition to 2 qm31 multiplications to compute$\alpha^2$ and$\alpha^4$ from$\alpha$ . This can reduce the total number of split-and-fold further, but at the cost of more computation. -
Quadruple split-and-fold: There is some situation where the DP algorithm will suggest a more aggressive version, which splits the polynomial
$f(X)$ into 16 polynomials$g_1(X), g_2(X), ..., g_{16}(X)$ .$$f(X) = \sum_{i=1}^{16} X^{i-1}\cdot g_{i}(X^{16})$$ This still requires 2 hint elements for deriving
$\alpha$ , 15 hint elements for the siblings (per query point), and$log(n/16)$ hint elements for Merkle tree (per query point). Regarding the number of qm31 multiplications, there are 15 qm31 multiplications per query point in addition to 3 qm31 multiplications to compute$\alpha^2$ ,$\alpha^4$ , and$\alpha^8$ . This is a more extreme direction of the input stack size vs the qm31 multiplication tradeoff.
There is a lot of flexibility in Merkle tree. Here we present a few talking points.
Arity. If we assume the standard relay policy that limits the hint element to be no more than 80 bytes, then we can use a binary tree or a 3-ary tree. There is, however, no benefit in terms of the number of hint elements to use a 3-ary tree. The binary tree uses 1 hint element for sibling, and the 3-ary tree uses 2 hint elements for siblings. This is not better than doing the binary tree twice.
However, if we forgo the standard relay policy, the current upper bound for OP_CAT output is 520 bytes, which allows us to squeeze in 16 hashes. This allows us to do a 16-ary tree, and for each level, we use 2 hint elements for siblings. This can roughly reduce the number of hint elements used in Merkle tree by half. This is an example where the transaction being nonstandard (in respect to the relay policy) can bring actual benefits.
Tree-top optimization. Sometimes, it is useful to remove the top parts of the Merkle tree, so that the new root is no
longer a single element, but
The benefit seems to completely disappear when we are using nonstandard transaction, since with a 16-ary tree, any tree-top optimization seems not better than using just one more layer in the 16-ary tree. For the reasons discussed here, this DP algorithm does not consider tree-top optimization.
The code in the repository wants to do the following:
- Given a limit of the hint elements
$N_{h}$ and a limit of qm31 multiplications$N_{m}$ , - Given a linear goal functon, such as
$g(h, m) = h$ (if we solely want to optimize for fewer hint elements), or$g(h, m) = m$ (if we solely want to optimize for fewer qm31 multiplications), - Find the reduction strategy (which is a reduction path from
$n$ to$1$ , with the freedom of choosing one of the split-and-fold methods for each step of the reduction) that stays within the limits$N_{h}$ and$N_{m}$ and minimizes the goal function.
This is done as follows:
- Start with the main problem
$(n, N_{h}, N_{m})$ . - Try all the reduction strategies and reduce the current problem to a subproblem. For example, if applying a reduction
strategy that lowers
$t$ layers at the cost of$a$ hint elements and$b$ qm31 multiplications incurring a cost of$g(a, b)$ , we figure out the cost of the subproblem$(n - t, N_{h} - a, N_{m} - b)$ in a similar way, add this reduction's cost$g(a, b)$ , and pick the one with the lowest cost (i.e., minimizing the goal function).
This can be expressed as follows.
$g_1 = G(n - 1, N_{h} - 2 - (1 + log(n/2))\cdot q, N_{m} - q) + g(2 + (1 + log(n/2))\cdot q, q)$ $g_2 = G(n - 2, N_{h} - 2 - (3 + log(n/4))\cdot q, N_{m} - 3\cdot q - 1) + g(2 + (3 + log(n/4))\cdot q, 3\cdot q + 1)$ $g_3 = G(n - 3, N_{h} - 2 - (7 + log(n/8))\cdot q, N_{m} - 7\cdot q - 2) + g(2 + (7 + log(n/8))\cdot q, 7\cdot q + 2)$ $g_4 = G(n - 4, N_{h} - 2 - (15 + log(n/16))\cdot q, N_{m} - 15\cdot q - 3) + g(2 + (15 + log(n/16))\cdot q, 15\cdot q + 3)$
with the following to make sure that the limits are observed and the algorithm terminates, respectively.
-
$G(\cdot, x, y) = +\infty$ if$x < 0$ or$y < 0$ -
$G(0, x, y) = 0$ for$x \geq 0$ and$y \geq 0$
How to compute. We use the standard dynamic programming (DP) method to figure out the minimal cost and the corresponding
strategy. This is in essence a depth-first search (DFS) with memoization (aka the top-down approach). Since every step
it is guaranteed to at least go down one level, the depth is at most
Assuming nonstandard transactions, limiting the number of hint elements to 800, to split-and-fold a polynomial of size 2^26, it needs 166 qm31 multiplications. The route is (1x2, 2x4, 1, 2, 1x2, 2, 1x8):
Layer 26: cost = 166, num_hints = 800, num_qm31_mul = 166
Route: 26 => 25 => 24 => 22 => 20 => 18 => 16 => 15 => 13 => 12 => 11 => 9 => 8 => 7 => 6 => 5 => 4 => 3 => 2 => 1
If we restrict the number of hint elements further to 700, the algorithm can present a new route (2x3, 1, 2x5, 1x2, 2, 1x4), as follows:
Layer 26: cost = 184, num_hints = 699, num_qm31_mul = 184
Route: 26 => 24 => 22 => 20 => 19 => 17 => 15 => 13 => 11 => 9 => 8 => 7 => 5 => 4 => 3 => 2 => 1
This can even go to 600 with a route (3x2, 2x2, 3, 2x4, 1, 2):
Layer 26: cost = 244, num_hints = 599, num_qm31_mul = 244
Route: 26 => 23 => 20 => 18 => 16 => 13 => 11 => 9 => 7 => 5 => 4 => 2
But, of course, there is a lower bound. We find that there is no strategy is use fewer than 575 hint elements, even if it uses a lot of qm31 multiplications.
This repository is intended to be public good. It is under the MIT license.
The flexibility and usage of different split-and-fold strategies come from a discussion with Shahar Papini from Starkware.