feat(precompiles): optimize native implementation of the bn128 precompiles #5611

AllFi · 2023-12-11T08:32:33Z

What does this PR do?
This PR optimizes BN128Multiplication, and BN128Pairing precompiled contracts by implementing Montgomery form arithmetic. It is worth mentioning that this PR makes BN128Addition a little bit slower because of the overhead of the conversion to and from the Montgomery form.

Why are these changes required?
The current implementation of these precompiles is relatively slow. In general, this hinders on-chain zkSNARKs verification and, consequently, makes zk-based apps almost unfeasible (#4311). This PR is aiming to mitigate this issue.

This PR is far worse than #5507 in terms of optimization but it doesn't require JNI and doesn't introduce a lot of changes in the implementation.

This PR has been tested by:

Unit Tests
Manual Testing

Follow up

Extra details
I've performed some benchmarks before and after these modifications. The results are below:

The average time of operations before (Intel(R) Core(TM) i7-10750H CPU, 32 GB RAM):

BN128Addition cost: 41165 ns
BN128Multiplication cost: 3490156 ns
BN128Pairing cost (2 pairs): 87665281 ns

The average time of operations after:

BN128Addition cost: 48147 ns (+17%)
BN128Multiplication cost: 2393792 ns (-31%)
BN128Pairing cost (2 pairs): 57961905 ns (-33%)

1. implement montgomery arithmetic 2. add tests from go-ethereum

3for · 2023-12-19T02:55:51Z

Hi, @AllFi , I add a test case FpMulMontTest in https://github.com/3for/java-tron/tree/feature/bn128_montgomery_multiplication, and there's an uncatched exception "java.lang.NullPointerException". Maybe it would be better to add some code to handle the 'null Fp value' case.

There exists null Fp value case

AllFi · 2023-12-19T11:20:09Z

Hi, @3for, @Federico2014 ! Thank you for the review!

There's indeed an uncaught exception, but it is only possible because access modifiers in the Fp were modified.

The current implementation doesn't expose any constructors/methods to instantiate Fp (Fp2, etc.) from outside the package. The only way to instantiate Fp objects from outside the package is by creating G1/G2 points. The code in that section checks that the created Fp values are not equal to null.
https://github.com/zkBob/java-tron/blob/410cce47e9375b89811bdde427d0126b301bc4b9/crypto/src/main/java/org/tron/common/crypto/zksnark/BN128Fp.java#L50-L53
https://github.com/zkBob/java-tron/blob/410cce47e9375b89811bdde427d0126b301bc4b9/crypto/src/main/java/org/tron/common/crypto/zksnark/BN128Fp2.java#L55-L58
https://github.com/zkBob/java-tron/blob/410cce47e9375b89811bdde427d0126b301bc4b9/crypto/src/main/java/org/tron/common/crypto/zksnark/BN128G1.java#L44-L46
https://github.com/zkBob/java-tron/blob/410cce47e9375b89811bdde427d0126b301bc4b9/crypto/src/main/java/org/tron/common/crypto/zksnark/BN128G2.java#L61-L63

Precompiled contracts also check that the created points are not equal to null.
https://github.com/zkBob/java-tron/blob/410cce47e9375b89811bdde427d0126b301bc4b9/actuator/src/main/java/org/tron/core/vm/PrecompiledContracts.java#L745-L753
https://github.com/zkBob/java-tron/blob/410cce47e9375b89811bdde427d0126b301bc4b9/actuator/src/main/java/org/tron/core/vm/PrecompiledContracts.java#L799-L801
https://github.com/zkBob/java-tron/blob/410cce47e9375b89811bdde427d0126b301bc4b9/actuator/src/main/java/org/tron/core/vm/PrecompiledContracts.java#L865-L867

In summary, the current implementation assumes that it is the responsibility of the caller to check that the created Fp values are not equal to null. And the code in the BN128Fp.create, BN128Fp2.create, BN128G1.create, and BN128G2.create handles this case. If I'm not missing something, it seems impossible to cause a NullPointerException by calling precompiled contracts using any possible data.

When we perform operations within the package, we use constructors directly, so it also seems impossible to encounter a NullPointerException. We still use static methods for constants, but in this case, we can be certain that the values are correct.

Am I missing something?

3for · 2023-12-20T03:39:02Z

I think it's okay for the caller to handle the 'null Fp value' case. There're another two issues:

It's better to check that v is non-negative: 0<=v<P.

static Fp create(BigInteger v) {
    if (v.compareTo(P) >= 0 || v.signum() == -1) {
      // Only the values less than P, and not be negative are valid
      return null;
    }
    return new Fp(toMontgomery(v));
  }

It's better to check that 0<=s<order, for sP == (s+order)P, and there exists such case in 32 bytes.

@Override
    public Pair<Boolean, byte[]> execute(byte[] data) {

      if (data == null) {
        data = EMPTY_BYTE_ARRAY;
      }

      byte[] x = parseWord(data, 0);
      byte[] y = parseWord(data, 1);

      byte[] s = parseWord(data, 2);

      BN128<Fp> p = BN128Fp.create(x, y);
      if (p == null) {
        return Pair.of(false, EMPTY_BYTE_ARRAY);
      }

      BN128<Fp> res = p.mul(BIUtil.toBI(s)).toEthNotation();

      BigInteger value = new BigInteger(1, s);
      BigInteger order = new BigInteger(
              "21888242871839275222246405745257275088548364400416034343698204186575808495617");
      byte[] s_a = value.add(order).toByteArray();
      BN128<Fp> res_a = p.mul(BIUtil.toBI(s_a)).toEthNotation();
      boolean flag_x= Arrays.equals(res.x().bytes(), res_a.x().bytes()); // is true
      boolean flag_y= Arrays.equals(res.y().bytes(), res_a.y().bytes()); // is true

      return Pair.of(true, encodeRes(res.x().bytes(), res.y().bytes()));
    }
  }

AllFi · 2023-12-20T09:24:31Z

It's better to check that v is non-negative: 0<=v<P.

Yes, that does make sense. Thank you for pointing it out. Fixed in 04a00c2.

It's better to check that 0<=s<order, for sP == (s+order)P, and there exists such case in 32 bytes.

I am not sure about it. I have the following concerns:

This check compromises backward compatibility. All the modifications made in this PR thus far maintain backward compatibility.
EIP196 specifies that multiplication with a scalar greater than the order should be successful.

Multiply point with scalar that is larger than the field order (should succeed).
There are some tests in go-ethereum where a scalar greater than the order is treated as valid input. https://github.com/ethereum/go-ethereum/blob/7124057bad16694d2b1f15dfe68a6109961b34ab/core/vm/testdata/precompiles/bn256ScalarMul.json#L59

That's why I believe it's better to keep it as is, but the decision is up to you.

3for · 2023-12-20T10:50:03Z

It looks good to me now. Keeping s unchecked is okay.

r0wdy1 · 2023-12-29T08:11:33Z

@3for @Federico2014 @lxcmyf @tomatoishealthy is there any additional help or information that we can provide in regards of this PR?
Is there any plan on going forwards with it?

lxcmyf · 2024-02-22T11:11:40Z

@3for @Federico2014 @lxcmyf @tomatoishealthy is there any additional help or information that we can provide in regards of this PR? Is there any plan on going forwards with it?

@r0wdy1
Can this optimization meet your needs if the current configuration is set to a timeout of 80ms? Or can you optimize to complete the transaction within 80ms as much as possible?

r0wdy1 · 2024-02-22T12:25:02Z

@lxcmyf , happy to hear from you again
I wouldn't give any exact estimates until we do a thorough testing on Nile.
As is the case with all of the optimizations the final results also depend on node configuration
Despite that our first predictions were around 120 ms we have achieved a significant progress on our end and we've seen the pairing work for about 80 ms, but that should be proven on the testnet
As you can see all of the github actions are stuck in pending state so we're patiently waiting for this PR to drop on Nile for further testing and optimizations.

lxcmyf · 2024-02-22T12:44:31Z

@lxcmyf , happy to hear from you again I wouldn't give any exact estimates until we do a thorough testing on Nile. As is the case with all of the optimizations the final results also depend on node configuration Despite that our first predictions were around 120 ms we have achieved a significant progress on our end and we've seen the pairing work for about 80 ms, but that should be proven on the testnet As you can see all of the github actions are stuck in pending state so we're patiently waiting for this PR to drop on Nile for further testing and optimizations.

@r0wdy1
Okay, it's great to see your further breakthrough. We will confirm the specific merging branch and time, as well as some preparation work. If there is any progress, we will synchronize it as soon as possible.

r0wdy1 · 2024-02-27T13:41:51Z

@lxcmyf can or should we participate in the call too?

Murphytron · 2024-02-28T03:05:46Z

@lxcmyf can or should we participate in the call too?

Hi @r0wdy1, if you feel you would contribute to the meeting by your attendance, please leave a comment here about the topics to be discussed: tronprotocol/pm#75, and meanwhile email to Murphy, so we could invite you to the call, thank you.

r0wdy1 · 2024-02-28T08:52:44Z

@Murphytron frankly I'm not familiar with the decision process in Tron core team and whether the presence of this topic in the agenda implies debates or just internal discussion on further steps.
If it does imply debates then I would like to use that chance to further advocate why pairing based ZKP is useful for Tron ecosystem and defend our point of view or to answer any additional questions on implementation, otherwise there's obviously no need for the presence

Murphytron · 2024-02-28T14:08:00Z

@Murphytron frankly I'm not familiar with the decision process in Tron core team and whether the presence of this topic in the agenda implies debates or just internal discussion on further steps. If it does imply debates then I would like to use that chance to further advocate why pairing based ZKP is useful for Tron ecosystem and defend our point of view or to answer any additional questions on implementation, otherwise there's obviously no need for the presence

Thank you for the active and positive contribution. The core devs community call is where developers of TRON introduce and explain their issues or TIPs for peer review and discussion. If you have more to further explain the topic, you are welcome to attend the call, otherwise you could also give other developers some time to finish the review procedure, and attend the next core devs community call if necessary.

xiangjie256329 · 2024-03-02T03:50:12Z

预计什么时候合并呢,我们想再复测一下之前跑不了的合约,合并到nile后会把超时时间改成80ms吗,不然120ms不会报错,之前的问题不会复现,或者把这个pr先更新到shasta上也行,总得有个和主网完全相同的环境吧

tomatoishealthy · 2024-03-02T09:01:34Z

预计什么时候合并呢,我们想再复测一下之前跑不了的合约,合并到nile后会把超时时间改成80ms吗,不然120ms不会报错,之前的问题不会复现,或者把这个pr先更新到shasta上也行,总得有个和主网完全相同的环境吧

Good advice, also concerned about future planning @lxcmyf

lxcmyf · 2024-03-02T09:46:35Z

预计什么时候合并呢,我们想再复测一下之前跑不了的合约,合并到nile后会把超时时间改成80ms吗,不然120ms不会报错,之前的问题不会复现,或者把这个pr先更新到shasta上也行,总得有个和主网完全相同的环境吧

Good advice, also concerned about future planning @lxcmyf

@tomatoishealthy
At present, the PR is still undergoing further review and testing. If there are any latest developments, we will synchronize them here as soon as possible.

barbatos2011 · 2024-03-08T07:59:00Z

Hi @r0wdy1, there are two things we need to do before going to nile testnet.

code review [doing]
assess the impact of BN128Addition's duration increase [discussing]
Once these are finished, we can deploy the code to nile testnet and confirm the effect of this optimization.
Then one more thing before releasing the code, we need to revaluate the fee of these op codes to avoid some DOS attacks.
We will update the progress here later.

AllFi · 2024-03-11T08:35:22Z

Hello!

I would like to provide a bit more context about benchmarks after reading the discussion. I've used "average" test cases (not involving infinity and low scalars) from go-ethereum set to benchmark the code. You can find the exact test that I've used to perform benchmark here: zkBob@1555a37. The only difference is that I've performed more iterations than specified in this commit.

barbatos2011 · 2024-03-11T11:15:55Z

Hello!

I would like to provide a bit more context about benchmarks after reading the discussion. I've used "average" test cases (not involving infinity and low scalars) from go-ethereum set to benchmark the code. You can find the exact test that I've used to perform benchmark here: zkBob@1555a37. The only difference is that I've performed more iterations than specified in this commit.

Hi @AllFi , thanks for your information! Are the benchmarks' results mentioned above from this commit's test code, right?

AllFi · 2024-03-11T13:49:18Z

Hi @barbatos2011! Yes, they are. To be more precise, I used the bn128Bench test from the branch zkBob:feature/bn128_montgomery_multiplication from this PR to measure the durations of precompiles after the optimization and I used the same test from the branch mentioned earlier to measure durations of existing implementations of precompiles.

barbatos2011 · 2024-03-12T05:49:40Z

Hi @barbatos2011! Yes, they are. To be more precise, I used the bn128Bench test from the branch zkBob:feature/bn128_montgomery_multiplication from this PR to measure the durations of precompiles after the optimization and I used the same test from the branch mentioned earlier to measure durations of existing implementations of precompiles.

@AllFi OK, thanks again for this, I will share this information to other developers and will try it if necessary.

GuipaiQigong111 · 2024-03-13T13:02:19Z

Has the code within this pull request been implemented on other blockchains and undergone live deployment previously? Additionally, is there available documentation that provides an overview of the optimized computational logic?

AllFi · 2024-03-15T12:17:50Z

Hi, @GuipaiQigong111!

I am not sure what do you mean exactly. If you are asking if other blockchains use Montgomery form, then the answer is yes. All efficient implementations of these precompiles use Montgomery form as far as I know. Though they implement more complicated multiprecision integer arithmetics. If you are asking about the exact code from this PR, then the answer is no as far as I know. The original implementation of these precompiles was taken from https://github.com/ethereum/ethereumj and I am not aware of any project that uses it now apart from the Tron. I am also not aware of any more efficient implementations on Java. I tried implementing these precompiles using well-known arkworks library written on Rust but that PR was postponed. That's why the goal of this PR was to optimize existing implementation by introducing as few modifications as possible.

The technique used in this PR is called Montgomery modular multiplication and there are a lot of resources that you can check. The implementation in this PR is probably the simplest one (since it doesn't involve multiprecision integers arithmetics) so this Wikipedia page might be enough to understand the code.

GuipaiQigong111 · 2024-03-17T04:00:27Z

@AllFi
Okay, thank you for helping me clarify.

lxcmyf · 2024-03-26T07:54:59Z

@AllFi
Thank you very much for your recent optimization work on the pre compilation of bn_128. This improvement is of great significance for our project. In order to better understand and apply this optimization, we hope to receive your further help and support.

Firstly, considering the readability and maintainability of the code, we would like to obtain a code level documentation. If other chains (such as Ethereum clients) have the same implementation, it can be noted in the documentation and compared if necessary. This kind of document not only helps us to have a deeper understanding of the principles and logic behind optimization, but also provides great convenience for future maintenance and expansion work. We would greatly appreciate it if you could provide some assistance in this regard.

Secondly, in order to ensure that the optimized code remains functionally consistent with the original code, we would like to know how you evaluate data consistency, as well as the evaluation methods and data you use. This helps us to use optimized code with more confidence and reduce potential risks.

Finally, we are also very concerned about the performance comparison data between the optimized code and the Ethereum Java client under the same opcode. Such data can intuitively reflect the effectiveness of your optimization and help us better evaluate the value of this improvement in practical applications.

We understand that these requests may involve some additional work for you, but please believe that these tasks are crucial for our project. We would greatly appreciate it if you could provide this information at your convenience.

yanghang8612 · 2024-04-10T09:31:13Z

Based on our local performance tests, the results are largely consistent with yours.

Is there a better improvement option to keep the BN128Addition operation at least on par with the original performance?

AllFi · 2024-04-11T13:51:46Z

Hello, @lxcmyf @yanghang8612! Sorry for the late reply.

Documentation

As far as I know, most clients use the Montgomery form, but they implement multi-precision arithmetics, so the code in these implementations differs significantly from the current implementation. Even after optimization, the current implementation remains quite naive and unique in this regard. Therefore, I am not sure if comparing it with other implementations would be useful. Regarding the code-level documentation, could you please provide some examples for reference? It would also be helpful if you could point out which parts are unclear and require a better explanation.

Tests

I've added test cases from go-ethereum to evaluate the correctness of the implementation. Previously, there were no tests for these precompiles, or I might have missed them. You can find these test cases in bn256Add.json, bn256Pairing.json, and bn256ScalarMul.json. These test cases are executed in PrecompiledContractsTest.bn128AdditionTest, PrecompiledContractsTest.bn128MultiplicationTest, and PrecompiledContractsTest.bn128PairingTest, respectively.

Benchmarks

I've conducted a simple comparative benchmark between java-tron and Hyperledger Besu. You can reassess Besu's performance by running these commands on your local machine.

git clone --recursive https://github.com/hyperledger/besu
cd ./besu
./gradlew installDist
cd ./build/install/besu/bin
./evmtool benchmark
./evmtool benchmark --nonative

It's worth noting that Hyperledger Besu has two implementations of these precompiles: the Java version and the native version. I've used the input data from Besu benchmarks in the java-tron benchmarks.

Here are the results:

java-tron without any optimizations

BN128Addition cost 6.2 µs
BN128Multiplication cost 2,069.6 µs
BN128Pairing cost 29,369.5 µs

java-tron with optimizations from this PR

BN128Addition cost 10.2 µs
BN128Multiplication cost 1,478.7 µs
BN128Pairing cost 20,105.1 µs

besu with Java implementation:

AltBN128 Add    500 gas @    6.4 µs /    78.1 MGps
AltBN128 Mul 40,000 gas @7,273.9 µs /     5.5 MGps
AltBN128 2 pairing 180,000 gas @478,523.7 µs /     0.4 MGps

besu with native implementation:

AltBN128 Add    500 gas @    2.8 µs /   179.5 MGps
AltBN128 Mul 40,000 gas @  113.5 µs /   352.5 MGps
AltBN128 2 pairing 180,000 gas @1,717.5 µs /   104.8 MGps

It seems that the Java implementation of java-tron is generally better than the Java implementation of Besu, but it significantly lags behind Besu's native implementation. I believe Besu uses the native implementation by default, so in practice, the java-tron implementation (even optimized with this PR) will be significantly slower than Besu's implementation. I've also conducted a benchmark with the proposed native implementation:

java-tron with native implementation:

BN128Addition cost 1.9 µs
BN128Multiplication cost 109.3 µs
BN128Pairing cost 1,187.1 µs

As you can see, this implementation is comparable and even slightly faster than Besu's native implementation. I believe the only way to achieve comparable speed with other clients for these precompiles is by using a native implementation.

BN128Addition precompile

I've tried to figure out something that might help but didn't manage to mitigate the constant overhead related to conversion between forms. There's always the option to keep the old implementation and use it only for this precompile. It will be a bit cumbersome, but it might be done.

yanghang8612 · 2024-04-11T14:51:22Z

Hello, @lxcmyf @yanghang8612! Sorry for the late reply.

Documentation

As far as I know, most clients use the Montgomery form, but they implement multi-precision arithmetics, so the code in these implementations differs significantly from the current implementation. Even after optimization, the current implementation remains quite naive and unique in this regard. Therefore, I am not sure if comparing it with other implementations would be useful. Regarding the code-level documentation, could you please provide some examples for reference? It would also be helpful if you could point out which parts are unclear and require a better explanation.

Tests

I've added test cases from go-ethereum to evaluate the correctness of the implementation. Previously, there were no tests for these precompiles, or I might have missed them. You can find these test cases in bn256Add.json, bn256Pairing.json, and bn256ScalarMul.json. These test cases are executed in PrecompiledContractsTest.bn128AdditionTest, PrecompiledContractsTest.bn128MultiplicationTest, and PrecompiledContractsTest.bn128PairingTest, respectively.

Benchmarks

I've conducted a simple comparative benchmark between java-tron and Hyperledger Besu. You can reassess Besu's performance by running these commands on your local machine.
git clone --recursive https://github.com/hyperledger/besu
cd ./besu
./gradlew installDist
cd ./build/install/besu/bin
./evmtool benchmark
./evmtool benchmark --nonative
It's worth noting that Hyperledger Besu has two implementations of these precompiles: the Java version and the native version. I've used the input data from Besu benchmarks in the java-tron benchmarks.

Here are the results:

java-tron without any optimizations
BN128Addition cost 6.2 µs
BN128Multiplication cost 2,069.6 µs
BN128Pairing cost 29,369.5 µs
java-tron with optimizations from this PR
BN128Addition cost 10.2 µs
BN128Multiplication cost 1,478.7 µs
BN128Pairing cost 20,105.1 µs
besu with Java implementation:
AltBN128 Add    500 gas @    6.4 µs /    78.1 MGps
AltBN128 Mul 40,000 gas @7,273.9 µs /     5.5 MGps
AltBN128 2 pairing 180,000 gas @478,523.7 µs /     0.4 MGps
besu with native implementation:
AltBN128 Add    500 gas @    2.8 µs /   179.5 MGps
AltBN128 Mul 40,000 gas @  113.5 µs /   352.5 MGps
AltBN128 2 pairing 180,000 gas @1,717.5 µs /   104.8 MGps
It seems that the Java implementation of java-tron is generally better than the Java implementation of Besu, but it significantly lags behind Besu's native implementation. I believe Besu uses the native implementation by default, so in practice, the java-tron implementation (even optimized with this PR) will be significantly slower than Besu's implementation. I've also conducted a benchmark with the proposed native implementation:

java-tron with native implementation:
BN128Addition cost 1.9 µs
BN128Multiplication cost 109.3 µs
BN128Pairing cost 1,187.1 µs
As you can see, this implementation is comparable and even slightly faster than Besu's native implementation. I believe the only way to achieve comparable speed with other clients for these precompiles is by using a native implementation.

BN128Addition precompile

I've tried to figure out something that might help but didn't manage to mitigate the constant overhead related to conversion between forms. There's always the option to keep the old implementation and use it only for this precompile. It will be a bit cumbersome, but it might be done.

Great job. We'll look into it carefully.

r0wdy1 · 2024-04-11T16:03:15Z

@lxcmyf

Firstly, considering the readability and maintainability of the code, we would like to obtain a code level documentation. If other chains (such as Ethereum clients) have the same implementation

Actually the only Ethereum Java client Besu uses Rust implementation via JNI for that

As you can see there's a whole besu-native project dedicated to using optimized math implementations in Besu Java code

Getting back to Montgomery multiplication implementation, do you need a math breakdown or a separate implementation write up? The core of the implementation is located in a single file (https://github.com/tronprotocol/java-tron/pull/5611/files#diff-99ee7c44ca170212da69fa123475bf66c7768696b855620b3e75ed73ba8ad695) and almost every line of code has a corresponding comment.

@yanghang8612
Basically, the idea boils down to transforming all the integers into the Montgomery form before doing any modulo additions and multiplications and transforming them back when everything is done. In Montgomery form the standard mod replaced by a more effective right shift operation. You can think of it as a kind of "batch" processing for modular operations: when multiple consecutive operations are executed ( such as in mutiplying an EC point by a scalar ) the speed gain is greater than the overhead of creating a batch ( i.e. transforming back and forth ) itself.
With addition on the other hand the situation is the opposite - transformation overhead outweighs the gains.

lxcmyf · 2024-04-12T06:27:48Z

@AllFi @r0wdy1
Yes, I see that the relevant annotations are already complete enough. Thank you for your efforts in this regard. Can the comparison of data consistency before and after this optimization be reflected in unit testing? Finally, please forgive my intrusion. Is it possible for this optimization implementation to be submitted on Ethereum Besu?

yanghang8612 · 2024-04-15T00:57:52Z

@r0wdy1 Understood. Thanks for the explanation.

AllFi · 2024-04-15T16:01:55Z

Hello, @lxcmyf!

I tested the old implementation (in a separate branch) with test cases mentioned earlier, and there weren't any differences compared to the current implementation. It is possible, but I suppose it would require keeping the old implementation intact. So, there would be two almost identical org.tron.common.crypto.zksnark packages and also two implementations of precompiles. If you think that it is necessary, it might be done, but as I mentioned earlier, the code will probably be a bit cumbersome.

I don't think this implementation makes a lot of sense for Besu. They already have a native implementation that works orders of magnitude faster than the Java implementation. I don't see any reason for them to be interested in optimizing the Java implementation by 30% or so.

lxcmyf · 2024-04-16T13:52:42Z

Hello, @lxcmyf!

I tested the old implementation (in a separate branch) with test cases mentioned earlier, and there weren't any differences compared to the current implementation. It is possible, but I suppose it would require keeping the old implementation intact. So, there would be two almost identical org.tron.common.crypto.zksnark packages and also two implementations of precompiles. If you think that it is necessary, it might be done, but as I mentioned earlier, the code will probably be a bit cumbersome.

I don't think this implementation makes a lot of sense for Besu. They already have a native implementation that works orders of magnitude faster than the Java implementation. I don't see any reason for them to be interested in optimizing the Java implementation by 30% or so.

I see. Thank you for informing me.

barbatos2011 · 2024-05-07T09:29:20Z

Hello, @lxcmyf!

I tested the old implementation (in a separate branch) with test cases mentioned earlier, and there weren't any differences compared to the current implementation. It is possible, but I suppose it would require keeping the old implementation intact. So, there would be two almost identical org.tron.common.crypto.zksnark packages and also two implementations of precompiles. If you think that it is necessary, it might be done, but as I mentioned earlier, the code will probably be a bit cumbersome.

I don't think this implementation makes a lot of sense for Besu. They already have a native implementation that works orders of magnitude faster than the Java implementation. I don't see any reason for them to be interested in optimizing the Java implementation by 30% or so.

@AllFi This work is great! Maybe try to submit to Besu is a way to accelerate code merging. With more people to review and test, the result will be more convincing.

r0wdy1 · 2024-05-07T16:59:08Z

@barbatos2011
May be it would make sense to do it the opposite way and merge Besu implementation to Tron?
Creating a PR to Besu without an intent for it to be actually merged is very weird
It is unlikely that we would get any meaningful feedback let alone the fact that you're suggesting to basically waste their time on reviewing it
Just reiterating what's being already said: Besu uses native implementation very similar to what has been proposed by our team in the first place almost a year ago #5492

yuekun0707 · 2024-05-09T10:51:09Z

@barbatos2011 May be it would make sense to do it the opposite way and merge Besu implementation to Tron? Creating a PR to Besu without an intent for it to be actually merged is very weird It is unlikely that we would get any meaningful feedback let alone the fact that you're suggesting to basically waste their time on reviewing it Just reiterating what's being already said: Besu uses native implementation very similar to what has been proposed by our team in the first place almost a year ago #5492

Yes, JNI implementation has the best performance, but maybe for TRON adding JNI is a big thing.
For the java implementation, like you said, Besu possibly doesn't need this optimization now. My point is there are more zk transactions on Ethereum than TRON, if the code works good on Besu, it will help to convince the TRON team to merge the PR.

r0wdy1 · 2024-05-14T13:37:51Z

if the code works good on Besu, it will help to convince the TRON team to merge the PR.

This code won't ever work good on Besu because it won't be merged to Besu, because it would mean worse performance
There are no zk snark transaction on Tron , because pairings don't work
So basically no rollups, no privacy, no verifiable credentials, zkml and other fancy stuff
The only way is to outsource pairings check to an MPC committee which doesn't really scale or to use ZKP without pairings like zk starks which have their own downsides.

I feel that we're going in circles for several months so I assume that PR won't move forward unless the community would really push it due to a sudden change in scaling/privacy requirements

yuekun0707 · 2024-06-12T03:56:55Z

if the code works good on Besu, it will help to convince the TRON team to merge the PR.

This code won't ever work good on Besu because it won't be merged to Besu, because it would mean worse performance There are no zk snark transaction on Tron , because pairings don't work So basically no rollups, no privacy, no verifiable credentials, zkml and other fancy stuff The only way is to outsource pairings check to an MPC committee which doesn't really scale or to use ZKP without pairings like zk starks which have their own downsides.

I feel that we're going in circles for several months so I assume that PR won't move forward unless the community would really push it due to a sudden change in scaling/privacy requirements

Yes, there are few this kind transactions on TRON now. Zk has its advantages but also potential risks. So maybe more time is required to plan this. According to the Devs Community Call before, the TRON team is doing on some exploratory work on zk, looking forward to the progress.

feat(precompiles): implement montgomery arithmetic

70b9645

1. implement montgomery arithmetic 2. add tests from go-ethereum

r0wdy1 mentioned this pull request Dec 11, 2023

Slow precompiles for alt_bn128 curve #5492

Open

halibobo1205 added topic: Smart contract VM, smart contract type:feature Improvement labels Dec 13, 2023

feat(precompiles): fix code style issues

410cce4

Federico2014 previously approved these changes Dec 18, 2023

View reviewed changes

Federico2014 requested review from Federico2014 and removed request for Federico2014 December 18, 2023 16:21

feat(precompiles): check negative values

04a00c2

Federico2014 approved these changes Dec 20, 2023

View reviewed changes

lxcmyf mentioned this pull request Feb 27, 2024

Core Devs Community Call 12 tronprotocol/pm#75

Closed

feat(precompiles): optimize native implementation of the bn128 precompiles #5611

Are you sure you want to change the base?

feat(precompiles): optimize native implementation of the bn128 precompiles #5611

Conversation

AllFi commented Dec 11, 2023

3for commented Dec 19, 2023

AllFi commented Dec 19, 2023

3for commented Dec 20, 2023

AllFi commented Dec 20, 2023

3for commented Dec 20, 2023

r0wdy1 commented Dec 29, 2023

lxcmyf commented Feb 22, 2024

r0wdy1 commented Feb 22, 2024

lxcmyf commented Feb 22, 2024

r0wdy1 commented Feb 27, 2024

Murphytron commented Feb 28, 2024

r0wdy1 commented Feb 28, 2024

Murphytron commented Feb 28, 2024 • edited Loading

xiangjie256329 commented Mar 2, 2024

tomatoishealthy commented Mar 2, 2024

lxcmyf commented Mar 2, 2024

barbatos2011 commented Mar 8, 2024

AllFi commented Mar 11, 2024

barbatos2011 commented Mar 11, 2024

AllFi commented Mar 11, 2024

barbatos2011 commented Mar 12, 2024

GuipaiQigong111 commented Mar 13, 2024

AllFi commented Mar 15, 2024

GuipaiQigong111 commented Mar 17, 2024

lxcmyf commented Mar 26, 2024 • edited Loading

yanghang8612 commented Apr 10, 2024

AllFi commented Apr 11, 2024

Documentation

Tests

Benchmarks

BN128Addition precompile

yanghang8612 commented Apr 11, 2024

Documentation

Tests

Benchmarks

BN128Addition precompile

r0wdy1 commented Apr 11, 2024

lxcmyf commented Apr 12, 2024 • edited Loading

yanghang8612 commented Apr 15, 2024

AllFi commented Apr 15, 2024

lxcmyf commented Apr 16, 2024 • edited Loading

barbatos2011 commented May 7, 2024 • edited Loading

r0wdy1 commented May 7, 2024

yuekun0707 commented May 9, 2024

r0wdy1 commented May 14, 2024

yuekun0707 commented Jun 12, 2024

Murphytron commented Feb 28, 2024 •

edited

Loading

lxcmyf commented Mar 26, 2024 •

edited

Loading

lxcmyf commented Apr 12, 2024 •

edited

Loading

lxcmyf commented Apr 16, 2024 •

edited

Loading

barbatos2011 commented May 7, 2024 •

edited

Loading