-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(precompiles): optimize native implementation of the bn128 precompiles #5611
base: develop
Are you sure you want to change the base?
feat(precompiles): optimize native implementation of the bn128 precompiles #5611
Conversation
1. implement montgomery arithmetic 2. add tests from go-ethereum
Hi, @AllFi , I add a test case |
Hi, @3for, @Federico2014 ! Thank you for the review! There's indeed an uncaught exception, but it is only possible because access modifiers in the The current implementation doesn't expose any constructors/methods to instantiate Precompiled contracts also check that the created points are not equal to null. In summary, the current implementation assumes that it is the responsibility of the caller to check that the created When we perform operations within the package, we use constructors directly, so it also seems impossible to encounter a Am I missing something? |
I think it's okay for the caller to handle the 'null Fp value' case. There're another two issues:
|
Yes, that does make sense. Thank you for pointing it out. Fixed in 04a00c2.
I am not sure about it. I have the following concerns:
That's why I believe it's better to keep it as is, but the decision is up to you. |
It looks good to me now. Keeping |
@3for @Federico2014 @lxcmyf @tomatoishealthy is there any additional help or information that we can provide in regards of this PR? |
@r0wdy1 |
@lxcmyf , happy to hear from you again |
@r0wdy1 |
@lxcmyf can or should we participate in the call too? |
Hi @r0wdy1, if you feel you would contribute to the meeting by your attendance, please leave a comment here about the topics to be discussed: tronprotocol/pm#75, and meanwhile email to Murphy, so we could invite you to the call, thank you. |
@Murphytron frankly I'm not familiar with the decision process in Tron core team and whether the presence of this topic in the agenda implies debates or just internal discussion on further steps. |
Thank you for the active and positive contribution. The core devs community call is where developers of TRON introduce and explain their issues or TIPs for peer review and discussion. If you have more to further explain the topic, you are welcome to attend the call, otherwise you could also give other developers some time to finish the review procedure, and attend the next core devs community call if necessary. |
预计什么时候合并呢,我们想再复测一下之前跑不了的合约,合并到nile后会把超时时间改成80ms吗,不然120ms不会报错,之前的问题不会复现,或者把这个pr先更新到shasta上也行,总得有个和主网完全相同的环境吧 |
Good advice, also concerned about future planning @lxcmyf |
@tomatoishealthy |
Hi @r0wdy1, there are two things we need to do before going to nile testnet.
|
Hello! I would like to provide a bit more context about benchmarks after reading the discussion. I've used "average" test cases (not involving infinity and low scalars) from go-ethereum set to benchmark the code. You can find the exact test that I've used to perform benchmark here: zkBob@1555a37. The only difference is that I've performed more iterations than specified in this commit. |
Hi @AllFi , thanks for your information! Are the benchmarks' results mentioned above from this commit's test code, right? |
Hi @barbatos2011! Yes, they are. To be more precise, I used the |
@AllFi OK, thanks again for this, I will share this information to other developers and will try it if necessary. |
Has the code within this pull request been implemented on other blockchains and undergone live deployment previously? Additionally, is there available documentation that provides an overview of the optimized computational logic? |
Hi, @GuipaiQigong111! I am not sure what do you mean exactly. If you are asking if other blockchains use Montgomery form, then the answer is yes. All efficient implementations of these precompiles use Montgomery form as far as I know. Though they implement more complicated multiprecision integer arithmetics. If you are asking about the exact code from this PR, then the answer is no as far as I know. The original implementation of these precompiles was taken from https://github.com/ethereum/ethereumj and I am not aware of any project that uses it now apart from the Tron. I am also not aware of any more efficient implementations on Java. I tried implementing these precompiles using well-known arkworks library written on Rust but that PR was postponed. That's why the goal of this PR was to optimize existing implementation by introducing as few modifications as possible. The technique used in this PR is called Montgomery modular multiplication and there are a lot of resources that you can check. The implementation in this PR is probably the simplest one (since it doesn't involve multiprecision integers arithmetics) so this Wikipedia page might be enough to understand the code. |
@AllFi |
@AllFi Firstly, considering the readability and maintainability of the code, we would like to obtain a code level documentation. If other chains (such as Ethereum clients) have the same implementation, it can be noted in the documentation and compared if necessary. This kind of document not only helps us to have a deeper understanding of the principles and logic behind optimization, but also provides great convenience for future maintenance and expansion work. We would greatly appreciate it if you could provide some assistance in this regard. Secondly, in order to ensure that the optimized code remains functionally consistent with the original code, we would like to know how you evaluate data consistency, as well as the evaluation methods and data you use. This helps us to use optimized code with more confidence and reduce potential risks. Finally, we are also very concerned about the performance comparison data between the optimized code and the Ethereum Java client under the same opcode. Such data can intuitively reflect the effectiveness of your optimization and help us better evaluate the value of this improvement in practical applications. We understand that these requests may involve some additional work for you, but please believe that these tasks are crucial for our project. We would greatly appreciate it if you could provide this information at your convenience. |
Based on our local performance tests, the results are largely consistent with yours. Is there a better improvement option to keep the |
Hello, @lxcmyf @yanghang8612! Sorry for the late reply. DocumentationAs far as I know, most clients use the Montgomery form, but they implement multi-precision arithmetics, so the code in these implementations differs significantly from the current implementation. Even after optimization, the current implementation remains quite naive and unique in this regard. Therefore, I am not sure if comparing it with other implementations would be useful. Regarding the code-level documentation, could you please provide some examples for reference? It would also be helpful if you could point out which parts are unclear and require a better explanation. TestsI've added test cases from BenchmarksI've conducted a simple comparative benchmark between
It's worth noting that Hyperledger Besu has two implementations of these precompiles: the Java version and the native version. I've used the input data from Besu benchmarks in the Here are the results:
It seems that the Java implementation of
As you can see, this implementation is comparable and even slightly faster than Besu's native implementation. I believe the only way to achieve comparable speed with other clients for these precompiles is by using a native implementation. BN128Addition precompileI've tried to figure out something that might help but didn't manage to mitigate the constant overhead related to conversion between forms. There's always the option to keep the old implementation and use it only for this precompile. It will be a bit cumbersome, but it might be done. |
Great job. We'll look into it carefully. |
Actually the only Ethereum Java client Besu uses Rust implementation via JNI for that As you can see there's a whole besu-native project dedicated to using optimized math implementations in Besu Java code Getting back to Montgomery multiplication implementation, do you need a math breakdown or a separate implementation write up? The core of the implementation is located in a single file (https://github.com/tronprotocol/java-tron/pull/5611/files#diff-99ee7c44ca170212da69fa123475bf66c7768696b855620b3e75ed73ba8ad695) and almost every line of code has a corresponding comment. @yanghang8612 |
@AllFi @r0wdy1 |
@r0wdy1 Understood. Thanks for the explanation. |
Hello, @lxcmyf! I tested the old implementation (in a separate branch) with test cases mentioned earlier, and there weren't any differences compared to the current implementation. It is possible, but I suppose it would require keeping the old implementation intact. So, there would be two almost identical I don't think this implementation makes a lot of sense for Besu. They already have a native implementation that works orders of magnitude faster than the Java implementation. I don't see any reason for them to be interested in optimizing the Java implementation by 30% or so. |
I see. Thank you for informing me. |
@AllFi This work is great! Maybe try to submit to Besu is a way to accelerate code merging. With more people to review and test, the result will be more convincing. |
@barbatos2011 |
Yes, JNI implementation has the best performance, but maybe for TRON adding JNI is a big thing. |
This code won't ever work good on Besu because it won't be merged to Besu, because it would mean worse performance I feel that we're going in circles for several months so I assume that PR won't move forward unless the community would really push it due to a sudden change in scaling/privacy requirements |
Yes, there are few this kind transactions on TRON now. Zk has its advantages but also potential risks. So maybe more time is required to plan this. According to the Devs Community Call before, the TRON team is doing on some exploratory work on zk, looking forward to the progress. |
What does this PR do?
This PR optimizes
BN128Multiplication
, andBN128Pairing
precompiled contracts by implementing Montgomery form arithmetic. It is worth mentioning that this PR makesBN128Addition
a little bit slower because of the overhead of the conversion to and from the Montgomery form.Why are these changes required?
The current implementation of these precompiles is relatively slow. In general, this hinders on-chain zkSNARKs verification and, consequently, makes zk-based apps almost unfeasible (#4311). This PR is aiming to mitigate this issue.
This PR is far worse than #5507 in terms of optimization but it doesn't require JNI and doesn't introduce a lot of changes in the implementation.
This PR has been tested by:
Follow up
Extra details
I've performed some benchmarks before and after these modifications. The results are below:
The average time of operations before (Intel(R) Core(TM) i7-10750H CPU, 32 GB RAM):
The average time of operations after: