Load Balancing Chaincode as a Service #3350

jkneubuh · 2022-04-22T12:01:39Z

jkneubuh
Apr 22, 2022

This is a discussion ported from the Discord #fabric-kubernetes channel's #LOAD BALANCING CHAINCODE discussion thread.

This topic comes up frequently, e.g. in #3342 @denyeart and @kmilodenisglez inquire of the feasibility of a load balancer to distribute workload from a peer across multiple chaincode service endpoints. Instead of hosting the discussion over at a Discord channel, let's use GitHub as a forum to aggregate the discussion.

The TL/DR summary and recommendation at this time: NO.

The best practice / current recommendation is to deploy chaincode endpoints as 1:1 with the peer, leveraging the Gateway Client to distribute transaction load across peers.

jkneubuh · 2022-04-22T12:03:16Z

jkneubuh
Apr 22, 2022
Author

Tom — 03/04/2022

Let’s say you have org foo and org bar. You deployed chaincode X as a service in a k8s deployment.
Chaincode has been approved by org foo and bar and committed. Let’s say now your fabric network is running and dealing with some workload. Both org foo and org bar are invoking the chaincode. At some point you would like to scale this deployment to 3 replicas of the chaincode X. As of now, you cannot.
Due to the grpc stream established between the peer and the chaincode the scaling and load balancing cannot work.
Meaning you would need an additional layer that is smart enough to act as grpc multiplexer.
I am running a fabric network with vault integration that acts as CA. We took advantage of kubernetes to make our network HA but we are blocked at the chaincode level

0 replies

jkneubuh · 2022-04-22T12:04:19Z

jkneubuh
Apr 22, 2022
Author

joshk — 03/04/2022

Hi Tom. There have been some (very early) investigations into the application of a service mesh within the network to inject a gRPC multiplexer and router between Fabric components. (E.g. linkerd.io or istio or ...) This is "not ready" for a number of reasons, not the least of which is that the communication protocol between Peer and Chaincode may be stateful, or possibly require some extensions for the initial handshake and bridge between peer and CC. Also complicating matters with a gRPC message router is the opaque nature of establishing mTLS network links between nodes.

Setting up multiple CC endpoints behind a gRPC router - may (or could be made to..) work, but this seems like an area where we may need to make some changes in Fabric to "get it right," rather than inject a point solution.

Another venue for you to evaluate would be to employ the load balancing features provided by the new Gateway SDKs. In this case there is still a 1:1 relationship between peer:CC instance, but the load can be spread across transactions being submitted on behalf of different client connections to the peer.

There are some notes on using k8s Service routing features to implement this (sorry, still on a PR!) at #3065 and hyperledger/fabric-gateway#257

0 replies

jkneubuh · 2022-04-22T12:04:45Z

jkneubuh
Apr 22, 2022
Author

Tom — 03/04/2022

I do understand and I almost reached this level of understanding but thank you very much for the concrete insight.
I was not criticising, not at all. I was more pointing what I was able to have up and running and what I’m still struggling with
I read about the new gateway sdk but didn’t give it a try since from my understanding everything was ending in the same chaincode instance

thanks for the link of your open PR it might be very interesting 😊

0 replies

jkneubuh · 2022-04-22T12:05:54Z

jkneubuh
Apr 22, 2022
Author

joshk — 03/04/2022

Also if I may - @bestbeforetoday provided a couple of salient points in a related discussion, which I will relay here:

As I understand it, the peer and chaincode container use a gRPC stream to pass endorsement requests and responses. Once the stream is established it’s a point-to-point connection so you’re not load-balancing different requests across different receivers. Your opportunity to do load balancing is at the point the connection is established.

For the client, endorsement is a unary request rather than a stream, so every request can be balanced across a set of Gateway peers. The Gateway can potentially load balance endorsements across the set of network peers but in reality this may actually slow things down as you increase your opportunity for MVCC conflicts and transactions being rejected, so the Gateway prefers the highest block height (most up-to-date) peers to improve the chance of successful transactions and good data being returned to the client. You potentially get some natural load-balancing as the most up-to-date peers get hit harder, start to lag behind and then other, more up-to-date peers are preferred, but it’s not the same as spraying requests across a big pool of peers (unless all your peers happen to always be equally up-to-date)

There is no point in criticizing Fabric! It is what it is. Let's all figure out how to work together to make it work ... right!

Please check in and let the channel know how your endeavors with multiplexed / LB cc are panning out. Cheers and happy coding!

0 replies

jkneubuh · 2022-04-22T12:06:17Z

jkneubuh
Apr 22, 2022
Author

Daniel22 — 03/08/2022

What happens if one deploys the chaincode in one org as an external chaincode as kuebrnetes deployment with two pods connected by a service ? The service can do theoretically some round robin load balancing, Could it work ?

0 replies

jkneubuh · 2022-04-22T12:07:51Z

jkneubuh
Apr 22, 2022
Author

@bestbeforetoday — 03/09/2022

I think the issue is that (as I understand it, at least) the connection between peer and chaincode container is a bi-directional gRPC stream. So a persistent connection that is established only at startup. Ideally from a load-balancing point of view, the peer would be sending unary gRPC requests for simulation to a chaincode container, which would allow every request to be transparently balanced to a different chaincode container without any work required in the peer. I'm not sure whether this is practical for chaincode as-a-service, and it certainly doesn't work that way today so it would require significant rework. A halfway house might be to allow multiple chaincode as-a-service endpoints to be registered for a single chaincode, and have the peer round-robin requests over the set of streaming connections. I don't think this is currently possible, but I haven't looked closely at the code so other people might have better information

It would be really useful to understand whether the chaincode container (rather than the peer) really is often a bottleneck in transaction processing. If not, there is no value in rearchitecting the implementation

If you need to scale horizontally, you might be able to simply add more peers rather than looking for ways to add more chaincode containers for each peer

Where I have seen people trying to push very high transaction throughput, it has typically been by using a very small number of peers (and chaincode containers) but applying a lot of compute resource to those nodes

Whether this is actually the best approach, I don't know. Testing and profiling would probably be the way to be sure

0 replies

jkneubuh · 2022-04-22T12:08:08Z

jkneubuh
Apr 22, 2022
Author

Daniel22 — 03/09/2022

Thanks for the detailed answer. Just for the horizontal scale idea, can it work that way that for one organization, I have like 3 peers each having its own chaincode pod / container and I somehow loadbalance before the peers ?

0 replies

jkneubuh · 2022-04-22T12:08:38Z

jkneubuh
Apr 22, 2022
Author

@bestbeforetoday — 03/09/2022

If you're using one of the "legacy" client SDKs with discovery enabled, the behaviour will generally be that endorsement requests are randomly distributed across peers that can satisfy the endorsement requirements. Using the newer Fabric Gateway client API, the Gateway service will prefer peers with higher block height, and prefer the local peer where it has the highest block height, which may limit the load balancing across peers but gives you a better chance of avoiding an MVCC conflict and a failed transaction

Since all v2.4+ peers can (and by default do) run the Gateway service, the client is free to load-balance requests across multiple Gateway services. A good way to do this might be to use an ingress controller or load balancer endpoint as the Gateway service endpoint for the client, and have the ingress / load balancer dispatch requests to any Gateway peers in the organisation

There is some general information on gRPC load-balancing approaches here: https://grpc.io/blog/grpc-load-balancing/

0 replies

jkneubuh · 2022-04-22T12:08:58Z

jkneubuh
Apr 22, 2022
Author

Daniel22 — 03/10/2022

Is there perpahps any workarounds or better idea, if the use case is not having a constant load on fabric but one specific batch of data that must be somehow uploaded only once but possibly in a limited timeframe ?

0 replies

jkneubuh · 2022-04-22T12:09:45Z

jkneubuh
Apr 22, 2022
Author

@bestbeforetoday — 03/10/2022

There are a few general approaches that can significantly improve your transaction rate. The first is to submit transactions in parallel (not serially) so the orderer bundles more transactions into a single block and doesn't wait on a block cutting timeout for more transactions to arrive. You do need to consider your data model here as submitting transactions in parallel that access the same ledger keys will result in transaction failures due to MVCC read conflicts

You can also experiment with bundling multiple updates into a single transaction to find the optimum transaction size for throughput on your system. For example, instead of a CreateAsset transaction taking a single asset, you might get better throughput with a CreateAssets transaction that takes multiple assets within a single transaction

There are some configuration settings for block size and timeouts on the order that can can experiment with too

There is no silver bullet or one-size-fits-all answer though. The right settings, approach and Fabric deployment are likely to be system-dependent and you will need to do some testing to find the right answer for you

0 replies

jkneubuh · 2022-04-22T12:10:14Z

jkneubuh
Apr 22, 2022
Author

joshk — 03/16/2022

The "Free Peer From Chain[code]" google doc has some interesting, relevant discussion related to load balancing of chaincode. https://docs.google.com/document/d/14l-0jjxw0SLrkpgEuXr0ZxAn0BtTJAQTupXvVkeIL2s/edit#heading=h.sfdv7g4jzt6r

This discussion thread seems like an introduction to "phase 2" of the load-balancing approach(es) outlined in sections 4.3 in the doc.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Load Balancing Chaincode as a Service #3350

{{title}}

Replies: 11 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Load Balancing Chaincode as a Service #3350

jkneubuh Apr 22, 2022

Replies: 11 comments

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh Apr 22, 2022 Author

jkneubuh
Apr 22, 2022

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author

jkneubuh
Apr 22, 2022
Author