CPS-???? | Block Delay Centralisation #943

TerminadaDrep · 2024-12-03T08:52:58Z

Abstract

An underlying assumption in the design of Cardano's Ouroboros protocol is that the probability of a stake pool being permitted to update the ledger is proportional to the relative stake of that pool. However, the current implementation does not properly realise this design goal.

Minor network delays endured by some participants cause them to face a greater number of fork battles. The result is that more geographically decentralised participants do not obtain participation that is proportional to their relative stake. This is both a fairness and security issue.

(rendered latest version)

rphair

@TerminadaDrep thanks for getting this into the open after putting so much work into this issue & led so much constructive discussion about it in the last couple of years. I'm marking this Triage to introduce it at the CIP meeting in a week's time (https://hackmd.io/@cip-editors/102) & you would be very welcome to attend if possible & field some initial questions.

CPS-XXXX/README.md

SmaugPool · 2024-12-03T13:39:41Z

CPS-XXXX/README.md

+
+This might seem like a minor problem, but the effect is significant.  If the majority of the network reside in USA - Europe with close connectivity and less than 1 second propagation delays, then those participant pools will see 5% of their blocks suffering "fork battles" which will only occur when another pool is awarded the exact same slot (ie: a "slot battle").  They will lose half of these battles on average causing 2.5% of their blocks to get dropped, or "orphaned".
+
+However, for a pool that happens to reside on the other side of the world where network delays might be just over 1 second, this pool will suffer "fork battles" not only with pools awarded the same slot, but also the slot before, and the slot after.  In other words, this geographically decentralised pool will suffer 3 times the number of slot battles, amounting to 15% of its blocks, and resulting in 7.5% of its blocks getting dropped.  The numbers are even worse for a pool suffering 2 second network delays because it will suffer 5 times the number of "fork battles" and see 12.5% of its blocks "orphaned".  This not only results in an unfair reduction in rewards, but also the same magnitude reduction in contribution to the ledger.


This CPS does not provide data showing that 1s is not enough for blocks to propagate anywhere in the world with required hardware, connection and configuration. Without such data, it's impossible to determine if it's needed or not, and what solution would work.

I often get 1000ms ping times just crossing India's border: though AWS Mumbai is probably exempt from such delays, which I imagine result from "great firewall" type packet inspection from the newly founded & somewhat ill-equipped surveillance state here. (p.s. this has become an additional reason why our pool is administered in India but all nodes are in Europe: which supports the author's premise fairly well.)

This block was produced by my BP earlier today. It was full at 86.57kB in size, containing 64 transactions, and 66.17kB of scripts: https://cexplorer.io/block/c740f9ce8b25410ddb938ff8c42e12738c18b7fd040ae5224c53fb45f04b3ba0

These are the delays (from beginning of the slot) before each of my own relays included this block in their chains:

Relay 1 (ARM on same LAN) → Delayed=0.263s

Relay 2 (AMD on adjacent LAN) → Delayed=0.243s

Relay 3 (ARM approx 5 Km away) → Delayed=0.288s

Relay 4 (AMD Contabo vps in USA) → Delayed=2.682s

Relay 5 (ARM Netcup vps in USA) → Delayed=1.523s

The average propagation delay by nodes pinging the data to pooltool was 1.67 seconds: https://pooltool.io/realtime/11169975

@TerminadaDrep Could you add the above delay metrics to the CPS? I think having empirical data would help strengthen the case of this CPS. Also, could you indicate whether your BP is locally controlled or in a vps? I'm guessing it is locally controlled.

This doesn't say in which country is the BP.
Also I'm not sure we should target low spec VPS nodes, is that an aim of Cardano? Or even just VPSs?

Good nodes require a good control of the hardware and software, which VPSs don't really offer. Some in this list are particularly known to provide bad performance, and virtualization adds overhead.

Moreover configuration optimization can help with latency (tracing, mempool size, TCP slow start, congestion control, etc..), so more details are needed.

Overall I believe we cannot conclude that 1s is not enough with just this data point.

As a counter-example here is a SMAUG full 86.37 KB block with 97 transactions propagated in average in 0.46s:
https://pooltool.io/realtime/11147794

My furthest relay in Singapore received it in 550ms.

And most of my blocks propagate quicker than that.

I'm not generally saying intercontinental propagation is irrelevant here. It actually is measurable and enforced by speed of light physical law. So I also disagree with one of the points in the CPS, in expectations of future transmission and latency improvements.

It is interesting that the delay is clearly in the Australian part of the internet. Perhaps the Aussie national broadband network (NBN) was more congested than usual at this time.

Certainly this block had worse than usual propagation delays.

I added some more examples which include pooltool data for a couple of pools in Japan.

I also added a new "Arguments against" No: 5. to discuss the extra infrastructure cost requirements for a BP in Australia to try to reduce the disadvantage that is inherent to the current Ouroboros implementation.

So I also disagree with one of the points in the CPS, in expectations of future transmission and latency improvements.

@gufmar I'm not an expert on networks, but I really don't think we should be relying on improvements to network throughput here due to the rebound effect. Demand will very likely increase to use up any extra "slack" (eg, Leios, block size increases, and even non-blockchain demand). If network capacity doubles but so does demand, we can easily find ourselves here again.

Yes sorry if I was unclear. I wanted to say I disagree with the argument made at https://github.com/cardano-foundation/CIPs/blob/e7bf9b4c103f3841f2d8364e78905c1183ee9526/CPS-XXXX/README.md#arguments-against-correcting-this-unfairness

because I don't expect we will see significant improvements in network latency which is the main limiting factor here with the TCP ACK back and forth packets, not so much the Throughput.

Co-authored-by: Robert Phair <rphair@cosd.com>

SmaugPool · 2024-12-03T13:40:51Z

CPS-XXXX/README.md

+
+However, for a pool that happens to reside on the other side of the world where network delays might be just over 1 second, this pool will suffer "fork battles" not only with pools awarded the same slot, but also the slot before, and the slot after.  In other words, this geographically decentralised pool will suffer 3 times the number of slot battles, amounting to 15% of its blocks, and resulting in 7.5% of its blocks getting dropped.  The numbers are even worse for a pool suffering 2 second network delays because it will suffer 5 times the number of "fork battles" and see 12.5% of its blocks "orphaned".  This not only results in an unfair reduction in rewards, but also the same magnitude reduction in contribution to the ledger.
+
+Even the high quality infrastructure of a first world country like Australia is not enough to reliably overcome this problem due to its geographical location.  But is it reasonable to expect all block producers across the world to receive blocks in under one second whenever the internet becomes congested, or if block size is increased following parameter changes?  Unfortunately, the penalty for a block producer that cannot sustain this remarkable feat of less than 1 second block receipt and propagation, is 3 times as many "fork battles" resulting in 7.5% "orphaned" blocks rather than 2.5%.


Some data is needed to prove the Australia's case and to be able to reproduce it and evaluate working solutions.

For example this AWS datacenter to datacenter round trip latency map does not seem to be enough to prove the point:

I agree with data helping, but I want to point out that using AWS as the benchmark doesn't seem appropriate since the goal is to not have AWS control most of the block producers.

P.S. I'm not suggesting you mean to use AWS as the benchmark, I just felt like this point should be made explicit.

Also we can't be sure these hop times aren't between AWS back-end networks: and therefore not including time spent for unaffiliated traffic to enter & exit backbone networks or cross the "last mile" of retail Internet services.

I agree with data helping, but I want to point out that using AWS as the benchmark doesn't seem appropriate since the goal is to not have AWS control most of the block producers.

P.S. I'm not suggesting you mean to use AWS as the benchmark, I just felt like this point should be made explicit.

The point was that if AWS datacenter to datacenter round trip latency was already more than 1s between 2 points in the world, it would have been enough to prove the CPS point, because it's close to the best case connectivity wise (independently from the centralization issue). But that's not the case, so more data is needed. I didn't mean to say anything else, you are interpreting, so I think it was appropriate, to show that more data is indeed needed. See my original quote:

For example this AWS datacenter to datacenter round trip latency map does not seem to be enough to prove the point:

Unfortunately not this week, because involved in other things, but I can assure you, we have plenty of latency and propagation data. Not only general latency but actual mainnet block propagation times as transmitted and received via Ouroboros mini protocols.
And we have it for 2.5 years of history, covering a bunch of different normal and extraordinary network situations. For small and large blocks, with no, up to max script execution units.
The previously showed gantt chart here is just one visualization for one block. I'm happy to invite to a workshop call where we go through some of these data points, computed averages for predefined edge cases etc.

Co-authored-by: Robert Phair <rphair@cosd.com>

happystaking · 2024-12-03T16:10:07Z

I can only speak for my own pool, but when I compare a high spec relay (A) to a low spec relay (B) the average time from slot begin to adoption on relays A and B is 240ms and 560ms respectively.

Blocks are forged on node C which is on the same physical machine as relay A. Relay B is 5500km (~100ms) away, so that time should be subtracted from the 560ms to take network delay (same LAN, no hops) out of the equation.

This (perhaps unscientific) method leads me to believe that having the right hardware does more to improve propagation times than being close to all other nodes. Moreover pinging halfway around the world should take no more than 300ms in an optimal situation. That means you should easily have enough time to bypass any countries or areas with deep packet inspection (which cause high network delays) and still stay under 1 second.

TerminadaDrep · 2024-12-05T03:37:29Z

Coincidentally, the last two TERM blocks in a row happened to have a leader for the next slot.

A full block which pooltool reported avg propagation time of 0.87s --> But despite the average reported propagation being less than 1 second, the next producer IOGP did not receive it in time and created a fork. Unfortunately IOGP's block had the lower VRF so TERM lost the "fork battle" and got its block orphaned.
A small block which pooltool reported avg propagation time of 0.62s --> Which fortunately was received by the next producer TLK in time to produce its block at the next slot. So there was no fork and TERMs block did contribute to the chain.

TerminadaDrep · 2024-12-05T04:59:00Z

Another important consideration is that it is possible to maliciously game these forks.

The block VRF only depends on the following inputs:

Epoch nonce
Slot number
Pool private key

Therefore the block VRF is known ahead of time.

A malicious operator can run a modified version of cardano-node that inspects the previous block VRF, compares this to its own value, and decide whether to deliberately cause a fork or not if it knows it will win the "fork battle". This would allow a malicious group of pools to deliberately "orphan" blocks of other competitors in order to earn a higher percentage of the reward pot, and gain more control over consensus.

kiwipool · 2024-12-07T09:46:17Z

Hi

Here's some data from our onsite (our own site), NZ-based baremetal operation for comparison purposes. With the exception of perhaps 46S stakepool based in Invercargill we are likely to be the most remote stakepool in the ecosystem. We are never going to win any low latency awards from our location

We run high-specification enterprize grade servers connected via gigabit fibre for our NZ relays and NZ primary BPs. In addition to NZ-based baremetal we operate cloud relays spread around the world with reputable providers on decent hardware. Our 'Plan B' cloud-based failover system is also very high specification and produces significantly lower latency numbers than our NZ-based baremetal. We choose to run our primary system on baremetal in NZ for philosophical, rather than performance, reasons.

See the below summarized pooltool data for the last 50 epochs (E476-525) for KIWI
ID: 60397646d7d1ad6fe2ddccfe7efc9cba61f6d3d94d29e8f41de73240

Slots: 1,699
Height Battles: 16
Height Battle Wins: 4
Height Battle Losses: 12
Height Battles as % of Slots: 0.94%
Height Battle Wins as % of Slots: 0.24%
Height Battle Losses as % of Slots: 0.71%
Height Battle Wins as % of Height Battles: 25.00%
Height Battle Losses as % of Height Battles: 75.00%
Average Height Battles per Epoch: 0.32
Average Height Battle Wins per Epoch: 0.08
Average Height Battle Losses per Epoch: 0.24

EDIT:
Combined Height + Slot Battle Loss as % of Slots: 3.06%

Anecdotally we appear to be experiencing more height battles in the dynamic-p2p era

Hopefully this is useful for comparison purposes.

Matticus
🥝Kiwipool Staking

TerminadaDrep · 2024-12-08T06:44:03Z

Is it reasonable to expect that geographically remote pools must use high QoS guaranteed priority fibre plans whereas the majority in USA/EU can use the "normal" internet? Or is Ouroboros expected to function fairly with everyone using the "normal" internet?

rphair

@TerminadaDrep we discussed this at today's CIP meeting on Discord. The consensus was that

the title & scope need to be more specifically defined as "fairness" relative to geographical limitations according to the observations that you've pointed out;
the CPS must be targeted towards specific goals to alleviate those discrepancies.

Here are the recommendations that came up:

• The current title Block Delay Centralisation is ambiguous and maybe not accurate (since the delays are at perimeter, not the centre, of the network topology). We should agree upon a CPS title that states "fairness" as something currently difficult to achieve for nodes on the perimeter of the Cardano network.

• Given that a huge component of node propagation delays derive from the speed of light and a currently unchangeable TCP/IP stack, we need the CPS to be written so CIPs that fix whatever is fixable in Cardano can be attached to your problem statement: otherwise it might as well be an issue in the node repository issue queue (see further below).

• Other than altering the consensus mechanism, the only optimisations Cardano can therefore make are to address network inefficiencies. @colll78 indicated that perhaps a 20-fold improvement in network efficiency is forthcoming, and I believe this CPS should make it possible to link any such improvements on the core roadmap to the symptoms of the problem you identify.

• Extending the slot duration to accommodate network propagation times was seen as be a "brute force" solution that would have dramatic effects on the Consensus protocol, and therefore less promising than identifying points of research for improvements in network performance (hence also the category change highlighted below).

• Therefore the reviewers concluded the CPS should focus on what needs to be investigated, and then remediated, to improve fairness of Cardano PoS performance for nodes at the topological boundaries of the network.

My own question was (since I know you've been working steadily on getting visibility for these issues for a couple of years): do you have any Consensus or Node repository issues about these so far? If so, I think the issues + any responses would help to fill out the proposal along these lines... and should be linked in the CPS and/or the discussion here.

CPS-XXXX/README.md

…sus)

fallen-icarus · 2024-12-10T19:45:24Z

Extending the slot duration to accommodate network propagation times was seen as be a "brute force" solution that would have dramatic effects on the Consensus protocol, and therefore less promising than identifying points of research for improvements in network performance (hence also the category change highlighted below).

Apologies for missing the meeting, but increasing the slot duration should still be seriously considered despite any required changes. According to the discussion in the consensus working group on discord, the designers did discuss 1 second vs 2 second slot lengths but they didn't really have any hard evidence to prefer 2 seconds. So they went with 1 second since it caused fewer slot battles.

Now that we have actual geographical data, there may be hard evidence to prefer 2 seconds. I'm not saying other avenues shouldn't be explored, but the above wording makes it seem like the slot duration option was downplayed too much in the meeting.

rphair · 2024-12-11T12:03:15Z

cross-referencing a vital post from @karknu on network options, limitations & potential workarounds here: https://forum.cardano.org/t/problem-with-increasing-blocksize-or-processing-requirements/140044/7

Terminada-CPS-block-delay-centralisation

fdb3b0d

rphair changed the title ~~Terminada-CPS-block-delay-centralisation~~ CPS-???? | Block Delay Centralisation Dec 3, 2024

rphair reviewed Dec 3, 2024

View reviewed changes

rphair added State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. Category: Consensus Proposals belonging to the `Consensus` category. labels Dec 3, 2024

rphair added 4 commits December 3, 2024 17:50

convert authors to mandated list format

64dec3e

put Discussions in list format, set current discussion

cb416da

remove superfluous H1 document title

e7520b9

fix indentation level of responses in FAQ section

f9cf028

rphair reviewed Dec 3, 2024

View reviewed changes

CPS-XXXX/README.md Show resolved Hide resolved

SmaugPool reviewed Dec 3, 2024

View reviewed changes

Update CPS-XXXX/README.md

8d96ae1

Co-authored-by: Robert Phair <rphair@cosd.com>

SmaugPool reviewed Dec 3, 2024

View reviewed changes

TerminadaDrep and others added 3 commits December 3, 2024 23:40

Update CPS-XXXX/README.md

2bfbc81

Co-authored-by: Robert Phair <rphair@cosd.com>

Update CPS-XXXX/README.md

8381ac1

Co-authored-by: Robert Phair <rphair@cosd.com>

Added example

c63ce0d

builder and others added 5 commits December 4, 2024 22:12

Extra examples and argument against

b131ea2

fix typo

d4a8dae

fix formatting

e7bf9b4

Example of fork battle

7eef46e

Fix email address

2eff5b1

rphair reviewed Dec 10, 2024

View reviewed changes

CPS-XXXX/README.md Outdated Show resolved Hide resolved

likely only possible changes will be in network behaviour (not consen…

d9ec47a

…sus)

rphair added Category: Network Proposals belonging to the `Network` category. and removed Category: Consensus Proposals belonging to the `Consensus` category. labels Dec 10, 2024

rphair added State: Unconfirmed Triaged at meeting but not confirmed (or assigned CIP number) yet. and removed State: Triage Applied to new PR afer editor cleanup on GitHub, pending CIP meeting introduction. labels Dec 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPS-???? | Block Delay Centralisation #943

CPS-???? | Block Delay Centralisation #943

TerminadaDrep commented Dec 3, 2024 •

edited by rphair

Loading

rphair left a comment

SmaugPool Dec 3, 2024

rphair Dec 3, 2024 •

edited

Loading

TerminadaDrep Dec 3, 2024

fallen-icarus Dec 3, 2024

SmaugPool Dec 3, 2024 •

edited

Loading

gufmar Dec 3, 2024

TerminadaDrep Dec 4, 2024

TerminadaDrep Dec 4, 2024

fallen-icarus Dec 4, 2024

gufmar Dec 4, 2024

SmaugPool Dec 3, 2024

SmaugPool Dec 3, 2024

fallen-icarus Dec 3, 2024 •

edited

Loading

rphair Dec 3, 2024 •

edited

Loading

SmaugPool Dec 3, 2024 •

edited

Loading

gufmar Dec 3, 2024

happystaking commented Dec 3, 2024

TerminadaDrep commented Dec 5, 2024

TerminadaDrep commented Dec 5, 2024

kiwipool commented Dec 7, 2024 •

edited

Loading

TerminadaDrep commented Dec 8, 2024

rphair left a comment •

edited

Loading

fallen-icarus commented Dec 10, 2024

rphair commented Dec 11, 2024


		This might seem like a minor problem, but the effect is significant. If the majority of the network reside in USA - Europe with close connectivity and less than 1 second propagation delays, then those participant pools will see 5% of their blocks suffering "fork battles" which will only occur when another pool is awarded the exact same slot (ie: a "slot battle"). They will lose half of these battles on average causing 2.5% of their blocks to get dropped, or "orphaned".

		However, for a pool that happens to reside on the other side of the world where network delays might be just over 1 second, this pool will suffer "fork battles" not only with pools awarded the same slot, but also the slot before, and the slot after. In other words, this geographically decentralised pool will suffer 3 times the number of slot battles, amounting to 15% of its blocks, and resulting in 7.5% of its blocks getting dropped. The numbers are even worse for a pool suffering 2 second network delays because it will suffer 5 times the number of "fork battles" and see 12.5% of its blocks "orphaned". This not only results in an unfair reduction in rewards, but also the same magnitude reduction in contribution to the ledger.


		However, for a pool that happens to reside on the other side of the world where network delays might be just over 1 second, this pool will suffer "fork battles" not only with pools awarded the same slot, but also the slot before, and the slot after. In other words, this geographically decentralised pool will suffer 3 times the number of slot battles, amounting to 15% of its blocks, and resulting in 7.5% of its blocks getting dropped. The numbers are even worse for a pool suffering 2 second network delays because it will suffer 5 times the number of "fork battles" and see 12.5% of its blocks "orphaned". This not only results in an unfair reduction in rewards, but also the same magnitude reduction in contribution to the ledger.

		Even the high quality infrastructure of a first world country like Australia is not enough to reliably overcome this problem due to its geographical location. But is it reasonable to expect all block producers across the world to receive blocks in under one second whenever the internet becomes congested, or if block size is increased following parameter changes? Unfortunately, the penalty for a block producer that cannot sustain this remarkable feat of less than 1 second block receipt and propagation, is 3 times as many "fork battles" resulting in 7.5% "orphaned" blocks rather than 2.5%.

CPS-???? | Block Delay Centralisation #943

Are you sure you want to change the base?

CPS-???? | Block Delay Centralisation #943

Conversation

TerminadaDrep commented Dec 3, 2024 • edited by rphair Loading

Abstract

rphair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rphair Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SmaugPool Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fallen-icarus Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

rphair Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

SmaugPool Dec 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

happystaking commented Dec 3, 2024

TerminadaDrep commented Dec 5, 2024

TerminadaDrep commented Dec 5, 2024

kiwipool commented Dec 7, 2024 • edited Loading

TerminadaDrep commented Dec 8, 2024

rphair left a comment • edited Loading

Choose a reason for hiding this comment

fallen-icarus commented Dec 10, 2024

rphair commented Dec 11, 2024

TerminadaDrep commented Dec 3, 2024 •

edited by rphair

Loading

rphair Dec 3, 2024 •

edited

Loading

SmaugPool Dec 3, 2024 •

edited

Loading

fallen-icarus Dec 3, 2024 •

edited

Loading

rphair Dec 3, 2024 •

edited

Loading

SmaugPool Dec 3, 2024 •

edited

Loading

kiwipool commented Dec 7, 2024 •

edited

Loading

rphair left a comment •

edited

Loading