Add read balancer info #4

JoshSalomon · 2024-06-22T10:43:19Z

Hi JJ - great Page,
I believe it is worth adding information about the read balancer, especially since the Squid version will support OSDs of different sizes. Would you like to work with @ljflores and me about it?

TheJJ · 2024-06-22T20:03:18Z

sure, what do you have in mind?

JoshSalomon · 2024-06-22T22:15:09Z

Wondering where to start: Have you heard anything about the read balancer (available since Reef)?

TheJJ · 2024-06-25T14:58:42Z

yes, i saw the initial presentation slides, and wondered how it compared to my balancer, but I didn't priorize just remapping primaries so far, but i think this can be added, too.
I didn't use it in a production cluster so far.

Thinking about balancers, it seems that the whole crush approach may not be ideal after all, and just having an efficient pg->osd mapping lookup table is probably suitable for nearly all clusters. then we wouldn't have to fight crush with one hack after the other to get a better desired mapping adjustment, instead of (re)mapping it directly.

JoshSalomon · 2024-06-26T20:30:22Z

Ceph improved read balancer.pdf
If I understand correctly - your balancer is a capacity balancer, not a read balancer - but I just heard your presentation in the past and did not dive into the code.
The read balancer is only a meta data operation, and it does not move data so it is a completely different approach, and is cheaper to execute continuously (more on this later)
The first version (in Reef) just makes sure that in each OSD you have the fair share of primaries (the read balancer works only on replicated pools so in each OSD we try to make pg_num/replica_num primaries). Obviously we check it against CRUSH constraints.
In Squid, we added a functionality that improves cluster performance when the devices are not of the same size. We added a pool parameter for the read_ratio of the IOs to the pool (70 means that 70% of the ios to the pool are read and 30% write) - with this information, we can optimally move more reads to the smaller devices and let the larger devices handle less reads so we try to balance the IOPS per OSD (assuming the devices have the same performance profile). In the future, we may calculate the read ratio automatically based on metrics and make it an adaptive system (I am not sure this is needed, but it will be easy to implement) for optimal performance.
Attached is the presentation explaining the model behind this balancer and some examples.
If you think this is worth mentioning, Laura and I can open a PR with the explanation to this Ceph guide

TheJJ · 2024-07-07T08:57:12Z

Yes, sure, you can add a short section about it :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add read balancer info #4

Add read balancer info #4

JoshSalomon commented Jun 22, 2024

TheJJ commented Jun 22, 2024

JoshSalomon commented Jun 22, 2024

TheJJ commented Jun 25, 2024

JoshSalomon commented Jun 26, 2024 •

edited

Loading

TheJJ commented Jul 7, 2024

Add read balancer info #4

Add read balancer info #4

Comments

JoshSalomon commented Jun 22, 2024

TheJJ commented Jun 22, 2024

JoshSalomon commented Jun 22, 2024

TheJJ commented Jun 25, 2024

JoshSalomon commented Jun 26, 2024 • edited Loading

TheJJ commented Jul 7, 2024

JoshSalomon commented Jun 26, 2024 •

edited

Loading