How to uniformly distribute data in all shards? #7535
-
PG: 15 Hi All, We have a large table distributed on a date column(event_date) with a shard count 128. We checked the shard sizes, seems the data is not uniform across all the shards. Some shards are fat having more than 100GB of data while many have very less data. Data size for each day is almost the same around, So the expectation is all shards should have almost similar sizes; and not be so skewed. This impacts the query latency when it hits a fat shard. Can someone let us know how to redistribute data among the shards uniformly?
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
Hi @tu5har, Citus performs hash calculation over the distribution column value to identify which shard will be used. It's expected to observe that some shards might have more data/query load than others. If you would like to achieve equally distributed data/query load across your shards based on a date data type distribution column, we suggest using the timeseries feature in Citus. You can find more details at https://docs.citusdata.com/en/stable/use_cases/timeseries.html#timeseries-data. |
Beta Was this translation helpful? Give feedback.
Hi @tu5har,
Citus performs hash calculation over the distribution column value to identify which shard will be used. It's expected to observe that some shards might have more data/query load than others. If you would like to achieve equally distributed data/query load across your shards based on a date data type distribution column, we suggest using the timeseries feature in Citus. You can find more details at https://docs.citusdata.com/en/stable/use_cases/timeseries.html#timeseries-data.