-
Notifications
You must be signed in to change notification settings - Fork 216
Kafka's Storage‐Compute Separation Architecture: Offloading Storage to Ceph
Redesigning streaming systems based on cloud object storage, such as S3, has become an industry consensus. In recent years, numerous innovations based on object storage have emerged within the Apache Kafka ecosystem, including AutoMQ's shared storage architecture based on EBS and S3, Confluent's tiered storage, Warpstream's direct write to S3 architecture, and Redpanda's shadow indexing. These storage architecture not only significantly reduce costs by migrating data to distributed object storage like S3 but also simplify the architecture of Kafka streaming systems and enhance their elasticity.
Our project, AutoMQ, utilizes a shared storage architecture based on S3 and EBS, which has proven to offer exceptional cost-effectiveness and elasticity in cloud environments. In private data centers, this innovative storage architecture can also create Kafka streaming systems with low latency, low cost, high throughput, and ultimate elasticiy. Ceph can serve not only as low-latency block storage but also as a cost-effective object storage service. If you have deployed Ceph in your environment, this tutorial will guide you on how to build a Kafka streaming system in your private data center that offloads storage to Ceph, achieving an optimal balance of latency, cost, and resilience.
Tips: AutoMQ is a cloud-native fork of Kafka that reinvents Kafka's storage layer with a shared storage architecture. Therefore, you can regard AutoMQ simply as an enhanced Kafka streaming system.
AutoMQ utilizes EBS and S3 for storage, while CEPH supports both POSIX and S3 access protocols, making it an ideal storage backend for AutoMQ. Below is a guide for deploying AutoMQ on CEPH.
-
For installing CEPH, refer to: https://docs.ceph.com/en/latest/install/
-
For installing CEPH's S3 compatible component RGW, refer to: https://docs.ceph.com/en/latest/cephadm/services/rgw/
-
For guidance on mounting raw devices on Linux hosts as per Ceph official documentation, refer to: https://docs.ceph.com/en/latest/rbd/
-
Configure the raw device path to /dev/vdb.
-
AutoMQ uses a raw device to store WAL data at a specified path. This can be configured using the startup parameter --override s3.wal.path=/dev/vdb.
radosgw-admin user create --uid="automq" --display-name="automq"
By default, users are created with full permissions required for AutoMQ. For reduced permissions, consult the CEPH official documentation for customized settings. Here are the results after executing the commands mentioned above:
{
"user_id": "automq",
"display_name": "automq",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"subusers": [],
"keys": [
{
"user": "automq",
"access_key": "X1J0E1EC3KZMQUZCVHED",
"secret_key": "Hihmu8nIDN1F7wshByig0dwQ235a0WAeUvAEiWSD"
}
],
"swift_keys": [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"default_storage_class": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"check_on_raw": false,
"max_size": -1,
"max_size_kb": 0,
"max_objects": -1
},
"temp_url_keys": [],
"type": "rgw",
"mfa_ids": []
}
- Set environment variables to configure the Access Key and Secret Key required for AWS CLI.
export AWS_ACCESS_KEY_ID=X1J0E1EC3KZMQUZCVHED
export AWS_SECRET_ACCESS_KEY=Hihmu8nIDN1F7wshByig0dwQ235a0WAeUvAEiWSD
- Use the AWS CLI to create an S3 bucket.
aws s3api create-bucket --bucket automq-data --endpoint=http://127.0.0.1:80
Below are the essential parameters needed to generate an S3 URL:
Parameter Name |
Default Value in This Example |
Description |
---|---|---|
--s3-access-key |
X1J0E1EC3KZMQUZCVHED |
After creating a Ceph user, remember to replace it according to the actual situation |
--s3-secret-key |
Hihmu8nIDN1F7wshByig0dwQ235a0WAeUvAEiWSD |
After creating a Ceph user, remember to replace it according to the actual situation |
--s3-region |
us-west-2 |
This parameter is ineffective in Ceph, it can be set to any value, such as us-west-2 |
--s3-endpoint |
http://127.0.0.1:80 |
This parameter is the address served by Ceph's S3-compatible component RGW. If there are multiple machines, it is recommended to use a load balancer (SLB) to consolidate into one IP address. |
--s3-data-bucket |
automq-data |
- |
--s3-ops-bucket |
automq-ops |
- |
Having set up WAL and S3URL, you can now move forward with deploying AutoMQ. Please follow the detailed guidelines provided on Cluster Deployment on Linux▸.