This is the implementation repository of our SOSP'24 paper: Aceso: Achieving Efficient Fault Tolerance in Memory-Disaggregated Key-Value Stores.
This artifact provides the source code of Aceso and scripts to reproduce the experiment results.
This README is specifically for artifact evaluation (AE).
Aceso's current implementation requires at least 6 nodes, with 5 of them set up as memory nodes (MNs) and the remaining as compute nodes (CNs).
We recommend running Aceso using c6220 instances on CloudLab as MNs and r320 instances as CNs, where the code has been thoroughly tested.
We have reserved 6 c6220 nodes and 14 r320 nodes on CloudLab from Aug 19th to Sep 9th for AE, with the first 5 c6220 nodes as MNs and the remaining as CNs.
Please coordinate among all AE reviewers to ensure that only one person uses the cluster at a time, or the scripts may fail to run.
- You can simply use the provided account to use our reserved c6220&r320 nodes on CloudLab.
- We have provided our account (username:
hzs
) on the artifact submission site. Contact us if you don't know the password. - Log into the provided account on CloudLab, then please submit the SSH public key of your personal computer via
hzs
|-->Manage SSH keys
. - You will see an experiment named
Aceso-AE
after that fromExperiments
|-->My Experiments
. - Reboot all nodes in the cluster to have your submitted public key loaded via
Aceso-AE
|-->List View
|-->Reboot Selected
(This takes about 5 minutes). - Now you can log into all the 20 nodes with the
SSH command
inList View
. If you find some nodes have broken shells (which happens sometimes after rebooting in CloudLab), you can reboot them again viaList View
|-->Reboot Selected
.
- We have provided our account (username:
!!! Skip this step if you are using our created cluster with the provided account.
-
Click CloudLab and log in.
-
Click
Experiments
|-->Create Experiment Profile
|-->Upload File
. Upload./setup/cloudlab.profile
provided in this repo. -
(Optional) Click
Edit Topology
to customize the cluster structure (e.g., reduce the number of nodes). Click on the cluster's central node in the topology to customize IP addresses for each node, ensuring they are in the form of10.10.1.x
. ClickAccept
to save the topology for this profile. -
Click
Create
to save this profile in your account. -
Click
Instantiate
to create a cluster using the profile. -
Try logging into and check each node using the SSH commands provided in the
List View
.
!!! Skip this step if you are using our created cluster with the provided account.
Note that we should run the following steps on all nodes we have created.
-
Log into a node.
-
Copy the source code of
./setup/setup-sda4.sh
from GitHub to a new file in the node~/setup-sda4.sh
.cd ~ vim setup-sda4.sh <do the copy>
-
Mount the disk
sda4
to~/home
directory. Currently the~
directory has very limited space, so we need to mount the unallocated disksda4
to the~/home
directory.cd ~ # Dont use `sudo` bash setup-sda4.sh
-
Download Aceso's repo and name it as
aceso
.cd ~/home git clone https://github.com/dmemsys/Aceso.git aceso
-
Enter the Aceso directory. Install libraries and tools.
cd ~/home/aceso/setup # This takes about 30 minutes sudo bash setup-env.sh # This takes about 6 minutes sudo reboot cd ~/home/aceso/setup sudo bash setup-pylibs.sh # allocate an RDMA IP `10.10.10.x` from the ethernet IP `10.10.1.x`. sudo bash setup-ipoib.sh sudo reboot
-
Check if the RNIC is working, and discard all nodes that are not working.
Unfortunately, we sometimes find that 1 or 2 nodes have their RNIC disabled, probably due to physical issues.
ibstat # if RNIC is working it will show: # Port 1: # State: Active # Physical state: LinkUp # Rate: 56
-
Modify Aceso
Common.h
in./src
. Make sure parametersmemoryNodeNum
,memoryIPs
,memcachedIP
are correct. Example:constexpr uint32_t memoryNodeNum = 5; // [CONFIG] constexpr char memoryIPs[16][16] = { // [CONFIG] "10.10.10.1", "10.10.10.2", "10.10.10.3", "10.10.10.4", "10.10.10.5", }; constexpr char memcachedIP[16] = "10.10.10.1"; // [CONFIG]
-
Build Aceso.
cd ~/home/aceso && mkdir build && cd build cmake .. && make -j
-
Set Aceso
server_id
for each node.cd ~/home/aceso/build/major-test vim config.json
Modify
config.json
to the following.{ "server_id": 0 }
Note that the
server_id
parameter of thei-th
node should bei
. It should start from 0 and be continuous.
-
Set huge pages.
# on each memory node (MN) echo 28000 | sudo tee /proc/sys/vm/nr_hugepages
-
Start servers.
# We use memcached for: # 1. RDMA initialization; # 2. synchronization of clients during experiments; # on each memory node (MN) cd ~/home/aceso/src && ./run_memcached.sh cd ~/home/aceso/build/major-test && ./server
-
Set huge pages.
# on each compute node (CN) echo 4000 | sudo tee /proc/sys/vm/nr_hugepages
-
Start clients.
# on each compute node (CN) cd ~/home/aceso/build/major-test && ./client_perf <workload_name> <CN_num> <client_num_per_CN> <coro_num_per_client>
- workload_name: the name of workload to test.
- microbenchmarks: insert, update, search, delete.
- macrobenchmarks: workload[a|b|c|d], workloadupd[0|10|...100], workloadtwi[s|c|t].
- CN_num: the number of compute nodes (CNs).
- client_num_per_CN: the number of clients in each CN.
- coro_num_per_client: the number of coroutines in each client.
Example: try the microbenchmark test (UPDATE)
# on each compute node (CN) ./client_perf update 15 8 8
Note that the microbenchmarks of Aceso can run simply without extra workload files, while the macrobenchmarks (e.g., YCSB) need to be prepared first.
- workload_name: the name of workload to test.
-
Results.
Throughputs (ops/s) will be displayed on each client terminal, example:
total tpt: 562314
Latencies (us) will be collected in files under the
./build/major-test/results
directory, such asmicro_update_lat.txt
,micro_search_lat.txt
, where each linei j
indicates that a latency ofi
µs appearedj
times.Note that the latency files will be generated only when the parameter
coro_num_per_client
is set to1
.
!!! Skip this step if you are using our created cluster with the provided account.
-
Download all the testing workloads using sh
download-<xxx>-workload.sh
in directory./setup
and unpack the workloads.cd ~/home/aceso/setup sudo bash download-ycsb-workload.sh sudo bash download-twitter-workload.sh
Note that
ycsb-full
workloads contain 1 million keys, whileycsb
workloads contain 0.1 million. In artifact evaluation, we recommend usingycsb
workloads because they significantly reduce the initial loading time, without affecting Aceso's performance. -
Execute the following command in directory
./setup
to split the workloads intoN
parts(N
is the total number of client threads):python3 split-ycsb-workload.py [N] python3 split-twitter-workload.py [N]
-
Copy the split workloads to
./build/major-test/workloads
.cd ~/home/aceso mkdir -p ./build/major-test/workloads/ cp ./setup/workloads/* ./build/major-test/workloads/ cp ./setup/twi-workloads/* ./build/major-test/workloads/
Then we can start testing Aceso using macrobenchmarks.
We provide code and scripts in ./exp
for reproducing our experiments. For more details, see ./exp/README.md.
Some part of Aceso's codebase from FUSEE, SMART, and Sherman.
The automatic testing scripts in ./exp
are based on the work done in SMART by Xuchuan. Special thanks to his contributions!