diff --git a/README.md b/README.md index 001c3f5..5ae147f 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,7 @@ Cleanba is CleanRL's implementation of DeepMind's Sebulba distributed training f **Scalable**: We can scale to N+ GPUs allowed by `jax.distributed` and memory (e.g., it can run with 16 GPUs). This makes cleanba suited for large-scale distributed training tasks such as RLHF. +**Understandable**: We adopt the single-file implementation philosophy used in CleanRL, making our core codebase succinct and easy to understand. For example, our `cleanba/cleanba_ppo_envpool_impala_atari_wrapper.py` is ~800 lines of code. @@ -139,4 +140,4 @@ To improve efficiency of Cleanba, we uses JAX and EnvPool, both of which are des [Espeholt et al., 2018](https://arxiv.org/abs/1802.01561) did not disclose the hardware usage and runtime for the Atari experiments. We did our best to recover its runtime by interpolating the results from the [R2D2 paper](https://openreview.net/pdf?id=r1lyTjAqYX) and found IMPALA (deep) takes ~2 hours. -![](static/r2d2_impala.png) \ No newline at end of file +![](static/r2d2_impala.png)