Releases: chainer/chainerrl
Releases · chainer/chainerrl
v0.8.0
Announcement
This release will probably be the final major update under the name of ChainerRL. The development team is planning to switch its backend from Chainer to PyTorch and continue its development as OSS.
Important enhancements
- Soft Actor-Critic (https://arxiv.org/abs/1812.05905) with benchmark results is added.
- Agent class:
chainerrl.agents.SoftActorCritic
- Example and benchmark results (MuJoCo): https://github.com/chainer/chainerrl/tree/v0.8.0/examples/mujoco/reproduction/soft_actor_critic
- Example (Roboschool Atlas): https://github.com/chainer/chainerrl/tree/v0.8.0/examples/atlas
- Agent class:
- Trained models of benchmark results are now downloadable. See READMEs of examples.
- For Atari envs: DQN, IQN, Rainbow, A3C
- For MuJoCo envs: DDPG, PPO, TRPO, TD3, Soft Actor-Critic
- DQN-based agents now support recurrent models in a new, more efficient interface.
- TRPO now supports recurrent models and batch training.
- A variant of IQN with double Q-learning is added.
- Agent class:
chainerrl.agents.DoubleIQN
. - Example: https://github.com/chainer/chainerrl/tree/v0.8.0/examples/atari/train_double_iqn.py
- Agent class:
- IQN now supports prioritized experience replay.
Important bugfixes
- The bug that the update of
CategoricalDoubleDQN
is same as that ofCategoricalDQN
is fixed. - The bug that batch training with N-step or episodic replay buffers does not work is fixed.
- The bug that weight normalization is
PrioritizedReplayBuffer
withnormalize_by_max == 'batch'
is wrong is fixed.
Important destructive changes
- Support of Python 2 is dropped. ChainerRL is now only tested with Python 3.5.1+.
- The interface of DQN-based agents to use recurrent models has changed. See the DRQN example: https://github.com/chainer/chainerrl/tree/v0.8.0/examples/atari/train_drqn_ale.py
All updates
Enhancements
- Recurrent DQN families with a new interface (#436)
- Recurrent and batched TRPO (#446)
- Add Soft Actor-Critic agent (#457)
- Code to collect demonstrations from an agent. (#468)
- Monitor with ContinuingTimeLimit support (#491)
- Fix B007: Loop control variable not used within the loop body (#502)
- Double IQN (#503)
- Fix B006: Do not use mutable data structures for argument defaults. (#504)
- Splits Replay Buffers into separate files in a replay_buffers module (#506)
- Use chainer.grad in ACER (#511)
- Prioritized Double IQN (#518)
- Add policy loss to TD3's logged statistics (#524)
- Adds checkpoint frequencies for serial and batch Agents. (#525)
- Add a deterministic mode to IQN for stable tests (#529)
- Use Link.cleargrads instead of Link.zerograds in REINFORCE (#536)
- Use cupyx.scatter_add instead of cupy.scatter_add (#537)
- Avoid cupy.zeros_like with numpy.ndrray (#538)
- Use get_device_from_id since get_device is deprecated (#539)
- Releases trained models for all reproduced agents (#565)
Documentation
- Typo fix in Replay Buffer Docs (#507)
- Fixes typo in docstring for AsyncEvaluator (#508)
- Improve the algorithm list on README (#509)
- Add Explorers to Documentation (#514)
- Fixes syntax errors in ReplayBuffer docs. (#515)
- Adds policies to the documentation (#516)
- Adds demonstration collection to experiments docs (#517)
- Adds List of Batch Agents to the README (#543)
- Add documentation for Q-functions and some missing details in docstrings (#556)
- Add comment on environment version difference (#582)
- Adds ChainerRL Bibtex to the README (#584)
- Minor Typo Fix (#585)
Examples
- Rename examples directories (#487)
- Adds training times for reproduced Mujoco results (#497)
- Adds additional information to Grasping Example README (#501)
- Fixes a comment in PPO example (#521)
- Rainbow Scores (#546)
- Update train_a3c.py (#547, thanks @xinyuewang1!)
- Update train_a3c.py (#548, thanks @xinyuewang1!)
- Improves formatting of IQN training times (#549)
- Corrects Scores in Examples (#552)
- Removes GPU option from README (#564)
- Releases trained models for all reproduced agents (#565)
- Add an example script for RoboschoolAtlasForwardWalk-v1 (#577)
- Corrects Rainbow Results (#580)
- Adds proper A3C scores (#581)
Testing
- Add CI configs (#478)
- Specify ubuntu 16.04 for Travis CI and modify a dependency accordingly (#520)
- Remove a tailing space of DoubleIQN (#526)
- Add a deterministic mode to IQN for stable tests (#529)
- Fix import error when chainer==7.0.0b3 (#531)
- Make test_monitor.py work on flexCI (#533)
- Improve parameter distributions used in TestGaussianDistribution (#540)
- Increase flexCI's time limit to 20min (#550)
- decrease amount of decimal digits required to 4 (#554)
- Use attrs<19.2.0 with pytest (#569)
- Run slow tests with flexCI (#575)
- Typo fix in CI comment. (#576)
- Adds time to DDPG Tests (#587)
- Fix CI errors due to pyglet, zipp, mock, and gym (#592)
Bugfixes
- Fix a bug in
batch_recurrent_experiences
regarding next_action (#528) - Fix ValueError in SARSA with GPU (#534)
- fix function call (#541)
- Pass env_id to replay_buffer methods to fix batch training (#558)
- Fixes Categorical Double DQN Error. (#567)
- Fix weight normalization inside prioritized experience replay (#570)
v0.7.0
Important enhancements
- Rainbow (https://arxiv.org/abs/1710.02298) with benchmark results is added. (thanks @seann999!)
- Agent class:
chainerrl.agents.CategoricalDoubleDQN
- Example and benchmark results: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/atari/rainbow
- Agent class:
- TD3 (https://arxiv.org/abs/1802.09477) with benchmark results is added.
- Agent class:
chainerrl.agents.TD3
- Example and benchmark results: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/mujoco/td3
- Agent class:
- PPO now supports recurrent models.
- Example: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/ale/train_ppo_ale.py (with
--recurrent
option) - Results: #431
- Example: https://github.com/chainer/chainerrl/tree/v0.7.0/examples/ale/train_ppo_ale.py (with
- DDPG now supports batch training
Important bugfixes
- The bug that some examples use the same random seed across envs for
env.seed
is fixed. - The bug that batch training with n-step return and/or recurrent models is not successful is fixed.
- The bug that
examples/ale/train_dqn_ale.py
usesLinearDecayEpsilonGreedy
even when NoisyNet is used is fixed. - The bug that
examples/ale/train_dqn_ale.py
does not use the value specified by--noisy-net-sigma
is fixed. - The bug that
chainerrl.links.to_factorized_noisy
does not work correctly withchainerrl.links.Sequence
is fixed.
Important destructive changes
chainerrl.experiments.train_agent_async
now requireseval_n_steps
(number of timesteps for each evaluation phase) andeval_n_episodes
(number of episodes for each evaluation phase) to be explicitly specified, with one of them being None.examples/ale/dqn_phi.py
is removed.chainerrl.initializers.LeCunNormal
is removed. Usechainer.initializers.LeCunNormal
instead.
All updates
Enhancement
- Rainbow (#374)
- Make copy_param support scalar parameters (#410)
- Enables batch DDPG agents to be trained. (#416)
- Enables asynchronous time-based evaluations of agents. (#420)
- Removes obsolete dqn_phi file (#424)
- Add Branched and use it to simplify train_ppo_batch_gym.py (#427)
- Remove LeCunNormal since Chainer has it from v3 (#428)
- Precompute log probability in PPO (#430)
- Recurrent PPO with a stateless recurrent model interface (#431)
- Replace Variable.data with Variable.array (again) (#434)
- Make IQN work with tuple observations (#435)
- Add VectorStackFrame to reduce memory usage in train_dqn_batch_ale.py (#443)
- DDPG example that reproduces the TD3 paper (#452)
- TD3 agent (#453)
- update requirements.txt and setup.py for gym (#461)
- Support
gym>=0.12.2
by stopping to use underscore methods in gym wrappers (#462) - Add warning about numpy 1.16.0 (#476)
Documentation
- Link to abstract pages on ArXiv (#409)
- fixes typo (#412)
- Fixes file path in grasping example README (#422)
- Add links to references (#425)
- Fixes minor grammar mistake in A3C ALE example (#432)
- Add explanation of
examples/atari
(#437) - Link to chainer/chainer, not pfnet/chainer (#439)
- Link to chainer/chainer(rl), not pfnet/chainer(rl) (#440)
- fix & add docstring for FCStateQFunctionWithDiscreteAction (#441)
- Fixes a typo in train_agent_batch Documentation. (#444)
- Adds Rainbow to main README (#447)
- Fixes Docstring in IQN (#451)
- Improves Rainbow README (#458)
- very small fix: add missing doc for eval_performance. (#459)
- Adds IQN Results to readme (#469)
- Adds IQN to the documentation. (#470)
- Adds reference to mujoco folder in the examples README (#474)
- Fixes incorrect comment. (#490)
Examples
- Rainbow (#374)
- Create an IQN example aimed at reproducing the original paper and its evaluation protocol. (#408)
- Benchmarks DQN example (#414)
- Enables batch DDPG agents to be trained. (#416)
- Fixes scores for Demon Attack (#418)
- Set observation_space of kuka env correctly (#421)
- Fixes error in setting explorer in DQN ALE example. (#423)
- Add Branched and use it to simplify train_ppo_batch_gym.py (#427)
- A3C Example for reproducing paper results. (#433)
- PPO example that reproduces the "Deep Reinforcement Learning that Matters" paper (#448)
- DDPG example that reproduces the TD3 paper (#452)
- TD3 agent (#453)
- Apply
noisy_net_sigma
parameter (#465)
Testing
- Use Python 3.6 in Travis CI (#411)
- Increase tolerance of TestGaussianDistribution.test_entropy since sometimes it failed (#438)
- make FrameStack follow original spaces (#445)
- Split test_examples.sh (#472)
- Fix Travis error (#492)
- Use Python 3.6 for ipynb (#493)
Bugfixes
- bugfix (#360, thanks @corochann!)
- Fixes error in setting explorer in DQN ALE example. (#423)
- Make sure the agent sees when episodes end (#429)
- Pass env_id to replay buffer methods to correctly support batch training (#442)
- Add VectorStackFrame to reduce memory usage in train_dqn_batch_ale.py (#443)
- Fix a bug of unintentionally using same process indices (#455)
- Make cv2 dependency optional (#456)
- fix ScaledFloatFrame.observation_space (#460)
- Apply
noisy_net_sigma
parameter (#465) - Match EpisodicReplayBuffer.sample with ReplayBuffer.sample (#485)
- Make
to_factorized_noisy
work with sequential links (#489)
v0.6.0
Important enhancements
- Implicit Quantile Network (IQN) https://arxiv.org/abs/1806.06923 agent is added:
chainerrl.agents.IQN
. - Training DQN and its variants with N-step returns is supported.
- Resetting env with
done=False
viainfo
dict is supported. Whenenv.step
returns ainfo
dict withinfo['needs_reset']=True
, env is reset. This feature is useful for implementing a continuing env. - Evaluation with a fixed number of timesteps is supported (except async training). This evaluation protocol is popular in Atari benchmarks.
examples/atari/dqn
now implements the same evaluation protocol as the Nature DQN paper.
- An example script of training a DoubleDQN agent for a PyBullet-based robotic grasping env is added:
examples/grasping
.
Important bugfixes
- The bug that PPO's
obs_normalizer
was not saved is fixed. - The bug that NonbiasWeightDecay didn't work with newer versions of Chainer is fixed.
- The bug that
argv
argument was ignored bychainerrl.experiments.prepare_output_dir
is fixed.
Important destructive changes
train_agent_with_evaluation
andtrain_agent_batch_with_evaluation
now requireeval_n_steps
(number of timesteps for each evaluation phase) andeval_n_episodes
(number of episodes for each evaluation phase) to be explicitly specified, with one of them beingNone
.train_agent_with_evaluation
'smax_episode_len
argument is renamed totrain_max_episode_len
.ReplayBuffer.sample
now returns a list of lists of N experiences to support N-step returns.
All updates
Enhancement
- Implicit quantile networks (IQN) (#288)
- Adds N-step learning for DQN-based agents. (#317)
- Replaywarning (#321)
- Close envs in async training (#343)
- Allow envs to send a 'needs_reset' signal (#356)
- Changes variable names in train_agent_with_evaluation (#358)
- Use chainer.dataset.concat_examples in batch_states (#366)
- Implements Time-based evaluations (#367)
Documentation
- Add long description for pypi (#357, thanks @ljvmiranda921!)
- A small change to the installation documentation (#369)
- Adds a link to the ChainerRL visualizer from the main repository (#370)
- adds implicit quantile networks to readme (#393)
- Fix DQN.update's docstring (#394)
Examples
Testing
- Fix
TestTrainAgentAsync
(#363) - Use AbnormalExitCodeWarning for nonzero exitcode warnings (#378)
- Avoid random test failures due to asynchronousness (#380)
- Drop hacking (#381)
- Avoid gym 0.11.0 in Travis (#396)
- Stabilize and speed up A3C tests (#401)
- Reduce ACER's test cases and maximum timesteps (#404)
- Add tests of IQN examples (#405)
Bugfixes
v0.5.0
Important enhancements
- Batch synchronized training using multiple environment instances and a single GPU is supported for some agents:
- A2C (added as
chainerrl.agents.A2C
) - PPO
- DQN and other agents that inherits DQN except SARSA
- A2C (added as
examples/ale/train_dqn_ale.py
now follows "Tuned DoubleDQN" setting by default, and supports prioritized experience replay as an optionexamples/atari/train_dqn.py
is added as a basic example of applying DQN to Atari.
Important bugfixes
- A bug in
chainerrl.agents.CategoricalDQN
that deteriorates performance is fixed - A bug in
atari_wrappers.LazyFrame
that unnecessarily increases memory usage is fixed
Important destructive changes
chainerrl.replay_buffer.PrioritizedReplayBuffer
andchainerrl.replay_buffer.PrioritizedEpisodicReplayBuffer
are updated:- become FIFO (First In, First Out), reducing memory usage in Atari games
- compute priorities more closely following the paper
eval_explorer
argument ofchainerrl.experiments.train_agent_*
is dropped (usechainerrl.wrappers.RandomizeAction
for evaluation-time epsilon-greedy)- Interface of
chainerrl.agents.PPO
has changed a lot - Support of Chainer v2 is dropped
- Support of gym<0.9.7 is dropped
- Support of loading chainerrl<=0.2.0's replay buffer is dropped
All updates
Enhancement
- A2C (#149, thanks @iory!)
- Add wrappers to cast observations (#160)
- Fix on flake8 3.5.0 (#214)
- Use ()-shaped array for scalar loss (#219)
- FIFO prioritized replay buffer (#277)
- Update Policy class to inherit ABCMeta (#280, thanks @uidilr!)
- Batch PPO Implementation (#295, thanks @ljvmiranda921!)
- Mimic the details of prioritized experience replay (#301)
- Add ScaleReward wrapper (#304)
- Remove GaussianPolicy and obsolete policies (#305)
- Make random access queue sampling code cleaner (#309)
- Support gym==0.10.8 (#324)
- Batch A2C/PPO/DQN (#326)
- Use RandomizeAction wrapper instead of Explorer in evaluation (#328)
- remove duplicate lines (typo) (#329, thanks @monado3!)
- Merge consecutive with statements (#333)
- Use Variable.array instead of Variable.data (#336)
- Remove code for Chainer v2 (#337)
- Implement getitem for ActionValue (#339)
- Count updates of DQN (#341)
- Move Atari Wrappers (#349)
- Render wrapper (#350)
Documentation
- fixes minor typos (#306)
- fixes typo (#307)
- Typos (#308)
- fixes readme typo (#310)
- Adds partial list of paper implementations with links to the main README (#311)
- Adds another paper to list (#312)
- adds some instructions regarding testing for potential contributors (#315)
- Remove duplication of DQN in docs (#334)
- nit on grammar of a comment: (#354)
Examples
- Tuned DoubleDQN with prioritized experience replay (#302)
- adds some descriptions to parseargs arguments (#319)
- Make clip_eps positive (#340)
- updates env in ddpg example (#345)
- Examples (#348)
Testing
- Fix Travis CI errors (#318)
- Parse Chainer version with packaging.version (#322)
- removes tests for old replay buffer (#347)
Bugfixes
v0.4.0
Important enhancements
- TRPO (trust region policy optimization) is added:
chainerrl.agents.TRPO
. - C51 (categorical DQN) is added:
chainerrl.agents.CategoricalDQN
. - NoisyNet is added:
chainerrl.links.FactorizedNoisyLinear
andchainerrl.links.to_factorized_noisy
. - Python 3.7 is supported
- Examples were improved in terms of logging and random seed setting
Important destructive changes
- The
async
module is renamedasync_
for Python 3.7 support.
All updates
Enhancements
- TRPO agent (#204)
- Use numpy random (#206)
- Add gpus argument for chainerrl.misc.set_random_seed (#207)
- More check on nesting AttributeSavingMixin (#208)
- show error message (#210, thanks @corochann!)
- Add an option to set whether the agent is saved every time the score is improved (#213)
- Make tests check exit status of subprocesses (#215)
- make ReplayBuffer.load() compatible with v0.2.0. (#216, thanks @mr4msm!)
- Add requirements-dev.txt (#222)
- Align act and act_and_train's signature to the Agent interface (#230, thanks @lyx-x!)
- Support dtype arg of spaces.Box (#231)
- Set outdir to results and add help strings (#248)
- Categorical DQN (C51) (#249)
- Remove DiscreteActionValue.sample_epsilon_greedy_actions (#259)
- Remove DQN.compute_q_values (#260)
- Enable to change batch_states in PPO (#261, thanks @kuni-kuni!)
- Remove unnecessary declaration and substitution of 'done' in the train_agent function (#271, thanks @uidilr!)
Documentation
- Update the contribution guide to use pytest (#220)
- Add docstring to ALE and fix seed range (#234)
- Fix docstrings of DDPG (#241)
- Update the algorithm section of README (#246)
- Add CategoricalDQN to README (#252)
- Remove unnecessary comments from examples/gym/train_categorical_dqn_gym.py (#255)
- Update README.md of examples/ale (#275)
Examples
- Fix OMP_NUM_THREADS setting (#235)
- Improve random seed setting in ALE examples (#239)
- Improve random seed setting for all examples (#243)
- Use gym and atari wrappers instead of chainerrl.envs.ale (#253)
- Remove unused args from examples/ale/train_categorical_dqn_ale.py and examples/ale/train_dqn_ale.py (#256)
- Remove unused --profile argument (#258)
- Hyperlink DOI against preferred resolver (#266, thanks @katrinleinweber!)
Testing
- Fix import chainer.testing.condition (#200)
- Use pytest (#209)
- Fix PCL tests (#211)
- Test loading v0.2.0 replay buffers (#217)
- Use assertRaises instead of expectedFailure (#218)
- Improve travis script (#242)
- Run autopep8 in travis ci (#247)
- Switch autopep8 and hacking (#257)
- Use hacking 1.0 (#262)
- Fix a too long line of PPO (#264)
- Update to hacking 1.1.0 (#274)
- Add tests of DQN's loss functions (#279)
Bugfixes
- gym 0.9.6 is not working with python2 (#226)
- Tiny fix: argument passing in SoftmaxDistribution (#228, thanks @lyx-x!)
- Add docstring to ALE and fix seed range (#234)
- except both Exception and KeyboardInterrupt (#250, thanks @uenoku!)
- Switch autopep8 and hacking (#257)
- Modify
async
toasync_
to support Python 3.7 (#286, thanks @mmilk1231!) - Noisy network fixes (#287, thanks @seann999!)
v0.3.0
Important enhancements
- Both Chainer v2 and v3 are now supported
- PPO (Proximal Policy Optimization) has been added:
chainerrl.agents.PPO
- Replay buffers has been made faster
Important destructive changes
- Episodic replay buffers'
__len__
now counts the number of transitions, not episodes - ALE's grayscale conversion formula has been corrected
- FCGaussianPolicyWithFixedCovariance now has a nonlinearity before the last layer
All updates
Enhancements
- Add RMSpropAsync and NonbiasWeightDecay to
optimizers/__init__.py
(#113) - Use init_scope (#116)
- Remove ALE dependency (#121)
- Support environments without git command (#124)
- Add PPO agent (#126)
- add .gitignore (#127, thanks @knorth55!)
- Use faster queue for replay buffers (#131)
- Use F.matmul instead of F.batch_matmul (#141)
- Add a utility function to draw a computational graph (#166)
- Improve MLPBN (#171)
- Improve StateActionQFunctions (#172)
- Improve deterministic policies (#173)
- Fix InvertGradients (#185)
- Remove unused functions in DQN (#188)
- Warn about negative exit code of child processes (#194)
Documentation
- Add animation gifs (#107)
- Synchronize docs version with package version (#111)
- Add logo (#136)
- [policies/gaussian_policy] Improve docstring (#140, thanks @iory!)
- Improve docstrings (#142)
- Fix a typo (#146)
- Fix a broken link to travis ci (#153)
- Add PPO to README as an implemented algorithm (#168)
- Improve the docstring of AdditiveGaussian (#170)
- Add docsting on eval_max_episode_len (#177)
- Add docstring to DuelingDQN (#187)
- Suppress Sphinx' warning in the docstring of PCL (#198)
Example
- fix typo (#122)
- Use Chain.init_scope in the quick start (#148)
- Draw computational graphs in
train_dqn_ale.py
(#192) - Draw computational graphs in
train_dqn_gym.py
(#195) - Draw computational graphs in
train_a3c_ale.py
(#197)
Testing
- Add CHAINER_VERSION config to CI (#143)
- Specify --outdir on 2nd test (#154)
- Return dict for info of env.step (#162)
- Fix import error in tests (#180)
- Mark TestBiasCorrection as slow (#181)
- Add tests for SingleActionValue (#191)
Bugfixes
- Fix save/load in EpisodicReplayBuffer (#130)
- Fix REINFORCE's missing initialization of t (#133)
- Fix episodic buffer
__len__
(#155) - Remove duplicated import of explorers (#163)
- Fix missing nonlinearity before the last layer (#165)
- Use bytestrings to write git outputs (#178)
- Patches to envs.ALE (#182)
- Fix QuadraticActionValue and add tests (#190)