Skip to content

llucid-97/FastDeepQLearning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Maintainability Language grade: Python Total alerts

Fast Deep Q Learning

Combining improvements in deep Q Learning for fast and stable training with a modular, configurable agent.
Pranjal Tandon's Pytorch Soft Actor Critic is used as a baseline. I've added the following optional components atop it:

Features:

WIP:

  • A State dependent exploration method based on Raffin & Stulp's gSDE to make SAC more robust to environments that act like low-pass filters

Motivation:

The state of the art in Deep RL has been through ramping up in scale scale. But with enough effort, patience and time in optimizing pipelines, people can achieve 80-90%-ish of state of art results with commodity hardware.

I'm setting out to create such from scratch to learn the intricacies of writing fast Reinforcement Learning pipelines, and combining improvements from published work to attain general algorithmic speed improvements.

I will start from simple classic control environments, then ramp up through to standard benchmarks like RoboSchool, then through to pixel-based environments like Atari.
My goal is to have a single algorithm solve all of these out-of-the-box with the same set of hyper parameters.

Usage

main.py configures the experiments. I haven't setup an argparse system or reading configs from file yet (on the todo list), for now, all configuration is done by edditing the config instances in main, then running it.

This was tested on windows 10 with torch 1.3.0.