Skip to content

v3.1.0: L1 fitness norm, code and spec refactor, online eval

Compare
Choose a tag to compare
@kengz kengz released this 09 Jan 06:11
· 1778 commits to master since this release

v3.1.0: L1 fitness norm, code and spec refactor, online eval

Docker image kengz/slm_lab:v3.1.0 released

L1 fitness norm (breaking change)

  • change fitness vector norm from L2 to L1 for intuitiveness and non-extreme values

code and spec refactor

  • #254 PPO cleanup: remove hack and restore minimization scheme
  • #255 remove use_gae and use_nstep param to infer from lam, num_step_returns
  • #260 fix decay start_step offset, add unit tests for rate decay methods
  • #262 make epi start from 0 instead of 1 for code logic consistency
  • #264 switch max_total_t, max_epi to max_tick and max_tick_unit for directness. retire graph_x for the unit above
  • #266 add Atari fitness std, fix CUDA coredump issue
  • #269 update gym, remove box2d hack

Online Eval mode

#252 #257 #261 #267
Evaluation sessions during training on a subprocess. This does not interfere with the training process, but spawns multiple subprocesses to do independent evaluation, which then adds to an eval file, and at the end a final eval will finish and plot all the graphs and save all the data for eval.

  • enabled by meta spec 'training_eval'
  • configure NUM_EVAL_EPI in analysis.py
  • update enjoy and eval mode syntax. see README.
  • change ckpt behavior to use e.g. tag ckpt-epi10-totalt1000
  • add new eval mode to lab. runs on a checkpoint file. see below

Eval Session

  • add a proper eval Session which loads from the ckpt like above, and does not interfere with existing files. This can be ran on terminal, and it's also used by the internal eval logic, e.g. command python run_lab.py data/dqn_cartpole_2018_12_20_214412/dqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000
  • when eval session is done, it will average all of its ran episodes and append to a row in an eval_session_df.csv
  • after that it will delete the ckpt files it had just used (to prevent large storage)
  • then, it will run a trial analysis to update eval_trial_graph.png, and an accompanying trial_df as average of all session_dfs

How eval mode works

  • checkpoint will save the models using the scheme which records its epi and total_t. This allows one to eval using the ckpt model
  • after creating ckpt files, if spec.meta.training_eval in trainmode, a subprocess will launch using the ckpt prepath to run an eval Session, using the same way abovepython run_lab.py data/dqn_cartpole_2018_12_20_214412/dqn_cartpole_t0_spec.json dqn_cartpole eval@dqn_cartpole_t0_s2_ckpt-epi10-totalt1000`
  • eval session runs as above. ckpt will now run at the starting timestep, ckpt timestep, and at the end
  • the main Session will wait for the final eval session and it's final eval trial to finish before closing, to ensure that other processes like zipping wait for them.

Example eval trial graph:

dqn_cartpole_t0_ckpt-eval_trial_graph