Implementation of Augmented-Random-Search for OpenAI Gym environments in Python. Performance is defined as the sample efficiency of the algorithm i.e how good is the average reward after using x episodes of interaction in the environment for traning. The paper can be found here: Simple random search provides a competitive approach to reinforcement learning
ARS is an Evolutionary Strategy where the policy is linear with weights wp
Given an observation st an action at is chosen by:
Continuous case:
at = wp * st
Discrete case:
at = softmax(wp * st)
The weights are mutated by adding i.i.d. normal distributed noise to every weight.
wnew = wp + α * N(0, 1)
Then the policy weights wp are updated in the direction of the best performing mutated weights.