Flow x SmartCity Transfer

For Rllab ('controller/control.py')

Dependencies: Theano, install with: pip install theano

For this experiment's purpose, you should only need to interface with controller/control.py. This controller takes in a path to a pkl file containing a theano.compile.function_module.Function object, which has the weights of this controller embedded in it. Provided with this class is "function.pkl," a controller designed for an autonomous vehicle (AV) within the following scenario, which should be recreated in SmartCity as closely as possible. A zoomed-picture of the beginning of the scenario "straight_scenario.png" is attached. - one straight road - one AV behind an IDM vehicle (10 m apart, with noise)

Observations for specific experiments should be provided to the StraightController function get_action in the normalized forms listed according to the version type below:

Inputs in this observation array must be scaled according to the experiment parameters listed for the neural net weights to provide accurate accelerations!

IMPORTANT: Certain features Theano for Python2.7 is behind.

/.../theano/gof/opt.py
- class _metadict
- class ChangeTracker
/.../theano/compile/function_module.py
- class Supervisor

To apply necessary changes for Theano (2.7), run scripts/apply_patch.py, which will change the above listed classes to inherit from object.

For RLlib ('controller/rllib_control.py')

Dependencies: Ray, Flow

Ray installation information: https://ray.readthedocs.io/en/latest/installation.html

Flow installation: https://flow.readthedocs.io/en/latest/flow_setup.html

IMPORTANT NOTE: Ray for 2.7 is incompatible with the pickle protocol. To fix this, the following change needs to be made in ray/tune/trainable.py. Replace the restore function with:

def restore(self, checkpoint_path):
    """Restores training state from a given model checkpoint.

    These checkpoints are returned from calls to save().

    Subclasses should override ``_restore()`` instead to restore state.
    This method restores additional metadata saved with the checkpoint.
    """

    # metadata = pickle.load(open(checkpoint_path + ".tune_metadata", "rb"))
    # self._experiment_id = metadata["experiment_id"]
    # self._iteration = metadata["iteration"]
    # self._timesteps_total = metadata["timesteps_total"]
    # self._time_total = metadata["time_total"]
    # self._episodes_total = metadata["episodes_total"]
    saved_as_dict = False
    if saved_as_dict:
        with open(checkpoint_path, "rb") as loaded_state:
            checkpoint_dict = pickle.load(loaded_state)
        self._restore(checkpoint_dict)
    else:
        self._restore(checkpoint_path)
    self._restored = True

For interfacing with RLlib-trained policy, run something like python controller/rllib_control.py data/rllib/ma_state_noise 150. For more detail, view controller/rllib_control.py. This controller takes in a path to a directory containing checkpoints of an RLlib-trained policy.

Example Usage:

sc = StraightController("../data/weights/weight_3.pkl") observation = [0.00207403, 0., 0.] sc.get_action(observation)

Log of Experiment Parameters

Listed below are the experiment parameters used in this experiment:

weight_0.pkl (v0)

target_velocity: 10 m/s
speed_limit: 30 m/s
max_acceleration: 3 m/s^2
max_deacceleration: 3 m/s^2
road_length: 2000 m
Observations provided as:

   [IDM velocity / speed_limit, IDM absolute position / road_length]]

weight_1.pkl, weight_2.pkl, weight_3.pkl (v1)

target_velocity: 10 m/s
speed_limit: 15 m/s
max_acceleration: 3 m/s^2
max_deacceleration: 3 m/s^2
road_length: 1500 m
Observations provided as:

[RL headway / road_length, RL velocity / speed_limit, IDM velocity / speed_limit]