ast_toolbox.envs.go_explore_ast_env module

Gym environment to turn general AST tasks into garage compatible problems with Go-Explore style resets.

class ast_toolbox.envs.go_explore_ast_env.Custom_GoExploreASTEnv(open_loop=True, blackbox_sim_state=True, fixed_init_state=False, s_0=None, simulator=None, reward_function=None, spaces=None)[source]

Bases: ast_toolbox.envs.go_explore_ast_env.GoExploreASTEnv

Custom class to change how downsampling works.

Example class of how to overload downsample to make the environment work for different environments.

downsample(obs, step=None)[source]

Create a downsampled approximation of the observed simulation state.

Parameters:
  • obs (array_like) – The observed simulation state.
  • step (int, optional) – The current iteration number
Returns:

array_like – The downsampled approximation of the observed simulation state.

class ast_toolbox.envs.go_explore_ast_env.GoExploreASTEnv(open_loop=True, blackbox_sim_state=True, fixed_init_state=False, s_0=None, simulator=None, reward_function=None, spaces=None)[source]

Bases: gym.core.Env, ast_toolbox.envs.go_explore_ast_env.Parameterized

Gym environment to turn general AST tasks into garage compatible problems with Go-Explore style resets.

Certain algorithms, such as Go-Explore and the Backwards Algorithm, require deterministic resets of the simulator. GoExploreASTEnv handles this by cloning simulator states and saving them in a cell structure. The cells are then stored in a hashed database.

Parameters:
  • open_loop (bool) – True if the simulation is open-loop, meaning that AST must generate all actions ahead of time, instead of being able to output an action in sync with the simulator, getting an observation back before the next action is generated. False to get interactive control, which requires that blackbox_sim_state is also False.
  • blackbox_sim_state (bool) – True if the true simulation state can not be observed, in which case actions and the initial conditions are used as the observation. False if the simulation state can be observed, in which case it will be used
  • fixed_init_state (bool) – True if the initial state is fixed, False to sample the initial state for each rollout from the observaation space.
  • s_0 (array_like) – The initial state for the simulation (ignored if fixed_init_state is False)
  • simulator (ast_toolbox.simulators.ASTSimulator) – The simulator wrapper, inheriting from ast_toolbox.simulators.ASTSimulator.
  • reward_function (ast_toolbox.rewards.ASTReward) – The reward function, inheriting from ast_toolbox.rewards.ASTReward.
  • spaces (ast_toolbox.spaces.ASTSpaces) – The observation and action space definitions, inheriting from ast_toolbox.spaces.ASTSpaces.
close()[source]

Calls the simulator’s close function, if it exists.

Returns:None or object – Returns the output of the simulator’s close function, or None if the simulator has no close function.
downsample(obs)[source]

Create a downsampled approximation of the observed simulation state.

Parameters:obs (array_like) – The observed simulation state.
Returns:array_like – The downsampled approximation of the observed simulation state.
env_reset()[source]

Resets the state of the environment, returning an initial observation.

Returns:observation (array_like) – The initial observation of the space. (Initial reward is assumed to be 0.)
get_cache_list()[source]

Returns the environment info cache.

Returns:dict – A dictionary containing diagnostic and logging information for the environment.
get_first_cell()[source]

Returns a the observation and state of the initial state, to be used for a root cell.

Returns:
  • obs (array_like) – Agent’s observation of the current environment.
  • state (array_like) – The cloned simulation state at the current cell, used for resetting if chosen to start a rollout.
get_param_values(**tags)[source]

Return the values of internal parameters.

Parameters:tags (dict[bool]) – For each tag, a parameter is returned if the parameter name matches the tag’s key
Returns:list – A list of parameter values.
get_params_internal(**tags)[source]

Returns the parameters associated with the given tags.

Parameters:tags (dict[bool]) – For each tag, a parameter is returned if the parameter name matches the tag’s key
Returns:list – List of parameters
log()[source]

Calls the simulator’s log function.

render(**kwargs)[source]

Calls the simulator’s render function, if it exists.

Returns:None or object – Returns the output of the simulator’s render function, or None if the simulator has no render function.
reset(**kwargs)[source]

Resets the state of the environment, returning an initial observation.

The reset has 2 modes.

In the “robustify” mode (self.p_robustify_state.value is not None), the simulator resets the environment to p_robustify_state.value. It then returns the initial condition.

In the “Go-Explore” mode, the environment attempts to sample a cell from the cell pool. If successful, the simulator is reset to the cell’s state. On an error, the environment is reset to the intial state.

Returns:observation (array_like) – The initial observation of the space. (Initial reward is assumed to be 0.)
sample(population)[source]

Sample a cell from the cell pool with likelihood proportional to cell fitness.

The sampling is done using Stochastic Acceptance [1], with inspiration from John B Nelson’s blog [2]_.

The sampler rejects cells until the acceptance criterea is met. If the maximum number of rejections is exceeded, the sampler then will sample uniformly sample a cell until it finds a cell with fitness > 0. If the second sampling phase also exceeds the rejection limit, then the function raises an exception.

Parameters:population (list) – A list containing the population of cells to sample from.
Returns:object – The sampled cell.
Raises:ValueError – If the maximum number of rejections is exceeded in both the proportional and the uniform sampling phases.

References

[1]Lipowski, Adam, and Dorota Lipowska. “Roulette-wheel selection via stochastic acceptance.”

Physica A: Statistical Mechanics and its Applications 391.6 (2012): 2193-2196. https://arxiv.org/pdf/1109.3627.pdf .. [2] https://jbn.github.io/fast_proportional_selection/

set_param_values(param_values, **tags)[source]

Set the values of parameters

Parameters:
  • param_values (object) – Value to set the parameter to.
  • tags (dict[bool]) – For each tag, a parameter is returned if the parameter name matches the tag’s key
simulate(actions)[source]

Run a full simulation rollout.

Parameters:actions (list[array_like]) – A list of array_likes, where each member is the action taken at that step.
Returns:
  • int – The step of the trajectory where a collision was found, or -1 if a collision was not found.
  • dict – A dictionary of simulation information for logging and diagnostics.
step(action)[source]

Run one timestep of the environment’s dynamics. When end of episode is reached, reset() should be called to reset the environment’s internal state.

Parameters:action (array_like) – An action provided by the environment.
Returns:garage.envs.base.Step() – A step in the rollout. Contains the following information:
  • observation (array_like): Agent’s observation of the current environment.
  • reward (float): Amount of reward due to the previous action.
  • done (bool): Is the current step a terminal or goal state, ending the rollout.
  • cache (dict): A dictionary containing other diagnostic information from the current step.
  • actions (array_like): The action taken at the current.
  • state (array_like): The cloned simulation state at the current cell, used for resetting if chosen to start a rollout.
  • is_terminal (bool): Whether or not the current cell is a terminal state.
  • is_goal (bool): Whether or not the current cell is a goal state.
action_space

Convenient access to the environment’s action space.

Returns:gym.spaces.Space – The action space of the reinforcement learning problem.
observation_space

Convenient access to the environment’s observation space.

Returns:gym.spaces.Space – The observation space of the reinforcement learning problem.
spec

Returns a garage environment specification.

Returns:garage.envs.env_spec.EnvSpec – A garage environment specification.
class ast_toolbox.envs.go_explore_ast_env.GoExploreParameter(name, value)[source]

Bases: object

A wrapper for variables that will be set as parameters in the GoExploreASTEnv :Parameters: * name (str) – Name of the parameter.

  • value (value) – Value of the parameter.
get_value(**kwargs)[source]

Return the value of the parameter.

Parameters:kwargs – Extra keyword arguments (Not currently used).
Returns:object – The value of the parameter.
set_value(value)[source]

Set the value of the parameter

Parameters:value (object) – What to set the parameters value to.
class ast_toolbox.envs.go_explore_ast_env.Parameterized[source]

Bases: object

A slimmed down version of the (deprecated) Parameterized class from garage for passing parameters to environments.

Garage uses pickle to handle parallelization, which limits the types of objects that can be used as class attributes withing the environment. This class is a workaround, so that the parallel environments can have access to things like a database.

get_params(**tags)[source]

Get the list of parameters, filtered by the provided tags. Some common tags include ‘regularizable’ and ‘trainable’

Parameters:tags (str) – Names of the paramters to return.
get_params_internal(**tags)[source]

Internal method to be implemented which does not perform caching

Parameters:tags (str) – Names of the paramters to return.