ast_toolbox.policies.go_explore_policy module

class ast_toolbox.policies.go_explore_policy.GoExplorePolicy(env_spec, name='GoExplorePolicy')[source]

Bases: garage.tf.policies.base.StochasticPolicy

A stochastic policy for Go-Explore that takes actions uniformally at random.

Parameters:
  • env_spec (garage.envs.EnvSpec) – Environment specification.
  • name (str) – Name for the tensors.
dist_info(obs, state_infos)[source]

Distribution info.

Return the distribution information about the actions.

Parameters:
  • obs (array_like) – Observation values.
  • state_infos (dict) – A dictionary whose values should contain information about the state of the policy at the time it received the observation.
dist_info_sym(obs_var, state_info_vars, name='dist_info_sym')[source]

Symbolic graph of the distribution.

Return the symbolic distribution information about the actions.

Parameters:
  • obs_var (tf.Tensor) – Symbolic variable for observations.
  • state_infos (dict) – A dictionary whose values should contain information about the state of the policy at the time it received the observation.
  • name (str) – Name of the symbolic graph.
get_action(observation)[source]

Get action sampled from the policy.

Parameters:observation (array_like) – Observation from the environment.
Returns:array_like – Action sampled from the policy.
get_actions(observations)[source]

Get actions sampled from the policy.

Parameters:observations (list[array_like]) – Observations from the environment.
Returns:array_like – Actions sampled from the policy.
log_diagnostics(paths)[source]

Log extra information per iteration based on the collected paths.

reset(dones=None)[source]

Reset the policy.

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:dones (array_like) – Bool that indicates terminal state(s).
terminate()[source]

Clean up operation.

distribution

Distribution.

Returns:Distribution.
vectorized

Indicates whether the policy is vectorized. If True, it should implement get_actions(), and support resetting with multiple simultaneous states.