ast_toolbox.policies.go_explore_policy module¶

class ast_toolbox.policies.go_explore_policy.GoExplorePolicy(env_spec, name='GoExplorePolicy')[source]¶

Bases: garage.tf.policies.base.StochasticPolicy

A stochastic policy for Go-Explore that takes actions uniformally at random.

Parameters:	env_spec (`garage.envs.EnvSpec`) – Environment specification. name (str) – Name for the tensors.

dist_info(obs, state_infos)[source]¶

Distribution info.

Return the distribution information about the actions.

Parameters:	obs (array_like) – Observation values. state_infos (dict) – A dictionary whose values should contain information about the state of the policy at the time it received the observation.

dist_info_sym(obs_var, state_info_vars, name='dist_info_sym')[source]¶

Symbolic graph of the distribution.

Return the symbolic distribution information about the actions.

Parameters:	obs_var (tf.Tensor) – Symbolic variable for observations. state_infos (dict) – A dictionary whose values should contain information about the state of the policy at the time it received the observation. name (str) – Name of the symbolic graph.

get_action(observation)[source]¶

Get action sampled from the policy.

Parameters:	observation (array_like) – Observation from the environment.
Returns:	array_like – Action sampled from the policy.

get_actions(observations)[source]¶

Get actions sampled from the policy.

Parameters:	observations (list[array_like]) – Observations from the environment.
Returns:	array_like – Actions sampled from the policy.

log_diagnostics(paths)[source]¶: Log extra information per iteration based on the collected paths.

reset(dones=None)[source]¶

Reset the policy.

If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.

Parameters:	dones (array_like) – Bool that indicates terminal state(s).

terminate()[source]¶: Clean up operation.

distribution¶

Distribution.

Returns:	Distribution.

vectorized¶: Indicates whether the policy is vectorized. If True, it should implement get_actions(), and support resetting with multiple simultaneous states.