ast_toolbox.policies.go_explore_policy module¶
-
class
ast_toolbox.policies.go_explore_policy.
GoExplorePolicy
(env_spec, name='GoExplorePolicy')[source]¶ Bases:
garage.tf.policies.base.StochasticPolicy
A stochastic policy for Go-Explore that takes actions uniformally at random.
Parameters: - env_spec (
garage.envs.EnvSpec
) – Environment specification. - name (str) – Name for the tensors.
-
dist_info
(obs, state_infos)[source]¶ Distribution info.
Return the distribution information about the actions.
Parameters: - obs (array_like) – Observation values.
- state_infos (dict) – A dictionary whose values should contain information about the state of the policy at the time it received the observation.
-
dist_info_sym
(obs_var, state_info_vars, name='dist_info_sym')[source]¶ Symbolic graph of the distribution.
Return the symbolic distribution information about the actions.
Parameters: - obs_var (tf.Tensor) – Symbolic variable for observations.
- state_infos (dict) – A dictionary whose values should contain information about the state of the policy at the time it received the observation.
- name (str) – Name of the symbolic graph.
-
get_action
(observation)[source]¶ Get action sampled from the policy.
Parameters: observation (array_like) – Observation from the environment. Returns: array_like – Action sampled from the policy.
-
get_actions
(observations)[source]¶ Get actions sampled from the policy.
Parameters: observations (list[array_like]) – Observations from the environment. Returns: array_like – Actions sampled from the policy.
-
reset
(dones=None)[source]¶ Reset the policy.
If dones is None, it will be by default np.array([True]) which implies the policy will not be “vectorized”, i.e. number of parallel environments for training data sampling = 1.
Parameters: dones (array_like) – Bool that indicates terminal state(s).
-
distribution
¶ Distribution.
Returns: Distribution.
-
vectorized
¶ Indicates whether the policy is vectorized. If True, it should implement get_actions(), and support resetting with multiple simultaneous states.
- env_spec (