ast_toolbox.algos.ga module¶
-
class
ast_toolbox.algos.ga.
GA
(top_paths=None, n_itr=2, batch_size=500, step_size=0.01, step_size_anneal=1.0, pop_size=5, truncation_size=2, keep_best=1, f_F='mean', log_interval=4000, init_step=1.0, **kwargs)[source]¶ Bases:
garage.tf.algos.batch_polopt.BatchPolopt
Deep Genetic Algorithm from Such et al. [1].
Parameters: - top_paths (
ast_toolbox.mcts.BoundedPriorityQueues
, optional) – The bounded priority queue to store top-rewarded trajectories. - step_size (float, optional) – Standard deviation for each mutation.
- step_size_anneal (float, optional) – The linear annealing rate of step_size after each iteration.
- pop_size (int, optional) – The population size
- truncation_size (int, optional) – The number of top-performed individuals that are chosen as parents.
- keep_best (int, optional) – The number of top-performed individuals that remain unchanged for next generation.
- f_F (string, optional) – The method used to calculate fitness: ‘mean’ for the average return, ‘max’ for the max return.
- log_interval (int, optional) – The log interval in terms of environment calls.
- kwargs – Keyword arguments passed to garage.tf.algos.BatchPolopt.
References
[1] Such, Felipe Petroski, et al. “Deep neuroevolution: Genetic algorithms are a competitive - alternative for training deep neural networks for reinforcement learning.”
- arXiv:1712.06567 (2017).
-
extra_recording
(itr)[source]¶ Record extra training statistics per-iteration.
Parameters: itr (int) – The iteration number.
-
get_fitness
(itr, all_paths)[source]¶ Calculate the fitness of the collexted paths.
Parameters: - itr (int) – The iteration number.
- all_paths (list[dict]) – The collected paths from the sampler.
Returns: fitness (list[float]) – The list of fitness of each individual.
-
get_itr_snapshot
(itr, samples_data)[source]¶ Get the snapshot of the current population.
Parameters: - itr (int) – The iteration number.
- samples_data (dict) – The processed data samples.
Returns: snaposhot (dict) – The training snapshot.
-
mutation
(itr, new_seeds, new_magnitudes, all_paths)[source]¶ Generate new random seeds and magnitudes for the next generation.
The first self.keep_best seeds are set to no-mutation value (0).
Parameters: - itr (int) – The iteration number.
- new_seeds (
numpy.ndarry
) – The original seeds. - new_magnitudes (
numpy.ndarry
) – The original magnitudes. - all_paths (list[dict]) – The collected paths from the sampler.
Returns: - new_seeds (
numpy.ndarry
) – The new seeds. - new_magnitudes (
numpy.ndarry
) – The new magnitudes.
-
obtain_samples
(itr, runner)[source]¶ Collect rollout samples using the current policy paramter.
Parameters: - itr (int) – The iteration number.
- runner (
garage.experiment.LocalRunner
) –LocalRunner
is passed to give algorithm the access torunner.obtain_samples()
, which collects rollout paths from the sampler.
Returns: paths (list[dict]) – The collected paths from the sampler.
-
optimize_policy
(itr, all_paths)[source]¶ Update the population represented by self.seeds and self.parents.
Parameters: - itr (int) – The iteration number.
- all_paths (list[dict]) – The collected paths from the sampler.
-
process_samples
(itr, paths)[source]¶ Return processed sample data based on the collected paths.
Parameters: - itr (int) – The iteration number.
- paths (list[dict]) – The collected paths from the sampler.
Returns: samples_data (dict) – Processed sample data with same trajectory length (padded with 0)
-
record_tabular
(itr)[source]¶ Record training performace per-iteration.
Parameters: itr (int) – The iteration number.
-
select_parents
(fitness)[source]¶ Select the individuals to be the parents of the next generation.
Parameters: fitness (list[float]) – The list of fitness of each individual.
-
set_params
(itr, p)[source]¶ Set the current policy paramter to the specified iteration and individual.
Parameters: - itr (int) – The iteration number.
- p (int) – The individual index.
-
train
(runner)[source]¶ Start training.
Parameters: runner ( garage.experiment.LocalRunner
) –LocalRunner
is passed to give algorithm the access torunner.step_epochs()
, which provides services such as snapshotting and sampler control.
- top_paths (