ast_toolbox.algos.ga module¶

class ast_toolbox.algos.ga.GA(top_paths=None, n_itr=2, batch_size=500, step_size=0.01, step_size_anneal=1.0, pop_size=5, truncation_size=2, keep_best=1, f_F='mean', log_interval=4000, init_step=1.0, **kwargs)[source]¶

Bases: garage.tf.algos.batch_polopt.BatchPolopt

Deep Genetic Algorithm from Such et al. [1].

Parameters:

top_paths (ast_toolbox.mcts.BoundedPriorityQueues, optional) – The bounded priority queue to store top-rewarded trajectories.
step_size (float, optional) – Standard deviation for each mutation.
step_size_anneal (float, optional) – The linear annealing rate of step_size after each iteration.
pop_size (int, optional) – The population size
truncation_size (int, optional) – The number of top-performed individuals that are chosen as parents.
keep_best (int, optional) – The number of top-performed individuals that remain unchanged for next generation.
f_F (string, optional) – The method used to calculate fitness: ‘mean’ for the average return, ‘max’ for the max return.
log_interval (int, optional) – The log interval in terms of environment calls.
kwargs – Keyword arguments passed to garage.tf.algos.BatchPolopt.

References

[1]	Such, Felipe Petroski, et al. “Deep neuroevolution: Genetic algorithms are a competitive

alternative for training deep neural networks for reinforcement learning.”: arXiv:1712.06567 (2017).

extra_recording(itr)[source]¶

Record extra training statistics per-iteration.

Parameters:	itr (int) – The iteration number.

get_fitness(itr, all_paths)[source]¶

Calculate the fitness of the collexted paths.

Parameters:	itr (int) – The iteration number. all_paths (list[dict]) – The collected paths from the sampler.
Returns:	fitness (list[float]) – The list of fitness of each individual.

get_itr_snapshot(itr, samples_data)[source]¶

Get the snapshot of the current population.

Parameters:	itr (int) – The iteration number. samples_data (dict) – The processed data samples.
Returns:	snaposhot (dict) – The training snapshot.

init_opt()[source]¶: Initiate trainer internal tensorflow operations.

initial()[source]¶: Initiate trainer internal parameters.

mutation(itr, new_seeds, new_magnitudes, all_paths)[source]¶

Generate new random seeds and magnitudes for the next generation.

The first self.keep_best seeds are set to no-mutation value (0).

Parameters:

itr (int) – The iteration number.
new_seeds (numpy.ndarry) – The original seeds.
new_magnitudes (numpy.ndarry) – The original magnitudes.
all_paths (list[dict]) – The collected paths from the sampler.

Returns:

new_seeds (numpy.ndarry) – The new seeds.
new_magnitudes (numpy.ndarry) – The new magnitudes.

obtain_samples(itr, runner)[source]¶

Collect rollout samples using the current policy paramter.

Parameters:	itr (int) – The iteration number. runner (`garage.experiment.LocalRunner`) – `LocalRunner` is passed to give algorithm the access to `runner.obtain_samples()`, which collects rollout paths from the sampler.
Returns:	paths (list[dict]) – The collected paths from the sampler.

optimize_policy(itr, all_paths)[source]¶

Update the population represented by self.seeds and self.parents.

Parameters:	itr (int) – The iteration number. all_paths (list[dict]) – The collected paths from the sampler.

process_samples(itr, paths)[source]¶

Return processed sample data based on the collected paths.

Parameters:	itr (int) – The iteration number. paths (list[dict]) – The collected paths from the sampler.
Returns:	samples_data (dict) – Processed sample data with same trajectory length (padded with 0)

record_tabular(itr)[source]¶

Record training performace per-iteration.

Parameters:	itr (int) – The iteration number.

select_parents(fitness)[source]¶

Select the individuals to be the parents of the next generation.

Parameters:	fitness (list[float]) – The list of fitness of each individual.

set_params(itr, p)[source]¶

Set the current policy paramter to the specified iteration and individual.

Parameters:	itr (int) – The iteration number. p (int) – The individual index.

train(runner)[source]¶

Start training.

Parameters:	runner (`garage.experiment.LocalRunner`) – `LocalRunner` is passed to give algorithm the access to `runner.step_epochs()`, which provides services such as snapshotting and sampler control.