ast_toolbox.algos.ga module

class ast_toolbox.algos.ga.GA(top_paths=None, n_itr=2, batch_size=500, step_size=0.01, step_size_anneal=1.0, pop_size=5, truncation_size=2, keep_best=1, f_F='mean', log_interval=4000, init_step=1.0, **kwargs)[source]

Bases: garage.tf.algos.batch_polopt.BatchPolopt

Deep Genetic Algorithm from Such et al. [1].

Parameters:
  • top_paths (ast_toolbox.mcts.BoundedPriorityQueues, optional) – The bounded priority queue to store top-rewarded trajectories.
  • step_size (float, optional) – Standard deviation for each mutation.
  • step_size_anneal (float, optional) – The linear annealing rate of step_size after each iteration.
  • pop_size (int, optional) – The population size
  • truncation_size (int, optional) – The number of top-performed individuals that are chosen as parents.
  • keep_best (int, optional) – The number of top-performed individuals that remain unchanged for next generation.
  • f_F (string, optional) – The method used to calculate fitness: ‘mean’ for the average return, ‘max’ for the max return.
  • log_interval (int, optional) – The log interval in terms of environment calls.
  • kwargs – Keyword arguments passed to garage.tf.algos.BatchPolopt.

References

[1]Such, Felipe Petroski, et al. “Deep neuroevolution: Genetic algorithms are a competitive
alternative for training deep neural networks for reinforcement learning.”
arXiv:1712.06567 (2017).
extra_recording(itr)[source]

Record extra training statistics per-iteration.

Parameters:itr (int) – The iteration number.
get_fitness(itr, all_paths)[source]

Calculate the fitness of the collexted paths.

Parameters:
  • itr (int) – The iteration number.
  • all_paths (list[dict]) – The collected paths from the sampler.
Returns:

fitness (list[float]) – The list of fitness of each individual.

get_itr_snapshot(itr, samples_data)[source]

Get the snapshot of the current population.

Parameters:
  • itr (int) – The iteration number.
  • samples_data (dict) – The processed data samples.
Returns:

snaposhot (dict) – The training snapshot.

init_opt()[source]

Initiate trainer internal tensorflow operations.

initial()[source]

Initiate trainer internal parameters.

mutation(itr, new_seeds, new_magnitudes, all_paths)[source]

Generate new random seeds and magnitudes for the next generation.

The first self.keep_best seeds are set to no-mutation value (0).

Parameters:
  • itr (int) – The iteration number.
  • new_seeds (numpy.ndarry) – The original seeds.
  • new_magnitudes (numpy.ndarry) – The original magnitudes.
  • all_paths (list[dict]) – The collected paths from the sampler.
Returns:

  • new_seeds (numpy.ndarry) – The new seeds.
  • new_magnitudes (numpy.ndarry) – The new magnitudes.

obtain_samples(itr, runner)[source]

Collect rollout samples using the current policy paramter.

Parameters:
  • itr (int) – The iteration number.
  • runner (garage.experiment.LocalRunner) – LocalRunner is passed to give algorithm the access to runner.obtain_samples(), which collects rollout paths from the sampler.
Returns:

paths (list[dict]) – The collected paths from the sampler.

optimize_policy(itr, all_paths)[source]

Update the population represented by self.seeds and self.parents.

Parameters:
  • itr (int) – The iteration number.
  • all_paths (list[dict]) – The collected paths from the sampler.
process_samples(itr, paths)[source]

Return processed sample data based on the collected paths.

Parameters:
  • itr (int) – The iteration number.
  • paths (list[dict]) – The collected paths from the sampler.
Returns:

samples_data (dict) – Processed sample data with same trajectory length (padded with 0)

record_tabular(itr)[source]

Record training performace per-iteration.

Parameters:itr (int) – The iteration number.
select_parents(fitness)[source]

Select the individuals to be the parents of the next generation.

Parameters:fitness (list[float]) – The list of fitness of each individual.
set_params(itr, p)[source]

Set the current policy paramter to the specified iteration and individual.

Parameters:
  • itr (int) – The iteration number.
  • p (int) – The individual index.
train(runner)[source]

Start training.

Parameters:runner (garage.experiment.LocalRunner) – LocalRunner is passed to give algorithm the access to runner.step_epochs(), which provides services such as snapshotting and sampler control.