ast_toolbox.mcts.MDP module

class ast_toolbox.mcts.MDP.TransitionModel(getInitialState, getNextState, isEndState, maxSteps, goToState)[source]

Bases: object

The wrapper for the transitin model used in the tree search.

Parameters:
  • getInitialState (function) – getInitialState() returns the initial AST state.
  • getNextState (function) – getNextState(s, a) returns the next state and the reward.
  • isEndState (function) – isEndState(s) returns whether s is a terminal state.
  • maxSteps (int) – The maximum path length.
  • goToState (function) – goToState(s) sets the simulator to the target state s.
ast_toolbox.mcts.MDP.simulate(model, p, policy, verbose=False, sleeptime=0.0)[source]

Simulate the environment model using the policy and the parameter p.

Parameters:
  • model (ast_toolbox.mcts.MDP.TransitionModel) – The environment model.
  • p – The extra paramters needed by the policy.
  • policy (function) – policy(p, s) returns the next action.
  • verbose (bool, optional) – Whether to logging simulating information.
  • sleeptime (float, optional) – The pause time between each step.
Returns:

  • cum_reward (float) – The cumulative reward.
  • actions (list) – The action sequence of the path.