ast_toolbox.mcts.AdaptiveStressTestingBlindValue module¶
-
class
ast_toolbox.mcts.AdaptiveStressTestingBlindValue.
AdaptiveStressTestBV
(**kwargs)[source]¶ Bases:
ast_toolbox.mcts.AdaptiveStressTesting.AdaptiveStressTest
The AST wrapper for MCTS using the Blind Value exploration [1].
Parameters: kwargs – Keyword arguments passed to ast_toolbox.mcts.AdaptiveStressTesting.AdaptiveStressTest References
[1] Couetoux, Adrien, Hassen Doghmen, and Olivier Teytaud. “Improving the exploration in upper confidence trees.” International Conference on Learning and Intelligent Optimization. Springer, Berlin, Heidelberg, 2012. -
explore_action
(s, tree)[source]¶ Sample an action for the exploration using Blind Value.
Parameters: - s (
ast_toolbox.mcts.AdaptiveStressTesting.ASTState
) – The current state. - tree (dict) – The searching tree.
Returns: action (
ast_toolbox.mcts.AdaptiveStressTesting.ASTAction
) – The sampled action.- s (
-
getBV
(y, rho, A, UCB)[source]¶ Calculate the Blind Value for the candidate action y
Parameters: - y (
numpy.ndarry
) – The candidate action. - rho (float) – The standard deviation ratio.
- A (list[
ast_toolbox.mcts.AdaptiveStressTesting.ASTAction
]) – The list of the explored AST actions - UCB (dict) – The dictionary containing the upper confidence bound for each explored action in the state node.
Returns: BV (float) – The blind value.
- y (
-
getDistance
(a, b)[source]¶ Get the (L2) distance between two actions.
Parameters: - a (
numpy.ndarry
) – The first action. - b (
numpy.ndarry
) – The second action.
Returns: distance (float) – The L2 distance between a and b.
- a (
-
getUCB
(s)[source]¶ Get the upper confidnece bound for the expected return for evary actions that has been explored at the state.
Parameters: s ( ast_toolbox.MCTSdpw.StateNode
) – The state node in the searching treeReturns: UCB (dict) – The dictionary containing the upper confidence bound for each explored action in the state node.
-