ast_toolbox.rewards package¶
Reward functions for AST formulated RL problems.
-
class
ast_toolbox.rewards.
ASTReward
[source]¶ Bases:
object
Function to calculate the rewards for timesteps when optimizing AST solver policies.
-
class
ast_toolbox.rewards.
ExampleAVReward
(num_peds=1, cov_x=0.1, cov_y=0.01, cov_sensor_noise=0.1, use_heuristic=True)[source]¶ Bases:
ast_toolbox.rewards.ast_reward.ASTReward
An example implementation of an ASTReward for an AV validation scenario.
Parameters: - num_peds (int) – The number of pedestrians in the scenario.
- cov_x (float) – Covariance of the x-acceleration.
- cov_y (float) – Covariance of the y-acceleration.
- cov_sensor_noise (float) – Covariance of the sensor noise.
- use_heuristic (bool) – Whether to include a heuristic in the reward based on how close the pedestrian is to the vehicle at the end of the trajectory.
-
give_reward
(action, **kwargs)[source]¶ Returns the reward for a given time step.
Parameters: - action (array_like) – Action taken by the AST solver.
- kwargs – Accepts relevant info for computing the reward.
Returns: reward (float) – Reward based on the previous action.
-
mahalanobis_d
(action)[source]¶ Calculate the Mahalanobis distance [1] between the action and the mean action.
Parameters: action (array_like) – Action taken by the AST solver. Returns: float – The Mahalanobis distance between the action and the mean action. References
[1] Mahalanobis, Prasanta Chandra. “On the generalized distance in statistics.” National Institute of Science of India, 1936. http://library.isical.ac.in:8080/jspui/bitstream/123456789/6765/1/Vol02_1936_1_Art05-pcm.pdf