ast_toolbox.rewards package¶

Reward functions for AST formulated RL problems.

class ast_toolbox.rewards.ASTReward[source]¶

Bases: object

Function to calculate the rewards for timesteps when optimizing AST solver policies.

give_reward(action, **kwargs)[source]¶

Returns the reward for a given time step.

Parameters:	action (array_like) – Action taken by the AST solver. kwargs – Accepts relevant info for computing the reward.
Returns:	reward (float) – Reward based on the previous action.

class ast_toolbox.rewards.ExampleAVReward(num_peds=1, cov_x=0.1, cov_y=0.01, cov_sensor_noise=0.1, use_heuristic=True)[source]¶

An example implementation of an ASTReward for an AV validation scenario.

Parameters:

num_peds (int) – The number of pedestrians in the scenario.
cov_x (float) – Covariance of the x-acceleration.
cov_y (float) – Covariance of the y-acceleration.
cov_sensor_noise (float) – Covariance of the sensor noise.
use_heuristic (bool) – Whether to include a heuristic in the reward based on how close the pedestrian is to the vehicle at the end of the trajectory.

give_reward(action, **kwargs)[source]¶

Returns the reward for a given time step.

Parameters:	action (array_like) – Action taken by the AST solver. kwargs – Accepts relevant info for computing the reward.
Returns:	reward (float) – Reward based on the previous action.

mahalanobis_d(action)[source]¶

Calculate the Mahalanobis distance [1] between the action and the mean action.

Parameters:	action (array_like) – Action taken by the AST solver.
Returns:	float – The Mahalanobis distance between the action and the mean action.

References

[1]	Mahalanobis, Prasanta Chandra. “On the generalized distance in statistics.” National Institute of Science of India, 1936. http://library.isical.ac.in:8080/jspui/bitstream/123456789/6765/1/Vol02_1936_1_Art05-pcm.pdf

Submodules¶