The Racetrack Problem

Off-Policy Monte Carlo Control with Importance Sampling.