Offline Monte Carlo Tree Search for Statistical Model Checking of Markov Decision Processes
To find the optimal policy for large Markov Decision Pro- cesses (MDPs), where state space explosion makes analytic methods infeasible, we turn to statistical methods. In this work, we apply Monte Carlo Tree Search to learning the opti- mal policy for a MDP with respect to a Probabilistic Bounded Linear Temporal Logic property. After we have the policy, we can proceed with statistical model checking or probability estimation for the property against the MDP. We have im- plemented this method as an extension to the PRISM proba- bilistic model checker, and discuss its results for several case studies.