Abstract: The explore-exploit dilemma in Markov Decision Processes (MDPs) is a fundamental challenge, especially in deterministic environments akin to real-world scenarios. Balancing exploration and ...