Autonomous head-to-head racing presents a formidable challenge, demanding vehicles to operate at the limits of friction and handling in pursuit of minimal lap times, all while strategically seeking to overtake or maintain the lead against opponents. In this study, we begin by advancing the state-of-the-art head-to-head racing environment by incorporating realistic high-fidelity vehicle dynamics with a non-linear tire model. Some prior attempts have sought to directly learn a policy in the complex vehicle dynamics environment but have failed to learn an optimal policy. Our approach, however, centers on a curriculum learning-based framework, progressively transitioning from a simpler vehicle model to a more complex real environment for an approximated optimal policy. Additionally, we propose an innovative safe reinforcement learning algorithm grounded in control barrier functions, which ensures the agent’s safety without compromising optimality. We evaluate our framework on both the proposed head-to-head racing environment and the Safety Gym benchmarks.