This document presents an approach for improving maintenance policies for multi-state systems. It first formalizes the transition process of a multi-state system using dynamic Bayesian networks. It then exhibits a cost function for preventive maintenance and an optimization method using reinforcement learning to identify the best combination of transition rates and preventive maintenance policy. The dynamic Bayesian network approach models the probability distributions of the system's state over time and allows for more compact representation compared to Markov chains. The reinforcement learning optimization seeks to minimize cost and maximize availability by learning the optimal preventive maintenance levels over the system's lifetime.