Artificial intelligence is one of the most important areas in recent years, as the development of reinforcement learning, heavily influenced by human nature and psychology, bridges the gap between technology and humans. It overcomes the problem of data acquisition by almost completely eliminating the need for data. Reinforcement learning involves training a model to find an optimal solution to a problem, making decisions independently and interacting with the environment. Through rewards, it learns to judge which actions to take to achieve its goal. Traffic congestion is increasing worldwide and the problem needs to be addressed. In a dynamically changing and interconnected transport environment, current traffic regulations are not adaptable. An intelligent transport system is needed to improve the efficiency of the road network of smart cities. The present Diploma Thesis proposes a system for calculating the timing of traffic lights in order to minimize the waiting time of vehicles. Each traffic light at an intersection is trained to learn to change its phase according to traffic. The proposed road system has a flexible structure that is modified by adding more intersections to the original structure of the simple intersection. Q-learning is an RL algorithm used to select the next optimal signal action in a given state. It works by sequentially improving the rewards for the state-action pairs, which are stored in a Q-table as traffic light information. The tool SUMO was used to simulate the road networks. The models were trained and studied in the environments of road networks with N intersections, where N = 1,2,4,6, and the traffic lights of each intersection were trained to reduce traffic. The results of the training are compared with the responses of the current traffic management models. In addition, Q-tables of simple structures (N = 1,2) are applied to the most complex networks to assess the correspondence of systems with the experience of simple structures. According to the results of the training of the models and the experiments, all models responded efficiently to a variety of traffic situations, although the training time increases with complexity. An optimal model requires more training time than a simply good model, so there is a trade-off between training time and optimal response that every researcher should consider.