Reinforcement learning allows an agent to learn behaviors without being directly supervised. The agent learns through trial-and-error interactions with its environment. It discovers which actions yield the most reward and learns to maximize its long-term reward. The key components are the agent, environment, rewards, states, and actions. Q-learning is a model-free technique that finds the optimal action selection policy using a Q-function to estimate expected rewards for state-action pairs. The agent explores various actions and updates its Q-values based on rewards received to gradually learn the optimal policy. Reinforcement learning has applications in game playing, robot control, and other domains.