This document discusses using R and RStudio to simulate reinforcement learning models. It demonstrates simulating a Rescorla-Wagner model to update action values Q_A and Q_B based on payoffs from actions A and B over time. The model is expanded to select actions stochastically using a softmax function of the difference between Q_A and Q_B. Plots show the evolution of Q_A and Q_B over time for different learning rate and temperature parameters. The document provides an example code implementation of this reinforcement learning model in R.