Brian Bumpas-Final Draft

Proting from the Lottery
Brian Bumpas
December 5, 2011
Contents
1 Introduction 2
2 Lotteries in General, Abrams and Garibaldi's Model, and Some Denitions 2
2.1 The Lottery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 Denitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Model Analysis 8
3.1 Propositions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 eRoR Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.3 Graphical Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4 Model Application 16
4.1 California's MEGA Millions Lottery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4.2 Maximum Likelihood Estimations for N . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
4.3 Obtaining the Other Necessary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5 Conclusion 25
1

1 Introduction
While the lottery is generally a bad bet and the chances of winning are extremely slim, there may be
circumstances in which it actually turns out to be statistically worthwhile. For instance, the largest jackpot
won in California was $315 million, a huge sum split between seven people in 2005 [2]. With such a huge
payo, was it actually a smart idea to buy into the lottery? The enormous possible payout greatly overwhelms
the cost of the $1 ticket necessary to enter into the lottery, and this skews the expected rate of return. We
will nd that there have indeed been some cases where the jackpot was enough to produce a positive rate of
return. This paper will use and explain a model developed by Aaron Abrams and Skip Garibaldi to analyze
just how big a jackpot must be for these circumstances to arise and for the lottery to be a good bet.
In order to accomplish this and to obtain a better grasp on the feasibility of a worthwhile wager, we
will apply Abrams and Garibaldi's model to California's version of the MEGA Millions lottery. Through
California State Lottery's published statistics on the amount of winners per drawing and the probability of
winning each prize, we will estimate the total number of people in each drawing. Then with the statistics on
the number of people who win each prize, we can also estimate the expected rate of return for each prize.
Once these results are obtained, we will compare each lottery above to those in Abrams and Garibaldi's
article, Finding Good Bets in the Lottery, and Why You Shouldn't Take Them. From this analysis, we
produce a strategy to decide when the lottery is a good bet.
2 Lotteries in General, Abrams and Garibaldi's Model, and Some
Denitions
Before we analyze the model, we will explain the general setup of a lottery, dene a few terms used throughout
the paper, then introduce the model itself.
2.1 The Lottery
While there are many dierent types of lotteries, we will only deal with one in this paper. In particular, we
consider games with large jackpots. In these games, a player purchases a lottery ticket before a particular
drawing takes place. Drawings are often conducted weekly or twice weekly. For each drawing, the lottery
operator randomly selects numbers that match certain criteria. Basically, the player guesses which numbers
will be randomly chosen, in the hope that he or she picked correctly. Then if a player's numbers match
enough of those that were drawn by the lottery operator, the player wins a prize. Note that winners can
claim only one type of prize: he or she wins the best prize that is applicable to their number choices. There
2

can be many dierent prizes (usually there are between ve and ten ), but we will divide these into three
dierent over-arching categories.
We will refer to the rst category as a fixed prize: if a player meets the criteria for winning this kind
of award, he or she receives a xed amount of money. We say this is xed because the amount awarded
does not vary per drawing, and it does not vary based o of how many people play during a drawing.
We call the second category a pari-mutuel prize. Pari-mutuel literally means mutual bet. Applied
to the lottery, a pari-mutuel prize is one that is split between numerous winners. Furthermore, for any
particular drawing, the amount paid out from a pari-mutuel prize is determined by some proportion of
the total amount collected. From here on out, we will use an index of i's in order to indicate that there
are multiple pari-mutuel prizes: for a lottery with d pari-mutuel prizes, we have 1 ≤ i ≤ d. Then if the
lottery operators determine that the total pool of money dedicated to a certain pari-mutuel prize should be
determined by some proportion ri, then for a drawing that collects N dollars, that prize pool will be valued
at a total of riN dollars. Once this amount is determined, the pool is split equally between each winner
of that prize. Then if ni players win that same prize, each of those players will receive riN/ni dollars. It
is important to note that the pari-mutuel prize amount varies per drawing: if more people play the lottery
during one particular drawing, more money is collected by the lottery, so the pari-mutuel prize is worth
more. On the other hand, if more people win that prize during a particular drawing, it is split between more
people, so it is worth less to an individual.
Now, the nal type of prize is the jackpot. For each drawing where nobody wins the jackpot, the
amount awarded for this prize will grow. Strictly speaking, this is most frequently (but not necessarily
always) a pari-mutuel prize because it is usually split between each winner. MEGA Millions and California
SuperLOTTO Plus both oer pari-mutuel jackpots: the amount by which the jackpot increases per drawing
is proportional to the number of people who play during that drawing, and if multiple people win the jackpot,
it is split between each winner. Despite the fact that the jackpot is also a pari-mutuel prize, we create a
dierent category because it is usually far more valuable than any other pari-mutuel prize. One can think
of the jackpot as the First Prize because everybody wants to win this amount.
This parlance can be rather muddled, so let us talk specics. Lotto Texas is a game in which a player
chooses six numbers, where each number is between 1 and 54. We will henceforth refer to a player's choice of
numbers as a ticket. In the Lotto Texas game, each ticket costs $1. A new drawing is held every Wednesday
and Saturday evening. For the drawing itself, a machine randomly pulls out six balls. If the player's ticket
has three matching numbers, then he or she can claim $3. Because there are
6
3 ways to choose three of the
six winning numbers and
48
3 ways to choose three of the remaining numbers, the probability of winning
this xed prize is
6
3
48
3 × 1
25,827,165 = 345,920
25,827,165 ≈ 0.013394. Thus, the odds of winning this prize are
3

approximately 1 in 75. Since the prize value does not change per drawing, it is a xed prize. Considering
another winning scenario, if the player has four matching numbers, then he or she can claim a pari-mutuel
prize that depends on how many tickets are purchased in total. This prize category is usually valued between
$40 and $50, and the odds of winning are about 1 in 1,526. If the player has ve matching numbers, then
he or she can claim a dierent pari-mutuel prize, usually valued between $1,000 and $4,000. The odds of
winning are about 1 in 89,678. Lastly, if the player has six matching numbers, he or she can claim the
jackpot. This prize starts at $5 million, and increases by $1 million for each drawing where nobody wins
the prize. The odds of winning are approximately 1 in 25,827,165. Since the jackpot is pari-mutuel, if two
people win the jackpot, it is split evenly between each person. These statistics can be seen in [3].
For example, say that John buys a ticket on Sunday for $1. He had to pick six numbers, each between
1 and 54, without repetition. But John is predictable: his favorite numbers are 3, 8, 20, 37, 42, and 47,
so his ticket consists of these numbers. That following Wednesday, the Lotto Texas drawing is published.
It turns out that the winning ticket was 3, 8, 15, 20, 42, 47. Unfortunately, John did not win the jackpot
because his choice of 37 does not match the randomly selected choice of 20. However, the remaining ve
of his numbers match those on the winning ticket, so he wins the ve matching numbers category. For
illustration, let us assume that during this drawing, the total collected by the lottery N was $2,000,000
(since each ticket is $1, this also means that 2,000,000 tickets were purchased), and for the ve matching
numbers category, Lotto Texas pays out a total pool of .005N. Then for this particular drawing, John's prize
pool is worth .005 × $2, 000, 000 = $10, 000. But let us say that four other players also won this prize, so
John doesn't win the whole poolrather, it is split equally between each person. Then each winner receives
$10, 000/5 = $2, 000. So even though John did not win the jackpot, he is extremely happy because he came
out nearly $2,000 richer!
1
2.2 Denitions
Denition: Suppose a random variable X can take value X1 with probability P1, value X2 with probability
P2, and so on, up to value Xk with probability Pk. The Expected Value of X is
E(X) =
k
i=1
XiPi.
1Note: For ease, we did not consider taxes in this hypothetical circumstance.
4

Intuitively, the expected value of X represents the value X will take on average. We will dene the expected
rate of return analogously.
Denition: For any particular asset or investment, its Expected Rate of Return (eRoR) is dened
as
(eRoR) =
n
i=1
PiRi, (2.1)
where each Pi is the probability that the return Ri (in dollars) is attained for the asset, and n is the
total number of possibilities.
This is a formula that we will modify numerous times. It is the sum of the products of all possible returns
and their corresponding probabilities. Observe that
n
i=1 pi = 1. Also, the expected rate of return of an
asset can be either positive or negative. For our purposes, let a bad bet be an asset that has a negative
expected rate of return. Similarly, let a good bet be an asset with a positive expected rate of return.
As a quick example, say a nancial analyst determines that purchasing a $1 stock in the company
Oilelds Inc. will provide a return R1 = $0.20 with probability P1 = 0.05. Note that the positive return
indicates that for every dollar invested in Oilelds Inc., the investor will get $1.20 back in return with
a probability of 0.05. Let us also say that other returns and their respective probabilities are given by:
R2 = $0.50, P2 = 0.50, R3 = −$0.35, P3 = 0.45. Here, R3 = −$0.35 indicates that a $1 investment will
only give back sixty-ve cents with a probability of 0.45. If this is the outcome, the investor loses thirty-ve
cents for every dollar he or she puts in to the stock. To nd the expected rate of return, we sum each of
these products to get:
(eRoR) = $0.20 × 0.05 + $0.50 × 0.50 − $0.35 × 0.45 ≈ $0.10.
So for this example, every dollar invested should give a net gain of ten cents. So putting money into this
stock should (on average) be a good bet in that the investor should not lose money from it.
We can now observe that for the lottery,
(eRoR) = − (cost of ticket) + (expected winnings from xed prizes) + (expected winnings from pari-
mutuel prizes) + (expected winnings from jackpot) (2.2)
5

2.3 The Model
Now we will duplicate Abrams and Garibaldi's model. To begin, we will dene some terms for individual
drawings. So that we stay consistent with Section 2.1, let N be the total number of ticket sales for a
drawing, in dollars. Now any particular lottery will have many dierent possibilities for each ticket. Let t
be the total number of those possibilities, which can also be referred to as the number of distinct tickets.
For example, in the Lotto Texas example, since each player chooses six numbers between 1 and 54, there are
t = 54
6 = 25, 827, 165 distinct tickets. Then if Alice goes and buys a ticket to this lottery, she can choose
from 25,827,165 distinct tickets. Now, we want to have some way to represent each possible prize. Then we
will say:
• There are tx
i distinct ways to win a xed prize worth ai dollars. In the Lotto Texas example, a1 = $3,
and there is only one xed prize. To win this prize, the player must have three matching numbers. As we
discussed in Section 2.1, there are tx
1 = 6
3
48
3 = 345, 920 ways to win this prize because there are
6
3
ways to choose three of the six winning numbers and
48
3 ways to choose three of the remaining numbers.
Thus, the probability of winning this xed prize is 345, 920/25, 827, 165 ≈ 0.013394. Generally, with
a total of c xed prizes, we will say that a total of tx
1 , tx
2 , tﬁx
3 , · · · , tﬁx
c tickets win xed prizes of
(positive) dollar amounts a1, a2, a3, . . . , ac, respectively. Then of the total possible tickets t, any player
that purchases one of tx
i tickets will receive a prize with a value of ai dollars. Note that multiple people
can buy the same ticket and win these prizes without changing the value of the prize.
• There are tpari
i ways to split a pot of riN dollars. In the hypothetical Lotto Texas example, the category
of ve matching numbers splits a pot of r1N = $10, 000, where r1 = 0.005 and N = $2, 000, 000.
Since there are
6
4 ways to choose four of the six winning numbers and
48
2 ways of choosing the
remaining losing numbers, there are tpari
1 = 6
4
48
2 = 16, 920 ways to win this prize. Thus, the
probability of winning the prize is 16, 920/25, 827, 165 ≈ 0.000655. Generally speaking, there are
tpari
1 , tpari
2 , tpari
3 , · · · , tpari
d distinct tickets that can split a pari-mutuel pot of (positive) dollar amounts
r1N, r2N, r3N, . . . , rdN, respectively. Here, each ri is the proportion of total sales N given to the prize
won by tpari
i tickets. Note that the letter d represents the total number of pari-mutuel prizes. Then
any player that purchases one of these tpari
i tickets will receive a prize that is inversely proportional to
the number of players who win that prize: each individual wins a prize of
riN
ni
dollars, where ni is the
number of people who win the i-th pari-mutuel prize.
• One of the possible distinct tickets wins a share of the (positive) jackpot J. If w copies of this one
6

ticket are sold, then each ticket holder receives J/w, in dollars.
In order to ensure accuracy of prediction in nal estimates on rates of return, from now on we will assume
that the values above are after taxes. All values from here on are consequently adjusted for taxes.
For our model, we will also create some other variables to represent dierent statistics. These numbers
depend on the structure of each lottery, not on any particular drawing of that lottery. As a result, they do
not vary with the value of the jackpot for a drawing or the number of people who play during any particular
drawing. First, we are interested in the cost of a ticket minus the expected winnings from xed prizes. To
get this statistic, we start with the cost of a ticket. Now, in order to nd the average winnings that a player
will receive from xed prizes, we add together the value of each prize (and we do this once for every winning
ticket), then we divide this by the total number of tickets. Finally, we subtract this quotient from the cost
of each ticket. We will denote this by f. Then with a ticket that costs $1,
f := 1 −
c
i=1 aitﬁx
i
t
.
Thus, f represents the loss that a player will incur per ticket on average, after taking xed prizes into
account. This does not account for pari-mutuel prizes. Observe that the value
c
i=1 aitx
i /t gives the
proportion that a player can expect ot receive from xed prizes per ticket purchased. Then by subtracting
c
i=1 aitx
i /t, we get f. We can also view f from the lottery operator's point of view: f represents the
proportion of each ticket that goes to the jackpot, the pari-mutuel prizes, and the amount that the operator
takes in as prot or to cover costs. Note that since each ai is non-negative and since the lottery operator
must stay protable, 0 f ≤ 1.
Furthermore, we will dene F to be:
F := 1 −
c
i=1 aitﬁx
i
t
−
d
i=1
ri = f −
d
i=1
ri. (2.3)
Unfortunately, there is no direct interpretation of F from the player's perspective. However, remember
that f can be used to represent the proportion of every ticket that goes to the jackpot, pari-mutuel prizes, and
lottery overhead. So subtracting each rate ri from f will remove the pari-mutuel prizes from the equation.
Thus from the lottery operator's perspective, F represents the proportion of every ticket that goes to jackpot
and overhead. Furthermore, a lottery will put at least some money into both F and f, and the operator
must prot from the lottery, so 0 F f ≤ 1.
7

Also note that the probability of winning the jackpot with w − 1 other people is given by:
N − 1
w − 1
1
t
w
1 −
1
t
N−w
. (2.4)
We justify this by choosing w−1 winners from a total of N −1 players (independent of your own ticket). This
gives the binomial coecient. Each contestant has a chance of p = 1
t to win, so
1
t multiplies the binomial
coecient w times. There is also a probability of 1 − 1
t to lose the jackpot, so we multiply by this N − w
times (since there are N − w losers).
Just for simplication and to decrease the space taken up by the equation for the (eRoR), let us dene
the function s as:
s(p, N) :=
w≥1
1
w
N − 1
w − 1
pw
(1 − p)N−w
.
With this denition, the expected return from the jackpot is Js (1/t, N). We multiply by
1
w because the
value of the player's share of the jackpot, after split with w − 1 other people, is
J
t . Similarly, we can nd
the amount that one can expect to win from pari-mutuel prizes: since the rate of compensation is
d
i=1 ri,
we multiply by the probability of winning that prize in order to get the (eRoR). So for the i-th pari-mutuel
prize, the expected rate of return for that prize alone is riNs (pi, N). We do this for each pari-mutuel prize.
Then the expected winnings from pari-mutuel prizes is
d
i=1 riNs(pi, N). Also remember that f is the
average loss a player will incur after accounting for xed prizes. Then −f gives us the negative of the cost of
a ticket plus the expected winnings from exed prizes. Now, combining these into Equation (2.2) transforms
it into:
(eRoR) = − (cost of ticket) + (expected winnings from xed prizes) + (expected winnings from pari-
mutuel prizes) + (expected winnings from jackpot)
= −f +
d
i=1
riNs(pi, N) + Js(
1
t
, N), (2.5)
where pi is the probability of winning the i-th pari-mutuel prize and d is the number of pari-mutuel prizes.
Note that it is neither certain that a person will win any prize in the lottery nor certain that they will lose,
so 0 pi 1.
3 Model Analysis
Here we will introduce some mathematical insights and then proceed with analyzing the model itself.
8

3.1 Propositions
Let us now take some propositions into consideration. In the interest of space, we will only prove one of
them, but the rest are justied in Abrams and Garibaldi's paper.
Proposition 3.1: For 0 p 1 and N 0, we have:
s(p, N) =
1 − (1 − p)N
N
.
Plugging this into Equation (2.5) results in:
(eRoR) = −f +
d
i=1
ri 1 − (1 − pi) N
+ Js
1
t
, N . (3.1)
Lemma 3.2: If 0 c 1, then 1 − 1
c − ln c 0.
Lemma 3.3: If b ∈ (0, 1), then:
1. The function
h(x, y) :=
1 − bxy
x
satises hx := ∂h
∂x 0 and hy := ∂h
∂y 0 for x, y 0.
2. For every z 0, the level set [(x, y) | h(x, y) = z] intersects the rst quadrant in the graph of a smooth,
positive, increasing, concave up function dened on the interval (0, 1
z ).
With these lemmas, we can prove the following corollary.
Corollary 3.4 (Abrams Garibaldi in [1]): Suppose that 0 p 1. Then for x ≥ 0,
1. The function x → s(p, x) decreases from − ln(1 − p) to 0.
2. The function x → xs(p, x) increases from 0 to 1.
9

Proof:
1. We will rst show that x → s(p, x) is a decreasing function, then show that its limit as x → ∞ is 0,
and nally show that as x → 0+
, s (p, x) → − ln (1 − p). Together, these facts show that x → s(p, x)
decreases from − ln(1 − p) to 0.
(a) To show that x → s(p, x) is decreasing, we set y = 1, and b = 1 − p. Then h (x, y) becomes
h (x, 1) = 1−(1−p)x
x = s (p, x). We have 0 1 − p 1, so by Lemma 3.3, s (p, x) is decreasing.
(b) We will rst show x → s(p, x) tends to 0. But we can do this by inspection: since 0 p 1, we
know 0 1 − p 1, so limx→∞ (1 − p)
x
= 0. Also, x → ∞, we know that
1
x → 0. It follows that
limx→∞ s (p, x) = 0. Then x → s(p, x) decreases to 0.
(c) For 0 a 1, ln (a) 0. Then − ln (a) 0 on the same interval. Because 0 1 − p 1, it
follows that − ln(1 − p) 0. Then we must nd the limit of x → s(p, x) as x → 0+
. We will do
so using l'Hopital's Rule. Deriving the numerator and denominator of s (p, x) with respect to x,
we get:
lim
x→0+
s(p, x) = lim
x→0+
d
dx [1 − (1 − p)
x
]
d
dx x
= lim
x→0+
− ln (1 − p) (1 − p)
x
1
= − ln (1 − p) (1 − p)
0
= − ln (1 − p) .
2. With xs (p, x) = x · 1−(1−p)x
x = 1 − (1 − p)
x
. Then as x → ∞, xs (p, x) → 1. Similarly, as x →
0, xs (p, x) → 0. Hence, on x ≥ 0, xs (p, x) increases from 0 to 1.
3.2 eRoR Analysis
In this section, we will begin deeper analysis of the (eRoR). With the above results, we can nd that there is
eectively a jackpot cuto J0. With a jackpot value that is less than J0, that lottery drawing will certainly
10

be a bad bet. This essentially gives us an upper bound on the (eRoR) in Equation (3.1). We will nd that
J0 is so high that a lottery will rarely exhibit positive returns.
Note that since 0 ≤ pi ≤ 1, each term 1 − (1 − pi)
N
≤ 1. Also, −f +
d
i=1 ri = −F. Then Equation
(3.1) becomes:
(eRoR) = −f +
d
i=1
ri 1 − (1 − pi) N
+ Js
1
t
, N
≤ −f +
d
i=1
ri + Js
1
t
, N
= −F + Js
1
t
, N .
But by Corollary 3.1, since N 1 and s(p, x) is decreasing, s 1
t , N s 1
t , 1 = 1
t . Then:
(eRoR) −F +
J
t
. (3.2)
With this insight, we will dene the value J0 as the jackpot cutoff. We call this the jackpot cuto
because for a drawing with a lottery jackpot J such thatt J J0, (eRoR) will certainly be negative. We let:
J0 := Ft.
Then in order for expected returns to be positive, we must have the lottery's jackpot J Ft = J0. In
other words, the jackpot must be greater than the proportion of lottery income that goes into the jackpot
and overhead multiplied by the total number of distinct tickets. This is why we call J0 = Ft the jackpot
cuto. Any lottery with jackpot J J0 will then be a bad bet. Equivalently, we need J large enough so
that
J
J0
1 for the lottery to be a good bet. This normalization of J helps provide useful criteria that will
assist us in graphing our model. Similarly, we can normalize N and deal with the ratio
N
J . This will also
help us in graphical analysis: we can think of the jackpot as large or small in comparison with the number
of people who participate in a particular drawing. With these ratios, we dene two new variables:
x := N
J so that N = xJ;
and y := J
J0
so that J = yJ0.
Here we assume that there is a jackpot cuto J0 that is greater than 0. On a similar note, we also assume
that an individual drawing's jackpot J is also greater than 0. Lastly, since N and J both vary per drawing
(but J0 does not), x and y also vary per drawing. Then x and y will simplify our graphical analysis and give
11

us relatively easy criteria that we can use to determine when a lottery is a good bet. If we substitute our
new expressions of N and J as well as our old denition of s (p, N) into (3.4), we obtain:
(eRoR) = −f +
d
i=1
ri 1 − (1 − pi) N
+ Js
1
t
, N
= −f +
d
i=1
ri 1 − (1 − pi)
xJ
+ (yJ0) s
1
t
, xJ
= −f +
d
i=1
ri 1 − (1 − pi)
xyJ0
+ (yJ0) s
1
t
, xyJ0
= −f +
d
i=1
ri 1 − (1 − pi)
xyJ0
+ (yJ0) ·
1 − 1 − 1
t
xyJ0
xyJ0
.
Then:
(eRoR) = −f +
d
i=1
ri 1 − (1 − pi)
xyJ0
+
1 − 1 − 1
t
xyJ0
x
. (3.3)
This equation gives us our most powerful graphical insight: when we set this equation equal to 0, we
have the break-even curve. In the interest of our graphical analysis, we do not simplify the exponents.
For any lottery, we can substitute the values of f, ri, pi, and
1
t , then graph the results as level curves in a
two-dimensional eld. We will apply this equation and graph it in Section 4.3. Any particular ordered pair
(x0, y0) that lies on this break-even curve will have an (eRoR) of 0, while any ordered pair that lies above
it will have positive returns.
Proposition 3.5 (Abrams and Garibaldi in [1]): For every lottery, the break-even curve is the graph
of a smooth, positive function g(x) with domain 0, 1
F .
Proof:
With any xed value of x, we can think of (eRoR) as a smooth function of one variable, namely y. This
variable can never be 0 mathematically because that would imply J = 0, which in turn would make x = N
J
undened. Practically speaking, we will never have a jackpot J = 0 because each lottery that has a jackpot
will put some money towards it.
12

Then as y → 0+
, we have:
lim
y→0+
(eRoR) = lim
y→0+
−f +
d
i=1
ri 1 − (1 − pi)
xyJ0
+
1 − 1 − 1
t
xyJ0
x
= −f +
d
i=1
ri 1 − (1 − pi)
0
+
1 − 1 − 1
t
0
x
= −f +
d
i=1
ri (1 − 1) +
1 − 1
x
= −f.
Note that −f 0, so there is a value of y close to 0 (call it y0) such that at that point, our (eRoR) is
negative. Intuitively speaking, this makes sense. That is, we know that a lottery has to stay protable, so
it cannot hand out more in xed prizes than it takes in via ticket sales. Then the only way we can have
(eRoR) 0 is if nobody has won the jackpot in such a long time that it has increased to a huge sum.
Next, let us consider the partial derivatives of our (eRoR) function with respect to y. Keep in mind that
f is not a function of J or J0, so
∂f
∂y = 0. This gives us:
∂
∂y
(eRoR) =
∂
∂y
−f +
d
i=1
ri 1 − (1 − pi)
xyJ0
+
1 − 1 − 1
t
xyJ0
x
= −
d
i=1
rixJ0 ln (1 − pi) (1 − pi)
xyJ0
−
xJ0 ln 1 − 1
t 1 − 1
t
xyJ0
x
= −
d
i=1
rixJ0 ln (1 − pi) (1 − pi)
xyJ0
− J ln 1 −
1
t
1 −
1
t
xyJ0
.
We know that 0 1 − pi 1 for each pi, so ln (1 − pi) is dened and is negative. Similarly, ln 1 − 1
t 0.
Then the partial derivative of the (eRoR) with respect to y is positive. Thus, (eRoR) is a continuous, strictly
increasing function of y on our domain. We also have:
lim
y→∞
(eRoR) = lim
y→∞
−f +
d
i=1
ri 1 − (1 − pi)
xyJ0
+
1 − 1 − 1
t
xyJ0
x
= −f +
d
i=1
ri +
1
x
= −F +
1
x
.
Thus, the limit can be positive only when x ∈ 0, 1
F . For any x in this interval, we will have a value of y
(call it yk) such that (eRoR) 0. Hence, since y0 0 yk and (eRoR) is a continuous function, we can
invoke the Intermediate Value Theorem to say that there is a unique y ∈ (y0, yk) such that (eRoR) = 0.
13

Deriving the (eRoR) with respect to x leads to a more complicated result, and it is unclear whether
(eRoR) increases or decreases in x. But we can use the Implicit Function Theorem to show that the graph
of the break-even curve is still smooth. In conclusion, since (eRoR) is a smooth function of two variables,
we can plot smooth level curves of (eRoR) as functions of x on the domain. And if we get the level curve
for (eRoR) = 0, this will give us the break-even curve.
3.3 Graphical Analysis
Now that we can graph the break-even curve and have found some of its general properties, we will establish
upper and lower bounds for it. This brings us to another of Abrams and Garibaldi's most crucial results,
which we will eventually prove. Let the functions U, V be such that:
U := (x, y) − 1 +
1 − 0.45xy
x
= 0
and
L := (x, y) − 0.8 +
1 − 0.36xy
x
= 0 .
Denition: A major lottery is one with at least 500 distinct tickets.
Note: This is a fairly arbitrary denition, and almost all state- or federal-run lotteries will match this
condition. In particular, the MEGA Millions lottery has over 150 million distinct tickets.
Lemma 3.6: The function g (t) = 1 − 1
t
t
is increasing t 1 and its limit as t → ∞ is 1
e .
Note: We will not prove this, but rest assured that it is justied in Abrams and Garibaldi's paper.
Theorem 3.7 (Abrams and Garibaldi in [1]): For any major lottery with F ≥ 0.8 so that the lottery
pays out less than 20% of revenue in prizes other than the jackpot:
1. The break-even curve lies in the region between the curves U and L and to the right of the y-axis.
2. Any drawing with (x, y) above the break-even curve has a positive (eRoR).
3. Any drawing with (x, y) below the break-even curve has a negative (eRoR).
14

Proof:
For Part (1), we will start by positing that for c 0,
1 − 0.45c
cJ0s
1
t
, cJ0 1 − 0.36c
. (3.4)
Now, we will show this is true. Since J0 = Ft and s(p, N) = 1−(1−p)N
N ,
cJ0s
1
t
, cJ0 = cJ0 ×
1 − 1 − 1
t
cJ0
cJ0
= 1 − 1 −
1
t
cJ0
.
By Lemma 3.6, 1 − 1
t
t
is increasing. And since we assumed t ≥ 500 and 1 − 1
500
500
= 0.998500
, we have:
1 −
1
t
t
≥ 0.998500
0.36.
Now, for the lower bound in (3.8), Lemma 3.6 implies 1 − 1
t
t
is at most
1
e . Combining these, we get:
1 − e−cF
cJ0s
1
t
, cJ0 1 − 0.36cF
. (3.5)
Since our hypothesis guarantees F ≥ 0.8, we can plug this in to obtain:
1 − e−cF
≥ 1 − e−c×0.8
≈ 1 − 0.4493c
1 − 0.45c
. (3.6)
Similarly,
1 − 0.36cF
≤ 1 − 0.36c
. (3.7)
Combining Equation (3.6) and Equation (3.7) with Equation (3.5) conclusively gives us our proposition in
Equation (3.4).
If we return to Equation (3.1), we can note that each ri is non-negative. Then to obtain a lower
bound, we can take pi = 0 for all i, because this will cancel out each term in the sum, leaving us with
−f + Js (1/t, N) ≤ (eRoR). Then combining this with the upper bound given by Equation (3.2), we obtain:
−f + Js
1
t
, N ≤ (eRoR) ≤ −F + Js
1
t
, N .
Then we can replace N with xyJ0 and J with
xyJ0
x = yJ0 and apply equation (3.5) with c = xy. Recall that
15

x and y 0. Then the inequality transforms into:
−f + Js
1
t
, N ≤ (eRoR) ≤ −F + Js
1
t
, N
−1 + yJ0s
1
t
, xyJ0 ≤ (eRoR) ≤ −0.8 + yJ0s
1
t
, xyJ0
−1 +
1 − 0.45xy
x
(eRoR) −0.8 +
1 − 0.36xy
x
. (3.8)
Analogously to Lemma 3.3, the partial derivates of the upper and lower bounds in Equation (3.8) are negative
with respect to x and positive with respect to y. This implies part (1) of this theorem.
Parts (2) and (3) follow directly from the proof of Proposition 3.5: for any xed value x, there is a y0
that will give us a value of the (eRoR) below the break-even curve (this will be negative), and a yk that will
give us a value of the (eRoR) above the break-even curve (this will be positive).
4 Model Application
Now we will begin our journey to acquire data so that we can graph the break-even curves of California's
lotteries. In order to do this, we will need to compile data to represent each lottery. Specically, the
California Lottery provides statistics on the probability of winning each prize (we denote this probability
pi), the value of each prize per drawing (denoted by vi), and the number of people winning each prize per
drawing (denoted by ni). We must use this data to nd the value of the jackpot cuto (denoted by J0),
the rates at which the lottery compensates for pari-mutuel prizes (denoted by ri), and the total number of
people playing in each drawing (denoted by N). However, in some situations, the best way to get this data
may not be inherently obvious.
4.1 California's MEGA Millions Lottery
Before we analyze the MEGA Millions lottery, let us rst briey discuss its setup. Each ticket costs $1, and a
player can choose ve numbers between 1 and 56 as well as one number between 1 and 46 (this is the MEGA
number). Drawings are conducted twice weekly, on Tuesdays and Fridays. A player's winnings are based
o of how many numbers he or she matches. The jackpot is given out to players who match all ve base
numbers as well as the MEGA number, while the second highest prize is given to players who just match the
ve base numbers. The third highest prize is given to players who match four base numbers as well as the
MEGA number, while the fourth highest prize is given to players who just match four base numbers, etc.
16

Players also win a prize for matching only the MEGA number. Each prize is pari-mutuel, so compensation
depends on how many people won that prize. There are no xed prizes. And the last piece of pertinent
information is that there are no California state taxes, while federal withholding taxes are levied at a rate of
30% on non-U.S. citizens on prizes valued at over $600. We will not assume that the players are American
citizens, because citizenship is not necessary to play the MEGA Millions lottery in California. The curious
reader can adjust the model for U.S. citizens alone by noting that the corresponding federal tax rate is 25%.
4.2 Maximum Likelihood Estimations for N
When dealing with situations similar to the lottery, there is a large amount of variance, which we can think
of as uncertainty in estimation. For example, consider the number of people per drawing and the equation
Npi = ni. For any total amount of participants N, this equation will give us an estimate of how many
people ni win prize i based o of the probability pi of winning that prize. Then if we know ni and pi and
want to estimate the value of N, one way we could do so is by dividing the number of winners ni by the
probability pi of winning that prize. Then N ≈ ni
pi
. But this estimate can vary widely: looking at Table 4.1
under the prize category for 5 matching numbers, we would estimate that N ≈ 3 · 3, 904, 701 = 11, 714, 103.
That would mean roughly one-third of California's population bought a ticket in the four days between
10/21/2011 (the date of the previous drawing) and 10/25/2011, assuming that each person buys only one
ticket. This does not seem very plausible for such a low jackpot. But if we choose to estimate the same N
using the prize category for 3 matching numbers, we get N ≈ 12, 465 · 306 = 3, 814, 290. Since the jackpot
for this particular drawing was not very high, it seems to be far more likely that a relatively small amount
of people participated in the drawing. As we have just seen, estimations that arise from simply dividing ni
by pi can produce a wide range of approximate values for N, giving us a wide range of error. We will then
expand on Abrams and Garibaldi's model by turning to maximum likelihood estimates.
Recall from Probability Theory: if we want the probability of events x1, · · · , xn dependent on a model
parameter θ, we would have a statement like Pr (x1, · · · , xn|θ). Here we take the model parameter θ as
a xed value and the values xj, where 0 j ≤ n, as unknowns. But what if we didn't know the model
parameters, and instead just observed the data x1, · · · , xn? In this case, we would want to nd the values
of θ that would maximize the probability of the events x1, · · · , xn actually occurring. Then we want to
maximize the statement L (θ|x1, · · · , xn). This is the concept of maximum likelihood estimation. Restated
dierently, while we may not know the actual values of certain parameters, we can take the parameter values
that will most likely give us our observed data.
17

Denition: The maximum likelihood estimation of a parameter θ is given by:
ˆθMLE = argmaxθL (θ|x1, . . . , xn)
where argmaxθ gives the value of θ that maximizes the argument L (θ|x1, . . . , xn).
Table 4.1
Prize Category: ni = # of Winners in CA: pi = Probability to Win,
1 in:
vi = Prize Amount:
5 + Mega 0 175,711,536 $ 39,900,000.00
5 3 3,904,701 $ 96,699.40
4 + Mega 8 689,065 $ 5,000.10
4 226 15,313 $ 168.00
3 + Mega 284 13,781 $ 152.00
3 12,465 306 $ 7.00
2 + Mega 4,945 844 $ 9.00
1 + Mega 29,321 141 $ 3.00
Mega 58,662 75 $ 1.00
Overall 105,914 39.89 n/a
Source: [4], MEGA Millions Draw #662, 10/25/2011. Prize values are after federal tax. Each vi =
riN/ni.
Example: Let us consider a coin that has been ipped 70 times. We know the outcome of each ip, but we
are unsure if the coin is fair. Then given that the coin landed as heads 32 times, we want to nd the
probability that a coin tossed 70 times shows heads 32 times. We have that for any particular ip,
heads shows with the probability θ, and tails shows with the probability 1 − θ. Then we want to
maximize
L (θ|H = 32, T = 38) = Pr (H = 32, T = 38|θ) =
70
32
θ32
(1 − θ)
38
,
where 0 ≤ p ≤ 1. The binomial coecient indicates we must choose which ips will give us a heads,
and we must do this 32 times. To maximize this, we will take the derivative with respect to θ and set
18

this equal to zero:
d
dθ
L (θ|H = 32, T = 38) =
d
dθ
70
32
θ32
(1 − θ)
38
= 32
70
32
θ31
(1 − θ)
38
− 38
70
32
θ32
(1 − θ)
37
=
70
32
θ31
(1 − θ)
37
[32 (1 − θ) − 38θ]
0 =
70
32
θ31
(1 − θ)
37
[32 − 70θ] .
Then L (θ) has critical points when θ = 0, θ = 1, and θ = 32
70 . However, when θ = 0 or θ = 1, we have
L (θ) = 0; when θ = 32
70 , we have L (θ) 0. Then the likelihood of the outcome H = 32 is maximized
for θ = 32
70 , so ˆθMLE = 32
70 . Of course, this makes sense: the likelihood that a coin will come out heads
32 out of 70 times is maximized when we have an unfair coin that will land as heads with probability
θ = 32
70 .
With this explanation of maximum likelihood estimation, we can now try to nd ˆNMLE for our model of
the lottery. In order to do so, we must maximize the function L (N|n1, . . . , nd) such that:
L (N|n1, . . . nd) = Pr (n1, . . . , nd|N) =
N
n1, . . . , nk, N −
k
i=1 ni
pn1
1 pn2
2 . . . pnk
k 1 −
k
i=1
pi
N− k
i=1 ni
.
Note that we have a multinomial coecient because multiple people win the same prizes, and we must choose
who wins. Now, let us represent Pr (n1, . . . , nd|N) as PN . Then the function L (N|n1, . . . nd) is maximized
when we have found an N such that
PN+1
PN
≈ 1. Then plugging this into Mathematica gives us:
1 ≈
PN+1
PN
=
N+1
n1,...,nk,(N+1)− k
i=1 ni
pn1
1 pn2
2 . . . pnk
k 1 −
k
i=1 pi
(N+1)− k
i=1 ni
N
n1,...,nk,N− k
i=1 ni
pn1
1 pn2
2 . . . pnk
k 1 −
d
i=1 pi
N− d
i=1 ni
ˆNMLE ≈
k
i=1 ni −
k
i=1 pi
k
i=1 pi
=
k
i=1 ni
k
i=1 pi
− 1.
S for any given drawing, we can use the data produced by the California Lottery to plug ni and pi in
to obtain our value for ˆNMLE. Basically, rather than getting a common denominator then adding up each
term, we simply add the terms in the numerator, and divide that sum by the sum of the denominators. For
19

Table 4.1, this is given by:
ˆNMLE ≈
0 + 3 + 8 + 226 + 284 + 12, 464 + 4, 945 + 29, 321 + 58, 662
1
174,711,536 + 1
3,904,701 + 1
689,065 + 1
15,313 + 1
13,781 + 1
306 + 1
844 + 1
141 + 1
75
− 1
≈ 4, 233, 484.3317
≈ 4, 233, 484.
Then for the MEGA Millions drawing on 10/25/2011, there were approximately 4,233,484 tickets purchased
in California. This is how we will compile data on population per drawing.
Note that the number obtained by maximum likelihood estimations is markedly lower than the average
of dividing the number of people who won each prize by the probability of winning this prize. If we were to
go with this method of averaging, we would have:
Navg ≈
1
9
0
1
174,711,536
+
3
1
3,904,701
+
8
1
689,065
+
226
1
15,313
+
284
1
13,781
+
12, 464
1
306
+
4, 945
1
844
+
29, 321
1
141
+
58, 662
1
75
≈ 4, 569, 216.2
≈ 4, 569, 216.
But this dierence would be even more pronounced if somebody had won the jackpot. In order to check the
validity of this statement, we will keep all other numbers constant, but change n1 from n1 = 0 to n1 = 1.
With this small change, we can easily nd that Navg ≈ 23,981,609 tickets, whereas ˆNMLE ≈ 4,233,524
tickets. Then for this small change in observed data, Navg increases wildly, while ˆNMLE has gone up by a
relatively minuscule amount. So while it is possible that 23,981,609 tickets were purchased and the average
is accurate, it is far more likely that 4,233,524 tickets were purchased. Since maximum likelihood estimations
typically minimize variance, we rely on this method to obtain estimations for N.
4.3 Obtaining the Other Necessary Data
Now that we have estimations for N, we can use these to estimate the rates ri that are paid out for each
pari-mutuel prize. Recall that all prizes in the MEGA Millions lottery are pari-mutuel: due to state law,
there are no xed prizes within California. Furthermore, because California is the only such state, we should
only consider California's data on drawing participation N and rates of compensation ri. We will ignore
national participation data and compensation rates. Due to the fact that there are no xed prizes, we must
20

also modify Abrams and Garibaldi's model by observing that:
f = 1 −
c
i=1 aitﬁx
i
t
(4.1)
= 1. (4.2)
Now, we should mention that since each ticket in the MEGA Millions lottery costs $1, the number of tickets
purchased N also represents the total amount of money spent by individuals per drawing. So we have
something of the form riN = nivi, where vi is the value of the i-th pari-mutuel prize, not including the
jackpot. Then we will approximate these rates by saying
ri ≈
vini
ˆNMLE
.
With this insight, for MEGA Millions Draw #662, we have the data in Table 4.2. Then for this drawing,
the lottery operator paid out a total of
8
i=1 ri ≈ 0.16 (approximately 16%) of the income N to pari-mutuel
prizes excluding the jackpot.
Table 4.2:
vini = $96, 699.40 ∗ 3 $5, 000.10 ∗ 8 $168 ∗ 226 $152 ∗ 284 $7 ∗ 12, 465
ri ≈ 0.0685246950 0.0094486716 0.0089684997 0.0101968024 0.0206106838
vini = $9 ∗ 4, 945 $3 ∗ 29, 321 $1 ∗ 58, 662
ri ≈ 0.0105126180 0.0207779219 0.0138566721
Source: [4], MEGA Millions Draw #662, 10/25/2011. Estimated with ˆNMLE ≈ $4, 233, 484. Prize
values are after federal taxes.
Now we must nd the jackpot cuto. Recall that J0 = Ft. Also remember that the MEGA Millions lottery
is set up so that a player chooses ve numbers, each between 1 and 56, and one MEGA number, which
can be between 1 and 46. We can choose ve numbers between 1 and 56 in
56
5 =3,819,816 dierent ways.
Then for each of those 3.8 million choices, we choose one number between 1 and 46, for a total of
46
1 =46
dierent ways. Now for the total number of distinct tickets, we multiply these together. Consequently, there
are 3, 819, 816 × 46 = 175, 711, 536 possible distinct tickets. We can substitute this and Equation 4.1 into
21

the formula for the jackpot cuto to get:
J0 = Ft
= f −
8
i=1
ri × 175, 711, 536
≈ (1 − 0.1628965645) × 175, 711, 536
≈ 147, 088, 730.
Then for the MEGA Millions lottery, the jackpot cuto is approximately $147,088,730. This assumes that
our estimates for the rates ri are both correct and constant. However, we have attempted to minimize error
by obtaining our N estimate through maximum likelihood estimation. And as for the constancy of the rates,
the lottery should keep them the same because any alterations would be a fundamental restructuring of the
game.
Now for Draw #662, the (eRoR) given by Equation 3.1 is equivalent to:
(eRoR) = −f +
d
i=1
ri 1 − (1 − pi)
xyJ0
+
1 − 1 − 1
t
N
x
≈ −1 +
d
i=1
ri 1 − (1 − pi)
N
+
J
N
1 − 0.9999999944233484
≈ −1 + (0.139703179) +
39900000
4233484
(0.0250810156)
≈ −1 + (0.139703179) + (0.236385096)
= −0.623911725
≈ −0.62.
Then with 4,233,484 purchased tickets, we have an expected rate of return of about -62%. So for every dollar
a person spent on this drawing, they received an average return of 38 cents, and lost a net total of 62 cents.
Table 4.3 compiles every time since 2009 where the jackpot value was above the jackpot cuto of
$147,088,730 after 30% federal taxes. Take note that we only deal with the nal jackpot values: since
the jackpot increases for every drawing in which somebody does not win the jackpot, for each date below,
the jackpot was higher than J0 for several drawings prior to the one listed. This shows there have been
numerous times where the (eRoR) has been positive.
Table 4.3:
22

Date: J : ˆNMLE : (eRoR) : J/J0 : N/J :
03/25/2011 $218,400,000 26,263,932 +32% 1.48 0.12
01/04/2011 $248,500,000* 35,706,523 +45% 1.69 0.14
05/04/2010 $186,200,000 24,072,463 +16% 1.27 0.13
08/28/2009 $235,200,000* 22,609,912 +42% 1.60 0.10
05/01/2009 $157,500,000** 9,790,701 +3% 1.07 0.06
03/03/2009 $148,400,000 11,742,182 -2% 1.01 0.08
Sources: [5, 4]. Jackpot values are after Federal taxes. The three columns on the right are estimated
based on ˆNMLE.
* Split between two winners.
** Split between three winners.
At this point, we can also produce our break-even curve. This is given by the graph in Figure 4.1. Note
that the yellow graph of the (eRoR) falls between our upper and lower bounds, which are blue and red
(respectively). Thus, our predictions are consistent with actual results. Furthermore, if we increased the
window, we would see that our functions never appear to touch. Since the yellow curve shows the break-even
curve (or equivalently, it shows where (eRoR) = 0 for the MEGA Millions lottery), by Theorem 3.7, any
point above the curve will give a positive (eRoR), and any point below or to the left will give a negative
(eRoR). The area between 0 ≤ x ≤ 0.3 where 1 ≤ y ≤ 1.1 is ambiguous, and must be handled on a case-by-
case basis. Consistent with our predictions, each point in Table 4.3 that lies above the yellow curve gives a
positive (eRoR), while the one point below, (0.08, 1.01), gives a negative (eRoR).
Figure 4.1:
23

The graph of U is given by the blue curve, the graph of (eRoR) is given by the yellow curve, and the
graph of L is given by the red curve. Abrams and Garibaldi proposed a dierent graph for the (eRoR),
given by the green curve. The blue dots are the plotted points from Table 4.3. Note also that x = N/J
and y = J/J0.
As it turns out, since the California lottery has its own setup for pari-mutuel prizes (and thus has its
own number of participants N), we have a slightly dierent curve than the one proposed in Abrams and
Garibaldi's paper. Our yellow break-even curve starts o above, but then quickly converges. Because the
graph of our (eRoR) break-even curve is higher than Abrams and Garibaldi's initially, California should
meet the conditions for positive (eRoR) less frequently. However, California seems to exhibit an advantage
because the number of participants within the state are markedly lower than the total number of national
participants... It seems that California's lack of xed prizes tends to be an advantage for the player in the
end. As noted in their paper, since the point (1, 2) is below our curve, for any drawing with N J and
J 2J0, we have a negative (eRoR). Nationally speaking, N will very frequently exceed J. But locally
speaking, this will happen relatively rarely because California players are a small subset of the national set,
and MEGA Millions jackpots start at $12,000,000, so players in our state have an advantage.
24

5 Conclusion
While we have isolated numerous scenarios in which the lottery has a positive rate of return, even for
California, these tend to be very few and far between. And when those circumstances do arise, there are
numerous other worthy investments that can allow for equal or similar returns on investment with much less
risk. Essentially, large jackpots skew the expected rate of return and can make an investment in the lottery
seem desirable even when in all likelihood, any particular person will not win the lottery. Overall, less risky
assets constitute better investments. Abrams and Garibaldi consider this on a more deeper level through
Portfolio Theory analysis, but suce to say that while the lottery can be a good bet, it is rarely a good
investment.
25

References
[1] Aaron Abrams; Skip Garibaldi. Finding good bets in the lottery, and why you shouldn't take them.
American Mathematical Monthly, 117:326, January 2010.
[2] Unknown. Amazing stats: The stats don't lie! Technical report, California Lottery, 2011.
http://www.calottery.com/WinnersGallery/AmazingStats/. Accessed on October 10, 2011.
[3] Unknown. How to play lotto texas. Technical report, Texas Lottery, 2011.
[4] Unknown. Past winning numbers. Technical report, California Lottery, 2011.
http://www.calottery.com/games/megamillions/winningnumbers/pastwinningnumbers.htm. Accessed
on 11/2/2011.
[5] Unknown. Previous results. Technical report, USAMEGA.com, 2011. http://www.usamega.com/mega-
millions-history.asp. Accessed on 11/3/2011.
26

Brian Bumpas-Final Draft

Recommended

Recommended

More Related Content

Similar to Brian Bumpas-Final Draft

Similar to Brian Bumpas-Final Draft (20)

Brian Bumpas-Final Draft