Lottery paradox csail-dec-2020.pptx

The Lottery Paradox
A New Use
MIT Computer Science & Artificial Intelligence Lab
Dec 1, 2020
Peter Cotton
Chief Data Scientist
Intech Investments

Hello. I work for Intech — a leading equity quant manager
2

I am asymptotically the world’s most productive data scientist
3
(Returns measured water height … somewhere … from NOAA)
Creates data stream
… or so I tell my boss
At the conclusion of a “ten minute data science project”, a data stream is predicted
by dozens of competing time series algorithms, written by different authors using
different tools, in different languages, with access to different exogenous data.

Outline
4
1. On the lottery paradox:
a. Positive returns
b. Continuous lotteries
c. Indifference to the market distribution
d. Relationship between returns and distance
2. Putting it to work:
a. Real-time distributional prediction
b. Stacking lottery games
c. Implied quantiles and copulas
d. Categories of business applications
3. An existence “proof” for a prediction network (that doesn’t exist)
a. The demise of artisan “data science”
b. Why algorithms will manage the production of prediction

Lottery paradox #1
6
Assume 10% rake. Buyer chooses 1 … 10,000. Most enter randomly.
Mary buys every possible ticket once.
16% return !

Lottery paradox resolution - simpler example
7
Mary benefits from Alice and Bob tripping on each other’s toes
Only two outcomes

Lottery paradox #2
8
Let W denote the average number of people sharing the prize.
Alice is a random ticket buyer.
Alice shares with approximately W other people

Lottery paradox #2 - resolution
9
In the case of two tickets (head and tails), Alice shares with W-½ others
(We can count)

Lottery paradox #2 - resolution
10
Alice’s average is a population average, not outcome average.
When Alice, Bob and Joe share the prize, it counts three times.
This allows the population average to exceed the average over tickets by almost 1

Lottery paradox #2 - better resolution
11
Alice wins with ticket 137 → Mean # of people choosing 137 goes up by almost +1.
(“Approximate Bayes”)
c.f. Mary winning conveys no information at all.

Lottery paradox #2 - even better resolution?
12
Consider Mary’s last ticket …. lucky by +1
But all her tickets are the same

Lottery paradox #3: Indifference
13
Suppose:
● No rake
● Mary’s investment is small
● Mary optimizes long run wealth
● Mary can see everyone else’s ticket choices

Lottery paradox #3: Indifference
14
Suppose:
● No rake
● Mary’s investment is small
● Mary optimizes long run wealth
● Mary can see everyone else’s ticket choices
⇒ Mary still buys one of each ticket
⇒ Mary doesn’t care what anyone else does !

Racetrack Paradox
15
Mary doesn’t look at the odds !

Racetrack paradox - resolution #1
16
Maximize
Constraint
First order Lagrange condition:
Thus must not depend on the horse index i

Racetrack paradox - elementary resolution
17
Transfer a tiny investment from first horse to the second
Follows that so they must be equal

Remark on Entropy and KL-Divergence
18
Entropy … also the term not involving q in Mary’s return
Kullback and Liebler cross-entropy
We can interpret distance of Q from truth P in terms of Mary’s return exploiting it

Now for something completely different?
19
Mayor draws from “Normalish” distribution
Participants write a real number down. All those close share the prize.

Mary’s Reward for Accuracy - Exponential
21
Market
Mary

Mary’s Reward for Accuracy - Normalish
22
Market error Mary’s return
10% 20 pts ( 0.2 )
1% 20 bps ( 0.002 )
● Use the fourth root transform to relate exponential
returns to market error … measured as a percentage of
standard deviation

23
2. Realtime distributional prediction

Algorithms Play Continuous Lottery Games
24
All day, every day
Algorithms authored by anyone Live data published by anyone

Algorithms submit 225 scenarios
25
Why not point estimates? Ask Roger Federer

Quarantine
26
Data arrives
12:37:52
Cutoff 12:36:42
70 second quarantine
12:36:57
12:37:27
12:37:46
Qualify for rewards
Time
12:34:27
12:35:41
17.56
Wind
speed
17.57
17.55
Reward
window

Implied Percentiles
27
Every incoming data point implies a new data point …
z = F(x)
where F is the “community” distribution function
Cumulative distribution for NY Electricity Production (Wind) 1 hr ahead

Example: Reactions to the presidential debate
Welcome Module 128
See https://www.microprediction.com/blog/tears_of_joy_standardizing_streaming_data

Stacking Lotteries
29
Market implied percentiles are themselves the subject of lottery games (via
normal quantile function)
Approximately N(0,1)
Algorithms predicting small
deviations from standard normal

Combine Percentiles
30
Some seemingly univariate series of games are actually copulas
Pitch and Yaw implied compulas - from MIT SciML helicopula challenge

Optics Analogy
31
Keep “lensing” until you get N(0,1)
Composition of monotone functions, each contributed by one or more algorithms

Pathways in the Collective Probability Brain
32
Scenarios “thrown” up to top level lottery
U( )
V( )
R( )
W( )
S( )
T( )
Collaboration
Q( )
Competition
Competition
Competition

Law of Iterated Expectations
33
Pathways grow and shrink based on the economics
Point estimates are a special case - shift
Exogenous data is a special case - shift arbitrarily (!)
E[Y|X]
E[E[Y|X]|Z]
Y
E[Y|S]
E[E[Y|S]|Z]
E[Y|S,Z,R] = E[E[E[Y|S]|Z]|R]
Scenarios thrown “up” into top level lottery
Management fees charged down from parent to child

Wanna Predict Something?
35
In any language (api.microprediction.org)

Use category #1: Auxiliary market predictions
36
Markets predict the mean of a stock well
Everything else (pretty much) is poorly predicted, due to lack
of the discipline imposed by competition.
● Volatilities,
● Correlations
● Bid-offer spreads
● Liquidity
● Trading costs
● Holding periods
● Client flow
● Response to inquiry
● Cover price

Use category #2: Prioritizing human work
37
e.g. reference data cleaning
Probability that a record is changed?
Which records will be changed?

Use category #3: Enhancing live data feeds
Welcome Module 138
Tagging.
Converting sporadic live data to continuous.
Discovering existing relationships
Predicting delayed data and partially filled data
Discovering good embeddings
Finding new exogenous data
Discovering good proxies for truth

Use category #4: Live feature discovery
Welcome Module 139
Chumming the water
Predicting quantities correlated with the quantity you truly care about
Determining which feature generation algorithms are suited to the task at hand

Use category #5: Enhancing business intelligence applications
Welcome Module 140
Predicting numbers on dashboards
Highlighting unusual movements
Predicting human reaction to information, or not (false positives)
Enabling humans to track a larger amount of data in real time

Use category #6: Fairness and explanation
Welcome Module 141
Discovering data that reveals hidden bias
Historical example: proxies for race, redlining

Usage category #7: Surrogate models
42
Competing and combining surrogate models for agent based epidemic modeling
https://www.microprediction.org/stream_dashboard.html?stream=pandemic_infected

43
3. An Existence Proof
(for an automated Machine Learning
network replacing artisan data science
in large part)

1 - Motherhood statement
Welcome Module 144
Quantitative business optimization will be a survival requirement for companies
(Machine Learning is set to transform all industries)

2 - Slightly more controversial...
Welcome Module 145
Quantitative business optimization using ML/AI = frequently repeated prediction
Control theory ~ RL ~ microprediction of value functions

3 - Obvious to MIT folks
Welcome Module 146
Strangers can do your ML for you

4 - Orthodox economics (local knowledge)
Welcome Module 147
At approximately zero friction, markets >> central planning by humans

5 - The rest is busywork ...
Welcome Module 148
Humans will not play a blocking role in the production of prediction
Machine Learning will be orchestrated by hierarchies of real-time generalized contests

50
• Wrote the front end
• Winning crawlers
• Clients in Java, Julia, Rust
• ZK-MUID proofs
• Monotonic NN’s
Thanks to Key Contributors. Join us !
Interested? Join us Friday’s at noon for informal contributor chat
https://www.microprediction.com/contact-us

Lottery paradox csail-dec-2020.pptx

Recommended

Recommended

More Related Content

Similar to Lottery paradox csail-dec-2020.pptx

Similar to Lottery paradox csail-dec-2020.pptx (20)

Recently uploaded

Recently uploaded (20)

Lottery paradox csail-dec-2020.pptx