SlideShare a Scribd company logo
The Lottery Paradox
A New Use
MIT Computer Science & Artificial Intelligence Lab
Dec 1, 2020
Peter Cotton
Chief Data Scientist
Intech Investments
Hello. I work for Intech — a leading equity quant manager
2
I am asymptotically the world’s most productive data scientist
3
(Returns measured water height … somewhere … from NOAA)
Creates data stream
… or so I tell my boss
At the conclusion of a “ten minute data science project”, a data stream is predicted
by dozens of competing time series algorithms, written by different authors using
different tools, in different languages, with access to different exogenous data.
Outline
4
1. On the lottery paradox:
a. Positive returns
b. Continuous lotteries
c. Indifference to the market distribution
d. Relationship between returns and distance
2. Putting it to work:
a. Real-time distributional prediction
b. Stacking lottery games
c. Implied quantiles and copulas
d. Categories of business applications
3. An existence “proof” for a prediction network (that doesn’t exist)
a. The demise of artisan “data science”
b. Why algorithms will manage the production of prediction
5
1. Lottery Paradoxes
Lottery paradox #1
6
Assume 10% rake. Buyer chooses 1 … 10,000. Most enter randomly.
Mary buys every possible ticket once.
16% return !
Lottery paradox resolution - simpler example
7
Mary benefits from Alice and Bob tripping on each other’s toes
Only two outcomes
Lottery paradox #2
8
Let W denote the average number of people sharing the prize.
Alice is a random ticket buyer.
Alice shares with approximately W other people
Lottery paradox #2 - resolution
9
In the case of two tickets (head and tails), Alice shares with W-½ others
(We can count)
Lottery paradox #2 - resolution
10
Alice’s average is a population average, not outcome average.
When Alice, Bob and Joe share the prize, it counts three times.
This allows the population average to exceed the average over tickets by almost 1
Lottery paradox #2 - better resolution
11
Alice wins with ticket 137 → Mean # of people choosing 137 goes up by almost +1.
(“Approximate Bayes”)
c.f. Mary winning conveys no information at all.
Lottery paradox #2 - even better resolution?
12
Consider Mary’s last ticket …. lucky by +1
But all her tickets are the same
Lottery paradox #3: Indifference
13
Suppose:
● No rake
● Mary’s investment is small
● Mary optimizes long run wealth
● Mary can see everyone else’s ticket choices
Lottery paradox #3: Indifference
14
Suppose:
● No rake
● Mary’s investment is small
● Mary optimizes long run wealth
● Mary can see everyone else’s ticket choices
⇒ Mary still buys one of each ticket
⇒ Mary doesn’t care what anyone else does !
Racetrack Paradox
15
Mary doesn’t look at the odds !
Racetrack paradox - resolution #1
16
Maximize
Constraint
First order Lagrange condition:
Thus must not depend on the horse index i
Racetrack paradox - elementary resolution
17
Transfer a tiny investment from first horse to the second
Follows that so they must be equal
Remark on Entropy and KL-Divergence
18
Entropy … also the term not involving q in Mary’s return
Kullback and Liebler cross-entropy
We can interpret distance of Q from truth P in terms of Mary’s return exploiting it
Now for something completely different?
19
Mayor draws from “Normalish” distribution
Participants write a real number down. All those close share the prize.
Same Game?
20
{1,2,...,10000}
Mary’s Reward for Accuracy - Exponential
21
Market
Mary
Mary’s Reward for Accuracy - Normalish
22
Market error Mary’s return
10% 20 pts ( 0.2 )
1% 20 bps ( 0.002 )
● Use the fourth root transform to relate exponential
returns to market error … measured as a percentage of
standard deviation
23
2. Realtime distributional prediction
Algorithms Play Continuous Lottery Games
24
All day, every day
Algorithms authored by anyone Live data published by anyone
Algorithms submit 225 scenarios
25
Why not point estimates? Ask Roger Federer
Quarantine
26
Data arrives
12:37:52
Cutoff 12:36:42
70 second quarantine
12:36:57
12:37:27
12:37:46
Qualify for rewards
Time
12:34:27
12:35:41
17.56
Wind
speed
17.57
17.55
Reward
window
Implied Percentiles
27
Every incoming data point implies a new data point …
z = F(x)
where F is the “community” distribution function
Cumulative distribution for NY Electricity Production (Wind) 1 hr ahead
Example: Reactions to the presidential debate
Welcome Module 128
See https://www.microprediction.com/blog/tears_of_joy_standardizing_streaming_data
Stacking Lotteries
29
Market implied percentiles are themselves the subject of lottery games (via
normal quantile function)
Approximately N(0,1)
Algorithms predicting small
deviations from standard normal
Combine Percentiles
30
Some seemingly univariate series of games are actually copulas
Pitch and Yaw implied compulas - from MIT SciML helicopula challenge
Optics Analogy
31
Keep “lensing” until you get N(0,1)
Composition of monotone functions, each contributed by one or more algorithms
Pathways in the Collective Probability Brain
32
Scenarios “thrown” up to top level lottery
U( )
V( )
R( )
W( )
S( )
T( )
Collaboration
Q( )
Competition
Competition
Competition
Law of Iterated Expectations
33
Pathways grow and shrink based on the economics
Point estimates are a special case - shift
Exogenous data is a special case - shift arbitrarily (!)
E[Y|X]
E[E[Y|X]|Z]
Y
E[Y|S]
E[E[Y|S]|Z]
E[Y|S,Z,R] = E[E[E[Y|S]|Z]|R]
Scenarios thrown “up” into top level lottery
Management fees charged down from parent to child
Wanna Play?
34
Wanna Predict Something?
35
In any language (api.microprediction.org)
Use category #1: Auxiliary market predictions
36
Markets predict the mean of a stock well
Everything else (pretty much) is poorly predicted, due to lack
of the discipline imposed by competition.
● Volatilities,
● Correlations
● Bid-offer spreads
● Liquidity
● Trading costs
● Holding periods
● Client flow
● Response to inquiry
● Cover price
Use category #2: Prioritizing human work
37
e.g. reference data cleaning
Probability that a record is changed?
Which records will be changed?
Use category #3: Enhancing live data feeds
Welcome Module 138
Tagging.
Converting sporadic live data to continuous.
Discovering existing relationships
Predicting delayed data and partially filled data
Discovering good embeddings
Finding new exogenous data
Discovering good proxies for truth
Use category #4: Live feature discovery
Welcome Module 139
Chumming the water
Predicting quantities correlated with the quantity you truly care about
Determining which feature generation algorithms are suited to the task at hand
Use category #5: Enhancing business intelligence applications
Welcome Module 140
Predicting numbers on dashboards
Highlighting unusual movements
Predicting human reaction to information, or not (false positives)
Enabling humans to track a larger amount of data in real time
Use category #6: Fairness and explanation
Welcome Module 141
Discovering data that reveals hidden bias
Historical example: proxies for race, redlining
Usage category #7: Surrogate models
42
Competing and combining surrogate models for agent based epidemic modeling
https://www.microprediction.org/stream_dashboard.html?stream=pandemic_infected
43
3. An Existence Proof
(for an automated Machine Learning
network replacing artisan data science
in large part)
1 - Motherhood statement
Welcome Module 144
Quantitative business optimization will be a survival requirement for companies
(Machine Learning is set to transform all industries)
2 - Slightly more controversial...
Welcome Module 145
Quantitative business optimization using ML/AI = frequently repeated prediction
Control theory ~ RL ~ microprediction of value functions
3 - Obvious to MIT folks
Welcome Module 146
Strangers can do your ML for you
4 - Orthodox economics (local knowledge)
Welcome Module 147
At approximately zero friction, markets >> central planning by humans
5 - The rest is busywork ...
Welcome Module 148
Humans will not play a blocking role in the production of prediction
Machine Learning will be orchestrated by hierarchies of real-time generalized contests
Thanks for listening !
49
50
• Wrote the front end
• Winning crawlers
• Clients in Java, Julia, Rust
• ZK-MUID proofs
• Monotonic NN’s
Thanks to Key Contributors. Join us !
Interested? Join us Friday’s at noon for informal contributor chat
https://www.microprediction.com/contact-us

More Related Content

Similar to Lottery paradox csail-dec-2020.pptx

44 randomized-algorithms
44 randomized-algorithms44 randomized-algorithms
44 randomized-algorithms
AjitSaraf1
 
Neo4j GraphDay Seattle- Sept19- graphs are ai
Neo4j GraphDay Seattle- Sept19-  graphs are aiNeo4j GraphDay Seattle- Sept19-  graphs are ai
Neo4j GraphDay Seattle- Sept19- graphs are ai
Neo4j
 
Data Science An Engineering Implementation Perspective
Data Science An Engineering Implementation PerspectiveData Science An Engineering Implementation Perspective
Data Science An Engineering Implementation Perspective
Lalit Mohan Chandra Bhatt
 
Pseudo-Random Number Generators: A New Approach
Pseudo-Random Number Generators: A New ApproachPseudo-Random Number Generators: A New Approach
Pseudo-Random Number Generators: A New Approach
Nithin Prince John
 
Understanding the fundamentals of attacks
Understanding the fundamentals of attacksUnderstanding the fundamentals of attacks
Understanding the fundamentals of attacks
Cyber Security Alliance
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
Dilum Bandara
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networks
David Balduzzi
 
Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714
IBM Thailand Co Ltd
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
InfoTrust LLC
 
Software testing
Software testingSoftware testing
Software testing
DIPEN SAINI
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Mathieu DESPRIEE
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
OCTO Technology
 
IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...
IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...
IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...
The Hive
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
DevGAMM Conference
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
pradeep kumar
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
Venkata Reddy Konasani
 
Week14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptxWeek14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptx
Usamamalik345378
 
Little book of programming challenges
Little book of programming challengesLittle book of programming challenges
Little book of programming challenges
ysolanki78
 
Estimating default risk in fund structures
Estimating default risk in fund structuresEstimating default risk in fund structures
Estimating default risk in fund structures
IFMR
 
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
Codemotion
 

Similar to Lottery paradox csail-dec-2020.pptx (20)

44 randomized-algorithms
44 randomized-algorithms44 randomized-algorithms
44 randomized-algorithms
 
Neo4j GraphDay Seattle- Sept19- graphs are ai
Neo4j GraphDay Seattle- Sept19-  graphs are aiNeo4j GraphDay Seattle- Sept19-  graphs are ai
Neo4j GraphDay Seattle- Sept19- graphs are ai
 
Data Science An Engineering Implementation Perspective
Data Science An Engineering Implementation PerspectiveData Science An Engineering Implementation Perspective
Data Science An Engineering Implementation Perspective
 
Pseudo-Random Number Generators: A New Approach
Pseudo-Random Number Generators: A New ApproachPseudo-Random Number Generators: A New Approach
Pseudo-Random Number Generators: A New Approach
 
Understanding the fundamentals of attacks
Understanding the fundamentals of attacksUnderstanding the fundamentals of attacks
Understanding the fundamentals of attacks
 
Introduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive AnalyticsIntroduction to Descriptive & Predictive Analytics
Introduction to Descriptive & Predictive Analytics
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networks
 
Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714Machine learning for bestt group - 20170714
Machine learning for bestt group - 20170714
 
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
Big Data Analytics: The Math, the Implementation and How it can be Effectivel...
 
Software testing
Software testingSoftware testing
Software testing
 
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
Big Data & Machine Learning - TDC2013 São Paulo - 12/0713
 
Big Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao PauloBig Data & Machine Learning - TDC2013 Sao Paulo
Big Data & Machine Learning - TDC2013 Sao Paulo
 
IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...
IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...
IOT Use Cases by Derick Jose - Co-founder and Chief Product Officer of M2M pl...
 
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...Applied Data Science for monetization: pitfalls, common misconceptions, and n...
Applied Data Science for monetization: pitfalls, common misconceptions, and n...
 
Ml ppt at
Ml ppt atMl ppt at
Ml ppt at
 
Machine Learning for Dummies
Machine Learning for DummiesMachine Learning for Dummies
Machine Learning for Dummies
 
Week14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptxWeek14_Business Simulation Modeling MSBA.pptx
Week14_Business Simulation Modeling MSBA.pptx
 
Little book of programming challenges
Little book of programming challengesLittle book of programming challenges
Little book of programming challenges
 
Estimating default risk in fund structures
Estimating default risk in fund structuresEstimating default risk in fund structures
Estimating default risk in fund structures
 
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
Amanda Sopkin - Computational Randomness: Creating Chaos in an Ordered Machin...
 

Recently uploaded

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
Social Samosa
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
bmucuha
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
facilitymanager11
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
a9qfiubqu
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
ihavuls
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Kaxil Naik
 

Recently uploaded (20)

一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
一比一原版(CU毕业证)卡尔顿大学毕业证如何办理
 
Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024Monthly Management report for the Month of May 2024
Monthly Management report for the Month of May 2024
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
原版一比一弗林德斯大学毕业证(Flinders毕业证书)如何办理
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
原版制作(unimelb毕业证书)墨尔本大学毕业证Offer一模一样
 
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...
 

Lottery paradox csail-dec-2020.pptx

  • 1. The Lottery Paradox A New Use MIT Computer Science & Artificial Intelligence Lab Dec 1, 2020 Peter Cotton Chief Data Scientist Intech Investments
  • 2. Hello. I work for Intech — a leading equity quant manager 2
  • 3. I am asymptotically the world’s most productive data scientist 3 (Returns measured water height … somewhere … from NOAA) Creates data stream … or so I tell my boss At the conclusion of a “ten minute data science project”, a data stream is predicted by dozens of competing time series algorithms, written by different authors using different tools, in different languages, with access to different exogenous data.
  • 4. Outline 4 1. On the lottery paradox: a. Positive returns b. Continuous lotteries c. Indifference to the market distribution d. Relationship between returns and distance 2. Putting it to work: a. Real-time distributional prediction b. Stacking lottery games c. Implied quantiles and copulas d. Categories of business applications 3. An existence “proof” for a prediction network (that doesn’t exist) a. The demise of artisan “data science” b. Why algorithms will manage the production of prediction
  • 6. Lottery paradox #1 6 Assume 10% rake. Buyer chooses 1 … 10,000. Most enter randomly. Mary buys every possible ticket once. 16% return !
  • 7. Lottery paradox resolution - simpler example 7 Mary benefits from Alice and Bob tripping on each other’s toes Only two outcomes
  • 8. Lottery paradox #2 8 Let W denote the average number of people sharing the prize. Alice is a random ticket buyer. Alice shares with approximately W other people
  • 9. Lottery paradox #2 - resolution 9 In the case of two tickets (head and tails), Alice shares with W-½ others (We can count)
  • 10. Lottery paradox #2 - resolution 10 Alice’s average is a population average, not outcome average. When Alice, Bob and Joe share the prize, it counts three times. This allows the population average to exceed the average over tickets by almost 1
  • 11. Lottery paradox #2 - better resolution 11 Alice wins with ticket 137 → Mean # of people choosing 137 goes up by almost +1. (“Approximate Bayes”) c.f. Mary winning conveys no information at all.
  • 12. Lottery paradox #2 - even better resolution? 12 Consider Mary’s last ticket …. lucky by +1 But all her tickets are the same
  • 13. Lottery paradox #3: Indifference 13 Suppose: ● No rake ● Mary’s investment is small ● Mary optimizes long run wealth ● Mary can see everyone else’s ticket choices
  • 14. Lottery paradox #3: Indifference 14 Suppose: ● No rake ● Mary’s investment is small ● Mary optimizes long run wealth ● Mary can see everyone else’s ticket choices ⇒ Mary still buys one of each ticket ⇒ Mary doesn’t care what anyone else does !
  • 16. Racetrack paradox - resolution #1 16 Maximize Constraint First order Lagrange condition: Thus must not depend on the horse index i
  • 17. Racetrack paradox - elementary resolution 17 Transfer a tiny investment from first horse to the second Follows that so they must be equal
  • 18. Remark on Entropy and KL-Divergence 18 Entropy … also the term not involving q in Mary’s return Kullback and Liebler cross-entropy We can interpret distance of Q from truth P in terms of Mary’s return exploiting it
  • 19. Now for something completely different? 19 Mayor draws from “Normalish” distribution Participants write a real number down. All those close share the prize.
  • 21. Mary’s Reward for Accuracy - Exponential 21 Market Mary
  • 22. Mary’s Reward for Accuracy - Normalish 22 Market error Mary’s return 10% 20 pts ( 0.2 ) 1% 20 bps ( 0.002 ) ● Use the fourth root transform to relate exponential returns to market error … measured as a percentage of standard deviation
  • 24. Algorithms Play Continuous Lottery Games 24 All day, every day Algorithms authored by anyone Live data published by anyone
  • 25. Algorithms submit 225 scenarios 25 Why not point estimates? Ask Roger Federer
  • 26. Quarantine 26 Data arrives 12:37:52 Cutoff 12:36:42 70 second quarantine 12:36:57 12:37:27 12:37:46 Qualify for rewards Time 12:34:27 12:35:41 17.56 Wind speed 17.57 17.55 Reward window
  • 27. Implied Percentiles 27 Every incoming data point implies a new data point … z = F(x) where F is the “community” distribution function Cumulative distribution for NY Electricity Production (Wind) 1 hr ahead
  • 28. Example: Reactions to the presidential debate Welcome Module 128 See https://www.microprediction.com/blog/tears_of_joy_standardizing_streaming_data
  • 29. Stacking Lotteries 29 Market implied percentiles are themselves the subject of lottery games (via normal quantile function) Approximately N(0,1) Algorithms predicting small deviations from standard normal
  • 30. Combine Percentiles 30 Some seemingly univariate series of games are actually copulas Pitch and Yaw implied compulas - from MIT SciML helicopula challenge
  • 31. Optics Analogy 31 Keep “lensing” until you get N(0,1) Composition of monotone functions, each contributed by one or more algorithms
  • 32. Pathways in the Collective Probability Brain 32 Scenarios “thrown” up to top level lottery U( ) V( ) R( ) W( ) S( ) T( ) Collaboration Q( ) Competition Competition Competition
  • 33. Law of Iterated Expectations 33 Pathways grow and shrink based on the economics Point estimates are a special case - shift Exogenous data is a special case - shift arbitrarily (!) E[Y|X] E[E[Y|X]|Z] Y E[Y|S] E[E[Y|S]|Z] E[Y|S,Z,R] = E[E[E[Y|S]|Z]|R] Scenarios thrown “up” into top level lottery Management fees charged down from parent to child
  • 35. Wanna Predict Something? 35 In any language (api.microprediction.org)
  • 36. Use category #1: Auxiliary market predictions 36 Markets predict the mean of a stock well Everything else (pretty much) is poorly predicted, due to lack of the discipline imposed by competition. ● Volatilities, ● Correlations ● Bid-offer spreads ● Liquidity ● Trading costs ● Holding periods ● Client flow ● Response to inquiry ● Cover price
  • 37. Use category #2: Prioritizing human work 37 e.g. reference data cleaning Probability that a record is changed? Which records will be changed?
  • 38. Use category #3: Enhancing live data feeds Welcome Module 138 Tagging. Converting sporadic live data to continuous. Discovering existing relationships Predicting delayed data and partially filled data Discovering good embeddings Finding new exogenous data Discovering good proxies for truth
  • 39. Use category #4: Live feature discovery Welcome Module 139 Chumming the water Predicting quantities correlated with the quantity you truly care about Determining which feature generation algorithms are suited to the task at hand
  • 40. Use category #5: Enhancing business intelligence applications Welcome Module 140 Predicting numbers on dashboards Highlighting unusual movements Predicting human reaction to information, or not (false positives) Enabling humans to track a larger amount of data in real time
  • 41. Use category #6: Fairness and explanation Welcome Module 141 Discovering data that reveals hidden bias Historical example: proxies for race, redlining
  • 42. Usage category #7: Surrogate models 42 Competing and combining surrogate models for agent based epidemic modeling https://www.microprediction.org/stream_dashboard.html?stream=pandemic_infected
  • 43. 43 3. An Existence Proof (for an automated Machine Learning network replacing artisan data science in large part)
  • 44. 1 - Motherhood statement Welcome Module 144 Quantitative business optimization will be a survival requirement for companies (Machine Learning is set to transform all industries)
  • 45. 2 - Slightly more controversial... Welcome Module 145 Quantitative business optimization using ML/AI = frequently repeated prediction Control theory ~ RL ~ microprediction of value functions
  • 46. 3 - Obvious to MIT folks Welcome Module 146 Strangers can do your ML for you
  • 47. 4 - Orthodox economics (local knowledge) Welcome Module 147 At approximately zero friction, markets >> central planning by humans
  • 48. 5 - The rest is busywork ... Welcome Module 148 Humans will not play a blocking role in the production of prediction Machine Learning will be orchestrated by hierarchies of real-time generalized contests
  • 50. 50 • Wrote the front end • Winning crawlers • Clients in Java, Julia, Rust • ZK-MUID proofs • Monotonic NN’s Thanks to Key Contributors. Join us ! Interested? Join us Friday’s at noon for informal contributor chat https://www.microprediction.com/contact-us