3. Background
• Vice President at Sorin Capital Management @ Stamford, CT
• Education
• PhD in Statistics, UC Irvine
• BE in Automotive Engineering, Tsinghua University
• Advisors:
• Hal Stern, author of Bayesian Data Analysis
• Yaming Yu
• Previous experiences
• Brain Imaging Center, Amen Clinics, Map Alternative Asset Management, Centricity
• Blizzard Entertainment
• Yahoo Inc.
• Validus Holdings
• Industries: Medical, Gaming, Financial Services, IT, (Re)-insurance, Hedge Fund
4. Not a statistics 101/951 course about theories and applications
Not an industrial high-tech seminar about cutting-edge big data and deep learning
technologies
Not a Kaggle data science competition winning solution discussion session
Not a discussion on how AI defeated the best Go players, Dota 2 players, etc.
What is this talk not about?
5. What is this talk about?
Sharing
• Views
• Key concepts
1
Clarifying
• Compare and contrast
2
Opening
• Removing psychological
barriers
• Opening doors
3
6. Hebrew 4:12
For the word of God is living
and operative and sharper
than any two-edged sword,
and piercing even to the
dividing of soul and spirit and
of joints and marrow, and able
to discern the thoughts and
intentions of the heart.
7. Data Scientist
• “Data Scientist: The Sexiest Job of the 21st Century” –HBR (2012)
• Best job in 2016 according to Glassdoor.
• “Know more computer science than a statistician, and know more
statistics than a computer scientist.”
• Requires a large set of knowledge.
• Programming is at the core.
• Online- forum is the best teacher.
8. Why is data so important suddenly?
• Data is giving rise to a new economy-The Economist (2017)
• Data science vs oil refinery
• Pipelines
• Producing crucial feedstocks
• Driver of growth and change
• One-man shop
9. Philosophies
Limited by our
perception of the
world
Data is a way of
describing the world
Abstraction of data to
achieve a better
understanding of the
world
Make intelligent
actions
Physical law
Statistical law
• Two routes
• Two languages
• Two schools of thoughts
Small data versus
big data
10. Unreasonable effectiveness of data
Volume
• Explosion of the amount of
data
• Changed the way of thinking
about how to use data
Variety
• Great varieties of data are
recorded
• More angles to look at things
Velocity
• Real-time data streamline
• Online and offline
11. • Famous paper: The Two Cultures (Breiman 2001)
• Former trying to explore the statistical nature of a data generating process, hence The Data Modeling Culture.
• Later trying to create a black-box with optimal performance, hence The Algorithmic Modeling Culture.
Statistical models vs mathematical models
12. Causation vs association
• Causation, cause-and-effect:
• Merovingian is a big believer in The Matrix Reloaded (2003).
• Straight-forward
• Easy to take actions
• Can be misleading
• Hard to find in real world
• Association:
• Closer to real-world phenomena
• Hard to control
• Allow variability
• Chinese philosophy?
13. Prediction vs Inference
• Prediction: predict what the responses are going to be to future input
variables.
• Inference: extract some information about how nature is associating the
response variables to the input variables.
14. Academic complication vs industrial simplification
• Enhance the maximum human potential
• Achieve maximum value creation efficiency, portability, maintainability,
maximum cost-effectiveness
15. Mathematical correctness vs practical correctness
• Principal component regression
• It’s recommended to scale the features (ESL).
• Not so simple in practice
• Total return indices have different coupon for each series
• Mathematically correct way
• Practically correct way
16. Knowing a model vs knowing a model really well
• Different levels of modeling
• Being able to do the work
• Being able to solve a problem
• Being able to find insights
• Being able to achieve maximum potential
• Examples
• Flood model
• Movie box office revenues
17. Asking the right question vs finding the right
solution
• The right questions: push towards the solution to a real problem.
• Once the right questions are asked and hence the right goals are set, the
proper solutions need to be found.
18. Information is chained up
• The output of one model is the input of another
• Layers of modeling
• Cargo model in actuarial science
• Vacancy rates in credit modeling in CMBS
• Expected Remaining Lifetime at Blizzard Entertainment in financial budgeting
• Bayesian hierarchical modeling the right way to go
• Complicated
• Very specialized
19. Data science in quantitative trading
• Trading is the best arena for quantitative techniques.
• Demo
20. Concluding remarks
• Data science is a big field.
• Knowledge, skills, talents and experience are all important.
• Projection is positive.
• Worth the time, effort and pains.
• Don’t be lone wolfs, even though one-man job.
• Sharing and collaboration are keys.