Successfully reported this slideshow.
Your SlideShare is downloading. ×

An Intro to Machine Learning For Quant Investment: "Eventually Probably Approximately Correct", Presented at Q-Group Fall 2017 Seminar

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 44 Ad

An Intro to Machine Learning For Quant Investment: "Eventually Probably Approximately Correct", Presented at Q-Group Fall 2017 Seminar

Presented at the Institute for Research in Quantitative Finance (Q-Group)
Fall 2017 Seminar, Vancouver BC, 10/16/2017
This is a brief Machine Learning overview for experienced Financial Quants who already know the math. The presentation is intended to fill in the gaps regarding what ML is and is not and differentiate what a quant needs to know from the cacophony of hype in the current environment. It is a tool to help the reader decide whether to invest the time necessary to master the basic methods of ML and use them in a way that tangibly benefits the investment process. If you just want to put Deep Learning on your resume, or toss Big Data into a blender and start backtesting, or use canned libraries without quantifiable confidence bounds and rational justification for your results, while avoiding troublesome mathematics and laborious thought, this probably not a good resource. At the end of the presentation, there is a list of books to read which have exercises you must perform in order to do the actual learning. Expect to spend three to twelve months depending on how far you want to take it.

Presented at the Institute for Research in Quantitative Finance (Q-Group)
Fall 2017 Seminar, Vancouver BC, 10/16/2017
This is a brief Machine Learning overview for experienced Financial Quants who already know the math. The presentation is intended to fill in the gaps regarding what ML is and is not and differentiate what a quant needs to know from the cacophony of hype in the current environment. It is a tool to help the reader decide whether to invest the time necessary to master the basic methods of ML and use them in a way that tangibly benefits the investment process. If you just want to put Deep Learning on your resume, or toss Big Data into a blender and start backtesting, or use canned libraries without quantifiable confidence bounds and rational justification for your results, while avoiding troublesome mathematics and laborious thought, this probably not a good resource. At the end of the presentation, there is a list of books to read which have exercises you must perform in order to do the actual learning. Expect to spend three to twelve months depending on how far you want to take it.

Advertisement
Advertisement

More Related Content

Recently uploaded (20)

Advertisement

An Intro to Machine Learning For Quant Investment: "Eventually Probably Approximately Correct", Presented at Q-Group Fall 2017 Seminar

  1. 1. Q-Group Fall 2017 Monday Oct 16 Special Track on Machine Learning Presenter Walter Alden Tackett, CFA
  2. 2. Introduction: your time is valuable • You’re covered for prerequisites of Machine Learning You are a specialist in quantitative finance • Mastery requires time, effort, and opportunity cost Any tool for investment must first be mastered • There are mixed signals with extreme values Is Machine Learning worth your time? • The differences are part of a bigger picture. ML and Quant Finance have the same lineage.
  3. 3. #DEFINE A Key to Acronyms and Abbreviations Machine Learning Computer Science OR Operations Research EE Electrical Engineering CS Artificial Intelligence nn Neural Networks IT Information Technology BS Big Data Science CS EEOR AI NN IT BS Acronyms Only: terms will be defined where needed ML
  4. 4. About the term Machine Learning Priority: Used by Arthur Samuel in the title of the first journal article describing the first ML program. •Called Statistical Learning by … (spoiler alert) … statisticians. Related, not synonymous: Statistical Data Analysis (SDA, due to Tukey) and recently Data Science are more inclusive: •ML, Very-Large Databases, and Visualization Graphics all fall under SDA and Data Science. Problem: The word Learning is anthropomorphic … •… which contributes to the AI hype- cycle. Adaptive Sampled-Data Statistical Computing would be most descriptive. •… but not many people would talk to you.
  5. 5. Machine Learning, Big Data, Data Science, and Artificial Intelligence: Hard questions ML successfully recommends Amazon & Netflix products – how is that relevant to price returns? Algorithms can learn if email is spam or not spam – does that translate to buy / sell / hold? Does ML always require large datasets, labor- intensive labeled samples, or categorical data? Where are the Quant Finance “Wins?”
  6. 6. Machine Learning, Big Data, Data Science, and Artificial Intelligence: Maybe not? Isn’t there some kind of sketchy relationship between Machine Learning and: •Data Mining? •Back-testing? •Expert Systems? 1 Do market data’s well- known issues thwart “Big Data Science?” •What qualifies data as Big data? •Has all useful information already been strip-mined from financial data? 2 Isn’t this maybe the 3rd generation of AI- related investment hype? • Neural Networks… again? Only “deeper?” • A bigger box of logistic regression and fairy dust? 3 Phoenix Wright: Ace Attorney Copyright © 2001-2017 Capcom Co., Ltd.
  7. 7. Machine Learning and Artificial Intelligence: Reviewing the Fear, Uncertainty, and Doubt Have you heard the news? • “Goldman Sachs replaced 100 traders with 40 computer engineers. Or 20, or something.” • Bonus points: “There will be no jobs for humans by 20xx.” AI is a threat to humanity according to… • Elon Musk • Stephen Hawking • Nick Bostrum • All books on the Amazon AI Threats to Humanity list* Should Quant Investors be concerned about… • Disruption similar to effect of HFT on trading? • Computers beating human champions at poker? • AI taking their jobs? • Human extinction? * List does not currently exist on Amazon, but consider that an AI program may be deciding what topics get their own list.
  8. 8. BS Bonanza: commodity Big Data Science is not a starting point for investment strategy Problem: Businesses have large stores of data to mine, much of it unstructured Innovation: Powerful, free, easy-to- use statistical packages are available - just add data Demand: Workers are needed who can run the packages on the data, ASAP Result: Big Data marketed, sold, bought, implemented on minimum time & cost basis Precedent: US Businesses converted CS and computing to commodity minimum- cost IT model in the 2000’s Effect: A new Big Data Science (BS) model compounding ignorance of CS, statistics, and bona fide data science Conclusion: Like the IT model, the BS model and associated hype aren’t predictive of investment success Prescription: Define Machine Learning from first principles, and relate it to Quant Finance practice
  9. 9. First Principles
  10. 10. Machine Learning systems optimize their response to input using input samples and automated adjustment. The Policy maps input to response based on its Parameters. The Loss Function measures map quality. Update of Policy Parameters minimizes Loss for expected out-of- sample data … Optimizing for out-of-sample expectation is called generalization. Generalization is the central problem of ML : • How and why will unobservable data differ from in-sample data? See: First page of the first article about the first ML program*. * Samuel, Arthur L. “Some Studies in Machine Learning Using the Game of Checkers.” IBM Journal of Research and Development 3, no. 3 (1959): 210–229. … “from the future”
  11. 11. Generalization: policy updates that minimize expected out-of-sample loss. … which must generalize from in-sample data. Out-of-sample data is unobservable on update …
  12. 12. Generalization: policy updates that minimize expected out-of-sample loss. Infer bird B∈{f, w, s} based on examples Ask a Human: write a program to solve White Swan. … which must generalize from in-sample data. Out-of-sample data is unobservable on update …
  13. 13. Generalization: policy updates that minimize expected out-of-sample loss. Infer bird B∈{f, w, s} based on examples Ask a Human: write a program to solve White Swan. Multiple predictors: color & height … • Idea: Use “many” features. Are some better than others? Rank: not just on in-sample performance • Consider: shape properties invariant to rotation, scale, noise ... … which must generalize from in-sample data. Out-of-sample data is unobservable on update …
  14. 14. Generalization: policy updates that minimize expected out-of-sample loss. Infer bird B∈{f, w, s} based on examples Ask a Human: write a program to solve White Swan. Multiple predictors: color & height … • Idea: Use “many” features. Are some better than others? Rank: not just on in-sample performance • Consider: shape properties invariant to rotation, scale, noise ... Robust features: e2/e1, convex hull/perimeter. • We humans “solved it.” Computers generalizing in this way is the goal of ML. … which must generalize from in-sample data. Out-of-sample data is unobservable on update …
  15. 15. Use ML method for White Swan, assuming good features and policies are unknown. Standardize in- and out-of-sample data: • Resize to 4096 colored pixels; Set each pixel value to 1 or 0.
  16. 16. Use ML method for White Swan, assuming good features and policies are unknown. Standardize in- and out-of-sample data: • Resize to 4096 colored pixels; Set each pixel value to 1 or 0. Add perturbations to in-sample data. • Adversary ML: makes data, loss func= -1*(loss of learner). Train a system to tag images by logo. •Logos above from template png files. •Synthesize data based on templates and source description. Images forthcoming •From public sightings varied lighting and perspective. •Shot with mobile. •Logo colors may differ. Expect background (mostly) removed. •Remaining foreground pixels number between N/4 and 4N. •N based on foreground pixels in templates. Use-case for template-based expected-distortion sampling or adversarial data generation:
  17. 17. Use ML method for White Swan, assuming good features and policies are unknown. Standardize in- and out-of-sample data: • Resize to 4096 colored pixels; Set each pixel value to 1 or 0. Add perturbations to in-sample data. • Adversary ML: makes data, loss func= -1*(loss of learner). Data Analysis: note owl eyes, flamingo legs… • Choose wisely: convolutional NN were literally made for 2D pattern rec. Must separate (extended) data into at least three partitions: train, test, and validate. Always true in ML. Train a system to tag images by logo. •Logos above from template png files. •Synthesize data based on templates and source description. Images forthcoming •From public sightings varied lighting and perspective. •Shot with mobile. •Logo colors may differ. Expect background (mostly) removed. •Remaining foreground pixels number between N/4 and 4N. •N based on foreground pixels in templates. Use-case for template-based expected-distortion sampling or adversarial data generation:
  18. 18. Train, Test, Validate
  19. 19. Machine Learning and Regression: Using Lasso and LARS The White Swan example is a supervised classification problem. • Popular misconception: ML can only be used for discrete classification Machine Learning addresses every type of statistics used by quants Regression & factor selection are key ML topics – a good starting point … Regularized Regression: the Lasso and LARS Lasso: Combines OLS with constrained optimization, minimizing MSE subject to: sum(abs(Coefficients)) ≤ C LARS efficiently finds unique optimal “path” of coefficients as C ranges over [0…∞) This “budgeting” approach eliminates factors with low discriminant power The sum(abs(…)) L1-norm tends to drive coefficients of “weak” variables to zero.
  20. 20. Lasso in Academic Finance and Practical use in Testing with Noise Several recent papers apply Lasso to factor identification. Typically these propose Lasso “improvements” that favor the factor corresponding to some economic hypothesis X. Lasso and LARS are central to research in sparsity (more factors than data), with texts and R code available for free. This makes an accessible entry point for quants to explore hypothesis X hands-on. 2016 - 2017 saw interest in the Lasso among the factor testing community. 1.Create 𝟏 𝒑⁄ =100 random time series similar to factor fn*. Let bogus factor bn ≡ the most significant according to univariate regression on y. K≥1 iterations create a family of K bogus factors bnk~fn. Repeat for each nÎN. Will the real fn please stand out? Use Lasso, LARS, and Bayes’ rule on panels drawn from actual fn and bogus factors bnk to test factors’ significance. Large K supports typicality analysis of results given 𝒑. Lasso-based significance of N factor time series explaining y: *Similar to factor n means random data filtered to have the same mean, variance, higher moments, serial correlation, or other statistics matching those of factor n.
  21. 21. Useful Things to Do: Finding Transient Factors with Clustering • Example of overlap between ML and Quant finance. • Try an equity universe e.g. RU1000, N≈1000: 1. Create residual returns for each stock by removing contributions of explanatory factors* 𝜀% 𝑡 + 1 = 𝑟% 𝑡 + 1 − , 𝑥%. 𝑡 𝑓. 𝑡 + 1 .∈2 2. Calculate empirical correlation matrix on trailing 252TD (1-year daily) window – it’s OK that N>T 3. Perform PCA on the correlation matrix 4. Use D=2 eigenvectors eA, eB as 2D “coordinates” ** 1. Note: eA and eB have one entry for each security 2. Each represents security contribution to residual correlation * Key: 𝑡 indexes trading days; 𝑟% 𝑡 + 1 and 𝑓3 𝑡 + 1 are security and factor returns respectively for interval 𝜏 ∈ (𝑡, 𝑡 + 1], and 𝑥%. 𝑡 is exposure of security 𝑖 to factor 𝑘 at point 𝑡 in time. Caution: Do not use residuals published in risk model flat files for this purpose: most vendors process them significantly. Also (as demonstrated) the results of varying the set K are informative. ** Clustering on D > 2 is limited in practice by significance of corresponding eigenvalues and, for K-means, the degradation of Euclidian distance measure as D grows (with typical onset in the range DÎ[7 … 10])
  22. 22. Eigenvalues: Residual of Return Correlation (Return ex- intercept and industry returns) Variance PCA of Corr as above, except with Residual Return data replaced by samples drawn from standard normal distribution v1 ~ e1 v2 ~ e2 v3 ~ e3 PCA of Corr for Residual Return Data Variance: Principal Components of Residual Correlation vs. Principal Components of Correlation for Noise 25 20 15 10 5 0 0 50 100 150 200 250
  23. 23. At each point in time perform K-means clustering in the 2D space of e1, e2 • Plot results over time for different residual types … • Residual Return of security = • security return ex-contribution of universe “market intercept” (xs-mean) • security return ex-contributions of (both): • Market Intercept (xs-mean) of universe • Corresponding industry return (xs-mean by industry, ex-intercept) • random noise (e1 and e2 of corr matrix, 255 day window)
  24. 24. Residual Return: K-Means clustering over time Do-it-Yourself Project: Supplementing Factor Models Form K-means clusters on residual returns Tabulate characteristic labels by cluster: • Industry (and Sub-Industry – note dependence!) • Ranked (Hi/Med/Lo) fundamentals • Ranked (Hi/Med/Lo) technical factors Bayesian Analysis: Are labels or their co-occurrence persistently significantly conditioned on clusters? random noise security return ex- contribution of universe “market intercept” (xs- mean) security return ex-contributions of both: 1. Market Intercept (xs-mean) of universe 2. Corresponding industry return
  25. 25. Where do we get the K in K-means? • In K-means, the number of clusters K must be chosen • K is a parameter: it can be overfit to in-sample data • A separate data partition must be used to test overfit • Results are only valid for a final partition, untouched by above process • Idea: let K be chosen automatically for each dataset • But choice algorithms have parameters ... it‘s turtles all the way down • Sooner or later, parameters must be set from data • It is a perilous undertaking, and easy to exhaust available data • The 80‘s were the golden age of ML data snooping • ML users faced problems long addressed by Statisticians
  26. 26. The Modern Age of 1991
  27. 27. Machine Learning has a long history: 1991 marks the relevant turning point 1956- 1959 Arthur Samuel’s Checkers Program Complete, Plays Live on TV in 1956 “Some Studies in Machine Learning…” Publication delayed until 1959 1961 “Since current machines have (only 32 KB RAM), and (maybe 1 MB RAM ten years from now), we see that Dynamic Programming problems cannot be solved (directly) because of the memory requirements.” • R. Bellman, Adaptive Control Systems, 1961 Early ’60s to Early ’80s Symbolic AI dominates CS • Rule-based inference • Symbolic reasoning • Learning: CART decision trees • Main ML progress in fields of Bio, Psych, & EE Pattern Recognition 1980’s- 1980-1986 – Parallel Distributed Processing and the Neural Network Renaissance Neural Network models developed by psychologists and biologists show promising results on hard AI problems by learning from data. NN, and hence ML, become focus of AI 1987- Power without knowledge: ubiquitous PCs allow many NN users to learn about in-sample vs out-of-sample the hard way. 1989- Decline of the NN Cargo Cult / Opening the Vaults • Mathematical Innovation: optimization-based SVN and Bellman-based Reinforcement Learning • For problem-solvers, pragmatism replaces “brain-like” rationale • EE/OR of 40’s/50’s/60’s are mined for newly- practical ideas • Bio/Psych take interest 1991 “Statistical Data Analysis in the Computer Age” published in Science magazine. This turning point to the modern era provides rigorous statistical basis for ML algorithms and confidence testing. 1998- Machine Learning as Shadow Industry PageRank statistical learning algorithm created by Page, Brin, Mogwani, & Winograd ca 1996 Google incorporates in 1998 Google dominates competitors by applying ML to key problems of Internet business 2000-2005 2016 “[Modern] neural networks used for machine learning ... are generally not designed to be realistic models of biological function. … [They are grounded in] linear algebra, probability, information theory, and numerical optimization. - Goodfellow et al, Deep Learning, 2016
  28. 28. "Most of our familiar statistical methods, such as hypothesis testing, linear regression, analysis of variance, and maximum likelihood estimation, were designed to be implemented on mechanical calculators.” "Modern ~ computers facilitate ~ new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated estimators.” "These methods allow the scientist to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability.” "This is possible because traditional methods of mathematical analysis are replaced by specially constructed computer algorithms.” "Mathematics has not disappeared from statistical theory. It is the main method for deciding which algorithms are correct and efficient tools for automating statistical inference.” Statistical data analysis in the computer age Efron & Tibshirani, Science. Jul 26, 1991
  29. 29. Statistical data analysis in the computer age … coincides with popularity of nonlinear systems Statisticians advocating computational methods that improve flexibility without sacrificing statistical power… Field: Statistical Testing and Validation Barrier: Constraints of Mathematical Statistics Insight: Harness the brute force of 80’s-era PCs Innovation: Statistical Data Analysis Algorithms* … as ML re-diversifies, NN paradigm shifts from cargo cult to one component of larger toolbox informed by CS/EE/OR. Field: Human-competitive prediction and decision Barrier: Limited knowledge of how brains work Insight: NN turns to previously-impractical EE/CS Innovation: Ideas infeasible in 40’s-60’s are doable *Statistical Data Analysis is the term coined by John Tukey, and used by the Efron & Tibshirani for continuity; however Statistical Data Analysis Algorithms is more accurate.
  30. 30. Statistical data analysis in the computer age … details of the new emergent synthesis Field: Statistical Testing and Validation Barrier: Constraints of Mathematical Statistics Insight: Harness the brute force of 80’s-era PCs Innovation: Statistical Data Analysis Algorithms* Mathematical Statistics: Based on algebra & calculus. Confidence in results derive from proofs that require analytically tractable formulas and assumptions about data. Statistical Data Algorithms: Based on brute force re- sampling. Confidence derives from sampling theorems, algorithm properties, and the statistic(s) applied to each sample set. Field: Human-competitive prediction and decision Barrier: Limited knowledge of how brains work Insight: NN turns to previously-impractical EE/CS Innovation: Ideas infeasible in 40’s-60’s are doable “Biological” Neural Nets: Parallel Distributed Processing … gets a jumpstart from research neglected by 60’s/70’s Symbolic AI. Practical implementations usually based on CS, EE, OR methods. Golden Oldies: Algorithms of the 50’s+90’s CPUs Dynamic Programming, Monte Carlo Methods, Convex Optimization, SVM Popularized by their use in NN, their variants are today’s core ML tools. *Statistical Data Analysis is the term coined by John Tukey, and used by the Efron & Tibshirani for continuity; however Statistical Data Analysis Algorithms is more accurate.
  31. 31. Statistical data analysis in the computer age … specific milestone changes to the ML field Field: Statistical Testing and Validation Barrier: Constraints of Mathematical Statistics Insight: Harness the brute force of 80’s-era PCs Innovation: Statistical Data Analysis Algorithms* Mathematical Statistics: Based on algebra & calculus. Confidence in results derive from proofs that require analytically tractable formulas and assumptions about data. Analytic tractability: statistics & tests use equations in closed form, each tailored to the result (e.g. mean). Distributional Assumptions: (example) False rejection of valid data may occur for fat tail distributions when assumed normal. Statistical Data Algorithms: Based on brute force re-sampling. Confidence derives from sampling theorems, algorithm properties, and the statistic(s) applied to each sample set. (example) Bootstrap Algorithm: Sampling with replacement takes statistic on data subsets over many iterations. Assumptions re data (usually) not required. Sampling methods may provide distribution insights. Confidence depends on the statistic: still difficult, but flexible. Field: Human-competitive prediction and decision Barrier: Limited knowledge of how brains work Insight: NN turns to previously-impractical EE/CS Innovation: Ideas infeasible in 40’s-60’s are doable “Biological” Neural Nets: Parallel Distributed Processing … gets a jumpstart from research neglected by 60’s/70’s Symbolic AI. Practical implementations usually based on CS, EE, OR methods. Neural networks: Good results from early efforts drive AI hype cycle, inflated expectations. By 1991, backlog of NN ideas tapped out. Problem: We don’t know how brains work, so can’t make neural nets “more brain- like.” CS, EE, OR are best bets for new ideas. Golden Oldies Triumphant: Algorithms of the 50’s + CPUs of 90’s Dynamic Programming, Monte Carlo Methods, Convex Optimization, SVM Popularized by their use in NN, their variants are today’s core ML tools. Feedback example: Psychology inspired the Reinforcement Learning(RL) method, an ML extensionofDynamic Programming,nowa frequentstapleofEE & OR curricula. The Connection: Testing highly non- linear methods against problematic data is an ideal match for Statistical Data Analysis Algorithms. *Statistical Data Analysis is the term coined by John Tukey, and used by the Efron & Tibshirani for continuity; however Statistical Data Analysis Algorithms is more accurate.
  32. 32. The eve of disruption
  33. 33. What is this thing called Computer Science? Three Aspects: • PROBLEM: Formally states the input elements and the goal to achieve • ALGORITHM: The step sequence required to achieve the goal • MECHANISM: The system that physically executes the algorithm steps Together, the three aspects determine hardness of a problem + solution pair in terms of • Time (number of steps) and • Space (memory)… required to achieve the goal. Many problems are intractable: they require so many steps to achieve a perfect solution that there isn’t enough time and space in the universe to find it. Examples are common: • Making the perfect chess move. • Deciding what to wear to the party next week. Intractable problems can still be usefully addressed using an algorithm that finds the best approximate solution for a given amount of time and space. Computer Science is the study of mechanistic problem solving, starting from three deeply intertwined aspects
  34. 34. Machine Learning as Competitive Advantage • Google’s key competitive advantages in October 2005: • The patent on Google’s ML-based PageRank algorithm was a major barrier to entry for search engines thereafter • Gmail featured an ML algorithm that eliminated spam, improved user experience, and rapidly eroded competitors’ market share • Google’s AdWords ML algorithm brought high revenue by rapidly taking leadership of the online advertising market • Google’s competitors were focused on “acquiring content.” • The big technology push in corporate America? IT Outsourcing. • Behavioral biases 101: Computer Science had become framed by most industries as a cost center, not as a source of innovation.
  35. 35. Can the Google Disruption happen in finance? There is at least one clue … • The most popular desktop text on High Frequency Trading is Irene Aldridge’s book of the same name. • The finance material within is standard and thorough: • risk adjusted return, • alpha strategies / capacity, • performance measurement, • order books, • detailed treatment of t-cost specific to microstructure • What differentiates the book is its discussion of computer architecture, networking, algorithms, and test procedures • This is good solid CS 101 – no magical spells required • It is a no-nonsense cookbook for industry disruption. • ML in finance has no equivalent. Yet.
  36. 36. CONCLUSIONS
  37. 37. Anthropophoria: Heider-Simmel and AI • Anthropomorphism is the great mother of all behavioral biases • Heider-Simmel led to the study of fundamental attribution error (FAE), which motivated the work of Kahneman and Tversky • Artificial Intelligence seeks to make computers do human-like things: • Playing games • Driving cars • Recognizing cats • … eliciting excitement and fear. Whether asked pointed questions about the figure’s behavior “as persons” or simply “what happened in the movie,” subjects described the shapes as people with vivid feelings and motivations. When the film was shown in reverse from end-to- start, the interactions of the figures made no obvious sense, but separate groups of subjects again created human behavioral interpretations of movement, and described them equally vividly as did the subjects who viewed the forward-running film. Heider-Simmel (1943) Subjects were shown a 3-minute film of moving shapes interacting: collisions, pursuit, and avoidance depicted highly intentional conflict among the trio.
  38. 38. The Fundamental Law of AI • Hype cannot diminish the significance of "deep" learning systems outperforming humans on complex tasks. • They are the newest in a long line of important AI developments over the past sixty years, most of which are now forgotten. • The forgotten AI Innovations did not disappear, or fail: 1. They became commonplace, … 2. the excitement wore off, and … 3. the world forgot that they were AI. Autopilot, anti-lock brakes, optical character recognition, relational databases, and chess-playing programs were once AI. The Fundamental Law of Artificial Intelligence: • AI is not viewed as AI after it becomes common, because anthropophoria wears off. • Corollary: The Fundamental Law of AI implies that AI failures are remembered as AI.
  39. 39. End See Appendices
  40. 40. Appendix 1: Endnotes The title of this talk is taken from Probably Approximately Correct, the recent book by Leslie Valiant, which greatly expands upon PAC Learning, a foundational theory of machine learning that first appeared in Valiant, Leslie G. 1984. “A Theory of the Learnable.” Communications of the ACM 27 (11): 1134–1142. This talk makes many mentions of computer-playing poker programs, particularly HUNL championship program DeepStack. The Q-Group Fall Special Track on ML included dinner speaker Michael Bowling, who heads the DeepStack team. The https://deepstack.ai web site is accessible to both specialist and non-specialist viewers. It describes the DeepStack project headed by Michael Bowling of U. Alberta, including gameplay footage, video lecture, and pre-print of the cover article from the May 5, 2017 issue of Science. DeepStack is a program (one of two, the other being CMU’s Libratus) that has significantly outperformed human champions at Heads Up No-Limit Poker, a game of incomplete information, a property which places it in a category of difficulty far beyond Chess or Go. See also: Moravčík, et al, and Michael Bowling. 2017. “DeepStack: Expert-Level Artificial Intelligence in Heads- up No-Limit Poker.” Science 356 (6337): 508. May 4, 2017.
  41. 41. Appendix 2: Next Steps For Learning Learning Machine Learning First Step: Computer Science Overview Christian and Griffiths, Algorithms to Live by. This is a popular science and general readership book from which even professionals in the field can gain new insights. It draws from life’s problems large and small: Where should I park? How long should I date before I marry? What’s the best way to sort my laundry? From these life examples Computer Science is revealed as the study of problems, their characteristics, and the algorithms that address them. The book illustrates hallmarks of what makes problems easy or hard, how solutions and algorithms can employ tradeoffs to gain different types of efficiency, and how these ideas inform Economics, Political Decisions, and Behavioral Psychology. Free of equations, suitable for audiobook, and recommended for all attendees. Machine Learning for Quants, Hands On: Starting Point James et al., An Introduction to Statistical Learning with Applications in R. For quants intending to investigate Machine Learning, this is the preferred starting point because it approaches the field from a statistical perspective that highlights similarities and differences with the foundations of Quant Finance practice. It is highly recommended to learn the material via the online Stanford class based on this book (it is a live course, but recent semesters are archived for on-demand use). Course and book are free, as are all software and data. NOTE: this class is taught by Tibshirani & Hastie through Stanford’s Lagunitas program, and should not be confused with Andrew Ng’s Coursera Machine Learning class. Taking both isn’t a waste, but if forced to choose, Statistical Learning is more compatible with Quant Finance. (CONTINUED)
  42. 42. Appendix 2: Next Steps For Learning (cont) Learning Machine Learning The Full Reference: Statistical Learning Hastie, Tibshirani, and Friedman, The Elements of Statistical Learning, 2ed. The first edition of this book was the culmination of a ten-year transformation beginning in the early 1990’s that resulted in both a rigorous grounding of ML in statistical method as well as firm establishment of Statistical Data Analysis first proposed by John Tukey, where proofs about algorithms replace proofs about equations, reducing reliance on closed-form formulas and restrictive assumptions on data distributions. While the authors were key figures in that movement, they carefully cover the breadth of the statistical learning field and its contributors. Note: Intro to Statistical Learning is culled from this book. Supplement: Hastie, Tibshirani, & Wainwright, Statistical Learning with Sparsity. Addresses the special case (or, perhaps, the common case) of too many candidate explanatory factors and not enough data. Machine Learning for Quants, Hands On: 800 pages of Next Steps Goodfellow, Bengio, and Courville, Deep Learning. This book assumes minimum knowledge equivalent to understanding all concepts in Christian & Griffiths, Intro to Statistical Learning, and undergrad-level calculus. Requires Python (learnable on-the-fly). Includes 200-page intro to linear algebra and differential tensor calculus. Teaches math, statistical, and learning concepts, and requires large projects using TensorFlow.
  43. 43. Appendix 2B: Computer Age Reference Statistics in the Computer Age The New Classical Statistics Efron, Brad and T. Hastie, Computer Age Statistical Inference. Twenty-five years after Efron & Tibshirani’s seminal article, this book offers a full treatment of classical and modern statistical inference. Rather than being focused primarily on Statistical Learning, it is intended to serve a modern age text on statistics in general for teaching the basic curriculum.
  44. 44. contact: walter.tackett@nE12.com

×