Machine Learning and AI in Finance 2020 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP
2 Speaker bio • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and...
3 QuantUniversity • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Train...
1. Key trends in AI, Machine Learning & Fintech 2. An intuitive introduction to AI and ML 3. Case studies 4. Slides at: 5....
AI and Machine Learning in Finance
6 The 4th Industrial revolution is Here! Source: Christoph Roser at AllAboutLean.com As per Wikipedia*, “The 4th Industria...
7 Scientists are disrupting the way we live! Source: https://www.ladn.eu/tech-a-suivre/mobilite-2030-vehicules-volants-ope...
8 Interest in Machine learning continues to grow https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
9 MACHINE LEARNING AND AI IS REVOLUTIONIZING FINANCE
10 Market impact at the speed of light! 10
11 • Machine learning is the scientific study of algorithms and statistical models that computer systems use to effectivel...
12 Machine Learning & AI in finance: A paradigm shift 12 Stochastic Models Factor Models Optimization Risk Factors P/Q Qua...
13 The Virtuous Circle of Machine Learning and AI 13 Smart Algorithms Hardware Data
14 The rise of Big Data and Data Science 14 Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_fil...
15 Smart Algorithms 15 Distributing Computing Frameworks Deep Learning Frameworks 1. Our labeled datasets were thousands o...
16 Hardware Speed up calculations with 1000s of processors Scale computations with infinite compute power
22
Risk Systems That Read® • Northfield uses machine learning based analysis of news text to describe how current conditions ...
24 1. Leveraging large and diverse datasets for Investment decision making at J.P. Morgan1 2. Improving Quantitative inves...
25 Source: https://www.cbinsights.com/research/artificial-intelligence-top-startups/
29 Let’s get under the hood 29 Source: https://www.pikrepo.com/fcsda/yellow-hot-rod-car-with-hood-open
Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Mode...
31 1. Data 2. Goals 3. Machine learning algorithms 4. Process 5. Performance evaluation Key steps involved
33 Dataset, variable and Observations Dataset: A rectangular array with Rows as observations and columns as variables Vari...
34 Variables A variable could be: ▫ Categorical  Yes/No flags  AAA,BB ratings for bonds ▫ Numerical  35 mpg  $170K sal...
35 Longitudinal ▫ Observations are dependent ▫ Temporal-continuity is required Cross-sectional ▫ Observations are independ...
36 Data Cross sectional Numerical Categorical Longitudinal Numerical Summary 36
38 • Descriptive Statistics ▫ Goal is to describe the data at hand ▫ Backward-looking ▫ Statistical techniques employed he...
39 • Given a dataset, build a model that captures the similarities in different observations and assigns them to different...
40 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical N...
42 Machine Learning Unsupervised Supervised Reinforcement Semi-Supervised Machine Learning
43 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical N...
44 Supervised Algorithms ▫ Given a set of variables 𝑥!, predict the value of another variable 𝑦 in a given data set such t...
45 Unsupervised Algorithms ▫ Given a dataset with variables 𝑥!, build a model that captures the similarities in different ...
46 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Su...
47 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learni...
48 Machine Learning Supervised Prediction Parametric Linear Regression Neural Networks Non- parametric KNN Decision Trees ...
49 Machine Learning movers and shakers Deep Learning Automatic Machine Learning Ensemble Learning Natural Language Process...
50 http://www.asimovinstitute.org/neural-network-zoo/
52 The Process 52 Data ingestion Data cleansing Feature engineering Training and testing Model building Model selection
53 • What transformations do I need for the x and y variables ? • Which are the best features to use? ▫ Dimension Reductio...
54 Data Training 80% Testing 20% Training the model 54
56 Evaluating Machine learning algorithms Supervised - Prediction R-square RMS MAE MAPE Supervised- Classification Confusi...
57 • Fit measures in classical regression modeling: • Adjusted 𝑅! has been adjusted for the number of predictors. It incre...
58 ▫ MAPE (mean absolute percentage error) gives a percentage score of how predictions deviate on average 𝑀𝐴𝑃𝐸 = ∑!"# $ 𝑒!...
59 1. Data 2. Goals 3. Machine learning algorithms 4. Process 5. Performance Evaluation Recap
Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Mode...
63 Claim: • Machine learning is better for fraud detection, looking for arbitrage opportunities and trade execution Cautio...
64 Claim: • Our models work on datasets we have tested on Caution: • Do we have enough data? • How do we handle bias in da...
65 AI and Machine Learning in Production https://www.itnews.com.au/news/hsbc-societe-generale-run- into-ais-production-pro...
66 Claim: • It works. We don’t know how! Caution: • It’s still not a proven science • Interpretability or “auditability” o...
67 Claim: • Machine Learning models are more accurate than traditional models Caution: • Is accuracy the right metric? • H...
68 Claim: • Machine Learning and AI will replace humans in most applications Caution: • Beware of the hype! • Just because...
Alternative investments: Interest rate predication for Peer-to-Peer Market places using ML techniques
71 1. Case Intro 2. Data Exploration of the Credit risk data set 3. Problem Definition and Machine learning 4. Performance...
72 Credit decisions Credit-scoring models and techniques assess the risk in lending to customers. Typical decisions: • Gra...
73 How Lending club works? https://www.lendingclub.com/public/how-peer-lending- works.action
74 • How much should I expect as interest? • Is my borrower credit worthy? • How much interest would a similar borrower pa...
75 The Data 75 https://www.kaggle.com/wendykan/lending-club-loan-data
76 Credit Risk pipeline Data Ingestion from Lending Club Pre-Processing Feature Engineering Model Development and Tuning M...
77 77
79 All scenarios haven’t played out • Stress scenarios • What-if scenarios Challenges with real datasets Figure ref: http:...
80 Missing values • Missing at random • Missing sequences • Need data to fill frames Challenges with real datasets
81 • Access ▫ Hard to find ▫ Rare class problems ▫ Privacy concerns making it difficult to share Challenges with real data...
82 Imbalanced • Need more samples of rare class • Need proxies for data points that were not observed or recorded Challeng...
83 Labels • Human labeling is hard • Synthetic label generators Challenges with real datasets
84 VAE https://arxiv.org/pdf/1808.06444.pdf
85 GAN https://developers.google.com/machine- learning/gan/gan_structure
86
87 Demo: Synthetic data generation Extreme scenario generation
Investments: Clustering of stocks
90 1. Case Intro 2. Data Exploration of WIG20 stock data 3. Problem Definition and Machine learning 4. Deployment Case stu...
91 Clustering stocks • Which stocks are like each other? • Are growth stocks behaving like growth stocks or value stocks? ...
92 Clustering workflow Data Ingestion Pre- Processing Clustering Visualization & analysis Model Deployment Stage 1 Stage 2...
93 • Clustering ▫ How do we define distance between stocks?  Correlation  1- Correlation2 (https://arxiv.org/abs/cond-ma...
94 • Clustering ▫ How do we define distance between stocks?  Covariance Estimation - https://scikit- learn.org/stable/mod...
95 ▫ Visualizing relationships Methodology
96 ▫ Visualizing relationships https://scikit-learn.org/stable/modules/manifold.html#manifold Methodology
97 97
98 98
99 1. http://awesome-streamlit.org/ 2. https://scikit- learn.org/stable/auto_examples/applications/plot_stock_mar ket.html...
101 • Understanding sentiments in Earnings call transcripts Goal
102 • Interpreting emotions • Labeling data Challenges
103 What is NLP ? AI Linguistics Computer Science
104 • Q/A • Dialog systems - Chatbots • Topic summarization • Sentiment analysis • Classification • Keyword extraction - S...
105 NLP in Finance
106 • If computers can understand language, opens huge possibilities ▫ Read and summarize ▫ Translate ▫ Describe what’s ha...
107 • Describe rules of grammar • Describe meanings of words and their relationships • …including all the special cases • ...
108 What is NLP ? Jumping NLP Curves https://ieeexplore.ieee.org/document/6786458/
109 Q: What’s hard about writing programs to understand text?
110 • Ambiguity: ▫ “ground” ▫ “jaguar” ▫ “The car hit the pole while it was moving” ▫ “One morning I shot an elephant in m...
111
112 • Many ways to say the same thing ▫ “the same thing can be said in many ways” ▫ “language is versatile” ▫ “The same wo...
113 • APIs • Human Insight • Expert Knowledge • Build your own Options?
114 NLP pipeline Data Ingestion from Edgar Pre-Processing Invoking APIs to label data Compare APIs Build a new model for s...
Register at Qufallschool.splashthat.com 116
Thank you! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Conta...
  1. 1. Machine Learning and AI in Finance 2020 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.quantuniversity.com 10/22/2020 CFA Society Poland
  2. 2. 2 Speaker bio • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Columnist for the Wilmott Magazine • Author of forthcoming book “The Model-Driven Enterprise” • Teaches AI/ML and Fintech Related topics in the MS and MBA programs at Northeastern University, Boston • Reviewer: Journal of Asset Management Sri Krishnamurthy Founder and CEO QuantUniversity
  3. 3. 3 QuantUniversity • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Building a platform for AI and Machine Learning Exploration and Experimentation
  4. 4. 1. Key trends in AI, Machine Learning & Fintech 2. An intuitive introduction to AI and ML 3. Case studies 4. Slides at: 5. https://academy.qusandbox.com/#/market/5f91612b99aa4a2469 1da7ef 6. Use Code: CFAPoland as registration code Agenda
  5. 5. AI and Machine Learning in Finance
  6. 6. 6 The 4th Industrial revolution is Here! Source: Christoph Roser at AllAboutLean.com As per Wikipedia*, “The 4th Industrial Revolution ….. marked by emerging technology breakthroughs in a number of fields, including robotics, artificial intelligence, nanotechnology, quantum computing, biotechnology, the Internet of Things, the Industrial Internet of Things (IIoT), decentralized consensus, fifth-generation wireless technologies (5G), additive manufacturing/3D printing and fully autonomous vehicles.” * https://en.wikipedia.org/wiki/Fourth_Industrial_Revolution
  7. 7. 7 Scientists are disrupting the way we live! Source: https://www.ladn.eu/tech-a-suivre/mobilite-2030-vehicules-volants-open-data/
  8. 8. 8 Interest in Machine learning continues to grow https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
  9. 9. 9 MACHINE LEARNING AND AI IS REVOLUTIONIZING FINANCE
  10. 10. 10 Market impact at the speed of light! 10
  11. 11. 11 • Machine learning is the scientific study of algorithms and statistical models that computer systems use to effectively perform a specific task without using explicit instructions, relying on patterns and inference instead1 • Artificial intelligence is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals1 Defining Machine Learning and AI 11 1. https://en.wikipedia.org/wiki/Machine_learning 2. Figure Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
  12. 12. 12 Machine Learning & AI in finance: A paradigm shift 12 Stochastic Models Factor Models Optimization Risk Factors P/Q Quants Derivative pricing Trading Strategies Simulations Distribution fitting Quant Real-time analytics Predictive analytics Machine Learning RPA NLP Deep Learning Computer Vision Graph Analytics Chatbots Sentiment Analysis Alternative Data Data Scientist
  13. 13. 13 The Virtuous Circle of Machine Learning and AI 13 Smart Algorithms Hardware Data
  14. 14. 14 The rise of Big Data and Data Science 14 Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg
  15. 15. 15 Smart Algorithms 15 Distributing Computing Frameworks Deep Learning Frameworks 1. Our labeled datasets were thousands of times too small. 2. Our computers were millions of times too slow. 3. We initialized the weights in a stupid way. 4. We used the wrong type of non-linearity. - Geoff Hinton “Capital One was able to determine fraudulent credit card applications in 100 milliseconds”* * http://go.databricks.com/hubfs/pdfs/Databricks-for-FinTech-170306.pdf
  16. 16. 16 Hardware Speed up calculations with 1000s of processors Scale computations with infinite compute power
  17. 17. 17 “Financial Technologies or “Fintech” is used to describe a variety of innovative business models and emerging technologies that have the potential to transform the financial services industry ” Technology drives finance! https://www.iosco.org/library/pubdocs/pdf/IOSCOPD554.pdf
  18. 18. 18 http://www.analyticscertificate.com/fintech/
  19. 19. 19 http://www.analyticscertificate.com/fintech/
  20. 20. 20 http://www.analyticscertificate.com/fintech/
  21. 21. 21 http://www.analyticscertificate.com/fintech/
  22. 22. 22
  23. 23. Risk Systems That Read® • Northfield uses machine learning based analysis of news text to describe how current conditions in financial markets are different than usual. • Typically, over 8000 articles per day containing more than 20,000 “topics” (companies, industries, countries) are processed. • The nature and magnitudes of these difference are used to revise expectations of financial market risks for all global equities and credit instruments on a daily basis.
  24. 24. 24 1. Leveraging large and diverse datasets for Investment decision making at J.P. Morgan1 2. Improving Quantitative investing at AQR2 3. Using Sandboxes and labs to further innovation in fintech at Fidelity3 4. Use of AI and ML increasing in ssset management from idea generation to execution - Wells Fargo4 Additional Use cases 1. https://www.jpmorgan.com/global/cib/research/investment-decisions-using-machine-learning-ai 2. https://www.aqr.com/Learning-Center/Machine-Learning 3. https://www.fidelitylabs.com/ 4. https://www08.wellsfargomedia.com/assets/pdf/personal/investing/investment-institute/IG_Machines_Are_Coming_ADA.pdf
  25. 25. 25 Source: https://www.cbinsights.com/research/artificial-intelligence-top-startups/
  26. 26. 26 • Automation to increase • Digital transformation and move to the cloud finally happening • Use of Synthetic data to increase • Edge cases of AI put to truth test! • Fintechs feeling the pressure to prove themselves! • Human-in-the-loop AI to regain focus! The changes have been drastic and sudden! What’s in store for the industry is yet to be seen! What does Covid2019 mean to adoption of AI and ML in Financial services?
  28. 28. 29 Let’s get under the hood 29 Source: https://www.pikrepo.com/fcsda/yellow-hot-rod-car-with-hood-open
  29. 29. Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer Data Scientist/QuantsSoftware/Web Engineer • AutoML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Analysts& DecisionMakers
  30. 30. 31 1. Data 2. Goals 3. Machine learning algorithms 4. Process 5. Performance evaluation Key steps involved
  31. 31. 33 Dataset, variable and Observations Dataset: A rectangular array with Rows as observations and columns as variables Variable: A characteristic of members of a population ( Age, State etc.) Observation: List of Variable values for a member of the population
  32. 32. 34 Variables A variable could be: ▫ Categorical  Yes/No flags  AAA,BB ratings for bonds ▫ Numerical  35 mpg  $170K salary
  33. 33. 35 Longitudinal ▫ Observations are dependent ▫ Temporal-continuity is required Cross-sectional ▫ Observations are independent Datasets
  34. 34. 36 Data Cross sectional Numerical Categorical Longitudinal Numerical Summary 36
  35. 35. 38 • Descriptive Statistics ▫ Goal is to describe the data at hand ▫ Backward-looking ▫ Statistical techniques employed here • Predictive Analytics ▫ Goal is to use historical data to build a model for prediction ▫ Forward-looking ▫ Machine learning & AI techniques employed here Goal 38
  36. 36. 39 • Given a dataset, build a model that captures the similarities in different observations and assigns them to different buckets- Clustering • Given a set of variables, predict the value of another variable in a given data set- Prediction ▫ Predict salaries given work experience, education etc. ▫ Predict whether a loan would be approved given fico score, current loans, employment status etc. Predictive Analytics : Cross sectional datasets 39
  37. 37. 40 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-series Summary 40
  38. 38. 42 Machine Learning Unsupervised Supervised Reinforcement Semi-Supervised Machine Learning
  39. 39. 43 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-series Machine Learning Algorithms 43
  40. 40. 44 Supervised Algorithms ▫ Given a set of variables 𝑥!, predict the value of another variable 𝑦 in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification ▫ Example: Given that a customer’s Debt-to-Income ratio increased 20%, what are the chances he/she would default in 3 months? Machine Learning 44 x1,x2,x3… Model F(X) y
  41. 41. 45 Unsupervised Algorithms ▫ Given a dataset with variables 𝑥!, build a model that captures the similarities in different observations and assigns them to different buckets => Clustering ▫ Example: Given a list of emerging market stocks, can we segment them into three buckets? Machine Learning 45 Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  42. 42. 46 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Supervised Learning models - Prediction 46 𝑌 = 𝛽! + 𝛽" 𝑋" Linear Regression Model Neural network Model
  43. 43. 47 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learning models 47 K-nearest neighbor Model Decision tree Model
  44. 44. 48 Machine Learning Supervised Prediction Parametric Linear Regression Neural Networks Non- parametric KNN Decision Trees Classification Parametric Logistic Regression Neural Networks Non Parametric Decision Trees KNN Unsupervised algorithms K-means Associative rule mining Machine Learning Algorithms 48
  45. 45. 49 Machine Learning movers and shakers Deep Learning Automatic Machine Learning Ensemble Learning Natural Language Processing Data Robot H20.ai Autosklearn autokkeras Tensorflow Pytorch NLTK HuggingFace Bagging Boosting DNN CNN LSTM GAN
  46. 46. 50 http://www.asimovinstitute.org/neural-network-zoo/
  47. 47. 52 The Process 52 Data ingestion Data cleansing Feature engineering Training and testing Model building Model selection
  48. 48. 53 • What transformations do I need for the x and y variables ? • Which are the best features to use? ▫ Dimension Reduction – PCA ▫ Best subset selection  Forward selection  Backward elimination  Stepwise regression Feature Engineering 53
  49. 49. 54 Data Training 80% Testing 20% Training the model 54
  50. 50. 56 Evaluating Machine learning algorithms Supervised - Prediction R-square RMS MAE MAPE Supervised- Classification Confusion Matrix ROC Curves Evaluation framework 56
  51. 51. 57 • Fit measures in classical regression modeling: • Adjusted 𝑅! has been adjusted for the number of predictors. It increases only when the improve of model is more than one would expect to see by chance (p is the total number of explanatory variables) 𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅! = 1 − ⁄∑"#$ % (𝑦" − 0𝑦")! (𝑛 − 𝑝 − 1) ∑"#$ % 𝑦" − 4𝑦" ! /(𝑛 − 1) • MAE or MAD (mean absolute error/deviation) gives the magnitude of the average absolute error 𝑀𝐴𝐸 = ∑"#$ % 𝑒" 𝑛 Prediction Accuracy Measures
  52. 52. 58 ▫ MAPE (mean absolute percentage error) gives a percentage score of how predictions deviate on average 𝑀𝐴𝑃𝐸 = ∑!"# $ 𝑒!/𝑦! 𝑛 ×100% • RMSE (root-mean-squared error) is computed on the training and validation data 𝑅𝑀𝑆𝐸 = 1/𝑛 2 !"# $ 𝑒! % Prediction Accuracy Measures
  53. 53. 59 1. Data 2. Goals 3. Machine learning algorithms 4. Process 5. Performance Evaluation Recap
  54. 54. Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer Data Scientist/QuantsSoftware/Web Engineer • AutoML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Analysts& DecisionMakers
  55. 55. 61
  56. 56. 62
  57. 57. 63 Claim: • Machine learning is better for fraud detection, looking for arbitrage opportunities and trade execution Caution: • Beware of imbalanced class problems • A model that gives 99% accuracy may still not be good enough 1. Machine learning is not a generic solution to all problems
  58. 58. 64 Claim: • Our models work on datasets we have tested on Caution: • Do we have enough data? • How do we handle bias in datasets? • Beware of overfitting • Historical Analysis is not Prediction 2. A prototype model is not your production model
  59. 59. 65 AI and Machine Learning in Production https://www.itnews.com.au/news/hsbc-societe-generale-run- into-ais-production-problems-477966 Kristy Roth from HSBC: “It’s been somewhat easy - in a funny way - to get going using sample data, [but] then you hit the real problems,” Roth said. “I think our early track record on PoCs or pilots hides a little bit the underlying issues. Matt Davey from Societe Generale: “We’ve done quite a bit of work with RPA recently and I have to say we’ve been a bit disillusioned with that experience,” “the PoC is the easy bit: it’s how you get that into production and shift the balance”
  60. 60. 66 Claim: • It works. We don’t know how! Caution: • It’s still not a proven science • Interpretability or “auditability” of models is important • Transparency in codebase is paramount with the proliferation of opensource tools • Skilled data scientists who are knowledgeable about algorithms and their appropriate usage are key to successful adoption 3. We are just getting started!
  61. 61. 67 Claim: • Machine Learning models are more accurate than traditional models Caution: • Is accuracy the right metric? • How do we evaluate the model? RMS or R2 • How does the model behave in different regimes? 4. Choose the right metrics for evaluation
  62. 62. 68 Claim: • Machine Learning and AI will replace humans in most applications Caution: • Beware of the hype! • Just because it worked sometimes doesn’t mean that the organization can be on autopilot • Will we have true AI or Augmented Intelligence? • Model risk and robust risk management is paramount to the success of the organization. • We are just getting started! 5. The Robots are coming! https://www.bloomberg.com/news/articles/2017-10-20/automation- starts-to-sweep-wall-street-with-tons-of-glitches
  Alternative investments: Interest rate predication for Peer-to-Peer Market places using ML techniques
  65. 65. 71 1. Case Intro 2. Data Exploration of the Credit risk data set 3. Problem Definition and Machine learning 4. Performance Evaluation 5. Deployment Case study
  66. 66. 72 Credit decisions Credit-scoring models and techniques assess the risk in lending to customers. Typical decisions: • Grant credit/not to new applicants • Increasing/Decreasing spending limits • Increasing/Decreasing lending rates • What new products can be given to existing applicants ?
  67. 67. 73 How Lending club works? https://www.lendingclub.com/public/how-peer-lending- works.action
  68. 68. 74 • How much should I expect as interest? • Is my borrower credit worthy? • How much interest would a similar borrower pay? • What is the repayment and default rate for a similar borrower? Investor’s big decisions
  69. 69. 75 The Data 75 https://www.kaggle.com/wendykan/lending-club-loan-data
  70. 70. 76 Credit Risk pipeline Data Ingestion from Lending Club Pre-Processing Feature Engineering Model Development and Tuning Model Deployment Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
  71. 71. 77 77
  73. 73. 79 All scenarios haven’t played out • Stress scenarios • What-if scenarios Challenges with real datasets Figure ref: http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
  74. 74. 80 Missing values • Missing at random • Missing sequences • Need data to fill frames Challenges with real datasets
  75. 75. 81 • Access ▫ Hard to find ▫ Rare class problems ▫ Privacy concerns making it difficult to share Challenges with real datasets
  76. 76. 82 Imbalanced • Need more samples of rare class • Need proxies for data points that were not observed or recorded Challenges with real datasets
  77. 77. 83 Labels • Human labeling is hard • Synthetic label generators Challenges with real datasets
  78. 78. 84 VAE https://arxiv.org/pdf/1808.06444.pdf
  79. 79. 85 GAN https://developers.google.com/machine- learning/gan/gan_structure
  80. 80. 86
  81. 81. 87 Demo: Synthetic data generation Extreme scenario generation
  Investments: Clustering of stocks
  84. 84. 90 1. Case Intro 2. Data Exploration of WIG20 stock data 3. Problem Definition and Machine learning 4. Deployment Case study
  85. 85. 91 Clustering stocks • Which stocks are like each other? • Are growth stocks behaving like growth stocks or value stocks? • Does the time series of prices & returns reveal which stocks are close to each other?
  86. 86. 92 Clustering workflow Data Ingestion Pre- Processing Clustering Visualization & analysis Model Deployment Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
  87. 87. 93 • Clustering ▫ How do we define distance between stocks?  Correlation  1- Correlation2 (https://arxiv.org/abs/cond-mat/9802256) ▫ Hierarchical Clustering Methodology
  88. 88. 94 • Clustering ▫ How do we define distance between stocks?  Covariance Estimation - https://scikit- learn.org/stable/modules/covariance.html#sparse-inverse-covariance ▫ Clustering: Affinity propagation - https://scikit- learn.org/stable/modules/clustering.html#affinity-propagation ▫ Visualizing relationships with manifold learning – https://scikit-learn.org/stable/modules/manifold.html#manifold Methodology
  89. 89. 95 ▫ Visualizing relationships Methodology
  90. 90. 96 ▫ Visualizing relationships https://scikit-learn.org/stable/modules/manifold.html#manifold Methodology
  91. 91. 97 97
  92. 92. 98 98
  93. 93. 99 1. http://awesome-streamlit.org/ 2. https://scikit- learn.org/stable/auto_examples/applications/plot_stock_mar ket.html Acknowledgements
  95. 95. 101 • Understanding sentiments in Earnings call transcripts Goal
  96. 96. 102 • Interpreting emotions • Labeling data Challenges
  97. 97. 103 What is NLP ? AI Linguistics Computer Science
  98. 98. 104 • Q/A • Dialog systems - Chatbots • Topic summarization • Sentiment analysis • Classification • Keyword extraction - Search • Information extraction – Prices, Dates, People etc. • Tone Analysis • Machine Translation • Document comparison – Similar/Dissimilar Sample applications
  99. 99. 105 NLP in Finance
  100. 100. 106 • If computers can understand language, opens huge possibilities ▫ Read and summarize ▫ Translate ▫ Describe what’s happening ▫ Understand commands ▫ Answer questions ▫ Respond in plain language Language allows understanding
  101. 101. 107 • Describe rules of grammar • Describe meanings of words and their relationships • …including all the special cases • ...and idioms • ...and special cases for the idioms • ... • ...understand language! Traditional language AI https://en.wikipedia.org/wiki/Formal_language
  102. 102. 108 What is NLP ? Jumping NLP Curves https://ieeexplore.ieee.org/document/6786458/
  103. 103. 109 Q: What’s hard about writing programs to understand text?
  104. 104. 110 • Ambiguity: ▫ “ground” ▫ “jaguar” ▫ “The car hit the pole while it was moving” ▫ “One morning I shot an elephant in my pajamas. How he got into my pajamas, I’ll never know.” ▫ “The tank is full of soldiers.” “The tank is full of nitrogen.” Language is hard to deal with
  105. 105. 111
  106. 106. 112 • Many ways to say the same thing ▫ “the same thing can be said in many ways” ▫ “language is versatile” ▫ “The same words can be arranged in many different ways to express the same idea” ▫ … Language is hard to deal with
  107. 107. 113 • APIs • Human Insight • Expert Knowledge • Build your own Options?
  108. 108. 114 NLP pipeline Data Ingestion from Edgar Pre-Processing Invoking APIs to label data Compare APIs Build a new model for sentiment Analysis
  110. 110. Register at Qufallschool.splashthat.com 116
  111. 111. Thank you! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 117

