SlideShare a Scribd company logo
1 of 101
A Master Class in AI and Machine Learning
for Financial Professionals
2019 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com
05/02/2019
ODSC-East
Boston MA
2
Speaker bio
• Advisory and Consultancy for Financial
Analytics
• Prior Experience at MathWorks, Citigroup
and Endeca and 25+ financial services and
energy customers.
• Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Teaches Analytics in the Babson College MBA
program and at Northeastern University,
Boston
• Reviewer: Journal of Asset Management
Sri Krishnamurthy
Founder and CEO
QuantUniversity
3
About www.QuantUniversity.com
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Building a platform for AI
and Machine Learning Enablement
in the Enterprise
• Key trends in AI and machine learning
• 5 things you need to know about machine learning
• Machine Learning in 30 minutes
• Building a ML application in 10 steps
• Case studies
Agenda
AI and Machine Learning in Finance
6
My journey into AI/ML in finance 5 pictures
7
The 4th Industrial revolution is Here!
Source: Christoph Roser at AllAboutLean.com
As per Wikipedia*, “The 4th Industrial Revolution ….. marked by emerging technology breakthroughs in a
number of fields, including robotics, artificial intelligence, nanotechnology, quantum computing, biotechnology,
the Internet of Things, the Industrial Internet of Things (IIoT), decentralized consensus, fifth-generation wireless
technologies (5G), additive manufacturing/3D printing and fully autonomous vehicles.”
* https://en.wikipedia.org/wiki/Fourth_Industrial_Revolution
8
Your challenge is to design an artificial intelligence and machine learning (AI/ML)
framework capable of flying a drone through several professional drone racing
courses without human intervention or navigational pre-programming.
AI is no longer science fiction!
Source: https://www.lockheedmartin.com/en-us/news/events/ai-innovation-challenge.html
9
Scientists are disrupting the way we live!
Source: https://www.ladn.eu/tech-a-suivre/mobilite-2030-vehicules-volants-open-data/
10
Interest in Machine learning continues to grow
https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
11
Source: https://www.bbc.com/news/technology-35785875
12
MACHINE LEARNING AND AI IS REVOLUTIONIZING FINANCE
13
Market impact at the speed of light!
13
14
Machine Learning & AI in finance: A paradigm shift
14
Stochastic
Models
Factor Models
Optimization
Risk Factors
P/Q Quants
Derivative pricing
Trading Strategies
Simulations
Distribution
fitting
Quant
Real-time analytics
Predictive analytics
Machine Learning
RPA
NLP
Deep Learning
Computer Vision
Graph Analytics
Chatbots
Sentiment Analysis
Alternative Data
Data Scientist
15
CFA Institute has adopted Fintech and AI content in its curriculum
Ref: https://www.cfainstitute.org/-/media/documents/support/programs/cfa/cfa-program-level-iii-fintech-in-investment-management.ashx
16
The Virtuous Circle of
Machine Learning and AI
16
Smart
Algorithms
Hardware
Data
17
The rise of Big Data and Data Science
17
Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg
18
Smart Algorithms
18
Distributing Computing Frameworks Deep Learning Frameworks
1. Our labeled datasets were thousands of times too
small.
2. Our computers were millions of times too slow.
3. We initialized the weights in a stupid way.
4. We used the wrong type of non-linearity.
- Geoff Hinton
“Capital One was able to determine fraudulent credit
card applications in 100 milliseconds”*
* http://go.databricks.com/hubfs/pdfs/Databricks-for-FinTech-170306.pdf
19
Hardware
Speed up calculations with
1000s of processors
Scale computations with
infinite compute power
• Bank of America
• Ravenpack
• Northfield
Examples on how AI and ML are used in Finance
#Disrupt19
21
Use Cases in NLP
Risk Management
Power risk models by
informing clients about
their portfolio exposures
to headline risk and
public disclosures.
Compliance
Reduce costs in trade
surveillance and
compliance by
reducing the number
of false-positives
chased by analysts
and officers.
Benchmarks
Create innovative
investable indexes
powered by AI and
Big Data.
Alpha Generation
Create trading signals
by ingesting event and
sentiment data; identify
securities that are likely
to suffer from short
squeezes or reversals.
Risk Systems That Read®
• Northfield uses machine learning based analysis of news text
to describe how current conditions in financial markets are
different than usual.
• Typically, over 8000 articles per day containing more than
20,000 “topics” (companies, industries, countries) are
processed.
• The nature and magnitudes of these difference are used to
revise expectations of financial market risks for all global
equities and credit instruments on a daily basis.
#Disrupt19
Building your Data science applications which uses AI/ML in 10 steps
25
1. Articulate your business problem
Data science in 10 steps
26
2. The Data questions
1. Do you know what data you need ?
2. Do you know if the data is available?
3. Do you have the data ?
4. Do you have the right data?
5. Will you continue to have the data?
Data science in 10 steps
27
3. Develop a data acquisition and data prep strategy
1. Do you know how to get the data ?
2. Who gets the data?
3. How do you process it?
4. How do you access it?
5. How do you version and govern the data?
Data science in 10 steps
28
4. Explore and evaluate your data and get it in the right format
Data science in 10 steps
29
5. Define your goal:
1. Summarization
2. Fact finding
3. Understanding relationships
4. Prediction
Data science in 10 steps
30
6. Shortlist (not “Choose” ) the
techniques/methodologies/algorithms
Data science in 10 steps
31
7. Evaluate/establish business constraints and narrow down your
choices of techniques/methodologies/algorithms
1. Cloud/Cost/Expertise/Cost-Value
2. Build/buy/access
Data science in 10 steps
Outcomes
Time
Quality
Cost
32
8. Establish criteria to know if the methodology/models/algorithms
work
1. Is the process replicable?
2. What performance metrics do we choose?
3. Can you evaluate the performance and validate if the models meet
the criteria?
4. Does it provide business value?
Data science in 10 steps
33
9. Fine tune your algorithms and algorithm selection
1. Hyper parameter tuning
2. Bias-variance tradeoff
3. Handling imbalanced class problems
4. Ensemble techniques
5. AutoML
Data science in 10 steps
https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf
34
10. How will this process reach decision makers
1. Deployment choices (On-prem/Cloud)
2. Frequency of data/model updates
3. Governance/Role/Responsibilities
4. Speed, Scale, Availability, Disaster recovery, Rollback, Pull-Plug
Data science in 10 steps
35
How do you monitor the efficacy of your solution?
1. Retuning
2. Monitoring
3. Model decay
4. Data augmentation
5. Newer innovations
Data science in 10 steps - Bonus
37
Claim:
• Machine learning is good for credit-card fraud detection
Caution:
• Beware of imbalanced class problems
• A model that gives 99% accuracy may still not be good enough
1.Machine learning is not a generic solution to all problems
37
38
Claim:
• Our models work on all the datasets we have tested on
Caution:
• Do we have enough data?
• How do we handle bias in datasets?
• Beware of overfitting
• Historical Analysis is not Prediction
2. A prototype model is not A production model
38
39
Prototyping vs Production: The reality
https://www.itnews.com.au/news/hsbc-societe-generale-run-
into-ais-production-problems-477966
Kristy Roth from HSBC:
“It’s been somewhat easy - in a funny way - to
get going using sample data, [but] then you hit
the real problems,” Roth said.
“I think our early track record on PoCs or pilots
hides a little bit the underlying issues.
Matt Davey from Societe Generale:
“We’ve done quite a bit of work with RPA
recently and I have to say we’ve been a bit
disillusioned with that experience,”
“the PoC is the easy bit: it’s how you get that
into production and shift the balance”
40
Claim:
• It works. We don’t know how!
Caution:
• Lots of heuristics; still not a proven science
• Interpretability, Fairness, Auditability of models are important
• Beware of black boxes; Transparency in codebase is paramount
with the proliferation of opensource tools
• Skilled data scientists with knowledge of algorithms and their
appropriate usage are key to successful adoption
3. We are just getting started!
40
41
Claim:
• Machine Learning models are more
accurate than traditional models
Caution:
• Is accuracy the right metric?
• How do we evaluate the model? Accuracy
or F1-Score?
• How does the model behave in different
regimes?
4. Choose the right metrics for evaluation
41
Source:
https://en.wikipedia.org/wiki/Confusion_matrix
42
Claim:
• Machine Learning and AI will replace humans
in most applications
Caution:
• Just because it worked some times doesn’t
mean that the organization can be on
autopilot
• Will we have true AI or Augmented
Intelligence?
• Model risk and robust risk management is
paramount to the success of the
organization.
• We are just getting started!
5. Are we there yet?
42
https://www.bloomberg.com/news/articles/2017-10-
20/automation-starts-to-sweep-wall-street-with-tons-of-
glitches
43
Can Machine Learning algorithms be gamed?
https://www.youtube.com/watch?time_continue=36&v=MIbFv
K2S9g8
https://arxiv.org/abs/1904.08653
45
Machine Learning
Unsupervised Supervised
Reinforcement Semi-Supervised
Machine Learning
46
Goal
Descriptive
Statistics
Cross
sectional
Numerical Categorical
Numerical vs
Categorical
Categorical vs
Categorical
Numerical vs
Numerical
Time series
Predictive
Analytics
Cross-
sectional
Segmentation Prediction
Predict a
number
Predict a
category
Time-series
Machine Learning Algorithms
46
47
Supervised Algorithms
▫ Given a set of variables !", predict the value of another variable # in
a given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
▫ Example: Given that a customer’s Debt-to-Income ratio increased 20%, what are
the chances he/she would default in 3 months?
Machine Learning
47
x1,x2,x3… Model F(X) y
48
Unsupervised Algorithms
▫ Given a dataset with variables !", build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
▫ Example: Given a list of emerging market stocks, can we segment them
into three buckets?
Machine Learning
48
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
49
Supervised
Learning
algorithms
Parametric
models
Non-
Parametric
models
Supervised learning Algorithms - Prediction
49
50
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
50
! = #$ + #&'&
Linear Regression Model Neural network Model
51
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest neighbors, Decision Trees
Supervised Learning models
51
K-nearest neighbor Model Decision tree Model
52
• Given estimates !"#, !"%, … , !"'We can make predictions using
the formula
() = !"# + !"%,% + !"-,- + ⋯ + !"','
• The parameters are estimated using the least squares approach
to minimize the sum of squared errors
/00 = 1
23%
4
()2 − ()2)-
Multiple linear regression
52
53
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Logistic Regression, Neural Networks
Supervised Learning models - Classification
53
Logistic Regression Model Neural network Model
54
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest Neighbors, Decision Trees
Supervised Learning models
54
K-nearest neighbor Model Decision tree Model
55
Unsupervised Algorithms
▫ Given a dataset with variables !", build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
55
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
56
• These methods partition the data into k clusters by assigning each data point to its
closest cluster centroid by minimizing the within-cluster sum of squares (WSS), which
is:
!
"#$
%
!
&∈()
!
*#$
+
(-&* − /"*)1
where 2" is the set of observations in the kth cluster and /"* is the mean of jth
variable of the cluster center of the kth cluster.
• Then, they select the top n points that are the farthest away from their nearest
cluster centers as outliers.
K-means clustering
56
57
Euclidean distance:
Distance functions
58
Correlation distance:
Distance functions
59
Machine
Learning
Supervised
Prediction
Parametric
Linear
Regression
Neural
Networks
Non-
parametric
KNN
Decision
Trees
Classification
Parametric
Logistic
Regression
Neural
Networks
Non
Parametric
Decision
Trees KNN
Unsupervised
algorithms
K-means
Associative
rule mining
Machine Learning Algorithms
59
60
Anomaly Detection vs Unsupervised Learning
60
61
Machine Learning movers and shakers
Deep
Learning
Automatic
Machine
Learning
Ensemble
Learning
Natural
Language
Processing
62
http://www.asimovinstitute.org/neural-network-zoo/
64
Evaluating
Machine learning
algorithms
Supervised -
Prediction
R-square RMS MAE MAPE
Supervised-
Classification
Confusion Matrix ROC Curves
Evaluation framework
64
65
• The prediction error for record i is defined as the difference
between its actual y value and its predicted y value
!" = $" − &$"
• '(
indicates how well data fits the statistical model
'(
= 1 −
∑"+,
-
($" − &$")(
∑"+,
-
($" − 0$")(
Prediction Accuracy Measures
66
• Fit measures in classical regression modeling:
• Adjusted !" has been adjusted for the number of predictors. It increases
only when the improve of model is more than one would expect to see by
chance (p is the total number of explanatory variables)
#$%&'()$ !" = 1 −
⁄∑/01
2
(4/ − 54/)" (7 − 8 − 1)
∑/01
2
4/ − 94/
" /(7 − 1)
• MAE or MAD (mean absolute error/deviation) gives the magnitude of the
average absolute error
;#< =
∑/01
2
)/
7
Prediction Accuracy Measures
67
▫ MAPE (mean absolute percentage error) gives a percentage score of
how predictions deviate on average
!"#$ =
∑'()
*
+'/-'
.
×100%
• RMSE (root-mean-squared error) is computed on the training and
validation data
3!4$ = 1/. 5
'()
*
+'
6
Prediction Accuracy Measures
68
• Consider a two-class case with classes !" and !#
• Classification matrix:
Classification matrix
Predicted Class
Actual Class !" !#
!"
$","= number of !" cases
classified correctly
$",#= number of !" cases
classified incorrectly as !#
!#
$#,"= number of !# cases
classified incorrectly as !"
$#,#= number of !# cases
classified correctly
69
• Estimated misclassification rate (overall error rate) is a main
accuracy measure
!"" =
$%,' + $',%
$%,% + $%,' + $',% + $','
=
$%,' + $',%
$
• Overall accuracy:
)**+",*- = 1 − !"" =
$%,% + $','
$
Accuracy Measures
70
• The ROC curve plots the pairs {sensitivity, 1-
specificity} as the cutoff value increases from 0
and 1
• Sensitivity (also called the true positive rate, or
recall in some fields) measures the proportion of
positives that are correctly identified (e.g., the
percentage of sick people who are correctly
identified as having the condition).
• Specificity (also called the true negative rate)
measures the proportion of negatives that are
correctly identified as such (e.g., the percentage of
healthy people who are correctly identified as not
having the condition).
• Better performance is reflected by curves that are
closer to the top left corner
ROC Curve
72
The reproducibility challenge
73
What’s needed for reproducibility
Code Data
Environment Process
74
QuSandbox solution suite for ML/AI applications
Model
Analytics
Studio
QuSandbox
Research
hub
75
Prototype
Standardize
workflow
Productionize
and share
DEMO with QuSandbox
75
QuSandbox Model Analytics Studio ResearchHub
76
1. Data
2. Goals
3. Machine learning algorithms
4. Process
5. Performance Evaluation
6. Model Deployment
Recap
77
Data
Cross
sectional
Numerical Categorical
Longitudinal
Numerical
Handling Data
77
78
Goal
Descriptive
Statistics
Cross
sectional
Numerical Categorical
Numerical vs
Categorical
Categorical vs
Categorical
Numerical vs
Numerical
Time series
Predictive
Analytics
Cross-
sectional
Segmentation Prediction
Predict a
number
Predict a
category
Time-series
Goal
78
79
Machine
Learning
Supervised
Prediction
Parametric
Linear
Regression
Neural
Networks
Non-
parametric
KNN
Decision
Trees
Classification
Parametric
Logistic
Regression
Neural
Networks
Non
Parametric
Decision
Trees KNN
Unsupervised
algorithms
K-means
Associative
rule mining
Machine Learning Algorithms
79
80
The Process
80
Data
ingestion
Data
cleansing
Feature
engineering
Training
and testing
Model
building
Model
selection
81
Evaluating
Machine learning
algorithms
Supervised -
Prediction
R-square RMS MAE MAPE
Supervised-
Classification
Confusion Matrix ROC Curves
Evaluation framework
81
Machine Learning Workflow
Data Scraping/
Ingestion
Data
Exploration
Data Cleansing
and Processing
Feature
Engineering
Model
Evaluation
& Tuning
Model
Selection
Model
Deployment/
Inference
Supervised
Unsupervised
Modeling
Data Engineer, Dev Ops Engineer
Data Scientist/QuantsSoftware/Web Engineer
• AutoML
• Model Validation
• Interpretability
Robotic Process Automation (RPA) (Microservices, Pipelines )
• SW: Web/ Rest API
• HW: GPU, Cloud
• Monitoring
• Regression
• KNN
• Decision Trees
• Naive Bayes
• Neural Networks
• Ensembles
• Clustering
• PCA
• Autoencoder
• RMS
• MAPS
• MAE
• Confusion Matrix
• Precision/Recall
• ROC
• Hyper-parameter
tuning
• Parameter Grids
Risk Management/ Compliance(All stages)
Analysts&
DecisionMakers
#Disrupt19
Sentiment Analysis Using Natural Language Processing in Finance
• What is Sentiment Analysis?
• The Case study Setup
• Design Choices
• The Pipeline
• Demo
#Disrupt19
Agenda
85
What is NLP ?
AI
Linguistics
Computer
Science
86
• Q/A
• Dialog systems - Chatbots
• Topic summarization
• Sentiment analysis
• Classification
• Keyword extraction - Search
• Information extraction – Prices, Dates, People etc.
• Tone Analysis
• Machine Translation
• Document comparison – Similar/Dissimilar
Sample applications
87
NLP in Finance
88
• The process of computationally identifying and categorizing
opinions expressed in a piece of text, especially in order to
determine whether the writer's attitude towards a particular
topic, product, etc. is positive, negative, or neutral.
Sentiment Analysis
#Disrupt19
89
• Understanding sentiments in Earnings call transcripts
Goal
89
90
• Interpreting emotions
• Labeling data
Options
• APIs
• Human Insight
• Expert Knowledge
• Build your own
Challenges
91
NLP pipeline
Data Ingestion
from Edgar
Pre-Processing
Invoking APIs to
label data
Compare APIs
Build a new
model for
sentiment
Analysis
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
• Amazon Comprehend API
• Google API
• Watson API
• Azure API
#Disrupt19
Credit Risk Decision Making Using Lending Club Data
93
1. Case Intro
2. Data Exploration of the Credit risk data set
3. Problem Definition and Machine learning
4. Performance Evaluation
5. Deployment
Case study 2
94
Credit risk in consumer credit
Credit-scoring models and techniques assess the risk in
lending to customers.
Typical decisions:
• Grant credit/not to new applicants
• Increasing/Decreasing spending limits
• Increasing/Decreasing lending rates
• What new products can be given to existing applicants ?
95
Credit assessment in consumer credit
History:
• Gut feel
• Social network
• Communities and influence
Traditional:
• Scoring mechanisms through credit bureaus
• Bank assessments through business rules
Newer approaches:
• Peer-to-Peer lending
• Prosper Market place
96
The Data
96
https://www.kaggle.com/wendykan/lending-club-loan-data
97
Credit Risk pipeline
Data Ingestion
from Lending
Club
Pre-Processing
Feature
Engineering
Model
Development
and Tuning
Model
Deployment
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
98
98
99#Disrupt19
1. Whitepapers at www.quantuniversity.com
2. https://blogs.cfainstitute.org/investor/tag/machine-learning/
3. https://techcrunch.com/
4. https://www.technologyreview.com/
5. https://www.bbc.com/timelines/zypd97h
6. https://www.bbc.com/timelines/zq376fr
Additional Reading
100
www.QuSandbox.com
Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
101

More Related Content

What's hot

What's hot (20)

Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
 
Explainable AI in Healthcare
Explainable AI in HealthcareExplainable AI in Healthcare
Explainable AI in Healthcare
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)Automated Machine Learning (Auto ML)
Automated Machine Learning (Auto ML)
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
An Introduction to Generative AI - May 18, 2023
An Introduction  to Generative AI - May 18, 2023An Introduction  to Generative AI - May 18, 2023
An Introduction to Generative AI - May 18, 2023
 
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1AI and ML Series - Introduction to Generative AI and LLMs - Session 1
AI and ML Series - Introduction to Generative AI and LLMs - Session 1
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
 
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
Foundation of Generative AI: Study Materials Connecting the Dots by Delving i...
 
An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!An Introduction to XAI! Towards Trusting Your ML Models!
An Introduction to XAI! Towards Trusting Your ML Models!
 
Machine learning
Machine learningMachine learning
Machine learning
 
Artificial intelligence
Artificial intelligence Artificial intelligence
Artificial intelligence
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
UTILITY OF AI
UTILITY OF AIUTILITY OF AI
UTILITY OF AI
 
Machine learning
Machine learningMachine learning
Machine learning
 
DC02. Interpretation of predictions
DC02. Interpretation of predictionsDC02. Interpretation of predictions
DC02. Interpretation of predictions
 
machine learning
machine learningmachine learning
machine learning
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 

Similar to Machine Learning for Finance Master Class

Similar to Machine Learning for Finance Master Class (20)

Model governance in the age of data science & AI
Model governance in the age of data science & AIModel governance in the age of data science & AI
Model governance in the age of data science & AI
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa poland
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 steps
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
 
Machine Learning and AI in Risk Management
Machine Learning and AI in Risk ManagementMachine Learning and AI in Risk Management
Machine Learning and AI in Risk Management
 
ML and AI in Finance: Master Class
ML and AI in Finance: Master ClassML and AI in Finance: Master Class
ML and AI in Finance: Master Class
 
CFA-NY Workshop - Final slides
CFA-NY Workshop - Final slidesCFA-NY Workshop - Final slides
CFA-NY Workshop - Final slides
 
Ml master class
Ml master classMl master class
Ml master class
 
Ml master class northeastern university
Ml master class   northeastern universityMl master class   northeastern university
Ml master class northeastern university
 
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute MasterclassMachine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sectorMachine Learning Adoption: Crossing the chasm for banking and insurance sector
Machine Learning Adoption: Crossing the chasm for banking and insurance sector
 
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
ETDP 2015 D1 SMAC & the Journey from Automation to Digital Factory - Snjeev K...
 
Technovision
TechnovisionTechnovision
Technovision
 
Adopting Data Science and Machine Learning in the financial enterprise
Adopting Data Science and Machine Learning in the financial enterpriseAdopting Data Science and Machine Learning in the financial enterprise
Adopting Data Science and Machine Learning in the financial enterprise
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
Data science in 10 steps
Data science in 10 stepsData science in 10 steps
Data science in 10 steps
 
Industrial revolution 4.0
Industrial revolution 4.0 Industrial revolution 4.0
Industrial revolution 4.0
 
Ml conference slides boston june 2019
Ml conference slides boston june 2019Ml conference slides boston june 2019
Ml conference slides boston june 2019
 

More from QuantUniversity

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
QuantUniversity
 

More from QuantUniversity (20)

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
 
The API Jungle
The API JungleThe API Jungle
The API Jungle
 
Explainable AI Workshop
Explainable AI WorkshopExplainable AI Workshop
Explainable AI Workshop
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
Qwafafew meeting 5
Qwafafew meeting 5Qwafafew meeting 5
Qwafafew meeting 5
 
Qu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial MarketsQu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial Markets
 
Fintech in the Post-Covid Age
Fintech in the Post-Covid AgeFintech in the Post-Covid Age
Fintech in the Post-Covid Age
 
Master Class: GANS with Applications in Synthetic Data Generation
Master Class:   GANS with  Applications in  Synthetic Data GenerationMaster Class:   GANS with  Applications in  Synthetic Data Generation
Master Class: GANS with Applications in Synthetic Data Generation
 

Recently uploaded

Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
SayantanBiswas37
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
gajnagarg
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
HyderabadDolls
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
HyderabadDolls
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
nirzagarg
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
gajnagarg
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
gajnagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 

Recently uploaded (20)

Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Computer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdfComputer science Sql cheat sheet.pdf.pdf
Computer science Sql cheat sheet.pdf.pdf
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf20240412-SmartCityIndex-2024-Full-Report.pdf
20240412-SmartCityIndex-2024-Full-Report.pdf
 
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Latur [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
Sonagachi * best call girls in Kolkata | ₹,9500 Pay Cash 8005736733 Free Home...
 
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
Charbagh + Female Escorts Service in Lucknow | Starting ₹,5K To @25k with A/C...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
Jodhpur Park | Call Girls in Kolkata Phone No 8005736733 Elite Escort Service...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
怎样办理圣地亚哥州立大学毕业证(SDSU毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
Top profile Call Girls In Bihar Sharif [ 7014168258 ] Call Me For Genuine Mod...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Indore [ 7014168258 ] Call Me For Genuine Models We...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
Top profile Call Girls In Vadodara [ 7014168258 ] Call Me For Genuine Models ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 

Machine Learning for Finance Master Class

  • 1. A Master Class in AI and Machine Learning for Financial Professionals 2019 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.analyticscertificate.com 05/02/2019 ODSC-East Boston MA
  • 2. 2 Speaker bio • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston • Reviewer: Journal of Asset Management Sri Krishnamurthy Founder and CEO QuantUniversity
  • 3. 3 About www.QuantUniversity.com • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Building a platform for AI and Machine Learning Enablement in the Enterprise
  • 4. • Key trends in AI and machine learning • 5 things you need to know about machine learning • Machine Learning in 30 minutes • Building a ML application in 10 steps • Case studies Agenda
  • 5. AI and Machine Learning in Finance
  • 6. 6 My journey into AI/ML in finance 5 pictures
  • 7. 7 The 4th Industrial revolution is Here! Source: Christoph Roser at AllAboutLean.com As per Wikipedia*, “The 4th Industrial Revolution ….. marked by emerging technology breakthroughs in a number of fields, including robotics, artificial intelligence, nanotechnology, quantum computing, biotechnology, the Internet of Things, the Industrial Internet of Things (IIoT), decentralized consensus, fifth-generation wireless technologies (5G), additive manufacturing/3D printing and fully autonomous vehicles.” * https://en.wikipedia.org/wiki/Fourth_Industrial_Revolution
  • 8. 8 Your challenge is to design an artificial intelligence and machine learning (AI/ML) framework capable of flying a drone through several professional drone racing courses without human intervention or navigational pre-programming. AI is no longer science fiction! Source: https://www.lockheedmartin.com/en-us/news/events/ai-innovation-challenge.html
  • 9. 9 Scientists are disrupting the way we live! Source: https://www.ladn.eu/tech-a-suivre/mobilite-2030-vehicules-volants-open-data/
  • 10. 10 Interest in Machine learning continues to grow https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
  • 12. 12 MACHINE LEARNING AND AI IS REVOLUTIONIZING FINANCE
  • 13. 13 Market impact at the speed of light! 13
  • 14. 14 Machine Learning & AI in finance: A paradigm shift 14 Stochastic Models Factor Models Optimization Risk Factors P/Q Quants Derivative pricing Trading Strategies Simulations Distribution fitting Quant Real-time analytics Predictive analytics Machine Learning RPA NLP Deep Learning Computer Vision Graph Analytics Chatbots Sentiment Analysis Alternative Data Data Scientist
  • 15. 15 CFA Institute has adopted Fintech and AI content in its curriculum Ref: https://www.cfainstitute.org/-/media/documents/support/programs/cfa/cfa-program-level-iii-fintech-in-investment-management.ashx
  • 16. 16 The Virtuous Circle of Machine Learning and AI 16 Smart Algorithms Hardware Data
  • 17. 17 The rise of Big Data and Data Science 17 Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg
  • 18. 18 Smart Algorithms 18 Distributing Computing Frameworks Deep Learning Frameworks 1. Our labeled datasets were thousands of times too small. 2. Our computers were millions of times too slow. 3. We initialized the weights in a stupid way. 4. We used the wrong type of non-linearity. - Geoff Hinton “Capital One was able to determine fraudulent credit card applications in 100 milliseconds”* * http://go.databricks.com/hubfs/pdfs/Databricks-for-FinTech-170306.pdf
  • 19. 19 Hardware Speed up calculations with 1000s of processors Scale computations with infinite compute power
  • 20. • Bank of America • Ravenpack • Northfield Examples on how AI and ML are used in Finance #Disrupt19
  • 21. 21
  • 22. Use Cases in NLP Risk Management Power risk models by informing clients about their portfolio exposures to headline risk and public disclosures. Compliance Reduce costs in trade surveillance and compliance by reducing the number of false-positives chased by analysts and officers. Benchmarks Create innovative investable indexes powered by AI and Big Data. Alpha Generation Create trading signals by ingesting event and sentiment data; identify securities that are likely to suffer from short squeezes or reversals.
  • 23. Risk Systems That Read® • Northfield uses machine learning based analysis of news text to describe how current conditions in financial markets are different than usual. • Typically, over 8000 articles per day containing more than 20,000 “topics” (companies, industries, countries) are processed. • The nature and magnitudes of these difference are used to revise expectations of financial market risks for all global equities and credit instruments on a daily basis.
  • 24. #Disrupt19 Building your Data science applications which uses AI/ML in 10 steps
  • 25. 25 1. Articulate your business problem Data science in 10 steps
  • 26. 26 2. The Data questions 1. Do you know what data you need ? 2. Do you know if the data is available? 3. Do you have the data ? 4. Do you have the right data? 5. Will you continue to have the data? Data science in 10 steps
  • 27. 27 3. Develop a data acquisition and data prep strategy 1. Do you know how to get the data ? 2. Who gets the data? 3. How do you process it? 4. How do you access it? 5. How do you version and govern the data? Data science in 10 steps
  • 28. 28 4. Explore and evaluate your data and get it in the right format Data science in 10 steps
  • 29. 29 5. Define your goal: 1. Summarization 2. Fact finding 3. Understanding relationships 4. Prediction Data science in 10 steps
  • 30. 30 6. Shortlist (not “Choose” ) the techniques/methodologies/algorithms Data science in 10 steps
  • 31. 31 7. Evaluate/establish business constraints and narrow down your choices of techniques/methodologies/algorithms 1. Cloud/Cost/Expertise/Cost-Value 2. Build/buy/access Data science in 10 steps Outcomes Time Quality Cost
  • 32. 32 8. Establish criteria to know if the methodology/models/algorithms work 1. Is the process replicable? 2. What performance metrics do we choose? 3. Can you evaluate the performance and validate if the models meet the criteria? 4. Does it provide business value? Data science in 10 steps
  • 33. 33 9. Fine tune your algorithms and algorithm selection 1. Hyper parameter tuning 2. Bias-variance tradeoff 3. Handling imbalanced class problems 4. Ensemble techniques 5. AutoML Data science in 10 steps https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf
  • 34. 34 10. How will this process reach decision makers 1. Deployment choices (On-prem/Cloud) 2. Frequency of data/model updates 3. Governance/Role/Responsibilities 4. Speed, Scale, Availability, Disaster recovery, Rollback, Pull-Plug Data science in 10 steps
  • 35. 35 How do you monitor the efficacy of your solution? 1. Retuning 2. Monitoring 3. Model decay 4. Data augmentation 5. Newer innovations Data science in 10 steps - Bonus
  • 36.
  • 37. 37 Claim: • Machine learning is good for credit-card fraud detection Caution: • Beware of imbalanced class problems • A model that gives 99% accuracy may still not be good enough 1.Machine learning is not a generic solution to all problems 37
  • 38. 38 Claim: • Our models work on all the datasets we have tested on Caution: • Do we have enough data? • How do we handle bias in datasets? • Beware of overfitting • Historical Analysis is not Prediction 2. A prototype model is not A production model 38
  • 39. 39 Prototyping vs Production: The reality https://www.itnews.com.au/news/hsbc-societe-generale-run- into-ais-production-problems-477966 Kristy Roth from HSBC: “It’s been somewhat easy - in a funny way - to get going using sample data, [but] then you hit the real problems,” Roth said. “I think our early track record on PoCs or pilots hides a little bit the underlying issues. Matt Davey from Societe Generale: “We’ve done quite a bit of work with RPA recently and I have to say we’ve been a bit disillusioned with that experience,” “the PoC is the easy bit: it’s how you get that into production and shift the balance”
  • 40. 40 Claim: • It works. We don’t know how! Caution: • Lots of heuristics; still not a proven science • Interpretability, Fairness, Auditability of models are important • Beware of black boxes; Transparency in codebase is paramount with the proliferation of opensource tools • Skilled data scientists with knowledge of algorithms and their appropriate usage are key to successful adoption 3. We are just getting started! 40
  • 41. 41 Claim: • Machine Learning models are more accurate than traditional models Caution: • Is accuracy the right metric? • How do we evaluate the model? Accuracy or F1-Score? • How does the model behave in different regimes? 4. Choose the right metrics for evaluation 41 Source: https://en.wikipedia.org/wiki/Confusion_matrix
  • 42. 42 Claim: • Machine Learning and AI will replace humans in most applications Caution: • Just because it worked some times doesn’t mean that the organization can be on autopilot • Will we have true AI or Augmented Intelligence? • Model risk and robust risk management is paramount to the success of the organization. • We are just getting started! 5. Are we there yet? 42 https://www.bloomberg.com/news/articles/2017-10- 20/automation-starts-to-sweep-wall-street-with-tons-of- glitches
  • 43. 43 Can Machine Learning algorithms be gamed? https://www.youtube.com/watch?time_continue=36&v=MIbFv K2S9g8 https://arxiv.org/abs/1904.08653
  • 44.
  • 46. 46 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-series Machine Learning Algorithms 46
  • 47. 47 Supervised Algorithms ▫ Given a set of variables !", predict the value of another variable # in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification ▫ Example: Given that a customer’s Debt-to-Income ratio increased 20%, what are the chances he/she would default in 3 months? Machine Learning 47 x1,x2,x3… Model F(X) y
  • 48. 48 Unsupervised Algorithms ▫ Given a dataset with variables !", build a model that captures the similarities in different observations and assigns them to different buckets => Clustering ▫ Example: Given a list of emerging market stocks, can we segment them into three buckets? Machine Learning 48 Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  • 50. 50 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Supervised Learning models - Prediction 50 ! = #$ + #&'& Linear Regression Model Neural network Model
  • 51. 51 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learning models 51 K-nearest neighbor Model Decision tree Model
  • 52. 52 • Given estimates !"#, !"%, … , !"'We can make predictions using the formula () = !"# + !"%,% + !"-,- + ⋯ + !"',' • The parameters are estimated using the least squares approach to minimize the sum of squared errors /00 = 1 23% 4 ()2 − ()2)- Multiple linear regression 52
  • 53. 53 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Logistic Regression, Neural Networks Supervised Learning models - Classification 53 Logistic Regression Model Neural network Model
  • 54. 54 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest Neighbors, Decision Trees Supervised Learning models 54 K-nearest neighbor Model Decision tree Model
  • 55. 55 Unsupervised Algorithms ▫ Given a dataset with variables !", build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Machine Learning 55 Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  • 56. 56 • These methods partition the data into k clusters by assigning each data point to its closest cluster centroid by minimizing the within-cluster sum of squares (WSS), which is: ! "#$ % ! &∈() ! *#$ + (-&* − /"*)1 where 2" is the set of observations in the kth cluster and /"* is the mean of jth variable of the cluster center of the kth cluster. • Then, they select the top n points that are the farthest away from their nearest cluster centers as outliers. K-means clustering 56
  • 60. 60 Anomaly Detection vs Unsupervised Learning 60
  • 61. 61 Machine Learning movers and shakers Deep Learning Automatic Machine Learning Ensemble Learning Natural Language Processing
  • 63.
  • 64. 64 Evaluating Machine learning algorithms Supervised - Prediction R-square RMS MAE MAPE Supervised- Classification Confusion Matrix ROC Curves Evaluation framework 64
  • 65. 65 • The prediction error for record i is defined as the difference between its actual y value and its predicted y value !" = $" − &$" • '( indicates how well data fits the statistical model '( = 1 − ∑"+, - ($" − &$")( ∑"+, - ($" − 0$")( Prediction Accuracy Measures
  • 66. 66 • Fit measures in classical regression modeling: • Adjusted !" has been adjusted for the number of predictors. It increases only when the improve of model is more than one would expect to see by chance (p is the total number of explanatory variables) #$%&'()$ !" = 1 − ⁄∑/01 2 (4/ − 54/)" (7 − 8 − 1) ∑/01 2 4/ − 94/ " /(7 − 1) • MAE or MAD (mean absolute error/deviation) gives the magnitude of the average absolute error ;#< = ∑/01 2 )/ 7 Prediction Accuracy Measures
  • 67. 67 ▫ MAPE (mean absolute percentage error) gives a percentage score of how predictions deviate on average !"#$ = ∑'() * +'/-' . ×100% • RMSE (root-mean-squared error) is computed on the training and validation data 3!4$ = 1/. 5 '() * +' 6 Prediction Accuracy Measures
  • 68. 68 • Consider a two-class case with classes !" and !# • Classification matrix: Classification matrix Predicted Class Actual Class !" !# !" $","= number of !" cases classified correctly $",#= number of !" cases classified incorrectly as !# !# $#,"= number of !# cases classified incorrectly as !" $#,#= number of !# cases classified correctly
  • 69. 69 • Estimated misclassification rate (overall error rate) is a main accuracy measure !"" = $%,' + $',% $%,% + $%,' + $',% + $',' = $%,' + $',% $ • Overall accuracy: )**+",*- = 1 − !"" = $%,% + $',' $ Accuracy Measures
  • 70. 70 • The ROC curve plots the pairs {sensitivity, 1- specificity} as the cutoff value increases from 0 and 1 • Sensitivity (also called the true positive rate, or recall in some fields) measures the proportion of positives that are correctly identified (e.g., the percentage of sick people who are correctly identified as having the condition). • Specificity (also called the true negative rate) measures the proportion of negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). • Better performance is reflected by curves that are closer to the top left corner ROC Curve
  • 71.
  • 73. 73 What’s needed for reproducibility Code Data Environment Process
  • 74. 74 QuSandbox solution suite for ML/AI applications Model Analytics Studio QuSandbox Research hub
  • 75. 75 Prototype Standardize workflow Productionize and share DEMO with QuSandbox 75 QuSandbox Model Analytics Studio ResearchHub
  • 76. 76 1. Data 2. Goals 3. Machine learning algorithms 4. Process 5. Performance Evaluation 6. Model Deployment Recap
  • 78. 78 Goal Descriptive Statistics Cross sectional Numerical Categorical Numerical vs Categorical Categorical vs Categorical Numerical vs Numerical Time series Predictive Analytics Cross- sectional Segmentation Prediction Predict a number Predict a category Time-series Goal 78
  • 81. 81 Evaluating Machine learning algorithms Supervised - Prediction R-square RMS MAE MAPE Supervised- Classification Confusion Matrix ROC Curves Evaluation framework 81
  • 82. Machine Learning Workflow Data Scraping/ Ingestion Data Exploration Data Cleansing and Processing Feature Engineering Model Evaluation & Tuning Model Selection Model Deployment/ Inference Supervised Unsupervised Modeling Data Engineer, Dev Ops Engineer Data Scientist/QuantsSoftware/Web Engineer • AutoML • Model Validation • Interpretability Robotic Process Automation (RPA) (Microservices, Pipelines ) • SW: Web/ Rest API • HW: GPU, Cloud • Monitoring • Regression • KNN • Decision Trees • Naive Bayes • Neural Networks • Ensembles • Clustering • PCA • Autoencoder • RMS • MAPS • MAE • Confusion Matrix • Precision/Recall • ROC • Hyper-parameter tuning • Parameter Grids Risk Management/ Compliance(All stages) Analysts& DecisionMakers
  • 83. #Disrupt19 Sentiment Analysis Using Natural Language Processing in Finance
  • 84. • What is Sentiment Analysis? • The Case study Setup • Design Choices • The Pipeline • Demo #Disrupt19 Agenda
  • 85. 85 What is NLP ? AI Linguistics Computer Science
  • 86. 86 • Q/A • Dialog systems - Chatbots • Topic summarization • Sentiment analysis • Classification • Keyword extraction - Search • Information extraction – Prices, Dates, People etc. • Tone Analysis • Machine Translation • Document comparison – Similar/Dissimilar Sample applications
  • 88. 88 • The process of computationally identifying and categorizing opinions expressed in a piece of text, especially in order to determine whether the writer's attitude towards a particular topic, product, etc. is positive, negative, or neutral. Sentiment Analysis #Disrupt19
  • 89. 89 • Understanding sentiments in Earnings call transcripts Goal 89
  • 90. 90 • Interpreting emotions • Labeling data Options • APIs • Human Insight • Expert Knowledge • Build your own Challenges
  • 91. 91 NLP pipeline Data Ingestion from Edgar Pre-Processing Invoking APIs to label data Compare APIs Build a new model for sentiment Analysis Stage 1 Stage 2 Stage 3 Stage 4 Stage 5 • Amazon Comprehend API • Google API • Watson API • Azure API
  • 92. #Disrupt19 Credit Risk Decision Making Using Lending Club Data
  • 93. 93 1. Case Intro 2. Data Exploration of the Credit risk data set 3. Problem Definition and Machine learning 4. Performance Evaluation 5. Deployment Case study 2
  • 94. 94 Credit risk in consumer credit Credit-scoring models and techniques assess the risk in lending to customers. Typical decisions: • Grant credit/not to new applicants • Increasing/Decreasing spending limits • Increasing/Decreasing lending rates • What new products can be given to existing applicants ?
  • 95. 95 Credit assessment in consumer credit History: • Gut feel • Social network • Communities and influence Traditional: • Scoring mechanisms through credit bureaus • Bank assessments through business rules Newer approaches: • Peer-to-Peer lending • Prosper Market place
  • 97. 97 Credit Risk pipeline Data Ingestion from Lending Club Pre-Processing Feature Engineering Model Development and Tuning Model Deployment Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
  • 98. 98 98
  • 99. 99#Disrupt19 1. Whitepapers at www.quantuniversity.com 2. https://blogs.cfainstitute.org/investor/tag/machine-learning/ 3. https://techcrunch.com/ 4. https://www.technologyreview.com/ 5. https://www.bbc.com/timelines/zypd97h 6. https://www.bbc.com/timelines/zq376fr Additional Reading
  • 101. Thank you! Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Contact Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 101