Practical model management in the age of Data science and ML

Practical Model Risk Management
in the age of
Data Science and Machine Learning
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com
8/12/2018
ARPM MATLAB Conference

2
About us:
• Data Science, Quant Finance and
Machine Learning Advisory
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform

• Founder of QuantUniversity LLC. and
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
3

4
The drivers in the markets are changing!

5
Market impact at the speed of light!

6
The Veracity of Information also affects markets
"The goal of the securities law is to provide the capital markets with accurate
information, and people's motivation are really beside the point,"
- Prof. Jill Fisch, University of Pennsylvania Law School

7
And sentiments drives markets

10
Machine Learning & AI in finance – A paradigm shift
Stochastic
Models
Factor Models
Optimization
Risk Factors
P/Q Quants
Derivative
pricing
Trading
Strategies
Simulations
Distribution
fitting
Quant
Real-time analytics
Predictive analytics
Machine Learning
RPA
NLP
Deep Learning
Computer Vision
Graph Analytics
Chatbots
Sentiment Analysis
Alternative Data
Data Scientist

11
The Virtuous Circle of Machine Learning and AI
Smart
Algorithms
Hardware
Data

12
The rise of Big Data and Data Science
Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg

13
Smarter Algorithms
Parallel and Distributing Computing Frameworks Deep Learning Frameworks
1. Our labeled datasets were thousands of times too
small.
2. Our computers were millions of times too slow.
3. We initialized the weights in a stupid way.
4. We used the wrong type of non-linearity.
- Geoff Hinton
“Capital One was able to determine fraudulent credit
card applications in 100 milliseconds”*
* http://go.databricks.com/hubfs/pdfs/Databricks-for-FinTech-170306.pdf

15
The Machine Learning Process
Data
cleansing
Feature
Engineering
Training and
Testing
Model
building
Model
selection
Model
Deployment

16
The Machine Learning Process
Data
cleansing
Feature
Engineering
Training and
Testing
Model
building
Model
selection
Model
Deployment

19
Claim:
• Machine learning is better for fraud
detection, looking for arbitrage
opportunities and trade execution
Caution:
• Beware of imbalanced class problems
• A model that gives 99% accuracy may still
not be good enough
1. Does the model actually work for my problem?

20
Claim:
• Our models work on
datasets we have tested on
Caution:
• Do we have enough data?
• How do we handle bias in
datasets?
• Beware of overfitting
• Historical Analysis is not
Prediction
2. A prototype model is not your production model

21
AI and Machine Learning in Production
https://www.itnews.com.au/news/hsbc-societe-generale-run-
into-ais-production-problems-477966
Kristy Roth from HSBC:
“It’s been somewhat easy - in a funny way - to
get going using sample data, [but] then you hit
the real problems,” Roth said.
“I think our early track record on PoCs or pilots
hides a little bit the underlying issues.
Matt Davey from Societe Generale:
“We’ve done quite a bit of work with RPA
recently and I have to say we’ve been a bit
disillusioned with that experience,”
“the PoC is the easy bit: it’s how you get that
into production and shift the balance”

22
Claim:
• The model just works. We don’t know
how!
Caution:
• It’s still not a proven science
• Interpretability or “auditability” of
models is important
• Transparency in codebase is paramount
with the proliferation of opensource
tools
• Skilled data scientists who are
knowledgeable about algorithms and
their appropriate usage are key to
successful adoption
3. We are in uncharted territories

23
Claim:
• Machine Learning models are
more accurate than
traditional models
Caution:
• Is accuracy the right metric?
• How do we evaluate the
model? RMS or R2
• How does the model behave
in different regimes?
• What is our Hyperparameter
tuning strategy?
4. Choose the right metrics for evaluation

24
Claim:
• Machine Learning and AI will replace
humans in most applications
Caution:
• Beware of the hype!
• Just because it worked some times
doesn’t mean that the organization can
be on autopilot
• Will we have true AI or Augmented
Intelligence?
• Model risk and robust risk
management is paramount to the
success of the organization.
• We are just getting started!
5. Are we there yet?
https://www.bloomberg.com/news/articles/2017-10-20/automation-
starts-to-sweep-wall-street-with-tons-of-glitches

25
• The regulatory sandbox allows businesses to test innovative
products, services, business models and delivery mechanisms in the
real market, with real consumers.
• The sandbox is a supervised space, open to both authorized and
unauthorized firms, that provides firms with:
▫ reduced time-to-market at potentially lower cost
▫ appropriate consumer protection safeguards built in to new products and
services
▫ better access to finance
• https://www.fca.org.uk/firms/regulatory-sandbox
Regulatory Sandboxes for testing and validating Fintech ideas

26
Regulatory Sandboxes coming to the US

27
At a company level, how can you innovate and govern models
responsibly?

28
Proliferation of tools and technologies

29
Processes are chaotic
Planning
Reality

30
The reproducibility challenge

32
• Facilitate use of different technologies for development and
deployment
• Ensure quant modeling and deployment environments are
replicable
• Facilitate sufficient decoupling to enable Quants, Data Scientists,
Data Engineers, Dev Ops, IT to work on their tasks in the model
development process
• Ensure model and data provenance along the entire workflow rather
than an afterthought
• Enable orchestration of replicable pipelines to facilitate robust
depolyment
Key aspects of our framework to enable model governance

QuSandbox- The platform for adopting Data
Science and AI in the Enterprise
2018 Copyright QuantUniversity LLC.

34
• QuSandbox, is an end-to-end workflow based system to
enable creation and deployment of data science workflows
within the enterprise.
• Our environment incorporates model and data provenance
throughout the life-cycle of model development.
• The solution can also be hosted on-prem to leverage custom
hardware and software integrations or on a public (AWS &
GCP) or private cloud
Executive Summary

35
QuSandbox solution suite
Model
Analytics
Studio
QuResearchHub
QuSandbox
Prototype, Iterate and tune Standardize workflows
Productionize and share

39
Quant/Enterprise use cases
• Create an environment that can support multiple platforms and
programming languages
• Enable remote running of applications
• Ability to try out a Github submission/ someone else’s code
• Facilitate creation of Docker images to create replicable containers
• Create prototyping environments for Data Science/Quant teams
• Enable Data scientists/Quants to deploy their solutions
• Enable running multiple tasks and jobs
• Enable concurrent running of multiple experiments
• Integrate seamlessly with the cloud to scale up computations
Use cases

40
Fintech use cases
• To demonstrate solutions to enterprises
• Create customized enterprise trials for companies that don’t permit
installation of vendor software prior to procurement
• To manage quick updates
• Enable effective integration and hosting of services (REST APIs)
• To deploy custom services on QuSandbox
Use cases

41
Academic & Research use cases
• Enable creation of course material and exercises that could be
shared
• Enable students and workshop participants to focus on the data
science experiments rather than environment setting
Use cases

42
Partnerships & Collaborations

44
• Understanding sentiments in Earnings call transcripts
Goal

45
• Interpreting emotions
• Labeling data
Options
• APIs
• Human Insight
• Expert Knowledge
• Build your own
Challenges

46
Options?
APIs
Human
Insight
Expert
Knowledge
Build your
own

47
NLP pipeline
Data Ingestion
from Edgar
Pre-Processing
Invoking APIs to
label data
Compare APIs
Build a new
model for
sentiment
Analysis
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
• Amazon Comprehend API
• Google API
• Watson API
• Azure API

48
Step1: Setup projects for each Stage on the QuSandbox
Code Data
Environment Process

49
Creating replicable environments
Creating and manage replicable environments (Code + software + data) in a single portal

50
Creating replicable environments
Create replicable environments (Code + software + data) through a easy point & click tool and
publish to Dockerhub or manage internally
Share it with target users

51
Step 2: Test, Iterate, Snapshot experiments

53
Compare and evaluate results

54
Step 3: Review results through the QuResearchHub

55
Step 4: Build a model with MATLAB using the MATLAB IDE on
QuSandbox

56
Step 5: Set up your Quant Research Pipeline on the Model
Management Studio to enable replication and automation

57
NLP pipeline
Data Ingestion
from Edgar
Pre-Processing
Invoking APIs to
label data
Compare APIs
Build a new
model for
sentiment
Analysis
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
• Amazon Comprehend API
• Google API
• Watson API
• Azure API

59
Step 6: Automate using Command line tools

60
Step 7: Create process/replicability documentation for each
stage

61
Step 8: Share results & API/App endpoints through the
QuResearchHub

65
QuantUniversity’s Model Risk related whitepapers published in the Wilmott Magazine
Email me at sri@quantuniversity.com for a copy

S ri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.QuantUniversity.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
67

Practical model management in the age of Data science and ML

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Practical model management in the age of Data science and ML

Similar to Practical model management in the age of Data science and ML (20)

More from QuantUniversity

More from QuantUniversity (20)

Recently uploaded

Recently uploaded (20)

Practical model management in the age of Data science and ML