SlideShare a Scribd company logo
1 of 57
Download to read offline
Location:
ARPM Open Source Conference
8/13/2017
Machine Learning applications in Credit Risk
2017 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com
2
Slides will be available at:
www.analyticscertificatecom/MachineLearning
• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
3
4
Quantitative Analytics and Big Data Analytics Onboarding
• Data Science, Quant Finance and
Machine Learning Advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Launching
▫ Analytics Certificate Program
 Spring 2018
▫ Fintech Certification program
 Fall 2017
• Building
6
Credit risk in consumer credit
Credit-scoring models and techniques assess the risk in lending to
customers.
Typical decisions:
• Grant credit/not to new applicants
• Increasing/Decreasing spending limits
• Increasing/Decreasing lending rates
• What new products can be given to existing applicants ?
Credit assessment in consumer credit
History:
• Gut feel
• Social network
• Communities and influence
Traditional:
• Scoring mechanisms through credit bureaus
• Bank assessments through business rules
Newer approaches (FINTECH):
• Peer-to-Peer lending
• Lending club, Prosper Market place
9
10
Types of algorithms
Machine
learning
Supervised
Learning
Prediction
Classification
Unsupervised
Learning
Clustering
11
Used to derive a relationship between dependent and independent
variables
• Prediction
▫ Regression
▫ Decision Trees (CART)
▫ Neural Networks
• Classification
▫ Logistic Regression
▫ CART, Random Forest, SVM
▫ Neural Networks
Supervised Learning
12
Data pre-
processing
Split data into
Training and
Testing sets
Train the model
on Training data
Test the model
using Testing data
to evaluate model
performance
Methodology
13
• No distinction between independent variables and dependent
variables
• No result labels to determine “correct” results
• Goals:
▫ Data Reduction
▫ Clustering
Unsupervised Learning
14
• Partitioning Clustering
▫ Starts with K –number of clusters sought
▫ Observations randomly divided to form cohesive clusters
▫ Example : K-means
• Hierarchical Agglomerative Clustering
▫ Each observation is its own cluster
▫ Combine clusters two at a time to finally have one cluster
▫ Example: Hierarchical clustering using single linkage, Ward’s method
etc.
Types of Clustering
15
• Tries to separate samples into K groups with a goal of maximizing
between group variance and minimizing within group variance
• Requires K to be specified up front.
• Starts with K initial centroids and optimizes to minimize the criterion
or till the number of specified iterations are reached.
• Suited for larger datasets
K-means
16
• Goal is to derive a dendrogram starting from each record being its
own cluster
• Works well for smaller data sets
• Proximity is measured in multiple ways (more later)
Hierarchical clustering
17
How do you measure similarity between two entities ?
▫ Apples and Bananas
▫ Coke and Pepsi vs Orange juice
▫ Honda Civic vs Toyota Corolla
▫ New York and Boston
• The notion of distance
The notion of distance
18
• Euclidean distance
• Cosine distance
Distance measures
19
• Manhattan distance
(Taxi-cab distance)
• Jaccard distance
▫ Used to measure similarity or dissimilarity between binary and non-
binary variables
▫ http://people.revoledu.com/kardi/tutorial/Similarity/Jaccard.html
Other distance measures
20
• Gower distance is used for calculating distances when we have mixed types
of variables (continuous and categorical)
• Variables can be:
▫ Quantitative (such as rating scale)
▫ Binary (such as present/absent)
▫ Nominal (such as worker/teacher/clerk)
• The metrics used for each data type are described below:
▫ Quantitative: range-normalized Manhattan distance
▫ Ordinal: variable is first ranked, then Manhattan distance is used with a special
adjustment for ties
▫ Nominal: variables of k categories are first converted into k binary columns and
then the Dice coefficient is used
(https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient )
Working with mixed-data
21
• Daisy : Compute all the pairwise dissimilarities (distances) between
observations in the data set
• Pam: Partitioning (clustering) of the data into k clusters “around
medoids”, a more robust version of K-means.
• Agnes: Computes agglomerative nesting (hierarchical clustering) of
the dataset.
Support in R
22
23
Lending club
24
The Data
https://www.lendingclub.com/info/download-data.action
25
The Data
https://www.kaggle.com/wendykan/lending-club-loan-data
Variable description
• Calculate dissimilarity between observations.
• Select algorithm to group observations together
• Choose the best number of clusters
• Visualize clusters on reduced dimensions
Objective
• Partitioning around medoids (PAM) is used in this case.
• PAM is an iterative clustering procedure with the following steps:
▫ Step 1: Choose k random entities to become the medoids.
▫ Step 2: Assign every entity to its closest medoid (using the distance
matrix we have calculated).
▫ Step 3: For each cluster, identify the observation that would yield the
lowest average distance if it were to be re-assigned as the medoid. If
so, make this observation the new medoid.
▫ Step 4: If at least one medoid has changes, return to step 2.
Otherwise, end the algorithm.
Selecting number of clusters
• One way to visualize many variables in a lower dimensional space is
with t-distributed stochastic neighborhood embedding (t-SNE)
• This method is a dimension reduction technique that tries to
preserve local structure so as to make clusters visible in a 2D or 3D
visualization.
• https://en.wikipedia.org/wiki/T-
distributed_stochastic_neighbor_embedding
Visualization with reduced dimension
30
31
Alternative Credit scoring in the news
32
Fintech being noticed by Regulators
33
• The regulatory sandbox allows businesses to test innovative
products, services, business models and delivery mechanisms in the
real market, with real consumers.
• The sandbox is a supervised space, open to both authorized and
unauthorized firms, that provides firms with:
▫ reduced time-to-market at potentially lower cost
▫ appropriate consumer protection safeguards built in to new products and
services
▫ better access to finance
• https://www.fca.org.uk/firms/regulatory-sandbox
Regulatory Sandboxes
34
US Regulators catching up
Model Validation
• “Model risk is the potential for adverse consequences from
decisions based on incorrect or misused model outputs and
reports. “ [1]
• “Model validation is the set of processes and activities
intended to verify that models are performing as expected,
in line with their design objectives and business uses. ” [1]
• Ref:
• [1] . Supervisory Letter SR 11-7 on guidance on Model Risk
36
Popularity of Open-source software in the enterprise
increasing
37
• Financial Services customers like Capital One, FINRA, and Pacific Life
are moving critical workloads to AWS
Cloud maturing
38
• Versions and packages
Challenges in adopting Open-source software in the
enterprise
39
• Difficulty in replicating and reconciling differences in environments
Challenges in adopting Open-source software in the
enterprise
40
• Deploying models built by Data Scientists still a problem
Challenges in adopting Open-source software in the
enterprise
Data Scientists Enterprise IT
41
• The try-before-adopt model is difficult with unproven open-source
solutions
Challenges in adopting Open-source software in the
enterprise
42
www.QuSandbox.com
43
Quant/Enterprise use cases
• Create an environment that can support multiple platforms and
programming languages
• Enable remote running of applications
• Ability to try out a Github submission/ someone else’s code
• Facilitate creation of Docker images to create replicable containers
• Create prototyping environments for Data Science/Quant teams
• Enable Data scientists/Quants to deploy their solutions
• Enable running multiple experiments concurrently
• Integrate seamlessly with the cloud to scale up computations
Use cases
44
Fintech use cases
• To demonstrate solutions to enterprises
• Create customized enterprise trials for companies that don’t permit
installation of vendor software prior to procurement
• To manage quick updates
• Enable effective integration and hosting of services (REST APIs)
Use cases
45
Academic use cases
• Enable creation of course material and exercises that could be
shared
• Enable students and workshop participants to focus on the data
science experiments rather than environment setting
Use cases
46
Creating replicable environments
Creating and manage replicable environments (Code + software + data) in a single portal
47
Creating replicable environments
Create replicable environments (Code + software + data) through a easy point & click tool and
publish to Dockerhub or manage internally
Share it with target users
48
User portal
• Run multiple experiments in pre-created environments (Code + software + data)
• Deploy your own solutions
• Run any Docker image or Github submission on the cloud
49
Run Jupyter notebooks and prototype applications
50
Run Rstudio and Shiny applications
51
Run any Docker application
52
Manage tasks and errors
53
User portal
• Dockerize and deploy applications on AWS in just a few steps
54
Deploy applications with ease
55
QU’s open source project – Project Mozaic
56
www.QuSandbox.com
57
www.analyticscertificatecom/MachineLearning
Thank you ARPM and enjoy the boot camp!
Checkout our programs at:
www.analyticscertificate.com/fintech
www.qusandbox.com
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
58

More Related Content

What's hot

Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Institute of Contemporary Sciences
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialKNIMESlides
 
Artificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud PreventionArtificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud PreventionJérôme Kehrli
 
Building a Data Analytics Portfolio
Building a Data Analytics PortfolioBuilding a Data Analytics Portfolio
Building a Data Analytics PortfolioJamie Renehan, FCCA
 
Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...Santhosh L
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsAlejandro Correa Bahnsen, PhD
 
Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learningQuantUniversity
 
Disruption in Digital Banking
Disruption in Digital BankingDisruption in Digital Banking
Disruption in Digital BankingBackbase
 
Analytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive TechniquesAnalytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive Techniquesleadershipsoil
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationScott Mongeau
 
Nasscom AI top 50 use cases
Nasscom AI top 50 use casesNasscom AI top 50 use cases
Nasscom AI top 50 use casesADDI AI 2050
 
Open banking [Evolution, Risks & Opportunities]
Open banking [Evolution, Risks & Opportunities]Open banking [Evolution, Risks & Opportunities]
Open banking [Evolution, Risks & Opportunities]Kannan Srinivasan
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation ModelMihai Enescu
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 

What's hot (20)

Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
Trends and practical applications of AI/ML in Fin Tech industry - Milos Kosan...
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Credit Card Fraud Detection Tutorial
Credit Card Fraud Detection TutorialCredit Card Fraud Detection Tutorial
Credit Card Fraud Detection Tutorial
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Artificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud PreventionArtificial Intelligence for Banking Fraud Prevention
Artificial Intelligence for Banking Fraud Prevention
 
Business Intelligence
Business IntelligenceBusiness Intelligence
Business Intelligence
 
Building a Data Analytics Portfolio
Building a Data Analytics PortfolioBuilding a Data Analytics Portfolio
Building a Data Analytics Portfolio
 
Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...Automated anti money laundering using artificial intelligence and machine lea...
Automated anti money laundering using artificial intelligence and machine lea...
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
 
Synthetic data generation for machine learning
Synthetic data generation for machine learningSynthetic data generation for machine learning
Synthetic data generation for machine learning
 
Disruption in Digital Banking
Disruption in Digital BankingDisruption in Digital Banking
Disruption in Digital Banking
 
Analytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive TechniquesAnalytics with Descriptive, Predictive and Prescriptive Techniques
Analytics with Descriptive, Predictive and Prescriptive Techniques
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
 
Data analytics
Data analyticsData analytics
Data analytics
 
Nasscom AI top 50 use cases
Nasscom AI top 50 use casesNasscom AI top 50 use cases
Nasscom AI top 50 use cases
 
Open banking [Evolution, Risks & Opportunities]
Open banking [Evolution, Risks & Opportunities]Open banking [Evolution, Risks & Opportunities]
Open banking [Evolution, Risks & Opportunities]
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 

Similar to Machine Learning in Credit Risk Scoring

Regtech in Fintech + QuSandbox Demo
Regtech in Fintech + QuSandbox DemoRegtech in Fintech + QuSandbox Demo
Regtech in Fintech + QuSandbox DemoQuantUniversity
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVIntoTheMinds
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVFrancisco Couto
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and MLQuantUniversity
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsQuantUniversity
 
Digicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache SparkQuantUniversity
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuantUniversity
 
The Future of BriteCore - Product Development
The Future of BriteCore - Product DevelopmentThe Future of BriteCore - Product Development
The Future of BriteCore - Product DevelopmentPhil Reynolds
 
Digital Transformation at the University of Edinburgh
Digital Transformation at the University of EdinburghDigital Transformation at the University of Edinburgh
Digital Transformation at the University of EdinburghMark Ritchie
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsQuantUniversity
 
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...Redis Labs
 
Using PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy ExplorationUsing PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy ExplorationDatabricks
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance
 

Similar to Machine Learning in Credit Risk Scoring (20)

Credit risk meetup
Credit risk meetupCredit risk meetup
Credit risk meetup
 
Regtech in Fintech + QuSandbox Demo
Regtech in Fintech + QuSandbox DemoRegtech in Fintech + QuSandbox Demo
Regtech in Fintech + QuSandbox Demo
 
A flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TVA flexible recommenndation system for Cable TV
A flexible recommenndation system for Cable TV
 
A Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TVA Flexible Recommendation System for Cable TV
A Flexible Recommendation System for Cable TV
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
QuSandbox+NVIDIA Rapids
QuSandbox+NVIDIA RapidsQuSandbox+NVIDIA Rapids
QuSandbox+NVIDIA Rapids
 
Ds for finance day 4
Ds for finance day 4Ds for finance day 4
Ds for finance day 4
 
Digicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics AppsDigicorp - Supply Chain Analytics Apps
Digicorp - Supply Chain Analytics Apps
 
Scaling Analytics with Apache Spark
Scaling Analytics with Apache SparkScaling Analytics with Apache Spark
Scaling Analytics with Apache Spark
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
QuTrack: Model Life Cycle Management for AI and ML models using a Blockchain ...
 
The Future of BriteCore - Product Development
The Future of BriteCore - Product DevelopmentThe Future of BriteCore - Product Development
The Future of BriteCore - Product Development
 
Digital Transformation at the University of Edinburgh
Digital Transformation at the University of EdinburghDigital Transformation at the University of Edinburgh
Digital Transformation at the University of Edinburgh
 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
Multi-Agency Multi-Media Interoperable Communication, Enabled By Redis: Paul ...
 
Using PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy ExplorationUsing PySpark to Scale Markov Decision Problems for Policy Exploration
Using PySpark to Scale Markov Decision Problems for Policy Exploration
 
Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016Pistoia Alliance USA Conference 2016
Pistoia Alliance USA Conference 2016
 
21st century quant
21st century quant21st century quant
21st century quant
 

More from QuantUniversity

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfQuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiserQuantUniversity
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA DallasQuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementQuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio AllocationQuantUniversity
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset BenchmarksQuantUniversity
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning InterpretabilityQuantUniversity
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in ActionQuantUniversity
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
 

More from QuantUniversity (20)

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
 
The API Jungle
The API JungleThe API Jungle
The API Jungle
 
Explainable AI Workshop
Explainable AI WorkshopExplainable AI Workshop
Explainable AI Workshop
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
Qwafafew meeting 5
Qwafafew meeting 5Qwafafew meeting 5
Qwafafew meeting 5
 

Recently uploaded

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Machine Learning in Credit Risk Scoring

  • 1. Location: ARPM Open Source Conference 8/13/2017 Machine Learning applications in Credit Risk 2017 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.analyticscertificate.com
  • 2. 2 Slides will be available at: www.analyticscertificatecom/MachineLearning
  • 3. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 3
  • 4. 4 Quantitative Analytics and Big Data Analytics Onboarding • Data Science, Quant Finance and Machine Learning Advisory • Trained more than 1000 students in Quantitative methods, Data Science and Big Data Technologies using MATLAB, Python and R • Launching ▫ Analytics Certificate Program  Spring 2018 ▫ Fintech Certification program  Fall 2017 • Building
  • 5. 6
  • 6. Credit risk in consumer credit Credit-scoring models and techniques assess the risk in lending to customers. Typical decisions: • Grant credit/not to new applicants • Increasing/Decreasing spending limits • Increasing/Decreasing lending rates • What new products can be given to existing applicants ?
  • 7. Credit assessment in consumer credit History: • Gut feel • Social network • Communities and influence Traditional: • Scoring mechanisms through credit bureaus • Bank assessments through business rules Newer approaches (FINTECH): • Peer-to-Peer lending • Lending club, Prosper Market place
  • 8. 9
  • 10. 11 Used to derive a relationship between dependent and independent variables • Prediction ▫ Regression ▫ Decision Trees (CART) ▫ Neural Networks • Classification ▫ Logistic Regression ▫ CART, Random Forest, SVM ▫ Neural Networks Supervised Learning
  • 11. 12 Data pre- processing Split data into Training and Testing sets Train the model on Training data Test the model using Testing data to evaluate model performance Methodology
  • 12. 13 • No distinction between independent variables and dependent variables • No result labels to determine “correct” results • Goals: ▫ Data Reduction ▫ Clustering Unsupervised Learning
  • 13. 14 • Partitioning Clustering ▫ Starts with K –number of clusters sought ▫ Observations randomly divided to form cohesive clusters ▫ Example : K-means • Hierarchical Agglomerative Clustering ▫ Each observation is its own cluster ▫ Combine clusters two at a time to finally have one cluster ▫ Example: Hierarchical clustering using single linkage, Ward’s method etc. Types of Clustering
  • 14. 15 • Tries to separate samples into K groups with a goal of maximizing between group variance and minimizing within group variance • Requires K to be specified up front. • Starts with K initial centroids and optimizes to minimize the criterion or till the number of specified iterations are reached. • Suited for larger datasets K-means
  • 15. 16 • Goal is to derive a dendrogram starting from each record being its own cluster • Works well for smaller data sets • Proximity is measured in multiple ways (more later) Hierarchical clustering
  • 16. 17 How do you measure similarity between two entities ? ▫ Apples and Bananas ▫ Coke and Pepsi vs Orange juice ▫ Honda Civic vs Toyota Corolla ▫ New York and Boston • The notion of distance The notion of distance
  • 17. 18 • Euclidean distance • Cosine distance Distance measures
  • 18. 19 • Manhattan distance (Taxi-cab distance) • Jaccard distance ▫ Used to measure similarity or dissimilarity between binary and non- binary variables ▫ http://people.revoledu.com/kardi/tutorial/Similarity/Jaccard.html Other distance measures
  • 19. 20 • Gower distance is used for calculating distances when we have mixed types of variables (continuous and categorical) • Variables can be: ▫ Quantitative (such as rating scale) ▫ Binary (such as present/absent) ▫ Nominal (such as worker/teacher/clerk) • The metrics used for each data type are described below: ▫ Quantitative: range-normalized Manhattan distance ▫ Ordinal: variable is first ranked, then Manhattan distance is used with a special adjustment for ties ▫ Nominal: variables of k categories are first converted into k binary columns and then the Dice coefficient is used (https://en.wikipedia.org/wiki/S%C3%B8rensen%E2%80%93Dice_coefficient ) Working with mixed-data
  • 20. 21 • Daisy : Compute all the pairwise dissimilarities (distances) between observations in the data set • Pam: Partitioning (clustering) of the data into k clusters “around medoids”, a more robust version of K-means. • Agnes: Computes agglomerative nesting (hierarchical clustering) of the dataset. Support in R
  • 21. 22
  • 26. • Calculate dissimilarity between observations. • Select algorithm to group observations together • Choose the best number of clusters • Visualize clusters on reduced dimensions Objective
  • 27. • Partitioning around medoids (PAM) is used in this case. • PAM is an iterative clustering procedure with the following steps: ▫ Step 1: Choose k random entities to become the medoids. ▫ Step 2: Assign every entity to its closest medoid (using the distance matrix we have calculated). ▫ Step 3: For each cluster, identify the observation that would yield the lowest average distance if it were to be re-assigned as the medoid. If so, make this observation the new medoid. ▫ Step 4: If at least one medoid has changes, return to step 2. Otherwise, end the algorithm. Selecting number of clusters
  • 28. • One way to visualize many variables in a lower dimensional space is with t-distributed stochastic neighborhood embedding (t-SNE) • This method is a dimension reduction technique that tries to preserve local structure so as to make clusters visible in a 2D or 3D visualization. • https://en.wikipedia.org/wiki/T- distributed_stochastic_neighbor_embedding Visualization with reduced dimension
  • 29. 30
  • 31. 32 Fintech being noticed by Regulators
  • 32. 33 • The regulatory sandbox allows businesses to test innovative products, services, business models and delivery mechanisms in the real market, with real consumers. • The sandbox is a supervised space, open to both authorized and unauthorized firms, that provides firms with: ▫ reduced time-to-market at potentially lower cost ▫ appropriate consumer protection safeguards built in to new products and services ▫ better access to finance • https://www.fca.org.uk/firms/regulatory-sandbox Regulatory Sandboxes
  • 34. Model Validation • “Model risk is the potential for adverse consequences from decisions based on incorrect or misused model outputs and reports. “ [1] • “Model validation is the set of processes and activities intended to verify that models are performing as expected, in line with their design objectives and business uses. ” [1] • Ref: • [1] . Supervisory Letter SR 11-7 on guidance on Model Risk
  • 35. 36 Popularity of Open-source software in the enterprise increasing
  • 36. 37 • Financial Services customers like Capital One, FINRA, and Pacific Life are moving critical workloads to AWS Cloud maturing
  • 37. 38 • Versions and packages Challenges in adopting Open-source software in the enterprise
  • 38. 39 • Difficulty in replicating and reconciling differences in environments Challenges in adopting Open-source software in the enterprise
  • 39. 40 • Deploying models built by Data Scientists still a problem Challenges in adopting Open-source software in the enterprise Data Scientists Enterprise IT
  • 40. 41 • The try-before-adopt model is difficult with unproven open-source solutions Challenges in adopting Open-source software in the enterprise
  • 42. 43 Quant/Enterprise use cases • Create an environment that can support multiple platforms and programming languages • Enable remote running of applications • Ability to try out a Github submission/ someone else’s code • Facilitate creation of Docker images to create replicable containers • Create prototyping environments for Data Science/Quant teams • Enable Data scientists/Quants to deploy their solutions • Enable running multiple experiments concurrently • Integrate seamlessly with the cloud to scale up computations Use cases
  • 43. 44 Fintech use cases • To demonstrate solutions to enterprises • Create customized enterprise trials for companies that don’t permit installation of vendor software prior to procurement • To manage quick updates • Enable effective integration and hosting of services (REST APIs) Use cases
  • 44. 45 Academic use cases • Enable creation of course material and exercises that could be shared • Enable students and workshop participants to focus on the data science experiments rather than environment setting Use cases
  • 45. 46 Creating replicable environments Creating and manage replicable environments (Code + software + data) in a single portal
  • 46. 47 Creating replicable environments Create replicable environments (Code + software + data) through a easy point & click tool and publish to Dockerhub or manage internally Share it with target users
  • 47. 48 User portal • Run multiple experiments in pre-created environments (Code + software + data) • Deploy your own solutions • Run any Docker image or Github submission on the cloud
  • 48. 49 Run Jupyter notebooks and prototype applications
  • 49. 50 Run Rstudio and Shiny applications
  • 50. 51 Run any Docker application
  • 52. 53 User portal • Dockerize and deploy applications on AWS in just a few steps
  • 54. 55 QU’s open source project – Project Mozaic
  • 57. Thank you ARPM and enjoy the boot camp! Checkout our programs at: www.analyticscertificate.com/fintech www.qusandbox.com Sri Krishnamurthy, CFA, CAP Founder and CEO QuantUniversity LLC. srikrishnamurthy www.QuantUniversity.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 58