This document provides an agenda for a presentation on AI and machine learning in finance. The presentation will cover key trends in AI/ML, examples of applications in areas like lending and stock analysis, and a case study approach. It includes a biography of the speaker and details about their company which provides quantitative finance and machine learning training. The agenda outlines topics to be covered in the morning and afternoon sessions including machine learning algorithms and building an ML application.
A Master Class for Financial Professionals for AI and Machine Learning
featuring Sri Krishnamurthy, CFA, CAP, QuantUniversity
Summary
The use of Data Science and Machine learning in the investment industry is increasing and investment professionals both fundamental and quantitative, are taking notice. Financial firms are taking AI and machine learning seriously to augment traditional investment decision making. Alternative datasets including text analytics, cloud computing, algorithmic trading are game changers for many firms who are adopting technology at a rapid pace. As more and more technologies penetrate enterprises, financial professionals are enthusiastic about the upcoming revolution and are looking for direction and education on data science and machine learning topics.
In this workshop, we aim to bring clarity on how AI and machine learning is revolutionizing financial services. We will introduce key concepts and through examples and case studies, we will illustrate the role of machine learning, data science techniques and AI in the investment industry. At the end of this workshop, participants can see a concrete picture on how to machine learning and AI techniques are fueling the Fintech wave!
Natural language processing (NLP) is an area of artificial intelligence that helps computers understand and interpret human language. Innovations in Artificial intelligence, deep learning and compuational hardware is helping make major strides in NLP research. While the applications are many, it is important to understand the kinds of problems NLP techniques can help solve.
In this master class, we will introduce ten key NLP techniques that are predominantly used in the industry.
- Question Answering
- Neural Machine Translation
- Topic Summarization
- Natural Language Inference
- Semantic Role Labeling
- Text Classification
- Sentiment Analysis
- Relation extraction
- Goal-Oriented Dialogue
- Semantic Parsing
We will also illustrate a case study on NLP in Python using the QuSandbox.
Machine Learning and AI: An Intuitive Introduction - CFA Institute MasterclassQuantUniversity
Learn how artificial intelligence (AI) and machine learning are revolutionizing financial services — this course will introduce key concepts and illustrate the role of machine learning, data science techniques, and AI through examples and case studies from the investment industry. The presentation uses simple mathematics and basic statistics to provide an intuitive understanding of machine learning, as used by financial firms, to augment traditional investment decision making.
This overview session offers a tour of machine learning and AI methods, examining case studies to understand the technology companies, data vendors, banks, and fintech startups that are the key players in trading and investment management. Practical examples and case studies will help participants understand key machine learning methodologies, choose an algorithm for a specific goal, and recognize when to use machine learning and AI techniques
QuantUniversity Machine Learning in Finance CourseQuantUniversity
The use of data science and machine learning in the investment industry is increasing. Financial firms are using artificial intelligence (AI) and machine learning to augment traditional investment decision making. In this course, we aim to bring clarity on how AI and machine learning are revolutionizing financial services. We will introduce key concepts and, through examples and case studies, will illustrate the role of machine learning, data science techniques, and AI in the investment industry. Rather than just showing how to write code or run experiments in Python, we will provide an intuitive understanding to machine learning with just enough mathematics and basic statistics.
YOU WILL LEARN:
• Role of Machine Learning and AI in Financial services
• When do we use Machine learning and AI techniques?
• What are the key machine learning methodologies?
• How do you choose an algorithm for a specific goal?
• Practical Case studies with fully functional code
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
AI in Finance
10 Key Considerations for AI/ML Model GovernanceQuantUniversity
As the financial industry continues to embrace AI and Machine Learning models, model risk management (MRM) departments are grappling with challenges on how to update model governance frameworks to adapt to the changing landscape of model management. While most MRM departments are structured and processes defined to address traditional statistical and quant models, data-driven models like Machine Learning models require modifications in the way models are defined, tested, validated, and governed.
In this webinar, we will discuss ten key aspects to factor when developing your model risk management framework when integrating Machine Learning models. We will discuss key drivers of model risk in today’s environment and how the scope of model governance is changing. We will introduce key concepts and discuss key aspects to be considered when developing a model governance framework when incorporating data science techniques and AI methodologies. Through this Decalogue, we aim to bring clarity on some of the model governance challenges when adopting data science, AI and machine learning methods in the enterprise.
A Master Class for Financial Professionals for AI and Machine Learning
featuring Sri Krishnamurthy, CFA, CAP, QuantUniversity
Summary
The use of Data Science and Machine learning in the investment industry is increasing and investment professionals both fundamental and quantitative, are taking notice. Financial firms are taking AI and machine learning seriously to augment traditional investment decision making. Alternative datasets including text analytics, cloud computing, algorithmic trading are game changers for many firms who are adopting technology at a rapid pace. As more and more technologies penetrate enterprises, financial professionals are enthusiastic about the upcoming revolution and are looking for direction and education on data science and machine learning topics.
In this workshop, we aim to bring clarity on how AI and machine learning is revolutionizing financial services. We will introduce key concepts and through examples and case studies, we will illustrate the role of machine learning, data science techniques and AI in the investment industry. At the end of this workshop, participants can see a concrete picture on how to machine learning and AI techniques are fueling the Fintech wave!
Natural language processing (NLP) is an area of artificial intelligence that helps computers understand and interpret human language. Innovations in Artificial intelligence, deep learning and compuational hardware is helping make major strides in NLP research. While the applications are many, it is important to understand the kinds of problems NLP techniques can help solve.
In this master class, we will introduce ten key NLP techniques that are predominantly used in the industry.
- Question Answering
- Neural Machine Translation
- Topic Summarization
- Natural Language Inference
- Semantic Role Labeling
- Text Classification
- Sentiment Analysis
- Relation extraction
- Goal-Oriented Dialogue
- Semantic Parsing
We will also illustrate a case study on NLP in Python using the QuSandbox.
Machine Learning and AI: An Intuitive Introduction - CFA Institute MasterclassQuantUniversity
Learn how artificial intelligence (AI) and machine learning are revolutionizing financial services — this course will introduce key concepts and illustrate the role of machine learning, data science techniques, and AI through examples and case studies from the investment industry. The presentation uses simple mathematics and basic statistics to provide an intuitive understanding of machine learning, as used by financial firms, to augment traditional investment decision making.
This overview session offers a tour of machine learning and AI methods, examining case studies to understand the technology companies, data vendors, banks, and fintech startups that are the key players in trading and investment management. Practical examples and case studies will help participants understand key machine learning methodologies, choose an algorithm for a specific goal, and recognize when to use machine learning and AI techniques
QuantUniversity Machine Learning in Finance CourseQuantUniversity
The use of data science and machine learning in the investment industry is increasing. Financial firms are using artificial intelligence (AI) and machine learning to augment traditional investment decision making. In this course, we aim to bring clarity on how AI and machine learning are revolutionizing financial services. We will introduce key concepts and, through examples and case studies, will illustrate the role of machine learning, data science techniques, and AI in the investment industry. Rather than just showing how to write code or run experiments in Python, we will provide an intuitive understanding to machine learning with just enough mathematics and basic statistics.
YOU WILL LEARN:
• Role of Machine Learning and AI in Financial services
• When do we use Machine learning and AI techniques?
• What are the key machine learning methodologies?
• How do you choose an algorithm for a specific goal?
• Practical Case studies with fully functional code
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
AI in Finance
10 Key Considerations for AI/ML Model GovernanceQuantUniversity
As the financial industry continues to embrace AI and Machine Learning models, model risk management (MRM) departments are grappling with challenges on how to update model governance frameworks to adapt to the changing landscape of model management. While most MRM departments are structured and processes defined to address traditional statistical and quant models, data-driven models like Machine Learning models require modifications in the way models are defined, tested, validated, and governed.
In this webinar, we will discuss ten key aspects to factor when developing your model risk management framework when integrating Machine Learning models. We will discuss key drivers of model risk in today’s environment and how the scope of model governance is changing. We will introduce key concepts and discuss key aspects to be considered when developing a model governance framework when incorporating data science techniques and AI methodologies. Through this Decalogue, we aim to bring clarity on some of the model governance challenges when adopting data science, AI and machine learning methods in the enterprise.
Machine Learning and AI: Core Methods and ApplicationsQuantUniversity
This session was presented at the CFA Institute on May 6th 2020
This deep-dive session discusses core methods and applications to provide an understanding of supervised and unsupervised machine learning. Participants will be introduced to advanced topics that include time series analysis, reinforcement learning, anomaly detection, and natural language processing. Case studies will also examine how to predict interest rates and credit risk with alternative data sets and how to analyze earning calls from EDGAR using Natural Language Processing Techniques.
RAPIDS is a suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.In this workshop, we will:
1. Introduce Rapids.ai & GPUs
2. Illustrate why GPUs are critical for machine learning and AI applications
3. Demonstrate common machine learning algorithms such as Regression, KNN,SGD etc. using RAPIDS on the QuSandbox
QU Speaker Series - Session 3
https://qusummerschool.splashthat.com
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Machine Learning and Model Risk (With a focus on Neural Network Models)
All models are wrong and when they are wrong they create financial or non-financial risks. Understanding, testing and managing model failures are the key focus of model risk management particularly model validation.
For machine learning models, particular attention is made on how to manage model fairness, explainability, robustness and change control. In this presentation, I will focus the discussion on machine learning explainability and robustness. Explainability is critical to evaluate conceptual soundness of models particularly for the applications in highly regulated institutions such as banks. There are many explainability tools available and my focus in this talk is how to develop fundamentally interpretable models.
Neural networks (including Deep Learning), with proper architectural choice, can be made to be highly interpretable models. Since models in production will be subjected to dynamically changing environments, testing and choosing robust models against changes are critical, an aspect that has been neglected in AutoML.
Synthetic VIX Data Generation Using ML TechniquesQuantUniversity
Slides from PRIMIA webinar: https://prmia.org/Shared_Content/Events/PRMIA_Event_Display.aspx?EventKey=8504&WebsiteKey=e0a57874-c04b-476a-827d-2bbc348e6b08
Part 1: We will discuss key trends in AI and machine learning in the financial services industry. We will discuss the key use cases, challenges, and best practices of using AI and ML techniques in financial services. We will also discuss key players and drivers for the AI and Machine learning revolution.
Part 2: We will illustrate a case study where AI and machine learning techniques are applied in financial services.
Case study: Synthetic VIX data generation using Machine learning techniques
Synthetic data sets and simulations are used to enrich and augment existing datasets to provide comprehensive samples while training machine learning problems. In addition, synthetic data generators could be used for scenario generation when modeling future scenarios when trained on real and synthetic scenarios. The advent of novel techniques in Machine Learning has rekindled interest in using deep learning techniques like Generative Adversarial Networks (GANs) and Encoder-Decoder architectures in financial synthetic data generation.
In this case study, we discuss a recent study we did to see the efficacy of synthetic data generation when there are significant VIX changes in the market during short time horizons. We used QuSynthesize, a synthetic data generator for time-series based datasets and used historical VIX datasets and synthetic VIX scenarios to generate futuristic scenarios.
The goal of this course is to offer data science and fintech enthusiasts a hand-on practical case study to understand the power of Data Science, ML and AI in Finance. We discuss two case studies; An NLP case study and a Credit Risk case study to reinforce concepts
Credit Risk Introduction and Pre-class preparation
Pre-class reading. We will be using the Lending club data set to build a credit risk model using machine learning techniques. This workshop was be delivered in Boston and Online by Sri Krishnamurthy.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
Topics for the Masterclass
- Learning Data science in 10 steps
With innovations in hardware, algorithms, and large datasets, the use of Data Science and Machine Learning in finance is increasing. As more and more open-source technologies penetrate enterprises, quants and data scientists have a plethora of choices for building, testing, and scaling models. Alternative datasets including text analytics, cloud computing, algorithmic trading are game-changers for many firms exploring novel modeling methods to augment their traditional investment and decision workflows. While there is significant enthusiasm, model risk professionals and risk managers are concerned about the onslaught of new technologies, programming languages, and data sets that are entering the enterprise. With very little guidance from regulators on how to govern the tools and the processes, organizations are developing their own home-cooked methods to address model risk management challenges.
In this webinar, we aim to bring clarity to some of the model risk management challenges when adopting data science, AI, and Machine Learning methods in the enterprise. We will discuss key drivers of model risk in today’s environment and how the scope of model governance is changing. We will introduce key concepts and discuss key aspects to be considered when developing a model risk management framework when incorporating data science techniques and AI methodologies.
Machine learning for factor investing - Tony Guida
https://quspeakerseries5.splashthat.com/
Topic: Machine Learning for Factor Investing: case study on "Trees"
In this presentation, Tony will first introduce the concept of supervised learning. Then he will cover the practitioner angle for constructing non linear multi factor signals using stock characteristics. He will show the added value of ML based signals over traditional linear stale factors blend in equity.
This master class is derived from Guillaume Coqueret and Tony Guida's latest book "Machine Learning for Factor Investing"
Modular Machine Learning for Model ValidationQuantUniversity
Topic: Modular Machine Learning for Model Validation
Implementing model validation through a set of interdependent modules that utilizes both traditional econometrics and data science techniques can produce robust assessments of the predictive effectiveness of investment signals in an economically intuitive manner.
The proposed methodology, modular machine learning, also answers a number of practical questions that arise when applying block time series cross-validation such as what number of folds to use and what block size to use between folds.
It is possible to re-interpret the Fundamental Law of Active Management into a model validation framework by expressing its fundamental concepts, information coefficient and breadth, using the formal language of data science.
In this talk, we introduce an approach towards model validation which we call modular machine learning (MML) and use it to build a methodology that can be applied to the evaluation of investment signals within the conceptual scheme provided by the FL. Our framework is modular in two respects: (1) It is comprised of independent computational components, each using the output of another as its input, and (2) It is characterized by the distinct role played by traditional econometric and date science methodologies.
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
Machine Learning: Considerations for Fairly and Transparently Expanding Access to Credit
With Raghu Kulkarni and Steve Dickerson
Recently, machine learning has been used extensively in credit decision making. As ML proliferates the industry, issues of considerations for fair and transparent access to credit decision making is becoming important.
In this talk, Dr.Raghu Kulkarni and Dr.Steven Dickerson from Discover Financial Services will share their experiences at Discover. The talk will include:
- An overview of how ML models are used across financial life cycle
- Practical problems practitioners run into and why explainability and bias detection becomes important.
References:
1- https://www.h2o.ai/resources/white-paper/machine-learning-considerations-for-fairly-and-transparently-expanding-access-to-credit/
2- https://arxiv.org/abs/2011.03156
Synthetic data generation for machine learningQuantUniversity
As machine learning becomes more pervasive in the industry, data scientists and quants are realizing the challenges and limitations of machine learning models. One of the primary reasons machine learning applications fail is due to the lack of rich, diverse and clean datasets needed to build models. Datasets may have missing values, may not incorporate enough samples for all use cases (for example: availability of fraudulent transaction records to train a model) and may not be easily sharable due to privacy concerns. While there are many data cleansing techniques to fix data-related issues and we can always try and get new and rich datasets, the cost is at times prohibitive and at times impractical leading many institutions to abandon machine learning and go back to rule-based methods.
Synthetic data sets and simulations are used to enrich and augment existing datasets to provide comprehensive samples while training machine learning problems. In addition, synthetic datasets can be used for comprehensive scenario analysis, missing value filling and privacy protection of the datasets when building models. The advent of novel techniques like Deep Learning has rekindled interest in using techniques like GANs and Encoder-Decoder architectures in financial synthetic data generation.
In this workshop, we will discuss the state of the art in Synthetic data generation and will illustrate the various techniques and methods that can be used in practice. Through examples using QuSynthesize & QuSandbox, we will demonstrate how these techniques can be realized in practice.
QU Summer school 2020 speaker Series - Session 7
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Managing Machine Learning Models in the Financial Industry
Lecture 1: Model Risk Management for AI and Machine Learning
Artificial intelligence and machine learning are part of today’s modeler’s toolbox for building challenger models and new innovative models that address business needs. However, AI presents new and unique challenges for risk management, particularly for assessing, controlling, and managing model risk for models of limited transparency. Another key consideration is the speed at which these models can be developed, validated, and then deployed into productive use to be competitive adhering to a robust model risk management program. This talk will highlight best practices for integrating AI into model risk practices and showcase examples across the model lifecycle.
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Generating Synthetic Data with Generative Adversarial Networks: Opportunities and Challenges
Limited data access continues to be a barrier to data-driven product development. In this talk, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge.
We identify key challenges of existing GAN approaches with respect to fidelity (e.g., capturing complex multidimensional correlations, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity).
To address fidelity challenges, we discuss our experiences designing a custom workflow called DoppelGANger and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DoppelGANger achieves up to 43% better fidelity than baseline models.
With respect to privacy, we identify fundamental challenges with both classical notions of privacy as well as recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges.
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring. In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical data sets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional data sets and machine learning techniques for customer segmentation.
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
In this master class, Stefan shows how to create synthetic time-series data using generative adversarial networks (GAN). GANs train a generator and a discriminator network in a competitive setting so that the generator learns to produce samples that the discriminator cannot distinguish from a given class of training data. The goal is to yield a generative model capable of producing synthetic samples representative of this class. While most popular with image data, GANs have also been used to generate synthetic time-series data in the medical domain. Subsequent experiments with financial data explored whether GANs can produce alternative price trajectories useful for ML training or strategy backtests.
Learn how artificial intelligence (AI) and machine learning are revolutionizing industries — this course will introduce key concepts and illustrate the role of machine learning, data science techniques, and AI through examples and case studies from the investment industry. The presentation uses simple mathematics and basic statistics to provide an intuitive understanding of machine learning, as used by firms, to augment traditional decision making.
https://quforindia.splashthat.com/
Machine Learning and AI: Core Methods and ApplicationsQuantUniversity
This session was presented at the CFA Institute on May 6th 2020
This deep-dive session discusses core methods and applications to provide an understanding of supervised and unsupervised machine learning. Participants will be introduced to advanced topics that include time series analysis, reinforcement learning, anomaly detection, and natural language processing. Case studies will also examine how to predict interest rates and credit risk with alternative data sets and how to analyze earning calls from EDGAR using Natural Language Processing Techniques.
RAPIDS is a suite of open source software libraries and APIs gives you the ability to execute end-to-end data science and analytics pipelines entirely on GPUs.In this workshop, we will:
1. Introduce Rapids.ai & GPUs
2. Illustrate why GPUs are critical for machine learning and AI applications
3. Demonstrate common machine learning algorithms such as Regression, KNN,SGD etc. using RAPIDS on the QuSandbox
QU Speaker Series - Session 3
https://qusummerschool.splashthat.com
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Machine Learning and Model Risk (With a focus on Neural Network Models)
All models are wrong and when they are wrong they create financial or non-financial risks. Understanding, testing and managing model failures are the key focus of model risk management particularly model validation.
For machine learning models, particular attention is made on how to manage model fairness, explainability, robustness and change control. In this presentation, I will focus the discussion on machine learning explainability and robustness. Explainability is critical to evaluate conceptual soundness of models particularly for the applications in highly regulated institutions such as banks. There are many explainability tools available and my focus in this talk is how to develop fundamentally interpretable models.
Neural networks (including Deep Learning), with proper architectural choice, can be made to be highly interpretable models. Since models in production will be subjected to dynamically changing environments, testing and choosing robust models against changes are critical, an aspect that has been neglected in AutoML.
Synthetic VIX Data Generation Using ML TechniquesQuantUniversity
Slides from PRIMIA webinar: https://prmia.org/Shared_Content/Events/PRMIA_Event_Display.aspx?EventKey=8504&WebsiteKey=e0a57874-c04b-476a-827d-2bbc348e6b08
Part 1: We will discuss key trends in AI and machine learning in the financial services industry. We will discuss the key use cases, challenges, and best practices of using AI and ML techniques in financial services. We will also discuss key players and drivers for the AI and Machine learning revolution.
Part 2: We will illustrate a case study where AI and machine learning techniques are applied in financial services.
Case study: Synthetic VIX data generation using Machine learning techniques
Synthetic data sets and simulations are used to enrich and augment existing datasets to provide comprehensive samples while training machine learning problems. In addition, synthetic data generators could be used for scenario generation when modeling future scenarios when trained on real and synthetic scenarios. The advent of novel techniques in Machine Learning has rekindled interest in using deep learning techniques like Generative Adversarial Networks (GANs) and Encoder-Decoder architectures in financial synthetic data generation.
In this case study, we discuss a recent study we did to see the efficacy of synthetic data generation when there are significant VIX changes in the market during short time horizons. We used QuSynthesize, a synthetic data generator for time-series based datasets and used historical VIX datasets and synthetic VIX scenarios to generate futuristic scenarios.
The goal of this course is to offer data science and fintech enthusiasts a hand-on practical case study to understand the power of Data Science, ML and AI in Finance. We discuss two case studies; An NLP case study and a Credit Risk case study to reinforce concepts
Credit Risk Introduction and Pre-class preparation
Pre-class reading. We will be using the Lending club data set to build a credit risk model using machine learning techniques. This workshop was be delivered in Boston and Online by Sri Krishnamurthy.
Innovations in technology has revolutionized financial services to an extent that large financial institutions like Goldman Sachs are claiming to be technology companies! It is no secret that technological innovations like Data science and AI are changing fundamentally how financial products are created, tested and delivered. While it is exciting to learn about technologies themselves, there is very little guidance available to companies and financial professionals should retool and gear themselves towards the upcoming revolution.
In this master class, we will discuss key innovations in Data Science and AI and connect applications of these novel fields in forecasting and optimization. Through case studies and examples, we will demonstrate why now is the time you should invest to learn about the topics that will reshape the financial services industry of the future!
Topics for the Masterclass
- Learning Data science in 10 steps
With innovations in hardware, algorithms, and large datasets, the use of Data Science and Machine Learning in finance is increasing. As more and more open-source technologies penetrate enterprises, quants and data scientists have a plethora of choices for building, testing, and scaling models. Alternative datasets including text analytics, cloud computing, algorithmic trading are game-changers for many firms exploring novel modeling methods to augment their traditional investment and decision workflows. While there is significant enthusiasm, model risk professionals and risk managers are concerned about the onslaught of new technologies, programming languages, and data sets that are entering the enterprise. With very little guidance from regulators on how to govern the tools and the processes, organizations are developing their own home-cooked methods to address model risk management challenges.
In this webinar, we aim to bring clarity to some of the model risk management challenges when adopting data science, AI, and Machine Learning methods in the enterprise. We will discuss key drivers of model risk in today’s environment and how the scope of model governance is changing. We will introduce key concepts and discuss key aspects to be considered when developing a model risk management framework when incorporating data science techniques and AI methodologies.
Machine learning for factor investing - Tony Guida
https://quspeakerseries5.splashthat.com/
Topic: Machine Learning for Factor Investing: case study on "Trees"
In this presentation, Tony will first introduce the concept of supervised learning. Then he will cover the practitioner angle for constructing non linear multi factor signals using stock characteristics. He will show the added value of ML based signals over traditional linear stale factors blend in equity.
This master class is derived from Guillaume Coqueret and Tony Guida's latest book "Machine Learning for Factor Investing"
Modular Machine Learning for Model ValidationQuantUniversity
Topic: Modular Machine Learning for Model Validation
Implementing model validation through a set of interdependent modules that utilizes both traditional econometrics and data science techniques can produce robust assessments of the predictive effectiveness of investment signals in an economically intuitive manner.
The proposed methodology, modular machine learning, also answers a number of practical questions that arise when applying block time series cross-validation such as what number of folds to use and what block size to use between folds.
It is possible to re-interpret the Fundamental Law of Active Management into a model validation framework by expressing its fundamental concepts, information coefficient and breadth, using the formal language of data science.
In this talk, we introduce an approach towards model validation which we call modular machine learning (MML) and use it to build a methodology that can be applied to the evaluation of investment signals within the conceptual scheme provided by the FL. Our framework is modular in two respects: (1) It is comprised of independent computational components, each using the output of another as its input, and (2) It is characterized by the distinct role played by traditional econometric and date science methodologies.
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
Machine Learning: Considerations for Fairly and Transparently Expanding Access to Credit
With Raghu Kulkarni and Steve Dickerson
Recently, machine learning has been used extensively in credit decision making. As ML proliferates the industry, issues of considerations for fair and transparent access to credit decision making is becoming important.
In this talk, Dr.Raghu Kulkarni and Dr.Steven Dickerson from Discover Financial Services will share their experiences at Discover. The talk will include:
- An overview of how ML models are used across financial life cycle
- Practical problems practitioners run into and why explainability and bias detection becomes important.
References:
1- https://www.h2o.ai/resources/white-paper/machine-learning-considerations-for-fairly-and-transparently-expanding-access-to-credit/
2- https://arxiv.org/abs/2011.03156
Synthetic data generation for machine learningQuantUniversity
As machine learning becomes more pervasive in the industry, data scientists and quants are realizing the challenges and limitations of machine learning models. One of the primary reasons machine learning applications fail is due to the lack of rich, diverse and clean datasets needed to build models. Datasets may have missing values, may not incorporate enough samples for all use cases (for example: availability of fraudulent transaction records to train a model) and may not be easily sharable due to privacy concerns. While there are many data cleansing techniques to fix data-related issues and we can always try and get new and rich datasets, the cost is at times prohibitive and at times impractical leading many institutions to abandon machine learning and go back to rule-based methods.
Synthetic data sets and simulations are used to enrich and augment existing datasets to provide comprehensive samples while training machine learning problems. In addition, synthetic datasets can be used for comprehensive scenario analysis, missing value filling and privacy protection of the datasets when building models. The advent of novel techniques like Deep Learning has rekindled interest in using techniques like GANs and Encoder-Decoder architectures in financial synthetic data generation.
In this workshop, we will discuss the state of the art in Synthetic data generation and will illustrate the various techniques and methods that can be used in practice. Through examples using QuSynthesize & QuSandbox, we will demonstrate how these techniques can be realized in practice.
QU Summer school 2020 speaker Series - Session 7
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Managing Machine Learning Models in the Financial Industry
Lecture 1: Model Risk Management for AI and Machine Learning
Artificial intelligence and machine learning are part of today’s modeler’s toolbox for building challenger models and new innovative models that address business needs. However, AI presents new and unique challenges for risk management, particularly for assessing, controlling, and managing model risk for models of limited transparency. Another key consideration is the speed at which these models can be developed, validated, and then deployed into productive use to be competitive adhering to a robust model risk management program. This talk will highlight best practices for integrating AI into model risk practices and showcase examples across the model lifecycle.
A conversation with Quants, Thinkers and Innovators all challenged to innovate in turbulent times!
Join QuantUniversity for a complimentary summer speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Topic: Generating Synthetic Data with Generative Adversarial Networks: Opportunities and Challenges
Limited data access continues to be a barrier to data-driven product development. In this talk, we explore if and how generative adversarial networks (GANs) can be used to incentivize data sharing by enabling a generic framework for sharing synthetic datasets with minimal expert knowledge.
We identify key challenges of existing GAN approaches with respect to fidelity (e.g., capturing complex multidimensional correlations, mode collapse) and privacy (i.e., existing guarantees are poorly understood and can sacrifice fidelity).
To address fidelity challenges, we discuss our experiences designing a custom workflow called DoppelGANger and demonstrate that across diverse real-world datasets (e.g., bandwidth measurements, cluster requests, web sessions) and use cases (e.g., structural characterization, predictive modeling, algorithm comparison), DoppelGANger achieves up to 43% better fidelity than baseline models.
With respect to privacy, we identify fundamental challenges with both classical notions of privacy as well as recent advances to improve the privacy properties of GANs, and suggest a potential roadmap for addressing these challenges.
Credit scoring has been used to categorize customers based on various characteristics to evaluate their credit worthiness. Increasingly, machine learning techniques are being deployed for customer segmentation, classification and scoring. In this talk, we will discuss various machine learning techniques that can be used for credit risk applications. Through a case study built in R, we will illustrate the nuances of working with practical data sets which includes categorical and numerical data, different techniques that can be used to evaluate and explore customer profiles, visualizing high dimensional data sets and machine learning techniques for customer segmentation.
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
In this master class, Stefan shows how to create synthetic time-series data using generative adversarial networks (GAN). GANs train a generator and a discriminator network in a competitive setting so that the generator learns to produce samples that the discriminator cannot distinguish from a given class of training data. The goal is to yield a generative model capable of producing synthetic samples representative of this class. While most popular with image data, GANs have also been used to generate synthetic time-series data in the medical domain. Subsequent experiments with financial data explored whether GANs can produce alternative price trajectories useful for ML training or strategy backtests.
Learn how artificial intelligence (AI) and machine learning are revolutionizing industries — this course will introduce key concepts and illustrate the role of machine learning, data science techniques, and AI through examples and case studies from the investment industry. The presentation uses simple mathematics and basic statistics to provide an intuitive understanding of machine learning, as used by firms, to augment traditional decision making.
https://quforindia.splashthat.com/
Learn how Artificial Intelligence (“AI”) and Machine Learning (“ML”) are revolutionizing financial services
Introduction of key concepts and illustration of the role of ML, data science techniques, and AI through examples and case studies from the investment industry.
Uses simple math and basic statistics to provide an intuitive understanding of ML, as used by financial firms, to augment traditional investment decision making.
Careers in ML and AI and how professionals should prepare for careers in the 21st century, especially post Covid19.
The use of data science and machine learning in the investment industry is increasing. Financial firms are using artificial intelligence (AI) and machine learning to augment traditional investment decision making.
In this workshop, we aim to bring clarity on how AI and machine learning are revolutionizing financial services. We will introduce key concepts and, through examples and case studies, will illustrate the role of machine learning, data science techniques, and AI in the investment industry.
Agenda:
In Part 1, we will discuss key trends in AI and machine learning in the financial services industry, including the key use cases, challenges, and best practices.
In Part 2, we will illustrate two case studies where AI and machine learning techniques are applied in financial services.
Case studies:
Sentiment Analysis Using Natural Language Processing in Finance
In this case study, we will demonstrate the use of natural language processing techniques to analyze EDGAR call earnings transcripts that could be used to generate sentiment analysis scores using the Amazon Comprehend, IBM Watson, Google, and Azure APIs (application programming interfaces). We will illustrate how these scores can be used to augment traditional quantitative research and for trading decisions.
Credit Risk Decision Making Using Lending Club Data
In this case study, we will use the Lending Club data set to build a credit risk model using
machine learning techniques.
Explainability for Natural Language ProcessingYunyao Li
NOTE: Please check out the final version here with small but important updates and links to downloadable version and recording: https://www.slideshare.net/YunyaoLi/explainability-for-natural-language-processing-249992241
Updated version on our popular tutorial on "Explainability for Natural Language Processing" as a tutorial at KDD'2021.
Title: Explainability for Natural Language Processing
@article{kdd2021xaitutorial,
title={Explainability for Natural Language Processing},
author= {Marina Danilevsky, Dhanorkar, Shipi and Li, Yunyao and Lucian Popa and Kun Qian and Anbang Xu},
journal={KDD},
year={2021}
}
Presenter: Marina Danilevsky, Dhanorkar, Shipi and Li, Yunyao and Lucian Popa and Kun Qian and Anbang Xu
Website: http://xainlp.github.io/
Abstract:
This lecture-style tutorial, which mixes in an interactive literature browsing component, is intended for the many researchers and practitioners working with text data and on applications of natural language processing (NLP) in data science and knowledge discovery. The focus of the tutorial is on the issues of transparency and interpretability as they relate to building models for text and their applications to knowledge discovery. As black-box models have gained popularity for a broad range of tasks in recent years, both the research and industry communities have begun developing new techniques to render them more transparent and interpretable.Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP/knowledge management researchers, our tutorial has two components: an introduction to explainable AI (XAI) in the NLP domain and a review of the state-of-the-art research; and findings from a qualitative interview study of individuals working on real-world NLP projects as they are applied to various knowledge extraction and discovery at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability inNLP. Then, we will discuss explainability for NLP tasks and reporton a systematic literature review of the state-of-the-art literaturein AI, NLP and HCI conferences. The second component reports on our qualitative interview study, which identifies practical challenges and concerns that arise in real-world development projects that require the modeling and understanding of text data.
Explainability for Natural Language ProcessingYunyao Li
Tutorial at AACL'2020 (http://www.aacl2020.org/program/tutorials/#t4-explainability-for-natural-language-processing).
More recent version: https://www.slideshare.net/YunyaoLi/explainability-for-natural-language-processing-249912819
Title: Explainability for Natural Language Processing
@article{aacl2020xaitutorial,
title={Explainability for Natural Language Processing},
author= {Dhanorkar, Shipi and Li, Yunyao and Popa, Lucian and Qian, Kun and Wolf, Christine T and Xu, Anbang},
journal={AACL-IJCNLP 2020},
year={2020}
Presenter: Shipi Dhanorkar, Christine Wolf, Kun Qian, Anbang Xu, Lucian Popa and Yunyao Li
Video: https://www.youtube.com/watch?v=3tnrGe_JA0s&feature=youtu.be
Abstract:
We propose a cutting-edge tutorial that investigates the issues of transparency and interpretability as they relate to NLP. Both the research community and industry have been developing new techniques to render black-box NLP models more transparent and interpretable. Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP researchers, our tutorial has two components: an introduction to explainable AI (XAI) and a review of the state-of-the-art for explainability research in NLP; and findings from a qualitative interview study of individuals working on real-world NLP projects at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability in NLP. Then, we will discuss explainability for NLP tasks and report on a systematic literature review of the state-of-the-art literature in AI, NLP, and HCI conferences. The second component reports on our qualitative interview study which identifies practical challenges and concerns that arise in real-world development projects which include NLP.
The use of Data Science and Machine learning in the investment industry is increasing, and investment professionals, both fundamental and quantitative, are taking notice. Financial firms are taking AI and machine learning seriously to augment traditional investment decision making. Alternative data sets including text analytics, cloud computing, and algorithmic trading are game changers for many firms who are adopting technology at a rapid pace. As more and more technologies penetrate enterprises, financial professionals are enthusiastic about the upcoming revolution and are looking for direction and education on data science and machine learning topics.
In this webinar, we aim to bring clarity to how AI and machine learning is revolutionizing financial services. We will introduce key concepts and through examples and case studies, we will illustrate the role of machine learning, data science techniques, and AI in the investment industry. At the end of this webinar, participants will see a concrete picture of how machine learning and AI techniques are fueling the Fintech wave!
Explainability for Natural Language ProcessingYunyao Li
Final deck for our popular tutorial on "Explainability for Natural Language Processing" at KDD'2021. See links below for downloadable version (with higher resolution) and recording of the live tutorial.
Title: Explainability for Natural Language Processing
Presenter: Marina Danilevsky, Shipi Dhanorkar, Yunyao Li and Lucian Popa and Kun Qian and Anbang Xu
Website: http://xainlp.github.io/
Recording: https://www.youtube.com/watch?v=PvKOSYGclPk&t=2s
Downloadable version with higher resolution: https://drive.google.com/file/d/1_gt_cS9nP9rcZOn4dcmxc2CErxrHW9CU/view?usp=sharing
@article{kdd2021xaitutorial,
title={Explainability for Natural Language Processing},
author= {Marina Danilevsky, Shipi Dhanorkar and Yunyao Li and Lucian Popa and Kun Qian and Anbang Xu},
journal={KDD},
year={2021}
}
Abstract:
This lecture-style tutorial, which mixes in an interactive literature browsing component, is intended for the many researchers and practitioners working with text data and on applications of natural language processing (NLP) in data science and knowledge discovery. The focus of the tutorial is on the issues of transparency and interpretability as they relate to building models for text and their applications to knowledge discovery. As black-box models have gained popularity for a broad range of tasks in recent years, both the research and industry communities have begun developing new techniques to render them more transparent and interpretable.Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP/knowledge management researchers, our tutorial has two components: an introduction to explainable AI (XAI) in the NLP domain and a review of the state-of-the-art research; and findings from a qualitative interview study of individuals working on real-world NLP projects as they are applied to various knowledge extraction and discovery at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability inNLP. Then, we will discuss explainability for NLP tasks and reporton a systematic literature review of the state-of-the-art literaturein AI, NLP and HCI conferences. The second component reports on our qualitative interview study, which identifies practical challenges and concerns that arise in real-world development projects that require the modeling and understanding of text data.
Model governance in the age of data science & AIQuantUniversity
As more and more open-source technologies penetrate enterprises, data scientists have a plethora of choices for building, testing and scaling models. In addition, data scientists have been able to leverage the growing support for cloud-based infrastructure and open data sets to develop machine learning applications. Even though there are multiple solutions and platforms available to build machine learning solutions, challenges remain in adopting machine learning in the enterprise. Many of the challenges are associated with how machine learning process can be formalized. As the field matures, formal mechanism for a replicable, interpretable, auditable process for a complete machine learning pipeline from data ingestion to deployment is warranted. Projects like Docker, Binderhub, MLFlow are efforts in this quest and research and industry efforts on replicable machine learning processes are gaining steam. Heavily regulated industries like financial and healthcare industries are looking for best practices to enable their research teams to reproduce research and adopt best practices in model governance. In this talk, we will discuss the challenges and best practices of governing AI and ML model in the enterprise
Mathematical Finance & Financial Data Science Seminar
AI and machine learning are entering every aspect of our life. Marketing, autonomous driving, personalization, computer vision, finance, wearables, travel are all benefiting from the advances in AI in the last decade. As more and more AI applications are being deployed in enterprises, concerns are growing about potential "AI accidents" and the misuse of AI. With increased complexity, some are questioning whether the models actually work! As the debate about fairness, bias, and privacy grow, there is increased attention to understanding how the models work and whether the models are thoroughly tested and designed to address potential issues.
The area "Responsible AI" is fast emerging and becoming an important aspect of the adoption of machine learning and AI products in the enterprise. Companies are now incorporating formal ethics reviews, model validation exercises, and independent algorithmic auditing to ensure that the adoption of AI is transparent and has gone through formal validation phases.
In this talk, Sri will introduce Algorithmic auditing and discuss why Algorithmic auditing will be a formal process industries using AI will need. Sri will also discuss the emerging risks in the adoption of AI and discuss how QuSandbox, his company is building, will address the emerging needs of formal Algorithmic auditing practices in enterprises.
How the World's Leading Independent Automotive Distributor is Reinventing Its...NUS-ISS
In this captivating session, we'll unveil the profound impact of AI, poised to revolutionise the business landscape. Prepare to shift your perspective, as we transition from the lens of a data scientist to the visionary mindset of a product manager. We're about to demystify the captivating world of Generative AI, dispelling myths and illuminating its remarkable potential. We will also delve into the pioneering applications that Inchcape is leading, pushing the boundaries of what's achievable. Join us for an exhilarating journey into the future of AI, where professionalism meets unparalleled excitement, and innovation takes center stage!
Adopting Data Science and Machine Learning in the financial enterpriseQuantUniversity
Financial firms are taking AI and machine learning seriously to augment traditional investment decision making. Alternative datasets including text analytics, cloud computing, algorithmic trading are game changers for many firms who are adopting technology at a rapid pace. As more and more open-source technologies penetrate enterprises, quants and data scientists have a plethora of choices for building, testing and scaling quantitative models. Even though there are multiple solutions and platforms available to build machine learning solutions, challenges remain in adopting machine learning in the enterprise.In this talk we will illustrate a step-by-step process to enable replicable AI/ML research within the enterprise using QuSandbox.
Top Rated Dissertation Data Analysis Services | PhD AssistancePHDAssistance2
Data Analytics is the keystone of transformative technologies like Artificial Intelligence (AI) and Machine Learning (ML). In the realm of AI and ML applications, data-driven insights empower businesses and researchers to make informed decisions, unravel patterns, and predict future trends.
For complete dissertation by statistics solution, visit - https://shorturl.at/oMSXY
Check our site to know more about real-time data analytics examples - https://shorturl.at/oszJ6
For #Enquiry:
Email: info@phdassistance.com
India: +91 91769 66446
UK: +44 7537144372
Uniform Legal Framework for AI: The EU AI Act establishes a uniform legal framework for the development, marketing, and use of artificial intelligence systems within the EU, aimed at promoting trustworthy and human-centric AI while ensuring a high level of health, safety, and fundamental rights protection.
Risk-Based Approach: The regulation adopts a risk-based approach, classifying AI systems based on the level of risk they pose, from minimal to unacceptable risk, with stringent requirements for high-risk AI systems, particularly those impacting health, safety, and fundamental rights.
Prohibitions for Certain AI Practices: Unacceptable risk practices, such as manipulative social scoring and real-time biometric identification in public spaces without justification, are prohibited to protect individual rights and freedoms.
Mandatory Requirements for High-Risk AI Systems: High-risk AI systems must comply with mandatory requirements before they can be marketed, put into service, or used within the EU. These requirements include transparency, data governance, technical documentation, and human oversight to ensure safety and compliance with fundamental rights.
Conformity Assessment and Compliance: Providers of high-risk AI systems must undergo a conformity assessment procedure to demonstrate compliance with the mandatory requirements. This includes maintaining technical documentation and conducting risk management activities.
Transparency Obligations: AI systems must be transparent, providing users with information about the AI system's capabilities, limitations, and the purpose for which it is intended, ensuring informed use of AI technologies.
Market Surveillance: The EU AI Act establishes mechanisms for market surveillance to monitor and enforce compliance, with the European Artificial Intelligence Board (EAIB) playing a central role in coordinating activities across member states.
Protection of Fundamental Rights: The Act emphasizes the protection of fundamental rights, including privacy, non-discrimination, and consumer rights, with specific provisions to safeguard these rights in the context of AI use.
Innovation and SME Support: The regulation aims to foster innovation and support small and medium-sized enterprises (SMEs) through regulatory sandboxes and by reducing administrative burdens for low and minimal risk AI applications.
Global Impact and Alignment: While the EU AI Act directly applies to the EU market, its global impact is significant, influencing international standards and practices in AI development and use. Financial industry professionals worldwide should be aware of these regulations as they may affect global operations and international collaborations.
The financial industry is witnessing an emerging trend of Large Language Models (LLMs) applications to improve operational efficiency. This article, based on a round table discussion hosted by TruEra and QuantUniversity in New York in May 2023, explores the potential use cases of LLMs in financial institutions (FIs), the risks to consider, approaches to manage these risks, and the implications for people, skills, and ways of working. Frontline personnel from Data and Analytics/AI teams, Model Risk, Data Management, and other roles from fifteen financial institutions devoted over two hours to discussing the LLM opportunities within their industry, as well as strategies for mitigating associated risks.
The discussions revealed a preference for discriminative use cases over generative ones, with a focus on information retrieval and operational automation. The necessity for a human-in-the-loop was emphasized, along with a detailed discourse on risks and their mitigation.
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
Join CFA Institute and QuantUniversity for an information session about the upcoming CFA Institute Professional Learning course: Python and Data Science for Investment professionals.
Seeing what a gan cannot generate: paper reviewQuantUniversity
Seeing what a GAN cannot Generate Paper review: Bau, David et al. “Seeing What a GAN Cannot Generate.” 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2019): 4501-4510.
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
Machine Learning and AI has revolutionized Finance! In the last five years, innovations in computing, technology and business models have created multiple products and services in Fintech prompting organizations to prioritize their data and AI strategies. What will 2021 bring and how should you prepare for it? Join Sri Krishnamurthy,CFA as we kickoff the QuantUniversity’s Winter school 2021. We will introduce you to the upcoming programs and have a masterclass on 10 innovations in AI and ML you need to know in 2021!
Markowitz portfolio optimization is optimal in theory, however, when applied in practice it often fails catastrophically. Usually, this is addressed by various simplifications to increase robustness. In this talk I will make the case that the reason this theory fails in practice is because uncertainty in the parameter estimation is not taken into account. By using Bayesian statistics we can fix Markowitz and retain all its desirable properties while still having a robustness technique that can be easily extended. This talk is geared at intermediate and will give a general introduction to Bayesian modeling using PyMC3 and focus on application and code examples rather than theory.
With Alternative Data becoming more and more popular in the industry, quants are eager to adopt them into their investment processes. However, with a plethora of options, API standards, trying and evaluating datasets is a major hindrance to adoption of datasets.
Join Yaacov, Sri, James and Brad discuss the opportunities, pitfalls and challenges of Alternative Data and its adoption in finance
A Unified Framework for Model Explanation
Ian Covert, University of Washington
Explainable AI is becoming increasingly important, but the field is evolving rapidly and requires better organizing principles to remain manageable for researchers and practitioners. In this talk, Ian will discuss a new paper that unifies a large portion of the literature using a simple idea: simulating feature removal. The new class of "removal-based explanations" describes 20+ existing methods (e.g., LIME, SHAP) and reveals underlying links with psychology, game theory and information theory.
Practical examples will be presented and available on the Qu.Academy site
Reference:
Explaining by Removing: A Unified Framework for Model Explanation
Ian Covert, Scott Lundberg, Su-In Lee
https://arxiv.org/abs/2011.14878
Machine Learning Interpretability -
Self-Explanatory Models: Interpretability, Diagnostics and Simplification
With Agus Sudjianto, Wells Fargo
The deep neural networks (DNNs) have achieved great success in learning complex patterns with strong predictive power, but they are often thought of as "black box"models without a sufficient level of transparency and interpretability. It is important to demystify the DNNs with rigorous mathematics and practical tools, especially when they are used for mission-critical applications. This talk aims to unwrap the black box of deep ReLU networks through exact local linear representation, which utilizes the activation pattern and disentangles the complex network into an equivalent set of local linear models (LLMs). We develop a convenient LLM-based toolkit for interpretability, diagnostics, and simplification of a pre-trained deep ReLU network. We propose the local linear profile plot and other visualization methods for interpretation and diagnostics, and an effective merging strategy for network simplification. The proposed methods are demonstrated by simulation examples, benchmark datasets, and a real case study in credit risk assessment. The paper that will be presented in this talk can be found here.
In 2009 author and motivational speaker Simon Sinek delivered the now-classic TED talk “Start with why”. Viewed by over 28 million people, “Start with Why” is the third most popular TED video of all time and it teaches us that great leaders and companies inspire us to take action by focusing on the WHY over the “what” or the “how”. In this talk we’ll ask how applied data and computational scientists can use the power of WHY to frame problems, inspire others, and give them answers to business questions they might never think of asking.
Bio
Jessica Stauth is a Managing Director in Fidelity Labs, an internal startup incubator with a mission to create new fintech businesses that drive growth for the firm. Dr. Stauth previously held roles as Managing Director of Portfolio Management, Research, and Trading at Quantopian, a crowd-sourced systematic hedge fund based in Boston, Director of Quant Product Strategy for Thomson Reuters (now Refinitiv), and as a Senior Quant Researcher at the StarMine Corporation, where she built global stock selection models including the design and implementation of the StarMine Short Interest model. Dr. Stauth holds a PhD in Biophysics from UC Berkeley, where her research focused on computational neuroscience.
Qu speaker series:Ethical Use of AI in Financial MarketsQuantUniversity
As AI and ML penetrates the financial industry, there are growing concerns about ethical use of AI in Finance. In this talk, Dan will focus on how the AI can be operationalized to help industry professionals and executive teams alike think about opportunities, risks as well as required actions factoring in ethics in our data-driven world.
The world has changed in the last six months with COVID-19! There have been a shakeup in business models and funding. As companies and customers change their behaviors, we are seeing changes on how companies are addressing new challenges.
Join Fintech experts, D.Shahrawat and Sarah Biller for a not to be missed conversation on Fintech in the Post-Covid age
Master Class: GANS with Applications in Synthetic Data GenerationQuantUniversity
Join QuantUniversity for a complimentary fall speaker series where you will hear from Quants, innovators, startups and Fintech experts on various topics in Quant Investing, Machine Learning, Optimization, Fintech, AI etc.
Master Class: GANS with applications in Synthetic data generation
With various innovations in neural networks, GANs are becoming popular as a means of generating synthetic data.
In this master class, Gautier will discuss Generative Adversarial Networks (GANs) and discuss applications in synthetic data generation and other quantitative finance applications. He will also discuss his work on CORRGANS, Sampling Realistic Financial Correlation Matrices Using Generative Adversarial Networks.[1]
Reference:
1. https://arxiv.org/abs/1910.09504
This workshop will look into ways to create synthetic data from lending club loan record datasets alongside comparing characteristics and statistical properties of real and synthetic datasets. There will also be discussions into building machine learning models for predicting interest rates using real and synthetic datasets and evaluating the performance and discuss the advantages and disadvantages of using synthetic datasets as a proxy for real datasets
Frontiers in Alternative Data : Techniques and Use CasesQuantUniversity
QuantUniversity Summer School 2020 (https://qusummerschool.splashthat.com/)
https://quspeakerseries10.splashthat.com/
Lecture 1: Alexander Denev
In this talk, Alexander will introduce Alternative Data and discuss it's uses from his book, The Book of Alternative Data
- What is alternative data?
- Adoption of alternative data
- Information value chain
- Risks associated with alternative data
- Processes required to develop signals
- Valuation of alternative data
Lecture 2: Saeed Amen
In this talk, Saeed will discuss use cases in Alternative Data
-Deciphering Federal Reserve communications
- Using CLS flow data to trade FX
- Geospatial Insight satellite data to estimate retailers' EPS
- Saving "alpha" with transaction cost analysis
- Using Bloomberg News data to trade FX
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfEnterprise Wired
In this guide, we'll explore the key considerations and features to look for when choosing a Trusted analytics platform that meets your organization's needs and delivers actionable intelligence you can trust.
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
Learn SQL from basic queries to Advance queriesmanishkhaire30
Dive into the world of data analysis with our comprehensive guide on mastering SQL! This presentation offers a practical approach to learning SQL, focusing on real-world applications and hands-on practice. Whether you're a beginner or looking to sharpen your skills, this guide provides the tools you need to extract, analyze, and interpret data effectively.
Key Highlights:
Foundations of SQL: Understand the basics of SQL, including data retrieval, filtering, and aggregation.
Advanced Queries: Learn to craft complex queries to uncover deep insights from your data.
Data Trends and Patterns: Discover how to identify and interpret trends and patterns in your datasets.
Practical Examples: Follow step-by-step examples to apply SQL techniques in real-world scenarios.
Actionable Insights: Gain the skills to derive actionable insights that drive informed decision-making.
Join us on this journey to enhance your data analysis capabilities and unlock the full potential of SQL. Perfect for data enthusiasts, analysts, and anyone eager to harness the power of data!
#DataAnalysis #SQL #LearningSQL #DataInsights #DataScience #Analytics
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
Adjusting primitives for graph : SHORT REPORT / NOTESSubhajit Sahu
Graph algorithms, like PageRank Compressed Sparse Row (CSR) is an adjacency-list based graph representation that is
Multiply with different modes (map)
1. Performance of sequential execution based vs OpenMP based vector multiply.
2. Comparing various launch configs for CUDA based vector multiply.
Sum with different storage types (reduce)
1. Performance of vector element sum using float vs bfloat16 as the storage type.
Sum with different modes (reduce)
1. Performance of sequential execution based vs OpenMP based vector element sum.
2. Performance of memcpy vs in-place based CUDA based vector element sum.
3. Comparing various launch configs for CUDA based vector element sum (memcpy).
4. Comparing various launch configs for CUDA based vector element sum (in-place).
Sum with in-place strategies of CUDA mode (reduce)
1. Comparing various launch configs for CUDA based vector element sum (in-place).
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Discussion on Vector Databases, Unstructured Data and AI
https://www.meetup.com/unstructured-data-meetup-new-york/
This meetup is for people working in unstructured data. Speakers will come present about related topics such as vector databases, LLMs, and managing data at scale. The intended audience of this group includes roles like machine learning engineers, data scientists, data engineers, software engineers, and PMs.This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
The Building Blocks of QuestDB, a Time Series Database
QCon conference 2019
1. AI and Machine Learning
2019 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.analyticscertificate.com
06/28/2019
Qcon Conference
New York, NY
2. 2
Speaker bio
• Advisory and Consultancy for Financial
Analytics
• Prior Experience at MathWorks, Citigroup
and Endeca and 25+ financial services and
energy customers.
• Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Teaches Analytics in the Babson College MBA
program and at Northeastern University,
Boston
• Reviewer: Journal of Asset Management
Sri Krishnamurthy
Founder and CEO
QuantUniversity
3. 3
About www.QuantUniversity.com
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science
and Big Data Technologies using
MATLAB, Python and R
• Building a platform for AI
and Machine Learning Enablement
in the Enterprise
4. AM
• Key trends in AI and machine learning
• 5 things you need to know about machine learning
• Machine Learning in 1 hour
• Lending Club - Prediction
PM
• Case studies
▫ Stock Data - Clustering
▫ Freddie Mac – Classification
▫ Recap: Building a ML application in 10 steps
Agenda
7. 7
The 4th Industrial revolution is Here!
Source: Christoph Roser at AllAboutLean.com
As per Wikipedia*, “The 4th Industrial Revolution ….. marked by emerging technology breakthroughs in a
number of fields, including robotics, artificial intelligence, nanotechnology, quantum computing, biotechnology,
the Internet of Things, the Industrial Internet of Things (IIoT), decentralized consensus, fifth-generation wireless
technologies (5G), additive manufacturing/3D printing and fully autonomous vehicles.”
* https://en.wikipedia.org/wiki/Fourth_Industrial_Revolution
8. 8
Your challenge is to design an artificial intelligence and machine learning (AI/ML)
framework capable of flying a drone through several professional drone racing
courses without human intervention or navigational pre-programming.
AI is no longer science fiction!
Source: https://www.lockheedmartin.com/en-us/news/events/ai-innovation-challenge.html
9. 9
Scientists are disrupting the way we live!
Source: https://www.ladn.eu/tech-a-suivre/mobilite-2030-vehicules-volants-open-data/
10. 10
Interest in Machine learning continues to grow
https://www.wipo.int/edocs/pubdocs/en/wipo_pub_1055.pdf
14. 14
Machine Learning & AI in finance: A paradigm shift
14
Stochastic
Models
Factor Models
Optimization
Risk Factors
P/Q Quants
Derivative pricing
Trading Strategies
Simulations
Distribution
fitting
Quant
Real-time analytics
Predictive analytics
Machine Learning
RPA
NLP
Deep Learning
Computer Vision
Graph Analytics
Chatbots
Sentiment Analysis
Alternative Data
Data Scientist
15. 15
CFA Institute has adopted Fintech and AI content in its curriculum
Ref: https://www.cfainstitute.org/-/media/documents/support/programs/cfa/cfa-program-level-iii-fintech-in-investment-management.ashx
17. 17
The rise of Big Data and Data Science
17
Image Source: http://www.ibmbigdatahub.com/sites/default/files/infographic_file/4-Vs-of-big-data.jpg
18. 18
Smart Algorithms
18
Distributing Computing Frameworks Deep Learning Frameworks
1. Our labeled datasets were thousands of times too
small.
2. Our computers were millions of times too slow.
3. We initialized the weights in a stupid way.
4. We used the wrong type of non-linearity.
- Geoff Hinton
“Capital One was able to determine fraudulent credit
card applications in 100 milliseconds”*
* http://go.databricks.com/hubfs/pdfs/Databricks-for-FinTech-170306.pdf
22. Use Cases in NLP
Risk Management
Power risk models by
informing clients about
their portfolio exposures
to headline risk and
public disclosures.
Compliance
Reduce costs in trade
surveillance and
compliance by
reducing the number
of false-positives
chased by analysts
and officers.
Benchmarks
Create innovative
investable indexes
powered by AI and
Big Data.
Alpha Generation
Create trading signals
by ingesting event and
sentiment data; identify
securities that are likely
to suffer from short
squeezes or reversals.
23. Risk Systems That Read®
• Northfield uses machine learning based analysis of news text
to describe how current conditions in financial markets are
different than usual.
• Typically, over 8000 articles per day containing more than
20,000 “topics” (companies, industries, countries) are
processed.
• The nature and magnitudes of these difference are used to
revise expectations of financial market risks for all global
equities and credit instruments on a daily basis.
24.
25. 25
• Machine learning is the scientific study of algorithms and statistical
models that computer systems use to effectively perform a specific task
without using explicit instructions, relying on patterns and inference
instead1
• Artificial intelligence is intelligence demonstrated by machines, in
contrast to the natural intelligence displayed by humans and animals1
Definitions: Machine Learning and AI
25
1. https://en.wikipedia.org/wiki/Machine_learning
2. Figure Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
26. 26
1. Data
2. Goals
3. Machine learning algorithms
4. Process
5. Performance evaluation
Key steps involved
27.
28. 28
Dataset, variable and Observations
Dataset: A rectangular array with Rows as observations and
columns as variables
Variable: A characteristic of members of a population ( Age, State
etc.)
Observation: List of Variable values for a member of the
population
29. 29
Variables
A variable could be:
▫ Categorical
– Yes/No flags
– AAA,BB ratings for bonds
▫ Numerical
– 35 mpg
– $170K salary
33. 33
• Descriptive Statistics
▫ Goal is to describe the data at hand
▫ Backward-looking
▫ Statistical techniques employed here
• Predictive Analytics
▫ Goal is to use historical data to build a model for prediction
▫ Forward-looking
▫ Machine learning & AI techniques employed here
Goal
33
34. 34
• How do you summarize numerical variables ?
• How do you summarize categorical variables ?
• How do you describe variability in numerical variables ?
• How do you summarize relationships between categorical and
numerical variables ?
• How do you summarize relationships between 2 numerical
variables?
Descriptive Statistics – Cross sectional datasets
34
35. 35
Goal is to extract the various components
Longitudinal datasets
35
36. 36
• Given a dataset, build a model that captures the
similarities in different observations and assigns
them to different buckets.
• Given a set of variables, predict the value of
another variable in a given data set
▫ Predict salaries given work experience, education etc.
▫ Predict whether a loan would be approved given fico
score, current loans, employment status etc.
Predictive Analytics : Cross sectional datasets
36
37. 37
• Given a time series dataset, build a model that can be used to
forecast values in the future
Predictive Analytics : Time series datasets
37
42. 42
Supervised Algorithms
▫ Given a set of variables 𝑥", predict the value of another variable 𝑦 in
a given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
▫ Example: Given that a customer’s Debt-to-Income ratio increased 20%, what are
the chances he/she would default in 3 months?
Machine Learning
42
x1,x2,x3… Model F(X) y
43. 43
Unsupervised Algorithms
▫ Given a dataset with variables 𝑥", build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
▫ Example: Given a list of emerging market stocks, can we segment them
into three buckets?
Machine Learning
43
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
45. 45
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
45
𝑌 = 𝛽' + 𝛽) 𝑋)
Linear Regression Model Neural network Model
46. 46
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest neighbors, Decision Trees
Supervised Learning models
46
K-nearest neighbor Model Decision tree Model
47. 47
• Given estimates +𝛽', +𝛽), … , +𝛽.We can make predictions using
the formula
/𝑦 = +𝛽' + +𝛽) 𝑥) + +𝛽0 𝑥0 + ⋯ + +𝛽. 𝑥.
• The parameters are estimated using the least squares approach
to minimize the sum of squared errors
𝑅𝑆𝑆 = 4
"5)
6
(𝑦" − /𝑦")0
Multiple linear regression
47
48. 48
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Logistic Regression, Neural Networks
Supervised Learning models - Classification
48
Logistic Regression Model Neural network Model
49. 49
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest Neighbors, Decision Trees
Supervised Learning models
49
K-nearest neighbor Model Decision tree Model
50. 50
Unsupervised Algorithms
▫ Given a dataset with variables 𝑥", build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
50
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
51. 51
• These methods partition the data into k clusters by assigning each data point to its
closest cluster centroid by minimizing the within-cluster sum of squares (WSS), which
is:
4
:5)
;
4
"∈=>
4
?5)
@
(𝑥"? − 𝜇:?)0
where 𝑆: is the set of observations in the kth cluster and 𝜇:? is the mean of jth
variable of the cluster center of the kth cluster.
• Then, they select the top n points that are the farthest away from their nearest
cluster centers as outliers.
K-means clustering
51
60. 60
• What transformations do I need for the x and y variables ?
• Which are the best features to use?
▫ Dimension Reduction – PCA
▫ Best subset selection
– Forward selection
– Backward elimination
– Stepwise regression
Feature Engineering
60
64. 64
• The prediction error for record i is defined as the difference
between its actual y value and its predicted y value
𝑒" = 𝑦" − /𝑦"
• 𝑅0
indicates how well data fits the statistical model
𝑅0
= 1 −
∑"5)
6
(𝑦" − /𝑦")0
∑"5)
6
(𝑦" − E𝑦")0
Prediction Accuracy Measures
65. 65
• Fit measures in classical regression modeling:
• Adjusted 𝑅0 has been adjusted for the number of predictors. It increases
only when the improve of model is more than one would expect to see by
chance (p is the total number of explanatory variables)
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅0 = 1 −
⁄∑"5)
6
(𝑦" − /𝑦")0 (𝑛 − 𝑝 − 1)
∑"5)
6
𝑦" − E𝑦"
0 /(𝑛 − 1)
• MAE or MAD (mean absolute error/deviation) gives the magnitude of the
average absolute error
𝑀𝐴𝐸 =
∑"5)
6
𝑒"
𝑛
Prediction Accuracy Measures
66. 66
▫ MAPE (mean absolute percentage error) gives a percentage score of
how predictions deviate on average
𝑀𝐴𝑃𝐸 =
∑"5)
6
𝑒"/𝑦"
𝑛
×100%
• RMSE (root-mean-squared error) is computed on the training and
validation data
𝑅𝑀𝑆𝐸 = 1/𝑛 4
"5)
6
𝑒"
0
Prediction Accuracy Measures
67. 67
• Consider a two-class case with classes 𝐶' and 𝐶)
• Classification matrix:
Classification matrix
Predicted Class
Actual Class 𝐶' 𝐶)
𝐶'
𝑛','= number of 𝐶' cases
classified correctly
𝑛',)= number of 𝐶' cases
classified incorrectly as 𝐶)
𝐶)
𝑛),'= number of 𝐶) cases
classified incorrectly as 𝐶'
𝑛),)= number of 𝐶) cases
classified correctly
69. 69
• The ROC curve plots the pairs {sensitivity, 1-
specificity} as the cutoff value increases from 0
and 1
• Sensitivity (also called the true positive rate, or
recall in some fields) measures the proportion of
positives that are correctly identified (e.g., the
percentage of sick people who are correctly
identified as having the condition).
• Specificity (also called the true negative rate)
measures the proportion of negatives that are
correctly identified as such (e.g., the percentage of
healthy people who are correctly identified as not
having the condition).
• Better performance is reflected by curves that are
closer to the top left corner
ROC Curve
70. 70
1. Data
2. Goals
3. Machine learning algorithms
4. Process
5. Performance Evaluation
Recap
76. Machine Learning Workflow
Data Scraping/
Ingestion
Data
Exploration
Data Cleansing
and Processing
Feature
Engineering
Model
Evaluation
& Tuning
Model
Selection
Model
Deployment/
Inference
Supervised
Unsupervised
Modeling
Data Engineer, Dev Ops Engineer
Data Scientist/QuantsSoftware/Web Engineer
• AutoML
• Model Validation
• Interpretability
Robotic Process Automation (RPA) (Microservices, Pipelines )
• SW: Web/ Rest API
• HW: GPU, Cloud
• Monitoring
• Regression
• KNN
• Decision Trees
• Naive Bayes
• Neural Networks
• Ensembles
• Clustering
• PCA
• Autoencoder
• RMS
• MAPS
• MAE
• Confusion Matrix
• Precision/Recall
• ROC
• Hyper-parameter
tuning
• Parameter Grids
Risk Management/ Compliance(All stages)
Analysts&
DecisionMakers
77.
78. 78
Claim:
• Machine learning is good for credit-card fraud detection
Caution:
• Beware of imbalanced class problems
• A model that gives 99% accuracy may still not be good enough
1.Machine learning is not a generic solution to all problems
78
79. 79
Claim:
• Our models work on all the datasets we have tested on
Caution:
• Do we have enough data?
• How do we handle bias in datasets?
• Beware of overfitting
• Historical Analysis is not Prediction
2. A prototype model is not A production model
79
80. 80
Prototyping vs Production: The reality
https://www.itnews.com.au/news/hsbc-societe-generale-run-
into-ais-production-problems-477966
Kristy Roth from HSBC:
“It’s been somewhat easy - in a funny way - to
get going using sample data, [but] then you hit
the real problems,” Roth said.
“I think our early track record on PoCs or pilots
hides a little bit the underlying issues.
Matt Davey from Societe Generale:
“We’ve done quite a bit of work with RPA
recently and I have to say we’ve been a bit
disillusioned with that experience,”
“the PoC is the easy bit: it’s how you get that
into production and shift the balance”
81. 81
Claim:
• It works. We don’t know how!
Caution:
• Lots of heuristics; still not a proven science
• Interpretability, Fairness, Auditability of models are important
• Beware of black boxes; Transparency in codebase is paramount
with the proliferation of opensource tools
• Skilled data scientists with knowledge of algorithms and their
appropriate usage are key to successful adoption
3. We are just getting started!
81
82. 82
Claim:
• Machine Learning models are more
accurate than traditional models
Caution:
• Is accuracy the right metric?
• How do we evaluate the model? Accuracy
or F1-Score?
• How does the model behave in different
regimes?
4. Choose the right metrics for evaluation
82
Source:
https://en.wikipedia.org/wiki/Confusion_matrix
83. 83
Claim:
• Machine Learning and AI will replace humans
in most applications
Caution:
• Just because it worked some times doesn’t
mean that the organization can be on
autopilot
• Will we have true AI or Augmented
Intelligence?
• Model risk and robust risk management is
paramount to the success of the
organization.
• We are just getting started!
5. Are we there yet?
83
https://www.bloomberg.com/news/articles/2017-10-
20/automation-starts-to-sweep-wall-street-with-tons-of-
glitches
84. 84
Can Machine Learning algorithms be gamed?
https://www.youtube.com/watch?time_continue=36&v=MIbFv
K2S9g8
https://arxiv.org/abs/1904.08653
86. 86
1. Case Intro
2. Data Exploration of the Credit risk data set
3. Problem Definition and Machine learning
4. Performance Evaluation
5. Deployment
Case study 1
87. 87
Credit risk in consumer credit
Credit-scoring models and techniques assess the risk in
lending to customers.
Typical decisions:
• Grant credit/not to new applicants
• Increasing/Decreasing spending limits
• Increasing/Decreasing lending rates
• What new products can be given to existing applicants ?
88. 88
Credit assessment in consumer credit
History:
• Gut feel
• Social network
• Communities and influence
Traditional:
• Scoring mechanisms through credit bureaus
• Bank assessments through business rules
Newer approaches:
• Peer-to-Peer lending
• Prosper Market place
90. 90
Credit Risk pipeline
Data Ingestion
from Lending
Club
Pre-Processing
Feature
Engineering
Model
Development
and Tuning
Model
Deployment
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
94. • Freddie Mac The Case study Setup
• Design Choices
• The Pipeline
• Demo
#Disrupt19
Agenda
95. 95
• Freddie Mac was created in 1970 to expand the secondary
market for mortgages in the US. Freddie Mac buys mortgages
on the secondary market, pools them, and sells them as
a mortgage-backed security to investors on the open market.
Introduction
95
https://a16z.com/2018/05/19/mortgage-process-players-
problems-opportunities/
96. 96
• Freddie mac data
Goal
96
http://www.freddiemac.com/research/datasets/sf_loanlevel_d
ataset.page
104. 104
2. The Data questions
1. Do you know what data you need ?
2. Do you know if the data is available?
3. Do you have the data ?
4. Do you have the right data?
5. Will you continue to have the data?
Data science in 10 steps
105. 105
3. Develop a data acquisition and data prep strategy
1. Do you know how to get the data ?
2. Who gets the data?
3. How do you process it?
4. How do you access it?
5. How do you version and govern the data?
Data science in 10 steps
106. 106
4. Explore and evaluate your data and get it in the right format
Data science in 10 steps
107. 107
5. Define your goal:
1. Summarization
2. Fact finding
3. Understanding relationships
4. Prediction
Data science in 10 steps
108. 108
6. Shortlist (not “Choose” ) the
techniques/methodologies/algorithms
Data science in 10 steps
109. 109
7. Evaluate/establish business constraints and narrow down your
choices of techniques/methodologies/algorithms
1. Cloud/Cost/Expertise/Cost-Value
2. Build/buy/access
Data science in 10 steps
Outcomes
Time
Quality
Cost
110. 110
8. Establish criteria to know if the methodology/models/algorithms
work
1. Is the process replicable?
2. What performance metrics do we choose?
3. Can you evaluate the performance and validate if the models meet
the criteria?
4. Does it provide business value?
Data science in 10 steps
111. 111
9. Fine tune your algorithms and algorithm selection
1. Hyper parameter tuning
2. Bias-variance tradeoff
3. Handling imbalanced class problems
4. Ensemble techniques
5. AutoML
Data science in 10 steps
https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf
112. 112
10. How will this process reach decision makers
1. Deployment choices (On-prem/Cloud)
2. Frequency of data/model updates
3. Governance/Role/Responsibilities
4. Speed, Scale, Availability, Disaster recovery, Rollback, Pull-Plug
Data science in 10 steps
113. 113
How do you monitor the efficacy of your solution?
1. Retuning
2. Monitoring
3. Model decay
4. Data augmentation
5. Newer innovations
Data science in 10 steps - Bonus
119. 119
• The process of computationally identifying and categorizing
opinions expressed in a piece of text, especially in order to
determine whether the writer's attitude towards a particular
topic, product, etc. is positive, negative, or neutral.
Sentiment Analysis
#Disrupt19
121. 121
• Interpreting emotions
• Labeling data
Options
• APIs
• Human Insight
• Expert Knowledge
• Build your own
Challenges
122. 122
NLP pipeline
Data Ingestion
from Edgar
Pre-Processing
Invoking APIs to
label data
Compare APIs
Build a new
model for
sentiment
Analysis
Stage 1 Stage 2 Stage 3 Stage 4 Stage 5
• Amazon Comprehend API
• Google API
• Watson API
• Azure API
128. Thank you!
Sri Krishnamurthy, CFA, CAP
Founder and CEO
QuantUniversity LLC.
srikrishnamurthy
www.QuantUniversity.com
Contact
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
128