SlideShare a Scribd company logo
1 of 29
Download to read offline
Synthetic Data Generation for Machine Learning
2020 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
Sri.Krishnamurthy@qusandbox.com
www.quantuniversity.com
03/05/2020
Boston, MA
2
Speaker bio
• Quant, Data Science & ML practitioner
• Prior Experience at MathWorks, Citigroup
and Endeca and 25+ financial services and
energy customers.
• Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Teaches Data Science/AI at Northeastern
University, Boston
• Reviewer: Journal of Asset Management
Sri Krishnamurthy
Founder and CEO
QuantUniversity
3
About QuantUniversity
• Boston-based Data Science, Quant
Finance and Machine Learning
training and consulting advisory
• Trained more than 1000 students in
Quantitative methods, Data Science,
ML and Big Data Technologies
• Building a platform for
operationalizing AI and Machine
Learning in the Enterprise
4
1. Challenges with Real Datasets
2. Synthetic Dataset generation tools
▫ Proprietary
▫ Open Source
– Faker
– Data Synthesizer
– SDV
– Synthpop
– GANs
3. Demos
▫ Data Synthesizer
▫ Sales Data Generator
▫ VIX Data Generator
Agenda
Challenges with Real Datasets
6
7
• It may not be feasible to get samples for all
categories
• Lighting conditions
• Modifications (Glasses/No glasses,
Moustache/ No Moustache etc.)
• Positions
Coverage
Challenges with real datasets
8
All scenarios haven’t
played out
• Stress scenarios
• What-if scenarios
Challenges with real datasets
Figure ref: http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
9
Missing values
• Missing at random
• Missing sequences
• Need data to fill frames
Challenges with real datasets
10
• Access
▫ Hard to find
▫ Rare class problems
▫ Privacy concerns
making it difficult to
share
Challenges with real datasets
11
Imbalanced
• Need more samples of rare
class
• Need proxies for data points
that were not observed or
recorded
Challenges with real datasets
12
Labels
• Human labeling is hard
• Synthetic label generators
Challenges with real datasets
Tools for Synthetic Data Generation
14
Proprietary Tools
Company Core Technology
Tonic.ai
All-in-one platform for data anonymization, subsetting, and synthesis
integrated with databases (hadoop, oracle, mysql, MS sql server,
mongo db, amazon aurora/redshift, and google big query)
- Uses Condenser and Masquerade
Mostly.ai
Tablular data using generative deep neural networks (no image data)
CVEDIA
- Sensor modeling and algorithm training
- Handle image using SynCity as a custom pocket laboratory to
generate highly entropic scenes, conditions, and metadata. Enable
real-time Hardware-In-the-Loop (HWIL), Human-In-the-Loop (HITL) or
Software-In-the-Loop (SIL) simulations even with complex sensor
configurations
Deep vision data image creation
synthetic training data
Synthesis.ai The data generation platform for computer vision
15
Opensource tools
16
SDV
https://www.computer.org/csdl/proceedings-
article/dsaa/2016/07796926/12OmNwx3Q7S
17
Data Synthesizer
https://faculty.washington.edu/billhowe/publications/pdfs/pin
g17datasynthesizer.pdf
18
Synthpop
19
VAE
https://arxiv.org/pdf/1808.06444.pdf
20
GAN
https://developers.google.com/machine-
learning/gan/gan_structure
21
WGAN
1. Loan Data Synthesizer
2. Sales Data Generator
3. Vix Data Generator
23
24
Demo 1 – Loan Data Synthesizer
25
Demo 2: Synthetic Sales data generation
26
Demo 3 : Synthetic VIX generation
27
If you want to be a part of QuSandbox private Beta
Contact us:
info@qusandbox
28
1. Model Governance in the Age of Data Science and AI
▫ GFMI Course, March 9th, 10th, New York, NY
2. Synthetic VIX data generation using deep learning techniques
▫ QWAFAFEW meeting - March 17th, 2020, Boston MA
3. Using synthetic data for ML in Finance
▫ 2nd Annual Machine Learning in Quantitative Finance – April 1st, 2020, New York, NY
4. Tackling the biggest limitations of ML
▫ 2nd Annual Machine Learning in Quantitative Finance – April 1st, 2020, New York, NY
5. Foundations of Machine learning and AI for Financial Professionals
▫ 8-week Online course offered in partnership with PRMIA – May 12th – June 30th, 2020, Online
6. A Master Class on AI and Machine Learning for Financial Professionals
▫ Invited session at the 73rd CFA Annual Conference – May 17th, 2020, Atlanta, GA
Upcoming events by QuantUniversity
Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.QuantUniversity.com
www.analyticscertificate.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
29

More Related Content

What's hot

Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care Meenakshi Sood
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerQuantUniversity
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadhMithlesh Sadh
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Why you should care about synthetic data
Why you should care about synthetic dataWhy you should care about synthetic data
Why you should care about synthetic dataRiaktr
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfDavid Rostcheck
 
Generative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfGenerative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfPriyanka Aash
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Krishnaram Kenthapadi
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceMahir Haque
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningﺁﺻﻒ ﻋﻠﯽ ﻣﯿﺮ
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceCarole Goble
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYAndre Muscat
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with WeaviateNETWAYS
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning akira-ai
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionPetteriTeikariPhD
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and ApplicationsEmanuele Ghelfi
 

What's hot (20)

Deep learning health care
Deep learning health care  Deep learning health care
Deep learning health care
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGanger
 
Big data by Mithlesh sadh
Big data by Mithlesh sadhBig data by Mithlesh sadh
Big data by Mithlesh sadh
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Why you should care about synthetic data
Why you should care about synthetic dataWhy you should care about synthetic data
Why you should care about synthetic data
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Large Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdfLarge Language Models - Chat AI.pdf
Large Language Models - Chat AI.pdf
 
Big data
Big dataBig data
Big data
 
Generative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdfGenerative AI and Security (1).pptx.pdf
Generative AI and Security (1).pptx.pdf
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Meta learning tutorial
Meta learning tutorialMeta learning tutorial
Meta learning tutorial
 
Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)Explainable AI in Industry (KDD 2019 Tutorial)
Explainable AI in Industry (KDD 2019 Tutorial)
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Few shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learningFew shot learning/ one shot learning/ machine learning
Few shot learning/ one shot learning/ machine learning
 
FAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practiceFAIRy stories: the FAIR Data principles in theory and in practice
FAIRy stories: the FAIR Data principles in theory and in practice
 
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITYGENERATIVE AI, THE FUTURE OF PRODUCTIVITY
GENERATIVE AI, THE FUTURE OF PRODUCTIVITY
 
stackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviatestackconf 2022: Introduction to Vector Search with Weaviate
stackconf 2022: Introduction to Vector Search with Weaviate
 
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
PPT4: Frameworks & Libraries of Machine Learning & Deep Learning
 
Image Restoration for 3D Computer Vision
Image Restoration for 3D Computer VisionImage Restoration for 3D Computer Vision
Image Restoration for 3D Computer Vision
 
GAN - Theory and Applications
GAN - Theory and ApplicationsGAN - Theory and Applications
GAN - Theory and Applications
 

Similar to Synthetic data generation for machine learning

Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in financeQuantUniversity
 
Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in financeQuantUniversity
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa polandQuantUniversity
 
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute MasterclassMachine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute MasterclassQuantUniversity
 
CFA-NY Workshop - Final slides
CFA-NY Workshop - Final slidesCFA-NY Workshop - Final slides
CFA-NY Workshop - Final slidesQuantUniversity
 
ML and AI in Finance: Master Class
ML and AI in Finance: Master ClassML and AI in Finance: Master Class
ML and AI in Finance: Master ClassQuantUniversity
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCapgemini
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Shashank Garg
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiserQuantUniversity
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA DallasQuantUniversity
 
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...Flink Forward
 
Digital Twin and Smart Spaces
Digital Twin and Smart Spaces Digital Twin and Smart Spaces
Digital Twin and Smart Spaces SANGHEE SHIN
 
Finely Chair talk: Every company is an AI company - and why Universities sho...
Finely Chair talk: Every company is an AI company  - and why Universities sho...Finely Chair talk: Every company is an AI company  - and why Universities sho...
Finely Chair talk: Every company is an AI company - and why Universities sho...Amit Sheth
 
Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4LennartF
 
Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones. Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones. AnandSRao1962
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021QuantUniversity
 

Similar to Synthetic data generation for machine learning (20)

Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in finance
 
Synthetic data in finance
Synthetic data in financeSynthetic data in finance
Synthetic data in finance
 
Ml master class cfa poland
Ml master class   cfa polandMl master class   cfa poland
Ml master class cfa poland
 
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute MasterclassMachine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
Machine Learning and AI: An Intuitive Introduction - CFA Institute Masterclass
 
CFA-NY Workshop - Final slides
CFA-NY Workshop - Final slidesCFA-NY Workshop - Final slides
CFA-NY Workshop - Final slides
 
ML and AI in Finance: Master Class
ML and AI in Finance: Master ClassML and AI in Finance: Master Class
ML and AI in Finance: Master Class
 
CWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pubCWIN17 san francisco-ai implementation-pub
CWIN17 san francisco-ai implementation-pub
 
Ds for finance day1
Ds for finance day1Ds for finance day1
Ds for finance day1
 
Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.Intelligent Big Data analytics for the future.
Intelligent Big Data analytics for the future.
 
Qu for India - QuantUniversity FundRaiser
Qu for India  - QuantUniversity FundRaiserQu for India  - QuantUniversity FundRaiser
Qu for India - QuantUniversity FundRaiser
 
QuSandbox+NVIDIA Rapids
QuSandbox+NVIDIA RapidsQuSandbox+NVIDIA Rapids
QuSandbox+NVIDIA Rapids
 
Ml master class for CFA Dallas
Ml master class for CFA DallasMl master class for CFA Dallas
Ml master class for CFA Dallas
 
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
Flink Forward Berlin 2017: Bas Geerdink, Martijn Visser - Fast Data at ING - ...
 
Digital Twin and Smart Spaces
Digital Twin and Smart Spaces Digital Twin and Smart Spaces
Digital Twin and Smart Spaces
 
Finely Chair talk: Every company is an AI company - and why Universities sho...
Finely Chair talk: Every company is an AI company  - and why Universities sho...Finely Chair talk: Every company is an AI company  - and why Universities sho...
Finely Chair talk: Every company is an AI company - and why Universities sho...
 
Careers in analytics
Careers in analyticsCareers in analytics
Careers in analytics
 
Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4Machine Learning and Power AI Workshop v4
Machine Learning and Power AI Workshop v4
 
ML master class
ML master classML master class
ML master class
 
Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones. Industry Disruptors: AI, Machine Learning and Drones.
Industry Disruptors: AI, Machine Learning and Drones.
 
Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021Machine Learning in Finance: 10 Things You Need to Know in 2021
Machine Learning in Finance: 10 Things You Need to Know in 2021
 

More from QuantUniversity

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !QuantUniversity
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfQuantUniversity
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...QuantUniversity
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...QuantUniversity
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewQuantUniversity
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementQuantUniversity
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0QuantUniversity
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio AllocationQuantUniversity
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset BenchmarksQuantUniversity
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning InterpretabilityQuantUniversity
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in ActionQuantUniversity
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQuantUniversity
 
Qu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial MarketsQu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial MarketsQuantUniversity
 
Fintech in the Post-Covid Age
Fintech in the Post-Covid AgeFintech in the Post-Covid Age
Fintech in the Post-Covid AgeQuantUniversity
 
Ml master class northeastern university
Ml master class   northeastern universityMl master class   northeastern university
Ml master class northeastern universityQuantUniversity
 

More from QuantUniversity (20)

EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !EU Artificial Intelligence Act 2024 passed !
EU Artificial Intelligence Act 2024 passed !
 
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdfManaging-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
Managing-the-Risks-of-LLMs-in-FS-Industry-Roundtable-TruEra-QuantU.pdf
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
Towards Fairer Datasets: Filtering and Balancing the Distribution of the Peop...
 
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
Machine Learning: Considerations for Fairly and Transparently Expanding Acces...
 
Seeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper reviewSeeing what a gan cannot generate: paper review
Seeing what a gan cannot generate: paper review
 
AI Explainability and Model Risk Management
AI Explainability and Model Risk ManagementAI Explainability and Model Risk Management
AI Explainability and Model Risk Management
 
Algorithmic auditing 1.0
Algorithmic auditing 1.0Algorithmic auditing 1.0
Algorithmic auditing 1.0
 
Bayesian Portfolio Allocation
Bayesian Portfolio AllocationBayesian Portfolio Allocation
Bayesian Portfolio Allocation
 
The API Jungle
The API JungleThe API Jungle
The API Jungle
 
Explainable AI Workshop
Explainable AI WorkshopExplainable AI Workshop
Explainable AI Workshop
 
Constructing Private Asset Benchmarks
Constructing Private Asset BenchmarksConstructing Private Asset Benchmarks
Constructing Private Asset Benchmarks
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Responsible AI in Action
Responsible AI in ActionResponsible AI in Action
Responsible AI in Action
 
Qu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in FinanceQu speaker series 14: Synthetic Data Generation in Finance
Qu speaker series 14: Synthetic Data Generation in Finance
 
Qwafafew meeting 5
Qwafafew meeting 5Qwafafew meeting 5
Qwafafew meeting 5
 
Qu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial MarketsQu speaker series:Ethical Use of AI in Financial Markets
Qu speaker series:Ethical Use of AI in Financial Markets
 
Fintech in the Post-Covid Age
Fintech in the Post-Covid AgeFintech in the Post-Covid Age
Fintech in the Post-Covid Age
 
Ml master class northeastern university
Ml master class   northeastern universityMl master class   northeastern university
Ml master class northeastern university
 

Recently uploaded

Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.pptamreenkhanum0307
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 

Recently uploaded (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Machine learning classification ppt.ppt
Machine learning classification  ppt.pptMachine learning classification  ppt.ppt
Machine learning classification ppt.ppt
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 

Synthetic data generation for machine learning

  • 1. Synthetic Data Generation for Machine Learning 2020 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP Sri.Krishnamurthy@qusandbox.com www.quantuniversity.com 03/05/2020 Boston, MA
  • 2. 2 Speaker bio • Quant, Data Science & ML practitioner • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Teaches Data Science/AI at Northeastern University, Boston • Reviewer: Journal of Asset Management Sri Krishnamurthy Founder and CEO QuantUniversity
  • 3. 3 About QuantUniversity • Boston-based Data Science, Quant Finance and Machine Learning training and consulting advisory • Trained more than 1000 students in Quantitative methods, Data Science, ML and Big Data Technologies • Building a platform for operationalizing AI and Machine Learning in the Enterprise
  • 4. 4 1. Challenges with Real Datasets 2. Synthetic Dataset generation tools ▫ Proprietary ▫ Open Source – Faker – Data Synthesizer – SDV – Synthpop – GANs 3. Demos ▫ Data Synthesizer ▫ Sales Data Generator ▫ VIX Data Generator Agenda
  • 6. 6
  • 7. 7 • It may not be feasible to get samples for all categories • Lighting conditions • Modifications (Glasses/No glasses, Moustache/ No Moustache etc.) • Positions Coverage Challenges with real datasets
  • 8. 8 All scenarios haven’t played out • Stress scenarios • What-if scenarios Challenges with real datasets Figure ref: http://www.actuaries.org/CTTEES_SOLV/Documents/StressTestingPaper.pdf
  • 9. 9 Missing values • Missing at random • Missing sequences • Need data to fill frames Challenges with real datasets
  • 10. 10 • Access ▫ Hard to find ▫ Rare class problems ▫ Privacy concerns making it difficult to share Challenges with real datasets
  • 11. 11 Imbalanced • Need more samples of rare class • Need proxies for data points that were not observed or recorded Challenges with real datasets
  • 12. 12 Labels • Human labeling is hard • Synthetic label generators Challenges with real datasets
  • 13. Tools for Synthetic Data Generation
  • 14. 14 Proprietary Tools Company Core Technology Tonic.ai All-in-one platform for data anonymization, subsetting, and synthesis integrated with databases (hadoop, oracle, mysql, MS sql server, mongo db, amazon aurora/redshift, and google big query) - Uses Condenser and Masquerade Mostly.ai Tablular data using generative deep neural networks (no image data) CVEDIA - Sensor modeling and algorithm training - Handle image using SynCity as a custom pocket laboratory to generate highly entropic scenes, conditions, and metadata. Enable real-time Hardware-In-the-Loop (HWIL), Human-In-the-Loop (HITL) or Software-In-the-Loop (SIL) simulations even with complex sensor configurations Deep vision data image creation synthetic training data Synthesis.ai The data generation platform for computer vision
  • 22. 1. Loan Data Synthesizer 2. Sales Data Generator 3. Vix Data Generator
  • 23. 23
  • 24. 24 Demo 1 – Loan Data Synthesizer
  • 25. 25 Demo 2: Synthetic Sales data generation
  • 26. 26 Demo 3 : Synthetic VIX generation
  • 27. 27 If you want to be a part of QuSandbox private Beta Contact us: info@qusandbox
  • 28. 28 1. Model Governance in the Age of Data Science and AI ▫ GFMI Course, March 9th, 10th, New York, NY 2. Synthetic VIX data generation using deep learning techniques ▫ QWAFAFEW meeting - March 17th, 2020, Boston MA 3. Using synthetic data for ML in Finance ▫ 2nd Annual Machine Learning in Quantitative Finance – April 1st, 2020, New York, NY 4. Tackling the biggest limitations of ML ▫ 2nd Annual Machine Learning in Quantitative Finance – April 1st, 2020, New York, NY 5. Foundations of Machine learning and AI for Financial Professionals ▫ 8-week Online course offered in partnership with PRMIA – May 12th – June 30th, 2020, Online 6. A Master Class on AI and Machine Learning for Financial Professionals ▫ Invited session at the 73rd CFA Annual Conference – May 17th, 2020, Atlanta, GA Upcoming events by QuantUniversity
  • 29. Sri Krishnamurthy, CFA, CAP Founder and Chief Data Scientist sri@quantuniversity.com srikrishnamurthy www.QuantUniversity.com www.analyticscertificate.com www.qusandbox.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 29