SlideShare a Scribd company logo
1 of 21
ThinkFast: Scaling Machine Learning
to Modern Demands
Hristo Paskov
The Genomic Data Deluge
• Precision Medicine
Initiative: sequence
1,000,000 genomes
– $215 Million in 2015
– Pilot study
– Outputs 10-50 GB/person
How do we analyze all of this data to drive
progress?
Massive Data Sources
News
eCommerce
Bioinformatics
100K Genomes
Social Media
The Analysis Refinement Cycle
⨂
Data
1
2
𝑦 − 𝑋𝑤 2
2
+
𝜆
2
𝑤 2
2
Model
𝑥+
= 𝑥 − 𝛼𝑀𝛻𝑓 𝑥
Solver
Model
captures
data
nuance?
Solver
exists, is
fast
enough?
Yes? Proceed
! No? Quit
Increase time, money, experience, resources
More Than Just Training Models
• Regularization paths
• Model risk assessment
• Interpretability
ModelCoefficient
Regularization Parameter
Brief History of Statistical Learning
Interpretability & Statistical Guarantees
Scalability
Ease of
Use
Simple
Models
Kernel
Methods
Trees &
Ensembles
Structured
Regularization
Structured Regularization
Losses
Regression
Classification
Ranking
Motif Finding
Matrix Factorization
Feature Embedding
Data Imputation
…
Regularizers
Sparsity
Spatial/ Temporal /
Manifold Structure
Group Structure
Hierarchical Structure
Structured & Unstructured
Multitask Learning
…
min
𝛽∈ℝ 𝑑
𝐿 𝑋𝛽 + 𝜆𝑅 𝛽
The Lasso’s Combinatorial Side
min
𝛽∈ℝ 𝑑
𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1
𝜆
0
3
2
1
4
ModelCoefficient
The Database Perspective
min
𝛽∈ℝ 𝑑
𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1
−𝑋 𝑇
𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1
The Database Perspective
−𝑋 𝑇
𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1
Feature & label storage
The Database Perspective
−𝑋 𝑇
𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1
Feature & label storage
Data access operations
𝑢 = 𝑦 − 𝑋𝛽
𝑣 = 𝜕 𝑢 𝐿 𝑢
𝑤 = 𝑋 𝑇 𝑣
The Database Perspective
−𝑋 𝑇
𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1
Feature & label storage
Data access operations
𝑢 = 𝑦 − 𝑋𝛽
𝑣 = 𝜕 𝑢 𝐿 𝑢
𝑤 = 𝑋 𝑇 𝑣
ML “Query Language” min
𝛽∈ℝ 𝑑
𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1
The Database Perspective
min
𝛽1,𝛽2,𝛽3∈ℝ 𝑑
𝑡=1
3
𝐿 𝑡 𝑦𝑡 − 𝑋𝑡 𝛽𝑡 + 𝜆 𝑡 𝑅𝑡 𝛽𝑡
+𝜔 𝛽1 𝛽2 𝛽3 ∗
The Database Perspective
Feature, label and
model storage
Data access operations
𝑢 = 𝑦 − 𝑋𝛽
𝑣 = 𝜕 𝑢 𝐿 𝑢
𝑤 = 𝑋 𝑇
𝑣
ML “Query Language” min
𝛽∈ℝ 𝑑
𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1
𝑀1
𝑀2
𝑀1
𝑀2
𝑀3
𝑀1
𝑀2
The Database Perspective
𝑢 = 𝑦 − 𝑋𝛽
𝑣 = 𝜕 𝑢 𝐿 𝑢
𝑤 = 𝑋 𝑇
𝑣
min
𝛽∈ℝ 𝑑
𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1
𝑀1
𝑀2
𝑀1
𝑀2
𝑀3
𝑀1
𝑀2
Processing Memory
Mathematical
Structure
Efficient Feature Storage
“Query Language” Optimization
• Static analysis
𝑦 − 𝑋𝑤 2
2
+ 𝑤 2
2
𝑦 − 𝑋𝑤 2
2
+ 𝑤 1
?
𝑦 − 𝑋𝑤 2
2
+
1
2
𝑤 2
2
+ 𝑤 1
“Query Language” Optimization
• Static analysis
𝑦 − 𝑋𝑤 2
2
+ 𝑤 2
2
𝑦 − 𝑋𝑤 2
2
+ 𝑤 1
𝑦 − 𝑋𝑤 2
2
+
1
2
𝑤 2
2
+ 𝑤 1
?
𝜀 𝑦 − 𝑋𝑤 +
1
2
𝑤 2
2
+ 𝑤 1
“Query Language” Optimization
• Static analysis
• Runtime analysis
Some Bioinformatics Applications
• Personalized medicine, Memorial Sloan
Kettering Cancer Center
– 35% accuracy improvement over state-of-the-art
• Metagenomic binning and DNA quality
assessment, Stanford School of Medicine
– Previously unsolved problem
• Toxicogenomic analysis, Stanford University
– Improved on state-of-the-art results
Upcoming
• Massive scale character level sentiment and
text analysis on Amazon data
– Billions of features, hours to solve a model
– Efficient multitask learning
• Characterize the global limitations of learning
word structure
– Devise provably more efficient regularizers for
uncovering structure

More Related Content

What's hot

`Data mining
`Data mining`Data mining
`Data mining
Jebin R
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
bhagathk
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAP
jaya lakshmi
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org
AIBDP
 

What's hot (17)

Data Science Training
Data Science TrainingData Science Training
Data Science Training
 
Data science
Data scienceData science
Data science
 
Data Mining
Data MiningData Mining
Data Mining
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
`Data mining
`Data mining`Data mining
`Data mining
 
Data mining techniques unit 2
Data mining techniques unit 2Data mining techniques unit 2
Data mining techniques unit 2
 
Machine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache SparkMachine Learning with Big Data using Apache Spark
Machine Learning with Big Data using Apache Spark
 
introduction to data mining tutorial
introduction to data mining tutorial introduction to data mining tutorial
introduction to data mining tutorial
 
Application of KDD & its future scope
Application of KDD & its future scopeApplication of KDD & its future scope
Application of KDD & its future scope
 
Introduction to Datamining Concept and Techniques
Introduction to Datamining Concept and TechniquesIntroduction to Datamining Concept and Techniques
Introduction to Datamining Concept and Techniques
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Data mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid languageData mining and Machine learning expained in jargon free & lucid language
Data mining and Machine learning expained in jargon free & lucid language
 
Data mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAPData mining course learning outcomes,Data Mining CMAP
Data mining course learning outcomes,Data Mining CMAP
 
Kdd process
Kdd processKdd process
Kdd process
 
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 5 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org"Demystifying Big Data by AIBDP.org
"Demystifying Big Data by AIBDP.org
 
Towards Visualization Recommendation Systems
Towards Visualization Recommendation SystemsTowards Visualization Recommendation Systems
Towards Visualization Recommendation Systems
 

Viewers also liked

No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
Domino Data Lab
 
0 esn no_transmisibles_2016
0 esn no_transmisibles_20160 esn no_transmisibles_2016
0 esn no_transmisibles_2016
rikard0
 
Miúdos a votos – 5º a (divulgação)
Miúdos a votos – 5º a (divulgação)Miúdos a votos – 5º a (divulgação)
Miúdos a votos – 5º a (divulgação)
paulocapelo
 

Viewers also liked (20)

Lean Data Science
Lean Data ScienceLean Data Science
Lean Data Science
 
Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...Computable content: Notebooks, containers, and data-centric organizational le...
Computable content: Notebooks, containers, and data-centric organizational le...
 
No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
 
Data Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software EngineersData Scientists Are Analysts Are Also Software Engineers
Data Scientists Are Analysts Are Also Software Engineers
 
Data Science and Goodhart's Law
Data Science and Goodhart's LawData Science and Goodhart's Law
Data Science and Goodhart's Law
 
Success Through an Actionable Data Science Stack
Success Through an Actionable Data Science StackSuccess Through an Actionable Data Science Stack
Success Through an Actionable Data Science Stack
 
Sentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social MediaSentiment Analysis of Film-Related Messages on Social Media
Sentiment Analysis of Film-Related Messages on Social Media
 
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment IndustriesCapturing the Mirage: Machine Learning in Media and Entertainment Industries
Capturing the Mirage: Machine Learning in Media and Entertainment Industries
 
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry DataA Tour of the Data Science Process, a Case Study Using Movie Industry Data
A Tour of the Data Science Process, a Case Study Using Movie Industry Data
 
Open Data for Social Good
Open Data for Social GoodOpen Data for Social Good
Open Data for Social Good
 
The Right Question
The Right QuestionThe Right Question
The Right Question
 
Realtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going OnRealtime Learning: Using Triggers to Know What the ?$# is Going On
Realtime Learning: Using Triggers to Know What the ?$# is Going On
 
0 esn no_transmisibles_2016
0 esn no_transmisibles_20160 esn no_transmisibles_2016
0 esn no_transmisibles_2016
 
Machine Learning at Netflix
Machine Learning at NetflixMachine Learning at Netflix
Machine Learning at Netflix
 
Challenges of Predicting User Engagement
Challenges of Predicting User EngagementChallenges of Predicting User Engagement
Challenges of Predicting User Engagement
 
Nerve repair postop rehab
Nerve repair   postop rehabNerve repair   postop rehab
Nerve repair postop rehab
 
2016 taller-1-de-propiedades-de-los-fluidos-de-yacimientos-copia-1.1 (1)
2016 taller-1-de-propiedades-de-los-fluidos-de-yacimientos-copia-1.1 (1)2016 taller-1-de-propiedades-de-los-fluidos-de-yacimientos-copia-1.1 (1)
2016 taller-1-de-propiedades-de-los-fluidos-de-yacimientos-copia-1.1 (1)
 
Miúdos a votos – 5º a (divulgação)
Miúdos a votos – 5º a (divulgação)Miúdos a votos – 5º a (divulgação)
Miúdos a votos – 5º a (divulgação)
 
Evaluation technologies
Evaluation technologiesEvaluation technologies
Evaluation technologies
 
Proyecto de pequeña empresa.
Proyecto de pequeña empresa.Proyecto de pequeña empresa.
Proyecto de pequeña empresa.
 

Similar to ThinkFast: Scaling Machine Learning to Modern Demands

ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
Srinath Perera
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
butest
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
Dong Guo
 
IQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health PolicyIQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health Policy
alexstorer
 

Similar to ThinkFast: Scaling Machine Learning to Modern Demands (20)

ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
ICTER 2014 Invited Talk: Large Scale Data Processing in the Real World: from ...
 
Semantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media PostsSemantic Analysis to Compute Personality Traits from Social Media Posts
Semantic Analysis to Compute Personality Traits from Social Media Posts
 
Machine Learning of Natural Language
Machine Learning of Natural LanguageMachine Learning of Natural Language
Machine Learning of Natural Language
 
Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"Approximate "Now" is Better Than Accurate "Later"
Approximate "Now" is Better Than Accurate "Later"
 
Large Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine LearningLarge Scale Data Mining using Genetics-Based Machine Learning
Large Scale Data Mining using Genetics-Based Machine Learning
 
AutoML for Data Science Productivity and Toward Better Digital Decisions
AutoML for Data Science Productivity and Toward Better Digital DecisionsAutoML for Data Science Productivity and Toward Better Digital Decisions
AutoML for Data Science Productivity and Toward Better Digital Decisions
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
 
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
A Maturing Role of Workflows in the Presence of Heterogenous Computing Archit...
 
A Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big DataA Production Quality Sketching Library for the Analysis of Big Data
A Production Quality Sketching Library for the Analysis of Big Data
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
Predictive analytics and big data tutorial
Predictive analytics and big data tutorial Predictive analytics and big data tutorial
Predictive analytics and big data tutorial
 
Machine Learning for Data Extraction
Machine Learning for Data ExtractionMachine Learning for Data Extraction
Machine Learning for Data Extraction
 
Data Management for Quantitative Biology - Database systems, May 7, 2015, Dr....
Data Management for Quantitative Biology - Database systems, May 7, 2015, Dr....Data Management for Quantitative Biology - Database systems, May 7, 2015, Dr....
Data Management for Quantitative Biology - Database systems, May 7, 2015, Dr....
 
Predictive model and segmented sensitivity analysis
Predictive model and segmented sensitivity analysisPredictive model and segmented sensitivity analysis
Predictive model and segmented sensitivity analysis
 
Big data 4 webmonday
Big data 4 webmondayBig data 4 webmonday
Big data 4 webmonday
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
IQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health PolicyIQSS Presentation to Program in Health Policy
IQSS Presentation to Program in Health Policy
 
ML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdfML MODULE 1_slideshare.pdf
ML MODULE 1_slideshare.pdf
 

More from Domino Data Lab

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
Domino Data Lab
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
Domino Data Lab
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
Domino Data Lab
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 

More from Domino Data Lab (20)

What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...What's in your workflow? Bringing data science workflows to business analysis...
What's in your workflow? Bringing data science workflows to business analysis...
 
The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...The Proliferation of New Database Technologies and Implications for Data Scie...
The Proliferation of New Database Technologies and Implications for Data Scie...
 
Racial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops dataRacial Bias in Policing: an analysis of Illinois traffic stops data
Racial Bias in Policing: an analysis of Illinois traffic stops data
 
Data Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using itData Quality Analytics: Understanding what is in your data, before using it
Data Quality Analytics: Understanding what is in your data, before using it
 
Supporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentationSupporting innovation in insurance with randomized experimentation
Supporting innovation in insurance with randomized experimentation
 
Leveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive IndustryLeveraging Data Science in the Automotive Industry
Leveraging Data Science in the Automotive Industry
 
Summertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile VirusSummertime Analytics: Predicting E. coli and West Nile Virus
Summertime Analytics: Predicting E. coli and West Nile Virus
 
Reproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with JupyterReproducible Dashboards and other great things to do with Jupyter
Reproducible Dashboards and other great things to do with Jupyter
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
How I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked DataHow I Learned to Stop Worrying and Love Linked Data
How I Learned to Stop Worrying and Love Linked Data
 
Software Engineering for Data Scientists
Software Engineering for Data ScientistsSoftware Engineering for Data Scientists
Software Engineering for Data Scientists
 
Making Big Data Smart
Making Big Data SmartMaking Big Data Smart
Making Big Data Smart
 
Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...Moving Data Science from an Event to A Program: Considerations in Creating Su...
Moving Data Science from an Event to A Program: Considerations in Creating Su...
 
Building Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technologyBuilding Data Analytics pipelines in the cloud using serverless technology
Building Data Analytics pipelines in the cloud using serverless technology
 
Leveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science ToolsLeveraging Open Source Automated Data Science Tools
Leveraging Open Source Automated Data Science Tools
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
The Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data ScienceThe Role and Importance of Curiosity in Data Science
The Role and Importance of Curiosity in Data Science
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 

ThinkFast: Scaling Machine Learning to Modern Demands

  • 1. ThinkFast: Scaling Machine Learning to Modern Demands Hristo Paskov
  • 2. The Genomic Data Deluge • Precision Medicine Initiative: sequence 1,000,000 genomes – $215 Million in 2015 – Pilot study – Outputs 10-50 GB/person How do we analyze all of this data to drive progress?
  • 4. The Analysis Refinement Cycle ⨂ Data 1 2 𝑦 − 𝑋𝑤 2 2 + 𝜆 2 𝑤 2 2 Model 𝑥+ = 𝑥 − 𝛼𝑀𝛻𝑓 𝑥 Solver Model captures data nuance? Solver exists, is fast enough? Yes? Proceed ! No? Quit Increase time, money, experience, resources
  • 5. More Than Just Training Models • Regularization paths • Model risk assessment • Interpretability ModelCoefficient Regularization Parameter
  • 6. Brief History of Statistical Learning Interpretability & Statistical Guarantees Scalability Ease of Use Simple Models Kernel Methods Trees & Ensembles Structured Regularization
  • 7. Structured Regularization Losses Regression Classification Ranking Motif Finding Matrix Factorization Feature Embedding Data Imputation … Regularizers Sparsity Spatial/ Temporal / Manifold Structure Group Structure Hierarchical Structure Structured & Unstructured Multitask Learning … min 𝛽∈ℝ 𝑑 𝐿 𝑋𝛽 + 𝜆𝑅 𝛽
  • 8. The Lasso’s Combinatorial Side min 𝛽∈ℝ 𝑑 𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1 𝜆 0 3 2 1 4 ModelCoefficient
  • 9. The Database Perspective min 𝛽∈ℝ 𝑑 𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1 −𝑋 𝑇 𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1
  • 10. The Database Perspective −𝑋 𝑇 𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1 Feature & label storage
  • 11. The Database Perspective −𝑋 𝑇 𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1 Feature & label storage Data access operations 𝑢 = 𝑦 − 𝑋𝛽 𝑣 = 𝜕 𝑢 𝐿 𝑢 𝑤 = 𝑋 𝑇 𝑣
  • 12. The Database Perspective −𝑋 𝑇 𝜕 𝑦−𝑋𝛽 𝐿 𝑦 − 𝑋𝛽 + 𝜆𝜕 𝛽 𝛽 1 Feature & label storage Data access operations 𝑢 = 𝑦 − 𝑋𝛽 𝑣 = 𝜕 𝑢 𝐿 𝑢 𝑤 = 𝑋 𝑇 𝑣 ML “Query Language” min 𝛽∈ℝ 𝑑 𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1
  • 13. The Database Perspective min 𝛽1,𝛽2,𝛽3∈ℝ 𝑑 𝑡=1 3 𝐿 𝑡 𝑦𝑡 − 𝑋𝑡 𝛽𝑡 + 𝜆 𝑡 𝑅𝑡 𝛽𝑡 +𝜔 𝛽1 𝛽2 𝛽3 ∗
  • 14. The Database Perspective Feature, label and model storage Data access operations 𝑢 = 𝑦 − 𝑋𝛽 𝑣 = 𝜕 𝑢 𝐿 𝑢 𝑤 = 𝑋 𝑇 𝑣 ML “Query Language” min 𝛽∈ℝ 𝑑 𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1 𝑀1 𝑀2 𝑀1 𝑀2 𝑀3 𝑀1 𝑀2
  • 15. The Database Perspective 𝑢 = 𝑦 − 𝑋𝛽 𝑣 = 𝜕 𝑢 𝐿 𝑢 𝑤 = 𝑋 𝑇 𝑣 min 𝛽∈ℝ 𝑑 𝐿 𝑦 − 𝑋𝛽 + 𝜆 𝛽 1 𝑀1 𝑀2 𝑀1 𝑀2 𝑀3 𝑀1 𝑀2 Processing Memory Mathematical Structure
  • 17. “Query Language” Optimization • Static analysis 𝑦 − 𝑋𝑤 2 2 + 𝑤 2 2 𝑦 − 𝑋𝑤 2 2 + 𝑤 1 ? 𝑦 − 𝑋𝑤 2 2 + 1 2 𝑤 2 2 + 𝑤 1
  • 18. “Query Language” Optimization • Static analysis 𝑦 − 𝑋𝑤 2 2 + 𝑤 2 2 𝑦 − 𝑋𝑤 2 2 + 𝑤 1 𝑦 − 𝑋𝑤 2 2 + 1 2 𝑤 2 2 + 𝑤 1 ? 𝜀 𝑦 − 𝑋𝑤 + 1 2 𝑤 2 2 + 𝑤 1
  • 19. “Query Language” Optimization • Static analysis • Runtime analysis
  • 20. Some Bioinformatics Applications • Personalized medicine, Memorial Sloan Kettering Cancer Center – 35% accuracy improvement over state-of-the-art • Metagenomic binning and DNA quality assessment, Stanford School of Medicine – Previously unsolved problem • Toxicogenomic analysis, Stanford University – Improved on state-of-the-art results
  • 21. Upcoming • Massive scale character level sentiment and text analysis on Amazon data – Billions of features, hours to solve a model – Efficient multitask learning • Characterize the global limitations of learning word structure – Devise provably more efficient regularizers for uncovering structure

Editor's Notes

  1. [Tons of data, show graph?] [Models are not good] [Howe do we quickly iterate with different models] [Memory $$$]