SlideShare a Scribd company logo
“Data! Data! Data!
I Can’t Make Bricks Without
Clay!”*
Shai Fine
Principal Engineer, Advanced Analytics, Intel
(*) Sherlock Holmes, The Adventure in the Copper Beeches
Big Data, Only a Few Years Back …
Executives Believe in Advanced Analytics
Analytics to the Rescue
• “Without big data analytics, companies are blind and deaf, wandering
out onto the web like deer on a freeway”
• Geoffrey Moore, Author of Crossing the Chasm
• … and who will lead the way?!
Big Data's High-Priests of Algorithms
The Wall Street Journal, Aug. 2014
Adoption of Analytics Faces Hurdles
• Developing Analytics solutions
• Far from being an engineering process
• There is a chasm to cross between “traditional” BI and Advanced Analytics
• Consumability of Analytics
• Deploying Analytics solutions is difficult
• Reliability, “Self Maintenance”
• Analytics Workloads are Challenging
• Speed (latency, time-to-solution), Throughput, Scalability, …
The ML Building Blocks Concept
There are “infinite” number of algorithms and datasets
But there are finite set of Building Blocks
Building Blocks:
A finite set of elements that can be mapped into HW and SW primitives and patterns
Building Blocks
Usages
High-level
Libraries
Low-level
Libraries
Hardware
Platforms
Xeon
Xeon Phi
Xeon FPGA
Iris Pro Graphics
Xeon Accel.
New ISA
Tier-1
Cloud
HPC
Enterprise
Academia
Machine Learning Building Blocks
• ML basic building blocks
1. Linear Algebra
2. Measures
3. Special Functions
4. Mathematical Optimization
5. Data Characteristics
6. Data-dependent Compute
7. Memory Access
8. Very large models
9. Hybrid Methods
• ML Meta building blocks
1. Learning Protocols
2. Learning Phases
3. Algorithmic Flow and Structure
Compute
Data
Compute - Data Interplay
Process
Towards a Comprehensive ML Workload Suite
• Workload design should cover elements of
• Compute
• Data Characteristics
• Data – Compute interplay
• Each workload includes
• Multiple data sets x Multiple algorithms
• Coverage of relevant data characteristics
• Coverage of compute patterns
The Building Block concept provides a mean for designing the ML Workload Suite
Machine Learning Workloads Suite
Workload Linear
Algebra
Measure
Calc.
Special
Funcs
Math
Optim.
Data
Characteristics
Data-dep.
Compute
Mem.
Access
large
model
Linear Algebra
Sparse
Dense
X X X
Un/Supervised,
Numeric
Data
Dependency
X X X
Un/Supervised,
Num/Cat
X X
Large Models X X X
Un/Supervised,
Numeric
X
Workload Dataset Type Characteristics
Linear Algebra
Clustered Dense, Numeric
Graphs Sparse, Numeric
Data
Dependency
Bio informatics High Dep - Dense/Sparse
Clustered Dense
Text High Dep – Sparse
Manufacturing High Dep – Numeric, Dense
Large Models Images Dense, Numeric
ALGORITHMS
DATASETS
Machine Learning Workloads Suite
Workload Linear
Algebra
Measure
Calc.
Special
Funcs
Math
Optim.
Data
Characteristics
Data-dep.
Compute
Mem.
Access
large
model
Linear Algebra
Sparse
Dense
X X X
Un/Supervised,
Numeric
Data
Dependency
X X X
Un/Supervised,
Num/Cat
X X
Large Models X X X
Un/Supervised,
Numeric
X
Workload Dataset Type Characteristics
Linear Algebra
Clustered Dense, Numeric
Graphs Sparse, Numeric
Data
Dependency
Bio informatics High Dep - Dense/Sparse
Clustered Dense
Text High Dep – Sparse
Manufacturing High Dep – Numeric, Dense
Large Models Images Dense, Numeric
ALGORITHMS
DATASETS
ML Bench 1.0
• Algorithm X Data
• Reference Models
• Data Generator
The “Dwarfs” Connection
• Phill Collela’s “Seven Dwarfs” (2004) –
• Patterns of computation and communication
that are important for science and engineering
• Berkley’s view (2006) –
• Extended to 13 Dwarfs after examining
the original 7 Dwarfs outside the HPC scope
• US National Research Council’s Committee
“Frontiers in Massive Data Analysis” (2013) –
• Chapter 10: “The Seven Computational Giants of Massive Data Analysis”
• The ML Building Blocks provide a further extension and a different perspective
• Introducing data characteristics and the interplay with compute, communication, memory
“Data! Data! Data!
I Can’t Make Bricks Without Clay!”
Thank You

More Related Content

What's hot

Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
NYC Predictive Analytics
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
Dony Riyanto
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
Bhaskara Reddy Sannapureddy
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
Dony Riyanto
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
Ivo Andreev
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
Paul Prae
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
Andrew Badera
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
Manuel Martín
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
Setu Chokshi
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Rahul Jain
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
Sri Ambati
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and Applications
Geeta Arora
 
Machine learning 101 dkom 2017
Machine learning 101 dkom 2017Machine learning 101 dkom 2017
Machine learning 101 dkom 2017
fredverheul
 
4 pillars of visualization & communication by Noah Iliinsky
4 pillars of visualization & communication by Noah Iliinsky4 pillars of visualization & communication by Noah Iliinsky
4 pillars of visualization & communication by Noah Iliinsky
iliinsky
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
safa cimenli
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
Haptik
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
Gülden Bilgütay
 

What's hot (20)

Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Graph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media AnalyticsGraph Based Machine Learning with Applications to Media Analytics
Graph Based Machine Learning with Applications to Media Analytics
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
ML DL AI DS BD - An Introduction
ML DL AI DS BD - An IntroductionML DL AI DS BD - An Introduction
ML DL AI DS BD - An Introduction
 
What is Machine Learning
What is Machine LearningWhat is Machine Learning
What is Machine Learning
 
Big Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-onBig Data Analytics (ML, DL, AI) hands-on
Big Data Analytics (ML, DL, AI) hands-on
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
 
Azure Machine Learning 101
Azure Machine Learning 101Azure Machine Learning 101
Azure Machine Learning 101
 
Artificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data AnalysisArtificial Intelligence for Automating Data Analysis
Artificial Intelligence for Automating Data Analysis
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Machine Learning and Applications
Machine Learning and ApplicationsMachine Learning and Applications
Machine Learning and Applications
 
Machine learning 101 dkom 2017
Machine learning 101 dkom 2017Machine learning 101 dkom 2017
Machine learning 101 dkom 2017
 
4 pillars of visualization & communication by Noah Iliinsky
4 pillars of visualization & communication by Noah Iliinsky4 pillars of visualization & communication by Noah Iliinsky
4 pillars of visualization & communication by Noah Iliinsky
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
Aditya Bhattacharya - Enterprise DL - Accelerating Deep Learning Solutions to...
 
A Friendly Introduction to Machine Learning
A Friendly Introduction to Machine LearningA Friendly Introduction to Machine Learning
A Friendly Introduction to Machine Learning
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 

Viewers also liked

Witness named at “free-version” hearing assassinated
Witness named at “free-version” hearing assassinatedWitness named at “free-version” hearing assassinated
Witness named at “free-version” hearing assassinated
Comisión Colombiana de Juristas
 
Judging from its first charges under Law 975, the Prosecutor’s Office does no...
Judging from its first charges under Law 975, the Prosecutor’s Office does no...Judging from its first charges under Law 975, the Prosecutor’s Office does no...
Judging from its first charges under Law 975, the Prosecutor’s Office does no...
Comisión Colombiana de Juristas
 
Som luz 8ano
Som luz 8anoSom luz 8ano
Som luz 8ano
Maria Picão
 
The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...
The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...
The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...
Comisión Colombiana de Juristas
 
La Información
La InformaciónLa Información
Paulz2
Paulz2Paulz2
Practica 5
Practica 5Practica 5
Practica 5
vir_97
 
Operaciones frecuente usuarios
Operaciones frecuente usuariosOperaciones frecuente usuarios
Operaciones frecuente usuarios
eduenlasiberia
 
Componentes de un sistema de Información
Componentes de un sistema de Información Componentes de un sistema de Información
Componentes de un sistema de Información
Cindy Lorena Morales Cardoso
 
La ira
La ira La ira
La ira
greisygomez
 
The Speech Report
The Speech ReportThe Speech Report
The Speech Report
Roselle Reonal
 
Developers Are People, Too
Developers Are People, TooDevelopers Are People, Too
Developers Are People, Too
mor
 
Useful Online Software
Useful Online Software Useful Online Software
Useful Online Software
bibliotecaria
 
Are you ready to start your business?
Are you ready to start your business?Are you ready to start your business?
Are you ready to start your business?
Marshanda Powell
 
Presentación Primer Encuentro
Presentación Primer EncuentroPresentación Primer Encuentro
Presentación Primer Encuentro
Fabiana Andrea Perez
 

Viewers also liked (15)

Witness named at “free-version” hearing assassinated
Witness named at “free-version” hearing assassinatedWitness named at “free-version” hearing assassinated
Witness named at “free-version” hearing assassinated
 
Judging from its first charges under Law 975, the Prosecutor’s Office does no...
Judging from its first charges under Law 975, the Prosecutor’s Office does no...Judging from its first charges under Law 975, the Prosecutor’s Office does no...
Judging from its first charges under Law 975, the Prosecutor’s Office does no...
 
Som luz 8ano
Som luz 8anoSom luz 8ano
Som luz 8ano
 
The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...
The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...
The Colombian Government Wants to Circumvent the Constitutional Court’s Rulin...
 
La Información
La InformaciónLa Información
La Información
 
Paulz2
Paulz2Paulz2
Paulz2
 
Practica 5
Practica 5Practica 5
Practica 5
 
Operaciones frecuente usuarios
Operaciones frecuente usuariosOperaciones frecuente usuarios
Operaciones frecuente usuarios
 
Componentes de un sistema de Información
Componentes de un sistema de Información Componentes de un sistema de Información
Componentes de un sistema de Información
 
La ira
La ira La ira
La ira
 
The Speech Report
The Speech ReportThe Speech Report
The Speech Report
 
Developers Are People, Too
Developers Are People, TooDevelopers Are People, Too
Developers Are People, Too
 
Useful Online Software
Useful Online Software Useful Online Software
Useful Online Software
 
Are you ready to start your business?
Are you ready to start your business?Are you ready to start your business?
Are you ready to start your business?
 
Presentación Primer Encuentro
Presentación Primer EncuentroPresentación Primer Encuentro
Presentación Primer Encuentro
 

Similar to Data! Data! Data! I Can't Make Bricks Without Clay!

Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
Melissa Hornbostel
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
Keshav Tripathy
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
Alok Mohapatra
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
DataWorks Summit
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
P. Taylor Goetz
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
Tung Nguyen
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
Ivo Andreev
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
George Stathis
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
Ivo Andreev
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Ganesan Narayanasamy
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
Mihai Criveti
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
kammeyer
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
Shree Shree
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
Alice Zheng
 
1559 mathematical and visualization software
1559 mathematical and visualization software1559 mathematical and visualization software
1559 mathematical and visualization software
Dr Fereidoun Dejahang
 
Big Data on The Cloud
Big Data on The CloudBig Data on The Cloud
Big Data on The Cloud
Putchong Uthayopas
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big Data
Raymond Yu
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
DotNetCampus
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
SugumarSarDurai
 

Similar to Data! Data! Data! I Can't Make Bricks Without Clay! (20)

Lunch & Learn Intro to Big Data
Lunch & Learn Intro to Big DataLunch & Learn Intro to Big Data
Lunch & Learn Intro to Big Data
 
Bigdata analytics
Bigdata analyticsBigdata analytics
Bigdata analytics
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
An overview of modern scalable web development
An overview of modern scalable web developmentAn overview of modern scalable web development
An overview of modern scalable web development
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Sharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data LessonsSharing a Startup’s Big Data Lessons
Sharing a Startup’s Big Data Lessons
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systemsTraditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
Traditional Machine Learning and Deep Learning on OpenPOWER/POWER systems
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Big data berlin
Big data berlinBig data berlin
Big data berlin
 
02-Lifecycle.pptx
02-Lifecycle.pptx02-Lifecycle.pptx
02-Lifecycle.pptx
 
The Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the MassesThe Challenges of Bringing Machine Learning to the Masses
The Challenges of Bringing Machine Learning to the Masses
 
1559 mathematical and visualization software
1559 mathematical and visualization software1559 mathematical and visualization software
1559 mathematical and visualization software
 
Big Data on The Cloud
Big Data on The CloudBig Data on The Cloud
Big Data on The Cloud
 
Tour of Big Data
Tour of Big DataTour of Big Data
Tour of Big Data
 
Ds01 data science
Ds01   data scienceDs01   data science
Ds01 data science
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 

More from Turi, Inc.

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
Turi, Inc.
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
Turi, Inc.
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
Turi, Inc.
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
Turi, Inc.
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
Turi, Inc.
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
Turi, Inc.
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Turi, Inc.
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
Turi, Inc.
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
Turi, Inc.
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
Turi, Inc.
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
Turi, Inc.
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
Turi, Inc.
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Turi, Inc.
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
Turi, Inc.
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Turi, Inc.
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
Turi, Inc.
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
Turi, Inc.
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
Turi, Inc.
 
SFrame
SFrameSFrame
SFrame
Turi, Inc.
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
Turi, Inc.
 

More from Turi, Inc. (20)

Webinar - Analyzing Video
Webinar - Analyzing VideoWebinar - Analyzing Video
Webinar - Analyzing Video
 
Webinar - Patient Readmission Risk
Webinar - Patient Readmission RiskWebinar - Patient Readmission Risk
Webinar - Patient Readmission Risk
 
Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)Webinar - Know Your Customer - Arya (20160526)
Webinar - Know Your Customer - Arya (20160526)
 
Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)Webinar - Product Matching - Palombo (20160428)
Webinar - Product Matching - Palombo (20160428)
 
Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)Webinar - Pattern Mining Log Data - Vega (20160426)
Webinar - Pattern Mining Log Data - Vega (20160426)
 
Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)Webinar - Fraud Detection - Palombo (20160428)
Webinar - Fraud Detection - Palombo (20160428)
 
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge DatasetsScaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
Scaling Up Machine Learning: How to Benchmark GraphLab Create on Huge Datasets
 
Pattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log DataPattern Mining: Extracting Value from Log Data
Pattern Mining: Extracting Value from Log Data
 
Intelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning ToolkitsIntelligent Applications with Machine Learning Toolkits
Intelligent Applications with Machine Learning Toolkits
 
Text Analysis with Machine Learning
Text Analysis with Machine LearningText Analysis with Machine Learning
Text Analysis with Machine Learning
 
Machine Learning with GraphLab Create
Machine Learning with GraphLab CreateMachine Learning with GraphLab Create
Machine Learning with GraphLab Create
 
Machine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive ServicesMachine Learning in Production with Dato Predictive Services
Machine Learning in Production with Dato Predictive Services
 
Machine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos GuestrinMachine Learning in 2016: Live Q&A with Carlos Guestrin
Machine Learning in 2016: Live Q&A with Carlos Guestrin
 
Scalable data structures for data science
Scalable data structures for data scienceScalable data structures for data science
Scalable data structures for data science
 
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
Introduction to Deep Learning for Image Analysis at Strata NYC, Sep 2015
 
Introduction to Recommender Systems
Introduction to Recommender SystemsIntroduction to Recommender Systems
Introduction to Recommender Systems
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Overview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature EngineeringOverview of Machine Learning and Feature Engineering
Overview of Machine Learning and Feature Engineering
 
SFrame
SFrameSFrame
SFrame
 
Building Personalized Data Products with Dato
Building Personalized Data Products with DatoBuilding Personalized Data Products with Dato
Building Personalized Data Products with Dato
 

Recently uploaded

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
Vadym Kazulkin
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
Fwdays
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
ScyllaDB
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
Ajin Abraham
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
Safe Software
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
Fwdays
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
Neo4j
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
UiPathCommunity
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Neo4j
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
Enterprise Knowledge
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
c5vrf27qcz
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
DianaGray10
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
Fwdays
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 

Recently uploaded (20)

High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024High performance Serverless Java on AWS- GoTo Amsterdam 2024
High performance Serverless Java on AWS- GoTo Amsterdam 2024
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk"Frontline Battles with DDoS: Best practices and Lessons Learned",  Igor Ivaniuk
"Frontline Battles with DDoS: Best practices and Lessons Learned", Igor Ivaniuk
 
A Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's ArchitectureA Deep Dive into ScyllaDB's Architecture
A Deep Dive into ScyllaDB's Architecture
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
AppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSFAppSec PNW: Android and iOS Application Security with MobSF
AppSec PNW: Android and iOS Application Security with MobSF
 
Essentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation ParametersEssentials of Automations: Exploring Attributes & Automation Parameters
Essentials of Automations: Exploring Attributes & Automation Parameters
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
"Scaling RAG Applications to serve millions of users",  Kevin Goedecke"Scaling RAG Applications to serve millions of users",  Kevin Goedecke
"Scaling RAG Applications to serve millions of users", Kevin Goedecke
 
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge GraphGraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
GraphRAG for LifeSciences Hands-On with the Clinical Knowledge Graph
 
Session 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdfSession 1 - Intro to Robotic Process Automation.pdf
Session 1 - Intro to Robotic Process Automation.pdf
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansBiomedical Knowledge Graphs for Data Scientists and Bioinformaticians
Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
Demystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through StorytellingDemystifying Knowledge Management through Storytelling
Demystifying Knowledge Management through Storytelling
 
Y-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PPY-Combinator seed pitch deck template PP
Y-Combinator seed pitch deck template PP
 
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectorsConnector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
Connector Corner: Seamlessly power UiPath Apps, GenAI with prebuilt connectors
 
Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024Northern Engraving | Nameplate Manufacturing Process - 2024
Northern Engraving | Nameplate Manufacturing Process - 2024
 
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin..."$10 thousand per minute of downtime: architecture, queues, streaming and fin...
"$10 thousand per minute of downtime: architecture, queues, streaming and fin...
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 

Data! Data! Data! I Can't Make Bricks Without Clay!

  • 1. “Data! Data! Data! I Can’t Make Bricks Without Clay!”* Shai Fine Principal Engineer, Advanced Analytics, Intel (*) Sherlock Holmes, The Adventure in the Copper Beeches
  • 2. Big Data, Only a Few Years Back …
  • 3. Executives Believe in Advanced Analytics
  • 4. Analytics to the Rescue • “Without big data analytics, companies are blind and deaf, wandering out onto the web like deer on a freeway” • Geoffrey Moore, Author of Crossing the Chasm • … and who will lead the way?! Big Data's High-Priests of Algorithms The Wall Street Journal, Aug. 2014
  • 5. Adoption of Analytics Faces Hurdles • Developing Analytics solutions • Far from being an engineering process • There is a chasm to cross between “traditional” BI and Advanced Analytics • Consumability of Analytics • Deploying Analytics solutions is difficult • Reliability, “Self Maintenance” • Analytics Workloads are Challenging • Speed (latency, time-to-solution), Throughput, Scalability, …
  • 6. The ML Building Blocks Concept There are “infinite” number of algorithms and datasets But there are finite set of Building Blocks Building Blocks: A finite set of elements that can be mapped into HW and SW primitives and patterns Building Blocks Usages High-level Libraries Low-level Libraries Hardware Platforms Xeon Xeon Phi Xeon FPGA Iris Pro Graphics Xeon Accel. New ISA Tier-1 Cloud HPC Enterprise Academia
  • 7. Machine Learning Building Blocks • ML basic building blocks 1. Linear Algebra 2. Measures 3. Special Functions 4. Mathematical Optimization 5. Data Characteristics 6. Data-dependent Compute 7. Memory Access 8. Very large models 9. Hybrid Methods • ML Meta building blocks 1. Learning Protocols 2. Learning Phases 3. Algorithmic Flow and Structure Compute Data Compute - Data Interplay Process
  • 8. Towards a Comprehensive ML Workload Suite • Workload design should cover elements of • Compute • Data Characteristics • Data – Compute interplay • Each workload includes • Multiple data sets x Multiple algorithms • Coverage of relevant data characteristics • Coverage of compute patterns The Building Block concept provides a mean for designing the ML Workload Suite
  • 9. Machine Learning Workloads Suite Workload Linear Algebra Measure Calc. Special Funcs Math Optim. Data Characteristics Data-dep. Compute Mem. Access large model Linear Algebra Sparse Dense X X X Un/Supervised, Numeric Data Dependency X X X Un/Supervised, Num/Cat X X Large Models X X X Un/Supervised, Numeric X Workload Dataset Type Characteristics Linear Algebra Clustered Dense, Numeric Graphs Sparse, Numeric Data Dependency Bio informatics High Dep - Dense/Sparse Clustered Dense Text High Dep – Sparse Manufacturing High Dep – Numeric, Dense Large Models Images Dense, Numeric ALGORITHMS DATASETS
  • 10. Machine Learning Workloads Suite Workload Linear Algebra Measure Calc. Special Funcs Math Optim. Data Characteristics Data-dep. Compute Mem. Access large model Linear Algebra Sparse Dense X X X Un/Supervised, Numeric Data Dependency X X X Un/Supervised, Num/Cat X X Large Models X X X Un/Supervised, Numeric X Workload Dataset Type Characteristics Linear Algebra Clustered Dense, Numeric Graphs Sparse, Numeric Data Dependency Bio informatics High Dep - Dense/Sparse Clustered Dense Text High Dep – Sparse Manufacturing High Dep – Numeric, Dense Large Models Images Dense, Numeric ALGORITHMS DATASETS ML Bench 1.0 • Algorithm X Data • Reference Models • Data Generator
  • 11. The “Dwarfs” Connection • Phill Collela’s “Seven Dwarfs” (2004) – • Patterns of computation and communication that are important for science and engineering • Berkley’s view (2006) – • Extended to 13 Dwarfs after examining the original 7 Dwarfs outside the HPC scope • US National Research Council’s Committee “Frontiers in Massive Data Analysis” (2013) – • Chapter 10: “The Seven Computational Giants of Massive Data Analysis” • The ML Building Blocks provide a further extension and a different perspective • Introducing data characteristics and the interplay with compute, communication, memory
  • 12. “Data! Data! Data! I Can’t Make Bricks Without Clay!”