SlideShare a Scribd company logo
Seattle Spark Meetup
Machine Learning Streams with Spark 1.0
Drew Minkin
Principal Program Manager, Ubix Labs
Machine Learning and Business Analytics
Streams and Real Time Analytics
Deep Dive into MLlib
AGENDA
Machine Learning and Business Analytics
Machine Learning is Not A Spectator Sport
Machine Learning and Data Science
http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
Reactive Proactive
ProductionResearch
The Analytics Spectrum
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process
ImprovementContent Delivery
Knowledge
Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain
Expertise
Integration
Big Data
Collaboration
Descriptive
Predictive
Prescriptive
Five Families of Algorithms
http://en.wikipedia.org/wiki/Wu_Xing
Association
Classification
Estimation
Forecasting
Clustering
Classification
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
Target a Discrete Answer –Yes/No
§  Find All Columns Driving its Value
§  Use model to score new records
§  Many Different Measures of Accuracy
§  Quick and Improving Iterations
§  Most Actionable Types of Models
§  Hospital Readmission
§  Equipment Failure
§  Likelihood to purchase
Examples
Credit Scoring Banding
Association and Sequencing
http://38.media.tumblr.com/tumblr_m81wcfIO3V1qmzwx0o1_1280.jpg
Examples
§  Collaborative Filtering
§  Identify cross-sell
§  Identify sequential, next-sale
§  Make purchase recommendations
§  Complex event associations
§  Transactions and items in
§  Rules, Sequences and Itemsets out
Recommender Systems
Forecasting and Time Series
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
•  Input of measure over time and related series
•  Predictions generated for short term trends
•  Based on cycles and events
Examples
§  Workforce Optimization
§  Timing Purchasing Decisions
§  Optimizing Maintenance Windows
§  Material Cost Planning
§  Equipment Usage Planning
Demand Sensing
Estimation and Regression
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
Predicting a Continuous Distribution
§  Many Different Measures of Accuracy
§  Quick and Improving Iterations
§  Most Actionable Types of Models
§  Length Of Stay Estimation
§  Customer Lifetime Value
Examples
Pricing Optimization
Clustering
http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/
§  Hard and Soft Groupings
§  Profiles of Subgroups
§  Likenesses and Differences
Examples
•  Marketing Campaigns
•  Reward Programs
•  Equipment Utilization
•  Process Improvement Analysis
Market Segmentation
Combining Algorithms in Harmony
http://en.wikipedia.org/wiki/Wu_Xing
Streams and Real Time Analytics
The Challenges of Scaling Analytics
Classes of Analytics Complexity
Spark vs. Storm, etc.
Stream Paradigms and Spark
AGENDA
Streams and Real Time Analytics
Will Business Run out of Modeling Opportunities?
The Approaching Crisis for Machine Learning
Hype vs. Reality in Scaling Data Science
http://www.kdnuggets.com/2013/04/poll-results-largest-dataset-analyzed-data-mined.html
2009 vs. 2014 Scaling Data Science
http://www.kdnuggets.com
Spectrum of Stream Based Analytics
Latency
Events/Sec
Months
Days
Hours
Minutes
Seconds
100 ms
< 1 ms
0 10 102 103 104 105 106
Big Data
NoSQL
RDBMS
Business
Monitoring
Machine
Monitoring
Real Time
Monitoring
Web Analytics
EDW
Analytics
Operational
Analytics
http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
Challenges of Stream Based Applications
http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
Devices	
  
Sensors	
  
Web	
  servers	
  
Feeds	
  
Complex Analytics &
Mining
Challenges of Stream Based Applications
http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
Hopping Windows
Tumbling Windows
Event Synchronization
Latency
Time Window Management
Deep Dive into MLlib
Architecture
Descriptive Analytics
Predictive Analytics
Prescriptive Analytics
AGENDA
Deep Dive into MLlib
MLlib Descriptive Analytics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process
Improvement
Reactive Proactive
ProductionResearch
Content Delivery
Knowledge
Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain
Expertise
Integration
Big Data
Collaboration
MLlib Descriptive Analytics - Data Types
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Vectors
•  Dense
MLlib Descriptive Analytics - Data Types
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Vectors
•  Sparse
MLlib Descriptive Analytics - Data Types
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Linear Algebra
•  CoordinateMatrix
•  DistributedMatrix
•  IndexedRow
•  IndexedRowMatrix
•  MatrixEntry
•  RowMatrix
MLlib Descriptive Analytics – Summary Statistics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Sample size
Maximum value of each column
Sample mean vector
Minimum value of each column
Number of nonzero elements
Sample variance vector
MLlib Descriptive Analytics - SVD
http://public.lanl.gov/mewall/kluwer2002.html
Singular Value Decomposition
Can Collapse Sparse Matrices to Denser Forms
MLlib Descriptive Analytics – PCA
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Primary Component Analysis
Reduces Dimensionality with Feature Selection
MLLib Predictive Analytics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process
Improvement
Reactive Proactive
ProductionResearch
Content Delivery
Knowledge
Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain
Expertise
Integration
Big Data
Collaboration
MLlib Predictive Analytics – Bayesian Classifier
http://xkcd.com/1132/
MLlib Predictive Analytics – Logistic Regression
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Grandaddy of Algorithms
Coefficients from states or exact values
Small scores can make big changes
MLlib Predictive Analytics - SVM
http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php
Linear Support Vector Machine for classifiers
Behold the “kernel trick”
MLlib Predictive Analytics – Regression
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Linear
Ridge
Least
Absolute
Shrinkage &
Selection
Operator
MLlib Predictive Analytics – Kmeans
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
MLlib Predictive Analytics – Matrix Factorization
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Collaborative Filtering
Alternating Least Squares (ALS)
Reactive Proactive
ProductionResearch
Prescriptive Analytics
http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
Graph
Data Management
Simulation
Process
ImprovementContent Delivery
Knowledge
Management
Data Modeling
Visualization
Data Quality
Monitoring
Analysis
Optimization
Algorithms
Trialing
Statistics
Domain
Expertise
Integration
Big Data
Collaboration
MLlib Prescriptive Analytics – Gradient Descent
http://bleedingedgemachine.blogspot.com/2012/12/gradient-descent.html http://kungfupanda.wikia.com/wiki/Monkey
Linear and Nonlinear
Optimization
minimize smooth functions without constraints,
MLlib Prescriptive Analytics – L-BFGS
http://graphics.utdallas.edu/sites/default/files/gpucvt.png
Limited-Memory BFGS
Nonlinear
Minimize Smoothing
Constraint is Memory
Notes from the MLlib Streams Field
MLlib Predictive Analytics – K Nearest Neighbor
http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php
Variation for classifiers
MLlib – A Call to Action
http://www.fanpop.com/clubs/voltron/images/2172709/title/original-fanart http://adventuretime.wikia.com/wiki/Princess_Monster_Wife
Coming Soon
•  Decision Trees
•  Model Performance Tools
It Takes A Village
•  Time Series
•  Ensemble MLI

More Related Content

Viewers also liked

GLAD Strategies in the Art Classroom
GLAD Strategies in the Art Classroom GLAD Strategies in the Art Classroom
GLAD Strategies in the Art Classroom mariers
 
Chapter5.2
Chapter5.2Chapter5.2
Chapter5.2nglaze10
 
Increasing Your Response Rate
Increasing Your Response RateIncreasing Your Response Rate
Increasing Your Response Rate
Email Delivered
 
Chapter1.4 alghonors
Chapter1.4 alghonorsChapter1.4 alghonors
Chapter1.4 alghonorsnglaze10
 
05Mar14 - Missing 90 year olds
05Mar14 - Missing 90 year olds05Mar14 - Missing 90 year olds
05Mar14 - Missing 90 year olds
ILC- UK
 
New week 10
New week 10New week 10
New week 10nglaze10
 
Self awareness
Self awarenessSelf awareness
Self awareness
ben_norris124
 
Felicitación navidad
Felicitación navidadFelicitación navidad
Felicitación navidad
Cosma Tour
 

Viewers also liked (10)

GLAD Strategies in the Art Classroom
GLAD Strategies in the Art Classroom GLAD Strategies in the Art Classroom
GLAD Strategies in the Art Classroom
 
Chapter5.2
Chapter5.2Chapter5.2
Chapter5.2
 
Increasing Your Response Rate
Increasing Your Response RateIncreasing Your Response Rate
Increasing Your Response Rate
 
Chapter1.4 alghonors
Chapter1.4 alghonorsChapter1.4 alghonors
Chapter1.4 alghonors
 
05Mar14 - Missing 90 year olds
05Mar14 - Missing 90 year olds05Mar14 - Missing 90 year olds
05Mar14 - Missing 90 year olds
 
New week 10
New week 10New week 10
New week 10
 
AdvisorVault Overview
AdvisorVault OverviewAdvisorVault Overview
AdvisorVault Overview
 
Self awareness
Self awarenessSelf awareness
Self awareness
 
Aloitus
AloitusAloitus
Aloitus
 
Felicitación navidad
Felicitación navidadFelicitación navidad
Felicitación navidad
 

Similar to Machine Learning Streams with Spark 1.0

Your data in the cloud windows azure
Your data in the cloud   windows azureYour data in the cloud   windows azure
Your data in the cloud windows azure
Nigel Watson
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
Barbara Fusinska
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Mark Tabladillo
 
Top10 Characteristics of Awesome Apps
Top10 Characteristics of Awesome AppsTop10 Characteristics of Awesome Apps
Top10 Characteristics of Awesome Apps
Casey Lee
 
Microsoft Azure User Group - Lessons Learned
Microsoft Azure User Group - Lessons Learned Microsoft Azure User Group - Lessons Learned
Microsoft Azure User Group - Lessons Learned
Michal Furmankiewicz
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
gjuljo
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Mark Tabladillo
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
Mark Tabladillo
 
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
Amazon Web Services Korea
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
Valdas Maksimavičius
 
Sql server and cloud
Sql server and cloudSql server and cloud
Sql server and cloud
Kiki Noviandi
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
Arcadia Data
 
Primend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure InfrastruktuurPrimend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure Infrastruktuur
Primend
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
IanFurlong4
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
CCG
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEWShiyong Lu
 
SQL Server & la virtualisation : « 45 minutes inside » !
SQL Server & la virtualisation :  « 45 minutes inside » !SQL Server & la virtualisation :  « 45 minutes inside » !
SQL Server & la virtualisation : « 45 minutes inside » !
Microsoft Décideurs IT
 
SQL Server & la virtualisation : « 45 minutes inside » !
SQL Server & la virtualisation :  « 45 minutes inside » !SQL Server & la virtualisation :  « 45 minutes inside » !
SQL Server & la virtualisation : « 45 minutes inside » !
Microsoft Technet France
 

Similar to Machine Learning Streams with Spark 1.0 (20)

Your data in the cloud windows azure
Your data in the cloud   windows azureYour data in the cloud   windows azure
Your data in the cloud windows azure
 
V like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure MLV like Velocity, Predicting in Real-Time with Azure ML
V like Velocity, Predicting in Real-Time with Azure ML
 
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham ALSecrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
 
Top10 Characteristics of Awesome Apps
Top10 Characteristics of Awesome AppsTop10 Characteristics of Awesome Apps
Top10 Characteristics of Awesome Apps
 
Microsoft Azure User Group - Lessons Learned
Microsoft Azure User Group - Lessons Learned Microsoft Azure User Group - Lessons Learned
Microsoft Azure User Group - Lessons Learned
 
Microsoft Azure Technical Overview
Microsoft Azure Technical OverviewMicrosoft Azure Technical Overview
Microsoft Azure Technical Overview
 
Games en
Games enGames en
Games en
 
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Adel_Ali_Shaban
Adel_Ali_ShabanAdel_Ali_Shaban
Adel_Ali_Shaban
 
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park4. aws enterprise summit seoul   기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
4. aws enterprise summit seoul 기존 엔터프라이즈 it 솔루션 클라우드로 이전하기 - thomas park
 
Data engineering design patterns
Data engineering design patternsData engineering design patterns
Data engineering design patterns
 
Sql server and cloud
Sql server and cloudSql server and cloud
Sql server and cloud
 
Accelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time AnalyticsAccelerating Data Lakes and Streams with Real-time Analytics
Accelerating Data Lakes and Streams with Real-time Analytics
 
Primend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure InfrastruktuurPrimend Pilvekonverents - Azure Infrastruktuur
Primend Pilvekonverents - Azure Infrastruktuur
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
Analytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual WorkshopAnalytics in a Day Ft. Synapse Virtual Workshop
Analytics in a Day Ft. Synapse Virtual Workshop
 
An Overview of VIEW
An Overview of VIEWAn Overview of VIEW
An Overview of VIEW
 
SQL Server & la virtualisation : « 45 minutes inside » !
SQL Server & la virtualisation :  « 45 minutes inside » !SQL Server & la virtualisation :  « 45 minutes inside » !
SQL Server & la virtualisation : « 45 minutes inside » !
 
SQL Server & la virtualisation : « 45 minutes inside » !
SQL Server & la virtualisation :  « 45 minutes inside » !SQL Server & la virtualisation :  « 45 minutes inside » !
SQL Server & la virtualisation : « 45 minutes inside » !
 

Recently uploaded

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
2023240532
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 

Recently uploaded (20)

Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
Quantitative Data AnalysisReliability Analysis (Cronbach Alpha) Common Method...
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 

Machine Learning Streams with Spark 1.0

  • 1. Seattle Spark Meetup Machine Learning Streams with Spark 1.0 Drew Minkin Principal Program Manager, Ubix Labs
  • 2. Machine Learning and Business Analytics Streams and Real Time Analytics Deep Dive into MLlib AGENDA
  • 3. Machine Learning and Business Analytics
  • 4. Machine Learning is Not A Spectator Sport
  • 5. Machine Learning and Data Science http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
  • 6. Reactive Proactive ProductionResearch The Analytics Spectrum http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process ImprovementContent Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration Descriptive Predictive Prescriptive
  • 7. Five Families of Algorithms http://en.wikipedia.org/wiki/Wu_Xing Association Classification Estimation Forecasting Clustering
  • 8. Classification http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ Target a Discrete Answer –Yes/No §  Find All Columns Driving its Value §  Use model to score new records §  Many Different Measures of Accuracy §  Quick and Improving Iterations §  Most Actionable Types of Models §  Hospital Readmission §  Equipment Failure §  Likelihood to purchase Examples Credit Scoring Banding
  • 9. Association and Sequencing http://38.media.tumblr.com/tumblr_m81wcfIO3V1qmzwx0o1_1280.jpg Examples §  Collaborative Filtering §  Identify cross-sell §  Identify sequential, next-sale §  Make purchase recommendations §  Complex event associations §  Transactions and items in §  Rules, Sequences and Itemsets out Recommender Systems
  • 10. Forecasting and Time Series http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ •  Input of measure over time and related series •  Predictions generated for short term trends •  Based on cycles and events Examples §  Workforce Optimization §  Timing Purchasing Decisions §  Optimizing Maintenance Windows §  Material Cost Planning §  Equipment Usage Planning Demand Sensing
  • 11. Estimation and Regression http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ Predicting a Continuous Distribution §  Many Different Measures of Accuracy §  Quick and Improving Iterations §  Most Actionable Types of Models §  Length Of Stay Estimation §  Customer Lifetime Value Examples Pricing Optimization
  • 12. Clustering http://akorra.com/2012/06/06/top-10-creatures-that-influenced-martial-arts/ §  Hard and Soft Groupings §  Profiles of Subgroups §  Likenesses and Differences Examples •  Marketing Campaigns •  Reward Programs •  Equipment Utilization •  Process Improvement Analysis Market Segmentation
  • 13. Combining Algorithms in Harmony http://en.wikipedia.org/wiki/Wu_Xing
  • 14. Streams and Real Time Analytics
  • 15. The Challenges of Scaling Analytics Classes of Analytics Complexity Spark vs. Storm, etc. Stream Paradigms and Spark AGENDA Streams and Real Time Analytics
  • 16. Will Business Run out of Modeling Opportunities?
  • 17. The Approaching Crisis for Machine Learning
  • 18. Hype vs. Reality in Scaling Data Science http://www.kdnuggets.com/2013/04/poll-results-largest-dataset-analyzed-data-mined.html
  • 19. 2009 vs. 2014 Scaling Data Science http://www.kdnuggets.com
  • 20. Spectrum of Stream Based Analytics Latency Events/Sec Months Days Hours Minutes Seconds 100 ms < 1 ms 0 10 102 103 104 105 106 Big Data NoSQL RDBMS Business Monitoring Machine Monitoring Real Time Monitoring Web Analytics EDW Analytics Operational Analytics http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx
  • 21. Challenges of Stream Based Applications http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx Devices   Sensors   Web  servers   Feeds   Complex Analytics & Mining
  • 22. Challenges of Stream Based Applications http://www.cs.ucr.edu/~mueen/ppt/StreamInsigh%205%20SLIDE%20DEMO.pptx Hopping Windows Tumbling Windows Event Synchronization Latency Time Window Management
  • 23. Deep Dive into MLlib
  • 25. MLlib Descriptive Analytics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process Improvement Reactive Proactive ProductionResearch Content Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration
  • 26. MLlib Descriptive Analytics - Data Types http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Vectors •  Dense
  • 27. MLlib Descriptive Analytics - Data Types http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Vectors •  Sparse
  • 28. MLlib Descriptive Analytics - Data Types http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Linear Algebra •  CoordinateMatrix •  DistributedMatrix •  IndexedRow •  IndexedRowMatrix •  MatrixEntry •  RowMatrix
  • 29. MLlib Descriptive Analytics – Summary Statistics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Sample size Maximum value of each column Sample mean vector Minimum value of each column Number of nonzero elements Sample variance vector
  • 30. MLlib Descriptive Analytics - SVD http://public.lanl.gov/mewall/kluwer2002.html Singular Value Decomposition Can Collapse Sparse Matrices to Denser Forms
  • 31. MLlib Descriptive Analytics – PCA http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Primary Component Analysis Reduces Dimensionality with Feature Selection
  • 32. MLLib Predictive Analytics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process Improvement Reactive Proactive ProductionResearch Content Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration
  • 33. MLlib Predictive Analytics – Bayesian Classifier http://xkcd.com/1132/
  • 34. MLlib Predictive Analytics – Logistic Regression http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Grandaddy of Algorithms Coefficients from states or exact values Small scores can make big changes
  • 35. MLlib Predictive Analytics - SVM http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php Linear Support Vector Machine for classifiers Behold the “kernel trick”
  • 36. MLlib Predictive Analytics – Regression http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Linear Ridge Least Absolute Shrinkage & Selection Operator
  • 37. MLlib Predictive Analytics – Kmeans http://halobi.com/wp-content/uploads/Blog-1-1024x600.png
  • 38. MLlib Predictive Analytics – Matrix Factorization http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Collaborative Filtering Alternating Least Squares (ALS)
  • 39. Reactive Proactive ProductionResearch Prescriptive Analytics http://halobi.com/wp-content/uploads/Blog-1-1024x600.png Graph Data Management Simulation Process ImprovementContent Delivery Knowledge Management Data Modeling Visualization Data Quality Monitoring Analysis Optimization Algorithms Trialing Statistics Domain Expertise Integration Big Data Collaboration
  • 40. MLlib Prescriptive Analytics – Gradient Descent http://bleedingedgemachine.blogspot.com/2012/12/gradient-descent.html http://kungfupanda.wikia.com/wiki/Monkey Linear and Nonlinear Optimization minimize smooth functions without constraints,
  • 41. MLlib Prescriptive Analytics – L-BFGS http://graphics.utdallas.edu/sites/default/files/gpucvt.png Limited-Memory BFGS Nonlinear Minimize Smoothing Constraint is Memory
  • 42. Notes from the MLlib Streams Field
  • 43. MLlib Predictive Analytics – K Nearest Neighbor http://www.youtube.com/watch?v=3liCbRZPrZA http://www.projectrho.com/public_html/rocket/fasterlight.php Variation for classifiers
  • 44. MLlib – A Call to Action http://www.fanpop.com/clubs/voltron/images/2172709/title/original-fanart http://adventuretime.wikia.com/wiki/Princess_Monster_Wife Coming Soon •  Decision Trees •  Model Performance Tools It Takes A Village •  Time Series •  Ensemble MLI