SlideShare a Scribd company logo
1 of 50
Download to read offline
1
© 2015 The MathWorks, Inc.
Big Data and Machine Learning
Using MATLAB
Seth DeLand & Amit Doshi
MathWorks
2
Data Analytics
Turn large volumes of complex data into actionable information
source: Gartner
3
Customer Example: Gas Natural Fenosa
Energy Production Optimization
Opportunity
• Allocate demand among power plants to minimize
generation costs
Analytics Use
• Data: Central database for historical power consumption
and price data, weather forecasts, and parameters for each
power plant
• Machine Learning: Develop price simulation scenarios
• Optimization: minimize production cost
Benefit
• Reduced generation costs
• White-box solution for optimizing power generation
User Story
4
Prescriptive Analytics
Predictive Analytics
Unit Commitment
Predictive and Prescriptive Analytics
Schedule
Generator
Parameters
Unit
Commitment
Load Forecast
Historical
Weather Data
Historical
Load Data
5
Big Data Analytics Workflow
Integrate Analytics with
Systems
Desktop Apps
Enterprise Scale
Systems
Embedded Devices
and Hardware
Files
Databases
Sensors
Access and Explore
Data
Develop Predictive
Models
Model Creation e.g.
Machine Learning
Model
Validation
Parameter
Optimization
Preprocess Data
Working with
Messy Data
Data Reduction/
Transformation
Feature
Extraction
6
Example: Working with Big Data in MATLAB
▪ Objective: Create a model to predict the cost of a taxi ride in New York City
▪ Inputs:
– Monthly taxi ride log files
– The local data set is small (~20 MB)
– The full data set is big (~21 GB)
▪ Approach:
– Access Data
– Preprocess and explore data
– Develop and validate predictive model (linear fit)
▪ Work with subset of data for prototyping and then run on spark enabled hadoop with full data
– Integrate analytics into a webapp
7
Example: Working with Big Data in MATLAB
8
Demo: Taxi Fare Predictor Web App
9
Big Data Analytics Workflow: Data Access and Pre-process
Integrate Analytics with
Systems
Desktop Apps
Enterprise Scale
Systems
Embedded Devices
and Hardware
Files
Databases
Sensors
Access and Explore
Data
Develop Predictive
Models
Model Creation e.g.
Machine Learning
Model
Validation
Parameter
Optimization
Preprocess Data
Working with
Messy Data
Data Reduction/
Transformation
Feature
Extraction
10
Data Access and Pre-processing – Challenges
▪ Data aggregation
– Different sources (files, web, etc.)
– Different types (images, text, audio, etc.)
▪ Data clean up
– Poorly formatted files
– Irregularly sampled data
– Redundant data, outliers, missing data etc.
▪ Data specific processing
– Signals: Smoothing, resampling, denoising,
Wavelet transforms, etc.
– Images: Image registration, morphological
filtering, deblurring, etc.
▪ Dealing with out of memory data (big data)
Challenges
Data preparation accounts for about 80% of the work of data
scientists - Forbes
11
Data Analytics Workflow: Big Data Access and Pre-processing
12
Next: Access Big Data from MATLAB
▪ datastore
– Tabular text files
– Images
– Excel spreadsheets
– (SQL) Databases
– HDFS (Hadoop)
– S3 - Amazon
13
Get data in MATLAB
14
What if the data is saved in HDFS?
15
Or Data is stored in a Database
16
Data Access: Summary
C Java Fortran Python
Hardware
Software
Servers and Databases
▪ Repositories – SQL, NoSQL, etc.
▪ File I/O – Text, Spreadsheet, etc.
▪ Web Sources – RESTful, JSON, etc.
Business and Transactional Data
Engineering, Scientific and Field
Data
▪ Real-Time Sources – Sensors,
GPS, etc.
▪ File I/O – Image, Audio, etc.
▪ Communication Protocols – OPC
(OLE for Process Control), CAN
(Controller Area Network), etc.
17
Process data which doesn't fit into memory
Integrate Analytics with
Systems
Desktop Apps
Enterprise Scale
Systems
Embedded Devices
and Hardware
Files
Databases
Sensors
Access and Explore
Data
Develop Predictive
Models
Model Creation e.g.
Machine Learning
Model
Validation
Parameter
Optimization
Preprocess Data
Working with
Messy Data
Data Reduction/
Transformation
Feature
Extraction
18
Pre-processing Big Data
▪ New data type designed for data that doesn’t fit into memory
▪ Lots of observations (hence “tall”)
▪ Looks like a normal MATLAB array
– Supports numeric types, tables, datetimes, strings, etc…
– Supports several hundred functions for basic math, stats, indexing, etc.
– Statistics and Machine Learning Toolbox support
(clustering, classification, etc.)
tall arrays in
19
tall array
Single
Machine
Memory
tall arrays
▪ Automatically breaks data up into
small “chunks” that fit in memory
▪ Tall arrays scan through the
dataset one “chunk” at a time
▪ Processing code for tall arrays is
the same as ordinary arrays
Single
Machine
Memory
Process
20
tall array
Cluster of
Machines
Memory
Single
Machine
Memory
tall arrays
▪ With Parallel Computing Toolbox,
process several “chunks” at once
▪ Can scale up to clusters with
MATLAB Distributed Computing
Server
Single
Machine
Memory
Process
Single
Machine
Memory
Process
Single
Machine
Memory
Process
Single
Machine
Memory
Process
Single
Machine
Memory
Process
Single
Machine
Memory
Process
21
Demo: Working with Tall Arrays
22
Data Access and pre-processing – challenges and solution
▪ Data aggregation
– Different sources (files, web, etc.)
– Different types (images, text, audio, etc.)
▪ Data clean up
– Poorly formatted files
– Irregularly sampled data
– Redundant data, outliers, missing data etc.
▪ Data specific processing
– Signals: Smoothing, resampling, denoising,
Wavelet transforms, etc.
– Images: Image registration, morphological
filtering, deblurring, etc.
▪ Dealing with out of memory data (big data)
Challenges
▪ Point and click tools to access
variety of data sources
▪ High-performance environment
for big data
Files
Signals
Databases
Images
▪ Built-in algorithms for data
preprocessing including sensor,
image, audio, video and other
real-time data
MATLAB makes it easy to
work with business and
engineering data
1
23
Data Analytics Workflow: Develop Predictive Models using Big Data
Integrate Analytics with
Systems
Desktop Apps
Enterprise Scale
Systems
Embedded Devices
and Hardware
Files
Databases
Sensors
Access and Explore
Data
Develop Predictive
Models
Model Creation e.g.
Machine Learning
Model
Validation
Parameter
Optimization
Preprocess Data
Working with
Messy Data
Data Reduction/
Transformation
Feature
Extraction
24
Machine Learning
Machine learning uses data and produces a program to perform a task
Standard Approach Machine Learning Approach
𝑚𝑜𝑑𝑒𝑙 = <
𝑴𝒂𝒄𝒉𝒊𝒏𝒆
𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈
𝑨𝒍𝒈𝒐𝒓𝒊𝒕𝒉𝒎
>(𝑠𝑒𝑛𝑠𝑜𝑟_𝑑𝑎𝑡𝑎, 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦)
Computer
Program
Machine
Learning
𝑚𝑜𝑑𝑒𝑙: Inputs → Outputs
Hand Written Program Formula or Equation
If X_acc > 0.5
then “SITTING”
If Y_acc < 4 and Z_acc > 5
then “STANDING”
…
𝑌𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦
= 𝛽1𝑋𝑎𝑐𝑐 + 𝛽2𝑌𝑎𝑐𝑐
+ 𝛽3𝑍𝑎𝑐𝑐 +
…
Task: Human Activity Detection
25
Consider Machine/Deep Learning When
update as more data
becomes available
learn complex non-
linear relationships
learn efficiently from
very large data sets
Problem is too complex for hand written rules or equations
Speech Recognition Object Recognition Engine Health Monitoring
Program needs to adapt with changing data
Weather Forecasting Energy Load Forecasting Stock Market Prediction
Program needs to scale
IoT Analytics Taxi Availability Airline Flight Delays
Because algorithms can
26
Different Types of Learning
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
Learning
Clustering
Discover an internal
representation from
input data only
Develop predictive
model based on both
input and output data
Type of Learning Categories of Algorithms
• No output - find natural groups and
patterns from input data only
• Output is a real number
(temperature, stock prices)
• Output is a choice between classes
(True, False) (Red, Blue, Green)
27
Different Types of Learning
Machine
Learning
Supervised
Learning
Classification
Regression
Unsupervised
Learning
Clustering
Discover an internal
representation from
input data only
Develop predictive
model based on both
input and output data
Type of Learning Categories of Algorithms
Linear
Regression
GLM
Decision
Trees
Ensemble
Methods
Neural
Networks
SVR,
GPR
Nearest
Neighbor
Discriminant
Analysis
Naive Bayes
Support
Vector
Machines
kMeans, kmedoids
Fuzzy C-Means
Hierarchical
Neural
Networks
Gaussian
Mixture
Hidden Markov
Model
28
Machine Learning with Big Data
• Descriptive statistics (skewness,
tabulate, crosstab, cov, grpstats, …)
• K-means clustering (kmeans)
• Visualization (ksdensity, binScatterPlot;
histogram, histogram2)
• Dimensionality reduction (pca, pcacov,
factoran)
• Linear and generalized linear regression
(fitlm, fitglm)
• Discriminant analysis (fitcdiscr)
• Linear classification methods for SVM
and logistic regression (fitclinear)
• Random forest ensembles of
classification trees (TreeBagger)
• Naïve Bayes classification (fitcnb)
• Regularized regression (lasso)
• Prediction applied to tall arrays
29
Demo: Training a Machine Learning Model
30
Demo: Training a Machine Learning Model
31
Regression Learner
32
Regression Learner
App to apply advanced regression methods to your data
▪ Added to Statistics and Machine Learning
Toolbox in R2017a
▪ Point and click interface – no coding
required
▪ Quickly evaluate, compare and select
regression models
▪ Export and share MATLAB code or
trained models
33
Classification Learner
App to apply advanced classification methods to your data
▪ Added to Statistics and Machine Learning
Toolbox in R2015a
▪ Point and click interface – no coding
required
▪ Quickly evaluate, compare and select
classification models
▪ Export and share MATLAB code or
trained models
34
and Many More MATLAB Apps for Data Analytics
Distribution Fitting
System Identification
Signal Analysis
Wavelet Design and Analysis
Neural Net Fitting
Neural Net Pattern Recognition
Training Image Labeler
and many more…
35
Tuning Machine Learning Models
Get more accurate models in less time
Automatically select best
machine leaning “features”
NCA: Neighborhood Component Analysis
Select best “features”
to keep in model from
over 400 candidates
Automatically fine-tune
machine learning parameters
Hyperparameter Tuning
36
Machine Learning Hyperparameters
Hyperparameters
Tune a typical set of
hyperparameters for this model
Tune all
hyperparameters for this model
37
Bayesian Optimization in Action
38
Challenges
▪ Lack of data science expertise
▪ Feature Extraction – How to transform
data to best represent the system?
– Requires subject matter expertise
– No right way of designing features
▪ Feature Selection – What attributes or
subset of data to use?
– Entails a lot of iteration – Trial and error
– Difficult to evaluate features
▪ Model Development
– Many different models
– Model Validation and Tuning
▪ Time required to conduct the analysis
MATLAB enables
domain experts to
do Data Science
2
Apps Language
▪ Easy to use apps
▪ Wide breadth of tools to facilitate
domain specific analysis
▪ Examples/videos to get started
▪ Automatic MATLAB code
generation
▪ High speed processing of large
data sets
Big Data Analytics Workflow: Developing
Predictive models
39
Back to our example: Working with Big Data in MATLAB
▪ Objective: Create a model to predict the cost of a taxi ride in New York City
▪ Inputs:
– Monthly taxi ride log files
– The local data set is small (~20 MB)
– The full data set is big (~25 GB)
▪ Approach:
– Acecss Data
– Preprocess and explore data
– Develop and validate predictive model (linear fit)
▪ Work with subset of data for prototyping
▪ Scale to full data set on a cluster
40
Data Analytics Workflow: Develop Predictive Models using Big Data
Integrate Analytics with
Systems
Desktop Apps
Enterprise Scale
Systems
Embedded Devices
and Hardware
Files
Databases
Sensors
Access and Explore
Data
Develop Predictive
Models
Model Creation e.g.
Machine Learning
Model
Validation
Parameter
Optimization
Preprocess Data
Working with
Messy Data
Data Reduction/
Transformation
Feature
Extraction
41
Demo: Taxi Fare Predictor Web App
42
MATLAB Production Server
▪ Server software
– Manages packaged MATLAB
programs and worker pool
▪ MATLAB Runtime libraries
– Single server can use runtimes
from different releases
▪ RESTful JSON interface
▪ Lightweight client libraries
– C/C++, .NET, Python, and Java
MATLAB Production Server
MATLAB
Runtime
Request Broker
&
Program
Manager
Applications/
Database
Servers RESTful
JSON
Enterprise
Application
MPS Client
Library
43
Integrate analytics with systems
MATLAB
Runtime
C, C++ HDL PLC
Embedded Hardware
C/C++ ++
Excel
Add-in Java
Hadoop/
Spark
.NET
MATLAB
Production
Server
Standalone
Application
Enterprise Systems
Python
MATLAB Analytics
run anywhere
3
44
YARN
Product Support for Spark
Web & Mobile
Applications
Enterprise
Applications DEVELOPMENT TOOLS
MATLAB
Compiler
MATLAB
Runtime
MATLAB Distributed
Computing Server
From MATLAB desktop:
• Access data from HDFS
• Run “tall” functions on
Spark/Hadoop using MDCS
Spark
Integrate with applications:
• Deploy MATLAB programs using “tall”
• Develop deployable applications for
Spark using MATLAB API for Spark
45
YARN : Data Operating System
Spark
Deployment Offerings
▪ Deploy “tall” programs
– Create Standalone Applications: MATLAB Compiler
▪ MATLAB API for Spark
– Create Standalone Applications: MATLAB Compiler
– Functionality beyond tall arrays
– For advanced programmers familiar with Spark
– Local install of Spark to run code in MATLAB
▪ Installed on same machine as MATLAB – single node, Linux
Standalone
Application
Edge Node
MATLAB
Runtime
MATLAB
Compiler
Program using tall
Program using
MATLAB API for Spark
Since the Standalone
must run on a Linux
Edge Node, you must
compile on Linux
46
Data Analytics Workflow
Integrate Analytics
with Systems
Desktop Apps
Enterprise Scale
Systems
Embedded Devices
and Hardware
Files
Databases
Sensors
Access and Explore
Data
Develop Predictive
Models
Model Creation e.g.
Machine Learning
Model
Validation
Parameter
Optimization
Preprocess Data
Working with
Messy Data
Data Reduction/
Transformation
Feature
Extraction
MATLAB Analytics work
with business and
engineering data
1 MATLAB enables
domain experts to do
Data Science
2 3
MATLAB Analytics
run anywhere
47
Resources to learn and get started mathworks.com/machine-learning
eBook
mathworks.com/big-data
48
MathWorks Services
▪ Consulting
– Integration
– Data analysis/visualization
– Unify workflows, models, data
▪ Training
– Classroom, online, on-site
– Data Processing, Visualization, Deployment, Parallel Computing
www.mathworks.com/services/consulting/
www.mathworks.com/services/training/
49
MathWorks Training Offerings
http://www.mathworks.com/services/training/
50
Speaker Details
Email:
seth.deland@mathworks.com
amit.doshi@mathworks.in
LinkedIn:
https://in.linkedin.com/in/amit-doshi
https://www.linkedin.com/in/seth-deland
Contact MathWorks India
Products/Training Enquiry Booth
Call: 080-6632-6000
Email: info@mathworks.in
Your feedback is valued.
Please complete the feedback form provided to you.

More Related Content

Similar to Big Data and Machine Learning Models

Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 abhagathk
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfRAKESHG79
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupSri Kanajan
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...Mihai Criveti
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningAbzetdin Adamov
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachMihai Criveti
 
DATA preprocessing.pptx
DATA preprocessing.pptxDATA preprocessing.pptx
DATA preprocessing.pptxChandra Meena
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptxElsonPaul2
 
Master Program in Computer Science with specialization in Data Science
Master Program in Computer Science with specialization in Data ScienceMaster Program in Computer Science with specialization in Data Science
Master Program in Computer Science with specialization in Data ScienceOleksii Molchanovskyi
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...Big Data Value Association
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning Gopal Sakarkar
 

Similar to Big Data and Machine Learning Models (20)

unit 1 big data.pptx
unit 1 big data.pptxunit 1 big data.pptx
unit 1 big data.pptx
 
Big data.ppt
Big data.pptBig data.ppt
Big data.ppt
 
Lecture1
Lecture1Lecture1
Lecture1
 
DA_01_Intro.pptx
DA_01_Intro.pptxDA_01_Intro.pptx
DA_01_Intro.pptx
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
Data Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdfData Science & Big Data - Theory.pdf
Data Science & Big Data - Theory.pdf
 
Big data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup GroupBig data Intro - Presentation to OCHackerz Meetup Group
Big data Intro - Presentation to OCHackerz Meetup Group
 
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
DevOps for Data Engineers - Automate Your Data Science Pipeline with Ansible,...
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
Analytics&IoT
Analytics&IoTAnalytics&IoT
Analytics&IoT
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Understanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine LearningUnderstanding your Data - Data Analytics Lifecycle and Machine Learning
Understanding your Data - Data Analytics Lifecycle and Machine Learning
 
Data Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps ApproachData Science at Scale - The DevOps Approach
Data Science at Scale - The DevOps Approach
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
DATA preprocessing.pptx
DATA preprocessing.pptxDATA preprocessing.pptx
DATA preprocessing.pptx
 
Big Data Session 1.pptx
Big Data Session 1.pptxBig Data Session 1.pptx
Big Data Session 1.pptx
 
BigData
BigDataBigData
BigData
 
Master Program in Computer Science with specialization in Data Science
Master Program in Computer Science with specialization in Data ScienceMaster Program in Computer Science with specialization in Data Science
Master Program in Computer Science with specialization in Data Science
 
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
BDVe Webinar Series - Designing Big Data pipelines with Toreador (Ernesto Dam...
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 

Recently uploaded

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/managementakshesh doshi
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationBoston Institute of Analytics
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 

Recently uploaded (20)

Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Spark3's new memory model/management
Spark3's new memory model/managementSpark3's new memory model/management
Spark3's new memory model/management
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Data Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health ClassificationData Science Project: Advancements in Fetal Health Classification
Data Science Project: Advancements in Fetal Health Classification
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 

Big Data and Machine Learning Models

  • 1. 1 © 2015 The MathWorks, Inc. Big Data and Machine Learning Using MATLAB Seth DeLand & Amit Doshi MathWorks
  • 2. 2 Data Analytics Turn large volumes of complex data into actionable information source: Gartner
  • 3. 3 Customer Example: Gas Natural Fenosa Energy Production Optimization Opportunity • Allocate demand among power plants to minimize generation costs Analytics Use • Data: Central database for historical power consumption and price data, weather forecasts, and parameters for each power plant • Machine Learning: Develop price simulation scenarios • Optimization: minimize production cost Benefit • Reduced generation costs • White-box solution for optimizing power generation User Story
  • 4. 4 Prescriptive Analytics Predictive Analytics Unit Commitment Predictive and Prescriptive Analytics Schedule Generator Parameters Unit Commitment Load Forecast Historical Weather Data Historical Load Data
  • 5. 5 Big Data Analytics Workflow Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 6. 6 Example: Working with Big Data in MATLAB ▪ Objective: Create a model to predict the cost of a taxi ride in New York City ▪ Inputs: – Monthly taxi ride log files – The local data set is small (~20 MB) – The full data set is big (~21 GB) ▪ Approach: – Access Data – Preprocess and explore data – Develop and validate predictive model (linear fit) ▪ Work with subset of data for prototyping and then run on spark enabled hadoop with full data – Integrate analytics into a webapp
  • 7. 7 Example: Working with Big Data in MATLAB
  • 8. 8 Demo: Taxi Fare Predictor Web App
  • 9. 9 Big Data Analytics Workflow: Data Access and Pre-process Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 10. 10 Data Access and Pre-processing – Challenges ▪ Data aggregation – Different sources (files, web, etc.) – Different types (images, text, audio, etc.) ▪ Data clean up – Poorly formatted files – Irregularly sampled data – Redundant data, outliers, missing data etc. ▪ Data specific processing – Signals: Smoothing, resampling, denoising, Wavelet transforms, etc. – Images: Image registration, morphological filtering, deblurring, etc. ▪ Dealing with out of memory data (big data) Challenges Data preparation accounts for about 80% of the work of data scientists - Forbes
  • 11. 11 Data Analytics Workflow: Big Data Access and Pre-processing
  • 12. 12 Next: Access Big Data from MATLAB ▪ datastore – Tabular text files – Images – Excel spreadsheets – (SQL) Databases – HDFS (Hadoop) – S3 - Amazon
  • 13. 13 Get data in MATLAB
  • 14. 14 What if the data is saved in HDFS?
  • 15. 15 Or Data is stored in a Database
  • 16. 16 Data Access: Summary C Java Fortran Python Hardware Software Servers and Databases ▪ Repositories – SQL, NoSQL, etc. ▪ File I/O – Text, Spreadsheet, etc. ▪ Web Sources – RESTful, JSON, etc. Business and Transactional Data Engineering, Scientific and Field Data ▪ Real-Time Sources – Sensors, GPS, etc. ▪ File I/O – Image, Audio, etc. ▪ Communication Protocols – OPC (OLE for Process Control), CAN (Controller Area Network), etc.
  • 17. 17 Process data which doesn't fit into memory Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 18. 18 Pre-processing Big Data ▪ New data type designed for data that doesn’t fit into memory ▪ Lots of observations (hence “tall”) ▪ Looks like a normal MATLAB array – Supports numeric types, tables, datetimes, strings, etc… – Supports several hundred functions for basic math, stats, indexing, etc. – Statistics and Machine Learning Toolbox support (clustering, classification, etc.) tall arrays in
  • 19. 19 tall array Single Machine Memory tall arrays ▪ Automatically breaks data up into small “chunks” that fit in memory ▪ Tall arrays scan through the dataset one “chunk” at a time ▪ Processing code for tall arrays is the same as ordinary arrays Single Machine Memory Process
  • 20. 20 tall array Cluster of Machines Memory Single Machine Memory tall arrays ▪ With Parallel Computing Toolbox, process several “chunks” at once ▪ Can scale up to clusters with MATLAB Distributed Computing Server Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process Single Machine Memory Process
  • 21. 21 Demo: Working with Tall Arrays
  • 22. 22 Data Access and pre-processing – challenges and solution ▪ Data aggregation – Different sources (files, web, etc.) – Different types (images, text, audio, etc.) ▪ Data clean up – Poorly formatted files – Irregularly sampled data – Redundant data, outliers, missing data etc. ▪ Data specific processing – Signals: Smoothing, resampling, denoising, Wavelet transforms, etc. – Images: Image registration, morphological filtering, deblurring, etc. ▪ Dealing with out of memory data (big data) Challenges ▪ Point and click tools to access variety of data sources ▪ High-performance environment for big data Files Signals Databases Images ▪ Built-in algorithms for data preprocessing including sensor, image, audio, video and other real-time data MATLAB makes it easy to work with business and engineering data 1
  • 23. 23 Data Analytics Workflow: Develop Predictive Models using Big Data Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 24. 24 Machine Learning Machine learning uses data and produces a program to perform a task Standard Approach Machine Learning Approach 𝑚𝑜𝑑𝑒𝑙 = < 𝑴𝒂𝒄𝒉𝒊𝒏𝒆 𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈 𝑨𝒍𝒈𝒐𝒓𝒊𝒕𝒉𝒎 >(𝑠𝑒𝑛𝑠𝑜𝑟_𝑑𝑎𝑡𝑎, 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦) Computer Program Machine Learning 𝑚𝑜𝑑𝑒𝑙: Inputs → Outputs Hand Written Program Formula or Equation If X_acc > 0.5 then “SITTING” If Y_acc < 4 and Z_acc > 5 then “STANDING” … 𝑌𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦 = 𝛽1𝑋𝑎𝑐𝑐 + 𝛽2𝑌𝑎𝑐𝑐 + 𝛽3𝑍𝑎𝑐𝑐 + … Task: Human Activity Detection
  • 25. 25 Consider Machine/Deep Learning When update as more data becomes available learn complex non- linear relationships learn efficiently from very large data sets Problem is too complex for hand written rules or equations Speech Recognition Object Recognition Engine Health Monitoring Program needs to adapt with changing data Weather Forecasting Energy Load Forecasting Stock Market Prediction Program needs to scale IoT Analytics Taxi Availability Airline Flight Delays Because algorithms can
  • 26. 26 Different Types of Learning Machine Learning Supervised Learning Classification Regression Unsupervised Learning Clustering Discover an internal representation from input data only Develop predictive model based on both input and output data Type of Learning Categories of Algorithms • No output - find natural groups and patterns from input data only • Output is a real number (temperature, stock prices) • Output is a choice between classes (True, False) (Red, Blue, Green)
  • 27. 27 Different Types of Learning Machine Learning Supervised Learning Classification Regression Unsupervised Learning Clustering Discover an internal representation from input data only Develop predictive model based on both input and output data Type of Learning Categories of Algorithms Linear Regression GLM Decision Trees Ensemble Methods Neural Networks SVR, GPR Nearest Neighbor Discriminant Analysis Naive Bayes Support Vector Machines kMeans, kmedoids Fuzzy C-Means Hierarchical Neural Networks Gaussian Mixture Hidden Markov Model
  • 28. 28 Machine Learning with Big Data • Descriptive statistics (skewness, tabulate, crosstab, cov, grpstats, …) • K-means clustering (kmeans) • Visualization (ksdensity, binScatterPlot; histogram, histogram2) • Dimensionality reduction (pca, pcacov, factoran) • Linear and generalized linear regression (fitlm, fitglm) • Discriminant analysis (fitcdiscr) • Linear classification methods for SVM and logistic regression (fitclinear) • Random forest ensembles of classification trees (TreeBagger) • Naïve Bayes classification (fitcnb) • Regularized regression (lasso) • Prediction applied to tall arrays
  • 29. 29 Demo: Training a Machine Learning Model
  • 30. 30 Demo: Training a Machine Learning Model
  • 32. 32 Regression Learner App to apply advanced regression methods to your data ▪ Added to Statistics and Machine Learning Toolbox in R2017a ▪ Point and click interface – no coding required ▪ Quickly evaluate, compare and select regression models ▪ Export and share MATLAB code or trained models
  • 33. 33 Classification Learner App to apply advanced classification methods to your data ▪ Added to Statistics and Machine Learning Toolbox in R2015a ▪ Point and click interface – no coding required ▪ Quickly evaluate, compare and select classification models ▪ Export and share MATLAB code or trained models
  • 34. 34 and Many More MATLAB Apps for Data Analytics Distribution Fitting System Identification Signal Analysis Wavelet Design and Analysis Neural Net Fitting Neural Net Pattern Recognition Training Image Labeler and many more…
  • 35. 35 Tuning Machine Learning Models Get more accurate models in less time Automatically select best machine leaning “features” NCA: Neighborhood Component Analysis Select best “features” to keep in model from over 400 candidates Automatically fine-tune machine learning parameters Hyperparameter Tuning
  • 36. 36 Machine Learning Hyperparameters Hyperparameters Tune a typical set of hyperparameters for this model Tune all hyperparameters for this model
  • 38. 38 Challenges ▪ Lack of data science expertise ▪ Feature Extraction – How to transform data to best represent the system? – Requires subject matter expertise – No right way of designing features ▪ Feature Selection – What attributes or subset of data to use? – Entails a lot of iteration – Trial and error – Difficult to evaluate features ▪ Model Development – Many different models – Model Validation and Tuning ▪ Time required to conduct the analysis MATLAB enables domain experts to do Data Science 2 Apps Language ▪ Easy to use apps ▪ Wide breadth of tools to facilitate domain specific analysis ▪ Examples/videos to get started ▪ Automatic MATLAB code generation ▪ High speed processing of large data sets Big Data Analytics Workflow: Developing Predictive models
  • 39. 39 Back to our example: Working with Big Data in MATLAB ▪ Objective: Create a model to predict the cost of a taxi ride in New York City ▪ Inputs: – Monthly taxi ride log files – The local data set is small (~20 MB) – The full data set is big (~25 GB) ▪ Approach: – Acecss Data – Preprocess and explore data – Develop and validate predictive model (linear fit) ▪ Work with subset of data for prototyping ▪ Scale to full data set on a cluster
  • 40. 40 Data Analytics Workflow: Develop Predictive Models using Big Data Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction
  • 41. 41 Demo: Taxi Fare Predictor Web App
  • 42. 42 MATLAB Production Server ▪ Server software – Manages packaged MATLAB programs and worker pool ▪ MATLAB Runtime libraries – Single server can use runtimes from different releases ▪ RESTful JSON interface ▪ Lightweight client libraries – C/C++, .NET, Python, and Java MATLAB Production Server MATLAB Runtime Request Broker & Program Manager Applications/ Database Servers RESTful JSON Enterprise Application MPS Client Library
  • 43. 43 Integrate analytics with systems MATLAB Runtime C, C++ HDL PLC Embedded Hardware C/C++ ++ Excel Add-in Java Hadoop/ Spark .NET MATLAB Production Server Standalone Application Enterprise Systems Python MATLAB Analytics run anywhere 3
  • 44. 44 YARN Product Support for Spark Web & Mobile Applications Enterprise Applications DEVELOPMENT TOOLS MATLAB Compiler MATLAB Runtime MATLAB Distributed Computing Server From MATLAB desktop: • Access data from HDFS • Run “tall” functions on Spark/Hadoop using MDCS Spark Integrate with applications: • Deploy MATLAB programs using “tall” • Develop deployable applications for Spark using MATLAB API for Spark
  • 45. 45 YARN : Data Operating System Spark Deployment Offerings ▪ Deploy “tall” programs – Create Standalone Applications: MATLAB Compiler ▪ MATLAB API for Spark – Create Standalone Applications: MATLAB Compiler – Functionality beyond tall arrays – For advanced programmers familiar with Spark – Local install of Spark to run code in MATLAB ▪ Installed on same machine as MATLAB – single node, Linux Standalone Application Edge Node MATLAB Runtime MATLAB Compiler Program using tall Program using MATLAB API for Spark Since the Standalone must run on a Linux Edge Node, you must compile on Linux
  • 46. 46 Data Analytics Workflow Integrate Analytics with Systems Desktop Apps Enterprise Scale Systems Embedded Devices and Hardware Files Databases Sensors Access and Explore Data Develop Predictive Models Model Creation e.g. Machine Learning Model Validation Parameter Optimization Preprocess Data Working with Messy Data Data Reduction/ Transformation Feature Extraction MATLAB Analytics work with business and engineering data 1 MATLAB enables domain experts to do Data Science 2 3 MATLAB Analytics run anywhere
  • 47. 47 Resources to learn and get started mathworks.com/machine-learning eBook mathworks.com/big-data
  • 48. 48 MathWorks Services ▪ Consulting – Integration – Data analysis/visualization – Unify workflows, models, data ▪ Training – Classroom, online, on-site – Data Processing, Visualization, Deployment, Parallel Computing www.mathworks.com/services/consulting/ www.mathworks.com/services/training/
  • 50. 50 Speaker Details Email: seth.deland@mathworks.com amit.doshi@mathworks.in LinkedIn: https://in.linkedin.com/in/amit-doshi https://www.linkedin.com/in/seth-deland Contact MathWorks India Products/Training Enquiry Booth Call: 080-6632-6000 Email: info@mathworks.in Your feedback is valued. Please complete the feedback form provided to you.