SlideShare a Scribd company logo
1 of 47
Big Data & Machine Learning
12/24/2017
Agenda
• Why Machine Learning ?
• What is Machine Learning ?
• Some ML Applications
• Data Science Pipeline
• Data -> Big Data
• Big Data -> Feature Selection
• Machine Learning Modelling
• Model Evaluation
• Inference/Analytics
• Summary
12/24/2017 2
Why Machine Learning ?
12/24/2017 3
… Why Machine Learning ?
12/24/2017 4
What is Machine Learning ?
12/24/2017 5
What is Machine Learning ?
12/24/2017 6
Some MLApplications
12/24/2017 7
Data Science Pipeline
12/24/2017 8
Data Science Pipeline
12/24/2017 9
Data Features Model Inference Decision
Evaluation
Data => Big Data
• Structured Data
• Unstructured Data
• Big Data
- Text Data
- Time Series Data
- Spatial/Location-based Data
- Image/Video/Audio Data
12/24/2017 10
Big Data
12/24/2017 11
Big Data => Feature Selection
• Simplification of models - easier to interpret
• Shorter training times
• To avoid the curse of dimensionality
• Enhanced generalization by reducing
overfitting(formally, reduction of variance)
12/24/2017 12
Big Data => Feature Selection Techniques
1. Subset selection
• Exhaustive
• Best first
• Simulated annealing
• Genetic algorithm
• Greedy forward selection
• Greedy backward elimination
• Particle swarm optimization
• Targeted projection pursuit
• Scatter Search
• Variable Neighborhood Search
2. Optimality criteria
3. Structure learning
12/24/2017 13
Machine Learning Modelling
Linear Vs Non-linear Models
12/24/2017 14
Linear Modelling
• Response = constant + parameter * predictor
+ ... + parameter * predictor
• Y = b o + b1X1 + b2X2 + ... + bkXk
• Y = b o + b1X1 + b2X1
2
12/24/2017 15
… Linear Modelling
12/24/2017 16
Non-Linear Modelling
• Models which are not Linear ;)
12/24/2017 17
Machine Learning Modelling
Deterministic Vs Stochastic Models
12/24/2017 18
Deterministic Modelling
12/24/2017 19
Modeling is done using deterministic
variables. Uncertainty is not captured
Stochastic Modelling
Models the real world uncertainty using
random variables
12/24/2017 20
Machine Learning Modelling
Parametric Vs Non- Parametric Models
12/24/2017 21
Parametric Modelling
• Data is behaved according to a probability
distribution
• No of parameters is constant
• Focused on group means
12/24/2017 22
Non-Parametric Modelling
• Do not assume a particular probability
distribution
• No of parameters grows with training
samples
• Focused on group medians
12/24/2017 23
Stochastic ML Modelling
Frequentist Vs Bayesian Models
12/24/2017 24
Frequentist ML Modelling
Maximum Likelihood Estimation(MLE)
You need to model your random variables realistically
- Discrete r.v
i.e : Bernouli/Binomial/Geometri/Poisson)
- Continuous r.v
i.e :
Uniform/Exponential/Gamma/Normal(Gaussian)
Explained in Regression Modeling – Probabilistic
Interpretation
12/24/2017 25
Bayesian ML Modelling
Very powerful modeling approach
Prior knowledge is incorporated 
Maximum-a-Posteriori(MAP)
12/24/2017 26
Bayesian ML Modelling
12/24/2017 27
Bayesian ML Modelling
12/24/2017 28
Bayesian ML Modelling
12/24/2017 29
Classification of ML Techniques
12/24/2017 30
Supervised Learning
• Given a training set of N example input–
output pairs (x1, y1), (x2, y2), . . . (xN, yN) ,
where each yj was generated by an
unknown function y = f(x),
• discover a function h that approximates the
true function f.
12/24/2017 31
Supervised Learning
12/24/2017 32
Supervised Learning
12/24/2017 33
• Regression
When the output y is a number
i.e : tomorrow’s temperature
• Classification
When the output y is one of a finite set of
values.
i.e : sunny, cloudy or rainy
Regression
12/24/2017 34
Regression – Optimization Approach
12/24/2017 35
Regression – Optimization Approach
12/24/2017 36
Regression – Probabilistic interpretation
12/24/2017 37
Maximum Likelihood function
Instead of maximizing L(θ), we can also maximize
any strictly increasing function of L(θ).
12/24/2017 38
Maximum Likelihood function
12/24/2017 39
Classification
12/24/2017 40
• Logistic Regression
i.e : Binary Classification y∈{0,1}
Hypothesis Representation
Unsupervised Learning
12/24/2017 41
Unsupervised Learning
12/24/2017 42
Reference
Model Evaluation – Bias Variance Tradeoff
12/24/2017 43
… Model Evaluation
Forecast Error = In-Sample Error + Model
Instability + Random Error
12/24/2017 44
Inference/Analytics
12/24/2017 45
Summary
12/24/2017 46
THANK YOU!

More Related Content

What's hot

Bigdata warehouse
Bigdata warehouseBigdata warehouse
Bigdata warehouse
student
 

What's hot (20)

Umc floortje scheepers
Umc floortje scheepersUmc floortje scheepers
Umc floortje scheepers
 
Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011Think Big Analytics Corporate Deck Hadoop Summit June 2011
Think Big Analytics Corporate Deck Hadoop Summit June 2011
 
"Big Data" and Business Analytics: Key Requirements for High Business Value R...
"Big Data" and Business Analytics: Key Requirements for High Business Value R..."Big Data" and Business Analytics: Key Requirements for High Business Value R...
"Big Data" and Business Analytics: Key Requirements for High Business Value R...
 
Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9Introduction to Data Science by Datalent Team @Data Science Clinic #9
Introduction to Data Science by Datalent Team @Data Science Clinic #9
 
Institutional Data Management Blueprint
Institutional Data Management BlueprintInstitutional Data Management Blueprint
Institutional Data Management Blueprint
 
Data vault
Data vaultData vault
Data vault
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Bigdata warehouse
Bigdata warehouseBigdata warehouse
Bigdata warehouse
 
Tim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentationTim scottkoenverheyenpresentation
Tim scottkoenverheyenpresentation
 
The big data in capital markets
The big data in capital marketsThe big data in capital markets
The big data in capital markets
 
Big data Analytics in Information Technology
Big data Analytics in Information TechnologyBig data Analytics in Information Technology
Big data Analytics in Information Technology
 
On Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challengesOn Big Data Analytics - opportunities and challenges
On Big Data Analytics - opportunities and challenges
 
Augmented analytics will push the analytics adoption
Augmented analytics will push the analytics adoptionAugmented analytics will push the analytics adoption
Augmented analytics will push the analytics adoption
 
Cs2017 gary allemann presentation
Cs2017 gary allemann presentationCs2017 gary allemann presentation
Cs2017 gary allemann presentation
 
Goethals Harvard Library's Digital Preservation Repository
Goethals Harvard Library's Digital Preservation RepositoryGoethals Harvard Library's Digital Preservation Repository
Goethals Harvard Library's Digital Preservation Repository
 
Analytics 101 - Getting Started
Analytics 101 - Getting Started Analytics 101 - Getting Started
Analytics 101 - Getting Started
 
MongoDB World 2019: Hands-on with an open source, serverless Data Warehouse f...
MongoDB World 2019: Hands-on with an open source, serverless Data Warehouse f...MongoDB World 2019: Hands-on with an open source, serverless Data Warehouse f...
MongoDB World 2019: Hands-on with an open source, serverless Data Warehouse f...
 
A common meta model for data analysis based on DSM
A common meta model for data analysis based on DSMA common meta model for data analysis based on DSM
A common meta model for data analysis based on DSM
 
HashCash big data services
HashCash big data servicesHashCash big data services
HashCash big data services
 
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University ChennaiBig Data As a service - Sethuonline.com | Sathyabama University Chennai
Big Data As a service - Sethuonline.com | Sathyabama University Chennai
 

Similar to Machine Learning in the Data Science Context

GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
Neo4j
 

Similar to Machine Learning in the Data Science Context (20)

Requirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - PresentationRequirements for a Master Data Management (MDM) Solution - Presentation
Requirements for a Master Data Management (MDM) Solution - Presentation
 
Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science Machine Learning part 3 - Introduction to data science
Machine Learning part 3 - Introduction to data science
 
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
The Hive Think Tank: Rendezvous Architecture Makes Machine Learning Logistics...
 
Machine Learning logistics
Machine Learning logisticsMachine Learning logistics
Machine Learning logistics
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
 
Business cases and costs RDN
Business cases and costs RDNBusiness cases and costs RDN
Business cases and costs RDN
 
Telecom Data Analytics
Telecom Data AnalyticsTelecom Data Analytics
Telecom Data Analytics
 
Lesson1.2.pptx.pdf
Lesson1.2.pptx.pdfLesson1.2.pptx.pdf
Lesson1.2.pptx.pdf
 
Empowering Businesses through Big Data Analytics
Empowering Businesses through  Big Data AnalyticsEmpowering Businesses through  Big Data Analytics
Empowering Businesses through Big Data Analytics
 
Datascience methodology
Datascience methodologyDatascience methodology
Datascience methodology
 
Demystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep LearningDemystifying AI, Machine Learning and Deep Learning
Demystifying AI, Machine Learning and Deep Learning
 
ml-02x01.pdf
ml-02x01.pdfml-02x01.pdf
ml-02x01.pdf
 
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
GraphConnect Europe 2016 - Semantic PIM: Using a Graph Data Model at Toy Manu...
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Seminaire bigdata23102014
Seminaire bigdata23102014Seminaire bigdata23102014
Seminaire bigdata23102014
 
Scalable Strategies for Computing with Massive Data: The Bigmemory Project
Scalable Strategies for Computing with Massive Data: The Bigmemory ProjectScalable Strategies for Computing with Massive Data: The Bigmemory Project
Scalable Strategies for Computing with Massive Data: The Bigmemory Project
 
Machine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model ManagementMachine Learning Success: The Key to Easier Model Management
Machine Learning Success: The Key to Easier Model Management
 
A data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototypingA data science observatory based on RAMP - rapid analytics and model prototyping
A data science observatory based on RAMP - rapid analytics and model prototyping
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is EssentialBig Data Expo 2015 - Barnsten Why Data Modelling is Essential
Big Data Expo 2015 - Barnsten Why Data Modelling is Essential
 

Recently uploaded

Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 

Recently uploaded (20)

%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT  - Elevating Productivity in Today's Agile EnvironmentHarnessing ChatGPT  - Elevating Productivity in Today's Agile Environment
Harnessing ChatGPT - Elevating Productivity in Today's Agile Environment
 
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 

Machine Learning in the Data Science Context