SlideShare a Scribd company logo
1 of 13
Download to read offline
( Big ) Data Management
Data Mining & Machine Learning
Global Concepts in 10 slides
2016
Nicolas SARRAMAGNA
https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587
CONTENTS
 Introduction
 What / Why
 How
 References
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning in Data Management 3
Collect
Storage
Data Mining /
Machine Learning
Data Viz
Governance
Security
Master Data
Data quality
 DATA MANAGEMENT
 Multiples modules
 BIG DATA
 Velocity, Volume, Variety, Veracity, Value
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning – What / Why 4
 DATA MINING - VALUE
 Explore, understand data and find : relations, new properties, inductions on them
 Descriptive approach
 MACHINE LEARNING - VALUE
 Build a predictive model to answer a question
 Predictive approach
 20/30 YEARS OLD BUT NEW CONTEXT
 cpu, db, ram capacities
 more data and features
 Internet
 Big data
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Overview - Data Mining
SEPTEMBER 2015
5
 EXPLORE DATA
 usage of statistics
 need data vizualisation for interpretation and insights
 CLUSTERING, ASSOCIATION
 usage of machine learning
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Overview - Machine Learning 6
 PREDICTION
 predict a categorical : classification
 predict a number : regression
 clustering, association
 usage of data mining
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning - How 7
 PROCESS
 Define objective, answer, success criteria -> ML Canvas
 Data understanding : collect data (one or more data sources), explore (min, max, histogram, charts)
 Data preparation : data quality (outliers, void values), normalize, dimension reduction, noise, new features, data
labeled, text, date, shuffle
 Data modeling : baseline (random, mean), split data : train & test, select, combine, apply algorithms
 Data evaluation : interpretation, evaluation (confusion matrix : recall, precision, formula), validation
 Data deployment : deploy and monitor the model (integration, performance : latency, throughput), A/B testing,
scalable, sustainability
 WARNING
 Need business : domain knowledge
 Need data, need features : min 10 by feature, 100 better, relevant features
 Date preparation is crucial : garbage in -> garbage out
 Stay rigorous on phases of modeling and evaluation : overfitting (train, test, cross validation), models can fail
 Use best practices of Web development : Continuous integration, deployment, evaluation, monitoring, packaging
 IN PRACTICE, DIFFERENT LEVELS OF ABSTRACTION
 Dev/lib (R, python scikit-learn, Spark) < generic (MLaaS : BigML, AWS) < problem specific and / or dedicated soft
 Use a data-driven approach than model-driven : better ROI with new features, more input data, trying different
models (as-is) and usage of combination of parameters than creating, tuning models and no automatic
combination parameters approach
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning - How
MARCH 2015
8
 EXAMPLE OF MACHINE LEARNING CANVAS ~ BUSINESS MODEL CANVAS
 https://github.com/louisdorard/machinelearningcanvas
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning – How 9
 DATA MODELING DEV/LIB LEVEL MODE (SEE LINKS IN LAST SLIDE)
 DATA MODELING GENERIC LEVEL MODE : 1-CLICK (AND SOME OPTIONS)
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning – How
MARCH 2015
10
 SOFTWARE
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning - How 11
 EVALUATION (TRAIN, TEST) WITH CONFUSION MATRIX :
 Recall -> % quantity of results : False Negative = 0 -> recall 100%
 Precision -> % quality of results : False Positive = 0 -> precision 100%
 Other metric : TP x costTP + TN x costTN + FP x costFP + FN x costFN = value of the model
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
Data Mining / Machine Learning - How
MARCH 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER
12
 ACTORS ON THE MARKET : LIBS, GENERIC, PROBLEM SPECIFIC
COMPAGNIE PLASTIC OMNIUM
CONFIDENTIAL
 REFERENCES
 http://www.saedsayad.com
 http://www.louisdorard.com/courses/
 https://bigml.com/
 http://scikit-learn.org/stable/tutorial/machine_learning_map/
 http://oliviaklose.com/machine-learning-11-algorithms-explained/
 http://www.kdnuggets.com/2016/02/gartner-2016-mq-analytics-platforms-gainers-losers.html
 http://www.kdnuggets.com/2015/04/forrester-wave-big-data-predictive-analytics-gainers-losers.html
 http://www.shivonzilis.com/
 http://www.datasciencecentral.com/profiles/blogs/20-data-science-r-python-excel-and-machine-learning-cheat-
sheets
Data Mining / Machine Learning - References 13

More Related Content

What's hot

A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...PAPIs.io
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning ProjectEng Teong Cheah
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Louis Dorard
 
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML AutomationBigML, Inc
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryMatouš Havlena
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningTamir Taha
 
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseBigML, Inc
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...Data Science Milan
 
Synthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML TechniquesSynthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML TechniquesQuantUniversity
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiSri Ambati
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...Athens Big Data
 
Build a Sentiment Model using ML.Net
Build a Sentiment Model using ML.NetBuild a Sentiment Model using ML.Net
Build a Sentiment Model using ML.NetCheah Eng Soon
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learningSri Ambati
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEDatabricks
 

What's hot (20)

A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
A business level introduction to Artificial Intelligence - Louis Dorard @ PAP...
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
End-to-End Machine Learning Project
End-to-End Machine Learning ProjectEnd-to-End Machine Learning Project
End-to-End Machine Learning Project
 
Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!Machine Learning: je m'y mets demain!
Machine Learning: je m'y mets demain!
 
DutchMLSchool. ML Automation
DutchMLSchool. ML AutomationDutchMLSchool. ML Automation
DutchMLSchool. ML Automation
 
Predictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive IndustryPredictive Analytics Project in Automotive Industry
Predictive Analytics Project in Automotive Industry
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the EnterpriseMLSEV Virtual. ML Platformization and AutoML in the Enterprise
MLSEV Virtual. ML Platformization and AutoML in the Enterprise
 
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig..."How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
"How Pirelli uses Domino and Plotly for Smart Manufacturing" by Alberto Arrig...
 
Synthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML TechniquesSynthetic VIX Data Generation Using ML Techniques
Synthetic VIX Data Generation Using ML Techniques
 
Ds for finance day 2
Ds for finance day 2Ds for finance day 2
Ds for finance day 2
 
Data Mining 101
Data Mining 101Data Mining 101
Data Mining 101
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
 
ML master class
ML master classML master class
ML master class
 
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
16th Athens Big Data Meetup - 1st Talk - An Introduction to Machine Learning ...
 
Machine learning
Machine learning Machine learning
Machine learning
 
Build a Sentiment Model using ML.Net
Build a Sentiment Model using ML.NetBuild a Sentiment Model using ML.Net
Build a Sentiment Model using ML.Net
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIMEUnified Approach to Interpret Machine Learning Model: SHAP + LIME
Unified Approach to Interpret Machine Learning Model: SHAP + LIME
 
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
Real-Time Machine Learning at Industrial scale (University of Oxford, 9th Oct...
 

Similar to ( Big ) Data Management - Data Mining and Machine Learning - Global concepts in 10 slides

( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slides( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slidesNicolas Sarramagna
 
Model Risk Management for Machine Learning
Model Risk Management for Machine LearningModel Risk Management for Machine Learning
Model Risk Management for Machine LearningQuantUniversity
 
( Big ) Data Management - Collect - Global concepts in 5 slides
( Big ) Data Management - Collect - Global concepts in 5 slides( Big ) Data Management - Collect - Global concepts in 5 slides
( Big ) Data Management - Collect - Global concepts in 5 slidesNicolas Sarramagna
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleDenodo
 
( Big ) Data Management - Data Quality - Global concepts in 5 slides
( Big ) Data Management - Data Quality - Global concepts in 5 slides( Big ) Data Management - Data Quality - Global concepts in 5 slides
( Big ) Data Management - Data Quality - Global concepts in 5 slidesNicolas Sarramagna
 
Presentation Title
Presentation TitlePresentation Title
Presentation Titlebutest
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amatoSSSW
 
Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahoutsscdotopen
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningSingleStore
 
lecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxlecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxMarc Teunis
 
Demystifying Data Science Webinar - February 14, 2018
Demystifying Data Science Webinar - February 14, 2018Demystifying Data Science Webinar - February 14, 2018
Demystifying Data Science Webinar - February 14, 2018Analytics8
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge DiscoverySSSW
 
Enterprise AI by using IBM DB2
Enterprise AI by using IBM DB2Enterprise AI by using IBM DB2
Enterprise AI by using IBM DB2Object Automation
 
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Dean Willson
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsKun Liu
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchArtemSunfun
 
Tuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationTuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationSigOpt
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 

Similar to ( Big ) Data Management - Data Mining and Machine Learning - Global concepts in 10 slides (20)

( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slides( Big ) Data Management - Master Data - Global concepts in 10 slides
( Big ) Data Management - Master Data - Global concepts in 10 slides
 
Model Risk Management for Machine Learning
Model Risk Management for Machine LearningModel Risk Management for Machine Learning
Model Risk Management for Machine Learning
 
( Big ) Data Management - Collect - Global concepts in 5 slides
( Big ) Data Management - Collect - Global concepts in 5 slides( Big ) Data Management - Collect - Global concepts in 5 slides
( Big ) Data Management - Collect - Global concepts in 5 slides
 
Virtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise ScaleVirtual Sandbox for Data Scientists at Enterprise Scale
Virtual Sandbox for Data Scientists at Enterprise Scale
 
( Big ) Data Management - Data Quality - Global concepts in 5 slides
( Big ) Data Management - Data Quality - Global concepts in 5 slides( Big ) Data Management - Data Quality - Global concepts in 5 slides
( Big ) Data Management - Data Quality - Global concepts in 5 slides
 
Presentation Title
Presentation TitlePresentation Title
Presentation Title
 
Knowledge discovery claudiad amato
Knowledge discovery claudiad amatoKnowledge discovery claudiad amato
Knowledge discovery claudiad amato
 
Introduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache MahoutIntroduction to Collaborative Filtering with Apache Mahout
Introduction to Collaborative Filtering with Apache Mahout
 
Building the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine LearningBuilding the Ideal Stack for Machine Learning
Building the Ideal Stack for Machine Learning
 
lecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptxlecture-intro-pet-nams-ai-in-toxicology.pptx
lecture-intro-pet-nams-ai-in-toxicology.pptx
 
Demystifying Data Science Webinar - February 14, 2018
Demystifying Data Science Webinar - February 14, 2018Demystifying Data Science Webinar - February 14, 2018
Demystifying Data Science Webinar - February 14, 2018
 
Tutorial Knowledge Discovery
Tutorial Knowledge DiscoveryTutorial Knowledge Discovery
Tutorial Knowledge Discovery
 
Enterprise AI by using IBM DB2
Enterprise AI by using IBM DB2Enterprise AI by using IBM DB2
Enterprise AI by using IBM DB2
 
Data Mining with SQL Server 2005
Data Mining with SQL Server 2005Data Mining with SQL Server 2005
Data Mining with SQL Server 2005
 
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to NutsDeveloping Web-scale Machine Learning at LinkedIn - From Soup to Nuts
Developing Web-scale Machine Learning at LinkedIn - From Soup to Nuts
 
Towards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning ResearchTowards Increasing Predictability of Machine Learning Research
Towards Increasing Predictability of Machine Learning Research
 
Enterprise AI using DB2
Enterprise AI using DB2Enterprise AI using DB2
Enterprise AI using DB2
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Tuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning OptimizationTuning the Untunable - Insights on Deep Learning Optimization
Tuning the Untunable - Insights on Deep Learning Optimization
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 

Recently uploaded

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...ranjana rawat
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 

Recently uploaded (20)

GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
(SHREYA) Chakan Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Esc...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 

( Big ) Data Management - Data Mining and Machine Learning - Global concepts in 10 slides

  • 1. ( Big ) Data Management Data Mining & Machine Learning Global Concepts in 10 slides 2016 Nicolas SARRAMAGNA https://fr.linkedin.com/pub/nicolas-sarramagna/19/941/587
  • 2. CONTENTS  Introduction  What / Why  How  References
  • 3. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning in Data Management 3 Collect Storage Data Mining / Machine Learning Data Viz Governance Security Master Data Data quality  DATA MANAGEMENT  Multiples modules  BIG DATA  Velocity, Volume, Variety, Veracity, Value
  • 4. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning – What / Why 4  DATA MINING - VALUE  Explore, understand data and find : relations, new properties, inductions on them  Descriptive approach  MACHINE LEARNING - VALUE  Build a predictive model to answer a question  Predictive approach  20/30 YEARS OLD BUT NEW CONTEXT  cpu, db, ram capacities  more data and features  Internet  Big data
  • 5. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Overview - Data Mining SEPTEMBER 2015 5  EXPLORE DATA  usage of statistics  need data vizualisation for interpretation and insights  CLUSTERING, ASSOCIATION  usage of machine learning
  • 6. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Overview - Machine Learning 6  PREDICTION  predict a categorical : classification  predict a number : regression  clustering, association  usage of data mining
  • 7. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning - How 7  PROCESS  Define objective, answer, success criteria -> ML Canvas  Data understanding : collect data (one or more data sources), explore (min, max, histogram, charts)  Data preparation : data quality (outliers, void values), normalize, dimension reduction, noise, new features, data labeled, text, date, shuffle  Data modeling : baseline (random, mean), split data : train & test, select, combine, apply algorithms  Data evaluation : interpretation, evaluation (confusion matrix : recall, precision, formula), validation  Data deployment : deploy and monitor the model (integration, performance : latency, throughput), A/B testing, scalable, sustainability  WARNING  Need business : domain knowledge  Need data, need features : min 10 by feature, 100 better, relevant features  Date preparation is crucial : garbage in -> garbage out  Stay rigorous on phases of modeling and evaluation : overfitting (train, test, cross validation), models can fail  Use best practices of Web development : Continuous integration, deployment, evaluation, monitoring, packaging  IN PRACTICE, DIFFERENT LEVELS OF ABSTRACTION  Dev/lib (R, python scikit-learn, Spark) < generic (MLaaS : BigML, AWS) < problem specific and / or dedicated soft  Use a data-driven approach than model-driven : better ROI with new features, more input data, trying different models (as-is) and usage of combination of parameters than creating, tuning models and no automatic combination parameters approach
  • 8. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning - How MARCH 2015 8  EXAMPLE OF MACHINE LEARNING CANVAS ~ BUSINESS MODEL CANVAS  https://github.com/louisdorard/machinelearningcanvas
  • 9. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning – How 9  DATA MODELING DEV/LIB LEVEL MODE (SEE LINKS IN LAST SLIDE)  DATA MODELING GENERIC LEVEL MODE : 1-CLICK (AND SOME OPTIONS)
  • 10. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning – How MARCH 2015 10  SOFTWARE
  • 11. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning - How 11  EVALUATION (TRAIN, TEST) WITH CONFUSION MATRIX :  Recall -> % quantity of results : False Negative = 0 -> recall 100%  Precision -> % quality of results : False Positive = 0 -> precision 100%  Other metric : TP x costTP + TN x costTN + FP x costFP + FN x costFN = value of the model
  • 12. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL Data Mining / Machine Learning - How MARCH 2015 FOOTER CAN BE PERSIZED AS FOLLOW: INSERT / HEADER AND FOOTER 12  ACTORS ON THE MARKET : LIBS, GENERIC, PROBLEM SPECIFIC
  • 13. COMPAGNIE PLASTIC OMNIUM CONFIDENTIAL  REFERENCES  http://www.saedsayad.com  http://www.louisdorard.com/courses/  https://bigml.com/  http://scikit-learn.org/stable/tutorial/machine_learning_map/  http://oliviaklose.com/machine-learning-11-algorithms-explained/  http://www.kdnuggets.com/2016/02/gartner-2016-mq-analytics-platforms-gainers-losers.html  http://www.kdnuggets.com/2015/04/forrester-wave-big-data-predictive-analytics-gainers-losers.html  http://www.shivonzilis.com/  http://www.datasciencecentral.com/profiles/blogs/20-data-science-r-python-excel-and-machine-learning-cheat- sheets Data Mining / Machine Learning - References 13