SlideShare a Scribd company logo
Big Data,
Big Opportunities
Chouaieb Nemri - Data Engineer @ Devoteam
Data Science & Machine Learning instructor for OpenClassrooms
Outline
1. Introduction
2. Big Data Timeline
3. Machine Learning Fundamentals
4. Machine Learning Use Case Examples
5. Machine Learning System types
6. Data Science Project Workflow
7. Book Recommendations
About Me
Data Engineer @ Devoteam
Machine Learning Instructor @
OpenClassrooms
I write about Data Engineering &
Machine Learning on Medium: c-
nemri.medium.com
3x AWS Certified
Soon : GCP and Kubernetes certified
Husband and father of 2 boys
Disability rights advocate
MS. ECE from Georgia Tech (Fall 2015)
ENSEEIHT - Electrical Engineering (2016)
Switched career in 2019
In the past I worked in:
● General Electric (2 years)
● Groupe RENAULT (1 year)
Chouaieb Nemri
Data Engineer with strong interest into AI, Cloud Computing and applied
mathematics. I am passionate about ingesting, processing, exploring
and modeling Data to make it tell compelling stories.
1. Introduction
Each year, USA have:
● 1.6 M new cases of
cancer
● 600k deaths
because of cancer
Breast Cancer
Most common cancer.
Each year USA have:
● 250k new cases of
cancer
● 40k deaths
because of cancer
Cancer tamoxifen citrate
More than 60
drugs have been
approved by FDA.
Before Big Data
era
Initially thought to
have a success
rate of 80%
After Big Data era
Tamoxifen 100% effective on 80% of people
and 0% effective on the other 20%
Life-changing finding
Machine learning and
Data mining techniques
used to identify genetic
markers that tell us about
which treatment will be
best for which patient
How it works ? An opportunity
Thanks to:
● More and more Data
● Computing resources
● Sophisticated algos
Medicine along with many
other sectors has been
reshaped - new concepts
appeared (e.g. personalized
medicine)
Knowing who are the 5% for
which a drug will be
effective is life saving.
2. Big Data Timeline
1991, World
Wide Web
is born
1995,
- Sun releases
Java Platform
- GPS fully
operational
1998,
- Google is
founded
- NoSQL
1999,
term IoT
invented @
MIT
2001
2002,
BT v1.1 spec
is released
2004
2003 2005
2006
2008
The number of devices
connected to the Internet
exceeds the world’s
population
2012
Data Scientist-
sexiest job of
the 21st century
3. Traditional vs Machine Learning approach
ML is a field of study that
gives computers the ability
to learn without being
explicitly programmed.
—Arthur Samuel, 1959
More engineering oriented
A computer program is said
to learn from experience E
with respect to some task T
and some performance
measure P, if its
performance on T, as
measured by P, improves
with experience E. —Tom
Mitchell, 1997
Definition Spam filter example
Rule based approach:
1. Manually detect and patterns
in email’s name, body or
domain name.
2. Write hardcoded rules for
every pattern detected.
3. Repeat 1 and 2 until you’re
good to go.
Result:
Complex, non-exhaustive code, that
is hard to maintain.
Machine learning approach
1. Collect Spam Data
2. Train a ML model that learns
automatically unusually fre‐
quent patterns of words in the
spam examples compared to
the ham examples
3. Update training data and re-
train periodically
3. Traditional vs Machine Learning approach
ML is a field of study that
gives computers the ability
to learn without being
explicitly programmed.
—Arthur Samuel, 1959
More engineering oriented
A computer program is said
to learn from experience E
with respect to some task T
and some performance
measure P, if its
performance on T, as
measured by P, improves
with experience E. —Tom
Mitchell, 1997
Definition Spam filter example
Rule based approach: Machine learning approach
4. Machine Learning Algorithms and Use Cases
CNN
Convolutional Neural Nets
- Analyzing images of
products on a production line
to automatically classify them
- Detecting tumors in brain
scans
RNN
Recurrent Neural Nets
- Automatically classifying
news articles
- Automatically flagging
offensive comments on
discussion forums
- Chatbots
- Text translation
- TimeSeries Forecasting
Reinforcement Learning
Clustering
- Client segmentation
Recommender Sys
- Netflix
- Amazon
- Dating apps ...
- Robotics
- Intelligent Bots
Anomaly Detection
- Credit card fraud detection
- Cybersecurity
- Log messages anomaly detection
5. Machine Learning System types
Supervised - unsupervised - semisupervised - Reinforcement Learning
Supervised - Classification
Supervised - Regression
Unsupervised - Clustering
Unsupervised - Dimensionality Reduction
5. Machine Learning System types
Batch vs Online learning
7. Data Science Project Workflow
1. Data collection 2. Exploratory Data
Analysis
- Univariate
analysis
- Multivariate
analysis
3. Data Cleaning
- Handling missing data
- Handling outliers
- Handling categorical fields
4. Feature Engineering
5. Train test split
6. Model Selection
7. Hyperparameter Tuning
8. Model Deployment 9. MLOps
8. Recommended readings
7. Recommended readings

More Related Content

What's hot

Lecture #01
Lecture #01Lecture #01
Lecture #01
Konpal Darakshan
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
Angelo Mariano
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databasesbutest
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
Rupen Momaya
 
Best Technology Capstone Project Ideas
Best Technology Capstone Project IdeasBest Technology Capstone Project Ideas
Best Technology Capstone Project Ideas
Best Capstone Project Ideas
 
A Review of Big data for Social Policy Decision Making
A Review of Big data for Social Policy Decision Making A Review of Big data for Social Policy Decision Making
A Review of Big data for Social Policy Decision Making
Ridi Fe
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsJOSEPH FRANCIS
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
iHub Research
 
Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...
Andry Alamsyah
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Srishti44
 
Big data
Big dataBig data
Big data
Claire Choong
 
Statisiticians
StatisiticiansStatisiticians
Statisiticians
Kaustubh Srivastava
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
EMC
 
Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
Open Knowledge Nepal
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
Muhammad Rumman Islam Nur
 
Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data Analytics
Product School
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
Fabio Stella
 
IOT DATA AND BIG DATA
IOT DATA AND BIG DATAIOT DATA AND BIG DATA
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
Ioannis Kourouklides
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
VijayMohan Vasu
 

What's hot (20)

Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Big Data & Machine Learning
Big Data & Machine LearningBig Data & Machine Learning
Big Data & Machine Learning
 
Data Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business DatabasesData Mining and Knowledge Discovery in Business Databases
Data Mining and Knowledge Discovery in Business Databases
 
Big Data - Insights & Challenges
Big Data - Insights & ChallengesBig Data - Insights & Challenges
Big Data - Insights & Challenges
 
Best Technology Capstone Project Ideas
Best Technology Capstone Project IdeasBest Technology Capstone Project Ideas
Best Technology Capstone Project Ideas
 
A Review of Big data for Social Policy Decision Making
A Review of Big data for Social Policy Decision Making A Review of Big data for Social Policy Decision Making
A Review of Big data for Social Policy Decision Making
 
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analyticsTools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
 
Data science and visualization lab presentation
Data science and visualization lab presentationData science and visualization lab presentation
Data science and visualization lab presentation
 
Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...Open Data Analytical Model for Human Development Index to Support Government ...
Open Data Analytical Model for Human Development Index to Support Government ...
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Big data
Big dataBig data
Big data
 
Statisiticians
StatisiticiansStatisiticians
Statisiticians
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Open Data and Artificial Intelligence
Open Data and Artificial IntelligenceOpen Data and Artificial Intelligence
Open Data and Artificial Intelligence
 
Data Mining With Big Data
Data Mining With Big DataData Mining With Big Data
Data Mining With Big Data
 
Importance of Data Analytics
 Importance of Data Analytics Importance of Data Analytics
Importance of Data Analytics
 
Causal networks, learning and inference - Introduction
Causal networks, learning and inference - IntroductionCausal networks, learning and inference - Introduction
Causal networks, learning and inference - Introduction
 
IOT DATA AND BIG DATA
IOT DATA AND BIG DATAIOT DATA AND BIG DATA
IOT DATA AND BIG DATA
 
What is Data Science
What is Data ScienceWhat is Data Science
What is Data Science
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 

Similar to Big data, big opportunities

Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
UmmeSalmaM1
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
Ankit Gupta
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Arpana Awasthi
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
ML All Chapter PDF.pdf
ML All Chapter PDF.pdfML All Chapter PDF.pdf
ML All Chapter PDF.pdf
example43
 
Machine learning - session 1
Machine learning - session 1Machine learning - session 1
Machine learning - session 1
Luis Borbon
 
Machine learning
Machine learningMachine learning
Machine learning
Tushar Nikam
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
ARVIND SARDAR
 
Introduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfIntroduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdf
GandhiMathy6
 
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners
Antonio Fernandes
 
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
Webinar trends in machine learning ce adar july 9 2020 susan mckeeverWebinar trends in machine learning ce adar july 9 2020 susan mckeever
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
smckeever
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
Tara ram Goyal
 
chapter1-introduction1.ppt
chapter1-introduction1.pptchapter1-introduction1.ppt
chapter1-introduction1.ppt
SeshuSrinivas2
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning
pyingkodi maran
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
Saratoga
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
HassanElalfy4
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learning
HarshitBarde
 
AI - the minimum you have to know
AI - the minimum you have to knowAI - the minimum you have to know
AI - the minimum you have to know
Marcel Blattner, PhD
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
Temok IT Services
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
Darshan Ambhaikar
 

Similar to Big data, big opportunities (20)

Lecture1 introduction to machine learning
Lecture1 introduction to machine learningLecture1 introduction to machine learning
Lecture1 introduction to machine learning
 
Intro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning PresentationIntro/Overview on Machine Learning Presentation
Intro/Overview on Machine Learning Presentation
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
ML All Chapter PDF.pdf
ML All Chapter PDF.pdfML All Chapter PDF.pdf
ML All Chapter PDF.pdf
 
Machine learning - session 1
Machine learning - session 1Machine learning - session 1
Machine learning - session 1
 
Machine learning
Machine learningMachine learning
Machine learning
 
Machine Learning Ch 1.ppt
Machine Learning Ch 1.pptMachine Learning Ch 1.ppt
Machine Learning Ch 1.ppt
 
Introduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdfIntroduction AI ML& Mathematicals of ML.pdf
Introduction AI ML& Mathematicals of ML.pdf
 
Artificial intelligence slides beginners
Artificial intelligence slides beginners Artificial intelligence slides beginners
Artificial intelligence slides beginners
 
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
Webinar trends in machine learning ce adar july 9 2020 susan mckeeverWebinar trends in machine learning ce adar july 9 2020 susan mckeever
Webinar trends in machine learning ce adar july 9 2020 susan mckeever
 
Supervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its applicationSupervised Machine Learning Techniques common algorithms and its application
Supervised Machine Learning Techniques common algorithms and its application
 
chapter1-introduction1.ppt
chapter1-introduction1.pptchapter1-introduction1.ppt
chapter1-introduction1.ppt
 
Essential concepts for machine learning
Essential concepts for machine learning Essential concepts for machine learning
Essential concepts for machine learning
 
Gary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you ThinkGary Hope - Machine Learning: It's Not as Hard as you Think
Gary Hope - Machine Learning: It's Not as Hard as you Think
 
Lect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdfLect 7 intro to M.L..pdf
Lect 7 intro to M.L..pdf
 
Case study on machine learning
Case study on machine learningCase study on machine learning
Case study on machine learning
 
AI - the minimum you have to know
AI - the minimum you have to knowAI - the minimum you have to know
AI - the minimum you have to know
 
what-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdfwhat-is-machine-learning-and-its-importance-in-todays-world.pdf
what-is-machine-learning-and-its-importance-in-todays-world.pdf
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 

Recently uploaded

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Boston Institute of Analytics
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
Oppotus
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 

Recently uploaded (20)

Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project PresentationPredicting Product Ad Campaign Performance: A Data Analysis Project Presentation
Predicting Product Ad Campaign Performance: A Data Analysis Project Presentation
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 

Big data, big opportunities

  • 1. Big Data, Big Opportunities Chouaieb Nemri - Data Engineer @ Devoteam Data Science & Machine Learning instructor for OpenClassrooms
  • 2. Outline 1. Introduction 2. Big Data Timeline 3. Machine Learning Fundamentals 4. Machine Learning Use Case Examples 5. Machine Learning System types 6. Data Science Project Workflow 7. Book Recommendations
  • 3. About Me Data Engineer @ Devoteam Machine Learning Instructor @ OpenClassrooms I write about Data Engineering & Machine Learning on Medium: c- nemri.medium.com 3x AWS Certified Soon : GCP and Kubernetes certified Husband and father of 2 boys Disability rights advocate MS. ECE from Georgia Tech (Fall 2015) ENSEEIHT - Electrical Engineering (2016) Switched career in 2019 In the past I worked in: ● General Electric (2 years) ● Groupe RENAULT (1 year) Chouaieb Nemri Data Engineer with strong interest into AI, Cloud Computing and applied mathematics. I am passionate about ingesting, processing, exploring and modeling Data to make it tell compelling stories.
  • 4. 1. Introduction Each year, USA have: ● 1.6 M new cases of cancer ● 600k deaths because of cancer Breast Cancer Most common cancer. Each year USA have: ● 250k new cases of cancer ● 40k deaths because of cancer Cancer tamoxifen citrate More than 60 drugs have been approved by FDA. Before Big Data era Initially thought to have a success rate of 80% After Big Data era Tamoxifen 100% effective on 80% of people and 0% effective on the other 20% Life-changing finding Machine learning and Data mining techniques used to identify genetic markers that tell us about which treatment will be best for which patient How it works ? An opportunity Thanks to: ● More and more Data ● Computing resources ● Sophisticated algos Medicine along with many other sectors has been reshaped - new concepts appeared (e.g. personalized medicine) Knowing who are the 5% for which a drug will be effective is life saving.
  • 5. 2. Big Data Timeline 1991, World Wide Web is born 1995, - Sun releases Java Platform - GPS fully operational 1998, - Google is founded - NoSQL 1999, term IoT invented @ MIT 2001 2002, BT v1.1 spec is released 2004 2003 2005 2006 2008 The number of devices connected to the Internet exceeds the world’s population 2012 Data Scientist- sexiest job of the 21st century
  • 6. 3. Traditional vs Machine Learning approach ML is a field of study that gives computers the ability to learn without being explicitly programmed. —Arthur Samuel, 1959 More engineering oriented A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997 Definition Spam filter example Rule based approach: 1. Manually detect and patterns in email’s name, body or domain name. 2. Write hardcoded rules for every pattern detected. 3. Repeat 1 and 2 until you’re good to go. Result: Complex, non-exhaustive code, that is hard to maintain. Machine learning approach 1. Collect Spam Data 2. Train a ML model that learns automatically unusually fre‐ quent patterns of words in the spam examples compared to the ham examples 3. Update training data and re- train periodically
  • 7. 3. Traditional vs Machine Learning approach ML is a field of study that gives computers the ability to learn without being explicitly programmed. —Arthur Samuel, 1959 More engineering oriented A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. —Tom Mitchell, 1997 Definition Spam filter example Rule based approach: Machine learning approach
  • 8. 4. Machine Learning Algorithms and Use Cases CNN Convolutional Neural Nets - Analyzing images of products on a production line to automatically classify them - Detecting tumors in brain scans RNN Recurrent Neural Nets - Automatically classifying news articles - Automatically flagging offensive comments on discussion forums - Chatbots - Text translation - TimeSeries Forecasting Reinforcement Learning Clustering - Client segmentation Recommender Sys - Netflix - Amazon - Dating apps ... - Robotics - Intelligent Bots Anomaly Detection - Credit card fraud detection - Cybersecurity - Log messages anomaly detection
  • 9. 5. Machine Learning System types Supervised - unsupervised - semisupervised - Reinforcement Learning Supervised - Classification Supervised - Regression Unsupervised - Clustering Unsupervised - Dimensionality Reduction
  • 10. 5. Machine Learning System types Batch vs Online learning
  • 11. 7. Data Science Project Workflow 1. Data collection 2. Exploratory Data Analysis - Univariate analysis - Multivariate analysis 3. Data Cleaning - Handling missing data - Handling outliers - Handling categorical fields 4. Feature Engineering 5. Train test split 6. Model Selection 7. Hyperparameter Tuning 8. Model Deployment 9. MLOps

Editor's Notes

  1. 1991 ■ The Internet, or World Wide Web as we know it, is born. The protocol Hypertext Transfer Protocol (HTTP) becomes the standard means for sharing information in this new medium. 1995 ■ Sun releases the Java platform. Java, invented in 1991, has become the second most popular language behind C. It dominates the Web applications space and is the de facto standard for middle‐tier applications. These applications are the source for recording and storing web traffi c. ■ Global Positioning System (GPS) becomes fully operational. GPS was originally developedby DARPA (Defense Advanced Research Projects Agency) for military applications in the early 1970s. 1998 ■ Carlo Strozzi develops an open‐source relational database and calls it NoSQL. Ten years later, a movement to develop NoSQL databases to work with large, unstructured data sets gains momentum. ■ Google is founded by Larry Page and Sergey Brin, who worked for about a year on a Stanford search engine project called BackRub. 1999 ■ Kevin Ashton, cofounder of the Auto‐ID Center at the Massachusetts Institute of Technology (MIT), invents the term “the Internet of Things.” 2001 ■ Wikipedia is launched. The crowd-sourced encyclopedia revolutionized the way people reference information. 2002 ■ Version 1.1 of the Bluetooth specifi cation is released by the Institute of Electrical and Electronics Engineers (IEEE). Bluetooth is a wireless technology standard for the transfer of data over short distances. The advancement of this specifi cation and its adoption lead to a whole host of wearable devices that communicate between the device and another computer. Today nearly every portable device has a Bluetooth receiver 2003 ■ According to studies by IDC and EMC, the amount of data created in 2003 surpasses the amount of data created in all of human history before then. It is estimated that 1.8 zettabytes (ZB) was created in 2011 alone (1.8 ZB is the equivalent of 200 billion high‐defi nition movies, each two hours long, or 47 million years of footage with no bathroom breaks). ■ LinkedIn, the popular social networking website for professionals, launches. In 2013, the site had about 260 million users. 2004 ■ Wikipedia reaches 500,000 articles in February; seven months later it tops 1 million articles. ■ Facebook, the social networking service, is founded by Mark Zuckerberg and others in Cambridge, Massachusetts. In 2013, the site had more than 1.15 billion users. 2005 ■ The Apache Hadoop project is created by Doug Cutting and Mike Caferella. The name for the project came from the toy elephant of Cutting’s young son. The now‐famous yellow elephant becomes a household word just a few years later and a foundational part of almost all big data strategies. ■ The National Science Board recommends that the National Science Foundation (NSF) create a career path for “a suffi cient number of high‐quality data scientists” to manage the growing collection of digital information 2007 ■ Apple releases the iPhone and creates a strong consumer market for smartphones 2008 ■ The number of devices connected to the Internet exceeds the world’s population. 2012 ■ The Obama administration announces the Big Data Research and Development Initiative, consisting of 84 programs in six departments. The NSF publishes “Core Techniques and Technologies for Advancing Big Data Science & Engineering.” ■ IDC and EMC estimate that 2.8 ZB of data will be created in 2012 but that only 3% of what could be usable for big data is tagged and less is analyzed. The report predicts that the digital world will by 2020 hold 40 ZB, 57 times the number of grains of sand on all the beaches in the world. ■ The Harvard Business Review calls the job of data scientist “the sexiest job of the 21st century.” 2013 ■ The democratization of data begins. With smartphones, tablets, and Wi‐Fi, everyone generates data at prodigious rates. More individuals access large volumes of public data and put data to creative use.