Big data, big opportunities

Big Data,
Big Opportunities
Chouaieb Nemri - Data Engineer @ Devoteam
Data Science & Machine Learning instructor for OpenClassrooms

Outline
1. Introduction
2. Big Data Timeline
3. Machine Learning Fundamentals
4. Machine Learning Use Case Examples
5. Machine Learning System types
6. Data Science Project Workflow
7. Book Recommendations

About Me
Data Engineer @ Devoteam
Machine Learning Instructor @
OpenClassrooms
I write about Data Engineering &
Machine Learning on Medium: c-
nemri.medium.com
3x AWS Certified
Soon : GCP and Kubernetes certified
Husband and father of 2 boys
Disability rights advocate
MS. ECE from Georgia Tech (Fall 2015)
ENSEEIHT - Electrical Engineering (2016)
Switched career in 2019
In the past I worked in:
● General Electric (2 years)
● Groupe RENAULT (1 year)
Chouaieb Nemri
Data Engineer with strong interest into AI, Cloud Computing and applied
mathematics. I am passionate about ingesting, processing, exploring
and modeling Data to make it tell compelling stories.

1. Introduction
Each year, USA have:
● 1.6 M new cases of
cancer
● 600k deaths
because of cancer
Breast Cancer
Most common cancer.
Each year USA have:
● 250k new cases of
cancer
● 40k deaths
because of cancer
Cancer tamoxifen citrate
More than 60
drugs have been
approved by FDA.
Before Big Data
era
Initially thought to
have a success
rate of 80%
After Big Data era
Tamoxifen 100% effective on 80% of people
and 0% effective on the other 20%
Life-changing finding
Machine learning and
Data mining techniques
used to identify genetic
markers that tell us about
which treatment will be
best for which patient
How it works ? An opportunity
Thanks to:
● More and more Data
● Computing resources
● Sophisticated algos
Medicine along with many
other sectors has been
reshaped - new concepts
appeared (e.g. personalized
medicine)
Knowing who are the 5% for
which a drug will be
effective is life saving.

2. Big Data Timeline
1991, World
Wide Web
is born
1995,
- Sun releases
Java Platform
- GPS fully
operational
1998,
- Google is
founded
- NoSQL
1999,
term IoT
invented @
MIT
2001
2002,
BT v1.1 spec
is released
2004
2003 2005
2006
2008
The number of devices
connected to the Internet
exceeds the world’s
population
2012
Data Scientist-
sexiest job of
the 21st century

3. Traditional vs Machine Learning approach
ML is a field of study that
gives computers the ability
to learn without being
explicitly programmed.
—Arthur Samuel, 1959
More engineering oriented
A computer program is said
to learn from experience E
with respect to some task T
and some performance
measure P, if its
performance on T, as
measured by P, improves
with experience E. —Tom
Mitchell, 1997
Definition Spam filter example
Rule based approach:
1. Manually detect and patterns
in email’s name, body or
domain name.
2. Write hardcoded rules for
every pattern detected.
3. Repeat 1 and 2 until you’re
good to go.
Result:
Complex, non-exhaustive code, that
is hard to maintain.
Machine learning approach
1. Collect Spam Data
2. Train a ML model that learns
automatically unusually fre‐
quent patterns of words in the
spam examples compared to
the ham examples
3. Update training data and re-
train periodically

3. Traditional vs Machine Learning approach
ML is a field of study that
gives computers the ability
to learn without being
explicitly programmed.
—Arthur Samuel, 1959
More engineering oriented
A computer program is said
to learn from experience E
with respect to some task T
and some performance
measure P, if its
performance on T, as
measured by P, improves
with experience E. —Tom
Mitchell, 1997
Definition Spam filter example
Rule based approach: Machine learning approach

4. Machine Learning Algorithms and Use Cases
CNN
Convolutional Neural Nets
- Analyzing images of
products on a production line
to automatically classify them
- Detecting tumors in brain
scans
RNN
Recurrent Neural Nets
- Automatically classifying
news articles
- Automatically flagging
offensive comments on
discussion forums
- Chatbots
- Text translation
- TimeSeries Forecasting
Reinforcement Learning
Clustering
- Client segmentation
Recommender Sys
- Netflix
- Amazon
- Dating apps ...
- Robotics
- Intelligent Bots
Anomaly Detection
- Credit card fraud detection
- Cybersecurity
- Log messages anomaly detection

Supervised - unsupervised - semisupervised - Reinforcement Learning
Supervised - Classification
Supervised - Regression
Unsupervised - Clustering
Unsupervised - Dimensionality Reduction

Batch vs Online learning

7. Data Science Project Workflow
1. Data collection 2. Exploratory Data
Analysis
- Univariate
analysis
- Multivariate
analysis
3. Data Cleaning
- Handling missing data
- Handling outliers
- Handling categorical fields
4. Feature Engineering
5. Train test split
6. Model Selection
7. Hyperparameter Tuning
8. Model Deployment 9. MLOps

Big data, big opportunities

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Big data, big opportunities

Similar to Big data, big opportunities (20)

Recently uploaded

Recently uploaded (20)

Big data, big opportunities

Editor's Notes