ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Introduction
to Data Science
Dr Ahmed Rebai, Phd in nuclear physics
Dr Lotfi Ncib, PhD in applied maths
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Plan
▪ The Data Explosion
▪ The Data Hystory
▪ Why Data Science?
▪ What is Data Science
▪ Steps in The Data Science Process
▪ Career in Data Science
▪ Data Science Tools
1
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
The Data Explosion
How Much Data Is Collected Every Minute of The Day in 2019 ?
2
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
The Data History
Since the dawn of time… up until 2005
Humans had created 130 EXABYTES of Data
2005-130 EXABYTES
2010-1200 EXABYTES
2015-7900 EXABYTES
2020-40900 EXABYTES
Byte
Kilobyte(KB) 1.000=103
Megabyte(MB) 1.000.000=106
Gigabyte(GB) 1.000.000.000=109
Terabyte(TB) 1.000.000.000.000=1012
Petabyte(PB) 1.000.000.000.000.000=1015
Exabyte(XB) 1.000.000.000.000.000.000=1018
3
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
The Data History
A 1 BYTE of space
if we zoom out 1000 times we
will get a page of letter (1 kB)
about 500 characters
Now let zoom another 1000
times and we will get a book -
about 500 pages to take 1MB
Now lets zoom another times and
we will get 1GB(1 GB is sufficient
to fit all human genomes once
coded (Usually it takes 725MB)
If we zoom another 1000 times we will
get into TB(enough to fit some one’s life
recorded for 8 years(everything they do-
every minute or second
If we zoom another 1000 times we will get
into PB(Amazon rain forest is 1.4 Billion acres
about 500 tree per acre / 700 billion trees). If
you shup all these trees down and turn them
in to papers and fill the papers with letters
both side- close to 1PB.
If we zoom another 1000 times we
will get into XB(1000 TB)
4
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
The Data History
1 ZettaByte=1000 ExaByte
5
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Why Data Science?
Salary trends have followed the impact of data science. With a national
average salary of $118.000(which increase to $126.000 in Silicon Valley), data
science has become a lucrative career path where you can solve hard
problems and drive social impact.
Data scientist is the sexiest career of
the 21st century
Statistical Analysis and Data Mining wher the
hottest skills that got recruiter’s attention in
2014/2015/2016/2017/2018
The US alone faces a shortage of more than
150.000 data analyst and an additional 1.5
million data savy managers
6
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Why Data Science?
7
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
What is Data Science?
“The ability to take data — to be able to understand it, to process it, to extract value
from it, to visualize it, to communicate it — that’s going to be a hugely important
skill in the next decades.”
- Hal Varian, chief economist at Google and UC Berkeley professor of information
sciences, business, and economics
DATA SCIENCE is the area of study which involves extracting insights from vast
amounts of data by the use of various scientific methods, algorithms, and processes.
Data Science is the science wich uses computer science, statistics and machine
learning, visualization and human-computer interactions to collect, clean integrate,
analyze, visualize, interact with data to create data products,
8
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Steps in The Data Science Process
ACQUIRE PREPARE ACTREPORTANALYZE
Data Engineering Computational Data Science
9
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Steps in The Data Science Process
ACQUIRE
Step 1: Acquire Data
▪ Identify data sets
▪ Retrieve data
▪ Query data
10
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Step 2: Prepare Data
▪ Explore Data
➢ Understand nature of data
➢ Preliminary analysis
▪ Pre-process Data
➢Clean
➢Transform
PREPARE
Steps in The Data Science Process
11
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Step 3: Analyze Data
▪ Select analytical techniques
▪ Build modelsANALYZE
SPAM
Dimensionality Reduction Clustering
Regression
Classification
Steps in The Data Science Process
12
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
REPORT
Step 4: Communicate Results
▪ What to present
▪ How to present
Steps in The Data Science Process
13
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Steps in The Data Science Process
ACT
Step 5: Turning Insights into Action
▪ Results
▪ Purpose
14
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Career in Data Science
Domain Expertise
Programing
languages
Math/ statistic/
Probability
Lingo/Fondations Projects
❖ Health care
❖ Retail
❖ Finance
❖ Eduction
❖ …
❖ Python
❖ R
❖ C++
❖ Java
❖ Julia
❖ Scala
❖ …
❖ Sentiment analysis
❖ Card Fraud detection
❖ Customer
segmentation
❖ Image classification
❖ Loan default
detection
❖ …
❖ Machine Learning
❖ Deep Learning
❖ Classification
❖ Rgression
❖ Clustering
❖ Decision trees
❖ KNN
❖ SVM
❖ Kmeans
❖ PAC
❖ …
❖ Linear algebera
❖ Bayes theorem
❖ Mean, Median
and Mode
❖ Covariance and
correlation
❖ Central Limite
Theorem
❖ Normal
Distrubtion
❖ …
15
ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn
Data Science Tools
Visualization ToolsModeling Tools
16

Introduction to Data Science

  • 1.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Introduction to Data Science DrAhmed Rebai, Phd in nuclear physics Dr Lotfi Ncib, PhD in applied maths
  • 2.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Plan ▪ The DataExplosion ▪ The Data Hystory ▪ Why Data Science? ▪ What is Data Science ▪ Steps in The Data Science Process ▪ Career in Data Science ▪ Data Science Tools 1
  • 3.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn The Data Explosion HowMuch Data Is Collected Every Minute of The Day in 2019 ? 2
  • 4.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn The Data History Sincethe dawn of time… up until 2005 Humans had created 130 EXABYTES of Data 2005-130 EXABYTES 2010-1200 EXABYTES 2015-7900 EXABYTES 2020-40900 EXABYTES Byte Kilobyte(KB) 1.000=103 Megabyte(MB) 1.000.000=106 Gigabyte(GB) 1.000.000.000=109 Terabyte(TB) 1.000.000.000.000=1012 Petabyte(PB) 1.000.000.000.000.000=1015 Exabyte(XB) 1.000.000.000.000.000.000=1018 3
  • 5.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn The Data History A1 BYTE of space if we zoom out 1000 times we will get a page of letter (1 kB) about 500 characters Now let zoom another 1000 times and we will get a book - about 500 pages to take 1MB Now lets zoom another times and we will get 1GB(1 GB is sufficient to fit all human genomes once coded (Usually it takes 725MB) If we zoom another 1000 times we will get into TB(enough to fit some one’s life recorded for 8 years(everything they do- every minute or second If we zoom another 1000 times we will get into PB(Amazon rain forest is 1.4 Billion acres about 500 tree per acre / 700 billion trees). If you shup all these trees down and turn them in to papers and fill the papers with letters both side- close to 1PB. If we zoom another 1000 times we will get into XB(1000 TB) 4
  • 6.
  • 7.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Why Data Science? Salarytrends have followed the impact of data science. With a national average salary of $118.000(which increase to $126.000 in Silicon Valley), data science has become a lucrative career path where you can solve hard problems and drive social impact. Data scientist is the sexiest career of the 21st century Statistical Analysis and Data Mining wher the hottest skills that got recruiter’s attention in 2014/2015/2016/2017/2018 The US alone faces a shortage of more than 150.000 data analyst and an additional 1.5 million data savy managers 6
  • 8.
  • 9.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn What is DataScience? “The ability to take data — to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it — that’s going to be a hugely important skill in the next decades.” - Hal Varian, chief economist at Google and UC Berkeley professor of information sciences, business, and economics DATA SCIENCE is the area of study which involves extracting insights from vast amounts of data by the use of various scientific methods, algorithms, and processes. Data Science is the science wich uses computer science, statistics and machine learning, visualization and human-computer interactions to collect, clean integrate, analyze, visualize, interact with data to create data products, 8
  • 10.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Steps in TheData Science Process ACQUIRE PREPARE ACTREPORTANALYZE Data Engineering Computational Data Science 9
  • 11.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Steps in TheData Science Process ACQUIRE Step 1: Acquire Data ▪ Identify data sets ▪ Retrieve data ▪ Query data 10
  • 12.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Step 2: PrepareData ▪ Explore Data ➢ Understand nature of data ➢ Preliminary analysis ▪ Pre-process Data ➢Clean ➢Transform PREPARE Steps in The Data Science Process 11
  • 13.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Step 3: AnalyzeData ▪ Select analytical techniques ▪ Build modelsANALYZE SPAM Dimensionality Reduction Clustering Regression Classification Steps in The Data Science Process 12
  • 14.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn REPORT Step 4: CommunicateResults ▪ What to present ▪ How to present Steps in The Data Science Process 13
  • 15.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Steps in TheData Science Process ACT Step 5: Turning Insights into Action ▪ Results ▪ Purpose 14
  • 16.
    ahmed.rebai@esprit.tnLotfi.ncib@esprit.tn Career in DataScience Domain Expertise Programing languages Math/ statistic/ Probability Lingo/Fondations Projects ❖ Health care ❖ Retail ❖ Finance ❖ Eduction ❖ … ❖ Python ❖ R ❖ C++ ❖ Java ❖ Julia ❖ Scala ❖ … ❖ Sentiment analysis ❖ Card Fraud detection ❖ Customer segmentation ❖ Image classification ❖ Loan default detection ❖ … ❖ Machine Learning ❖ Deep Learning ❖ Classification ❖ Rgression ❖ Clustering ❖ Decision trees ❖ KNN ❖ SVM ❖ Kmeans ❖ PAC ❖ … ❖ Linear algebera ❖ Bayes theorem ❖ Mean, Median and Mode ❖ Covariance and correlation ❖ Central Limite Theorem ❖ Normal Distrubtion ❖ … 15
  • 17.