1. INTRODUCTION TO DATA
SCIENCE
CHAPTER 1
“Introduction to Data Science : Practical Approach with R and Python ”
B.Uma Maheswari and R Sujatha
Copyright @ 2021 Wiley India Pvt. Ltd. All rights reserved.
2. LEARNING OBJECTIVES
•Understand the concept of data science
•Briefly learn the history of data science
•Learn about the fundamental fields related to data science
•Understand the different terminologies related to data science like big
data,
•Business intelligence, data mining, artificial intelligence, machine learning
and deep learning.
•Learn about the different types of analytics- descriptive, diagnostic,
predictive and prescriptive
•Learn briefly about the applications of data science.
•Comprehend the data science process model
3. DATA SCIENCE
Data Science is the science
of understanding data
using processes, tools and
techniques which aid in
decision making. It
involves techniques for
identifying, collecting and
exploring the data using
colorful plots and graphs
4. HISTORY OF DATA SCIENCE
John W.Tukey, a mathematician in his article “The Future of Data
Analysis”.
John Chambers, Consulting Professor, Stanford University. The S
system is the basis for all the future statistical programming
languages including the R language which will be discussed in this
book
Jeff Wu, Coco - Cola chair in Engineering Statistics and Professor at
Georgia Tech coined the term “Data Science” in 1997
William Cleveland , Distinguished Professor of Statistics and Professor
of Computer Science at Purdue University authored many books on
data visualization
Leo Breiman, distinguished statistician at the University of California,
Berkeley was one of the pioneers in ‘machine learning.
5. WHY IS DATA SCIENCE RECEIVING
SO MUCH ATTENTION
•Increasing usage of internet which has generated more data.
•Growing usage of smart phones, tablets and digital devices
•Increasing usage of social media
•Increasing computational capability with both hardware and software
becoming powerful by the day.
•Programming languages to work with such data are freely available through
open source platforms.
•Programmers across the world are creating complex algorithms and
contributing to the open source developers’ community.
•Easy and speedy access to such data for every individual or organization
irrespective of the size of the concern.
•Storage of data becoming cheaper.
6. DATA, DATA AND MORE DATA
Every minute on the internet,
Zoom hosts 2,08,333 participants in
meetings
Netflix users stream 4,00,444 hours
of video
Instagram users post 3,47,222 stories.
YouTube users upload 500 hours of
video
Twitter gains 319 new users
Facebook users share 1,50,000
messages
Linkedin users apply for 69,444 jobs
Amazon ships 6,659 packages
Whatsapp users share 4,16,66,667
According to the data captured by the cloud
software company Domo, as on April 2020,
internet has reached 59% of the world
population.
7. FUNDAMENTAL FIELDS OF STUDY
RELATING TO DATA SCIENCE
Data
Science
Computer
Science
Mathematics Statistics
Domain
Knowledge
9. Business Intelligence
Business Intelligence (BI) involves gathering, pre-processing and most importantly
presenting such data using data-visualization tools and techniques through charts, plots,
tables and dashboards
10. DATA MINING
Data mining is the technology used for processing large
volume of data
Generate inferences from data such as
Identifying trends in stock prices
Categorizing customers on the basis of their preferences
Ascertaining the purchasing patterns of customers
Predicting student performance in an educational institution
Lie detection in dealing with criminals etc.
Applications of data mining can be seen in the field of
agriculture, education, industrial engineering, marketing,
healthcare etc.
11. ARTIFICIAL INTELLIGENCE-
MACHINE LEARNING-DEEP
LEARNING
Artificial Intelligence:AI is the design of smart
machines or algorithms which can perform functions
or tasks that generally requires human intelligence
Machine Learning:Machine Learning (ML) is a subset
of artificial intelligence which refers to the modelling
techniques, where the model learns on its own without
human intervention.
Deep Learning:Deep learning is a part of machine
learning which works more effectively on larger
datasets and aims at pattern recognition by imitating
the human brain.
12. TYPES OF ANALYTICS
• Descriptive
Analytics
What has
happened ?
• Diagnostic
Analytics
Why did it
happen?
• Predictive
Analytics
What will
happen ?
• Prescriptive
Analytics
What should
we do ?
13. DATA SCIENCE PROCESS MODEL
Objective
The project
objective needs to
be identified
Data collection
Collate the data
from the different
sources
Exploratory
Data analysis
(Chapter 3)
Data
visualization
(Chapter 4)
Dimensionality
reduction
(Chapter 5)
Model
building
(Chapter 7-14)