An
Introduction
to DATA
SCIENCE
FYBSC
SYLLABUS
Introduction to Data Science
types of data
Evolution of Data Science
Data Science Roles
Stages in a Data Science Project
Applications of Data Science in various fields – Data Security Issue Data Collection Strategies – Data
PreProcessing Overview
Data v/s Information
● Data - > always in raw form ; storage is in the form of 0’s and 1’s
â—Ź Information -> Processed form of data.
Process
A 66 66 99
B 55 88 98
C 66 87 89
Roll no mks1 mks2 Mks3
A 66 66 99
b 55 88 98
Data v/s Information
Data Information
Meaning
Method of Collection
Format of collection
Consists of
Can we take a decision?
Dependency??
Based on
Examples…
Data v/s Information
Data Information
Meaning Raw facts Processed fact
Method of Collection Random collection Specific collection
Format of collection Unorganized form of
collection
Systematic form of processed
data
Consists of Text and numbers Refined form of data
Can we take a decision? Decision making process is
difficult
Easy to take decision
Dependency?? Data is not depend on
information
Information is dependent on data
Based on Records and observation Analysis
Examples…
Data Shape -> how data is
represented in business and
storage form
Types of data
Further classification of data
â—Ź Demographic data (this customer is a woman, 35 years old, has two children, etc.).
â—Ź Transactional data (the products she buys each time, the time of purchases, etc.)
â—Ź Web behaviour data (the products she puts into her basket when she shops online).
â—Ź Data from customer-created texts (comments about the retailer that this woman
leaves on the internet).
DBMS… way of data extraction
Problems faced by current DBMS
â—Ź large quantities of data is generated /processed.
â—Ź data may get doubled in every say 3 months.
â—Ź Seeking knowledge from this massive data is most required.
â—Ź Fast developing in computer science and engineering techniques generates new
demands.
â—Ź To fulfill those demands we require to analyze the data
â—Ź Data Rich , Information Poor.. Raw data by itself does not provide much
â—Ź information.
â—Ź In today's life we require only significant data from which we can judge the
customer’s likings and strategies.
What is data mining?
Data Mining is….
â—Ź Data mining is a powerful tool with great potential.
â—Ź Focus on the most important information in data
â—Ź Gives detail information about their potential customer and their behavior.
â—Ź Extraction of useful information.
â—Ź Finding useful valid and understandable data or patterns in a data.
â—Ź It is also defined as finding hidden information in a database
Why Big Data
15
â—Ź Old Model: Few companies are
generating data, all others are
consuming data
â—Ź New Model: all of us are
generating data, and all of us
are consuming
Why Big Data??
16
examples of big data and ML
Customer
analytics
Demographic
data
Transactional
data
Web behavior
data
Data from
customer-
created texts
Industrial
analytics
sensor data
machine
breakdown
Business
process
analytics(Ola)
Performance
of employees
Patterns
fraud
detection
Real time
Location
mapping
Behaviour
patterns
Data sources..
What is Data Science?
â—Ź various tools, algorithms, and
machine learning principles
â—Ź involves obtaining meaningful
information
â—Ź Involves elements like
mathematics, statistics,
computer science
How Data Science Works?
Problem Statement
Data Collection
Optimization and Deployment:
Data Analysis and Exploration
Data Modelling
Data Cleaning
The Data Science Lifecycle
â—Ź Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage
involves gathering raw structured and unstructured data.
â—Ź Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data
Architecture. This stage covers taking the raw data and putting it in a form that can be
used.
â—Ź Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization.
Data scientists take the prepared data and examine its patterns, ranges, and biases to
determine how useful it will be in predictive analysis.
â—Ź Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining,
Qualitative Analysis. Here is the real meat of the lifecycle. This stage involves performing
the various analyses on the data.
â—Ź Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision
Making. In this final step, analysts prepare the analyses in easily readable forms such as
charts, graphs, and reports.
What is Data
Science?
Data science is an
interdisciplinary field that uses
algorithms, procedures, and
processes to examine large
amounts of data in order to
uncover hidden patterns,
generate insights, and direct
decision making.
Importance
of Data
Science
01
Career Opportunities
"The rise of Data Science needs will create roughly 11.5 million job openings by
2026" US Bureau of Labour Statistics
"By 2026, Data Scientists and Analysts will become the number one emerging
role in the world." World Economic Forum
Data Science and Artificial Intelligence are amongst the hottest fields of the
21st century that will impactall segments of daily life by 2025, from transport
and logistics to healthcare and customer service.
Examples:
Oil giant Shell, for instance, used data science to anticipate machine failure at
facilities across the world.
Agricultural company Cargill developed a mobile data-tracking app that helps
shrimp farmers reduce mortality rates.
Dr. Pepper Snapple Group analyzed data with machine learning to glean more
details about beverage sales and vendors.
And freight company Pitt Ohio used historical data and predictive analytics to
estimate delivery time with 99 percent accuracy.
Facts on Data Generation
Facts on Data Genaration
Statistics show that more than 500 terabytes of new data are
entered into the databases of the social networking site
Facebook every day.
• A single Jet engine can generate over 10 terabytes of data in
30 minutes of flight time. With several thousand flights per
day, data generation reaches several petabytes.
• Stock Exchange is also an example of big data that generates
about a terabyte of new trade data per day
How does
Data Science
Work?
02
Collect Data
Raw data is gathered
from various sources that
explain the business
problem
Using various statistical
analysis, and machine
learning approaches,
data modeling is
performed to get the
optimum solutions that
best explain the business
problem.
Actionable insights that
will serve as a solution
for the business
problems gathered
through data science.
How does Data Science Work?
Analyze Data Insights
Collect Data
Gather the previous data
on the sales that were
closed.
Use statistical analysis to
find out the patterns that
were followed by the
leads that were closed.
Use machine learning to
get actionable insights
for finding out potential
leads.
Consider an Example!
Analyze Data Insights
Suppose there is an organization that is working
towards finding out potential leads for their sales
team. They can follow the following approach to
get an optimal solution using Data Science:
Lets check relationship between AI
and Data Science
“In above example we saw machine
learning is required for insights”
AI and Data
Science
03
Data science and artificial intelligence are not
the same.
“Data science and artificial intelligence are two technologies
that are transforming the world. While artificial intelligence powers
data science operations, data science is not completely dependent on
AI. Data Science is leading the fourth industrial revolution. ”
Data science also requires machine learning algori
thms
, which results in dependency on AI.
Comparison Between AI and
Data Science
• Data science jobs require the knowledge of ML languages
like R and Python to perform various data operations and
computer science expertise.
• Data science uses more tools apart from AI. This is
because data science involves multiples steps to analyze
data and generate insights.
• Data science models are built for statistical insights
whereas AI is used to build models that mimic cognition
and human understanding.
Comparison Between AI and
Data Science
• Today’s industries require both, data science and
artificial intelligence. Data science will help them
make necessary data-driven decisions and assess
their performance in the market, while artificial
intelligence will help industries work with
smarter devices and software that will minimize
workload and optimize all the processes for
improves innovation.
Comparison Between AI and
Data Science
Class Activity 1
â—Ź Justify the role of data scientist.
â—Ź What is the Prerequisites for Data Science
● How one can observe different types of data in “Identifying a particular type of
disease’
â—Ź What are the responsibilities of Data Scientist , Data Analyst , Data Engineers .

introduction TO DS 1.pptxvbvcbvcbvcbvcbvcb

  • 1.
  • 2.
    SYLLABUS Introduction to DataScience types of data Evolution of Data Science Data Science Roles Stages in a Data Science Project Applications of Data Science in various fields – Data Security Issue Data Collection Strategies – Data PreProcessing Overview
  • 3.
    Data v/s Information ●Data - > always in raw form ; storage is in the form of 0’s and 1’s ● Information -> Processed form of data. Process A 66 66 99 B 55 88 98 C 66 87 89 Roll no mks1 mks2 Mks3 A 66 66 99 b 55 88 98
  • 4.
    Data v/s Information DataInformation Meaning Method of Collection Format of collection Consists of Can we take a decision? Dependency?? Based on Examples…
  • 5.
    Data v/s Information DataInformation Meaning Raw facts Processed fact Method of Collection Random collection Specific collection Format of collection Unorganized form of collection Systematic form of processed data Consists of Text and numbers Refined form of data Can we take a decision? Decision making process is difficult Easy to take decision Dependency?? Data is not depend on information Information is dependent on data Based on Records and observation Analysis Examples…
  • 7.
    Data Shape ->how data is represented in business and storage form
  • 9.
  • 10.
    Further classification ofdata â—Ź Demographic data (this customer is a woman, 35 years old, has two children, etc.). â—Ź Transactional data (the products she buys each time, the time of purchases, etc.) â—Ź Web behaviour data (the products she puts into her basket when she shops online). â—Ź Data from customer-created texts (comments about the retailer that this woman leaves on the internet).
  • 11.
    DBMS… way ofdata extraction
  • 12.
    Problems faced bycurrent DBMS ● large quantities of data is generated /processed. ● data may get doubled in every say 3 months. ● Seeking knowledge from this massive data is most required. ● Fast developing in computer science and engineering techniques generates new demands. ● To fulfill those demands we require to analyze the data ● Data Rich , Information Poor.. Raw data by itself does not provide much ● information. ● In today's life we require only significant data from which we can judge the customer’s likings and strategies.
  • 13.
    What is datamining?
  • 14.
    Data Mining is…. ●Data mining is a powerful tool with great potential. ● Focus on the most important information in data ● Gives detail information about their potential customer and their behavior. ● Extraction of useful information. ● Finding useful valid and understandable data or patterns in a data. ● It is also defined as finding hidden information in a database
  • 15.
    Why Big Data 15 â—ŹOld Model: Few companies are generating data, all others are consuming data â—Ź New Model: all of us are generating data, and all of us are consuming
  • 16.
  • 17.
    examples of bigdata and ML Customer analytics Demographic data Transactional data Web behavior data Data from customer- created texts Industrial analytics sensor data machine breakdown Business process analytics(Ola) Performance of employees Patterns fraud detection Real time Location mapping Behaviour patterns
  • 18.
  • 19.
    What is DataScience? â—Ź various tools, algorithms, and machine learning principles â—Ź involves obtaining meaningful information â—Ź Involves elements like mathematics, statistics, computer science How Data Science Works? Problem Statement Data Collection Optimization and Deployment: Data Analysis and Exploration Data Modelling Data Cleaning
  • 20.
    The Data ScienceLifecycle â—Ź Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction. This stage involves gathering raw structured and unstructured data. â—Ź Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing, Data Architecture. This stage covers taking the raw data and putting it in a form that can be used. â—Ź Process: Data Mining, Clustering/Classification, Data Modeling, Data Summarization. Data scientists take the prepared data and examine its patterns, ranges, and biases to determine how useful it will be in predictive analysis. â—Ź Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text Mining, Qualitative Analysis. Here is the real meat of the lifecycle. This stage involves performing the various analyses on the data. â—Ź Communicate: Data Reporting, Data Visualization, Business Intelligence, Decision Making. In this final step, analysts prepare the analyses in easily readable forms such as charts, graphs, and reports.
  • 21.
    What is Data Science? Datascience is an interdisciplinary field that uses algorithms, procedures, and processes to examine large amounts of data in order to uncover hidden patterns, generate insights, and direct decision making.
  • 22.
  • 23.
    Career Opportunities "The riseof Data Science needs will create roughly 11.5 million job openings by 2026" US Bureau of Labour Statistics "By 2026, Data Scientists and Analysts will become the number one emerging role in the world." World Economic Forum Data Science and Artificial Intelligence are amongst the hottest fields of the 21st century that will impactall segments of daily life by 2025, from transport and logistics to healthcare and customer service.
  • 38.
    Examples: Oil giant Shell,for instance, used data science to anticipate machine failure at facilities across the world. Agricultural company Cargill developed a mobile data-tracking app that helps shrimp farmers reduce mortality rates. Dr. Pepper Snapple Group analyzed data with machine learning to glean more details about beverage sales and vendors. And freight company Pitt Ohio used historical data and predictive analytics to estimate delivery time with 99 percent accuracy.
  • 39.
    Facts on DataGeneration
  • 40.
    Facts on DataGenaration Statistics show that more than 500 terabytes of new data are entered into the databases of the social networking site Facebook every day. • A single Jet engine can generate over 10 terabytes of data in 30 minutes of flight time. With several thousand flights per day, data generation reaches several petabytes. • Stock Exchange is also an example of big data that generates about a terabyte of new trade data per day
  • 41.
  • 42.
    Collect Data Raw datais gathered from various sources that explain the business problem Using various statistical analysis, and machine learning approaches, data modeling is performed to get the optimum solutions that best explain the business problem. Actionable insights that will serve as a solution for the business problems gathered through data science. How does Data Science Work? Analyze Data Insights
  • 43.
    Collect Data Gather theprevious data on the sales that were closed. Use statistical analysis to find out the patterns that were followed by the leads that were closed. Use machine learning to get actionable insights for finding out potential leads. Consider an Example! Analyze Data Insights Suppose there is an organization that is working towards finding out potential leads for their sales team. They can follow the following approach to get an optimal solution using Data Science:
  • 44.
    Lets check relationshipbetween AI and Data Science “In above example we saw machine learning is required for insights”
  • 45.
  • 46.
    Data science andartificial intelligence are not the same. “Data science and artificial intelligence are two technologies that are transforming the world. While artificial intelligence powers data science operations, data science is not completely dependent on AI. Data Science is leading the fourth industrial revolution. ”
  • 47.
    Data science alsorequires machine learning algori thms , which results in dependency on AI.
  • 48.
    Comparison Between AIand Data Science • Data science jobs require the knowledge of ML languages like R and Python to perform various data operations and computer science expertise. • Data science uses more tools apart from AI. This is because data science involves multiples steps to analyze data and generate insights. • Data science models are built for statistical insights whereas AI is used to build models that mimic cognition and human understanding.
  • 49.
    Comparison Between AIand Data Science • Today’s industries require both, data science and artificial intelligence. Data science will help them make necessary data-driven decisions and assess their performance in the market, while artificial intelligence will help industries work with smarter devices and software that will minimize workload and optimize all the processes for improves innovation.
  • 50.
    Comparison Between AIand Data Science
  • 51.
    Class Activity 1 ●Justify the role of data scientist. ● What is the Prerequisites for Data Science ● How one can observe different types of data in “Identifying a particular type of disease’ ● What are the responsibilities of Data Scientist , Data Analyst , Data Engineers .