SlideShare a Scribd company logo
CS3352 -Foundations of Data
Science
Introduction to Data Science
Dr.V.Anusuya
Associate Professor/IT
Ramco Institute of Technology
Rajapalayam
Introduction
• Data Science is the area of study which involves extracting
insights from vast amounts of data using various scientific
methods, algorithms and processes.
• Data Science is useful to discover hidden patterns from the
voluminous raw data.
• Data Science is because of the evolution of mathematical
statistics, data analysis, and big data.
Contd.,
• Ingredients = Data
• When a chef is starting out with a new dish.
• After the chef has determined what type of dish he would like to
make, he goes to the fridge to gather ingredients. If he doesn’t
have the necessary ingredients, he takes a trip to the store to
collect them.
• As data scientists, our data points are our ingredients. We may
have some on hand, but we still might have to collect more
through web scraping, SQL queries, etc.
Big Data
• Data Science focuses on processing the huge
volume of heterogeneous data known as big
data.
• Big data refers to the data sets that are large
and complex in nature and difficult to process
using traditional data processing application
software.
Contd.,
• The characteristics of big data are often referred
to as the three Vs:
• Volume—How much data is there?
• Variety—How diverse are different types of
data?
• Velocity—At what speed is new data generated?
• Veracity-How accurate is the data?.
• These four properties make big data different
from the data found in traditional data
management tools.
Contd.,
• The four V’s of data make the data
capturing,cleaning,preprocessing,storing,search
ing,sharing,transferring and visualization
processes a complex task.
• Specialization techniques are required to extract
the insights from this huge volume of data.
• The main things that set a data scientist apart
from a statistician are the ability to work with
big data and experience in machine learning,
computing, and algorithm building.
Big data
• Tools- Hadoop, Pig, Spark, R, Python, and Java.
• Python is a great language for data science
because it has many data science libraries
available, and it’s widely supported by
specialized software.
Contd.,
• Data Science is an interdisciplinary field that
allows to extract knowledge from structured
or unstructured data.
• Translate business problem and Research
project into practical solution.
Benefits and uses of data science
• Data science and big data are used almost everywhere
in both commercial and noncommercial Settings.
• Commercial companies in almost every industry use
data science and big data to gain insights into their
customers, processes, staff, completion, and products.
• Many companies use data science to offer customers a
better user experience, as well as to cross-sell, up-sell,
and personalize their offerings.
Benefits and uses of data science
• Governmental organizations are also aware of data’s value. Many
governmental organizations not only rely on internal data scientists to
discover valuable information, but also share their data with the
public.
• Nongovernmental organizations (NGOs) use it to raise money and
defend their causes.
• Universities use data science in their research but also to enhance the
study experience of their students. The rise of massive open online
courses (MOOC) produces a lot of data, which allows universities to
study how this type of learning can complement traditional classes.
Facets of data
• In data science and big data we’ll come across
many different types of data, and each of them
tends to require different tools and techniques. The
main categories of data are these:
• Structured
• Unstructured
• Natural language
• Machine-generated
• Graph-based
• Audio, video, and images
• Streaming
Structured Data
•Structured data is data that depends on a
data model and resides in a fixed field within
a record.
•To store structured data in tables within
databases or Excel files.
•SQL, or Structured Query Language, is the
preferred way to manage and query data
that resides in databases.
•Hierarchical data such as a family tree.
Example
Unstructured Data
• Unstructured data is data that isn’t easy to fit
into a data model because the content is
context-specific or varying.
• Example of unstructured data is your regular
email.
• Although email contains structured elements
such as the sender, title, and body text.
• The thousands of different languages and
dialects out there further complicate this.
Example
Natural language
• Natural language is a special type of unstructured data; it’s challenging to
process because it requires knowledge of specific data science techniques
and linguistics.
• The natural language processing community has had success in entity
recognition, Speech recognition, summarization, text completion, and
sentiment analysis, but models trained in one domain don’t generalize well to
other domains.
• Even state-of-the-art techniques aren’t able to decipher the meaning of every
piece of text.
• Example: personal voice Assistant, siri, Alexa recognize patterns in speech,
provide a useful response.
• Example: Email Filters- Primary, Social ,Promotions
Machine-generated data
• Machine-generated data is information that’s
automatically created by a computer, process,
application, or other machine without human
intervention.
• Machine-generated data is becoming a major
data resource and will continue to do so.
• The analysis of machine data relies on highly
scalable tools, due to its high volume and
speed. Examples of machine data are web
server logs, call detail records, network event
logs, and telemetry.
Graph-based or Networked Data
• A graph is a mathematical structure to model
pair-wise relationships between objects.
• Graph or network data is, in short, data that
focuses on the relationship or adjacency of
objects.
• The graph structures use nodes, edges, and
properties to represent and store graphical data.
Contd.,
• Graph-based data is a natural way to represent
social networks, and its structure allows you to
calculate specific metrics such as the influence
of a person and the shortest path between two
people.
• Example
• LinkedIn-business colleagues, Twitter-
follower list.
• Facebook-connecting edges here to show
friends.
Graph based structure
Specialized query languages such as SPARQL
Audio, image, and video
• Audio, image, and video are data types that
pose specific challenges to a data scientist.
• Tasks that are trivial for humans, such as
recognizing objects in pictures, to be challenging
for computers.
• MLBAM (Major League Baseball Advanced
Media) announced in 2014 that they’ll increase
video capture to approximately 7 TB per game
for the purpose of live, in-game analytics.
Contd.,
• Recently a company called DeepMind
succeeded at creating an algorithm that’s
capable of learning how to play video games.
• This algorithm takes the video screen as input
and learns to interpret everything via a
complex of deep learning.
Streaming Data
• The data flows into the system when an event
happens instead of being loaded into a data
store in a batch.
• Examples are the “What’s trending” on
Twitter, live sporting or music events, and the
stock market.

More Related Content

What's hot

Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptx
WidsoulDevil
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
Murassa Gillani
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
Yugal Kumar
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
koolkampus
 
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptxPROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
ShantanuDharekar
 
Introduction to data structures (ss)
Introduction to data structures (ss)Introduction to data structures (ss)
Introduction to data structures (ss)
Madishetty Prathibha
 
R Programming: Introduction to Matrices
R Programming: Introduction to MatricesR Programming: Introduction to Matrices
R Programming: Introduction to Matrices
Rsquared Academy
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
kunj desai
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
Anusuya123
 
Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
Dabbal Singh Mahara
 
Data Preprocessing
Data PreprocessingData Preprocessing
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
Radhika Kotecha
 
Social Impacts & Trends of Data Mining
Social Impacts & Trends of Data MiningSocial Impacts & Trends of Data Mining
Social Impacts & Trends of Data Mining
SushilDhakal4
 
Chapter 5 Database Transaction Management
Chapter 5 Database Transaction ManagementChapter 5 Database Transaction Management
Chapter 5 Database Transaction Management
Eddyzulham Mahluzydde
 
Data structures
Data structuresData structures
Data structures
Naresh Babu Merugu
 
DBMS
DBMSDBMS
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
Muhammad Muzammal
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization
Uma mohan
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptx
AbdullahAbbasi55
 

What's hot (20)

Data Science Process.pptx
Data Science Process.pptxData Science Process.pptx
Data Science Process.pptx
 
Types of Database Models
Types of Database ModelsTypes of Database Models
Types of Database Models
 
Data mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updatedData mining and data warehouse lab manual updated
Data mining and data warehouse lab manual updated
 
14. Query Optimization in DBMS
14. Query Optimization in DBMS14. Query Optimization in DBMS
14. Query Optimization in DBMS
 
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptxPROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
PROCEDURAL AND DECLARATIVE KNOWLEDGE IN AI & ML (1).pptx
 
Introduction to data structures (ss)
Introduction to data structures (ss)Introduction to data structures (ss)
Introduction to data structures (ss)
 
R Programming: Introduction to Matrices
R Programming: Introduction to MatricesR Programming: Introduction to Matrices
R Programming: Introduction to Matrices
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
 
Unit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptxUnit 1-Data Science Process Overview.pptx
Unit 1-Data Science Process Overview.pptx
 
Odbms concepts
Odbms conceptsOdbms concepts
Odbms concepts
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Data warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika KotechaData warehousing - Dr. Radhika Kotecha
Data warehousing - Dr. Radhika Kotecha
 
Social Impacts & Trends of Data Mining
Social Impacts & Trends of Data MiningSocial Impacts & Trends of Data Mining
Social Impacts & Trends of Data Mining
 
Chapter 5 Database Transaction Management
Chapter 5 Database Transaction ManagementChapter 5 Database Transaction Management
Chapter 5 Database Transaction Management
 
Data structures
Data structuresData structures
Data structures
 
DBMS
DBMSDBMS
DBMS
 
Complexity of Algorithm
Complexity of AlgorithmComplexity of Algorithm
Complexity of Algorithm
 
Data Structure - Elementary Data Organization
Data Structure - Elementary  Data Organization Data Structure - Elementary  Data Organization
Data Structure - Elementary Data Organization
 
DATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptxDATA WRANGLING presentation.pptx
DATA WRANGLING presentation.pptx
 

Similar to Introduction to Data Science.pptx

Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
derbew2112
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Deepika ParthaSarathy
 
Digital data
Digital dataDigital data
Digital data
ShivanandaVSeeri
 
Digital Types
Digital TypesDigital Types
Digital Types
ShivanandaVSeeri
 
Big data
Big dataBig data
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptx
ssuser96aab9
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
Loïc Lejoly
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptx
sa3302
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
Professor Lili Saghafi
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
Dr Pradhan PL Pradhan
 
BIg Data Overview
BIg Data OverviewBIg Data Overview
BIg Data Overview
dimantoku
 
semana1.pptx
semana1.pptxsemana1.pptx
semana1.pptx
AidaVivancoLuna1
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
Elvis Muyanja
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
ngVnThng12
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Umair Shafique
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
subhashchandra197
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma
 

Similar to Introduction to Data Science.pptx (20)

Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Big data
Big dataBig data
Big data
 
big data processing.pptx
big data processing.pptxbig data processing.pptx
big data processing.pptx
 
Behind the scenes of data science
Behind the scenes of data scienceBehind the scenes of data science
Behind the scenes of data science
 
Data Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptxData Science ppt for the asjdbhsadbmsnc.pptx
Data Science ppt for the asjdbhsadbmsnc.pptx
 
Data Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili SaghafiData Scientist By: Professor Lili Saghafi
Data Scientist By: Professor Lili Saghafi
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
BIg Data Overview
BIg Data OverviewBIg Data Overview
BIg Data Overview
 
semana1.pptx
semana1.pptxsemana1.pptx
semana1.pptx
 
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
DATA SCIENCE IS CATALYZING BUSINESS AND INNOVATION
 
01-Introduction.pdf
01-Introduction.pdf01-Introduction.pdf
01-Introduction.pdf
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Unit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptxUnit-I- Introduction- Traits of Big Data-Final.pptx
Unit-I- Introduction- Traits of Big Data-Final.pptx
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 

More from Anusuya123

Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
Anusuya123
 
Types of Data-Introduction.pptx
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptx
Anusuya123
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
Anusuya123
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
Anusuya123
 
5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx
Anusuya123
 
5.1.3. Chord.pptx
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptx
Anusuya123
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
Anusuya123
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
Anusuya123
 
5.Collective bargaining.pptx
5.Collective bargaining.pptx5.Collective bargaining.pptx
5.Collective bargaining.pptx
Anusuya123
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
Anusuya123
 
Think pair share
Think pair shareThink pair share
Think pair share
Anusuya123
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
Anusuya123
 
Operators in Python
Operators in PythonOperators in Python
Operators in Python
Anusuya123
 

More from Anusuya123 (13)

Unit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptxUnit-III Correlation and Regression.pptx
Unit-III Correlation and Regression.pptx
 
Types of Data-Introduction.pptx
Types of Data-Introduction.pptxTypes of Data-Introduction.pptx
Types of Data-Introduction.pptx
 
Basic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptxBasic Statistical Descriptions of Data.pptx
Basic Statistical Descriptions of Data.pptx
 
Data warehousing.pptx
Data warehousing.pptxData warehousing.pptx
Data warehousing.pptx
 
5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx5.2.2. Memory Consistency Models.pptx
5.2.2. Memory Consistency Models.pptx
 
5.1.3. Chord.pptx
5.1.3. Chord.pptx5.1.3. Chord.pptx
5.1.3. Chord.pptx
 
3. Descriptive statistics.ppt
3. Descriptive statistics.ppt3. Descriptive statistics.ppt
3. Descriptive statistics.ppt
 
1. Intro DS.pptx
1. Intro DS.pptx1. Intro DS.pptx
1. Intro DS.pptx
 
5.Collective bargaining.pptx
5.Collective bargaining.pptx5.Collective bargaining.pptx
5.Collective bargaining.pptx
 
Runtimeenvironment
RuntimeenvironmentRuntimeenvironment
Runtimeenvironment
 
Think pair share
Think pair shareThink pair share
Think pair share
 
Lexical analyzer generator lex
Lexical analyzer generator lexLexical analyzer generator lex
Lexical analyzer generator lex
 
Operators in Python
Operators in PythonOperators in Python
Operators in Python
 

Recently uploaded

PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
anoopmanoharan2
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
Mukeshwaran Balu
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
PauloRodrigues104553
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
ihlasbinance2003
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
nooriasukmaningtyas
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
PuktoonEngr
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
nooriasukmaningtyas
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
Victor Morales
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
IJNSA Journal
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
MDSABBIROJJAMANPAYEL
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
insn4465
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
IJECEIAES
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
awadeshbabu
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
heavyhaig
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
ssuser36d3051
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
wisnuprabawa3
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
SUTEJAS
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
KrishnaveniKrishnara1
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
gerogepatton
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
mamunhossenbd75
 

Recently uploaded (20)

PPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testingPPT on GRP pipes manufacturing and testing
PPT on GRP pipes manufacturing and testing
 
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
ACRP 4-09 Risk Assessment Method to Support Modification of Airfield Separat...
 
Series of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.pptSeries of visio cisco devices Cisco_Icons.ppt
Series of visio cisco devices Cisco_Icons.ppt
 
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
5214-1693458878915-Unit 6 2023 to 2024 academic year assignment (AutoRecovere...
 
A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...A review on techniques and modelling methodologies used for checking electrom...
A review on techniques and modelling methodologies used for checking electrom...
 
2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt2. Operations Strategy in a Global Environment.ppt
2. Operations Strategy in a Global Environment.ppt
 
Low power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniquesLow power architecture of logic gates using adiabatic techniques
Low power architecture of logic gates using adiabatic techniques
 
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressionsKuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
KuberTENes Birthday Bash Guadalajara - K8sGPT first impressions
 
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMSA SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
A SYSTEMATIC RISK ASSESSMENT APPROACH FOR SECURING THE SMART IRRIGATION SYSTEMS
 
Properties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptxProperties Railway Sleepers and Test.pptx
Properties Railway Sleepers and Test.pptx
 
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
哪里办理(csu毕业证书)查尔斯特大学毕业证硕士学历原版一模一样
 
Embedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoringEmbedded machine learning-based road conditions and driving behavior monitoring
Embedded machine learning-based road conditions and driving behavior monitoring
 
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
[JPP-1] - (JEE 3.0) - Kinematics 1D - 14th May..pdf
 
Technical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prismsTechnical Drawings introduction to drawing of prisms
Technical Drawings introduction to drawing of prisms
 
sieving analysis and results interpretation
sieving analysis and results interpretationsieving analysis and results interpretation
sieving analysis and results interpretation
 
New techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdfNew techniques for characterising damage in rock slopes.pdf
New techniques for characterising damage in rock slopes.pdf
 
Understanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine LearningUnderstanding Inductive Bias in Machine Learning
Understanding Inductive Bias in Machine Learning
 
22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt22CYT12-Unit-V-E Waste and its Management.ppt
22CYT12-Unit-V-E Waste and its Management.ppt
 
International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...International Conference on NLP, Artificial Intelligence, Machine Learning an...
International Conference on NLP, Artificial Intelligence, Machine Learning an...
 
Heat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation pptHeat Resistant Concrete Presentation ppt
Heat Resistant Concrete Presentation ppt
 

Introduction to Data Science.pptx

  • 1. CS3352 -Foundations of Data Science Introduction to Data Science Dr.V.Anusuya Associate Professor/IT Ramco Institute of Technology Rajapalayam
  • 2. Introduction • Data Science is the area of study which involves extracting insights from vast amounts of data using various scientific methods, algorithms and processes. • Data Science is useful to discover hidden patterns from the voluminous raw data. • Data Science is because of the evolution of mathematical statistics, data analysis, and big data.
  • 3. Contd., • Ingredients = Data • When a chef is starting out with a new dish. • After the chef has determined what type of dish he would like to make, he goes to the fridge to gather ingredients. If he doesn’t have the necessary ingredients, he takes a trip to the store to collect them. • As data scientists, our data points are our ingredients. We may have some on hand, but we still might have to collect more through web scraping, SQL queries, etc.
  • 4. Big Data • Data Science focuses on processing the huge volume of heterogeneous data known as big data. • Big data refers to the data sets that are large and complex in nature and difficult to process using traditional data processing application software.
  • 5. Contd., • The characteristics of big data are often referred to as the three Vs: • Volume—How much data is there? • Variety—How diverse are different types of data? • Velocity—At what speed is new data generated? • Veracity-How accurate is the data?. • These four properties make big data different from the data found in traditional data management tools.
  • 6. Contd., • The four V’s of data make the data capturing,cleaning,preprocessing,storing,search ing,sharing,transferring and visualization processes a complex task. • Specialization techniques are required to extract the insights from this huge volume of data. • The main things that set a data scientist apart from a statistician are the ability to work with big data and experience in machine learning, computing, and algorithm building.
  • 7. Big data • Tools- Hadoop, Pig, Spark, R, Python, and Java. • Python is a great language for data science because it has many data science libraries available, and it’s widely supported by specialized software.
  • 8. Contd., • Data Science is an interdisciplinary field that allows to extract knowledge from structured or unstructured data. • Translate business problem and Research project into practical solution.
  • 9. Benefits and uses of data science • Data science and big data are used almost everywhere in both commercial and noncommercial Settings. • Commercial companies in almost every industry use data science and big data to gain insights into their customers, processes, staff, completion, and products. • Many companies use data science to offer customers a better user experience, as well as to cross-sell, up-sell, and personalize their offerings.
  • 10. Benefits and uses of data science • Governmental organizations are also aware of data’s value. Many governmental organizations not only rely on internal data scientists to discover valuable information, but also share their data with the public. • Nongovernmental organizations (NGOs) use it to raise money and defend their causes. • Universities use data science in their research but also to enhance the study experience of their students. The rise of massive open online courses (MOOC) produces a lot of data, which allows universities to study how this type of learning can complement traditional classes.
  • 11. Facets of data • In data science and big data we’ll come across many different types of data, and each of them tends to require different tools and techniques. The main categories of data are these: • Structured • Unstructured • Natural language • Machine-generated • Graph-based • Audio, video, and images • Streaming
  • 12. Structured Data •Structured data is data that depends on a data model and resides in a fixed field within a record. •To store structured data in tables within databases or Excel files. •SQL, or Structured Query Language, is the preferred way to manage and query data that resides in databases. •Hierarchical data such as a family tree.
  • 14. Unstructured Data • Unstructured data is data that isn’t easy to fit into a data model because the content is context-specific or varying. • Example of unstructured data is your regular email. • Although email contains structured elements such as the sender, title, and body text. • The thousands of different languages and dialects out there further complicate this.
  • 16. Natural language • Natural language is a special type of unstructured data; it’s challenging to process because it requires knowledge of specific data science techniques and linguistics. • The natural language processing community has had success in entity recognition, Speech recognition, summarization, text completion, and sentiment analysis, but models trained in one domain don’t generalize well to other domains. • Even state-of-the-art techniques aren’t able to decipher the meaning of every piece of text. • Example: personal voice Assistant, siri, Alexa recognize patterns in speech, provide a useful response. • Example: Email Filters- Primary, Social ,Promotions
  • 17. Machine-generated data • Machine-generated data is information that’s automatically created by a computer, process, application, or other machine without human intervention. • Machine-generated data is becoming a major data resource and will continue to do so. • The analysis of machine data relies on highly scalable tools, due to its high volume and speed. Examples of machine data are web server logs, call detail records, network event logs, and telemetry.
  • 18.
  • 19. Graph-based or Networked Data • A graph is a mathematical structure to model pair-wise relationships between objects. • Graph or network data is, in short, data that focuses on the relationship or adjacency of objects. • The graph structures use nodes, edges, and properties to represent and store graphical data.
  • 20. Contd., • Graph-based data is a natural way to represent social networks, and its structure allows you to calculate specific metrics such as the influence of a person and the shortest path between two people. • Example • LinkedIn-business colleagues, Twitter- follower list. • Facebook-connecting edges here to show friends.
  • 21. Graph based structure Specialized query languages such as SPARQL
  • 22. Audio, image, and video • Audio, image, and video are data types that pose specific challenges to a data scientist. • Tasks that are trivial for humans, such as recognizing objects in pictures, to be challenging for computers. • MLBAM (Major League Baseball Advanced Media) announced in 2014 that they’ll increase video capture to approximately 7 TB per game for the purpose of live, in-game analytics.
  • 23. Contd., • Recently a company called DeepMind succeeded at creating an algorithm that’s capable of learning how to play video games. • This algorithm takes the video screen as input and learns to interpret everything via a complex of deep learning.
  • 24. Streaming Data • The data flows into the system when an event happens instead of being loaded into a data store in a batch. • Examples are the “What’s trending” on Twitter, live sporting or music events, and the stock market.