SlideShare a Scribd company logo
WHAT IS DATA SCIENCE ?
BY
SHILPA KRISHNA
RESEARCH SCHOLAR
Data
Science
Process
DISCOVERY
DATA
PREPARATIO
N
MODEL
PLANNIN
G
MODEL
BUILDIN
G
OPERATI
ON
COMMUNICAT
E
RESULTS
DISCOVERY
 It involves acquiring data from all the identified
internal and external sources which helps you to
answer the business question.
 The data can be :
1. Logs from webservers
2. Data gathered from social media
3. Census datasets
4. Data streamed from online sources using APIs
DATA PREPARATION
 Data can have lots of inconsistencies like
missing value,blank columns,incorrect data
format which needs to be cleaned.
 You need to process,explore and condition
data before modeling.
 The cleaner your data, the better are your
predictions.
MODEL PLANNING
 In this stage, you need to determine the
method and technique to draw the relation
between input variables.
 Planning for a model is performed by using
different statistical formulas and
visualization tools like SQL analysis
services, R and SAS/access
MODEL BUILDING
 Data scientist distributes datasets for
training and testing.
 Techniques like association, classification,
and clustering are applied to the training
dataset.
 The model once prepared is tested
against the “testing” dataset
OPERATIONALIZE
 You deliver the final baselined model with
reports,code and technical documents.
 Model is deployed into a real-time
production environment after through
testing.
COMMUNICATE RESULTS
 The key findings are communicated to all
stakeholders.
 This helps you to decide if the results of
the project are a success or a failure
based on the inputs from the model.
MOST PROMINENT DATA SCIENTIST JOB TITLES ARE :
1) Data scientist
2) Data engineer
3) Data analyst
4) Statistician
5) Data admin
6) Business analyst
Data Scientist
ROLE LANGUAGES
 It is a professional who
manages enormous
amounts of data to come
up with compelling
business visions by using
various tools, techniques,
methodologies, algorithms
etc…
 R
 SAS
 PYTHON
 SQL
 HIVE
 MATLAB
 PIG
 SPARK
Data Engineer
ROLE LANGUAGES
 He is working with large
amounts of data and
develops constructs,
tests and maintains
architectures like large
scale processing system
and databases.
 SQL
 HIVE
 R
 SAS
 MATLAB
 PYTHON
 JAVA
 RUBY
 C++
 PERL
Data Analyst
ROLE LANGUAGES
 Responsible for mining vast
amounts of data and look
for relationships, patterns,
trends in data.
 Later deliver compeling
reporting and visualization
for analyzing the data to
take the most viable
business decisions.
 R
 PYTHON
 HTML
 JS
 C
 C++
 SQL
Statistician
ROLE LANGUAGES
 Collects, analyses,
understand qualitative
and quantitative data by
using statistical theories
and methods.
 SQL
 R
 MATLAB
 TABLEAU
 PYTHON
 PERL
 SPARK
 HIVE
Data Administrator
ROLE LANGUAGES
 Data admin should
ensure that the database
is accessible to all
relevant users also
makes sure that it is
performing correctly and
is being kept safe from
hacking
 RUBY on Rails
 SQL
 JAVA
 C#
 PYTHON
Business Analyst
ROLE LANGUAGES
 This professional need to
improves business
processes and He is an
intermediary between the
business executive team
and IT department
 SQL
 TABLEAU
 POWER BI
 PYTHON
DEFINE THE GOAL
 Define a measurable and quantifiable goal
 Goal should be specific and precise
 Goal is come up with candidate
hypothesis. These hypothesis can then be
turned into concrete questions or goals for
a full-scale modeling project.
COLLECT AND MANAGE DATA
 Time consuming step
 Conduct initial exploration and
visualization of the data
 Clean data: repair data errors and
transform variables as needed
BUILD THE MODEL
Most common data science modeling tasks are
 Classification
 Scoring
 Ranking
 Clustering
 Finding relations
 Characterization
EVALUATE AND CRITIQUE MODEL
Once you have a model, you need to
determine if it meets your goals :
 Is it accurate enough for your needs ?
 Does it perform better than the obvious
guess ?
 Do the results of the model make sense in
the context of the problem domain ?
PRESENT RESULTS AND DOCUMENT
 Present results to your project sponser
and other stakeholders.
 Document the model for those in the
organization who are responsible for
using running and maintaining the model
once it has been deployed.
DEPLOY MODEL
 Make sure that the model can be updated
as its environment changes.
 The model initially be deployed in a small
pilot program.
Several ways of gathering data for
analysis are :
 CSV FILE
 FLAT FILE(tab, space
or any other separator)
 TEXT FILE(In a single
file- reading data all at
once) or (reading data
line by line)
 ZIP FILE
 APIs(JSON)
 MULTIPLE TEXT
FILE(data is split over
multiple text files)
 DOWNLOAD FILE
FROM INTERNET(file
hosted on a server)
 WEBPAGE(scraping)
 RDBMS(SQL tables)
 Relational database uses tables which
are called Records
 Establish connections among records by
using primary key and foreign key
 Allows users to establish defined
relationships between tables
 In RDBMS, we use SQL instructions to
reproduce and analyze data separately
SOME COMMONLY USED PLOTS FOR EDA ARE :
 Histogram
 Scatter plots
 Maps
 Feature corelation plot(Heatmap)
 Time series plots
Data management platforms enables
organizations and enterprises to use data
analytics in beneficial ways, such as :
 Personalizing the customer experience
 Adding value to customer interactions
 Improving customer engagement
 Increasing customer loyalty
 Reaping and revenues associated with data
driven marketing
 Identifying the root causes of marketing failures
and business issues in real time

More Related Content

What's hot

Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
VijayMohan Vasu
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Edureka!
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Tharushi Ruwandika
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ANOOP V S
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Srishti44
 
Data science
Data scienceData science
Data science
Sreejith c
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Edureka!
 
Data Science
Data ScienceData Science
Data Science
Amit Singh
 
Data science
Data scienceData science
Data science
Ranjit Nambisan
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Simplilearn
 
Data science
Data science Data science
Data science
SouravSadhukhan6
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | Edureka
Edureka!
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data ScienceEdureka!
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Edureka!
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
SadhanaParameswaran
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Edureka!
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
ActonRoy
 

What's hot (20)

Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
Data Science Training | Data Science Tutorial for Beginners | Data Science wi...
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Data science
Data scienceData science
Data science
 
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
Data Analytics For Beginners | Introduction To Data Analytics | Data Analytic...
 
Data Science
Data ScienceData Science
Data Science
 
Data science
Data scienceData science
Data science
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Data science
Data science Data science
Data science
 
Data Science Full Course | Edureka
Data Science Full Course | EdurekaData Science Full Course | Edureka
Data Science Full Course | Edureka
 
Introduction on Data Science
Introduction on Data ScienceIntroduction on Data Science
Introduction on Data Science
 
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...Data Science Tutorial | Introduction To Data Science | Data Science Training ...
Data Science Tutorial | Introduction To Data Science | Data Science Training ...
 
Introduction to data science.pptx
Introduction to data science.pptxIntroduction to data science.pptx
Introduction to data science.pptx
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
Data Science For Beginners | Who Is A Data Scientist? | Data Science Tutorial...
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 

Similar to Data science | What is Data science

Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Chain Sys Corporation
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Ghulam Imaduddin
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
Osman Ali
 
Bigdataanalytics
BigdataanalyticsBigdataanalytics
Bigdataanalytics
Haroon Karim
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark
ZaranTech LLC
 
Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)
Marié Roux
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
NagarajanG35
 
Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020
Marié Roux
 
Sujit lead plsql
Sujit lead plsqlSujit lead plsql
Sujit lead plsql
Sujit Jha
 
Shraddha Verma_IT_ETL Architect_10+_CV
Shraddha Verma_IT_ETL Architect_10+_CVShraddha Verma_IT_ETL Architect_10+_CV
Shraddha Verma_IT_ETL Architect_10+_CVShraddha Mehrotra
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
TechoERP
 
Sap Interview Questions - Part 1
Sap Interview Questions - Part 1Sap Interview Questions - Part 1
Sap Interview Questions - Part 1
ReKruiTIn.com
 
Deblina Dey - Resume
Deblina Dey - ResumeDeblina Dey - Resume
Deblina Dey - Resumedeblina dey
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
Data Science Council of America
 
Resume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - IndResume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - IndAbhishek Ray
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
atSistemas
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
Dr. C.V. Suresh Babu
 
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .NetCV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
Sham Sunder
 
Resume_RaghavMahajan_ETL_Developer
Resume_RaghavMahajan_ETL_DeveloperResume_RaghavMahajan_ETL_Developer
Resume_RaghavMahajan_ETL_DeveloperRaghav Mahajan
 

Similar to Data science | What is Data science (20)

Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...Neoaug 2013 critical success factors for data quality management-chain-sys-co...
Neoaug 2013 critical success factors for data quality management-chain-sys-co...
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Big Data Analytics
Big Data AnalyticsBig Data Analytics
Big Data Analytics
 
Bigdataanalytics
BigdataanalyticsBigdataanalytics
Bigdataanalytics
 
Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark Introduction To Data Science with Apache Spark
Introduction To Data Science with Apache Spark
 
Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)Overview of tools for data analysis and visualisation (2021)
Overview of tools for data analysis and visualisation (2021)
 
Data science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptxData science Nagarajan and madhav.pptx
Data science Nagarajan and madhav.pptx
 
Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020Overview data analyis and visualisation tools 2020
Overview data analyis and visualisation tools 2020
 
Sujit lead plsql
Sujit lead plsqlSujit lead plsql
Sujit lead plsql
 
Shraddha Verma_IT_ETL Architect_10+_CV
Shraddha Verma_IT_ETL Architect_10+_CVShraddha Verma_IT_ETL Architect_10+_CV
Shraddha Verma_IT_ETL Architect_10+_CV
 
TechoERP.pdf
TechoERP.pdfTechoERP.pdf
TechoERP.pdf
 
Sap Interview Questions - Part 1
Sap Interview Questions - Part 1Sap Interview Questions - Part 1
Sap Interview Questions - Part 1
 
Deblina Dey - Resume
Deblina Dey - ResumeDeblina Dey - Resume
Deblina Dey - Resume
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Resume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - IndResume - Abhishek Ray-Mar-2016 - Ind
Resume - Abhishek Ray-Mar-2016 - Ind
 
DevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-OracleDevOps Spain 2019. Olivier Perard-Oracle
DevOps Spain 2019. Olivier Perard-Oracle
 
Data Analytics Life Cycle
Data Analytics Life CycleData Analytics Life Cycle
Data Analytics Life Cycle
 
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .NetCV | Sham Sunder | Data | Database | Business Intelligence | .Net
CV | Sham Sunder | Data | Database | Business Intelligence | .Net
 
Kanakaraj_Periasamy
Kanakaraj_PeriasamyKanakaraj_Periasamy
Kanakaraj_Periasamy
 
Resume_RaghavMahajan_ETL_Developer
Resume_RaghavMahajan_ETL_DeveloperResume_RaghavMahajan_ETL_Developer
Resume_RaghavMahajan_ETL_Developer
 

More from ShilpaKrishna6

WBAN(Wireless Body Area Network)
WBAN(Wireless Body Area Network)WBAN(Wireless Body Area Network)
WBAN(Wireless Body Area Network)
ShilpaKrishna6
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
ShilpaKrishna6
 
Big data business analytics | Introduction to Business Analytics
Big data business analytics | Introduction to Business AnalyticsBig data business analytics | Introduction to Business Analytics
Big data business analytics | Introduction to Business Analytics
ShilpaKrishna6
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
ShilpaKrishna6
 
What is MapReduce ?
What is MapReduce ?What is MapReduce ?
What is MapReduce ?
ShilpaKrishna6
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databases
ShilpaKrishna6
 
Internet of Things(IoT) Applications
Internet of Things(IoT) ApplicationsInternet of Things(IoT) Applications
Internet of Things(IoT) Applications
ShilpaKrishna6
 
4 pillers of iot
4 pillers of iot4 pillers of iot
4 pillers of iot
ShilpaKrishna6
 
Iot enabled technologies
Iot enabled technologiesIot enabled technologies
Iot enabled technologies
ShilpaKrishna6
 
Iot logical design
Iot logical designIot logical design
Iot logical design
ShilpaKrishna6
 
Physical design of io t
Physical design of io tPhysical design of io t
Physical design of io t
ShilpaKrishna6
 
Introduction to iot(internet of things)
Introduction to iot(internet of things)Introduction to iot(internet of things)
Introduction to iot(internet of things)
ShilpaKrishna6
 
Number system and its conversions
Number system and its conversionsNumber system and its conversions
Number system and its conversions
ShilpaKrishna6
 

More from ShilpaKrishna6 (13)

WBAN(Wireless Body Area Network)
WBAN(Wireless Body Area Network)WBAN(Wireless Body Area Network)
WBAN(Wireless Body Area Network)
 
Evolution of big data
Evolution of big dataEvolution of big data
Evolution of big data
 
Big data business analytics | Introduction to Business Analytics
Big data business analytics | Introduction to Business AnalyticsBig data business analytics | Introduction to Business Analytics
Big data business analytics | Introduction to Business Analytics
 
What is big data ? | Big Data Applications
What is big data ? | Big Data ApplicationsWhat is big data ? | Big Data Applications
What is big data ? | Big Data Applications
 
What is MapReduce ?
What is MapReduce ?What is MapReduce ?
What is MapReduce ?
 
Introduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databasesIntroduction to nosql | NoSQL databases
Introduction to nosql | NoSQL databases
 
Internet of Things(IoT) Applications
Internet of Things(IoT) ApplicationsInternet of Things(IoT) Applications
Internet of Things(IoT) Applications
 
4 pillers of iot
4 pillers of iot4 pillers of iot
4 pillers of iot
 
Iot enabled technologies
Iot enabled technologiesIot enabled technologies
Iot enabled technologies
 
Iot logical design
Iot logical designIot logical design
Iot logical design
 
Physical design of io t
Physical design of io tPhysical design of io t
Physical design of io t
 
Introduction to iot(internet of things)
Introduction to iot(internet of things)Introduction to iot(internet of things)
Introduction to iot(internet of things)
 
Number system and its conversions
Number system and its conversionsNumber system and its conversions
Number system and its conversions
 

Recently uploaded

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
Delapenabediema
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
Jisc
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Po-Chuan Chen
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
EduSkills OECD
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 

Recently uploaded (20)

Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
The Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official PublicationThe Challenger.pdf DNHS Official Publication
The Challenger.pdf DNHS Official Publication
 
How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...How libraries can support authors with open access requirements for UKRI fund...
How libraries can support authors with open access requirements for UKRI fund...
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdfAdversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
Adversarial Attention Modeling for Multi-dimensional Emotion Regression.pdf
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Francesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptxFrancesca Gottschalk - How can education support child empowerment.pptx
Francesca Gottschalk - How can education support child empowerment.pptx
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 

Data science | What is Data science

  • 1. WHAT IS DATA SCIENCE ? BY SHILPA KRISHNA RESEARCH SCHOLAR
  • 3. DISCOVERY  It involves acquiring data from all the identified internal and external sources which helps you to answer the business question.  The data can be : 1. Logs from webservers 2. Data gathered from social media 3. Census datasets 4. Data streamed from online sources using APIs
  • 4. DATA PREPARATION  Data can have lots of inconsistencies like missing value,blank columns,incorrect data format which needs to be cleaned.  You need to process,explore and condition data before modeling.  The cleaner your data, the better are your predictions.
  • 5. MODEL PLANNING  In this stage, you need to determine the method and technique to draw the relation between input variables.  Planning for a model is performed by using different statistical formulas and visualization tools like SQL analysis services, R and SAS/access
  • 6. MODEL BUILDING  Data scientist distributes datasets for training and testing.  Techniques like association, classification, and clustering are applied to the training dataset.  The model once prepared is tested against the “testing” dataset
  • 7. OPERATIONALIZE  You deliver the final baselined model with reports,code and technical documents.  Model is deployed into a real-time production environment after through testing.
  • 8. COMMUNICATE RESULTS  The key findings are communicated to all stakeholders.  This helps you to decide if the results of the project are a success or a failure based on the inputs from the model.
  • 9.
  • 10. MOST PROMINENT DATA SCIENTIST JOB TITLES ARE : 1) Data scientist 2) Data engineer 3) Data analyst 4) Statistician 5) Data admin 6) Business analyst
  • 11. Data Scientist ROLE LANGUAGES  It is a professional who manages enormous amounts of data to come up with compelling business visions by using various tools, techniques, methodologies, algorithms etc…  R  SAS  PYTHON  SQL  HIVE  MATLAB  PIG  SPARK
  • 12. Data Engineer ROLE LANGUAGES  He is working with large amounts of data and develops constructs, tests and maintains architectures like large scale processing system and databases.  SQL  HIVE  R  SAS  MATLAB  PYTHON  JAVA  RUBY  C++  PERL
  • 13. Data Analyst ROLE LANGUAGES  Responsible for mining vast amounts of data and look for relationships, patterns, trends in data.  Later deliver compeling reporting and visualization for analyzing the data to take the most viable business decisions.  R  PYTHON  HTML  JS  C  C++  SQL
  • 14. Statistician ROLE LANGUAGES  Collects, analyses, understand qualitative and quantitative data by using statistical theories and methods.  SQL  R  MATLAB  TABLEAU  PYTHON  PERL  SPARK  HIVE
  • 15. Data Administrator ROLE LANGUAGES  Data admin should ensure that the database is accessible to all relevant users also makes sure that it is performing correctly and is being kept safe from hacking  RUBY on Rails  SQL  JAVA  C#  PYTHON
  • 16. Business Analyst ROLE LANGUAGES  This professional need to improves business processes and He is an intermediary between the business executive team and IT department  SQL  TABLEAU  POWER BI  PYTHON
  • 17.
  • 18.
  • 19. DEFINE THE GOAL  Define a measurable and quantifiable goal  Goal should be specific and precise  Goal is come up with candidate hypothesis. These hypothesis can then be turned into concrete questions or goals for a full-scale modeling project.
  • 20. COLLECT AND MANAGE DATA  Time consuming step  Conduct initial exploration and visualization of the data  Clean data: repair data errors and transform variables as needed
  • 21. BUILD THE MODEL Most common data science modeling tasks are  Classification  Scoring  Ranking  Clustering  Finding relations  Characterization
  • 22. EVALUATE AND CRITIQUE MODEL Once you have a model, you need to determine if it meets your goals :  Is it accurate enough for your needs ?  Does it perform better than the obvious guess ?  Do the results of the model make sense in the context of the problem domain ?
  • 23. PRESENT RESULTS AND DOCUMENT  Present results to your project sponser and other stakeholders.  Document the model for those in the organization who are responsible for using running and maintaining the model once it has been deployed.
  • 24. DEPLOY MODEL  Make sure that the model can be updated as its environment changes.  The model initially be deployed in a small pilot program.
  • 25.
  • 26. Several ways of gathering data for analysis are :  CSV FILE  FLAT FILE(tab, space or any other separator)  TEXT FILE(In a single file- reading data all at once) or (reading data line by line)  ZIP FILE  APIs(JSON)  MULTIPLE TEXT FILE(data is split over multiple text files)  DOWNLOAD FILE FROM INTERNET(file hosted on a server)  WEBPAGE(scraping)  RDBMS(SQL tables)
  • 27.
  • 28.  Relational database uses tables which are called Records  Establish connections among records by using primary key and foreign key  Allows users to establish defined relationships between tables  In RDBMS, we use SQL instructions to reproduce and analyze data separately
  • 29.
  • 30. SOME COMMONLY USED PLOTS FOR EDA ARE :  Histogram  Scatter plots  Maps  Feature corelation plot(Heatmap)  Time series plots
  • 31.
  • 32. Data management platforms enables organizations and enterprises to use data analytics in beneficial ways, such as :  Personalizing the customer experience  Adding value to customer interactions  Improving customer engagement  Increasing customer loyalty  Reaping and revenues associated with data driven marketing  Identifying the root causes of marketing failures and business issues in real time