SlideShare a Scribd company logo
Department of Computer Science and Engineering
Session 2023-24(Odd)
Subject: 5CDS-04: Data visualization- R Programming/
Power BI
Lecture-1
Topic: Introduction of Subject,
Data Science Introduction: Concepts, lifecycle, applications
Faculty : Sumit Mathur
Assistant Professor
Swami Keshvanand Institute of Technology,
Management & Gramothan, Jaipur
Introduction of Subject
• Introduction to Data Science and Data
Visualization: Data Science, Data Visualization
and R Programming
• Data Preprocessing and EDA with R: Data
Collection, Data Cleaning, EDA, ggpolt2
• Advanced Data Analysis and Visualization
with R: Statistical Analysis, Machine Learning,
R Shiny
• Power BI for Data Visualization and
Dashboard Creation: Power BI, Storytelling
• Advanced Data Visualization and Integration:
Integrating R with Power BI, Capstone Project
What is Data Science?
• Data Science is about data gathering, analysis
and decision-making.
• Data Science is about finding patterns in data,
through analysis, and make future predictions.
• By using Data Science, companies are able to
make:
o Better decisions (should we choose A or B)
o Predictive analysis (what will happen next?)
o Pattern discoveries (find pattern, or maybe
hidden information in the data)
What is Data Science?
How Does a Data Scientist Work?
A Data Scientist requires expertise in several
backgrounds:
• Machine Learning
• Statistics
• Programming (Python or R)
• Mathematics
• Databases
How a Data Scientist works:
• Ask the right questions - To understand the business
problem.
• Explore and collect data - From database, web logs,
customer feedback, etc.
• Extract the data - Transform the data to a standardized
format.
• Clean the data - Remove erroneous values from the data.
• Find and replace missing values - Check for missing values
and replace them with a suitable value (e.g. an average
value).
• Normalize data - Scale the values in a practical range (e.g.
140 cm is smaller than 1,8 m. However, the number 140 is
larger than 1,8. - so scaling is important).
• Analyze data, find patterns and make future predictions.
• Represent the result - Present the result with useful insights
in a way the "company" can understand.
Data Science Components:
• Statistics: Statistics is one of the most important
components of data science. Statistics is a way to
collect and analyze the numerical data in a large
amount and finding meaningful insights from it.
• Domain Expertise: In data science, domain
expertise binds data science together. Domain
expertise means specialized knowledge or skills of
a particular area. In data science, there are various
areas for which we need domain experts.
• Data engineering: Data engineering is a part of
data science, which involves acquiring, storing,
retrieving, and transforming the data. Data
engineering also includes metadata (data about
data) to the data.
Data Science Components
• Visualization: Data visualization is meant by representing
data in a visual context so that people can easily
understand the significance of data. Data visualization
makes it easy to access the huge amount of data in visuals.
• Advanced computing: Heavy lifting of data science is
advanced computing. Advanced computing involves
designing, writing, debugging, and maintaining the source
code of computer programs.
• Mathematics: Mathematics is the critical part of data
science. Mathematics involves the study of quantity,
structure, space, and changes. For a data scientist,
knowledge of good mathematics is essential.
• Machine learning: Machine learning is backbone of data
science. Machine learning is all about to provide training
to a machine so that it can act as a human brain. In data
science, we use various machine learning algorithms to
solve the problems.
Tools for Data Science
• Following are some tools required for data science:
• Data Analysis tools: R, Python, Statistics, SAS, Jupyter,
R Studio, MATLAB, Excel, RapidMiner.
• Data Warehousing: ETL, SQL, Hadoop,
Informatica/Talend, AWS Redshift
• Data Visualization tools: R, Jupyter, Tableau, Cognos.
• Machine learning tools: Spark, Mahout, Azure ML
studio.
Data Science Lifecycle
1. Discovery: The first phase is discovery, which involves asking the
right questions. When you start any data science project, you need
to determine what are the basic requirements, priorities, and
project budget. In this phase, we need to determine all the
requirements of the project such as the number of people,
technology, time, data, an end goal, and then we can frame the
business problem on first hypothesis level.
2. Data preparation: Data preparation is also known as Data Munging.
In this phase, we need to perform the following tasks:
Data cleaning--Data Reduction--Data integration--Data
transformation
After performing all the above tasks, we can easily use this data for
our further processes.
• 3. Model Planning: In this phase, we need to determine the various
methods and techniques to establish the relation between input
variables. We will apply Exploratory data analytics(EDA) by using
various statistical formula and visualization tools to understand the
relations between variable and to see what data can inform us.
Common tools used for model planning are:
SQL Analysis Services—R--Python
4. Model-building: In this phase, the process of model
building starts. We will create datasets for training and
testing purpose. We will apply different techniques such as
association, classification, and clustering, to build the
model.
Following are some common Model building tools:
SAS Enterprise Miner—WEKA--SPCS Modeler--MATLAB
5. Operationalize: In this phase, we will deliver the final
reports of the project, along with briefings, code, and
technical documents. This phase provides you a clear
overview of complete project performance and other
components on a small scale before the full deployment.
6. Communicate results: In this phase, we will check if we
reach the goal, which we have set on the initial phase. We
will communicate the findings and final result with the
business team.
Applications of Data Science
• Image recognition and speech recognition
• Gaming world
• Internet search
• Transport
• Healthcare
• Recommendation systems
• Risk detection
Advantages of data science
• Improved decision-making: Data science can help
organizations make better decisions by providing insights
and predictions based on data analysis.
• Cost-effective: With the right tools and techniques, data
science can help organizations reduce costs by identifying
areas of inefficiency and optimizing processes.
• Innovation: Data science can be used to identify new
opportunities for innovation and to develop new products
and services.
• Competitive advantage: Organizations that use data
science effectively can gain a competitive advantage by
making better decisions, improving efficiency, and
identifying new opportunities.
• Personalization: Data science can help organizations
personalize their products or services to better meet the
needs of individual customers.
Disadvantages of data science:
• Data quality: The accuracy and quality of the data used in
data science can have a significant impact on the results
obtained.
• Privacy concerns: The collection and use of data can raise
privacy concerns, particularly if the data is personal or
sensitive.
• Complexity: Data science can be a complex and technical
field that requires specialized skills and expertise.
• Bias: Data science algorithms can be biased if the data
used to train them is biased, which can lead to inaccurate
results.
• Interpretation: Interpreting data science results can be
challenging, particularly for non-technical stakeholders
who may not understand the underlying assumptions and
methods used.

More Related Content

Similar to Data Science Introduction: Concepts, lifecycle, applications.pptx

Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabadVamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in HyderabadKumarNaik21
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxAbderrahmanABID2
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and PlacementAkhilGGM
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfData Science Council of America
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)SayyedYusufali
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)SayyedYusufali
 

Similar to Data Science Introduction: Concepts, lifecycle, applications.pptx (20)

Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Ch1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptxCh1IntroductiontoDataScience.pptx
Ch1IntroductiontoDataScience.pptx
 
Data Science Training and Placement
Data Science Training and PlacementData Science Training and Placement
Data Science Training and Placement
 
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdfThe Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
The Simple 5-Step Process for Creating a Winning Data Pipeline.pdf
 
Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)Data science training in hyd ppt converted (1)
Data science training in hyd ppt converted (1)
 
Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)Data science training in hyd pdf converted (1)
Data science training in hyd pdf converted (1)
 

Recently uploaded

A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfKamal Acharya
 
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfKamal Acharya
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industriesMuhammadTufail242431
 
retail automation billing system ppt.pptx
retail automation billing system ppt.pptxretail automation billing system ppt.pptx
retail automation billing system ppt.pptxfaamieahmd
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfPipe Restoration Solutions
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdfKamal Acharya
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-IVigneshvaranMech
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxMd. Shahidul Islam Prodhan
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectRased Khan
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdfKamal Acharya
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamDr. Radhey Shyam
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxwendy cai
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGKOUSTAV SARKAR
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdfKamal Acharya
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturingssuser0811ec
 
School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdfKamal Acharya
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringC Sai Kiran
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationDr. Radhey Shyam
 

Recently uploaded (20)

A case study of cinema management system project report..pdf
A case study of cinema management system project report..pdfA case study of cinema management system project report..pdf
A case study of cinema management system project report..pdf
 
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdfONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
ONLINE CAR SERVICING SYSTEM PROJECT REPORT.pdf
 
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxCFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptx
 
Halogenation process of chemical process industries
Halogenation process of chemical process industriesHalogenation process of chemical process industries
Halogenation process of chemical process industries
 
retail automation billing system ppt.pptx
retail automation billing system ppt.pptxretail automation billing system ppt.pptx
retail automation billing system ppt.pptx
 
The Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdfThe Benefits and Techniques of Trenchless Pipe Repair.pdf
The Benefits and Techniques of Trenchless Pipe Repair.pdf
 
Laundry management system project report.pdf
Laundry management system project report.pdfLaundry management system project report.pdf
Laundry management system project report.pdf
 
Standard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - NeometrixStandard Reomte Control Interface - Neometrix
Standard Reomte Control Interface - Neometrix
 
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES  INTRODUCTION UNIT-IENERGY STORAGE DEVICES  INTRODUCTION UNIT-I
ENERGY STORAGE DEVICES INTRODUCTION UNIT-I
 
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptxCloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
Cloud-Computing_CSE311_Computer-Networking CSE GUB BD - Shahidul.pptx
 
Arduino based vehicle speed tracker project
Arduino based vehicle speed tracker projectArduino based vehicle speed tracker project
Arduino based vehicle speed tracker project
 
Furniture showroom management system project.pdf
Furniture showroom management system project.pdfFurniture showroom management system project.pdf
Furniture showroom management system project.pdf
 
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data StreamKIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
KIT-601 Lecture Notes-UNIT-3.pdf Mining Data Stream
 
Construction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptxConstruction method of steel structure space frame .pptx
Construction method of steel structure space frame .pptx
 
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWINGBRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
BRAKING SYSTEM IN INDIAN RAILWAY AutoCAD DRAWING
 
Pharmacy management system project report..pdf
Pharmacy management system project report..pdfPharmacy management system project report..pdf
Pharmacy management system project report..pdf
 
Introduction to Casting Processes in Manufacturing
Introduction to Casting Processes in ManufacturingIntroduction to Casting Processes in Manufacturing
Introduction to Casting Processes in Manufacturing
 
School management system project report.pdf
School management system project report.pdfSchool management system project report.pdf
School management system project report.pdf
 
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical EngineeringIntroduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
Introduction to Machine Learning Unit-4 Notes for II-II Mechanical Engineering
 
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and VisualizationKIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
 

Data Science Introduction: Concepts, lifecycle, applications.pptx

  • 1. Department of Computer Science and Engineering Session 2023-24(Odd) Subject: 5CDS-04: Data visualization- R Programming/ Power BI Lecture-1 Topic: Introduction of Subject, Data Science Introduction: Concepts, lifecycle, applications Faculty : Sumit Mathur Assistant Professor Swami Keshvanand Institute of Technology, Management & Gramothan, Jaipur
  • 2. Introduction of Subject • Introduction to Data Science and Data Visualization: Data Science, Data Visualization and R Programming • Data Preprocessing and EDA with R: Data Collection, Data Cleaning, EDA, ggpolt2 • Advanced Data Analysis and Visualization with R: Statistical Analysis, Machine Learning, R Shiny • Power BI for Data Visualization and Dashboard Creation: Power BI, Storytelling • Advanced Data Visualization and Integration: Integrating R with Power BI, Capstone Project
  • 3. What is Data Science? • Data Science is about data gathering, analysis and decision-making. • Data Science is about finding patterns in data, through analysis, and make future predictions. • By using Data Science, companies are able to make: o Better decisions (should we choose A or B) o Predictive analysis (what will happen next?) o Pattern discoveries (find pattern, or maybe hidden information in the data)
  • 4. What is Data Science?
  • 5. How Does a Data Scientist Work? A Data Scientist requires expertise in several backgrounds: • Machine Learning • Statistics • Programming (Python or R) • Mathematics • Databases
  • 6. How a Data Scientist works: • Ask the right questions - To understand the business problem. • Explore and collect data - From database, web logs, customer feedback, etc. • Extract the data - Transform the data to a standardized format. • Clean the data - Remove erroneous values from the data. • Find and replace missing values - Check for missing values and replace them with a suitable value (e.g. an average value). • Normalize data - Scale the values in a practical range (e.g. 140 cm is smaller than 1,8 m. However, the number 140 is larger than 1,8. - so scaling is important). • Analyze data, find patterns and make future predictions. • Represent the result - Present the result with useful insights in a way the "company" can understand.
  • 8. • Statistics: Statistics is one of the most important components of data science. Statistics is a way to collect and analyze the numerical data in a large amount and finding meaningful insights from it. • Domain Expertise: In data science, domain expertise binds data science together. Domain expertise means specialized knowledge or skills of a particular area. In data science, there are various areas for which we need domain experts. • Data engineering: Data engineering is a part of data science, which involves acquiring, storing, retrieving, and transforming the data. Data engineering also includes metadata (data about data) to the data. Data Science Components
  • 9. • Visualization: Data visualization is meant by representing data in a visual context so that people can easily understand the significance of data. Data visualization makes it easy to access the huge amount of data in visuals. • Advanced computing: Heavy lifting of data science is advanced computing. Advanced computing involves designing, writing, debugging, and maintaining the source code of computer programs. • Mathematics: Mathematics is the critical part of data science. Mathematics involves the study of quantity, structure, space, and changes. For a data scientist, knowledge of good mathematics is essential. • Machine learning: Machine learning is backbone of data science. Machine learning is all about to provide training to a machine so that it can act as a human brain. In data science, we use various machine learning algorithms to solve the problems.
  • 10. Tools for Data Science • Following are some tools required for data science: • Data Analysis tools: R, Python, Statistics, SAS, Jupyter, R Studio, MATLAB, Excel, RapidMiner. • Data Warehousing: ETL, SQL, Hadoop, Informatica/Talend, AWS Redshift • Data Visualization tools: R, Jupyter, Tableau, Cognos. • Machine learning tools: Spark, Mahout, Azure ML studio.
  • 12. 1. Discovery: The first phase is discovery, which involves asking the right questions. When you start any data science project, you need to determine what are the basic requirements, priorities, and project budget. In this phase, we need to determine all the requirements of the project such as the number of people, technology, time, data, an end goal, and then we can frame the business problem on first hypothesis level. 2. Data preparation: Data preparation is also known as Data Munging. In this phase, we need to perform the following tasks: Data cleaning--Data Reduction--Data integration--Data transformation After performing all the above tasks, we can easily use this data for our further processes. • 3. Model Planning: In this phase, we need to determine the various methods and techniques to establish the relation between input variables. We will apply Exploratory data analytics(EDA) by using various statistical formula and visualization tools to understand the relations between variable and to see what data can inform us. Common tools used for model planning are: SQL Analysis Services—R--Python
  • 13. 4. Model-building: In this phase, the process of model building starts. We will create datasets for training and testing purpose. We will apply different techniques such as association, classification, and clustering, to build the model. Following are some common Model building tools: SAS Enterprise Miner—WEKA--SPCS Modeler--MATLAB 5. Operationalize: In this phase, we will deliver the final reports of the project, along with briefings, code, and technical documents. This phase provides you a clear overview of complete project performance and other components on a small scale before the full deployment. 6. Communicate results: In this phase, we will check if we reach the goal, which we have set on the initial phase. We will communicate the findings and final result with the business team.
  • 14. Applications of Data Science • Image recognition and speech recognition • Gaming world • Internet search • Transport • Healthcare • Recommendation systems • Risk detection
  • 15. Advantages of data science • Improved decision-making: Data science can help organizations make better decisions by providing insights and predictions based on data analysis. • Cost-effective: With the right tools and techniques, data science can help organizations reduce costs by identifying areas of inefficiency and optimizing processes. • Innovation: Data science can be used to identify new opportunities for innovation and to develop new products and services. • Competitive advantage: Organizations that use data science effectively can gain a competitive advantage by making better decisions, improving efficiency, and identifying new opportunities. • Personalization: Data science can help organizations personalize their products or services to better meet the needs of individual customers.
  • 16. Disadvantages of data science: • Data quality: The accuracy and quality of the data used in data science can have a significant impact on the results obtained. • Privacy concerns: The collection and use of data can raise privacy concerns, particularly if the data is personal or sensitive. • Complexity: Data science can be a complex and technical field that requires specialized skills and expertise. • Bias: Data science algorithms can be biased if the data used to train them is biased, which can lead to inaccurate results. • Interpretation: Interpreting data science results can be challenging, particularly for non-technical stakeholders who may not understand the underlying assumptions and methods used.