This document provides an introduction to the subject of data visualization using R programming and Power BI. It discusses key concepts in data science including the data science lifecycle, components of data science like statistics and machine learning, and applications of data science such as image recognition. The document also outlines some advantages and disadvantages of using data science.
KIT-601 Lecture Notes-UNIT-5.pdf Frame Works and Visualization
Data Science Introduction: Concepts, lifecycle, applications.pptx
1. Department of Computer Science and Engineering
Session 2023-24(Odd)
Subject: 5CDS-04: Data visualization- R Programming/
Power BI
Lecture-1
Topic: Introduction of Subject,
Data Science Introduction: Concepts, lifecycle, applications
Faculty : Sumit Mathur
Assistant Professor
Swami Keshvanand Institute of Technology,
Management & Gramothan, Jaipur
2. Introduction of Subject
• Introduction to Data Science and Data
Visualization: Data Science, Data Visualization
and R Programming
• Data Preprocessing and EDA with R: Data
Collection, Data Cleaning, EDA, ggpolt2
• Advanced Data Analysis and Visualization
with R: Statistical Analysis, Machine Learning,
R Shiny
• Power BI for Data Visualization and
Dashboard Creation: Power BI, Storytelling
• Advanced Data Visualization and Integration:
Integrating R with Power BI, Capstone Project
3. What is Data Science?
• Data Science is about data gathering, analysis
and decision-making.
• Data Science is about finding patterns in data,
through analysis, and make future predictions.
• By using Data Science, companies are able to
make:
o Better decisions (should we choose A or B)
o Predictive analysis (what will happen next?)
o Pattern discoveries (find pattern, or maybe
hidden information in the data)
5. How Does a Data Scientist Work?
A Data Scientist requires expertise in several
backgrounds:
• Machine Learning
• Statistics
• Programming (Python or R)
• Mathematics
• Databases
6. How a Data Scientist works:
• Ask the right questions - To understand the business
problem.
• Explore and collect data - From database, web logs,
customer feedback, etc.
• Extract the data - Transform the data to a standardized
format.
• Clean the data - Remove erroneous values from the data.
• Find and replace missing values - Check for missing values
and replace them with a suitable value (e.g. an average
value).
• Normalize data - Scale the values in a practical range (e.g.
140 cm is smaller than 1,8 m. However, the number 140 is
larger than 1,8. - so scaling is important).
• Analyze data, find patterns and make future predictions.
• Represent the result - Present the result with useful insights
in a way the "company" can understand.
8. • Statistics: Statistics is one of the most important
components of data science. Statistics is a way to
collect and analyze the numerical data in a large
amount and finding meaningful insights from it.
• Domain Expertise: In data science, domain
expertise binds data science together. Domain
expertise means specialized knowledge or skills of
a particular area. In data science, there are various
areas for which we need domain experts.
• Data engineering: Data engineering is a part of
data science, which involves acquiring, storing,
retrieving, and transforming the data. Data
engineering also includes metadata (data about
data) to the data.
Data Science Components
9. • Visualization: Data visualization is meant by representing
data in a visual context so that people can easily
understand the significance of data. Data visualization
makes it easy to access the huge amount of data in visuals.
• Advanced computing: Heavy lifting of data science is
advanced computing. Advanced computing involves
designing, writing, debugging, and maintaining the source
code of computer programs.
• Mathematics: Mathematics is the critical part of data
science. Mathematics involves the study of quantity,
structure, space, and changes. For a data scientist,
knowledge of good mathematics is essential.
• Machine learning: Machine learning is backbone of data
science. Machine learning is all about to provide training
to a machine so that it can act as a human brain. In data
science, we use various machine learning algorithms to
solve the problems.
10. Tools for Data Science
• Following are some tools required for data science:
• Data Analysis tools: R, Python, Statistics, SAS, Jupyter,
R Studio, MATLAB, Excel, RapidMiner.
• Data Warehousing: ETL, SQL, Hadoop,
Informatica/Talend, AWS Redshift
• Data Visualization tools: R, Jupyter, Tableau, Cognos.
• Machine learning tools: Spark, Mahout, Azure ML
studio.
12. 1. Discovery: The first phase is discovery, which involves asking the
right questions. When you start any data science project, you need
to determine what are the basic requirements, priorities, and
project budget. In this phase, we need to determine all the
requirements of the project such as the number of people,
technology, time, data, an end goal, and then we can frame the
business problem on first hypothesis level.
2. Data preparation: Data preparation is also known as Data Munging.
In this phase, we need to perform the following tasks:
Data cleaning--Data Reduction--Data integration--Data
transformation
After performing all the above tasks, we can easily use this data for
our further processes.
• 3. Model Planning: In this phase, we need to determine the various
methods and techniques to establish the relation between input
variables. We will apply Exploratory data analytics(EDA) by using
various statistical formula and visualization tools to understand the
relations between variable and to see what data can inform us.
Common tools used for model planning are:
SQL Analysis Services—R--Python
13. 4. Model-building: In this phase, the process of model
building starts. We will create datasets for training and
testing purpose. We will apply different techniques such as
association, classification, and clustering, to build the
model.
Following are some common Model building tools:
SAS Enterprise Miner—WEKA--SPCS Modeler--MATLAB
5. Operationalize: In this phase, we will deliver the final
reports of the project, along with briefings, code, and
technical documents. This phase provides you a clear
overview of complete project performance and other
components on a small scale before the full deployment.
6. Communicate results: In this phase, we will check if we
reach the goal, which we have set on the initial phase. We
will communicate the findings and final result with the
business team.
14. Applications of Data Science
• Image recognition and speech recognition
• Gaming world
• Internet search
• Transport
• Healthcare
• Recommendation systems
• Risk detection
15. Advantages of data science
• Improved decision-making: Data science can help
organizations make better decisions by providing insights
and predictions based on data analysis.
• Cost-effective: With the right tools and techniques, data
science can help organizations reduce costs by identifying
areas of inefficiency and optimizing processes.
• Innovation: Data science can be used to identify new
opportunities for innovation and to develop new products
and services.
• Competitive advantage: Organizations that use data
science effectively can gain a competitive advantage by
making better decisions, improving efficiency, and
identifying new opportunities.
• Personalization: Data science can help organizations
personalize their products or services to better meet the
needs of individual customers.
16. Disadvantages of data science:
• Data quality: The accuracy and quality of the data used in
data science can have a significant impact on the results
obtained.
• Privacy concerns: The collection and use of data can raise
privacy concerns, particularly if the data is personal or
sensitive.
• Complexity: Data science can be a complex and technical
field that requires specialized skills and expertise.
• Bias: Data science algorithms can be biased if the data
used to train them is biased, which can lead to inaccurate
results.
• Interpretation: Interpreting data science results can be
challenging, particularly for non-technical stakeholders
who may not understand the underlying assumptions and
methods used.