SlideShare a Scribd company logo
1 of 31
Data Science
Grade IX
Version 1.0
Chapter 1: Introduction to data
At the end of this chapter, students will understand
the usefulness of studying data. They will know:
• What is data and information?
• What is DIKW Model?
• How data influences our lives?
• What are data footprints?
• Data loss and recovery
What is Data?
• Whatever we can read, write, speak,
and observe is data. Every day we
share so much data when we read,
write, speak, and watch. It can be
numbers, alphabets, symbols, or a
combination of all of these.
• Data can be defined as facts or
information which when stored, can
be used as a basis for decision
making, calculation, or discussion.
Data vs. Information
Data can be a number,
symbol, or text, which may
not mean anything to
individuals on its own.
However, when data is
processed and put in context,
they bear a meaning. This
means data can be used for
decision-making,
calculations, or discussions.
The data then becomes
information.
DIKW Model
Data after transformation to
information can also be
converted to knowledge and
wisdom.
This is called the DIKW model,
which explains how we move
from Data to Information to
Knowledge to Wisdom.
How data influences our lives?
Data heavily influence our daily
lives, Starting from online shopping
to watching our favorite shows on
television to ordering food from the
best restaurants. Data and data
analysis have a significant impact
on our lives.
Few aspects of our lives that are
impacted by data are:
• Healthcare
• Online shopping
• Education
• Travel
• Online shows
What are data footprints?
• In our daily life, the internet has
become an essential part. With all the
activities we do on the internet, we
create trails of data. These trails of data
are called data footprints.
• We create digital footprints that can be
used to identify us as individuals.
• This image shows how we connect to
the rest of the world and the people
around us.
Data loss and
recovery
• Data can be lost, corrupted,
damaged, or deleted due to
multiple reasons like transaction
failure, system crash, or disk
failure. The process of restoring
inaccessible, lost, corrupted,
damaged, or deleted data is called
data recovery.
• To prevent data loss, we should
frequently back up our data.
Large enterprise systems
generally use backup data
storage from where they recover
the data in case of any loss.
Chapter 2: Arranging and Collecting data
At the end of this chapter, Students will understand
how to arrange and collect data. They will know:
• What is data collection?
• What are variables?
• What are data sources?
• What is big data?
• Asking questions on data
• Univariate vs. Multivariate data
What is data collection?
The method of gathering data for
calculating and analyzing
reliable insights is known as
data collection, which is done
using standard validated
techniques. A researcher or
scientist works based on the
collected data. Data collection is
a primary and essential step in
most cases.
What are variables?
A variable is an attribute of an object of study that may vary for different
cases. Thus, a variable varies for different case studies in research.
Now variables can be of two types.
• Numerical variable
They represent values that have numbers. For Example, age, weight, height.
• Categorical variable
These variables represent values that have words, for example, name,
nationality, sport, etc.
Types of data
Data can be divided
into two categories,
quantitative and
qualitative.
Types of data sources
Data sources can be
classified into primary and
secondary sources.
What is big data?
• When the data volumes exceed the
processing capacities of traditional
databases, they are called Big Data.
• Millions of users are using the social
media platforms and creating an
enormous amount of content every
minute.
• Systems, which can extract statistical
insights from a huge amount of data, are
called Big Data systems.
Questioning your
data
• Data is typically stored as
numbers (numeric) or
labels (categories). Based
on the type of data, we
need to ask five simple
questions from the data.
• These are the 5 types of
algorithms we will learn in
this section
Univariate vs Multivariate data
• Univariate data is a type of data that has only one
variable or which does not involve multiple
parameters or relationships.
For Example, the height of students is univariate
data.
• Multivariate data is a type of data which involves
a relationship between multiple variables
For Example, sales of umbrellas increase during the
rainy season.
We see umbrella sales are dependent on rainfall.
So, there are two variables – "rainfall" and "sales."
These types of data are more complex than
univariate as they involve comparisons and
relations with multiple parameters.
Chapter 3: Data Visualizations
At the end of this chapter, students will understand
how to visualize data and make it more
comprehendible. They will know:
• The importance of visualization
• Plotting data
• Histograms
• Use of shapes
• Use of single and multivariable plots
Importance of data
visualization
• With so much information
around us, it is challenging
to view the data and derive
insights from it.
• Representing data through
visualizations like graphs,
charts, maps, etc., gives us a
visual context of the data.
• It also makes complex data
simple and enables the
human mind to understand
its significance.
Plotting data
There are different ways to
visualize data depending on the
data being modeled and its
purpose. Different kinds of
graphs and tables can be used to
visualize the data.
Students will learn about the
following in this section:
• Dot plots
• Bar graphs
• Maximum and Minimum
• Frequency
Dot Plot
• A dot plot is a graphical display of data
using dots.
• Dots are used in dot plots to illustrate
the quantitative values associated with
the categorical values.
For Example, Minutes to reach school.
This data shows how long does it take
five people to reach the school.
Minutes: 6, 2, 4, 8, 5
Person: A, B, C, D, E
Bar Graphs
• A bar graph is a graphical
display of data using bars
of different heights. It is
possible to plots the bars
vertically or horizontally.
• In a bar graph, the bars
are presented to show
elements so that they do
not touch each other.
Column Chart
A bar graph is called a
column chart or graph.
Maximum and Minimum
The minimum of the data is
less than or equal to any
other values in our data set.
If we had to order all our
data in ascending order, so
the first number in our list
would be the minimum.
The maximum of the data is
greater than or equal to all
other values. If we had to
order all our data in
ascending order, so the last
number listed would be the
maximum.
Frequency
The frequency of a data value is the number of times the data value
occurs/repeats.
For Example, if five students have a score of 85 in English, the score of
85 is said to have a frequency of 5. The frequency of a data value is
often represented by f.
Histograms and Use of shapes
• A histogram is a graphical display
of data using bars of different
heights. It is used to summarize
discrete or continuous data.
• In other words, a histogram
displays the number of data points
that fall within a given set of values
(called 'bins') to provide a visual
representation of numeric data.
Unlike a vertical bar graph, a
histogram shows no gaps between
the bars.
Use of single and
multivariable plots
Finally, the students will
learn about the use of single
and multivariable plots.
They will also learn to
visualize relationships
between more than one
variable
0
5
10
15
20
25
30
35
40
45
50
Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
Example of a multi variable plot
Company A Company B Company C
Chapter 4: Ethics in data science
At the end of this chapter, students will understand
the following:
• Ethical guidelines around data analysis
• Need for ethical guidelines in data analysis
• Goals of ethical guidelines in data analysis
• Data governance framework
• Why do we need to govern data?
• Goals of data governance
Why do we need ethical guidelines
There are many reasons why adhering to ethical guidelines in data analysis
is essential.
• Guidelines encourage facts, knowledge, and error avoidance.
For Example, prohibitions against falsifying, fabricating, or misrepresenting
data promote the truth and minimize error.
• Ethical guidelines in data analysis also help to build public support for
the analysis. People are more likely to confide in the analysis if they can
trust data quality and integrity.
• Before conducting any new analysis, we ask ourselves whether it will
benefit the people. If it doesn't, we will not do it.
• Minimizing data usage, we should use the least amount of data necessary
to meet the desired objective, understanding that reducing data usage
encourages more sustainable and less risky analysis
Why do we
need to govern
data?
To Improve data quality through efforts to
identify and fix errors in data sets.
To increase analytics accuracy and give
decision-makers reliable information.
To ensure compliance with data privacy laws
and other regulations
To implement and enforce policies that help
prevent data errors and misuse
To avoid inconsistent data in different
departments and business units
To come to an agreement on standard data
definitions for a shared understanding of data
Final Project
Finally, we expect the students to finish two final
projects with the teachers' help to understand the
concepts learned in the previous chapters.
Thank You

More Related Content

Similar to classIX_DS_Teacher_Presentation.pptx

Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in MalaysiaAhmed Elmalla
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3varshakumar21
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Lauri Eloranta
 
Introduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data scienceIntroduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data scienceApurvaLaddha
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxWollo UNiversity
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptxDikshantSharma63
 
Module 4_Data Management.pptx
Module 4_Data Management.pptxModule 4_Data Management.pptx
Module 4_Data Management.pptxClaudineBaladjay1
 
6134DBMS Introduction 5 sem.pptx
6134DBMS Introduction 5 sem.pptx6134DBMS Introduction 5 sem.pptx
6134DBMS Introduction 5 sem.pptxDanishompi
 
DBMS Introduction 5 .pptx
DBMS Introduction 5 .pptxDBMS Introduction 5 .pptx
DBMS Introduction 5 .pptxNitinKashyap89
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1sasi
 

Similar to classIX_DS_Teacher_Presentation.pptx (20)

ml-03x01.pdf
ml-03x01.pdfml-03x01.pdf
ml-03x01.pdf
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
Big Data and Data Mining - Lecture 3 in Introduction to Computational Social ...
 
Introduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data scienceIntroduction to Data science in syllabus of machine intelligence in data science
Introduction to Data science in syllabus of machine intelligence in data science
 
Data presentation .pptx
Data presentation .pptxData presentation .pptx
Data presentation .pptx
 
Lec 3.pptx
Lec 3.pptxLec 3.pptx
Lec 3.pptx
 
Data Analysis
Data AnalysisData Analysis
Data Analysis
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
 
Data-Management.pptx
Data-Management.pptxData-Management.pptx
Data-Management.pptx
 
Introduction to Data Analytics.pptx
Introduction to Data Analytics.pptxIntroduction to Data Analytics.pptx
Introduction to Data Analytics.pptx
 
Module 4_Data Management.pptx
Module 4_Data Management.pptxModule 4_Data Management.pptx
Module 4_Data Management.pptx
 
6134DBMS Introduction 5 sem.pptx
6134DBMS Introduction 5 sem.pptx6134DBMS Introduction 5 sem.pptx
6134DBMS Introduction 5 sem.pptx
 
DBMS Introduction 5 .pptx
DBMS Introduction 5 .pptxDBMS Introduction 5 .pptx
DBMS Introduction 5 .pptx
 
EDA
EDAEDA
EDA
 
Data Science in Python.pptx
Data Science in Python.pptxData Science in Python.pptx
Data Science in Python.pptx
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
ch2 DS.pptx
ch2 DS.pptxch2 DS.pptx
ch2 DS.pptx
 
Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1Fundamentals of Data science Introduction Unit 1
Fundamentals of Data science Introduction Unit 1
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaWSO2
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringWSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityWSO2
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governanceWSO2
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightSafe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuidePixlogix Infotech
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontologyjohnbeverley2021
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...caitlingebhard1
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Recently uploaded (20)

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Modernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using BallerinaModernizing Legacy Systems Using Ballerina
Modernizing Legacy Systems Using Ballerina
 
Choreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software EngineeringChoreo: Empowering the Future of Enterprise Software Engineering
Choreo: Empowering the Future of Enterprise Software Engineering
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
API Governance and Monetization - The evolution of API governance
API Governance and Monetization -  The evolution of API governanceAPI Governance and Monetization -  The evolution of API governance
API Governance and Monetization - The evolution of API governance
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and InsightThe Zero-ETL Approach: Enhancing Data Agility and Insight
The Zero-ETL Approach: Enhancing Data Agility and Insight
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
JavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate GuideJavaScript Usage Statistics 2024 - The Ultimate Guide
JavaScript Usage Statistics 2024 - The Ultimate Guide
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

classIX_DS_Teacher_Presentation.pptx

  • 2. Chapter 1: Introduction to data At the end of this chapter, students will understand the usefulness of studying data. They will know: • What is data and information? • What is DIKW Model? • How data influences our lives? • What are data footprints? • Data loss and recovery
  • 3. What is Data? • Whatever we can read, write, speak, and observe is data. Every day we share so much data when we read, write, speak, and watch. It can be numbers, alphabets, symbols, or a combination of all of these. • Data can be defined as facts or information which when stored, can be used as a basis for decision making, calculation, or discussion.
  • 4. Data vs. Information Data can be a number, symbol, or text, which may not mean anything to individuals on its own. However, when data is processed and put in context, they bear a meaning. This means data can be used for decision-making, calculations, or discussions. The data then becomes information.
  • 5. DIKW Model Data after transformation to information can also be converted to knowledge and wisdom. This is called the DIKW model, which explains how we move from Data to Information to Knowledge to Wisdom.
  • 6. How data influences our lives? Data heavily influence our daily lives, Starting from online shopping to watching our favorite shows on television to ordering food from the best restaurants. Data and data analysis have a significant impact on our lives. Few aspects of our lives that are impacted by data are: • Healthcare • Online shopping • Education • Travel • Online shows
  • 7. What are data footprints? • In our daily life, the internet has become an essential part. With all the activities we do on the internet, we create trails of data. These trails of data are called data footprints. • We create digital footprints that can be used to identify us as individuals. • This image shows how we connect to the rest of the world and the people around us.
  • 8. Data loss and recovery • Data can be lost, corrupted, damaged, or deleted due to multiple reasons like transaction failure, system crash, or disk failure. The process of restoring inaccessible, lost, corrupted, damaged, or deleted data is called data recovery. • To prevent data loss, we should frequently back up our data. Large enterprise systems generally use backup data storage from where they recover the data in case of any loss.
  • 9. Chapter 2: Arranging and Collecting data At the end of this chapter, Students will understand how to arrange and collect data. They will know: • What is data collection? • What are variables? • What are data sources? • What is big data? • Asking questions on data • Univariate vs. Multivariate data
  • 10. What is data collection? The method of gathering data for calculating and analyzing reliable insights is known as data collection, which is done using standard validated techniques. A researcher or scientist works based on the collected data. Data collection is a primary and essential step in most cases.
  • 11. What are variables? A variable is an attribute of an object of study that may vary for different cases. Thus, a variable varies for different case studies in research. Now variables can be of two types. • Numerical variable They represent values that have numbers. For Example, age, weight, height. • Categorical variable These variables represent values that have words, for example, name, nationality, sport, etc.
  • 12. Types of data Data can be divided into two categories, quantitative and qualitative.
  • 13. Types of data sources Data sources can be classified into primary and secondary sources.
  • 14. What is big data? • When the data volumes exceed the processing capacities of traditional databases, they are called Big Data. • Millions of users are using the social media platforms and creating an enormous amount of content every minute. • Systems, which can extract statistical insights from a huge amount of data, are called Big Data systems.
  • 15. Questioning your data • Data is typically stored as numbers (numeric) or labels (categories). Based on the type of data, we need to ask five simple questions from the data. • These are the 5 types of algorithms we will learn in this section
  • 16. Univariate vs Multivariate data • Univariate data is a type of data that has only one variable or which does not involve multiple parameters or relationships. For Example, the height of students is univariate data. • Multivariate data is a type of data which involves a relationship between multiple variables For Example, sales of umbrellas increase during the rainy season. We see umbrella sales are dependent on rainfall. So, there are two variables – "rainfall" and "sales." These types of data are more complex than univariate as they involve comparisons and relations with multiple parameters.
  • 17. Chapter 3: Data Visualizations At the end of this chapter, students will understand how to visualize data and make it more comprehendible. They will know: • The importance of visualization • Plotting data • Histograms • Use of shapes • Use of single and multivariable plots
  • 18. Importance of data visualization • With so much information around us, it is challenging to view the data and derive insights from it. • Representing data through visualizations like graphs, charts, maps, etc., gives us a visual context of the data. • It also makes complex data simple and enables the human mind to understand its significance.
  • 19. Plotting data There are different ways to visualize data depending on the data being modeled and its purpose. Different kinds of graphs and tables can be used to visualize the data. Students will learn about the following in this section: • Dot plots • Bar graphs • Maximum and Minimum • Frequency
  • 20. Dot Plot • A dot plot is a graphical display of data using dots. • Dots are used in dot plots to illustrate the quantitative values associated with the categorical values. For Example, Minutes to reach school. This data shows how long does it take five people to reach the school. Minutes: 6, 2, 4, 8, 5 Person: A, B, C, D, E
  • 21. Bar Graphs • A bar graph is a graphical display of data using bars of different heights. It is possible to plots the bars vertically or horizontally. • In a bar graph, the bars are presented to show elements so that they do not touch each other.
  • 22. Column Chart A bar graph is called a column chart or graph.
  • 23. Maximum and Minimum The minimum of the data is less than or equal to any other values in our data set. If we had to order all our data in ascending order, so the first number in our list would be the minimum. The maximum of the data is greater than or equal to all other values. If we had to order all our data in ascending order, so the last number listed would be the maximum.
  • 24. Frequency The frequency of a data value is the number of times the data value occurs/repeats. For Example, if five students have a score of 85 in English, the score of 85 is said to have a frequency of 5. The frequency of a data value is often represented by f.
  • 25. Histograms and Use of shapes • A histogram is a graphical display of data using bars of different heights. It is used to summarize discrete or continuous data. • In other words, a histogram displays the number of data points that fall within a given set of values (called 'bins') to provide a visual representation of numeric data. Unlike a vertical bar graph, a histogram shows no gaps between the bars.
  • 26. Use of single and multivariable plots Finally, the students will learn about the use of single and multivariable plots. They will also learn to visualize relationships between more than one variable 0 5 10 15 20 25 30 35 40 45 50 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Example of a multi variable plot Company A Company B Company C
  • 27. Chapter 4: Ethics in data science At the end of this chapter, students will understand the following: • Ethical guidelines around data analysis • Need for ethical guidelines in data analysis • Goals of ethical guidelines in data analysis • Data governance framework • Why do we need to govern data? • Goals of data governance
  • 28. Why do we need ethical guidelines There are many reasons why adhering to ethical guidelines in data analysis is essential. • Guidelines encourage facts, knowledge, and error avoidance. For Example, prohibitions against falsifying, fabricating, or misrepresenting data promote the truth and minimize error. • Ethical guidelines in data analysis also help to build public support for the analysis. People are more likely to confide in the analysis if they can trust data quality and integrity. • Before conducting any new analysis, we ask ourselves whether it will benefit the people. If it doesn't, we will not do it. • Minimizing data usage, we should use the least amount of data necessary to meet the desired objective, understanding that reducing data usage encourages more sustainable and less risky analysis
  • 29. Why do we need to govern data? To Improve data quality through efforts to identify and fix errors in data sets. To increase analytics accuracy and give decision-makers reliable information. To ensure compliance with data privacy laws and other regulations To implement and enforce policies that help prevent data errors and misuse To avoid inconsistent data in different departments and business units To come to an agreement on standard data definitions for a shared understanding of data
  • 30. Final Project Finally, we expect the students to finish two final projects with the teachers' help to understand the concepts learned in the previous chapters.

Editor's Notes

  1. At the end of this chapter, we have a fun activity for the students.
  2. Examples of Data: 100 (number – integer) 11.2 (number – decimal) A (alphabet) Apple (word – a set of alphabets) 18/12/2020 (date – a combination of numbers and symbols).
  3. If you are given a list of temperature readings, it would not make any sense. However, when the list is organized and analyzed, it shows that the global temperature is rising. This list now becomes information from data.
  4. 100°C is a piece of information, but it is also the water's boiling temperature. This then becomes knowledge. Also, if you touch boiling water, you may get burnt. This now becomes wisdom. Thus, we see data itself does not have much importance. It is the analysis of the data that has actual value.
  5. The teacher should ask the students to give a few examples of how data influences their lives.
  6. The teacher should ask the students to think of the positive and negative impacts of data footprints.
  7. The teacher should ask the students to read about some computer virus and ransomware that has recently created problems for organizations.
  8. The teacher should ask the students about various data collection done by social media in everyday life.
  9. The teacher should ask the students to give examples of univariate and multivariate data
  10. At the end of this chapter, we have some activities for the students.
  11. The teacher should ask the students to give more examples of data visualizations and where they can be used.
  12. The teacher should encourage the students to use various graphs and charts to plot a data set.
  13. The teacher should ask the students to create a histogram from a data set and to find out the shape of the histogram.
  14. The teacher should briefly explain why it is vital to exhibit ethical behavior while dealing with data. The teacher should give few simple examples to explain the concept in a better way.
  15. The teacher should help the students formulate the statistical investigative questions and encourage the students to do the analysis on their own.