SlideShare a Scribd company logo
What is Data?
Data
The critical mass of data being generated thanks to the internet,
improved computing technology and the development of Data
Analytics. The world’s most valuable resource is no longer oil, but data.
Data does not have any meaning unless we study it and make inference
out of it or draw insights from it. Think of data analytics as the process
of extracting usable fuel from crude oil…just like crude oil is first mined
from sea and then this crude oil is cleaned, processed to get good
quality fuel, similarly Data is first mined from relevant sources and then
data is cleaned ,analysed for any meaningful insights.
“Data is a collection of facts, such as numbers, words, measurements, observations or even just descriptions
of things. Data may be in the form of text documents, images, audio clips, software programs, or other types
of data. If data is not put into context, it doesn't do anything to a human or computer.”
Unit Measurement Value
Bit 1 bit
Byte 8 bits
Kilobyte 1024 bytes
Megabyte 1024 kilobytes
Gigabyte 1024 megabyte
Terabyte 1024 gigabyte
Petabyte 1024 terabyte
Exabyte 1024 petabyte
Zettabyte 1024 exabyte
Yottabyte 1024 zettabyte
Brontobyte 1024 yottabyte
Common Data measurements
Classification of data
• Data can be classified as Primary and secondary data.
Primary Data:
Primary data are the facts and figures collected for the problem at hand
by an investigator or group of investigators directly. This can be further
divided into two namely Observational Data and Questionnaire data.
• Observational data are those which are collected by observing people,
activity or processes. Collecting data from mechanical or electronic (IoT)
devices.
• Questionnaire data are the ones which are collected using in-depth interview
or collecting data using questionnaire forms either manually, telephonically or
using internet medium.
Classification of data
Secondary Data:
when data is collated from a source which already had the information
stored. These facts and figures might have been recorded for an earlier
project by an individual, agency or government. Secondary data can be
divided into two types.
• Internal data: These are the data which are generated from inside the
organization which is captured through ERP systems, CRM system or any
other transactional data system. From these data Financial statements,
vendor/customer lists, different reports of interest are produced.
• External Data: these are data which is present external to the organization
like the survey reports of an external agency, reports of periodicals and
magazines, reports published by government etc.
Data
Primary Data Secondary Data
Observational Data Questionnaire Data Internal Data Secondary Data
Classification of data
Pros and Cons of Primary and Secondary data
• Advantages of Primary data:
• The researcher can decide on the variables, from where to collect, when and
why to collect.
• He can decide upon the size of the data required for problem at hand.
• He can personally collect the data or hire an agency.
• Since the data is collected in the supervision of the researcher, data cleaning
might not be required.
• Disadvantages of Primary data:
• Highly expensive both in terms of money and time.
• Need to keep a tab on the quality of data.
Pros and Cons of Primary and Secondary data
• Advantages of Secondary data:
• Since the data is already available, the data can be procured without wasting
any time in collecting it.
• The cost of acquiring data is relatively inexpensive.
• The researcher is not personally responsible for the quality of data.
• Disadvantages of Secondary data:
• Data quality cannot be guaranteed i.e. fake data or blank data might be there.
• Data can be insufficient or inaccurate.
• Lots of cleaning of data might be required.
Data can be stored in file formats, as in mainframe systems using ISAM and VSAM. Other file formats for data
storage include comma-separated values. These formats continued to find uses across a variety of machine
types.
In corporate computing or enterprise softwares the data is stored in database, database management
system(DBMS) or relational database systems (RDBMS). Where as the data from IoT, Social media etc are
stored in data lakes
How is data stored?
Types of Data
• Data can be divided into two types viz. Structured data and Unstructured
data
• In Structured data, the data is organized into a table in rows and columns
in a formatted structure, typically a database, so that its data can be used
for more effective processing and analysis. By storing the data in structured
format, each field is discrete and its information can be retrieved either
separately or along with data from other fields, in a variety of
combinations. For example numbers, words, measurements, observations
or even just descriptions of things. The transactional data in financial
systems (ERP) and other business applications are some of the places
where structured data is used. Structured data is stored in database,
database management system (DBMS), relational database systems
(RDBMS) or data warehouse.
Types of Data
• Unstructured data:
Unstructured data is information, in different forms than the ones used
in conventional data models and isn't a good fit for a mainstream
relational database. The emergence of internet has resulted into data
explosion and there are terabytes of data generated every second,
formats aren't uniform are in the form of Videos, audios, photos, e-
mails, word documents, power-point presentations and the list can go
on. Due to the advent of IoT (internet of Things) there are many data
generated by the sensors attached in machines, automobiles, server log
files and social media feeds etc. Unstructured data is generally stored
on Data Lakes.
Difference between Structured data and Unstructured data
• Out of the above two unstructured data is the least formatted and
structured data is the most formatted. Below picture depicts
Difference between Structured and Unstructured data
Structured Data Unstructured Data
Characteristics • Usually numbers, texts
• Easy to search
• Pre-defined structure like tables
• Highly organised
• Easy to analyse
• Text, videos, audios, images or other
formats
• No pre-defined structure
• Difficult to search
Stored in Relational database, Data warehouses Dataware houses, Datalakes, NoSQL
databases
Applications • ERP systems like SAP, Oracle etc.
• CRM systems like SiebelCRM
• Railway/Airlines reservation system
• Spread sheet
• Word, Powerpoint files
• Email client
• Social Media sites
• E-commerce sites
Flexibility • Schema dependent, Very rigid • Very flexible, Absence of schema
Example • Transactions in ERP
• Date
• Phone numbers
• Amount
• Names
• Text
• Email messages
• Social media posts
• Audio files
• IoT sensor data

More Related Content

What's hot

Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
NyraSehgal
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
Ahmed Amr Abdul-Fattah
 
data scientists and their role
data scientists and their roledata scientists and their role
data scientists and their role
bhavesh lande
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
ActonRoy
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
VijayMohan Vasu
 
Big data, Machine learning and the Auditor
Big data, Machine learning and the AuditorBig data, Machine learning and the Auditor
Big data, Machine learning and the Auditor
Bharath Rao
 
The Impact of Data Science on Finance
The Impact of Data Science on FinanceThe Impact of Data Science on Finance
The Impact of Data Science on Finance
Roger Fried
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
bhavesh lande
 
Big Data in Banking (White paper)
Big Data in Banking (White paper)Big Data in Banking (White paper)
Big Data in Banking (White paper)
InData Labs
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
Utkarsh Sharma
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
Trillium Software
 
Data Analytics in Azure Cloud
Data Analytics in Azure CloudData Analytics in Azure Cloud
Data Analytics in Azure Cloud
Microsoft Canada
 
Vikrant data scientist
Vikrant data scientistVikrant data scientist
Vikrant data scientist
Vikrant Narayan
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data mining
cegonsoft1999
 
data science
data sciencedata science
data science
skhraletta
 
Predictive Analytics: Business Perspective & Use Cases
Predictive Analytics: Business Perspective & Use CasesPredictive Analytics: Business Perspective & Use Cases
Predictive Analytics: Business Perspective & Use Cases
Cagri Sarigoz
 
Future of datascience
Future of datascienceFuture of datascience
Future of datascience
jyostnanareshit
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
RAVIKANTSHARMA98
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
Enes Bolfidan
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
Sandeep Garg
 

What's hot (20)

Welcome to Data Science
Welcome to Data ScienceWelcome to Data Science
Welcome to Data Science
 
Data Analytics Career Paths
Data Analytics Career PathsData Analytics Career Paths
Data Analytics Career Paths
 
data scientists and their role
data scientists and their roledata scientists and their role
data scientists and their role
 
Career in Data Science
Career in Data ScienceCareer in Data Science
Career in Data Science
 
Data science & data scientist
Data science & data scientistData science & data scientist
Data science & data scientist
 
Big data, Machine learning and the Auditor
Big data, Machine learning and the AuditorBig data, Machine learning and the Auditor
Big data, Machine learning and the Auditor
 
The Impact of Data Science on Finance
The Impact of Data Science on FinanceThe Impact of Data Science on Finance
The Impact of Data Science on Finance
 
introduction to data science
introduction to data scienceintroduction to data science
introduction to data science
 
Big Data in Banking (White paper)
Big Data in Banking (White paper)Big Data in Banking (White paper)
Big Data in Banking (White paper)
 
Introduction to Data Analytics
Introduction to Data AnalyticsIntroduction to Data Analytics
Introduction to Data Analytics
 
The Bigger They Are The Harder They Fall
The Bigger They Are The Harder They FallThe Bigger They Are The Harder They Fall
The Bigger They Are The Harder They Fall
 
Data Analytics in Azure Cloud
Data Analytics in Azure CloudData Analytics in Azure Cloud
Data Analytics in Azure Cloud
 
Vikrant data scientist
Vikrant data scientistVikrant data scientist
Vikrant data scientist
 
Presentation data mining
Presentation data miningPresentation data mining
Presentation data mining
 
data science
data sciencedata science
data science
 
Predictive Analytics: Business Perspective & Use Cases
Predictive Analytics: Business Perspective & Use CasesPredictive Analytics: Business Perspective & Use Cases
Predictive Analytics: Business Perspective & Use Cases
 
Future of datascience
Future of datascienceFuture of datascience
Future of datascience
 
Big data analytics
Big data analyticsBig data analytics
Big data analytics
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 

Similar to What is Data?

Big data
Big dataBig data
Overview of Big Data
Overview of Big DataOverview of Big Data
Overview of Big Data
LexiConn Content Services
 
What is big data
What is big dataWhat is big data
What is big data
mintubutani2212
 
Digital data
Digital dataDigital data
Digital data
ShivanandaVSeeri
 
Digital Types
Digital TypesDigital Types
Digital Types
ShivanandaVSeeri
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptx
RupaRani28
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
Utkarsh Sharma
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
varshakumar21
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Data Analytics-Unit 1 , this Is ppt for student help
Data Analytics-Unit 1 , this Is ppt for student helpData Analytics-Unit 1 , this Is ppt for student help
Data Analytics-Unit 1 , this Is ppt for student help
SaurabhJaiswal790114
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
Umair Shafique
 
Big data ppt
Big data pptBig data ppt
Big data ppt
Deepika ParthaSarathy
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
AbdulrahimShaibuIssa
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
infinix8
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
varshakumar21
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
Dr Geetha Mohan
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
Wollo UNiversity
 
Bigdata
Bigdata Bigdata
Bigdata
NithiDazz
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
Umair Shafique
 
A beginner's guide to Big data
A beginner's guide to Big dataA beginner's guide to Big data
A beginner's guide to Big data
AnushkaGupta763558
 

Similar to What is Data? (20)

Big data
Big dataBig data
Big data
 
Overview of Big Data
Overview of Big DataOverview of Big Data
Overview of Big Data
 
What is big data
What is big dataWhat is big data
What is big data
 
Digital data
Digital dataDigital data
Digital data
 
Digital Types
Digital TypesDigital Types
Digital Types
 
Business Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptxBusiness Intelligence and Analytics Unit-2 part-A .pptx
Business Intelligence and Analytics Unit-2 part-A .pptx
 
Introduction to Big Data Analytics
Introduction to Big Data AnalyticsIntroduction to Big Data Analytics
Introduction to Big Data Analytics
 
Data science.chapter-1,2,3
Data science.chapter-1,2,3Data science.chapter-1,2,3
Data science.chapter-1,2,3
 
Ch~2.pdf
Ch~2.pdfCh~2.pdf
Ch~2.pdf
 
Data Analytics-Unit 1 , this Is ppt for student help
Data Analytics-Unit 1 , this Is ppt for student helpData Analytics-Unit 1 , this Is ppt for student help
Data Analytics-Unit 1 , this Is ppt for student help
 
Introduction to Big Data
Introduction to Big DataIntroduction to Big Data
Introduction to Big Data
 
Big data ppt
Big data pptBig data ppt
Big data ppt
 
Introduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdfIntroduction to Business and Data Analysis Undergraduate.pdf
Introduction to Business and Data Analysis Undergraduate.pdf
 
TOPIC.pptx
TOPIC.pptxTOPIC.pptx
TOPIC.pptx
 
Data science unit1
Data science unit1Data science unit1
Data science unit1
 
INTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOPINTRODUCTION TO BIG DATA AND HADOOP
INTRODUCTION TO BIG DATA AND HADOOP
 
Chapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptxChapter 2 - Introduction to Data Science.pptx
Chapter 2 - Introduction to Data Science.pptx
 
Bigdata
Bigdata Bigdata
Bigdata
 
Handling and Processing Big Data
Handling and Processing Big DataHandling and Processing Big Data
Handling and Processing Big Data
 
A beginner's guide to Big data
A beginner's guide to Big dataA beginner's guide to Big data
A beginner's guide to Big data
 

Recently uploaded

Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
Celine George
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
Celine George
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
Nicholas Montgomery
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
ak6969907
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
Kavitha Krishnan
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 

Recently uploaded (20)

Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
How to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRMHow to Manage Your Lost Opportunities in Odoo 17 CRM
How to Manage Your Lost Opportunities in Odoo 17 CRM
 
How to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP ModuleHow to Add Chatter in the odoo 17 ERP Module
How to Add Chatter in the odoo 17 ERP Module
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Film vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movieFilm vocab for eal 3 students: Australia the movie
Film vocab for eal 3 students: Australia the movie
 
World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024World environment day ppt For 5 June 2024
World environment day ppt For 5 June 2024
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Assessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptxAssessment and Planning in Educational technology.pptx
Assessment and Planning in Educational technology.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 

What is Data?

  • 2. Data The critical mass of data being generated thanks to the internet, improved computing technology and the development of Data Analytics. The world’s most valuable resource is no longer oil, but data. Data does not have any meaning unless we study it and make inference out of it or draw insights from it. Think of data analytics as the process of extracting usable fuel from crude oil…just like crude oil is first mined from sea and then this crude oil is cleaned, processed to get good quality fuel, similarly Data is first mined from relevant sources and then data is cleaned ,analysed for any meaningful insights.
  • 3. “Data is a collection of facts, such as numbers, words, measurements, observations or even just descriptions of things. Data may be in the form of text documents, images, audio clips, software programs, or other types of data. If data is not put into context, it doesn't do anything to a human or computer.” Unit Measurement Value Bit 1 bit Byte 8 bits Kilobyte 1024 bytes Megabyte 1024 kilobytes Gigabyte 1024 megabyte Terabyte 1024 gigabyte Petabyte 1024 terabyte Exabyte 1024 petabyte Zettabyte 1024 exabyte Yottabyte 1024 zettabyte Brontobyte 1024 yottabyte Common Data measurements
  • 4. Classification of data • Data can be classified as Primary and secondary data. Primary Data: Primary data are the facts and figures collected for the problem at hand by an investigator or group of investigators directly. This can be further divided into two namely Observational Data and Questionnaire data. • Observational data are those which are collected by observing people, activity or processes. Collecting data from mechanical or electronic (IoT) devices. • Questionnaire data are the ones which are collected using in-depth interview or collecting data using questionnaire forms either manually, telephonically or using internet medium.
  • 5. Classification of data Secondary Data: when data is collated from a source which already had the information stored. These facts and figures might have been recorded for an earlier project by an individual, agency or government. Secondary data can be divided into two types. • Internal data: These are the data which are generated from inside the organization which is captured through ERP systems, CRM system or any other transactional data system. From these data Financial statements, vendor/customer lists, different reports of interest are produced. • External Data: these are data which is present external to the organization like the survey reports of an external agency, reports of periodicals and magazines, reports published by government etc.
  • 6. Data Primary Data Secondary Data Observational Data Questionnaire Data Internal Data Secondary Data Classification of data
  • 7. Pros and Cons of Primary and Secondary data • Advantages of Primary data: • The researcher can decide on the variables, from where to collect, when and why to collect. • He can decide upon the size of the data required for problem at hand. • He can personally collect the data or hire an agency. • Since the data is collected in the supervision of the researcher, data cleaning might not be required. • Disadvantages of Primary data: • Highly expensive both in terms of money and time. • Need to keep a tab on the quality of data.
  • 8. Pros and Cons of Primary and Secondary data • Advantages of Secondary data: • Since the data is already available, the data can be procured without wasting any time in collecting it. • The cost of acquiring data is relatively inexpensive. • The researcher is not personally responsible for the quality of data. • Disadvantages of Secondary data: • Data quality cannot be guaranteed i.e. fake data or blank data might be there. • Data can be insufficient or inaccurate. • Lots of cleaning of data might be required.
  • 9. Data can be stored in file formats, as in mainframe systems using ISAM and VSAM. Other file formats for data storage include comma-separated values. These formats continued to find uses across a variety of machine types. In corporate computing or enterprise softwares the data is stored in database, database management system(DBMS) or relational database systems (RDBMS). Where as the data from IoT, Social media etc are stored in data lakes How is data stored?
  • 10. Types of Data • Data can be divided into two types viz. Structured data and Unstructured data • In Structured data, the data is organized into a table in rows and columns in a formatted structure, typically a database, so that its data can be used for more effective processing and analysis. By storing the data in structured format, each field is discrete and its information can be retrieved either separately or along with data from other fields, in a variety of combinations. For example numbers, words, measurements, observations or even just descriptions of things. The transactional data in financial systems (ERP) and other business applications are some of the places where structured data is used. Structured data is stored in database, database management system (DBMS), relational database systems (RDBMS) or data warehouse.
  • 11. Types of Data • Unstructured data: Unstructured data is information, in different forms than the ones used in conventional data models and isn't a good fit for a mainstream relational database. The emergence of internet has resulted into data explosion and there are terabytes of data generated every second, formats aren't uniform are in the form of Videos, audios, photos, e- mails, word documents, power-point presentations and the list can go on. Due to the advent of IoT (internet of Things) there are many data generated by the sensors attached in machines, automobiles, server log files and social media feeds etc. Unstructured data is generally stored on Data Lakes.
  • 12. Difference between Structured data and Unstructured data • Out of the above two unstructured data is the least formatted and structured data is the most formatted. Below picture depicts Difference between Structured and Unstructured data Structured Data Unstructured Data Characteristics • Usually numbers, texts • Easy to search • Pre-defined structure like tables • Highly organised • Easy to analyse • Text, videos, audios, images or other formats • No pre-defined structure • Difficult to search Stored in Relational database, Data warehouses Dataware houses, Datalakes, NoSQL databases Applications • ERP systems like SAP, Oracle etc. • CRM systems like SiebelCRM • Railway/Airlines reservation system • Spread sheet • Word, Powerpoint files • Email client • Social Media sites • E-commerce sites Flexibility • Schema dependent, Very rigid • Very flexible, Absence of schema Example • Transactions in ERP • Date • Phone numbers • Amount • Names • Text • Email messages • Social media posts • Audio files • IoT sensor data