Types of Digital Data
Types of Digital Data
Definition
Sources of Digital Data
Types of Digital Data
 Structured Data
 Semi-structured Data
 Unstructured Data
Definition and Meaning
of
Digital Data
Definition of Digital Data
Digital describes electronic technology that generates, stores,
and processes data in terms of two states: positive and non-
positive. Positive is expressed or represented by the number
1 and non-positive by the number 0.
Digital data is information stored on a computer system as a
series of 0's and 1's in a binary language. All data in the
computer is in digital form.
Digital data is data that represents other forms of data using
specific machine language systems that can be interpreted
by various technologies.
Meaning of Digital Data
 Digital means recording or storing information as series of
the numbers 1s and 0s.
 The most fundamental of these systems is a binary system,
which simply stores complex audio, video or text
information in a series of binary characters, traditionally
ones and zeros.
 Digital data is a binary language.
 When you press a key on the keyboard, an electrical circuit
is closed.
 The circuit acts like a switch and has only two possible
options: open or closed.
Meaning of Digital Data
 If you know Morse code, the idea is the same.
 A string of dashes and dots represents one letter or
number. This is binary.
 There is no halfway or in-between.
 The status of the switch as open or closed is interpreted by
the computer as a 0 or 1.
 Each digit is known as a bit.
 Computer disks and drives store this information as lines of
0's and 1's.
 A byte is composed of eight bits
Sources of Digital Data
Sources of Digital Data
 A data source, in the context of computer science and
computer applications, is the location where data that is
being used come from.
 The data source can be a database, a dataset, or a
spreadsheet.
 Data, as we know, is massive and exists in various forms.
 If it is not classified or sourced well, it can end up wasting
precious time and resources.
 It is important that companies have the know-how
between the various data sources available and
accordingly classify its usability and relevance.
Sources of Digital Data
 Following are few important sources of Digital Data for
most of the industries:
1. Internal Transactional Data
2. IoT as a big Source
3. Syndicated Data
4. Trading Partners Data
5. Open Data
6. Media as a big Data Source
Sources of Digital Data
1. Internal Transactional Data
 Transactional data: data relating to the day-to-day
transactions.
 Specially, transactions happens internally (inside the
organizational boundary) are called as Internal
Transactional Data.
 Internal data, and especially data which is the easiest
because you don't have to negotiate a formal contract with
a third party.
 Its also important to secure internal data from not only
outer world but also from internal ( by giving right of
access) boundary also.
 Often, these data are playing vital role in decision making.
Sources of Digital Data
1. Internal Transactional Data
 Some of the examples of internal transactional data can be
purchases, returns, sales, inventory, orders, invoices,
payments, etc…
 It helps in taking most of routine decisions in the context
of business surely after analyzing only.
 Make sure the proper processes are in place, and follow
very closely organization and staff movements to inform
new stakeholders of why your access to data must remain
safe.
 Though it is easiest source but valuable to maintain
secrecy and utility of it for future purpose.
Sources of Digital Data
2. IoT as a Big Data Source
 Machine-generated content or data created from IoT
constitute a valuable source of big data.
 This data is usually generated from the sensors that are
connected to electronic devices.
 The sourcing capacity depends on the ability of the sensors
to provide real-time accurate information.
 IoT is now gaining momentum and includes big data
generated, not only from computers and smartphones, but
also possibly from every device that can emit data.
 With IoT, data can now be sourced from medical devices,
vehicular processes, video games, meters, cameras,
household appliances, and the like.
Sources of Digital Data
3. Syndicated Data
 A syndicated service is a research study which is
conducted and funded by a market research firm but not
for any specific client is called a syndicated research.
 The result of such research is often provided in the form of
reports, presentations, raw data
 Syndicated data is usually the easiest to control.
 Because you are paying a service provider to deliver data
to you, you have a contract with this provider.
 However, you still need to consider what will happen if the
service provider goes out of business, or changes its
business model.
 Companies have to utilizing data well as its not at all free.
Sources of Digital Data
4. Trading Partners Data
 The case of trading partners data is very similar to the one
of syndicated data, except that the data is usually not
provided as a standalone service but as part of a broader
relationship -- for example between a retailer and a
manufacturer.
 Companies have to develop processes to collect data from
their channel partners.
 Trade partners can give valuable contribution by supplying
data pertaining to insights of consumer.
 Often trade partners are not motivated to do so, company
needs to develop approaches to motivate them to
consistently supply the same.
Sources of Digital Data
5. Open Data
 The good news with open data is that it's free -- but it's also
the bad news.
 Assuming you study carefully the terms of use and licensing
agreement for the data, you should be safe legally.
 But there is no guarantee that this service will be provided in
the long run, or that it will be provided consistently.
 The risks of the access methods provided, is very high.
 And if the service is not responding, you have no recourse.
 Find multiple sources, and do not build your business on the
assumption that open data feeds will remain available in the
long run.
Sources of Digital Data
6. Media as a Big Data Source
 Media is the most popular source of big data, as it provides
valuable insights on consumer preferences and changing
trends.
 It is the fastest way for businesses to get an in-depth
overview of their target audience, draw patterns and
conclusions, and enhance their decision-making.
 Media includes social media and interactive platforms, like
Google, Facebook, Twitter, YouTube, Instagram, as well as
generic media like images, videos, audios, and podcasts
that provide quantitative and qualitative insights on every
aspect of user interaction.
Types of Digital Data
- Structured Data
- Unstructured Data
- Semi structured Data
Structured Data
1. Definition
 The data which is in an organized form (e.g. in rows and columns)
and can be used by a computer program are called as “Structured
Data”.
 Structured data exists in a format created to be captured, stored,
organized and analyzed.
 Structured data are data that are organized in a format easily
used by a database or other technology.
 The term structured data generally refers to data that has
a defined length and format for big data.
 Structured data is data that has been organized into a formatted
repository (a central location in which data is stored and
managed), typically a database, so that its elements can be made
addressable for more effective processing and analysis.
Structured Data
2. Sources of Structured Data
Databases (e.g.,
Access)
Spreadsheets
SQL
OLTP systems
Structured Data
SQL stands for Structured Query Language. SQL lets you access and manipulate databases.
Structured Data
3. Storage of Structured Data
Relational Database
Data Warehouse
Spreadsheet
Structured Data
Structured Data
Example of Structured Data
Structured Data
4. Characteristics of Structured Data
Structured
data
Conforms to a
data model
Data is stored in
form of rows and
columns
(e.g., relational
database)
Data resides in
fixed fields within
a record or file
Definition, format
& meaning of data
is explicitly
known
Attributes in a
group are the
same
Similar entities
are grouped
Summary of Structured Data
Unstructured Data
1. Definition
 Unstructured data is information that either does not have a pre-
defined data model or is not organized in a pre-defined manner.
 Unstructured data represents any data that does not have a
recognizable structure.
 Unstructured data, in contrast, refers to data that doesn't fit
neatly into the traditional row and column structure of relational
databases.
 Data which does not conform to a data model or is not in a form
which can be used easily by a computer program.
 E.g. memos, chat rooms, PowerPoint presentations, images,
audios, videos, letters, researches, white papers, body of an e-
mail etc.
Formats of Digital Data
Unstructured Data
2. Sources of Unstructured Data
Web pages
Memos
Videos (MPEG, etc.)
Images (JPEG, GIF, etc.)
Body of an e-mail
Word document
PowerPoint
presentations
Chats
Reports
Whitepapers
Surveys
Unstructured data
Unstructured Data
3. Challenges in Storage of Unstructured Data
Storage
Space
Scalability
Retrieve
information
Security
Update and
delete
Indexing and
searching
Sheer volume of unstructured data and its unprecedented
growth makes it difficult to store. Audios, videos, images,
etc. acquire huge amount of storage space
Scalability becomes an issue with increase
in unstructured data
Retrieving and recovering unstructured
data are cumbersome
Ensuring security is difficult due to
varied sources of data (e.g. e-mail, web
pages)
Updating, deleting, etc. are not easy due to
the unstructured form
Indexing becomes difficult with increase in data.
Searching is difficult for non-text data
Challenges faced
Unstructured Data
3. Solution for Storage of Unstructured Data
Change
formats
New
hardware
RDBMS/
BLOBs
XML
CAS
Unstructured data may be be converted to formats
which are easily managed, stored and searched. For
example, IBM is working on providing a solution
which converts audio , video, etc. to text
Create hardware which support
unstructured data either compliment the
existing storage devices or be a stand
alone for unstructured data
Store in relational databases which
support BLOBs which is Binary
Large Objects
Store in XML which tries to give some
structure to unstructured data by using tags
and elements
Organize files based on their metadata
Possible solutions
Unstructured Data
3. Solution for Storage of Unstructured Data
 A Binary Large Object (BLOBs) is a collection of binary data
stored as a single entity in a database management system.
Blobs are typically images, audio or other multimedia
objects, though sometimes binary executable code is stored
as a blob.
 Extensible Markup Language (XML) is a markup language
that defines a set of rules for encoding documents in a
format that is both human-readable and machine-readable.
 Content-addressable storage (CAS) is a way
of storing information that can be retrieved based on
its content, instead of its storage location. It is used
extensively to store e-mails.
Unstructured Data
4. Characteristics of Unstructured Data
Unstructured
data
Does not
conform to any
data model
Cannot be
stored in form
of rows and
columns as in a
database
Not in any
particular
format or
sequence
Not easily
usable by a
program
Does not
follow any
rules
Has no easily
identifiable
structure
Semi-structured Data
1. Definition
 Data which does not conform to a data model but has
some structure. It is not in a form which can be used easily
by a computer program.
 It is structured data, but it is not organized in a rational
model, like a table.
 Semi-structured data is information that does not reside in
a rational database but that have some organizational
properties that make it easier to analyze.
 With some process, you can store them in the relation
database.
Semi-structured Data
2. Sources of Semi-structured
E-mail
XML
TCP/IP packets
Zipped files
Binary
executables
Mark-up languages
Integration of data from
heterogeneous sources
Semi-structured
data
Semi-structured Data
3. Storage of Semi-structured
Schemas
• Describe the
structure and
content of data to
some extent
• Assign meaning to
data hence
allowing automatic
search and
indexing
Graph-based data
models
• In computing,
a graph
database (GDB) is
a database that
uses graph structur
es for representing
and storing data.
• Used for data
exchange among
heterogeneous
sources
XML
• Markup
language XML This
is a semi-
structured document
language.
Semi-structured Data
4. Characteristics of Semi-structured Data
Semi-
structured
data
Does not
conform to a
data model but
contains tags &
elements
(metadata) Cannot be
stored in form
of rows and
columns as in a
database
The tags and
elements
describe how
data is stored
Not sufficient
Metadata
Attributes in a
group may not
be the same
Similar entities
are grouped

431152222-Types-of-Digital-Data.powerppt

  • 1.
  • 2.
    Types of DigitalData Definition Sources of Digital Data Types of Digital Data  Structured Data  Semi-structured Data  Unstructured Data
  • 3.
  • 4.
    Definition of DigitalData Digital describes electronic technology that generates, stores, and processes data in terms of two states: positive and non- positive. Positive is expressed or represented by the number 1 and non-positive by the number 0. Digital data is information stored on a computer system as a series of 0's and 1's in a binary language. All data in the computer is in digital form. Digital data is data that represents other forms of data using specific machine language systems that can be interpreted by various technologies.
  • 5.
    Meaning of DigitalData  Digital means recording or storing information as series of the numbers 1s and 0s.  The most fundamental of these systems is a binary system, which simply stores complex audio, video or text information in a series of binary characters, traditionally ones and zeros.  Digital data is a binary language.  When you press a key on the keyboard, an electrical circuit is closed.  The circuit acts like a switch and has only two possible options: open or closed.
  • 6.
    Meaning of DigitalData  If you know Morse code, the idea is the same.  A string of dashes and dots represents one letter or number. This is binary.  There is no halfway or in-between.  The status of the switch as open or closed is interpreted by the computer as a 0 or 1.  Each digit is known as a bit.  Computer disks and drives store this information as lines of 0's and 1's.  A byte is composed of eight bits
  • 7.
  • 8.
    Sources of DigitalData  A data source, in the context of computer science and computer applications, is the location where data that is being used come from.  The data source can be a database, a dataset, or a spreadsheet.  Data, as we know, is massive and exists in various forms.  If it is not classified or sourced well, it can end up wasting precious time and resources.  It is important that companies have the know-how between the various data sources available and accordingly classify its usability and relevance.
  • 9.
    Sources of DigitalData  Following are few important sources of Digital Data for most of the industries: 1. Internal Transactional Data 2. IoT as a big Source 3. Syndicated Data 4. Trading Partners Data 5. Open Data 6. Media as a big Data Source
  • 10.
    Sources of DigitalData 1. Internal Transactional Data  Transactional data: data relating to the day-to-day transactions.  Specially, transactions happens internally (inside the organizational boundary) are called as Internal Transactional Data.  Internal data, and especially data which is the easiest because you don't have to negotiate a formal contract with a third party.  Its also important to secure internal data from not only outer world but also from internal ( by giving right of access) boundary also.  Often, these data are playing vital role in decision making.
  • 11.
    Sources of DigitalData 1. Internal Transactional Data  Some of the examples of internal transactional data can be purchases, returns, sales, inventory, orders, invoices, payments, etc…  It helps in taking most of routine decisions in the context of business surely after analyzing only.  Make sure the proper processes are in place, and follow very closely organization and staff movements to inform new stakeholders of why your access to data must remain safe.  Though it is easiest source but valuable to maintain secrecy and utility of it for future purpose.
  • 12.
    Sources of DigitalData 2. IoT as a Big Data Source  Machine-generated content or data created from IoT constitute a valuable source of big data.  This data is usually generated from the sensors that are connected to electronic devices.  The sourcing capacity depends on the ability of the sensors to provide real-time accurate information.  IoT is now gaining momentum and includes big data generated, not only from computers and smartphones, but also possibly from every device that can emit data.  With IoT, data can now be sourced from medical devices, vehicular processes, video games, meters, cameras, household appliances, and the like.
  • 13.
    Sources of DigitalData 3. Syndicated Data  A syndicated service is a research study which is conducted and funded by a market research firm but not for any specific client is called a syndicated research.  The result of such research is often provided in the form of reports, presentations, raw data  Syndicated data is usually the easiest to control.  Because you are paying a service provider to deliver data to you, you have a contract with this provider.  However, you still need to consider what will happen if the service provider goes out of business, or changes its business model.  Companies have to utilizing data well as its not at all free.
  • 14.
    Sources of DigitalData 4. Trading Partners Data  The case of trading partners data is very similar to the one of syndicated data, except that the data is usually not provided as a standalone service but as part of a broader relationship -- for example between a retailer and a manufacturer.  Companies have to develop processes to collect data from their channel partners.  Trade partners can give valuable contribution by supplying data pertaining to insights of consumer.  Often trade partners are not motivated to do so, company needs to develop approaches to motivate them to consistently supply the same.
  • 15.
    Sources of DigitalData 5. Open Data  The good news with open data is that it's free -- but it's also the bad news.  Assuming you study carefully the terms of use and licensing agreement for the data, you should be safe legally.  But there is no guarantee that this service will be provided in the long run, or that it will be provided consistently.  The risks of the access methods provided, is very high.  And if the service is not responding, you have no recourse.  Find multiple sources, and do not build your business on the assumption that open data feeds will remain available in the long run.
  • 16.
    Sources of DigitalData 6. Media as a Big Data Source  Media is the most popular source of big data, as it provides valuable insights on consumer preferences and changing trends.  It is the fastest way for businesses to get an in-depth overview of their target audience, draw patterns and conclusions, and enhance their decision-making.  Media includes social media and interactive platforms, like Google, Facebook, Twitter, YouTube, Instagram, as well as generic media like images, videos, audios, and podcasts that provide quantitative and qualitative insights on every aspect of user interaction.
  • 17.
    Types of DigitalData - Structured Data - Unstructured Data - Semi structured Data
  • 18.
    Structured Data 1. Definition The data which is in an organized form (e.g. in rows and columns) and can be used by a computer program are called as “Structured Data”.  Structured data exists in a format created to be captured, stored, organized and analyzed.  Structured data are data that are organized in a format easily used by a database or other technology.  The term structured data generally refers to data that has a defined length and format for big data.  Structured data is data that has been organized into a formatted repository (a central location in which data is stored and managed), typically a database, so that its elements can be made addressable for more effective processing and analysis.
  • 19.
    Structured Data 2. Sourcesof Structured Data Databases (e.g., Access) Spreadsheets SQL OLTP systems Structured Data SQL stands for Structured Query Language. SQL lets you access and manipulate databases.
  • 20.
    Structured Data 3. Storageof Structured Data Relational Database Data Warehouse Spreadsheet Structured Data
  • 21.
  • 22.
    Structured Data 4. Characteristicsof Structured Data Structured data Conforms to a data model Data is stored in form of rows and columns (e.g., relational database) Data resides in fixed fields within a record or file Definition, format & meaning of data is explicitly known Attributes in a group are the same Similar entities are grouped
  • 23.
  • 24.
    Unstructured Data 1. Definition Unstructured data is information that either does not have a pre- defined data model or is not organized in a pre-defined manner.  Unstructured data represents any data that does not have a recognizable structure.  Unstructured data, in contrast, refers to data that doesn't fit neatly into the traditional row and column structure of relational databases.  Data which does not conform to a data model or is not in a form which can be used easily by a computer program.  E.g. memos, chat rooms, PowerPoint presentations, images, audios, videos, letters, researches, white papers, body of an e- mail etc.
  • 25.
  • 26.
    Unstructured Data 2. Sourcesof Unstructured Data Web pages Memos Videos (MPEG, etc.) Images (JPEG, GIF, etc.) Body of an e-mail Word document PowerPoint presentations Chats Reports Whitepapers Surveys Unstructured data
  • 27.
    Unstructured Data 3. Challengesin Storage of Unstructured Data Storage Space Scalability Retrieve information Security Update and delete Indexing and searching Sheer volume of unstructured data and its unprecedented growth makes it difficult to store. Audios, videos, images, etc. acquire huge amount of storage space Scalability becomes an issue with increase in unstructured data Retrieving and recovering unstructured data are cumbersome Ensuring security is difficult due to varied sources of data (e.g. e-mail, web pages) Updating, deleting, etc. are not easy due to the unstructured form Indexing becomes difficult with increase in data. Searching is difficult for non-text data Challenges faced
  • 28.
    Unstructured Data 3. Solutionfor Storage of Unstructured Data Change formats New hardware RDBMS/ BLOBs XML CAS Unstructured data may be be converted to formats which are easily managed, stored and searched. For example, IBM is working on providing a solution which converts audio , video, etc. to text Create hardware which support unstructured data either compliment the existing storage devices or be a stand alone for unstructured data Store in relational databases which support BLOBs which is Binary Large Objects Store in XML which tries to give some structure to unstructured data by using tags and elements Organize files based on their metadata Possible solutions
  • 29.
    Unstructured Data 3. Solutionfor Storage of Unstructured Data  A Binary Large Object (BLOBs) is a collection of binary data stored as a single entity in a database management system. Blobs are typically images, audio or other multimedia objects, though sometimes binary executable code is stored as a blob.  Extensible Markup Language (XML) is a markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable.  Content-addressable storage (CAS) is a way of storing information that can be retrieved based on its content, instead of its storage location. It is used extensively to store e-mails.
  • 30.
    Unstructured Data 4. Characteristicsof Unstructured Data Unstructured data Does not conform to any data model Cannot be stored in form of rows and columns as in a database Not in any particular format or sequence Not easily usable by a program Does not follow any rules Has no easily identifiable structure
  • 31.
    Semi-structured Data 1. Definition Data which does not conform to a data model but has some structure. It is not in a form which can be used easily by a computer program.  It is structured data, but it is not organized in a rational model, like a table.  Semi-structured data is information that does not reside in a rational database but that have some organizational properties that make it easier to analyze.  With some process, you can store them in the relation database.
  • 32.
    Semi-structured Data 2. Sourcesof Semi-structured E-mail XML TCP/IP packets Zipped files Binary executables Mark-up languages Integration of data from heterogeneous sources Semi-structured data
  • 33.
    Semi-structured Data 3. Storageof Semi-structured Schemas • Describe the structure and content of data to some extent • Assign meaning to data hence allowing automatic search and indexing Graph-based data models • In computing, a graph database (GDB) is a database that uses graph structur es for representing and storing data. • Used for data exchange among heterogeneous sources XML • Markup language XML This is a semi- structured document language.
  • 34.
    Semi-structured Data 4. Characteristicsof Semi-structured Data Semi- structured data Does not conform to a data model but contains tags & elements (metadata) Cannot be stored in form of rows and columns as in a database The tags and elements describe how data is stored Not sufficient Metadata Attributes in a group may not be the same Similar entities are grouped