Information Processing
UNIT-1
Data, Information, Knowledge
By
Kishor Sakariya
VBT’s Institute of Library and Information Science,
C. U. Shah University
Wadhwan city
Gujarat INDIA
What is Data ?
• Data is all around us. But what exactly is it?
Data is a value assigned to a thing. Take for
example the balls in the picture below.
What is Data ? Continue...
• What can we say about these? They are golf balls, right? So
one of the first data points we have is that they are used for
golf. Golf is a category of sport, so this helps us to put the
ball in a taxonomy. But there is more to them. We have the
colour: “white”, the condition “used”. They all have a size,
there is a certain number of them and they probably have
some monetary value, and so on.
• Even unremarkable objects have a lot of data attached to
them. You too: you have a name (most people have given
and family names) a date of birth, weight, height,
nationality etc. All these things are data.
• In the example above, we can already see that there are
different types of data. The two major categories are
qualitative and quantitative data.
What is Data ? Continue...
• In the example above, we can already see that there are different types of data. The two
major categories are qualitative and quantitative data.
• Qualitative data is everything that refers to the quality of something: A description of
colours, texture and feel of an object , a description of experiences, and interview are all
qualitative data.
• Quantitative data is data that refers to a number. E.g. the number of golf balls, the size, the
price, a score on a test etc.
• However there are also other categories that you will most likely encounter:
• Categorical data puts the item you are describing into a category: In our example the
condition “used” would be categorical (with categories such as “new”, “used” ,”broken” etc.)
• Discrete data is numerical data that has gaps in it: e.g. the count of golf balls. There can only
be whole numbers of golf ball (there is no such thing as 0.3 golf balls). Other examples are
scores in tests (where you receive e.g. 7/10) or shoe sizes.
• Continuous data is numerical data with a continuous range: e.g. size of the golfballs can be
any value (e.q. 10.53mm or 10.54mm but also 10.536mm), or the size of your foot (as
opposed to your shoe size, which is discrete): In continuous data, all values are possible with
no gaps in between.
From Data to Information to
Knowledge.
• Data, when collected and structured suddenly becomes a lot more useful.
Let’s do this in the table below.
Colour -White
Category -Sport – Golf
Condition -Used
Diameter -43mm
Price (per ball) -$0.5 (AUD)
• But each of the data values is still rather meaningless by itself. To create
information out of data, we need to interpret that data.
• Let’s take the size: A diameter of 43mm doesn’t tell us much. It is only
meaningful when we compare it to other things. In sports there are often
size regulations for equipment. The minimum size for a competition golf
ball is 42.67mm. Good, we can use that golf ball in a competition. This is
information. But it still is not knowledge. Knowledge is created when the
information is learned, applied and understood.
Unstructured vs. Structured data
• Data for Humans
• A plain sentence – “we have 5 white used golf balls with a
diameter of 43mm at 50 cents each” – might be easy to
understand for a human, but for a computer this is hard to
understand. The above sentence is what we call
unstructured data. Unstructured has no fixed underlying
structure – the sentence could easily be changed and it’s
not clear which word refers to what exactly. Likewise, PDFs
and scanned images may contain information which is
pleasing to the human-eye as it is laid-out nicely, but they
are not machine-readable.
Unstructured vs. Structured data
• Data for Computers
• Computers are inherently different from humans. It can
be exceptionally hard to make computers extract
information from certain sources. Some tasks that
humans find easy are still difficult to automate with
computers. For example, interpreting text that is
presented as an image is still a challenge for a
computer. If you want your computer to process and
analyse your data, it has to be able to read and process
the data. This means it needs to be structured and in
a machine-readable form.
Unstructured vs. Structured data
• One of the most commonly used formats for exchanging
data is CSV. CSV stands for comma separated values. The
same thing expressed as CSV can look something like:
• This is way simpler for your computer to understand and
can be read directly by spreadsheet software. Note that
words have quotes around them: This distinguishes them as
text (string values in computer speak) – whereas numbers
do not have quotes. It is worth mentioning that there are
many more formats out there that are structured and
machine readable.
Summary
• In this tutorial we explored some of the
essential concepts that crop up again and
again in discussions of data. What discussed
what data is, and how it is structured.
What is Information ?
• One of the most common ways to define
information is to describe it as one or more
statements or facts that are received by a human
and that have some form of worth to the
recipient. For example, the Sesame Street
character ``Cookie Monster" describes
information as ``news or facts about something,"
or, as the first definition in the Random House
College Dictionary suggests for information,
``knowledge communicated or received
concerning a particular fact or circumstance;
news
What is Information ?
• Cookie Monster's definition is consistent with the
common notions that information must:
• 1.be something, although the exact nature
(substance, energy, or abstract concept) isn't
clear;
• 2.provide ``new" information: a repetition of
previously received messages isn't informative;
• 3.be ``true:" a lie or false or counterfactual
information is mis-information, not information
itself;
• 4.be ``about" something.
What is Information ?
• We suggest here a general definition of
information: information is produced by all
processes and it is the values of characteristics
in the processes' output that are
information. This captures most concepts of
information in individual disciplines. The
number of possible values in the output and
their relative frequencies of occurrence may
be used in measuring the amount of
information present.
Source: https://ils.unc.edu/~losee/b5/node2.html
What is Knowledge ?
• Knowledge is a familiarity, awareness or understanding of someone
or something, such as facts, information, descriptions, or skills,
which is acquired through experience or education by perceiving,
discovering, or learning.
• Knowledge is closely linked to doing and implies know-how and
understanding. The knowledge possessed by each individual is a
product of his experience, and encompasses the norms by which he
evaluates new inputs from his surroundings (Davenport & Prusak
2000). I will use the definition presented by Gamble and Blackwell
(2001), based closely on a previous definition by Davenport &
Prusak:
What is Knowledge ?
• "Knowledge is a fluid mix of framed experience, values, contextual
information, expert insight, and grounded intuition that provides an
environment and framework for evaluating and incorporating new
experiences and information. It originates and is applied in the mind of
the knowers. In organizations it often becomes embedded not only in
documents or repositories, but also in organizational routines, practices
and norms."
• In order for KM to succeed, one needs a deep understanding of what
constitutes knowledge. Now that we have set clear boundaries between
knowledge, information, and data, it is possible to go one step further and
look at the forms in which knowledge exists and the different ways that it
can be accessed, shared, and combined. I will examine this in the section
titled "The Different Kinds of Knowledge".
•
Read more:http://www.knowledge-management-tools.net/knowledge-
information-data.html#ixzz4HQcdjMmS
Data.. Information.. Knowledge
Summary
Data.. Information.. Knowledge
Summary
• Data: Facts and figures which relay something specific, but which are not organized in any way and
which provide no further information regarding patterns, context, etc. I will use the definition for
data presented by Thierauf (1999): "unstructured facts and figures that have the least impact on
the typical manager."
• Information: For data to become information, it must be contextualized, categorized, calculated
and condensed (Davenport & Prusak 2000). Information thus paints a bigger picture; it is data with
relevance and purpose (Bali et al 2009). It may convey a trend in the environment, or perhaps
indicate a pattern of sales for a given period of time. Essentially information is found "in answers to
questions that begin with such words as who, what, where, when, and how many" (Ackoff 1999).
• IT is usually invaluable in the capacity of turning data into information, particularly in larger firms
that generate large amounts of data across multiple departments and functions. The human brain is
mainly needed to assist in contextualization.
• Knowledge: Knowledge is closely linked to doing and implies know-how and understanding. The
knowledge possessed by each individual is a product of his experience, and encompasses the
norms by which he evaluates new inputs from his surroundings
•
Read more:http://www.knowledge-management-tools.net/knowledge-information-
data.html#ixzz4HQdUpJoX
References
• http://schoolofdata.org/handbook/courses/w
hat-is-data/
• https://ils.unc.edu/~losee/b5/node2.html
• http://www.knowledge-management-
tools.net/knowledge-information-
data.html#ixzz4HQcdjMmS

Data information knowledge

  • 1.
    Information Processing UNIT-1 Data, Information,Knowledge By Kishor Sakariya VBT’s Institute of Library and Information Science, C. U. Shah University Wadhwan city Gujarat INDIA
  • 2.
    What is Data? • Data is all around us. But what exactly is it? Data is a value assigned to a thing. Take for example the balls in the picture below.
  • 3.
    What is Data? Continue... • What can we say about these? They are golf balls, right? So one of the first data points we have is that they are used for golf. Golf is a category of sport, so this helps us to put the ball in a taxonomy. But there is more to them. We have the colour: “white”, the condition “used”. They all have a size, there is a certain number of them and they probably have some monetary value, and so on. • Even unremarkable objects have a lot of data attached to them. You too: you have a name (most people have given and family names) a date of birth, weight, height, nationality etc. All these things are data. • In the example above, we can already see that there are different types of data. The two major categories are qualitative and quantitative data.
  • 4.
    What is Data? Continue... • In the example above, we can already see that there are different types of data. The two major categories are qualitative and quantitative data. • Qualitative data is everything that refers to the quality of something: A description of colours, texture and feel of an object , a description of experiences, and interview are all qualitative data. • Quantitative data is data that refers to a number. E.g. the number of golf balls, the size, the price, a score on a test etc. • However there are also other categories that you will most likely encounter: • Categorical data puts the item you are describing into a category: In our example the condition “used” would be categorical (with categories such as “new”, “used” ,”broken” etc.) • Discrete data is numerical data that has gaps in it: e.g. the count of golf balls. There can only be whole numbers of golf ball (there is no such thing as 0.3 golf balls). Other examples are scores in tests (where you receive e.g. 7/10) or shoe sizes. • Continuous data is numerical data with a continuous range: e.g. size of the golfballs can be any value (e.q. 10.53mm or 10.54mm but also 10.536mm), or the size of your foot (as opposed to your shoe size, which is discrete): In continuous data, all values are possible with no gaps in between.
  • 5.
    From Data toInformation to Knowledge. • Data, when collected and structured suddenly becomes a lot more useful. Let’s do this in the table below. Colour -White Category -Sport – Golf Condition -Used Diameter -43mm Price (per ball) -$0.5 (AUD) • But each of the data values is still rather meaningless by itself. To create information out of data, we need to interpret that data. • Let’s take the size: A diameter of 43mm doesn’t tell us much. It is only meaningful when we compare it to other things. In sports there are often size regulations for equipment. The minimum size for a competition golf ball is 42.67mm. Good, we can use that golf ball in a competition. This is information. But it still is not knowledge. Knowledge is created when the information is learned, applied and understood.
  • 6.
    Unstructured vs. Structureddata • Data for Humans • A plain sentence – “we have 5 white used golf balls with a diameter of 43mm at 50 cents each” – might be easy to understand for a human, but for a computer this is hard to understand. The above sentence is what we call unstructured data. Unstructured has no fixed underlying structure – the sentence could easily be changed and it’s not clear which word refers to what exactly. Likewise, PDFs and scanned images may contain information which is pleasing to the human-eye as it is laid-out nicely, but they are not machine-readable.
  • 7.
    Unstructured vs. Structureddata • Data for Computers • Computers are inherently different from humans. It can be exceptionally hard to make computers extract information from certain sources. Some tasks that humans find easy are still difficult to automate with computers. For example, interpreting text that is presented as an image is still a challenge for a computer. If you want your computer to process and analyse your data, it has to be able to read and process the data. This means it needs to be structured and in a machine-readable form.
  • 8.
    Unstructured vs. Structureddata • One of the most commonly used formats for exchanging data is CSV. CSV stands for comma separated values. The same thing expressed as CSV can look something like: • This is way simpler for your computer to understand and can be read directly by spreadsheet software. Note that words have quotes around them: This distinguishes them as text (string values in computer speak) – whereas numbers do not have quotes. It is worth mentioning that there are many more formats out there that are structured and machine readable.
  • 9.
    Summary • In thistutorial we explored some of the essential concepts that crop up again and again in discussions of data. What discussed what data is, and how it is structured.
  • 10.
    What is Information? • One of the most common ways to define information is to describe it as one or more statements or facts that are received by a human and that have some form of worth to the recipient. For example, the Sesame Street character ``Cookie Monster" describes information as ``news or facts about something," or, as the first definition in the Random House College Dictionary suggests for information, ``knowledge communicated or received concerning a particular fact or circumstance; news
  • 11.
    What is Information? • Cookie Monster's definition is consistent with the common notions that information must: • 1.be something, although the exact nature (substance, energy, or abstract concept) isn't clear; • 2.provide ``new" information: a repetition of previously received messages isn't informative; • 3.be ``true:" a lie or false or counterfactual information is mis-information, not information itself; • 4.be ``about" something.
  • 12.
    What is Information? • We suggest here a general definition of information: information is produced by all processes and it is the values of characteristics in the processes' output that are information. This captures most concepts of information in individual disciplines. The number of possible values in the output and their relative frequencies of occurrence may be used in measuring the amount of information present. Source: https://ils.unc.edu/~losee/b5/node2.html
  • 13.
    What is Knowledge? • Knowledge is a familiarity, awareness or understanding of someone or something, such as facts, information, descriptions, or skills, which is acquired through experience or education by perceiving, discovering, or learning. • Knowledge is closely linked to doing and implies know-how and understanding. The knowledge possessed by each individual is a product of his experience, and encompasses the norms by which he evaluates new inputs from his surroundings (Davenport & Prusak 2000). I will use the definition presented by Gamble and Blackwell (2001), based closely on a previous definition by Davenport & Prusak:
  • 14.
    What is Knowledge? • "Knowledge is a fluid mix of framed experience, values, contextual information, expert insight, and grounded intuition that provides an environment and framework for evaluating and incorporating new experiences and information. It originates and is applied in the mind of the knowers. In organizations it often becomes embedded not only in documents or repositories, but also in organizational routines, practices and norms." • In order for KM to succeed, one needs a deep understanding of what constitutes knowledge. Now that we have set clear boundaries between knowledge, information, and data, it is possible to go one step further and look at the forms in which knowledge exists and the different ways that it can be accessed, shared, and combined. I will examine this in the section titled "The Different Kinds of Knowledge". • Read more:http://www.knowledge-management-tools.net/knowledge- information-data.html#ixzz4HQcdjMmS
  • 15.
  • 16.
    Data.. Information.. Knowledge Summary •Data: Facts and figures which relay something specific, but which are not organized in any way and which provide no further information regarding patterns, context, etc. I will use the definition for data presented by Thierauf (1999): "unstructured facts and figures that have the least impact on the typical manager." • Information: For data to become information, it must be contextualized, categorized, calculated and condensed (Davenport & Prusak 2000). Information thus paints a bigger picture; it is data with relevance and purpose (Bali et al 2009). It may convey a trend in the environment, or perhaps indicate a pattern of sales for a given period of time. Essentially information is found "in answers to questions that begin with such words as who, what, where, when, and how many" (Ackoff 1999). • IT is usually invaluable in the capacity of turning data into information, particularly in larger firms that generate large amounts of data across multiple departments and functions. The human brain is mainly needed to assist in contextualization. • Knowledge: Knowledge is closely linked to doing and implies know-how and understanding. The knowledge possessed by each individual is a product of his experience, and encompasses the norms by which he evaluates new inputs from his surroundings • Read more:http://www.knowledge-management-tools.net/knowledge-information- data.html#ixzz4HQdUpJoX
  • 17.
    References • http://schoolofdata.org/handbook/courses/w hat-is-data/ • https://ils.unc.edu/~losee/b5/node2.html •http://www.knowledge-management- tools.net/knowledge-information- data.html#ixzz4HQcdjMmS