Information Systems




Gathering of Information and Data
Introduction

• Previous presentation
  covered what data is

• In this presentation we
  cover where data comes
  from and factors we need
  to take into account when
  gathering data for
  processing
Data Sources

Data can be collected either:

• DIRECTLY
   – Gathered from an original source
or
• INDIRECTLY
   – Gathered from another source or as a by-product of
     another operation

• In the world of business these would be described as
  primary and secondary sources of data
Sources of Information
• Primary data is ...

    data that you (or your organisation) gathers and interprets
                              yourself


• Secondary data is ...


      ... where another organisation uses the data you have
           collected and interprets it for other purposes
Direct (Original) Data Sources

• Sale of an item in a
  supermarket recorded at
  EFTPOS terminal

• Data from sensors (e.g. a
  weather station)

• Data collected in a survey (e.g.
  a questionnaire or an
  interview)
Indirect Data Sources

• Data collected for one purpose and used for another

   – A credit card company collects data about your spending in
     order to bill you each month. However, a secondary use of
     this data is to build up a “profile” of your spending habits.
     This data can then be used to send you direct marketing
     about goods and services that may appeal to you.
                                   Direct Use           Customer
                                     of Data             Billing

   Credit Card Transaction

                                  Indirect Use           Direct
                                    of Data             Marketing
Indirect Data Sources

• Purchased data/data passed on

   – There are a number of ways data can
     be acquired from 3rd parties and then
     used for a different purpose

   – A good example is the electoral roll.
     Its main use is to gather data about
     who is eligible to vote.
     However, marketing companies make
     extensive use of the roll to target
     customers.
                                             11
Coding Data

• Before being stored in a         This represents the
  computer information can be      eighth week of 2006

  coded as data e.g.
   – M or F
   – Mo, Tu, We, Th, Fr, Sa, Su
   – I, II, IIIM, IIIN, IV, V
   – S, M, L, XL, XXL

• In the picture shown we can
  see the date code for the tyre
Benefits of Coding
• Less storage space is required
   – M and F require less storage space than male and female

• Faster data input
   – See above

• Validation is easier
   – With a limited number of codes it is easier to match them
     against rules to check they are entered correctly
Drawbacks of Coding

• Precision of data can be lost                    Data in
  (coarsened)
   – In the example all shades of
     blue are coded as “blue”             Pink   Blue   Black Blue




• The user needs to know the
  codes used                                     Stored data

   – How many of these top level
     domains do you know?
   – au, ch, de, ie, pk, fr, il, lk, es
Coding Value Judgements

• Coding value judgements can be a particular problem as
  they are subject to personal opinion

• What do you think of this presentation?
  – Good? Average? Poor?
  – One person’s good may be another person’s poor!!!

• Value judgements are very difficult to encode without
  some coarsening (loss of detail)

• How would you improve the analysis? What are the
  time/cost implications?
Quality of the Data Source

• GIGO (Garbage In Garbage
  Out)                                  Garbage In


• If data input is poor the
  resulting information
  output will be poor i.e.
  corrupt, inaccurate etc.

                               Garbage Out
• Can you think of any “real
  life” examples?
Quality of the Data Source

Examples of GIGO can include:
• Unreliable questionnaires/surveys
   – e.g. inappropriate samples, badly
     worded questions etc.
• Incorrectly calibrated instruments
   – e.g. an incorrectly calibrated balance
     will give incorrect measures of mass
• Human error
   – e.g. transcription errors when entering
     data
• Incomplete data sets
   – e.g. failing to account for “shrinkage”
     when measuring supermarket stock
Summary

• Data can arise from direct and indirect sources

• Information can be coded as data

• This has a number of benefits but can lead to
  coarsening

• The source/accuracy of data has a major impact
  on the quality of information produced i.e. GIGO

Unit 3 gathering information and data

  • 1.
  • 2.
    Introduction • Previous presentation covered what data is • In this presentation we cover where data comes from and factors we need to take into account when gathering data for processing
  • 3.
    Data Sources Data canbe collected either: • DIRECTLY – Gathered from an original source or • INDIRECTLY – Gathered from another source or as a by-product of another operation • In the world of business these would be described as primary and secondary sources of data
  • 4.
    Sources of Information •Primary data is ... data that you (or your organisation) gathers and interprets yourself • Secondary data is ... ... where another organisation uses the data you have collected and interprets it for other purposes
  • 5.
    Direct (Original) DataSources • Sale of an item in a supermarket recorded at EFTPOS terminal • Data from sensors (e.g. a weather station) • Data collected in a survey (e.g. a questionnaire or an interview)
  • 6.
    Indirect Data Sources •Data collected for one purpose and used for another – A credit card company collects data about your spending in order to bill you each month. However, a secondary use of this data is to build up a “profile” of your spending habits. This data can then be used to send you direct marketing about goods and services that may appeal to you. Direct Use Customer of Data Billing Credit Card Transaction Indirect Use Direct of Data Marketing
  • 7.
    Indirect Data Sources •Purchased data/data passed on – There are a number of ways data can be acquired from 3rd parties and then used for a different purpose – A good example is the electoral roll. Its main use is to gather data about who is eligible to vote. However, marketing companies make extensive use of the roll to target customers. 11
  • 8.
    Coding Data • Beforebeing stored in a This represents the computer information can be eighth week of 2006 coded as data e.g. – M or F – Mo, Tu, We, Th, Fr, Sa, Su – I, II, IIIM, IIIN, IV, V – S, M, L, XL, XXL • In the picture shown we can see the date code for the tyre
  • 9.
    Benefits of Coding •Less storage space is required – M and F require less storage space than male and female • Faster data input – See above • Validation is easier – With a limited number of codes it is easier to match them against rules to check they are entered correctly
  • 10.
    Drawbacks of Coding •Precision of data can be lost Data in (coarsened) – In the example all shades of blue are coded as “blue” Pink Blue Black Blue • The user needs to know the codes used Stored data – How many of these top level domains do you know? – au, ch, de, ie, pk, fr, il, lk, es
  • 11.
    Coding Value Judgements •Coding value judgements can be a particular problem as they are subject to personal opinion • What do you think of this presentation? – Good? Average? Poor? – One person’s good may be another person’s poor!!! • Value judgements are very difficult to encode without some coarsening (loss of detail) • How would you improve the analysis? What are the time/cost implications?
  • 12.
    Quality of theData Source • GIGO (Garbage In Garbage Out) Garbage In • If data input is poor the resulting information output will be poor i.e. corrupt, inaccurate etc. Garbage Out • Can you think of any “real life” examples?
  • 13.
    Quality of theData Source Examples of GIGO can include: • Unreliable questionnaires/surveys – e.g. inappropriate samples, badly worded questions etc. • Incorrectly calibrated instruments – e.g. an incorrectly calibrated balance will give incorrect measures of mass • Human error – e.g. transcription errors when entering data • Incomplete data sets – e.g. failing to account for “shrinkage” when measuring supermarket stock
  • 14.
    Summary • Data canarise from direct and indirect sources • Information can be coded as data • This has a number of benefits but can lead to coarsening • The source/accuracy of data has a major impact on the quality of information produced i.e. GIGO