Data CollectionData Collection
MethodsMethods
Pros and Cons of Primary andPros and Cons of Primary and
Secondary DataSecondary Data
Where do data comeWhere do data come
from?from?
 We’ve seen our data for this lab, allWe’ve seen our data for this lab, all
nice and collated in a database – from:nice and collated in a database – from:
– Insurance companies (claims,Insurance companies (claims,
medications, procedures, diagnoses, etc.)medications, procedures, diagnoses, etc.)
– Firms (demographic data, productivityFirms (demographic data, productivity
data, etc.)data, etc.)
Where do data comeWhere do data come
from?from?
 Take a step back – if we’re startingTake a step back – if we’re starting
from scratch, how do we collect / findfrom scratch, how do we collect / find
data?data?
– Secondary dataSecondary data
– Primary dataPrimary data
Secondary DataSecondary Data
 Secondary data – data someone elseSecondary data – data someone else
has collectedhas collected
– This is what you were looking for in yourThis is what you were looking for in your
assignment.assignment.
Secondary Data –Secondary Data –
Examples of SourcesExamples of Sources
 County health departmentsCounty health departments
 Vital Statistics – birth, death certificatesVital Statistics – birth, death certificates
 Hospital, clinic, school nurse recordsHospital, clinic, school nurse records
 Private and foundation databasesPrivate and foundation databases
 City and county governmentsCity and county governments
 Surveillance data from state governmentSurveillance data from state government
programsprograms
 Federal agency statistics - Census, NIH,Federal agency statistics - Census, NIH,
etc.etc.
Secondary Data –Secondary Data –
LimitationsLimitations
 What did you find on the frustratingWhat did you find on the frustrating
side as you looked for data on theside as you looked for data on the
state’s websites?state’s websites?
Secondary Data –Secondary Data –
LimitationsLimitations
 When was it collected? For how long?When was it collected? For how long?
– May be out of date for what you want toMay be out of date for what you want to
analyze.analyze.
– May not have been collected long enoughMay not have been collected long enough
for detecting trends.for detecting trends.
– E.g. Have new anticorruption lawsE.g. Have new anticorruption laws
impacted Russia’s governmentimpacted Russia’s government
accountability ratings?accountability ratings?
Secondary Data –Secondary Data –
LimitationsLimitations
 Is the data set complete?Is the data set complete?
– There may be missing information onThere may be missing information on
some observationssome observations
– Unless such missing information isUnless such missing information is
caught and corrected for, analysis will becaught and corrected for, analysis will be
biased.biased.
Secondary Data –Secondary Data –
LimitationsLimitations
 Are there confounding problems?Are there confounding problems?
– Sample selection bias?Sample selection bias?
– Source choice bias?Source choice bias?
– In time series, did some observationsIn time series, did some observations
drop out over time?drop out over time?
Secondary Data –Secondary Data –
LimitationsLimitations
 Are the data consistent/reliable?Are the data consistent/reliable?
– Did variables drop out over time?Did variables drop out over time?
– Did variables change in definition overDid variables change in definition over
time?time?
 E.g. number of years of education versusE.g. number of years of education versus
highest degree obtained.highest degree obtained.
Secondary Data –Secondary Data –
LimitationsLimitations
 Is the information exactly what you need?Is the information exactly what you need?
– In some cases, may have to use “proxyIn some cases, may have to use “proxy
variables” – variables that may approximatevariables” – variables that may approximate
something you really wanted to measure. Aresomething you really wanted to measure. Are
they reliable? Is there correlation to what youthey reliable? Is there correlation to what you
actually want to measure?actually want to measure?
– E.g. gauging student interest in U.W. by theirE.g. gauging student interest in U.W. by their
ranking on FAFSA – subject to gamesmanship.ranking on FAFSA – subject to gamesmanship.
Secondary Data –Secondary Data –
AdvantagesAdvantages
 No need to reinvent the wheel.No need to reinvent the wheel.
– If someone has already found the data,If someone has already found the data,
take advantage of it.take advantage of it.
Secondary Data –Secondary Data –
AdvantagesAdvantages
 It will save you money.It will save you money.
– Even if you have to pay for access, oftenEven if you have to pay for access, often
it is cheaper in terms of money thanit is cheaper in terms of money than
collecting your own data. (more on thiscollecting your own data. (more on this
later.)later.)
Secondary Data –Secondary Data –
AdvantagesAdvantages
 It will save you time.It will save you time.
– Primary data collection is very timePrimary data collection is very time
consuming. (More on this later, too!)consuming. (More on this later, too!)
Secondary Data –Secondary Data –
AdvantagesAdvantages
 It may be very accurate.It may be very accurate.
– When especially a government agencyWhen especially a government agency
has collected the data, incrediblehas collected the data, incredible
amounts of time and money went into it.amounts of time and money went into it.
It’s probably highly accurate.It’s probably highly accurate.
Secondary Data –Secondary Data –
AdvantagesAdvantages
 It has great exploratory valueIt has great exploratory value
– Exploring research questions andExploring research questions and
formulating hypothesis to test.formulating hypothesis to test.
Primary DataPrimary Data
 Primary data – data you collectPrimary data – data you collect
Primary Data - ExamplesPrimary Data - Examples
 SurveysSurveys
 Focus groupsFocus groups
 QuestionnairesQuestionnaires
 Personal interviewsPersonal interviews
 Experiments and observational studyExperiments and observational study
Primary Data -Primary Data -
LimitationsLimitations
 Do you have the time and money for:Do you have the time and money for:
– Designing your collection instrument?Designing your collection instrument?
– Selecting your population or sample?Selecting your population or sample?
– Pretesting/piloting the instrument to workPretesting/piloting the instrument to work
out sources of bias?out sources of bias?
– Administration of the instrument?Administration of the instrument?
– Entry/collation of data?Entry/collation of data?
Primary Data -Primary Data -
LimitationsLimitations
 UniquenessUniqueness
– May not be able to compare to otherMay not be able to compare to other
populationspopulations
Primary Data -Primary Data -
LimitationsLimitations
 Researcher errorResearcher error
– Sample biasSample bias
– Other confounding factorsOther confounding factors
Data collection choiceData collection choice
 What you must ask yourself:What you must ask yourself:
– Will the data answer my researchWill the data answer my research
question?question?
Data collection choiceData collection choice
 To answer thatTo answer that
– You much first decide what your research
question is
– Then you need to decide what
data/variables are needed to scientifically
answer the question
Data collection choiceData collection choice
 If that data exist in secondary form,If that data exist in secondary form,
then use them to the extent you can,then use them to the extent you can,
keeping in mind limitations.keeping in mind limitations.
 But if it does not, and you are able toBut if it does not, and you are able to
fund primary collection, then it is thefund primary collection, then it is the
method of choice.method of choice.

Data collection methods

  • 1.
    Data CollectionData Collection MethodsMethods Prosand Cons of Primary andPros and Cons of Primary and Secondary DataSecondary Data
  • 2.
    Where do datacomeWhere do data come from?from?  We’ve seen our data for this lab, allWe’ve seen our data for this lab, all nice and collated in a database – from:nice and collated in a database – from: – Insurance companies (claims,Insurance companies (claims, medications, procedures, diagnoses, etc.)medications, procedures, diagnoses, etc.) – Firms (demographic data, productivityFirms (demographic data, productivity data, etc.)data, etc.)
  • 3.
    Where do datacomeWhere do data come from?from?  Take a step back – if we’re startingTake a step back – if we’re starting from scratch, how do we collect / findfrom scratch, how do we collect / find data?data? – Secondary dataSecondary data – Primary dataPrimary data
  • 4.
    Secondary DataSecondary Data Secondary data – data someone elseSecondary data – data someone else has collectedhas collected – This is what you were looking for in yourThis is what you were looking for in your assignment.assignment.
  • 5.
    Secondary Data –SecondaryData – Examples of SourcesExamples of Sources  County health departmentsCounty health departments  Vital Statistics – birth, death certificatesVital Statistics – birth, death certificates  Hospital, clinic, school nurse recordsHospital, clinic, school nurse records  Private and foundation databasesPrivate and foundation databases  City and county governmentsCity and county governments  Surveillance data from state governmentSurveillance data from state government programsprograms  Federal agency statistics - Census, NIH,Federal agency statistics - Census, NIH, etc.etc.
  • 6.
    Secondary Data –SecondaryData – LimitationsLimitations  What did you find on the frustratingWhat did you find on the frustrating side as you looked for data on theside as you looked for data on the state’s websites?state’s websites?
  • 7.
    Secondary Data –SecondaryData – LimitationsLimitations  When was it collected? For how long?When was it collected? For how long? – May be out of date for what you want toMay be out of date for what you want to analyze.analyze. – May not have been collected long enoughMay not have been collected long enough for detecting trends.for detecting trends. – E.g. Have new anticorruption lawsE.g. Have new anticorruption laws impacted Russia’s governmentimpacted Russia’s government accountability ratings?accountability ratings?
  • 8.
    Secondary Data –SecondaryData – LimitationsLimitations  Is the data set complete?Is the data set complete? – There may be missing information onThere may be missing information on some observationssome observations – Unless such missing information isUnless such missing information is caught and corrected for, analysis will becaught and corrected for, analysis will be biased.biased.
  • 9.
    Secondary Data –SecondaryData – LimitationsLimitations  Are there confounding problems?Are there confounding problems? – Sample selection bias?Sample selection bias? – Source choice bias?Source choice bias? – In time series, did some observationsIn time series, did some observations drop out over time?drop out over time?
  • 10.
    Secondary Data –SecondaryData – LimitationsLimitations  Are the data consistent/reliable?Are the data consistent/reliable? – Did variables drop out over time?Did variables drop out over time? – Did variables change in definition overDid variables change in definition over time?time?  E.g. number of years of education versusE.g. number of years of education versus highest degree obtained.highest degree obtained.
  • 11.
    Secondary Data –SecondaryData – LimitationsLimitations  Is the information exactly what you need?Is the information exactly what you need? – In some cases, may have to use “proxyIn some cases, may have to use “proxy variables” – variables that may approximatevariables” – variables that may approximate something you really wanted to measure. Aresomething you really wanted to measure. Are they reliable? Is there correlation to what youthey reliable? Is there correlation to what you actually want to measure?actually want to measure? – E.g. gauging student interest in U.W. by theirE.g. gauging student interest in U.W. by their ranking on FAFSA – subject to gamesmanship.ranking on FAFSA – subject to gamesmanship.
  • 12.
    Secondary Data –SecondaryData – AdvantagesAdvantages  No need to reinvent the wheel.No need to reinvent the wheel. – If someone has already found the data,If someone has already found the data, take advantage of it.take advantage of it.
  • 13.
    Secondary Data –SecondaryData – AdvantagesAdvantages  It will save you money.It will save you money. – Even if you have to pay for access, oftenEven if you have to pay for access, often it is cheaper in terms of money thanit is cheaper in terms of money than collecting your own data. (more on thiscollecting your own data. (more on this later.)later.)
  • 14.
    Secondary Data –SecondaryData – AdvantagesAdvantages  It will save you time.It will save you time. – Primary data collection is very timePrimary data collection is very time consuming. (More on this later, too!)consuming. (More on this later, too!)
  • 15.
    Secondary Data –SecondaryData – AdvantagesAdvantages  It may be very accurate.It may be very accurate. – When especially a government agencyWhen especially a government agency has collected the data, incrediblehas collected the data, incredible amounts of time and money went into it.amounts of time and money went into it. It’s probably highly accurate.It’s probably highly accurate.
  • 16.
    Secondary Data –SecondaryData – AdvantagesAdvantages  It has great exploratory valueIt has great exploratory value – Exploring research questions andExploring research questions and formulating hypothesis to test.formulating hypothesis to test.
  • 17.
    Primary DataPrimary Data Primary data – data you collectPrimary data – data you collect
  • 18.
    Primary Data -ExamplesPrimary Data - Examples  SurveysSurveys  Focus groupsFocus groups  QuestionnairesQuestionnaires  Personal interviewsPersonal interviews  Experiments and observational studyExperiments and observational study
  • 19.
    Primary Data -PrimaryData - LimitationsLimitations  Do you have the time and money for:Do you have the time and money for: – Designing your collection instrument?Designing your collection instrument? – Selecting your population or sample?Selecting your population or sample? – Pretesting/piloting the instrument to workPretesting/piloting the instrument to work out sources of bias?out sources of bias? – Administration of the instrument?Administration of the instrument? – Entry/collation of data?Entry/collation of data?
  • 20.
    Primary Data -PrimaryData - LimitationsLimitations  UniquenessUniqueness – May not be able to compare to otherMay not be able to compare to other populationspopulations
  • 21.
    Primary Data -PrimaryData - LimitationsLimitations  Researcher errorResearcher error – Sample biasSample bias – Other confounding factorsOther confounding factors
  • 22.
    Data collection choiceDatacollection choice  What you must ask yourself:What you must ask yourself: – Will the data answer my researchWill the data answer my research question?question?
  • 23.
    Data collection choiceDatacollection choice  To answer thatTo answer that – You much first decide what your research question is – Then you need to decide what data/variables are needed to scientifically answer the question
  • 24.
    Data collection choiceDatacollection choice  If that data exist in secondary form,If that data exist in secondary form, then use them to the extent you can,then use them to the extent you can, keeping in mind limitations.keeping in mind limitations.  But if it does not, and you are able toBut if it does not, and you are able to fund primary collection, then it is thefund primary collection, then it is the method of choice.method of choice.