GEOG 420 / POL 420 / SOC 420
Spring 2018
 Examples: Documents, reports, manuscripts,
memoirs, speeches, etc.
 Issue ofTime (History) and Space (Location)
 Fewer ethical issues than observation,
interviewing, etc.
ADVANTAGES
 Access difficult subjects
 Raw data is nonreactive
 Analysis over time
 Increased sample size
 Cost
DISADVANTAGES
 Selective Survival
 Incomplete Nature
 Biased Content
 Unavailability of Records
 Lack of Standard Format
 Records NOT part of organized or systematic
record-keeping program.
 Produced and preserved in a casual, personal,
and sometimes accidental manner
 Examples: Diaries, memoirs, manuscripts,
correspondence, autobiographies,
“media of temporary existence”
(e.g. brochures, posters, etc.)
 Produced by organizations
 Carefully stored, easily accessed, and
available for long periods of time
 Examples: OpenSecrets,THOMAS, Almanac
of American Politics, University Crime Log
 Advantages
 Cost –Time and Money
 Accessibility
 More Extensive
 Disadvantages
 Control of record-keeping organization
 Unwillingness of organization to share data
 Finding out record-keeping organization’s practices
 How do we actually analyze the data that we collect?
 Step #1: Materials to Include in Analysis
 Selection guided by theory and existing research
 Examples:
 Speeches, newspapers, blogs, magazines, etc.
 Materials collectively make up sampling frame
 Step 2: Define Recording or Coding Units
 Units that are distinguished for description,
recording, and coding purposes in analysis
 Examples:
 Word or sentence fragment
 Sentence
 Paragraph
 Story
 Item asWhole
 Step 3: Categories of Content to Measure
 Variables that you want to focus on in study
 Can be most important part of content analysis
 Example: Viewing nightly news programs and
coding stories based on specific issues
(economy, health care, crime, education, etc.)
 Step 4: Devise System to Measure Content
 Presence or absence of given content category
VALIDITY
 Precise explanations of
procedures and categories
RELIABILITY
 Demonstrated through
intercoder reliability
 Two or more analysis,
using same procedures and
definitions, agree on
content categories
applied to material
 Reading
 Human Coding / Manual Content Analysis
 Dictionary-Based Approaches
 Supervised Learning
 Analysis process involves reading a text
 Advantages:
 Fundamental for inferring meaning from text
 Fewer assumptions that other methods
 Disadvantage: Cost
 Standard methodology for content analysis
and text coding with social science research
 Coders read text and attempt to assign one
of a set of categories to each unit
ADVANTAGES
 Less substantive
knowledge than deep
reading of the texts
 Less costly than reading
DISADVANTAGES
 Higher initial costs
 Arriving at categorization
scheme requires knowledge
of subject matter and
substantial time
 Analyst develops list of words and phrases
likely to be in a particular category
 Examples: LIWC, DICTION, JFREQ
 Computer tallies words in given categories
ADVANTAGES
 Analysis Costs
 Large number of texts
can be processed quickly
 Descriptive numerical
summaries are easily
generated
DISADVANTAGES
 High startup costs
 Building appropriate
dictionary requires good
deal of knowledge
 Trial and error
 Hand coding done to subset of texts
 “Training” Set: Evaluation tool for “test” set
 Algorithms used to attempt to infer mapping
from text features to hand-coded categories in
training set
ADVANTAGES
 Large amount of texts
due to automated process
DISADVANTAGES
 High startup costs due to
human construction of
“training” and “test” sets
of documents

Content Analysis

  • 1.
    GEOG 420 /POL 420 / SOC 420 Spring 2018
  • 4.
     Examples: Documents,reports, manuscripts, memoirs, speeches, etc.  Issue ofTime (History) and Space (Location)  Fewer ethical issues than observation, interviewing, etc.
  • 5.
    ADVANTAGES  Access difficultsubjects  Raw data is nonreactive  Analysis over time  Increased sample size  Cost DISADVANTAGES  Selective Survival  Incomplete Nature  Biased Content  Unavailability of Records  Lack of Standard Format
  • 7.
     Records NOTpart of organized or systematic record-keeping program.  Produced and preserved in a casual, personal, and sometimes accidental manner  Examples: Diaries, memoirs, manuscripts, correspondence, autobiographies, “media of temporary existence” (e.g. brochures, posters, etc.)
  • 9.
     Produced byorganizations  Carefully stored, easily accessed, and available for long periods of time  Examples: OpenSecrets,THOMAS, Almanac of American Politics, University Crime Log
  • 10.
     Advantages  Cost–Time and Money  Accessibility  More Extensive  Disadvantages  Control of record-keeping organization  Unwillingness of organization to share data  Finding out record-keeping organization’s practices  How do we actually analyze the data that we collect?
  • 12.
     Step #1:Materials to Include in Analysis  Selection guided by theory and existing research  Examples:  Speeches, newspapers, blogs, magazines, etc.  Materials collectively make up sampling frame
  • 13.
     Step 2:Define Recording or Coding Units  Units that are distinguished for description, recording, and coding purposes in analysis  Examples:  Word or sentence fragment  Sentence  Paragraph  Story  Item asWhole
  • 14.
     Step 3:Categories of Content to Measure  Variables that you want to focus on in study  Can be most important part of content analysis  Example: Viewing nightly news programs and coding stories based on specific issues (economy, health care, crime, education, etc.)
  • 15.
     Step 4:Devise System to Measure Content  Presence or absence of given content category
  • 17.
    VALIDITY  Precise explanationsof procedures and categories RELIABILITY  Demonstrated through intercoder reliability  Two or more analysis, using same procedures and definitions, agree on content categories applied to material
  • 19.
     Reading  HumanCoding / Manual Content Analysis  Dictionary-Based Approaches  Supervised Learning
  • 21.
     Analysis processinvolves reading a text  Advantages:  Fundamental for inferring meaning from text  Fewer assumptions that other methods  Disadvantage: Cost
  • 23.
     Standard methodologyfor content analysis and text coding with social science research  Coders read text and attempt to assign one of a set of categories to each unit
  • 24.
    ADVANTAGES  Less substantive knowledgethan deep reading of the texts  Less costly than reading DISADVANTAGES  Higher initial costs  Arriving at categorization scheme requires knowledge of subject matter and substantial time
  • 26.
     Analyst developslist of words and phrases likely to be in a particular category  Examples: LIWC, DICTION, JFREQ  Computer tallies words in given categories
  • 27.
    ADVANTAGES  Analysis Costs Large number of texts can be processed quickly  Descriptive numerical summaries are easily generated DISADVANTAGES  High startup costs  Building appropriate dictionary requires good deal of knowledge  Trial and error
  • 29.
     Hand codingdone to subset of texts  “Training” Set: Evaluation tool for “test” set  Algorithms used to attempt to infer mapping from text features to hand-coded categories in training set
  • 30.
    ADVANTAGES  Large amountof texts due to automated process DISADVANTAGES  High startup costs due to human construction of “training” and “test” sets of documents