May 28, 2009
Why should we care?
• How do you find resources and information?
• How do you think our data users are finding
– Is the strategy to find them and bring them
– Is the strategy to enable them to find us?
• 80% of Internet user sessions begin at search
engines (Source: Internetstats.com)
ICPSR Search Traffic
• So far in 2009:
– 41% of our traffic coming through search
– 37% coming through Google
• Stats from Google Analytics (January 1 – May
Some Definitions & the Goal
• Search Engine Marketing (SEM)– techniques to
develop a web page or site that search engine
spiders can easily interpret; includes SEO and
paid search techniques
• Search Engine Optimization (SEO) – a part of
SEM; techniques to “influence” ranking in
“natural/organic” search results - Google
• Google’s POV: manipulating your site such
that their spiders can correctly interpret your
content and bring the right users to you is
Search Engine Share & Ranking
• Google – 60%
• Yahoo! – 23%
• Live Search (formerly MSN Search) – 12%
• Ask – 4%
(Source: Hitwise 2007 – oldie stat but a goodie!)
So Google has a ranking algorithm – rumor has
it, with 200+ elements . . .
• Organic Search – The Content Factor
• Organic Search – The Links Factor
• Google Analytics & Other Sources – Playing a
role in your content
• Feeding Spiders
• Tips & Free Evaluation Tools
How do Search Engines Work?
• Web pages containing text are “spidered” by
• Search engines process spidered pages to
decide importance of sites/pages and to create
• Users submit a keyword(s) search to the
engine which then returns relevant pages
associated with that keyword(s)
• Users click the links of interest from the
results (note – this also feeds into relevance
since the search engine “learns”)
Content – Text versus Not Text
• What is Text?
– Webpages with HTML text
– Common files: PDF, Word, Excel
– META tags
• What is Not text?
Not all Content is Equal!
• Title & Tagline
• Section Heading
• Navigation tags
• Text in paragraphs
• Image tags
• Meta tags
• URLs also Crawled!
– Note: Spiders interpret hyphens (-) as a
space but underscores (_) as a character –
the latter thus lowers the match!
Not So Good Example
• Scarecrow Winery
– All Flash and no Content!
• Spiders can’t read or interpret, so can’t rank or
return to users
• Winery is successful – but it’s relying on other
content & viral methods (and good wine!) to get
people there (more on that during social media
training next month!)
• If possible, incorporate key search terms in
titles, section names, text, & metafiles (and
also in your print media/promotions)
• Don’t use the brand name in these unless you
are really well known – okay for Coca Cola and
• Lot’s of “personalizing” the website going on
(sessionizing) – be careful however since if
you make the landing page unique to each
user, spiders don’t consolidate across users
(spiders can crawl, but they can’t add!)
Analyze your Preferred Site for
• Do titles and subtitles on your page
explain the content?
– “The OLC supports quantitative
literacy” vs. “What We Do”
• Does paragraph text have terms users
• Are images tagged?
• Are navigation tags aiding the spiders?
The Link Factor!
• Emphasizing the “Web” in Website!
• Three kinds of links:
– Internal links to other pages within the
site: Never, Never “CLICK HERE”!!
• Subpages should link to homepage –
increases homepage importance ranking
– Outbound Links to other sites: Never,
Never “CLICK HERE”!!
– Inbound Links from other sites: Do I
need to say it again?
The Link Factor!
• Inbound Links Rule!
– Feed Google’s fascination with citations
• But it’s not just quantity, it’s quality:
– Links to site/page from .gov, .mil, .edu, and
.org are more highly weighted
– Links from blogs, social media sites, and
.com also important (think popularity
• Links should describe your content
– “Click here for info on the ICPSR Summer Program”
-versus- “For info on the ICPSR Summer Program,
Feeding the Spider a Good Link Diet
• Referring to ICPSR:
– ICPSR is the world’s largest archive of digital social science data
• Referring to an ICPSR resource:
– ICPSR’s Online Learning Center is a collection of
social science teaching resources
• Referring to an archive/project:
– The National Archive of Computerized Data on Aging (NACDA) is a
collection of research data on the aging
• Referring to an ICPSR dataset:
– Those interested in research regarding
racial disparities in mental health should consult the
Detroit Area Studies Series
Detroit Area Study, 1995: Social Influence on Health: Stress, Racism, and
. This survey explored the
ways in which social influences, such as stress and racism, affected health
, and the impact these influences had on the respondents' outlook
Feeding the Spider - Social Media
Discussion Social Networks
Analyze your Preferred Site for Links
• What do you know about your internal
• Take a look at your outbound links –
are they properly described?
• What do you know about inbound links
– Who are your best referrals?
– Have you looked at how other sites
are describing your site?
– Have you asked appropriate sites to
link to you?
• Design your site with users AND spiders in
• Don’t become labeled as a spammer!
– No hidden text/key words on pages
– No hidden links on pages
– Stay away from link factories
• Design strategically – use some of the free
tools out there to assist. There are many!
Free Evaluation Tools
• Really Cool Tool: Seochat.com -
• Google Trends - shows how often a particular search
term is entered relative the total search volume across various
regions of the world - http://www.google.com/trends
• Google Webmaster Tools – Inbound link & other
information about what the spider is seeing -
• Yahoo Site Explorer:
• Google Adwords:
• Delicious - http://delicious.com/
• Some to investigate: