INFO 498: Content Strategy (week #7)



From Blobs to Structured Data
SEO in the Age of Entities




              Jonathon Colman, @jcolman
              In-House SEO for REI
              www.REI.com
What is content?
 If you boil away all the formatting, what’s
  left?
 Just text?
 If so, then why isn’t full text search good
  enough to find what you’re looking for?
 What could work better than that?
 Any what can we do to content to support
  its findability?
http://www.youtube.com/watch?v=dsA4FnwrR7E
Huh? Wikipedia
                                 is a source?




https://www.facebook.com/pages/The-Bus-
That-Couldnt-Slow-Down/114241625259749
Oh, it’s via a synonym
                                        redirect to…




http://en.wikipedia.org/w/index.php?title=The_Bus_Tha
        t_Couldn%27t_Slow_Down&redirect=no
Joss Whedon was a
                                  co-writer? WTF?!




http://en.wikipedia.org/wiki/Speed_(1994_film)
What is a document?
 How can you tell what a document is
  about?
 How can you tell one document from
  another?
 What sort of signals do documents give us
  that help us derive their meaning?
 Do you know them when you see them?
veniam, quis nostrud exerci tation ullamcorper suscipit l
ommodo consequat. Duis autem vel eum iriure dolor in h
ate velit esse molestie consequat, vel illum dolore eu feu
 os et accumsan et iusto odio dignissim qui blandit praes
  augue duis dolore te feugait nulla facilisi. Nam liber tem
d option congue nihil imperdiet doming id quod mazim p
  Typi non habent claritatem insitam; est usus legentis in
 em. Investigationes demonstraverunt lectores legere me
s. Claritas est etiam processus dynamicus, qui sequitur m
 tudium lectorum. Mirum est notare quam littera gothica
us parum claram, anteposuerit litterarum formas human
 decima et quinta decima. Eodem modo typi, qui nunc no
ant sollemnes in futurum. Lorem ipsum dolor sit amet, c
ing elit, sed diam nonummy nibh euismod tincidunt ut la
m erat volutpat. Ut wisi enim ad minim veniam, quis nost
orper suscipit lobortis nisl ut aliquip ex ea commodo con
m iriure dolor in hendrerit in vulputate velit esse molestie
 eu feugiat nulla facilisis at vero eros et accumsan et iusto
  praesent luptatum zzril delenit augue duis dolore te feu
ber tempor cum soluta nobis eleifend option congue nihi
d mazim placerat facer possim assum. Typi non habent cl
                                         This is a Blob.
gentis in iis qui facit eorum claritatem. Investigationes de
 s legere me lius quod ii legunt saepius. Claritas est etiam
icus, qui sequitur mutationem consuetudium lectorum. M
ittera gothica, quam nunc putamus parum claram, antep
  humanitatis per seacula quarta decima et quinta decima
Lorem ipsum: A Study in Dolor Sit Amet
Author: Melissa Weaver
Date: February 18, 2012
Language: Latin, English
Publisher: UW Husky Press
Keywords: consectetuer, adipiscing, elit, sed, diam
Abstract: Nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat
volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit
lobortis nisl ut aliquip ex ea commodo consequat.

Chapter 1: Hendrerit in Vulputate
Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse
molestie consequat, vel illum dolore eu feugiat nulla facilisis at
vero eros et accumsan et iusto odio dignissim qui blandit praesent
                                          This uses Entities.
luptatum zzril delenit augue duis dolore te feugait nulla facilisi.
Nam liber tempor cum soluta nobis eleifend option congue nihil
imperdiet doming id quod mazim placerat facer possim assum...
The Problem with Blobs
 Unstructured content is useful, but only to
  a point
 It’s hard to scan, skim, and easily make
  sense of – both for humans and robots
 It’s hard to search against, particularly in a
  crowded collection with lots of competing
  content containing similar information
 What should a search engine pay
  attention to in order to help the user?
HTML metadata
 Metadata is “data about data”, right?
 In HTML, we can express metadata like:
     <title>The Problem With Blobs</title>
     <meta name=“description” content=“An overview
      of why blobs are tricky things to deal with.” />
     <meta name="keywords" content=“blob, entity,
      seo, content strategy, inf0498" />

 Unfortunately, that’s not going to be good
  enough. But why not? Let’s see…
2.2M results! Where
  are the movies?
How can we do better?
Real metadata – in this case, “microdata”.
What is Schema.org?
 Microdata standard agreed upon by
  Google, Bing, and Yahoo
 Uses relatively simple on-page code to
  turn blobs of content into structured data
 Once structured, this content become
  interoperable in other systems – you can
  display that data wherever the standards
  are accepted
 Here’s an example…
This can increase
 clicks by +30%.
Controlled entities help searchers
 Documents can be documents, authors
  can be authors, products can be products,
  and prices can be prices.
   Each of these entities has a definition in
    Schema.org and markup that you can use to
    define a blob as being actual data.
 So if Homer doesn’t know the name of the
  movie “Speed”, he can still find it with
  searches for its subject, the actors, the
  year it came out, the director, etc.
Exercise: Use the “Article” schema
 Go to http://schema.org/Article
 Look at the entities and the code sample
  at the bottom
 Pick appropriate content from the IAI
  Library, such as
  http://iainstitute.org/en/learn/research/a
  _simplified_model_for_facet_analysis.php
 “View Source” and try marking it up with
  Schema.org microdata
Partial potential results
<div itemscope itemtype=“http://schema.org/Article”>
   <h1 itemprop="name">A Simplified Model for Facet Analysis</h1>
   <div itemscope itemtype=“http://schema.org/Author”>
     <h2 itemprop=“name">Dr. Louise Spiteri</h2><br />
     <span
itemprop=“URL">http://dal.academia.edu/LouiseSpiteri</span><br>
     <div itemscope itemtype=“http://schema.org/Affiliation”>
        Faculty of Management<br />
        School of Library and Information Studies<br />
        <span itemprop=“Organization”>Dalhousie University</span><br />
        <div itemscope itemtype="schema.org/PostalAddress">
           <span itemprop=“addressLocality”>Halifax</span><br />
           <span itemprop="addressRegion">Nova Scotia</span> <span
itemprop="postalCode">NS B3H 3J5</span><br />
           <span itemprop="addressCountry">Canada</span></div><br />
        Voice: <span itemprop=“telephone”>(902) 494-2473</span><br />
        Fax: <span itemprop=“faxNumber”>(902) 494-2451</span></div><br />
   </div>
</div>
How to test
Use Google’s Rich Snippets Testing Tool:
http://www.google.com/webmasters/tools/r
ichsnippets
Sample test output
 For this example blog post:
  http://homebiss.blogspot.com/2011/11/markup-
  blogger-schemaorg-examples.html

 The Google Rich Snippets Testing Tool
  shows this output, which includes some
  use of Schema.org:
  http://www.google.com/webmasters/tools/richsnip
  pets?url=http%3A%2F%2Fhomebiss.blogspot.com%
  2F2011%2F11%2Fmarkup-blogger-schemaorg-
  examples.html&view=
What did we just learn?
 Schema.org is frakkin’ verbose.
 Entities can cascade poly-hierarchically
 There are many “right” approaches
 Not all entities need to be expressed
 Not all entities provide value
 Still, it’s hard to know when to stop
     In your case, you’re done when the quarter’s over. 
Common Schema.org entities
 Thing > Person
 Thing > Organization
 Thing > CreativeWork > Article
  See also: Blog, BlogPosting, NewsArticle, ScholarlyArticle

 Thing > CreativeWork > MediaObject
  See also: AudioObject, ImageObject, VideoObject

 Thing > Place
 See full list at
  http://schema.org/docs/full.html
Constraints to consider
 Helping more people find more things is
  great, right?
 But in the Real World™:
   Assume that there’s a cost to do this
   Assume that there’s a cost for maintenance
   Assume that the standards will change
   Assume that there are other priorities
   Assume that conflicts, dependencies exist
Takeaways
 Jon likes horror movies and The Simpsons
 Blobs aren’t evil, just misunderstood!
 Structured data entities help define blobs
     Structured data entities make blobs easier to
      understand, learn from, index, and find
     Metadata, microdata, and other methods can be
      used to create these entities

 SEO standards (such as Schema.org) are
  emerging to support entities in popular
  search engines.
Many thanks!
               Jonathon Colman
               In-House SEO for REI
               Home:   about.me/jcolman
               Twitter: @jcolman




               Pssssst! So you wanna learn
               more about SEO? See
               http://www.seomoz.org/begin
               ners-guide-to-seo

SEO in the Age of Entities: Using Schema.org for Findability

  • 3.
    INFO 498: ContentStrategy (week #7) From Blobs to Structured Data SEO in the Age of Entities Jonathon Colman, @jcolman In-House SEO for REI www.REI.com
  • 5.
    What is content? If you boil away all the formatting, what’s left?  Just text?  If so, then why isn’t full text search good enough to find what you’re looking for?  What could work better than that?  Any what can we do to content to support its findability?
  • 6.
  • 7.
    Huh? Wikipedia is a source? https://www.facebook.com/pages/The-Bus- That-Couldnt-Slow-Down/114241625259749
  • 8.
    Oh, it’s viaa synonym redirect to… http://en.wikipedia.org/w/index.php?title=The_Bus_Tha t_Couldn%27t_Slow_Down&redirect=no
  • 9.
    Joss Whedon wasa co-writer? WTF?! http://en.wikipedia.org/wiki/Speed_(1994_film)
  • 10.
    What is adocument?  How can you tell what a document is about?  How can you tell one document from another?  What sort of signals do documents give us that help us derive their meaning?  Do you know them when you see them?
  • 11.
    veniam, quis nostrudexerci tation ullamcorper suscipit l ommodo consequat. Duis autem vel eum iriure dolor in h ate velit esse molestie consequat, vel illum dolore eu feu os et accumsan et iusto odio dignissim qui blandit praes augue duis dolore te feugait nulla facilisi. Nam liber tem d option congue nihil imperdiet doming id quod mazim p Typi non habent claritatem insitam; est usus legentis in em. Investigationes demonstraverunt lectores legere me s. Claritas est etiam processus dynamicus, qui sequitur m tudium lectorum. Mirum est notare quam littera gothica us parum claram, anteposuerit litterarum formas human decima et quinta decima. Eodem modo typi, qui nunc no ant sollemnes in futurum. Lorem ipsum dolor sit amet, c ing elit, sed diam nonummy nibh euismod tincidunt ut la m erat volutpat. Ut wisi enim ad minim veniam, quis nost orper suscipit lobortis nisl ut aliquip ex ea commodo con m iriure dolor in hendrerit in vulputate velit esse molestie eu feugiat nulla facilisis at vero eros et accumsan et iusto praesent luptatum zzril delenit augue duis dolore te feu ber tempor cum soluta nobis eleifend option congue nihi d mazim placerat facer possim assum. Typi non habent cl This is a Blob. gentis in iis qui facit eorum claritatem. Investigationes de s legere me lius quod ii legunt saepius. Claritas est etiam icus, qui sequitur mutationem consuetudium lectorum. M ittera gothica, quam nunc putamus parum claram, antep humanitatis per seacula quarta decima et quinta decima
  • 12.
    Lorem ipsum: AStudy in Dolor Sit Amet Author: Melissa Weaver Date: February 18, 2012 Language: Latin, English Publisher: UW Husky Press Keywords: consectetuer, adipiscing, elit, sed, diam Abstract: Nonummy nibh euismod tincidunt ut laoreet dolore magna aliquam erat volutpat. Ut wisi enim ad minim veniam, quis nostrud exerci tation ullamcorper suscipit lobortis nisl ut aliquip ex ea commodo consequat. Chapter 1: Hendrerit in Vulputate Duis autem vel eum iriure dolor in hendrerit in vulputate velit esse molestie consequat, vel illum dolore eu feugiat nulla facilisis at vero eros et accumsan et iusto odio dignissim qui blandit praesent This uses Entities. luptatum zzril delenit augue duis dolore te feugait nulla facilisi. Nam liber tempor cum soluta nobis eleifend option congue nihil imperdiet doming id quod mazim placerat facer possim assum...
  • 13.
    The Problem withBlobs  Unstructured content is useful, but only to a point  It’s hard to scan, skim, and easily make sense of – both for humans and robots  It’s hard to search against, particularly in a crowded collection with lots of competing content containing similar information  What should a search engine pay attention to in order to help the user?
  • 14.
    HTML metadata  Metadatais “data about data”, right?  In HTML, we can express metadata like:  <title>The Problem With Blobs</title>  <meta name=“description” content=“An overview of why blobs are tricky things to deal with.” />  <meta name="keywords" content=“blob, entity, seo, content strategy, inf0498" />  Unfortunately, that’s not going to be good enough. But why not? Let’s see…
  • 15.
    2.2M results! Where are the movies?
  • 16.
    How can wedo better? Real metadata – in this case, “microdata”.
  • 17.
    What is Schema.org? Microdata standard agreed upon by Google, Bing, and Yahoo  Uses relatively simple on-page code to turn blobs of content into structured data  Once structured, this content become interoperable in other systems – you can display that data wherever the standards are accepted  Here’s an example…
  • 18.
    This can increase clicks by +30%.
  • 19.
    Controlled entities helpsearchers  Documents can be documents, authors can be authors, products can be products, and prices can be prices.  Each of these entities has a definition in Schema.org and markup that you can use to define a blob as being actual data.  So if Homer doesn’t know the name of the movie “Speed”, he can still find it with searches for its subject, the actors, the year it came out, the director, etc.
  • 20.
    Exercise: Use the“Article” schema  Go to http://schema.org/Article  Look at the entities and the code sample at the bottom  Pick appropriate content from the IAI Library, such as http://iainstitute.org/en/learn/research/a _simplified_model_for_facet_analysis.php  “View Source” and try marking it up with Schema.org microdata
  • 21.
    Partial potential results <divitemscope itemtype=“http://schema.org/Article”> <h1 itemprop="name">A Simplified Model for Facet Analysis</h1> <div itemscope itemtype=“http://schema.org/Author”> <h2 itemprop=“name">Dr. Louise Spiteri</h2><br /> <span itemprop=“URL">http://dal.academia.edu/LouiseSpiteri</span><br> <div itemscope itemtype=“http://schema.org/Affiliation”> Faculty of Management<br /> School of Library and Information Studies<br /> <span itemprop=“Organization”>Dalhousie University</span><br /> <div itemscope itemtype="schema.org/PostalAddress"> <span itemprop=“addressLocality”>Halifax</span><br /> <span itemprop="addressRegion">Nova Scotia</span> <span itemprop="postalCode">NS B3H 3J5</span><br /> <span itemprop="addressCountry">Canada</span></div><br /> Voice: <span itemprop=“telephone”>(902) 494-2473</span><br /> Fax: <span itemprop=“faxNumber”>(902) 494-2451</span></div><br /> </div> </div>
  • 22.
    How to test UseGoogle’s Rich Snippets Testing Tool: http://www.google.com/webmasters/tools/r ichsnippets
  • 23.
    Sample test output For this example blog post: http://homebiss.blogspot.com/2011/11/markup- blogger-schemaorg-examples.html  The Google Rich Snippets Testing Tool shows this output, which includes some use of Schema.org: http://www.google.com/webmasters/tools/richsnip pets?url=http%3A%2F%2Fhomebiss.blogspot.com% 2F2011%2F11%2Fmarkup-blogger-schemaorg- examples.html&view=
  • 24.
    What did wejust learn?  Schema.org is frakkin’ verbose.  Entities can cascade poly-hierarchically  There are many “right” approaches  Not all entities need to be expressed  Not all entities provide value  Still, it’s hard to know when to stop  In your case, you’re done when the quarter’s over. 
  • 25.
    Common Schema.org entities Thing > Person  Thing > Organization  Thing > CreativeWork > Article See also: Blog, BlogPosting, NewsArticle, ScholarlyArticle  Thing > CreativeWork > MediaObject See also: AudioObject, ImageObject, VideoObject  Thing > Place  See full list at http://schema.org/docs/full.html
  • 26.
    Constraints to consider Helping more people find more things is great, right?  But in the Real World™:  Assume that there’s a cost to do this  Assume that there’s a cost for maintenance  Assume that the standards will change  Assume that there are other priorities  Assume that conflicts, dependencies exist
  • 27.
    Takeaways  Jon likeshorror movies and The Simpsons  Blobs aren’t evil, just misunderstood!  Structured data entities help define blobs  Structured data entities make blobs easier to understand, learn from, index, and find  Metadata, microdata, and other methods can be used to create these entities  SEO standards (such as Schema.org) are emerging to support entities in popular search engines.
  • 28.
    Many thanks! Jonathon Colman In-House SEO for REI Home: about.me/jcolman Twitter: @jcolman Pssssst! So you wanna learn more about SEO? See http://www.seomoz.org/begin ners-guide-to-seo