Semantic Web SEO: Using Linked Data and schema.org to improve Library Reach and Digital Repository Access

  • 2,415 views
Uploaded on

Semantic Web SEO is characterized by a number of concepts that help achieve the goals of increasing library reach by making digital collections more accessible and visible. Intelligent search engines …

Semantic Web SEO is characterized by a number of concepts that help achieve the goals of increasing library reach by making digital collections more accessible and visible. Intelligent search engines will seek and utilize well-structured linked data that improves processing efficiency and the ability to return more accurate results and a richer search experience for users. Semantic search places less importance on the wording of a query and uses probabilities and algorithms to determine intent of the user. In this workshop we demonstrate how linked data concepts and Schema.org can be incorporated into digital libraries to improve search engine contextual understanding of collections and deliver a better experience to their users.
SEO requires tools to measure the effect of your efforts and the value it produces. We will provide a framework and a Google Analytics Scorecard that digital repository collection managers, libraries and their funders can use as a baseline for making informed decisions and tracking progress toward the goal of increasing access and visibility of digital libraries.
Attendees of this workshop will gain knowledge in the following areas:
1. A basic understanding of Semantic Web SEO and its two most important concepts for digital repositories.
2. A baseline Google Analytics dashboard to support pre/post funding decisions and the knowledge to get started
3. Simplifying the setup and administration of Google Analytics and Google Webmaster for their entire organization and their stakeholders
4. A basic understanding of how to incorporate Schema.org and linked data into a digital repository

Session Leaders:
Kenning Arlitsch, Montana State University
Patrick OBrien, Montana State University

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
2,415
On Slideshare
0
From Embeds
0
Number of Embeds
5

Actions

Shares
Downloads
15
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Semantic Web SEO: Using Linked Dataand schema.org to improve Library Reachand Digital Repository AccessKenning  Arlitsch  &  Patrick  OBrien  DLF  Fall  –  Denver,  Colorado  November  5,  2012  
  • 2. Today’s  Objec.ves  u  Basic  understanding  of   v  Semantic  Web  SEO  for  digital  repositories   v  How  to  get  started  incorporating  Schema.org  and  linked  data   into  a  digital  repository  u  Implement  baseline  metrics  to  support  pre/post  funding   decisions  of  digital  repositories   v  Simplify  setup  and  administration  of  Google  Analytics  and   Google  Webmaster  for  an  organization  and  its  stakeholders   v  Implement  Digital  Repository  SEO  Google  Analytics  dashboard  
  • 3. Agenda  u  Why  SEO  &  the  Semantic  Web  Matters   v  Performance  &  Accountability   v  The  semantics  of  what  really  matters  today  u  How  to  Get  Started   v  SEO  Administration  at  an  Institutional  Scale   v  Enhance  Your  Data   v  Clean  up  You  Data  
  • 4. You  can  not  evaluate  what  you  do  not  measure    "We  cannot  call  a  digital-­‐library  or  electronic-­‐ publishing  system  a  success  if  we  cannot   measure  and  interpret  its  use"     -­‐  -­‐  Ann  Peterson  Bishop   “Logins  and  Bailouts:  Measuring  Access,  Use,  and  Success  in  Digital  Libraries”   The  Journal  of  Electronic  Publishing   Volume  4,  Issue  2,  December,  1998  
  • 5. Funding  providers  want  more  accountability   and  demonstrated  value*   u  “IMLS  is  focusing  on  areas  where  it  can  best   effect  change  and  measure  its  results.”**     u  The  IMLS  assessment  model  will  “identify   effective  museum  and  library  services  through   performance  monitoring”  among  other  things.**  * ACRL Research Planning and Review Committee, “2010 top ten trends in academic libraries,” June 2010**Institute of Museum and Library Services. 2011. “Creating a Nation of Learners; IMLS Five-Year Strategic Plan 2012–2016”
  • 6. Accountability  extends  beyond  gran.ng  agencies  u  State  Legislatures   v  Local  tax  payers  u  University  administration  u  Library  administration  u  Donors  u  Association  of  Research  Libraries  statistics  
  • 7. Accountability  at  the  Ins.tu.onal  level  u  Enable  all  your  Stakeholders   v  Collection  Managers   v  IT  Personnel   v  Administrators  u  Avoid  the  free-­‐for-­‐all  of  silos  u  Establish  an  institutional  master  account   v  Administer  rights   v  Everyone  uses  same  baseline  metrics  and  tools  
  • 8. 2010:  began  looking  at  proxy  metrics  for  digital   collec.on  public  accessibility  and  use   u  12+  Billion     v  Number  of  search  queries  submitted  to  Google  each   month  by  Americans*   u  12%   v  Percentage  of  our  digital  collection  content  in   Google  index   u  0.5%   v  Percentage  of  our  USpace  IR  scholarly  papers   accessible  to  researchers  using  Google  Scholar  * http://www.comscore.com/Press_Events/Press_Releases/2012/1/comScore_Releases_December_2011_U.S._Search_Engine_Rankings
  • 9. Basic  SEO  has  improved  collec.on  accessibility   in  Google  across  the  board…   Google Index Ratio - All Collections* 12%  Average   51%   79%   37%   High**   87%   100%   0%   25%   50%   75%   100%   07/05/10   04/04/11   11/30/11   * Google Index Ratio = URLs submitted / URLs Indexed by Google for about 150 collections containing ~170,00 URLs **Highest index ratio achieved for Collections with over 500 URLs submitted to Google
  • 10. …almost  100%  of  USpace  IR  content  is   accessible  to  patrons  using  Google.   Google Index Ratio 12%   07/05/10   ETD  1   69%   11/19/10   97%   10/16/11   0%   ETD  2   68%   98%   23%   UScholar  Works   51%   98%   4%   Board  of  Regents   47%   97%   0%   25%   50%   75%   100%  *October 16, 2011 Weighted Average Google Index Ratio = 97.82% (10,306/10,536).
  • 11. …resul.ng  in  more  referrals  and  visitors   12 week comparison 2010 vs. 2012
  • 12. Agenda  u  Why  SEO  &  the  Semantic  Web  Matters   v  Performance  &  Accountability   v  The  semantics  of  what  really  matters  today  u  How  to  Get  Started   v  SEO  Administration  at  an  Institutional  Scale   v  Enhance  Your  Data   v  Clean  up  You  Data  
  • 13. Today’s  Key  Premise,  Concepts  &  Focus  u  SEO  Goals  are  to  increase  access,  visibility  and   use  by  patrons  that  value  our  content  u  Semantic  Web  is  a  framework  of  standards  and   technologies  to  share,  integrate  and  represent   data  as  concepts  across  different  content,   information  and  system  boundaries.  u  Semantic  Search  incorporates  the  Semantic   Web  to  understand  the  context  and  intent  of   users  seeking  information  and  the  concepts   contained  within  a  document  
  • 14. Why  seman.c  search  is  useful   u  Perfect  application  for  research  &  discovery  of   concepts   v  Apple  Siri   v  IBM  Watson   v  Google  Knowledge  Graph   u  Making  content  Search  Engine  Readable  &   semantically  Understandable  can  increase   v  click  though  rates  (CTR)  by  15%*   v  organic  trafjic  by  30%*  * http://searchengineland.com/how-to-get-a-30-increase-in-ctr-with-structured-markup-105830
  • 15. Seman.c  implies  “meaning”  or  “understanding”  
  • 16. Seman.c  implies  “meaning”  or  “understanding”  u  Why  would  I  search  for  “historic  landmarks  in   Denver”?  u  Anticipates  what  information  I  want?    
  • 17. Seman.c  implies  “meaning”  or  “understanding”  u  Why  would  I  search  for  “historic  landmarks  in   Denver”?  u  Anticipates  what  information  I  want?    
  • 18. Seman.c  implies  “meaning”  or  “understanding”  
  • 19. 4  Major  SE’s  commiZed  to  Schema.org  as  their  seman.c  model  
  • 20. The  4  major  SE’s  have  commiZed   Schema.org  as  their  Seman.c  model   u  SE  Understandable   v  Schema.org  is  a  mechanism  (i.e.,  ontology)  to   communicate  the  meaning  of  your  data   u  SE  Readable   v  Microdata  and  RDFa  are  the  preferred  way  SE’s  read   your  data   u  US  submits  19  Billion  queries  per  month  to  3  of   these  SE’s*   u  We  have  not  found  any  tools  within  reach  of   typical  Library  budgets,  or  skill  sets,  that  are   easily  implementable  * http://www.comscore.com/Press_Events/Press_Releases/2012/1/comScore_Releases_December_2011_U.S._Search_Engine_Rankings
  • 21. Agenda  u  Why  SEO  &  the  Semantic  Web  Matters   v  Performance  &  Accountability   v  The  semantics  of  what  really  matters  today  u  How  to  Get  Started   v  SEO  Administration  at  an  Institutional  Scale   v  Enhance  Your  Data   v  Clean  up  You  Data  
  • 22. Created  a  SEO  Scorecard  designed  to  support  pre  /  post  funding  decisions  u  Assembled  Team  of   v  Collection  Managers   v  Business  School  Group  Project   v  2nd  Year  MBA  Team  u  Focused  on  the  10  Google  Analytics  features  that   support     v  IMLS  &  NEH  strategic  plan   v  SEO  Collection  Manager  Goals  
  • 23. Created  a  SEO  Scorecard  designed  to  support  pre  /  post  funding  decisions  
  • 24. Workshop  Process  u  Diagrams  and  Process  of  what  we  did  at  Utah  u  Live  Demo  Using  Montana  State  (MSU)  u  Information  that  would  be  helpful  today   v  Access  to  your  organization’s  Admin  Accounts  (i.e.,   User  ID  &  password)   n  Google  Analytics   n  Google  Webmaster  Tools   v  An  internal  list  server  for  your  organizations   Managers  responsible  for  making  pre  /  post   funding  digital  repository  decisions  
  • 25. Diagram  of  problem  domain  
  • 26. Steps  for  se]ng  up  Measurement  &  Evalua.on  for  your  Ins.tu.on  and  Staff  1.  Associate  a  Google  Account  with  your   Institution  2.  Staff  create  their  own  Google  Account  using   their  Institution  email  address  3.  Activate  Google  Services  using  your  Institution   Google  Account   v  Google  Analytics   v  Google  Webmaster  Tools  4.  Add  Staff  to  Google  Services  using  their   Institution  email  addresses  
  • 27. Diagram  of  what  it  all  looks  like   2 1
  • 28. Diagram  of  what  it  all  looks  like   4 3
  • 29. Step  1:  Associate  a  Google  Account*   (Master)  with  your  Ins.tu.on   u  Use  an  internal  list  server  e.g.,  seo@utah.edu   u  Include  managers  who  are  responsible  for   administration   v  Google  Analytics   v  Google  Web  Master  Tools  * https://accounts.google.com/NewAccount
  • 30. Step  1:  Associate  a  Google  Account  (Master)  with  your  Ins.tu.on  
  • 31. Step  2:    Staff  create  their  own  Google   Account*  (Master)  using  Ins.tu.on  email    * https://accounts.google.com/NewAccount
  • 32. Step  3:  Ac.vate  Google  Services  using  your  Ins.tu.on  Google  Account  (Master)  
  • 33. Step  4:  Add  Staff  to  Google  Services  using  their  Ins.tu.on  email  addresses  
  • 34. Step  3  &  4:  Successful  Google  Analy.cs  
  • 35. Step  3  &  4:  Successful  Google  Webmaster  Tools  
  • 36. Next  steps  are  to  test  scalable  tools  and  repeatable  process  u  Found  issues  with  most  Analytics  conjigurations  u  We  Need  study  participants  to  evaluate  and  test   accuracy  of  additional  analytics  tools  being   developed  under  IMLS  Grant  program    
  • 37. What  type  of  web  analy.cs  socware  does   your  IR  use?  A.  Analytics  Service  B.  Log  Files  C.  Dont  Know  D.  None   IR HTML Page Tagging B A {JavaScript} Log Files Analytics Service
  • 38. Both  types  have  poten&al  accuracy  issues   for  IRs  A.  Analytics  Services   v  Under  count  non-­‐HTML  (e.g.,   PDF)  jile  downloads  B.  Log  Files   v  Over  count  visits  &  downloads   due  to  spiders,  etc.   IR v  Under  count  page  views  due  to   web  caching  –  upto  30%   HTML Page Tagging B A {JavaScript} Log Files Analytics Service
  • 39. Analy.cs  Services  do  not  track  non-­‐HTML  downloads  out  of  the  box   Special Config Non-HTML HTML Page Tagging A {JavaScript} Analytics Service
  • 40. Analy.cs  Services  do  not  track  non-­‐HTML  file  downloads  via  direct  external  links   Non-HTML HTML Page Tagging A {JavaScript} Analytics Service
  • 41. Agenda  u  Why  SEO  &  the  Semantic  Web  Matters   v  Performance  &  Accountability   v  The  semantics  of  what  really  matters  today  u  How  to  Get  Started   v  SEO  Administration  at  an  Institutional  Scale   v  Enhance  Your  Data   v  Clean  up  You  Data  
  • 42. Tradi.onal  SEO  is  s.ll  very  important,  but  not  today’s  focus.  u  Descriptive  Page  Titles,  anchor  text,   descriptions,  etc.  u  Easy  &  Intuitive  Site  Navigation  u  Submit  sitemaps/conjigure  robots.txt  jile  u  Monitor/address  errors  u  Inform  staff  &  assign  ownership  u  Clean  metadata  u  Upgrade  repository  software  
  • 43. Recommended  Background  informa.on  u  Ronallo,  Jason.  "HTML5  Microdata  and  Schema.  org."  Code4Lib  Journal  (2012).   http://journal.code4lib.org/articles/6400u  Arlitsch,  Kenning,  and  Patrick  OBrien.  "Invisible  Institutional  Repositories:   Addressing  the  Low  Indexing  Ratios  of  IRs  in  Google  Scholar."  Library  Hi  Tech   30,  no.  1  (2012):  60-­‐81.   http://www.emeraldinsight.com/journals.htm?articleid=17020806u  Arlitsch,  Kenning,  and  Patrick  OBrien.  "Search  Engine  Optimization  (SEO)  for   Institutional  Repositories."  In  Technical  Advances  for  Innovation  in  Cultural   Heritage  Institutions  (TAI  CHI)  Webinar  Series;  2012  Mar  16;  pp.  1-­‐48.  OCLC   Research,  Online  Computer  Library  Center,  Inc.  (OCLC),  2012.     http://www.oclc.org/resources/research/events/20120316seo.pdfu  Arlitsch,  Kenning,  and  Patrick  OBrien.  "Search  engine  optimization  (SEO)  for   digital  repositories."  In  Coalition  for  Networked  Information  (CNI)  Spring   2011  Membership  Meeting;  2011  Apr  4-­‐5;  San  Deigo,  California,  USA;  pp.  1-­‐25.   J.  Willard  Marriott  Library,  University  Libraries,  University  of  Utah,  2011.   http://content.lib.utah.edu/utils/getfile/collection/uspace/id/1976/filename/713.pdf
  • 44. Challenge  is  presen.ng  structured  data  SE’s   can  iden.fy,  parse  and  digest   Human ReadableWoljinger,  N.  H.,  &  McKeever,  M.  (2006,  July).  Thanks  for  nothing:  changes  in  income  and  labor  force  participation  for  never-­‐married  mothers  since  1982.  In  101st  American  Sociological  Association  (ASA)  Annual  Meeting;  2006  Aug  11-­‐14;  Montreal,  Canada  (No.  2006-­‐07-­‐04,  pp.  1-­‐42).  Institute  of  Public  &  International  Affairs  (IPIA),  University  of  Utah.   Machine Understandable
  • 45. Google  Scholar  can  read  and  understand!   Google Scholar
  • 46. However,  Google  can  not  understand  or  read  any  of  our  “structured  data”   nd able de rsta No t Un rg = ad able em a.o t Re Sch No N o DFa= R data or icro N oM
  • 47. Work  Shop  Excercise   Meta  Tag   Working  Paper  1  -­‐  citation_author   Arlitsch,  Kenning;  OBrien,  Patrick  2  -­‐  citation_date   2011-­‐04-­‐05  3  -­‐  citation_title   Search  engine  optimization  (SEO)  for  digital  repositories  6  -­‐  citation_volume  7  -­‐  citation_issue  8  -­‐  citation_jirstpage   1  9  -­‐  citation_lastpage   25  10  -­‐  citation_doi    13  -­‐  citation_keywords   SEO  Tips,  Special  Collections,  Digital  Collection,  Institutional  Repository,  Digital   Repository  16  -­‐  citation_technical_report_institution   University  of  Utah  17  -­‐  citation_technical_report_number  18  -­‐  citation_language   en  19  -­‐  citation_conference_title   Coalion  for  Networked  Informaon  (CNI)  Spring  2011  Membership  Meeng;  201  Apr  4-­‐5;   San  Diego,  California,  USA  21  -­‐  citation_pdf_url     http://content.lib.utah.edu/utils/getfile/collection/uspace/id/1976/filename/713.pdf22  -­‐  citation_abstract_html_url   http://content.lib.utah.edu/cdm/ref/collection/uspace/id/197623  –  University   University  of  Utah  24  –  College   University  Libraries  25  –  Department   J.  Willard  MarrioO  Library  26  –  subject.LCSH   Web  search  engines;  Web  sites-­‐-­‐Registraon  with  search  engines;  Digital  libraries-­‐-­‐Collecon   development  
  • 48. Describe  concepts  using  Schema.org  to  help  SE  understand  your  repository  u  Answer  Questions   v  What  type  of  WebPage?   v  What  content  /  data  does  the  page  contain?   v  Who  was  involved?     n  Organizations?   n  People?   v  Where  is  it?  u  Look  at  the  properties  to  see  if  the  concept   applies  
  • 49. WebPage  concepts  relevant  to  digital   repositories   u  Creative  Works  >  WebPage* u  WebPage  Classes   u  Important  Properties   v  SearchResultsPage   v  description   v  CollectionPage   v  breadCrumb   n  ImageGallery   n  VideoGallery   v  isPartOf   v  ItemPage   v  signijicantLink   v  signijicantLinks  * http://schema.org/WebPage
  • 50. Typical  Digital  Repository  Content   u  CreativeWorks  Classes   v  Article  >  ScholarlyArticle   v  Book   v  Map   u  Important  Properties   v  Painting   v  publisher   v  Photograph   v  sourceOrganization   v  MediaObject   v  contentLocation   n  AudioObject   v  copyrightHolder   n  ImageObject   v  author   n  MusicVideoObject   n  VideoObject  * http://schema.org/ScholarlyArticle
  • 51. Organiza.ons  might  be  relevant   u  Organization*   v  EducationalOrganization   n  CollegeOrUniversity   v  LocalBusiness   u  Important  Properties   n  Library**   v  member   v  employee   v  contactPoint  * http://schema.org/Organization** http://schema.org/Library
  • 52. What  People  might  be  relevant   u  Person*   u  Important  Properties   v  memberOf   v  worksFor   v  jobTitle   v  email   v  afjiliation   v  alumniOf  * http://schema.org/Person
  • 53. What  loca.ons  might  be  relevant?   u  Place*   v  LandmarksOrHistoricalBuildings   u  Intangible  >  StructuredValue   v  GeoCoordinates   u  Important  Properties   v  geo   v  photo   v  address   v  containedIn  * http://schema.org/Place
  • 54. Check  your  work  using  Google  Rich  Snipet   Tool  <title>Search engine optimization (SEO) for digital repositories</title><body itemscope itemtype="http://schema.org/WebPage"><div itemprop="breadcrumb"> <a href="category/ir.html">Uspace Instutional Repository</a> > <a href="category/CollegeofSocialBehavioralScience.html">University Libraries</a> > <a href="category/books-literature.html">J. Willard Marriott Library</a> ></div><div itemscope itemtype="http://schema.org/ScholarlyArticle"> <span itemprop="name">Search engine optimization (SEO) for digital repositories</span> <div itemscope itemtype="http://schema.org/Person"> <span itemprop="name">Patrick OBrien</span> <a href="http://www.linkedin.com/in/obrienpatricks" itemprop="url">Patrick OBrien Resume</a> <span itemprop="jobTitle">Semantic Web Research Director</span> <div itemprop="affiliation" itemscope itemtype="http://schema.org/CollegeOrUniversity"> <span itemprop="name">Montana State University Library</span> </div> <div itemprop="affiliation" itemscope itemtype="http://schema.org/Organization"> <a href="http://www.RevXcorp.com" itemprop="name">RevX Corporation</a> </div> </div></div></body>
  • 55. Ques.ons  &  Study  Par.cipa.on?  Kenning  Arlitsch  Dean  of  the  Library  at  Montana  State  University  kenning.arlitsch@montana.edu    Patrick  OBrien  Semantic  Web  Research  Director  patrick.obrien4@montana.edu