Seman&c	  Web                                       	  for	                            Libraries	  &	  Publishers         ...
The	  Problem	  Set                                              2Monday, November 21, 11
Monday, November 21, 11Silos
Monday, November 21, 11More silos
Monday, November 21, 11Lots of different silos
Monday, November 21, 11Blue silos
Monday, November 21, 11Old SilosWe in the library and publishing trades force readers, some of them who are authors aswell...
Monday, November 21, 11We give them better interfaces, ones that permit refinement of results, to our holdings atthe title ...
Monday, November 21, 11Simulateneously, we show them many other tools, each excellent in some ways, tocontinue their explo...
Monday, November 21, 11And the results of using other, often very good, discovery tools differ in relevanceranking, format...
Monday, November 21, 11some of us provide our readers with lots of databases to search. Too many really, for allbut a few ...
Monday, November 21, 11Selecting a licensed data base is an art in itself!Once again notice that we rarely offer a web sea...
!!!Monday, November 21, 11We have not conspired to make the search for relevant information objects difficult. Wejust have ...
ATLAS at LHC -- 150*106 sensors    Ntl Cntr for    Biotech Info                                         NSF CyberInfrastru...
Monday, November 21, 11Too many silos.Here’s the biggest of the lot...
16Monday, November 21, 11
One size fits all???                                                           17Monday, November 21, 11Does	  one	  size	...
18Monday, November 21, 11Not	  quite.	  	  Even	  Google	  has	  silos	  and	  uses,	  as	  do	  others,	  clever	  interf...
Monday, November 21, 11Given all these silos and search engines, our users, our authors, and readers, andteachers, and stu...
Discovery & Access                              ...   the problemsMonday, November 21, 11Let’s dwell on the problemsbriefl...
1. Too many stovepipe systems        2. Too little precision           with inadequate recall                             ...
1. Too many stovepipe systemsMonday, November 21, 11
1. Too many stovepipe systems     The landscape of discovery & access     services is a shamblesMonday, November 21, 11
1. Too many stovepipe systems     The landscape of discovery & access     services is a shambles     It can’t be mapped in...
1. Too many stovepipe systems     The landscape of discovery & access     services is a shambles     It can’t be mapped in...
1. Too many stovepipe systems     The landscape of discovery & access     services is a shambles     It can’t be mapped in...
2. Too little precision        with inadequate recallMonday, November 21, 11
2. Too little precision        with inadequate recall     Some of the problem ... too many stovepipe systemsMonday, Novemb...
2. Too little precision        with inadequate recall     Some of the problem ... too many stovepipe systems     • dumbing...
2. Too little precision        with inadequate recall     Some of the problem ... too many systems     • dumbing down effe...
the 1st limiting factor ... ambiguityMonday, November 21, 11
the 1st limiting factor ... ambiguity   Most of our metadata uses a string of bytes   to label a semantic entity [people, ...
the 1st limiting factor ... ambiguity   Most of our metadata uses a string of bytes   to label a semantic entity [person, ...
the 1st limiting factor ... ambiguity   Most of our metadata uses a string of bytes   to label a semantic entity [person, ...
the 1st limiting factor ... ambiguity   Most of our metadata uses a string of bytes   to label a semantic entity [person, ...
the 1st limiting factor ... ambiguity   Most of our metadata uses a string of bytes   to label a semantic entity [person, ...
... a rose is a rose is a rose     company                   Ltd.        cars                    XK series, in pro-       ...
... a rose is a rose is a rose      company                                       music                       Ltd.        ...
... a rose is a rose is a rose      company                                       music                       Ltd.        ...
Prrrrr     ... a rose is a rose is a rose     company                                    music                   Ltd.     ...
the 2nd limiting factor                  ... instance-based metadataMonday, November 21, 11
the 2nd limiting factor                  ... instance-based metadata     Most of our metadata uses focuses      on publica...
the 2nd limiting factor                  ... instance-based metadata    Most of our metadata uses focuses     on publicati...
the 2nd limiting factor                  ... instance-based metadata    Most of our metadata uses focuses     on publicati...
Prolific authors ...               search:                                              Shakespeare’s Hamlet     Wading th...
Prolific authors ...               search:                                              Shakespeare’s Hamlet     Wading th...
Prolific authors ...               search:                                              Shakespeare’s Hamlet     Wading th...
3     3. Too far removed from W                                       Web                                   Wide          ...
3     3. Too far removed from W                                                                              Web          ...
3     3. Too far removed from W                                                                              Web          ...
3     3. Too far removed from W                                                                              Web          ...
3     3. Too far removed from W                                                                              Web          ...
3     3. Too far removed from W                                                                              Web          ...
Our	  Working	  Environment                            54Monday, November 21, 11
academypublisher                          pr                             od                             ce ulibrary       ...
Once	  upon	  a	  &me…the	  Internet        internetMonday, November 21, 11And here is the way the e-discovery and e-commu...
Then…the	  World	  Wide	  Web                                                   web                                       ...
webUnder	  construc&on                                                                              of                    ...
webUnder	  construc&on                                                                                                    ...
ConstrucGon	  Tools                                                                                                       ...
Recipe	  for	  crea+ng	  the	  web	  of	  data                          • identify people, places, things, events,        ...
Recipe	  for	  crea+ng	  the	  web	  of	  data                          • identify people, places, things, events,        ...
Recipe	  for	  crea+ng	  the	  web	  of	  data                          • identify people, places, things, events,        ...
Recipe	  for	  crea+ng	  the	  web	  of	  data                          • identify people, places, things, events,        ...
65Monday, November 21, 11Here	  is	  a	  pile	  of	  words	  represenGng	  all	  the	  words	  on	  the	  web	  that	  mos...
From	  this	  pile	  of	  words,	  structure!                                                                             ...
67Monday, November 21, 11Here’s	  a	  graph	  of	  a	  very	  few	  relaGonships	  to	  Yo	  Yo	  Ma,	  the	  great	  ‘cel...
Linked	  Data	  Web                                                                                           68Monday, No...
RDF$triples$&$URIs$              •  RDF$triples$=$subject$–$object$–$predicate$                    –  A$way$to$describe$ob...
70Monday, November 21, 11A	  graph	  of	  RDF	  statements	  and	  URIs
The Linked Data Principles 1. Use Resource Description Frameworks as names of things (people, places, times, objects, idea...
LibraryMetadata            •    Librarymetadatastandardsclosed            •                                               ...
LibraryMetadata                                     Seman/cWebMetadata               LibraryMetadata                      ...
Make	  Library	  	  bibliographic	  factsin	  to	  RDFs	  &	  URIs;Release	  them	  into	  the	  wild.Make	  Library	  Lin...
What	  about	  Publishers?                                                  75Monday, November 21, 11
Publishers*&*Socie/es**                              making*use*of*Linked*Data*     •  Aggregate*content*in*their*own*real...
Seman4c	  Web	  adopters                                                        77Monday, November 21, 11Here	  are	  some...
78Monday, November 21, 11For	  publishers	  and	  libraries...though	  we	  should	  not	  neglect	  services.
...if	  users	  can	  find	  it	  in	  their	  own	  context                                                               ...
Context                                       Users                  ContentUsers	  =	  readers,	  authors,	  teachers,	  ...
Context                                         Users                         ContentPublishers	  must	  make	  content	  ...
82Monday, November 21, 11Here	  is	  a	  recent	  PLoS	  arGcle	  from	  PLoS	  Neglected	  Tropical	  Diseases.	  	  
83Monday, November 21, 11And	  here	  is	  the	  semanGcally	  enhanced	  version	  of	  this	  arGcle,	  enhancements	  p...
aggrega+on                                                                                                                ...
85Monday, November 21, 11
Disambigua4on                                                                                                             ...
Web	  of	  Data	  Progress                                                      87Monday, November 21, 11
2007                                                                                                                      ...
89Monday, November 21, 11This	  is	  the	  2011	  graph	  of	  enGGes	  supplying	  RDFs	  and	  URIs.	  	  Now	  the	  po...
2011                                                            90                          hSp://inkdroid.org/lod-­‐graph...
Encouragement                          Examples                                          91Monday, November 21, 11
LinkedOpenDataValueProposi4on     •  Linkedopendata(LOD)putsinforma4onwherepeoplearelookingforit–on        theWeb;     •  ...
Google	  using	  Stanford	  bib	  facts	  +	  web	  resources                                                             ...
BnF	  using	  data	  only	  from	  its	  catalogs	  &	  Gallica                                                           ...
95Monday, November 21, 11
A"Bibliographic"Framework"for"the"             Digital"Age"(October"31,"2011)!  •  “The!new!bibliographic!framework!projec...
Value	  Proposi-on	  for	  LAM’s    We	  in	  the	  cultural	  heritage	  and	  knowledge	  management	  institutions	  ar...
DARPA	  Internet                                                                 98Monday, November 21, 11This	  is	  wher...
World	  Wide	  Web                                                                      99Monday, November 21, 11Thanks	  ...
SOCIAL	  WEB                                                                                                              ...
Linked	  Data	  Web                                                                                101Monday, November 21,...
Seman+c	  Web                                                                                                      102Mond...
Ubiquitous	  compu+ng                                                                                                     ...
Mobility                                                                                                                  ...
Ubiquitous	  Compu4ng                                                                                             Linked	 ...
Monday, November 21, 11NO MORE SILOS ARE NEEDED or wanted.
W3C Library Linked Data Incubator                          Group                          http://www.w3.org/2005/Incubator...
108Monday, November 21, 11
109Monday, November 21, 11
110Monday, November 21, 11
111Monday, November 21, 11
112Monday, November 21, 11
113Monday, November 21, 11
Upcoming SlideShare
Loading in...5
×

The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University

1,782

Published on

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,782
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
34
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "The Semantic Web for Librarians and Publishers, by Michael Keller, Stanford University"

  1. 1. Seman&c  Web  for   Libraries  &  Publishers Charleston  Conference   111103Monday, November 21, 11so, what’s the problem?
  2. 2. The  Problem  Set 2Monday, November 21, 11
  3. 3. Monday, November 21, 11Silos
  4. 4. Monday, November 21, 11More silos
  5. 5. Monday, November 21, 11Lots of different silos
  6. 6. Monday, November 21, 11Blue silos
  7. 7. Monday, November 21, 11Old SilosWe in the library and publishing trades force readers, some of them who are authors aswell, to search iteratively for information they want or need or thinks might exist, inmany different silos, using many different search engines, forms, and vocabularies. Wedo not make it easy for them to discover what is locally available, what is more or lesseasy to get, or everything that might be available.No wonder the young and foolish depend upon and believe in Google’s searches.Google is quick...and in terms of search terms of relevance, very, very dirty.
  8. 8. Monday, November 21, 11We give them better interfaces, ones that permit refinement of results, to our holdings atthe title level, BUT...
  9. 9. Monday, November 21, 11Simulateneously, we show them many other tools, each excellent in some ways, tocontinue their exploration of the literature. No single tool is comprehensive. We do notrefer our clients to the Web, at least not on our own web sites! // Our OPACs refer to ourholdings. While Indices and abstracts refer our readers to articles in journals to whichwe may have licensed. SFX and similar provide readers with links to titles revealed towhich we have subscribed. Neither our opacs nor the secondary databases directly tomore than a tiny, percentage of the vast collection of pages that is the World Wide Web.The Web, of course, refers in fragmentary fashion to information resources we might, Iemphasize, MIGHT have on hand for our readers.
  10. 10. Monday, November 21, 11And the results of using other, often very good, discovery tools differ in relevanceranking, format, and options than the ones we provide for our OPAcs, thus addingconfusion.
  11. 11. Monday, November 21, 11some of us provide our readers with lots of databases to search. Too many really, for allbut a few are not forensic-level scholars.
  12. 12. Monday, November 21, 11Selecting a licensed data base is an art in itself!Once again notice that we rarely offer a web search engine as an option, and for goodreasons. Nevertheless, the discoverable relevant information resources on the webapparently are not part of our repertory.
  13. 13. !!!Monday, November 21, 11We have not conspired to make the search for relevant information objects difficult. Wejust have not yet had the tools, the methods, the vision, and yes, the gumption to trysomething new.
  14. 14. ATLAS at LHC -- 150*106 sensors Ntl Cntr for Biotech Info NSF CyberInfrastructure quake engineering simulationMonday, November 21, 11Here’s a teensy slice of the information and communication environment in which ourfaculty and students find themselves. And it gets more complex every day. Alas thelarger the number of websites indexed by Bing or Google or whatever search engine dujour, the more likely it is that the relevance of the returns will be less pointed andprecisely matched to what the searcher hoped to find.
  15. 15. Monday, November 21, 11Too many silos.Here’s the biggest of the lot...
  16. 16. 16Monday, November 21, 11
  17. 17. One size fits all??? 17Monday, November 21, 11Does  one  size  fit  all?
  18. 18. 18Monday, November 21, 11Not  quite.    Even  Google  has  silos  and  uses,  as  do  others,  clever  interfaces  to  hide  the  fact  of  the  silos.
  19. 19. Monday, November 21, 11Given all these silos and search engines, our users, our authors, and readers, andteachers, and students, people on the street, our nations...need us to find a better way.Facts about the information objects we have acquired or leased, facts about books,articles, films, and so forth that we have published need to be found in the wild, on theweb. Ideally, we, librarians and publishers will get the facts about what we have andwhat we are making public, for fun or profit, discoverable on the Web.
  20. 20. Discovery & Access ... the problemsMonday, November 21, 11Let’s dwell on the problemsbriefly...
  21. 21. 1. Too many stovepipe systems 2. Too little precision with inadequate recall 3 3. Too far removed from W Web Wide WorldMonday, November 21, 11
  22. 22. 1. Too many stovepipe systemsMonday, November 21, 11
  23. 23. 1. Too many stovepipe systems The landscape of discovery & access services is a shamblesMonday, November 21, 11
  24. 24. 1. Too many stovepipe systems The landscape of discovery & access services is a shambles It can’t be mapped in any logical wayMonday, November 21, 11
  25. 25. 1. Too many stovepipe systems The landscape of discovery & access services is a shambles It can’t be mapped in any logical way • not by us (the supposed information pros) • not by the faculty & students who must navigate the chaosMonday, November 21, 11
  26. 26. 1. Too many stovepipe systems The landscape of discovery & access services is a shambles It can’t be mapped in any logical way • not by us (the supposed information pros) • not by the faculty & students who must navigate the chaos This state of affairs shouldn’t be a surpriseMonday, November 21, 11
  27. 27. 2. Too little precision with inadequate recallMonday, November 21, 11
  28. 28. 2. Too little precision with inadequate recall Some of the problem ... too many stovepipe systemsMonday, November 21, 11
  29. 29. 2. Too little precision with inadequate recall Some of the problem ... too many stovepipe systems • dumbing-down effects of federation often hinder explicit searches • each interface has its own search-refinement tricks • numerous, overlapping discovery paths hamper full recallMonday, November 21, 11
  30. 30. 2. Too little precision with inadequate recall Some of the problem ... too many systems • dumbing down effects of federation often hinder explicit searches • each interface has its own search-refinement tricks • numerous, overlapping discovery paths hamper full recall Most of the problem ... limitations in the design & execution of infrastructure that supports discovery & accessMonday, November 21, 11
  31. 31. the 1st limiting factor ... ambiguityMonday, November 21, 11
  32. 32. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [people, places, things, events, ...]Monday, November 21, 11
  33. 33. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entitiesMonday, November 21, 11
  34. 34. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entities For libraries, the fix is authorities • authoritative forms of strings (names, organization, titles, places, events, topics, etc.)Monday, November 21, 11
  35. 35. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entities For libraries, the fix is authorities • authoritative forms of strings (names, organization, titles, places, events, topics, etc.) work to improve precision and recall hold on ... what about cases where no one-to-one relationship exists between a string-of-text label & the underlying semantic entityMonday, November 21, 11
  36. 36. the 1st limiting factor ... ambiguity Most of our metadata uses a string of bytes to label a semantic entity [person, place, thing, event, ...] • discovery based on matching text labels • not on the gist of semantic entities For libraries, the fix is authorities • authoritative forms of strings (names, organization, titles, places, events, topics, etc.) work to improve precision and recall hold on ... what about cases where no one-to-one relationship exists between a string-of-text label & the underlying semantic entity Take for example the text string: jaguar byte string: 4a 61 67 75 61 72Monday, November 21, 11
  37. 37. ... a rose is a rose is a rose company Ltd. cars XK series, in pro- duction since 1996 E-Type (UK) or XK-E (US) mftg 1961 to 1974 etc. hardware & software Atari video game console Macintosh OS X 10.2 John Giannandrea, CTO, MetawebMonday, November 21, 11Imagine this keyword search and realize the ambiguity of the term “jaquar”inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC inApril, 2008
  38. 38. ... a rose is a rose is a rose company music Ltd. heavy metal band formed in Bristol, England. Dec 1979 cars Fender electric guitar, XK series, in pro- introduced in 1962 duction since 1996 Philadelphia-based singer/songwriter E-Type (UK) or Jaguar Wright XK-E (US) mftg 1961 to 1974 etc. military type 140 Jaguar class fast attack craft [torpedo], hardware & software Germany WWII Atari video game console Anglo-French ground attack aircraft Macintosh XF10F prototype swing-wing OS X 10.2 fighter, early 1950s, Grumman John Giannandrea, CTO, MetawebMonday, November 21, 11 inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008
  39. 39. ... a rose is a rose is a rose company music Ltd. heavy metal band formed in Bristol, England. Dec 1979 cars Fender electric guitar, heros XK series, in pro- introduced in 1962 duction since 1996 The Jaguar is a superhero published by Archie Comics Philadelphia-based singer/songwriter E-Type (UK) or Jaguar Wright XK-E (US) mftg 1961 to 1974 DC Comics Impact series, ... loosely based on Archie Comics character etc. military type 140 Jaguar class fast attack pro footbal craft [torpedo], hardware & software Germany WWII Jacksonville Atari video game console Anglo-French ground attack aircraft Macintosh XF10F prototype swing-wing OS X 10.2 fighter, early 1950s, Grumman John Giannandrea, CTO, MetawebMonday, November 21, 11 inspired by John Giannandrea, CTO, Metaweb ... from his presentation at PARC in April, 2008
  40. 40. Prrrrr ... a rose is a rose is a rose company music Ltd. heavy metal band formed in Bristol, England. Dec 1979 cars Fender electric guitar, heros XK series, in pro- introduced in 1962 duction since 1996 The Jaguar is a superhero published by Archie Comics Philadelphia-based singer/songwriter E-Type (UK) or Jaguar Wright XK-E (US) mftg 1961 to 1974 DC Comics Impact series, ... loosely based on Archie Comics character etc. military type 140 Jaguar class fast attack pro footbal craft [torpedo], hardware & software Germany WWII Jacksonville Atari video game console Anglo-French ground attack aircraft Macintosh XF10F prototype swing-wing OS X 10.2 fighter, early 1950s, Grumman John Giannandrea, CTO, MetawebMonday, November 21, 11inspired by John Giannandrea, CTO, Metaweb... from his presentation at PARC in April, 2008
  41. 41. the 2nd limiting factor ... instance-based metadataMonday, November 21, 11
  42. 42. the 2nd limiting factor ... instance-based metadata Most of our metadata uses focuses on publication artifacts • identify responsibility for its creation • list topical headingsMonday, November 21, 11
  43. 43. the 2nd limiting factor ... instance-based metadata Most of our metadata uses focuses on publication artifacts • identify responsibility for its creation • list topical headings For simple cases ... few worries • as with ambiguity, one-to-one relationships pose few problems • things work for authors with a few books in several editionsMonday, November 21, 11
  44. 44. the 2nd limiting factor ... instance-based metadata Most of our metadata uses focuses on publication artifacts • identify responsibility for its creation • list topical headings For simple cases ... few worries • as with ambiguity, one-to-one relationships pose few problems • things work for authors with a few books in several editions But, as complexity increases, precision & recall sufferMonday, November 21, 11
  45. 45. Prolific authors ... search: Shakespeare’s Hamlet Wading thru search results for authors 811 entries like Shakespeare shows clearly the effects that instance-based metadata has on precision & recallMonday, November 21, 11A Socrates (Stanford Libraries OPAC) keyword search for the terms shakespeare andhamlet
  46. 46. Prolific authors ... search: Shakespeare’s Hamlet Wading thru search results for authors 811 entries like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall Unflagging patience marks the task of flipping back & forth between hundreds of brief and full records to sort thru the varied instances of a single entityMonday, November 21, 11
  47. 47. Prolific authors ... search: Shakespeare’s Hamlet Wading thru search results for authors 811 entries like Shakespeare shows clearly the effects that instance-based metadata has on precision & recall Unflagging patience marks the task of flipping back & forth between hundreds of brief and full records to sort thru the varied instances of a single entity, e.g. • critical editions based on primary sources • 18th & 19th century collections of the plays • social, historical and literary essays • histories & critiques of such writings • video and audio recordings of performances • reviews and indices of the same • treatments of stagecraft, costumes, music • life & works of notables associated with the plays (e.g., performers, directors) • other art forms inspired by the playsMonday, November 21, 11
  48. 48. 3 3. Too far removed from W Web Wide WorldMonday, November 21, 11
  49. 49. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ]Monday, November 21, 11
  50. 50. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and accessMonday, November 21, 11
  51. 51. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and access • Library of Congress & Smithsonian images (FLICKR)Monday, November 21, 11
  52. 52. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and access • Library of Congress & Smithsonian images (FLICKR) • SULAIR’s Highwire Press ( > 2x increase via Google)Monday, November 21, 11
  53. 53. 3 3. Too far removed from W Web Wide World Together, our metadata & collections make up a big chunk of the “dark web” [ info resources that search-engine spiders can’t see ] It’s clear that visibility on the web promotes dramatic increases in discovery and access • Library of Congress & Smithsonian images (FLICKR) • SULAIR’s Highwire Press ( > 2x increase via Google) The state of affairs is well known ...Monday, November 21, 11
  54. 54. Our  Working  Environment 54Monday, November 21, 11
  55. 55. academypublisher pr od ce ulibrary pr Scholars ov &  students e idMonday, November 21, 11Here is a schematic to suggest how our ecosystem works. It is more complex, ofcourse, but the basics are embodied here.
  56. 56. Once  upon  a  &me…the  Internet internetMonday, November 21, 11And here is the way the e-discovery and e-communication environment is developing.First there was the Internet. Prophets such as Vannevar Bush, Ted Nelson, and DougEnglebart showed us the way.
  57. 57. Then…the  World  Wide  Web web of pages internetMonday, November 21, 11Thanks to another profit, Tim Berners-Lee, the Internet, a network of communicatingcomputers, became a web of pages of information. Scholarly journal publishers and somelibrarians realized early on that there were functional advantages to scholarship and topublishing in the web of pages. Yahoo, Google, and others realized that mining the web opages by words on those pages, could make the rapidly growing web of pages reveal morthrough indexing and cataloging the web. Indexing won out as we now know over catalogThe next thing is the subject of this talk. It is the web of data. It is the web of relationshipsconstructed and expressed so that both computers and humans can identify and understarelationships in that web. The web of data lives with the web of pages and is carried on thInternet, the global carrier.
  58. 58. webUnder  construc&on of data web of pages internetMonday, November 21, 11This web of data is the next big thing in discovering relevant information objects and the nbig thing in empowering individuals, communities, and industries in making better use ofinformation that they or others create. What distinguishes this web of data, this linked datenvironment, is the principal of identifying entities, virtual & real by statements of relationsand descriptions in machine readable form. More about this as we go along.
  59. 59. webUnder  construc&on of data web of pages internet aka Linked DataMonday, November 21, 11We  are  calling  this  next  phase  the  Linked  Data  phase,  because  it  is  enGrely  dependent  upon  statements  of  relaGonships  and  descripGons  in  machine  readable  form,  but  this  phase  may  be  onla  pre-­‐cursor  to  another,  more  complex  and  more  difficult  web  world  to  engineer.  The  next  phase  ithe  SemanGc  Web,  which  in  theory  allows  the  machine  readable  relaGonships  and  descripGons  tointeroperate  to  saGsfy  a  person’s  requirements,  albeit  without    constant  interacGon.    In  short,  in  thSemanGc  Web,  the  machines  will  understand  meaning  and  presumably  act  on  it.    Scarey,  eh?
  60. 60. ConstrucGon  Tools 60Monday, November 21, 11How  to  we  work  to  alleviate  our  problems  as  informaGon  professionals,  librarians  and  publishers?
  61. 61. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and producesMonday, November 21, 11
  62. 62. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces • tie those facts together with named connectionsMonday, November 21, 11
  63. 63. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces • tie those facts together with named connections • publish the relationships as crawl-able links on the webMonday, November 21, 11
  64. 64. Recipe  for  crea+ng  the  web  of  data • identify people, places, things, events, and other entities embedded in the knowledge resources that a research university consumes and produces • tie those facts together with named connections • publish the relationships as crawl-able links on the web  Build/use apps supporting discovery via the web of dataMonday, November 21, 11
  65. 65. 65Monday, November 21, 11Here  is  a  pile  of  words  represenGng  all  the  words  on  the  web  that  most  search  engines  index  constantly.    Good  search  engines  today  can  do  a  lot  with  this  pile.    BUT,  the  search  engines  create  the  percepGon  of  relaGonships,  not  based  on  meaning,  but  on  other  factors,  such  as  number  of  links  to  a  site  containing  the  words  of  interest  OR  the  traffic  to  a  site.
  66. 66. From  this  pile  of  words,  structure! 66Monday, November 21, 11The  Linked  Data  approach  aSempts  to  structure  the  pile  in  anGcipaGon  of  the  need  for  discovery.    That  structure  is  based  on  meaning,  on  relaGonships.    I  will  make  this  clearer  in  the  next  slides.
  67. 67. 67Monday, November 21, 11Here’s  a  graph  of  a  very  few  relaGonships  to  Yo  Yo  Ma,  the  great  ‘cellist.
  68. 68. Linked  Data  Web 68Monday, November 21, 11Here’s  a  graph  of  relaGonships  to  Haggis,  just  a  fun  one  I  could  not  resist  throwing  in.    Meaning  is  provided  by  understanding  relaGonships.
  69. 69. RDF$triples$&$URIs$ •  RDF$triples$=$subject$–$object$–$predicate$ –  A$way$to$describe$objects$or$even$ideas$on$the$web$ –  An$object$or$idea$might$have$many$RDF$triples$describing$it$ –  Objects$or$ideas$need$not$exist$on$the$web!$ •  URIs$=$Uniform$Resource$IdenDfiers$ –  Allows$machine$interacDon$among$Web$objects$ –  Various$syntacDcal$schemes$&$protocols$used$to$construct$ URIs$ –  At$least$3$needed$to$support$an$RDF$(subject$–$objectJ$ predicate)$ 69Monday, November 21, 11Geek  ingredients  to  the  construcGon  of  the  Linked  DAta  Web.  RDF  means  Resource  DescripGon  Framework,  always  expressed  as  a  simple  sentence,  though  mulGple  such  statements  might  aSach  to  a  single  enGty.    In  fact,  we  need  mulGple  RDFs  in  this  scheme.
  70. 70. 70Monday, November 21, 11A  graph  of  RDF  statements  and  URIs
  71. 71. The Linked Data Principles 1. Use Resource Description Frameworks as names of things (people, places, times, objects, ideas...anything really) 2. Use HTTP URIs so that people can look up those names 3. When someone looks up a URI, provide useful RDF information 4. Include RDF statements that link to other URIs so that they can discover related things 71Monday, November 21, 11The  really  great  aspect  of  RDFs  is  that  they  can  refer  to  ideas,  not  just  to  physical  or  virtual  enGGes.    Any  kind  of  idea  could  be  treated.
  72. 72. LibraryMetadata •  Librarymetadatastandardsclosed •  “Passive”metadata,searchable,but… •  InSilos •  Readable,butnotac=onable •  Searchresultsrefinable,butfinal 72Monday, November 21, 11These  are  some  of  the  edges  of  the  problem  of  library  metadata.
  73. 73. LibraryMetadata Seman/cWebMetadata LibraryMetadata Seman/cWebMetadata •  Librarymetadatastandards •  Open •  Librarymetadatastandards •  Open closed closed •  “Passive”metadata, •  Dynamic,Contextualized •  “Passive”metadata, •  Dynamic,Contextualized searchable,but… searchable,but… •  InSilos •  Inthewild •  InSilos •  Inthewild •  Readable,butnot •  Interac<ve,Responsive •  Readable,butnot •  Interac<ve,Responsive ac<onable ac<onable •  Searchresultsrefinable,but •  Searchresultsrefinable,but •  Leadingtootherqueries& final •  Leadingtootherqueries& final views views 73Monday, November 21, 11And  here  is  the  comparison  between  the  library  metadata  scene  now  and  the  one  we  advocate  for  the  Linked  Data/SemanGc  Web.    Library  metadata  in  the  Linked  Data  Web  should  be  freely  available,  constantly  updated,  o[en  reconciled  with  RDF  triple  statements  from  non-­‐library  sources.    Library  Linked  Data  should  be  enGrely  open  on  the  web.
  74. 74. Make  Library    bibliographic  factsin  to  RDFs  &  URIs;Release  them  into  the  wild.Make  Library  Linked  Data  OPEN. 74Monday, November 21, 11I  should  add  that  accounGng  for  physical  objects  in  our  collecGons,  locaGng  them,  making  our  collecGons  auditable,  and  managing  our  collecGons  seems  to  be  possible  using  Linked  Data  too,  at  least  in  principal.
  75. 75. What  about  Publishers? 75Monday, November 21, 11
  76. 76. Publishers*&*Socie/es** making*use*of*Linked*Data* •  Aggregate*content*in*their*own*realms*&*beyond* •  Aggregate*informa/on*about* –  Conferences* –  Career*building*&*employment*opportuni/es* –  Communi/es*in*collabora/on* –  Commercial*&*other*services*suppor/ng*research*with* specimens,*source*material,*processing,*trials* –  Produc/ve*rela/onships*with*others* •  Provide*ac/onable,*constantly*updated*links*in* support*of*scholars,*teachers,*and*learners* •  Provide*compelling*services*tying*users*to*them* 76Monday, November 21, 11Libraries  too  can  use  Linked  Data  to  reveal  and  adverGse  compelling  services  offered  to  their  clients.
  77. 77. Seman4c  Web  adopters 77Monday, November 21, 11Here  are  some  of  the  big  players  in  the  Linked  Data  /  SemanGc  Web  world.    The  BriGsh  Library  has  released  RDFs/URIs  for  the  enGre  BriGsh  NaGonal  Bibliography.    The  Library  of  Congress  has  released  the  same  for  LCSH  &  Name  Authority  Files.    LCSH  includes  links  to  AGROVOC,  RAMEAU,  DNB,  GLIN  Subject  Thesaurus,  and  the  NaGonal  Agriculture  Librarys  Subject  Index.    Every  Personal  and  Corporate  entry  in  LC/NAF  links  to  VIAF,  the  Virtual  InternaGonal  Authority  File  based  at  OCLC.        The  N  Y  Times  18  months  ago  made  all  500,000  (and  growing)  of  its  index  terms  available  in  the  wild  as  RDFs  and  URIs.
  78. 78. 78Monday, November 21, 11For  publishers  and  libraries...though  we  should  not  neglect  services.
  79. 79. ...if  users  can  find  it  in  their  own  context 79Monday, November 21, 11
  80. 80. Context Users ContentUsers  =  readers,  authors,  teachers,  students 80Monday, November 21, 11
  81. 81. Context Users ContentPublishers  must  make  content  VISIBLE 81Monday, November 21, 11I  am  using  the  imperaGve  here,  because  invisible  published  content  means  invisible  benefit  to  the  author  and/or  the  publisher.
  82. 82. 82Monday, November 21, 11Here  is  a  recent  PLoS  arGcle  from  PLoS  Neglected  Tropical  Diseases.    
  83. 83. 83Monday, November 21, 11And  here  is  the  semanGcally  enhanced  version  of  this  arGcle,  enhancements  provided  by  David  ShoSen  et  al.  in  the  form  of  links  to  further  informaGon,  interacGve  figures,  re-­‐orderable  reference  list,  citaGons  in  context  and  tag  trees.  These  enhancements  took  10  man  weeks  in  2009!    However,  with  the  growing  ecology  of  linked  data,  much  of  this  could  be  accomplished  by  auto-­‐tagging  and  algorithmic  construcGon  of  the  basic  RDFs  &  URIs  for  the  unique  arGcle.    Microdata  submiSed  by  some  publishers  and  their  supporGng  services  to  schema.org  lead  to  these  exciGng  possibiliGes.
  84. 84. aggrega+on 84Monday, November 21, 11AggregaGon  counts,  but  think  how  much  more  we  would  get  if  we  could  aggregate  from  libraries,  publishers,  and  the  wild  and  weird  variety  of  sources  on  the  web?
  85. 85. 85Monday, November 21, 11
  86. 86. Disambigua4on 86Monday, November 21, 11RDFs  and  URIs  can  operate  in  many  languages  and  relaGonships  can  be  expressed  across  languages,  a  potenGal  big  benefit  to  research  and  collaboraGon  in  research.
  87. 87. Web  of  Data  Progress 87Monday, November 21, 11
  88. 88. 2007 88Monday, November 21, 11FOAF  =  Friend  of  a  Friend.    Hundreds  of  millions  of  RDFs/URIs.    Fortunately  they  do  not  take  much  space  in  memory!
  89. 89. 89Monday, November 21, 11This  is  the  2011  graph  of  enGGes  supplying  RDFs  and  URIs.    Now  the  populaGon  is  in  the  hundreds  of  billions,  heading  to  trillions.
  90. 90. 2011 90 hSp://inkdroid.org/lod-­‐graph/Monday, November 21, 11
  91. 91. Encouragement Examples 91Monday, November 21, 11
  92. 92. LinkedOpenDataValueProposi4on •  Linkedopendata(LOD)putsinforma4onwherepeoplearelookingforit–on theWeb; •  LODcanexpandsdiscoverabilityofourcontent; •  LODopensopportuni4esforcrea4veinnova4onindigitalscholarshipand par4cipa4on; •  LODallowsforopencon4nuousimprovementofdata; •  LODcreatesastoreofmachineDac4onabledataonwhichimprovedservicescan bebuilt; •  Librarylinkedopendatamightfacilitatethebreakdownthetyrannyofdomain silos; •  LODcanprovidedirectaccesstodatainwaysthatarenotcurrentlypossible; •  LODprovidesunan4cipatedbenefitsthatwillemergelaterasthestoresofLOD expandexponen4ally. A"product"of"the"Stanford/CLIR"Linked"Data"Workshop"June"2011." 92Monday, November 21, 1125  ParGcipants  from  the  BriGsh  Library,  the  Bibliothèque  naGonale  de  France,  the  Deutsch  NaGonalbibliothek,  the  Royal  Library  of  Denmark,  Aalto  University  in  Finland,  the  Library  of  Congress,  the  Bibliotheca  Alexandrina,  the  NaGonal  InsGtute  of  InformaGcs  of  Japan,  Google,  Seme4,  Emory,  University  of  Virginia,  University  of  Michigan,  California  Digital  Library,  Knowledge  MoGfs,  CLIR,  and  Stanford.    
  93. 93. Google  using  Stanford  bib  facts  +  web  resources 93Monday, November 21, 11This  is  a  movie  of  a  live  interacGon  with  Freebase  using  bibliographic  facts  from  Stanford,  and  linked  informaGon  resources  from  the  web.    It  shows  in  a  limited  way  the  potenGal  for  discovery  and  retrieval  in  the  Linked  Data  Web.    
  94. 94. BnF  using  data  only  from  its  catalogs  &  Gallica 94Monday, November 21, 11This  is  another  movie  of  the  Linked  Data  prototype  based  enGrely  on  bibliographic  facts  from  the  BnF  catalogs  and  digital  texts  in  Gallica.    There  are  no  other  web  resources  drawn  into  this  prototype...yet.
  95. 95. 95Monday, November 21, 11
  96. 96. A"Bibliographic"Framework"for"the" Digital"Age"(October"31,"2011)! •  “The!new!bibliographic!framework!project!will!be!focused!on! the!Web!environment,!Linked!Data!principles!and! mechanisms,!and!the!Resource!Descrip?on!Framework!(RDF)! as!a!basic!data!model.!!The!protocols!and!ideas!behind! Linked!Data!are!natural!exchange!mechanisms!for!the!Web! that!have!found!substan?al!resonance!even!beyond!the! cultural!heritage!sector.!!Likewise,!it!is!expected!that!the!use! of!RDF!and!other!W3C!(World!Wide!Web!Consor?um)! developments!will!enable!the!integra?on!of!library!data!and! other!cultural!heritage!data!on!the!Web!for!more!expansive! user!access!to!informa?on.”! Deanna%Marcum,%Associate%Librarian%of%Congress,%introducing%a% transi7on%from%MARC.% 96Monday, November 21, 11
  97. 97. Value  Proposi-on  for  LAM’s We  in  the  cultural  heritage  and  knowledge  management  institutions  are  discovering   better  ways  of  publishing,  sharing,  and  using  information  by  linking  data  and   helping  others  do  the  same.    Through  this  work,  we  have  come  to  value  and  to   promote  the  following  practices: 1.   Publishing  data  on  the  web  for  discovery  and  use,  rather  than  preserving  it  in   dark,  more  or  less  unreachable  archives  that  are  often  proprietary  and  pro?it   driven;     2.   Continuously  improving  data  and  Linked  Data,  rather  than  waiting  to  publish   “perfect”  data; 3.   Structuring  data  semantically,  rather  than  preparing  ?lat,  unstructured  data; 4.   Collaborating,  rather  than  working  alone; 5.   Adopting  Web  standards,  rather  than  domain  speci?ic  ones; 6.   Using  open,  commonly  understood  licenses,  rather  than  closed  and/or  local   licenses. from  the  Stanford/CLIR  Workshop  on  Linked  Data,  June  2011 97Monday, November 21, 11In  each  couplet,  we  emphasize  the  second  half,  a[er  “rather  than”,  admitng  that  someGmes  the  first  half  of  the  couplet  has  to  be  operaGve.
  98. 98. DARPA  Internet 98Monday, November 21, 11This  is  where  we  started  2.5  decades  ago.
  99. 99. World  Wide  Web 99Monday, November 21, 11Thanks  to  Tim  Berners-­‐Lee  and  many  others,  we  advanced  in  this  environment  from  the  early  1990s  unGl  today.
  100. 100. SOCIAL  WEB 100Monday, November 21, 11We  cannot  ignore  the  social  web  that  exists  in  the  current  WWW,  but  think  how  much  more,  some  of  it  scarey,  could  be  done  in  the  Linked  Data  Web  with  the  behaviors  of  the  Social  Web.
  101. 101. Linked  Data  Web 101Monday, November 21, 11Just  that  funny  reminder  of  the  fundamental  nature  of  the  Linked  Data  Web:  expressing  machine  acGonable  relaGonships.
  102. 102. Seman+c  Web 102Monday, November 21, 11And  in  the  next  web,  the  SemanGc  Web,  who  knows  what  may  be  possible.    
  103. 103. Ubiquitous  compu+ng 103Monday, November 21, 11To  the  progression  of  network  types,  we  need  to  add  a  couple  of  enormously  important  environmental  factors.    Ubiquitous  compuGng  is  a  very  important  one.    Having  lots  of  computers  on  the  net  makes  the  possibility  of  an  open  global  linked  data  web  very  strong.
  104. 104. Mobility 104Monday, November 21, 11And  our  ability  to  communicate  by  voice  (how  about  that  Siri?)  and  by  bits/bytes  from  everywhere,  is,  perhaps,  just  another  aspect  of  ubiquitous  compuGng.
  105. 105. Ubiquitous  Compu4ng Linked  Web M o b i l e Web Social  Web Internet 105Monday, November 21, 11The  black  box  in  the  upper  right  corner  is  the  SemanGc  Web,  a  level  of  sophisGcaGon  yet  to  be  achieved.    The  linked  data  web  is  at  hand,  though.Will  Librarians  and  Publishers  join  the  development  of  the  Linked  Open  Data  web?    I  certainly  think  we  should.
  106. 106. Monday, November 21, 11NO MORE SILOS ARE NEEDED or wanted.
  107. 107. W3C Library Linked Data Incubator Group http://www.w3.org/2005/Incubator/lld/ A Bibliographic Framework Initiative General Plan for the Digital Age (October 31, 2011) http://www.loc.gov/marc/ transition/news/ framework-103111.html Linked  Data  Survey  &  Workshop  June  2011 hSp://www.clir.org/pubs/archives/linked-­‐data-­‐ survey/ 107Monday, November 21, 11
  108. 108. 108Monday, November 21, 11
  109. 109. 109Monday, November 21, 11
  110. 110. 110Monday, November 21, 11
  111. 111. 111Monday, November 21, 11
  112. 112. 112Monday, November 21, 11
  113. 113. 113Monday, November 21, 11

×