Linked	  Data	  for	  Smart	  Content	                                           	                Ellen	  Hays,	  Elsevier...
Why	  Smart	  Content?	  Elsevier’s	  readers	  want	  more	  than	  text	  and	     images,	  that	  is,	  more	  than	  ...
The	  challenge	  How	  to	  do	  seman?c	  enhancement	  at	  scale	   for	  STM	  publishing?	     •    In	  harmony	  w...
Smarter	  Content                                              Applied Smart Content                                      ...
Content	  enrichmentEvaluation and management of delirium in hospitalized olderpatientsDelirium is common in hospitalized ...
Guiding	  principles	  •    Leverage	  our	  exis?ng	  content	  produc?on	       workflow	  and	  infrastructure	  •    Ac...
Current	  approach	  •    Embrace	  linked	  data	  principles	        •    Reuse	  Web-­‐standard	  vocabularies,	  taxon...
Linked	  data	  principles	  1.  Use	  URIs	  to	  name	  things	  2.  Use	  HTTP	  URIs	  so	  they	  can	  be	      look...
Standards:	  Content	  satellites                                            	  Content	  satellites	  are	  XML	  documen...
Infrastructure:	  	                          Linked	  Data	  Repository	  •    Allows	  Elsevier	  plaOorms	  and	  applic...
Benefits	  of	  the	  LDR	  •    Unprecedented	  access	  to	  Elsevier	  content	  •    Key	  enabler	  for	  providing	  ...
Mining	  text	  for	  semanHc	  data	   Building	  the	  databases	  that	  support	  content	    enrichment	  includes	  ...
Mining	  text	  for	  semanHc	  data	    •         We’re	  exploring	  a	  range	  of	  tools	  and	  techniques	  to	  do...
SemanHc	  and	  lexical	  models	         Suppor?ng	  our	  text	  mining	  efforts	  is	  an	  increased	  focus	  on	    ...
Smart	  Content	  design	  paIerns	     Linked	  data	     •  Link-­‐following	  naviga?on	  over	  linked	  graph	  of	  ...
Example:	  Marine	  Geology	                                      16
Example:	  Marine	  Geology	                                      17
Linking	  data	  to	  support	  enriched	  content	  is	  an	  essenHal	  part	  of	  the	      future	  of	  STM	  publis...
Upcoming SlideShare
Loading in …5

NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web


Published on

The much-heralded Semantic Web is enabled by an ability for machines to process webpages and certain data intelligently and perform better tasks on behalf of end users. Material is linked together through machine-readable statements of relationships among ideas, people, events, and places. Linked data examples are beginning to abound in the scholarly information environment, appearing from both publishers and libraries. This webinar will showcase several such examples. Presenters will describe their motivations for investment in such projects and discuss interfaces and other early outcomes.

Published in: Education
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

NISO Webinar: Return on Investment (ROI) in Linking the Semantic Web

  1. 1. Linked  Data  for  Smart  Content     Ellen  Hays,  Elsevier  Labs   Presented  at:     NISO  Webinar  on  Seman?c  Web  Linking   28  September  2011   1
  2. 2. Why  Smart  Content?  Elsevier’s  readers  want  more  than  text  and   images,  that  is,  more  than  simply  an  online   rendi?on  of  what  we  print.  They  want:   •  Seman?cally  enhanced  content,  such  as  mashups   that  combine  informa?on  from  diverse  sources   and  in  diverse  media   •  The  ability  to  do  seman?cally-­‐mo?vated  search   •  Source  data,  and  the  tools  to  mine  it  effec?vely  for   more  informa?on   •  I.e.,  informa?on,  presented  in  ways  that  make  it   straighOorward  to  use  and  understand   2
  3. 3. The  challenge  How  to  do  seman?c  enhancement  at  scale   for  STM  publishing?   •  In  harmony  with  our  culture  and  legacy   •  Across  the  breadth  of  our  content   •  Within  an  ecosystem  of  authors,   ins?tu?ons,  publishers,  content  suppliers,   and  funding  agencies   3
  4. 4. Smarter  Content Applied Smart Content Better discovery Text •  Faceted search & browse •  Ontology-driven navigationElsevier •  Task-specific resultscontent •  Personalized/localized Tables results •  Question answering Images Better understanding •  Tag clouds •  Heatmaps Related Concepts: •  Streamgraphs Elsevier content Metadata, •  Scatterplots and data Entities, •  Time series Relationships •  Animations Actionable, persuasive knowledge •  Topic pages •  Social network maps Linked data •  Geolocation maps from partners •  Data mashups and the Web •  Text mining reports 4
  5. 5. Content  enrichmentEvaluation and management of delirium in hospitalized olderpatientsDelirium is common in hospitalized older patients and may be a Title •   Concepts  and  rela?ons  symptom of a medical emergency, such as hypoxia or hypoglycemia.It is characterized by an acute change in cognition and attention, between  concepts  are  although the symptoms may be subtle and usually fluctuatethroughout the day. This heterogeneous syndrome requires prompt iden?fied  in  text,  compared  to  recognition and evaluation, because the underlying medical condition Diseasemay be life threatening. Risk factors for delirium include visual a  controlled  vocabulary  or  impairment, previous cognitive impairment, severe illness, and anelevated blood urea nitrogen/serum creatinine ratio. Interventions seman?c  model,  and  the  that have been shown to reduce the incidence of delirium in at-riskhospitalized patients include repeated reorientation of the patient toperson and place, promotion of good sleep hygiene, early resul?ng  informa?on  is  stored   Clinical findingmobilization, correction of dehydration, and the minimization ofunnecessary noise and stimuli. The treatment of delirium centers on as  RDF  in  annota?on  files  the identification and management of the medical condition thattriggered the delirious state. Nonpharmacologic interventions may be •   The  storage  mechanism  for  beneficial, but antipsychotic agents may be needed when the causeis nonspecific and other interventions do not sufficiently control this  informa?on  is  the  Elsevier  symptoms such as severe agitation or psychosis. Although deliriumis aDrugs condition, it may persist for several months in the temporary Linked  Data  Repository  (LDR)most vulnerable patients. Patient outcomes at one year include ahigher mortality rate and a lower level of functioning compared withage-matched control patients. Copyright © 2008 American Academyof Family Physicians. Source 5
  6. 6. Guiding  principles  •  Leverage  our  exis?ng  content  produc?on   workflow  and  infrastructure  •  Acknowledge  a  deep  dependence  on  subject   maZer  exper?se,  third  par?es  and  the  Web  for   content  enhancement  and  knowledge   organiza?on  systems  •  Deliver  benefits  across  the  complementary  use   cases  of  researcher  and  prac??oner   6
  7. 7. Current  approach  •  Embrace  linked  data  principles   •  Reuse  Web-­‐standard  vocabularies,  taxonomies,   ontologies  and  en?ty  resources  where  possible  •  Start  with  a  focus  on  standards  and   infrastructure    •  Leverage  partners  and  acquisi?ons  for  content   enhancement  algorithms/capabili?es  •  Build  out  linked  data  design  paZerns  for   applica?on  development  •  Explore  new  product  opportuni?es  around   linked  data   7
  8. 8. Linked  data  principles  1.  Use  URIs  to  name  things  2.  Use  HTTP  URIs  so  they  can  be   looked  up  3.  Return  useful  data  when   things  are  looked  up  4.  Include  links  to  other  things   in  the  returned  data                    “Linked  data  is  just  a  term  for  how  to   publish  data  on  the  web  while  working   with  the  web.  And  the  web  is  the  best   architecture  we  know  for  publishing   informa?on  in  a  hugely  diverse  and   distributed  environment,  in  a  gradual   and  sustainable  way.”   Tennison  J,  2010.  Why  Linked  Data  for  hZp://   ShoZon  D,  Portwin  K,  Klyne  G,  Miles  A,  2009.    Adventures  in  Seman?c   Publishing:  Exemplar  Seman?c  Enhancements  of  a  Research  Ar?cle.  PLoS   Comput  Biol  5(4):  e1000361.  doi:10.1371/journal.pcbi.1000361    
  9. 9. Standards:  Content  satellites  Content  satellites  are  XML  documents  containing  RDF   statements;  for  example:   •  Tags  from  a  taxonomy  for  a  given  document   •  Document  sec?ons  relevant  to  a  given  concept   •  Document  sec?ons  providing  answers  to  a  given  ques?on   •  Learning  objects  compliant  with  a  given  state  educa?onal   standard   •  Genes  men?oned  in  a  given  document   •  Documents  suppor?ng  or  dispu?ng  conclusions  of  a  given   document   •  Concepts  that  are  in  the  areas  of  exper?se  for  a  given  author  Goal  is  to  balance  expressivity  and  manageability  for   seman?c  enhancement   •  Constrain  the  RDF  serializa?on  to  allow  exis?ng  XML-­‐centric   staff,  tools,  and  workflows  to  accommodate  RDF  modeling  for   specific  applica?on  use  cases   9
  10. 10. Infrastructure:     Linked  Data  Repository  •  Allows  Elsevier  plaOorms  and  applica?ons  to  retrieve   and  store  content  enhancements   •  About  Elsevier  content   •  About  third  party  content  •  Allows  third  par?es  to  store  content  enhancements   •  About  primary  and  secondary  content  •  Provides  a  REST  API  for     •  CRUD  opera?ons  on  satellites  as  RDF  named  graphs   •  Simple,  low-­‐expressivity  queries  across  stored  named  graphs   •  For  <subject>,  give  me  all  objects  for  <property>   •  Give  me  all  subjects  that  have  <object>  for  <property>   •  These  can  be  for  sets  of  subjects  and  objects  •  Supports  content  nego?a?on  •  Op?mized  for  high-­‐volume  read-­‐write  of  RDF  named   graphs   10
  11. 11. Benefits  of  the  LDR  •  Unprecedented  access  to  Elsevier  content  •  Key  enabler  for  providing  advanced  seman?c  search   across  products  •  Provides  links  to  other  data  sources  to  provide   further  contextual  enrichment   •  Allow  others  to  discover  and  integrate  with  Elsevier   content   •  Link  content  across  domains  •  Data  can  be  pulled  out  of  large  amounts  of  text  and   organized  for  review  and  ac?on   •  Informa?on  mining  for  compliance  and  research   •  Create  mashups  from  mul?ple  data  sources   •  Present  informa?on  with  enhanced  visualiza?on   11
  12. 12. Mining  text  for  semanHc  data   Building  the  databases  that  support  content   enrichment  includes  extrac?ng  from  unstructured   text:                ―  men?ons  of  concepts      ―  men?ons  of  rela,ons                between  concepts                ―  other  seman,c                informa,on,  such  as                document    metadata                  and  context  indicators 12
  13. 13. Mining  text  for  semanHc  data   •  We’re  exploring  a  range  of  tools  and  techniques  to  do   text  mining,  including:        Rule-­‐based  informa?on  extrac?on      Sta?s?cal  informa?on  extrac?on            Mapping  terms  in  text  to  thesauri  (Ei  Thesaurus,  EMTREE)    or  other  sources  of  lexical/seman?c  informa?on  •  Working  with  GATE  and  UIMA  components  to  design  and   implement  language  processing  pipelines,  and  with  a   number  of  text  mining  vendors    •  Because  Elsevier  publishes  in  a  broad  range  of  subject   areas,  content  types,  and  languages,  no  one  approach  is   appropriate  for  all  uses   13
  14. 14. SemanHc  and  lexical  models   Suppor?ng  our  text  mining  efforts  is  an  increased  focus  on   acquiring,  building,  and  maintaining  vocabularies  and   seman?c  models,  including:        Dic?onaries/thesauri    Taxonomies      Ontologies   We  reuse  Web-­‐standard  seman?c  and  lexical  resources   wherever  possible,  but  also  create  applica?on-­‐specific   domain  models,  some?mes  by  hand,  for  narrow   domains   These  seman?c  resources  are  also  stored  in  the  LDR,  which   links  seman?c  data  to  documents,  to  non-­‐text  content,   and  to  other  resources,  to  create  a  web  of  meaningful   and  re-­‐usable  informa?on   14
  15. 15. Smart  Content  design  paIerns   Linked  data   •  Link-­‐following  naviga?on  over  linked  graph  of   browser   RDF  resources   •  Integrated  presenta?on  of  content  and  data   Mashup   across  mul?ple  sources   •  Free  text/faceted  search  over  document/data  Seman?c  search   sets   •  Rela?onal  query  over  aggregated/federated  sets  Seman?c  query   of  RDF  statements   15
  16. 16. Example:  Marine  Geology   16
  17. 17. Example:  Marine  Geology   17
  18. 18. Linking  data  to  support  enriched  content  is  an  essenHal  part  of  the   future  of  STM  publishing   Ellen Hays, Elsevier Labs 18