Semantic Search Using RDF Metadata Semantic Technology Conference 2005  8 March 2005 Bradley P. Allen Siderean Software, I...
Overview <ul><li>Semantic search </li></ul><ul><ul><li>Motivation  </li></ul></ul><ul><ul><li>Enterprise adoption </li></u...
Problem <ul><li>“ We have to understand what information we have and organize it,’ says [Santa Clara Co. CIO] Ajmani, who ...
Portal-driven demand for a better solution <ul><li>“ A portal provides an integrated information source for our internal p...
Current solutions <ul><li>Enterprise search, portals, knowledge management and content management systems lashed up in ad ...
Why semantic search? <ul><li>Explicitly represented knowledge can </li></ul><ul><ul><li>Unify access to both content and d...
Semantic search – some definitions <ul><li>Search: the process of retrieving objects matching a given query </li></ul><ul>...
Benefits in the enterprise <ul><li>Addresses pervasive frustration with enterprise search </li></ul><ul><li>Let users  </l...
Roots <ul><li>Parametric search </li></ul><ul><li>Query by example </li></ul><ul><li>Retrieval by reformulation </li></ul>...
Semantic search requires metadata <ul><li>Ontologies </li></ul><ul><ul><li>Specifications of how to represent classes, ins...
Current metadata solutions are costly <ul><li>Much custom development done </li></ul><ul><ul><li>Not easy to tag or incorp...
Metadata in today’s enterprises <ul><li>From thirty interviews conducted with Fortune 1000 organizations during Fall 2004 ...
Approach: top down <ul><li>CEO says “We must be an information-driven company” </li></ul><ul><li>“ Corporate controlled vo...
Approach: bottom up <ul><li>Groups determine their vocabulary while describing their process </li></ul><ul><ul><li>Often i...
Approach: give up <ul><li>Assumption: too difficult to create metadata from existing content </li></ul><ul><ul><li>“ We ca...
Don’t give up! <ul><li>RDF can make metadata use easier and less costly  </li></ul><ul><ul><li>An open standard for metada...
Building semantic search systems with RDF <ul><li>Define/reuse ontologies expressed in RDF(S) </li></ul><ul><ul><li>Classe...
Types of semantic search in RDF <ul><li>Searching for RDF </li></ul><ul><ul><li>Swoogle </li></ul></ul><ul><li>Adding valu...
Swoogle: Searching for RDF <ul><li>Crawling for SW documents </li></ul><ul><ul><li>Leverages Google indexing </li></ul></u...
TAP: Adding value to search using RDF <ul><li>Layering “related items” on top of traditional Web search </li></ul><ul><li>...
FOAFNaut: Adding value to search using RDF <ul><li>Specialized search and visualization over FOAF networks </li></ul><ul><...
Edutella: Searching resources using RDF <ul><li>P2P architecture federating collections of learning objects </li></ul><ul>...
Seamark: Searching resources using RDF <ul><li>Using ontologies and taxonomies to define navigation over specific collecti...
Faceted navigation as a type of semantic search <ul><li>Metadata may be faceted, i.e., includes properties whose ranges fo...
Case study: DC 2003 Online Proceedings <ul><li>Further the goals of the Dublin Core Metadata Initiative (DCMI) by providin...
Project timeline <ul><li>July 2003 </li></ul><ul><ul><li>Initial experiment using DC 2002 site </li></ul></ul><ul><li>Augu...
Ontology <ul><li>Reused ontologies and metadata vocabularies </li></ul><ul><ul><li>Papers and posters: Dublin Core  </li><...
Ontology for conferences <ul><li><s:Class rdf:about=&quot;&dcconf;Event&quot;>  </li></ul><ul><li><s:label>Presentation</s...
Controlled vocabulary <ul><li>Author-assigned keywords used as source materials </li></ul><ul><li>Combined author-assigned...
Seed thesaurus
Wrapping author-assigned keywords <ul><li><tif:Term rdf:about=&quot;&dcconf2003;Relational_Database&quot;>  </li></ul><ul>...
Adding editorial control <ul><li><tif:Term rdf:about=&quot;&dcconf2003;Domain_Metadata&quot;>  </li></ul><ul><li><tif:valu...
Instance metadata <ul><li>Paper and poster metadata automatically extracted from author submissions </li></ul><ul><ul><li>...
Papers and posters <ul><li><dcconf:Paper rdf:about=&quot;http://www.siderean.com/dc2003/103_paper-22.pdf&quot;> <seamark:t...
Creators and organizations <ul><li><foaf:Person rdf:about=&quot;&dcconf;Greenberg_Jane&quot;>  </li></ul><ul><li><foaf:nam...
Application profile <ul><li>Expressed in XRBR (XML For Retrieval By Reformulation) </li></ul><ul><ul><li>Specifies a view ...
Application profile: specifying facets <ul><li><xrbr:query xmlns:xrbr=&quot;http://www.siderean.com/2001/10/xrbr/&quot; it...
Application profile: specifying hierarchical facets <ul><li>… </li></ul><ul><li><xrbr:dimension name=&quot;BT1&quot; predi...
Application profile: flattening graphs <ul><li>… </li></ul><ul><li><xrbr:structure name=&quot;creator&quot; predicate=&quo...
Automatically generated interface
Alternate view: creators
Alternate view: subjects
Site start page
Site drilldown
Case study: Environmental Health News <ul><li>Aggregating news stories from the Web </li></ul><ul><li>Semi-automated metad...
Case study: Gateway to Educational Materials <ul><li>Aggregating learning objects from members of the GEM Consortium </li>...
Case study: NASA JPL <ul><li>Project information aggregated from content and data repositories </li></ul><ul><li>Using and...
Related work in RDF <ul><li>OCLC </li></ul><ul><ul><li>Metadata Switch </li></ul></ul><ul><li>MIT </li></ul><ul><ul><li>Si...
Issues <ul><li>Scale: must be commensurate with expectations and requirements from traditional web and enterprise search <...
Lessons <ul><li>Balanced incremental approach </li></ul><ul><li>Leverage metadata and indices at hand </li></ul><ul><li>Ex...
Lessons: ontologies <ul><li>Don’t do: assume you have to build elaborate OWL ontologies  </li></ul><ul><ul><li>Don’t have ...
Lessons: controlled vocabularies <ul><li>Don’t do: huge monolithic taxonomies </li></ul><ul><ul><li>Unless they are ready ...
Lessons: instances <ul><li>Manual creation </li></ul><ul><ul><li>Don’t do: exhaustive author creation of metadata </li></u...
Application profiles <ul><li>Metadata is increasingly pervasive </li></ul><ul><ul><li>The way to leverage existing informa...
The big question: statistics vs. knowledge <ul><li>Statistics can’t deliver everything </li></ul><ul><ul><li>Alan Kay’s pu...
Future directions <ul><li>User tagging + RDF: the killer SW application? </li></ul><ul><ul><li>The rehabilitation of metad...
Summary <ul><li>Semantic search has a role in today’s enterprises </li></ul><ul><li>RDF provides a framework that can ease...
 
Upcoming SlideShare
Loading in...5
×

Semantic Search using RDF Metadata (SemTech 2005)

1,163

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,163
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
22
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Semantic Search using RDF Metadata (SemTech 2005)

  1. 1. Semantic Search Using RDF Metadata Semantic Technology Conference 2005 8 March 2005 Bradley P. Allen Siderean Software, Inc.
  2. 2. Overview <ul><li>Semantic search </li></ul><ul><ul><li>Motivation </li></ul></ul><ul><ul><li>Enterprise adoption </li></ul></ul><ul><li>Semantic search using RDF </li></ul><ul><ul><li>Examples </li></ul></ul><ul><li>Lessons </li></ul><ul><li>Directions </li></ul>
  3. 3. Problem <ul><li>“ We have to understand what information we have and organize it,’ says [Santa Clara Co. CIO] Ajmani, who estimates that saving each employee an hour a month spent looking for information would save millions of dollars.” [Information Week, 1/19/04] </li></ul><ul><li>“… typical enterprise floundering in a sea of information … too many repositories, each with its own set of applications.” [IDC, 2004] </li></ul><ul><li>“ The search capabilities on most company and content-oriented Web sites are as bad now as they were several years ago.” [eWeek, 1/26/04] </li></ul>
  4. 4. Portal-driven demand for a better solution <ul><li>“ A portal provides an integrated information source for our internal process users or external customers” </li></ul><ul><li>“ Now we have to architect the information related to business processes differently to search across multiple repositories” </li></ul><ul><li>But they lack tools and applications that support this </li></ul>
  5. 5. Current solutions <ul><li>Enterprise search, portals, knowledge management and content management systems lashed up in ad hoc architectures </li></ul><ul><ul><li>Doesn’t unify data and content </li></ul></ul><ul><ul><li>Doesn’t provide context or scope </li></ul></ul><ul><ul><li>Too many results (requires searching the answer to the original search) </li></ul></ul>
  6. 6. Why semantic search? <ul><li>Explicitly represented knowledge can </li></ul><ul><ul><li>Unify access to both content and data </li></ul></ul><ul><ul><li>Create context and frames of reference </li></ul></ul><ul><li>Intellectual contributions that inform the search process must be captured </li></ul><ul><ul><li>The answer should include the question </li></ul></ul>
  7. 7. Semantic search – some definitions <ul><li>Search: the process of retrieving objects matching a given query </li></ul><ul><li>Semantic search: </li></ul><ul><ul><li>Search that uses an explicit representation of knowledge to retrieve, organize or display objects matching a query </li></ul></ul><ul><ul><li>Search that transparently renders human insight into the nature of matches </li></ul></ul>
  8. 8. Benefits in the enterprise <ul><li>Addresses pervasive frustration with enterprise search </li></ul><ul><li>Let users </li></ul><ul><ul><li>Find high-value information quickly </li></ul></ul><ul><ul><li>Add more value to it, and </li></ul></ul><ul><ul><li>Share it with others </li></ul></ul><ul><li>Aligns information to business needs </li></ul>
  9. 9. Roots <ul><li>Parametric search </li></ul><ul><li>Query by example </li></ul><ul><li>Retrieval by reformulation </li></ul><ul><ul><li>Rabbit, Argon </li></ul></ul><ul><li>Work in existing enterprise search and knowledge management </li></ul><ul><ul><li>Autonomy, Semio </li></ul></ul>
  10. 10. Semantic search requires metadata <ul><li>Ontologies </li></ul><ul><ul><li>Specifications of how to represent classes, instances and their properties </li></ul></ul><ul><ul><li>Sometimes called “vocabularies” </li></ul></ul><ul><li>Controlled vocabularies </li></ul><ul><ul><li>Terms for saying what something is about </li></ul></ul><ul><ul><li>Also called “taxonomies” and “thesauri” </li></ul></ul><ul><li>Instances </li></ul><ul><ul><li>Descriptions of resources </li></ul></ul><ul><li>Application profiles </li></ul><ul><ul><li>Specifications of which classes and properties are useful and how they are to be used in an application </li></ul></ul>
  11. 11. Current metadata solutions are costly <ul><li>Much custom development done </li></ul><ul><ul><li>Not easy to tag or incorporate content into the desired structures </li></ul></ul><ul><ul><li>No easy way for groups creating the vocabularies to deliver them to production environments </li></ul></ul><ul><li>Perceived lack of tools </li></ul><ul><ul><li>Point solutions not well integrated </li></ul></ul><ul><ul><li>Existing platform solutions closed </li></ul></ul>
  12. 12. Metadata in today’s enterprises <ul><li>From thirty interviews conducted with Fortune 1000 organizations during Fall 2004 </li></ul><ul><ul><li>Use of metadata not yet widespread but emerging </li></ul></ul><ul><ul><li>Understanding varies widely across enterprises </li></ul></ul><ul><ul><li>Three basic approaches </li></ul></ul><ul><ul><ul><li>Top down, bottom up, and give up </li></ul></ul></ul>
  13. 13. Approach: top down <ul><li>CEO says “We must be an information-driven company” </li></ul><ul><li>“ Corporate controlled vocabulary that all divisions will use” </li></ul><ul><ul><li>Typically based on Dublin Core </li></ul></ul><ul><ul><li>Used for subject tagging </li></ul></ul><ul><li>The effort is multi-year, ROI hard to track, and may not be implemented or adopted widely </li></ul>
  14. 14. Approach: bottom up <ul><li>Groups determine their vocabulary while describing their process </li></ul><ul><ul><li>Often in a collaboration environment </li></ul></ul><ul><li>Light tagging of content when it is created or when the content is published to a portal </li></ul><ul><ul><li>Again, based on Dublin Core and their own controlled vocabularies </li></ul></ul>
  15. 15. Approach: give up <ul><li>Assumption: too difficult to create metadata from existing content </li></ul><ul><ul><li>“ We can’t ever hope to organize this morass of content, so let’s put in a search appliance like Google” </li></ul></ul><ul><ul><li>“ Our internal needs are like the public internet and users are familiar with Google searches” </li></ul></ul><ul><li>But still feel that metadata would improve matters, particularly within business units </li></ul>
  16. 16. Don’t give up! <ul><li>RDF can make metadata use easier and less costly </li></ul><ul><ul><li>An open standard for metadata reduces cost and avoids technology and vendor lock-in </li></ul></ul><ul><ul><li>A “universal solvent” for data and content </li></ul></ul><ul><ul><li>A platform for reuse and sharing </li></ul></ul>
  17. 17. Building semantic search systems with RDF <ul><li>Define/reuse ontologies expressed in RDF(S) </li></ul><ul><ul><li>Classes for defining instances and controlled vocabularies </li></ul></ul><ul><ul><li>Properties for facets and additional attributes </li></ul></ul><ul><li>Import/transform instances into an RDF representation </li></ul><ul><ul><li>Resources referred to via URIs </li></ul></ul><ul><ul><li>Content and controlled vocabularies </li></ul></ul><ul><li>Write application profiles in terms of RDF </li></ul>
  18. 18. Types of semantic search in RDF <ul><li>Searching for RDF </li></ul><ul><ul><li>Swoogle </li></ul></ul><ul><li>Adding value to search using RDF </li></ul><ul><ul><li>TAP, FOAFNaut </li></ul></ul><ul><li>Searching resources using RDF </li></ul><ul><ul><li>Edutella, Seamark </li></ul></ul>
  19. 19. Swoogle: Searching for RDF <ul><li>Crawling for SW documents </li></ul><ul><ul><li>Leverages Google indexing </li></ul></ul><ul><ul><li>And structure of key document types </li></ul></ul><ul><li>Searching for ontologies and instance data </li></ul><ul><li>Mostly relevant to people bulding semantic applications rather than general users </li></ul>
  20. 20. TAP: Adding value to search using RDF <ul><li>Layering “related items” on top of traditional Web search </li></ul><ul><li>Arm’s length integration and value-add for traditional Web search </li></ul>
  21. 21. FOAFNaut: Adding value to search using RDF <ul><li>Specialized search and visualization over FOAF networks </li></ul><ul><li>Introducing the notion of social aspects of finding information </li></ul>
  22. 22. Edutella: Searching resources using RDF <ul><li>P2P architecture federating collections of learning objects </li></ul><ul><li>Work on distributing RDF queries using schema information </li></ul><ul><li>RDF as a more natural representation for learning objects than IEEE LOM </li></ul>
  23. 23. Seamark: Searching resources using RDF <ul><li>Using ontologies and taxonomies to define navigation over specific collections </li></ul><ul><li>First implementation of faceted navigation using RDF </li></ul>
  24. 24. Faceted navigation as a type of semantic search <ul><li>Metadata may be faceted, i.e., includes properties whose ranges form a near-orthogonal set of controlled vocabularies </li></ul><ul><ul><li>Creator: Dickens, Charles </li></ul></ul><ul><ul><li>Subject: Arsenic, Antimony </li></ul></ul><ul><ul><li>Location: World > U.S. > California > Venice </li></ul></ul><ul><li>Facets form a frame of reference for information overview, access and discovery </li></ul><ul><ul><li>Other properties serve as landmarks and cues </li></ul></ul>
  25. 25. Case study: DC 2003 Online Proceedings <ul><li>Further the goals of the Dublin Core Metadata Initiative (DCMI) by providing DC-centric faceted navigation of online proceedings </li></ul>
  26. 26. Project timeline <ul><li>July 2003 </li></ul><ul><ul><li>Initial experiment using DC 2002 site </li></ul></ul><ul><li>August 2003 </li></ul><ul><ul><li>Initial proposal to DCMI </li></ul></ul><ul><ul><li>Iterative prototyping involving </li></ul></ul><ul><ul><ul><li>Selection and development of ontologies </li></ul></ul></ul><ul><ul><ul><li>Generation of instance metadata </li></ul></ul></ul><ul><ul><ul><li>Specification of application profile </li></ul></ul></ul><ul><ul><li>Conversion of DC2003 dataset into navigable RDF </li></ul></ul><ul><ul><ul><li>Elapsed time to implement: 1 day </li></ul></ul></ul><ul><li>September 2003 </li></ul><ul><ul><li>Design and editing of controlled vocabulary </li></ul></ul><ul><ul><li>Final iterations on site pages </li></ul></ul><ul><ul><li>Launch at conference </li></ul></ul>
  27. 27. Ontology <ul><li>Reused ontologies and metadata vocabularies </li></ul><ul><ul><li>Papers and posters: Dublin Core </li></ul></ul><ul><ul><li>Creators: Friend Of A Friend (FOAF) </li></ul></ul><ul><ul><li>Subjects: Thesaurus Interchange Format (TIF) </li></ul></ul><ul><li>Added relatively few properties and classes in a conference ontology </li></ul><ul><ul><li>Events </li></ul></ul><ul><ul><li>Tracks </li></ul></ul>
  28. 28. Ontology for conferences <ul><li><s:Class rdf:about=&quot;&dcconf;Event&quot;> </li></ul><ul><li><s:label>Presentation</s:label> </li></ul><ul><li></s:Class> </li></ul><ul><li><s:Class rdf:about=&quot;&dcconf;Paper&quot;> </li></ul><ul><li><s:label>Paper</s:label> </li></ul><ul><li><s:subClassOf rdf:resource=&quot;&dcconf;Event&quot;/> </li></ul><ul><li></s:Class> </li></ul><ul><li><s:Class rdf:about=&quot;&dcconf;Track&quot;> </li></ul><ul><li><s:label>Conference Track</s:label> </li></ul><ul><li></s:Class> </li></ul><ul><li><rdf:Property rdf:about=&quot;&dcconf;track&quot;> </li></ul><ul><li><s:label>Track</s:label> </li></ul><ul><li><s:comment>The track that the given paper is in.</s:comment> </li></ul><ul><li><s:domain rdf:resource=&quot;&dcconf;Event&quot; /> </li></ul><ul><li><s:range rdf:resource=&quot;&dcconf;Track&quot; /> </li></ul><ul><li></rdf:Property> </li></ul>
  29. 29. Controlled vocabulary <ul><li>Author-assigned keywords used as source materials </li></ul><ul><li>Combined author-assigned with editorial judgment about the CV terms and structure </li></ul>
  30. 30. Seed thesaurus
  31. 31. Wrapping author-assigned keywords <ul><li><tif:Term rdf:about=&quot;&dcconf2003;Relational_Database&quot;> </li></ul><ul><li><tif:value>Relational Database</tif:value> </li></ul><ul><li><tifs:USE rdf:resource=&quot;&dcconf2003;Relational_Databases&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;relationship_metadata&quot;> </li></ul><ul><li><tif:value>Relationship metadata</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;requirements&quot;> </li></ul><ul><li><tif:value>Requirements</tif:value> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;resource_discovery&quot;> </li></ul><ul><li><tif:value>Resource discovery</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Discovery&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;resource-level_metadata&quot;> </li></ul><ul><li><tif:value>Resource-level metadata</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;SCORM&quot;> </li></ul><ul><li><tif:value>SCORM</tif:value> </li></ul><ul><li><tifs:USE rdf:resource=&quot;&dcconf2003;Sharable_Content_Object_Reference_Model_SCORM&quot; /> </li></ul><ul><li></tif:Term> </li></ul>
  32. 32. Adding editorial control <ul><li><tif:Term rdf:about=&quot;&dcconf2003;Domain_Metadata&quot;> </li></ul><ul><li><tif:value>Domain Metadata</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Applications&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;Governments&quot;> </li></ul><ul><li><tif:value>Governments</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Organizations_and_Domains&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;Federal_Geographic_Data_Committee_Metadata&quot;> </li></ul><ul><li><tif:value>Federal Geographic Data Committee Metadata</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> </li></ul><ul><li><tifs:RT rdf:resource=&quot;&dcconf2003;Governments&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;Geospatial_Metadata&quot;> </li></ul><ul><li><tif:value>Geospatial Metadata</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> </li></ul><ul><li><tifs:RT rdf:resource=&quot;&dcconf2003;Organizations_and_Domains&quot; /> </li></ul><ul><li></tif:Term> </li></ul><ul><li><tif:Term rdf:about=&quot;&dcconf2003;Government_Agency_Metadata&quot;> </li></ul><ul><li><tif:value>Government Agency Metadata</tif:value> </li></ul><ul><li><tifs:BT rdf:resource=&quot;&dcconf2003;Domain_Metadata&quot; /> </li></ul><ul><li><tifs:RT rdf:resource=&quot;&dcconf2003;Governments&quot; /> </li></ul><ul><li></tif:Term> </li></ul>
  33. 33. Instance metadata <ul><li>Paper and poster metadata automatically extracted from author submissions </li></ul><ul><ul><li>Ad hoc Perl script </li></ul></ul><ul><ul><li>Manual review and cleanup of generated RDF </li></ul></ul><ul><ul><li>Mostly Dublin Core with some application-specific properties </li></ul></ul><ul><li>Creator and organization metadata manually collated from paper and poster metadata </li></ul><ul><ul><li>Represented in FOAF (but not in the manner in which FOAF is typically used) </li></ul></ul>
  34. 34. Papers and posters <ul><li><dcconf:Paper rdf:about=&quot;http://www.siderean.com/dc2003/103_paper-22.pdf&quot;> <seamark:texturl>http://www.siderean.com/dc2003/103_paper-22.pdf</seamark:texturl> </li></ul><ul><li><rdf:type rdf:resource=&quot;&dcconf;Event&quot;/> </li></ul><ul><li><dcconf:track rdf:resource=&quot;&dcconf;Interoperability&quot; /> </li></ul><ul><li><dc:title>Two Paths to Interoperable Metadata</dc:title> </li></ul><ul><li><dc:creator rdf:resource=&quot;&dcconf;Godby_Carol&quot; /> </li></ul><ul><li><dc:creator rdf:resource=&quot;&dcconf;Smith_Devon&quot; /> </li></ul><ul><li><dc:creator rdf:resource=&quot;&dcconf;Childress_Eric&quot; /> </li></ul><ul><li><dc:description> This paper describes a prototype for a Web service that translates between pairs of metadata schemas. Despite a current trend toward encoding in XML and XSLT, we present arguments for a design that features a more distinct separation of syntax from semantics. The result is a system that auomates routine processes, has a well-defined place for human input, and achieves a clean separation of the document data model, the document translations, and the machinery of the application. </dc:description> </li></ul><ul><li><dc:subject rdf:resource=&quot;&dcconf2003;metadata_schema_translation&quot; /> </li></ul><ul><li><dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;metadata_schema_translation&quot; /> </li></ul><ul><li><dc:subject rdf:resource=&quot;&dcconf2003;Web_services&quot; /> </li></ul><ul><li><dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;Web_services&quot; /> </li></ul><ul><li><dc:subject rdf:resource=&quot;&dcconf2003;communities_of_practice&quot; /> </li></ul><ul><li><dcconf:authorKeyword rdf:resource=&quot;&dcconf2003;communities_of_practice&quot; /> </li></ul><ul><li></dcconf:Paper> </li></ul>
  35. 35. Creators and organizations <ul><li><foaf:Person rdf:about=&quot;&dcconf;Greenberg_Jane&quot;> </li></ul><ul><li><foaf:name>Greenberg, Jane</foaf:name> </li></ul><ul><li><foaf:mbox rdf:resource=&quot;mailto:janeg@ils.unc.edu&quot; /> </li></ul><ul><li><foaf:memberOf rdf:resource=&quot;&dcconf;University_of_North_Carolina_at_Chapel_Hill&quot; /> </li></ul><ul><li><foaf:publication rdf:resource=&quot;http://www.siderean.com/dc2003/202_Paper82-color-NEW.pdf&quot; /> </li></ul><ul><li></foaf:Person> </li></ul><ul><li><foaf:Organization rdf:about=&quot;&dcconf;University_of_North_Carolina_at_Chapel_Hill&quot;> </li></ul><ul><li><foaf:name>University of North Carolina at Chapel Hill, USA</foaf:name> </li></ul><ul><li><foaf:member rdf:resource=&quot;&dcconf;Greenberg_Jane&quot; /> <foaf:member rdf:resource=&quot;&dcconf;Crystal_Abe&quot; /> </li></ul><ul><li></foaf:Organization> </li></ul>
  36. 36. Application profile <ul><li>Expressed in XRBR (XML For Retrieval By Reformulation) </li></ul><ul><ul><li>Specifies a view over (possibly heterogeneous) RDF schemas with hints as to its interpretation and use for faceted navigation </li></ul></ul><ul><ul><li>Provides a language for query reformulation and refinement in the context of navigation </li></ul></ul><ul><ul><ul><li>Query: “give me all resources where…” + advice </li></ul></ul></ul><ul><ul><ul><li>Response: result set + suggested query refinements + original query </li></ul></ul></ul>
  37. 37. Application profile: specifying facets <ul><li><xrbr:query xmlns:xrbr=&quot;http://www.siderean.com/2001/10/xrbr/&quot; item-type=&quot;http://www.dcmi.org/dcconf/objects#Event&quot; sort-dimension=&quot;title&quot; > </li></ul><ul><li><xrbr:hint flattenresults=&quot;yes&quot; startpagecolumns=&quot;4&quot;/> </li></ul><ul><li><xrbr:dimensions> </li></ul><ul><li><xrbr:dimension name=&quot;title&quot; </li></ul><ul><li>predicate=&quot;http://purl.org/dc/elements/1.1/title&quot;> </li></ul><ul><li><xrbr:hint textsearch=&quot;yes&quot; label=&quot;Title&quot; function=&quot;itemlabel&quot;/> </li></ul><ul><li><xrbr:return /> </li></ul><ul><li></xrbr:dimension> </li></ul><ul><li><xrbr:dimension name=&quot;description&quot; </li></ul><ul><li>predicate=&quot;http://purl.org/dc/elements/1.1/description&quot;> </li></ul><ul><li><xrbr:hint textsearch=&quot;yes&quot; label=&quot;Description&quot; </li></ul><ul><li>function=&quot;itemdescription&quot;/> </li></ul><ul><li><xrbr:return /> </li></ul><ul><li></xrbr:dimension> </li></ul><ul><li>… </li></ul><ul><li></xrbr:dimensions> </li></ul><ul><li></xrbr:query> </li></ul>
  38. 38. Application profile: specifying hierarchical facets <ul><li>… </li></ul><ul><li><xrbr:dimension name=&quot;BT1&quot; predicate=&quot;http://purl.org/dc/elements/1.1/subject&quot; </li></ul><ul><li>display-predicate=&quot;http://www.w3c.rl.ac.uk/2003/07/31-tif#value&quot; </li></ul><ul><li>root-resource=&quot;http://www.dcmi.org/dcconf/2003#Organizations_and_Domains&quot; </li></ul><ul><li>ancestor-predicate=&quot;http://www.w3c.rl.ac.uk/2003/07/31-tif-simple#BT&quot; > </li></ul><ul><li><xrbr:hint label=&quot;Organizations and Domains&quot; </li></ul><ul><li>facet=&quot;yes“ </li></ul><ul><li>scopenote=&quot;Sectors, languages, special literatures or communities that use metadata&quot; /> </li></ul><ul><li><xrbr:suggestions count=&quot;7&quot; /> </li></ul><ul><li></xrbr:dimension> </li></ul><ul><li>… </li></ul>
  39. 39. Application profile: flattening graphs <ul><li>… </li></ul><ul><li><xrbr:structure name=&quot;creator&quot; predicate=&quot;http://purl.org/dc/elements/1.1/creator&quot;> </li></ul><ul><li><xrbr:dimension name=&quot;creatorname&quot; </li></ul><ul><li>predicate=&quot;http://xmlns.com/foaf/0.1/#name&quot;> </li></ul><ul><li><xrbr:hint label=&quot;Author&quot; textsearch=&quot;yes&quot;/> </li></ul><ul><li><xrbr:suggestions count=&quot;7&quot; /> </li></ul><ul><li><xrbr:return /> </li></ul><ul><li></xrbr:dimension> </li></ul><ul><li><xrbr:dimension name=&quot;creatororg“ </li></ul><ul><li>predicate=&quot;http://xmlns.com/foaf/0.1/#memberOf&quot; </li></ul><ul><li>display-predicate=&quot;http://xmlns.com/foaf/0.1/#name&quot;> </li></ul><ul><li><xrbr:hint label=&quot;Author Affiliation&quot; /> </li></ul><ul><li><xrbr:suggestions count=&quot;7&quot; /> </li></ul><ul><li><xrbr:return /> </li></ul><ul><li></xrbr:dimension> </li></ul><ul><li></xrbr:structure> </li></ul><ul><li>… </li></ul>
  40. 40. Automatically generated interface
  41. 41. Alternate view: creators
  42. 42. Alternate view: subjects
  43. 43. Site start page
  44. 44. Site drilldown
  45. 45. Case study: Environmental Health News <ul><li>Aggregating news stories from the Web </li></ul><ul><li>Semi-automated metadata creation by a team of subject matter experts and editors </li></ul><ul><li>Semantic search to design custom feeds </li></ul>
  46. 46. Case study: Gateway to Educational Materials <ul><li>Aggregating learning objects from members of the GEM Consortium </li></ul><ul><li>Embedding semantic search into a portal </li></ul>
  47. 47. Case study: NASA JPL <ul><li>Project information aggregated from content and data repositories </li></ul><ul><li>Using and extending taxonomies </li></ul><ul><li>Exploiting document type/genre </li></ul>
  48. 48. Related work in RDF <ul><li>OCLC </li></ul><ul><ul><li>Metadata Switch </li></ul></ul><ul><li>MIT </li></ul><ul><ul><li>Simile </li></ul></ul><ul><ul><ul><li>Longwell </li></ul></ul></ul><ul><ul><li>Haystack </li></ul></ul><ul><li>Aduna </li></ul><ul><ul><li>Sesame </li></ul></ul><ul><li>Ontoprise </li></ul><ul><ul><li>OntoSeek </li></ul></ul><ul><li>Nature Publishing Group </li></ul><ul><ul><li>Urchin </li></ul></ul>
  49. 49. Issues <ul><li>Scale: must be commensurate with expectations and requirements from traditional web and enterprise search </li></ul><ul><ul><li>Number of objects, feeds: 10 6 to 10 9 </li></ul></ul><ul><ul><li>Ingest rates: ~ 10 3 – 10 4 triples/sec, how many per resource? </li></ul></ul><ul><ul><li>Tagging: where and when? </li></ul></ul><ul><ul><li>Latency: < 0.5 sec user time regardless of application </li></ul></ul><ul><li>Retrieval algorithms: many alternatives still being explored </li></ul><ul><ul><li>Federated services vs. centralized servers </li></ul></ul><ul><ul><li>Relationship to relevance ranking </li></ul></ul><ul><ul><li>Support for aggregate and text search operators in RDF query </li></ul></ul><ul><li>Usability: lots of work to be done to validate benefits </li></ul><ul><ul><li>Navigation </li></ul></ul><ul><ul><li>Precision and recall </li></ul></ul><ul><ul><li>Visualization </li></ul></ul><ul><li>Security, trust and provenance: just beginning to understand </li></ul>
  50. 50. Lessons <ul><li>Balanced incremental approach </li></ul><ul><li>Leverage metadata and indices at hand </li></ul><ul><li>Exploit statistics where desirable </li></ul><ul><ul><li>But layer a framework on top to structure the statistics </li></ul></ul><ul><li>Significant mileage from very simple frameworks </li></ul>
  51. 51. Lessons: ontologies <ul><li>Don’t do: assume you have to build elaborate OWL ontologies </li></ul><ul><ul><li>Don’t have to boil the ocean to get the benefits </li></ul></ul><ul><ul><li>OWL DL, are OWL Full are overkill for this class of application </li></ul></ul><ul><li>Do: Tiny Ontologies Stitched Together </li></ul><ul><ul><li>RDF Schema with a smattering of RDF/OWL properties (e.g., owl:inverse) </li></ul></ul><ul><ul><li>Start with DC + SKOS + FOAF </li></ul></ul>
  52. 52. Lessons: controlled vocabularies <ul><li>Don’t do: huge monolithic taxonomies </li></ul><ul><ul><li>Unless they are ready at hand and can be reused largely without modification </li></ul></ul><ul><li>Do: bite-sized controlled vocabularies that exploit faceted approaches </li></ul><ul><ul><li>4 facets x 10 terms per facet versus 10 4 terms in a single taxonomy </li></ul></ul><ul><ul><li>Start with flat term lists </li></ul></ul><ul><ul><li>Add BT/NT/RT relationships over time </li></ul></ul>
  53. 53. Lessons: instances <ul><li>Manual creation </li></ul><ul><ul><li>Don’t do: exhaustive author creation of metadata </li></ul></ul><ul><ul><li>Do: community annotation and tagging </li></ul></ul><ul><li>(Semi-)automated creation </li></ul><ul><ul><li>Don’t do: assume elaborate information extraction based on NLP, subject tagging and categorization </li></ul></ul><ul><ul><li>Do: quick and dirty NEE or better yet, stick to readily available asset and relational metadata (date, creator, document type/genre) </li></ul></ul><ul><ul><ul><li>Much of the benefit at a fraction of the effort </li></ul></ul></ul>
  54. 54. Application profiles <ul><li>Metadata is increasingly pervasive </li></ul><ul><ul><li>The way to leverage existing information infrastructure </li></ul></ul><ul><li>Exploit “on-demand” information integration feature of RDF </li></ul><ul><li>DB + XML -> XLST - > RDF(S) </li></ul>
  55. 55. The big question: statistics vs. knowledge <ul><li>Statistics can’t deliver everything </li></ul><ul><ul><li>Alan Kay’s puppy analogy </li></ul></ul><ul><ul><li>Vitanyi work on “Google learning” </li></ul></ul><ul><li>On the other hand, knowledge is dearly won </li></ul><ul><ul><li>CYC </li></ul></ul><ul><li>Need a balance that enables adoption without losing the benefits </li></ul><ul><li>Lessons from </li></ul><ul><ul><li>Statistics vs. knowledge in NLP </li></ul></ul><ul><ul><li>Expert systems </li></ul></ul>
  56. 56. Future directions <ul><li>User tagging + RDF: the killer SW application? </li></ul><ul><ul><li>The rehabilitation of metadata in the social software community </li></ul></ul><ul><ul><li>The re-emergence of RSS/RDF </li></ul></ul><ul><ul><li>“ Folksonomy”-driven collaborative search </li></ul></ul><ul><ul><ul><li>Del.icio.us, Flickr, CiteULike </li></ul></ul></ul><ul><li>Growth of the SW compared to historical growth of the Web: it’s 1994 all over again </li></ul>
  57. 57. Summary <ul><li>Semantic search has a role in today’s enterprises </li></ul><ul><li>RDF provides a framework that can ease adoption and encourage innovation in semantic search </li></ul><ul><li>The future for enterprise and consumer use looks bright </li></ul>
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×