Your SlideShare is downloading. ×
Finding knowledge, data and answers on the Semantic Web Tim Finin University of Maryland, Baltimore County http://ebiquity...
This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applicat...
Google has made us smarter
But what about our agents? <ul><li>Agents still have a very minimal understanding of text and images. </li></ul>tell regis...
But what about our agents? <ul><li>A Google for knowledge on the Semantic Web is needed by software agents and programs </...
This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applicat...
<ul><li>http://swoogle.umbc.edu/ </li></ul><ul><li>Running since summer 2004 </li></ul><ul><li>1.8M RDF docs, 320M triples...
Swoogle Architecture Analysis Index Discovery IR Indexer Search Services Semantic Web metadata Web  Service Web  Server Ca...
A Hybrid Harvesting Framework Manual  submission RDF crawling Bounded HTML crawling Meta crawling Seeds M Seeds H Seeds R ...
Performance – Site Coverage <ul><li>SW06MAR  -  Basic statistics (Mar 31, 2006) </li></ul><ul><ul><li>1.3M SWDs from 157K ...
Performance – crawlers’ contribution  <ul><li>High SWD ratio:  42% URLs are confirmed as SWD </li></ul><ul><li>Consistent ...
This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applicat...
Applications and use cases <ul><li>Supporting Semantic Web developers </li></ul><ul><ul><li>Ontology designers, vocabulary...
1
By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size. 80 ontologies w...
Basic Metadata hasDateDiscovered :  2005-01-17  hasDatePing :  2006-03-21  hasPingState :  PingModified  type :  SemanticW...
 
rdfs:range was used 41 times to assert a value. owl:ObjectProperty was instantiated 28 times  time:Cal… defined once and u...
These are the namespaces this ontology uses.  Clicking on one shows all of the documents using the namespace. All of this ...
Here’s what the agent sees.  Note the swoogle and wob (web of belief) ontologies.
We can also search for terms (classes, properties) like terms for “person”.
10K terms associated with “person”! Ordered by use. Let’s look at foaf:Person’s metadata
 
 
 
87K documents used foaf:gender with a foaf:Person instance as the subject
3K documents used dc:creator with a foaf:Person instance as the object
Swoogle’s archive saves every version of a SWD it’s seen.
 
2 <ul><li>An NSF ITR collaborative project with </li></ul><ul><li>University of Maryland, Baltimore County  </li></ul><ul>...
An invasive species scenario <ul><li>Nile Tilapia fish have been found in a California lake. </li></ul><ul><li>Can this in...
Food Webs <ul><li>A food web models the trophic (feeding) relationships between organisms in an ecology </li></ul><ul><ul>...
East River Valley Trophic Web   http://www.foodwebs.org/
Species List Constructor <ul><li>Click a county, get a species list </li></ul>
The problem <ul><li>We have data on what species are known to be in the location and can further restrict and fill in with...
 
Food Web Constructor <ul><li>Predict food web links using database and taxonomic reasoning. </li></ul>In an new estuary, N...
Evidence Provider <ul><li>Examine evidence for predicted links. </li></ul>
Status <ul><li>Goal is ELVIS   (Ecosystem Location Visualization and Information System) as an integrated set of web servi...
UMBC Triple Shop <ul><li>http://sparql.cs.umbc.edu/ </li></ul><ul><li>Online SPARQL RDF query processing with several inte...
Web-scale semantic web data access agent data access service the Web ask (“person”) Search vocabulary ask (“?x rdf:type fo...
Who knows Anupam Joshi? Show me their names, email address and pictures
The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles
PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT  DISTINCT ?p2name ?p2mbox ?p2pix FROM ??? WHERE { ?p1 foaf:surname &quot;...
Enter query w/o FROM clause! log in specify dataset
 
 
302 RDF documents were found that might have useful data.
We’ll select them all and add them to the current dataset.
We’ll run the query against this dataset to see if the results are as expected.
The results can be produced in any of several formats
 
Looks like a useful dataset.  Let’s save it and also materialize it the TS triple store.
 
We can also annotate, save and share queries.
Work in Progress <ul><li>There are a host of performance issues </li></ul><ul><li>We plan on supporting some special datas...
This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applicat...
Will Swoogle Scale? How? <ul><li>Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle...
How much reasoning should Swoogle do? <ul><li>SwoogleN (N<=3) does limited reasoning </li></ul><ul><ul><li>It’s expensive ...
A RDF Dictionary <ul><li>We hope to develop an RDF dictionary. </li></ul><ul><li>Given an RDF term, returns a graph of its...
This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applicat...
Conclusion <ul><li>The web will contain the world’s knowledge in forms accessible to people and computers </li></ul><ul><u...
<ul><ul><li>http://ebiquity.umbc.edu/ </li></ul></ul>Annotated in OWL For more  information
Upcoming SlideShare
Loading in...5
×

Finding knowledge, data and answers on the Semantic Web

2,439

Published on

Web search engines like Google have made us all smarter by providing ready access to the world's knowledge whenever we need to look up a fact, learn about a topic or evaluate opinions. The W3C's Semantic Web effort aims to make such knowledge more accessible to computer programs by publishing it in machine understandable form.
As the volume of Semantic Web data grows software agents will need their own search engines to help them find the relevant and trustworthy knowledge they need to perform their tasks. We will discuss the general issues underlying the indexing and retrieval of RDF based information and describe Swoogle, a crawler based search engine whose index contains information on over a million RDF documents.
We will illustrate its use in several Semantic Web related research projects at UMBC including a distributed platform for constructing end-to-end use cases that demonstrate the semantic web’s utility for integrating scientific data. We describe ELVIS (the Ecosystem Location Visualization and Information System), a suite of tools for constructing food webs for a given location, and Triple Shop, a SPARQL query interface which searches the Semantic Web for data relevant to a given query ELVIS functionality is exposed as a collection of web services, and all input and output data is expressed in OWL, thereby enabling its integration with Triple Shop and other semantic web resources.

Published in: Technology, Education
1 Comment
6 Likes
Statistics
Notes
  • hey there,could you please mail this across to me,it will truly assist me for my function.thank you really much.
    Teisha
    http://dashinghealth.com http://healthimplants.com
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,439
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide
  • Transcript of "Finding knowledge, data and answers on the Semantic Web"

    1. 1. Finding knowledge, data and answers on the Semantic Web Tim Finin University of Maryland, Baltimore County http://ebiquity.umbc.edu/resource/html/id/202/ Joint work with Li Ding, Anupam Joshi, Yun Peng, Cynthia Parr, Pranam Kolari, Pavan Reddivari, Sandor Dornbush, Rong Pan, Akshay Java, Joel Sachs, Scott Cost and Vishal Doshi  http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433 and grants from IBM, Fujitsu and HP.
    2. 2. This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applications </li></ul><ul><li>Observations </li></ul><ul><li>Conclusions </li></ul>
    3. 3. Google has made us smarter
    4. 4. But what about our agents? <ul><li>Agents still have a very minimal understanding of text and images. </li></ul>tell register
    5. 5. But what about our agents? <ul><li>A Google for knowledge on the Semantic Web is needed by software agents and programs </li></ul>Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle Swoogle tell register
    6. 6. This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applications </li></ul><ul><li>Observations </li></ul><ul><li>Conclusions </li></ul>
    7. 7. <ul><li>http://swoogle.umbc.edu/ </li></ul><ul><li>Running since summer 2004 </li></ul><ul><li>1.8M RDF docs, 320M triples, 10K ontologies, 15K namespaces, 1.3M classes, 175K properties, 43M instances, 600 registered users </li></ul>
    8. 8. Swoogle Architecture Analysis Index Discovery IR Indexer Search Services Semantic Web metadata Web Service Web Server Candidate URLs Bounded Web Crawler Google Crawler SwoogleBot SWD Indexer Ranking document cache SWD classifier human machine html rdf/xml … the Web Semantic Web Information flow Swoogle‘s web interface Legends
    9. 9. A Hybrid Harvesting Framework Manual submission RDF crawling Bounded HTML crawling Meta crawling Seeds M Seeds H Seeds R Swoogle Sample Dataset Inductive learner the Web Google API call crawl crawl true would google
    10. 10. Performance – Site Coverage <ul><li>SW06MAR - Basic statistics (Mar 31, 2006) </li></ul><ul><ul><li>1.3M SWDs from 157K websites </li></ul></ul><ul><ul><li>268M triples </li></ul></ul><ul><ul><li>61K SWOs including >10K in high quality </li></ul></ul><ul><ul><li>1.4M SWTs using 12K namespaces </li></ul></ul><ul><li>Significance </li></ul><ul><ul><li>Compare with existing works ( DAML crawler, scutter ) </li></ul></ul><ul><ul><li>Compare SW06MAR with Google’s estimated SWDs </li></ul></ul>SWDs per website Website
    11. 11. Performance – crawlers’ contribution <ul><li>High SWD ratio: 42% URLs are confirmed as SWD </li></ul><ul><li>Consistent growth rate: 3000 SWDs per day </li></ul><ul><li>RDF crawler: best harvesting method </li></ul><ul><li>HTML crawler: best accuracy </li></ul><ul><li>Meta crawler: best in detecting websites </li></ul># of documents
    12. 12. This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applications </li></ul><ul><li>Observations </li></ul><ul><li>Conclusions </li></ul>
    13. 13. Applications and use cases <ul><li>Supporting Semantic Web developers </li></ul><ul><ul><li>Ontology designers, vocabulary discovery, who’s using my ontologies or data?, use analysis, errors, statistics, etc. </li></ul></ul><ul><li>Searching specialized collections </li></ul><ul><ul><li>Spire: aggregating observations and data from biologists </li></ul></ul><ul><ul><li>InferenceWeb: searching over and enhancing proofs </li></ul></ul><ul><ul><li>SemNews: Text Meaning of news stories </li></ul></ul><ul><li>Supporting SW tools </li></ul><ul><ul><li>Triple shop: finding data for SPARQL queries </li></ul></ul>1 2 3
    14. 14. 1
    15. 15. By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size. 80 ontologies were found that had these three terms Let’s look at this one
    16. 16. Basic Metadata hasDateDiscovered :  2005-01-17 hasDatePing :  2006-03-21 hasPingState :  PingModified type :  SemanticWebDocument isEmbedded :  false hasGrammar :  RDFXML hasParseState :  ParseSuccess hasDateLastmodified :  2005-04-29 hasDateCache :  2006-03-21 hasEncoding :  ISO-8859-1 hasLength :  18K hasCntTriple :  311.00 hasOntoRatio :  0.98 hasCntSwt :  94.00 hasCntSwtDef :  72.00 hasCntInstance :  8.00
    17. 18. rdfs:range was used 41 times to assert a value. owl:ObjectProperty was instantiated 28 times time:Cal… defined once and used 24 times (e.g., as range)
    18. 19. These are the namespaces this ontology uses. Clicking on one shows all of the documents using the namespace. All of this is available in RDF form for the agents among us.
    19. 20. Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.
    20. 21. We can also search for terms (classes, properties) like terms for “person”.
    21. 22. 10K terms associated with “person”! Ordered by use. Let’s look at foaf:Person’s metadata
    22. 26. 87K documents used foaf:gender with a foaf:Person instance as the subject
    23. 27. 3K documents used dc:creator with a foaf:Person instance as the object
    24. 28. Swoogle’s archive saves every version of a SWD it’s seen.
    25. 30. 2 <ul><li>An NSF ITR collaborative project with </li></ul><ul><li>University of Maryland, Baltimore County </li></ul><ul><li>University of Maryland, College Park </li></ul><ul><li>U. Of California, Davis </li></ul><ul><li>Rocky Mountain Biological Laboratory </li></ul>
    26. 31. An invasive species scenario <ul><li>Nile Tilapia fish have been found in a California lake. </li></ul><ul><li>Can this invasive species thrive in this environment? </li></ul><ul><li>If so, what will be the likely consequences for the ecology? </li></ul><ul><li>So…we need to understand the effects of introducing this fish into the food web of a typical California lake </li></ul>
    27. 32. Food Webs <ul><li>A food web models the trophic (feeding) relationships between organisms in an ecology </li></ul><ul><ul><li>Food web simulators are used to explore the consequences of changes in the ecology, such as the introduction or removal of a species </li></ul></ul><ul><ul><li>A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them. </li></ul></ul><ul><li>Goal: automatically construct a food web for a new location using existing data and knowledge </li></ul><ul><li>ELVIS: Ecosystem Location Visualization and Information System </li></ul>
    28. 33. East River Valley Trophic Web http://www.foodwebs.org/
    29. 34. Species List Constructor <ul><li>Click a county, get a species list </li></ul>
    30. 35. The problem <ul><li>We have data on what species are known to be in the location and can further restrict and fill in with other ecological models </li></ul><ul><li>But we don’t know which of these the Nile Tilapia eats of who might eat it. </li></ul><ul><li>We can reason from taxonomic data (simlar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps. </li></ul>
    31. 37. Food Web Constructor <ul><li>Predict food web links using database and taxonomic reasoning. </li></ul>In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected
    32. 38. Evidence Provider <ul><li>Examine evidence for predicted links. </li></ul>
    33. 39. Status <ul><li>Goal is ELVIS (Ecosystem Location Visualization and Information System) as an integrated set of web services for constructing food webs for a given location. </li></ul><ul><li>Background ontologies </li></ul><ul><ul><li>SpireEcoConcepts: concepts and properties to represent food webs, and ELVIS related tasks, inputs and outputs </li></ul></ul><ul><ul><li>ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources </li></ul></ul><ul><li>Under development </li></ul><ul><ul><li>Connect to visualization software </li></ul></ul><ul><ul><li>Connect to triple shop to discover more data </li></ul></ul>
    34. 40. UMBC Triple Shop <ul><li>http://sparql.cs.umbc.edu/ </li></ul><ul><li>Online SPARQL RDF query processing with several interesting features </li></ul><ul><li>Automatically finds SWDs for give queries using Swoogle backend database </li></ul><ul><li>Datasets, queries and results can be saved, tagged, annotated, shared, searched for, etc. </li></ul><ul><li>RDF datasets as first class objects </li></ul><ul><ul><li>Can be stored on our server or downloaded </li></ul></ul><ul><ul><li>Can be materialized in a database or (soon) as a Jena model </li></ul></ul>3
    35. 41. Web-scale semantic web data access agent data access service the Web ask (“person”) Search vocabulary ask (“?x rdf:type foaf:Person”) inform (“foaf:Person”) Fetch docs Populate RDF database Query local RDF database inform (doc URLs) Search URIrefs in SW vocabulary Search URLs in SWD index Compose query Index RDF data
    36. 42. Who knows Anupam Joshi? Show me their names, email address and pictures
    37. 43. The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles
    38. 44. PREFIX foaf: <http://xmlns.com/foaf/0.1/> SELECT DISTINCT ?p2name ?p2mbox ?p2pix FROM ??? WHERE { ?p1 foaf:surname &quot;Joshi&quot; . ?p1 foaf:firstName “Anupam&quot; . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . } ORDER BY ?p2name No FROM clause!
    39. 45. Enter query w/o FROM clause! log in specify dataset
    40. 48. 302 RDF documents were found that might have useful data.
    41. 49. We’ll select them all and add them to the current dataset.
    42. 50. We’ll run the query against this dataset to see if the results are as expected.
    43. 51. The results can be produced in any of several formats
    44. 53. Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.
    45. 55. We can also annotate, save and share queries.
    46. 56. Work in Progress <ul><li>There are a host of performance issues </li></ul><ul><li>We plan on supporting some special datasets, e.g., </li></ul><ul><ul><li>FOAF data collected from Swoogle </li></ul></ul><ul><ul><li>Definitions of RDF and OWL classes and properties from all ontologies that Swoogle has discovered </li></ul></ul><ul><li>Expanding constraints to select candidate SWDs to include arbitrary metadata and embedded queries </li></ul><ul><ul><li>FROM “documents trusted by a member of the SPIRE project” </li></ul></ul><ul><li>We will explore two models for making this useful </li></ul><ul><ul><li>As a downloadable application for client machines </li></ul></ul><ul><ul><li>As an (open source?) downloadable service for servers supporting a community of users. </li></ul></ul>
    47. 57. This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applications </li></ul><ul><li>Observations </li></ul><ul><li>Conclusions </li></ul>
    48. 58. Will Swoogle Scale? How? <ul><li>Here’s a rough estimate of the data in RDF documents on the semantic web based on Swoogle’s crawling </li></ul>We think Swoogle’s centralized approach can be made to work for the next few years if not longer. 5x10 13 5x10 11 5x10 9 5x10 9 5x10 6 2008 5x10 11 5x10 9 5x10 7 5x10 7 1x10 6 2006 1x10 10 7.5x10 7 1.5x10 7 7x10 5 2x10 5 Swoogle3 7x10 9 5x10 7 7x10 6 3.5x10 5 1.5x10 5 Swoogle2 Bytes Triples Individuals Documents Terms System/date
    49. 59. How much reasoning should Swoogle do? <ul><li>SwoogleN (N<=3) does limited reasoning </li></ul><ul><ul><li>It’s expensive </li></ul></ul><ul><ul><li>It’s not clear how much should be done </li></ul></ul><ul><li>More reasoning would benefit many use cases </li></ul><ul><ul><li>e.g., type hierarchy </li></ul></ul><ul><li>Recognizing specialized metadata </li></ul><ul><ul><li>E.g., that ontology A some maps terms from B to C </li></ul></ul>
    50. 60. A RDF Dictionary <ul><li>We hope to develop an RDF dictionary. </li></ul><ul><li>Given an RDF term, returns a graph of its definiton </li></ul><ul><ul><li>Term  definition from “official” ontology </li></ul></ul><ul><ul><li>Term+URL  definition from SWD at URL </li></ul></ul><ul><ul><li>Term+*  union definition </li></ul></ul><ul><ul><li>Optional argument recursively adds definitions of terms in definition excluding RDFS and OWL terms </li></ul></ul><ul><ul><li>Optional arguments identifies more namespaces to exclude </li></ul></ul>
    51. 61. This talk <ul><li>Motivation </li></ul><ul><li>Swoogle Semantic Web search engine </li></ul><ul><li>Use cases and applications </li></ul><ul><li>Observations </li></ul><ul><li>Conclusions </li></ul>
    52. 62. Conclusion <ul><li>The web will contain the world’s knowledge in forms accessible to people and computers </li></ul><ul><ul><li>We need better ways to discover, index, search and reason over SW knowledge </li></ul></ul><ul><li>SW search engines address different tasks than html search engines </li></ul><ul><ul><li>So they require different techniques and APIs </li></ul></ul><ul><li>Swoogle like systems can help create consensus ontologies and foster best practices </li></ul><ul><ul><li>Swoogle is for Semantic Web 1.0 </li></ul></ul><ul><ul><li>Semantic Web 2.0 will make different demands </li></ul></ul>
    53. 63. <ul><ul><li>http://ebiquity.umbc.edu/ </li></ul></ul>Annotated in OWL For more information

    ×