• Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
  • Redo as formatting went wrong:

    Nice presentation!

    Little points to fix., if possible.

    35/71

    -- use generic HTTP URIs so that you can use a single URI for Referring-to Things and Looking up (Accessing) their Structured Representation

    -- use RDF to produce Structured Representations of these Things. 14/71

    -- Virtuoso Sponger (a bubble in the LOD Cloud) enables generation of an RDF based description for any HTTP accessible resource.

    The URIBurner instance at (http://uriburner.com and the LOD Cloud Cache at:

    http://lod.openlinksw.com are live examples of progressively populated Linked Data Spaces).

    -- PingTheSemanticWeb (PTSW) is another progressively populated Linked Data Space

    -- PTSW + Virtuoso Sponger add a dynamic dimension to the LOD triple count game which basically takes LOD (deep Web) way beyond 15 Billion Triples, In short, we’ve stopped counting :-)


    57/71 Note all of the following showcase Entity Ranking (Data Ranking) based of Link coefficients. You can even order SPARQL queries by Entity Rank.
    Are you sure you want to
    Your message goes here
  • Nice presentation!
    Little points to fix., if possible.

    35/71
    -- use generic HTTP URIs so that you can use a single URI for Referring-to Things and Looking up (Accessing) their Structured Representation
    -- use RDF to produce Structured Representations of these Things.

    14/71

    -- Virtuoso Sponger (a bubble in the LOD Cloud) enables generation of an RDF based description for any HTTP accessible resource. The URIBurner instance at (http://uriburner.com and the LOD Cloud Cache at: http://lod.openlinksw.com are live examples of progressively populated Linked Data Spaces).

    -- PingTheSemanticWeb (PTSW) is another progressively populated Linked Data Space

    PTSW + Virtuoso Sponger add a dynamic dimension to the LOD triple count game which basically takes LOD (deep Web) way beyond 15 Billion Triples, In short, we've stopped counting :-)

    57/71
    Note all of the following showcase Entity Ranking (Data Ranking) based of Link coefficients;





    You can even order SPARQL queries by Entity Rank.
    Are you sure you want to
    Your message goes here
No Downloads

Views

Total Views
2,649
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
59
Comments
2
Likes
8

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Linked Open Data Universe Harald Sack Internet Technologies and Systems (ITS) Future Internet Technologies Hasso-Plattner-Institute for IT Systems Engineering 5th Annual Symposium on Future Trends in Service-Oriented Computing June 16th, 2010 Hasso-Plattner-Institute for IT Systems Engineering Potsdam
  • 2. The Web is huge.... 2 To be more precise, the WWW is rather huge... •more than 25 x 109 documents in Search engine indexes (TNL Blog: Google has 24 billion items index, considers MSN search nearest competitor, September 2005) •Google Web Crawler found more than 1012 documents (The Official Google Blog: We knew the Web was Big....., Juli 25, 2008) •New Google Search Index Caffeine comprises 100 Million Gigabytes of data i.e. 1017 Byte (SMX Video: Google’s Matt Cutts On Caffeine Launch, June 9, 2010, http://searchengineland.com/smx-video-googles-matt-cutts-on-caffeine-launch-43933) •And then, there is also the DeepWeb (Darkweb) ...and it is supposed to be up to 500 time larger than the Surface Web (Bergman, 2001) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 3. The Web is growing... 3 Multimedia, Real-Time Data, Sensor Data, .... in 06/2010: 7 TB/day in 05/2010: • 24 h of video upload / minute • 2 billion streamed videos per day JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 4. The Web is growing... 3 Multimedia, Real-Time Data, Sensor Data, .... in 06/2010: 7 TB/day in 05/2010: • 24 h of video upload / minute • 2 billion streamed videos per day JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 5. How to find something on the Web? 4 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 6. The ‘Web of Data‘ 5 Semantic Web Technologies • Interoperable and machine understandable data semantics • Based on formal knowledge representations • Creating a ‘Web of Data‘ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 7. Linked Open Data Universe 6 • Topic: Semantic Web and Linked Data • Problems and Experiments • Application: Exploratory Multimedia Search JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 8. Semantic Web and Linked Data 7 From World Wide Web to Web of Data „The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help… “ Tim Berners-Lee, Semantic Web Roadmap, Sept 1998 Prerequisites: Semantic Web • Content can be read and • (natural language) web content is interpreted correctly explicitely annotated with semantic (=understood) by metadata machines • semantic metadata encode the meaning (semantics) of web content and can be read and Natural Language Processing interpreted correctly my machine • Technology from traditional Information Retrieval (WWW Search Engines) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 9. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information Retrieval (WWW Search Engines) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 10. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information Retrieval (WWW Search Engines) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 11. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information text: „FAB“ Retrieval (WWW Search Engines) Entity Mapping ? Disambiguation fabulous ? ? ... JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 12. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information text: „FAB“ Retrieval (WWW Search Engines) Entity Mapping ? Disambiguation fabulous ? ? ... Manager of Fabio Capello UK National Football Team JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 13. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information text: „FAB“ Retrieval (WWW Search Engines) Entity Mapping ? Disambiguation fabulous ? ? ... Manager of Fabio Capello UK National Football Team Goal Keeper of David James UK National Football Team JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 14. Semantic Web and Linked Data 9 Understanding Web Content - II text: „FAB“ Entity Mapping Fabio Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 15. Semantic Web and Linked Data 9 Understanding Web Content - II text: „FAB“ Entity Mapping Fabio Capello is a Soccer Manager JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 16. Semantic Web and Linked Data 9 Understanding Web Content - II text: „FAB“ Entity Mapping Fabio Capello is a Soccer Manager is a Person JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 17. Semantic Web and Linked Data 10 Understanding Web Content - III Fabio Capello (entity) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 18. Semantic Web and Linked Data 10 Understanding Web Content - III Fabio Capello (entity) Class- is a membership has type Soccer Manager (class) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 19. Semantic Web and Linked Data 10 Understanding Web Content - III Fabio Capello (entity) Class- is a membership has type Soccer Manager (class) subclass is a is subclass of superclass Person (class) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 20. Semantic Web and Linked Data 11 Understanding Web Content - IV Fabio Capello Entities is a Classes Soccer Manager is a Person JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 21. Semantic Web and Linked Data 11 Understanding Web Content - IV Fabio Capello Entities is a Classes Soccer Manager is a Person Place hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 22. Semantic Web and Linked Data 11 Understanding Web Content - IV Fabio Capello Entities is a Classes Soccer Manager is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 23. Semantic Web and Linked Data 11 Understanding Web Content - IV 1946-06-18 Fabio Capello hasBirthDate Entities is a Classes is a Soccer Manager is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 24. Semantic Web and Linked Data 11 Understanding Web Content - IV 1946-06-18 Fabio Capello San Canzian d‘Isonzo hasBirthDate hasBirthPlace Entities is a Classes is a Soccer Manager is a is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 25. Semantic Web and Linked Data 12 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 26. Semantic Web and Linked Data 13 URI - Uniform Resource Identifier Fabio Capello http://dbpedia.org/resource/Fabio_Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 27. Semantic Web and Linked Data 14 http://en.wikipediapedia.org/resource/Fabio_Capello http://dbpedia.org/resource/Fabio_Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 28. Semantic Web and Linked Data 15 http://dbpedia.org/resource/Fabio_Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 29. Semantic Web and Linked Data 16 http://dbpedia.org/resource/Fabio_Capello :Fabio_Capello dbpp:birthPlace :San_Canzian_d%27Isonzo . :Fabio_Capello dbpp:birthDate “1946-06-18“ . :Fabio_Capello rdfs:type dbpo:SoccerManager . :Fabio_Capello rdfs:type dbpo:Person . ... RDF Resource Description Framework :Fabio_Capello rdf:type dbpo:SoccerManager . RDF Tripel RDF Subject RDF Property RDF Object JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 30. Semantic Web and Linked Data 17 http://dbpedia.org/ontology/soccer_manager dbpo:SoccerManager rdf:type owl:class . dbpo:SoccerManager rdfs:subClassOf dbpo:Person . dbpo:SoccerManager rdfs:label “Soccer Manager“ . dbpp:birthPlace rdf:type rdf:Property . dbpp:birthPlace rdfs:domain dbpo:Person . dbpp:birthPlace rdfs:range dbpo:Place . dbpp:birthDate rdf:type rdf:Property . dbpp:birthDate rdfs:domain :Person . dbpp:birthDate rdfs:range xsd:date . ... RDF Schema Soccer Manager is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 31. Semantic Web and Linked Data 18 Understanding Web Content - V 1946-06-18 Fabio Capello hasBirthDate is a logical constraint is a LivingPeople ∩ DeadPeople =∅ is a is a Date Person hasBirthDate ∀x.∃y.hasDeathDate(x,y) ∧ Person(x) ∧ Date(y) + Rules → DeadPeople(x) (Description Logics) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 32. Semantic Web and Linked Data 19 Select all players of a soccer nationalteam that have scored more than 10 goals while in the team SELECT DISTINCT ?l ?l2 ?g FROM <http://dbpedia.org> WHERE { ?s dbpp:nationalteam ?o . ?s rdfs:label?l FILTER langMatches( lang(?l), "EN" ) . ?s dbpp:nationalgoals ?g FILTER(?g>10). ?s dbprop:nationalteam ?nat . ?nat rdfs:label ?l2 FILTER langMatches( lang(?l2), "EN" ). } ORDER BY DESC(?g) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 33. Semantic Web and Linked Data 20 Select all players of a soccer nationalteam that have scored more than 10 goals while in the team JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 34. Semantic Web and Linked Data 21 Linked Data ■ Term was originally coined by Tim Berners-Lee (Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html) The Web of data is about a data (RDF) and naming (URI) model on the Web M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.net/mediasemanticweb/quick-linked-data-introduction JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 35. Semantic Web and Linked Data 22 Linked Data ■ Technical Principles □ use URIs to identify things uniquely (not only documents...) □ use HTTP URIs (URLs) so that these things can be referred to and looked up ("dereferenced") by people and user agents □ use RDF as an universal data model to provide useful information about these things □ include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 36. Semantic Web and Linked Data 23 Linked Data □ The application lf the Linked Data principles leads to the creation of a ,Web of Data‘ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 37. Semantic Web and Linked Data 24 Linking Open Data ■ Public available structured data should be published as Linked Data ■ Various data sources should be interlinked LOD-WikiPage: http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 38. Semantic Web and Linked Data 25 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 39. Semantic Web and Linked Data 26 Linked Data Achievments ■ Extension of the Web with a data commons (14b RDF triples = facts) ■ Vibrant global RTD community ■ Industrial uptake starting (BBC, Thomson, Reuters, etc.) ■ Emerging governmental adoption in sight ■ Establishing Linked Data as a deployment path for the Semantic Web JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 40. Semantic Web and Linked Data 27 Linked Data Challenges ■ Coherence relatively few, expensively maintained links ■ Quality partly low quality data and inconsistencies ■ Performance still substantial penalties compared to relational database technologies Sören Auer:"Linked Data: Now what?" ■ Data consumption ESWC2010 Panel Discussion large scale processing, schema mapping and data fusion still in its infancy ■ Usability Missing direct end user tools and network effect JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 41. Linked Open Data Universe 28 • Topic: Semantic Web and Linked Data • Problems and Experiments • Application: Exploratory Multimedia Search JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 42. Problems and Experiments 29 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 43. Problems and Experiments 30 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 44. Problems and Experiments 31 A. Hoigan et al: Weaving the Pedantic Web, LDOW 2010 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 45. Problems and Experiments 32 Experiment Summary (1) Crawling the Semantic Web (2) Structural Analysis (3) Content-based Analysis (4) Data Cleansing (5) Heuristics for Ranking Semantic Web Data (6) Augmenting Semantic Web Infrastructure JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 46. Problems and Experiments 33 So what? ■ Interesting Facts to find out about Semantic Web & Linked Data ■ How big is the Semantic Universe? ■ # tripel ■ # documents ■ # interlinking ■ Linking Open Data is only registered vocabulary/data in the LOD-Wiki → 14b RDF triples ■ What else is out there ... and how much of it? ■ ...and how do we get it? JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 47. Problems and Experiments 34 (1) Crawling the Semantic Web ■ Of course we are not the first to be out there... ■ Swoogle Li Ding et al: Finding and Ranking Knowledge on the Semantic Web, ISWC 2005. ■ Scutter/Slug Leigh Dodds: Slug: A Semantic Web Crawler, 2006 ■ Sindice Giovanni Tumarello et al: Sindice.com - weaving the open linked data, ISWC 2007 → 2.1b RDF triples ■ SWSE Andreas Harth et al: SWSE: Objects before Documents, Semantic Web Challenge 2008, ISWC 2008 → 1.1b RDF triples ■ Falcons G.Cheng et al.:Falcons: Searching and Browsing Entities on the Semantic Web, WWW17 2008. → 2.9b RDF triples JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 48. Problems and Experiments 35 (1) Crawling the Semantic Web ■ First experiments: ■ Adapting & Improving Slug Crawler ■ for parallelization (48 Cores) and ■ lots of RAM (256GB - 2TB) ■ first test run: >1GB RDF data/1h ■ What‘s new: ■ crawl not only RDF/RDFS and OWL resources ■ include (X)HTML with RDFa extensions and ■ dynamic documents with (semantic) sitemaps ■ What‘s next...? JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 49. Problems and Experiments 36 (2) Analyzing the Semantic Web I - Structural Analysis ■ Again we are not the first to be out there... ■ Structural Analysis of the ,early‘ WWW unconnected components appendices appendices IN SCC OUT 44m nodes 56m nodes 44m nodes tunnels A. Broder et al.: Graph structure in the Web. unconnected components In Comput. Netw. 33, 1-6 (Jun. 2000), 309-320. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 50. Problems and Experiments 37 (2) Analyzing the Semantic Web I - Structural Analysis ■ Again we are not the first to be there... ■ Structural Analysis of the ,early‘ Semantic Web Weiyi Ge et al.: Object Link Structure in the Semantic Web, ESWC 2010 ■ Experimental Setup ■ 18m RDF documents (Falcons crawl 2009) ■ 110m nodes with 190m edges ■ Analysis of RDF link graph ■ average node degree: ≈3.4 ■ effective diameter: ≈11.5 ■ Largest connected component: ≈88% of all nodes JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 51. Problems and Experiments 38 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... A. Hoigan et al: Weaving the Pedantic Web, LDOW 2010 ■ 150k documents with more than 12m RDF triples ■ Discovered categories of symptoms: ■ incomplete → dead links ■ incoherent → no correct interpretation (local) ■ hijack → no correct interpretation (remote) ■ inconsistent → contradictions http://pedantic-web.org/ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 52. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 53. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace is a Person JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 54. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace is a Person Place hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 55. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace class membership is a can be deduced Person Place hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 56. Problems and Experiments 40 (4) Analyzing the Semantic Web III - Data Cleansing ■ trying to clean out Linked Open Data and possibly also (partially) the Semantic Web... (1) Identify inconsistencies and ambiguities by (automated) content-based analysis (2) Solve inconsistencies & ambiguities ■ if possible by reasoning ■ else by crowdsourcing (game-based evaluation, etc.) Cleaning out the Augean stables... AUGEAN-STABLES: Extremely nasty and smelly warehouses of filth, straw and manure JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 57. Problems and Experiments 41 (5) Analyzing the Semantic Web IV - Data Ranking ■ Linked Data provides (unbiased) knowledge ■ unbiased = no distinction of what is important, what is not important ■ e.g., Albert Einstein ■ > 600 facts (triples) ■ > 80 properties ■ no ranking ■ no relevance http://dbpedia.org/page/Albert_Einstein JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 58. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type :Albert_Einstein :Scientist JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 59. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type :Albert_Einstein :Scientist rdf:type :Alfred_Kleiner JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 60. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type rdf:type :Albert_Einstein :Scientist :Bill_Cosby rdf:type :Alfred_Kleiner JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 61. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type rdf:type :Albert_Einstein :Scientist :Bill_Cosby rdf:type :doctoralAdviser :Alfred_Kleiner considered to be relevant JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 62. Problems and Experiments 43 (6) Semantic Web Infrastructure - Tripel Stores ■ RDF(S) Data is stored in Triple Stores ■ Basic idea: ■ Use 1 table with 3 columns (s,p,o) ■ For every row / row combination create index structures for fast access (spo, sop, pos, pso, ops, osp) ■ Drawback: many self-joins needed (memory consumption) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 63. Problems and Experiments 44 Experiment Summary (1) Crawling the Semantic Web (2) Structural Analysis (3) Content-based Analysis (4) Data Cleansing (5) Heuristics for Ranking Semantic Web Data (6) Augmenting Semantic Web Infrastructure JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 64. Linked Open Data Universe 45 • Topic: Semantic Web and Linked Data • Problem Defintion and Experiments • Application: Exploratory Multimedia Search JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 65. Application: Exploratory Multimedia Search 46 Yovisto semantic video search engine ■ specialized on academic video content, e.g., lecture recordings ■ enables to search within the content of video ■ automated video analysis: video scene cut detection, intelligent character recognition, complemented by collaborative user annotation ■ more than 8.000h of video Semantic Metadata: http://www.yovisto.com ■ Ontology: http://www.yovisto.com/ontology/0.9/ ■ DBpedia, FOAF, DublinCore, MPEG-7, Tagging ■ RDFa annotation ■ public SPARQL Endpoint: http://sparql.yovisto.com/ J. Waitelonis, H. Sack: Augmenting Video Search with Linked Open Data, in Proc. of International Conference on Semantic Systems 2009 (i-semantics 2009), September, 2-4, 2009, Graz, Journal of JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam Universal Computer Science
  • 66. Application: Exploratory Multimedia Search 47 ■ Semantic Annotation Metadata Extraction time JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 67. Application: Exploratory Multimedia Search 47 ■ Semantic Annotation Metadata Extraction time Entity Recognition/ Mapping e.g., person xy location yz event abc JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 68. Application: Exploratory Multimedia Search 47 ■ Semantic Annotation Metadata Extraction time Entity Recognition/ Mapping e.g., person xy location yz event abc e.g., bibliographical data, geographical data, encyclopedic data, .. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 69. Application: Exploratory Multimedia Search 48 Exploratory Search • Is a kind of investigation task, where the user is (a) not familiar with the domain of the search result, i.e. before entering appropriate keywords, she needs to learn about the domain (b) not sure about the way how to reach search destination (concerning search process and search technology) (c) not really sure about what she’s looking for, i.e. “Can you please find something out about ... ?”. „Which modern philosophers build on the theories of the greek philosopher Plato?“ White, R.W., Kules, B., Drucker, S.M., and schraefel, M.C.Supporting Exploratory Search, Introduction to Special Section of Communications of the ACM, Vol. 49, Issue 4, (2006), pp. 36-39. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 70. history search term related resources with properties 29 Waitelonis, Sack: Augmenting Video Search with Linked Open Data, in Proc. I-Semantics , Graz 2009. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  • 71. Linked Open Data Universe 50 • Topic: Semantic Web and Linked Data • Problem Defintion and Experiments • Application: Exploratory Multimedia Search Thank you for your Attention! JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam