Linked Open Data Universe
Harald Sack
Internet Technologies and Systems (ITS)
Future Internet Technologies
Hasso-Plattner-...
The Web is huge....
2
    To be more precise, the WWW is rather huge...
    •more than 25 x 109 documents in
      Search ...
The Web is growing...
3
    Multimedia, Real-Time Data, Sensor Data, ....



         in 06/2010: 7 TB/day




     in 05/...
The Web is growing...
3
    Multimedia, Real-Time Data, Sensor Data, ....



         in 06/2010: 7 TB/day




     in 05/...
How to find something on the Web?
4




    JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computi...
The ‘Web of Data‘
5
    Semantic Web Technologies
       •   Interoperable and machine understandable
           data sema...
Linked Open Data Universe
6
    • Topic: Semantic Web and Linked Data
    • Problems and Experiments
    • Application: Ex...
Semantic Web and Linked Data
7   From World Wide Web to Web of Data
                            „The Web was designed as a...
Semantic Web and Linked Data
8   Understanding Web Content - I
    Natural Language Processing
      •   Technology from t...
Semantic Web and Linked Data
8   Understanding Web Content - I
    Natural Language Processing
      •   Technology from t...
Semantic Web and Linked Data
8   Understanding Web Content - I
    Natural Language Processing
      •   Technology from t...
Semantic Web and Linked Data
8   Understanding Web Content - I
    Natural Language Processing
      •   Technology from t...
Semantic Web and Linked Data
8   Understanding Web Content - I
    Natural Language Processing
      •   Technology from t...
Semantic Web and Linked Data
9   Understanding Web Content - II

                                                         ...
Semantic Web and Linked Data
9   Understanding Web Content - II

                                                         ...
Semantic Web and Linked Data
9   Understanding Web Content - II

                                                         ...
Semantic Web and Linked Data
10   Understanding Web Content - III


                          Fabio Capello               ...
Semantic Web and Linked Data
10   Understanding Web Content - III


                          Fabio Capello               ...
Semantic Web and Linked Data
10   Understanding Web Content - III


                          Fabio Capello               ...
Semantic Web and Linked Data
11   Understanding Web Content - IV


                                                     Fa...
Semantic Web and Linked Data
11   Understanding Web Content - IV


                                                     Fa...
Semantic Web and Linked Data
11   Understanding Web Content - IV


                                                     Fa...
Semantic Web and Linked Data
11   Understanding Web Content - IV


      1946-06-18                                     Fa...
Semantic Web and Linked Data
11   Understanding Web Content - IV


      1946-06-18                                     Fa...
Semantic Web and Linked Data
12




     JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing...
Semantic Web and Linked Data
13




                                                      URI - Uniform Resource Identifie...
Semantic Web and Linked Data
14    http://en.wikipediapedia.org/resource/Fabio_Capello




            http://dbpedia.org/...
Semantic Web and Linked Data
15     http://dbpedia.org/resource/Fabio_Capello




     JHarald Sack, 5th Annual Symposium ...
Semantic Web and Linked Data
16




                                                                     http://dbpedia.or...
Semantic Web and Linked Data
17




                                                                     http://dbpedia.or...
Semantic Web and Linked Data
18   Understanding Web Content - V


      1946-06-18                                     Fab...
Semantic Web and Linked Data
19                                                                         Select all players...
Semantic Web and Linked Data
20   Select all players of a soccer nationalteam that have
     scored more than 10 goals whi...
Semantic Web and Linked Data
21
     Linked Data
     ■ Term was originally coined by Tim Berners-Lee
        (Tim Berners...
Semantic Web and Linked Data
22
     Linked Data
     ■ Technical Principles
         □ use URIs to identify things unique...
Semantic Web and Linked Data
23
     Linked Data
     □ The application lf the Linked Data principles leads to the creatio...
Semantic Web and Linked Data
24
      Linking Open Data
      ■ Public available structured data should be published as Li...
Semantic Web and Linked Data
25




     JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing...
Semantic Web and Linked Data
26
     Linked Data Achievments
       ■ Extension of the Web with a
         data commons
  ...
Semantic Web and Linked Data
27
     Linked Data Challenges
       ■ Coherence
         relatively few, expensively mainta...
Linked Open Data Universe
28
     • Topic: Semantic Web and Linked Data
     • Problems and Experiments
     • Application...
Problems and Experiments
29




     JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, Ju...
Problems and Experiments
30




     JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, Ju...
Problems and Experiments
31




                                                                                          ...
Problems and Experiments
32
     Experiment Summary
            (1) Crawling the Semantic Web
            (2) Structural A...
Problems and Experiments
33
     So what?
     ■ Interesting Facts to find out about
       Semantic Web & Linked Data


  ...
Problems and Experiments
34
     (1) Crawling the Semantic Web
     ■ Of course we are not the first to be out there...
   ...
Problems and Experiments
35
     (1) Crawling the Semantic Web
       ■ First experiments:
            ■ Adapting & Improv...
Problems and Experiments
36
     (2) Analyzing the Semantic Web I - Structural Analysis
     ■ Again we are not the first t...
Problems and Experiments
37
     (2) Analyzing the Semantic Web I - Structural Analysis
     ■ Again we are not the first t...
Problems and Experiments
38
     (3) Analyzing the Semantic Web II - Content-Based Analysis
     ■ Again we are not the fir...
Problems and Experiments
39
     (3) Analyzing the Semantic Web II - Content-Based Analysis
     ■ Again we are not the fir...
Problems and Experiments
39
     (3) Analyzing the Semantic Web II - Content-Based Analysis
     ■ Again we are not the fir...
Problems and Experiments
39
     (3) Analyzing the Semantic Web II - Content-Based Analysis
     ■ Again we are not the fir...
Problems and Experiments
39
     (3) Analyzing the Semantic Web II - Content-Based Analysis
     ■ Again we are not the fir...
Problems and Experiments
40
     (4) Analyzing the Semantic Web III - Data Cleansing
     ■ trying to clean out Linked Ope...
Problems and Experiments
41
     (5) Analyzing the Semantic Web IV - Data Ranking
            ■ Linked Data provides (unbi...
Problems and Experiments
42
     (5) Analyzing the Semantic Web IV - Data Ranking
            ■ We have developed heuristi...
Problems and Experiments
42
     (5) Analyzing the Semantic Web IV - Data Ranking
            ■ We have developed heuristi...
Problems and Experiments
42
     (5) Analyzing the Semantic Web IV - Data Ranking
            ■ We have developed heuristi...
Problems and Experiments
42
     (5) Analyzing the Semantic Web IV - Data Ranking
            ■ We have developed heuristi...
Problems and Experiments
43
     (6) Semantic Web Infrastructure - Tripel Stores
       ■ RDF(S) Data is stored in Triple ...
Problems and Experiments
44
     Experiment Summary
            (1) Crawling the Semantic Web
            (2) Structural A...
Linked Open Data Universe
45
     • Topic: Semantic Web and Linked Data
     • Problem Defintion and Experiments
     • App...
Application: Exploratory Multimedia Search
46    Yovisto semantic video search engine
      ■ specialized on academic vide...
Application: Exploratory Multimedia Search
47
       ■ Semantic              Annotation

 Metadata Extraction             ...
Application: Exploratory Multimedia Search
47
        ■ Semantic              Annotation

 Metadata Extraction            ...
Application: Exploratory Multimedia Search
47
        ■ Semantic              Annotation

 Metadata Extraction            ...
Application: Exploratory Multimedia Search
48
      Exploratory Search
        • Is a kind of investigation task, where th...
history



                                   search
                                   term




                         ...
Linked Open Data Universe
50
     • Topic: Semantic Web and Linked Data
     • Problem Defintion and Experiments
     • App...
Upcoming SlideShare
Loading in...5
×

Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab

2,776

Published on

Published in: Technology, Education
2 Comments
8 Likes
Statistics
Notes
  • Redo as formatting went wrong:

    Nice presentation!

    Little points to fix., if possible.

    35/71

    -- use generic HTTP URIs so that you can use a single URI for Referring-to Things and Looking up (Accessing) their Structured Representation

    -- use RDF to produce Structured Representations of these Things. 14/71

    -- Virtuoso Sponger (a bubble in the LOD Cloud) enables generation of an RDF based description for any HTTP accessible resource.

    The URIBurner instance at (http://uriburner.com and the LOD Cloud Cache at:

    http://lod.openlinksw.com are live examples of progressively populated Linked Data Spaces).

    -- PingTheSemanticWeb (PTSW) is another progressively populated Linked Data Space

    -- PTSW + Virtuoso Sponger add a dynamic dimension to the LOD triple count game which basically takes LOD (deep Web) way beyond 15 Billion Triples, In short, we’ve stopped counting :-)


    57/71 Note all of the following showcase Entity Ranking (Data Ranking) based of Link coefficients. You can even order SPARQL queries by Entity Rank.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Nice presentation!
    Little points to fix., if possible.

    35/71
    -- use generic HTTP URIs so that you can use a single URI for Referring-to Things and Looking up (Accessing) their Structured Representation
    -- use RDF to produce Structured Representations of these Things.

    14/71

    -- Virtuoso Sponger (a bubble in the LOD Cloud) enables generation of an RDF based description for any HTTP accessible resource. The URIBurner instance at (http://uriburner.com and the LOD Cloud Cache at: http://lod.openlinksw.com are live examples of progressively populated Linked Data Spaces).

    -- PingTheSemanticWeb (PTSW) is another progressively populated Linked Data Space

    PTSW + Virtuoso Sponger add a dynamic dimension to the LOD triple count game which basically takes LOD (deep Web) way beyond 15 Billion Triples, In short, we've stopped counting :-)

    57/71
    Note all of the following showcase Entity Ranking (Data Ranking) based of Link coefficients;





    You can even order SPARQL queries by Entity Rank.
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total Views
2,776
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
62
Comments
2
Likes
8
Embeds 0
No embeds

No notes for slide

Linked Data Universe - Large Scale Computing Tasks for the HPI FutureSOC-Lab

  1. 1. Linked Open Data Universe Harald Sack Internet Technologies and Systems (ITS) Future Internet Technologies Hasso-Plattner-Institute for IT Systems Engineering 5th Annual Symposium on Future Trends in Service-Oriented Computing June 16th, 2010 Hasso-Plattner-Institute for IT Systems Engineering Potsdam
  2. 2. The Web is huge.... 2 To be more precise, the WWW is rather huge... •more than 25 x 109 documents in Search engine indexes (TNL Blog: Google has 24 billion items index, considers MSN search nearest competitor, September 2005) •Google Web Crawler found more than 1012 documents (The Official Google Blog: We knew the Web was Big....., Juli 25, 2008) •New Google Search Index Caffeine comprises 100 Million Gigabytes of data i.e. 1017 Byte (SMX Video: Google’s Matt Cutts On Caffeine Launch, June 9, 2010, http://searchengineland.com/smx-video-googles-matt-cutts-on-caffeine-launch-43933) •And then, there is also the DeepWeb (Darkweb) ...and it is supposed to be up to 500 time larger than the Surface Web (Bergman, 2001) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  3. 3. The Web is growing... 3 Multimedia, Real-Time Data, Sensor Data, .... in 06/2010: 7 TB/day in 05/2010: • 24 h of video upload / minute • 2 billion streamed videos per day JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  4. 4. The Web is growing... 3 Multimedia, Real-Time Data, Sensor Data, .... in 06/2010: 7 TB/day in 05/2010: • 24 h of video upload / minute • 2 billion streamed videos per day JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  5. 5. How to find something on the Web? 4 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  6. 6. The ‘Web of Data‘ 5 Semantic Web Technologies • Interoperable and machine understandable data semantics • Based on formal knowledge representations • Creating a ‘Web of Data‘ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  7. 7. Linked Open Data Universe 6 • Topic: Semantic Web and Linked Data • Problems and Experiments • Application: Exploratory Multimedia Search JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  8. 8. Semantic Web and Linked Data 7 From World Wide Web to Web of Data „The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help… “ Tim Berners-Lee, Semantic Web Roadmap, Sept 1998 Prerequisites: Semantic Web • Content can be read and • (natural language) web content is interpreted correctly explicitely annotated with semantic (=understood) by metadata machines • semantic metadata encode the meaning (semantics) of web content and can be read and Natural Language Processing interpreted correctly my machine • Technology from traditional Information Retrieval (WWW Search Engines) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  9. 9. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information Retrieval (WWW Search Engines) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  10. 10. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information Retrieval (WWW Search Engines) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  11. 11. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information text: „FAB“ Retrieval (WWW Search Engines) Entity Mapping ? Disambiguation fabulous ? ? ... JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  12. 12. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information text: „FAB“ Retrieval (WWW Search Engines) Entity Mapping ? Disambiguation fabulous ? ? ... Manager of Fabio Capello UK National Football Team JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  13. 13. Semantic Web and Linked Data 8 Understanding Web Content - I Natural Language Processing • Technology from traditional Information text: „FAB“ Retrieval (WWW Search Engines) Entity Mapping ? Disambiguation fabulous ? ? ... Manager of Fabio Capello UK National Football Team Goal Keeper of David James UK National Football Team JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  14. 14. Semantic Web and Linked Data 9 Understanding Web Content - II text: „FAB“ Entity Mapping Fabio Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  15. 15. Semantic Web and Linked Data 9 Understanding Web Content - II text: „FAB“ Entity Mapping Fabio Capello is a Soccer Manager JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  16. 16. Semantic Web and Linked Data 9 Understanding Web Content - II text: „FAB“ Entity Mapping Fabio Capello is a Soccer Manager is a Person JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  17. 17. Semantic Web and Linked Data 10 Understanding Web Content - III Fabio Capello (entity) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  18. 18. Semantic Web and Linked Data 10 Understanding Web Content - III Fabio Capello (entity) Class- is a membership has type Soccer Manager (class) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  19. 19. Semantic Web and Linked Data 10 Understanding Web Content - III Fabio Capello (entity) Class- is a membership has type Soccer Manager (class) subclass is a is subclass of superclass Person (class) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  20. 20. Semantic Web and Linked Data 11 Understanding Web Content - IV Fabio Capello Entities is a Classes Soccer Manager is a Person JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  21. 21. Semantic Web and Linked Data 11 Understanding Web Content - IV Fabio Capello Entities is a Classes Soccer Manager is a Person Place hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  22. 22. Semantic Web and Linked Data 11 Understanding Web Content - IV Fabio Capello Entities is a Classes Soccer Manager is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  23. 23. Semantic Web and Linked Data 11 Understanding Web Content - IV 1946-06-18 Fabio Capello hasBirthDate Entities is a Classes is a Soccer Manager is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  24. 24. Semantic Web and Linked Data 11 Understanding Web Content - IV 1946-06-18 Fabio Capello San Canzian d‘Isonzo hasBirthDate hasBirthPlace Entities is a Classes is a Soccer Manager is a is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  25. 25. Semantic Web and Linked Data 12 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  26. 26. Semantic Web and Linked Data 13 URI - Uniform Resource Identifier Fabio Capello http://dbpedia.org/resource/Fabio_Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  27. 27. Semantic Web and Linked Data 14 http://en.wikipediapedia.org/resource/Fabio_Capello http://dbpedia.org/resource/Fabio_Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  28. 28. Semantic Web and Linked Data 15 http://dbpedia.org/resource/Fabio_Capello JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  29. 29. Semantic Web and Linked Data 16 http://dbpedia.org/resource/Fabio_Capello :Fabio_Capello dbpp:birthPlace :San_Canzian_d%27Isonzo . :Fabio_Capello dbpp:birthDate “1946-06-18“ . :Fabio_Capello rdfs:type dbpo:SoccerManager . :Fabio_Capello rdfs:type dbpo:Person . ... RDF Resource Description Framework :Fabio_Capello rdf:type dbpo:SoccerManager . RDF Tripel RDF Subject RDF Property RDF Object JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  30. 30. Semantic Web and Linked Data 17 http://dbpedia.org/ontology/soccer_manager dbpo:SoccerManager rdf:type owl:class . dbpo:SoccerManager rdfs:subClassOf dbpo:Person . dbpo:SoccerManager rdfs:label “Soccer Manager“ . dbpp:birthPlace rdf:type rdf:Property . dbpp:birthPlace rdfs:domain dbpo:Person . dbpp:birthPlace rdfs:range dbpo:Place . dbpp:birthDate rdf:type rdf:Property . dbpp:birthDate rdfs:domain :Person . dbpp:birthDate rdfs:range xsd:date . ... RDF Schema Soccer Manager is a Date Person Place hasBirthDate hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  31. 31. Semantic Web and Linked Data 18 Understanding Web Content - V 1946-06-18 Fabio Capello hasBirthDate is a logical constraint is a LivingPeople ∩ DeadPeople =∅ is a is a Date Person hasBirthDate ∀x.∃y.hasDeathDate(x,y) ∧ Person(x) ∧ Date(y) + Rules → DeadPeople(x) (Description Logics) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  32. 32. Semantic Web and Linked Data 19 Select all players of a soccer nationalteam that have scored more than 10 goals while in the team SELECT DISTINCT ?l ?l2 ?g FROM <http://dbpedia.org> WHERE { ?s dbpp:nationalteam ?o . ?s rdfs:label?l FILTER langMatches( lang(?l), "EN" ) . ?s dbpp:nationalgoals ?g FILTER(?g>10). ?s dbprop:nationalteam ?nat . ?nat rdfs:label ?l2 FILTER langMatches( lang(?l2), "EN" ). } ORDER BY DESC(?g) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  33. 33. Semantic Web and Linked Data 20 Select all players of a soccer nationalteam that have scored more than 10 goals while in the team JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  34. 34. Semantic Web and Linked Data 21 Linked Data ■ Term was originally coined by Tim Berners-Lee (Tim Berners-Lee, Linked Data, 2006, http://www.w3.org/DesignIssues/LinkedData.html) The Web of data is about a data (RDF) and naming (URI) model on the Web M.Hausenblas, Quick Linked Data Introduction, http://www.slideshare.net/mediasemanticweb/quick-linked-data-introduction JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  35. 35. Semantic Web and Linked Data 22 Linked Data ■ Technical Principles □ use URIs to identify things uniquely (not only documents...) □ use HTTP URIs (URLs) so that these things can be referred to and looked up ("dereferenced") by people and user agents □ use RDF as an universal data model to provide useful information about these things □ include links to other, related URIs in the exposed data to improve discovery of other related information on the Web. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  36. 36. Semantic Web and Linked Data 23 Linked Data □ The application lf the Linked Data principles leads to the creation of a ,Web of Data‘ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  37. 37. Semantic Web and Linked Data 24 Linking Open Data ■ Public available structured data should be published as Linked Data ■ Various data sources should be interlinked LOD-WikiPage: http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData/ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  38. 38. Semantic Web and Linked Data 25 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  39. 39. Semantic Web and Linked Data 26 Linked Data Achievments ■ Extension of the Web with a data commons (14b RDF triples = facts) ■ Vibrant global RTD community ■ Industrial uptake starting (BBC, Thomson, Reuters, etc.) ■ Emerging governmental adoption in sight ■ Establishing Linked Data as a deployment path for the Semantic Web JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  40. 40. Semantic Web and Linked Data 27 Linked Data Challenges ■ Coherence relatively few, expensively maintained links ■ Quality partly low quality data and inconsistencies ■ Performance still substantial penalties compared to relational database technologies Sören Auer:"Linked Data: Now what?" ■ Data consumption ESWC2010 Panel Discussion large scale processing, schema mapping and data fusion still in its infancy ■ Usability Missing direct end user tools and network effect JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  41. 41. Linked Open Data Universe 28 • Topic: Semantic Web and Linked Data • Problems and Experiments • Application: Exploratory Multimedia Search JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  42. 42. Problems and Experiments 29 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  43. 43. Problems and Experiments 30 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  44. 44. Problems and Experiments 31 A. Hoigan et al: Weaving the Pedantic Web, LDOW 2010 JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  45. 45. Problems and Experiments 32 Experiment Summary (1) Crawling the Semantic Web (2) Structural Analysis (3) Content-based Analysis (4) Data Cleansing (5) Heuristics for Ranking Semantic Web Data (6) Augmenting Semantic Web Infrastructure JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  46. 46. Problems and Experiments 33 So what? ■ Interesting Facts to find out about Semantic Web & Linked Data ■ How big is the Semantic Universe? ■ # tripel ■ # documents ■ # interlinking ■ Linking Open Data is only registered vocabulary/data in the LOD-Wiki → 14b RDF triples ■ What else is out there ... and how much of it? ■ ...and how do we get it? JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  47. 47. Problems and Experiments 34 (1) Crawling the Semantic Web ■ Of course we are not the first to be out there... ■ Swoogle Li Ding et al: Finding and Ranking Knowledge on the Semantic Web, ISWC 2005. ■ Scutter/Slug Leigh Dodds: Slug: A Semantic Web Crawler, 2006 ■ Sindice Giovanni Tumarello et al: Sindice.com - weaving the open linked data, ISWC 2007 → 2.1b RDF triples ■ SWSE Andreas Harth et al: SWSE: Objects before Documents, Semantic Web Challenge 2008, ISWC 2008 → 1.1b RDF triples ■ Falcons G.Cheng et al.:Falcons: Searching and Browsing Entities on the Semantic Web, WWW17 2008. → 2.9b RDF triples JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  48. 48. Problems and Experiments 35 (1) Crawling the Semantic Web ■ First experiments: ■ Adapting & Improving Slug Crawler ■ for parallelization (48 Cores) and ■ lots of RAM (256GB - 2TB) ■ first test run: >1GB RDF data/1h ■ What‘s new: ■ crawl not only RDF/RDFS and OWL resources ■ include (X)HTML with RDFa extensions and ■ dynamic documents with (semantic) sitemaps ■ What‘s next...? JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  49. 49. Problems and Experiments 36 (2) Analyzing the Semantic Web I - Structural Analysis ■ Again we are not the first to be out there... ■ Structural Analysis of the ,early‘ WWW unconnected components appendices appendices IN SCC OUT 44m nodes 56m nodes 44m nodes tunnels A. Broder et al.: Graph structure in the Web. unconnected components In Comput. Netw. 33, 1-6 (Jun. 2000), 309-320. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  50. 50. Problems and Experiments 37 (2) Analyzing the Semantic Web I - Structural Analysis ■ Again we are not the first to be there... ■ Structural Analysis of the ,early‘ Semantic Web Weiyi Ge et al.: Object Link Structure in the Semantic Web, ESWC 2010 ■ Experimental Setup ■ 18m RDF documents (Falcons crawl 2009) ■ 110m nodes with 190m edges ■ Analysis of RDF link graph ■ average node degree: ≈3.4 ■ effective diameter: ≈11.5 ■ Largest connected component: ≈88% of all nodes JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  51. 51. Problems and Experiments 38 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... A. Hoigan et al: Weaving the Pedantic Web, LDOW 2010 ■ 150k documents with more than 12m RDF triples ■ Discovered categories of symptoms: ■ incomplete → dead links ■ incoherent → no correct interpretation (local) ■ hijack → no correct interpretation (remote) ■ inconsistent → contradictions http://pedantic-web.org/ JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  52. 52. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  53. 53. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace is a Person JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  54. 54. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace is a Person Place hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  55. 55. Problems and Experiments 39 (3) Analyzing the Semantic Web II - Content-Based Analysis ■ Again we are not the first to be there... Urbani et al: OWL Reasoning with WebPIE: Calculating the Closure of 100 Billion Triples, ESWC 2010 ■ Artificial Benchmark dataset used Leigh University Benchmark (LUBM) with 100b RDF triples ■ Computing the transitive closure (= reasoning) ■ Making implicit knowledge explicit Fabio Capello San Canzian d‘Isonzo hasBirthPlace class membership is a can be deduced Person Place hasBirthPlace JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  56. 56. Problems and Experiments 40 (4) Analyzing the Semantic Web III - Data Cleansing ■ trying to clean out Linked Open Data and possibly also (partially) the Semantic Web... (1) Identify inconsistencies and ambiguities by (automated) content-based analysis (2) Solve inconsistencies & ambiguities ■ if possible by reasoning ■ else by crowdsourcing (game-based evaluation, etc.) Cleaning out the Augean stables... AUGEAN-STABLES: Extremely nasty and smelly warehouses of filth, straw and manure JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  57. 57. Problems and Experiments 41 (5) Analyzing the Semantic Web IV - Data Ranking ■ Linked Data provides (unbiased) knowledge ■ unbiased = no distinction of what is important, what is not important ■ e.g., Albert Einstein ■ > 600 facts (triples) ■ > 80 properties ■ no ranking ■ no relevance http://dbpedia.org/page/Albert_Einstein JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  58. 58. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type :Albert_Einstein :Scientist JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  59. 59. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type :Albert_Einstein :Scientist rdf:type :Alfred_Kleiner JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  60. 60. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type rdf:type :Albert_Einstein :Scientist :Bill_Cosby rdf:type :Alfred_Kleiner JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  61. 61. Problems and Experiments 42 (5) Analyzing the Semantic Web IV - Data Ranking ■ We have developed heuristics for ranking objects and properties, e.g. :AmericanVegetarian rdf:type rdf:type rdf:type :Albert_Einstein :Scientist :Bill_Cosby rdf:type :doctoralAdviser :Alfred_Kleiner considered to be relevant JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  62. 62. Problems and Experiments 43 (6) Semantic Web Infrastructure - Tripel Stores ■ RDF(S) Data is stored in Triple Stores ■ Basic idea: ■ Use 1 table with 3 columns (s,p,o) ■ For every row / row combination create index structures for fast access (spo, sop, pos, pso, ops, osp) ■ Drawback: many self-joins needed (memory consumption) JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  63. 63. Problems and Experiments 44 Experiment Summary (1) Crawling the Semantic Web (2) Structural Analysis (3) Content-based Analysis (4) Data Cleansing (5) Heuristics for Ranking Semantic Web Data (6) Augmenting Semantic Web Infrastructure JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  64. 64. Linked Open Data Universe 45 • Topic: Semantic Web and Linked Data • Problem Defintion and Experiments • Application: Exploratory Multimedia Search JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  65. 65. Application: Exploratory Multimedia Search 46 Yovisto semantic video search engine ■ specialized on academic video content, e.g., lecture recordings ■ enables to search within the content of video ■ automated video analysis: video scene cut detection, intelligent character recognition, complemented by collaborative user annotation ■ more than 8.000h of video Semantic Metadata: http://www.yovisto.com ■ Ontology: http://www.yovisto.com/ontology/0.9/ ■ DBpedia, FOAF, DublinCore, MPEG-7, Tagging ■ RDFa annotation ■ public SPARQL Endpoint: http://sparql.yovisto.com/ J. Waitelonis, H. Sack: Augmenting Video Search with Linked Open Data, in Proc. of International Conference on Semantic Systems 2009 (i-semantics 2009), September, 2-4, 2009, Graz, Journal of JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam Universal Computer Science
  66. 66. Application: Exploratory Multimedia Search 47 ■ Semantic Annotation Metadata Extraction time JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  67. 67. Application: Exploratory Multimedia Search 47 ■ Semantic Annotation Metadata Extraction time Entity Recognition/ Mapping e.g., person xy location yz event abc JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  68. 68. Application: Exploratory Multimedia Search 47 ■ Semantic Annotation Metadata Extraction time Entity Recognition/ Mapping e.g., person xy location yz event abc e.g., bibliographical data, geographical data, encyclopedic data, .. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  69. 69. Application: Exploratory Multimedia Search 48 Exploratory Search • Is a kind of investigation task, where the user is (a) not familiar with the domain of the search result, i.e. before entering appropriate keywords, she needs to learn about the domain (b) not sure about the way how to reach search destination (concerning search process and search technology) (c) not really sure about what she’s looking for, i.e. “Can you please find something out about ... ?”. „Which modern philosophers build on the theories of the greek philosopher Plato?“ White, R.W., Kules, B., Drucker, S.M., and schraefel, M.C.Supporting Exploratory Search, Introduction to Special Section of Communications of the ACM, Vol. 49, Issue 4, (2006), pp. 36-39. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  70. 70. history search term related resources with properties 29 Waitelonis, Sack: Augmenting Video Search with Linked Open Data, in Proc. I-Semantics , Graz 2009. JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  71. 71. Linked Open Data Universe 50 • Topic: Semantic Web and Linked Data • Problem Defintion and Experiments • Application: Exploratory Multimedia Search Thank you for your Attention! JHarald Sack, 5th Annual Symposium on Future Trends in Service-Oriented Computing, June 16th, 2010, HPI, Potsdam
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×