SlideShare a Scribd company logo
1 of 50
How to Juggle with more
than a Billion Triples?

Ansgar Scherp
Research Group on Data and
Web Science

Universität Mannheim
October 2012
                                                                                             Image source:
                                              http://www.flickr.com/photos/pedromourapinheiro/2122754745/ 1
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                                                    Slide
My thanks go to …
•    Marianna                                       •   Daniel Eißing
•    Simon Schenk                                   •   Mathias Konrath
•    Carsten Saathoff                               •   Daniel Schmeiß
•    Thomas Franz                                   •   Anton Baumesberger
•    Thomas Gottron                                 •   Frederik Jochum
•    Steffen Staab                                  •   Alexander Kleinen
•    Arne Peters
•    Bastian Krayer                                      And many more …


Ansgar Scherp – ansgar@informatik.uni-mannheim.de                      Slide 2
Scenario

• Tim plans to travel
  – from London
  – to a customer in Cologne




Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 3
Website of the German Railway




It works, why bother…?
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 4
Let„s Try Different Queries

 Bottlenecks in public transportation?
 Compare the connections with flights?
 Visualize on a map?
…


 All these queries cannot be answered,
  because the data …


Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 5
… locked in Silos!


 – High Integration Effort
 – Lack in Reuse of Data
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                                                           Slide 6
                                                    B. Jagendorf, http://www.flickr.com/photos/bobjagendorf/, CC-BY
Linked Data
• Publishing and interlinking of data
• Different quality and purpose
• From different sources in the Web

          World Wide Web                                Linked Data
        Documents                                   Data
        Hyperlinks                                  Typed Links
        HTML                                        RDF
        Addresses (URIs)                            Addresses (URIs)

Example: http://www.uni-mannheim.de/
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                      Slide 7
Relevance of Linked Data?




Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 8
Linked Data: May „07                                           Sept. „11
                                                Web 2.0


                                Media



                                                                             Publications

   eGovernment

                                 Cross-Domain



                                                            Life
               Geographic                                 Sciences



Ansgar Billion–Triples
< 31 Scherp ansgar@informatik.uni-mannheim.de                        Source: http://lod-cloud.net
                                                                                           Slide 9
Linked Data Principles


1.        Identification
2.        Interlinkage
3.        Dereferencing
4.        Description




Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 10
Example: Big Lynx
                                               Matt Briggs




                                             Scott Miller
                                                               ?
                                                              Big Lynx
                                                              Company




Ansgar Scherp – ansgar@informatik.uni-mannheim.de
< 31 Milliarde Triple                                        Source: http://lod-cloud.net
                                                                                   Slide 11
1. Use URIs for Identification




 Matt Briggs


                                                                              Scott Miller
         http://biglynx.co.uk/
         people/matt-briggs
                                                                         http://biglynx.co.uk/
                                                                         people/scott-miller

Ansgar Scherp – ansgar@informatik.uni-mannheim.de
                   B. Gazen,http://www.flickr.com/photos/bayat/, CC-BY                           Slide 12
Example: Big Lynx
                                               Matt Briggs




                                             Scott Miller
                                                             Big Lynx
                                                             Company



 How to model relationships like knows?

Ansgar Scherp – ansgar@informatik.uni-mannheim.de                       Slide 13
Resource DescriptionFramework (RDF)
• Description of Ressources with RDF triple
            Matt Briggs                               is a      Person


                  Subject                           Predicate    Object

@prefix rdf:<http://w3.org/1999/02/22-rdf-
      syntax-ns#> .
@prefix foaf:<http://xmlns.com/foaf/0.1/> .
<http://biglynx.co.uk/people/matt-briggs>
    rdf:type foaf:Person .
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                         Slide 14
1. Use URIs also for Relations




        http://biglynx.co.uk/
        people/matt-briggs

                                                                         http://biglynx.co.uk/
                                                                         people/scott-miller

Ansgar Scherp – ansgar@informatik.uni-mannheim.de
                   B. Gazen,http://www.flickr.com/photos/bayat/, CC-BY                           Slide 15
Example: Big Lynx
                                                             Dave Smith
         London
                                       „lives here―

                                             Matt Briggs

                                              „same
                                             Scott Miller
                                                            Big Lynx
                          …                     person―
                                                            Company

           DBpedia                                           Matt Briggs

                              Matts private
                              Webseite
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                      Slide 16
2. Establishing Interlinkage
• Relation links between ressources
       <http://biglynx.co.uk/people/dave-smith>
           foaf:based_near
           <http://dbpedia.org/resource/London> .


 Identity links between ressources
    <http://biglynx.co.uk/people/matt-briggs>
        owl:sameAs
         <http://www.matt-briggs.eg.uk#me> .
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 17
Example: Big Lynx
                                                            Dave Smith
         London
                                      „lives here―
                                    foaf:based_near


                                             Matt Briggs

                                              „same
                                             owl:sameAs
                                              Person―      Big Lynx
                                                           Company

           DBpedia                                          Matt Briggs

                              Matts private
                              Webseite
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                     Slide 18
3. Dereferencing of URIs

• Looking up of web documents

• How can we ―look up‖ things of the real world?




                                 http://biglynx.co.uk/
                                 people/matt-briggs


Ansgar Scherp – ansgar@informatik.uni-mannheim.de        Slide 19
Two Approaches
1. Hash URIs
   – URI contains a part separated by #, e.g.,
    http://biglynx.co.uk/vocab/sme#Team

2. Negotiation via „303 See Other― request
      http://biglynx.co.uk/people/matt-briggs
      Response: „Look here:―
      http://biglynx.co.uk/people/matt-briggs.rdf


Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 20
Example: Big Lynx
                                                           Dave Smith
         London
                                    foaf:based_near


                               Description of
                                     Matt Briggs
                               Matt?
                                             owl:sameAs
                                                          Big Lynx
                                                          Company

           DBpedia                                         Matt Briggs

                              Matts private
                              Webseite
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                    Slide 21
4. Description of URIs
                  foaf:Person                                                …
…                                                    dp:Birmingham
                              rdf:type
                                                    foaf:based_near          …

             biglynx:matt-briggs                    ex:loc
                                                              _:point
                              foaf:knows
                                                                          wgs84:
                                                         wgs84:             long
            biglynx:dave-smith
                                                         lat
                                                                        ―-0.118‖
                              foaf:based_near
                                                             ―51.509‖
                   dp:London

        …                                           …
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                            Slide 22
Formalization of Description
 Given a RDF graph G (V , P, E ) with
  V R B L and E ( R B) P V

                                         ∩∞
 SimpleCBD(n) =                                    I j with
                                        j=0

        I 0 = { (s, p, o) | (s, p, o)                          E     s=n}

     I j+1 = { (o, p‗, o‗)                    E|        (s, p, o)       Ij : o   B
                                                                                 ∩j
                                                                   (o, p‗, o‗)        Ik}
                                                                                 k=0

Ansgar Scherp – ansgar@informatik.uni-mannheim.de                                     Slide 23
W3C RDF / RDF Schema Vocabulary
•    Set of URIs defined in rdf:/rdfs: namespace
•    rdf:type               • rdfs:domain
•    rdf:Property           • rdfs:range
•    rdf:XMLLiteral         • rdfs:Resource
•    rdf:List               • rdfs:Literal
•    rdf:first              • rdfs:Datatype
•    rdf:rest               • rdfs:Class
•    rdf:Seq                • rdfs:subClassOf
•    rdf:Bag                • rdfs:subPropertyOf
•    rdf:Alt                • rdfs:comment
•    ...                    • …
•    rdf:value              • rdfs:label
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 24
Semantic Web Layer Cake (Simplified)




Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 25
Exploration of Linked Data


                             Word
                             Net




         Swoogle

                                                Geo
                                               Names
Ansgar Scherp – ansgar@informatik.uni-mannheim.de
< 31 Billion Triples                                   Source: http://lod-cloud.net
                                                                             Slide 26
Naive Approach
• Download all data
• Store in really big
  database                                                               RDFS
• Programming of                                    WordNet              Rules
  queries                                           Swoogle               Geo
• Design of
  user interface                                     GeoNames

                                                Inflexible           Monolithic
                                                                Not
Ansgar Scherp – ansgar@informatik.uni-mannheim.de
                                                             scaleable
                                                                                 Slide 27
SemaPlorer Approach
                                                                             Flexible

                                                                               Extensible

                                                                                 Scaleable
                                                    birthplace



                              placeOfBirth
                               birthplace

                                                                    Geo
               RDFS             Rules          Fulltext            Queries     > 1 Billion
                                                                                 Triples
             WordNet      +              +   Swoogle      +   +   GeoNames
                                                                      12 Month in 2005/06
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                       700 Mio. Triple Slide 28
SemaPlorer – Semantic Social Media




Ansgar Scherpvideo online: http://vimeo.com/2057249
    Watch – ansgar@informatik.uni-mannheim.de         Slide 29
Billion Triple Challenge 2008




                                                    [JWS 2009]
Ansgar Scherp – ansgar@informatik.uni-mannheim.de          Slide 30
Searching for Linked Data Sources




                                                      ?
       Persons that are
       - Politicians and
       - Actors
       ?




<Ansgar Scherp – ansgar@informatik.uni-mannheim.de
  31 Milliarde Triples                               Quelle: http://lod-cloud.net
                                                                           Slide 31
Idea: Index of Data Sources
SELECT ?x
FROM …
WHERE {
 ?x rdf:type ex:Actor .
 ?x rdf:type ex:Politician .
}

                                 Index


                                        ?
           Query

  “Politician and
      Actor”
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 32
The Naive Approach
1.     Download the entire LOD cloud
2.     Put it into a (really) large triple store
3.     Process the data and extract schema
4.     Provide lookup

- Big machinery
- Late in processing the data
- High effort to scale with LOD cloud



Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 33
Idea
 Schema-level index
   Define families of graph patterns
   Assign instances to graph patterns
   Map graph patterns to context (source URI)
 Construction
   Stream-based for scalability
   Little loss of accuracy
 Note
   Index defined over instances
   But stores the context
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 34
Input Data
 n-Quads
         <subject> <predicate> <object> <context>
 Example:
            <http://www.w3.org/People/Connolly/#me>
            <http://www.w3.org/1999/02/22-rdf-syntax-ns#
            <http://xmlns.com/foaf/0.1/Person>
            <http://dig.csail.mit.edu/2008/webdav/timbl/
                             http://dig.csail.mit.edu/2008/
                             webdav/timbl/foaf.rdf
                          w3p:
                          #me
                                                       foaf:
                                                      Person



Ansgar Scherp – ansgar@informatik.uni-mannheim.de              Slide 35
SchemEX Approach
• Stream-based schema extraction
• While crawling the data


                                          FIFO
LOD-Crawler                                         Instance-
 RDF-Dump                                             Cache      RDF
 Triple Store                                                   RDBMS
                              NxParser

    Nquad-                                          Schema-     Schema-
                                Parser
    Stream                                          Extractor    Level
                                                                 Index
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                   Slide 36
Building the Index from a Stream
 Stream of n-quads (coming from a LD crawler)
      … Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1



                                                         FiFo
                                                                     1
                                                    C3    4
                                                                     6
                                                    C2    3
                                                                     4
                                                          2
                                                    C2               2
                                                          1              3
                                                    C1               5



• Linear runtime complexity wrt # of input triples
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                            Slide 37
Building the Schema and Index
                                                                      RDF
      C1                C2              C3               …    Ck
                                                                     classes
                                         consistsOf
                                                                      Type
        TC1                     TC2                      …   TCm     clusters
hasEQ
Class                 p1                            p2
       EQC1                   EQC2                       … EQCn    Equivalence
                                                                     classes
                                            hasDataSource

                                                         …           Data
  DS1 DS2 DS3 DS4 DS5                                        DSx    sources
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                          Slide 38
Layer 1: RDF Classes
 All instances of a                                                   C1
  particular type
                                                            DS 1      DS 2        DS 3

 SELECT ?x
 FROM …
 WHERE {
    ?x rdfs:type foaf:Person .
                           foaf:Person
 }

                                                                   http://dig.csail.mit.edu/2008/...
                                foaf:
 timbl:                        Person
 card#i                                             http://www.w3.org/People/Berners-Lee/card



Ansgar Scherp – ansgar@informatik.uni-mannheim.de                                          Slide 39
Layer 2: Type Clusters
 All instances belonging                                        C1         C2

  to exactly the same set
                                                                      TC1
  of types
 SELECT ?x                     DS 1      DS 2    DS 3
 FROM …
 WHERE {
                            foaf:Person       pim:Male
    ?x rdfs:type foaf:Person .
    ?x rdfs:type pim:Male .           tc4711
 }
                       pim:
                       Male
                                                    http://www.w3.org/People/Berners-Lee/card
                                     foaf:
 timbl:
                                    Person
 card#i
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                                      Slide 40
Layer 3: Equivalence Classes
 Two instances are                                  C1           C2         C3

  equivalent iff:
    They are in the same TC                               TC1               TC2

    They have the same                                                p
     properties
                                                           EQC1
    The property targets are
     in the same TC                                 DS 1     DS 2          DS 3




  Similar to 1-Bisimulation
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                            Slide 41
Layer 3: Equivalence Classes
SELECT ?x
WHERE {
   ?x rdfs:type foaf:Person foaf:Person
                            .
   ?x rdfs:type pim:Male .            pim:Male foaf:PPD
   ?x foaf:maker ?y .
   ?y rdfs:type
      foaf:PersonalProfileDocument .
                                 tc4711         tc1234
}                                       eqc0815
                                                                          -maker-
 pim:           foaf:                foaf:                                 tc1234
 Male          Person                PPD
                                                                eqc0815
                                                                               foaf:maker


                                  timbl:            http://www.w3.org/People/Berners-Lee/card
      timbl:                       card
      card#i
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                                       Slide 42
Computing SchemEX: TimBL Data Set
• Analysis of a smaller data set
• 11 M triples, TimBL‘s FOAF profile
• LDspider with ~ 2k triples / sec


•   Different cache sizes: 100, 1k, 10k, 50k, 100k
•   Compared SchemEX with reference schema
•   Index queries on all Types, TCs, EQCs
•   Good precision/recall ratio at 50k+
• Commodity hardware (4GB RAM, single CPU)
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 43
Quality of Stream-based Index
Construction




+ Runtime increases hardly with window size
+ Memory consumption scales with window size
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 44
Computing SchemEX: Full BTC 2011 Data




Cache size: 50 k
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 45
Billion Triple Challenge 2011




  …




                                                    [JWS 2012]
Ansgar Scherp – ansgar@informatik.uni-mannheim.de          Slide 46
And 2012? Get the Google Feeling!




Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 47
Semantic Data Management Chain
• Research topics in a greater context

       SchemEX*                                OntoMDE       SemaPlorer*

      Publish                  Collect              Aggregate     Use

      Kreuzverweis.com                              Core Ontologies

                                                            Mobile Facets
* Winner of Billion Triple Challenge 2011/2008
    See at: dws.informatik.uni-mannheim.de 
Ansgar Scherp – ansgar@informatik.uni-mannheim.de                       Slide 48
Ansgar Scherp – ansgar@informatik.uni-mannheim.de   Slide 49
Recommended Readings
• Maciej Janik, Ansgar Scherp, Steffen Staab: The Semantic Web:
  Collective Intelligence on the Web. Informatik Spektrum 34(5): 469-483
  (2011) URL: http://dx.doi.org/10.1007/s00287-011-0535-x
• Simon Schenk, Carsten Saathoff, Steffen Staab, Ansgar Scherp:
  SemaPlorer - Interactive semantic exploration of data and media based on
  a federated cloud infrastructure. J. Web Sem. 7(4): 298-304 (2009)
  URL: http://dx.doi.org/10.1016/j.websem.2009.09.006
• Mathias Konrath, Thomas Gottron, Steffen Staab, Ansgar Scherp:
  SchemEX — Efficient construction of a data catalogue by stream-based
  indexing of linked data, J. of Web Semantics: Science, Services and
  Agents on the World Wide Web, Available online 23 June 2012
  URL: http://www.sciencedirect.com/science/article/pii/S1570826812000716
• Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global
  Data Space, Morgan & Claypool Publishers, 2011
  URL: http://dx.doi.org/10.2200/S00334ED1V01Y201102WBE001



Ansgar Scherp – ansgar@informatik.uni-mannheim.de                    Slide 50

More Related Content

Viewers also liked

Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Thomas Gottron
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...Ansgar Scherp
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationAnsgar Scherp
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataThomas Gottron
 
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...Thomas Gottron
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataThomas Gottron
 
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Thomas Gottron
 
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources Thomas Gottron
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresThomas Gottron
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestAnsgar Scherp
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Thomas Gottron
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Ansgar Scherp
 
 Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities Challenges in Managing Online Business Communities
 Challenges in Managing Online Business CommunitiesThomas Gottron
 
Can you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze InformationCan you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze InformationAnsgar Scherp
 
Challenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open DataChallenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open DataThomas Gottron
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudAnsgar Scherp
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationAnsgar Scherp
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataAnsgar Scherp
 
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...Ansgar Scherp
 
Establishing a Strategy for Data Quality
Establishing a Strategy for Data QualityEstablishing a Strategy for Data Quality
Establishing a Strategy for Data QualityDatabase Answers Ltd.
 

Viewers also liked (20)

Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
Leveraging the Web of Data: Managing, Analysing and Making Use of Linked Open...
 
A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...A Model of Events for Integrating Event-based Information in Complex Socio-te...
A Model of Events for Integrating Event-based Information in Complex Socio-te...
 
A Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document AnnotationA Comparison of Different Strategies for Automated Semantic Document Annotation
A Comparison of Different Strategies for Automated Semantic Document Annotation
 
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open DataOf Sampling and Smoothing: Approximating Distributions over Linked Open Data
Of Sampling and Smoothing: Approximating Distributions over Linked Open Data
 
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
ESWC 2013: A Systematic Investigation of Explicit and Implicit Schema Informa...
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
 
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
Finding Good URLs: Aligning Entities in Knowledge Bases with Public Web Docum...
 
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
From Changes to Dynamics: Dynamics Analysis of Linked Open Data Sources
 
Making Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index StructuresMaking Use of the Linked Data Cloud: The Role of Index Structures
Making Use of the Linked Data Cloud: The Role of Index Structures
 
Smart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interestSmart photo selection: interpret gaze as personal interest
Smart photo selection: interpret gaze as personal interest
 
Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data Perplexity of Index Models over Evolving Linked Data
Perplexity of Index Models over Evolving Linked Data
 
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
Linked Open Data (Entwurfsprinzipien und Muster für vernetzte Daten)
 
 Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities Challenges in Managing Online Business Communities
 Challenges in Managing Online Business Communities
 
Can you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze InformationCan you see it? Annotating Image Regions based on Users' Gaze Information
Can you see it? Annotating Image Regions based on Users' Gaze Information
 
Challenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open DataChallenging Retrieval Scenarios: Social Media and Linked Open Data
Challenging Retrieval Scenarios: Social Media and Linked Open Data
 
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data CloudSchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
SchemEX - Creating the Yellow Pages for the Linked Open Data Cloud
 
Events in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, ApplicationEvents in Multimedia - Theory, Model, Application
Events in Multimedia - Theory, Model, Application
 
Mining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open DataMining and Managing Large-scale Linked Open Data
Mining and Managing Large-scale Linked Open Data
 
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
Identifying Objects in Images from Analyzing the User‘s Gaze Movements for Pr...
 
Establishing a Strategy for Data Quality
Establishing a Strategy for Data QualityEstablishing a Strategy for Data Quality
Establishing a Strategy for Data Quality
 

Similar to Linked open data - how to juggle with more than a billion triples

Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpLinked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpADTELLIGENCE GmbH
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataAnsgar Scherp
 
Twitter Community Extraction by Markov Clustering
Twitter Community Extraction by Markov ClusteringTwitter Community Extraction by Markov Clustering
Twitter Community Extraction by Markov ClusteringkingsBSD
 
The Science Of Social Networks
The Science Of Social NetworksThe Science Of Social Networks
The Science Of Social NetworksEhren Foss
 
New Modes of Research - Teagle Summer Institute - 13_0612
New Modes of Research - Teagle Summer Institute - 13_0612New Modes of Research - Teagle Summer Institute - 13_0612
New Modes of Research - Teagle Summer Institute - 13_0612jeffreylancaster
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatescaise2013vlc
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayatesPROS-UPV
 
Big data in the web
Big data in the webBig data in the web
Big data in the webcaise2013
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL OverviewMarc Smith
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pubStephen Buxton
 
Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016John Casey
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data networkJisc RDM
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXLMarc Smith
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceDomino Data Lab
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningMikel Emaldi Manrique
 
Natural Language Processing - A brief survey of technologies and applications
Natural Language Processing - A brief survey of technologies and applicationsNatural Language Processing - A brief survey of technologies and applications
Natural Language Processing - A brief survey of technologies and applicationsMark Cieliebak
 

Similar to Linked open data - how to juggle with more than a billion triples (20)

Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar ScherpLinked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
Linked Open Data & E-Commerce von Jun.-Prof. Dr. habil. Ansgar Scherp
 
SchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open DataSchemEX -- Building an Index for Linked Open Data
SchemEX -- Building an Index for Linked Open Data
 
Twitter Community Extraction by Markov Clustering
Twitter Community Extraction by Markov ClusteringTwitter Community Extraction by Markov Clustering
Twitter Community Extraction by Markov Clustering
 
The Science Of Social Networks
The Science Of Social NetworksThe Science Of Social Networks
The Science Of Social Networks
 
New Modes of Research - Teagle Summer Institute - 13_0612
New Modes of Research - Teagle Summer Institute - 13_0612New Modes of Research - Teagle Summer Institute - 13_0612
New Modes of Research - Teagle Summer Institute - 13_0612
 
Cloud Creativity
Cloud CreativityCloud Creativity
Cloud Creativity
 
Linked Data past, present and futures
Linked Datapast, present and futuresLinked Datapast, present and futures
Linked Data past, present and futures
 
Big Media: Multimedia goes Big Data
Big Media: Multimedia goes Big DataBig Media: Multimedia goes Big Data
Big Media: Multimedia goes Big Data
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Keynote baezayates
Keynote baezayatesKeynote baezayates
Keynote baezayates
 
Big data in the web
Big data in the webBig data in the web
Big data in the web
 
2009 December NodeXL Overview
2009 December NodeXL Overview2009 December NodeXL Overview
2009 December NodeXL Overview
 
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub2013 10-03-semantics-meetup-s buxton-mark_logic_pub
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
 
Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016Clipper jisc rdn cambridge 2016
Clipper jisc rdn cambridge 2016
 
Clipper, research data network
Clipper, research data networkClipper, research data network
Clipper, research data network
 
Hendrickson data2 2012-gnip
Hendrickson data2 2012-gnipHendrickson data2 2012-gnip
Hendrickson data2 2012-gnip
 
2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL2014 TheNextWeb-Mapping connections with NodeXL
2014 TheNextWeb-Mapping connections with NodeXL
 
GeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data ScienceGeoViz: A Canvas for Data Science
GeoViz: A Canvas for Data Science
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
Natural Language Processing - A brief survey of technologies and applications
Natural Language Processing - A brief survey of technologies and applicationsNatural Language Processing - A brief survey of technologies and applications
Natural Language Processing - A brief survey of technologies and applications
 

More from Ansgar Scherp

Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Ansgar Scherp
 
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...Ansgar Scherp
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Ansgar Scherp
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresAnsgar Scherp
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...Ansgar Scherp
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesAnsgar Scherp
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Ansgar Scherp
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebAnsgar Scherp
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...Ansgar Scherp
 

More from Ansgar Scherp (9)

Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
Analysis of GraphSum's Attention Weights to Improve the Explainability of Mul...
 
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
STEREO: A Pipeline for Extracting Experiment Statistics, Conditions, and Topi...
 
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
Text Localization in Scientific Figures using Fully Convolutional Neural Netw...
 
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly FiguresA Comparison of Approaches for Automated Text Extraction from Scholarly Figures
A Comparison of Approaches for Automated Text Extraction from Scholarly Figures
 
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
About Multimedia Presentation Generation and Multimedia Metadata: From Synthe...
 
Knowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital LibrariesKnowledge Discovery in Social Media and Scientific Digital Libraries
Knowledge Discovery in Social Media and Scientific Digital Libraries
 
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
Formalization and Preliminary Evaluation of a Pipeline for Text Extraction Fr...
 
A Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the WebA Framework for Iterative Signing of Graph Data on the Web
A Framework for Iterative Signing of Graph Data on the Web
 
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...strukt - A Pattern System for Integrating Individual and Organizational Knowl...
strukt - A Pattern System for Integrating Individual and Organizational Knowl...
 

Recently uploaded

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 

Recently uploaded (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 

Linked open data - how to juggle with more than a billion triples

  • 1. How to Juggle with more than a Billion Triples? Ansgar Scherp Research Group on Data and Web Science Universität Mannheim October 2012 Image source: http://www.flickr.com/photos/pedromourapinheiro/2122754745/ 1 Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide
  • 2. My thanks go to … • Marianna • Daniel Eißing • Simon Schenk • Mathias Konrath • Carsten Saathoff • Daniel Schmeiß • Thomas Franz • Anton Baumesberger • Thomas Gottron • Frederik Jochum • Steffen Staab • Alexander Kleinen • Arne Peters • Bastian Krayer And many more … Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 2
  • 3. Scenario • Tim plans to travel – from London – to a customer in Cologne Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 3
  • 4. Website of the German Railway It works, why bother…? Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 4
  • 5. Let„s Try Different Queries  Bottlenecks in public transportation?  Compare the connections with flights?  Visualize on a map? …  All these queries cannot be answered, because the data … Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 5
  • 6. … locked in Silos! – High Integration Effort – Lack in Reuse of Data Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 6 B. Jagendorf, http://www.flickr.com/photos/bobjagendorf/, CC-BY
  • 7. Linked Data • Publishing and interlinking of data • Different quality and purpose • From different sources in the Web World Wide Web Linked Data Documents Data Hyperlinks Typed Links HTML RDF Addresses (URIs) Addresses (URIs) Example: http://www.uni-mannheim.de/ Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 7
  • 8. Relevance of Linked Data? Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 8
  • 9. Linked Data: May „07  Sept. „11 Web 2.0 Media Publications eGovernment Cross-Domain Life Geographic Sciences Ansgar Billion–Triples < 31 Scherp ansgar@informatik.uni-mannheim.de Source: http://lod-cloud.net Slide 9
  • 10. Linked Data Principles 1. Identification 2. Interlinkage 3. Dereferencing 4. Description Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 10
  • 11. Example: Big Lynx Matt Briggs Scott Miller ? Big Lynx Company Ansgar Scherp – ansgar@informatik.uni-mannheim.de < 31 Milliarde Triple Source: http://lod-cloud.net Slide 11
  • 12. 1. Use URIs for Identification Matt Briggs Scott Miller http://biglynx.co.uk/ people/matt-briggs http://biglynx.co.uk/ people/scott-miller Ansgar Scherp – ansgar@informatik.uni-mannheim.de B. Gazen,http://www.flickr.com/photos/bayat/, CC-BY Slide 12
  • 13. Example: Big Lynx Matt Briggs Scott Miller Big Lynx Company  How to model relationships like knows? Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 13
  • 14. Resource DescriptionFramework (RDF) • Description of Ressources with RDF triple Matt Briggs is a Person Subject Predicate Object @prefix rdf:<http://w3.org/1999/02/22-rdf- syntax-ns#> . @prefix foaf:<http://xmlns.com/foaf/0.1/> . <http://biglynx.co.uk/people/matt-briggs> rdf:type foaf:Person . Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 14
  • 15. 1. Use URIs also for Relations http://biglynx.co.uk/ people/matt-briggs http://biglynx.co.uk/ people/scott-miller Ansgar Scherp – ansgar@informatik.uni-mannheim.de B. Gazen,http://www.flickr.com/photos/bayat/, CC-BY Slide 15
  • 16. Example: Big Lynx Dave Smith London „lives here― Matt Briggs „same Scott Miller Big Lynx … person― Company DBpedia Matt Briggs Matts private Webseite Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 16
  • 17. 2. Establishing Interlinkage • Relation links between ressources <http://biglynx.co.uk/people/dave-smith> foaf:based_near <http://dbpedia.org/resource/London> .  Identity links between ressources <http://biglynx.co.uk/people/matt-briggs> owl:sameAs <http://www.matt-briggs.eg.uk#me> . Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 17
  • 18. Example: Big Lynx Dave Smith London „lives here― foaf:based_near Matt Briggs „same owl:sameAs Person― Big Lynx Company DBpedia Matt Briggs Matts private Webseite Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 18
  • 19. 3. Dereferencing of URIs • Looking up of web documents • How can we ―look up‖ things of the real world? http://biglynx.co.uk/ people/matt-briggs Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 19
  • 20. Two Approaches 1. Hash URIs – URI contains a part separated by #, e.g., http://biglynx.co.uk/vocab/sme#Team 2. Negotiation via „303 See Other― request http://biglynx.co.uk/people/matt-briggs Response: „Look here:― http://biglynx.co.uk/people/matt-briggs.rdf Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 20
  • 21. Example: Big Lynx Dave Smith London foaf:based_near Description of Matt Briggs Matt? owl:sameAs Big Lynx Company DBpedia Matt Briggs Matts private Webseite Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 21
  • 22. 4. Description of URIs foaf:Person … … dp:Birmingham rdf:type foaf:based_near … biglynx:matt-briggs ex:loc _:point foaf:knows wgs84: wgs84: long biglynx:dave-smith lat ―-0.118‖ foaf:based_near ―51.509‖ dp:London … … Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 22
  • 23. Formalization of Description  Given a RDF graph G (V , P, E ) with V R B L and E ( R B) P V ∩∞  SimpleCBD(n) = I j with j=0 I 0 = { (s, p, o) | (s, p, o) E s=n} I j+1 = { (o, p‗, o‗) E| (s, p, o) Ij : o B ∩j (o, p‗, o‗) Ik} k=0 Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 23
  • 24. W3C RDF / RDF Schema Vocabulary • Set of URIs defined in rdf:/rdfs: namespace • rdf:type • rdfs:domain • rdf:Property • rdfs:range • rdf:XMLLiteral • rdfs:Resource • rdf:List • rdfs:Literal • rdf:first • rdfs:Datatype • rdf:rest • rdfs:Class • rdf:Seq • rdfs:subClassOf • rdf:Bag • rdfs:subPropertyOf • rdf:Alt • rdfs:comment • ... • … • rdf:value • rdfs:label Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 24
  • 25. Semantic Web Layer Cake (Simplified) Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 25
  • 26. Exploration of Linked Data Word Net Swoogle Geo Names Ansgar Scherp – ansgar@informatik.uni-mannheim.de < 31 Billion Triples Source: http://lod-cloud.net Slide 26
  • 27. Naive Approach • Download all data • Store in really big database RDFS • Programming of WordNet Rules queries Swoogle Geo • Design of user interface GeoNames Inflexible Monolithic Not Ansgar Scherp – ansgar@informatik.uni-mannheim.de scaleable Slide 27
  • 28. SemaPlorer Approach Flexible Extensible Scaleable birthplace placeOfBirth birthplace Geo RDFS Rules Fulltext Queries > 1 Billion Triples WordNet + + Swoogle + + GeoNames 12 Month in 2005/06 Ansgar Scherp – ansgar@informatik.uni-mannheim.de  700 Mio. Triple Slide 28
  • 29. SemaPlorer – Semantic Social Media Ansgar Scherpvideo online: http://vimeo.com/2057249 Watch – ansgar@informatik.uni-mannheim.de Slide 29
  • 30. Billion Triple Challenge 2008 [JWS 2009] Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 30
  • 31. Searching for Linked Data Sources ? Persons that are - Politicians and - Actors ? <Ansgar Scherp – ansgar@informatik.uni-mannheim.de 31 Milliarde Triples Quelle: http://lod-cloud.net Slide 31
  • 32. Idea: Index of Data Sources SELECT ?x FROM … WHERE { ?x rdf:type ex:Actor . ?x rdf:type ex:Politician . } Index ? Query “Politician and Actor” Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 32
  • 33. The Naive Approach 1. Download the entire LOD cloud 2. Put it into a (really) large triple store 3. Process the data and extract schema 4. Provide lookup - Big machinery - Late in processing the data - High effort to scale with LOD cloud Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 33
  • 34. Idea  Schema-level index  Define families of graph patterns  Assign instances to graph patterns  Map graph patterns to context (source URI)  Construction  Stream-based for scalability  Little loss of accuracy  Note  Index defined over instances  But stores the context Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 34
  • 35. Input Data  n-Quads <subject> <predicate> <object> <context>  Example: <http://www.w3.org/People/Connolly/#me> <http://www.w3.org/1999/02/22-rdf-syntax-ns# <http://xmlns.com/foaf/0.1/Person> <http://dig.csail.mit.edu/2008/webdav/timbl/ http://dig.csail.mit.edu/2008/ webdav/timbl/foaf.rdf w3p: #me foaf: Person Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 35
  • 36. SchemEX Approach • Stream-based schema extraction • While crawling the data FIFO LOD-Crawler Instance- RDF-Dump Cache RDF Triple Store RDBMS NxParser Nquad- Schema- Schema- Parser Stream Extractor Level Index Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 36
  • 37. Building the Index from a Stream  Stream of n-quads (coming from a LD crawler) … Q16, Q15, Q14, Q13, Q12, Q11, Q10, Q9, Q8, Q7, Q6, Q5, Q4, Q3, Q2, Q1 FiFo 1 C3 4 6 C2 3 4 2 C2 2 1 3 C1 5 • Linear runtime complexity wrt # of input triples Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 37
  • 38. Building the Schema and Index RDF C1 C2 C3 … Ck classes consistsOf Type TC1 TC2 … TCm clusters hasEQ Class p1 p2 EQC1 EQC2 … EQCn Equivalence classes hasDataSource … Data DS1 DS2 DS3 DS4 DS5 DSx sources Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 38
  • 39. Layer 1: RDF Classes  All instances of a C1 particular type DS 1 DS 2 DS 3 SELECT ?x FROM … WHERE { ?x rdfs:type foaf:Person . foaf:Person } http://dig.csail.mit.edu/2008/... foaf: timbl: Person card#i http://www.w3.org/People/Berners-Lee/card Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 39
  • 40. Layer 2: Type Clusters  All instances belonging C1 C2 to exactly the same set TC1 of types SELECT ?x DS 1 DS 2 DS 3 FROM … WHERE { foaf:Person pim:Male ?x rdfs:type foaf:Person . ?x rdfs:type pim:Male . tc4711 } pim: Male http://www.w3.org/People/Berners-Lee/card foaf: timbl: Person card#i Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 40
  • 41. Layer 3: Equivalence Classes  Two instances are C1 C2 C3 equivalent iff:  They are in the same TC TC1 TC2  They have the same p properties EQC1  The property targets are in the same TC DS 1 DS 2 DS 3  Similar to 1-Bisimulation Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 41
  • 42. Layer 3: Equivalence Classes SELECT ?x WHERE { ?x rdfs:type foaf:Person foaf:Person . ?x rdfs:type pim:Male . pim:Male foaf:PPD ?x foaf:maker ?y . ?y rdfs:type foaf:PersonalProfileDocument . tc4711 tc1234 } eqc0815 -maker- pim: foaf: foaf: tc1234 Male Person PPD eqc0815 foaf:maker timbl: http://www.w3.org/People/Berners-Lee/card timbl: card card#i Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 42
  • 43. Computing SchemEX: TimBL Data Set • Analysis of a smaller data set • 11 M triples, TimBL‘s FOAF profile • LDspider with ~ 2k triples / sec • Different cache sizes: 100, 1k, 10k, 50k, 100k • Compared SchemEX with reference schema • Index queries on all Types, TCs, EQCs • Good precision/recall ratio at 50k+ • Commodity hardware (4GB RAM, single CPU) Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 43
  • 44. Quality of Stream-based Index Construction + Runtime increases hardly with window size + Memory consumption scales with window size Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 44
  • 45. Computing SchemEX: Full BTC 2011 Data Cache size: 50 k Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 45
  • 46. Billion Triple Challenge 2011 … [JWS 2012] Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 46
  • 47. And 2012? Get the Google Feeling! Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 47
  • 48. Semantic Data Management Chain • Research topics in a greater context SchemEX* OntoMDE SemaPlorer* Publish Collect Aggregate Use Kreuzverweis.com Core Ontologies Mobile Facets * Winner of Billion Triple Challenge 2011/2008  See at: dws.informatik.uni-mannheim.de  Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 48
  • 49. Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 49
  • 50. Recommended Readings • Maciej Janik, Ansgar Scherp, Steffen Staab: The Semantic Web: Collective Intelligence on the Web. Informatik Spektrum 34(5): 469-483 (2011) URL: http://dx.doi.org/10.1007/s00287-011-0535-x • Simon Schenk, Carsten Saathoff, Steffen Staab, Ansgar Scherp: SemaPlorer - Interactive semantic exploration of data and media based on a federated cloud infrastructure. J. Web Sem. 7(4): 298-304 (2009) URL: http://dx.doi.org/10.1016/j.websem.2009.09.006 • Mathias Konrath, Thomas Gottron, Steffen Staab, Ansgar Scherp: SchemEX — Efficient construction of a data catalogue by stream-based indexing of linked data, J. of Web Semantics: Science, Services and Agents on the World Wide Web, Available online 23 June 2012 URL: http://www.sciencedirect.com/science/article/pii/S1570826812000716 • Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space, Morgan & Claypool Publishers, 2011 URL: http://dx.doi.org/10.2200/S00334ED1V01Y201102WBE001 Ansgar Scherp – ansgar@informatik.uni-mannheim.de Slide 50