SlideShare a Scribd company logo
1 of 45
Download to read offline
Insiders
                                                            January
                                                               2010


                  Using the Web of Data
                            for
                  Information Extraction


    scoobie
          sparql rdfa
D2R server rdf
 squin    epiphany
  Linked Data
                OBIE




                        Benjamin Adrian
                        http://www.dfki.uni-kl.de/~adrian
Insiders
Are you still surfing ...                  January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Insiders
… or overloaded?                       January
                                          2010




   Benjamin Adrian
   http://www.dfki.uni-kl.de/~adrian
Insiders
                 A simple question ...                                January
                                                                         2010


What are the cities of the universities in Rhineland Palatinate and
what is the unemployment rate of these cities?




                             Benjamin Adrian
                             http://www.dfki.uni-kl.de/~adrian
Insiders
                     A simple question ...                                       January
                                                                                    2010


What are the cities of the universities in Rhineland Palatinate and
what is the unemployment rate of these cities?

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
PREFIX eurostat: <http://www4.wiwiss.fu-berlin.de/eurostat/resource/eurostat/>
PREFIX dbpedia: <http://dbpedia.org/ontology/>
PREFIX dbpedia_cat: <http://dbpedia.org/resource/Category>

SELECT ?dbpcity ?cityName ?ur WHERE {
?uni      skos:subject dbpedia_cat:Universities_and_colleges_in_Rhineland-Palatinate;
          dbpedia:city                       ?dbpcity .
?dbpcity  owl:sameAs                         ?statcity.
?statcity rdfs:label                         ?cityName ;
          eurostat:unemployment_rate_total ?ur
}
                 http://www.w3.org/TR/rdf-sparql-query/
                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
Insiders
                      … and its answer.                                           January
                                                                                     2010



         dbpcity                                      cityName          ur

         http://dbpedia.org/resource/Koblenz          Koblenz           8.8
         http://dbpedia.org/resource/Trier            Trier             7.3




Data Sources:

 http://epp.eurostat.ec.europa.eu                       http://wiki.dbpedia.org
 http://www4.wiwiss.fu-berlin.de/eurostat/


Query Engine:    SQUIN - Query the Web of Linked Data
                 http://squin.sourceforge.net/




                                 Benjamin Adrian
                                 http://www.dfki.uni-kl.de/~adrian
So much data out there,                      Insiders
                                             January
too much?                                       2010




         Benjamin Adrian
         http://www.dfki.uni-kl.de/~adrian
Insiders
What data do you have?                    January
                                             2010




      Benjamin Adrian
      http://www.dfki.uni-kl.de/~adrian
Insiders
Are you still surfing ...                  January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Insiders
                   Agenda                             January
                                                         2010


In order to use Web of Data for information
extraction, you have to understand its basics.
●   RDF on one slide
●   Publish data in RDF with D2R Server
●   Publish RDF as Linked Data
●   Query Linked Data with SPARQL and Squin
●   Use RDF for information extraction
●   Bring Linked Data to text via RDFa


                  Benjamin Adrian
                  http://www.dfki.uni-kl.de/~adrian
Insiders
       Wouldn't this be nice.                    January
                                                    2010



Data




             Benjamin Adrian
             http://www.dfki.uni-kl.de/~adrian              11
Insiders
       Wouldn't this be nice.                                             January
                                                                             2010



Data        Text


                                              User-defined Filter




           Ex
             tra
                ct
                   io
                        n
                            Pi
                                 pe
                                   l in
                                          e


                                                             Extraction
                                                              Results
                                          enrich

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                                    12
Insiders
       Wouldn't this be nice.                                                   January
                                                                                   2010

                                                            annotated
Data        Text                                                 text


                                              User-defined Filter




           Ex                                                             annotate
             tra
                ct
                   io
                        n
                            Pi
                                 pe
                                   l in
                                          e


                                                             Extraction
                                                              Results
                                          enrich

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                                          13
Insiders
       Wouldn't this be nice.                                                    January
                                                                                    2010

                                                             annotated
Data          Text                                                text


                                               User-defined Filter




            Ex                                                             annotate
              tra
                 ct
                    io
                         n
                             Pi
                                  pe
          populate                  l in
                                           e


                                                              Extraction
                                                               Results
                                           enrich

                 Benjamin Adrian
                 http://www.dfki.uni-kl.de/~adrian                                          14
Insiders
                        RDF on one slide                                                  January
                                                                                             2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type     foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications//icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .
* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                   Benjamin AdrianFound at:
                                   http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                                January
                                                                                                  2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Vocabularies
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                               January
                                                                                                 2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                URLs / URIs
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                            January
                                                                                              2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Subjects
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                              January
                                                                                                2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Predicates
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
                               RDF on one slide                                           January
                                                                                             2010

@prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
                                                                                Objects
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix dc: <http://purl.org/dc/terms/> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix acm: <http://acm.rkbexplorer.com/description/> .

dblp_author:Michael_Gillmann
    foaf:name „Michael Gillmann“ ;
    rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ;
    rdf:type       foaf:Agent ;
    owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ;
    foaf:isMakerOf
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> .

<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    dc:creator dblp_author:Michael_Gillmann ;
    dc:creator dblp_author:Markus_Ebbecke ;
    dc:title       „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .

* From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf

                                            Benjamin AdrianFound at:
                                            http://www.dfki.uni-kl.de/~adrian
Insiders
RDF data is graph data.                    January
                                              2010




       Benjamin Adrian
       http://www.dfki.uni-kl.de/~adrian
Publishing relational                     Insiders
                                          January
    data in RDF                              2010




      Benjamin Adrian
      http://www.dfki.uni-kl.de/~adrian
Publishing relational                                                 Insiders
                                                                                         January
                       data in RDF                                                          2010


D2R Server -    Publishing Relational Databases on
                the Semantic Web

   http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/




                                         Two small command line calls:

                                         ./d2r-server
                                              -p 80
                                              -b http://projects.dfki.uni-kl.de/mydatabase/
                                              mydatabase.n3
                                        ./generate-mapping
                                             -o mydatabase.n3
                                             -b http://projects.dfki.uni-kl.de/mydatabase/
                                             jdbc:mysql://localhost:3306/mydatabase


                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
Linked Data: Linking RDF                             Insiders
                                                     January
data from different sources                             2010


   Customer DB                        Employees DB




                  How to interlink
                  these datasets?




   Project DB                          DBpedia

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian
Linked Data: Linking RDF                                                Insiders
                                                                                    January
            data from different sources                                                2010


Linked Data Principles (TimBL, 2006)

1. Use URIs as names for things
                            (e.g., http://dbpedia.org/resource/Berlin)
2. Use HTTP-URIs so that people can look up those names
3. Provide useful information in RDF when someone looks up an URI
4. Include links to other URIs to enable discovery of more information
Example:

<http://dbpedia.org/resource/Berlin>
    owl:sameAs opencyc:en/CityOfBerlinGermany ;
    owl:sameAs opencyc:en/Berlin_StateGermany
    owl:sameAs <http://sws.geonames.org/2950159/>
    owl:sameAs <http://www4.wiwiss.fu-berlin.de/eurostat/resource/regions/Berlin>
    owl:sameAs freebase:http://dbpedia.org/resource/Berlin


                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
SPARQL: Querying RDF                                            Insiders
                                                                              January
                      data                                                       2010



SPARQL - the RDF query language.
In contrast to SQL, it's data model is not set oriented but graph oriented.

Some Examples:

     Resulting in tuples:
     SELECT ?interest ?friend WHERE {
         <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .
         ?friend foaf:interest ?interest .    }

     Resulting as graph :
     CONSTRUCT {?friend foaf:interest ?interest } WHERE {
         <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .
         ?friend foaf:interest ?interest .    }




                                  Benjamin Adrian
                                  http://www.dfki.uni-kl.de/~adrian
SPARQL: Query Linked                                Insiders
                                                     January
Data from different sources                             2010


   Customer DB                        Employees DB




                  How to access
                 these datasets
                  with a single
                 SPARQL query?




   Project DB                          DBpedia

                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian
SPARQL: Query Linked                                                             Insiders
                                                                                         January
       Data from different sources                                                          2010


Customer DB           Employees DB              Squin: Query the Web of
                                                Linked Data

                                                http://squin.sourceforge.net/

                                                Squin follows a Link Traversal
 D2R Server           D2R Server                approach over HTTP URIs.

              SQUIN                             Remember:

                                                 SELECT DISTINCT ?c ?cityName ?ur
                                                WHERE {
D2R Server            D2R Server                ?u skos:subject
                                                dbpedia_cat:Universities_and_colleges_i
                                                n_Rhineland-Palatinate;
                                                   dbpedia:city ?c .
                                                 ?c owl:sameAs [ rdfs:label ?cityName ;

                                                eurostat:unemployment_rate_total ?ur ]
                                                }
Project DB            DBpedia

                      Benjamin Adrian
                      http://www.dfki.uni-kl.de/~adrian
Using RDF and Linked Data                                     Insiders
                                                              January
 for Information Extraction                                      2010


   User          Linked Data                          Query


          asks                      question



                       t
                  a bou




           to                      answers




   Text            Extraction                Result Graph
                   Pipeline



                  Benjamin Adrian
                  http://www.dfki.uni-kl.de/~adrian
Using RDF and Linked Data                                                       Insiders
                                                                                            January
             for Information Extraction                                                        2010


What data do we have?
Example RDF data
<http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09>
    rdf:type     foaf:Document ;
    dc:creator   dblp_author:Markus_Ebbecke ; 
    dc:title     „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ .




  Classes            Instances      Datatype Properties          Object Properties     Literals
 foaf:Document .../SchulzEGAAD09      dc:title                       dc:creator      „Markus“
 foaf:Person   .../Markus_Ebbecke     foaf:name                      foaf:knows      „Ebbecke“
                                      foaf:firstName                                 „Seizing the
                                      foaf:surName                                   Treasure:
                                                                                     Transferring
                                                                                     Knowledge
                                                                                     in Invoice
                                                                                     Analysis“

                                 Benjamin Adrian
                                 http://www.dfki.uni-kl.de/~adrian
SCOOBIE                                    Insiders
                                                                         January
                         Domain Adaption                                    2010



    Structured                            Text Corpus
    Data                                         Data

                                                          Patterns and
                                                           Gazetteers
                                                                  Data



                 Vocabulary Data

Instance Data



                    Data Preprocessing                Information
                    & Learning (offline)           Extraction (online)


                     Benjamin Adrian
                     http://www.dfki.uni-kl.de/~adrian                              31
SCOOBIE                       Insiders
                                                                             January
                                              Eco System                        2010


               Index      Domain Knowledge                   Models
                                 Text                             Training
                                Corpus                             Corpus
Session Data



                         Instances

                         Ontology                             Models

                                         Patterns +
                                         Gazetteers
                 Pre-
               process               Train                   Extract
Tasks
API




                I   O            I



                         Benjamin Adrian
                         http://www.dfki.uni-kl.de/~adrian                              32
SCOOBIE                       Insiders
                                                               January
                              OBIE Pipeline                       2010


Normalization                        Text Extraction
                                     Language Detection
Segmentation                         Tokenization
                                     Sentence Extraction
                                     POS-Tagging
Symbolization                        Named Entity Recognition
                                     Structured Entity Recognition
                                     Noun Phrase Chunking
                                     Symbol Recognition
Instantiation                        Instance Recognition
                                     Instance Disambiguation
                                     Chunk Classification
Contextualization                    Fact Extraction
                                     Fact Selection
Population                           Query Answering
                Benjamin Adrian
                http://www.dfki.uni-kl.de/~adrian                         33
Used Machine                           Insiders
                                                                 January

                        Learning Models                             2010


             Semi-Supervised Learning

            CRF-based Noun Phrase Chunker
I
             Supervised Learning

            Gazetteer matching statistics (Named Entity Recognition)
        I   Regex matching statistics (Structured Entity Recognition)

            Unsupervised or Instance-based Learning

            TF/IDF-based instance re-ranking (Instance Disambiguation)
    I       K-Nearest-Neighbor chunk classifier (Chunk Classification)
            Spreading Activation-based fact ranking (Fact Selection)


                       Benjamin Adrian
                       http://www.dfki.uni-kl.de/~adrian                    34
Used Machine Learning:                                                             Insiders
                                                                                          January
       Conditional Random Field                                                              2010



CRFs are sequence taggers:

Train it with:   Bill      CAPITALIZED                noun
                 slept     LOWERCASE                  non-noun
                 here      LOWERCASE                  non-noun

Test it with:    He            CAPITALIZED
                 visited       LOWERCASE
                 London        CAPITALIZED

CRF results:     noun                                           MALLET - MAchine Learning
                 non-noun                                       for LanguagE Toolkit
                 non-noun
                                                                http://mallet.cs.umass.edu/


                            Benjamin Adrian
                            http://www.dfki.uni-kl.de/~adrian                                        35
Bringing Linked Data to                            Insiders
                                                                January
                       Text                                        2010


Annotate plain text or HTML with RDF data.
   I'm working at DFKI.

RDFa offers an HTML extension:

   I'm working at
   <span about="dbpedia:DFKI" property="rdfs:label">
   DFKI</span>



Now lets generate RDFa automatically ...




                            Benjamin Adrian
                            http://www.dfki.uni-kl.de/~adrian              36
Insiders
       Do you remember?                                                        January
                                                                                  2010

                                                           annotated
Data        Text                                                text


                                             User-defined Filter




          Ex                                                             annotate
            tra
               ct
                  io
                       n
                           Pi
                                pe
        populate                  l in
                                         e


                                                            Extraction
                                                             Results
                                         enrich

               Benjamin Adrian
               http://www.dfki.uni-kl.de/~adrian                                          37
Insiders
RDF Epiphany                                               January
                                                              2010



                                      Epiphany takes the
                                      original webpage
                                       …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                   38
Insiders
RDF Epiphany                                               January
                                                              2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF data set
                                      …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                   39
Insiders
RDF Epiphany                                                 January
                                                                2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF data set
                                      …
                                      It extracts RDF information
                                      from text and annotates it as
                                      RDFa
                                      …




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                     40
Insiders
RDF Epiphany                                                January
                                                               2010



                                      Epiphany takes the
                                      original webpage
                                       …
                                      and SCOOBIE initialized
                                      with an RDF Linked Data set
                                      …
                                      It extracts RDF information
                                      from text and annotates it as
                                      RDFa
                                      …
                                      clicking on RDFa annotations
                                      opens further information from
                                      the Linked Data set




  Benjamin Adrian
  http://www.dfki.uni-kl.de/~adrian                                    41
Insiders
                              RDF Epiphany                                          January
                                                                                       2010




At a glance
●   Epiphany is a free web service.

●   Epiphany uses SCOOBIE.
                                                                          SCOOBIE
●   Epiphany can be initialized with any RDF
       Linked Data set.

●   Epiphany generates an RDF document about
       a web page.

●   Epiphany annotates RDF as RDFa in the web
       page.


http://projects.dfki.uni-kl.de/epiphany/


                                      Benjamin Adrian
                                      http://www.dfki.uni-kl.de/~adrian                        42
Insiders
                                           Summary                                                           January
                                                                                                                2010

Customer DB          Employees DB                                                      annotated
                                                                     Text                   text

 D2R                 D2R
 Server
             SQUIN
                     Server                                              User-defined Filter

D2R                  D2R
Server               Server



Project DB           DBpedia          Ex                                                               annotate
                                        tra
                                           ct
                                              io
                                                   n
                                                       Pi
                                                            pe
                                    populate                  l in
                                                                     e


                                                                                          Extraction
                                                                                           Results
                                                                     enrich

                                           Benjamin Adrian
                                           http://www.dfki.uni-kl.de/~adrian                                            43
Insiders
                                              Outlook                                                        January
                                                                                                                2010

Customer DB          Employees DB
                                                                     E-Mail
                                                                                          annotated
                                                                                             E-Mail
 D2R                 D2R
 Server
             SQUIN
                     Server                                              User-defined Filter

D2R                  D2R
Server               Server



Project DB           DBpedia          Ex                                                               annotate
                                        tra
                                           ct
                                              io
                                                   n
                                                       Pi
                                                            pe
                                    populate                  l in
                                                                     e


                                                                                          Extraction
                                                                                           Results
                                                                     enrich

                                           Benjamin Adrian
                                           http://www.dfki.uni-kl.de/~adrian                                            44
Insiders
                                                Thank you!   January
                                                                2010




    scoobie
          sparql rdfa
D2R server rdf
 squin    epiphany
  Linked Data
                OBIE




                        Benjamin Adrian
                        http://www.dfki.uni-kl.de/~adrian               45

More Related Content

Viewers also liked

Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - SlidesAnkush Jain
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersSriTeja Allaparthi
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosisask2372
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITAnkit Sharma
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalChen Xi
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2ndhit_alex
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and ExtractionChristopher Frenz
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionDeeksha thakur
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...Masumi Shirakawa
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...Jim Jenkins
 
N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)Yuya Unno
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesTommaso Teofili
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesYunyao Li
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment AnalysisAyush Khandelwal
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked DataIsabelle Augenstein
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsMatthew Lease
 
SAS University Edition - Getting Started
SAS University Edition - Getting StartedSAS University Edition - Getting Started
SAS University Edition - Getting StartedCraig Trim
 

Viewers also liked (20)

Mining Product Synonyms - Slides
Mining Product Synonyms - SlidesMining Product Synonyms - Slides
Mining Product Synonyms - Slides
 
IRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research PapersIRE- Algorithm Name Detection in Research Papers
IRE- Algorithm Name Detection in Research Papers
 
System for-health-diagnosis
System for-health-diagnosisSystem for-health-diagnosis
System for-health-diagnosis
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Information_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIITInformation_retrieval_and_extraction_IIIT
Information_retrieval_and_extraction_IIIT
 
A survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrievalA survey of_eigenvector_methods_for_web_information_retrieval
A survey of_eigenvector_methods_for_web_information_retrieval
 
Open Information Extraction 2nd
Open Information Extraction 2ndOpen Information Extraction 2nd
Open Information Extraction 2nd
 
Information Retrieval and Extraction
Information Retrieval and ExtractionInformation Retrieval and Extraction
Information Retrieval and Extraction
 
Algorithm Name Detection & Extraction
Algorithm Name Detection & ExtractionAlgorithm Name Detection & Extraction
Algorithm Name Detection & Extraction
 
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
N-gram IDF: A Global Term Weighting Scheme Based on Information Distance (WWW...
 
ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...ATI Courses Professional Development Short Course Remote Sensing Information ...
ATI Courses Professional Development Short Course Remote Sensing Information ...
 
N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)N-gram統計量からの係り受け情報の復元 (YANS2011)
N-gram統計量からの係り受け情報の復元 (YANS2011)
 
2 13
2 132 13
2 13
 
Information Extraction with UIMA - Usecases
Information Extraction with UIMA - UsecasesInformation Extraction with UIMA - Usecases
Information Extraction with UIMA - Usecases
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Enterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challengesEnterprise information extraction: recent developments and open challenges
Enterprise information extraction: recent developments and open challenges
 
Twitter Sentiment Analysis
Twitter Sentiment AnalysisTwitter Sentiment Analysis
Twitter Sentiment Analysis
 
Information Extraction with Linked Data
Information Extraction with Linked DataInformation Extraction with Linked Data
Information Extraction with Linked Data
 
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and ApplicationsCrowdsourcing for Information Retrieval: Principles, Methods, and Applications
Crowdsourcing for Information Retrieval: Principles, Methods, and Applications
 
SAS University Edition - Getting Started
SAS University Edition - Getting StartedSAS University Edition - Getting Started
SAS University Edition - Getting Started
 

Recently uploaded

Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introductionMaksud Ahmed
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfUmakantAnnand
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesFatimaKhan178732
 

Recently uploaded (20)

Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Concept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.CompdfConcept of Vouching. B.Com(Hons) /B.Compdf
Concept of Vouching. B.Com(Hons) /B.Compdf
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Separation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and ActinidesSeparation of Lanthanides/ Lanthanides and Actinides
Separation of Lanthanides/ Lanthanides and Actinides
 

Using the Web of Data for Information Extraction

  • 1. Insiders January 2010 Using the Web of Data for Information Extraction scoobie sparql rdfa D2R server rdf squin epiphany Linked Data OBIE Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 2. Insiders Are you still surfing ... January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 3. Insiders … or overloaded? January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 4. Insiders A simple question ... January 2010 What are the cities of the universities in Rhineland Palatinate and what is the unemployment rate of these cities? Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 5. Insiders A simple question ... January 2010 What are the cities of the universities in Rhineland Palatinate and what is the unemployment rate of these cities? PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX owl: <http://www.w3.org/2002/07/owl#> PREFIX skos: <http://www.w3.org/2004/02/skos/core#> PREFIX eurostat: <http://www4.wiwiss.fu-berlin.de/eurostat/resource/eurostat/> PREFIX dbpedia: <http://dbpedia.org/ontology/> PREFIX dbpedia_cat: <http://dbpedia.org/resource/Category> SELECT ?dbpcity ?cityName ?ur WHERE { ?uni skos:subject dbpedia_cat:Universities_and_colleges_in_Rhineland-Palatinate; dbpedia:city ?dbpcity . ?dbpcity owl:sameAs ?statcity. ?statcity rdfs:label ?cityName ; eurostat:unemployment_rate_total ?ur } http://www.w3.org/TR/rdf-sparql-query/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 6. Insiders … and its answer. January 2010 dbpcity cityName ur http://dbpedia.org/resource/Koblenz Koblenz 8.8 http://dbpedia.org/resource/Trier Trier 7.3 Data Sources: http://epp.eurostat.ec.europa.eu http://wiki.dbpedia.org http://www4.wiwiss.fu-berlin.de/eurostat/ Query Engine: SQUIN - Query the Web of Linked Data http://squin.sourceforge.net/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 7. So much data out there, Insiders January too much? 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 8. Insiders What data do you have? January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 9. Insiders Are you still surfing ... January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 10. Insiders Agenda January 2010 In order to use Web of Data for information extraction, you have to understand its basics. ● RDF on one slide ● Publish data in RDF with D2R Server ● Publish RDF as Linked Data ● Query Linked Data with SPARQL and Squin ● Use RDF for information extraction ● Bring Linked Data to text via RDFa Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 11. Insiders Wouldn't this be nice. January 2010 Data Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 11
  • 12. Insiders Wouldn't this be nice. January 2010 Data Text User-defined Filter Ex tra ct io n Pi pe l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 12
  • 13. Insiders Wouldn't this be nice. January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 13
  • 14. Insiders Wouldn't this be nice. January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 14
  • 15. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications//icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 16. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Vocabularies @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 17. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . URLs / URIs @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 18. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Subjects @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 19. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Predicates @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 20. Insiders RDF on one slide January 2010 @prefix dblp_author: <http://dblp.l3s.de/d2r/page/authors/> . @prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . Objects @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix dc: <http://purl.org/dc/terms/> . @prefix owl: <http://www.w3.org/2002/07/owl#> . @prefix acm: <http://acm.rkbexplorer.com/description/> . dblp_author:Michael_Gillmann foaf:name „Michael Gillmann“ ; rdfs:seeAlso <http://www.bibsonomy.org/uri/author/Michael+Gillmann> ; rdf:type foaf:Agent ; owl:sameAs acm:person-197117-81d3fccbfd0249fc33c0d00f03a30af4 ; foaf:isMakerOf <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> . <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> dc:creator dblp_author:Michael_Gillmann ; dc:creator dblp_author:Markus_Ebbecke ; dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . * From: http://sig.ma/entity/ddcb76b935e91940e5508a460619a2ac.rdf Benjamin AdrianFound at: http://www.dfki.uni-kl.de/~adrian
  • 21. Insiders RDF data is graph data. January 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 22. Publishing relational Insiders January data in RDF 2010 Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 23. Publishing relational Insiders January data in RDF 2010 D2R Server - Publishing Relational Databases on the Semantic Web http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/ Two small command line calls: ./d2r-server -p 80 -b http://projects.dfki.uni-kl.de/mydatabase/ mydatabase.n3 ./generate-mapping -o mydatabase.n3 -b http://projects.dfki.uni-kl.de/mydatabase/ jdbc:mysql://localhost:3306/mydatabase Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 24. Linked Data: Linking RDF Insiders January data from different sources 2010 Customer DB Employees DB How to interlink these datasets? Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 25. Linked Data: Linking RDF Insiders January data from different sources 2010 Linked Data Principles (TimBL, 2006) 1. Use URIs as names for things (e.g., http://dbpedia.org/resource/Berlin) 2. Use HTTP-URIs so that people can look up those names 3. Provide useful information in RDF when someone looks up an URI 4. Include links to other URIs to enable discovery of more information Example: <http://dbpedia.org/resource/Berlin> owl:sameAs opencyc:en/CityOfBerlinGermany ; owl:sameAs opencyc:en/Berlin_StateGermany owl:sameAs <http://sws.geonames.org/2950159/> owl:sameAs <http://www4.wiwiss.fu-berlin.de/eurostat/resource/regions/Berlin> owl:sameAs freebase:http://dbpedia.org/resource/Berlin Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 26. SPARQL: Querying RDF Insiders January data 2010 SPARQL - the RDF query language. In contrast to SQL, it's data model is not set oriented but graph oriented. Some Examples: Resulting in tuples: SELECT ?interest ?friend WHERE {    <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .    ?friend foaf:interest ?interest .  } Resulting as graph : CONSTRUCT {?friend foaf:interest ?interest } WHERE {    <http://www.w3.org/People/Berners­Lee/card#i> foaf:knows ?friend .    ?friend foaf:interest ?interest .  } Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 27. SPARQL: Query Linked Insiders January Data from different sources 2010 Customer DB Employees DB How to access these datasets with a single SPARQL query? Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 28. SPARQL: Query Linked Insiders January Data from different sources 2010 Customer DB Employees DB Squin: Query the Web of Linked Data http://squin.sourceforge.net/ Squin follows a Link Traversal D2R Server D2R Server approach over HTTP URIs. SQUIN Remember: SELECT DISTINCT ?c ?cityName ?ur WHERE { D2R Server D2R Server ?u skos:subject dbpedia_cat:Universities_and_colleges_i n_Rhineland-Palatinate; dbpedia:city ?c . ?c owl:sameAs [ rdfs:label ?cityName ; eurostat:unemployment_rate_total ?ur ] } Project DB DBpedia Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 29. Using RDF and Linked Data Insiders January for Information Extraction 2010 User Linked Data Query asks question t a bou to answers Text Extraction Result Graph Pipeline Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 30. Using RDF and Linked Data Insiders January for Information Extraction 2010 What data do we have? Example RDF data <http://dblp.l3s.de/d2r/resource/publications/dblp_pub:conf/icdar/SchulzEGAAD09> rdf:type foaf:Document ; dc:creator dblp_author:Markus_Ebbecke ;  dc:title „Seizing the Treasure: Transferring Knowledge in Invoice Analysis“ . Classes Instances Datatype Properties Object Properties Literals foaf:Document .../SchulzEGAAD09 dc:title dc:creator „Markus“ foaf:Person .../Markus_Ebbecke foaf:name foaf:knows „Ebbecke“ foaf:firstName „Seizing the foaf:surName Treasure: Transferring Knowledge in Invoice Analysis“ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian
  • 31. SCOOBIE Insiders January Domain Adaption 2010 Structured Text Corpus Data Data Patterns and Gazetteers Data Vocabulary Data Instance Data Data Preprocessing Information & Learning (offline) Extraction (online) Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 31
  • 32. SCOOBIE Insiders January Eco System 2010 Index Domain Knowledge Models Text Training Corpus Corpus Session Data Instances Ontology Models Patterns + Gazetteers Pre- process Train Extract Tasks API I O I Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 32
  • 33. SCOOBIE Insiders January OBIE Pipeline 2010 Normalization Text Extraction Language Detection Segmentation Tokenization Sentence Extraction POS-Tagging Symbolization Named Entity Recognition Structured Entity Recognition Noun Phrase Chunking Symbol Recognition Instantiation Instance Recognition Instance Disambiguation Chunk Classification Contextualization Fact Extraction Fact Selection Population Query Answering Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 33
  • 34. Used Machine Insiders January Learning Models 2010 Semi-Supervised Learning CRF-based Noun Phrase Chunker I Supervised Learning Gazetteer matching statistics (Named Entity Recognition) I Regex matching statistics (Structured Entity Recognition) Unsupervised or Instance-based Learning TF/IDF-based instance re-ranking (Instance Disambiguation) I K-Nearest-Neighbor chunk classifier (Chunk Classification) Spreading Activation-based fact ranking (Fact Selection) Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 34
  • 35. Used Machine Learning: Insiders January Conditional Random Field 2010 CRFs are sequence taggers: Train it with: Bill CAPITALIZED noun slept LOWERCASE non-noun here LOWERCASE non-noun Test it with: He CAPITALIZED visited LOWERCASE London CAPITALIZED CRF results: noun MALLET - MAchine Learning non-noun for LanguagE Toolkit non-noun http://mallet.cs.umass.edu/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 35
  • 36. Bringing Linked Data to Insiders January Text 2010 Annotate plain text or HTML with RDF data. I'm working at DFKI. RDFa offers an HTML extension: I'm working at <span about="dbpedia:DFKI" property="rdfs:label"> DFKI</span> Now lets generate RDFa automatically ... Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 36
  • 37. Insiders Do you remember? January 2010 annotated Data Text text User-defined Filter Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 37
  • 38. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 38
  • 39. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF data set … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 39
  • 40. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF data set … It extracts RDF information from text and annotates it as RDFa … Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 40
  • 41. Insiders RDF Epiphany January 2010 Epiphany takes the original webpage … and SCOOBIE initialized with an RDF Linked Data set … It extracts RDF information from text and annotates it as RDFa … clicking on RDFa annotations opens further information from the Linked Data set Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 41
  • 42. Insiders RDF Epiphany January 2010 At a glance ● Epiphany is a free web service. ● Epiphany uses SCOOBIE. SCOOBIE ● Epiphany can be initialized with any RDF Linked Data set. ● Epiphany generates an RDF document about a web page. ● Epiphany annotates RDF as RDFa in the web page. http://projects.dfki.uni-kl.de/epiphany/ Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 42
  • 43. Insiders Summary January 2010 Customer DB Employees DB annotated Text text D2R D2R Server SQUIN Server User-defined Filter D2R D2R Server Server Project DB DBpedia Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 43
  • 44. Insiders Outlook January 2010 Customer DB Employees DB E-Mail annotated E-Mail D2R D2R Server SQUIN Server User-defined Filter D2R D2R Server Server Project DB DBpedia Ex annotate tra ct io n Pi pe populate l in e Extraction Results enrich Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 44
  • 45. Insiders Thank you! January 2010 scoobie sparql rdfa D2R server rdf squin epiphany Linked Data OBIE Benjamin Adrian http://www.dfki.uni-kl.de/~adrian 45