SlideShare a Scribd company logo
INDEXING AND SEARCHING
      RDF DATASETS
Improving Performance of Semantic Web Applications with
                 Lucene, SIREn and RDF


                      Mike Hugo
                     Entagen, LLC
slides and sample code can be found at
https://github.com/mjhugo/rdf-lucene-siren-presentation
ENTAGEN
ACCELERATING INSIGHT
17
AGENDA
SPARQL
SPARQL

    LUCENE
SPARQL

    LUCENE

         SIREN
SPARQL

    LUCENE

          SIREN
    TripleMap.com
LINKING OPEN DATA
LIFE SCIENCE LINKED DATA
WHAT’S A TRIPLE?

Subject


          Predicate



                      Object
WHAT’S A TRIPLE?

<Mike>


         <name>



                  “Mike Hugo”
WHAT’S A TRIPLE?

                           “Minneapolis”
         <lives_in_city>
<Mike>

            <name>

                           “Mike Hugo”
WHAT’S A TRIPLE?

                               “Minneapolis”
<Mike>       <lives_in_city>



<daughter>       <name>


                               “Mike Hugo”
   <Lydia>
WHAT’S A TRIPLE?

                               “Minneapolis”
<Mike>       <lives_in_city>


                   <name>
<daughter>
                                 “Mike Hugo”



   <Lydia>
                 <name>        “Lydia Hugo”
Ready, GO!
select id, label
from targets
where label = ‘${queryValue}’
select id, label
from targets
where label
 ilike ‘%${queryValue}%’
SELECT ?uri ?type ?label WHERE {
  ?uri rdfs:label ?label .
  ?uri rdf:type ?type .
  FILTER (?label = '${params.query}')
} LIMIT 10
SELECT ?uri ?type ?label WHERE {
  ?uri rdfs:label ?label .
  ?uri rdf:type ?type .
  FILTER regex(?label,
    'Q${params.query}E', 'i')
} LIMIT 10
SELECT ?uri ?type ?label WHERE {
  ?uri rdfs:label ?label .
  ?uri rdf:type ?type .
  FILTER regex(?label,
    'Q${params.query}E', 'i')
} LIMIT 10



                              case insensitive
     query as literal value
DEMO
Baseline SPARQL Query Performance
FASTER!
Java API
Indexing and Searching Text
`




http://wiki.apache.org/lucene-java/PoweredBy
indexing   storage
Document
Document

 field        value
   ID           2
  name     “Mike Hugo”
company     “Entagen”
           “lorem ipsum
  bio
          dolor sum etc...”
Index

  field
   field             value
                     value
    field
     field             value
                       value
      field
       field             value
  name
   name field    “mike value
                         value
                         hugo”
                 “mike hugo”
    name
     name         “mike hugo”
                   “mike hugo”
      name
       nameid       “mike hugo”
                     “mike 2hugo”
company
 company          “Entagen”
                   “Entagen”
  company
   company
        name        “Entagen”
                     “Entagen”
                      “Mike Hugo”
    company
     company          “Entagen”
                       “Entagen”
                        “Entagen” Indexed
      company “lorem ipsum
                “lorem ipsum
    bio
     bio         “lorem ipsum
                  “lorem etc...”
      bio
       bio         “loremipsum
               dolorsum ipsum
              dolor“loremetc...”      not
        bio
         bio            sum ipsum
                dolorsum etc...””
                 dolor sum etc... ”
          bio     dolor sum ipsum”
                     “lorem etc...
                   dolor sum etc... Stored
                    dolor sum etc...”
Query:   name: mike
Query:     name: mike

 Matching
Documents:   field            value
              idfield           2
                               value
                 idfield          2
                                 value
                    idfield         2
                                   value
                       id              2
field   value
 id      2
field   value
 id      2
field   value
 id      2



                field          value
                  ID            2
                 name     “Mike Hugo”
               company      “Entagen”
                          “lorem ipsum
                 bio
                         dolor sum etc...”
Simplest Solution
Lucene index of rdfs:label
Build the Index
String queryLabels = """
    SELECT ?uri ?label
    WHERE {
         ?uri rdfs:label ?label .
    }                 Build a SPARQL
"""                 query to find all the
                   rdfs:label properties
sparqlQueryService.executeForEach(repo
      def doc = new Document()
      String uri = it.uri.stringValue()
      String label = it.label.stringValu

    doc.add(new Field(SUBJECT_URI_FIEL
sparqlQueryService.executeForEach
  (repository, queryLabels) {
    String uri = it.uri.stringValue()
    String label = it.label.stringValu
     Execute the
      def doc = new Document()
    SPARQL query

      doc.add(new Field(SUBJECT_URI_FIEL
              Field.Store.YES, Field.Ind
      doc.add(new Field(LABEL_FIELD, lab
              Field.Store.NO, Field.Inde

      writer.addDocument(doc)
}
arqlQueryService.executeForEach(reposito
  String uri = it.uri.stringValue()
  String label = it.label.stringValue()

  Document doc = new Document()
  doc.add(new Field(SUBJECT_URI_FIELD,
          uri,           Instantiate a
                          new Lucene
          Field.Store.YES,
                           Document
          Field.Index.ANALYZED))
  doc.add(new Field(LABEL_FIELD,
          label,
          Field.Store.NO,
          Field.Index.ANALYZED))

  writer.addDocument(doc)
key
  Document doc = new Document()
  doc.add(new Field(SUBJECT_URI_FIELD,
    value uri,
          Field.Store.YES,
          Field.Index.ANALYZED))
  doc.add(new Field(LABEL_FIELD,
          label,        Add the Subject
          Field.Store.NO, URI to the
                           Document
          Field.Index.ANALYZED))

  writer.addDocument(doc)

lly {
Field.Store.YES,
           Field.Index.ANALYZED))
  doc.add(new Field(LABEL_FIELD, key
     value label,
           Field.Store.NO,
           Field.Index.ANALYZED))
                      Add the Label field
  writer.addDocument(doc) document
                       to the
                      (but don’t store it)
lly {
iter.close() // Close index
doc.add(new Field(LABEL_FIELD, labe
             Field.Store.NO,
             Field.Index.ANALYZED))

     writer.addDocument(doc)
 }
inally {
 writer.close()   // Closethe document
                      Add index
                        to the Index
Query the Index
f query = {
  Query query = new QueryParser(
     Version.LUCENE_CURRENT,
     LABEL_FIELD, query this field
     new StandardAnalyzer())
          .parse(params.query);
                           for this value
  def s Create a Lucene
        = new Date().time
        Query from user
  List results = executeQuery(query)
             input
  def e = new Date().time

  render(view: 'index', model: [results:
IndexSearcher searcher = luceneSearche
ScoreDoc[] scoreDocs =
    searcher.search(query, 10).scoreDo
List results = [] Search the index
                    (limit 10) for
def connection = repository.connection
scoreDocs.each {       matching
                     documents
    Document doc = searcher.doc(it.doc
    String uri = doc[SUBJECT_URI_FIELD
    Map labelAndType = sparqlQueryServ
    results << [uri: uri, type: labelA
}
connection.close()
return results
List results = []
def connection = repository.connection
scoreDocs.each {
  Document doc = searcher.doc(it.doc)
  String uri = doc[SUBJECT_URI_FIELD]
  Map labelAndType =
       For each matching
   sparqlQueryService.
       document, get the
     getLabelAndType(uri, connection)
      doc and extract the
  results.add([
          Subject URI
         uri: uri,
         type: labelAndType.type,
         label: labelAndType.label])
}
connection.close()
return results
List results = []
def connection = repository.connection
scoreDocs.each {
  Document doc = searcher.doc(it.doc)
  String uri = doc[SUBJECT_URI_FIELD]
  Map labelAndType =
   sparqlQueryService.
     getLabelAndType(uri, connection)
  results.add([
         uri: uri, Using the Subject
                  URI, load properties
         type: labelAndType.type,
                  from the triplestore
         label: labelAndType.label])
}
connection.close()
return results
List results = []
def connection = repository.connection
scoreDocs.each {
  Document doc = searcher.doc(it.doc)
                  return results
                containing Subject
  String uri = doc[SUBJECT_URI_FIELD]
  Map labelAndType Type, and Label
               URI, =
   sparqlQueryService.
     getLabelAndType(uri, connection)
  results.add([
         uri: uri,
         type: labelAndType.type,
         label: labelAndType.label])
}
connection.close()
return results
DEMO
Lucene Index of Searchable Labels
WHAT ABOUT ENTITY
 RELATIONSHIPS?
WHAT ABOUT OTHER
   PROPERTIES?
Lucene Extension

Indexing and Searching
 Semi-Structured Data
Document
Document

field                      value

 URI   <DB00619>
        <DB00619> rdfs:label "Imatinib" .
        <DB00619> rdf:type <drugbank:drugs> .
triples
        <DB00619> drugbank:brandName "Gleevec" .
        <DB00619> drugbank:target <targets/1588> .
Build the Index
Connection connection = repository.conn
y {
  String subjectUris = """
      SELECT distinct ?uri
      WHERE {
           ?uri ?p ?o .
      }
  """
  sparqlQueryService.executeForEach(rep
        def doc = new Select all Subject
                      Document()
                        URIs from the
                         triplestore
        String subjectUri = it.uri.string
        doc.add(new Field(SUBJECT_URI_FIE
                subjectUri,
"""
sparqlQueryService.executeForEach(
  repository, subjectUris) {
    def doc = new Document()

    String subjectUri = it.uri.stringV
    doc.add(new Field(SUBJECT_URI_FIEL
            subjectUri,
            Field.Store.YES,
             Execute the Sparql Query
            Field.Index.ANALYZED))
               For each URI, create a
                  new Document
    StringWriter triplesStringWriter =
    NTriplesWriter nTriplesWriter =
        new NTriplesWriter(triplesStri
epository, subjectUris) {
 def doc = new Document()

 String subjectUri = it.uri.stringValue
 doc.add(new Field(SUBJECT_URI_FIELD,
         subjectUri,
         Field.Store.YES,
         Field.Index.ANALYZED))

 StringWriter triplesStringWriter = new
 NTriplesWriter nTriplesWriter =URI
                  Add the Subject
                    to the Document
     new NTriplesWriter(triplesStringWr
 connection.exportStatements(
         new URIImpl(subjectUri),
         null, null, false,
Field.Index.ANALYZED))

StringWriter triplesStringWriter = new
NTriplesWriter nTriplesWriter =
    new NTriplesWriter(triplesStringWr
connection.exportStatements(
        new URIImpl(subjectUri),
        null, null, false,
        nTriplesWriter)

                     Get an NTriples
doc.add(new Field(TRIPLES_FIELD,
                     string from the
        triplesStringWriter.toString()
        Field.Store.NO, triplestore
        Field.Index.ANALYZED))
new URIImpl(subjectUri),
        null, null, false,
        nTriplesWriter)

doc.add(new Field(TRIPLES_FIELD,
        triplesStringWriter.toString()
        Field.Store.NO,
        Field.Index.ANALYZED))

                   Add the NTriples
writer.addDocument(doc)
                     string to the
                      document
doc.add(new Field(TRIPLES_FIELD,
        triplesStringWriter.toString()
        Field.Store.NO,
        Field.Index.ANALYZED))

writer.addDocument(doc)


                 Add the document
                   to the index
Query the Index
SirenCellQuery predicate =
  new SirenCellQuery(
   new SirenTermQuery(
       new Term(TRIPLES_FIELD,
       RDFS.LABEL.stringValue())));
predicate.constraint = PREDICATE_CELL

SirenCellQuery object =
                   query the Triples
  new SirenCellQuery(
   new SirenTermQuery(   field
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase()))
object.constraint = OBJECT_CELL
SirenCellQuery predicate =
  new SirenCellQuery(
   new SirenTermQuery(
       new Term(TRIPLES_FIELD,
       RDFS.LABEL.stringValue())));
predicate.constraint = PREDICATE_CELL

SirenCellQuery object =
  new SirenCellQuery( a predicate
                    for
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase()))
object.constraint = OBJECT_CELL
SirenCellQuery predicate =
  new SirenCellQuery(
   new SirenTermQuery(
       new Term(TRIPLES_FIELD,
       RDFS.LABEL.stringValue())));
predicate.constraint = PREDICATE_CELL
                         of rdfs:label *
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
             new Term(TRIPLES_FIELD,
             params.query.toLowerCase()))
    * note: could be any predicate!
object.constraint = OBJECT_CELL
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase())
object.constraint = OBJECT_CELL

Query query = new SirenTupleQuery()
                   query the Triples
query.add(predicate,
                         field
        SirenTupleClause.Occur.MUST)
query.add(object,
        SirenTupleClause.Occur.MUST)
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase())
object.constraint = OBJECT_CELL

Query query = new SirenTupleQuery()
query.add(predicate,
                     for an object
        SirenTupleClause.Occur.MUST)
query.add(object,
        SirenTupleClause.Occur.MUST)
SirenCellQuery object =
  new SirenCellQuery(
   new SirenTermQuery(
           new Term(TRIPLES_FIELD,
           params.query.toLowerCase())
object.constraint = OBJECT_CELL

Query query = new SirenTupleQuery()
query.add(predicate, matching the
                      user input
        SirenTupleClause.Occur.MUST)
query.add(object,
        SirenTupleClause.Occur.MUST)
field                      value

  URI   <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query: “imatinib”
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:

   triples field
field                      value

    URI   <DB00619>
           <DB00619> rdfs:label "Imatinib" .
           <DB00619> rdf:type <drugbank:drugs> .
   triples
           <DB00619> drugbank:brandName "Gleevec" .
           <DB00619> drugbank:target <targets/1588> .


 Query:

predicate = rdfs:label
field                      value

    URI   <DB00619>
           <DB00619> rdfs:label "Imatinib" .
           <DB00619> rdf:type <drugbank:drugs> .
   triples
           <DB00619> drugbank:brandName "Gleevec" .
           <DB00619> drugbank:target <targets/1588> .


 Query:

predicate = rdfs:label
                       object = “imatinib”
List executeQuery(Query query) {
 IndexSearcher searcher = sirenSearcherM
 ScoreDoc[] scoreDocs =
   searcher.search(query, 10).scoreDocs
 List results = []
 def connection = repository.connection
                        Search the index
 scoreDocs.each {         (limit 10) for
                             matching
     Document doc = searcher.doc(it.doc)
                           documents
     String uri = doc[SUBJECT_URI_FIELD]
     Map labelAndType = sparqlQueryServi
          getLabelAndType(uri, connectio
     results.add([
             uri: uri,
             type: labelAndType.type,
List results = []
def connection = repository.connection
scoreDocs.each {
    Document doc = searcher.doc(it.doc)
    String uri = doc[SUBJECT_URI_FIELD]
    Map labelAndType = sparqlQueryServic
             For each matching
         getLabelAndType(uri, connection
             document, get the
    results.add([
            doc and extract the
            uri: uri,
                Subject URI
            type: labelAndType.type,
            label: labelAndType.label])
}
connection.close()
return results
connection = repository.connection
reDocs.each {
 Document doc = searcher.doc(it.doc)
 String uri = doc[SUBJECT_URI_FIELD]
 Map labelAndType = sparqlQueryService.
      getLabelAndType(uri, connection)
 results.add([
         uri: uri, Using the Subject
         type: labelAndType.type,
                   URI, load properties
         label: labelAndType.label])
                   from the triplestore

nection.close()
urn results
String uri = doc[SUBJECT_URI_FIELD]
 Map labelAndType = sparqlQueryService.
      getLabelAndType(uri, connection)
 results.add([
         uri: uri,
         type: labelAndType.type,
         label: labelAndType.label])

nection.close()      return results
urn results        containing Subject
                  URI, Type, and Label
DEMO
SIREn Index of RDF Entities
FLEXIBILITY
field                      value

    URI   <DB00619>
           <DB00619> rdfs:label "Imatinib" .
           <DB00619> rdf:type <drugbank:drugs> .
   triples
           <DB00619> drugbank:brandName "Gleevec" .
           <DB00619> drugbank:target <targets/1588> .


 Query:

predicate = rdfs:label
                       object = “imatinib”
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:

         object = “imatinib”
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:
object = “imatinib”
              OR
               object = “gleevec”
MORE THAN LITERALS
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:


         predicate = brandName
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:


             predicate = target
RELATIONSHIPS
field                      value

  URI    <DB00619>
         <DB00619> rdfs:label "Imatinib" .
         <DB00619> rdf:type <drugbank:drugs> .
 triples
         <DB00619> drugbank:brandName "Gleevec" .
         <DB00619> drugbank:target <targets/1588> .


Query:


         object = <targets/1588>
DEMO
Searching SIREn Index for Relationships
Distributed
Indexing and Searching
 Semi-Structured Data
Replication
400 Million Documents
> 12 Billion Triples
Query Parser
Query Parser




subject

          predicate   object
DEMO
SIREn in action on TripleMap.com
DEMO
SIREn in action on TripleMap.com
SPARQL

    LUCENE

          SIREN
    TripleMap.com
QUESTIONS?

    mike@entagen.com / twitter: @piragua



                                TripleMap

http://www.entagen.com   http://www.triplemap.com

More Related Content

What's hot

Database2
Database2Database2
Database2
Claudio Guidi
 
DOM and Events
DOM and EventsDOM and Events
DOM and Events
Julie Iskander
 
Erlang for data ops
Erlang for data opsErlang for data ops
Erlang for data ops
mnacos
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
rogerbodamer
 
MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011Steven Francia
 
Ext GWT 3.0 Data Widgets
Ext GWT 3.0 Data WidgetsExt GWT 3.0 Data Widgets
Ext GWT 3.0 Data Widgets
Sencha
 
Code Samples &amp; Screenshots
Code Samples &amp; ScreenshotsCode Samples &amp; Screenshots
Code Samples &amp; Screenshots
Nii Amah Hesse
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
Oliver Gierke
 
Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제홍준 김
 
Jquery
JqueryJquery
Jquery
Zoya Shaikh
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
Oliver Gierke
 
SetFocus Portfolio
SetFocus PortfolioSetFocus Portfolio
SetFocus Portfoliodonjoshu
 
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
영욱 김
 
Building DSLs with Groovy
Building DSLs with GroovyBuilding DSLs with Groovy
Building DSLs with Groovy
Sten Anderson
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
Michael Limansky
 
J query1
J query1J query1
J query1
Manav Prasad
 
JSON
JSONJSON
JSON
Yoga Raja
 
J query
J queryJ query
J query
Manav Prasad
 
jQuery
jQueryjQuery

What's hot (20)

Database2
Database2Database2
Database2
 
DOM and Events
DOM and EventsDOM and Events
DOM and Events
 
Erlang for data ops
Erlang for data opsErlang for data ops
Erlang for data ops
 
Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling Intro to MongoDB and datamodeling
Intro to MongoDB and datamodeling
 
MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011MongoDB and PHP ZendCon 2011
MongoDB and PHP ZendCon 2011
 
Ext GWT 3.0 Data Widgets
Ext GWT 3.0 Data WidgetsExt GWT 3.0 Data Widgets
Ext GWT 3.0 Data Widgets
 
Code Samples &amp; Screenshots
Code Samples &amp; ScreenshotsCode Samples &amp; Screenshots
Code Samples &amp; Screenshots
 
An introduction into Spring Data
An introduction into Spring DataAn introduction into Spring Data
An introduction into Spring Data
 
Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제Mongo db문서의생성,갱신,삭제
Mongo db문서의생성,갱신,삭제
 
03DOM.ppt
03DOM.ppt03DOM.ppt
03DOM.ppt
 
Jquery
JqueryJquery
Jquery
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
 
SetFocus Portfolio
SetFocus PortfolioSetFocus Portfolio
SetFocus Portfolio
 
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
4시간만에 따라해보는 Windows 10 앱 개발 샘플코드
 
Building DSLs with Groovy
Building DSLs with GroovyBuilding DSLs with Groovy
Building DSLs with Groovy
 
Embedding a language into string interpolator
Embedding a language into string interpolatorEmbedding a language into string interpolator
Embedding a language into string interpolator
 
J query1
J query1J query1
J query1
 
JSON
JSONJSON
JSON
 
J query
J queryJ query
J query
 
jQuery
jQueryjQuery
jQuery
 

Viewers also liked

The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...
Smarter-Companies
 
Managementmodellen Bij Audit Cc
Managementmodellen Bij Audit CcManagementmodellen Bij Audit Cc
Managementmodellen Bij Audit CcBruno Verbergt
 
Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02
indraf
 
All we know he´s called Kenneth
All we know he´s called KennethAll we know he´s called Kenneth
All we know he´s called Kenneth
Peter Falkheden
 
Midwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference PresentationMidwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference PresentationP. Haans Mulder, JD, MST, CFP®
 
IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?
Smarter-Companies
 
DreMode~Capabilities Kit
DreMode~Capabilities KitDreMode~Capabilities Kit
DreMode~Capabilities KitDreMode
 
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
dcd2z
 
Finito, lavoro italiano
Finito, lavoro italianoFinito, lavoro italiano
Finito, lavoro italianoStefano31
 

Viewers also liked (11)

The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...The Role of the Intangibles Information Gap in the Financialization of the A...
The Role of the Intangibles Information Gap in the Financialization of the A...
 
Managementmodellen Bij Audit Cc
Managementmodellen Bij Audit CcManagementmodellen Bij Audit Cc
Managementmodellen Bij Audit Cc
 
Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02Digitalcommunicationstrategysn 090602093708-phpapp02
Digitalcommunicationstrategysn 090602093708-phpapp02
 
Navigating Complicated Issues for Seniors
Navigating Complicated Issues for Seniors Navigating Complicated Issues for Seniors
Navigating Complicated Issues for Seniors
 
All we know he´s called Kenneth
All we know he´s called KennethAll we know he´s called Kenneth
All we know he´s called Kenneth
 
Midwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference PresentationMidwest Trust and Wealth Management Conference Presentation
Midwest Trust and Wealth Management Conference Presentation
 
IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?IC: Ready to Cross The Chasm?
IC: Ready to Cross The Chasm?
 
DreMode~Capabilities Kit
DreMode~Capabilities KitDreMode~Capabilities Kit
DreMode~Capabilities Kit
 
Corporate Presentation
Corporate PresentationCorporate Presentation
Corporate Presentation
 
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
Creating Value with SAP BusinessObjects Planning and Consolidation, version f...
 
Finito, lavoro italiano
Finito, lavoro italianoFinito, lavoro italiano
Finito, lavoro italiano
 

Similar to Improving RDF Search Performance with Lucene and SIREN

Python dictionaries
Python dictionariesPython dictionaries
Python dictionaries
Krishna Nanda
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
rogerbodamer
 
What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?
Trisha Gee
 
Decorators demystified
Decorators demystifiedDecorators demystified
Decorators demystified
Pablo Enfedaque
 
Text to data
Text to dataText to data
Text to data
Edmund Chamberlain
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data ModelingDATAVERSITY
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
Martin Rehfeld
 
2017-06-22 Documentation as code
2017-06-22 Documentation as code2017-06-22 Documentation as code
2017-06-22 Documentation as code
Jérémie Bresson
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012Yaqi Zhao
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
TO THE NEW | Technology
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
Eric Bottard
 
Building DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language WorkbenchBuilding DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language Workbench
Eelco Visser
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
Abdelrahman Othman Helal
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
Henrik Ingo
 
ActionScript3 collection query API proposal
ActionScript3 collection query API proposalActionScript3 collection query API proposal
ActionScript3 collection query API proposal
Slavisa Pokimica
 
DevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreDevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a Datastore
Xavier Coulon
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
Alexandre Victoor
 
Working With JQuery Part1
Working With JQuery Part1Working With JQuery Part1
Working With JQuery Part1saydin_soft
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at Squidoo
Gil Hildebrand
 

Similar to Improving RDF Search Performance with Lucene and SIREN (20)

Python dictionaries
Python dictionariesPython dictionaries
Python dictionaries
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?What do you mean, Backwards Compatibility?
What do you mean, Backwards Compatibility?
 
Decorators demystified
Decorators demystifiedDecorators demystified
Decorators demystified
 
Text to data
Text to dataText to data
Text to data
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
CouchDB-Lucene
CouchDB-LuceneCouchDB-Lucene
CouchDB-Lucene
 
2017-06-22 Documentation as code
2017-06-22 Documentation as code2017-06-22 Documentation as code
2017-06-22 Documentation as code
 
Pyconie 2012
Pyconie 2012Pyconie 2012
Pyconie 2012
 
MongoDB (Advanced)
MongoDB (Advanced)MongoDB (Advanced)
MongoDB (Advanced)
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
Building DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language WorkbenchBuilding DSLs with the Spoofax Language Workbench
Building DSLs with the Spoofax Language Workbench
 
Building a Search Engine Using Lucene
Building a Search Engine Using LuceneBuilding a Search Engine Using Lucene
Building a Search Engine Using Lucene
 
Building Your First MongoDB App
Building Your First MongoDB AppBuilding Your First MongoDB App
Building Your First MongoDB App
 
ActionScript3 collection query API proposal
ActionScript3 collection query API proposalActionScript3 collection query API proposal
ActionScript3 collection query API proposal
 
DevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a DatastoreDevNation'15 - Using Lambda Expressions to Query a Datastore
DevNation'15 - Using Lambda Expressions to Query a Datastore
 
Avro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSONAvro, la puissance du binaire, la souplesse du JSON
Avro, la puissance du binaire, la souplesse du JSON
 
JNDI
JNDIJNDI
JNDI
 
Working With JQuery Part1
Working With JQuery Part1Working With JQuery Part1
Working With JQuery Part1
 
When Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at SquidooWhen Relational Isn't Enough: Neo4j at Squidoo
When Relational Isn't Enough: Neo4j at Squidoo
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

Improving RDF Search Performance with Lucene and SIREN

  • 1. INDEXING AND SEARCHING RDF DATASETS Improving Performance of Semantic Web Applications with Lucene, SIREn and RDF Mike Hugo Entagen, LLC
  • 2. slides and sample code can be found at https://github.com/mjhugo/rdf-lucene-siren-presentation
  • 3.
  • 5. 17
  • 6.
  • 7.
  • 9.
  • 11. SPARQL LUCENE
  • 12. SPARQL LUCENE SIREN
  • 13. SPARQL LUCENE SIREN TripleMap.com
  • 14.
  • 17. WHAT’S A TRIPLE? Subject Predicate Object
  • 18. WHAT’S A TRIPLE? <Mike> <name> “Mike Hugo”
  • 19. WHAT’S A TRIPLE? “Minneapolis” <lives_in_city> <Mike> <name> “Mike Hugo”
  • 20. WHAT’S A TRIPLE? “Minneapolis” <Mike> <lives_in_city> <daughter> <name> “Mike Hugo” <Lydia>
  • 21. WHAT’S A TRIPLE? “Minneapolis” <Mike> <lives_in_city> <name> <daughter> “Mike Hugo” <Lydia> <name> “Lydia Hugo”
  • 22.
  • 23.
  • 24.
  • 25.
  • 27. select id, label from targets where label = ‘${queryValue}’
  • 28. select id, label from targets where label ilike ‘%${queryValue}%’
  • 29. SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER (?label = '${params.query}') } LIMIT 10
  • 30. SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER regex(?label, 'Q${params.query}E', 'i') } LIMIT 10
  • 31. SELECT ?uri ?type ?label WHERE { ?uri rdfs:label ?label . ?uri rdf:type ?type . FILTER regex(?label, 'Q${params.query}E', 'i') } LIMIT 10 case insensitive query as literal value
  • 34.
  • 35. Java API Indexing and Searching Text
  • 37. indexing storage
  • 39. Document field value ID 2 name “Mike Hugo” company “Entagen” “lorem ipsum bio dolor sum etc...”
  • 40. Index field field value value field field value value field field value name name field “mike value value hugo” “mike hugo” name name “mike hugo” “mike hugo” name nameid “mike hugo” “mike 2hugo” company company “Entagen” “Entagen” company company name “Entagen” “Entagen” “Mike Hugo” company company “Entagen” “Entagen” “Entagen” Indexed company “lorem ipsum “lorem ipsum bio bio “lorem ipsum “lorem etc...” bio bio “loremipsum dolorsum ipsum dolor“loremetc...” not bio bio sum ipsum dolorsum etc...”” dolor sum etc... ” bio dolor sum ipsum” “lorem etc... dolor sum etc... Stored dolor sum etc...”
  • 41. Query: name: mike
  • 42. Query: name: mike Matching Documents: field value idfield 2 value idfield 2 value idfield 2 value id 2
  • 43. field value id 2
  • 44. field value id 2
  • 45. field value id 2 field value ID 2 name “Mike Hugo” company “Entagen” “lorem ipsum bio dolor sum etc...”
  • 47. Lucene index of rdfs:label
  • 49. String queryLabels = """ SELECT ?uri ?label WHERE { ?uri rdfs:label ?label . } Build a SPARQL """ query to find all the rdfs:label properties sparqlQueryService.executeForEach(repo def doc = new Document() String uri = it.uri.stringValue() String label = it.label.stringValu doc.add(new Field(SUBJECT_URI_FIEL
  • 50. sparqlQueryService.executeForEach (repository, queryLabels) { String uri = it.uri.stringValue() String label = it.label.stringValu Execute the def doc = new Document() SPARQL query doc.add(new Field(SUBJECT_URI_FIEL Field.Store.YES, Field.Ind doc.add(new Field(LABEL_FIELD, lab Field.Store.NO, Field.Inde writer.addDocument(doc) }
  • 51. arqlQueryService.executeForEach(reposito String uri = it.uri.stringValue() String label = it.label.stringValue() Document doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, uri, Instantiate a new Lucene Field.Store.YES, Document Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc)
  • 52. key Document doc = new Document() doc.add(new Field(SUBJECT_URI_FIELD, value uri, Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, label, Add the Subject Field.Store.NO, URI to the Document Field.Index.ANALYZED)) writer.addDocument(doc) lly {
  • 53. Field.Store.YES, Field.Index.ANALYZED)) doc.add(new Field(LABEL_FIELD, key value label, Field.Store.NO, Field.Index.ANALYZED)) Add the Label field writer.addDocument(doc) document to the (but don’t store it) lly { iter.close() // Close index
  • 54. doc.add(new Field(LABEL_FIELD, labe Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) } inally { writer.close() // Closethe document Add index to the Index
  • 56. f query = { Query query = new QueryParser( Version.LUCENE_CURRENT, LABEL_FIELD, query this field new StandardAnalyzer()) .parse(params.query); for this value def s Create a Lucene = new Date().time Query from user List results = executeQuery(query) input def e = new Date().time render(view: 'index', model: [results:
  • 57. IndexSearcher searcher = luceneSearche ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDo List results = [] Search the index (limit 10) for def connection = repository.connection scoreDocs.each { matching documents Document doc = searcher.doc(it.doc String uri = doc[SUBJECT_URI_FIELD Map labelAndType = sparqlQueryServ results << [uri: uri, type: labelA } connection.close() return results
  • 58. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = For each matching sparqlQueryService. document, get the getLabelAndType(uri, connection) doc and extract the results.add([ Subject URI uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results
  • 59. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, Using the Subject URI, load properties type: labelAndType.type, from the triplestore label: labelAndType.label]) } connection.close() return results
  • 60. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) return results containing Subject String uri = doc[SUBJECT_URI_FIELD] Map labelAndType Type, and Label URI, = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) } connection.close() return results
  • 61. DEMO Lucene Index of Searchable Labels
  • 62. WHAT ABOUT ENTITY RELATIONSHIPS?
  • 63. WHAT ABOUT OTHER PROPERTIES?
  • 64. Lucene Extension Indexing and Searching Semi-Structured Data
  • 66. Document field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> .
  • 68. Connection connection = repository.conn y { String subjectUris = """ SELECT distinct ?uri WHERE { ?uri ?p ?o . } """ sparqlQueryService.executeForEach(rep def doc = new Select all Subject Document() URIs from the triplestore String subjectUri = it.uri.string doc.add(new Field(SUBJECT_URI_FIE subjectUri,
  • 69. """ sparqlQueryService.executeForEach( repository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringV doc.add(new Field(SUBJECT_URI_FIEL subjectUri, Field.Store.YES, Execute the Sparql Query Field.Index.ANALYZED)) For each URI, create a new Document StringWriter triplesStringWriter = NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStri
  • 70. epository, subjectUris) { def doc = new Document() String subjectUri = it.uri.stringValue doc.add(new Field(SUBJECT_URI_FIELD, subjectUri, Field.Store.YES, Field.Index.ANALYZED)) StringWriter triplesStringWriter = new NTriplesWriter nTriplesWriter =URI Add the Subject to the Document new NTriplesWriter(triplesStringWr connection.exportStatements( new URIImpl(subjectUri), null, null, false,
  • 71. Field.Index.ANALYZED)) StringWriter triplesStringWriter = new NTriplesWriter nTriplesWriter = new NTriplesWriter(triplesStringWr connection.exportStatements( new URIImpl(subjectUri), null, null, false, nTriplesWriter) Get an NTriples doc.add(new Field(TRIPLES_FIELD, string from the triplesStringWriter.toString() Field.Store.NO, triplestore Field.Index.ANALYZED))
  • 72. new URIImpl(subjectUri), null, null, false, nTriplesWriter) doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString() Field.Store.NO, Field.Index.ANALYZED)) Add the NTriples writer.addDocument(doc) string to the document
  • 73. doc.add(new Field(TRIPLES_FIELD, triplesStringWriter.toString() Field.Store.NO, Field.Index.ANALYZED)) writer.addDocument(doc) Add the document to the index
  • 75. SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = query the Triples new SirenCellQuery( new SirenTermQuery( field new Term(TRIPLES_FIELD, params.query.toLowerCase())) object.constraint = OBJECT_CELL
  • 76. SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL SirenCellQuery object = new SirenCellQuery( a predicate for new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase())) object.constraint = OBJECT_CELL
  • 77. SirenCellQuery predicate = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, RDFS.LABEL.stringValue()))); predicate.constraint = PREDICATE_CELL of rdfs:label * SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase())) * note: could be any predicate! object.constraint = OBJECT_CELL
  • 78. SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query the Triples query.add(predicate, field SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST)
  • 79. SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, for an object SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST)
  • 80. SirenCellQuery object = new SirenCellQuery( new SirenTermQuery( new Term(TRIPLES_FIELD, params.query.toLowerCase()) object.constraint = OBJECT_CELL Query query = new SirenTupleQuery() query.add(predicate, matching the user input SirenTupleClause.Occur.MUST) query.add(object, SirenTupleClause.Occur.MUST)
  • 81. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: “imatinib”
  • 82. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: triples field
  • 83. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = rdfs:label
  • 84. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = rdfs:label object = “imatinib”
  • 85. List executeQuery(Query query) { IndexSearcher searcher = sirenSearcherM ScoreDoc[] scoreDocs = searcher.search(query, 10).scoreDocs List results = [] def connection = repository.connection Search the index scoreDocs.each { (limit 10) for matching Document doc = searcher.doc(it.doc) documents String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryServi getLabelAndType(uri, connectio results.add([ uri: uri, type: labelAndType.type,
  • 86. List results = [] def connection = repository.connection scoreDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryServic For each matching getLabelAndType(uri, connection document, get the results.add([ doc and extract the uri: uri, Subject URI type: labelAndType.type, label: labelAndType.label]) } connection.close() return results
  • 87. connection = repository.connection reDocs.each { Document doc = searcher.doc(it.doc) String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, Using the Subject type: labelAndType.type, URI, load properties label: labelAndType.label]) from the triplestore nection.close() urn results
  • 88. String uri = doc[SUBJECT_URI_FIELD] Map labelAndType = sparqlQueryService. getLabelAndType(uri, connection) results.add([ uri: uri, type: labelAndType.type, label: labelAndType.label]) nection.close() return results urn results containing Subject URI, Type, and Label
  • 89. DEMO SIREn Index of RDF Entities
  • 91. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = rdfs:label object = “imatinib”
  • 92. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: object = “imatinib”
  • 93. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: object = “imatinib” OR object = “gleevec”
  • 95. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = brandName
  • 96. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: predicate = target
  • 98.
  • 99.
  • 100.
  • 101.
  • 102. field value URI <DB00619> <DB00619> rdfs:label "Imatinib" . <DB00619> rdf:type <drugbank:drugs> . triples <DB00619> drugbank:brandName "Gleevec" . <DB00619> drugbank:target <targets/1588> . Query: object = <targets/1588>
  • 103. DEMO Searching SIREn Index for Relationships
  • 104. Distributed Indexing and Searching Semi-Structured Data
  • 105.
  • 107.
  • 108.
  • 109.
  • 110.
  • 111. 400 Million Documents > 12 Billion Triples
  • 113. Query Parser subject predicate object
  • 114. DEMO SIREn in action on TripleMap.com
  • 115. DEMO SIREn in action on TripleMap.com
  • 116. SPARQL LUCENE SIREN TripleMap.com
  • 117. QUESTIONS? mike@entagen.com / twitter: @piragua TripleMap http://www.entagen.com http://www.triplemap.com