SlideShare a Scribd company logo
1 of 16
Data Segmenting in Anzo


Contact:
Lee Feigenbaum
lee@cambridgesemantics.com


                                      ©2011 Cambridge Semantics Inc. All rights reserved.
Simple Introduction to Cambridge Semantics & Anzo

     • Cambridge Semantics is a software startup founded
       by a team of engineers from IBM’s Advanced Internet
       Technology group in 2007
     • We sell the Anzo platform and tools to (mainly)
       Fortune 500 companies
     • Anzo is Semantic Web middleware that often stores
       large amounts of data for diverse uses




2                                          ©2011 Cambridge Semantics Inc. All rights reserved.
We Use Named Graphs

    • Primary tool for segmenting data in Anzo
    • Smallest unit of granularity for:
       –   Versioning & provenance
       –   Access control
       –   Notifications
       –   Replication
    • (Concretely: we use TriG extensively)




3                                             ©2011 Cambridge Semantics Inc. All rights reserved.
Which Triples Go Into a Named Graph?

    • Everything
       – Effectively a triple store
    • Single triple
       – Gives per statement access control, etc.
    • Whatever was in the source document
       – OK in some cases, but documents are often an artificial
         construct
       – What happens when doing a bulk load of hundreds of
         millions of triples?
    • All triples that share a subject
       – Decent compromise / default state in our experience
    • Closure of triples from a given subject following
      predicated annotated as “internal”
4                                                   ©2011 Cambridge Semantics Inc. All rights reserved.
Typical Anzo Data Segmenting

                debut showing             10/14/1994
     Pulp
    Fiction         budget                $ 8,500,000

              director


                                                    directed
                              Tarantino                                  Reservoir
                                                                           Dogs


                         birth date            full name

                                                   Quentin Jerome
                              3/27/1963              Tarantino




5                                                                   ©2011 Cambridge Semantics Inc. All rights reserved.
Impact of Typical Anzo Data Segmenting

    • Many, many (millions) of small graphs
    • Often corresponds with the natural granularity at
      which you want to do things like
      permissions, versioning, alerting, etc.
    • Significant overhead for per-graph metadata
       – Sometimes encourages other partitioning schemes




6                                              ©2011 Cambridge Semantics Inc. All rights reserved.
Finding the Graph for a Particular Resource

    • Default case: graph name is the same as the resource
      name
       – Not Kosher, but works well
    • Fallback case: system-wide SPARQL query
    • General case: graph resolution framework that can
      identify appropriate graph(s) via:
       – SPARQL DESCRIBE query (just kicks the can down the road
         a bit)
       – Lookup (registry)
       – Pattern matching (similar to POWDER)
    • (Graphs do not have to be local; sometimes
      resolution ends up retrieving them via HTTP or from
      an RDB)
7                                              ©2011 Cambridge Semantics Inc. All rights reserved.
Accessing Graphs

    • Replication service
       – Chunked to handle large graphs gracefully
       – Client replicas kept up to date via JMS-driven notification
         service
       – Replicas are cached aggressively – encourages smaller
         graphs to limit client memory footprint (e.g. in a Web
         browser)




8                                                  ©2011 Cambridge Semantics Inc. All rights reserved.
Linked Data in Anzo

    • Data in Anzo can be exposed as linked data
    • Anzo will dereference external URIs to get at
      data, but that’s of limited utility
       – Allows single-instance views, but not faceted browsing
    • Anzo does not use linked data internally for data
      access
    • Linked Data consumption/publication is a
      feature, not a core part of Anzo’s architecture




9                                                ©2011 Cambridge Semantics Inc. All rights reserved.
Accessing Graphs

     • SPARQL queries
       – Clients (e.g. Anzo on the Web facetted browser) target
         subsets of the server data with SPARQL queries
       – Impractical to enumerate millions of graphs in FROM or FROM
         NAMED clauses
       – Extend SPARQL with named datasets
          • Server-based lists of graphs that comprise an RDF dataset (default
            graph and named graphs)
          • Add FROM DATASET clause to reference named datasets from a
            query




10                                                       ©2011 Cambridge Semantics Inc. All rights reserved.
Anzo and other Sem Web Technologies

     • Everything described in RDFS and OWL (used as a rich
       data modeling language mostly)
     • We publish RDFa
     • We use JSON serializations of SPARQL results and RDF
     • We implement SPARQL Update but don’t use it from our
       tools
     • SPARQL-based rules (used to be CONSTRUCT, now INSERT )
     • We use SPARQL ASK queries for transaction pre-
       conditions and validation
     • We have our own long-in-the-tooth implementation of
       the D2RQ mapping language that we don’t use often

11                                            ©2011 Cambridge Semantics Inc. All rights reserved.
This is the full architecture that drives the Anzo
             Server and applications.
These parts are driven
primarily by SemWeb
    technologies.
These parts are driven
 primarily by quality
software engineering.
We can’t & shouldn’t standardize everything.

     • Need to leave room for competitive differentiation
       that goes beyond simply who has the “best”
       implementation of a standard
     • For standardization work, take a disciplined approach
       to identifying what problems are both:
        – Costly (a.k.a. valuable to solve)
        – Impacting interoperability




15                                            ©2011 Cambridge Semantics Inc. All rights reserved.
What we could use

     • We often get asked “can we use your tools against
       <insert arbitrary SPARQL endpoint or linked data
       source here>?”
        – “No.”
     • We need standards for & adoption of:
        – Richly advertising contents of linked data sources
           • c.f. VoID
        – Richly advertising capabilities of SPARQL endpoints
           • c.f. SPARQL 1.1 Service Description and Basic Federated Query
        – Named datasets
        – Various other SPARQL extensions (though we can work
          around many of these)
16                                                       ©2011 Cambridge Semantics Inc. All rights reserved.

More Related Content

What's hot

Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference KeynoteKingsley Uyi Idehen
 
Semantic web service
Semantic web serviceSemantic web service
Semantic web servicejean Agnimel
 
Understanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured DescriptionsUnderstanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured DescriptionsKingsley Uyi Idehen
 
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endUsing Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endKingsley Uyi Idehen
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overviewAmit Sheth
 
Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles Kingsley Uyi Idehen
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Matthew Petrillo
 
Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Artificial Intelligence Institute at UofSC
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Amit Sheth
 
Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...
Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...
Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...InSync2011
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Noterumito
 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Benjamin Heitmann
 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agavejuanaya
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integrationrumito
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCKingsley Uyi Idehen
 

What's hot (18)

Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
Virtuoso, The Prometheus of RDF -- Sematics 2014 Conference Keynote
 
Semantic web service
Semantic web serviceSemantic web service
Semantic web service
 
Semantics2014
Semantics2014Semantics2014
Semantics2014
 
Understanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured DescriptionsUnderstanding Linked Data via EAV Model based Structured Descriptions
Understanding Linked Data via EAV Model based Structured Descriptions
 
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-endUsing Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
Using Tibco SpotFire (via Virtuoso ODBC) as Linked Data Front-end
 
Semantic Web: introduction & overview
Semantic Web: introduction & overviewSemantic Web: introduction & overview
Semantic Web: introduction & overview
 
Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles Linked Open Data (LOD) Cloud & Ontology Life Cycles
Linked Open Data (LOD) Cloud & Ontology Life Cycles
 
Ontotext Overview Winter 2012
Ontotext Overview Winter 2012Ontotext Overview Winter 2012
Ontotext Overview Winter 2012
 
Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...Mastering the variety dimension of Big Data with semantic technologies: high ...
Mastering the variety dimension of Big Data with semantic technologies: high ...
 
Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...Semantic Web in Action: Ontology-driven information search, integration and a...
Semantic Web in Action: Ontology-driven information search, integration and a...
 
Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...
Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...
Database & Technology 1 _ Marcelle Kratchvil _ Why you should be storing unst...
 
Linked Data Planet Key Note
Linked Data Planet Key NoteLinked Data Planet Key Note
Linked Data Planet Key Note
 
Understanding Data
Understanding Data Understanding Data
Understanding Data
 
Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...Presentation of current research: distributed architecture for recommendation...
Presentation of current research: distributed architecture for recommendation...
 
ISWC 2012 - Linked Data Meetup
ISWC 2012 - Linked Data MeetupISWC 2012 - Linked Data Meetup
ISWC 2012 - Linked Data Meetup
 
Swap2010 agave
Swap2010 agaveSwap2010 agave
Swap2010 agave
 
Linked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale IntegrationLinked Data Driven Data Virtualization for Web-scale Integration
Linked Data Driven Data Virtualization for Web-scale Integration
 
Accessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBCAccessing the Linked Open Data Cloud via ODBC
Accessing the Linked Open Data Cloud via ODBC
 

Viewers also liked

rupali published paper
rupali published paperrupali published paper
rupali published paperRoopali Singh
 
BEA Content & Digital Conference & IDPF 2016
BEA Content & Digital Conference & IDPF 2016BEA Content & Digital Conference & IDPF 2016
BEA Content & Digital Conference & IDPF 2016Bowker
 
Búsqueda de información con google
Búsqueda de información con googleBúsqueda de información con google
Búsqueda de información con googleJuan Pariona
 
Resume-Ravi Nair-2015
Resume-Ravi Nair-2015Resume-Ravi Nair-2015
Resume-Ravi Nair-2015RAVI NAIR
 
Editing note
Editing noteEditing note
Editing noteramsha12
 
"Habitat" Katixa Agirre
"Habitat" Katixa Agirre"Habitat" Katixa Agirre
"Habitat" Katixa Agirreegaroa
 
Appendices Group A
Appendices Group AAppendices Group A
Appendices Group Anbastedo
 
Instruction Project
Instruction ProjectInstruction Project
Instruction ProjectJeremy Vargo
 
Location pictures
Location picturesLocation pictures
Location picturesramsha12
 
Ecg Procedure
Ecg ProcedureEcg Procedure
Ecg Procedureajushetty
 
ประวัติส่วนตัว0001
ประวัติส่วนตัว0001ประวัติส่วนตัว0001
ประวัติส่วนตัว0001Supattra Phoikanha
 
1. adhesive cementation of indirect composite inlays and onlays. a literature...
1. adhesive cementation of indirect composite inlays and onlays. a literature...1. adhesive cementation of indirect composite inlays and onlays. a literature...
1. adhesive cementation of indirect composite inlays and onlays. a literature...Lady viviana panduro monteiro
 
Computer Use Policy EK Computers
Computer Use Policy EK ComputersComputer Use Policy EK Computers
Computer Use Policy EK ComputersKirstin Long
 
Todo lo que querías saber acerca de la Globalizacion.
Todo lo que querías saber acerca de la Globalizacion.Todo lo que querías saber acerca de la Globalizacion.
Todo lo que querías saber acerca de la Globalizacion.alfredmora
 

Viewers also liked (19)

rupali published paper
rupali published paperrupali published paper
rupali published paper
 
BEA Content & Digital Conference & IDPF 2016
BEA Content & Digital Conference & IDPF 2016BEA Content & Digital Conference & IDPF 2016
BEA Content & Digital Conference & IDPF 2016
 
Búsqueda de información con google
Búsqueda de información con googleBúsqueda de información con google
Búsqueda de información con google
 
Resume-Ravi Nair-2015
Resume-Ravi Nair-2015Resume-Ravi Nair-2015
Resume-Ravi Nair-2015
 
Editing note
Editing noteEditing note
Editing note
 
"Habitat" Katixa Agirre
"Habitat" Katixa Agirre"Habitat" Katixa Agirre
"Habitat" Katixa Agirre
 
Appendices Group A
Appendices Group AAppendices Group A
Appendices Group A
 
Burrito POS 2016
Burrito POS 2016Burrito POS 2016
Burrito POS 2016
 
Instruction Project
Instruction ProjectInstruction Project
Instruction Project
 
Mohawk_Transcript2
Mohawk_Transcript2Mohawk_Transcript2
Mohawk_Transcript2
 
Location pictures
Location picturesLocation pictures
Location pictures
 
Ecg Procedure
Ecg ProcedureEcg Procedure
Ecg Procedure
 
ประวัติส่วนตัว0001
ประวัติส่วนตัว0001ประวัติส่วนตัว0001
ประวัติส่วนตัว0001
 
1. adhesive cementation of indirect composite inlays and onlays. a literature...
1. adhesive cementation of indirect composite inlays and onlays. a literature...1. adhesive cementation of indirect composite inlays and onlays. a literature...
1. adhesive cementation of indirect composite inlays and onlays. a literature...
 
Computer Use Policy EK Computers
Computer Use Policy EK ComputersComputer Use Policy EK Computers
Computer Use Policy EK Computers
 
Turbinas a vapor
Turbinas a vaporTurbinas a vapor
Turbinas a vapor
 
Nuevo documento 1
Nuevo documento 1Nuevo documento 1
Nuevo documento 1
 
Todo lo que querías saber acerca de la Globalizacion.
Todo lo que querías saber acerca de la Globalizacion.Todo lo que querías saber acerca de la Globalizacion.
Todo lo que querías saber acerca de la Globalizacion.
 
Pit and fissure
Pit and fissurePit and fissure
Pit and fissure
 

Similar to Data Segmenting in Anzo

StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011darach
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBasedarach
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists jlacefie
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Michael McIntosh
 
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02TNR Global
 
Couchbase Performance Benchmarking
Couchbase Performance BenchmarkingCouchbase Performance Benchmarking
Couchbase Performance BenchmarkingRenat Khasanshyn
 
Couchbase Performance Benchmarking 2012
Couchbase Performance Benchmarking 2012Couchbase Performance Benchmarking 2012
Couchbase Performance Benchmarking 2012Altoros
 
Django in enterprise world
Django in enterprise worldDjango in enterprise world
Django in enterprise worldSimone Federici
 
Nimbuzz march2012
Nimbuzz march2012Nimbuzz march2012
Nimbuzz march2012nlwebperf
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problemsAbhishek Gupta
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...Amazon Web Services
 
Introduction To Perl - SpringPeople
Introduction To Perl - SpringPeopleIntroduction To Perl - SpringPeople
Introduction To Perl - SpringPeopleSpringPeople
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure javaRoman Elizarov
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersJulien Anguenot
 

Similar to Data Segmenting in Anzo (20)

StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
StreamBase - Embedded Erjang - Erlang User Group London - 20th April 2011
 
Complex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBaseComplex Er[jl]ang Processing with StreamBase
Complex Er[jl]ang Processing with StreamBase
 
Data Engineering for Data Scientists
Data Engineering for Data Scientists Data Engineering for Data Scientists
Data Engineering for Data Scientists
 
Web servicesoverview
Web servicesoverviewWeb servicesoverview
Web servicesoverview
 
Web servicesoverview
Web servicesoverviewWeb servicesoverview
Web servicesoverview
 
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
Migration from FAST ESP to Lucene Solr - Apache Lucene Eurocon Barcelona 2011
 
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02Esp2solr eurocon-2011-presentation-111021215049-phpapp02
Esp2solr eurocon-2011-presentation-111021215049-phpapp02
 
Couchbase Performance Benchmarking
Couchbase Performance BenchmarkingCouchbase Performance Benchmarking
Couchbase Performance Benchmarking
 
Couchbase Performance Benchmarking 2012
Couchbase Performance Benchmarking 2012Couchbase Performance Benchmarking 2012
Couchbase Performance Benchmarking 2012
 
Django in enterprise world
Django in enterprise worldDjango in enterprise world
Django in enterprise world
 
Stardog talk-dc-march-17
Stardog talk-dc-march-17Stardog talk-dc-march-17
Stardog talk-dc-march-17
 
Nimbuzz march2012
Nimbuzz march2012Nimbuzz march2012
Nimbuzz march2012
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
An Introduction to IaaS Framework
An Introduction to IaaS FrameworkAn Introduction to IaaS Framework
An Introduction to IaaS Framework
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
NEW LAUNCH! How to build graph applications with SPARQL and Gremlin using Ama...
 
Introduction To Perl - SpringPeople
Introduction To Perl - SpringPeopleIntroduction To Perl - SpringPeople
Introduction To Perl - SpringPeople
 
Millions quotes per second in pure java
Millions quotes per second in pure javaMillions quotes per second in pure java
Millions quotes per second in pure java
 
Introduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developersIntroduction to Cassandra and CQL for Java developers
Introduction to Cassandra and CQL for Java developers
 

More from LeeFeigenbaum

Evolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebEvolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebLeeFeigenbaum
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialLeeFeigenbaum
 
What;s Coming In SPARQL2?
What;s Coming In SPARQL2?What;s Coming In SPARQL2?
What;s Coming In SPARQL2?LeeFeigenbaum
 
Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009LeeFeigenbaum
 

More from LeeFeigenbaum (6)

Evolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic WebEvolution Towards Web 3.0: The Semantic Web
Evolution Towards Web 3.0: The Semantic Web
 
CSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web TutorialCSHALS 2010 W3C Semanic Web Tutorial
CSHALS 2010 W3C Semanic Web Tutorial
 
What;s Coming In SPARQL2?
What;s Coming In SPARQL2?What;s Coming In SPARQL2?
What;s Coming In SPARQL2?
 
SPARQL 1.1 Status
SPARQL 1.1 StatusSPARQL 1.1 Status
SPARQL 1.1 Status
 
SPARQL Cheat Sheet
SPARQL Cheat SheetSPARQL Cheat Sheet
SPARQL Cheat Sheet
 
Semantic Web Landscape 2009
Semantic Web Landscape 2009Semantic Web Landscape 2009
Semantic Web Landscape 2009
 

Recently uploaded

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 

Recently uploaded (20)

Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort ServiceHot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
Hot Sexy call girls in Panjabi Bagh 🔝 9953056974 🔝 Delhi escort Service
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 

Data Segmenting in Anzo

  • 1. Data Segmenting in Anzo Contact: Lee Feigenbaum lee@cambridgesemantics.com ©2011 Cambridge Semantics Inc. All rights reserved.
  • 2. Simple Introduction to Cambridge Semantics & Anzo • Cambridge Semantics is a software startup founded by a team of engineers from IBM’s Advanced Internet Technology group in 2007 • We sell the Anzo platform and tools to (mainly) Fortune 500 companies • Anzo is Semantic Web middleware that often stores large amounts of data for diverse uses 2 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 3. We Use Named Graphs • Primary tool for segmenting data in Anzo • Smallest unit of granularity for: – Versioning & provenance – Access control – Notifications – Replication • (Concretely: we use TriG extensively) 3 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 4. Which Triples Go Into a Named Graph? • Everything – Effectively a triple store • Single triple – Gives per statement access control, etc. • Whatever was in the source document – OK in some cases, but documents are often an artificial construct – What happens when doing a bulk load of hundreds of millions of triples? • All triples that share a subject – Decent compromise / default state in our experience • Closure of triples from a given subject following predicated annotated as “internal” 4 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 5. Typical Anzo Data Segmenting debut showing 10/14/1994 Pulp Fiction budget $ 8,500,000 director directed Tarantino Reservoir Dogs birth date full name Quentin Jerome 3/27/1963 Tarantino 5 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 6. Impact of Typical Anzo Data Segmenting • Many, many (millions) of small graphs • Often corresponds with the natural granularity at which you want to do things like permissions, versioning, alerting, etc. • Significant overhead for per-graph metadata – Sometimes encourages other partitioning schemes 6 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 7. Finding the Graph for a Particular Resource • Default case: graph name is the same as the resource name – Not Kosher, but works well • Fallback case: system-wide SPARQL query • General case: graph resolution framework that can identify appropriate graph(s) via: – SPARQL DESCRIBE query (just kicks the can down the road a bit) – Lookup (registry) – Pattern matching (similar to POWDER) • (Graphs do not have to be local; sometimes resolution ends up retrieving them via HTTP or from an RDB) 7 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 8. Accessing Graphs • Replication service – Chunked to handle large graphs gracefully – Client replicas kept up to date via JMS-driven notification service – Replicas are cached aggressively – encourages smaller graphs to limit client memory footprint (e.g. in a Web browser) 8 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 9. Linked Data in Anzo • Data in Anzo can be exposed as linked data • Anzo will dereference external URIs to get at data, but that’s of limited utility – Allows single-instance views, but not faceted browsing • Anzo does not use linked data internally for data access • Linked Data consumption/publication is a feature, not a core part of Anzo’s architecture 9 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 10. Accessing Graphs • SPARQL queries – Clients (e.g. Anzo on the Web facetted browser) target subsets of the server data with SPARQL queries – Impractical to enumerate millions of graphs in FROM or FROM NAMED clauses – Extend SPARQL with named datasets • Server-based lists of graphs that comprise an RDF dataset (default graph and named graphs) • Add FROM DATASET clause to reference named datasets from a query 10 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 11. Anzo and other Sem Web Technologies • Everything described in RDFS and OWL (used as a rich data modeling language mostly) • We publish RDFa • We use JSON serializations of SPARQL results and RDF • We implement SPARQL Update but don’t use it from our tools • SPARQL-based rules (used to be CONSTRUCT, now INSERT ) • We use SPARQL ASK queries for transaction pre- conditions and validation • We have our own long-in-the-tooth implementation of the D2RQ mapping language that we don’t use often 11 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 12. This is the full architecture that drives the Anzo Server and applications.
  • 13. These parts are driven primarily by SemWeb technologies.
  • 14. These parts are driven primarily by quality software engineering.
  • 15. We can’t & shouldn’t standardize everything. • Need to leave room for competitive differentiation that goes beyond simply who has the “best” implementation of a standard • For standardization work, take a disciplined approach to identifying what problems are both: – Costly (a.k.a. valuable to solve) – Impacting interoperability 15 ©2011 Cambridge Semantics Inc. All rights reserved.
  • 16. What we could use • We often get asked “can we use your tools against <insert arbitrary SPARQL endpoint or linked data source here>?” – “No.” • We need standards for & adoption of: – Richly advertising contents of linked data sources • c.f. VoID – Richly advertising capabilities of SPARQL endpoints • c.f. SPARQL 1.1 Service Description and Basic Federated Query – Named datasets – Various other SPARQL extensions (though we can work around many of these) 16 ©2011 Cambridge Semantics Inc. All rights reserved.