SlideShare a Scribd company logo
1 of 39
Download to read offline
Evolution of the Graph Schema
Data Day Seattle 2017
Joshua Shinavier, PhD
20.10.2017
1. Knowledge and graphs
2. Semantic Web to Property Graphs
3. Re-emergence of the graph schema
4. Elements of a schema language
5. Graph and schema management
6. Graph generation
Outline
Knowledge and graphs
• Performance is a factor, but
• Many storage back-ends can be adapted to graphs
• E.g. relational DBs, column stores, key-value stores
• Better reasons:
• The domain model is graph-like
• We can take inspiration from the way we naturally
understand the world
Why graph databases?
Early data modelers
κατηγορία!
• How do we relate data with concepts in order to make
inferences or take action?
• We use schemata — rules that constrain data to a
language of “categories” (concepts)
• Some fundamental categories are built in
• E.g. plurality, necessity, limitation, negation,
reciprocity, etc.
• Others are built upon the foundation
Kant’s “schemata”
This schematism […] is an art, hidden in the depths of the human
soul, whose true modes of action we shall only with difficulty
discover and unveil. (Kant, 1781)
• Psychologists saw Kant’s “schemata” in the organization
of human memory (Head, 1920), but
• Memory is more than storage and recall
• We react to data by combining a schema with an
attitude (Bartlett, 1933)
Schemas in psychology
• Scripts, plans, goals (Schank & Abelson, 1970s)
• Frames (Minsky, 1974)
• Early KR languages
• Upper-level ontologies and commonsense
knowledge bases
Enter databases and AI
Semantic Web to Property Graphs
• A vocabulary for vocabulary sharing
• Includes a handful of basic terms
• Classes, properties, inheritance
• Meets the needs of most Web schemas
RDF Schema (RDFS)
• A much more expressive language for ontology development
• Supports:
• Classes and properties with inheritance
• Equality (sameAs, differentFrom, equivalentTo)
• Property domain/range restrictions, cardinality restrictions
• Inverse, transitive, and symmetric properties
• Ontology metadata (imports, versioning)
• Sublanguages OWL Full, OWL DL (description logic), OWL
Lite
• OWL 2 profiles EL (polynomial-time checking), QL (memory-
efficient query answering), DL (completeness and decidability)
• Is this slide too dense? OWL is huge.
OWL
• What commercial applications ended up using
• Supported by AllegroGraph, TopBraid, etc.
• All of RDFS
• Classes, properties, inheritance
• A few terms stolen from OWL
• e.g. sameAs, inverseOf, TransitiveProperty
“RDFS+”
The Web of Data…
• Property Graph data model takes a minimalist
approach
• Typically no inference or rules support
• Graph DBs, schema.org are a response to real-world
demands
…simplified
1
3
2
foo
foo
bar
Re-emergence of the graph schema
• There is power in simplicity
• NoSQL databases are said to have no predefined
schema
• In practice, every graph DB has a schema
• A set of constraints or assumptions about correct
structure
• Useful for validation and optimization
• There is no graph schema standard
NoSQL ⇏ no schema
• Property Graph data model is a basic schema
• Edge labels (required)
• Vertex labels (optional)
• Property keys (required)
• Property data types (optional, with optional constraints)
• Vertex meta-properties (optional)
Schemas in TinkerPop
• Labels
• Simple types on nodes and/or relationships
• Indexes
• Single-property — equality, existence, containment,
ranges
• Composite (multiple properties) — equality only
• Constraints
• Node property uniqueness
• Node/relationship property existence
• Node key (set of properties unique for the node)
Schemas in Neo4j
• Vertex and edge labels
• Property keys
• Property cardinality (SINGLE, LIST, SET)
• Indexes
• Graph-centric
• Individual properties, composite
• Vertex-centric (index on incoming/outgoing edges)
• Sorting key, sort order
• Automatic/implicit schema creation
Schemas in JanusGraph
• Object databases ≠ graph databases, but similar
• Built-in, object-oriented schemas
• Classes, extension, relationships, recursivity, etc.
• Used for encapsulation, composition, inheritance,
delegation, etc.
• OOP frameworks for graph DBs
• Frames, Ferma, etc.
Schemas in object databases
• Hypernode
• Objects, relations, and functions
• GROOVY
• Multi-level OOP schemas
• Hypergraph DB
• Types and relationships
• Grakn.AI
• Entities, relations, roles, and resources (data type,
uniqueness, regex)
• Single inheritance
Schemas in hypergraph databases
Elements of a schema language
• Support for a basic schema vocabulary
• Entity and relationship types, constraints
• Good coverage of existing schema frameworks
• Extensibility of schemas and types
• Mappings to RDF, schema.org, and storage frameworks
• Reference APIs for
• Schema validation
• Graph schema initialization and migration
• Statistical models, graph generation
Design goals
• Things about which we can make assertions
• “Classes” in RDF, “types” in schema.org, “vertex labels”
in TinkerPop, etc.
• Extend other entity types
Entity types
entities:
- label: Trip
sameAs: http://schema.org/TravelAction
description: A trip taken by a driver or requested by a rider
• Assertions about things
• “Properties” in RDF and schema.org
• “Edges” vs. “properties” in graph databases
• Hyperedges, meta-properties are also “relations”
Relationship types
relations:
- label: requested
description: Relates a rider to a trip he or she has requested
extends:
- core.relatedTo
cardinality: OneToMany
from: users.User
to: Trip
• Graph-centric
• Single-relation, composite
• Entity-centric
• Ordering on a secondary key
Index hints
indexes:
- key: core.uuid
- key: trips.requested
direction: Out
orderBy: core.createdAt
order: Decreasing
• Schemas import other schemas, like software modules
• Give developers/teams autonomy, but
• Coordinate schema integration top-down
Schema imports
name: production
version: 1.2
includes:
- name: trips
version: 1.2
- name: referrals
version: 1.2
Graph and schema management
• Study the source data
• Extend and validate the shared schema
• Generate artificial graph data
• Study system performance, iterate on the model
• Develop ingestion mappings for real data
• Review and check in schema changes
• Apply the schema to a live database
• Ingest data into the live database
Graph onboarding workflow
Revision control for schemas
• The schema is constantly changing
• Is this database compatible with this schema?
• How to update the database w.r.t. the schema?
• Use revision control to find diffs
• Ordered lists of basic changes
• Translate diffs to storage-specific workflows
• Ordered lists of idempotent operations
• Apply diff workflows to the database
Schema initialization, migration public enum SchemaChange {
AbstractAttributeChanged,
CardinalityChanged,
DomainChanged,
EntityAdded,
EntityRemoved,
ExtensionAdded,
ExtensionRemoved,
IncludeAdded,
IncludeRemoved,
IndexAdded,
IndexRemoved,
RangeChanged,
RelationAdded,
RelationRemoved,
RequiredAttributeChanged,
RequiredOfAttributeChanged,
SchemaAdded,
SchemaRemoved,
SchemaNameChanged,
SchemaVersionChanged,
}
Schema diff and patch
New
Database
Schema x.1
Schema x.2
Database at
Schema x.1
initialize
Diff of x.1
and x.2
Database at
Schema x.2
apply
diff
find
diff
Migration is not always possible
Don’t feel bad!
Basic schemas can’t be changed!
• E.g.
• Removal or abstraction of types already in use
• Changes unsupported at the storage level
Graph generation
• Problem:
• Need to predict write throughput, read latency
given 10x more data
• Analytical solutions are difficult
• Solution?
• Generate graphs of different sizes
• Study the trends
• Problem:
• Where do we get the data?
• Shrinking or growing real data is difficult
Capacity planning
• Existing graph benchmarks
• Lancichinetti-Fortunato-Radicchi (LFR) benchmark
• graphdb-benchmarks
• Linked Data Benchmark Council (LDBC)
• SPARQL benchmarks for triple stores
• None of these are very much like our data
• Not a social network; no power law distributions
• Vastly different topology
• Idea: use the schema to generate statistically
representative data
Benchmarking options
• Gather some statistics
• Entity and relationship type distributions
• Per-relationship in- and out-degree distributions
• Add these to the schema
• Give the Graphgen utility a dataset size, random seed
• Graphgen attempts to create a graph in accordance
with the model
• Gather statistics from the generated graph
• Compare and contrast
• Same dataset can be generated in different
environments
Graph generation workflow
Q&A
Joshua Shinavier
joshsh@uber.com
Kyler Liu
kylerliu@uber.com
Vignesh Ganapathy
vigneshg@uber.com
Evolution of the Graph Schema Data Day Seattle 2017

More Related Content

What's hot

Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsDATAVERSITY
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxTomazBratanic1
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph IntroductionSören Auer
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKai Wähner
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityJoshua Shinavier
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAmazon Web Services
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningDavid Stein
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Edureka!
 
Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...
Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...
Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...Amazon Web Services
 
ENEL Electricity Topology Network on Neo4j Graph DB
ENEL Electricity Topology Network on Neo4j Graph DBENEL Electricity Topology Network on Neo4j Graph DB
ENEL Electricity Topology Network on Neo4j Graph DBNeo4j
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesCambridge Semantics
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing DataDatabricks
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge GraphsJeff Z. Pan
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overviewNeo4j
 
Developing custom transformation in the Kafka connect to minimize data redund...
Developing custom transformation in the Kafka connect to minimize data redund...Developing custom transformation in the Kafka connect to minimize data redund...
Developing custom transformation in the Kafka connect to minimize data redund...HostedbyConfluent
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveYingjun Wu
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Ontotext
 

What's hot (20)

Slides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property GraphsSlides: Knowledge Graphs vs. Property Graphs
Slides: Knowledge Graphs vs. Property Graphs
 
Deep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptxDeep dive into LangChain integration with Neo4j.pptx
Deep dive into LangChain integration with Neo4j.pptx
 
Knowledge Graph Introduction
Knowledge Graph IntroductionKnowledge Graph Introduction
Knowledge Graph Introduction
 
Kappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology ComparisonKappa vs Lambda Architectures and Technology Comparison
Kappa vs Lambda Architectures and Technology Comparison
 
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from RealityBuilding an Enterprise Knowledge Graph @Uber: Lessons from Reality
Building an Enterprise Knowledge Graph @Uber: Lessons from Reality
 
Graph database
Graph database Graph database
Graph database
 
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the CloudAWS Neptune - A Fast and reliable Graph Database Built for the Cloud
AWS Neptune - A Fast and reliable Graph Database Built for the Cloud
 
TinkerPop 2020
TinkerPop 2020TinkerPop 2020
TinkerPop 2020
 
Frame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine LearningFrame - Feature Management for Productive Machine Learning
Frame - Feature Management for Productive Machine Learning
 
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
Pyspark Tutorial | Introduction to Apache Spark with Python | PySpark Trainin...
 
Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...
Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...
Using Amazon Neptune to power identity resolution at scale - ADB303 - Atlanta...
 
ENEL Electricity Topology Network on Neo4j Graph DB
ENEL Electricity Topology Network on Neo4j Graph DBENEL Electricity Topology Network on Neo4j Graph DB
ENEL Electricity Topology Network on Neo4j Graph DB
 
Semantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational DatabasesSemantic Graph Databases: The Evolution of Relational Databases
Semantic Graph Databases: The Evolution of Relational Databases
 
Democratizing Data
Democratizing DataDemocratizing Data
Democratizing Data
 
Introduction of Knowledge Graphs
Introduction of Knowledge GraphsIntroduction of Knowledge Graphs
Introduction of Knowledge Graphs
 
Neo4j 4.1 overview
Neo4j 4.1 overviewNeo4j 4.1 overview
Neo4j 4.1 overview
 
Developing custom transformation in the Kafka connect to minimize data redund...
Developing custom transformation in the Kafka connect to minimize data redund...Developing custom transformation in the Kafka connect to minimize data redund...
Developing custom transformation in the Kafka connect to minimize data redund...
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Battle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWaveBattle of the Stream Processing Titans – Flink versus RisingWave
Battle of the Stream Processing Titans – Flink versus RisingWave
 
Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020Property graph vs. RDF Triplestore comparison in 2020
Property graph vs. RDF Triplestore comparison in 2020
 

Similar to Evolution of the Graph Schema Data Day Seattle 2017

Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphP. Taylor Goetz
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphDataWorks Summit
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWSAmazon Web Services
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxRahul Borate
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureArthur Gimpel
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Don Demcsak
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...Institute of Contemporary Sciences
 
Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute TIB Academy
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology LandscapeShivanandaVSeeri
 
Infromation Reprentation, Structured Data and Semantics
Infromation Reprentation,Structured Data and SemanticsInfromation Reprentation,Structured Data and Semantics
Infromation Reprentation, Structured Data and SemanticsYogendra Tamang
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The CloudImaginea
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloudImaginea
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singhMayank Singh
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebSimon Price
 

Similar to Evolution of the Graph Schema Data Day Seattle 2017 (20)

Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
Large Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraphLarge Scale Graph Analytics with JanusGraph
Large Scale Graph Analytics with JanusGraph
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS(DAT203) Building Graph Databases on AWS
(DAT203) Building Graph Databases on AWS
 
UNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptxUNIT I Introduction to NoSQL.pptx
UNIT I Introduction to NoSQL.pptx
 
Oracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data ArchitectureOracle Week 2016 - Modern Data Architecture
Oracle Week 2016 - Modern Data Architecture
 
Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)Big Data (NJ SQL Server User Group)
Big Data (NJ SQL Server User Group)
 
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 How to use Big Data and Data Lake concept in business using Hadoop and Spark... How to use Big Data and Data Lake concept in business using Hadoop and Spark...
How to use Big Data and Data Lake concept in business using Hadoop and Spark...
 
Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute Advance Java Training in Bangalore | Best Java Training Institute
Advance Java Training in Bangalore | Best Java Training Institute
 
Big Data technology Landscape
Big Data technology LandscapeBig Data technology Landscape
Big Data technology Landscape
 
Infromation Reprentation, Structured Data and Semantics
Infromation Reprentation,Structured Data and SemanticsInfromation Reprentation,Structured Data and Semantics
Infromation Reprentation, Structured Data and Semantics
 
Nosql databases
Nosql databasesNosql databases
Nosql databases
 
Scaling Databases On The Cloud
Scaling Databases On The CloudScaling Databases On The Cloud
Scaling Databases On The Cloud
 
Scaing databases on the cloud
Scaing databases on the cloudScaing databases on the cloud
Scaing databases on the cloud
 
NoSql - mayank singh
NoSql - mayank singhNoSql - mayank singh
NoSql - mayank singh
 
NoSQL.pptx
NoSQL.pptxNoSQL.pptx
NoSQL.pptx
 
NoSql
NoSqlNoSql
NoSql
 
Ontologies & linked open data
Ontologies & linked open dataOntologies & linked open data
Ontologies & linked open data
 
ORM Methodology
ORM MethodologyORM Methodology
ORM Methodology
 
A review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic WebA review of the state of the art in Machine Learning on the Semantic Web
A review of the state of the art in Machine Learning on the Semantic Web
 

More from Joshua Shinavier

In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)Joshua Shinavier
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)Joshua Shinavier
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Joshua Shinavier
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsJoshua Shinavier
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.orgJoshua Shinavier
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsJoshua Shinavier
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsJoshua Shinavier
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsJoshua Shinavier
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked DataJoshua Shinavier
 

More from Joshua Shinavier (11)

In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)In Search of the Universal Data Model (ISWC 2019 Minute Madness)
In Search of the Universal Data Model (ISWC 2019 Minute Madness)
 
In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)In Search of the Universal Data Model (Connected Data London 2019)
In Search of the Universal Data Model (Connected Data London 2019)
 
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
Algebraic Property Graphs (GQL Community Update, oct. 9, 2019)
 
TinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBsTinkerPop: a story of graphs, DBs, and graph DBs
TinkerPop: a story of graphs, DBs, and graph DBs
 
Semantics and Sensors
Semantics and SensorsSemantics and Sensors
Semantics and Sensors
 
semantic markup using schema.org
semantic markup using schema.orgsemantic markup using schema.org
semantic markup using schema.org
 
The Real-time Web in the Age of Agents
The Real-time Web in the Age of AgentsThe Real-time Web in the Age of Agents
The Real-time Web in the Age of Agents
 
Linked Process
Linked ProcessLinked Process
Linked Process
 
Real-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter AnnotationsReal-time Semantic Web with Twitter Annotations
Real-time Semantic Web with Twitter Annotations
 
Real-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 charsReal-time #SemanticWeb in 140 chars
Real-time #SemanticWeb in 140 chars
 
The state of the art in Linked Data
The state of the art in Linked DataThe state of the art in Linked Data
The state of the art in Linked Data
 

Recently uploaded

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 

Recently uploaded (20)

Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Naraina Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 

Evolution of the Graph Schema Data Day Seattle 2017

  • 1. Evolution of the Graph Schema Data Day Seattle 2017 Joshua Shinavier, PhD 20.10.2017
  • 2. 1. Knowledge and graphs 2. Semantic Web to Property Graphs 3. Re-emergence of the graph schema 4. Elements of a schema language 5. Graph and schema management 6. Graph generation Outline
  • 4. • Performance is a factor, but • Many storage back-ends can be adapted to graphs • E.g. relational DBs, column stores, key-value stores • Better reasons: • The domain model is graph-like • We can take inspiration from the way we naturally understand the world Why graph databases?
  • 6. • How do we relate data with concepts in order to make inferences or take action? • We use schemata — rules that constrain data to a language of “categories” (concepts) • Some fundamental categories are built in • E.g. plurality, necessity, limitation, negation, reciprocity, etc. • Others are built upon the foundation Kant’s “schemata” This schematism […] is an art, hidden in the depths of the human soul, whose true modes of action we shall only with difficulty discover and unveil. (Kant, 1781)
  • 7. • Psychologists saw Kant’s “schemata” in the organization of human memory (Head, 1920), but • Memory is more than storage and recall • We react to data by combining a schema with an attitude (Bartlett, 1933) Schemas in psychology
  • 8. • Scripts, plans, goals (Schank & Abelson, 1970s) • Frames (Minsky, 1974) • Early KR languages • Upper-level ontologies and commonsense knowledge bases Enter databases and AI
  • 9. Semantic Web to Property Graphs
  • 10. • A vocabulary for vocabulary sharing • Includes a handful of basic terms • Classes, properties, inheritance • Meets the needs of most Web schemas RDF Schema (RDFS)
  • 11. • A much more expressive language for ontology development • Supports: • Classes and properties with inheritance • Equality (sameAs, differentFrom, equivalentTo) • Property domain/range restrictions, cardinality restrictions • Inverse, transitive, and symmetric properties • Ontology metadata (imports, versioning) • Sublanguages OWL Full, OWL DL (description logic), OWL Lite • OWL 2 profiles EL (polynomial-time checking), QL (memory- efficient query answering), DL (completeness and decidability) • Is this slide too dense? OWL is huge. OWL
  • 12. • What commercial applications ended up using • Supported by AllegroGraph, TopBraid, etc. • All of RDFS • Classes, properties, inheritance • A few terms stolen from OWL • e.g. sameAs, inverseOf, TransitiveProperty “RDFS+”
  • 13. The Web of Data…
  • 14. • Property Graph data model takes a minimalist approach • Typically no inference or rules support • Graph DBs, schema.org are a response to real-world demands …simplified 1 3 2 foo foo bar
  • 15. Re-emergence of the graph schema
  • 16. • There is power in simplicity • NoSQL databases are said to have no predefined schema • In practice, every graph DB has a schema • A set of constraints or assumptions about correct structure • Useful for validation and optimization • There is no graph schema standard NoSQL ⇏ no schema
  • 17. • Property Graph data model is a basic schema • Edge labels (required) • Vertex labels (optional) • Property keys (required) • Property data types (optional, with optional constraints) • Vertex meta-properties (optional) Schemas in TinkerPop
  • 18. • Labels • Simple types on nodes and/or relationships • Indexes • Single-property — equality, existence, containment, ranges • Composite (multiple properties) — equality only • Constraints • Node property uniqueness • Node/relationship property existence • Node key (set of properties unique for the node) Schemas in Neo4j
  • 19. • Vertex and edge labels • Property keys • Property cardinality (SINGLE, LIST, SET) • Indexes • Graph-centric • Individual properties, composite • Vertex-centric (index on incoming/outgoing edges) • Sorting key, sort order • Automatic/implicit schema creation Schemas in JanusGraph
  • 20. • Object databases ≠ graph databases, but similar • Built-in, object-oriented schemas • Classes, extension, relationships, recursivity, etc. • Used for encapsulation, composition, inheritance, delegation, etc. • OOP frameworks for graph DBs • Frames, Ferma, etc. Schemas in object databases
  • 21. • Hypernode • Objects, relations, and functions • GROOVY • Multi-level OOP schemas • Hypergraph DB • Types and relationships • Grakn.AI • Entities, relations, roles, and resources (data type, uniqueness, regex) • Single inheritance Schemas in hypergraph databases
  • 22. Elements of a schema language
  • 23. • Support for a basic schema vocabulary • Entity and relationship types, constraints • Good coverage of existing schema frameworks • Extensibility of schemas and types • Mappings to RDF, schema.org, and storage frameworks • Reference APIs for • Schema validation • Graph schema initialization and migration • Statistical models, graph generation Design goals
  • 24. • Things about which we can make assertions • “Classes” in RDF, “types” in schema.org, “vertex labels” in TinkerPop, etc. • Extend other entity types Entity types entities: - label: Trip sameAs: http://schema.org/TravelAction description: A trip taken by a driver or requested by a rider
  • 25. • Assertions about things • “Properties” in RDF and schema.org • “Edges” vs. “properties” in graph databases • Hyperedges, meta-properties are also “relations” Relationship types relations: - label: requested description: Relates a rider to a trip he or she has requested extends: - core.relatedTo cardinality: OneToMany from: users.User to: Trip
  • 26. • Graph-centric • Single-relation, composite • Entity-centric • Ordering on a secondary key Index hints indexes: - key: core.uuid - key: trips.requested direction: Out orderBy: core.createdAt order: Decreasing
  • 27. • Schemas import other schemas, like software modules • Give developers/teams autonomy, but • Coordinate schema integration top-down Schema imports name: production version: 1.2 includes: - name: trips version: 1.2 - name: referrals version: 1.2
  • 28. Graph and schema management
  • 29. • Study the source data • Extend and validate the shared schema • Generate artificial graph data • Study system performance, iterate on the model • Develop ingestion mappings for real data • Review and check in schema changes • Apply the schema to a live database • Ingest data into the live database Graph onboarding workflow
  • 31. • The schema is constantly changing • Is this database compatible with this schema? • How to update the database w.r.t. the schema? • Use revision control to find diffs • Ordered lists of basic changes • Translate diffs to storage-specific workflows • Ordered lists of idempotent operations • Apply diff workflows to the database Schema initialization, migration public enum SchemaChange { AbstractAttributeChanged, CardinalityChanged, DomainChanged, EntityAdded, EntityRemoved, ExtensionAdded, ExtensionRemoved, IncludeAdded, IncludeRemoved, IndexAdded, IndexRemoved, RangeChanged, RelationAdded, RelationRemoved, RequiredAttributeChanged, RequiredOfAttributeChanged, SchemaAdded, SchemaRemoved, SchemaNameChanged, SchemaVersionChanged, }
  • 32. Schema diff and patch New Database Schema x.1 Schema x.2 Database at Schema x.1 initialize Diff of x.1 and x.2 Database at Schema x.2 apply diff find diff
  • 33. Migration is not always possible Don’t feel bad! Basic schemas can’t be changed! • E.g. • Removal or abstraction of types already in use • Changes unsupported at the storage level
  • 35. • Problem: • Need to predict write throughput, read latency given 10x more data • Analytical solutions are difficult • Solution? • Generate graphs of different sizes • Study the trends • Problem: • Where do we get the data? • Shrinking or growing real data is difficult Capacity planning
  • 36. • Existing graph benchmarks • Lancichinetti-Fortunato-Radicchi (LFR) benchmark • graphdb-benchmarks • Linked Data Benchmark Council (LDBC) • SPARQL benchmarks for triple stores • None of these are very much like our data • Not a social network; no power law distributions • Vastly different topology • Idea: use the schema to generate statistically representative data Benchmarking options
  • 37. • Gather some statistics • Entity and relationship type distributions • Per-relationship in- and out-degree distributions • Add these to the schema • Give the Graphgen utility a dataset size, random seed • Graphgen attempts to create a graph in accordance with the model • Gather statistics from the generated graph • Compare and contrast • Same dataset can be generated in different environments Graph generation workflow