SlideShare a Scribd company logo
Today:
Marc C. Hadfield, Founder

Vital AI

http://vital.ai
marc@vital.ai
917.463.4776
MetaQL:
Queries Across NoSQL,
SQL, Sparql, and Spark
<intro>
Marc C. Hadfield, Founder Vital AI

http://vital.ai

marc@vital.ai
MetaQL: Queries Across NoSQL, SQL,
Sparql, and Spark

Quick Overview
agenda
MetaQL Intro
Motivation
Domain Models (Schema)
MetaQL DSL
MetaQL Implementations
Examples
MetaQL
Leverage Domain Model (Schema)
Compose Queries in Code: Typed
Execute Queries on Databases,
Interchangeably
Minimize TCO:

Separation of Concerns

Developer Efficiency
Query Framework
Executable JVM Code! (Groovy Closure)
MetaQL Origin
Across many data-driven application
implementations, a desire for:

Reusable Processes, Tools:

Stop re-inventing the wheel.
Tools to manage “schema” across an
application & organization.
Tools to combine Semantic Web,
NOSQL, and Hadoop/Spark.
Team Collaboration:

Human Labor is usually limiting factor.
sample
Recipient
Sender EMail
hasRecipient
hasSender
sample
Recipient
Sender EMail
hasRecipient
hasSender
ARC
ARC
sample
Recipient
Sender EMail
hasRecipient
hasSender
notEqual
type:Person

Address:john@example.org
type:Person
type:hasSender
type:hasRecipient
type:Email
sample MetaQL graph query
GRAPH {
value segments: ["mydata"]
ARC {
node_constraint { Email.class }
constraint { "?person1 != ?person2" }
ARC_AND {
ARC {
edge_constraint { Edge_hasSender.class }
node_constraint {
Person.props().emailAddress.equalTo(“john@example.org")

}
node_constraint { Person.class }
node_provides { "person1 = URI" }
}
ARC {
edge_constraint { Edge_hasRecipient.class }
node_constraint { Person.class }
node_provides { "person2 = URI" }
}
}
}
}
Internet of Things
Amazon Echo
Internet of Things
Coffee
Internet of Things:
Batch and
Stream
Processing
Amazon Echo
Amazon Echo Service
haley-app webservice
Vert.X
Vital Prime
Database
DataScript
Hadoop - HDFS
Apache Spark
Streaming, MLLIB, NLP, GraphX
Aspen Datawarehouse
Analytics Layer
Serving Layer
Haley Device
Raspberry Pi
Voice to Text API
Cognitive Application
NLP and Inference to
process User request.
Query Knowledge in DB
Streaming Prediction Models:
“Should I really have
more Coffee?”
External APIs…
Demo Examples
Vital Prime
Database
Vert.X
Vital-Vertx
JavaScript WebApp
VitalService-JS
Prediction
Models
DataScript
https://github.com/vital-ai/vital-examples
Demo Example
https://demos.vital.ai/enron-js-app/index.html
https://github.com/vital-ai/vital-examples/tree/master/enron-js-app
Demo Example
Demo Example
Demo Example
Recipient EMailhasRecipient
Cytoscape Plugin
https://github.com/vital-ai/vital-cytoscape
http://cytoscape.org/
Cytoscape Plugin
Cytoscape Plugin
Cytoscape Plugin
Cytoscape Plugin
Cytoscape Plugin: Wordnet Data, “wine, vino”
where are we using MetaQL?
Financial Services
Healthcare
Internet-of-Things
Start-Ups, Recommendation Apps
motivation for MetaQL
application architecture
Batch and
Stream
Processing
Web / Mobile
Application
Application Server
Transactional
Database
Hadoop - HDFS
Apache Spark
Streaming, MLLIB, GraphX
Analytics Layer
Serving Layer
Key/Value
Cache
External APIs Exrernal API
Services
Multiple Databases +
Analytics +
External APIs
enterprise application architecture
Dashboard
Application Server
Enterprise Datawarehouse
Data Silo Data Silo Data Silo Data Silo Data Silo
∞
Many Many Many Data Models…
volume, velocity, variety
polyglot persistance = multiple database technologies
…but we also have very many data models.
many databases, many data models, changing rapidly.
too many moving parts for a developer to reasonably manage!
need fewer APIs to learn!
what happens when changes occur?
Task
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Roles
what changes?
Data Model Changes
New Data Sources
Infrastructure Change
Switch Databases
New Prediction Models / Features
New Service APIs…
Many Interdependencies…
Example: Change in the taxonomy of a categorization
service breaks all the logic tied to the old categories.
total cost of ownership
How much code changes when we modify our data
model to include new sources?
How to minimize by decoupling dependencies?
When we switch database technologies?
Domain Model as “Contract”
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Domain
Model
Everyone to agree (or at least be aware) of
the definition of Domain Concepts.
Ue semantics to map “views”.
MetaQL Abstraction
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Domain
Model
MetaQL
Abstraction to give breathing room to Infrastructure.
Infrastructure / DevOps
Database Types:
• Key/Value
• Document
• RDF Graph
• NOSQL
• Relational
• Timeseries
ACID vs. BASE
Optimizing Query Generation
Tuning Secondary Indices
Update MetaQL DSL for new DB features
CAP Theorem
Domain Model (Schema)
Domain Model Implementation
Combine:
SQL-style Schema
with
Hadoop Data Serialization Schema
(Avro, Thrift, Protocol Buffers, Kyro, Parquet)
add
Semantics: the “Meaning” of objects
Not a table “person”, but define the concept of
Person to be used throughout an application.
The implementation decides how to store “Person”
data in it’s database.
Domain Model Implementation
Domain Model definition resolves:
RDF vs Property Graph model
Object Relational Impedance Mismatch
Use OWL to capture Domain Model:
SubClasses
SubProperties
Multiple Inheritance
Marginal technology performance gains are hugely outweighed
by Human productively gains, and wider choice of tools.
Compromise
across modeling
paradigms .
Domain Model Implementation
Example: Healthcare Application:

URI<Person123> IS_A:
• Patient
• BillableAccount
• InsuredEntity
Same URI across three domain concepts:

Diagnostics Records, Billing System, Insurance System.
Implementation Note:
We generate code for the JVM using “traits” as a way to implement
multiple inheritance (Groovy, Scala, Java8).
The trait is used as a semantic marker to link to the Domain Model.
Domain Model - Core Classes
Node NodeEdge
HyperNodeHyperEdge
Properties:
• URI
• Primary Type
• Types

Edges/HyperEdges:
• Source URI
• Destination URI
Edges:
• Peer
• Taxonomy
Class Instances
contain Properties.
Protege OWL Editor
VitalSigns: Domain Model Dev Kit
$ vitalsigns generate -o ./domain-ontology/enron-dataset-1.0.0.owl
$ ls domain-groovy-jar
enron-dataset-groovy-1.0.0.jar
$ ls domain-json-schema
enron-dataset-1.0.0.js
OWL can be compiled into JVM code
statically (create an artifact for maven), or
done dynamically at runtime.
Development with the Domain Model
Code Completion from
Domain Model
Development with the Domain Model
VitalSigns vs = VitalSigns.get()
Musician john = new Musician().generateURI(“john")
john.name = "John Lennon"
john.birthday = "October 9, 1940"^xsd.xdatetime("MMMM d, yyyy”)
MusicGroup thebeatles = new MusicGroup().generateURI("thebeatles")
thebeatles.name = "The Beatles"
// try to assign the wrong property, throws an exception
try { thebeatles.birthday = "January 1, 1970"^xsd.xdatetime("MMMM d, yyyy”)
} catch(Exception ex) { println ex } // no such property exception
vs.addToCache( thebeatles.addEdge_hasMember(john) )
// use cache to resolve queries
thebeatles.getMembers().each{ println it.name }
// use database to resolve queries
thebeatles.getMembers(ServiceWide).each{ println it.name }
Implicit MetaQL Queries
VitalService API
• Open/Close Endpoint
• Create/Remove Segment
• Create/Read/Update/Delete Object
• Queries (MetaQL as input closure)
• Service Operations (MetaQL as input closure)
• callFunction (DataScript)
• init Transaction/Commit/Rollback
A “Segment” is a Database (container of objects)
MetaQL
VitalSigns: Domain Model Manager
• MetaQL DSL
• Prediction Model DSL
• Pipeline Transformation DSL (ETL)
(in development)
A tricky bit is find the best way to express
the DSL within the allowed grammar of the
host language (Groovy).

It’s an ongoing effort.
Query Types
AGGREGATION
PATH
GRAPH
SELECT
Query Elements
• constraints: node_constraint, edge_constraint, …
• comparators (equalTo, greaterThan, …)
• provides, ?reference
• AND, OR
• OPTIONAL
• Sort Criteria
SELECT query
SELECT {
value limit: 100
value offset: 0
value segments: ["mydata"]
constraint { Person.class }
constraint { Person.props().name.equalTo("John" ) }
}
GRAPH query
GRAPH {
value segments: ["mydata"]
ARC {
node_constraint { Email.class }
constraint { "?person1 != ?person2" }
ARC_AND {
ARC {
edge_constraint { Edge_hasSender.class }
node_constraint {
Person.props().emailAddress.equalTo(“john@example.org") }
node_constraint { Person.class }
node_provides { "person1 = URI" }
}
ARC {
edge_constraint { Edge_hasRecipient.class }
node_constraint { Person.class }
node_provides { "person2 = URI" }
}
}
}
}
GRAPH query (2)
GRAPH {
value segments: [VitalSegment.withId('wordnet')]
value inlineObjects: true
ARC {
node_bind { "node1" }
node_constraint { SynsetNode.expandSubclasses(true) }
node_constraint { SynsetNode.props().name.contains_i("happy") }
ARC {
edge_bind { "edge" }
node_bind { "node2" }
}
}
}
Code iterating over Results can use bind names to
reference objects in each solution: node1, edge, node2.
<—- inline objects
PATH query
def forward = true
def reverse = false
PATH {
value segments: segments
value maxdepth: 5

value rootURIs: [URIProperty.withString(inputURI)]
if( forward ) {
ARC {
value direction: 'forward'
// accept any edge: edge_constraint { }
// accept any node: node_constraint { }
}
}
if( reverse ) {
ARC {
value direction: 'reverse'
// accept any edge: edge_constraint { }
// accept any node: node_constraint { }
}
}
}
AGGREGATION query
SUM Product.props().cost
AVERAGE Person.props().birthday
COUNT_DISTINCT Document.props().active
FIRST { DISTINCT Document.props().title, expandProperty : false, order: Order.ASC }
Part of a SELECT query
Service Operations DSL
Insert
Update
Delete
Service Operations
INSERT {
value segment: 'testing'
insert(MusicGroup.class, provides: "thebeatles") {
MusicGroup thebeatles ->
thebeatles.name = "The Beatles"
thebeatles.URI = "thebeatles"
}
insert(Musician.class, provides: "john") {
Musician john ->
john.name = "John"
john.URI = "john"
}
insert(Edge_hasMember) { Edge_hasMember member ->
member.sourceURI = ref("thebeatles").toString()
member.destinationURI = ref("john").toString()
member.URI = "edge1"
}
}
<— Using “provides” values
Transactions
def xid = service.startTransaction()
service.save(xid, person123)
service.commitTransaction(xid)
Implemented at the service level:
MetaQL Implementations
MetaQL
Executable
Query
Query Generator
Sparql/RDF Implementation
G S P O
Quad Store
Franz Allegrograph
Sparql/RDF Implementation
VitalGraphQuery q = builder.query {
GRAPH {
value segments: ["documents"]
ARC {
node_constraint { Person.class }
node_constraint
{ Person.props().emailID.equalTo(“k.lay@enron.com" ) }
ARC {
node_constraint { EMailMessage.class }
edge_constraint { Edge_hasEMailMessage.class }
}
}

}
}.toQuery()
println "Query: " + q.toSparql()
Sparql/RDF Implementation
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX vital-core: <http://vital.ai/ontology/vital-core#>
PREFIX p0: <http://vital.ai/ontology/enron-emails#>
SELECT DISTINCT ?s1 ?d2 ?e2
FROM <segment:customer__app__documents>
WHERE {
{
?s1 p0:hasEmailID ?value1 .
?s1 rdf:type ?value2 .
FILTER (
?value2 = p0:Person && ?value1 = “k.lay@enron.com"^^xsd:string
)
{
?d2 rdf:type ?value3 .
?e2 rdf:type ?value4 .
FILTER (
?value3 = p0:EMailMessage && ?value4 = p0:Edge_hasEMailMessage
)
?e2 vital-core:hasEdgeSource ?s1 .
?e2 vital-core:hasEdgeDestination ?d2 .
}
}
}
Spark-SQL / Dataframe
URI P V
Segment RDD Property RDD
K V
Experimenting with: new Dataframe Optimizer: Catalyst, new Dataframe
DSL for query generation, and using GraphX for isolated Graph Query cases
Generate “Bad” queries, with optimizer fixing them and Spark
partitioning RDDs, as long as Spark is aware of Schema.
Key/Value Implementation
K V
URI —> Serialized Object
Lucene/SOLR Implementation
DocID
1
2
3
P1
V1
V1
P2
V2
V2
P3
V3
V3
P4
V4
V4
Inverted Index of Property Values…
NoSQL BigTable Implementation
DynamoDB (HBase, Cassandra, Accumulo, …)
ROWID
1
2
3
C1
K1=V1
K1=V1
K1=V1
C2
K1=V1
K1=V1
K1=V1
C3
K1=V1
K1=V1
K1=V1
C4
K1=V1, K1=V1
K1=V1, K1=V1
K1=V1, K1=V1
URI P V
Per Segment object table
Per Segment property table
+ Secondary Indices
+ Secondary Indices
SQL Implementation
SQL, Hive-SQL, Redshift, …
G S P O
Per Segment Table
with Partitioning (Hive)
implementation
DSL Documentation to be posted:
http://www.metaql.org/
VitalSigns, VitalService, MetaQL
https://dashboard.vital.ai/
Vital AI github: https://github.com/vital-ai/
Sample Code

Spark Code: Aspen, Aspen-Datawarehouse
Documentation Coming!
closing thoughts
Separation of Concerns yields
the Agility needed to keep up
with rapidly evolving Data.
“Domain Model as Contract” provides a framework for
consistent interpretation of Data across an application.
MetaQL provides a framework for the consistent
access and query of Data across an application.
Context: Data-Driven Application / Cognitive Applications:
Thank You!
Marc C. Hadfield, Founder

Vital AI

http://vital.ai
marc@vital.ai
917.463.4776
Pipeline DSL (ETL)
PIPELINE { // Workflow
PIPE { // a Workflow Component with dependencies
TRANSFORM { // Joins across Datasets

IF (RULE { } ) // Boolean, Query, Construct, …

THEN { RULE { } }

ELSE { RULE { } }

}
PIPE { … } // dependent PIPE
} // Output Dataset
PIPE { …

}
}
Influenced by Spark Pipeline and
Google Dataflow Pipeline
Schema Upgrade/Downgrade
UPGRADE {
upgrade(
oldClass: OLD_Person.class,
newClass: NEW_Person.class ) {
person_old, person_new ->
person_new.newName = person_old.oldName
}
}
DOWNGRADE {
downgrade(
newClass: NEW_Person.class,
oldClass: OLD_Person.class ) {
person_new, person_old ->
person_old.oldName = person_new.newName
}
}
Multiple Endpoints
def service1 = VitalService.getService(profile:”kv-users”)
def service2 = VitalService.getService(profile:”posts-db”)
def service3 = VitalService.getService(profile:”friendgraph-db”)
// given user URI:user123@email.org
// get user object from service1
// find friends of user in friendgraph via service3
// find posts of friends in posts-db
// update service1 with cache of user-to-friends-postings
// send postings of friends to user in UI

More Related Content

What's hot

Graph Analysis over JSON, Larus
Graph Analysis over JSON, LarusGraph Analysis over JSON, Larus
Graph Analysis over JSON, Larus
Neo4j
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
jexp
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven apps
Connected Data World
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4j
M. David Allen
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...South London Geek Nights
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
Avkash Chauhan
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
Jean Ihm
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Andy Petrella
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Cambridge Semantics
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Stefan Urbanek
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platform
Andy Petrella
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
Neo4j
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Peter Haase
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
MongoDB
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
markgrover
 
SHACL-based data life cycle management
SHACL-based data life cycle managementSHACL-based data life cycle management
SHACL-based data life cycle management
Connected Data World
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
Moumie Soulemane
 

What's hot (20)

Graph Analysis over JSON, Larus
Graph Analysis over JSON, LarusGraph Analysis over JSON, Larus
Graph Analysis over JSON, Larus
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven apps
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4j
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platform
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
SHACL-based data life cycle management
SHACL-based data life cycle managementSHACL-based data life cycle management
SHACL-based data life cycle management
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 

Similar to Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
Cloudera, Inc.
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
RTTS
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
MongoDB
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
Kaxil Naik
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
Marco Parenzan
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
confluent
 
Introducing Oslo
Introducing OsloIntroducing Oslo
Introducing Oslo
Suresh Veeragoni
 
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDBMongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
PascalDesmarets1
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
Brian Ritchie
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Paco Nathan
 
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL SystemsStrudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
tatemura
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
Databricks
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
Trivadis
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
Mark Tabladillo
 
Object- Relational Persistence in Smalltalk
Object- Relational Persistence in SmalltalkObject- Relational Persistence in Smalltalk
Object- Relational Persistence in Smalltalk
ESUG
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
hdhappy001
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA
 
Building social and RESTful frameworks
Building social and RESTful frameworksBuilding social and RESTful frameworks
Building social and RESTful frameworksbrendonschwartz
 

Similar to Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark (20)

Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
 
Introducing Oslo
Introducing OsloIntroducing Oslo
Introducing Oslo
 
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDBMongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
RavenDB overview
RavenDB overviewRavenDB overview
RavenDB overview
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL SystemsStrudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Object- Relational Persistence in Smalltalk
Object- Relational Persistence in SmalltalkObject- Relational Persistence in Smalltalk
Object- Relational Persistence in Smalltalk
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Building social and RESTful frameworks
Building social and RESTful frameworksBuilding social and RESTful frameworks
Building social and RESTful frameworks
 

Recently uploaded

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
benishzehra469
 

Recently uploaded (20)

Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
Empowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptxEmpowering Data Analytics Ecosystem.pptx
Empowering Data Analytics Ecosystem.pptx
 

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

  • 1. Today: Marc C. Hadfield, Founder
 Vital AI
 http://vital.ai marc@vital.ai 917.463.4776 MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
  • 2. <intro> Marc C. Hadfield, Founder Vital AI
 http://vital.ai
 marc@vital.ai MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 Quick Overview
  • 3. agenda MetaQL Intro Motivation Domain Models (Schema) MetaQL DSL MetaQL Implementations Examples
  • 4. MetaQL Leverage Domain Model (Schema) Compose Queries in Code: Typed Execute Queries on Databases, Interchangeably Minimize TCO:
 Separation of Concerns
 Developer Efficiency Query Framework Executable JVM Code! (Groovy Closure)
  • 5. MetaQL Origin Across many data-driven application implementations, a desire for:
 Reusable Processes, Tools:
 Stop re-inventing the wheel. Tools to manage “schema” across an application & organization. Tools to combine Semantic Web, NOSQL, and Hadoop/Spark. Team Collaboration:
 Human Labor is usually limiting factor.
  • 9. sample MetaQL graph query GRAPH { value segments: ["mydata"] ARC { node_constraint { Email.class } constraint { "?person1 != ?person2" } ARC_AND { ARC { edge_constraint { Edge_hasSender.class } node_constraint { Person.props().emailAddress.equalTo(“john@example.org")
 } node_constraint { Person.class } node_provides { "person1 = URI" } } ARC { edge_constraint { Edge_hasRecipient.class } node_constraint { Person.class } node_provides { "person2 = URI" } } } } }
  • 12. Internet of Things: Batch and Stream Processing Amazon Echo Amazon Echo Service haley-app webservice Vert.X Vital Prime Database DataScript Hadoop - HDFS Apache Spark Streaming, MLLIB, NLP, GraphX Aspen Datawarehouse Analytics Layer Serving Layer Haley Device Raspberry Pi Voice to Text API Cognitive Application NLP and Inference to process User request. Query Knowledge in DB Streaming Prediction Models: “Should I really have more Coffee?” External APIs…
  • 13. Demo Examples Vital Prime Database Vert.X Vital-Vertx JavaScript WebApp VitalService-JS Prediction Models DataScript https://github.com/vital-ai/vital-examples
  • 23. Cytoscape Plugin: Wordnet Data, “wine, vino”
  • 24. where are we using MetaQL? Financial Services Healthcare Internet-of-Things Start-Ups, Recommendation Apps
  • 26. application architecture Batch and Stream Processing Web / Mobile Application Application Server Transactional Database Hadoop - HDFS Apache Spark Streaming, MLLIB, GraphX Analytics Layer Serving Layer Key/Value Cache External APIs Exrernal API Services Multiple Databases + Analytics + External APIs
  • 27. enterprise application architecture Dashboard Application Server Enterprise Datawarehouse Data Silo Data Silo Data Silo Data Silo Data Silo ∞ Many Many Many Data Models…
  • 28. volume, velocity, variety polyglot persistance = multiple database technologies …but we also have very many data models. many databases, many data models, changing rapidly. too many moving parts for a developer to reasonably manage! need fewer APIs to learn!
  • 29. what happens when changes occur? Task Infrastructure DevOps Data Scientists Business + Domain Experts Developers Roles
  • 30. what changes? Data Model Changes New Data Sources Infrastructure Change Switch Databases New Prediction Models / Features New Service APIs… Many Interdependencies… Example: Change in the taxonomy of a categorization service breaks all the logic tied to the old categories.
  • 31. total cost of ownership How much code changes when we modify our data model to include new sources? How to minimize by decoupling dependencies? When we switch database technologies?
  • 32. Domain Model as “Contract” Infrastructure DevOps Data Scientists Business + Domain Experts Developers Domain Model Everyone to agree (or at least be aware) of the definition of Domain Concepts. Ue semantics to map “views”.
  • 33. MetaQL Abstraction Infrastructure DevOps Data Scientists Business + Domain Experts Developers Domain Model MetaQL Abstraction to give breathing room to Infrastructure.
  • 34. Infrastructure / DevOps Database Types: • Key/Value • Document • RDF Graph • NOSQL • Relational • Timeseries ACID vs. BASE Optimizing Query Generation Tuning Secondary Indices Update MetaQL DSL for new DB features CAP Theorem
  • 36. Domain Model Implementation Combine: SQL-style Schema with Hadoop Data Serialization Schema (Avro, Thrift, Protocol Buffers, Kyro, Parquet) add Semantics: the “Meaning” of objects Not a table “person”, but define the concept of Person to be used throughout an application. The implementation decides how to store “Person” data in it’s database.
  • 37. Domain Model Implementation Domain Model definition resolves: RDF vs Property Graph model Object Relational Impedance Mismatch Use OWL to capture Domain Model: SubClasses SubProperties Multiple Inheritance Marginal technology performance gains are hugely outweighed by Human productively gains, and wider choice of tools. Compromise across modeling paradigms .
  • 38. Domain Model Implementation Example: Healthcare Application:
 URI<Person123> IS_A: • Patient • BillableAccount • InsuredEntity Same URI across three domain concepts:
 Diagnostics Records, Billing System, Insurance System. Implementation Note: We generate code for the JVM using “traits” as a way to implement multiple inheritance (Groovy, Scala, Java8). The trait is used as a semantic marker to link to the Domain Model.
  • 39. Domain Model - Core Classes Node NodeEdge HyperNodeHyperEdge Properties: • URI • Primary Type • Types
 Edges/HyperEdges: • Source URI • Destination URI Edges: • Peer • Taxonomy Class Instances contain Properties.
  • 41. VitalSigns: Domain Model Dev Kit $ vitalsigns generate -o ./domain-ontology/enron-dataset-1.0.0.owl $ ls domain-groovy-jar enron-dataset-groovy-1.0.0.jar $ ls domain-json-schema enron-dataset-1.0.0.js OWL can be compiled into JVM code statically (create an artifact for maven), or done dynamically at runtime.
  • 42. Development with the Domain Model Code Completion from Domain Model
  • 43. Development with the Domain Model VitalSigns vs = VitalSigns.get() Musician john = new Musician().generateURI(“john") john.name = "John Lennon" john.birthday = "October 9, 1940"^xsd.xdatetime("MMMM d, yyyy”) MusicGroup thebeatles = new MusicGroup().generateURI("thebeatles") thebeatles.name = "The Beatles" // try to assign the wrong property, throws an exception try { thebeatles.birthday = "January 1, 1970"^xsd.xdatetime("MMMM d, yyyy”) } catch(Exception ex) { println ex } // no such property exception vs.addToCache( thebeatles.addEdge_hasMember(john) ) // use cache to resolve queries thebeatles.getMembers().each{ println it.name } // use database to resolve queries thebeatles.getMembers(ServiceWide).each{ println it.name } Implicit MetaQL Queries
  • 44. VitalService API • Open/Close Endpoint • Create/Remove Segment • Create/Read/Update/Delete Object • Queries (MetaQL as input closure) • Service Operations (MetaQL as input closure) • callFunction (DataScript) • init Transaction/Commit/Rollback A “Segment” is a Database (container of objects)
  • 45. MetaQL VitalSigns: Domain Model Manager • MetaQL DSL • Prediction Model DSL • Pipeline Transformation DSL (ETL) (in development) A tricky bit is find the best way to express the DSL within the allowed grammar of the host language (Groovy).
 It’s an ongoing effort.
  • 47. Query Elements • constraints: node_constraint, edge_constraint, … • comparators (equalTo, greaterThan, …) • provides, ?reference • AND, OR • OPTIONAL • Sort Criteria
  • 48. SELECT query SELECT { value limit: 100 value offset: 0 value segments: ["mydata"] constraint { Person.class } constraint { Person.props().name.equalTo("John" ) } }
  • 49. GRAPH query GRAPH { value segments: ["mydata"] ARC { node_constraint { Email.class } constraint { "?person1 != ?person2" } ARC_AND { ARC { edge_constraint { Edge_hasSender.class } node_constraint { Person.props().emailAddress.equalTo(“john@example.org") } node_constraint { Person.class } node_provides { "person1 = URI" } } ARC { edge_constraint { Edge_hasRecipient.class } node_constraint { Person.class } node_provides { "person2 = URI" } } } } }
  • 50. GRAPH query (2) GRAPH { value segments: [VitalSegment.withId('wordnet')] value inlineObjects: true ARC { node_bind { "node1" } node_constraint { SynsetNode.expandSubclasses(true) } node_constraint { SynsetNode.props().name.contains_i("happy") } ARC { edge_bind { "edge" } node_bind { "node2" } } } } Code iterating over Results can use bind names to reference objects in each solution: node1, edge, node2. <—- inline objects
  • 51. PATH query def forward = true def reverse = false PATH { value segments: segments value maxdepth: 5
 value rootURIs: [URIProperty.withString(inputURI)] if( forward ) { ARC { value direction: 'forward' // accept any edge: edge_constraint { } // accept any node: node_constraint { } } } if( reverse ) { ARC { value direction: 'reverse' // accept any edge: edge_constraint { } // accept any node: node_constraint { } } } }
  • 52. AGGREGATION query SUM Product.props().cost AVERAGE Person.props().birthday COUNT_DISTINCT Document.props().active FIRST { DISTINCT Document.props().title, expandProperty : false, order: Order.ASC } Part of a SELECT query
  • 54. Service Operations INSERT { value segment: 'testing' insert(MusicGroup.class, provides: "thebeatles") { MusicGroup thebeatles -> thebeatles.name = "The Beatles" thebeatles.URI = "thebeatles" } insert(Musician.class, provides: "john") { Musician john -> john.name = "John" john.URI = "john" } insert(Edge_hasMember) { Edge_hasMember member -> member.sourceURI = ref("thebeatles").toString() member.destinationURI = ref("john").toString() member.URI = "edge1" } } <— Using “provides” values
  • 55. Transactions def xid = service.startTransaction() service.save(xid, person123) service.commitTransaction(xid) Implemented at the service level:
  • 57. Sparql/RDF Implementation G S P O Quad Store Franz Allegrograph
  • 58. Sparql/RDF Implementation VitalGraphQuery q = builder.query { GRAPH { value segments: ["documents"] ARC { node_constraint { Person.class } node_constraint { Person.props().emailID.equalTo(“k.lay@enron.com" ) } ARC { node_constraint { EMailMessage.class } edge_constraint { Edge_hasEMailMessage.class } } }
 } }.toQuery() println "Query: " + q.toSparql()
  • 59. Sparql/RDF Implementation PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX vital-core: <http://vital.ai/ontology/vital-core#> PREFIX p0: <http://vital.ai/ontology/enron-emails#> SELECT DISTINCT ?s1 ?d2 ?e2 FROM <segment:customer__app__documents> WHERE { { ?s1 p0:hasEmailID ?value1 . ?s1 rdf:type ?value2 . FILTER ( ?value2 = p0:Person && ?value1 = “k.lay@enron.com"^^xsd:string ) { ?d2 rdf:type ?value3 . ?e2 rdf:type ?value4 . FILTER ( ?value3 = p0:EMailMessage && ?value4 = p0:Edge_hasEMailMessage ) ?e2 vital-core:hasEdgeSource ?s1 . ?e2 vital-core:hasEdgeDestination ?d2 . } } }
  • 60. Spark-SQL / Dataframe URI P V Segment RDD Property RDD K V Experimenting with: new Dataframe Optimizer: Catalyst, new Dataframe DSL for query generation, and using GraphX for isolated Graph Query cases Generate “Bad” queries, with optimizer fixing them and Spark partitioning RDDs, as long as Spark is aware of Schema.
  • 61. Key/Value Implementation K V URI —> Serialized Object
  • 63. NoSQL BigTable Implementation DynamoDB (HBase, Cassandra, Accumulo, …) ROWID 1 2 3 C1 K1=V1 K1=V1 K1=V1 C2 K1=V1 K1=V1 K1=V1 C3 K1=V1 K1=V1 K1=V1 C4 K1=V1, K1=V1 K1=V1, K1=V1 K1=V1, K1=V1 URI P V Per Segment object table Per Segment property table + Secondary Indices + Secondary Indices
  • 64. SQL Implementation SQL, Hive-SQL, Redshift, … G S P O Per Segment Table with Partitioning (Hive)
  • 65. implementation DSL Documentation to be posted: http://www.metaql.org/ VitalSigns, VitalService, MetaQL https://dashboard.vital.ai/ Vital AI github: https://github.com/vital-ai/ Sample Code
 Spark Code: Aspen, Aspen-Datawarehouse Documentation Coming!
  • 66. closing thoughts Separation of Concerns yields the Agility needed to keep up with rapidly evolving Data. “Domain Model as Contract” provides a framework for consistent interpretation of Data across an application. MetaQL provides a framework for the consistent access and query of Data across an application. Context: Data-Driven Application / Cognitive Applications:
  • 67. Thank You! Marc C. Hadfield, Founder
 Vital AI
 http://vital.ai marc@vital.ai 917.463.4776
  • 68. Pipeline DSL (ETL) PIPELINE { // Workflow PIPE { // a Workflow Component with dependencies TRANSFORM { // Joins across Datasets
 IF (RULE { } ) // Boolean, Query, Construct, …
 THEN { RULE { } }
 ELSE { RULE { } }
 } PIPE { … } // dependent PIPE } // Output Dataset PIPE { …
 } } Influenced by Spark Pipeline and Google Dataflow Pipeline
  • 69. Schema Upgrade/Downgrade UPGRADE { upgrade( oldClass: OLD_Person.class, newClass: NEW_Person.class ) { person_old, person_new -> person_new.newName = person_old.oldName } } DOWNGRADE { downgrade( newClass: NEW_Person.class, oldClass: OLD_Person.class ) { person_new, person_old -> person_old.oldName = person_new.newName } }
  • 70. Multiple Endpoints def service1 = VitalService.getService(profile:”kv-users”) def service2 = VitalService.getService(profile:”posts-db”) def service3 = VitalService.getService(profile:”friendgraph-db”) // given user URI:user123@email.org // get user object from service1 // find friends of user in friendgraph via service3 // find posts of friends in posts-db // update service1 with cache of user-to-friends-postings // send postings of friends to user in UI