SlideShare a Scribd company logo
1 of 70
Download to read offline
Today:
Marc C. Hadfield, Founder

Vital AI

http://vital.ai
marc@vital.ai
917.463.4776
MetaQL:
Queries Across NoSQL,
SQL, Sparql, and Spark
<intro>
Marc C. Hadfield, Founder Vital AI

http://vital.ai

marc@vital.ai
MetaQL: Queries Across NoSQL, SQL,
Sparql, and Spark

Quick Overview
agenda
MetaQL Intro
Motivation
Domain Models (Schema)
MetaQL DSL
MetaQL Implementations
Examples
MetaQL
Leverage Domain Model (Schema)
Compose Queries in Code: Typed
Execute Queries on Databases,
Interchangeably
Minimize TCO:

Separation of Concerns

Developer Efficiency
Query Framework
Executable JVM Code! (Groovy Closure)
MetaQL Origin
Across many data-driven application
implementations, a desire for:

Reusable Processes, Tools:

Stop re-inventing the wheel.
Tools to manage “schema” across an
application & organization.
Tools to combine Semantic Web,
NOSQL, and Hadoop/Spark.
Team Collaboration:

Human Labor is usually limiting factor.
sample
Recipient
Sender EMail
hasRecipient
hasSender
sample
Recipient
Sender EMail
hasRecipient
hasSender
ARC
ARC
sample
Recipient
Sender EMail
hasRecipient
hasSender
notEqual
type:Person

Address:john@example.org
type:Person
type:hasSender
type:hasRecipient
type:Email
sample MetaQL graph query
GRAPH {
value segments: ["mydata"]
ARC {
node_constraint { Email.class }
constraint { "?person1 != ?person2" }
ARC_AND {
ARC {
edge_constraint { Edge_hasSender.class }
node_constraint {
Person.props().emailAddress.equalTo(“john@example.org")

}
node_constraint { Person.class }
node_provides { "person1 = URI" }
}
ARC {
edge_constraint { Edge_hasRecipient.class }
node_constraint { Person.class }
node_provides { "person2 = URI" }
}
}
}
}
Internet of Things
Amazon Echo
Internet of Things
Coffee
Internet of Things:
Batch and
Stream
Processing
Amazon Echo
Amazon Echo Service
haley-app webservice
Vert.X
Vital Prime
Database
DataScript
Hadoop - HDFS
Apache Spark
Streaming, MLLIB, NLP, GraphX
Aspen Datawarehouse
Analytics Layer
Serving Layer
Haley Device
Raspberry Pi
Voice to Text API
Cognitive Application
NLP and Inference to
process User request.
Query Knowledge in DB
Streaming Prediction Models:
“Should I really have
more Coffee?”
External APIs…
Demo Examples
Vital Prime
Database
Vert.X
Vital-Vertx
JavaScript WebApp
VitalService-JS
Prediction
Models
DataScript
https://github.com/vital-ai/vital-examples
Demo Example
https://demos.vital.ai/enron-js-app/index.html
https://github.com/vital-ai/vital-examples/tree/master/enron-js-app
Demo Example
Demo Example
Demo Example
Recipient EMailhasRecipient
Cytoscape Plugin
https://github.com/vital-ai/vital-cytoscape
http://cytoscape.org/
Cytoscape Plugin
Cytoscape Plugin
Cytoscape Plugin
Cytoscape Plugin
Cytoscape Plugin: Wordnet Data, “wine, vino”
where are we using MetaQL?
Financial Services
Healthcare
Internet-of-Things
Start-Ups, Recommendation Apps
motivation for MetaQL
application architecture
Batch and
Stream
Processing
Web / Mobile
Application
Application Server
Transactional
Database
Hadoop - HDFS
Apache Spark
Streaming, MLLIB, GraphX
Analytics Layer
Serving Layer
Key/Value
Cache
External APIs Exrernal API
Services
Multiple Databases +
Analytics +
External APIs
enterprise application architecture
Dashboard
Application Server
Enterprise Datawarehouse
Data Silo Data Silo Data Silo Data Silo Data Silo
∞
Many Many Many Data Models…
volume, velocity, variety
polyglot persistance = multiple database technologies
…but we also have very many data models.
many databases, many data models, changing rapidly.
too many moving parts for a developer to reasonably manage!
need fewer APIs to learn!
what happens when changes occur?
Task
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Roles
what changes?
Data Model Changes
New Data Sources
Infrastructure Change
Switch Databases
New Prediction Models / Features
New Service APIs…
Many Interdependencies…
Example: Change in the taxonomy of a categorization
service breaks all the logic tied to the old categories.
total cost of ownership
How much code changes when we modify our data
model to include new sources?
How to minimize by decoupling dependencies?
When we switch database technologies?
Domain Model as “Contract”
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Domain
Model
Everyone to agree (or at least be aware) of
the definition of Domain Concepts.
Ue semantics to map “views”.
MetaQL Abstraction
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Domain
Model
MetaQL
Abstraction to give breathing room to Infrastructure.
Infrastructure / DevOps
Database Types:
• Key/Value
• Document
• RDF Graph
• NOSQL
• Relational
• Timeseries
ACID vs. BASE
Optimizing Query Generation
Tuning Secondary Indices
Update MetaQL DSL for new DB features
CAP Theorem
Domain Model (Schema)
Domain Model Implementation
Combine:
SQL-style Schema
with
Hadoop Data Serialization Schema
(Avro, Thrift, Protocol Buffers, Kyro, Parquet)
add
Semantics: the “Meaning” of objects
Not a table “person”, but define the concept of
Person to be used throughout an application.
The implementation decides how to store “Person”
data in it’s database.
Domain Model Implementation
Domain Model definition resolves:
RDF vs Property Graph model
Object Relational Impedance Mismatch
Use OWL to capture Domain Model:
SubClasses
SubProperties
Multiple Inheritance
Marginal technology performance gains are hugely outweighed
by Human productively gains, and wider choice of tools.
Compromise
across modeling
paradigms .
Domain Model Implementation
Example: Healthcare Application:

URI<Person123> IS_A:
• Patient
• BillableAccount
• InsuredEntity
Same URI across three domain concepts:

Diagnostics Records, Billing System, Insurance System.
Implementation Note:
We generate code for the JVM using “traits” as a way to implement
multiple inheritance (Groovy, Scala, Java8).
The trait is used as a semantic marker to link to the Domain Model.
Domain Model - Core Classes
Node NodeEdge
HyperNodeHyperEdge
Properties:
• URI
• Primary Type
• Types

Edges/HyperEdges:
• Source URI
• Destination URI
Edges:
• Peer
• Taxonomy
Class Instances
contain Properties.
Protege OWL Editor
VitalSigns: Domain Model Dev Kit
$ vitalsigns generate -o ./domain-ontology/enron-dataset-1.0.0.owl
$ ls domain-groovy-jar
enron-dataset-groovy-1.0.0.jar
$ ls domain-json-schema
enron-dataset-1.0.0.js
OWL can be compiled into JVM code
statically (create an artifact for maven), or
done dynamically at runtime.
Development with the Domain Model
Code Completion from
Domain Model
Development with the Domain Model
VitalSigns vs = VitalSigns.get()
Musician john = new Musician().generateURI(“john")
john.name = "John Lennon"
john.birthday = "October 9, 1940"^xsd.xdatetime("MMMM d, yyyy”)
MusicGroup thebeatles = new MusicGroup().generateURI("thebeatles")
thebeatles.name = "The Beatles"
// try to assign the wrong property, throws an exception
try { thebeatles.birthday = "January 1, 1970"^xsd.xdatetime("MMMM d, yyyy”)
} catch(Exception ex) { println ex } // no such property exception
vs.addToCache( thebeatles.addEdge_hasMember(john) )
// use cache to resolve queries
thebeatles.getMembers().each{ println it.name }
// use database to resolve queries
thebeatles.getMembers(ServiceWide).each{ println it.name }
Implicit MetaQL Queries
VitalService API
• Open/Close Endpoint
• Create/Remove Segment
• Create/Read/Update/Delete Object
• Queries (MetaQL as input closure)
• Service Operations (MetaQL as input closure)
• callFunction (DataScript)
• init Transaction/Commit/Rollback
A “Segment” is a Database (container of objects)
MetaQL
VitalSigns: Domain Model Manager
• MetaQL DSL
• Prediction Model DSL
• Pipeline Transformation DSL (ETL)
(in development)
A tricky bit is find the best way to express
the DSL within the allowed grammar of the
host language (Groovy).

It’s an ongoing effort.
Query Types
AGGREGATION
PATH
GRAPH
SELECT
Query Elements
• constraints: node_constraint, edge_constraint, …
• comparators (equalTo, greaterThan, …)
• provides, ?reference
• AND, OR
• OPTIONAL
• Sort Criteria
SELECT query
SELECT {
value limit: 100
value offset: 0
value segments: ["mydata"]
constraint { Person.class }
constraint { Person.props().name.equalTo("John" ) }
}
GRAPH query
GRAPH {
value segments: ["mydata"]
ARC {
node_constraint { Email.class }
constraint { "?person1 != ?person2" }
ARC_AND {
ARC {
edge_constraint { Edge_hasSender.class }
node_constraint {
Person.props().emailAddress.equalTo(“john@example.org") }
node_constraint { Person.class }
node_provides { "person1 = URI" }
}
ARC {
edge_constraint { Edge_hasRecipient.class }
node_constraint { Person.class }
node_provides { "person2 = URI" }
}
}
}
}
GRAPH query (2)
GRAPH {
value segments: [VitalSegment.withId('wordnet')]
value inlineObjects: true
ARC {
node_bind { "node1" }
node_constraint { SynsetNode.expandSubclasses(true) }
node_constraint { SynsetNode.props().name.contains_i("happy") }
ARC {
edge_bind { "edge" }
node_bind { "node2" }
}
}
}
Code iterating over Results can use bind names to
reference objects in each solution: node1, edge, node2.
<—- inline objects
PATH query
def forward = true
def reverse = false
PATH {
value segments: segments
value maxdepth: 5

value rootURIs: [URIProperty.withString(inputURI)]
if( forward ) {
ARC {
value direction: 'forward'
// accept any edge: edge_constraint { }
// accept any node: node_constraint { }
}
}
if( reverse ) {
ARC {
value direction: 'reverse'
// accept any edge: edge_constraint { }
// accept any node: node_constraint { }
}
}
}
AGGREGATION query
SUM Product.props().cost
AVERAGE Person.props().birthday
COUNT_DISTINCT Document.props().active
FIRST { DISTINCT Document.props().title, expandProperty : false, order: Order.ASC }
Part of a SELECT query
Service Operations DSL
Insert
Update
Delete
Service Operations
INSERT {
value segment: 'testing'
insert(MusicGroup.class, provides: "thebeatles") {
MusicGroup thebeatles ->
thebeatles.name = "The Beatles"
thebeatles.URI = "thebeatles"
}
insert(Musician.class, provides: "john") {
Musician john ->
john.name = "John"
john.URI = "john"
}
insert(Edge_hasMember) { Edge_hasMember member ->
member.sourceURI = ref("thebeatles").toString()
member.destinationURI = ref("john").toString()
member.URI = "edge1"
}
}
<— Using “provides” values
Transactions
def xid = service.startTransaction()
service.save(xid, person123)
service.commitTransaction(xid)
Implemented at the service level:
MetaQL Implementations
MetaQL
Executable
Query
Query Generator
Sparql/RDF Implementation
G S P O
Quad Store
Franz Allegrograph
Sparql/RDF Implementation
VitalGraphQuery q = builder.query {
GRAPH {
value segments: ["documents"]
ARC {
node_constraint { Person.class }
node_constraint
{ Person.props().emailID.equalTo(“k.lay@enron.com" ) }
ARC {
node_constraint { EMailMessage.class }
edge_constraint { Edge_hasEMailMessage.class }
}
}

}
}.toQuery()
println "Query: " + q.toSparql()
Sparql/RDF Implementation
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX vital-core: <http://vital.ai/ontology/vital-core#>
PREFIX p0: <http://vital.ai/ontology/enron-emails#>
SELECT DISTINCT ?s1 ?d2 ?e2
FROM <segment:customer__app__documents>
WHERE {
{
?s1 p0:hasEmailID ?value1 .
?s1 rdf:type ?value2 .
FILTER (
?value2 = p0:Person && ?value1 = “k.lay@enron.com"^^xsd:string
)
{
?d2 rdf:type ?value3 .
?e2 rdf:type ?value4 .
FILTER (
?value3 = p0:EMailMessage && ?value4 = p0:Edge_hasEMailMessage
)
?e2 vital-core:hasEdgeSource ?s1 .
?e2 vital-core:hasEdgeDestination ?d2 .
}
}
}
Spark-SQL / Dataframe
URI P V
Segment RDD Property RDD
K V
Experimenting with: new Dataframe Optimizer: Catalyst, new Dataframe
DSL for query generation, and using GraphX for isolated Graph Query cases
Generate “Bad” queries, with optimizer fixing them and Spark
partitioning RDDs, as long as Spark is aware of Schema.
Key/Value Implementation
K V
URI —> Serialized Object
Lucene/SOLR Implementation
DocID
1
2
3
P1
V1
V1
P2
V2
V2
P3
V3
V3
P4
V4
V4
Inverted Index of Property Values…
NoSQL BigTable Implementation
DynamoDB (HBase, Cassandra, Accumulo, …)
ROWID
1
2
3
C1
K1=V1
K1=V1
K1=V1
C2
K1=V1
K1=V1
K1=V1
C3
K1=V1
K1=V1
K1=V1
C4
K1=V1, K1=V1
K1=V1, K1=V1
K1=V1, K1=V1
URI P V
Per Segment object table
Per Segment property table
+ Secondary Indices
+ Secondary Indices
SQL Implementation
SQL, Hive-SQL, Redshift, …
G S P O
Per Segment Table
with Partitioning (Hive)
implementation
DSL Documentation to be posted:
http://www.metaql.org/
VitalSigns, VitalService, MetaQL
https://dashboard.vital.ai/
Vital AI github: https://github.com/vital-ai/
Sample Code

Spark Code: Aspen, Aspen-Datawarehouse
Documentation Coming!
closing thoughts
Separation of Concerns yields
the Agility needed to keep up
with rapidly evolving Data.
“Domain Model as Contract” provides a framework for
consistent interpretation of Data across an application.
MetaQL provides a framework for the consistent
access and query of Data across an application.
Context: Data-Driven Application / Cognitive Applications:
Thank You!
Marc C. Hadfield, Founder

Vital AI

http://vital.ai
marc@vital.ai
917.463.4776
Pipeline DSL (ETL)
PIPELINE { // Workflow
PIPE { // a Workflow Component with dependencies
TRANSFORM { // Joins across Datasets

IF (RULE { } ) // Boolean, Query, Construct, …

THEN { RULE { } }

ELSE { RULE { } }

}
PIPE { … } // dependent PIPE
} // Output Dataset
PIPE { …

}
}
Influenced by Spark Pipeline and
Google Dataflow Pipeline
Schema Upgrade/Downgrade
UPGRADE {
upgrade(
oldClass: OLD_Person.class,
newClass: NEW_Person.class ) {
person_old, person_new ->
person_new.newName = person_old.oldName
}
}
DOWNGRADE {
downgrade(
newClass: NEW_Person.class,
oldClass: OLD_Person.class ) {
person_new, person_old ->
person_old.oldName = person_new.newName
}
}
Multiple Endpoints
def service1 = VitalService.getService(profile:”kv-users”)
def service2 = VitalService.getService(profile:”posts-db”)
def service3 = VitalService.getService(profile:”friendgraph-db”)
// given user URI:user123@email.org
// get user object from service1
// find friends of user in friendgraph via service3
// find posts of friends in posts-db
// update service1 with cache of user-to-friends-postings
// send postings of friends to user in UI

More Related Content

What's hot

NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
South London Geek Nights
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
Trey Grainger
 

What's hot (20)

Graph Analysis over JSON, Larus
Graph Analysis over JSON, LarusGraph Analysis over JSON, Larus
Graph Analysis over JSON, Larus
 
How Graph Databases efficiently store, manage and query connected data at s...
How Graph Databases efficiently  store, manage and query  connected data at s...How Graph Databases efficiently  store, manage and query  connected data at s...
How Graph Databases efficiently store, manage and query connected data at s...
 
Connected datalondon metadata-driven apps
Connected datalondon metadata-driven appsConnected datalondon metadata-driven apps
Connected datalondon metadata-driven apps
 
Family tree of data – provenance and neo4j
Family tree of data – provenance and neo4jFamily tree of data – provenance and neo4j
Family tree of data – provenance and neo4j
 
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
An Introduction to Graph: Database, Analytics, and Cloud Services
An Introduction to Graph:  Database, Analytics, and Cloud ServicesAn Introduction to Graph:  Database, Analytics, and Cloud Services
An Introduction to Graph: Database, Analytics, and Cloud Services
 
The Semantic Knowledge Graph
The Semantic Knowledge GraphThe Semantic Knowledge Graph
The Semantic Knowledge Graph
 
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
Agile data science: Distributed, Interactive, Integrated, Semantic, Micro Ser...
 
Large Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel ProcessingLarge Scale Graph Analytics with RDF and LPG Parallel Processing
Large Scale Graph Analytics with RDF and LPG Parallel Processing
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 
Leveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platformLeveraging mesos as the ultimate distributed data science platform
Leveraging mesos as the ultimate distributed data science platform
 
How Lyft Drives Data Discovery
How Lyft Drives Data DiscoveryHow Lyft Drives Data Discovery
How Lyft Drives Data Discovery
 
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the CloudBuilding Enterprise-Ready Knowledge Graph Applications in the Cloud
Building Enterprise-Ready Knowledge Graph Applications in the Cloud
 
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data LakesWebinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
Webinar: Enterprise Data Management in the Era of MongoDB and Data Lakes
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
SHACL-based data life cycle management
SHACL-based data life cycle managementSHACL-based data life cycle management
SHACL-based data life cycle management
 
Graph based data models
Graph based data modelsGraph based data models
Graph based data models
 

Similar to Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

Building social and RESTful frameworks
Building social and RESTful frameworksBuilding social and RESTful frameworks
Building social and RESTful frameworks
brendonschwartz
 

Similar to Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark (20)

Introduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data ApplicationsIntroduction to Designing and Building Big Data Applications
Introduction to Designing and Building Big Data Applications
 
QuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing WebinarQuerySurge Slide Deck for Big Data Testing Webinar
QuerySurge Slide Deck for Big Data Testing Webinar
 
Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101Ops Jumpstart: MongoDB Administration 101
Ops Jumpstart: MongoDB Administration 101
 
Building and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache AirflowBuilding and deploying LLM applications with Apache Airflow
Building and deploying LLM applications with Apache Airflow
 
Graph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft EcosystemGraph Databases in the Microsoft Ecosystem
Graph Databases in the Microsoft Ecosystem
 
Confluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & LearnConfluent & MongoDB APAC Lunch & Learn
Confluent & MongoDB APAC Lunch & Learn
 
Introducing Oslo
Introducing OsloIntroducing Oslo
Introducing Oslo
 
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDBMongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
MongoDB Evenings Toronto - Monolithic to Microservices with MongoDB
 
Tutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaborationTutorial Workgroup - Model versioning and collaboration
Tutorial Workgroup - Model versioning and collaboration
 
Document Databases & RavenDB
Document Databases & RavenDBDocument Databases & RavenDB
Document Databases & RavenDB
 
RavenDB overview
RavenDB overviewRavenDB overview
RavenDB overview
 
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and CascadingBoulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
Boulder/Denver BigData: Cluster Computing with Apache Mesos and Cascading
 
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL SystemsStrudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
 
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's DataFrom Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
From Pandas to Koalas: Reducing Time-To-Insight for Virgin Hyperloop's Data
 
USQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake EventUSQL Trivadis Azure Data Lake Event
USQL Trivadis Azure Data Lake Event
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Object- Relational Persistence in Smalltalk
Object- Relational Persistence in SmalltalkObject- Relational Persistence in Smalltalk
Object- Relational Persistence in Smalltalk
 
Michael stack -the state of apache h base
Michael stack -the state of apache h baseMichael stack -the state of apache h base
Michael stack -the state of apache h base
 
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and AtlasData Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
 
Building social and RESTful frameworks
Building social and RESTful frameworksBuilding social and RESTful frameworks
Building social and RESTful frameworks
 

Recently uploaded

Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 

Recently uploaded (20)

Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 

Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark

  • 1. Today: Marc C. Hadfield, Founder
 Vital AI
 http://vital.ai marc@vital.ai 917.463.4776 MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
  • 2. <intro> Marc C. Hadfield, Founder Vital AI
 http://vital.ai
 marc@vital.ai MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
 Quick Overview
  • 3. agenda MetaQL Intro Motivation Domain Models (Schema) MetaQL DSL MetaQL Implementations Examples
  • 4. MetaQL Leverage Domain Model (Schema) Compose Queries in Code: Typed Execute Queries on Databases, Interchangeably Minimize TCO:
 Separation of Concerns
 Developer Efficiency Query Framework Executable JVM Code! (Groovy Closure)
  • 5. MetaQL Origin Across many data-driven application implementations, a desire for:
 Reusable Processes, Tools:
 Stop re-inventing the wheel. Tools to manage “schema” across an application & organization. Tools to combine Semantic Web, NOSQL, and Hadoop/Spark. Team Collaboration:
 Human Labor is usually limiting factor.
  • 9. sample MetaQL graph query GRAPH { value segments: ["mydata"] ARC { node_constraint { Email.class } constraint { "?person1 != ?person2" } ARC_AND { ARC { edge_constraint { Edge_hasSender.class } node_constraint { Person.props().emailAddress.equalTo(“john@example.org")
 } node_constraint { Person.class } node_provides { "person1 = URI" } } ARC { edge_constraint { Edge_hasRecipient.class } node_constraint { Person.class } node_provides { "person2 = URI" } } } } }
  • 12. Internet of Things: Batch and Stream Processing Amazon Echo Amazon Echo Service haley-app webservice Vert.X Vital Prime Database DataScript Hadoop - HDFS Apache Spark Streaming, MLLIB, NLP, GraphX Aspen Datawarehouse Analytics Layer Serving Layer Haley Device Raspberry Pi Voice to Text API Cognitive Application NLP and Inference to process User request. Query Knowledge in DB Streaming Prediction Models: “Should I really have more Coffee?” External APIs…
  • 13. Demo Examples Vital Prime Database Vert.X Vital-Vertx JavaScript WebApp VitalService-JS Prediction Models DataScript https://github.com/vital-ai/vital-examples
  • 23. Cytoscape Plugin: Wordnet Data, “wine, vino”
  • 24. where are we using MetaQL? Financial Services Healthcare Internet-of-Things Start-Ups, Recommendation Apps
  • 26. application architecture Batch and Stream Processing Web / Mobile Application Application Server Transactional Database Hadoop - HDFS Apache Spark Streaming, MLLIB, GraphX Analytics Layer Serving Layer Key/Value Cache External APIs Exrernal API Services Multiple Databases + Analytics + External APIs
  • 27. enterprise application architecture Dashboard Application Server Enterprise Datawarehouse Data Silo Data Silo Data Silo Data Silo Data Silo ∞ Many Many Many Data Models…
  • 28. volume, velocity, variety polyglot persistance = multiple database technologies …but we also have very many data models. many databases, many data models, changing rapidly. too many moving parts for a developer to reasonably manage! need fewer APIs to learn!
  • 29. what happens when changes occur? Task Infrastructure DevOps Data Scientists Business + Domain Experts Developers Roles
  • 30. what changes? Data Model Changes New Data Sources Infrastructure Change Switch Databases New Prediction Models / Features New Service APIs… Many Interdependencies… Example: Change in the taxonomy of a categorization service breaks all the logic tied to the old categories.
  • 31. total cost of ownership How much code changes when we modify our data model to include new sources? How to minimize by decoupling dependencies? When we switch database technologies?
  • 32. Domain Model as “Contract” Infrastructure DevOps Data Scientists Business + Domain Experts Developers Domain Model Everyone to agree (or at least be aware) of the definition of Domain Concepts. Ue semantics to map “views”.
  • 33. MetaQL Abstraction Infrastructure DevOps Data Scientists Business + Domain Experts Developers Domain Model MetaQL Abstraction to give breathing room to Infrastructure.
  • 34. Infrastructure / DevOps Database Types: • Key/Value • Document • RDF Graph • NOSQL • Relational • Timeseries ACID vs. BASE Optimizing Query Generation Tuning Secondary Indices Update MetaQL DSL for new DB features CAP Theorem
  • 36. Domain Model Implementation Combine: SQL-style Schema with Hadoop Data Serialization Schema (Avro, Thrift, Protocol Buffers, Kyro, Parquet) add Semantics: the “Meaning” of objects Not a table “person”, but define the concept of Person to be used throughout an application. The implementation decides how to store “Person” data in it’s database.
  • 37. Domain Model Implementation Domain Model definition resolves: RDF vs Property Graph model Object Relational Impedance Mismatch Use OWL to capture Domain Model: SubClasses SubProperties Multiple Inheritance Marginal technology performance gains are hugely outweighed by Human productively gains, and wider choice of tools. Compromise across modeling paradigms .
  • 38. Domain Model Implementation Example: Healthcare Application:
 URI<Person123> IS_A: • Patient • BillableAccount • InsuredEntity Same URI across three domain concepts:
 Diagnostics Records, Billing System, Insurance System. Implementation Note: We generate code for the JVM using “traits” as a way to implement multiple inheritance (Groovy, Scala, Java8). The trait is used as a semantic marker to link to the Domain Model.
  • 39. Domain Model - Core Classes Node NodeEdge HyperNodeHyperEdge Properties: • URI • Primary Type • Types
 Edges/HyperEdges: • Source URI • Destination URI Edges: • Peer • Taxonomy Class Instances contain Properties.
  • 41. VitalSigns: Domain Model Dev Kit $ vitalsigns generate -o ./domain-ontology/enron-dataset-1.0.0.owl $ ls domain-groovy-jar enron-dataset-groovy-1.0.0.jar $ ls domain-json-schema enron-dataset-1.0.0.js OWL can be compiled into JVM code statically (create an artifact for maven), or done dynamically at runtime.
  • 42. Development with the Domain Model Code Completion from Domain Model
  • 43. Development with the Domain Model VitalSigns vs = VitalSigns.get() Musician john = new Musician().generateURI(“john") john.name = "John Lennon" john.birthday = "October 9, 1940"^xsd.xdatetime("MMMM d, yyyy”) MusicGroup thebeatles = new MusicGroup().generateURI("thebeatles") thebeatles.name = "The Beatles" // try to assign the wrong property, throws an exception try { thebeatles.birthday = "January 1, 1970"^xsd.xdatetime("MMMM d, yyyy”) } catch(Exception ex) { println ex } // no such property exception vs.addToCache( thebeatles.addEdge_hasMember(john) ) // use cache to resolve queries thebeatles.getMembers().each{ println it.name } // use database to resolve queries thebeatles.getMembers(ServiceWide).each{ println it.name } Implicit MetaQL Queries
  • 44. VitalService API • Open/Close Endpoint • Create/Remove Segment • Create/Read/Update/Delete Object • Queries (MetaQL as input closure) • Service Operations (MetaQL as input closure) • callFunction (DataScript) • init Transaction/Commit/Rollback A “Segment” is a Database (container of objects)
  • 45. MetaQL VitalSigns: Domain Model Manager • MetaQL DSL • Prediction Model DSL • Pipeline Transformation DSL (ETL) (in development) A tricky bit is find the best way to express the DSL within the allowed grammar of the host language (Groovy).
 It’s an ongoing effort.
  • 47. Query Elements • constraints: node_constraint, edge_constraint, … • comparators (equalTo, greaterThan, …) • provides, ?reference • AND, OR • OPTIONAL • Sort Criteria
  • 48. SELECT query SELECT { value limit: 100 value offset: 0 value segments: ["mydata"] constraint { Person.class } constraint { Person.props().name.equalTo("John" ) } }
  • 49. GRAPH query GRAPH { value segments: ["mydata"] ARC { node_constraint { Email.class } constraint { "?person1 != ?person2" } ARC_AND { ARC { edge_constraint { Edge_hasSender.class } node_constraint { Person.props().emailAddress.equalTo(“john@example.org") } node_constraint { Person.class } node_provides { "person1 = URI" } } ARC { edge_constraint { Edge_hasRecipient.class } node_constraint { Person.class } node_provides { "person2 = URI" } } } } }
  • 50. GRAPH query (2) GRAPH { value segments: [VitalSegment.withId('wordnet')] value inlineObjects: true ARC { node_bind { "node1" } node_constraint { SynsetNode.expandSubclasses(true) } node_constraint { SynsetNode.props().name.contains_i("happy") } ARC { edge_bind { "edge" } node_bind { "node2" } } } } Code iterating over Results can use bind names to reference objects in each solution: node1, edge, node2. <—- inline objects
  • 51. PATH query def forward = true def reverse = false PATH { value segments: segments value maxdepth: 5
 value rootURIs: [URIProperty.withString(inputURI)] if( forward ) { ARC { value direction: 'forward' // accept any edge: edge_constraint { } // accept any node: node_constraint { } } } if( reverse ) { ARC { value direction: 'reverse' // accept any edge: edge_constraint { } // accept any node: node_constraint { } } } }
  • 52. AGGREGATION query SUM Product.props().cost AVERAGE Person.props().birthday COUNT_DISTINCT Document.props().active FIRST { DISTINCT Document.props().title, expandProperty : false, order: Order.ASC } Part of a SELECT query
  • 54. Service Operations INSERT { value segment: 'testing' insert(MusicGroup.class, provides: "thebeatles") { MusicGroup thebeatles -> thebeatles.name = "The Beatles" thebeatles.URI = "thebeatles" } insert(Musician.class, provides: "john") { Musician john -> john.name = "John" john.URI = "john" } insert(Edge_hasMember) { Edge_hasMember member -> member.sourceURI = ref("thebeatles").toString() member.destinationURI = ref("john").toString() member.URI = "edge1" } } <— Using “provides” values
  • 55. Transactions def xid = service.startTransaction() service.save(xid, person123) service.commitTransaction(xid) Implemented at the service level:
  • 57. Sparql/RDF Implementation G S P O Quad Store Franz Allegrograph
  • 58. Sparql/RDF Implementation VitalGraphQuery q = builder.query { GRAPH { value segments: ["documents"] ARC { node_constraint { Person.class } node_constraint { Person.props().emailID.equalTo(“k.lay@enron.com" ) } ARC { node_constraint { EMailMessage.class } edge_constraint { Edge_hasEMailMessage.class } } }
 } }.toQuery() println "Query: " + q.toSparql()
  • 59. Sparql/RDF Implementation PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> PREFIX vital-core: <http://vital.ai/ontology/vital-core#> PREFIX p0: <http://vital.ai/ontology/enron-emails#> SELECT DISTINCT ?s1 ?d2 ?e2 FROM <segment:customer__app__documents> WHERE { { ?s1 p0:hasEmailID ?value1 . ?s1 rdf:type ?value2 . FILTER ( ?value2 = p0:Person && ?value1 = “k.lay@enron.com"^^xsd:string ) { ?d2 rdf:type ?value3 . ?e2 rdf:type ?value4 . FILTER ( ?value3 = p0:EMailMessage && ?value4 = p0:Edge_hasEMailMessage ) ?e2 vital-core:hasEdgeSource ?s1 . ?e2 vital-core:hasEdgeDestination ?d2 . } } }
  • 60. Spark-SQL / Dataframe URI P V Segment RDD Property RDD K V Experimenting with: new Dataframe Optimizer: Catalyst, new Dataframe DSL for query generation, and using GraphX for isolated Graph Query cases Generate “Bad” queries, with optimizer fixing them and Spark partitioning RDDs, as long as Spark is aware of Schema.
  • 61. Key/Value Implementation K V URI —> Serialized Object
  • 63. NoSQL BigTable Implementation DynamoDB (HBase, Cassandra, Accumulo, …) ROWID 1 2 3 C1 K1=V1 K1=V1 K1=V1 C2 K1=V1 K1=V1 K1=V1 C3 K1=V1 K1=V1 K1=V1 C4 K1=V1, K1=V1 K1=V1, K1=V1 K1=V1, K1=V1 URI P V Per Segment object table Per Segment property table + Secondary Indices + Secondary Indices
  • 64. SQL Implementation SQL, Hive-SQL, Redshift, … G S P O Per Segment Table with Partitioning (Hive)
  • 65. implementation DSL Documentation to be posted: http://www.metaql.org/ VitalSigns, VitalService, MetaQL https://dashboard.vital.ai/ Vital AI github: https://github.com/vital-ai/ Sample Code
 Spark Code: Aspen, Aspen-Datawarehouse Documentation Coming!
  • 66. closing thoughts Separation of Concerns yields the Agility needed to keep up with rapidly evolving Data. “Domain Model as Contract” provides a framework for consistent interpretation of Data across an application. MetaQL provides a framework for the consistent access and query of Data across an application. Context: Data-Driven Application / Cognitive Applications:
  • 67. Thank You! Marc C. Hadfield, Founder
 Vital AI
 http://vital.ai marc@vital.ai 917.463.4776
  • 68. Pipeline DSL (ETL) PIPELINE { // Workflow PIPE { // a Workflow Component with dependencies TRANSFORM { // Joins across Datasets
 IF (RULE { } ) // Boolean, Query, Construct, …
 THEN { RULE { } }
 ELSE { RULE { } }
 } PIPE { … } // dependent PIPE } // Output Dataset PIPE { …
 } } Influenced by Spark Pipeline and Google Dataflow Pipeline
  • 69. Schema Upgrade/Downgrade UPGRADE { upgrade( oldClass: OLD_Person.class, newClass: NEW_Person.class ) { person_old, person_new -> person_new.newName = person_old.oldName } } DOWNGRADE { downgrade( newClass: NEW_Person.class, oldClass: OLD_Person.class ) { person_new, person_old -> person_old.oldName = person_new.newName } }
  • 70. Multiple Endpoints def service1 = VitalService.getService(profile:”kv-users”) def service2 = VitalService.getService(profile:”posts-db”) def service3 = VitalService.getService(profile:”friendgraph-db”) // given user URI:user123@email.org // get user object from service1 // find friends of user in friendgraph via service3 // find posts of friends in posts-db // update service1 with cache of user-to-friends-postings // send postings of friends to user in UI