4. MetaQL
Leverage Domain Model (Schema)
Compose Queries in Code: Typed
Execute Queries on Databases,
Interchangeably
Minimize TCO:
Separation of Concerns
Developer Efficiency
Query Framework
Executable JVM Code! (Groovy Closure)
5. MetaQL Origin
Across many data-driven application
implementations, a desire for:
Reusable Processes, Tools:
Stop re-inventing the wheel.
Tools to manage “schema” across an
application & organization.
Tools to combine Semantic Web,
NOSQL, and Hadoop/Spark.
Team Collaboration:
Human Labor is usually limiting factor.
12. Internet of Things:
Batch and
Stream
Processing
Amazon Echo
Amazon Echo Service
haley-app webservice
Vert.X
Vital Prime
Database
DataScript
Hadoop - HDFS
Apache Spark
Streaming, MLLIB, NLP, GraphX
Aspen Datawarehouse
Analytics Layer
Serving Layer
Haley Device
Raspberry Pi
Voice to Text API
Cognitive Application
NLP and Inference to
process User request.
Query Knowledge in DB
Streaming Prediction Models:
“Should I really have
more Coffee?”
External APIs…
28. volume, velocity, variety
polyglot persistance = multiple database technologies
…but we also have very many data models.
many databases, many data models, changing rapidly.
too many moving parts for a developer to reasonably manage!
need fewer APIs to learn!
29. what happens when changes occur?
Task
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Roles
30. what changes?
Data Model Changes
New Data Sources
Infrastructure Change
Switch Databases
New Prediction Models / Features
New Service APIs…
Many Interdependencies…
Example: Change in the taxonomy of a categorization
service breaks all the logic tied to the old categories.
31. total cost of ownership
How much code changes when we modify our data
model to include new sources?
How to minimize by decoupling dependencies?
When we switch database technologies?
32. Domain Model as “Contract”
Infrastructure
DevOps
Data Scientists
Business +
Domain Experts
Developers
Domain
Model
Everyone to agree (or at least be aware) of
the definition of Domain Concepts.
Ue semantics to map “views”.
34. Infrastructure / DevOps
Database Types:
• Key/Value
• Document
• RDF Graph
• NOSQL
• Relational
• Timeseries
ACID vs. BASE
Optimizing Query Generation
Tuning Secondary Indices
Update MetaQL DSL for new DB features
CAP Theorem
36. Domain Model Implementation
Combine:
SQL-style Schema
with
Hadoop Data Serialization Schema
(Avro, Thrift, Protocol Buffers, Kyro, Parquet)
add
Semantics: the “Meaning” of objects
Not a table “person”, but define the concept of
Person to be used throughout an application.
The implementation decides how to store “Person”
data in it’s database.
37. Domain Model Implementation
Domain Model definition resolves:
RDF vs Property Graph model
Object Relational Impedance Mismatch
Use OWL to capture Domain Model:
SubClasses
SubProperties
Multiple Inheritance
Marginal technology performance gains are hugely outweighed
by Human productively gains, and wider choice of tools.
Compromise
across modeling
paradigms .
38. Domain Model Implementation
Example: Healthcare Application:
URI<Person123> IS_A:
• Patient
• BillableAccount
• InsuredEntity
Same URI across three domain concepts:
Diagnostics Records, Billing System, Insurance System.
Implementation Note:
We generate code for the JVM using “traits” as a way to implement
multiple inheritance (Groovy, Scala, Java8).
The trait is used as a semantic marker to link to the Domain Model.
39. Domain Model - Core Classes
Node NodeEdge
HyperNodeHyperEdge
Properties:
• URI
• Primary Type
• Types
Edges/HyperEdges:
• Source URI
• Destination URI
Edges:
• Peer
• Taxonomy
Class Instances
contain Properties.
41. VitalSigns: Domain Model Dev Kit
$ vitalsigns generate -o ./domain-ontology/enron-dataset-1.0.0.owl
$ ls domain-groovy-jar
enron-dataset-groovy-1.0.0.jar
$ ls domain-json-schema
enron-dataset-1.0.0.js
OWL can be compiled into JVM code
statically (create an artifact for maven), or
done dynamically at runtime.
43. Development with the Domain Model
VitalSigns vs = VitalSigns.get()
Musician john = new Musician().generateURI(“john")
john.name = "John Lennon"
john.birthday = "October 9, 1940"^xsd.xdatetime("MMMM d, yyyy”)
MusicGroup thebeatles = new MusicGroup().generateURI("thebeatles")
thebeatles.name = "The Beatles"
// try to assign the wrong property, throws an exception
try { thebeatles.birthday = "January 1, 1970"^xsd.xdatetime("MMMM d, yyyy”)
} catch(Exception ex) { println ex } // no such property exception
vs.addToCache( thebeatles.addEdge_hasMember(john) )
// use cache to resolve queries
thebeatles.getMembers().each{ println it.name }
// use database to resolve queries
thebeatles.getMembers(ServiceWide).each{ println it.name }
Implicit MetaQL Queries
44. VitalService API
• Open/Close Endpoint
• Create/Remove Segment
• Create/Read/Update/Delete Object
• Queries (MetaQL as input closure)
• Service Operations (MetaQL as input closure)
• callFunction (DataScript)
• init Transaction/Commit/Rollback
A “Segment” is a Database (container of objects)
45. MetaQL
VitalSigns: Domain Model Manager
• MetaQL DSL
• Prediction Model DSL
• Pipeline Transformation DSL (ETL)
(in development)
A tricky bit is find the best way to express
the DSL within the allowed grammar of the
host language (Groovy).
It’s an ongoing effort.
50. GRAPH query (2)
GRAPH {
value segments: [VitalSegment.withId('wordnet')]
value inlineObjects: true
ARC {
node_bind { "node1" }
node_constraint { SynsetNode.expandSubclasses(true) }
node_constraint { SynsetNode.props().name.contains_i("happy") }
ARC {
edge_bind { "edge" }
node_bind { "node2" }
}
}
}
Code iterating over Results can use bind names to
reference objects in each solution: node1, edge, node2.
<—- inline objects
51. PATH query
def forward = true
def reverse = false
PATH {
value segments: segments
value maxdepth: 5
value rootURIs: [URIProperty.withString(inputURI)]
if( forward ) {
ARC {
value direction: 'forward'
// accept any edge: edge_constraint { }
// accept any node: node_constraint { }
}
}
if( reverse ) {
ARC {
value direction: 'reverse'
// accept any edge: edge_constraint { }
// accept any node: node_constraint { }
}
}
}
52. AGGREGATION query
SUM Product.props().cost
AVERAGE Person.props().birthday
COUNT_DISTINCT Document.props().active
FIRST { DISTINCT Document.props().title, expandProperty : false, order: Order.ASC }
Part of a SELECT query
60. Spark-SQL / Dataframe
URI P V
Segment RDD Property RDD
K V
Experimenting with: new Dataframe Optimizer: Catalyst, new Dataframe
DSL for query generation, and using GraphX for isolated Graph Query cases
Generate “Bad” queries, with optimizer fixing them and Spark
partitioning RDDs, as long as Spark is aware of Schema.
65. implementation
DSL Documentation to be posted:
http://www.metaql.org/
VitalSigns, VitalService, MetaQL
https://dashboard.vital.ai/
Vital AI github: https://github.com/vital-ai/
Sample Code
Spark Code: Aspen, Aspen-Datawarehouse
Documentation Coming!
66. closing thoughts
Separation of Concerns yields
the Agility needed to keep up
with rapidly evolving Data.
“Domain Model as Contract” provides a framework for
consistent interpretation of Data across an application.
MetaQL provides a framework for the consistent
access and query of Data across an application.
Context: Data-Driven Application / Cognitive Applications:
67. Thank You!
Marc C. Hadfield, Founder
Vital AI
http://vital.ai
marc@vital.ai
917.463.4776
68. Pipeline DSL (ETL)
PIPELINE { // Workflow
PIPE { // a Workflow Component with dependencies
TRANSFORM { // Joins across Datasets
IF (RULE { } ) // Boolean, Query, Construct, …
THEN { RULE { } }
ELSE { RULE { } }
}
PIPE { … } // dependent PIPE
} // Output Dataset
PIPE { …
}
}
Influenced by Spark Pipeline and
Google Dataflow Pipeline
70. Multiple Endpoints
def service1 = VitalService.getService(profile:”kv-users”)
def service2 = VitalService.getService(profile:”posts-db”)
def service3 = VitalService.getService(profile:”friendgraph-db”)
// given user URI:user123@email.org
// get user object from service1
// find friends of user in friendgraph via service3
// find posts of friends in posts-db
// update service1 with cache of user-to-friends-postings
// send postings of friends to user in UI