Ground: A Data Context Service
Vikram Sreekanti, Rolando Garcia, Joe Hellerstein
RISE Camp Fall 2017, 09/08/2017
REFLECTION: RISE CAMP
• Ray: Distributed Python and RL on CPUs & GPUs
• Clipper: Prediction Serving
• PyWren: Serverless Analytics in the Cloud
THEMES IMPLICATIONS
Improved Programmability
New Modes of
Operationalization
Data-Centric & Data Rich
More programs, more
developers, more depth.
More deployments, more
complexity.
More scale, more
dependencies.
WHAT’S MISSING?
• Holistically capturing context around “the more”:
• Data, code, training, testing, pipelines, usage, revisions…
• Previous open source state of play: Hive Metastore.
• Managing & exploiting the “digital exhaust” of this activity.
• What and how do we learn from capturing usage information?
THE ABCS OF DATA CONTEXT
[A]pplication Context: Describes how raw bits are interpreted
for use.
[B]ehavioral Context: Information about how data was created
and used by real people or systems.
[C]hange Over Time: The version history of the other two forms
of data context.
MODEL GRAPHS
Schema 1
Table 1
Column 1 Column c
Table t
Column 1 Column d
foreign key
Object 1
member k1:
string
member k2
Object 2
member k1
member k2:
number
member k11:
string
member k12
element 1 element 2 element 3
element 1 element 2 element 3
Root
ABOVEGROUND API TO APPLICATIONS
UNDERGROUND API TO SERVICES
METAMODEL
COMMON GROUND
GROUND ARCHITECTURE
GROUND ARCHITECTURE
AI Pipeline
Management
Time-Travel
Parsing &
Featurization
Reference Data
Wrangling
Analytics &
Vis
Data Quality
& Change
Catalog &
Discovery
Crawling
and Ingestion
Search &
Query
ABOVEGROUND API TO APPLICATIONS
UNDERGROUND API TO SERVICES
METAMODEL
COMMON GROUND
Versioned
Storage
ID & Auth
ID & Auth
INITIAL FOCUSES
AI Pipeline
Management
Time-Travel
Parsing &
Featurization
Reference Data
Wrangling
Analytics &
Vis
Data Quality
& Change
Catalog &
Discovery
Crawling
and Ingestion
Search &
Query
ABOVEGROUND API TO APPLICATIONS
UNDERGROUND API TO SERVICES
METAMODEL
COMMON GROUND
Versioned
Storage
INITIAL FOCUSES
Time-Travel
Parsing &
Featurization
Reference Data
Wrangling
Analytics &
Vis
Data Quality
& Change
Catalog &
Discovery
Crawling
and Ingestion
Search &
Query
ABOVEGROUND API TO APPLICATIONS
UNDERGROUND API TO SERVICES
METAMODEL
COMMON GROUND
Versioned
Storage
ID & Auth
AI Pipeline
Management
• Current version: v0.1.2.
• Initial community field testing has begun (Capital One,
hotels.com)
• Input, feedback, and contributions welcome!
CURRENT STATUS
GOALS FOR TODAY
• Familiarize you with Common Ground API and model:
• Work through a simple AI pipeline scenario
• Explore how to build applications that generate & exploit data
context