Advertisement
Advertisement

More Related Content

Advertisement

Ground: A Data Context Service

  1. Ground: A Data Context Service Vikram Sreekanti, Rolando Garcia, Joe Hellerstein RISE Camp Fall 2017, 09/08/2017
  2. REFLECTION: RISE CAMP • Ray: Distributed Python and RL on CPUs & GPUs • Clipper: Prediction Serving • PyWren: Serverless Analytics in the Cloud
  3. THEMES IMPLICATIONS Improved Programmability New Modes of Operationalization Data-Centric & Data Rich More programs, more developers, more depth. More deployments, more complexity. More scale, more dependencies.
  4. WHAT’S MISSING? • Holistically capturing context around “the more”: • Data, code, training, testing, pipelines, usage, revisions… • Previous open source state of play: Hive Metastore. • Managing & exploiting the “digital exhaust” of this activity. • What and how do we learn from capturing usage information?
  5. This is data context.
  6. DATA CONTEXT The information surrounding the use of data in an organization.
  7. THE ABCS OF DATA CONTEXT [A]pplication Context: Describes how raw bits are interpreted for use. [B]ehavioral Context: Information about how data was created and used by real people or systems. [C]hange Over Time: The version history of the other two forms of data context.
  8. A broader context for big data.
  9. DESIGN PRINCIPLES • Model-Agnostic • Immutable • Scalable • Open community
  10. ABOVEGROUND API TO APPLICATIONS UNDERGROUND API TO SERVICES METAMODEL COMMON GROUND
  11. METAMODEL A: Model Graphs C: Version Graphs B: Usage Graphs
  12. METAMODEL A: Model Graphs
  13. MODEL GRAPHS Schema 1 Table 1 Column 1 Column c Table t Column 1 Column d foreign key Object 1 member k1: string member k2 Object 2 member k1 member k2:
 number member k11: string member k12 element 1 element 2 element 3 element 1 element 2 element 3 Root
  14. METAMODEL A: Model Graphs C: Version Graphs
  15. VERSION GRAPHS a3eb4b765520b0d0ab90594dcf2373c1ce 0e9233e8e99cccd6861d304968efa4c9 3e64220f08374629ad43ca652d4ce7cef 3e0bada008655fe32d7d136eac0a3f333fd75a4ba16f96d11f3f954854acc2d73 In this order In no particular order
  16. METAMODEL A: Model Graphs C: Version Graphs B: Usage Graphs
  17. METAMODEL A: Model Graphs C: Version Graphs B: Usage Graphs
  18. ABOVEGROUND API TO APPLICATIONS UNDERGROUND API TO SERVICES METAMODEL COMMON GROUND GROUND ARCHITECTURE
  19. GROUND ARCHITECTURE AI Pipeline
 Management Time-Travel Parsing &
 Featurization Reference Data Wrangling Analytics &
 Vis Data Quality 
 & Change Catalog &
 Discovery Crawling
 and Ingestion Search &
 Query ABOVEGROUND API TO APPLICATIONS UNDERGROUND API TO SERVICES METAMODEL COMMON GROUND Versioned
 Storage ID & Auth
  20. ID & Auth INITIAL FOCUSES AI Pipeline
 Management Time-Travel Parsing &
 Featurization Reference Data Wrangling Analytics &
 Vis Data Quality 
 & Change Catalog &
 Discovery Crawling
 and Ingestion Search &
 Query ABOVEGROUND API TO APPLICATIONS UNDERGROUND API TO SERVICES METAMODEL COMMON GROUND Versioned
 Storage
  21. INITIAL FOCUSES Time-Travel Parsing &
 Featurization Reference Data Wrangling Analytics &
 Vis Data Quality 
 & Change Catalog &
 Discovery Crawling
 and Ingestion Search &
 Query ABOVEGROUND API TO APPLICATIONS UNDERGROUND API TO SERVICES METAMODEL COMMON GROUND Versioned
 Storage ID & Auth AI Pipeline
 Management
  22. • Current version: v0.1.2. • Initial community field testing has begun (Capital One, hotels.com) • Input, feedback, and contributions welcome! CURRENT STATUS
  23. GOALS FOR TODAY • Familiarize you with Common Ground API and model: • Work through a simple AI pipeline scenario • Explore how to build applications that generate & exploit data context
  24. THANKS! https://github.com/ground-context/ground http://www.ground-context.org http://bit.ly/ground-risecamp2017 vikrams@cs.berkeley.edu @vsreekanti
Advertisement