3. ● Uber is an ideal proving ground for an enterprise knowledge graph (EKG)
● 200k managed data sets
● Billions and billions of trips served
○ Low thousands of new entities per second
○ Totally doable!
● Even more sensor data
○ Use cases for graph stream processing
● Genuine need for knowledge and real-time inference
Knowledge @Uber
7. ● Real data is messy
● We are not all ontologists
8. ● Real data is messy
● We are not all ontologists
● Good enough does not scale
9. ● Real data is messy
● We are not all ontologists
● Good enough does not scale
● Beware of the hype cycle
10. ● Real data is messy
● We are not all ontologists
● Good enough does not scale
● Beware of the hype cycle
● RDF is a hard sell
11. ● Real data is messy
● We are not all ontologists
● Good enough does not scale
● Beware of the hype cycle
● RDF is a hard sell
● Property Graphs are not enough
13. ● Use and promote standards
● Invest in shared vocabulary
14. ● Use and promote standards
● Invest in shared vocabulary
● Fit the tooling to the infrastructure
15. ● Use and promote standards
● Invest in shared vocabulary
● Fit the tooling to the infrastructure
● Fit the data model to the data
16. ● Use and promote standards
● Invest in shared vocabulary
● Fit the tooling to the infrastructure
● Fit the data model to the data
● Budget for “other stuff”
17. ● Use and promote standards
● Invest in shared vocabulary
● Fit the tooling to the infrastructure
● Fit the data model to the data
● Budget for “other stuff”
● Collaborate early and often
18. Risk & Safety Knowledge Graph
This slide intentionally left blank to save entropy.
UBER KNOWLEDGE GRAPH
19. ● Controlled vocabularies for all of Uber
○ Basic type aliases
○ Structured types for geospatial data, sensor data, money, etc. etc.
○ Entities and relationships (User, Vehicle, Trip, etc.)
○ Metadata vocabularies
● Elevates domain-specific RPC and storage schemas to ontologies
● Tooling carries schemas between data representation languages
○ Protobuf, Thrift, Avro, RDF, PG, etc.
Data Standardization
20. ● Hundreds of thousands of structured datasets at Uber
● Data protections and user trust
○ GDPR and other regulations, Uber’s own data policies
○ What kind of user data? Where is it?
○ Heroic numbers of manual annotations
■ Limited expressivity, limited guarantees
■ Inference is required
● Two birds: in annotating datasets, standardize and compose schemas
○ Now we have a true global knowledge graph
○ Investigating efficient reasoning and “No ETL” solutions
Metadata graph
21. ● Common data model for RPC, storage, and KR at Uber
● In progress: alignment with the Property Graph Schema Working Group
● In progress: “Universal structure” of TinkerPop4
Algebraic Property Graphs
22. ● Real data is messy
● We are not all ontologists
● Good enough does not scale
● Beware of the hype cycle
● RDF is a hard sell
● The Property Graph is not enough
● Use and promote standards
● Invest in shared vocabulary
● Fit the tooling to the infrastructure
● Fit the data model to the data
● Budget for “other stuff”
● Collaborate early and often