Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Data Provision API with BigQuery - Google Cloud Summit Jakarta 18

53 views

Published on

This talk presented how Traveloka uses Google Cloud BigQuery to build Data Provisioning API which enables the microservices in Traveloka to consume data from our BigQuery.

Published in: Engineering
  • DOWNLOAD FULL BOOKS INTO AVAILABLE FORMAT ......................................................................................................................... ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL PDF EBOOK here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL EPUB Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... 1.DOWNLOAD FULL doc Ebook here { https://tinyurl.com/y8nn3gmc } ......................................................................................................................... ......................................................................................................................... ......................................................................................................................... .............. Browse by Genre Available eBooks ......................................................................................................................... Art, Biography, Business, Chick Lit, Children's, Christian, Classics, Comics, Contemporary, Cookbooks, Crime, Ebooks, Fantasy, Fiction, Graphic Novels, Historical Fiction, History, Horror, Humor And Comedy, Manga, Memoir, Music, Mystery, Non Fiction, Paranormal, Philosophy, Poetry, Psychology, Religion, Romance, Science, Science Fiction, Self Help, Suspense, Spirituality, Sports, Thriller, Travel, Young Adult,
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Be the first to like this

Data Provision API with BigQuery - Google Cloud Summit Jakarta 18

  1. 1. CREATE INTELLIGENCE FROM DATA
  2. 2. Session 4 14:40 - 15:15 Data Lake API with BigQuery. PRESENTERS: Imre Nagi Software Engineer Traveloka Rendy Bambang Jr. Data System Architect Traveloka
  3. 3. Imre Nagi Software Engineer Traveloka Data Provisioning API with BigQuery. Rendy Bambang Jr. Data System Architect Traveloka
  4. 4. Traveloka 7 offices Jakarta, Singapore, Bangalore, etc 2,000+ Global employees 400+ Engineers
  5. 5. Metrics ● ~4 TiB per day data goes in to PubSub ● ~400 TB (~500 billion rows) data in BigQuery ● ~250 TB data in GCS ● >2 PiB BigQuery data scan per month (excluding ETLs) ● >60k batch jobs executed per day ● >2500 Dataflow jobs per day ● >1500 charts using BigQuery generated via BI tools
  6. 6. AGENDA ● How we use Data ● Problem Statement ● Data Lake API ● Future work
  7. 7. Data drives product & enables business use case Each mission team has unique use cases in terms of data usage. Data team in Traveloka needs to fulfill this need in order to maximize Traveloka growth and revenue.
  8. 8. ● Personalization ● Fraud Detection ● Improving User Experience ● A/B Test ● Giving recommendation ● Review Moderation ● Photo classification ● and many other use cases How we use data
  9. 9. 1Understanding The Problem
  10. 10. Data Provisioning in Traveloka Machine To Machine Machine To Human Frequent, Small request Huge Data, High Latency
  11. 11. Data Provisioning in Traveloka Machine To Machine Machine To Human Frequent, Small request Huge Data, High Latency Huge Data, High Latency
  12. 12. How we previously deliver big data to product team Product team requests data for a specific use case Data team provides raw or pre-processed data in a blob storage Data team grants access to bucket or tables for team microservices Product team pulls the data and do its job 1 2 3 4
  13. 13. IngestionSystemDataProcessing Kafka Batch Ingest S3 Data Lake PostgreSQL ETL Traveloka Services Initial Data Delivery System
  14. 14. This becomes problematic ● Systems are tightly coupled ● No column level access control ● Hard to audit data usage
  15. 15. What we need? A standardised Way in Accessing Data ● Clear contract between client and server ● Client is not tightly coupled to internal implementation ● Better access control
  16. 16. 2Async Data Provisioning API
  17. 17. Data Provisioning API Requirements 1. Clear contract with JSON query 2. Asynchronously return huge number of data 3. Well defined access control up to column level
  18. 18. Technology We Use ● BigQuery ● Cloud Composer ● GKE ● Stackdriver Logging ● Cloud Storage ● Cloud SQL ● RxJava ● Dropwizard
  19. 19. Data Provisioning API underlying architecture Storage Cloud Storage For Storing Results Cloud SQL BigQuery Interface Data Provision API Consumers Traveloka Backend Services Data Source BigQuery Tracking Data Processing Pipeline BigQuery SQL Orchestration Monitoring Logging Architecture: Data Provision API overall architecture
  20. 20. Storage Cloud Storage For Storing Results Cloud SQL BigQuery Interface Data Provision API Consumers Traveloka Backend Services Data Source BigQuery Tracking Data Processing Pipeline BigQuery SQL Orchestration Monitoring Logging
  21. 21. How Data Provisioning API works Query Interpreter Interf ace Monitoring Logging Service Client Kubernetes Engine Query Validation BigQuery Cloud SQL Cloud StorageJob Creation Query Execution Write to Permanent Table Export Permanent Table to GCS Generate Sign URL Store the sign URL ACL Checks
  22. 22. 3Future Improvement
  23. 23. Future Improvement ● Use Queue to manage the jobs ● Add more capabilities to the query features (complex aggregation, etc) ● Separating ACL service to enable service reuse.
  24. 24. THANK YOU
  25. 25. Jakarta, Indonesia October 4th, 2018

×