Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

The journey of Moving from AWS ELK to GCP Data Pipeline

This is a real case from VMfive to shifting ELK architecture from AWS. Currently GCP Data Pipeline provide us more efficiency and stable environment for running our service.

The journey of Moving from AWS ELK to GCP Data Pipeline

  1. 1. Build DMP on top of GCP VMFive - Randy Huang
  2. 2. Agenda • Migrated Pipeline to GCP • Cost Comparison • Business Use Case • Fluentd Demo
  3. 3. ELK + AWS EMR Kinesis Lambda
  4. 4. Pros & Cons • Pros : • Well Support. • Well docs. • Easy to find Reference. • Cons : • High Cost. • Not open source. • Have to set the scale at first.
  5. 5. Pipeline on GCP Dataflow BigQuery Machine Learning Data Visualization Compute Engine Global Load Balancing
  6. 6. Datastudio
  7. 7. The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 7 Batch BI Analysis Storage
 Cloud Storage Processing
 Cloud DataflowStreaming Time Series Streaming
 Cloud Pub/Sub Storage
 BigQuery
  8. 8. The Products and Services logos may be used to accurately reference Google's technology and tools, for instance in architecture diagrams. 8 Targeting Engines Data Sources Machine Learning Applications API Backend
 Compute Engine Spark MLlib
 Cloud Dataproc App Engine Transform Data Hosted Models
 Cloud Machine Learning Real-Time
 Prediction API Device Related
 Cloud Pub/Sub Behavior Related
 Cloud Pub/Sub 3rd Party Data
 Cloud Pub/Sub Redis
 Compute Engine
  9. 9. Pros & Cons • Pros : • Cost-effective. • Operation-effective. • Google got your back. • Cons : • API/SDK changes everyday. • Some still in beta mode. • Docs everywhere.
  10. 10. Workflow Monitoring • Digdag <Airflow/Oozie/Luigi> • Native support Python & Ruby • Multi-Cloud • Modular • Workflow as code • Docker Support • Altering to Slack
  11. 11. Digdag Sample
  12. 12. Digdag
  13. 13. Cost Comparison • $2000 on AWS per month • about $200 on GCP production • about another $200 for dev • 50M events per month
  14. 14. Business Use Case • Digital Ads Targeting • User Behavior Tagging • BI • GEO Reporting • KPI Reporting • User Demographic
  15. 15. Some Tips • BigQuery • https://status.cloud.google.com/incident/bigquery/ 18022 • Solved by Fluentd’s Retry and HA • Dataflow’s SDK & docs is not sync • Dataflow Sideinput has a bug with Streaming mode • Compute Engine SLB - TCP/UDP setup for forwarding
  16. 16. Flunetd Update • Release note for v0.14 • sub second event flush • New Plugin APIS support formatting configurations dynamically (e.g., path /my/dest/${tag}/mydata.%Y-%m-%d.log) • Secure Forward
  17. 17. Demo • Nginx -> Fluentd -> BigQuery -> DataStudio • MySQL -> Fluentd -> BigQuery

×