Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Serverless Big Data Architecture on Google Cloud Platform at Credit OK

298 views

Published on

This is a talk at at Barcamp Bangkhen 2018,
presented by Kriangkrai Chaonithi.
I shared my experience at Credit OK on building a data pipeline to ingest huge amount of customer data to our big data analytic warehouse using serverless services on Google platform.
As a result, we can make it without setting up any servers to handle our data at a very minimal cost.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Serverless Big Data Architecture on Google Cloud Platform at Credit OK

  1. 1. Serverless Big Data Architecture on Google Cloud Platform at Presented by Kriangkrai Chaonithi @spicydog On 25/11/2018, At Barcamp Bangkhen 9
  2. 2. Hello! My name is Gap Education ● BS Applied Computer Science (KMUTT) ● MS Applied Computer Engineering (KMUTT) Work Experience ● Former Android, iOS & PHP Developer at Longdo.COM ● Former R&D Manager at Insightera ● CTO & co-founder at Credit OK Fields of Interests ● Software Engineering ● Computer Security ● Servers & Cloud & Distributed Computing ● Machine Learning & NLP https://spicydog.me
  3. 3. Agenda ● Server & application deployment history ● Introduction to Google Cloud Platform products ○ Computing ○ Storage & databases ○ Data analytics ● Big data architecture at Credit OK ○ About Credit OK ○ Why we use serverless ○ Our requirements ○ Our solutions ○ The summary
  4. 4. Server & Application Deployment History
  5. 5. Bare Metal Server ● Pre-cloud era (probably..) ● Install OS and dependencies on a machine ● One machine - one server ● Expose the network to the internet ● Colocation/on-premise ● SSH/FTP/Git to the server
  6. 6. Virtualization ● One machine - many servers ● One machine multiple customers ● VPS / Cloud ● SSH/FTP/Git to the server IaaS
  7. 7. Containers & Micro Services ● Docker / Kubernetes ● Auto deployment ● Auto scale (automatic spawn new nodes) ● Pay base on number of nodes ● Infrastructure as code! (new concept!) PaaS
  8. 8. Why Containers?
  9. 9. Why Container Orchestration? https://blog.risingstack.com/what-is-kubernetes-how-to-get-started/
  10. 10. Serverless ● Write code and deploy! ● Auto deploy ● Auto scale ● Pay per request ● No infrastructure!! SaaS
  11. 11. It’s time to talk about..
  12. 12. Some Famous Features on GCP
  13. 13. GCP Computing Virtual Machine Containers Severless
  14. 14. Let’s Review Types of Databases SQL NoSQL
  15. 15. CAP Theorem
  16. 16. GCP Storages & Databases Non-serverless Serverless
  17. 17. GCP Data Analytics Pipeline Analytics Visualization
  18. 18. Credit Scoring Platform on Big Data Analytics creditok.co
  19. 19. Why use serverless on big data? ● Scalable & super high performance ● No more server maintenance :) ● Easier to optimize ● Only pay per use
  20. 20. Requirements ● Have a HUGE data warehouse for batch processing ● Our customer have on-premise data on >400 sites ● Data ingestor app is needed to install to every site ● Data ingestor app must be able to run on ● Data ingestor app must be super robust and easy to install ● Must work automatically everyday, task scheduler
  21. 21. When >400 sites upload large files to your server at the same time.. This is unintentional DDoS!
  22. 22. So we mainly use cloud function ● Auto scale ● But only accept <10 MB body size and also use Compute/App Engine for >10MB files
  23. 23. Raw Data Source Raw Data Source Data Flow Architecture
  24. 24. Serverless Big Data Architecture In Summary ● Focus on design & coding ● Few people to achieve huge task ● No cost on idle server, pay as you use (GCS storage ~$0.02 per GB) ● Processing cost is surprisingly low when optimized (Beware of BigQuery cost!)
  25. 25. Beware of ZONE_RESOURCE_POOL_EXHAUSTED ● Serverless doesn’t mean no server, you just do not need to spawn servers/workers ● Worker pools have limit, do not run your app at the peak time (but when!!) ● Hopefully Google will solve the problem soon :)
  26. 26. We Are Hiring! ● PHP Laravel/Lumen Developer ● Data Engineer ● Credit Risk Analyst hr@creditok.co https://jobs.blognone.com/company/creditok
  27. 27. Qu s o & An er
  28. 28. Time is short, let’s utilize the networks. Feel free to connect with me via spicydog.me

×