Serverless Big Data Architecture on Google Cloud Platform at Credit OK

Kriangkrai Chaonithi
Kriangkrai ChaonithiCTO & Co-founder at spicydog.org
Serverless Big Data Architecture
on
Google Cloud Platform
at
Presented by Kriangkrai Chaonithi @spicydog
On 25/11/2018, At Barcamp Bangkhen 9
Hello! My name is Gap
Education
● BS Applied Computer Science (KMUTT)
● MS Applied Computer Engineering (KMUTT)
Work Experience
● Former Android, iOS & PHP Developer at Longdo.COM
● Former R&D Manager at Insightera
● CTO & co-founder at Credit OK
Fields of Interests
● Software Engineering
● Computer Security
● Servers & Cloud & Distributed Computing
● Machine Learning & NLP
https://spicydog.me
Agenda
● Server & application deployment history
● Introduction to Google Cloud Platform products
○ Computing
○ Storage & databases
○ Data analytics
● Big data architecture at Credit OK
○ About Credit OK
○ Why we use serverless
○ Our requirements
○ Our solutions
○ The summary
Server & Application
Deployment History
Bare Metal Server
● Pre-cloud era (probably..)
● Install OS and dependencies on a machine
● One machine - one server
● Expose the network to the internet
● Colocation/on-premise
● SSH/FTP/Git to the server
Virtualization
● One machine - many servers
● One machine multiple customers
● VPS / Cloud
● SSH/FTP/Git to the server
IaaS
Containers & Micro Services
● Docker / Kubernetes
● Auto deployment
● Auto scale (automatic spawn new nodes)
● Pay base on number of nodes
● Infrastructure as code! (new concept!)
PaaS
Why Containers?
Why Container Orchestration?
https://blog.risingstack.com/what-is-kubernetes-how-to-get-started/
Serverless
● Write code and deploy!
● Auto deploy
● Auto scale
● Pay per request
● No infrastructure!!
SaaS
It’s time to talk about..
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Some Famous Features on GCP
GCP Computing
Virtual Machine
Containers
Severless
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Let’s Review Types of Databases
SQL NoSQL
CAP Theorem
GCP Storages & Databases
Non-serverless
Serverless
GCP Data Analytics
Pipeline Analytics Visualization
Serverless Big Data Architecture on Google Cloud Platform at Credit OK
Credit Scoring Platform on Big Data Analytics
creditok.co
Why use serverless on big data?
● Scalable & super high performance
● No more server maintenance :)
● Easier to optimize
● Only pay per use
Requirements
● Have a HUGE data warehouse for batch processing
● Our customer have on-premise data on >400 sites
● Data ingestor app is needed to install to every site
● Data ingestor app must be able to run on
● Data ingestor app must be super robust and easy to install
● Must work automatically everyday, task scheduler
When >400 sites upload large files
to your server at the same time..
This is unintentional DDoS!
So we mainly use cloud function
● Auto scale
● But only accept <10 MB body size
and also use
Compute/App Engine
for >10MB files
Raw Data
Source
Raw Data
Source
Data Flow Architecture
Serverless
Big Data Architecture
In Summary
● Focus on design & coding
● Few people to achieve huge task
● No cost on idle server, pay as you use
(GCS storage ~$0.02 per GB)
● Processing cost is surprisingly low when optimized
(Beware of BigQuery cost!)
Beware of ZONE_RESOURCE_POOL_EXHAUSTED
● Serverless doesn’t mean no server, you just do not need to spawn servers/workers
● Worker pools have limit, do not run your app at the peak time (but when!!)
● Hopefully Google will solve the problem soon :)
We Are Hiring!
● PHP Laravel/Lumen Developer
● Data Engineer
● Credit Risk Analyst
hr@creditok.co
https://jobs.blognone.com/company/creditok
Qu s o & An er
Time is short, let’s utilize the networks.
Feel free to connect with me via spicydog.me
1 of 31

More Related Content

What's hot(20)

Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
Mike Dirolf38.4K views
Apache Spark ArchitectureApache Spark Architecture
Apache Spark Architecture
Alexey Grishchenko75.9K views
High performance computing for researchHigh performance computing for research
High performance computing for research
Esteban Hernandez1.4K views
Cloud and Grid ComputingCloud and Grid Computing
Cloud and Grid Computing
Leen Blom2.5K views
Apache Flink Adoption at ShopifyApache Flink Adoption at Shopify
Apache Flink Adoption at Shopify
Yaroslav Tkachenko1.1K views
Introduction to sparkIntroduction to spark
Introduction to spark
Duyhai Doan3.5K views
GCP Data Engineer cheatsheetGCP Data Engineer cheatsheet
GCP Data Engineer cheatsheet
Guang Xu20.7K views
Introduction to Storm Introduction to Storm
Introduction to Storm
Chandler Huang20.1K views
Introducing MongoDB AtlasIntroducing MongoDB Atlas
Introducing MongoDB Atlas
MongoDB2.3K views
Behind the scenes   data engineeringBehind the scenes   data engineering
Behind the scenes data engineering
Else de boer236 views
RDBMS vs Hadoop vs SparkRDBMS vs Hadoop vs Spark
RDBMS vs Hadoop vs Spark
Laxmi8583 views
Datacenter Strategy, Design, and BuildDatacenter Strategy, Design, and Build
Datacenter Strategy, Design, and Build
Christopher Kelley1.2K views
Appache Cassandra  Appache Cassandra
Appache Cassandra
nehabsairam219 views

Similar to Serverless Big Data Architecture on Google Cloud Platform at Credit OK(20)

kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
Krivoy Rog IT Community385 views
Web Performance OptimizationWeb Performance Optimization
Web Performance Optimization
Livares Technologies Pvt Ltd34 views
Workflow Engines + LuigiWorkflow Engines + Luigi
Workflow Engines + Luigi
Vladislav Supalov1.9K views
Introduction to  Modern DevOps TechnologiesIntroduction to  Modern DevOps Technologies
Introduction to Modern DevOps Technologies
Kriangkrai Chaonithi373 views
Cloud computingCloud computing
Cloud computing
Yash Patel127 views

Recently uploaded(20)

The Research Portal of Catalonia: Growing more (information) & more (services)The Research Portal of Catalonia: Growing more (information) & more (services)
The Research Portal of Catalonia: Growing more (information) & more (services)
CSUC - Consorci de Serveis Universitaris de Catalunya51 views
ChatGPT and AI for Web DevelopersChatGPT and AI for Web Developers
ChatGPT and AI for Web Developers
Maximiliano Firtman152 views
ThroughputThroughput
Throughput
Moisés Armani Ramírez28 views
CXL at OCPCXL at OCP
CXL at OCP
CXL Forum183 views
Liqid: Composable CXL PreviewLiqid: Composable CXL Preview
Liqid: Composable CXL Preview
CXL Forum118 views
Web Dev - 1 PPT.pdfWeb Dev - 1 PPT.pdf
Web Dev - 1 PPT.pdf
gdsczhcet48 views

Serverless Big Data Architecture on Google Cloud Platform at Credit OK

  • 1. Serverless Big Data Architecture on Google Cloud Platform at Presented by Kriangkrai Chaonithi @spicydog On 25/11/2018, At Barcamp Bangkhen 9
  • 2. Hello! My name is Gap Education ● BS Applied Computer Science (KMUTT) ● MS Applied Computer Engineering (KMUTT) Work Experience ● Former Android, iOS & PHP Developer at Longdo.COM ● Former R&D Manager at Insightera ● CTO & co-founder at Credit OK Fields of Interests ● Software Engineering ● Computer Security ● Servers & Cloud & Distributed Computing ● Machine Learning & NLP https://spicydog.me
  • 3. Agenda ● Server & application deployment history ● Introduction to Google Cloud Platform products ○ Computing ○ Storage & databases ○ Data analytics ● Big data architecture at Credit OK ○ About Credit OK ○ Why we use serverless ○ Our requirements ○ Our solutions ○ The summary
  • 5. Bare Metal Server ● Pre-cloud era (probably..) ● Install OS and dependencies on a machine ● One machine - one server ● Expose the network to the internet ● Colocation/on-premise ● SSH/FTP/Git to the server
  • 6. Virtualization ● One machine - many servers ● One machine multiple customers ● VPS / Cloud ● SSH/FTP/Git to the server IaaS
  • 7. Containers & Micro Services ● Docker / Kubernetes ● Auto deployment ● Auto scale (automatic spawn new nodes) ● Pay base on number of nodes ● Infrastructure as code! (new concept!) PaaS
  • 10. Serverless ● Write code and deploy! ● Auto deploy ● Auto scale ● Pay per request ● No infrastructure!! SaaS
  • 11. It’s time to talk about..
  • 16. Let’s Review Types of Databases SQL NoSQL
  • 18. GCP Storages & Databases Non-serverless Serverless
  • 19. GCP Data Analytics Pipeline Analytics Visualization
  • 21. Credit Scoring Platform on Big Data Analytics creditok.co
  • 22. Why use serverless on big data? ● Scalable & super high performance ● No more server maintenance :) ● Easier to optimize ● Only pay per use
  • 23. Requirements ● Have a HUGE data warehouse for batch processing ● Our customer have on-premise data on >400 sites ● Data ingestor app is needed to install to every site ● Data ingestor app must be able to run on ● Data ingestor app must be super robust and easy to install ● Must work automatically everyday, task scheduler
  • 24. When >400 sites upload large files to your server at the same time.. This is unintentional DDoS!
  • 25. So we mainly use cloud function ● Auto scale ● But only accept <10 MB body size and also use Compute/App Engine for >10MB files
  • 27. Serverless Big Data Architecture In Summary ● Focus on design & coding ● Few people to achieve huge task ● No cost on idle server, pay as you use (GCS storage ~$0.02 per GB) ● Processing cost is surprisingly low when optimized (Beware of BigQuery cost!)
  • 28. Beware of ZONE_RESOURCE_POOL_EXHAUSTED ● Serverless doesn’t mean no server, you just do not need to spawn servers/workers ● Worker pools have limit, do not run your app at the peak time (but when!!) ● Hopefully Google will solve the problem soon :)
  • 29. We Are Hiring! ● PHP Laravel/Lumen Developer ● Data Engineer ● Credit Risk Analyst hr@creditok.co https://jobs.blognone.com/company/creditok
  • 30. Qu s o & An er
  • 31. Time is short, let’s utilize the networks. Feel free to connect with me via spicydog.me