Experiences sharing about Lambda, Kinesis, and Postgresql

•Download as PPTX, PDF•

0 likes•116 views

Okis Chuang

My experiences and lessons learned about using streaming processing by lambda, kinesis and postgresql.

Engineering

Okis Chuang
Experiences sharing about
Lambda, Kinesis, and
Postgresql

Agenda
● Use Case and Architecture
● The rugged road we want to get better
○ Enhancement we’ve tried and error
■Bulk insert/update
■Kinesis stream tuning
■Lambda tuning
■Postgresql partition

AWS Kinesis Family
https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

AWS Kinesis Shard
● Streams are made of shards
● Each shard ingest 1MB/sec, 1000
records/sec
● Each shard emit 2MB/sec
● All data is stored for 24 hours by
default (can be extended to 7
days)
https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

Serverless Architecture
Nebula: Producer Kinesis Stream Lambda: Consumer RDS - Postgresql

What things we done?
Nebula: Producer Kinesis Stream Lambda: Consumer RDS - Postgresql

Batch Upsert
● Upsert for every single event at first
● We combine several methods to achieve better performance and reduce
times of query
○ Batch insert to postgresql temp table (memory table, per session) for caching all data
■Almost the same table structure with destination table except unique constraint
○ Use CTE to bulk update and delete updated row from temp table
■Use WITH clause combining RETURNING clause
○ Insert remaining data into destination table from temp table
■We faced conflict issue – misunderstanding of concurrency of lambda
■Catch duplicated error code for retrying UPDATE again
● Using SAVEPOINT to rollback to specific data point (like snapshot)

Partition Table
● Over 1M counts in 6 months
● Use trigger function
○ control partition table management
■Partition by date
○ Insert Redirect
■Redirect insert to partition table but master table
● Insert inside the trigger function breaks the original behavior badly
○ Original insert will not behave the same
○ Function return
■Return null -> break RETURNING clause behavior
■Retrun NEW -> result in duplicate record both in master and partition table

AWS Kinesis Lesson Learned
● Evaluate your shard needs
○ Ingestion style: FIFO, FILO…etc
○ For ingress, how much data would come into stream every day
○ For egress, how fast is your consumer? How many consumers will consume the same
stream simutaneously?
● Pre-batch your events before puts
○ In producer
○ In consumer
○ Fluentd as collector https://github.com/awslabs/aws-fluent-plugin-kinesis
● Make sure your backend would not be influenced due to high throughput
○ e.g. persists your aggregated events in database

AWS Lambda Lesson Learned
● Evaluate lambda resource
○ Memory will decide the whole performance level including CPU, memory…etc
● Scaling behavior - Concurrent?
○ Mistake & misunderstanding
■context.callbackWaitsForEmptyEventLoop=false
■Different level of granularity between data from insert and inside the database
■Use savepoint to retry batch update
○ Event-based/stream-based event-source (doc - Scaling Behavior)
■Stream-based: kinesis/dynamoDB
● Concurrency is the number of shards
■Non-stream based
● S3 event, API GW
● The number of events (or requests) these event sources publish influences the concurrency

What's hot

Writing and deploying serverless python applicationsCesar Cardenas Desales

Go With The FlowPhilWinstanley

Clouds presentation, aws meetup v2Cristian Măgherușan-Stanciu

Introduction to Modern DevOps TechnologiesKriangkrai Chaonithi

Build App with Nodejs - YWC WorkshopSarunyhot Suwannachoti

Serverless haskellDavid Overton

Firebase Cloud Functions: a quick overviewJoseph Lust

CLI utility in ClojureScript running on Node.jsKarolis Labrencis

Micro services infrastructure with AWS and AnsibleBamdad Dashtban

Introduction to DevOps and the Practical Use Cases at Credit OKKriangkrai Chaonithi

From business requirements to working pipelines with apache airflowDerrick Qin

Workers and Event processors that Scalejasonfill

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow managementBurasakorn Sabyeying

Start Serverless with Golang!Kyuhyun Byun

GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...DevOpsDays Tel Aviv

Introducing Apache Airflow and how we are using itBruno Faria

Scalable and reliable kubernetes on awsApplatix

Communication tool & Environment for Remote WorkerShotaro Sakamaki

Embracing Serverless with GoogleJoseph Lust

What's hot (19)

Writing and deploying serverless python applications

Go With The Flow

Clouds presentation, aws meetup v2

Introduction to Modern DevOps Technologies

Build App with Nodejs - YWC Workshop

Serverless haskell

Firebase Cloud Functions: a quick overview

CLI utility in ClojureScript running on Node.js

Micro services infrastructure with AWS and Ansible

Introduction to DevOps and the Practical Use Cases at Credit OK

From business requirements to working pipelines with apache airflow

Workers and Event processors that Scale

Intro to Airflow: Goodbye Cron, Welcome scheduled workflow management

Start Serverless with Golang!

GRU: Taming a Herd of Wild Servers - Oz Katz, Similarweb - DevOpsDays Tel Avi...

Introducing Apache Airflow and how we are using it

Scalable and reliable kubernetes on aws

Communication tool & Environment for Remote Worker

Embracing Serverless with Google

Similar to Experiences sharing about Lambda, Kinesis, and Postgresql

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez

AWS Big Data Demystified #1: Big data architecture lessons learned Omid Vahdaty

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | EnglishOmid Vahdaty

Load testing in Zonky with GatlingPetr Vlček

Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaDataWorks Summit

Serverless for High Performance ComputingLuciano Mammino

Data Science in the Cloud @StitchFixC4Media

Big Data in 200 km/h | AWS Big Data Demystified #1.3 Omid Vahdaty

Discoblocks.pptx.pdfRichárd Kovács

Netflix Open Source Meetup Season 4 Episode 2aspyker

Going Serverless with AWS Lambda at ReportGardenJay Gandhi

Scalable complex event processing on samza @UBERShuyi Chen

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...Amazon Web Services

TRHUG 2015 - Veloxity Big Data Migration Use CaseHakan Ilter

NetflixOSS meetup lightning talks and roadmapRuslan Meshenberg

Owain Perry (Just Giving) - Continuous Delivery of Windows Micro-Services in ...Outlyer

Backing up Wikipedia DatabasesJaime Crespo

Harvard it summit 2016 - opencast in the cloud at harvard dce- live and on-d...kevin_donovan

Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...Ontico

Laskar: High-Velocity GraphQL & Lambda-based Software Development ModelGarindra Prahandono

Similar to Experiences sharing about Lambda, Kinesis, and Postgresql (20)

Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...

AWS Big Data Demystified #1: Big data architecture lessons learned

AWS big-data-demystified #1.1 | Big Data Architecture Lessons Learned | English

Load testing in Zonky with Gatling

Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka

Serverless for High Performance Computing

Data Science in the Cloud @StitchFix

Big Data in 200 km/h | AWS Big Data Demystified #1.3

Discoblocks.pptx.pdf

Netflix Open Source Meetup Season 4 Episode 2

Going Serverless with AWS Lambda at ReportGarden

Scalable complex event processing on samza @UBER

Infrastructure at Scale: Apache Kafka, Twitter Storm & Elastic Search (ARC303...

TRHUG 2015 - Veloxity Big Data Migration Use Case

NetflixOSS meetup lightning talks and roadmap

Owain Perry (Just Giving) - Continuous Delivery of Windows Micro-Services in ...

Backing up Wikipedia Databases

Harvard it summit 2016 - opencast in the cloud at harvard dce- live and on-d...

Protecting the Web at a scale using consul and Elk / Valentin Chernozemski (S...

Laskar: High-Velocity GraphQL & Lambda-based Software Development Model

Recently uploaded

pipeline in computer architecture designssuser87fa0c1

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

main PPT.pptx of girls hostel security using rfidNikhilNagaraju

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani

young call girls in Green Park🔝 9953056974 🔝 escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

An experimental study in using natural admixture as an alternative for chemic...Chandu841456

GDSC ASEB Gen AI study jams presentationGDSCAESB

Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3

Architect Hassan Khalil Portfolio for 2024hassan khalil

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...9953056974 Low Rate Call Girls In Saket, Delhi NCR

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000

Call Girls Narol 7397865700 Independent Call Girlsssuser7cb4ff

Internship report on mechanical engineeringmalavadedarshan25

Oxy acetylene welding presentation note.eptoze12

Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis

Application of Residue Theorem to evaluate real integrations.pptx959SahilShah

Churning of Butter, Factors affecting .Satyam Kumar

POWER SYSTEMS-1 Complete notes examplesDr. Gudipudi Nageswara Rao

Recently uploaded (20)

pipeline in computer architecture design

Call Girls Delhi {Jodhpur} 9711199012 high profile service

main PPT.pptx of girls hostel security using rfid

CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf

young call girls in Green Park🔝 9953056974 🔝 escort Service

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

An experimental study in using natural admixture as an alternative for chemic...

GDSC ASEB Gen AI study jams presentation

Concrete Mix Design - IS 10262-2019 - .pptx

Architect Hassan Khalil Portfolio for 2024

🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...

Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service

Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...

Call Girls Narol 7397865700 Independent Call Girls

Internship report on mechanical engineering

Oxy acetylene welding presentation note.

Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction

Application of Residue Theorem to evaluate real integrations.pptx

Churning of Butter, Factors affecting .

POWER SYSTEMS-1 Complete notes examples

Experiences sharing about Lambda, Kinesis, and Postgresql

1. Okis Chuang Experiences sharing about Lambda, Kinesis, and Postgresql

2. Agenda ● Use Case and Architecture ● The rugged road we want to get better ○ Enhancement we’ve tried and error ■Bulk insert/update ■Kinesis stream tuning ■Lambda tuning ■Postgresql partition

3. Batch Processing? Streaming Processing?

4. AWS Kinesis Family https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

5. https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

6. https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

7. AWS Kinesis Shard ● Streams are made of shards ● Each shard ingest 1MB/sec, 1000 records/sec ● Each shard emit 2MB/sec ● All data is stored for 24 hours by default (can be extended to 7 days) https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

8. https://www.slideshare.net/AmazonWebServices/deep-dive-and-best-practices-for-realtime-streaming-applications

9. Serverless Architecture Nebula: Producer Kinesis Stream Lambda: Consumer RDS - Postgresql

10. What things we done? Nebula: Producer Kinesis Stream Lambda: Consumer RDS - Postgresql

11. What things we did? RDS - PostgreSQL

12. Batch Upsert ● Upsert for every single event at first ● We combine several methods to achieve better performance and reduce times of query ○ Batch insert to postgresql temp table (memory table, per session) for caching all data ■Almost the same table structure with destination table except unique constraint ○ Use CTE to bulk update and delete updated row from temp table ■Use WITH clause combining RETURNING clause ○ Insert remaining data into destination table from temp table ■We faced conflict issue – misunderstanding of concurrency of lambda ■Catch duplicated error code for retrying UPDATE again ● Using SAVEPOINT to rollback to specific data point (like snapshot)

13. Partition Table ● Over 1M counts in 6 months ● Use trigger function ○ control partition table management ■Partition by date ○ Insert Redirect ■Redirect insert to partition table but master table ● Insert inside the trigger function breaks the original behavior badly ○ Original insert will not behave the same ○ Function return ■Return null -> break RETURNING clause behavior ■Retrun NEW -> result in duplicate record both in master and partition table

14. What things we did? AWS Kinesis Stream

15. AWS Kinesis Lesson Learned ● Evaluate your shard needs ○ Ingestion style: FIFO, FILO…etc ○ For ingress, how much data would come into stream every day ○ For egress, how fast is your consumer? How many consumers will consume the same stream simutaneously? ● Pre-batch your events before puts ○ In producer ○ In consumer ○ Fluentd as collector https://github.com/awslabs/aws-fluent-plugin-kinesis ● Make sure your backend would not be influenced due to high throughput ○ e.g. persists your aggregated events in database

16. What things we did? AWS Lambda

17. AWS Lambda Lesson Learned ● Evaluate lambda resource ○ Memory will decide the whole performance level including CPU, memory…etc ● Scaling behavior - Concurrent? ○ Mistake & misunderstanding ■context.callbackWaitsForEmptyEventLoop=false ■Different level of granularity between data from insert and inside the database ■Use savepoint to retry batch update ○ Event-based/stream-based event-source (doc - Scaling Behavior) ■Stream-based: kinesis/dynamoDB ● Concurrency is the number of shards ■Non-stream based ● S3 event, API GW ● The number of events (or requests) these event sources publish influences the concurrency

Experiences sharing about Lambda, Kinesis, and Postgresql

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Experiences sharing about Lambda, Kinesis, and Postgresql

Similar to Experiences sharing about Lambda, Kinesis, and Postgresql (20)

More from Okis Chuang

More from Okis Chuang (7)

Recently uploaded

Recently uploaded (20)

Experiences sharing about Lambda, Kinesis, and Postgresql