SlideShare a Scribd company logo
1 of 12
Ingesting Data StreamsWith
a Serverless Infrastructure
Gil Colunga
gil.colunga@nordstrom.com
https://www.linkedin.com/in/gilcolunga
1 of 356
Solution Overview
Nordstrom.com
micro-services
Ingestion
Pipeline
Web UI
Analytics
Messages
Simple messages
• Exceptions
Compound messages
• Start / Stop
• Send / Receive
ABC Standard Schema
• Timestamp
• Message type
• Application ID
• Trace context ID
• Details
Windows Service + Redshift + S3
swing and a miss: strike one
Nordstrom.com
micro-services
Incoming stream Windows Service RedshiftCloudWatch
{log message} {content:{log message}} {content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
Our problem in a nutshell…
Fast & Slow
Stream
Processor
Stream
Processor
Data Modeling
/
Transformation
Short term
storage
Long term
storage
Data
Stream
“real time” needs
(alerting, monitoring, etc.)
Analytics,
reporting, etc.
Windows Service + Elastic Search
swing…foul ball: strike two
Nordstrom.com
micro-services
Incoming stream Windows ServiceCloudWatch
{log message} {content:{log message}} {content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{paired message}
{paired message}
{paired message}
{paired message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{log message}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
{content:{log message}}
Kinesis + Lambda + Elastic Search +Win. Svc + S3
swing…and the crowd goes wild!!
Nordstrom.com
micro-services
CloudWatch CW Stream
Unzip &
Discard CW
Forward to
Firehose
ABC Schema
Stream
Segregate
&
Pair
Unpaired messages
Processed Stream ES Loader
Service
Segregate & Pair in Detail
ABC Schema
Stream
shard 3
S&P
S&P
S&P
S&P
S&P
S&P
S&P
S&P
Processed Stream
batch > 500
batch > 500
Max # messages: 500
Set max batch size: 900
Unpaired messages
Measure Everything…
and when you’re done, go back and measure what you forgot to measure the first time.
Lessons Learned & Misc Facts
• On an average day: 120,000 messages per minute
• Cost of Lambdas: ~$5.00/day
• Cost of Kinesis (140 shards): ~$50.00/day
• There is a soft limit in the amount of lambdas that can be executing at any given point in time. Be mindful
of their execution time.
• Kinesis is very opaque
• We append shard information to the messages on the fly to get a better understanding on how the streams are working.
• Instrument/measure everything!
• So many moving parts can make it harder to troubleshoot.
Ingesting Data Streams With Serverless Infrastructure

More Related Content

What's hot

Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and futureEd Yakabosky
 
Donatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsDonatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsTanya Denisyuk
 
Resilient Design Using Queue Theory
Resilient Design Using Queue TheoryResilient Design Using Queue Theory
Resilient Design Using Queue TheoryScyllaDB
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNblueboxtraveler
 
IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...
IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...
IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...Dr. Haxel Consult
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016Monal Daxini
 
Understanding how your ap is are being traffic controlled
Understanding how your ap is are being traffic controlledUnderstanding how your ap is are being traffic controlled
Understanding how your ap is are being traffic controlledSanjeewa Malalgoda
 
What's new in apache pulsar 2.4.0
What's new in apache pulsar 2.4.0What's new in apache pulsar 2.4.0
What's new in apache pulsar 2.4.0StreamNative
 
How Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar communityHow Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar communityStreamNative
 
Cassandra Data Loader project presentation
Cassandra Data Loader project presentationCassandra Data Loader project presentation
Cassandra Data Loader project presentationdjdij123
 

What's hot (10)

Apache samza past, present and future
Apache samza  past, present and futureApache samza  past, present and future
Apache samza past, present and future
 
Donatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIsDonatas Mažionis, Building low latency web APIs
Donatas Mažionis, Building low latency web APIs
 
Resilient Design Using Queue Theory
Resilient Design Using Queue TheoryResilient Design Using Queue Theory
Resilient Design Using Queue Theory
 
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARNApache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
Apache Samza: Reliable Stream Processing Atop Apache Kafka and Hadoop YARN
 
IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...
IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...
IC-SDV 2019: Physical Quantity Search with the Simplicity of Keyword Searchin...
 
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016Netflix keystone   streaming data pipeline @scale in the cloud-dbtb-2016
Netflix keystone streaming data pipeline @scale in the cloud-dbtb-2016
 
Understanding how your ap is are being traffic controlled
Understanding how your ap is are being traffic controlledUnderstanding how your ap is are being traffic controlled
Understanding how your ap is are being traffic controlled
 
What's new in apache pulsar 2.4.0
What's new in apache pulsar 2.4.0What's new in apache pulsar 2.4.0
What's new in apache pulsar 2.4.0
 
How Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar communityHow Zhaopin contributes to Pulsar community
How Zhaopin contributes to Pulsar community
 
Cassandra Data Loader project presentation
Cassandra Data Loader project presentationCassandra Data Loader project presentation
Cassandra Data Loader project presentation
 

Similar to Ingesting Data Streams With Serverless Infrastructure

Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analyticsamesar0
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture PatternsAmazon Web Services
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfAmazon Web Services
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleBishop Fox
 
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...Amazon Web Services
 
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...Amazon Web Services
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Sid Anand
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)Amazon Web Services
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaAmazon Web Services
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!Julien SIMON
 
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...Amazon Web Services
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersAmazon Web Services
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteRoger Barga
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Amazon Web Services
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data WorkloadsAmazon Web Services
 
Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...
Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...
Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...Amazon Web Services
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...Amazon Web Services Korea
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015StampedeCon
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon Web Services
 

Similar to Ingesting Data Streams With Serverless Infrastructure (20)

Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
Cloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark AnalyticsCloud Security Monitoring and Spark Analytics
Cloud Security Monitoring and Spark Analytics
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdf
 
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at ScaleGetting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
Getting Buzzed on Buzzwords: Using Cloud & Big Data to Pentest at Scale
 
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
BDA403 The Visible Network: How Netflix Uses Kinesis Streams to Monitor Appli...
 
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
AWS as a Data Platform for Cloud and On-Premises Workloads | AWS Public Secto...
 
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)Resilient Predictive Data Pipelines (GOTO Chicago 2016)
Resilient Predictive Data Pipelines (GOTO Chicago 2016)
 
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
AWS Summit London 2014 | Scaling on AWS for the First 10 Million Users (200)
 
Raleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS LambdaRaleigh DevDay 2017: Real time data processing using AWS Lambda
Raleigh DevDay 2017: Real time data processing using AWS Lambda
 
Scale, baby, scale!
Scale, baby, scale!Scale, baby, scale!
Scale, baby, scale!
 
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
(BDT204) Rendering a Seamless Satellite Map of the World with AWS and NASA Da...
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Barga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 KeynoteBarga IC2E & IoTDI'16 Keynote
Barga IC2E & IoTDI'16 Keynote
 
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
Getting Maximum Performance from Amazon Redshift (DAT305) | AWS re:Invent 2013
 
(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads(CMP403) AWS Lambda: Simplifying Big Data Workloads
(CMP403) AWS Lambda: Simplifying Big Data Workloads
 
Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...
Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...
Speed and Reliability at Any Scale: Amazon SQS and Database Services (SVC206)...
 
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...AWS Summit Seoul 2015 -  AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
AWS Summit Seoul 2015 - AWS 최신 서비스 살펴보기 - Aurora, Lambda, EFS, Machine Learn...
 
Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015Lifting the hood on spark streaming - StampedeCon 2015
Lifting the hood on spark streaming - StampedeCon 2015
 
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
Amazon RDS for MySQL – Diagnostics, Security, and Data Migration (DAT302) | A...
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 

Recently uploaded (20)

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 

Ingesting Data Streams With Serverless Infrastructure

  • 1. Ingesting Data StreamsWith a Serverless Infrastructure Gil Colunga gil.colunga@nordstrom.com https://www.linkedin.com/in/gilcolunga 1 of 356
  • 3. Messages Simple messages • Exceptions Compound messages • Start / Stop • Send / Receive ABC Standard Schema • Timestamp • Message type • Application ID • Trace context ID • Details
  • 4. Windows Service + Redshift + S3 swing and a miss: strike one Nordstrom.com micro-services Incoming stream Windows Service RedshiftCloudWatch {log message} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}}
  • 5. Our problem in a nutshell…
  • 6. Fast & Slow Stream Processor Stream Processor Data Modeling / Transformation Short term storage Long term storage Data Stream “real time” needs (alerting, monitoring, etc.) Analytics, reporting, etc.
  • 7. Windows Service + Elastic Search swing…foul ball: strike two Nordstrom.com micro-services Incoming stream Windows ServiceCloudWatch {log message} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {paired message} {paired message} {paired message} {paired message} {log message} {log message} {log message} {log message} {log message} {log message} {log message} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}} {content:{log message}}
  • 8. Kinesis + Lambda + Elastic Search +Win. Svc + S3 swing…and the crowd goes wild!! Nordstrom.com micro-services CloudWatch CW Stream Unzip & Discard CW Forward to Firehose ABC Schema Stream Segregate & Pair Unpaired messages Processed Stream ES Loader Service
  • 9. Segregate & Pair in Detail ABC Schema Stream shard 3 S&P S&P S&P S&P S&P S&P S&P S&P Processed Stream batch > 500 batch > 500 Max # messages: 500 Set max batch size: 900 Unpaired messages
  • 10. Measure Everything… and when you’re done, go back and measure what you forgot to measure the first time.
  • 11. Lessons Learned & Misc Facts • On an average day: 120,000 messages per minute • Cost of Lambdas: ~$5.00/day • Cost of Kinesis (140 shards): ~$50.00/day • There is a soft limit in the amount of lambdas that can be executing at any given point in time. Be mindful of their execution time. • Kinesis is very opaque • We append shard information to the messages on the fly to get a better understanding on how the streams are working. • Instrument/measure everything! • So many moving parts can make it harder to troubleshoot.

Editor's Notes

  1. Redshift as the sole repository for “real time” and Analytics Windows service (C#) running on EC2 Unzip CloudWatch messages Discard CloudWatch Wrapper Match start / stop messages Upload to S3 Load from S3 to Redshift Problems Trying to join compound messages on the fly by the Web UI was too slow Could not process and load data fast enough; database kept falling behind. Did not have backfill capabilities
  2. Our problem was that we could complete all the processing without falling behind. The system would stay up to date for a few hours and then it would begin lagging and never recover.
  3. Problem: we were trying to use one repository for all purposes. We decided to implement a common approach for ingesting. Fast and Slow (EXPLAIN) We’re just going to focus and drill into the blue boxes for timesake
  4. Windows services (C#) running on EC2 to Fast service: Unzip CloudWatch messages Discard CloudWatch Wrapper Match start / stop messages. THIS WAS DONE IN MEMORY Send messages to Elastic Search Elastic Search as the repository for the fast path (web UI) Elastic Search query response time was fast enough for web UI. Yay! Problems We were having incidents too often Fast service ran very hot (high memory and CPU utilization) Fast service was too sensitive to sudden increases in data flow (didn’t auto-scale) Adding auto-scaling capabilities would have kept adding to code complexity, maintenance, etc. Still did not have backfill capabilities
  5. EXPLAIN EACH STEP AT A TIME Use Lambdas to perform only one task at a time Unzip and discard CloudWatch wrapper Segregate and Pair Send to S3 Use Kinesis Streams as the holding place between processing steps Use S3 as the data lake for the entire pipeline Windows service as the Elastic Search Loader Elastic Search as the repository for the fast path (web UI) Redshift as the repository for the slow path (data analytics/reporting) Elastic Search query response time was fast enough for web UI. Yay! Using S3 as the data lake provided us with the means to backfill data if needed. Problems Lots of moving parts, harder to see exactly what’s going on.
  6. Explain limitation of one lambda per shard and how a big batch would cause it to fall behind