SlideShare a Scribd company logo
1 of 32
Download to read offline
Create a serverless architecture for data collec1on
with Python and AWS
9 Apr 2017
David Santucci
david.santucci@cloudacademy.com
About me
David Santucci
Data scien;st @ CloudAcademy.com
@davidsantucci
linkedin.com/in/davidsantucci/
Agenda
• Introduc;on
• Architecture
• Amazon Kinesis Stream
• Amazon Lambda
• Dead LeGer Queue (DLQ)
• Conclusions
• Q&A
Introduc4on
Challenges:
• Collect events from different sources
• Backend applica;ons
• Frontend applica;ons
• Mobile apps
• Store events to different des4na4ons
• Data Warehouse
• Third-party services
• e.g., Hubspot, Mixpanel, GTM, …
• Avoid data loss
A serverless architecture
AWS services:
• Kinesis Stream
• Lambda Func;ons
• SQS
• S3
• Amazon API Gateway
Manage events from mul4ple sources
Amazon Kinesis Stream
What is Amazon Kinesis Stream?
• Collect and process large streams of data records in real ;me.
• Typical scenarios for using Streams:
• Manage mul;ple producers that push their data feed directly into a stream;
• Collect real-;me analy;cs and metrics;
• Process applica;on logs;
• Create pipeline with other AWS services (the consumers).
from time import gmtime, strftime

import boto3



client = boto3.client(

service_name="kinesis",

region_name="us-east-1",

)



for i in xrange(300):

print "sending event {}".format(i+1)

response = client.put_record(

StreamName="data-collection-stream",

Data='{"name":"event-%d","data":{"payload":%d}}' % (i, i),

PartitionKey=strftime("PK-%Y%m%d-%H%M%S", gmtime()),

)

print "response for event {}: {}".format(i+1, response)

Amazon Kinesis Stream
Amazon Kinesis Stream - Tips
• Use API Gateway as entry point for front-end and mobile.
• Start with a single shard and increase only when needed.
• Output events one by one to avoid data loss.
• Generate Par44onKey using uuid (e.g., for test purpose).
Amazon Lambda
What is AWS Lambda?
• It processes a single event at real-;me without managing servers.
• Highly scalable.
• Fallback strategy in case of errors.
Amazon Lambda - Events rou4ng
Amazon Lambda - Events rou4ng
It works as router and it
is directly triggered by
Kinesis Streams.
[

{

"destination_name": "mixpanel",

"destination_arn": "arn:aws:lambda:region:account-id:function:function-name:prod",

"enabled_events": [

"page_view",

"search",

"button_click",

"page_scroll",

]

},

{

"destination_name": "hubspotcrm",

"destination_arn": "arn:aws:lambda:region:account-id:function:function-name:prod",

"enabled_events": [

"login",

"logout",

"registration",

"page_view",

"search",

"email_sent",

"email_open",

]

},

{

"destination_name": "datawarehouse",

"destination_arn": "arn:aws:lambda:region:account-id:function:function-name:prod",

"enabled_events": [

"login",

"logout",

"registration",

"page_view",

"search",

"button_click",

"page_scroll",

"email_sent",

"email_open",

]

}

]
Amazon Lambda - Events rou4ng
…
{

"destination_name": "datawarehouse",

"destination_arn": "arn:aws:lambda:region:id:function:name:prod",

"enabled_events": [

"login",

"logout",

"registration",

"page_view",

"search",

"button_click",

"page_scroll",

"email_sent",

"email_open",

]

}
Amazon Lambda - Events rou4ng
Amazon Lambda - events rou4ng
Amazon Lambda - events rou4ng
It provides the logic to
connect to the des4na4on
services (e.g., HubSpot,
Mixpanel, etc … ).
Custom retry strategy (with
exponen;al delay).
Amazon Lambda - Retry strategy
def lambda_handler(event, context=None):

try:

hub_id = os.environ['HUBSPOT_HUB_ID']

except KeyError:

raise DoNotRetryException('HUBSPOT_HUB_ID')

event = format_event_data(event, hub_id)

process_event(event['data'])

return "ok"



def format_event_data(event, hub_id):

event_id = event["name"].split(".")[-1].replace("_", " ").title()

event['data'].update({

'_a': hub_id,

'_n': event_id,

'email': event['data']['_email'],

})

return event



@retry

def process_event(params):

url = 'http://track.hubspot.com/v1/event?{}'.format(urllib.urlencode(params))

urllib2.urlopen(url)
Amazon Lambda - Retry strategy
def retry(func, max_retries=3, backoff_rate=2, scale_factor=.1):

def func_wrapper(*args, **kwargs):

attempts = 0

while True:

attempts += 1

if attempts >= max_retries:

raise

try:

return func(*args, **kwargs)

except DoNotRetryException:

raise

except:

time.sleep(backoff_rate ** attempts * scale_factor)

return func_wrapper





class DoNotRetryException(Exception):

def __init__(self, *args, **kwargs):

Exception.__init__(self, *args, **kwargs)
Amazon Lambda - Our 4ps
• Enable Kinesis Stream as a trigger for other AWS services.
• To preserve the priority Configure trigger with Batch size: 1 and Star;ng posi;on: Trim Horizon
• An S3 file can be used to define the rou;ng rules.
• Invoke Lambda Func;ons that work as connector asynchronously
• Always create aliases and versions for each Func;on.
• Use environment variables for configura;ons.
• Create a custom IAM role for each Func;on.
• Detect delays in stream processing monitoring IteratorAge metric

in the Lambda console’s monitoring tab.
Dead LeIer Queues (DLQ) - Avoid event loss
DLQ - Simple Queue Service (SQS)
What is AWS SQS?
• Lambda automa4cally retries failed execu;ons for asynchronous invoca;ons.
• Configure Lambda (advanced secngs) to forward payloads that were not
processed to a dead-leIer queue (an SQS queue or an SNS topic).
• We used a SQS.
def get_events_from_sqs(

sqs_queue_name,

region_name='us-west-2',

purge_messages=False,

backup_filename='backup.jsonl',

visibility_timeout=60):

"""

Create a json backup file of all events in the SQS queue with the
given 'sqs_queue_name'.

:sqs_queue_name: the name of the AWS SQS queue to be read via boto3

:region_name: the region name of the AWS SQS queue to be read via boto3

:purge_messages: True if messages must be deleted after reading, False otherwise

:backup_filename: the name of the file where to store all SQS messages

:visibility_timeout: period of time in seconds (unique consumer window)

:return: the number of processed batch of events

"""

forwarded = 0

counter = 0

sqs = boto3.resource('sqs', region_name=region_name)

dlq = sqs.get_queue_by_name(QueueName=sqs_queue_name)
# continues to next slide ..
Amazon Lambda - Events rou4ng
Amazon Lambda - Events rou4ng
# continues from previous slide ..
with open(backup_filename, 'a') as filep:

while True:

batch_messages = dlq.receive_messages(

MessageAttributeNames=['All'],

MaxNumberOfMessages=10,

WaitTimeSeconds=20,

VisibilityTimeout=visibility_timeout,

)

for msg in batch_messages:

try:

line = "{}n".format(json.dumps({

'attributes': msg.message_attributes,

'body': msg.body,

}))

print("Line: ", line)

filep.write(line)

if purge_messages:

print('Deleting message from the queue.')

msg.delete()

forwarded += 1

except Exception as ex:

print("Error in processing message %s: %r", msg, ex)

counter += 1

print('Batch %d processed', counter)
DLQ - Our 4ps
• Set a DLQ on each Lambda Func;on that can fail.
• Re-process events sent to DLQ with a custom script.
• Tune DLQ config directly from Lambda Func;on panel.
Conclusions
Why a serverless architecture?
• scalability - prevent data loss - full control on each step - costs
Open points:
• Integrate a custom CloudWatch dashboard.
• Configure Firehose for a Backup.
• Write a script that manages events sent to DLQs.
• Create a listener for anomaly detec;on with Kinesis Analy;cs.
• Amazon StepFunc;ons.
WE’RE HIRING!
clad.co/fullstack-dev
Useful links
These slides:
Create a serverless architecture for data collec4on with Python and AWS
—> hGp://clda.co/pycon8-serverless-data-collec;on
Blog post with code snippets:
Building a serverless architecture for data collec4on with AWS Lambda
—> hGp://clda.co/pycon8-data-collec;on-blogpost
Serverless Learning Path:
GeJng Started with Serverless Compu4ng
—> hGp://clda.co/pycon8-serverless-LP
Thank you :)
cloudacademy.com

More Related Content

What's hot

Containers and the Evolution of Computing
Containers and the Evolution of ComputingContainers and the Evolution of Computing
Containers and the Evolution of ComputingAmazon Web Services
 
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)Amazon Web Services Korea
 
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaPyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaFabian Dubois
 
Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheAmazon Web Services
 
AWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar SeriesAWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar SeriesAmazon Web Services
 
Norikra: Stream Processing with SQL
Norikra: Stream Processing with SQLNorikra: Stream Processing with SQL
Norikra: Stream Processing with SQLSATOSHI TAGOMORI
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Amazon Web Services
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep DiveAmazon Web Services
 
Programando sua infraestrutura com o AWS CloudFormation
Programando sua infraestrutura com o AWS CloudFormationProgramando sua infraestrutura com o AWS CloudFormation
Programando sua infraestrutura com o AWS CloudFormationAmazon Web Services LATAM
 
CON420 Infrastructure as code for containers
CON420 Infrastructure as code for containersCON420 Infrastructure as code for containers
CON420 Infrastructure as code for containersNathan Peck
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014Amazon Web Services
 
Masterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormationMasterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormationAmazon Web Services
 
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014Amazon Web Services
 
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)Amazon Web Services Korea
 
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAmazon Web Services
 
(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...
(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...
(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...Amazon Web Services
 

What's hot (20)

AWS CloudFormation Masterclass
AWS CloudFormation MasterclassAWS CloudFormation Masterclass
AWS CloudFormation Masterclass
 
Containers and the Evolution of Computing
Containers and the Evolution of ComputingContainers and the Evolution of Computing
Containers and the Evolution of Computing
 
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 5. DynamoDB에 센서 데이터 저장하기 (김무현 솔루션즈 아키텍트)
 
Firebase overview
Firebase overviewFirebase overview
Firebase overview
 
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS LambdaPyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
PyconJP: Building a data preparation pipeline with Pandas and AWS Lambda
 
Hands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCacheHands-on Lab: Amazon ElastiCache
Hands-on Lab: Amazon ElastiCache
 
AWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar SeriesAWS Infrastructure as Code - September 2016 Webinar Series
AWS Infrastructure as Code - September 2016 Webinar Series
 
Norikra: Stream Processing with SQL
Norikra: Stream Processing with SQLNorikra: Stream Processing with SQL
Norikra: Stream Processing with SQL
 
Deep Dive: AWS CloudFormation
Deep Dive: AWS CloudFormationDeep Dive: AWS CloudFormation
Deep Dive: AWS CloudFormation
 
Programming Amazon Web Services
Programming Amazon Web ServicesProgramming Amazon Web Services
Programming Amazon Web Services
 
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...
 
(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive(DAT401) Amazon DynamoDB Deep Dive
(DAT401) Amazon DynamoDB Deep Dive
 
Programando sua infraestrutura com o AWS CloudFormation
Programando sua infraestrutura com o AWS CloudFormationProgramando sua infraestrutura com o AWS CloudFormation
Programando sua infraestrutura com o AWS CloudFormation
 
CON420 Infrastructure as code for containers
CON420 Infrastructure as code for containersCON420 Infrastructure as code for containers
CON420 Infrastructure as code for containers
 
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
(BDT205) Your First Big Data Application on AWS | AWS re:Invent 2014
 
Masterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormationMasterclass Webinar - AWS CloudFormation
Masterclass Webinar - AWS CloudFormation
 
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
(SDD411) Amazon CloudSearch Deep Dive and Best Practices | AWS re:Invent 2014
 
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
AWS IoT 핸즈온 워크샵 - 실습 3. AWS IoT Thing Shadow (김무현 솔루션즈 아키텍트)
 
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as CodeAWS January 2016 Webinar Series - Managing your Infrastructure as Code
AWS January 2016 Webinar Series - Managing your Infrastructure as Code
 
(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...
(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...
(APP306) Using AWS CloudFormation for Deployment and Management at Scale | AW...
 

Similar to Create a serverless architecture for data collection with Python and AWS

(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoTAmazon Web Services
 
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS LambdaAmazon Web Services
 
Alex Casalboni - Configuration management and service discovery - Codemotion ...
Alex Casalboni - Configuration management and service discovery - Codemotion ...Alex Casalboni - Configuration management and service discovery - Codemotion ...
Alex Casalboni - Configuration management and service discovery - Codemotion ...Codemotion
 
Containerless in the Cloud with AWS Lambda
Containerless in the Cloud with AWS LambdaContainerless in the Cloud with AWS Lambda
Containerless in the Cloud with AWS LambdaRyan Cuprak
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Amazon Web Services
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...Amazon Web Services
 
Serverless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat SystemServerless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat SystemAmazon Web Services
 
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)Amazon Web Services
 
Continuous Deployment in AWS Lambda
Continuous Deployment in AWS LambdaContinuous Deployment in AWS Lambda
Continuous Deployment in AWS LambdaShu Ting Tseng
 
A Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaA Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaAmazon Web Services
 
AWS Incident Response Cheat Sheet.pdf
AWS Incident Response Cheat Sheet.pdfAWS Incident Response Cheat Sheet.pdf
AWS Incident Response Cheat Sheet.pdfChristopher Doman
 
Utah Codecamp Cloud Computing
Utah Codecamp Cloud ComputingUtah Codecamp Cloud Computing
Utah Codecamp Cloud ComputingTom Creighton
 
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013Amazon Web Services
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaAmazon Web Services
 
2 years with python and serverless
2 years with python and serverless2 years with python and serverless
2 years with python and serverlessHector Canto
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture PatternsAmazon Web Services
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfAmazon Web Services
 
(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best PracticesAmazon Web Services
 

Similar to Create a serverless architecture for data collection with Python and AWS (20)

(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
(MBL305) You Have Data from the Devices, Now What?: Getting the Value of the IoT
 
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
(CMP407) Lambda as Cron: Scheduling Invocations in AWS Lambda
 
Alex Casalboni - Configuration management and service discovery - Codemotion ...
Alex Casalboni - Configuration management and service discovery - Codemotion ...Alex Casalboni - Configuration management and service discovery - Codemotion ...
Alex Casalboni - Configuration management and service discovery - Codemotion ...
 
Containerless in the Cloud with AWS Lambda
Containerless in the Cloud with AWS LambdaContainerless in the Cloud with AWS Lambda
Containerless in the Cloud with AWS Lambda
 
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...Supercharge Your Product Development with Continuous Delivery & Serverless Co...
Supercharge Your Product Development with Continuous Delivery & Serverless Co...
 
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
Building Big Data Applications with Serverless Architectures -  June 2017 AWS...Building Big Data Applications with Serverless Architectures -  June 2017 AWS...
Building Big Data Applications with Serverless Architectures - June 2017 AWS...
 
AWS Lambda
AWS LambdaAWS Lambda
AWS Lambda
 
Serverless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat SystemServerless by Example: Building a Real-Time Chat System
Serverless by Example: Building a Real-Time Chat System
 
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
AWS re:Invent 2016: ↑↑↓↓←→←→ BA Lambda Start (SVR305)
 
Continuous Deployment in AWS Lambda
Continuous Deployment in AWS LambdaContinuous Deployment in AWS Lambda
Continuous Deployment in AWS Lambda
 
A Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS LambdaA Walk in the Cloud with AWS Lambda
A Walk in the Cloud with AWS Lambda
 
AWS Incident Response Cheat Sheet.pdf
AWS Incident Response Cheat Sheet.pdfAWS Incident Response Cheat Sheet.pdf
AWS Incident Response Cheat Sheet.pdf
 
Utah Codecamp Cloud Computing
Utah Codecamp Cloud ComputingUtah Codecamp Cloud Computing
Utah Codecamp Cloud Computing
 
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
AWS CloudFormation under the Hood (DMG303) | AWS re:Invent 2013
 
Real-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS LambdaReal-time Data Processing Using AWS Lambda
Real-time Data Processing Using AWS Lambda
 
2 years with python and serverless
2 years with python and serverless2 years with python and serverless
2 years with python and serverless
 
Serverless Architecture Patterns
Serverless Architecture PatternsServerless Architecture Patterns
Serverless Architecture Patterns
 
serverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdfserverless_architecture_patterns_london_loft.pdf
serverless_architecture_patterns_london_loft.pdf
 
(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices(DVO304) AWS CloudFormation Best Practices
(DVO304) AWS CloudFormation Best Practices
 
SEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) ScaleSEC301 Security @ (Cloud) Scale
SEC301 Security @ (Cloud) Scale
 

Recently uploaded

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一Fs
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Paul Calvano
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationLinaWolf1
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Sonam Pathan
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Lucknow
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012rehmti665
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa494f574xmv
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Excelmac1
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxDyna Gilbert
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Sonam Pathan
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书zdzoqco
 

Recently uploaded (20)

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
定制(AUT毕业证书)新西兰奥克兰理工大学毕业证成绩单原版一比一
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24Font Performance - NYC WebPerf Meetup April '24
Font Performance - NYC WebPerf Meetup April '24
 
PHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 DocumentationPHP-based rendering of TYPO3 Documentation
PHP-based rendering of TYPO3 Documentation
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170Call Girls Near The Suryaa Hotel New Delhi 9873777170
Call Girls Near The Suryaa Hotel New Delhi 9873777170
 
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja VipCall Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
Call Girls Service Adil Nagar 7001305949 Need escorts Service Pooja Vip
 
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Uttam Nagar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
Call Girls South Delhi Delhi reach out to us at ☎ 9711199012
 
Film cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasaFilm cover research (1).pptxsdasdasdasdasdasa
Film cover research (1).pptxsdasdasdasdasdasa
 
Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...Blepharitis inflammation of eyelid symptoms cause everything included along w...
Blepharitis inflammation of eyelid symptoms cause everything included along w...
 
Top 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptxTop 10 Interactive Website Design Trends in 2024.pptx
Top 10 Interactive Website Design Trends in 2024.pptx
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
Call Girls In The Ocean Pearl Retreat Hotel New Delhi 9873777170
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
办理多伦多大学毕业证成绩单|购买加拿大UTSG文凭证书
 

Create a serverless architecture for data collection with Python and AWS

  • 1. Create a serverless architecture for data collec1on with Python and AWS 9 Apr 2017 David Santucci
  • 2. david.santucci@cloudacademy.com About me David Santucci Data scien;st @ CloudAcademy.com @davidsantucci linkedin.com/in/davidsantucci/
  • 3. Agenda • Introduc;on • Architecture • Amazon Kinesis Stream • Amazon Lambda • Dead LeGer Queue (DLQ) • Conclusions • Q&A
  • 4. Introduc4on Challenges: • Collect events from different sources • Backend applica;ons • Frontend applica;ons • Mobile apps • Store events to different des4na4ons • Data Warehouse • Third-party services • e.g., Hubspot, Mixpanel, GTM, … • Avoid data loss
  • 5. A serverless architecture AWS services: • Kinesis Stream • Lambda Func;ons • SQS • S3 • Amazon API Gateway
  • 6.
  • 7. Manage events from mul4ple sources
  • 8. Amazon Kinesis Stream What is Amazon Kinesis Stream? • Collect and process large streams of data records in real ;me. • Typical scenarios for using Streams: • Manage mul;ple producers that push their data feed directly into a stream; • Collect real-;me analy;cs and metrics; • Process applica;on logs; • Create pipeline with other AWS services (the consumers).
  • 9. from time import gmtime, strftime
 import boto3
 
 client = boto3.client(
 service_name="kinesis",
 region_name="us-east-1",
 )
 
 for i in xrange(300):
 print "sending event {}".format(i+1)
 response = client.put_record(
 StreamName="data-collection-stream",
 Data='{"name":"event-%d","data":{"payload":%d}}' % (i, i),
 PartitionKey=strftime("PK-%Y%m%d-%H%M%S", gmtime()),
 )
 print "response for event {}: {}".format(i+1, response)
 Amazon Kinesis Stream
  • 10. Amazon Kinesis Stream - Tips • Use API Gateway as entry point for front-end and mobile. • Start with a single shard and increase only when needed. • Output events one by one to avoid data loss. • Generate Par44onKey using uuid (e.g., for test purpose).
  • 11. Amazon Lambda What is AWS Lambda? • It processes a single event at real-;me without managing servers. • Highly scalable. • Fallback strategy in case of errors.
  • 12.
  • 13. Amazon Lambda - Events rou4ng
  • 14. Amazon Lambda - Events rou4ng It works as router and it is directly triggered by Kinesis Streams.
  • 15. [
 {
 "destination_name": "mixpanel",
 "destination_arn": "arn:aws:lambda:region:account-id:function:function-name:prod",
 "enabled_events": [
 "page_view",
 "search",
 "button_click",
 "page_scroll",
 ]
 },
 {
 "destination_name": "hubspotcrm",
 "destination_arn": "arn:aws:lambda:region:account-id:function:function-name:prod",
 "enabled_events": [
 "login",
 "logout",
 "registration",
 "page_view",
 "search",
 "email_sent",
 "email_open",
 ]
 },
 {
 "destination_name": "datawarehouse",
 "destination_arn": "arn:aws:lambda:region:account-id:function:function-name:prod",
 "enabled_events": [
 "login",
 "logout",
 "registration",
 "page_view",
 "search",
 "button_click",
 "page_scroll",
 "email_sent",
 "email_open",
 ]
 }
 ] Amazon Lambda - Events rou4ng
  • 16. … {
 "destination_name": "datawarehouse",
 "destination_arn": "arn:aws:lambda:region:id:function:name:prod",
 "enabled_events": [
 "login",
 "logout",
 "registration",
 "page_view",
 "search",
 "button_click",
 "page_scroll",
 "email_sent",
 "email_open",
 ]
 } Amazon Lambda - Events rou4ng
  • 17.
  • 18. Amazon Lambda - events rou4ng
  • 19. Amazon Lambda - events rou4ng It provides the logic to connect to the des4na4on services (e.g., HubSpot, Mixpanel, etc … ). Custom retry strategy (with exponen;al delay).
  • 20. Amazon Lambda - Retry strategy def lambda_handler(event, context=None):
 try:
 hub_id = os.environ['HUBSPOT_HUB_ID']
 except KeyError:
 raise DoNotRetryException('HUBSPOT_HUB_ID')
 event = format_event_data(event, hub_id)
 process_event(event['data'])
 return "ok"
 
 def format_event_data(event, hub_id):
 event_id = event["name"].split(".")[-1].replace("_", " ").title()
 event['data'].update({
 '_a': hub_id,
 '_n': event_id,
 'email': event['data']['_email'],
 })
 return event
 
 @retry
 def process_event(params):
 url = 'http://track.hubspot.com/v1/event?{}'.format(urllib.urlencode(params))
 urllib2.urlopen(url)
  • 21. Amazon Lambda - Retry strategy def retry(func, max_retries=3, backoff_rate=2, scale_factor=.1):
 def func_wrapper(*args, **kwargs):
 attempts = 0
 while True:
 attempts += 1
 if attempts >= max_retries:
 raise
 try:
 return func(*args, **kwargs)
 except DoNotRetryException:
 raise
 except:
 time.sleep(backoff_rate ** attempts * scale_factor)
 return func_wrapper
 
 
 class DoNotRetryException(Exception):
 def __init__(self, *args, **kwargs):
 Exception.__init__(self, *args, **kwargs)
  • 22. Amazon Lambda - Our 4ps • Enable Kinesis Stream as a trigger for other AWS services. • To preserve the priority Configure trigger with Batch size: 1 and Star;ng posi;on: Trim Horizon • An S3 file can be used to define the rou;ng rules. • Invoke Lambda Func;ons that work as connector asynchronously • Always create aliases and versions for each Func;on. • Use environment variables for configura;ons. • Create a custom IAM role for each Func;on. • Detect delays in stream processing monitoring IteratorAge metric
 in the Lambda console’s monitoring tab.
  • 23.
  • 24. Dead LeIer Queues (DLQ) - Avoid event loss
  • 25. DLQ - Simple Queue Service (SQS) What is AWS SQS? • Lambda automa4cally retries failed execu;ons for asynchronous invoca;ons. • Configure Lambda (advanced secngs) to forward payloads that were not processed to a dead-leIer queue (an SQS queue or an SNS topic). • We used a SQS.
  • 26. def get_events_from_sqs(
 sqs_queue_name,
 region_name='us-west-2',
 purge_messages=False,
 backup_filename='backup.jsonl',
 visibility_timeout=60):
 """
 Create a json backup file of all events in the SQS queue with the given 'sqs_queue_name'.
 :sqs_queue_name: the name of the AWS SQS queue to be read via boto3
 :region_name: the region name of the AWS SQS queue to be read via boto3
 :purge_messages: True if messages must be deleted after reading, False otherwise
 :backup_filename: the name of the file where to store all SQS messages
 :visibility_timeout: period of time in seconds (unique consumer window)
 :return: the number of processed batch of events
 """
 forwarded = 0
 counter = 0
 sqs = boto3.resource('sqs', region_name=region_name)
 dlq = sqs.get_queue_by_name(QueueName=sqs_queue_name) # continues to next slide .. Amazon Lambda - Events rou4ng
  • 27. Amazon Lambda - Events rou4ng # continues from previous slide .. with open(backup_filename, 'a') as filep:
 while True:
 batch_messages = dlq.receive_messages(
 MessageAttributeNames=['All'],
 MaxNumberOfMessages=10,
 WaitTimeSeconds=20,
 VisibilityTimeout=visibility_timeout,
 )
 for msg in batch_messages:
 try:
 line = "{}n".format(json.dumps({
 'attributes': msg.message_attributes,
 'body': msg.body,
 }))
 print("Line: ", line)
 filep.write(line)
 if purge_messages:
 print('Deleting message from the queue.')
 msg.delete()
 forwarded += 1
 except Exception as ex:
 print("Error in processing message %s: %r", msg, ex)
 counter += 1
 print('Batch %d processed', counter)
  • 28. DLQ - Our 4ps • Set a DLQ on each Lambda Func;on that can fail. • Re-process events sent to DLQ with a custom script. • Tune DLQ config directly from Lambda Func;on panel.
  • 29. Conclusions Why a serverless architecture? • scalability - prevent data loss - full control on each step - costs Open points: • Integrate a custom CloudWatch dashboard. • Configure Firehose for a Backup. • Write a script that manages events sent to DLQs. • Create a listener for anomaly detec;on with Kinesis Analy;cs. • Amazon StepFunc;ons.
  • 31. Useful links These slides: Create a serverless architecture for data collec4on with Python and AWS —> hGp://clda.co/pycon8-serverless-data-collec;on Blog post with code snippets: Building a serverless architecture for data collec4on with AWS Lambda —> hGp://clda.co/pycon8-data-collec;on-blogpost Serverless Learning Path: GeJng Started with Serverless Compu4ng —> hGp://clda.co/pycon8-serverless-LP