beSharp a serverless approach to big data on aws

AWS SummitEMEA
Claudio Pontili
AWS Senior Cloud Solution Architect
Claudio.pontili@besharp.it
AServerless approachtoBigDataonAWS

Claudio Pontili
10+ years of experience on AWS
Senior Cloud Solution Architect
AWS Authorized Instructor Champion
Claudio.pontili@besharp.it
https://www.linkedin.com/in/claudiopontili/

Agenda
• Using Lambda for ETLs
• Glue ETLs
• CI/CD to deploy code inside Lambdas and Glue Jobs
• Datawarehousing on Aurora Serverless v1
• A full serverless Big Data Architecture
• What we’ve learned

Using Lambda for ETLs 1/2
• You can use Python + Panda library
• A lambda can have 10 GB of memory and a lot of CPU
power
• A lambda can run for 15 minutes
• Max deployment package 50 MB (zipped)
• Container image code package size 10 GB
• /tmp directory storage 512 MB

Glue Jobs, Data Catalog and Crawler 1/2
• Fully managed Data Catlog and Extract-Transform-Load (ETL) service
• Automates data discovery, conversion, mapping and job scheduling
• Glue runs your ETL jobs in an Apache Spark serverless envinronment
• Allow to scale your ETLs jobs
• Can easily schedule a crawler to to create a catalog of files stored on
S3
• Too much code? Try Glue Databrew

Glue Jobs, Data Catalog and Crawler 2/2

RDS Aurora Serverless
• MySql and PostgreSQL supported (reuse
the experience of your team)
• Pay per ACU/hours (2 GB of memory)
• Scales from 1 to 256 ACU
• You can pause the cluster during the night
• Aurora Serverless v2 in preview for
MultiAz, Read-Replicas, faster scale

What we’ve learned
• Serverless gives you High Availability and
great scalability with no effort
• Pause Aurora Serverless v1 (it will take
about 30 seconds to restart)
• Use IaC (Cloudformation, Terraform, CDK,
etc) to deploy your infrastructure
• Tune your lamba memory using
https://github.com/alexcasalboni/aws-
lambda-power-tuning
• S3 is cheap but try not to write tiny
(<128KB) files
• Serverless can be pretty cheap if it’s used
in the right way

www.besharp.it
info@besharp.it
+39 0382 1692920
beSharp srl - viale Ludovico il Moro 27 - 27100 Pavia (ITALY)
VAT ID IT02415160189
Follow beSharp on

beSharp a serverless approach to big data on aws

More Related Content

What's hot

Similar to beSharp a serverless approach to big data on aws

More from Claudio Pontili

Recently uploaded

beSharp a serverless approach to big data on aws