beSharp a serverless approach to big data on aws

•

0 likes•54 views

Claudio Pontili

Data & Analytics

AWS SummitEMEA
Claudio Pontili
AWS Senior Cloud Solution Architect
Claudio.pontili@besharp.it
AServerless approachtoBigDataonAWS

Claudio Pontili
10+ years of experience on AWS
Senior Cloud Solution Architect
AWS Authorized Instructor Champion
Claudio.pontili@besharp.it
https://www.linkedin.com/in/claudiopontili/

Agenda
• Using Lambda for ETLs
• Glue ETLs
• CI/CD to deploy code inside Lambdas and Glue Jobs
• Datawarehousing on Aurora Serverless v1
• A full serverless Big Data Architecture
• What we’ve learned

Using Lambda for ETLs 1/2
• You can use Python + Panda library
• A lambda can have 10 GB of memory and a lot of CPU
power
• A lambda can run for 15 minutes
• Max deployment package 50 MB (zipped)
• Container image code package size 10 GB
• /tmp directory storage 512 MB

Glue Jobs, Data Catalog and Crawler 1/2
• Fully managed Data Catlog and Extract-Transform-Load (ETL) service
• Automates data discovery, conversion, mapping and job scheduling
• Glue runs your ETL jobs in an Apache Spark serverless envinronment
• Allow to scale your ETLs jobs
• Can easily schedule a crawler to to create a catalog of files stored on
S3
• Too much code? Try Glue Databrew

RDS Aurora Serverless
• MySql and PostgreSQL supported (reuse
the experience of your team)
• Pay per ACU/hours (2 GB of memory)
• Scales from 1 to 256 ACU
• You can pause the cluster during the night
• Aurora Serverless v2 in preview for
MultiAz, Read-Replicas, faster scale

What we’ve learned
• Serverless gives you High Availability and
great scalability with no effort
• Pause Aurora Serverless v1 (it will take
about 30 seconds to restart)
• Use IaC (Cloudformation, Terraform, CDK,
etc) to deploy your infrastructure
• Tune your lamba memory using
https://github.com/alexcasalboni/aws-
lambda-power-tuning
• S3 is cheap but try not to write tiny
(<128KB) files
• Serverless can be pretty cheap if it’s used
in the right way

www.besharp.it
info@besharp.it
+39 0382 1692920
beSharp srl - viale Ludovico il Moro 27 - 27100 Pavia (ITALY)
VAT ID IT02415160189
Follow beSharp on

What's hot

Benchmarking Aerospike on the Google Cloud - NoSQL Speed with EaseLynn Langit

Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins) Dave Pitts

AWS to Bare Metal: Motivation, Pitfalls, and ResultsMongoDB

Spark volume requirements 2018Rachit Arora

Mining public datasets using opensource tools: Zeppelin, Spark and Jujuseoul_engineer

London HUG 8/3 - NomadLondon HashiCorp User Group

Cloudsolutionday 2016: Getting Started with Severless ArchitectureAWS Vietnam Community

Serverless log analytics with Amazon KinesisRob Greenwood

Meetup #3: Migrate a fast scale system to AWSAWS Vietnam Community

Getting Started with Serverless PHPAndrew Raines

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...ScyllaDB

DevOps in real lifeDataArt

Kafka for begginerYousun Jeong

Visualization of RDS metrics using AWS CLI and JQuery at AWS Usergroup Leipzigroot360 GmbH

Apache Superset at AirbnbBill Liu

Beyond RelationalLynn Langit

Meetup #3: Migrating an Oracle Application from on-premise to AWSAWS Vietnam Community

Apache Cassandra in the CloudInstaclustr

Long running aws lambda - Joel Schuweiler, MinneapolisAWS Chicago

What's hot (19)

Benchmarking Aerospike on the Google Cloud - NoSQL Speed with Ease

Why learn jenkins via nomad_ci (nomad/consul/docker/jenkins)

AWS to Bare Metal: Motivation, Pitfalls, and Results

Spark volume requirements 2018

Mining public datasets using opensource tools: Zeppelin, Spark and Juju

London HUG 8/3 - Nomad

Cloudsolutionday 2016: Getting Started with Severless Architecture

Serverless log analytics with Amazon Kinesis

Meetup #3: Migrate a fast scale system to AWS

Getting Started with Serverless PHP

Scylla Summit 2022: Rakuten’s Catalog Platform Migration from Cassandra to Sc...

DevOps in real life

Kafka for begginer

Visualization of RDS metrics using AWS CLI and JQuery at AWS Usergroup Leipzig

Apache Superset at Airbnb

Beyond Relational

Meetup #3: Migrating an Oracle Application from on-premise to AWS

Apache Cassandra in the Cloud

Long running aws lambda - Joel Schuweiler, Minneapolis

Similar to beSharp a serverless approach to big data on aws

Scality S3 Server: Node js Meetup PresentationScality

Optimizing Big Data to run in the Public CloudQubole

Leveraging Databricks for Spark PipelinesRose Toomey

Leveraging Databricks for Spark pipelinesRose Toomey

re:Cap RVA - A Recap of AWS re:Invent 2019Ford Prior

Scaling Up to Your First 10 Million UsersAmazon Web Services

Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesAmazon Web Services

Serverless design considerations for Cloud Native workloadsTensult

Training AWS: Module 8 - RDS, Aurora, ElastiCacheBùi Quang Lâm

Five Years of EC2 DistilledGrig Gheorghiu

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. NielsenMS Cloud Summit

AWS Well Architected-Info Session WeCloudDataWeCloudData

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...Precisely

Going Serverless - an Introduction to AWS GlueMichael Rainey

AWS Community Day 2022 Shirish Joshi_Choosing between RDS and Aurora for MySQ...AWS Chicago

Running SQL Server on AWS | John McCormack | DataGrillen 2019John McCormack

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...Radhika Puthiyetath

5 Factors When Selecting a High Performance, Low Latency DatabaseScyllaDB

Best of re:InventAmazon Web Services

Move your on prem data to a lake in a Lake in CloudCAMMS

Similar to beSharp a serverless approach to big data on aws (20)

Scality S3 Server: Node js Meetup Presentation

Optimizing Big Data to run in the Public Cloud

Leveraging Databricks for Spark Pipelines

Leveraging Databricks for Spark pipelines

re:Cap RVA - A Recap of AWS re:Invent 2019

Scaling Up to Your First 10 Million Users

Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series

Serverless design considerations for Cloud Native workloads

Training AWS: Module 8 - RDS, Aurora, ElastiCache

Five Years of EC2 Distilled

J1 T1 3 - Azure Data Lake store & analytics 101 - Kenneth M. Nielsen

AWS Well Architected-Info Session WeCloudData

Big Data Goes Airborne. Propelling Your Big Data Initiative with Ironcluster ...

Going Serverless - an Introduction to AWS Glue

AWS Community Day 2022 Shirish Joshi_Choosing between RDS and Aurora for MySQ...

Running SQL Server on AWS | John McCormack | DataGrillen 2019

On CloudStack, Docker, Kubernetes, and Big Data…Oh my ! By Sebastien Goasguen...

5 Factors When Selecting a High Performance, Low Latency Database

Best of re:Invent

Move your on prem data to a lake in a Lake in Cloud

Recently uploaded

PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptxAbdelrhman abooda

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort

Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster

RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh

B2 Creative Industry Response Evaluation.docxStephen266013

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort

Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863

Recently uploaded (20)

PKS-TGC-1084-630 - Stage 1 Proposal.pptx

Amazon TQM (2) Amazon TQM (2)Amazon TQM (2).pptx

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service

Call Girls In Dwarka 9654467111 Escorts Service

꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call

Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024

Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...

High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝

EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx

RA-11058_IRR-COMPRESS Do 198 series of 1998

B2 Creative Industry Response Evaluation.docx

1:1定制(UQ毕业证）昆士兰大学毕业证成绩单修改留信学历认证原版一模一样

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

DBA Basics: Getting Started with Performance Tuning.pdf

GA4 Without Cookies [Measure Camp AMS]

9654467111 Call Girls In Munirka Hotel And Home Service

Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)

Dubai Call Girls Wifey O52&786472 Call Girls Dubai

beSharp a serverless approach to big data on aws

1. AWS SummitEMEA Claudio Pontili AWS Senior Cloud Solution Architect Claudio.pontili@besharp.it AServerless approachtoBigDataonAWS

3. Claudio Pontili 10+ years of experience on AWS Senior Cloud Solution Architect AWS Authorized Instructor Champion Claudio.pontili@besharp.it https://www.linkedin.com/in/claudiopontili/

4. Agenda • Using Lambda for ETLs • Glue ETLs • CI/CD to deploy code inside Lambdas and Glue Jobs • Datawarehousing on Aurora Serverless v1 • A full serverless Big Data Architecture • What we’ve learned

5. Using Lambda for ETLs 1/2 • You can use Python + Panda library • A lambda can have 10 GB of memory and a lot of CPU power • A lambda can run for 15 minutes • Max deployment package 50 MB (zipped) • Container image code package size 10 GB • /tmp directory storage 512 MB

6. Using Lambda for ETLs 2/2

7. Glue Jobs, Data Catalog and Crawler 1/2 • Fully managed Data Catlog and Extract-Transform-Load (ETL) service • Automates data discovery, conversion, mapping and job scheduling • Glue runs your ETL jobs in an Apache Spark serverless envinronment • Allow to scale your ETLs jobs • Can easily schedule a crawler to to create a catalog of files stored on S3 • Too much code? Try Glue Databrew

8. Glue Databrew

9. Glue Jobs, Data Catalog and Crawler 2/2

10. A serverless CI/CD

11. RDS Aurora Serverless • MySql and PostgreSQL supported (reuse the experience of your team) • Pay per ACU/hours (2 GB of memory) • Scales from 1 to 256 ACU • You can pause the cluster during the night • Aurora Serverless v2 in preview for MultiAz, Read-Replicas, faster scale

12. Glue Jobs, Data Catalog and Crawler 2/2

13. What we’ve learned • Serverless gives you High Availability and great scalability with no effort • Pause Aurora Serverless v1 (it will take about 30 seconds to restart) • Use IaC (Cloudformation, Terraform, CDK, etc) to deploy your infrastructure • Tune your lamba memory using https://github.com/alexcasalboni/aws- lambda-power-tuning • S3 is cheap but try not to write tiny (<128KB) files • Serverless can be pretty cheap if it’s used in the right way

14. Questions?

15. www.besharp.it info@besharp.it +39 0382 1692920 beSharp srl - viale Ludovico il Moro 27 - 27100 Pavia (ITALY) VAT ID IT02415160189 Follow beSharp on

beSharp a serverless approach to big data on aws

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to beSharp a serverless approach to big data on aws

Similar to beSharp a serverless approach to big data on aws (20)

More from Claudio Pontili

More from Claudio Pontili (9)

Recently uploaded

Recently uploaded (20)

beSharp a serverless approach to big data on aws