SlideShare a Scribd company logo
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Advanced Workflows with AWS Glue
Santosh Chandrachood
SDM
AWS Glue
A N T 3 7 2
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Agenda
Overview
Building Blocks
Building a usecase
Event driven workflows
Workflow considerations
Monitoring and Tuning
Bring your own workflow engine
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Breakout repeats
Monday, Nov 26
ANT 372 – [CT] Building Advanced Workflows with AWS Glue
10:00 – 11:00 | Aria East, Plaza Level, Orovada 3
Tuesday, Nov 27
ANT 333 – [BS] Building Advanced Workflows with AWS Glue
2:30 – 3:30 | Mirage, Grand Ballroom D, Table 4
Wednesday, Nov 28
ANT 381 – [BS] Building Advanced Workflows with AWS Glue
5:30 – 6:30 | Aria West, Level 3, Starvine 10, Table 5
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Fully-managed, serverless extract-transform-load (ETL) service
for developers, built by developers
1000s of customers and jobs
AWS Glue
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
OrchestrationData Catalog ETL Engine
Automatic crawling
Apache Hive Metastore compatible
Integrated with AWS analytic services
Discover
Flexible scheduling
Monitoring and alerting
External integrations
Deploy
Apache Spark core
Python and Scala
Auto-generates ETL code
Develop
AWS Glue Components
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Apache Spark is a distributed data processing engine for complex analytics.
AWS Glue builds on the Apache Spark to offer ETL specific functionality.
Spark Core: RDDs
Spark DataFrames Glue DynamicFrames
SparkSQL AWS Glue ETL
Review: Apache Spark and AWS Glue ETL
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building Blocks
Crawlers Jobs TriggersEntities
Schedule ExternalEventsDependencies
Conditions TimeoutRetriesControl
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Building a usecase
raw
optimize optimized
SLA
reporting
New
variable mins variable mins
Goal: compose jobs in DAG through dependencies
In-practice: time-based workflows
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
New features
External
Conditions
Control
Publish crawler and job notifications into CloudWatch events
CloudWatch events to control downstream workflows
‘ANY’ and ‘AND’ operators in Trigger conditions
Additional job states ’failed’, ‘stopped’, or ‘timeout’
Configure job timeout
Job delay notifications
On-demand cancel
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Example event-driven workflow
raw
optimize optimized
SLA
reporting
New
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow considerations
• Incremental data processing
• Job bookmarks to keep state
• Job parameters to select new datasets
• Job size
• Unique versus One job per logical units of work
• Multiple small jobs or one big job
• Job parameters
• Initial, Global, In-between jobs
• Use Amazon S3 to pass parameters
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow considerations
• Data processing unit
• Number of DPUs
• Adjusting DPUs
• SLA
• Job delays notifications
• Timeouts
• Error handling
• Retry logic
• Integration with 3rd party
• Job re-run
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow monitoring—Performance
• How is your dataset
partitioned?
• How is your application
divided into jobs and
stages?
• Data is divided into
partitions that are
processed concurrently
Driver
Executors
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Workflow monitoring—Metrics
• Job metrics
• CPU
• Memory
• Network
• Executors, stages
• Data movement
• Use data points to
adjust job parameters
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Bring your own workflow engine
Triggers
Top-level AWS
Glue job
CloudWatch
Events
Jobs
Lambda
States
SNS
Config
STEP
Thank you!
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.
Santosh Chandrachood
glue-feedback@amazon.com
© 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.

More Related Content

What's hot

AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스
AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스
AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스
Amazon Web Services Korea
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
Amazon Web Services
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
sampath439572
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Amazon Web Services
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Amazon Web Services
 
Cost optimization - Don't overspend on AWS
Cost optimization - Don't overspend on AWSCost optimization - Don't overspend on AWS
Cost optimization - Don't overspend on AWS
Sandeep Cashyap
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
Amazon Web Services
 
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Amazon Web Services
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
Lam Le
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
Amazon Web Services
 
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWSAWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
Amazon Web Services Korea
 
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
Amazon Web Services Korea
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
Amazon Web Services
 
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 201830분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018Amazon Web Services Korea
 
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and StrategiesAWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
Amazon Web Services
 
AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항
AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항
AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항
Amazon Web Services Korea
 
Heterogenous Migration with DMS & SCT
Heterogenous Migration with DMS & SCTHeterogenous Migration with DMS & SCT
Heterogenous Migration with DMS & SCT
Amazon Web Services
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
Amazon Web Services
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
Amazon Web Services
 
AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...
AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...
AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...
Amazon Web Services Korea
 

What's hot (20)

AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스
AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스
AWS Summit Seoul 2023 | 비즈니스 경쟁에서 승리하기 위한 AWS AI/ML 서비스
 
Building Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWSBuilding Data Lakes and Analytics on AWS
Building Data Lakes and Analytics on AWS
 
AWS Data Analytics on AWS
AWS Data Analytics on AWSAWS Data Analytics on AWS
AWS Data Analytics on AWS
 
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
Aurora Serverless: Scalable, Cost-Effective Application Deployment (DAT336) -...
 
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best PracticesBuild Data Lakes & Analytics on AWS: Patterns & Best Practices
Build Data Lakes & Analytics on AWS: Patterns & Best Practices
 
Cost optimization - Don't overspend on AWS
Cost optimization - Don't overspend on AWSCost optimization - Don't overspend on AWS
Cost optimization - Don't overspend on AWS
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...Cost efficiencies and security best practices with Amazon S3 storage - STG301...
Cost efficiencies and security best practices with Amazon S3 storage - STG301...
 
Module 2 - Datalake
Module 2 - DatalakeModule 2 - Datalake
Module 2 - Datalake
 
Best Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWSBest Practices for Building Your Data Lake on AWS
Best Practices for Building Your Data Lake on AWS
 
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWSAWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
AWS에서 빅데이터 프로젝트 시작하기 - 이종화 솔루션즈 아키텍트, AWS
 
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
AWS Summit Seoul 2023 | 삼성전자/쿠팡의 대규모 트래픽 처리를 위한 클라우드 네이티브 데이터베이스 활용
 
Amazon Aurora: Under the Hood
Amazon Aurora: Under the HoodAmazon Aurora: Under the Hood
Amazon Aurora: Under the Hood
 
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 201830분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
30분만에 만드는 AWS 기반 빅데이터 분석 애플리케이션::안효빈::AWS Summit Seoul 2018
 
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and StrategiesAWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
AWS Webinar Series - Cost Optimisation Levers, Tools, and Strategies
 
AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항
AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항
AWS Summit Seoul 2023 | 성공적인 AWS RDS 마이그레이션을 위한 여정과 필수 고려사항
 
Heterogenous Migration with DMS & SCT
Heterogenous Migration with DMS & SCTHeterogenous Migration with DMS & SCT
Heterogenous Migration with DMS & SCT
 
Cost Optimisation on AWS
Cost Optimisation on AWSCost Optimisation on AWS
Cost Optimisation on AWS
 
Building Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS GlueBuilding Serverless ETL Pipelines with AWS Glue
Building Serverless ETL Pipelines with AWS Glue
 
AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...
AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...
AWS를 활용한 Digital Manufacturing 실현 방법 및 사례 소개 - Douglas Bellin, 월드와이드 제조 솔루션 담...
 

Similar to Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018

Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Amazon Web Services
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Amazon Web Services
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Amazon Web Services
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
Amazon Web Services
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Amazon Web Services
 
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Amazon Web Services
 
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
Amazon Web Services
 
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Amazon Web Services
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
Amazon Web Services
 
Coordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdfCoordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdf
Amazon Web Services
 
Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018
AWS Germany
 
Accelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWSAccelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWS
Amazon Web Services
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Web Services
 
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
Amazon Web Services
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Amazon Web Services
 
AWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nubeAWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nube
Amazon Web Services LATAM
 
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Amazon Web Services
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
Amazon Web Services
 
AWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfAWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdf
Sal Marcus
 
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Amazon Web Services
 

Similar to Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018 (20)

Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
Build End-to-End IT Lifecycle Management on AWS with ServiceNow (ENT330) - AW...
 
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
Analyze Amazon CloudFront and Lambda@Edge Logs to Improve Customer Experience...
 
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
Distributed Solar Systems at EDF Renewables and AWS IoT: A Natural Fit (PUT30...
 
Migrating database to cloud
Migrating database to cloudMigrating database to cloud
Migrating database to cloud
 
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
Serverless State Management & Orchestration for Modern Apps (API302) - AWS re...
 
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
Drive Self-Service & Standardization in the First 100 Days of Your Cloud Migr...
 
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
How Dow Jones Uses AWS to Enable Innovation and New Engineering Work (CTD316)...
 
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
Securing Machine Learning Deployments for the Enterprise (SEC369-R1) - AWS re...
 
Data Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech TalksData Transformation Patterns in AWS - AWS Online Tech Talks
Data Transformation Patterns in AWS - AWS Online Tech Talks
 
Coordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdfCoordinating Microservices with AWS Step Functions.pdf
Coordinating Microservices with AWS Step Functions.pdf
 
Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018Data Design and Modeling for Microservices I AWS Dev Day 2018
Data Design and Modeling for Microservices I AWS Dev Day 2018
 
Accelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWSAccelerate and Modernise Microsoft Workload Migrations on AWS
Accelerate and Modernise Microsoft Workload Migrations on AWS
 
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
Amazon Athena: What's New and How SendGrid Innovates (ANT324) - AWS re:Invent...
 
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
SaaS Reference Architectures: Review of Real-World Patterns & Strategies (GPS...
 
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
Architecting for Real-Time Insights with Amazon Kinesis (ANT310) - AWS re:Inv...
 
AWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nubeAWS Cloud Experience CA: AWS y su marco de valor en la nube
AWS Cloud Experience CA: AWS y su marco de valor en la nube
 
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
Accelerate Oracle to Aurora PostgreSQL Migration (GPSTEC313) - AWS re:Invent ...
 
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
How GumGum Migrated from Cassandra to Amazon DynamoDB (DAT345) - AWS re:Inven...
 
AWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdfAWSomeBuilder3-v12-clean.pdf
AWSomeBuilder3-v12-clean.pdf
 
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
Gaining Better Observability of Your VMs with Amazon CloudWatch - AWS Online ...
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
Amazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
Amazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
Amazon Web Services
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Amazon Web Services
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
Amazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
Amazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Amazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
Amazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Amazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
Amazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Building Advanced Workflows with AWS Glue (ANT372) - AWS re:Invent 2018

  • 1.
  • 2. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Advanced Workflows with AWS Glue Santosh Chandrachood SDM AWS Glue A N T 3 7 2
  • 3. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Agenda Overview Building Blocks Building a usecase Event driven workflows Workflow considerations Monitoring and Tuning Bring your own workflow engine
  • 4. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Breakout repeats Monday, Nov 26 ANT 372 – [CT] Building Advanced Workflows with AWS Glue 10:00 – 11:00 | Aria East, Plaza Level, Orovada 3 Tuesday, Nov 27 ANT 333 – [BS] Building Advanced Workflows with AWS Glue 2:30 – 3:30 | Mirage, Grand Ballroom D, Table 4 Wednesday, Nov 28 ANT 381 – [BS] Building Advanced Workflows with AWS Glue 5:30 – 6:30 | Aria West, Level 3, Starvine 10, Table 5
  • 5. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Fully-managed, serverless extract-transform-load (ETL) service for developers, built by developers 1000s of customers and jobs AWS Glue
  • 6. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. OrchestrationData Catalog ETL Engine Automatic crawling Apache Hive Metastore compatible Integrated with AWS analytic services Discover Flexible scheduling Monitoring and alerting External integrations Deploy Apache Spark core Python and Scala Auto-generates ETL code Develop AWS Glue Components
  • 7. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Spark is a distributed data processing engine for complex analytics. AWS Glue builds on the Apache Spark to offer ETL specific functionality. Spark Core: RDDs Spark DataFrames Glue DynamicFrames SparkSQL AWS Glue ETL Review: Apache Spark and AWS Glue ETL
  • 8. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building Blocks Crawlers Jobs TriggersEntities Schedule ExternalEventsDependencies Conditions TimeoutRetriesControl
  • 9. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Building a usecase raw optimize optimized SLA reporting New variable mins variable mins Goal: compose jobs in DAG through dependencies In-practice: time-based workflows
  • 10. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. New features External Conditions Control Publish crawler and job notifications into CloudWatch events CloudWatch events to control downstream workflows ‘ANY’ and ‘AND’ operators in Trigger conditions Additional job states ’failed’, ‘stopped’, or ‘timeout’ Configure job timeout Job delay notifications On-demand cancel
  • 11. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Example event-driven workflow raw optimize optimized SLA reporting New
  • 12. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow considerations • Incremental data processing • Job bookmarks to keep state • Job parameters to select new datasets • Job size • Unique versus One job per logical units of work • Multiple small jobs or one big job • Job parameters • Initial, Global, In-between jobs • Use Amazon S3 to pass parameters
  • 13. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow considerations • Data processing unit • Number of DPUs • Adjusting DPUs • SLA • Job delays notifications • Timeouts • Error handling • Retry logic • Integration with 3rd party • Job re-run
  • 14. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow monitoring—Performance • How is your dataset partitioned? • How is your application divided into jobs and stages? • Data is divided into partitions that are processed concurrently Driver Executors
  • 15. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Workflow monitoring—Metrics • Job metrics • CPU • Memory • Network • Executors, stages • Data movement • Use data points to adjust job parameters
  • 16. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Bring your own workflow engine Triggers Top-level AWS Glue job CloudWatch Events Jobs Lambda States SNS Config STEP
  • 17. Thank you! © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved. Santosh Chandrachood glue-feedback@amazon.com
  • 18. © 2018, Amazon Web Services, Inc. or its affiliates. All rights reserved.