SlideShare a Scribd company logo
1 of 41
Automate all your EMR related activities
Eitan Sela - System Architect
eitan.sela@weissbeerger.com
$ whoami
• "Hands-On" system Architect with more than 17 years of
experience with billing, banking, information security (DLP) and
Cloud IoT/Big Data applications.
• Big Data specialist – Hadoop, Spark, Hive and EMR on AWS.
• Work with vast AWS services, and with serverless projects
especially.
• Java development, scalability performance and stabilization
expert.
• Alexa skills developer.
• Love to share my experience in lectures and meetups.
What to expect from this session
• WeissBeerger use case – Aggregating raw orders and IoT data.
• Amazon EMR basics.
• Implementing ETLs with Spark.
• Submitting work to a Cluster.
• Provisioning scheduled transient EMR Clusters for ETLs jobs.
• Our new Slack Chabot for EMR, using Amazon Lex!
WeissBeerger use case – Aggregating raw orders and IoT data
Solution
• WeissBeerger bridges the gap between breweries, bars and
customers.
Benefits for the brewery
• Consumption Analytics.
• Dynamic Promotions.
• Beer Quality.
• Value creation.
• Beer penetration.
Benefits for the bar
• Real Time Consumption Tracking.
• Waste Reduction.
• Sales Growth.
How does it work?
• IoT (Pouring) – Beverage Analytics Hub.
• Point of sales – POS Vendors, via REST API, S3, DB, etc.
Aggregating raw point of sales orders and IoT data
Amazon EMR basics
Amazon EMR - Easily Run and Scale Apache Hadoop, Spark, HBase, Presto,
Hive, and other Big Data Frameworks
Amazon EMR – Create Cluster – Software and Steps
Amazon EMR – Create Cluster – Hardware
Amazon EMR – General Cluster Settings
Amazon EMR – Security
Implementing ETLs with Spark
Problem – Huge queries from MySQL to aggregative tables in Redshift
Solution - Implementing all ETL’s with PySpark
New data pipeline using EMR PySpark jobs – ELT rather than ETL
Submitting work to a Cluster
Launching Applications with spark-submit
./bin/spark-submit 
--jars jar1.jar,jar2.jar 
--py-files path/to/my/pymodule1.py, path/to/my/pymodule2.py
my_program.py arg1 arg2
• The spark-submit script in Spark’s bin directory is used to launch
applications on a cluster.
• It can use all of Spark’s supported cluster managers through a
uniform interface so you don’t have to configure your application
especially for each one.
EMR Steps - Submit Work to a Cluster
• You can submit work to a cluster by adding steps or by interactively
submitting Hadoop jobs to the master node.
• You can add steps to a cluster using the AWS Management Console,
the AWS CLI, or the Amazon EMR API.
• You can add step during cluster creation or to a running cluster.
EMR Steps – Job Types
• Custom Jar.
• Streaming Program.
• Spark App.
• Hive Program.
EMR Steps - Lifecycle
• Pending
• Cancelled (by user or API request)
• Running
• Completed / Failed.
EMR Steps – Add jobs to a running cluster
EMR Steps – View logs
WeissBeerger’s Spark ETL jobs submitted to EMR Cluster
Provisioning scheduled transient EMR Clusters for ETLs jobs
Requirements
• Run ETL using Spark on EMR cluster every 1 hour for one month
back.
• Input: MySQL or Hive (stg).
• Output: Hive (stg) or Redshift.
• Storage should be separated from the compute, so EMR clusters
should be transient.
• Multiple clusters should be able to run together.
• Fully automated and monitored.
Passing Spark Job steps parameters to Lambda input
• We created a simple json with all parameters required to add step to EMR cluster.
Monitoring EMR Steps with Lambda and Datadog
• We created a Lambda to sample all running EMR clusters for failed steps.
As more developers are developing PySpark Jobs…
Our new Slack Chabot for EMR, using Amazon Lex
Amazon Lex
• Conversational interfaces for your applications.
• Powered by the same deep learning technologies as Alexa.
• Amazon Lex provides the advanced deep learning functionalities of
automatic speech recognition (ASR) for converting speech to text,
and natural language understanding (NLU).
Amazon Lex - Use cases - Call Center Bots
awsbot - the chatbot that help you manage AWS resources
awsbot - Demo
awsbot - Demo - EMR Cluster is ready
Q & A
We Are Hiring!
Senior Data Scientist
Senior Designer (UI/UX)
Senior Full Stack Developer
Java Developer
Senior Manual QA
Director of Ops
BI Analyst
Data Management Analyst
Customer Success Manager
Senior BI Analyst

More Related Content

What's hot

開発者におくるサーバーレスモニタリング
開発者におくるサーバーレスモニタリング開発者におくるサーバーレスモニタリング
開発者におくるサーバーレスモニタリングAmazon Web Services Japan
 
Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )Harish Ganesan
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWSAmazon Web Services
 
Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017
Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017
Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017Amazon Web Services
 
CircleCI vs. CodePipeline
CircleCI vs. CodePipelineCircleCI vs. CodePipeline
CircleCI vs. CodePipelineHonMarkHunt
 
クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス - クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス - SORACOM, INC
 
AWS Batch Fargate対応は何をもたらすか
AWS Batch Fargate対応は何をもたらすかAWS Batch Fargate対応は何をもたらすか
AWS Batch Fargate対応は何をもたらすかShun Fukazawa
 
AWS Black Belt Techシリーズ Amazon Kinesis
AWS Black Belt Techシリーズ  Amazon KinesisAWS Black Belt Techシリーズ  Amazon Kinesis
AWS Black Belt Techシリーズ Amazon KinesisAmazon Web Services Japan
 
AWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon RedshiftAWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon RedshiftAmazon Web Services Japan
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...Amazon Web Services Japan
 
サーバーレスでアンケートフォームを作ってみた
サーバーレスでアンケートフォームを作ってみたサーバーレスでアンケートフォームを作ってみた
サーバーレスでアンケートフォームを作ってみたryutakatori
 
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Amazon Web Services
 
[AWSマイスターシリーズ] AWS OpsWorks
[AWSマイスターシリーズ] AWS OpsWorks[AWSマイスターシリーズ] AWS OpsWorks
[AWSマイスターシリーズ] AWS OpsWorksAmazon Web Services Japan
 
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...Works Applications
 
AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...
AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...
AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...Amazon Web Services Korea
 
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon ConnectAmazon Web Services Japan
 

What's hot (20)

開発者におくるサーバーレスモニタリング
開発者におくるサーバーレスモニタリング開発者におくるサーバーレスモニタリング
開発者におくるサーバーレスモニタリング
 
Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )Auto scaling using Amazon Web Services ( AWS )
Auto scaling using Amazon Web Services ( AWS )
 
High Performance Computing on AWS
High Performance Computing on AWSHigh Performance Computing on AWS
High Performance Computing on AWS
 
Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017
Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017
Amazon Elastic Compute Cloud (EC2) - Module 2 Part 1 - AWSome Day 2017
 
CircleCI vs. CodePipeline
CircleCI vs. CodePipelineCircleCI vs. CodePipeline
CircleCI vs. CodePipeline
 
クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス - クラウドのためのアーキテクチャ設計 - ベストプラクティス -
クラウドのためのアーキテクチャ設計 - ベストプラクティス -
 
AWS Batch Fargate対応は何をもたらすか
AWS Batch Fargate対応は何をもたらすかAWS Batch Fargate対応は何をもたらすか
AWS Batch Fargate対応は何をもたらすか
 
Microsoft licensing on AWS
Microsoft licensing on AWSMicrosoft licensing on AWS
Microsoft licensing on AWS
 
AWS Black Belt Techシリーズ Amazon Kinesis
AWS Black Belt Techシリーズ  Amazon KinesisAWS Black Belt Techシリーズ  Amazon Kinesis
AWS Black Belt Techシリーズ Amazon Kinesis
 
AWS ELB
AWS ELBAWS ELB
AWS ELB
 
AWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon RedshiftAWS Black Belt Online Seminar Amazon Redshift
AWS Black Belt Online Seminar Amazon Redshift
 
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
202201 AWS Black Belt Online Seminar Apache Spark Performnace Tuning for AWS ...
 
サーバーレスでアンケートフォームを作ってみた
サーバーレスでアンケートフォームを作ってみたサーバーレスでアンケートフォームを作ってみた
サーバーレスでアンケートフォームを作ってみた
 
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
Using AWS Batch and AWS Step Functions to Design and Run High-Throughput Work...
 
[AWSマイスターシリーズ] AWS OpsWorks
[AWSマイスターシリーズ] AWS OpsWorks[AWSマイスターシリーズ] AWS OpsWorks
[AWSマイスターシリーズ] AWS OpsWorks
 
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
RDB脳でCassandra / MSAを始めた僕達が、分散Drivenなトランザクション管理にたどり着くまで / A journey to a...
 
AWSではじめるDNSSEC
AWSではじめるDNSSECAWSではじめるDNSSEC
AWSではじめるDNSSEC
 
AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...
AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...
AWS 비용 효율화를 고려한 Reserved Instance + Savings Plan 옵션 - 박윤 어카운트 매니저 :: AWS Game...
 
AWS 101
AWS 101AWS 101
AWS 101
 
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
[最新バージョンの情報がDescription欄にございます]AWS Black Belt Online Seminar 2018 Amazon Connect
 

Similar to Automate all your EMR related activities

Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Amazon Web Services
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRAmazon Web Services
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Amazon Web Services
 
Lighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache SparkLighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache SparkAmazon Web Services
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Sid Anand
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaHelen Rogers
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Amazon Web Services
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recapCloudHesive
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSAmazon Web Services
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)Spark Summit
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesAmazon Web Services
 
Cloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersCloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersEitan Sela
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon Web Services
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRAmazon Web Services
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAmazon Web Services
 
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Amazon Web Services
 

Similar to Automate all your EMR related activities (20)

Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMRSpark and the Hadoop Ecosystem: Best Practices for Amazon EMR
Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR
 
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
Building Machine Learning models with Apache Spark and Amazon SageMaker | AWS...
 
Lighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache SparkLighting your Big Data Fire with Apache Spark
Lighting your Big Data Fire with Apache Spark
 
Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)Cloud Native Data Pipelines (DataEngConf SF 2017)
Cloud Native Data Pipelines (DataEngConf SF 2017)
 
Aws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon ElishaAws-What You Need to Know_Simon Elisha
Aws-What You Need to Know_Simon Elisha
 
Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)Design patterns and best practices for data analytics with amazon emr (ABD305)
Design patterns and best practices for data analytics with amazon emr (ABD305)
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
Aws re invent 2018 recap
Aws re invent 2018 recapAws re invent 2018 recap
Aws re invent 2018 recap
 
Best Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWSBest Practices for Using Apache Spark on AWS
Best Practices for Using Apache Spark on AWS
 
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
A Big Data Lake Based on Spark for BBVA Bank-(Oscar Mendez, STRATIO)
 
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar SeriesIntroducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
Introducing Amazon EMR Release 5.0 - August 2016 Monthly Webinar Series
 
Cloud & Native Cloud for Managers
Cloud & Native Cloud for ManagersCloud & Native Cloud for Managers
Cloud & Native Cloud for Managers
 
Amazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best PracticesAmazon EMR Deep Dive & Best Practices
Amazon EMR Deep Dive & Best Practices
 
Data Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMRData Science & Best Practices for Apache Spark on Amazon EMR
Data Science & Best Practices for Apache Spark on Amazon EMR
 
Managing Your Cloud Assets
Managing Your Cloud AssetsManaging Your Cloud Assets
Managing Your Cloud Assets
 
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWSAWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
AWS April 2016 Webinar Series - Best Practices for Apache Spark on AWS
 
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...
 

Recently uploaded

EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024StefanoLambiase
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfAlina Yurenko
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...gurkirankumar98700
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 

Recently uploaded (20)

Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
Dealing with Cultural Dispersion — Stefano Lambiase — ICSE-SEIS 2024
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdfGOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
GOING AOT WITH GRAALVM – DEVOXX GREECE.pdf
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
(Genuine) Escort Service Lucknow | Starting ₹,5K To @25k with A/C 🧑🏽‍❤️‍🧑🏻 89...
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort ServiceHot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
Hot Sexy call girls in Patel Nagar🔝 9953056974 🔝 escort Service
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 

Automate all your EMR related activities

  • 1. Automate all your EMR related activities Eitan Sela - System Architect eitan.sela@weissbeerger.com
  • 2. $ whoami • "Hands-On" system Architect with more than 17 years of experience with billing, banking, information security (DLP) and Cloud IoT/Big Data applications. • Big Data specialist – Hadoop, Spark, Hive and EMR on AWS. • Work with vast AWS services, and with serverless projects especially. • Java development, scalability performance and stabilization expert. • Alexa skills developer. • Love to share my experience in lectures and meetups.
  • 3. What to expect from this session • WeissBeerger use case – Aggregating raw orders and IoT data. • Amazon EMR basics. • Implementing ETLs with Spark. • Submitting work to a Cluster. • Provisioning scheduled transient EMR Clusters for ETLs jobs. • Our new Slack Chabot for EMR, using Amazon Lex!
  • 4. WeissBeerger use case – Aggregating raw orders and IoT data
  • 5. Solution • WeissBeerger bridges the gap between breweries, bars and customers.
  • 6. Benefits for the brewery • Consumption Analytics. • Dynamic Promotions. • Beer Quality. • Value creation. • Beer penetration.
  • 7. Benefits for the bar • Real Time Consumption Tracking. • Waste Reduction. • Sales Growth.
  • 8. How does it work? • IoT (Pouring) – Beverage Analytics Hub. • Point of sales – POS Vendors, via REST API, S3, DB, etc.
  • 9. Aggregating raw point of sales orders and IoT data
  • 11. Amazon EMR - Easily Run and Scale Apache Hadoop, Spark, HBase, Presto, Hive, and other Big Data Frameworks
  • 12. Amazon EMR – Create Cluster – Software and Steps
  • 13. Amazon EMR – Create Cluster – Hardware
  • 14. Amazon EMR – General Cluster Settings
  • 15. Amazon EMR – Security
  • 17. Problem – Huge queries from MySQL to aggregative tables in Redshift
  • 18. Solution - Implementing all ETL’s with PySpark
  • 19. New data pipeline using EMR PySpark jobs – ELT rather than ETL
  • 20. Submitting work to a Cluster
  • 21. Launching Applications with spark-submit ./bin/spark-submit --jars jar1.jar,jar2.jar --py-files path/to/my/pymodule1.py, path/to/my/pymodule2.py my_program.py arg1 arg2 • The spark-submit script in Spark’s bin directory is used to launch applications on a cluster. • It can use all of Spark’s supported cluster managers through a uniform interface so you don’t have to configure your application especially for each one.
  • 22. EMR Steps - Submit Work to a Cluster • You can submit work to a cluster by adding steps or by interactively submitting Hadoop jobs to the master node. • You can add steps to a cluster using the AWS Management Console, the AWS CLI, or the Amazon EMR API. • You can add step during cluster creation or to a running cluster.
  • 23. EMR Steps – Job Types • Custom Jar. • Streaming Program. • Spark App. • Hive Program.
  • 24. EMR Steps - Lifecycle • Pending • Cancelled (by user or API request) • Running • Completed / Failed.
  • 25. EMR Steps – Add jobs to a running cluster
  • 26. EMR Steps – View logs
  • 27. WeissBeerger’s Spark ETL jobs submitted to EMR Cluster
  • 28. Provisioning scheduled transient EMR Clusters for ETLs jobs
  • 29. Requirements • Run ETL using Spark on EMR cluster every 1 hour for one month back. • Input: MySQL or Hive (stg). • Output: Hive (stg) or Redshift. • Storage should be separated from the compute, so EMR clusters should be transient. • Multiple clusters should be able to run together. • Fully automated and monitored.
  • 30.
  • 31. Passing Spark Job steps parameters to Lambda input • We created a simple json with all parameters required to add step to EMR cluster.
  • 32. Monitoring EMR Steps with Lambda and Datadog • We created a Lambda to sample all running EMR clusters for failed steps.
  • 33. As more developers are developing PySpark Jobs…
  • 34. Our new Slack Chabot for EMR, using Amazon Lex
  • 35. Amazon Lex • Conversational interfaces for your applications. • Powered by the same deep learning technologies as Alexa. • Amazon Lex provides the advanced deep learning functionalities of automatic speech recognition (ASR) for converting speech to text, and natural language understanding (NLU).
  • 36. Amazon Lex - Use cases - Call Center Bots
  • 37. awsbot - the chatbot that help you manage AWS resources
  • 39. awsbot - Demo - EMR Cluster is ready
  • 40. Q & A
  • 41. We Are Hiring! Senior Data Scientist Senior Designer (UI/UX) Senior Full Stack Developer Java Developer Senior Manual QA Director of Ops BI Analyst Data Management Analyst Customer Success Manager Senior BI Analyst