Hadoop in the cloud with AWS' EMR

•Download as PPTX, PDF•

2 likes•2,021 views

Quick intro to and walkthrough of the AWS Elastic Map Reduce (EMR) service. Part of a larger course at http://bit.ly/get-hadoop

Technology

Hadoop in the Cloud: AWS Elastic Map Reduce
• What is EMR?
• How does EMR compare to Hadoop?
• Use cases

EMR is an AWS Service
• AWS review helpful to understand
• Infiniteskills offers a course!
– http://bit.ly/learn-aws
• AWS constantly changing and evolving
http://aws.amazon.com/documentation/elasticmapreduce/

EMR Overview
• Abstracts out cluster setup & management
– Integrated provisioning, tooling, debug, monitoring
– AWS constantly tuning and optimizing
– Failed nodes automatically re-provisioned by AWS
• Reduced costs
– Clusters shut down automatically by default
– Excellent for sporadic MapReduce needs
• Integration to AWS
– Leverage cost-effective EC2 instances for processing, S3 for storage
– Monitoring done via CloudWatch

EMR Architecture
Master Instance Group
EC2
S3
Core Instance Group
EC2EC2
HDFS HDFS
Task Instance Group
EC2 EC2
EC2 EC2
• Master group controls cluster
• Core group runs DataNode &
TaskTracker daemons
• Task group runs tasks
• Can be added & removed
• S3 can be used for data input / output
• Master group coordinates core + task
activities and manages cluster state
• Core + task instances read / write to /
from S3

EMR AWS Integration
• Datastore pull / push to
– RDS
– DynamoDB
– S3
• Derived data can be stored in RedShift
– Via AWS DataPipelines
– Further post-processing
• Data can be pre-processed with Kinesis

What you give up with EMR
• Control
– Always 2-3 months behind Hadoop releases
– Cannot use CDH or HDP releases (although MapR is supported)
• Speed (if you’re not an AWS customer)
• Vendor lock-in

EMR Use Cases
• Already AWS customer
– Lots of data in S3 / DynamoDB / RDS
• Sporadic MapReduce needs
• Proof-of-concepting Hadoop
• Ease of use
– Seamless, near-infinite scale
– Simple administration

Hadoop in the Cloud: AWS Elastic Map Reduce
• What is EMR?
• How does EMR compare to Hadoop?
• Benefits & downsides
• Use cases

The document discusses using Amazon EMR to scale analytics workloads on AWS. It provides an overview of EMR and how it allows users to easily run Hadoop clusters on AWS. It discusses how EMR allows tuning clusters and reducing costs by using Spot instances. It also discusses using various AWS services like S3, HDFS and integrating various Hadoop ecosystem tools on EMR. It provides examples of using EMR for batch processing logs, as a long-running database and for ad-hoc analysis of large datasets. It emphasizes using S3 for persistent storage and provides best practices around file sizes, compression and bootstrap actions.

Scaling your Analytics with Amazon Elastic MapReduce (BDT301) | AWS re:Invent...

Amazon Web Services

AWS May Webinar Series - Getting Started with Amazon EMR

Amazon Web Services

This document provides an overview of Amazon Elastic MapReduce (EMR), a service that makes it easy to process large amounts of data using the Hadoop framework. It discusses how EMR allows users to launch Hadoop clusters in minutes, integrate with other AWS services for storage and databases, customize clusters using various Hadoop applications and design patterns, and pay only for the resources used. The document aims to demonstrate how EMR provides an easy, fast, secure and cost-effective way to run Hadoop in the cloud.

AWS EMR (Elastic Map Reduce) explained

Harsha KM

Amazon Elastic MapReduce Deep Dive and Best Practices (BDT404) | AWS re:Inven...

Amazon Web Services

Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch four years ago, our customers have launched more than 5.5 million Hadoop clusters. In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.

Best Practices for Managing Hadoop Framework Based Workloads (on Amazon EMR) ...

Amazon Web Services

Best Practices for Running Amazon EC2 Spot Instances with Amazon EMR - AWS On...

Amazon Web Services

Learning Objectives: - Learn how to run Amazon EMR clusters on Spot instances and significantly reduce the cost of processing vast amounts of data on managed Hadoop clusters - Understand key EC2 Spot Instances concepts and common usage patterns for maximum scale and cost optimization for Big Data workloads - See a few customer examples that show how to leverage the full scale of the AWS cloud for faster results

(BDT208) A Technical Introduction to Amazon Elastic MapReduce

Amazon Web Services

"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way. In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."

Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; Deployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively.

Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...

Amazon Web Services

Amazon EMR is a managed service that makes it easy for customers to use big data frameworks and applications like Apache Hadoop, Spark, and Presto to analyze data stored in HDFS or on Amazon S3, Amazon’s highly scalable object storage service. In this session, we will introduce Amazon EMR and the greater Apache Hadoop ecosystem, and show how customers use them to implement and scale common big data use cases such as batch analytics, real-time data processing, interactive data science, and more. Then, we will walk through a demo to show how you can start processing your data at scale within minutes.

Deep Dive - Amazon Elastic MapReduce (EMR)

Amazon Web Services

The document provides an overview of Amazon Elastic MapReduce (EMR) including how to easily launch and manage clusters, leverage Amazon S3 for storage, optimize file formats and storage, and design patterns for batch processing, interactive querying, and server clusters. It also shares lessons learned from Swiftkey including using Parquet and Cascalog for ETL, getting serialization right, avoiding many small files in S3, using spot instances, and experimenting with instance types. The document concludes by mentioning Apache Spark on EMR for faster in-memory processing directly from S3.

Deep Dive: Amazon Elastic MapReduce

Amazon Web Services

Amazon Elastic MapReduce (EMR) is a web service that allows you to easily and securely provision and manage your Hadoop clusters. In this talk, we will introduce you to Amazon EMR design patterns, such as using various data stores such as Amazon S3, how to take advantage of both transient and active clusters, as well as other Amazon EMR architectural patterns. We will dive deep on how to dynamically scale your cluster and address the ways you can fine-tune your cluster. We will discuss bootstrapping Hadoop applications from our partner ecosystem that you can use natively with Amazon EMR. Lastly, we will share best practices on how to keep your Amazon EMR cluster cost-effective.

Masterclass Live: Amazon EMR

Amazon Web Services

Abhishek Sinha is a senior product manager at Amazon for Amazon EMR. Amazon EMR allows customers to easily run data frameworks like Hadoop, Spark, and Presto on AWS. It provides a managed platform and tools to launch clusters in minutes that leverage the elasticity of AWS. Customers can customize clusters and choose from different applications, instances types, and access methods. Amazon EMR allows separating compute and storage where the low-cost S3 can be used for persistent storage while clusters are dynamically scaled based on workload.

(BDT305) Amazon EMR Deep Dive and Best Practices

Amazon Web Services

Getting Started with Amazon EMR

Arman Iman

(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...

Amazon Web Services

Amazon Elastic MapReduce is one of the largest Hadoop operators in the world. Since its launch five years ago, AWS customers have launched more than 5.5 million Hadoop clusters. In this talk, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.

Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...

Amazon Web Services

Growing too quickly may sound like a nice problem to have, unless you are the one having it. A growing business can’t afford not to keep up with customer demand and availability. Don’t be left behind. Come learn how start-ups Chute and Euclid kept up with real-time user-generated data from over 3,000 apps and 2 TB of metadata and stayed ahead of retail peak-time traffic, all with AWS. Hear how they used all that data on their own growth to propel their business even further and deepen relationships with customers. Not planning for growth is just like not planning to grow!

Masterclass Webinar: Amazon Elastic MapReduce (EMR)

Amazon Web Services

This webinar recording will explain how to get started with Amazon Elastic MapReduce (EMR). EMR enables fast processing of large structured or unstructured datasets, and in this webinar we'll demonstrated how to setup an EMR job flow to analyse application logs, and perform Hive queries against it. We'll review best practices around data file organisation on Amazon Simple Storage Service (S3), how clusters can be started from the AWS web console and command line, and how to monitor the status of a Map/Reduce job. The security configuration that allows direct access to the Amazon EMR cluster in interactive mode will be shown, and we'll see how Hive provides a SQL like environment, while allowing you to dynamically grow and shrink the amount of compute used for powerful data processing activities. Amazon EMR YouTube Recording: http://youtu.be/gSPh6VTBEbY

Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...

Amazon Web Services

AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices

Amazon Web Services

Amazon Elastic MapReduce (EMR) is one of the largest Hadoop operators in the world. Since its launch five years ago, our customers have launched more than 15 million Hadoop clusters inside of EMR. In this webinar, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters and other Amazon EMR architectural patterns. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost efficient.

Interactively Querying Large-scale Datasets on Amazon S3

Amazon Web Services

Organizations often need to quickly analyze large amounts of data, such as logs generated from a wide variety of sources and formats. However, traditional approaches require a lot of time and effort designing complex data transformation and loading processes; and configuring data warehouses. Using AWS, you can start querying your datasets within minutes. In this session you will learn how you can deploy a managed Presto environment in minutes to interactively query log data using standard ANSI SQL. Presto is a popular open source SQL engine for running interactive analytic queries against data sources of all sizes. We will talk about common use cases and best practices for running Presto on Amazon EMR.

Deep Dive: Amazon Elastic MapReduce

Amazon Web Services

Amazon Elastic MapReduce (Amazon EMR) is a web service that allows you to easily and securely provision and manage your Hadoop clusters. In this talk, we will introduce you to Amazon EMR design patterns, such as using various data stores like Amazon S3, how to take advantage of both transient and active clusters, and how to work with other Amazon EMR architectural patterns. We will dive deep on how to dynamically scale your cluster and address the ways you can fine-tune your cluster. We will discuss bootstrapping Hadoop applications from our partner ecosystem that you can use natively with Amazon EMR. Lastly, we will share best practices on how to keep your Amazon EMR cluster cost-effective.

Beyond EC2 and S3

Lorenzo Aiello

Masterclass Webinar - Amazon Elastic MapReduce (EMR)

Amazon Web Services

The document provides an overview of Amazon Elastic MapReduce (EMR) and how it can be used to process large amounts of data using Hadoop and other big data technologies in the AWS cloud. Some key points: - EMR allows users to run Hadoop frameworks and analytics tools like Hive and Pig on AWS using a web service API or command line tools. - It provides a managed Hadoop cluster and integrates with other AWS services for storage, networking, etc. allowing big data workloads to easily scale up and down based on need. - Users can launch EMR job flows to run data processing jobs, specifying options like instance types, numbers of nodes, bootstrap actions, and steps to execute across

Amazon EMR Deep Dive & Best Practices

Amazon Web Services

BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR

Amazon Web Services

Customers are migrating their analytics, data processing (ETL), and data science workloads running on Apache Hadoop, Spark, and data warehouse appliances from on-premise deployments to Amazon EMR in order to save costs, increase availability, and improve performance. Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. This session will focus on identifying the components and workflows in your current environment and providing the best practices to migrate these workloads to Amazon EMR. We will explain how to move from HDFS to Amazon S3 as a durable storage layer, and how to lower costs with Amazon EC2 Spot instances and Auto Scaling. Additionally, we will go over common security recommendations and tuning tips to accelerate the time to production.

Data Science & Best Practices for Apache Spark on Amazon EMR

Amazon Web Services

Organizations need to perform increasingly complex analysis on their data — streaming analytics, ad-hoc querying and predictive analytics — in order to get better customer insights and actionable business intelligence. However, the growing data volume, speed, and complexity of diverse data formats make current tools inadequate or difficult to use. Apache Spark has recently emerged as the framework of choice to address these challenges. Spark is a general-purpose processing framework that follows a DAG model and also provides high-level APIs, making it more flexible and easier to use than MapReduce. Thanks to its use of in-memory datasets (RDDs), embedded libraries, fault-tolerance, and support for a variety of programming languages, Apache Spark enables developers to implement and scale far more complex big data use cases, including real-time data processing, interactive querying, graph computations and predictive analytics. In this session, we present a technical deep dive on Spark running on Amazon EMR. You learn why Spark is great for ad-hoc interactive analysis and real-time stream processing, how to deploy and tune scalable clusters running Spark on Amazon EMR, how to use EMRFS with Spark to query data directly in Amazon S3, and best practices and patterns for Spark on Amazon EMR.

Autoscaling Spark on AWS EC2 - 11th Spark London meetup

Rafal Kwasny

This document discusses autoscaling Spark clusters on AWS for efficiency and cost-effectiveness. It presents a typical AWS architecture with Spark running on EC2 and data stored in S3. It describes how autoscaling works to dynamically adjust the number of EC2 instances based on demand metrics to match resource usage. The spark-cloud tool is introduced to simplify managing Spark clusters on AWS with features like building AMIs, starting and shutting down clusters, and using spot instances for lower costs compared to on-demand pricing. Autoscaling helps remove the need to manually scale clusters up and down.

Hadoop AWS infrastructure cost evaluation

mattlieber

From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...

Alexander Dean

Hadoop is everywhere these days, but it can seem like a complex, intimidating ecosystem to those who have yet to jump in. In this hands-on workshop, Alex Dean, co-founder of Snowplow Analytics, will take you "from zero to Hadoop", showing you how to run a variety of simple (but powerful) Hadoop jobs on Elastic MapReduce, Amazon's hosted Hadoop service. Alex will start with a no-nonsense overview of what Hadoop is, explaining its strengths and weaknesses and why it's such a powerful platform for data warehouse practitioners. Then Alex will help get you setup with EMR and Amazon S3, before leading you through a very simple job in Pig, a simple language for writing Hadoop jobs. After this we will move onto writing a more advanced job in Scalding, Twitter's Scala API for writing Hadoop jobs. For our final job, we will consolidate everything we have learnt by building a more sophisticated job in Scalding.

What's hot

Big data with amazon EMR - Pop-up Loft Tel Aviv

Amazon Web Services

Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...

Amazon Web Services

Deep Dive - Amazon Elastic MapReduce (EMR)

Amazon Web Services

Deep Dive: Amazon Elastic MapReduce

Amazon Web Services

Masterclass Live: Amazon EMR

Amazon Web Services

(BDT305) Amazon EMR Deep Dive and Best Practices

Amazon Web Services

Getting Started with Amazon EMR

Arman Iman

(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...

Amazon Web Services

Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...

Amazon Web Services

Masterclass Webinar: Amazon Elastic MapReduce (EMR)

Amazon Web Services

Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...

Amazon Web Services

AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices

Amazon Web Services

Interactively Querying Large-scale Datasets on Amazon S3

Amazon Web Services

Deep Dive: Amazon Elastic MapReduce

Amazon Web Services

Beyond EC2 and S3

Lorenzo Aiello

Masterclass Webinar - Amazon Elastic MapReduce (EMR)

Amazon Web Services

Amazon EMR Deep Dive & Best Practices

Amazon Web Services

BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR

Amazon Web Services

Data Science & Best Practices for Apache Spark on Amazon EMR

Amazon Web Services

Autoscaling Spark on AWS EC2 - 11th Spark London meetup

Rafal Kwasny

What's hot (20)

Big data with amazon EMR - Pop-up Loft Tel Aviv

Apache Hadoop and Spark on AWS: Getting started with Amazon EMR - Pop-up Loft...

Deep Dive - Amazon Elastic MapReduce (EMR)

Deep Dive: Amazon Elastic MapReduce

Masterclass Live: Amazon EMR

(BDT305) Amazon EMR Deep Dive and Best Practices

Getting Started with Amazon EMR

(SDD401) Amazon Elastic MapReduce Deep Dive and Best Practices | AWS re:Inven...

Scaling your Application for Growth using Automation (CPN209) | AWS re:Invent...

Masterclass Webinar: Amazon Elastic MapReduce (EMR)

Tune your Big Data Platform to Work at Scale: Taking Hadoop to the Next Level...

AWS Webcast - Amazon Elastic Map Reduce Deep Dive and Best Practices

Interactively Querying Large-scale Datasets on Amazon S3

Deep Dive: Amazon Elastic MapReduce

Beyond EC2 and S3

Masterclass Webinar - Amazon Elastic MapReduce (EMR)

Amazon EMR Deep Dive & Best Practices

BDA 302 Deep Dive on Migrating Big Data Workloads to Amazon EMR

Data Science & Best Practices for Apache Spark on Amazon EMR

Autoscaling Spark on AWS EC2 - 11th Spark London meetup

Viewers also liked

Hadoop AWS infrastructure cost evaluation

mattlieber

From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...

Alexander Dean

Big Data & Analytics: End to End on AWS - Technical 101

Amazon Web Services

Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

Amazon Web Services

A few years ago, Netflix had a fairly classic business intelligence tech stack. Now, things have changed. Netflix is a heavy user of AWS for much of its ongoing operations, and Data Science & Engineering (DSE) is no exception. In this talk, we dive into the Netflix DSE architecture: what and why. Key topics include their use of Big Data technologies (Cassandra, Hadoop, Pig + Python, and Hive); their Amazon S3 central data hub; their multiple persistent Amazon EMR clusters; how they benefit from AWS elasticity; their data science-as-a-service approach, how they made a hybrid AWS/data center setup work well, their open-source Hadoop-related software, and more.

AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015

Amazon Web Services Korea

이제 빅데이터란 개념은 익숙한 것이 되었지만 이를 비지니스에 적용하고 최대의 효과를 얻는 방법에 대한 고찰은 여전히 필요합니다. 소중한 데이터를 쉽게 저장 및 분석하고 시각화하는 것은 비즈니스에 대한 통찰을 얻기 위한 중요한 과정입니다. 이 강연에서는 AWS Elastic MapReduce, Amazon Redshift, Amazon Kinesis 등 AWS가 제공하는 다양한 데이터 분석 도구를 활용해 보다 간편하고 빠른 빅데이터 분석 서비스를 구축하는 방법에 대해 소개합니다.

Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...

Amazon Web Services

This document describes how to build a web analytics service using node.js, Amazon DynamoDB, and Amazon Elastic MapReduce (EMR). Node.js servers collect minute-level analytics data and write it to DynamoDB. EMR runs Hadoop jobs that roll up the minute-level data into hourly, daily, and monthly aggregates which are also stored in DynamoDB. The system can process billions of data points per month from major websites and provide analytics data at different granularities to applications through a RESTful API.

Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks

Amazon Web Services

The document provides information about a webinar on getting started with AWS, including deploying a static website. It outlines the agenda which includes: watching a 15 minute presentation on AWS; watching a 25 minute demo of deploying a static website; and having 45-60 minutes to complete the demo independently. It then details the various sections of the webinar which cover creating an AWS account, enabling security features, using S3 buckets to host the website, configuring permissions, associating a domain name, and using CloudFront for acceleration.

AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...

Amazon Web Services

Amazon EMR is one of the largest Hadoop operators in the world. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features. This session will feature Asurion, a provider of device protection and support services for over 280 million smartphones and other consumer electronics devices. Asurion will share how they architected their petabyte-scale data platform using Apache Hive, Apache Spark, and Presto on Amazon EMR.

Samsung hope for children tehran

samsungmena

This document summarizes Tehran Samsung's CSR program called "Samsung Hope for Children" which provides financial aid for heart surgeries for children. It outlines a two day event in November including a dinner, press conference with doctors and Samsung director, a visit to the Samsung factory, and an invitation for 50 children and their families, press members, and hospital staff. It also shows the increasing number of children supported and money spent by the program from 2007 to 2010, as well as increased media coverage of the press conference over previous years.

P1161211140

Ashraf Aboshosha

This document presents the work plan for a study comparing different techniques for analyzing medical images with and without Gaussian blur filtering. The plan includes an introduction to Gaussian blur filtering and the statistical tests (t-test, F-test, z-test) that will be used. The methodology describes applying Gaussian blur to images, extracting samples, and using the statistical tests to determine if there are significant differences between samples and which technique is most accurate. The results section presents example images and statistical test distributions.

P1141218183

Ashraf Aboshosha

This document analyzes alternatives to traditional alphanumeric passwords including enhancements to traditional passwords and replacements. It discusses various options such as one-time passwords, certificate-based passwords, biometrics, and graphical passwords. It evaluates each option based on ease of use, ease of implementation, security, and versatility. The document concludes that properly chosen traditional alphanumeric passwords currently work better than other available alternatives.

Opeb cost control siia may 2008

Jim van Iwaarden

This document discusses strategies for controlling costs associated with Other Post-Employment Benefits (OPEB). It outlines several options including: 1) For small employers (<50 employees), charging retirees the actual cost of health benefits rather than a blended rate, eliminating implicit subsidies. 2) Making adjustments to existing OPEB plans like changing prescription drug copays or eligibility requirements. 3) Pre-funding OPEB liabilities through irrevocable trusts like VEBAs, which can provide higher discount rates and investment flexibility. 4) Transitioning to defined contribution accounts for new hires, replacing open-ended liabilities with known costs and more secure benefits for employees. It

Nightingale Features Showcase

Martin Giger

opendatahub teamMiguel Aprossine

Iphone Presentation Wash U042110

The Loud Few

Robin Rath presented on her experience managing the mobile application Radial 50. She discussed the importance of having a unique and engaging idea, getting feedback throughout the design and development process, and setting clear timelines and requirements. Rath emphasized the importance of marketing both before and after launch, including generating hype through blogs, social media, and media outreach. Her key takeaways were to be confident in your idea, get feedback at every step, set and stick to timelines, and use the experience to open future opportunities.

Powert point surfDepartament d'Educació - Generalitat de Catalunya

Protect Your Heart

Pk Doctors

P1121102462

Ashraf Aboshosha

This document describes a videogame called MazeMaze that aims to adapt to the user's emotions based on their behavior in the game. It analyzes the user's movements to recognize emotions like interest, boredom, confusion and desperation. Based on the recognized emotion, the game will take actions like providing help, distractions, or messages to calm the user down. The goal is to create an interactive experience that keeps the user engaged. The game was programmed in C++ and analyzes movement data to classify the user's emotional state. It then takes targeted actions to facilitate the user's experience based on principles from affective computing and emotion theory.

ASC: Integrating Technology into Construction and Engineering Courses

guestb8f153b

The document discusses integrating learning technologies into engineering and construction courses at Wentworth Institute of Technology. It covers topics like collaboration tools to build collaboration in the classroom, active learning tools to provide more opportunities for student interaction, and communication tools to allow for more student communication. The presentation recommends selecting tools based on skills used rather than the tool itself and employing a variety of assessment methods.

Viewers also liked (19)

Hadoop AWS infrastructure cost evaluation

From Zero to Hadoop: a tutorial for getting started writing Hadoop jobs on Am...

Big Data & Analytics: End to End on AWS - Technical 101

Data Science at Netflix with Amazon EMR (BDT306) | AWS re:Invent 2013

AWS를 활용한 첫 빅데이터 프로젝트 시작하기(김일호)- AWS 웨비나 시리즈 2015

Build Your Web Analytics with node.js, Amazon DynamoDB and Amazon EMR (BDT203...

Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks

AWS re:Invent 2016: Deep Dive: Amazon EMR Best Practices & Design Patterns (B...

Samsung hope for children tehran

P1161211140

P1141218183

Opeb cost control siia may 2008

Nightingale Features Showcase

opendatahub team

Iphone Presentation Wash U042110

Powert point surf

Protect Your Heart

P1121102462

ASC: Integrating Technology into Construction and Engineering Courses

Similar to Hadoop in the cloud with AWS' EMR

Amazon Elastic Map Reduce: the concepts

Julien SIMON

Cloud & Native Cloud for Managers

Eitan Sela

Cloud Computing with Amazon Web Services. AWS Cloud Solutions - Websites, Archiving, Data Lakes and Analytics, Serverless Computing, Internet of Things and more. Containers in AWS - Amazon Elastic Container Service, Fargate, and EKS Big Data and the Data lake implementation in AWS Machine Learning with Amazon SageMaker - Build, train, and deploy machine learning models at scale. AWS Identity and Access Management (IAM) - Securely manage access to AWS services and resources. AWS Pricing - How does AWS pricing work?

Introduction to AWS and Docker on ECS

CloudHesive

The document provides an introduction to AWS and Docker on ECS for microservice deployment. It discusses: - An overview of what will be covered including introductions to cloud computing, AWS services, Docker on ECS, and a Q&A. - Key benefits of moving to the cloud like cost savings, scalability, availability, security and manageability. - An introduction to AWS including popular services like EC2, S3, RDS, and a history of AWS innovation. - A discussion of Docker concepts like images, containers, registries and how Docker compares to traditional virtualization. - An overview of ECS terminology like clusters, tasks and scheduling and what advantages it provides over rolling your

AWS 101 - An Introduction to the Amazon Cloud

CloudHesive

This document provides an introduction to Amazon Web Services (AWS) presented by Patrick Hannah, VP of Engineering at CloudHesive. It begins with an overview of cloud computing benefits like cost savings, scalability, availability and security. It then discusses where to start with AWS, including documentation, concepts of regions/availability zones and categories of services. The document outlines AWS' global infrastructure and breadth of services across computing, storage, databases, networking, developer tools and more. It concludes with best practices like leveraging different storage options and architectures for AWS like lift-and-shift or cloud-native.

Cost Optimization with Spot Instances

Arun Sirimalla

This document summarizes Amazon Web Services for cost optimization with spot instances. It discusses using spot instances with Amazon Elastic MapReduce (EMR) to process vast amounts of data across AWS at a lower cost compared to on-demand instances. It provides an overview of AWS regions, availability zones, VPC, EC2, S3, and EMR instance groups for separating compute and storage across dynamically scalable EC2 instances with S3 as the persistent data store.

Apache Spark and the Hadoop Ecosystem on AWS

Amazon Web Services

The document provides an overview of Apache Spark and Hadoop ecosystem tools on Amazon EMR including Spark, Hive on Tez, and Presto. It discusses building data lakes with Amazon EMR and S3, running jobs and security options, and customer use cases. The demo shows Zeppelin and Hue interfaces. Examples are given of Netflix using Presto on EMR with a 25PB dataset and FINRA saving 60% costs by moving to HBase on EMR.

AWS Certified Solutions Architect Professional Course S15-S18

Neal Davis

Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR

Amazon Web Services

The document provides an introduction to Apache Spark, Hive on Tez, and Presto on Amazon EMR. It discusses how to build data lakes using Amazon S3 for storage and Amazon EMR for processing. It also covers running jobs on EMR clusters, security options, and two customer use cases - one by FINRA that saved 60% costs by moving to HBase on EMR, and one by Netflix that uses Presto on EMR for a 25PB dataset in S3.

Building and scaling your containerized microservices on Amazon ECS

Amazon Web Services

This document provides an overview of using Amazon EC2 Container Service (ECS) to build and scale containerized microservices. It discusses microservices concepts, introduces ECS as a container management system, outlines some ECS best practices around version control, load balancing, resource usage, and alerts. It also describes how to use the AWS CLI to automate container lifecycles on ECS including creating clusters, registering tasks, deploying services, scaling, and deleting resources.

Introduction to Batch Processing on AWS

Amazon Web Services

This document provides an overview and agenda for a presentation on batch processing solutions on AWS. It discusses batch computing challenges and needs, why the cloud is suitable for batch workloads, and options for running batch jobs on AWS including AWS Batch and Amazon ECS. It provides details on how AWS Batch and ECS work, examples of using them for batch processing, and best practices like leveraging spot instances. The presentation demonstrates how companies can build massively scalable systems on AWS for batch-oriented workloads like processing maps at scale.

AWS Distilled

Jeyaram Gurusamy

PASS 17 SQL Server on AWS Best Practices

Amazon Web Services

Vlad Vlasceanu, a specialist solutions architect at AWS, presented best practices for deploying SQL Server on Amazon Web Services. He discussed deployment options for SQL Server on Amazon EC2 and Amazon RDS, highlighting their differences. He then provided recommendations for optimizing SQL Server performance and high availability when using Amazon EC2 and Amazon RDS, focusing on storage, availability zones, and configuration management. The presentation aimed to help customers design, deploy, and optimize SQL Server workloads effectively on AWS.

AWS Black Belt Tips

Amazon Web Services

AWS Black Belt Tips

Amazon Web Services

EMR Training

vishal192091

This document provides an overview of Amazon EMR (Elastic MapReduce), a managed cluster platform for big data processing using Apache Hadoop and Spark. It discusses the basic architecture including master nodes, core nodes, and task nodes. It also covers launch types, storage options like HDFS, S3, and EMRFS, managed scaling, security features, and pricing. The latter part includes hands-on examples for running Spark jobs on EMR and interacting with the cluster.

Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR

Amazon Web Services

by Dario Rivera, Solutions Architect, AWS Amazon EMR is a managed service that lets you process and analyze extremely large data sets using the latest versions of over 15 open-source frameworks in the Apache Hadoop and Spark ecosystems. In this session, we introduce you to Amazon EMR design patterns such as using Amazon S3 instead of HDFS, taking advantage of both long and short-lived clusters, and other Amazon EMR architectural best practices. We talk about how to scale your cluster up or down dynamically and introduce you to ways you can fine-tune your cluster. We also share best practices to keep your Amazon EMR cluster cost-efficient. Finally, we dive into some of our recent launches to keep you current on our latest features. This session will feature Asurion, a provider of device protection and support services for over 280 million smartphones and other consumer electronics devices.

PHP LAMP AWS RightSscale

maxgribov

This document summarizes a presentation about experiences using AWS and RightScale cloud management tools. It describes the basic AWS services like EC2, EBS, and S3. It also discusses how RightScale supports advanced AWS services and provides templates and scripts to automate server provisioning and management. Finally, it outlines how RightScale was used to set up a production environment with load balancing, auto-scaling web servers, and database servers across multiple availability zones for high availability.

AWS Black Belt Tips

Amazon Web Services

[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介

Amazon Web Services Japan

Amazon Redshift is a fully managed petabyte-scale data warehouse service in the cloud. It provides fast query performance at a very low cost. Updates since re:Invent 2013 include new features like distributed tables, remote data loading, approximate count distinct, and workload queue memory management. Customers have seen query performance improvements of 20-100x compared to Hive and cost reductions of 50-80%. Amazon Redshift makes it easy to setup, operate, and scale a data warehouse without having to worry about provisioning and managing hardware.

Accelerate SQL Server Migration to the AWS Cloud

Datavail

In today’s marketplace, moving to the public Cloud is a familiar and consistent trend within the SQL Server community. But which cloud provider do you choose? After all there are different AWS instances each with their own distinctive features. Migrations to the cloud are only going to gain greater momentum as organizations grapple with their on-premises alternatives. Recent cloud breaches may have some organizations hesitant to take the leap and move to the cloud, however market-leading cloud providers are making every attempt in adhering to compliance guidelines while boosting their security framework and reliability offerings. They are also becoming more competitive by managing their cost more effectively. For both homogeneous and heterogeneous migrations, planning plays a critical role in moving to the cloud. Preparing a checklist and asking the right questions to stakeholders lays the groundwork in this planning. There are different methods to migrate databases from on-premises to the AWS cloud. This webinar is in partnership with PASS, download the recording to learn more about: Reasons to go to the cloud SQL Server on AWS EC2 vs. AWS RDS SQL Server high availability (HA) & disaster recovery (DR) SQL Server migration methodology DBAs role in the cloud

Similar to Hadoop in the cloud with AWS' EMR (20)

Amazon Elastic Map Reduce: the concepts

Cloud & Native Cloud for Managers

Introduction to AWS and Docker on ECS

AWS 101 - An Introduction to the Amazon Cloud

Cost Optimization with Spot Instances

Apache Spark and the Hadoop Ecosystem on AWS

AWS Certified Solutions Architect Professional Course S15-S18

Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR

Building and scaling your containerized microservices on Amazon ECS

Introduction to Batch Processing on AWS

AWS Distilled

PASS 17 SQL Server on AWS Best Practices

AWS Black Belt Tips

EMR Training

Spark and the Hadoop Ecosystem: Best Practices for Amazon EMR

PHP LAMP AWS RightSscale

AWS Black Belt Tips

[よくわかるAmazon Redshift in 大阪]Amazon Redshift最新情報と導入事例のご紹介

Accelerate SQL Server Migration to the AWS Cloud

More from rICh morrow

IoT Stream Conf Keynote: Past, Present and Future of IoT

rICh morrow

This document discusses the past, present, and future of the Internet of Things (IoT). It describes how IoT has evolved from individual technology platforms to integrated technology stacks. Currently, IoT mainly involves connecting industrial machines and consumer devices. However, the future IoT is expected to include 25 billion connected devices by 2020 communicating in real-time to optimize processes. This will create new challenges around device and data variety, velocity, and security as IoT systems scale to become the central way that everything interacts digitally.

PHP from soup to nuts Course Deck

rICh morrow

"PHP from soup to nuts" -- lab exercises

rICh morrow

This document provides instructions for setting up a LAMP (Linux, Apache, MySQL, PHP) development environment on Amazon Web Services (AWS) for completing a series of PHP/LAMP labs. It describes launching an EC2 Linux instance on AWS, installing the LAMP stack, and downloading lab code files. The labs cover topics like control structures, data types, input/output, forms, files, cookies, sessions, and regular expressions. Students are instructed to stop their EC2 instance each day to avoid costs when not in use.

EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...

rICh morrow

No sql distilled-distilled

rICh morrow

This document provides an overview and introduction to NoSQL databases. It discusses how NoSQL databases were developed to address issues with scaling relational databases to handle large volumes of data with high velocity. The document outlines several categories of NoSQL databases, including key-value, document, columnar, and graph databases, and provides examples of databases that fall within each category. It also discusses some of the core concepts in NoSQL, such as eventual consistency and relaxing ACID properties, in order to prioritize availability and partition tolerance at scale.

quicloud Apr 20 2010 Boulder New Tech Presentation

rICh morrow

More from rICh morrow (6)

IoT Stream Conf Keynote: Past, Present and Future of IoT

PHP from soup to nuts Course Deck

"PHP from soup to nuts" -- lab exercises

EC2 Pricing Model (deck 0307 of the InfiniteSkills AWS course at http://bit.l...

No sql distilled-distilled

quicloud Apr 20 2010 Boulder New Tech Presentation

Recently uploaded

Digital Marketing Trends in 2024 | Guide for Staying Ahead

Wask

https://www.wask.co/ebooks/digital-marketing-trends-in-2024 Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.

UI5 Controls simplified - UI5con2024 presentation

Wouter Lemaire

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence

IndexBug

5th LF Energy Power Grid Model Meet-up Slides

DanBrown980551

5th Power Grid Model Meet-up It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology. Power Grid Model The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services. Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability. Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization. What to expect For the upcoming meetup we are organizing, we have an exciting lineup of activities planned: -Insightful presentations covering two practical applications of the Power Grid Model. -An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024. -An interactive brainstorming session to discuss and propose new feature requests. -An opportunity to connect with fellow Power Grid Model enthusiasts and users.

20240607 QFM018 Elixir Reading List May 2024

Matthew Sinclair

Building Production Ready Search Pipelines with Spark and Milvus

Zilliz

How to use Firebase Data Connect For Flutter

Daiki Mogmet Ito

Generating privacy-protected synthetic data using Secludy and Milvus

Zilliz

During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

panagenda

Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/ DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen! Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell. Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten. Diese Themen werden behandelt - Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten - Wie funktionieren CCB- und CCX-Lizenzen wirklich? - Verstehen des DLAU-Tools und wie man es am besten nutzt - Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw. - Praxisbeispiele und Best Practices zum sofortigen Umsetzen

Energy Efficient Video Encoding for Cloud and Edge Computing Instances

Alpen-Adria-Universität

Best 20 SEO Techniques To Improve Website Visibility In SERP

Pixlogix Infotech

Presentation of the OECD Artificial Intelligence Review of Germany

innovationoecd

Driving Business Innovation: Latest Generative AI Advancements & Success Story

Safe Software

Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency. During the hour, we’ll take you through: Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board. Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes. Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI. We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI. This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!

Choosing The Best AWS Service For Your Website + API.pptx

Brandon Minnick, MBA

Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API? Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose? Which one is cheapest? Which one is fastest? Which one will scale to meet our needs? Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

Chart Kalyan

Main news related to the CCS TSI 2023 (2023/1695)

Jakub Marek

An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers. The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 . The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .

Serial Arm Control in Real Time Presentation

tolgahangng

Introduction of Cybersecurity with OSS at Code Europe 2024

Hiroshi SHIBATA

I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems. The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS. Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application. I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

Edge AI and Vision Alliance

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/ Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit. In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing. van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.

20240609 QFM020 Irresponsible AI Reading List May 2024

Matthew Sinclair

Recently uploaded (20)

Digital Marketing Trends in 2024 | Guide for Staying Ahead

UI5 Controls simplified - UI5con2024 presentation

AI 101: An Introduction to the Basics and Impact of Artificial Intelligence

5th LF Energy Power Grid Model Meet-up Slides

20240607 QFM018 Elixir Reading List May 2024

Building Production Ready Search Pipelines with Spark and Milvus

How to use Firebase Data Connect For Flutter

Generating privacy-protected synthetic data using Secludy and Milvus

HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU

Energy Efficient Video Encoding for Cloud and Edge Computing Instances

Best 20 SEO Techniques To Improve Website Visibility In SERP

Presentation of the OECD Artificial Intelligence Review of Germany

Driving Business Innovation: Latest Generative AI Advancements & Success Story

Choosing The Best AWS Service For Your Website + API.pptx

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf

Main news related to the CCS TSI 2023 (2023/1695)

Serial Arm Control in Real Time Presentation

Introduction of Cybersecurity with OSS at Code Europe 2024

“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...

20240609 QFM020 Irresponsible AI Reading List May 2024

Hadoop in the cloud with AWS' EMR

1. Hadoop in the Cloud: AWS Elastic Map Reduce • What is EMR? • How does EMR compare to Hadoop? • Use cases

2. EMR is an AWS Service • AWS review helpful to understand • Infiniteskills offers a course! – http://bit.ly/learn-aws • AWS constantly changing and evolving http://aws.amazon.com/documentation/elasticmapreduce/

3. EMR Overview • Abstracts out cluster setup & management – Integrated provisioning, tooling, debug, monitoring – AWS constantly tuning and optimizing – Failed nodes automatically re-provisioned by AWS • Reduced costs – Clusters shut down automatically by default – Excellent for sporadic MapReduce needs • Integration to AWS – Leverage cost-effective EC2 instances for processing, S3 for storage – Monitoring done via CloudWatch

4. EMR Architecture Master Instance Group EC2 S3 Core Instance Group EC2EC2 HDFS HDFS Task Instance Group EC2 EC2 EC2 EC2 • Master group controls cluster • Core group runs DataNode & TaskTracker daemons • Task group runs tasks • Can be added & removed • S3 can be used for data input / output • Master group coordinates core + task activities and manages cluster state • Core + task instances read / write to / from S3

5. EMR AWS Integration • Datastore pull / push to – RDS – DynamoDB – S3 • Derived data can be stored in RedShift – Via AWS DataPipelines – Further post-processing • Data can be pre-processed with Kinesis

6. What you give up with EMR • Control – Always 2-3 months behind Hadoop releases – Cannot use CDH or HDP releases (although MapR is supported) • Speed (if you’re not an AWS customer) • Vendor lock-in

7. EMR Use Cases • Already AWS customer – Lots of data in S3 / DynamoDB / RDS • Sporadic MapReduce needs • Proof-of-concepting Hadoop • Ease of use – Seamless, near-infinite scale – Simple administration

8. Hadoop in the Cloud: AWS Elastic Map Reduce • What is EMR? • How does EMR compare to Hadoop? • Benefits & downsides • Use cases

Hadoop in the cloud with AWS' EMR

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (19)

Similar to Hadoop in the cloud with AWS' EMR

Similar to Hadoop in the cloud with AWS' EMR (20)

More from rICh morrow

More from rICh morrow (6)

Recently uploaded

Recently uploaded (20)

Hadoop in the cloud with AWS' EMR