The document provides an overview of Amazon Elastic MapReduce (EMR), a web service that allows users to easily and cost-effectively process large amounts of data using a Hadoop cluster. It discusses how EMR handles provisioning Hadoop clusters on EC2 instances, installing and configuring Hadoop, monitoring jobs, and providing debugging tools. It also covers how to get started with EMR, including signing up for accounts, installing the command line client, and filling out credentials. Tips are provided on customizing clusters using bootstrap actions and developing MapReduce applications on EMR.
The Future is Now: Leveraging the Cloud with RubyRobert Dempsey
My presentation from the Ruby Hoedown on cloud computing and how Ruby developers can take advantage of cloud services to build scalable web applications.
Castles in the Cloud: Developing with Google App Enginecatherinewall
App Engine offers developers the opportunity to deploy systems on Google's robust and scalable server-farms. App Engine provides a higher-level platform than Amazon Web Services,with automated scaling and true pay-per-use billing.
The poster-child of App Engine, "BuddyPoke", has gained over thirty million users.
With App Engine, Google has released the first public API to BigTable, its planetary datastore, which performs successfully at petabyte scale across diverse applications from search to finance to Google Earth.
This presentation will cover App Engine's features and limitations, and how to exploit this new and evolving platform.
Amazon Elastic Compute Cloud (Amazon EC2) provides resizable compute capacity in the cloud and is often the starting point for your first week using AWS. This session will introduce these concepts, along with the fundamentals of EC2, by employing an agile approach that is made possible by the cloud. Attendees will experience the reality of what a first week on EC2 looks like from the perspective of someone deploying an actual application on EC2. You will follow them as they progress from deploying their entire application from an EC2 AMI on day 1 to more advanced features and patterns available in EC2 by day 5. Throughout the process we will identify cloud best practices that can be applied to your first week on EC2 and beyond.
How to Upgrade Your Database Plan on Heroku and Rails Setup?Katy Slemon
Heroku is one-stop solution to upgrade your database plan and deploy it using RoR. Let’s find out steps to upgrade your database plan on Heroku and Rails Setup.
Slides for an introductory workshop on cloud computing for a web app developer audience at FOWA Miami 09 (http://events.carsonified.com/fowa/2009/miami/workshops#workshop_36)
Master Chef class: learn how to quickly cook delightful CQ/AEM infrastructuresFrançois Le Droff
ConnectCon 2014 presentation
Francois and Nicolas share their latest experiment coding AEM 6 infrastructure with Chef. Learn how to start from bare metal - virtual, physical or cloud - servers and turn them, in matter of minutes, into a production ready AEM 6 infrastructure. Think author and publish farms, optional SSL, dispatcher, and clustering with MongoDB) Meanwhile you’ll be given a comprehensive overview of Chef resources and techniques enabling you to accelerate, scale, simplify and secure your development and release workflow.
The Future is Now: Leveraging the Cloud with RubyRobert Dempsey
My presentation from the Ruby Hoedown on cloud computing and how Ruby developers can take advantage of cloud services to build scalable web applications.
Castles in the Cloud: Developing with Google App Enginecatherinewall
App Engine offers developers the opportunity to deploy systems on Google's robust and scalable server-farms. App Engine provides a higher-level platform than Amazon Web Services,with automated scaling and true pay-per-use billing.
The poster-child of App Engine, "BuddyPoke", has gained over thirty million users.
With App Engine, Google has released the first public API to BigTable, its planetary datastore, which performs successfully at petabyte scale across diverse applications from search to finance to Google Earth.
This presentation will cover App Engine's features and limitations, and how to exploit this new and evolving platform.
Amazon Elastic Compute Cloud (Amazon EC2) provides resizable compute capacity in the cloud and is often the starting point for your first week using AWS. This session will introduce these concepts, along with the fundamentals of EC2, by employing an agile approach that is made possible by the cloud. Attendees will experience the reality of what a first week on EC2 looks like from the perspective of someone deploying an actual application on EC2. You will follow them as they progress from deploying their entire application from an EC2 AMI on day 1 to more advanced features and patterns available in EC2 by day 5. Throughout the process we will identify cloud best practices that can be applied to your first week on EC2 and beyond.
How to Upgrade Your Database Plan on Heroku and Rails Setup?Katy Slemon
Heroku is one-stop solution to upgrade your database plan and deploy it using RoR. Let’s find out steps to upgrade your database plan on Heroku and Rails Setup.
Slides for an introductory workshop on cloud computing for a web app developer audience at FOWA Miami 09 (http://events.carsonified.com/fowa/2009/miami/workshops#workshop_36)
Master Chef class: learn how to quickly cook delightful CQ/AEM infrastructuresFrançois Le Droff
ConnectCon 2014 presentation
Francois and Nicolas share their latest experiment coding AEM 6 infrastructure with Chef. Learn how to start from bare metal - virtual, physical or cloud - servers and turn them, in matter of minutes, into a production ready AEM 6 infrastructure. Think author and publish farms, optional SSL, dispatcher, and clustering with MongoDB) Meanwhile you’ll be given a comprehensive overview of Chef resources and techniques enabling you to accelerate, scale, simplify and secure your development and release workflow.
Optimize Site Deployments with Drush (DrupalCamp WNY 2011)Jon Peck
When a site goes live, are you crossing your fingers or are you confident that everything is configured? Are you looking to manage and optimize site deployments like any other operational process? Do you find it impossible to create development, test and production environments that act the same every time? Do you have a custom set of modules or configurations that you rely on for all your sites?
This session will teach you how to optimize your site deployments with open tools such as drush, drush make, features, leveraging software versioning systems such as subversion and git. Beyond these projects, the session will train you to develop your own custom modules for consistent and precise deployments including variables, users, content types, nodes, imagecache presets, menus, blocks, theme configuration, and more.
Using these techniques you can automate and optimize your deployment procedures, giving you technical flexibility and saving valuable time.
AWS Cloud Design Patterns (a.k.a. CDP) are generally repeatable solutions to commonly occurring problems in cloud architecting. In this session, we introduce CDP and explain how you can apply CDPs in practical scenarios such as photo sharing, e-commerce, and web site campaigns.
Deep Learning for Developers (December 2017)Julien SIMON
Talk @ Code Europe, Poland, December 5th, 2017
- An introduction to Deep Learning
- An introduction to Apache MXNet
- Demos using Jupyter notebooks on Amazon SageMaker
- Resources
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019AWS Summits
This talk will dive deep into Amazon ECS. We will take a look at recently added ECS features, like target tracking autoscaling, service discovery, daemon scheduling, task networking, and GPU pinning, including live demos!
Case analysis exploring ebay's strategic options. Comparisons made against Amazon.com's 1500%+ growth over the past decade versus ebay's 50%+ growth, revenues, margins and ownership of key assets, supply chain, etc.
Optimize Site Deployments with Drush (DrupalCamp WNY 2011)Jon Peck
When a site goes live, are you crossing your fingers or are you confident that everything is configured? Are you looking to manage and optimize site deployments like any other operational process? Do you find it impossible to create development, test and production environments that act the same every time? Do you have a custom set of modules or configurations that you rely on for all your sites?
This session will teach you how to optimize your site deployments with open tools such as drush, drush make, features, leveraging software versioning systems such as subversion and git. Beyond these projects, the session will train you to develop your own custom modules for consistent and precise deployments including variables, users, content types, nodes, imagecache presets, menus, blocks, theme configuration, and more.
Using these techniques you can automate and optimize your deployment procedures, giving you technical flexibility and saving valuable time.
AWS Cloud Design Patterns (a.k.a. CDP) are generally repeatable solutions to commonly occurring problems in cloud architecting. In this session, we introduce CDP and explain how you can apply CDPs in practical scenarios such as photo sharing, e-commerce, and web site campaigns.
Deep Learning for Developers (December 2017)Julien SIMON
Talk @ Code Europe, Poland, December 5th, 2017
- An introduction to Deep Learning
- An introduction to Apache MXNet
- Demos using Jupyter notebooks on Amazon SageMaker
- Resources
Deep Dive on Amazon Elastic Container Service (ECS) | AWS Summit Tel Aviv 2019AWS Summits
This talk will dive deep into Amazon ECS. We will take a look at recently added ECS features, like target tracking autoscaling, service discovery, daemon scheduling, task networking, and GPU pinning, including live demos!
Case analysis exploring ebay's strategic options. Comparisons made against Amazon.com's 1500%+ growth over the past decade versus ebay's 50%+ growth, revenues, margins and ownership of key assets, supply chain, etc.
Managed services such as AWS Lambda and API Gateway allow developers to focus on value adding development instead of IT heavy lifting. This workshop introduces how to build a simple REST blog backend using AWS technologies and the serverless framework.
Taking a look at different cloud providers and how easy it is to deploy a basic Grails application to them. Created for the http://sfgrails.com meetup Feb 2011.
Scaling drupal horizontally and in cloudVladimir Ilic
Vancouver Drupal group presentation for April 25, 2013.
How to deploy Drupal on
- multiple web servers,
- multiple web and database servers, and
- how to join all that together and make site deployed on Amazon Cloud (Virtual Private Cloud) inside
- one availability zone
- multiple availability zones deployment.
Session cover details about what you need in order to get Drupal deployed on separate servers, what are issues/concerns, and how to solve them.
(SDD420) Amazon WorkSpaces: Advanced Topics and Deep Dive | AWS re:Invent 2014Amazon Web Services
Amazon WorkSpaces is an enterprise desktop computing service in the cloud. In this session, we dive deep into configuration, administration, and advanced networking topics for WorkSpaces. We also discuss integration of WorkSpaces to your corporate active directory and best practices for enabling your WorkSpaces to access resources on your corporate intranet.
(BDT208) A Technical Introduction to Amazon Elastic MapReduceAmazon Web Services
"Amazon EMR provides a managed framework which makes it easy, cost effective, and secure to run data processing frameworks such as Apache Hadoop, Apache Spark, and Presto on AWS. In this session, you learn the key design principles behind running these frameworks on the cloud and the feature set that Amazon EMR offers. We discuss the benefits of decoupling compute and storage and strategies to take advantage of the scale and the parallelism that the cloud offers, while lowering costs. Additionally, you hear from AOL’s Senior Software Engineer on how they used these strategies to migrate their Hadoop workloads to the AWS cloud and lessons learned along the way.
In this session, you learn the benefits of decoupling storage and compute and allowing them to scale independently; how to run Hadoop, Spark, Presto and other supported Hadoop Applications on Amazon EMR; how to use Amazon S3 as a persistent data-store and process data directly from Amazon S3; dDeployment strategies and how to avoid common mistakes when deploying at scale; and how to use Spot instances to scale your transient infrastructure effectively."
Business intelligence is often described as a set of methodologies and technologies that transform raw data into meaningful and useful information for business purposes. But this simple description hides many technical challenges IT teams struggle with. This session will show how to build business intelligence applications leveraging AWS, from the raw data import, consumption and storage down to the information production. We will also cover best practices for services such as Amazon Redshift or Amazon RDS, and how to use applications such as SAP Hana, Jaspersoft and others.
Continuous Integration and Deployment Best Practices on AWS (ARC307) | AWS re...Amazon Web Services
With AWS, companies now have the ability to develop and run their applications with speed and flexibility like never before. Working with an infrastructure that can be 100 percent API driven enables businesses to use lean methodologies and realize these benefits. This in turn leads to greater success for those who make use of these practices. In this session, we talk about some key concepts and design patterns for continuous deployment and continuous integration, two elements of lean development of applications and infrastructures.
The purpose of this paper is to demonstrate that it is possible to have an Odoo deployment that costs less than $100/month for 50 concurrent users. Moreover, this system will be always available and fault-tolerant and very much scalable. All this, because of its cloud architecture.
Serverless in production, an experience report (linuxing in london)Yan Cui
AWS Lambda has changed the way we deploy and run software, but this new serverless paradigm has created new challenges to old problems - how do you test a cloud-hosted function locally? How do you monitor them? What about logging and config management? And how do we start migrating from existing architectures?
In this talk Yan and Scott will discuss solutions to these challenges by drawing from real-world experience running Lambda in production and migrating from an existing monolithic architecture.
Serverless in production, an experience report (Going Serverless)Yan Cui
In this talk Yan Cui shares his experience of migrating an existing monolithic architecture for a social network to AWS Lambda, and how it empowered a small team to deliver features quickly and how they address operational concerns such as CI/CD, logging, monitoring and config management.
Analytics on AWS with Amazon Redshift, Amazon QuickSight, and Amazon Machine ...Amazon Web Services
AWS has a large and growing portfolio of big data management and analytics services, designed to be integrated into solution architectures that meet the needs of your business. In this session, we look at analytics through the eyes of a business intelligence analyst, a data scientist, and an application developer, and we explore how to quickly leverage Amazon Redshift, Amazon QuickSight, RStudio, and Amazon Machine Learning to create powerful, yet straightforward, business solutions.
Speaker:
Paul Armstrong, Solutions Architect, Amazon Web Services
Serverless in production, an experience report (JeffConf)Yan Cui
In this talk Yan Cui shares his experience of migrating an existing monolithic architecture for a social network to AWS Lambda, and how it empowered a small team to deliver features quickly and how they address operational concerns such as CI/CD, logging, monitoring and config management.
Ingest, Transform & Visualize w Amazon Web ServicesBigDataCamp
Startups and enterprises derive actionable insights from a myriad of data sources that are growing rapidly and must be processed quickly. While most organizations understand the pivotal role that data can play for their business, the process of transforming data and analytics from a concept to an actual business driver is often less clear. Arun will talk about how organizations can determine what they need from their IT infrastructure to turn raw data into valuable answers and insights.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Welcome to the first live UiPath Community Day Dubai! Join us for this unique occasion to meet our local and global UiPath Community and leaders. You will get a full view of the MEA region's automation landscape and the AI Powered automation technology capabilities of UiPath. Also, hosted by our local partners Marc Ellis, you will enjoy a half-day packed with industry insights and automation peers networking.
📕 Curious on our agenda? Wait no more!
10:00 Welcome note - UiPath Community in Dubai
Lovely Sinha, UiPath Community Chapter Leader, UiPath MVPx3, Hyper-automation Consultant, First Abu Dhabi Bank
10:20 A UiPath cross-region MEA overview
Ashraf El Zarka, VP and Managing Director MEA, UiPath
10:35: Customer Success Journey
Deepthi Deepak, Head of Intelligent Automation CoE, First Abu Dhabi Bank
11:15 The UiPath approach to GenAI with our three principles: improve accuracy, supercharge productivity, and automate more
Boris Krumrey, Global VP, Automation Innovation, UiPath
12:15 To discover how Marc Ellis leverages tech-driven solutions in recruitment and managed services.
Brendan Lingam, Director of Sales and Business Development, Marc Ellis
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
34. Example Continued select impressions.adId as adId, count(distinct clickId) / count(1) as clickthrough from impressions left outer join clicks on impressions.impressionId = clicks.impressionId group by impressions.adId ; impression_id, user_id, ad_id, … i-ABABABAB, u-ABABA, a-ABABABA … impression_id, click_id, … i-ABABABA, c-ABABA, … … impressions clicks
38. Declare the Impressions Table ADD JAR ${SAMPLE}/libs/jsonserde.jar ; CREATE EXTERNAL TABLE impressions ( requestBeginTime string, adId string, impressionId string, referrer string, userAgent string, userCookie string, ip string ) PARTITIONED BY (dt string) ROW FORMAT serde 'com.amazon.elasticmapreduce.JsonSerde' with serdeproperties ( 'paths'='requestBeginTime, adId, impressionId, referrer, userAgent, userCookie, ip' ) LOCATION '${SAMPLE}/tables/impressions' ; ALTER TABLE impressions ADD PARTITION (dt='2009-04-13-08-05') ;
39. Declare Clicks Table CREATE EXTERNAL TABLE clicks ( impressionId string, clickId string ) PARTITIONED BY (dt string) ROW FORMAT SERDE 'com.amazon.elasticmapreduce.JsonSerde' WITH SERDEPROPERTIES ( 'paths'='impressionId, number' ) LOCATION '${SAMPLE}/tables/clicks' ; ALTER TABLE clicks ADD PARTITION (dt='2009-04-13-08-05') ;
40. Execute Hive Query INSERT OVERWRITE DIRECTORY "s3://emr-demo/output/clickthough" SELECT impressions.adId as adId, count(distinct clickId) / count(1) as clickthrough FROM impressions left outer join clicks on impressions.impressionId = clicks.impressionId GROUP BY impressions.adId ORDER by clickthrough desc ; Ended Job = job_201006270056_0011 2868 Rows loaded to s3://emr-demo/output/clickthough
46. Accessing the Hadoop UI ssh -i c:/Users/richcole/emr-demo.pem -ND 8157 [email_address] Install FoxyProxy https://addons.mozilla.org/en-US/firefox/addon/2464/ Leave the Default proxy setting as is, add a new proxy - select SOCKS Proxy, and SOCKS 5 - select localhost and port 8157 - add a whitelist rule for http://*ec2*.amazonaws.com* - add a whitelist rule for http://*ec2.internal*
Hi, I’m Richard Cole, a software engineer on the Amazon Elastic MapReduce team. I’m going run through some of the features of the Elastic MapReduce. At the end of the talk I’ll give you the URL to these slides so you can download them. That way you don’t need to keep note down URLS.
Here’s a overview. First I’ll talk a little about what Amazon Elastic MapReduce is. Then I’ll explain how to get setup to EMR. Next I’ll run through an example of Developing a Bootstrap Action. I’ll then go through a quick example using Hive. My intention here is to take you through many of the useful features of our service.
We also support hadoop 0.18
Now I want to show you briefly how to get started with Elastic MapReduce. I’m going to show you how to sign up for EMR and SimpleDB. You should be able to use your.
Go to aws.amazon.com. This is the main page for Amazon Web Services. Click the orange sign up button on the right.
This is the page of Amazon Elastic MapReduce. Click the orange sign up button on the right.
This is the main page for Amazon SimpleDB. Click the sign up button on the right. Simple DB is required for Hadoop Debugging.
Next download the Elastic MapReduce command line client. Click the download button.
To install the command line client you need to have ruby installed. You basically unzip the client into a directory and create a credentials file either there, or in your home directory. The credentials file needs to be filled in with some details that we’ll fetch in the next few slides. You need your AWS Credentials. You need an EC2 keypair. You need also to specify a log-uri, this is where log files from your jobflow will be uploaded to.
Next we need a copy of the access credentials. Copy your access id and private key into the credentials file.
To create an EC2 keypair we’re going to the AWS Management Console. Click the orange button on the right.
Click on the EC2 tab. The EC2 Key Pair is required to SSH to the cluster. Click create a new Key Pair. Save the secret key somewhere safe. Copy the name of the key pair and the location key pair file into the credentials.json file.
You don’t need to use the command line client. You can also call the web service from Java. Here’s the AWS SDK for Java. To download it you click the yellow button on the right.
Here’s a recap of what we just did.
A job flow is what we call a Hadoop cluster is running or ran at some time. Log files from the cluster are stored in S3 so that they’re accessible later after the job flow has shutdown. Typically a jobflow runs in batch mode. That is it executes a series of MapReduce jobs and then terminates. The batch job might analyse log files over some period of time and produce data in a structured format that is stored in S3 for example. You might also run a jobflow in interactive mode. The typical use case for an interactive mode jobflow is when your developing a batch process. Here you might start with a smaller jobflow and a small portion of your data, you run your Hadoop jobs that are under development and test the results that you get. Another reason to run an interactive jobflow is for Adhoc analysis. You might for example be investigating some aspect of your data, and each query that you run suggests the next query to be run. In this case you run a job flow in interactive mode. You could also choose to run a job flow as an always on, long running job flow. In this case you persist data to Amazon S3 so that you can recover in the event of a master failure, but in the normal case you pull data continuously to your datawarehouse and you run a variety of batch mode and ad-hoc processing on the job flow.
Job flows have steps. A step specifies a jar located in Amazon S3 to be run on the master node. The jar is like a Hadoop job jar. It has a main function that is either specified in the manfiest of the jar or on the command line and it can contain lib jars in the same way that a Hadoop job jar does. Typically a step will use the Hadoop API’s to create one or more Hadoop jobs and wait for them to terminate. Steps are executed sequentially. A step jar indicates failure by returning non-zero value. There is step property called ActionOnFailure, this says what to do after a step fails. The options are: CONTINUE, which will just continue on to the next step effectively ignoring the error, CANCEL_AND_WAIT, which will cancel all following steps and TERMINATE_JOBFLOW which terminate JOBFLOW regardless of the setting KeepJobFlowAliveWhenNoSteps. This last property is property of a jobflow, it is used to decide what to do one all the steps have been executed or cancelled. If you want an interactive or long lived cluster then you need to set this property to true.
Steps only run on the master node. Bootstrap actions run on all nodes. They are run after Hadoop is configured but before Hadoop is started. So you can use them to modify the site-config to set settings that are not settable on a per job basis. You can also use bootstrap actions to install additional software on the nodes or to modify the machine configuration. For example you might want to add more swap space to the nodes. Bootstrap actions run as hadoop user, however Hadoop user to escalate to root without a password using sudo. So really within bootstrap actions you have complete control over the nodes.
Bootstrap actions are typically scripts located in Amazon S3. They can use Hadoop to download additional software to execute from S3. They indicate failure by returning a non-zero value. If a bootstrap action fails then the node will be discarded. Be carefull though, if more than 10% of your nodes fail their bootstrap action then the job flow will fail.
bNext I want to show you an example of developing a bootstrap action. Lets say that your application requires the mysql client library for Ruby. Lets say you have a streaming job and it needs to fetch some parameters from an Amazon RDS instance that is running. So you want to make a bootstrap action that will install the mysql client library. First you create an install script, we’re going to use bash but you could use ruby, or python, or perl or whatever is your favorite. This script first does set --e --x to turn on tracing and to make the script fail with non-zero value if any command in the script fail. Next it escalates to root using sudo and then installs the library using apt-get. The nodes run Debian/stable and the tool for installing software under Debian is called apt-get. We’ll put this script in a file and upload it to S3.
So next lets run an interactive job flow using the command line client. The --alive option makes the jobflow keep running even when all steps are finished. It is important for an interactive jobflow. Next we ssh to the master node and copy our script from Amazon S3 where we uploaded it. Then we make the script executable and execute it.
Next we’ll run a jobflow specifying the bootstrap action script on the command line. The script will then be run on all nodes in the jobflow and install the ruby mysql client for us.
Test on a small subset so you don’t waste lots of money