The document discusses Amazon EMR and Hadoop. It provides an overview of collecting, storing, organizing, analyzing and sharing big data using Hadoop frameworks like Hive and Pig. It also describes how Amazon EMR allows users to easily launch and terminate Hadoop clusters in the AWS cloud to process large amounts of data stored in S3.
The Hadoop ecosystem is blossoming. In this session, learn how to take advantage of tools such as Mesos, Spark, Shark and Mahout on Amazon Elastic MapReduce. Senior Product Manager, Jon Einkauf, discusses the optimizations which make Hadoop sing on EMR, and describes how to use different Hadoop distributions and tools such as Hbase and Hparser with your big data analytics pipelines.
This talk covers how Amazon CloudSearch and Amazon DynamoDB can be used together to provide an ideal combination of throughput and durability and rich, powerful search.
How do you build an architecture which is designed from the beginning to withstand failure? This session will cover techniques to develop such an architecture which is capable of withstanding disaster and failure. Take advantage of AWS Availability Zones to spread your application or workload across multiple physical locations to isolate yourself from physical and geographical disruptions. Replicate your database and state information to increase availability.
Attend this complimentary webinar to learn these techniques and many more. What you will learn:
• How to design for failure
• How to use distribute your application across Multiple Availability Zones (physically different data centers)
• How to scale your application as traffic grows and / or shrinks
• How to make your application self healing
• How to add loose coupling into your application to make it more survivable
Amazon EC2 provides you several pricing options that can help you significantly reduce your overall AWS bill, including On-Demand Instances, Spot Instances, Reserved Instances, and the Reserved Instance Marketplace. This session covers high-level architectures and when to use and not to use each of the pricing models for components of those architectures. We walk through several customer examples to illustrate when to use each pricing option. Additionally, we walk through tools that may be useful to determine when to use each pricing model. This session is aimed at technically savvy managers and engineers who need to reduce their cloud spending.
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names Amazon Web Services
In this webinar, we will demonstrate two new features that make it even easier for you to deliver content with Amazon CloudFront.
First, we’ll demonstrate how use can use Amazon Route 53, AWS’s authoritative DNS service, to configure an ‘Alias’ record that lets you use CloudFront to deliver your website at the root domain, or "zone apex." This feature enables you to map the apex or root (e.g. “example.com”) of your domain name to your CloudFront distribution. Then, visitors to your website can easily and reliably access your site from their browser without specifying “www” in the web address.
Second, we’ll demonstrate how you can use a custom SSL certificate with CloudFront to deliver content over HTTPS using your own domain name. With custom SSL domain names, your customers now get the low latency, reliability, and scalability benefits of CloudFront’s entire global edge location network when downloading your content over an SSL connection using your own domain name.
Adobe Summit EMEA 2012 : 16706 Optimise Mobile ExperienceBen Seymour
Smartphones and tablets come in all shapes and sizes, with screen sizes from 4 to 10 inches and varying support for rich media formats. How are you approaching this challenge to support highly engaging user experiences across the expanding range of mobile devices? Find out how Adobe Scene7 can help you optimise content for multiple screens to ensure high engagement and conversion.
Learn about:
- Best practices for optimising mobile experiences
-Emerging trends for immersive mobile experiences, including video and interactive catalogues
- Examples from clients who have optimised their rich media for tablets and smartphones
The Hadoop ecosystem is blossoming. In this session, learn how to take advantage of tools such as Mesos, Spark, Shark and Mahout on Amazon Elastic MapReduce. Senior Product Manager, Jon Einkauf, discusses the optimizations which make Hadoop sing on EMR, and describes how to use different Hadoop distributions and tools such as Hbase and Hparser with your big data analytics pipelines.
This talk covers how Amazon CloudSearch and Amazon DynamoDB can be used together to provide an ideal combination of throughput and durability and rich, powerful search.
How do you build an architecture which is designed from the beginning to withstand failure? This session will cover techniques to develop such an architecture which is capable of withstanding disaster and failure. Take advantage of AWS Availability Zones to spread your application or workload across multiple physical locations to isolate yourself from physical and geographical disruptions. Replicate your database and state information to increase availability.
Attend this complimentary webinar to learn these techniques and many more. What you will learn:
• How to design for failure
• How to use distribute your application across Multiple Availability Zones (physically different data centers)
• How to scale your application as traffic grows and / or shrinks
• How to make your application self healing
• How to add loose coupling into your application to make it more survivable
Amazon EC2 provides you several pricing options that can help you significantly reduce your overall AWS bill, including On-Demand Instances, Spot Instances, Reserved Instances, and the Reserved Instance Marketplace. This session covers high-level architectures and when to use and not to use each of the pricing models for components of those architectures. We walk through several customer examples to illustrate when to use each pricing option. Additionally, we walk through tools that may be useful to determine when to use each pricing model. This session is aimed at technically savvy managers and engineers who need to reduce their cloud spending.
AWS Webcast - Amazon CloudFront Zone Apex Support & Custom SSL Domain Names Amazon Web Services
In this webinar, we will demonstrate two new features that make it even easier for you to deliver content with Amazon CloudFront.
First, we’ll demonstrate how use can use Amazon Route 53, AWS’s authoritative DNS service, to configure an ‘Alias’ record that lets you use CloudFront to deliver your website at the root domain, or "zone apex." This feature enables you to map the apex or root (e.g. “example.com”) of your domain name to your CloudFront distribution. Then, visitors to your website can easily and reliably access your site from their browser without specifying “www” in the web address.
Second, we’ll demonstrate how you can use a custom SSL certificate with CloudFront to deliver content over HTTPS using your own domain name. With custom SSL domain names, your customers now get the low latency, reliability, and scalability benefits of CloudFront’s entire global edge location network when downloading your content over an SSL connection using your own domain name.
Adobe Summit EMEA 2012 : 16706 Optimise Mobile ExperienceBen Seymour
Smartphones and tablets come in all shapes and sizes, with screen sizes from 4 to 10 inches and varying support for rich media formats. How are you approaching this challenge to support highly engaging user experiences across the expanding range of mobile devices? Find out how Adobe Scene7 can help you optimise content for multiple screens to ensure high engagement and conversion.
Learn about:
- Best practices for optimising mobile experiences
-Emerging trends for immersive mobile experiences, including video and interactive catalogues
- Examples from clients who have optimised their rich media for tablets and smartphones
Moses Tool Set is a set of tools to simplify the usage of Moses. By using this tool, the training process of Moses can be done in an easier and intuitive way. It consists of 4 features: Corpus Clean Tool, Corpus Splitting Tool, Moses Training Harness, and Moses Scoring Harness. Each feature cannot only work independently but be combined into a job, which enables users to complete the whole training process in one click.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme.
Latest news on Twitter - #MosesCore
Develop multi-screen applications with Flex Codemotion
Presentazione tenuta da Michael Chaize per Adobe in occasione del Codemotion del 5 marzo 2011 a Roma - http://www.codemotion.it/
With the rise of a wide range of Internet connected devices, a new class of application is emerging to work across multiple kinds of devices. Developers are now faced with new challenges to provide the most engaging user experiences on any screen. New device input methods like touch and gestures require developers to rethink interaction models. Screen size constraints also require developers to optimize real estate usage. With so many different mediums for delivering rich Internet applications
In this presentation, the Amazon CloudFront product team discusses the basic features of the service and also introduce newer features such as dynamic content support and enhanced live streaming support. This presentation provides Partners with the background needed to feel comfortable discussing the newest enhancements to Amazon CloudFront with customers.
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...Amazon Web Services
This webinar covers both introductory as well as advanced topics related to ElastiCache and is intended for current memcached users as well as those already using ElastiCache. During this session we will go over various scenarios and use-cases that can benefit by enabling caching, discuss the features provided by ElastiCache, and review best-practices, design patterns, and anti-patterns related to ElastiCache. The webinar will also include a demo where we enable ElastiCache for a web application and show the resulting performance improvements.
Development Platform as a Service - erfarenheter efter ett års användning - ...IBM Sverige
Presentation från IBM Smarter Business 2011. Spår: Utveckla produkter och tjänster kostnadseffektivt.
Ta del av Tietos erfarenheter inom implementation av agil utveckling och Application Lifecycle Management med IBM Rationals lösningar. Presentationen visar på ett antal olika exempel på implementationer, och en representant från en svensk kund berättar om sina erfarenheter från ett års användning av IBM och Tietos Cloudbaserad utvecklingsplattform, DpaaS.
Talare: Per Engman, Business Development, Tieto.
Mer information på www.smarterbusiness.se
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
Big Data Analysis Patterns: Tying real world use cases to strategies for analysis using big data technologies and tools.
Big data is ushering in a new era for analytics with large scale data and relatively simple algorithms driving results rather than relying on complex models that use sample data. When you are ready to extract benefits from your data, how do you decide what approach, what algorithm, what tool to use? The answer is simpler than you think.
This session tackles big data analysis with a practical description of strategies for several classes of application types, identified concretely with use cases. Topics include new approaches to search and recommendation using scalable technologies such as Hadoop, Mahout, Storm, Solr, & Titan.
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
A user's perspective on SaltStack and other configuration management toolsSaltStack
Aurelien Geron uses SaltStack to manage a few VMs running Django web apps based on a sharded mongodb cluster. He had struggled with another configuration management tool for months but then read about Saltstack and decided to try it out. For Aurelien SaltStack just works, it's plain and simple, powerful, configurable and ultra-fast. This is his presentation.
In this talk, we’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app. We'll show you how to design a full-blown RSS Aggregation service to replace the loss the world suffered when Google Reader was shutdown.
We'll dive deeper into topics, such as how to model your data and create your REST API using MongoDB, Express.js and Node.js (core components of the MEAN stack). This session will jumpstart your development knowledge of MongoDB.
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB
Presented by Eoin Brazil, Proactive Technical Services Engineer, MongoDB
Experience level: Advanced
MongoDB offers a flexible, scalable, and easy way to store your large data set. Python provides many useful data science tools (e.g. NumPy, SciPy, Scikit-learn, etc.). This talk will discuss the concerns for creating operational data analytic pipelines, introduce Monary as alternative for loading data into NumPy, and give examples of accessing data with Monary, as well as how to build scalable data analysis pipelines using these open source tools.
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
Introduction to Apache Airflow (Incubating), best practices and roadmap. Airflow is a platform to programmatically author, schedule and monitor workflows.
Moses Tool Set is a set of tools to simplify the usage of Moses. By using this tool, the training process of Moses can be done in an easier and intuitive way. It consists of 4 features: Corpus Clean Tool, Corpus Splitting Tool, Moses Training Harness, and Moses Scoring Harness. Each feature cannot only work independently but be combined into a job, which enables users to complete the whole training process in one click.
This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supporetd by the European Commission Grant Number 288487 under the 7th Framework Programme.
Latest news on Twitter - #MosesCore
Develop multi-screen applications with Flex Codemotion
Presentazione tenuta da Michael Chaize per Adobe in occasione del Codemotion del 5 marzo 2011 a Roma - http://www.codemotion.it/
With the rise of a wide range of Internet connected devices, a new class of application is emerging to work across multiple kinds of devices. Developers are now faced with new challenges to provide the most engaging user experiences on any screen. New device input methods like touch and gestures require developers to rethink interaction models. Screen size constraints also require developers to optimize real estate usage. With so many different mediums for delivering rich Internet applications
In this presentation, the Amazon CloudFront product team discusses the basic features of the service and also introduce newer features such as dynamic content support and enhanced live streaming support. This presentation provides Partners with the background needed to feel comfortable discussing the newest enhancements to Amazon CloudFront with customers.
AWS Webcast - Accelerating Application Performance Using In-Memory Caching in...Amazon Web Services
This webinar covers both introductory as well as advanced topics related to ElastiCache and is intended for current memcached users as well as those already using ElastiCache. During this session we will go over various scenarios and use-cases that can benefit by enabling caching, discuss the features provided by ElastiCache, and review best-practices, design patterns, and anti-patterns related to ElastiCache. The webinar will also include a demo where we enable ElastiCache for a web application and show the resulting performance improvements.
Development Platform as a Service - erfarenheter efter ett års användning - ...IBM Sverige
Presentation från IBM Smarter Business 2011. Spår: Utveckla produkter och tjänster kostnadseffektivt.
Ta del av Tietos erfarenheter inom implementation av agil utveckling och Application Lifecycle Management med IBM Rationals lösningar. Presentationen visar på ett antal olika exempel på implementationer, och en representant från en svensk kund berättar om sina erfarenheter från ett års användning av IBM och Tietos Cloudbaserad utvecklingsplattform, DpaaS.
Talare: Per Engman, Business Development, Tieto.
Mer information på www.smarterbusiness.se
Big Data Analysis Patterns - TriHUG 6/27/2013boorad
Big Data Analysis Patterns: Tying real world use cases to strategies for analysis using big data technologies and tools.
Big data is ushering in a new era for analytics with large scale data and relatively simple algorithms driving results rather than relying on complex models that use sample data. When you are ready to extract benefits from your data, how do you decide what approach, what algorithm, what tool to use? The answer is simpler than you think.
This session tackles big data analysis with a practical description of strategies for several classes of application types, identified concretely with use cases. Topics include new approaches to search and recommendation using scalable technologies such as Hadoop, Mahout, Storm, Solr, & Titan.
A brief history of Instagram's adoption cycle of the open source distributed database Apache Cassandra, in addition to details about it's use case and implementation. This was presented at the San Francisco Cassandra Meetup at the Disqus HQ in August 2013.
A user's perspective on SaltStack and other configuration management toolsSaltStack
Aurelien Geron uses SaltStack to manage a few VMs running Django web apps based on a sharded mongodb cluster. He had struggled with another configuration management tool for months but then read about Saltstack and decided to try it out. For Aurelien SaltStack just works, it's plain and simple, powerful, configurable and ultra-fast. This is his presentation.
In this talk, we’ll discuss the benefits of the document-based data model that MongoDB offers by walking through how one can build a simple app. We'll show you how to design a full-blown RSS Aggregation service to replace the loss the world suffered when Google Reader was shutdown.
We'll dive deeper into topics, such as how to model your data and create your REST API using MongoDB, Express.js and Node.js (core components of the MEAN stack). This session will jumpstart your development knowledge of MongoDB.
MongoDB Days UK: Using MongoDB and Python for Data Analysis PipelinesMongoDB
Presented by Eoin Brazil, Proactive Technical Services Engineer, MongoDB
Experience level: Advanced
MongoDB offers a flexible, scalable, and easy way to store your large data set. Python provides many useful data science tools (e.g. NumPy, SciPy, Scikit-learn, etc.). This talk will discuss the concerns for creating operational data analytic pipelines, introduce Monary as alternative for loading data into NumPy, and give examples of accessing data with Monary, as well as how to build scalable data analysis pipelines using these open source tools.
Apache Airflow (incubating) NL HUG Meetup 2016-07-19Bolke de Bruin
Introduction to Apache Airflow (Incubating), best practices and roadmap. Airflow is a platform to programmatically author, schedule and monitor workflows.
Data Lake for the Cloud: Extending your Hadoop ImplementationHortonworks
As more applications are created using Apache Hadoop that derive value from the new types of data from sensors/machines, server logs, click-streams, and other sources, the enterprise "Data Lake" forms with Hadoop acting as a shared service. While these Data Lakes are important, a broader life-cycle needs to be considered that spans development, test, production, and archival and that is deployed across a hybrid cloud architecture.
If you have already deployed Hadoop on-premise, this session will also provide an overview of the key scenarios and benefits of joining your on-premise Hadoop implementation with the cloud, by doing backup/archive, dev/test or bursting. Learn how you can get the benefits of an on-premise Hadoop that can seamlessly scale with the power of the cloud.
AWS Summit Auckland 2014 | Moving to the Cloud. What does it Mean to your Bus...Amazon Web Services
AWS launched in 2006, and since then we have released more than 530 services, features, and major announcements. Every year, we outpace the previous year in launches and are continuously accelerating the pace of innovation across the organization. Ever wonder how we formulate customer-centric ideas, turn them into features and services, and get them to market quickly? This session dives deep into how an idea becomes a service at AWS and how we continue to evolve the service after release through innovation at every level. We even spill the beans on how we manage operational excellence across our services to ensure the highest possible availability. Come learn about the rapid pace of innovation at AWS, and the culture that formulates magic behind the scenes.
Enterprises are increasingly looking for new ways to simplify and optimize their current development, orchestration, automation and deployment pipelines through the use of hybrid IT and the public cloud. In this session we will explore architecture patterns and integration approaches in the context of both new and existing AWS devops-focused services, with the goal of helping enterprises better iterate and reduce cost through the entire software development lifecycle.
Datapipe’s Director of Compliant Solutions, Mark Fuqua, will lead a conversation with Shane Shelton, McGraw Hill Education’s Senior Director of Application Performance and Development Operations, and Datapipe Solution Architects around the steps taken to deliver an end-to-end hybrid infrastructure. This session will cover the issues faced, problems solved, services implemented, and approach used in deploying a holistic hybrid it solution for McGraw Hill Education.
The discussion will center around achieving visibility into both AWS and traditional infrastructure through an integrated managed solution encompassing thousands of instances, direct connect, highly available oracle database components, and governance controls around the entire
environment.
At this scale, simple changes can have a significant impact to an overall environment, having drastic effects on the integrity, security, and costs of solution. We will cover the change control, intrusion detection services, log management, and analytics platform used to secure and manage this environment.
CIS13: AWS Identity and Access ManagementCloudIDSummit
Jim Scharf, Director, AWS Identity and Access Management, Amazon
Amazon Web Services customers include students, startups, mobile developers, enterprises and government agencies. Learn how AWS Identity and Access Management provides access control for trillions of cloud resources.
Building Better Search For Wikipedia: How We Did It Using Amazon CloudSearch ...Amazon Web Services
In this webinar Paul Nelson, CTO and search guru at Search Technologies, covers how he implemented improved search capabilities for Wikipedia using Amazon CloudSearch, a fully-managed search service in the AWS cloud. See how Wikipedia search can now deliver a richer experience that includes faceted navigation, better and more relevant results, and an improved user interface. Topics include data acquisition and clean-up, indexing, handling queries, relevance ranking, and building the search user interface. For more information please see: http://aws.amazon.com/cloudsearch/
You will learn how to create file archives, upload them to Amazon S3, and manage permissions and lifetimes, giving you the ability to back up any amount of data and to retain it for as long as you'd like. A number of open source and commercial backup and archiving tools will be demonstrated, as time permits.
You will also learn how to use built-in AWS facilities to quickly and easily create and restore snapshots of entire disk volumes.
Design for failure and nothing fails. How do you build a system which is designed from the beginning to withstand failure? This session will cover many techniques to develop a system which can remain available during times of disaster and failure. Take advantage of AWS Availability Zones to spread your system across multiple physical locations to isolate yourself from physical and geographical disruptions. Replicate your database and state information to increase availability. Presenter; Brett Hollman, Solutions Architect for Amazon Web Services
The general purpose computing and storage environment of Amazon Web Services integrates perfectly into your existing ecosystem. Join customers who have taken advantage of this environment in parallel to their on-premise infrastructure to hear tales, tips, and tricks of best practices of integrating AWS with existing resources securely using services such as Amazon Virtual Private Cloud, AWS Direct Connect, and AWS Storage Gateway.
AWS Webcast - High Availability with Route 53 DNS FailoverAmazon Web Services
This webinar will be discussing how to use DNS Failover to a range of high-availability architectures, from a simple backup website to advanced multi-region architectures.
In the event of a disaster, you can quickly restore data locally or launch resources in Amazon Web Services (AWS) to help ensure business continuity. In this presentation, you will learn about the AWS services that you can leverage for your disaster recovery (DR) solution, four common DR architectures that leverage the AWS Cloud, and how to get started.
Amazon RDS makes it easy to set up, operate, and scale relational deployments in the cloud. We are to introduce PostgreSQL to the family of supported database engines. Now, you can deploy scalable PostgreSQL deployments in minutes, freeing you up to focus on application development instead of time-consuming database administration tasks including backups, software patching, monitoring, scaling and replication. In this webinar, we will provide an overview of Amazon RDS for PostgreSQL, discuss popular use cases, and share best practices that will help you fully leverage PostgreSQL in the cloud.
AWS Webcast - Using Amazon CloudFront-Accelerate Your Static, Dynamic, Intera...Amazon Web Services
Amazon CloudFront AWS’s easy-to-use and cost-effective content delivery service, recently added support for five additional HTTP methods: POST, PUT, DELETE, OPTIONS and PATCH. This means you can now use CloudFront to accelerate data uploaded from end users, improving the performance of dynamic and personalized websites that have web forms, comment and login boxes, “add to cart” buttons or other features. In this webinar, we will explain how CloudFront can accelerate your entire website running on Amazon S3, Amazon EC2, an Elastic Load Balancer, or on your own origin server using routes optimized via persistent connections, TCP/IP and other network path optimizations. We will also demo recent CloudFront features such as zone apex support, custom error pages, and content upload (via these additional HTTP methods).
Redshift is a petabyte-scale data warehouse that is a lot faster, a lot less expensive and a whole lot simpler to use. How can you get your data into Amazon Redshift? In this webinar, hear from representatives of Attunity (Amazon Redshift Partner), and AWS as they present many of the options available for data integration. Whether your data is in an on premise platform or a cloud based database like DynamoDB, we will show you how you can easily load your data in to Re
dshift.
Reasons to attend: - Learn about best practices to efficiently integrate data into Redshift. - Attend Q&A session with Redshift experts
SEC101 A Guided Tour of AWS Identity and Access Management - AWS re: Invent…Amazon Web Services
Learn what AWS Identity and Access Management (IAM) technologies are available for you to manage users and their access to your AWS environment. We present a high level discussion of the benefits and functionality IAM provides to control secure access to your AWS environment. We discuss how you can manage users and their permissions when using IAM, how roles makes it simpler for you delegate access, and how to use Multi-Factor Authentication (MFA) to require additional proof of identity.
iQ FutureNow: Ensuring the success of your mobile strategyiQcontent
Xavier Agnetti from Adobe tells us directly from the leader of digital marketing technology how to analyse and measure the effectiveness of your mobile strategy. First presented at iQ FutureNow, Manchester 4 July 2012.
Delivering Better Search For WordPress - AWS WebcastMichael Bohlig
Want to offer your users more accurate search results for your WordPress websites and content? We will show you how to install and set up the Lift WordPress plugin for Amazon CloudSearch to improve your default WordPress search functionality. You will learn how to get better search relevancy, with faceting and search filters for post types and date ranges. Lift integrates with your existing Wordpress theme, with no need for customization, and runs on top of your WordPress installation with no additional servers, services or hosting configuration required.
Presenters:
Chris Scott, Voce Communications
Jon Handler, Solution Architect, Amazon CloudSearch
Architecting Security & Governance across Your AWS Landing Zone - SEC301 - An...Amazon Web Services
Whether it is per business unit or per application, many AWS customers use multiple accounts to meet their infrastructure isolation, separation of duties, and billing requirements to establish their AWS Landing Zone. In this session, we cover considerations, limitations, and security patterns when building a multi-account strategy. We explore topics such as thought pattern, identity federation, cross-account roles, consolidated logging, and account governance. We conclude by presenting an enterprise-ready landing zone framework and providing the background needed to implement an AWS Landing Zone.
Amazon Simple Workflow Service (Amazon SWF) is a workflow service for building scalable, resilient applications. Whether automating business processes for finance or insurance applications, building sophisticated data analytics applications, or managing cloud infrastructure services, Amazon SWF reliably coordinates all of the processing steps within an application.
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
Il Forecasting è un processo importante per tantissime aziende e viene utilizzato in vari ambiti per cercare di prevedere in modo accurato la crescita e distribuzione di un prodotto, l’utilizzo delle risorse necessarie nelle linee produttive, presentazioni finanziarie e tanto altro. Amazon utilizza delle tecniche avanzate di forecasting, in parte questi servizi sono stati messi a disposizione di tutti i clienti AWS.
In questa sessione illustreremo come pre-processare i dati che contengono una componente temporale e successivamente utilizzare un algoritmo che a partire dal tipo di dato analizzato produce un forecasting accurato.
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
La varietà e la quantità di dati che si crea ogni giorno accelera sempre più velocemente e rappresenta una opportunità irripetibile per innovare e creare nuove startup.
Tuttavia gestire grandi quantità di dati può apparire complesso: creare cluster Big Data su larga scala sembra essere un investimento accessibile solo ad aziende consolidate. Ma l’elasticità del Cloud e, in particolare, i servizi Serverless ci permettono di rompere questi limiti.
Vediamo quindi come è possibile sviluppare applicazioni Big Data rapidamente, senza preoccuparci dell’infrastruttura, ma dedicando tutte le risorse allo sviluppo delle nostre le nostre idee per creare prodotti innovativi.
Ora puoi utilizzare Amazon Elastic Kubernetes Service (EKS) per eseguire pod Kubernetes su AWS Fargate, il motore di elaborazione serverless creato per container su AWS. Questo rende più semplice che mai costruire ed eseguire le tue applicazioni Kubernetes nel cloud AWS.In questa sessione presenteremo le caratteristiche principali del servizio e come distribuire la tua applicazione in pochi passaggi
Vent'anni fa Amazon ha attraversato una trasformazione radicale con l'obiettivo di aumentare il ritmo dell'innovazione. In questo periodo abbiamo imparato come cambiare il nostro approccio allo sviluppo delle applicazioni ci ha permesso di aumentare notevolmente l'agilità, la velocità di rilascio e, in definitiva, ci ha consentito di creare applicazioni più affidabili e scalabili. In questa sessione illustreremo come definiamo le applicazioni moderne e come la creazione di app moderne influisce non solo sull'architettura dell'applicazione, ma sulla struttura organizzativa, sulle pipeline di rilascio dello sviluppo e persino sul modello operativo. Descriveremo anche approcci comuni alla modernizzazione, compreso l'approccio utilizzato dalla stessa Amazon.com.
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
L’utilizzo dei container è in continua crescita.
Se correttamente disegnate, le applicazioni basate su Container sono molto spesso stateless e flessibili.
I servizi AWS ECS, EKS e Kubernetes su EC2 possono sfruttare le istanze Spot, portando ad un risparmio medio del 70% rispetto alle istanze On Demand. In questa sessione scopriremo insieme quali sono le caratteristiche delle istanze Spot e come possono essere utilizzate facilmente su AWS. Impareremo inoltre come Spreaker sfrutta le istanze spot per eseguire applicazioni di diverso tipo, in produzione, ad una frazione del costo on-demand!
In recent months, many customers have been asking us the question – how to monetise Open APIs, simplify Fintech integrations and accelerate adoption of various Open Banking business models. Therefore, AWS and FinConecta would like to invite you to Open Finance marketplace presentation on October 20th.
Event Agenda :
Open banking so far (short recap)
• PSD2, OB UK, OB Australia, OB LATAM, OB Israel
Intro to Open Finance marketplace
• Scope
• Features
• Tech overview and Demo
The role of the Cloud
The Future of APIs
• Complying with regulation
• Monetizing data / APIs
• Business models
• Time to market
One platform for all: a Strategic approach
Q&A
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
Per creare valore e costruire una propria offerta differenziante e riconoscibile, le startup di successo sanno come combinare tecnologie consolidate con componenti innovativi creati ad hoc.
AWS fornisce servizi pronti all'utilizzo e, allo stesso tempo, permette di personalizzare e creare gli elementi differenzianti della propria offerta.
Concentrandoci sulle tecnologie di Machine Learning, vedremo come selezionare i servizi di intelligenza artificiale offerti da AWS e, anche attraverso una demo, come costruire modelli di Machine Learning personalizzati utilizzando SageMaker Studio.
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
Con l'approccio tradizionale al mondo IT per molti anni è stato difficile implementare tecniche di DevOps, che finora spesso hanno previsto attività manuali portando di tanto in tanto a dei downtime degli applicativi interrompendo l'operatività dell'utente. Con l'avvento del cloud, le tecniche di DevOps sono ormai a portata di tutti a basso costo per qualsiasi genere di workload, garantendo maggiore affidabilità del sistema e risultando in dei significativi miglioramenti della business continuity.
AWS mette a disposizione AWS OpsWork come strumento di Configuration Management che mira ad automatizzare e semplificare la gestione e i deployment delle istanze EC2 per mezzo di workload Chef e Puppet.
Scopri come sfruttare AWS OpsWork a garanzia e affidabilità del tuo applicativo installato su Instanze EC2.
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
Vuoi conoscere le opzioni per eseguire Microsoft Active Directory su AWS? Quando si spostano carichi di lavoro Microsoft in AWS, è importante considerare come distribuire Microsoft Active Directory per supportare la gestione, l'autenticazione e l'autorizzazione dei criteri di gruppo. In questa sessione, discuteremo le opzioni per la distribuzione di Microsoft Active Directory su AWS, incluso AWS Directory Service per Microsoft Active Directory e la distribuzione di Active Directory su Windows su Amazon Elastic Compute Cloud (Amazon EC2). Trattiamo argomenti quali l'integrazione del tuo ambiente Microsoft Active Directory locale nel cloud e l'utilizzo di applicazioni SaaS, come Office 365, con AWS Single Sign-On.
Dal riconoscimento facciale al riconoscimento di frodi o difetti di fabbricazione, l'analisi di immagini e video che sfruttano tecniche di intelligenza artificiale, si stanno evolvendo e raffinando a ritmi elevati. In questo webinar esploreremo le possibilità messe a disposizione dai servizi AWS per applicare lo stato dell'arte delle tecniche di computer vision a scenari reali.
Amazon Web Services e VMware organizzano un evento virtuale gratuito il prossimo mercoledì 14 Ottobre dalle 12:00 alle 13:00 dedicato a VMware Cloud ™ on AWS, il servizio on demand che consente di eseguire applicazioni in ambienti cloud basati su VMware vSphere® e di accedere ad una vasta gamma di servizi AWS, sfruttando a pieno le potenzialità del cloud AWS e tutelando gli investimenti VMware esistenti.
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
Molte aziende oggi, costruiscono applicazioni con funzionalità di tipo ledger ad esempio per verificare lo storico di accrediti o addebiti nelle transazioni bancarie o ancora per tenere traccia del flusso supply chain dei propri prodotti.
Alla base di queste soluzioni ci sono i database ledger che permettono di avere un log delle transazioni trasparente, immutabile e crittograficamente verificabile, ma sono strumenti complessi e onerosi da gestire.
Amazon QLDB elimina la necessità di costruire sistemi personalizzati e complessi fornendo un database ledger serverless completamente gestito.
In questa sessione scopriremo come realizzare un'applicazione serverless completa che utilizzi le funzionalità di QLDB.
Con l’ascesa delle architetture di microservizi e delle ricche applicazioni mobili e Web, le API sono più importanti che mai per offrire agli utenti finali una user experience eccezionale. In questa sessione impareremo come affrontare le moderne sfide di progettazione delle API con GraphQL, un linguaggio di query API open source utilizzato da Facebook, Amazon e altro e come utilizzare AWS AppSync, un servizio GraphQL serverless gestito su AWS. Approfondiremo diversi scenari, comprendendo come AppSync può aiutare a risolvere questi casi d’uso creando API moderne con funzionalità di aggiornamento dati in tempo reale e offline.
Inoltre, impareremo come Sky Italia utilizza AWS AppSync per fornire aggiornamenti sportivi in tempo reale agli utenti del proprio portale web.
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
Molte organizzazioni sfruttano i vantaggi del cloud migrando i propri carichi di lavoro Oracle e assicurandosi notevoli vantaggi in termini di agilità ed efficienza dei costi.
La migrazione di questi carichi di lavoro, può creare complessità durante la modernizzazione e il refactoring delle applicazioni e a questo si possono aggiungere rischi di prestazione che possono essere introdotti quando si spostano le applicazioni dai data center locali.
In queste slide, gli esperti AWS e VMware presentano semplici e pratici accorgimenti per facilitare e semplificare la migrazione dei carichi di lavoro Oracle accelerando la trasformazione verso il cloud, approfondiranno l’architettura e dimostreranno come sfruttare a pieno le potenzialità di VMware Cloud ™ on AWS.
Amazon Elastic Container Service (Amazon ECS) è un servizio di gestione dei container altamente scalabile, che semplifica la gestione dei contenitori Docker attraverso un layer di orchestrazione per il controllo del deployment e del relativo lifecycle. In questa sessione presenteremo le principali caratteristiche del servizio, le architetture di riferimento per i differenti carichi di lavoro e i semplici passi necessari per poter velocemente migrare uno o più dei tuo container.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
39. What are Spot Instances?
Sold at Sold at
50%
Unused 54%
Unused
Discount! Discount!
Sold at Sold at
56%
Unused 59%
Unused
Discount! Discount!
Sold at Sold at
66%
Unused 63%
Unused
Discount! Discount!
Availability Zone Availability Zone
Region
40. What is the tradeoff?
Unused Unused
Unused
Reclaimed Unused
Unused
Reclaimed Unused
Availability Zone Availability Zone
Region
The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing.
Hadoop is complex to setup and hard to setup. Its capex intensiveAlso if your workload needs to scale, Hadoop is pretty hard to scale on physical infrastructure. Though hadoop is a fault tolerant system, its difficult to replace failed components like disks drives or nodes. You still need time to procure this information
The key messages that we want to deliver with this slide are 1. Elastic MapReduce is a hosted hadoop service. We use the most stable version of apache hadoop and provide a hosted service, and build integration point withs other services on the AWS eco-system such as S3, Cloudwatch, Dynamodb etc. We make other improvements to Hadoop so that it becomes easier to scale and manage on AWS2. We will keep iterating on the different versions of hadoop as they become stable. When you use the console you launch the latest version of hadoop, but you also have the choice or launching an older version of hadoop via the CLI or the SDK. 3. So what all you can do with EMR ?You can build applications on Amazon EMR, just like you would with HadoopIn order to develop custom Hadoop applications, you used to need access to a lot of hardware to test your Hadoop programs. Amazon EMR makes it easy to spin up a set of Amazon EC2 instances as virtual servers to run your Hadoop cluster. You can also test various server configurations without having to purchase or reconfigure hardware. When you're done developing and testing your application, you can terminate your cluster, only paying for the computational time you used.Amazon EMR provides three types of clusters (also called job flows) that you can launch to run custom map-reduce applications, depending on the type of program you're developing and which libraries you intend to use.
Supported hadoop versions are 1.0.30.20.2050.200.18
Custom JARRun your custom map-reduce program written in Java. This cluster provides low-level access to the MapReduce API. You have the most flexibility programming for this type of cluster, but also the responsibility of defining and implementing the map reduce tasks in your Java application.CascadingCascading is an open-source Java library that provides a query API, a query planner, and a job schedulerfor creating and running Hadoop MapReduce applications. Applications developed with Cascading arecompiled and packaged into standard Hadoop-compatible JAR files similar to other native Hadoopapplications.Multitool is a Cascading application that provides a simple command line interface for managing largedatasets. For example, you can filter records matching a Java regular expression from data stored inAmazon S3 and copy the results to the Hadoop file system.You can run the Cascading Multitool application on Amazon Elastic MapReduce (Amazon EMR) usingeither the Amazon EMR command line interface or the Amazon EMR console. Amazon EMR supportsall Multitool arguments.StreamingRun a single Hadoop job based on map and reduce functions you upload to Amazon S3. The functions can be implemented in any of the following supported languages: Ruby, Perl, Python, PHP, R, Bash, C++.
HIVE and PIGYou can use Amazon EMR to analyze data without writing a line of code. Several open-source applications run on top of Hadoop and make it possible to run map-reduce jobs and query data using either a SQL-like syntax or a specialized query language called Pig Latin. Amazon EMR is integrated with Apache Hive and Apache Pig.With Amazon Hive, you can run queries against data in NOSQL data stores like dynamodb, Hbase, along with data present in S3 and in HDFS – ALL in a single query. This is an amazon specific option.You can also use EMR to Move large volumes of data You can use Amazon EMR to move large amounts of data in and out of databases and data stores. By distributing the work, the data can be moved quickly. Amazon EMR provides custom libraries to move data in and out of Amazon Simple Storage Service (Amazon S3), Amazon Dynamo DB, and Apache Hbase.
RazorFish
EMR supports multiple instance types including the latest HS1 instance types EMR now supports High Storage Instances (hs1.8xlarge) in US East. These new instances offer 48 TB of storage across 24 hard disk drives, 35 EC2 Compute Units (ECUs) of compute capacity, 117 GB of RAM, 10 Gbps networking, and 2.4+ GB per second of sequential I/O performance. High Storage Instances are ideally suited for Hadoop and they significantly reduce the cost of processing very large data sets on EMR. We look forward to adding support for High Storage Instances in additional regions early next year.
And the concept of adding nodes works well with hadoop – especially on the cloud since 10 nodes running for 10 hours costs the same as 100 nodes running for 1 hour.
10 x 10 = 100 nodes running for 1 hour
1.3 Trillion Objects835k+ peak transactions per second
You can run hadoop clusters in automated mode , where you code will be pulled out of S3 automatically by the cluster OR You can run an interactive cluster , where once the cluster boots, you can SSH into the master node and manually fire a job
Now you can create a job flowIts important to understand the concept of a job flow A job flow is the series of instructions Amazon Elastic MapReduce (Amazon EMR) uses to process data. A job flow contains any number of user-defined steps. A step is any instruction that manipulates the data. Steps are executed in the order in which they are defined in the job flow
This screen gives you to chance to select different version of hadoop.
Now you can select the type of job flow you want to run
There will be different options available for different types of program For eg. The Java based JAR will ask you for location of your input data, output data and location of mapper and reducer scripts Extra arguments are anything extra that your programs might need, for example in this case, I have choose to include some specific HIVE Libraries that my HIVE script refers to
Amazon EMR refers to managed hadoop clustersas a job flow, and defines the concept of instance groups, which are collections of Amazon EC2 instances that perform roles analogous to the master and slave nodes of Hadoop. There are three types of instance groups: master, core, and task.Each Amazon EMR job flow includes one master instance group that contains one master node, a core instance group containing one or more core nodes, and an optional task instance group, which can contain any number of task nodes.If the job flow is run on a single node, then that instance is simultaneously a master and a core node. For job flows running on more than one node, one instance is the master node and the remaining are core or task nodes.You have the choice of running different instance types for each of them. Lets look at each of these instance group types Master Instance GroupThe master instance group manages the job flow: coordinating the distribution of the MapReduce executable and subsets of the raw data, to the core and task instance groups. It also tracks the status of each task performed, and monitors the health of the instance groups. To monitor the progress of the job flow, you can SSH into the master node as the Hadoop user and either look at the Hadoop log files directly or access the user interface that Hadoop publishes to the web server running on the master node. As the job flow progresses, each core and task node processes its data, transfers the data back to Amazon S3, and provides status metadata to the master node.Core Instance GroupThe core instance group contains all of the core nodes of a job flow. A core node is an EC2 instance that runs Hadoop map and reduce tasks and stores data using the Hadoop Distributed File System (HDFS). Core nodes are managed by the master node.The EC2 instances you assign as core nodes are capacity that must be allotted for the entire job flow run. Because core nodes store data, you can't remove them from a job flow. However, you can add more core nodes to a running job flow. Core nodes run both the DataNodes and TaskTracker Hadoop daemons.CautionRemoving HDFS from a running node runs the risk of losing data.Task Instance GroupThe task instance group contains all of the task nodes in a job flow. The task instance group is optional. You can add it when you start the job flow or add a task instance group to a job flow in progress.Task nodes are managed by the master node. While a job flow is running you can increase and decrease the number of task nodes. Because they don't store data and can be added and removed from a job flow, you can use task nodes to manage the EC2 instance capacity your job flow uses, increasing capacity to handle peak loads and decreasing it later. Task nodes only run a TaskTracker Hadoop daemon.There are three other aspects related to instance groups are important and we will address at a later part of this presentation. They are SPOT instances Dealing with Failure Resizing job flows
Amazon EC2 Key Pair Optionally, specify a key pair that you created previously. If you do not enter a value in this field, you cannot use SSH to connect to the master node. Amazon VPC Subnet Id Optionally, specify a VPC subnet identifier to launch the job flow in an Amazon VPC. Amazon S3 Log Path OptionOptionally, specify a path in Amazon S3 to receive a copy of the log files generated by the job flow. When this value is set, Amazon EMR copies the log files from the EC2 instances in the job flow to Amazon S3. This prevents the log files from being lost when the job flow ends and the EC2 instances hosting the job flow are terminated.Enable Debugging Optionally, select Yes to create an index of your log files in Amazon SimpleDB. This index must exist in order to use the debugging tool in the Amazon EMR console. Whether or not to create this index can only be set when the job flow is created. If you set this to Yes, you must also specify a value for Amazon S3 Log Path.Keep Alive Optionally, select Yes to cause the job flow to continue running when all processing is completed. This is how you would run a persistent cluster. Once you keep the cluster alive , you will be able to submit jobs to the cluster. Once a job is finished you will see that the cluster is in wAITING mode as we discussed earlier.If you select No. Because this job flow is non-interactive, it will terminate automatically when it is done so you do not continue to accrue charges on an idle job flow.Termination Protection Optionally, select Yes to ensure the job flow is not shut down due to accident or errorVisible To All IAM Users Select Yes to make the job flow visible and accessible to all IAM users on the AWS account. For more information, see Configure User Permissions with IAM.
Bootstrap actions allow you to pass a reference to a script stored in Amazon S3. This script can contain configuration settings and arguments related to Hadoop or Elastic MapReduce. Bootstrap actions are run before Hadoop starts and before the node begins processing data.Unlike other managed services, EMR gives you complete control. With the bootstrap action you can make any customizations to the hadoop cluster or run other open source projects like MAHOUT etc to it. NoteIf the bootstrap action returns a nonzero error code, Amazon Elastic MapReduce (Amazon EMR) treats it as a failure and terminates the instance. If too many instances fail their bootstrap actions, then Amazon EMR terminates the job flow. If just a few instances fail, then an attempt is made to reallocate the failed instances and continue.So this is another advantage of the the managed service. Amazon provides a number of predefined bootstrap action scripts that you can use to customize Hadoop settings. This section describes the available predefined bootstrap actions. References to predefined bootstrap action scripts are passed to Elastic MapReduce by using the bootstrap-action parameter.I am going to talk to you about the pre-defined bootstrap actions in the next slide
All these pre-defined bootstrap action scripts are available in S3 and you can download them and change them. You can also use your own scripts. So examples could be a script that installs script and pulls data from a relational data store incrementallyAnother example could be a script that install mahout and configures the environment for it Lets look at the existing pre-defined bootstrap actions Configure DaemonsThis predefined bootstrap action lets you specify the heap size or other Java Virtual Machine (JVM) options for the Hadoop daemons. You can use this bootstrap action to configure Hadoop for large jobs that require more memory than Hadoop allocates by default. You can also use this bootstrap action to modify advanced JVM options, such as garbage collection behavior.Configure HadoopThis bootstrap action allows you to set cluster-wide Hadoop settings. This script provides two types of command line options:Option 1—Enables you to upload an XML file containing configuration settings to Amazon S3. The bootstrap action merges the new configuration settings with the existing Hadoop configuration.Option 2—Allows you to specify a Hadoop key value pair from the command line that overrides the existing Hadoop configuration.Configure Memory-Intensive WorkloadsThis bootstrap action allows you to set cluster-wide Hadoop settings to values appropriate for job flows with memory-intensive workloads.NOTE: The default configurations for cc1.4xlarge, cc2.8xlarge, hs1.8xlarge, and cg1.4xlarge instances are sufficient for memory-intensive workloads. This bootstrap action does not modify the settings for these instance types.Shutdown ActionsA bootstrap action script can create one or more shutdown actions by writing scripts to the /mnt/var/lib/instance-controller/public/shutdown-actions/ directory. When a job flow is terminated, all the scripts in this directory are executed in parallel. Each script must run and complete within 60 seconds.NoteShutdown action scripts are not guaranteed to run if the node terminates with an error.Run IfYou can use this predefined bootstrap action to conditionally run a command when an instance-specific value is found in the instance.json or job-flow.json files. The command can refer to a file in Amazon S3 that MapReduce can download and execute.Lastly, the one that we think gets used quite frequently is GangliaThe Ganglia open source project is a scalable, distributed system designed to monitor clusters and gridswhile minimizing the impact on their performance. When you enable Ganglia on your job flow, you cangenerate reports and view the performance of the cluster as a whole, as well as inspect the performanceof individual node instancesTo set up Ganglia monitoring on a job flow, you must specify the Ganglia bootstrap action when you create the job flow. You cannot add Ganglia monitoring to a job flow that is already running. Amazon Elastic MapReduce (Amazon EMR) then installs the monitoring agents and the aggregator that Ganglia uses to report data. Once you have Ganglia setup then you can look at Ganglia detailed metrics like the next slide
When you open the Ganglia web reports in a browser, you see an overview of the cluster’s performance,with graphs detailing the load, memory usage, CPU utilization, and network traffic of the cluster. Belowthe cluster statistics are graphs for each individual server in the cluster. So for example in this job we launched three instances, so in the following reports there are three instance charts showingthe cluster data.
When you don’t put alive the cluster dies down and you don’t pay me.
You can increase or decrease the number of nodes in a running job flow. A job flow contains a single master node. The master node controls any slave nodes that are present. There are two types of slave nodes: core nodes, which hold data to process in the Hadoop Distributed File System (HDFS), and task nodes, which do not contain HDFS. After a job flow is running, you can increase, but not decrease, the number of core nodes. Task nodes also run your Hadoop jobs. After a job flow is running, you can both increase or decrease the number of task nodes.You can modify the size of a running job flow using either the API or the CLI. The AWS Management Console allows you to monitor job flows that you resized, but it does not provide the option to resize job flows.You may include a predefined step in your workflow that automatically resizes a job flow between steps that are known to have different capacity needs. As all steps are guaranteed to run sequentially, this allows you to set the number of slave nodes that will execute a given job flow step.
Enter spot instances
What is the trade off – so in case of hadoop if your task nodes are on spot and they get taken away, your job wont stop and you will be able to continue.
Suppose you have a job which runs for 14 hrs and takes 4 nodes. So 14 nodes running for 4 hrs at 0.45 cents per hour (on-demand) will cost you 25.20 dollars.Now assume that we added 5 more nodes BUT we add it on spot. Since the number of nodes have doubled , the time taken is half , given hadoop’s scalability. So in second case, I pay for 4 instances x 7 hours x 0.45 cents = 12.60 cents and ASSUME spot is at 50% on demand pricing then 5 x spot * time = 7.85 , totalling to 20.475 dollarsSo you save 50% time and 19% cost savings. If you capacity gets taken away you will be back to scenario one – which is what you intended to run earlier. So everything in scenario 2 (bottom one) is a bonus !
Guess this is a great time to talk about what happens in case of a failure. If the master node goes down, your job flow will be terminated and you’ll have to rerun your job. Amazon Elastic MapReduce currently does not support automatic failover of the master nodes or master node state recovery. In case of master node failure, the AWS Management console displays “The master node was terminated” message which is an indicator for you to start a new job flow. Customers can instrument check pointing in their job flows to save intermediate data (data created in the middle of a job flow that has not yet been reduced) on Amazon S3. This will allow resuming the job flow from the last check point in case of failure.Amazon Elastic MapReduce is fault tolerant for slave failures and continues job execution if a slave node goes down. The service also monitors your job flow execution—retrying failed tasks, shutting down problematic instances, and provisioning new nodes to replace those that fail.AWS EMR support name node redundancy using MapR , so if you want to try mapR , please go ahead.
There are two types of logs that store information about your job flow: step-level logs generated by Amazon Elastic MapReduce (Amazon EMR) and Hadoop job logs generated by Apache Hadoop. You need to examine both log types to have complete information about your job flow.Amazon EMR step-level logs contain information about the job flow and the results of each step. These logs are useful when you are debugging problems that you encounter initializing and running the job flow. For example, a step-level log contains status information such as Streaming Command Failed!.Hadoop logs contain information about Hadoop jobs, tasks, and task attempts. They are the standard log files generated by Apache Hadoop.The following image shows the relationship between Amazon EMR job flow steps and Hadoop jobs, tasks, and task attempts.Both step-level logs and Hadoop logs are generated by default and stored on the master node of the job flow. You can access them while the job flow is running by using SSH to connect to the master node as When the job flow ends the master node is terminated and you will no longer be able to access those logs using SSH. To be able to access the log files of a terminated job flow, you can direct Amazon EMR to copy the step-level and Hadoop log files to an Amazon S3 bucketIf you specify that the log files are to be copied to an Amazon S3 bucket, you have the option to have Amazon EMR create an index over those log files to generate debugging information and reports. This index is stored in Amazon SimpleDB and can be accessed by clicking the Debug button in the Amazon EMR console.
Summarize this slide
Quickly show this slide, take the names and move on to move examples as listed down from 49 to 53
There is also support for enterprise products such as Informatica which you probably have heard about. Informatica is the leader in Enterprise data integration space. Their product Hparser allows you to use the cloud to do ETL operations on large data sets.Informatica'sHParser is a tool you can use to extract data stored in heterogeneous formats and convert it into a form that is easy to process and analyze. For example, if your company has legacy stock trading information stored in custom-formatted text files, you could use HParser to read the text files and extract the relevant data as XML. In addition to text and XML, HParser can extract and convert data stored in proprietary formats such as PDF and Word files.HParser is designed to run on top of the Hadoop architecture, which means you can distribute operations across many computers in a cluster to efficiently parse vast amounts of data. Amazon Elastic MapReduce (Amazon EMR) makes it easy to run Hadoop in the Amazon Web Services (AWS) cloud. With Amazon EMR you can set up a Hadoop cluster in minutes and automatically terminate the resources when the processing is complete.
The MapR Hadoop distribution adds dependability and ease of use to the strength and flexibility of Hadoop. The Amazon Elastic MapReduce (EMR) service enables you to easily setup, operate, and scale MapR deployments in the cloud as well as integrate with other AWS services.
NFSThe MapR distribution for Hadoop provides an NFS interface that you can use to mount the cluster. The NFS interface enables you to use standard Linux tools and applications with your cluster directly. You can get data into and out of the cluster with scp, and analyze data with commands like grep, sed, awk, or your own applications or scripts. Amazon EMR with MapR clusters have NFS preconfigured. The cluster is mounted at the /mapr directory on the master node; cluster data and files reside in the directory /mapr/clustername (for example/mapr/my.cluster.com). To use NFS on your Amazon EMR with MapR cluster, log in to the master node via ssh. After logging in to the cluster, you can use standard file-based applications, including Linux utilities, file browsers, and other applications.The MapRdistrbution for Hadoop provides a Hive ODBC driver that conforms to the standard ODBC 3.52 specification
With M5 version of the MapR software you get enterprise features like DR across availability zones, where you can mirror specific data between clustersYou can also extend an on premises MapR cluster to the cloud Last but by no means least – you could do periodic on demand snapshots to S3
So lets look at some of the common design decisions, developers have to make before they start deploying a cluster. The first one is that should I use s3 or should I run HDFS ? Actually you can use both - the choice is yours. Remember with EMR, data is lost as soon as you shut down the cluster since HDFS sits on the local ephemeral drives and dies as soon as the cluster is shutdown.
Take for example the Netflix hadoop platform as a service architecture. Netflix collects a huge amount of data and what you see in the diagram is their hadoop as a service platform built on AWS. They offer big data processing engine to different stakeholders within the business. At the base of the service is S3, where everything that is worth storing is stored and hence is the “single version of truth”. With the scale, cost, global reach and durability , S3 is the perfect place for them to store data. From S3 they run multiple EMR clusters. They like to use EMR instead of building their cluster on EC2 because EMR takes away the undifferentiated heavy lifting.There are various tools used to explore data like HIVE, PIG, JAVA programs and Python code. On top of it they have a job exectution and resource management platform called Genie. Genie is connected to enterprise schedulers and other viz and web tools for data anlaysis.
These are the reasons why customers choose S311 9 of reliability and durabilityVersion control against failure: With S3 you can create version control, which protects the data from a logical corruption. Lets say on your physical cluster, a developer overwrote something and logically corrupted the data. Inspite of 3 times replication of data in HDFS, you cannot recover it. With S3 just roll back. Elastic and practically unlimited size You can run multiple clusters (one production cluster), one SLA driven high performance cluster, many ad-hoc clusters , many dev cluster. Running different types of workflow in parallel gurantees isolation between jobs. Remember 5 , 10 node clusters cost you the same as one 50 node cluster but provides better isolation. NOW if your data is in HDFS, you will need to replicate all the data between each cluster, however with S3, there is one single version of truth and you can run as many clusters as you want Ability to continuously resize clusters on the run can be difficult if you have all your data in HDFS(which can be problematic because data redistribution can happen) with S3 just a single version of truthFailure or spikey load , spin up a new cluster and no need to mirror data across HDFS or create a new cluster and start the job flow http://techblog.netflix.com/2013/01/hadoop-platform-as-service-in-cloud.html
However if you do want to use HDFS, you can.Remember that if the cluster is shutdown then the data is lost.Make sure termination protection is on All data processing happens local and not from S3. Make sure termination protection is on Alternatively consider snapshotting to S3 periodically Use S3DistCP , to move large volumes of data from S3 or push large volumes of data to S3. S3distcp is a tool that is available on EMR can be used to move large amounts of data. Remember, that S3distcp can be runs on multiple nodes so that each node pulls data in parallel.
You can definitely use HDFS on EMR. You need to have