Building cross-region and cross could high availability into your app, a real life use case by Gigaspaces, Nati Shalom, Funder & CTO, Gigaspaces
Achieving high levels of availability and disaster recovery in a cloud environment requires the implementation of patterns and practices that introduce redundancy through multi-zone, multi-region, and multi-cloud deployments. As we move towards implementing higher availability, we cannot escape the direct increase in the accidental complexity of the deployment architecture resulting from lack of cloud portability and deployment lifecycle automation. We present how high availability and disaster recovery were achieved in reality by using the Cloudify open source framework on top of AWS. This approach applies to not just AWS but also other public clouds and private cloud environments such as Eucalyptus. The resulting reference architecture provides portable PostgreSQL replication and disaster recovery as well as application tier scalability across zones, regions, and public/private clouds through a unified deployment workflow.
A cloud management platform (CMP) is fast becoming a de facto requirement for enterprises pursuing a multi-cloud or hybrid cloud strategy. But what should you be looking for in a CMP? Many companies make the mistake of taking a “boil the ocean” approach to a CMP evaluation. We’ll share best practices and discuss whether you need an RFP.
Migrate your Existing Express Apps to AWS Lambda and Amazon API GatewayAmazon Web Services
This webinar teaches you how to use Amazon API Gateway and AWS Lambda to run your existing Express.js applications with just a few lines of code. We will introduce three new features in API Gateway: proxy integrations, greedy paths, and the ANY HTTP method. Combining these features, you can configure API Gateway in a few simple clicks via the management console and express all of your logic and API definition in code.
Learning Objectives:
1. Easier migration to API Gateway and Lambda
2. New API Gateway Catch-all methods
Who Should Attend: Developers
Terraform in production - experiences, best practices and deep dive- Piotr Ki...PROIDEA
In my presentation I would like to share my experiences about working with Terraform in various infra projects (ECS/Kops/Core-infra types). I'm gonna share what's "common-sense" in deploying projects with terraform with several different approaches (Should I use module? Should I write my own? How to structure repo with code? Terraform in Terraform (kops example)?)
Regardless of whether you do nothing, build kit, buy from AWS or another CSP, someone from finance will come back to you and ask what happened to their money. In this session we will cover Cloud ROI: the key economical drivers for moving to the cloud and the tips and tricks for cost optimization on AWS.
A cloud management platform (CMP) is fast becoming a de facto requirement for enterprises pursuing a multi-cloud or hybrid cloud strategy. But what should you be looking for in a CMP? Many companies make the mistake of taking a “boil the ocean” approach to a CMP evaluation. We’ll share best practices and discuss whether you need an RFP.
Migrate your Existing Express Apps to AWS Lambda and Amazon API GatewayAmazon Web Services
This webinar teaches you how to use Amazon API Gateway and AWS Lambda to run your existing Express.js applications with just a few lines of code. We will introduce three new features in API Gateway: proxy integrations, greedy paths, and the ANY HTTP method. Combining these features, you can configure API Gateway in a few simple clicks via the management console and express all of your logic and API definition in code.
Learning Objectives:
1. Easier migration to API Gateway and Lambda
2. New API Gateway Catch-all methods
Who Should Attend: Developers
Terraform in production - experiences, best practices and deep dive- Piotr Ki...PROIDEA
In my presentation I would like to share my experiences about working with Terraform in various infra projects (ECS/Kops/Core-infra types). I'm gonna share what's "common-sense" in deploying projects with terraform with several different approaches (Should I use module? Should I write my own? How to structure repo with code? Terraform in Terraform (kops example)?)
Regardless of whether you do nothing, build kit, buy from AWS or another CSP, someone from finance will come back to you and ask what happened to their money. In this session we will cover Cloud ROI: the key economical drivers for moving to the cloud and the tips and tricks for cost optimization on AWS.
AWS provides a platform that is ideally suited for building highly available systems, enabling you to build reliable, affordable, fault-tolerant systems that operate with a minimal amount of human interaction. This session covers many of the high-availability and fault-tolerance concepts and features of the various services that you can use to build highly reliable and highly available applications in the AWS Cloud: architectures involving multiple Availability Zones, including EC2 best practices and RDS Multi-AZ deployments; loosely coupled and self-healing systems involving SQS and Auto Scaling; networking best practices for high availability, including Elastic IP addresses, load balancing, and DNS; leveraging services that inherently are built with high-availability and fault tolerance in mind, including S3, Elastic Beanstalk and more.
This session is for anyone interested in understanding the financial costs associated with migrating workloads to AWS. By presenting real cases from AWS Professional Services and directly from a customer, we explore how to measure value, improve the economics of a migration project, and manage migration costs and expectations through large-scale IT transformations. We’ll also look at automation tooling that can further assist and accelerate the migration process.
Google Cloud Connect @ Korea
- Google Cloud Vision
- G Suite Product Roadmap
- Google Cloud Security
- Google Cloud Machine Learning
- G suite Customer Stories
Cloud promises a simple pay-as-you-go approach to technology, with cost-savings at the top of the list. As more enterprises adopt the cloud, cost continues to be a major issue with new pricing models, services and features that introduce waste and complexity into the decision-making process. In this webinar, you’ll learn expert strategies that will amplify your cloud performance and maximize your ROI with a level of intricacy that can’t be solved using manual process – tools and expertise are needed.
Docker containers have become a key component of modern application design. Increasingly, developers are breaking their applications apart into smaller components and distributing them across a pool of compute resources.
Software release cycles are now measured in days instead of months. Cutting edge companies are continuously delivering high-quality software at a fast pace. In this session, we will cover how you can begin your DevOps journey by sharing best practices and tools used by the engineering teams at Amazon. We will showcase how you can accelerate developer productivity by implementing continuous Integration and delivery workflows. We will also cover an introduction to AWS CodeStar, AWS CodeCommit, AWS CodeBuild, AWS CodePipeline, AWS CodeDeploy, AWS Cloud9, and AWS X-Ray the services inspired by Amazon's internal developer tools and DevOps practice.
Level: 200
Speaker: Nick Brandaleone - Solutions Architect, AWS
[NEW LAUNCH!] AWS Transit Gateway and Transit VPCs - Reference Architectures ...Amazon Web Services
In this session, we will review the new AWS Transit Gateway and new networking features. We compare AWS Transit Gateway and Transit VPCs and discuss how to architect your accounts and VPCs. This session will be helpful if the developers have been let loose, and you are planning lots of VPCs or accounts. How should you connect them; what limits do you need to be aware of; and how does routing work with many VPCs? We dive into the details of recent launches and how to work with concepts like Transit VPCs, account strategies, scaling services, using firewalls, and direct connect gateways to solve problems of many VPCs.
Organisations are rapidly adopting hybrid cloud strategies to take advantage of both on-premises and cloud services. However, moving applications to the cloud can be difficult and time-consuming, often taking months. VMware offers solutions that customers are using to migrate hundreds of applications to the cloud in a few days. Additionally, VMware solutions simplify day 2 operations by providing consistent infrastructure and operations across on-premises and public cloud services. Come to this session to hear how VMware is helping organisations migrate applications to the cloud, extend their data centers to the cloud, deploy cloud-based disaster recovery solutions, and modernize their applications with the power of VMware and AWS cloud services.
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
Cloud computing provides a number of advantages, such as the ability to scale your web application or website on demand. If you have a new web application and want to use cloud computing, you might be asking yourself, "Where do I start?" Join us in this session for best practices on scaling your resources from one to millions of users. We show you how to best combine different AWS services, how to make smarter decisions for architecting your application, and how to scale your infrastructure in the cloud.
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS SummitAmazon Web Services
With AWS CloudFormation macros, infrastructure-as-code developers can use AWS Lambda functions to empower template authors with utilities to improve their productivity. In this session, we review example use cases to teach you best practices when writing macros. You also learn deployment strategies so your teams can make the most of this functionality.
La plataforma Azure está compuesta por más de 200 productos y servicios en la nube diseñados para ayudarle a dar vida a nuevas soluciones que permitan resolver las dificultades actuales y crear el futuro. Cree, ejecute y administre aplicaciones en varias nubes, en el entorno local y en el perímetro, con las herramientas y los marcos que prefiera.
AWS provides a platform that is ideally suited for building highly available systems, enabling you to build reliable, affordable, fault-tolerant systems that operate with a minimal amount of human interaction. This session covers many of the high-availability and fault-tolerance concepts and features of the various services that you can use to build highly reliable and highly available applications in the AWS Cloud: architectures involving multiple Availability Zones, including EC2 best practices and RDS Multi-AZ deployments; loosely coupled and self-healing systems involving SQS and Auto Scaling; networking best practices for high availability, including Elastic IP addresses, load balancing, and DNS; leveraging services that inherently are built with high-availability and fault tolerance in mind, including S3, Elastic Beanstalk and more.
This session is for anyone interested in understanding the financial costs associated with migrating workloads to AWS. By presenting real cases from AWS Professional Services and directly from a customer, we explore how to measure value, improve the economics of a migration project, and manage migration costs and expectations through large-scale IT transformations. We’ll also look at automation tooling that can further assist and accelerate the migration process.
Google Cloud Connect @ Korea
- Google Cloud Vision
- G Suite Product Roadmap
- Google Cloud Security
- Google Cloud Machine Learning
- G suite Customer Stories
Cloud promises a simple pay-as-you-go approach to technology, with cost-savings at the top of the list. As more enterprises adopt the cloud, cost continues to be a major issue with new pricing models, services and features that introduce waste and complexity into the decision-making process. In this webinar, you’ll learn expert strategies that will amplify your cloud performance and maximize your ROI with a level of intricacy that can’t be solved using manual process – tools and expertise are needed.
Docker containers have become a key component of modern application design. Increasingly, developers are breaking their applications apart into smaller components and distributing them across a pool of compute resources.
Software release cycles are now measured in days instead of months. Cutting edge companies are continuously delivering high-quality software at a fast pace. In this session, we will cover how you can begin your DevOps journey by sharing best practices and tools used by the engineering teams at Amazon. We will showcase how you can accelerate developer productivity by implementing continuous Integration and delivery workflows. We will also cover an introduction to AWS CodeStar, AWS CodeCommit, AWS CodeBuild, AWS CodePipeline, AWS CodeDeploy, AWS Cloud9, and AWS X-Ray the services inspired by Amazon's internal developer tools and DevOps practice.
Level: 200
Speaker: Nick Brandaleone - Solutions Architect, AWS
[NEW LAUNCH!] AWS Transit Gateway and Transit VPCs - Reference Architectures ...Amazon Web Services
In this session, we will review the new AWS Transit Gateway and new networking features. We compare AWS Transit Gateway and Transit VPCs and discuss how to architect your accounts and VPCs. This session will be helpful if the developers have been let loose, and you are planning lots of VPCs or accounts. How should you connect them; what limits do you need to be aware of; and how does routing work with many VPCs? We dive into the details of recent launches and how to work with concepts like Transit VPCs, account strategies, scaling services, using firewalls, and direct connect gateways to solve problems of many VPCs.
Organisations are rapidly adopting hybrid cloud strategies to take advantage of both on-premises and cloud services. However, moving applications to the cloud can be difficult and time-consuming, often taking months. VMware offers solutions that customers are using to migrate hundreds of applications to the cloud in a few days. Additionally, VMware solutions simplify day 2 operations by providing consistent infrastructure and operations across on-premises and public cloud services. Come to this session to hear how VMware is helping organisations migrate applications to the cloud, extend their data centers to the cloud, deploy cloud-based disaster recovery solutions, and modernize their applications with the power of VMware and AWS cloud services.
Scaling Up to Your First 10 Million Users (ARC205-R1) - AWS re:Invent 2018Amazon Web Services
Cloud computing provides a number of advantages, such as the ability to scale your web application or website on demand. If you have a new web application and want to use cloud computing, you might be asking yourself, "Where do I start?" Join us in this session for best practices on scaling your resources from one to millions of users. We show you how to best combine different AWS services, how to make smarter decisions for architecting your application, and how to scale your infrastructure in the cloud.
AWS CloudFormation macros: Coding best practices - MAD201 - New York AWS SummitAmazon Web Services
With AWS CloudFormation macros, infrastructure-as-code developers can use AWS Lambda functions to empower template authors with utilities to improve their productivity. In this session, we review example use cases to teach you best practices when writing macros. You also learn deployment strategies so your teams can make the most of this functionality.
La plataforma Azure está compuesta por más de 200 productos y servicios en la nube diseñados para ayudarle a dar vida a nuevas soluciones que permitan resolver las dificultades actuales y crear el futuro. Cree, ejecute y administre aplicaciones en varias nubes, en el entorno local y en el perímetro, con las herramientas y los marcos que prefiera.
Public cloud's are going to crash. It's inevitable. The best thing you can do is be prepared with a highly available architecture to ensure you're not affected by the outage. Join a live webinar with Gigaspaces founder and CTO Nati Shalom to discuss best practices in high availability to safe guard your cloud from the inevitable outage.
http://www.newvem.com/cloud-webinar-safe-guard-your-application-from-outages/
LinuxFest NW 2013: Hitchhiker's Guide to Open Source Cloud ComputingMark Hinkle
Presented on April 27th, 2013 at LinuxFest NW
Imagine it’s eight o’clock on a Thursday morning and you awake to see a bulldozer out your window ready to plow over your data center. Normally you may wish to consult the Encyclopedia Galáctica to discern the best course of action but your copy is likely out of date. And while the Hitchhiker’s Guide to the Galaxy (HHGTTG) is a wholly remarkable book it doesn’t cover the nuances of cloud computing. That’s why you need the Hitchhiker’s Guide to Cloud Computing (HHGTCC) or at least to attend this talk understand the state of open source cloud computing. Specifically this talk will cover infrastructure-as-a-service, platform-as-a-service and developments in big data and how to more effectively take advantage of these technologies using open source software. Technologies that will be covered in this talk include Apache CloudStack, Chef, CloudFoundry, NoSQL, OpenStack, Puppet and many more.
Specific topics for discussion will include:
Infrastructure-as-a-Service - The Systems Cloud - Get a comparision of the open source cloud platforms including OpenStack, Apache CloudStack, Eucalyptus, OpenNebula
Platform-as-a-Service - The Developers Cloud - Find out what tools are availble to build portable auto-scaling applications including CloudFoundry, OpenShift, Stackato and more.
Data-as-a-Service - The Analytics Cloud - Want to figure out the who, what , where , when and why of big data ? You get an overview of open source NoSQL databases and technologies like MapReduce to help crunch massive data sets in the cloud.
Finally you'll get a overview of the tools that can help you really take advantage of the cloud? Want to auto-scale virtual machiens to serve millions of web pages or want to automate the configuration of cloud computing environments. You'll learn how to combine these tools to provide continous deployment systems that will help you earn DevOps cred in any data center.
[Finally, for those of you that are Douglas Adams fans please accept the deepest apologies for bad analogies to the HHGTTG.]
The Total Cost of Ownership (TCO) of Web Applications in the AWS Cloud - Jine...Amazon Web Services
Weighing the financial considerations of owning and operating a data center facility versus employing a cloud infrastructure requires detailed and careful analysis. In practice, it is not as simple as just measuring potential hardware expense alongside utility pricing for compute and storage resources. The Total Cost of Ownership (TCO) is often the financial metric used to estimate and compare direct and indirect costs of a product or a service. Given the large differences between the two models, it is challenging to perform accurate apples-to-apples cost comparisons between on-premises data centers and cloud infrastructure that is offered as a service. In this presentation, we explain the economic benefits of deploying a web application in the Amazon Web Services (AWS) cloud over deploying an equivalent web application hosted in an on-premises data center and highlight the 5 things to not forget while calculating TCO.
Whitepaper: http://bit.ly/aws-tco-webapps
Intro to cloud computing — MegaCOMM 2013, JerusalemReuven Lerner
What is cloud computing? This is an introduction that I gave at MegaCOMM 2013, a conference for technical writers in Jerusalem. The talk describes how the combination of Internet access, virtualization, and open source have made computing a utility that we can turn on and off at will -- similar in some ways to electricity, water, and other utilities with which we're familiar.
Skycon 2012 - Public, private, and hybrid; software, platform, and infrastructure. This talk will discuss the current state of the Platform-as-a-Service space, and why the keys to success lie in enabling developer productivity, and providing openness and choice.
Thanks to Tony Whitmore for the audio and to Patrick Chanezon for some pieces of the content.
Can we hack open source #cloud platforms to help reduce emissions?Tom Raftery
Cloud computing is changing our lives but this change comes with a cost - pollution.
Can we hack open source cloud platforms to make them report their energy and (more importantly) their emissions, so we can choose the cleanest cloud?
Video of this talk is now online at http://redmonk.com/tv/2012/10/24/can-we-hack-open-source-cloud-platforms-to-help-reduce-emissions/
Symantec’s Avoiding the Hidden Costs of Cloud 2013 Survey found more than 90 percent of all organizations are at least discussing cloud, up from 75 percent a year ago. Other key survey findings showed enterprises and SMBs are experiencing escalating costs tied to rogue cloud use, complex backup and recovery, and inefficient cloud storage.
The 2013 Future of Cloud Computing 3rd Annual Survey was conducted in partnership with GigaOM Research and 57 industry collaborators. It focuses on Cloud adoption, growth, investment, and key trends emanating from the 2011 and 2012 surveys. For additional information and to get involved follow us @futureofcloud #futurecloud and visit http://www.mjskok.com/resource/2013-future-cloud-computing-3rd-annual-survey-results.
AWS Canberra WWPS Summit 2013 - Cloud Computing with AWS: Introduction to AWSAmazon Web Services
Amazon Elastic Compute Cloud (Amazon EC2) provides resizable compute capacity in the cloud and is often the starting point for your first week using AWS. This session will introduce these concepts, along with the fundamentals of EC2, by employing an agile approach that is made possible by the cloud. Attendees will experience the reality of what a first week on EC2 looks like from the perspective of someone deploying an actual application on EC2. You will follow them as they progress from deploying their entire application from an EC2 AMI on day 1 to more advanced features and patterns available in EC2 by day 5. Throughout the process we will identify cloud best practices that can be applied to your first week on EC2 and beyond.
Curious about the cloud? We've got answers. Join HOSTING for an overview of cloud hosting and computing basics. From the history of the cloud to the projected future, we'll investigate the foundation of this $2.1 billion industry.
Disaster Recovery on Demand on the CloudNati Shalom
How to avoid Cloud Outages and leverage cloud economics to keep the cost down through automation of disaster recovery processes and on-demand deployment of the backup nodes.
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Amazon Web Services
Running your Amazon EC2 instances in Auto Scaling groups allows you to improve your application's availability right out of the box. Auto Scaling replaces impaired or unhealthy instances automatically to maintain your desired number of instances (even if that number is one). You can also use Auto Scaling to automate the provisioning of new instances and software configurations as well as to track of usage and costs by app, project, or cost center. Of course, you can also use Auto Scaling to adjust capacity as needed - on demand, on a schedule, or dynamically based on demand. In this session, we show you a few of the tools you can use to enable Auto Scaling for the applications you run on Amazon EC2. We also share tips and tricks we've picked up from customers such as Netflix, Adobe, Nokia, and Amazon.com about managing capacity, balancing performance against cost, and optimizing availability.
Over 60 CIOs and Tech Leaders attended the #GoCloudWebinar on “AGILE INFRASTRUCTURE WITH WINDOWS AZURE” hosted by Aditi Technologies and Microsoft. Our CTO, Wade Wegner and Microsoft Azure solution specialist, Dina Frandsen discussed how Windows Azure Infrastructure Services (WAIS) can help organizations stay agile and what Windows Azure technology environment looks like and what it means to your organization.
We Explored
1. How IT teams can execute fast and stay lean with WAIS – A case study
2. Which enterprise workloads are best suited of WAIS migration
3. What are the best practices on how to plan, execute, deploy WAIS
Download this slidedeck and Sign up with the below link for viewing the Webinar - http://www.aditi.com/webevent/Agile_Infrastructure_with_WAIS/
Azure en Nutanix: your journey to the hybrid cloudICT-Partners
Op zoek naar oplossingen voor een flexibel, schaalbaar, kostenefficiënt en toekomstvast datacenter? Ontdek dan nu de kracht van Microsoft Azure & Nutanix: twee moderne platformen waarmee u de voordelen van uw on-premise infrastructuur kunt combineren met de voordelen van de public cloud.
Presentatie van 30 april 2015
These days, EVERY workload is considered critical by someone in the organization. As a result, SLAs are shrinking. IT is challenged to meet these SLAs, but there isn’t enough budget to provide services like disaster recovery (DR) using traditional methods and infrastructure. The good news is that public cloud platforms, like AWS, are becoming the de facto infrastructure choice for DR. However, workload portability solutions that simplify cross-platform or cloud recovery are required to meet most RTO & RPO SLAs in the cloud. AWS provides the infrastructure we need to bring DR to tier 2 and tier 3 workloads that have never been able to afford it before. Now, we need orchestration and automation to make it scalable and reliable.
In this session you will learn key considerations and practical steps for getting to the AWS cloud and how you can leverage Amazon S3 storage for cost-effective disaster recovery. Dow Jones will also share details on their migration to AWS Cloud, the benefits realized there, and what the future looks like. Session sponsored by Commvault.
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...Amazon Web Services
Playfish, Gumi, and Halfbrick are three of many gaming companies on AWS. Pinterest, Netflix and Flipboard host web and mobile applications using the AWS Cloud. What are the best practices to build an application to take advantage of the benefits of AWS? Learn about these approaches and how customers have built highly scalable, durable and reliable infrastructures to host their internet-facing businesses on AWS. Attend this complimentary webinar to learn more.
19th February 2013, AWS User Group UK, Meetup #3, Managing your apps on AWS: ...AWS User Group UK
Agenda entry: Managing your apps on AWS: Real life lessons with GigaSpaces, Ron Zanver. We’ve all learned Murphy’s inevitable law the hard way – if it can go wrong, it often will! But that doesn’t mean we can’t be ready for such scenarios in the cloud. In this talk, GigaSpaces will focus on the AWS environment, which is dynamic and volatile by nature, and how to maximise your utilisation and minimise downtime. This session will show you how you can architect your cloud-hosted systems to sustain such outages, delving into how to choose the right PaaS for the job, addressing data centre failures, how to avoid single points of failure, and more.
Organiser's commentary: Ron Zanver from GigaSpaces came to talk about the inherent instability of life in the cloud, and what you can do to protect yourself - it's all about good design and architecture. He also introduced us to GigaSaces' new Cloudify product, for abstracting estate management across multiple clouds and cloud vendors.
How to protect your application from outages and failures of cloud infrastructures. Planning disaster recovery architecture and use Cloudify for cloud abstraction and monitoring.
Flink Forward SF 2017: James Malone - Make The Cloud Work For YouFlink Forward
You should spend your time using the powerful Apache Flink ecosystem to get value from your data, not on your data processing infrastructure. Cloud environments can help you with this problem by providing managed services and infrastructure. Since Google Cloud Dataproc, Google's managed service to power the Apache big data ecosystem, runs Flink, you can easily combine the benefits of cloud with your Flink data pipelines. With new support for Flink and long-running streaming jobs, we will show you how you can set up a cluster and a streaming job in less than three minutes.
Scaling Databricks to Run Data and ML Workloads on Millions of VMsMatei Zaharia
Keynote at Scale By The Bay 2020.
Cloud service developers need to handle massive scale workloads from thousands of customers with no downtime or regressions. In this talk, I’ll present our experience building a very large-scale cloud service at Databricks, which provides a data and ML platform service used by many of the largest enterprises in the world. Databricks manages millions of cloud VMs that process exabytes of data per day for interactive, streaming and batch production applications. This means that our control plane has to handle a wide range of workload patterns and cloud issues such as outages. We will describe how we built our control plane for Databricks using Scala services and open source infrastructure such as Kubernetes, Envoy, and Prometheus, and various design patterns and engineering processes that we learned along the way. In addition, I’ll describe how we have adapted data analytics systems themselves to improve reliability and manageability in the cloud, such as creating an ACID storage system that is as reliable as the underlying cloud object store (Delta Lake) and adding autoscaling and auto-shutdown features for Apache Spark.
This presentation provides an introduction to the Cloudify integration plugin with Terraform.
This integration allows Terraform users to use Cloudify to manage configuration and workflow of applications ontop of an infrastructure that was created by Terraform.
What A No Compromises Hybrid Cloud Looks Like Nati Shalom
Expectation vs. reality of a typical enterprise cloud journey
Lesson learned on how to set a cloud native strategy without compromising on the least common denominator, nor going through a complete rewrite
It has long been debated whether OpenStack is production ready. In this session you will learn how a major bank has gone to production with more than 5000 VMs that delivered the results of a 40% decrease in cost, reduced deployment time to hours not weeks, 56 new technologies introduced, 7 new platforms launched - all in under a year. Learn how their platform built on Rackspace and RHEL, coupled with best of breed open source tooling - SaltStack, Jenkins, Cloudify, and Nexus are the enablers for production-grade OpenStack.
http://sched.co/7fH1
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
Video recording: https://www.youtube.com/watch?v=tGlIgUeoGz8
It’s no news that containers represent a portable unit of deployment, and OpenStack has proven an ideal environment for running container workloads. However, where it usually becomes more complex is that many times an application is often built out of multiple containers. What’s more, setting up a cluster of container images can be fairly cumbersome because you need to make one container aware of another and expose intimate details that are required for them to communicate which is not trivial especially if they’re not on the same host.
These scenarios have instigated the demand for some kind of orchestrator. The list of container orchestrators is growing fairly fast. This session will compare the different orchestation projects out there - from Heat to Kubernetes to TOSCA - and help you choose the right tool for the job.
Session link from teh summit: https://openstacksummitmay2015vancouver.sched.org/event/abd484e0dedcb9774edda1548ad47518#.VV5eh5NViko
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
Looking for application orchestration in a hybrid or multi-cloud environment? You’ve got to hear about TOSCA orchestration. TOSCA (Topology and Orchestration Specification for Cloud Applications), brought to you by the same people who brought us XML, enables you to seamlessly migrate your workloads across environments or build a hybrid deployment that runs simultaneously across the VMware cloud offering.
Join our Cloud Online Meetup to learn how Cloudify’s TOSCA-compliant orchestration can be your common management interface across the VMware cloud offering, OpenStack and heterogeneous cloud environments.
Speakers:
Nati Shalom, Founder and CTO at GigaSpaces, is a thought leader in Cloud Computing and Big Data Technologies. Shalom was recently recognized as a Top Cloud Computing Blogger for CIOs by The CIO Magazine and his blog is listed as an excellent blog by YCombinator. Shalom is the founder and also one of leaders of OpenStack Israel group, and is a frequent presenter at industry conferences.
Paco Gomez, Senior Solution Architect at VMware vCloud Air. Paco evaluates and integrates strategic solutions that help vCloud Air clients benefit from VMware's hybrid cloud and application services. Paco is a seasoned technologist, having extensive experience in diverse fields including mainframes, distributed systems, enterprise development, cloud computing, mobile, assistive technology, electrical engineering and embedded systems. Across his career, Paco has held positions in consulting, sales engineering
OpenStack Juno The Complete Lowdown and Tales from the SummitNati Shalom
This presentation covers the main points from the summit and the OpenStack Juno release
It also covers how users use OpenStack based on the recent survey
Application and Network Orchestration using Heat & ToscaNati Shalom
The buzzwords Neutron, Heat, and TOSCA are spoken about quite often when it comes to the OpenStack - and many of us are still trying to make sense of the terminology and its place in the OpenStack world.
Where OpenStack Neutron provides APIs for creating network elements, OpenStack Heat provides an orchestration engine for automating the setup and configuration of OpenStack infrastructure, while TOSCA is a standard for templating and defining application topology and policies (that form the basis for Heat). In this context, it really makes sense to put these all together to achieve application and network automation for OpenStack on steroids.
In this session we will learn how we can use the robust combination of Heat and TOSCA to configure and control resources on Nova and Neutron in order to automate the network configuration as part of the application deployment.
The session will include a demo and code examples that show how you can configure virtual networks, attach public IPs, set up security groups, set up load balancing and automatically scale up/down and more. You will leave this session understanding where Neutron meets Heat and TOSCA.
This talk was delivered as part of OpenStack Paris summit - 2014 - http://openstacksummitnovember2014paris.sched.org/event/2b85b682ccaf3a5961e463b61e2403f8#.VFeuG_TF8mc
During the past few years we’ve seen how our entire data-center becomes software defined. This include the Compute, Storage, Network and also Configuration. This new data centre is the cloud.
The missing piece in the puzzle:
While this is pretty much old news there is one big thing that is missing in this puzzle and that is the operator itself.
The operator is responsible for running processes such:
* Installation of new apps
* Upgrades and update of new features or patches
* Performance tuning
* Handling failure
* Managing the capacity to meet the scaling demand.
Most of those tasks today involves lots of human intervention. Users who realised that gap try to mitigate that by putting their own custom automation - usually that comes in a form of scripts on-top of the configuration management. Those custom scripts tend to grow fairly quickly to the point where they become unmanageable.
This presentation will introduce how we can use an orchestrator to automate those tasks and by that create a software defined Operator.
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf
Is Orchestration the Next Big Thing in DevOpsNati Shalom
DevOps processes (such as continuous deployment and delivery) often involve writing many custom scripts that are triggered by the build system. With that approach, it is relatively hard to trace the deployment process and troubleshoot when something goes wrong. Additionally, custom scripts are often not written in an easily understood manner. In this session we will walk through specific DevOps workflows (such as install, update, etc) using Riemann as the framework in subject and see the steps required to automate those processes. We will also discuss how Cloudify uses Riemann to provide simple execution and monitoring of those workflow processes. We will share how one customer, PaddyPower, was able to leverage Cloudify to transition their traditional IT into a DevOps environment, bridging the gap betwe
When networks meets apps (open stack atlanta)Nati Shalom
Recent advancements in OpenStack capabilities have made the cloud better tuned to enterprise needs by introducing much more flexible network designs and networking services, with the tradeoff of making the cloud more complex.
In this session we will describe how we can leverage the power of the new networking advancement without exposing the complexity to the end user. We will present alternative approaches and their tradeoffs for automating the deployment of a typical n-tier enterprise application that include multi-tenant environment, separate network for admin and applications, cross region network, attach a floating IP, setup security groups etc. all through a combination of Heat, TOSCA, Chef, Puppet, and more.
The experience of automating continuous delivery processes with Chef and Cloudify through an application-centric approach to DevOps, and how this model transformed PaddyPower's traditional IT into DevOps, keeping their Devs and their Ops happy.
References:
---------------
- Cloudify & Chef : http://www.cloudifysource.org/guide/2.7/integrations/chef_documentation
- Blog Post: http://www.cloudifysource.org/2013/10/27/application_centric_approach_to_devops.html
- Earlier Video Presentation : http://www.youtube.com/watch?v=YhDNKyP_s7U
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Avoiding Cloud Outage
1. Protect your app from Outages
Nati Shalom CTO GigaSpaces
@natishalom
May 2013
2. AWS and outages
Outage impact
Disaster Recovery – it’s all about redundancy!
Cloudify as a solution for redundancy
Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved2
AGENDA
3. 3
AWS USAGE
• AWS – around 0.5M servers
• Facebook – less than 0.1M servers
• Google – around 1M servers
5. OUTAGE – APRIL 21, 2011
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved5
6. OUTAGE - JUNE 29, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved6
7. OUTAGE - OCTOBER 22, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved7
8. OUTAGE - CHRISTMAS EVE 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved8
9. NOT ONLY AMAZON
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved9
28 December 2012 - some owners of
Microsoft's XBox 360 gaming console were
unable to access some of their cloud-based
storage files.
26 July 2012 - Service for Microsoft’s
Windows Azure Europe region went down for
more than two hours
29 February 2012 - The ultimate result was
service impacts of 8-10 hours for users of
Azure data centers in Dublin, Ireland, Chicago,
and San Antonio.
10. 10
THAT’S WHAT YOU EXPECT?
99% - 3.65 days downtime
99.9% - 8.76 hours downtime
99.99% - 53 minutes downtime
99.999% - 5.26 minutes downtime
11. ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved11
OUTAGE IMPACT – DESIGN FOR FAILURES
Outage could cost…
$89K per hour for Amadeus
$225K per hour for PayPal!
14. 14
PREPARE FOR DISASTER RECOVERY
•Dedicated expert for DR architecture
•Define target recovery time & point
•Assume every tier can fail
•Use monitoring and alerts
•Document your operational processes
22. BUILT IN SUPPORT FOR MANAGING DATA IN THE CLOUD
Real Time Relational DB
Clusters
NoSQL Clusters Hadoop
Storm MySQL MongoDB Hadoop (Hive,
Pig,..)
Elastic Caching XAP Postgress Cassandra ZooKeeper
Couchbase
ElasticSearch
24. VERIFI (CURRENT) DEPLOYMENT ARCHITECTURE
24
Availability region (US-West: Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
EC2 Instance
EC2 Instance
PostgresSQL
Cassandra
4 recipes
25. TARGET ARCHITECTURE
Availability Region (US-West Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
Postgres Master
EC2 Instance
EC2 Instance
Cassandra
Availability Region (US-East Virginia)
Data Volume
EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
Postgres Slave
EC2 Instance
EC2 Instance
Cassandra
replication
Bootstrap two EC2 clouds in different regions, install the “verifi” application on each. The second cloud will have a slightly modified
(extended) postgres recipe for acting as a slave + no running app servers. Upon the primary zone failure, the second cloud will spin up
instances of the app servers and turn the data instance into master, then bootstrapping another “slave” cloud in another zone.
26. FAILOVER SCENARIO
26
Region (US-West Oregon)
App Servers
PostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
App Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure
occurs
Bootstrap another cloud in
a different region using the
same application recipe
used to bootstrap cloud #2
above*
Liveness poll
Liveness poll
Upon initial deployment, the primary deployment
of the application will be bootstrapped onto cloud
#1, another slightly modified application recipe
will be bootstrapped as cloud #2, polling cloud #1
for failure, and acting as a PostgresSQL db slave.
Turn Postgres slave into
master, Start app server
instances*
27. Copyright 2012 Gigaspaces. All Rights Reserved27
NEXT STEPS
Across clouds
(AWS, Rackspace, Azure…etc)
Across AWS regions
Across AWS zones
1 application
+ overrides
Several cloud
drivers
1 application
+ overrides
1 cloud driver
1 application +
overrides
1 cloud driver
Availability
Supported by
Verifi phase #1
28. Copyright 2012 Gigaspaces. All Rights Reserved28
EVOLUTION PATH
Availability
Complexity Multi
cloud/provider
Multi
region
Multi
zone
Multi
instance
Multi
cloud/provider
Multi
region
Multi
zoneMulti
instance
29. AWS and outages
Outage impact
Disaster Recovery – it’s all about redundancy!
Cloning your environment – app stack
Cloning your DB – Replication
Cloudify as a solution for Redundancy
Use recipes to work on any cloud
Fast and customized data replication
Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved29
SUMMARY
A high-ranking Amazon executive said there are 60,000 different customers across the various Amazon Web Services, and most of them are not the startups that are normally associated with on-demand computing. Rather the biggest customers in both number and amount of computing resources consumed are divisions of banks, pharmaceuticals companies and other large corporations who try AWS once for a temporary project, and then get hooked. According to Statspotting.com in March 2012 - researcher estimates that Amazon Web Services is using at least 454,400 servers in seven data center hubs around the globe. Let us try this: Google is powered by a million servers. Maybe a little more than that. And Amazon has half a million servers. Now, things fall in place. Facebook, the service that takes up one fourth of all our time online, is powered by less than 100,000 servers.Biggest customers – pinterest, instagram, Netflix, heroku, quora, foursquare etcAmazon Web Services runs more than 835,000 requests per second for hundreds of thousands of customers in 190 countries, including 300 government agencies and 1,500 educational institutions.
The Amazon cloud proved itself in that sufficient resources were available world-wide such that many well-prepared users could continue operating with relatively little downtime. But because Amazon’s reliability has been incredible, many users were not well-prepared leading to widespread outages.Amazon EC2 outage on April 2011 was the worst in cloud computing’s history back then. It made the front page of many news pages, including the New York Times, probably because many people were shocked by how many web sites and services rely on EC2.Microsoft Azure outageDec 28 2012 - some owners of Microsoft's Xbox 360 game console were unable to access some of their cloud-based save storage files.July 26 - 2012 - Service for Microsoft’s Windows Azure Europe region went down for more than two hoursFeb 29 2012 - The ultimate result was service impacts of 8-10 hours for users of Azure data centers in Dublin, Ireland, Chicago, and San Antonio.
Some parts of Amazon Web Services suffered a major outage. A portion of volumes utilizing the Elastic Block Store (EBS) service became "stuck" and were unable to fulfill read/write requests. It took at least two days for service to be fully restored. Reddit, one of the better-known sites to go down due to the error, said it has 700 EBS volumes with Amazon.Sites like Quora and Reddit were able to come back online in "read-only" mode, but users couldn't post new content for many hours.
For second time in less than a month, Amazon’s Northern Virginia data center has suffered an outage and is impacting many popular services such as Instagram, Pinterest & Netflix.Several websites that rely on Amazon Web Services were taken offline due to a severe storm of historic proportions in the Northern Virginia area where Amazon's largest datacenter is located. Amazon previously suffered an outage in its Northern Virginia facilities on June 14, 2012.A line of severe storms packing winds of up to 80 mph has caused extensive damage and power outages in Virginia. Dominion Virginia Power crews are assessing damages and will be restoring power where safe to do so.
A major outage occurred, affecting many sites such as reddit, Foursquare, Pinterest, and others. The cause was a latent bug in an operational data collection agent. A memory leak and a failed monitoring system caused the Amazon Web Services outage on Monday that took out Reddit and other major services.According to a post Friday night, AWS explained that the problem arose after a simple replacement of a data collection server. After installation, the server did not propagate its DNS address correctly and so a fraction of servers did not get the message. Those servers kept trying to reach the server, which led to a memory leak that then went out of control due to the failure of an internal monitoring alarm. Eventually the system ground to a virtual stop and millions of customers felt the pain.
Amazon AWS again suffered an outage, causing websites such as Netflix instant video to be unavailable for some customers, particularly in the North-eastern US. Amazon later issued a statement detailing the issues with the Elastic Load Balancing service that led up to the outage.The disruption began shortly after noon Pacific time on December 24 when data was accidentally deleted by a developer during maintenance on the East Coast Elastic Load Balancing system, which is designed to distribute traffic volume among servers."Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two," the company said in ablog post this afternoon. "We are working on ways of extending our resiliency to handle partial or complete regional outages."
Fault tolerant systems are measured by their uptime / downtime for end usersAmazon says it is "committed" to a 99.95 percent uptime
Although AWS went offline for a few hours only, the downtime experience did have an impact on customers’ businesses. There is no known data for the number of people affected by a cloud computing service outage. It is estimated that the travel service provider Amadeus loses $89,000 per hour during any cloud computing outage, while Paypal loses around $225,000 per hour.
DR – The process and procedures you take to restore your system after catastrophic event.Cloud infrastructure has made DR much easier and affordable comparing to previous options.Cloud can also suffer from large scale failures because of network, power or any IT failures.Applications owners need to be responsible for HA and DR – can use multiple servers, AZ, regions and even clouds.Zones within a region share a LAN so they have high bandwidth, low latency and private IP access. Zones utilize separate power resources. Regions are “islands” – they share no resources.
Each cloud is unique in many aspects offering different API and functionality to manage the resources.Different set of available resourcesDifferent format, encoding and versionsDifferent security groups, machine images, snapshots etc.
Make sure to have a dedicated expert to manage your DR architecture, processes and testing.Define what your target recovery time and recovery point is.Be pessimistic and design for failures – (assume everything will fail and design a solution that is capable of handling it). Avoid single point of failures – all parts of your app should be highly available (different AZ / regions / cloud) – load balancers, app servers, web servers, message bus, database.Use monitoring and alerts for failover processes and for every change in state.Document your DR operational processes and automations.Try to “break” different part in your application. Try different ways to break it – unplug the network, turn machine off etc. Try it again.
Netflix has open sourced ”Chaos Monkey,” its tool designed to purposely cause failure in order to increase the resiliency of an application in Amazon Web Services (AWS.)It’s a timely move as AWS has had its fair share of outages. With tools like Chaos Monkey, companies can be better prepared when a cloud infrastructure has a failure.In a blog post, Netflx says that this is the first of several tools that it will open source to help companies better manage the services they run in cloud infrastructures. Next up is likely to be Janitor Monkey which helps keep an environment tidy and costs down.Chaos Monkey has achieved its own fame for its innovative approach. According to Netflix, the tool “randomly disables production instances to make sure it can survive common types of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption.”
Netflix provides an excellent toolset for surving outages at the operation level.In this part i wanted to zoom-in more on the design implication of our application.The core principle for surviving failure is actually fairly simple and in fact applies to any systems not just cloud whether they happen to be Airplane, Missiles, Cars etc.. At the end its all about redundancy. The degree of tolerance is often determined by how many alternate systems or parts of the system we have in our design and how much they are separated from one another. The degree tolerance is also determined by how fast we can detect the broken part in our system and make the switch. In software terms the common parts that comprises our system is built out of two main groups - the business logic and the data.Making a redundant software application that can survive failure is often based on setting up clones for two of those parts of our system.
We need abstraction – we don’t want to be locked in. We want to use tools that offer this abstraction layers both for daily management and for DR. This tool should translate our architecture concepts to the cloud specific properties (using recipes).To clone our application business logic we need to be able to ensure that all parts of our system runs the exact same version of all our software components . That include not just the binaries but also the configuration, the scripts that runs our application and more importantly that all our post deployment procedures such as fail-over, scaling and monitoring are also kept consistent. Quite often the things that makes the cloning of our business logic complex is due to the fact that the information on how to run our application is often scattered in many different sources such as scripts, as well as the mind of the people that runs those apps. To make the job of cloning our application much simpler and thus more consistent we need to be able to capture all parts of the information for running our apps in the same place. Configuration management tools such as Chef, Puppet and in the case of Amazon CloudFormation can help on this regard.
RDS read replica - Amazon RDS uses MySQL’s built-in replication functionality to create a special type of DB Instance called a Read Replica that allows you to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. Once you create a Read Replica, database updates on the source DB Instance are replicated to the Read Replica using MySQL’s native, asynchronous replication. Since Read Replicas leverage standard MySQL replication, they may fall behind their sources, and they are therefore not intended to be used for enhancing fault tolerance in the event of source DB Instance failure or Availability Zone failure.
There are lots of patterns on how to avoid failure.It took Netflix lots of development work to build a framework that can handle them well.Most users, startup don't have the lactury of implementing them themselves. You need a tool that will enable you to automate those patterns in a consistent way. - Enter Cloudify
Any App, Any Stack — Move your application to the cloud without making any code changes, regardless of the application stack (Java/Spring, Java EE, Ruby on Rails, …), database store (relational such as MySQL or non-relational such as Apache Cassandra), or any other middleware components it uses. This enables you to achieve your objective of no code changes.To make the work of setting all this work simpler we tried to bake all those patterns into a readymade tools and are scripted into out of the box recipes. The cloudify recipes includes: Database cluster recipes with support for MySQL, MongoDB, Cassandra, Postgress etc..Integration with Chef and Puppet Automation of fail-over, scaling and continues maintenance of our application.Application recipes that allows you to capture all the aspect of running your application including the post deployment aspect such as fail-over, scaling and monitoring.
There are lots of patterns on how to avoid failure.It took Netflix lots of development work to build a framework that can handle them well.Most users, startup don't have the lactury of implementing them themselves. You need a tool that will enable you to automate those patterns in a consistent way. - Enter Cloudify
Cloud brings lots of promise for making our business more agile.Cloud has also become a huge shared infrastructure in which every failure has a much more significant impact on our business world wide.The experience in the past year had tought us that even a robust cloud infrastructure such as Amazon can fail. Through this experience we've learned that rather than relying on the infrastructure for preventing failure we need to design our system to cope with failure and get used to failure as away of life. Having said that the investment required to build a robust application can be fairly large and not something that everyone can afford.Using tools like Cloudify, Chef Puppet and if your a pure Amazon shop Netflix <framework> could help greatly to reduce this effort by making a lot of those patterns pre-backed into recipes.