The document discusses outages on cloud platforms like AWS and strategies for disaster recovery. It notes that AWS has experienced several outages in 2011, 2012, and 2013, costing users millions per hour. To prepare for outages and enable fast disaster recovery, the document recommends having redundancy across multiple cloud regions and availability zones using automation and data replication. It presents a case study of a company that used the Cloudify platform to clone their application environment and database across different AWS regions, enabling failover between regions in seconds in the event of an outage. This approach significantly reduced recovery time objectives while keeping costs lower than maintaining a dedicated hot disaster recovery site.
The experience of automating continuous delivery processes with Chef and Cloudify through an application-centric approach to DevOps, and how this model transformed PaddyPower's traditional IT into DevOps, keeping their Devs and their Ops happy.
References:
---------------
- Cloudify & Chef : http://www.cloudifysource.org/guide/2.7/integrations/chef_documentation
- Blog Post: http://www.cloudifysource.org/2013/10/27/application_centric_approach_to_devops.html
- Earlier Video Presentation : http://www.youtube.com/watch?v=YhDNKyP_s7U
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Motoki Kakinuma
NTT Data is an IT service company.
Kirin is one of the largest beverages companies in Japan.
In this presentation, we will present the user story of migrating all applications from creaky infrastructure to OpenStack private cloud including actual challenges, know-hows and future prospects.
The key concept of this project is:
* Mission Critical: Migrate all Kirin enterprise applications to OpenStack private cloud.
* Think Big, Start Small: Start from small number of apps, and expand rapidly.
* Agility and elasticity: Adopt a PaaS-like automation approach, targeting 50% less development cost and 40% less operational cost.
In order to achieve all items above, we have decided to use OpenStack IaaS, ICO, which is an automation product by IBM, serverspec for testing, and Hinemos for monitoring management.
Starting from Aug 2014, the project expects 100 VM / 100 TB storage as the first-stage migration by end of 2015. We're planning to migrate 500 VM / 300 TB by end of 2016 and 2000 VM / 1 PB finally.
The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. In this session, you'll learn how to build your big data "database on-demand" using MongoDB, Cassandra, Solr, MySQL, or any other big data solution, as well as manage your big data application using a new open source framework called “Cloudify.” All this, on top of the OpenStack cloud.
Automating your OpenStack environment with Chef, Puppet and Cloudify Nati Shalom
This session teaches you how to use configuration and DevOps tools like Chef and Puppet to setup your OpenStack environment by using Cloudify to automate the deployment and orchestration of applications and services in that environment.
The experience of automating continuous delivery processes with Chef and Cloudify through an application-centric approach to DevOps, and how this model transformed PaddyPower's traditional IT into DevOps, keeping their Devs and their Ops happy.
References:
---------------
- Cloudify & Chef : http://www.cloudifysource.org/guide/2.7/integrations/chef_documentation
- Blog Post: http://www.cloudifysource.org/2013/10/27/application_centric_approach_to_devops.html
- Earlier Video Presentation : http://www.youtube.com/watch?v=YhDNKyP_s7U
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Motoki Kakinuma
NTT Data is an IT service company.
Kirin is one of the largest beverages companies in Japan.
In this presentation, we will present the user story of migrating all applications from creaky infrastructure to OpenStack private cloud including actual challenges, know-hows and future prospects.
The key concept of this project is:
* Mission Critical: Migrate all Kirin enterprise applications to OpenStack private cloud.
* Think Big, Start Small: Start from small number of apps, and expand rapidly.
* Agility and elasticity: Adopt a PaaS-like automation approach, targeting 50% less development cost and 40% less operational cost.
In order to achieve all items above, we have decided to use OpenStack IaaS, ICO, which is an automation product by IBM, serverspec for testing, and Hinemos for monitoring management.
Starting from Aug 2014, the project expects 100 VM / 100 TB storage as the first-stage migration by end of 2015. We're planning to migrate 500 VM / 300 TB by end of 2016 and 2000 VM / 1 PB finally.
The massive computing and storage resources that are needed to support big data applications make cloud environments an ideal fit. In this session, you'll learn how to build your big data "database on-demand" using MongoDB, Cassandra, Solr, MySQL, or any other big data solution, as well as manage your big data application using a new open source framework called “Cloudify.” All this, on top of the OpenStack cloud.
Automating your OpenStack environment with Chef, Puppet and Cloudify Nati Shalom
This session teaches you how to use configuration and DevOps tools like Chef and Puppet to setup your OpenStack environment by using Cloudify to automate the deployment and orchestration of applications and services in that environment.
When networks meets apps (open stack atlanta)Nati Shalom
Recent advancements in OpenStack capabilities have made the cloud better tuned to enterprise needs by introducing much more flexible network designs and networking services, with the tradeoff of making the cloud more complex.
In this session we will describe how we can leverage the power of the new networking advancement without exposing the complexity to the end user. We will present alternative approaches and their tradeoffs for automating the deployment of a typical n-tier enterprise application that include multi-tenant environment, separate network for admin and applications, cross region network, attach a floating IP, setup security groups etc. all through a combination of Heat, TOSCA, Chef, Puppet, and more.
Operations Playbook: Monitoring and Automation - RightScale Compute 2013RightScale
Speaker: Chris Deutsch - Systems Administrator, RightScale
As a systems administrator, what is the best way to ensure that you don’t get paged in your sleep or on your days off? The RightScale operations team manages hundreds of cloud servers, as well as a host of other cloud services, to deliver always-on production applications. The RightScale Ops Team will share tips as power users of RightScale, including running batch updates, automating scaling, adding custom monitoring graphs, and troubleshooting configuration and performance issues.
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
SFHUG presentation from February 2, 2016. One of the key values of the Hadoop ecosystem is its flexibility. There is a myriad of components that make up this ecosystem, allowing Hadoop to tackle otherwise intractable problems. However, having so many components provides a significant integration, implementation, and usability burden. Features that ought to work in all the components often require sizable per-component effort to ensure correctness across the stack.
Lenni Kuff explores RecordService, a new solution to this problem that provides an API to read data from Hadoop storage managers and return them as canonical records. This eliminates the need for components to support individual file formats, handle security, perform auditing, and implement sophisticated IO scheduling and other common processing that is at the bottom of any computation.
Lenni discusses the architecture of the service and the integration work done for MapReduce and Spark. Many existing applications on those frameworks can take advantage of the service with little to no modification. Lenni demonstrates how this provides fine grain (column level and row level) security, through Sentry integration, and improves performance for existing MapReduce and Spark applications by up to 5×. Lenni concludes by discussing how this architecture can enable significant future improvements to the Hadoop ecosystem.
About the speaker: Lenni Kuff is an engineering manager at Cloudera. Before joining Cloudera, he worked at Microsoft on a number of projects including SQL Server storage engine, SQL Azure, and Hadoop on Azure. Lenni graduated from the University of Wisconsin-Madison with degrees in computer science and computer engineering.
How to migrate workloads to the google cloud platformactualtechmedia
IT Organizations of all sizes are moving their workloads to the public cloud in order to gain business agility, unlimited workload scalability, and free their time to work on the projects that matter. One of the leaders in public cloud is the Google Cloud Platform (GCP)
Considerations for Your Next Cloud Project – CloudForms & OpenStack Do’s and Don’ts
In this Session we will discuss Organizational and Operational Considerations on how to move into Infrastructure as a Service Environments and showcase how Enterprises today address different aspects of Cloud Management.
Focus of this session is on Design and Operational Aspects of running an Open Hybrid Cloud. The session will also touch on Process and Organizational Aspects.
Responding to Digital Transformation With RDS Database TechnologyAlibaba Cloud
See Webinar Recording at https://resource.alibabacloud.com/webinar/detail.htm?webinarId=30
Learn how your business can gain an advantage and utilize ApsaraDB for RDS to enhance data flexibility, security, performance, and stability amid the backdrop of digital transformation.
Chen Zhaoshang of the Alibaba Cloud Database Product team will share how traditional industries can meet database challenges and leverage distributed architecture transformation using MySQL and other open technologies to reduce the overall cost of ownership and realize cross-data-center disaster tolerance deployment while ensuring data consistency. He will also share how to ensure database security databases and alleviate concerns about data migration to the cloud.
ApsaraDB for RDS: www.alibabacloud.com/product/apsaradb-for-rds
More Webinars: https://resource.alibabacloud.com/webinar/index.htm
OpenStack Architected Like AWS (and GCP)Randy Bias
A description of how we built Open Cloud System (OCS), an OpenStack-powered complete cloud operating system. With a focus on AWS and GCE interoperability, we describe why hybrid cloud interoperability matters and how we got there. Anyone can do it and we think you should too.
When networks meets apps (open stack atlanta)Nati Shalom
Recent advancements in OpenStack capabilities have made the cloud better tuned to enterprise needs by introducing much more flexible network designs and networking services, with the tradeoff of making the cloud more complex.
In this session we will describe how we can leverage the power of the new networking advancement without exposing the complexity to the end user. We will present alternative approaches and their tradeoffs for automating the deployment of a typical n-tier enterprise application that include multi-tenant environment, separate network for admin and applications, cross region network, attach a floating IP, setup security groups etc. all through a combination of Heat, TOSCA, Chef, Puppet, and more.
Operations Playbook: Monitoring and Automation - RightScale Compute 2013RightScale
Speaker: Chris Deutsch - Systems Administrator, RightScale
As a systems administrator, what is the best way to ensure that you don’t get paged in your sleep or on your days off? The RightScale operations team manages hundreds of cloud servers, as well as a host of other cloud services, to deliver always-on production applications. The RightScale Ops Team will share tips as power users of RightScale, including running batch updates, automating scaling, adding custom monitoring graphs, and troubleshooting configuration and performance issues.
Simplifying Hadoop with RecordService, A Secure and Unified Data Access Path ...Cloudera, Inc.
SFHUG presentation from February 2, 2016. One of the key values of the Hadoop ecosystem is its flexibility. There is a myriad of components that make up this ecosystem, allowing Hadoop to tackle otherwise intractable problems. However, having so many components provides a significant integration, implementation, and usability burden. Features that ought to work in all the components often require sizable per-component effort to ensure correctness across the stack.
Lenni Kuff explores RecordService, a new solution to this problem that provides an API to read data from Hadoop storage managers and return them as canonical records. This eliminates the need for components to support individual file formats, handle security, perform auditing, and implement sophisticated IO scheduling and other common processing that is at the bottom of any computation.
Lenni discusses the architecture of the service and the integration work done for MapReduce and Spark. Many existing applications on those frameworks can take advantage of the service with little to no modification. Lenni demonstrates how this provides fine grain (column level and row level) security, through Sentry integration, and improves performance for existing MapReduce and Spark applications by up to 5×. Lenni concludes by discussing how this architecture can enable significant future improvements to the Hadoop ecosystem.
About the speaker: Lenni Kuff is an engineering manager at Cloudera. Before joining Cloudera, he worked at Microsoft on a number of projects including SQL Server storage engine, SQL Azure, and Hadoop on Azure. Lenni graduated from the University of Wisconsin-Madison with degrees in computer science and computer engineering.
How to migrate workloads to the google cloud platformactualtechmedia
IT Organizations of all sizes are moving their workloads to the public cloud in order to gain business agility, unlimited workload scalability, and free their time to work on the projects that matter. One of the leaders in public cloud is the Google Cloud Platform (GCP)
Considerations for Your Next Cloud Project – CloudForms & OpenStack Do’s and Don’ts
In this Session we will discuss Organizational and Operational Considerations on how to move into Infrastructure as a Service Environments and showcase how Enterprises today address different aspects of Cloud Management.
Focus of this session is on Design and Operational Aspects of running an Open Hybrid Cloud. The session will also touch on Process and Organizational Aspects.
Responding to Digital Transformation With RDS Database TechnologyAlibaba Cloud
See Webinar Recording at https://resource.alibabacloud.com/webinar/detail.htm?webinarId=30
Learn how your business can gain an advantage and utilize ApsaraDB for RDS to enhance data flexibility, security, performance, and stability amid the backdrop of digital transformation.
Chen Zhaoshang of the Alibaba Cloud Database Product team will share how traditional industries can meet database challenges and leverage distributed architecture transformation using MySQL and other open technologies to reduce the overall cost of ownership and realize cross-data-center disaster tolerance deployment while ensuring data consistency. He will also share how to ensure database security databases and alleviate concerns about data migration to the cloud.
ApsaraDB for RDS: www.alibabacloud.com/product/apsaradb-for-rds
More Webinars: https://resource.alibabacloud.com/webinar/index.htm
OpenStack Architected Like AWS (and GCP)Randy Bias
A description of how we built Open Cloud System (OCS), an OpenStack-powered complete cloud operating system. With a focus on AWS and GCE interoperability, we describe why hybrid cloud interoperability matters and how we got there. Anyone can do it and we think you should too.
IT-Centric Disaster Recovery & Business ContinuitySteve Susina
This presentation was delivered to the Business Resumption Planners Association of Chicago meeting on 3/11/2010.
IT leaders who assume responsibility for their firm's DR/BC efforts need to understand how to build a cross-organization strategy that transcends IT organizational boundaries. In the presentation, we discuss the need for IT leaders to reach across the aisles to work with Line-of-Business leaders, and present a six-step framework on how to accomplish a cross-business IT-centric strategy.
AWS Webcast - Discover Disaster Recovery Solutions in the CloudAmazon Web Services
Join Amazon Web Services for a webinar on how others are using the AWS Cloud to enable faster disaster recovery of their IT systems without incurring infrastructure expenses. Join us for an informative webinar on how AWS Cloud supports many popular disaster recovery (DR) architectures from “pilot light” environments that are ready to scale up at a moment’s notice to “hot standby” environments that enable rapid fail-over. With infrastructure centers in 10 regions around the world, AWS provides a set of cloud-based DR services that enable rapid recovery of your IT infrastructure and data.
the emergency assessment to be done carefully and immediately .the emergency nurse have quick review and deliver the health carein the quality manner in all the fields of health care as medical,surgical, paediatric ,and obstertics .
State, Local and Education customers are using the AWS cloud to enable faster disaster recovery of their mission critical IT systems without incurring the infrastructure expense of a second physical site. Join us for an informative webinar on how AWS cloud supports many popular disaster recovery (DR) architectures from “pilot light” environments that are ready to scale up at a moment’s notice to “hot standby” environments that enable rapid failover. With infrastructure centers in 10 regions around the world, AWS provides a set of cloud-based DR services that enable rapid recovery of your IT infrastructure and data.
This presentation on Triage and transport deals with how we should we deal with the patients who are attending the emergency department and to provide best treatment for the needy patients at appropriate time.
I hope this will be helpful to nurses, paramedics, graduate and under graduate students and emergency doctors and team.
Building cross-region and cross could high availability into your app, a real life use case by Gigaspaces, Nati Shalom, Funder & CTO, Gigaspaces
Achieving high levels of availability and disaster recovery in a cloud environment requires the implementation of patterns and practices that introduce redundancy through multi-zone, multi-region, and multi-cloud deployments. As we move towards implementing higher availability, we cannot escape the direct increase in the accidental complexity of the deployment architecture resulting from lack of cloud portability and deployment lifecycle automation. We present how high availability and disaster recovery were achieved in reality by using the Cloudify open source framework on top of AWS. This approach applies to not just AWS but also other public clouds and private cloud environments such as Eucalyptus. The resulting reference architecture provides portable PostgreSQL replication and disaster recovery as well as application tier scalability across zones, regions, and public/private clouds through a unified deployment workflow.
Cloudifying High Availability: The Case for Elastic Disaster RecoveryAli Hodroj
Elastic DR: a solution architecture that aims to optimally balance cost and recovery time via three core principles that are germane the cloud world:
On-Demand: The disaster recovery cloud can be provisioned on any availability zone, region, or public/private cloud through Cloudify's cloud-agnostic bootstrapping mechanism.
Elastic: The ability to automatically provision resources in the recovery cloud in case of disaster while eliminating the need for idle resources in normal scenarios, thereby fully profiting from the pay-per-use pricing model of clouds.
Flexible RTO/RPO: The architecture can be easily extended from a warm DR to a hot DR pattern through enabling/disabling application recipes. This allows us to exploit economies of scale that the cloud provides by matching the number of recipes/tiers to provision (in the recovery cloud) against the recovery time/point objective for our disaster recovery strategy
Strategies for Seamless Backup and Disaster Recovery with AWSAmazon Web Services
by Isaiah Weiner, Sr. Manager of Solutions Architecture, AWS
Businesses of all sizes and industries are using the AWS cloud to enable faster disaster recovery of their critical IT systems without incurring the infrastructure expense of a second physical site. With our global scale, AWS provides a set of cloud-based backup and disaster recovery services that enable rapid recovery of your IT infrastructure and data. Learn best practices from AWS experts. During the second half of this session, an APN B&R provider will demo enterprise backup solutions that are helping customers recovery data and systems quickly in the event of a disaster.
These days, EVERY workload is considered critical by someone in the organization. As a result, SLAs are shrinking. IT is challenged to meet these SLAs, but there isn’t enough budget to provide services like disaster recovery (DR) using traditional methods and infrastructure. The good news is that public cloud platforms, like AWS, are becoming the de facto infrastructure choice for DR. However, workload portability solutions that simplify cross-platform or cloud recovery are required to meet most RTO & RPO SLAs in the cloud. AWS provides the infrastructure we need to bring DR to tier 2 and tier 3 workloads that have never been able to afford it before. Now, we need orchestration and automation to make it scalable and reliable.
In this session you will learn key considerations and practical steps for getting to the AWS cloud and how you can leverage Amazon S3 storage for cost-effective disaster recovery. Dow Jones will also share details on their migration to AWS Cloud, the benefits realized there, and what the future looks like. Session sponsored by Commvault.
Over 60 CIOs and Tech Leaders attended the #GoCloudWebinar on “AGILE INFRASTRUCTURE WITH WINDOWS AZURE” hosted by Aditi Technologies and Microsoft. Our CTO, Wade Wegner and Microsoft Azure solution specialist, Dina Frandsen discussed how Windows Azure Infrastructure Services (WAIS) can help organizations stay agile and what Windows Azure technology environment looks like and what it means to your organization.
We Explored
1. How IT teams can execute fast and stay lean with WAIS – A case study
2. Which enterprise workloads are best suited of WAIS migration
3. What are the best practices on how to plan, execute, deploy WAIS
Download this slidedeck and Sign up with the below link for viewing the Webinar - http://www.aditi.com/webevent/Agile_Infrastructure_with_WAIS/
4 C’s for Using Cloud to Support Scientific ResearchAvere Systems
While cost is a primary "c" driving the adoption of object-based cloud solutions in the life sciences, compute, capacity, and collaboration may all be bigger incentives. In this webinar, we'll examine how to use an Avere Hybrid Cloud NAS infrastructure to gain big benefits in areas like genomics research, personalized medicine, drug discovery, imaging, and other data analysis applications.
• Compute - Building production environments in the compute cloud without rewriting existing applications
• Capacity - Modernizing storage archives and disaster recovery by adding object storage for durability while leveraging existing on-premises NAS
• Collaboration - Using the cloud t o safely and securely share data globally
• Cost - Using cloud to lower overall costs to keep pace with fast-growing demands of research initiatives
Improving Availability & Lowering Costs with Auto Scaling & Amazon EC2 (CPN20...Amazon Web Services
Running your Amazon EC2 instances in Auto Scaling groups allows you to improve your application's availability right out of the box. Auto Scaling replaces impaired or unhealthy instances automatically to maintain your desired number of instances (even if that number is one). You can also use Auto Scaling to automate the provisioning of new instances and software configurations as well as to track of usage and costs by app, project, or cost center. Of course, you can also use Auto Scaling to adjust capacity as needed - on demand, on a schedule, or dynamically based on demand. In this session, we show you a few of the tools you can use to enable Auto Scaling for the applications you run on Amazon EC2. We also share tips and tricks we've picked up from customers such as Netflix, Adobe, Nokia, and Amazon.com about managing capacity, balancing performance against cost, and optimizing availability.
Disaster recovery sites on AWS: minimal costs maximum efficiencyAmazon Web Services
Implementation of a disaster recovery (DR) site is crucial for the business continuity of any enterprise. Due to the fundamental nature of features like elasticity, scalability, and geographic distribution, DR implementation on AWS can be done at 10-50% of the conventional cost. In this session, we do a deep dive into proven DR architectures on AWS and the best practices, tools and techniques to get the most out of them.
AWS 201 - A Walk through the AWS Cloud: App Hosting on AWS - Games, Apps and ...Amazon Web Services
Playfish, Gumi, and Halfbrick are three of many gaming companies on AWS. Pinterest, Netflix and Flipboard host web and mobile applications using the AWS Cloud. What are the best practices to build an application to take advantage of the benefits of AWS? Learn about these approaches and how customers have built highly scalable, durable and reliable infrastructures to host their internet-facing businesses on AWS. Attend this complimentary webinar to learn more.
Supporting Hadoop in containers takes much more than the very primitive support Docker provides using the Storage Plugin. A production scale Hadoop deployment inside containers needs to honor anti/affinity, fault-domain and data-locality policies. Kubernetes alone, with primitives such as StatefulSets and PersitentVolumeClaims, is not sufficient to support a complex data-heavy application such as Hadoop. One needs to think about this problem more holistically across containers, networking and storage stacks. Also, constructs around deployment, scaling, upgrade etc in traditional orchestration platforms is designed for applications that have adopted a microservices philosophy, which doesn't fit most Big Data applications across the ingest, store, process, serve and visualization stages of the pipeline. Come to this technical session to learn how to run and manage lifecycle of containerized Hadoop and other applications in the data analytics pipeline efficiently and effectively, far and beyond simple container orchestration. #BigData, #NoSQL, #Hortonworks, #Cloudera, #Kafka, #Tensorflow, #Cassandra, #MongoDB, #Kudu, #Hive, #HBase, PARTHA SEETALA, CTO, Robin Systems.
Running your own infrastructure *can* be as little as half the cost of running on AWS once you are at scale. OpenStack-based cloud systems can provide the same or similar economies of scale if you leverage the lessons of AWS and GCE when building your cloud. This talk discusses the economic factors in designing a cost-efficient AWS + OpenStack hybrid cloud. We look at the issues involved in repatriating existing applications, and show a couple of real-world demonstration of tools that can assist in the repatriation process. Repatriation isn quite as simple as hitting the Easy button, but if you plan your deployment correctly, you can make it work, both technically and economically.
This presentation provides an introduction to the Cloudify integration plugin with Terraform.
This integration allows Terraform users to use Cloudify to manage configuration and workflow of applications ontop of an infrastructure that was created by Terraform.
What A No Compromises Hybrid Cloud Looks Like Nati Shalom
Expectation vs. reality of a typical enterprise cloud journey
Lesson learned on how to set a cloud native strategy without compromising on the least common denominator, nor going through a complete rewrite
It has long been debated whether OpenStack is production ready. In this session you will learn how a major bank has gone to production with more than 5000 VMs that delivered the results of a 40% decrease in cost, reduced deployment time to hours not weeks, 56 new technologies introduced, 7 new platforms launched - all in under a year. Learn how their platform built on Rackspace and RHEL, coupled with best of breed open source tooling - SaltStack, Jenkins, Cloudify, and Nexus are the enablers for production-grade OpenStack.
http://sched.co/7fH1
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...Nati Shalom
Video recording: https://www.youtube.com/watch?v=tGlIgUeoGz8
It’s no news that containers represent a portable unit of deployment, and OpenStack has proven an ideal environment for running container workloads. However, where it usually becomes more complex is that many times an application is often built out of multiple containers. What’s more, setting up a cluster of container images can be fairly cumbersome because you need to make one container aware of another and expose intimate details that are required for them to communicate which is not trivial especially if they’re not on the same host.
These scenarios have instigated the demand for some kind of orchestrator. The list of container orchestrators is growing fairly fast. This session will compare the different orchestation projects out there - from Heat to Kubernetes to TOSCA - and help you choose the right tool for the job.
Session link from teh summit: https://openstacksummitmay2015vancouver.sched.org/event/abd484e0dedcb9774edda1548ad47518#.VV5eh5NViko
Real World Application Orchestration Made Easy on VMware vCloud Air, vSphere ...Nati Shalom
Looking for application orchestration in a hybrid or multi-cloud environment? You’ve got to hear about TOSCA orchestration. TOSCA (Topology and Orchestration Specification for Cloud Applications), brought to you by the same people who brought us XML, enables you to seamlessly migrate your workloads across environments or build a hybrid deployment that runs simultaneously across the VMware cloud offering.
Join our Cloud Online Meetup to learn how Cloudify’s TOSCA-compliant orchestration can be your common management interface across the VMware cloud offering, OpenStack and heterogeneous cloud environments.
Speakers:
Nati Shalom, Founder and CTO at GigaSpaces, is a thought leader in Cloud Computing and Big Data Technologies. Shalom was recently recognized as a Top Cloud Computing Blogger for CIOs by The CIO Magazine and his blog is listed as an excellent blog by YCombinator. Shalom is the founder and also one of leaders of OpenStack Israel group, and is a frequent presenter at industry conferences.
Paco Gomez, Senior Solution Architect at VMware vCloud Air. Paco evaluates and integrates strategic solutions that help vCloud Air clients benefit from VMware's hybrid cloud and application services. Paco is a seasoned technologist, having extensive experience in diverse fields including mainframes, distributed systems, enterprise development, cloud computing, mobile, assistive technology, electrical engineering and embedded systems. Across his career, Paco has held positions in consulting, sales engineering
OpenStack Juno The Complete Lowdown and Tales from the SummitNati Shalom
This presentation covers the main points from the summit and the OpenStack Juno release
It also covers how users use OpenStack based on the recent survey
Application and Network Orchestration using Heat & ToscaNati Shalom
The buzzwords Neutron, Heat, and TOSCA are spoken about quite often when it comes to the OpenStack - and many of us are still trying to make sense of the terminology and its place in the OpenStack world.
Where OpenStack Neutron provides APIs for creating network elements, OpenStack Heat provides an orchestration engine for automating the setup and configuration of OpenStack infrastructure, while TOSCA is a standard for templating and defining application topology and policies (that form the basis for Heat). In this context, it really makes sense to put these all together to achieve application and network automation for OpenStack on steroids.
In this session we will learn how we can use the robust combination of Heat and TOSCA to configure and control resources on Nova and Neutron in order to automate the network configuration as part of the application deployment.
The session will include a demo and code examples that show how you can configure virtual networks, attach public IPs, set up security groups, set up load balancing and automatically scale up/down and more. You will leave this session understanding where Neutron meets Heat and TOSCA.
This talk was delivered as part of OpenStack Paris summit - 2014 - http://openstacksummitnovember2014paris.sched.org/event/2b85b682ccaf3a5961e463b61e2403f8#.VFeuG_TF8mc
During the past few years we’ve seen how our entire data-center becomes software defined. This include the Compute, Storage, Network and also Configuration. This new data centre is the cloud.
The missing piece in the puzzle:
While this is pretty much old news there is one big thing that is missing in this puzzle and that is the operator itself.
The operator is responsible for running processes such:
* Installation of new apps
* Upgrades and update of new features or patches
* Performance tuning
* Handling failure
* Managing the capacity to meet the scaling demand.
Most of those tasks today involves lots of human intervention. Users who realised that gap try to mitigate that by putting their own custom automation - usually that comes in a form of scripts on-top of the configuration management. Those custom scripts tend to grow fairly quickly to the point where they become unmanageable.
This presentation will introduce how we can use an orchestrator to automate those tasks and by that create a software defined Operator.
Complex Analytics with NoSQL Data Store in Real TimeNati Shalom
NOSQL are often limited in the type of queries that they can support due to the distributed nature of the data. In this session we would learn patterns on how we can overcome this limitation and combine multiple query semantics with NoSQL based engines.
We will demonstrate specifically a combination of key/value, SQL like, Document model and Graph based queries as well as more advanced topic such as handling partial update and query through projection. We will also demonstrate how we can create a meshaup between those API's i.e. write fast through Key/Value API and execute complex queries on that same data through SQL query.
- See more at: http://nosql2014.dataversity.net/sessionPop.cfm?confid=81&proposalid=6335#sthash.PNSZi5TJ.dpuf
Is Orchestration the Next Big Thing in DevOpsNati Shalom
DevOps processes (such as continuous deployment and delivery) often involve writing many custom scripts that are triggered by the build system. With that approach, it is relatively hard to trace the deployment process and troubleshoot when something goes wrong. Additionally, custom scripts are often not written in an easily understood manner. In this session we will walk through specific DevOps workflows (such as install, update, etc) using Riemann as the framework in subject and see the steps required to automate those processes. We will also discuss how Cloudify uses Riemann to provide simple execution and monitoring of those workflow processes. We will share how one customer, PaddyPower, was able to leverage Cloudify to transition their traditional IT into a DevOps environment, bridging the gap betwe
Real-Time Big Data at In-Memory Speed, Using StormNati Shalom
Storm, a popular framework from Twitter, is used for real-time event processing. The challenge presented is how to manage the state of your real-time data processing at all times. In addition, you need Storm to integrate with your batch processing system (such as Hadoop) in a consistent manner.
This session will demonstrate how to integrate Storm with an in-memory database/grid, and explore various strategies for integrating the data grid with Hadoop and Cassandra, seamlessly. By achieving smooth integration with consistent management, you will be able to easily manage all the tiers of you Big Data stack in a consistent and effective way.
- See more at: http://nosql2013.dataversity.net/sessionPop.cfm?confid=74&proposalid=5526#sthash.FWIdqRHh.dpuf
Disaster Recovery on Demand on the CloudNati Shalom
How to avoid Cloud Outages and leverage cloud economics to keep the cost down through automation of disaster recovery processes and on-demand deployment of the backup nodes.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Disaster recovery on demand on the cloud
1. Protect your app from Outages
Nati Shalom CTO GigaSpaces
@natishalom
May 2013
2. AWS and outages
Outage impact
Disaster Recovery – it’s all about redundancy!
Cloudify as a solution for redundancy
Demo with Cloudify on EC2
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved2
AGENDA
3. 3
AWS USAGE
• AWS – around 0.5M servers
• Facebook – less than 0.1M servers
• Google – around 1M servers
5. OUTAGE – APRIL 21, 2011
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved5
6. OUTAGE - JUNE 29, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved6
7. OUTAGE - OCTOBER 22, 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved7
8. OUTAGE - CHRISTMAS EVE 2012
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved8
9. NOT ONLY AMAZON
® Copyright 2012 GigaSpaces Ltd. All Rights Reserved9
28 December 2012 - some owners of
Microsoft's XBox 360 gaming console were
unable to access some of their cloud-based
storage files.
26 July 2012 - Service for Microsoft’s
Windows Azure Europe region went down for
more than two hours
29 February 2012 - The ultimate result was
service impacts of 8-10 hours for users of
Azure data centers in
Dublin, Ireland, Chicago, and San Antonio.
10. 10
THAT’S WHAT YOU EXPECT?
99% - 3.65 days downtime
99.9% - 8.76 hours downtime
99.99% - 53 minutes downtime
99.999% - 5.26 minutes downtime
11. ® Copyright 2012 GigaSpaces Ltd. All Rights Reserved11
OUTAGE IMPACT – DESIGN FOR FAILURES
Outage could cost…
$89K per hour for Amadeus
$225K per hour for PayPal!
14. 14
PREPARE FOR DISASTER RECOVERY
•Dedicated expert for DR architecture
•Define target recovery time & point
•Assume every tier can fail
•Use monitoring and alerts
•Document your operational processes
22. BUILT IN SUPPORT FOR MANAGING DATA IN THE CLOUD
Real Time Relational DB
Clusters
NoSQL Clusters Hadoop
Storm MySQL MongoDB Hadoop (Hive,
Pig,..)
Elastic Caching XAP Postgress Cassandra ZooKeeper
Couchbase
ElasticSearch
24. Technology-based concrete
process control and information
service
Deployments across North
America, Latin America, Asia, and
Europe for nearly a decade
Part of W.R. Grace & Co , $6.3 B
Company.
The problem: On-Demand HA/DR
over multiple Cloud regions.
CASE STUDY: VERIFI
24
High
Availability
Data
Replication
Disaster
Recovery
25. ELASTIC ON-DEMAND DISASTER RECOVERY
25
Problem
Can we eliminate the
RTO vs. Cost trade-off
in the cloud?
Solution (Elastic DR)
A hybrid between Hot
and Warm DR
Switch to Active site
in matter of seconds
through cloud-
agnostic lifecycle
automation recipes
26. VERIFI (INITIAL) ARCHITECTURE
26
Availability region (US-West: Oregon)
Data Volume
Internet EC2 Instance
mod_cluster
EC2 Instance
JBoss
Data Volume
EC2 Instance
EC2 Instance
PostgresSQL
Cassandra
4 recipes
27. ELASTIC DR ON-DEMAND: FAILOVER SCENARIO
27
Region (US-West Oregon)
App Servers
PostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
App Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure
occurs
* Initially, all those actions may be done manually by
Verifi’s Ops team (e.g.: via recipe commands in CLI)
Bootstrap another cloud in
a different region using the
same application recipe
used to bootstrap cloud #2
above*
Liveness poll
Liveness poll
Upon initial deployment, the primary deplyoment
of the application “verifi” will be bootstrapped
onto cloud #1, another slightly modified
application recipe “verifi_dr” will be bootstrapped
as cloud #2, polling cloud #1 for failure, and acting
as a PostgresSQL db slave.
Turn Postgres slave into
master, Start app server
instances*
28. FAILOVER SCENARIO
28
Region (US-West Oregon)
App Servers
PostgresSQL
Region (US-East Virginia)
PostgresSQL
Cloud #1 Cloud #2
Region (US-East Virginia )
PostgresSQL
Cloud #1 Cloud #2
App Servers
Region (US-West California)
PostgresSQL
Cloud #3
Region failure
occurs
Bootstrap another cloud in
a different region using the
same application recipe
used to bootstrap cloud #2
above*
Liveness poll
Liveness poll
Upon initial deployment, the primary deployment
of the application will be bootstrapped onto cloud
#1, another slightly modified application recipe
will be bootstrapped as cloud #2, polling cloud #1
for failure, and acting as a PostgresSQL db slave.
Turn Postgres slave into
master, Start app server
instances*
29. Copyright 2012 Gigaspaces. All Rights Reserved29
NEXT STEPS
Across clouds
(AWS, Rackspace, Azure…etc)
Across AWS regions
Across AWS zones
1 application
+ overrides
Several cloud
drivers
1 application
+ overrides
1 cloud driver
1 application +
overrides
1 cloud driver
Availability
Supported by
Verifi phase #1
30. Copyright 2013 Gigaspaces. All Rights Reserved30
ELASTIC ON-DEMAND DR: COSTS
Main Site (US-West) Warm DR Site (US-East) Hot DR Site
Cost $82,068 $12,625 $82,068
Main Site
1 Load balancer, 2 JBoss instances, 1 PostgreSQL master, 3 Cassandra
DR Site
1 PostgreSQL slave – All other instance start on demand upon failover
31. Copyright 2013 Gigaspaces. All Rights Reserved31
ELASTIC DR: WARM DR COST, CLOUD PORTABILITY
4 recipes
DR Site
$12k
SameRecipe
$14k
$6k
$5k
$9k
32. Copyright 2013 Gigaspaces. All Rights Reserved32
ELASTIC DR: HOT DR COST
4 recipes
DR Site
$82k
SameRecipe
$79k
$115k
$68k
$91k
33. Disaster Recovery – it’s all about redundancy!
Cloning your environment – app stack
Cloning your Data – DB Replication
Automation makes DR processes simple
Use recipes to clone your app stack consistently
Use replication to clone your data
Leverage cloud economics to reduce the cost
DR on Demand
Multi Cloud
® Copyright 2013 GigaSpaces Ltd. All Rights Reserved33
SUMMARY
A high-ranking Amazon executive said there are 60,000 different customers across the various Amazon Web Services, and most of them are not the startups that are normally associated with on-demand computing. Rather the biggest customers in both number and amount of computing resources consumed are divisions of banks, pharmaceuticals companies and other large corporations who try AWS once for a temporary project, and then get hooked. According to Statspotting.com in March 2012 - researcher estimates that Amazon Web Services is using at least 454,400 servers in seven data center hubs around the globe. Let us try this: Google is powered by a million servers. Maybe a little more than that. And Amazon has half a million servers. Now, things fall in place. Facebook, the service that takes up one fourth of all our time online, is powered by less than 100,000 servers.Biggest customers – pinterest, instagram, Netflix, heroku, quora, foursquare etcAmazon Web Services runs more than 835,000 requests per second for hundreds of thousands of customers in 190 countries, including 300 government agencies and 1,500 educational institutions.
The Amazon cloud proved itself in that sufficient resources were available world-wide such that many well-prepared users could continue operating with relatively little downtime. But because Amazon’s reliability has been incredible, many users were not well-prepared leading to widespread outages.Amazon EC2 outage on April 2011 was the worst in cloud computing’s history back then. It made the front page of many news pages, including the New York Times, probably because many people were shocked by how many web sites and services rely on EC2.Microsoft Azure outageDec 28 2012 - some owners of Microsoft's Xbox 360 game console were unable to access some of their cloud-based save storage files.July 26 - 2012 - Service for Microsoft’s Windows Azure Europe region went down for more than two hoursFeb 29 2012 - The ultimate result was service impacts of 8-10 hours for users of Azure data centers in Dublin, Ireland, Chicago, and San Antonio.
Some parts of Amazon Web Services suffered a major outage. A portion of volumes utilizing the Elastic Block Store (EBS) service became "stuck" and were unable to fulfill read/write requests. It took at least two days for service to be fully restored. Reddit, one of the better-known sites to go down due to the error, said it has 700 EBS volumes with Amazon.Sites like Quora and Reddit were able to come back online in "read-only" mode, but users couldn't post new content for many hours.
For second time in less than a month, Amazon’s Northern Virginia data center has suffered an outage and is impacting many popular services such as Instagram, Pinterest & Netflix.Several websites that rely on Amazon Web Services were taken offline due to a severe storm of historic proportions in the Northern Virginia area where Amazon's largest datacenter is located. Amazon previously suffered an outage in its Northern Virginia facilities on June 14, 2012.A line of severe storms packing winds of up to 80 mph has caused extensive damage and power outages in Virginia. Dominion Virginia Power crews are assessing damages and will be restoring power where safe to do so.
A major outage occurred, affecting many sites such as reddit, Foursquare, Pinterest, and others. The cause was a latent bug in an operational data collection agent. A memory leak and a failed monitoring system caused the Amazon Web Services outage on Monday that took out Reddit and other major services.According to a post Friday night, AWS explained that the problem arose after a simple replacement of a data collection server. After installation, the server did not propagate its DNS address correctly and so a fraction of servers did not get the message. Those servers kept trying to reach the server, which led to a memory leak that then went out of control due to the failure of an internal monitoring alarm. Eventually the system ground to a virtual stop and millions of customers felt the pain.
Amazon AWS again suffered an outage, causing websites such as Netflix instant video to be unavailable for some customers, particularly in the North-eastern US. Amazon later issued a statement detailing the issues with the Elastic Load Balancing service that led up to the outage.The disruption began shortly after noon Pacific time on December 24 when data was accidentally deleted by a developer during maintenance on the East Coast Elastic Load Balancing system, which is designed to distribute traffic volume among servers."Netflix is designed to handle failure of all or part of a single availability zone in a region as we run across three zones and operate with no loss of functionality on two," the company said in ablog post this afternoon. "We are working on ways of extending our resiliency to handle partial or complete regional outages."
Fault tolerant systems are measured by their uptime / downtime for end usersAmazon says it is "committed" to a 99.95 percent uptime
Although AWS went offline for a few hours only, the downtime experience did have an impact on customers’ businesses. There is no known data for the number of people affected by a cloud computing service outage. It is estimated that the travel service provider Amadeus loses $89,000 per hour during any cloud computing outage, while Paypal loses around $225,000 per hour.
DR – The process and procedures you take to restore your system after catastrophic event.Cloud infrastructure has made DR much easier and affordable comparing to previous options.Cloud can also suffer from large scale failures because of network, power or any IT failures.Applications owners need to be responsible for HA and DR – can use multiple servers, AZ, regions and even clouds.Zones within a region share a LAN so they have high bandwidth, low latency and private IP access. Zones utilize separate power resources. Regions are “islands” – they share no resources.
Each cloud is unique in many aspects offering different API and functionality to manage the resources.Different set of available resourcesDifferent format, encoding and versionsDifferent security groups, machine images, snapshots etc.
Make sure to have a dedicated expert to manage your DR architecture, processes and testing.Define what your target recovery time and recovery point is.Be pessimistic and design for failures – (assume everything will fail and design a solution that is capable of handling it). Avoid single point of failures – all parts of your app should be highly available (different AZ / regions / cloud) – load balancers, app servers, web servers, message bus, database.Use monitoring and alerts for failover processes and for every change in state.Document your DR operational processes and automations.Try to “break” different part in your application. Try different ways to break it – unplug the network, turn machine off etc. Try it again.
Netflix has open sourced ”Chaos Monkey,” its tool designed to purposely cause failure in order to increase the resiliency of an application in Amazon Web Services (AWS.)It’s a timely move as AWS has had its fair share of outages. With tools like Chaos Monkey, companies can be better prepared when a cloud infrastructure has a failure.In a blog post, Netflx says that this is the first of several tools that it will open source to help companies better manage the services they run in cloud infrastructures. Next up is likely to be Janitor Monkey which helps keep an environment tidy and costs down.Chaos Monkey has achieved its own fame for its innovative approach. According to Netflix, the tool “randomly disables production instances to make sure it can survive common types of failure without any customer impact. The name comes from the idea of unleashing a wild monkey with a weapon in your data center (or cloud region) to randomly shoot down instances and chew through cables — all the while we continue serving our customers without interruption.”
Netflix provides an excellent toolset for surving outages at the operation level.In this part i wanted to zoom-in more on the design implication of our application.The core principle for surviving failure is actually fairly simple and in fact applies to any systems not just cloud whether they happen to be Airplane, Missiles, Cars etc.. At the end its all about redundancy. The degree of tolerance is often determined by how many alternate systems or parts of the system we have in our design and how much they are separated from one another. The degree tolerance is also determined by how fast we can detect the broken part in our system and make the switch. In software terms the common parts that comprises our system is built out of two main groups - the business logic and the data.Making a redundant software application that can survive failure is often based on setting up clones for two of those parts of our system.
We need abstraction – we don’t want to be locked in. We want to use tools that offer this abstraction layers both for daily management and for DR. This tool should translate our architecture concepts to the cloud specific properties (using recipes).To clone our application business logic we need to be able to ensure that all parts of our system runs the exact same version of all our software components . That include not just the binaries but also the configuration, the scripts that runs our application and more importantly that all our post deployment procedures such as fail-over, scaling and monitoring are also kept consistent. Quite often the things that makes the cloning of our business logic complex is due to the fact that the information on how to run our application is often scattered in many different sources such as scripts, as well as the mind of the people that runs those apps. To make the job of cloning our application much simpler and thus more consistent we need to be able to capture all parts of the information for running our apps in the same place. Configuration management tools such as Chef, Puppet and in the case of Amazon CloudFormation can help on this regard.
RDS read replica - Amazon RDS uses MySQL’s built-in replication functionality to create a special type of DB Instance called a Read Replica that allows you to elastically scale out beyond the capacity constraints of a single DB Instance for read-heavy database workloads. Once you create a Read Replica, database updates on the source DB Instance are replicated to the Read Replica using MySQL’s native, asynchronous replication. Since Read Replicas leverage standard MySQL replication, they may fall behind their sources, and they are therefore not intended to be used for enhancing fault tolerance in the event of source DB Instance failure or Availability Zone failure.
There are lots of patterns on how to avoid failure.It took Netflix lots of development work to build a framework that can handle them well.Most users, startup don't have the luxury of implementing them themselves. You need a tool that will enable you to automate those patterns in a consistent way. - Enter Cloudify
Any App, Any Stack — Move your application to the cloud without making any code changes, regardless of the application stack (Java/Spring, Java EE, Ruby on Rails, …), database store (relational such as MySQL or non-relational such as Apache Cassandra), or any other middleware components it uses. This enables you to achieve your objective of no code changes.To make the work of setting all this work simpler we tried to bake all those patterns into a readymade tools and are scripted into out of the box recipes. The cloudify recipes includes: Database cluster recipes with support for MySQL, MongoDB, Cassandra, Postgress etc..Integration with Chef and Puppet Automation of fail-over, scaling and continues maintenance of our application.Application recipes that allows you to capture all the aspect of running your application including the post deployment aspect such as fail-over, scaling and monitoring.
There are lots of patterns on how to avoid failure.It took Netflix lots of development work to build a framework that can handle them well.Most users, startup don't have the lactury of implementing them themselves. You need a tool that will enable you to automate those patterns in a consistent way. - Enter Cloudify
Cloud brings lots of promise for making our business more agile.Cloud has also become a huge shared infrastructure in which every failure has a much more significant impact on our business world wide.The experience in the past year had tought us that even a robust cloud infrastructure such as Amazon can fail. Through this experience we've learned that rather than relying on the infrastructure for preventing failure we need to design our system to cope with failure and get used to failure as away of life. Having said that the investment required to build a robust application can be fairly large and not something that everyone can afford.Using tools like Cloudify, Chef Puppet and if your a pure Amazon shop Netflix <framework> could help greatly to reduce this effort by making a lot of those patterns pre-backed into recipes.