This document discusses availability concepts for IT infrastructure architecture. Some key points include:
- Availability is calculated based on uptime percentage over time (usually annually or monthly), with 99.9% being a common service level agreement target.
- Downtime is influenced by factors like mean time between failures (MTBF) and mean time to repair (MTTR). Redundancy and failover help improve availability.
- Sources of downtime include human errors, software bugs, planned maintenance, hardware failures, environmental issues, and infrastructure complexity. Redundancy, failover, and fallback solutions can help address some causes of downtime.
A distributed system is a collection of computational and storage devices connected through a communications network. In this type of system, data, software, and users are distributed.
PowerPoint Presentation on Distributed Operating Systems,reasons for opting for distributed systems over centralized systems,types of Distributed Systems,Process Migration and its advantages.
A distributed system is a collection of computational and storage devices connected through a communications network. In this type of system, data, software, and users are distributed.
PowerPoint Presentation on Distributed Operating Systems,reasons for opting for distributed systems over centralized systems,types of Distributed Systems,Process Migration and its advantages.
It security for libraries part 3 - disaster recovery Brian Pichman
A very important topic in today's data age is Disaster Recovery. With the need for high up time in our environments, your environment must be prepared for the worse. From basic internet outages to full system failure, how you plan will determine how quickly you can recover. See more details below. Topics/Agenda: * Learn the key infrastructure components in mitigating risks as it relates to data loss or system failure * Identify the main points to include within a disaster plan
• We sleeping well. And our mobile ringing and ringing. Message: DISASTER! In this session (on slides) we are NOT talk about potential disaster (such BCM); we talk about: And what NOW? New version old my old well-known session updated for whole changes which happened in DBA World in last two-three years.
• So, from the ground to the Sky and further - everything for surviving disaster. Which tasks should have been finished BEFORE. Is virtual or physical SQL matter? We talk about systems, databases, peoples, encryption, passwords, certificates and users.
• In this session (on few demos) I'll show which part of our SQL Server Environment are critical and how to be prepared to disaster. In some documents I'll show You how to be BEST prepared.
When planning for Disaster Recovery it is essential to have a clearly defined set of objectives that are based on your businesses needs .InTechnology's Product Director for Data & Cloud Services, Stefan Haase, provides tips for any business to consider when putting together their disaster recovery plan. http://www.intechnology.co.uk/resource-centre/webcast-disaster-recovery-planning.aspx
Real-Time Operating Systems Real-Time Operating Systems RTOS .pptlematadese670
Real-Time OpReal-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operating Systems Real-Time Operatingv
Session form series of conferences during Data Relay (formerly SQL Relay) 2018 in Newcastle, Leeds, Birmingham, Reading, Bristol. The session contains only slides form the talk (no videos included).
Mtc learnings from isv & enterprise interactionGovind Kanshi
This is one of the dated presentation for which I keep getting requests for, please do reach out to me for status on various things as Azure keeps fixing/innovating whole of things every day.
There are bunch of other things I can help you on to ensure you can take advantage of Azure platform for oss, .net frameworks and databases.
Mtc learnings from isv & enterprise (dated - Dec -2014)Govind Kanshi
This is little dated deck for our learnings - I keep getting multiple requests for it. I have removed one slide for access permissions (RBAC -which are now available).
A Backup Today Saves Tomorrow is a presentation from Percona Live 2013 that provides insight into planning and the tools used today to capture MySQL backups.
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
Are you looking to streamline your workflows and boost your projects’ efficiency? Do you find yourself searching for ways to add flexibility and control over your FME workflows? If so, you’re in the right place.
Join us for an insightful dive into the world of FME parameters, a critical element in optimizing workflow efficiency. This webinar marks the beginning of our three-part “Essentials of Automation” series. This first webinar is designed to equip you with the knowledge and skills to utilize parameters effectively: enhancing the flexibility, maintainability, and user control of your FME projects.
Here’s what you’ll gain:
- Essentials of FME Parameters: Understand the pivotal role of parameters, including Reader/Writer, Transformer, User, and FME Flow categories. Discover how they are the key to unlocking automation and optimization within your workflows.
- Practical Applications in FME Form: Delve into key user parameter types including choice, connections, and file URLs. Allow users to control how a workflow runs, making your workflows more reusable. Learn to import values and deliver the best user experience for your workflows while enhancing accuracy.
- Optimization Strategies in FME Flow: Explore the creation and strategic deployment of parameters in FME Flow, including the use of deployment and geometry parameters, to maximize workflow efficiency.
- Pro Tips for Success: Gain insights on parameterizing connections and leveraging new features like Conditional Visibility for clarity and simplicity.
We’ll wrap up with a glimpse into future webinars, followed by a Q&A session to address your specific questions surrounding this topic.
Don’t miss this opportunity to elevate your FME expertise and drive your projects to new heights of efficiency.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
UiPath Test Automation using UiPath Test Suite series, part 4DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 4. In this session, we will cover Test Manager overview along with SAP heatmap.
The UiPath Test Manager overview with SAP heatmap webinar offers a concise yet comprehensive exploration of the role of a Test Manager within SAP environments, coupled with the utilization of heatmaps for effective testing strategies.
Participants will gain insights into the responsibilities, challenges, and best practices associated with test management in SAP projects. Additionally, the webinar delves into the significance of heatmaps as a visual aid for identifying testing priorities, areas of risk, and resource allocation within SAP landscapes. Through this session, attendees can expect to enhance their understanding of test management principles while learning practical approaches to optimize testing processes in SAP environments using heatmap visualization techniques
What will you get from this session?
1. Insights into SAP testing best practices
2. Heatmap utilization for testing
3. Optimization of testing processes
4. Demo
Topics covered:
Execution from the test manager
Orchestrator execution result
Defect reporting
SAP heatmap example with demo
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
2. Introduction
• Everyone expects
their infrastructure
to be available all
the time
• A 100%
guaranteed
availability of an
infrastructure is
impossible
3. Calculating availability
• Availability can neither be calculated, nor
guaranteed upfront
– It can only be reported on afterwards, when a system
has run for some years
• Over the years, much knowledge and experience
is gained on how to design high available systems
– Failover
– Redundancy
– Structured programming
– Avoiding Single Points of Failures (SPOFs)
– Implementing systems management
4. Calculating availability
• The availability of a system is usually expressed as
a percentage of uptime in a given time period
– Usually one year or one month
• Example for downtime expressed as a percentage
per year:
Availability %
Downtime
per year
Downtime
per month
Downtime
per week
99.8% 17.5 hours 86.2 minutes 20.2 minutes
99.9% ("three nines") 8.8 hours 43.2 minutes 10.1 minutes
99.99% ("four nines") 52.6 minutes 4.3 minutes 1.0 minutes
99.999% ("five nines") 5.3 minutes 25.9 seconds 6.1 seconds
5. Calculating availability
• Typical requirements used in service level
agreements today are 99.8% or 99.9% availability
per month for a full IT system
• The availability of the infrastructure must be
much higher
– Typically in the range of 99.99% or higher
• 99.999% uptime is also known as carrier grade
availability
– For one component
– Higher availability levels for a complete system are
very uncommon, as they are almost impossible to
reach
6. Calculating availability
• It is good practice to agree on the maximum
frequency of unavailability
Unavailability
(minutes)
Number of events
(per year)
0 – 5 <= 35
5 – 10 <= 10
10 – 20 <= 5
20 – 30 <=2
> 30 <= 1
7. MTBF and MTTR
• Mean Time Between Failures (MTBF)
–The average time that passes between
failures
• Mean Time To Repair (MTTR)
–The time it takes to recover from a failure
8. MTBF and MTTR
• Some components have higher MTBF than others
• Some typical MTB’s:
Component MTBF (hours)
Hard disk 750,000
Power supply 100,000
Fan 100,000
Ethernet Network Switch 350,000
RAM 1,000,000
9. MTTR
• MTTR can be kept low by:
– Having a service contract with the supplier
– Having spare parts on-site
– Automated redundancy and failover
10. MTTR
• Steps to complete repairs:
– Notification of the fault (time before seeing an alarm
message)
– Processing the alarm
– Finding the root cause of the error
– Looking up repair information
– Getting spare components from storage
– Having technician come to the datacenter with the
spare component
– Physically repairing the fault
– Restarting and testing the component
12. Calculation examples
• Serial components: One defect leads to
downtime
• Example: the above system’s availability is:
0.9999200 × 0.9999200 × 0.9999733
× 0.9999920 × 0.9999840 × 0.9999680
= 0.99977 = 𝟗𝟗. 𝟗𝟕𝟕%
(each components’ availability is at least 99.99%)
13. Calculation examples
• Parallel components: One defect: no downtime!
• But beware of SPOFs!
• Calculate availability:
𝐴 = 1 − (1 − 𝐴1) 𝑛
• Total availability = 1 − (1 − 0.99)2
= 99.99%
14. Sources of unavailability - human
errors
• 80% of outages impacting mission-critical
services is caused by people and process issues
• Examples:
• Performing a test in the production environment
• Switching off the wrong component for repair
• Swapping a good working disk in a RAID set instead of the
defective one
• Restoring the wrong backup tape to production
• Accidentally removing files
– Mail folders, configuration files
• Accidentally removing database entries
– Drop table x instead of drop table y
15. Sources of unavailability - software
bugs
• Because of the complexity of most software it
is nearly impossible (and very costly) to create
bug-free software
• Application software bugs can stop an entire
system
• Operating systems are software too
– Operating systems containing bugs can lead to
corrupted file systems, network failures, or other
sources of unavailability
16. Sources of unavailability - planned
maintenance
• Sometimes needed to perform systems management
tasks:
– Upgrading hardware or software
– Implementing software changes
– Migrating data
– Creation of backups
• Should only be performed on parts of the
infrastructure where other parts keep serving clients
• During planned maintenance the system is more
vulnerable to downtime than under normal
circumstances
– A temporary SPOF could be introduced
– Systems managers could make mistakes
17. Sources of unavailability - physical
defects
• Everything breaks down eventually
• Mechanical parts are most likely to break first
• Examples:
– Fans for cooling equipment usually break because
of dust in the bearings
– Disk drives contain moving parts
– Tapes are very vulnerable to defects as the tape is
spun on and off the reels all the time
– Tape drives contain very sensitive pieces of
mechanics that can break easily
18. Sources of unavailability - bathtub
curve
• A component failure is most likely when the
component is new
• When a component still works after the first month, it
is likely that it will continue working without failure
until the end of its life
19. Sources of unavailability -
environmental issues
• Environmental issues can cause downtime:
– Failing facilities
• Power
• Cooling
– Disasters
• Fire
• Earthquakes
• Flooding
20. Sources of unavailability - complexity
of the infrastructure
• Adding more components to an overall system
design can undermine high availability
– Even if the extra components are implemented to
achieve high availability
• Complex systems
– Have more potential points of failure
– Are more difficult to implement correctly
– Are harder to manage
• Sometimes it is better to just have an extra spare
system in the closet than to use complex
redundant systems
21. Redundancy
• Redundancy is the duplication of critical
components in a single system, to avoid a
single point of failure (SPOF)
• Examples:
– A single component having two power supplies; if
one fails, the other takes over
– Dual networking interfaces
– Redundant cabling
22. Failover
• Failover is the (semi)automatic switch-over to
a standby system or component
• Examples:
– Windows Server failover clustering
– VMware High Availability
– Oracle Real Application Cluster (RAC) database
23. Fallback
• Fallback is the manual switchover to an
identical standby computer system in a
different location
• Typically used for disaster recovery
• Three basic forms of fallback solutions:
– Hot site
– Cold site
– Warm site
24. Fallback – hot site
• A hot site is
– A fully configured fallback datacentre
– Fully equipped with power and cooling
– Applications are installed on the servers
– Data is kept up-to-date to fully mirror the production
system
• Requires constant maintenance of the hardware,
software, data, and applications to be sure the
site accurately mirrors the state of the production
site at all times
25. Fallback - cold site
• Is ready for equipment to be brought in during
an emergency, but no computer hardware is
available at the site
• Applications will need to be installed and
current data fully restored from backups
• If an organization has very little budget for a
fallback site, a cold site may be better than
nothing
26. Fallback - warm site
• A computer facility readily available with
power, cooling, and computers, but the
applications may not be installed or
configured
• A mix between a hot site and cold site
• Applications and data must be restored from
backup media and tested
– This typically takes a day
27. Business Continuity
• An IT disaster is defined as an irreparable
problem in a datacenter, making the datacenter
unusable
• Natural disasters:
– Floods
– Hurricanes
– Tornadoes
– Earthquakes
• Manmade disasters:
– Hazardous material spills
– Infrastructure failure
– Bio-terrorism
28. Business Continuity
• In case of a disaster, the infrastructure could
become unavailable, in some cases for a longer
period of time
• Business Continuity Management includes:
– IT
– Managing business processes
– Availability of people and work places in disaster
situations
• Disaster recovery planning (DRP) contains a set of
measures to take in case of a disaster, when
(parts of) the IT infrastructure must be
accommodated in an alternative location
29. RTO and RPO
• RTO and RPO are objectives in case of a disaster
• Recovery Time Objective (RTO)
– The maximum duration of time within which a
business process must be restored after a disaster, in
order to avoid unacceptable consequences (like
bankruptcy)
30. RTO and RPO
• Recovery Point Objective (RPO)
– The point in time to which data must be recovered
considering some "acceptable loss" in a disaster
situation
• RTO and RPO are individual objectives
– They are not related