WHITE
Redundant Fiber-Based
Systems




 A Thinklogical White Paper
 By Larry Wachter
 Senior Product Manager - Routing and Extension Solutions - Thinklogical


 This white paper illustrates the concept of redundant and resilient
 systems and how ber-based extension and routing solutions can
 maintain operability in the event of a failure.




                                                                                                   




                                                                           www.thinklogical.com
White Paper - Redundant Fiber-Based Systems




Introduction

At the most basic level, availability can be de ned as the probability that a system is operating
successfully when needed. The term high availability has been used to encompass all things related to
productivity, including reliability and maintainability. The adoption of high availability has led to
redundant and resilient systems spurring a ripple e ect and ending with the creation of ber
infrastructures which require products and solutions that provide various levels of fault-tolerance. In
particular, this is true of ber-based routing and extension solutions, which not only provide mechanisms
that aid in modular redundant system architecture, but also provide high bandwidth, cost-e ectiveness,
and support for complex topologies. Consequently, Thinklogical has designed a redundant ber-based
routing and extension solution that meets the requirements for reliable signal transmission in modular
redundant system deployments.

High Availability Achieved Through Redundant and Resilient Systems

Redundancy can involve a variety of technologies, all of which pertain to physical backups, whereas
resiliency deals primarily with communication protocols. A redundant device may activate as a result of a
failure, but without built in resiliency as well, there could be data loss, or worse, the inability to establish
the redundant connection. A resilient system will return to an operable state after encountering trouble.
Therefore, if a risk event knocks a system o ine, a highly resilient system will resume its intended work
and function with minimal downtime.

Building a redundant and resilient system requires a holistic mentality. One must prioritize every
foreseeable risk and then determine not only how to reduce the risk in the rst place, but determine how
to minimize its impact on the system. The need or requirement for redundancy can be based on a set of
system criteria questions:

    Does the system need to run around the clock and is downtime unacceptable?

    If a system fault occurs, should the primary system switch over to the secondary system seamlessly?

    What is the degree to which the data shared between sources and destinations must remain constant
    and reliable?

    How can single points of failure within the system be minimized and how can one ensure that
    components within the infrastructure will not stop the overall operation of the system?




2                                                                                 www.thinklogical.com
Application Diagram                                               BROADCAST & POST-PRODUCTION BRIEF
                                                         White Paper - Redundant Fiber-Based Systems




  High availability, achieved through redundancy and fault tolerance, is a critical component of many
  routing and extension installations, especially in secure visual computing environments. While the loss of
  an enterprise system for a few minutes is inconvenient, losing a secure visual computing system can have
  disastrous consequences. Some form of redundancy and fault tolerance is generally used if a control
  system shutdown or loss of visibility causes a major loss of revenue, loss of equipment, disruption to
  public services and/or safety. Redundancy in these situations means the duplication, or even triplication,
  of equipment that is needed to operate without disruption, if and when the primary equipment fails
  during the mission. In these types of environments the cost of failure is so high that a redundant system
  approach is crucial.

  By using a ber-based solution that supports redundant system design, users enjoy highly reliable data
  transmission, reduced costs of deployment and a guaranteed upgrade strategy as requirements evolve.
  This white paper will touch upon several various redundant and fault tolerant features and architectures
  for ber-based infrastructures, but will focus primarily on Dual Modular Redundancy, otherwise known as
  Parallel Redundancy, which is the approach taken by Thinklogical systems. This paper will also highlight
  features within the Thinklogical product lines that can help achieve higher availability.

  Redundancy on a Component Level

  The most important place to start to guarantee reliable operation is to provide redundant,
  hot-swappable components. It is also critical that modules or components should be capable of being
  removed, replaced or added to the system without interruption. Replacements should not need rewiring
  or reprogramming. In addition, many innovations have been created, such as state-based control and
  self-learning diagnostic routines, which have raised the ability of the controller to detect, annunciate and
  describe problems within the components. For many users, the ability to maintain and revise the system
  without shutting down o ers an acceptable level of availability, especially if the change or repair can be
  completed in minutes.




   3                                                                                    www.thinklogical.com
White Paper - Redundant Fiber-Based Systems




Critical system components:

       Uninterrupted power supply (UPS)
       Redundant power supplies
       Redundant components
         - Chassis
         - Processors
         - I/O modules
         - Sensors and actuators
         - PCs/HMI
         - Networks
         - Media
         - Servers
         - Databases




Thinklogical’s System Contingencies

Power supply redundancy is a very popular means to increase system reliability. A single power supply
failure could have a catastrophic e ect that equates to a tremendous amount of lost revenue. This need
for system integrity and guaranteed performance in these demanding conditions necessitates power
redundancy. Therefore, all of Thinklogical’s routing and modular extension products are equipped with
redundant, hot-swappable power supplies.

Thinklogical’s VX and HDX line of routers are designed with hot-swappable critical system components,
such as cooling fans and pluggable optics (SFP+), thus minimizing business impact in the unlikely event
a component should fail. The hot-swappable I/O boards also provide excellent in-service expansion
capabilities allowing the router to be recon gured without interrupting signal processing by powering
down the router. In addition, the HDX Router line is equipped with dual controller cards with the ability
to switch between cards in the event of a failure.




4                                                                              www.thinklogical.com
White Paper - Redundant Fiber-Based Systems




Models of Redundancy

There are a number of common redundancy models used in the industry, such as Standby Redundancy
and Dual Modular Redundancy, or Parallel Redundancy.

Standby Redundancy

Standby Redundancy refers to a con guration where there is an identical secondary unit to backup the
primary unit. Under standby redundancy they do not share any of the load and they start operating only
when active components fail. In addition, a third party may be needed to monitor the system and give
the command when a switchover condition is met.

In standby redundancy, the components are set to have three state: Cold, Warm and Hot Standby.
Typically in Cold Standby the secondary unit is powered o in order to preserve the life of the unit. The
disadvantage of this model is that there is a signi cant time delay in getting the replacement system up
and running. While the hardware and software are available the unit needs to be powered up before it
can be brought online into a known state.

Warm Standby has a faster response time because the backup (redundant) system is always running and
regularly synchronized with the Device Under Control (DUC). When a failure occurs on the primary
system, the redundant system can disconnect from the failed system and connect to the backup system.
This allows the system to recover fairly quickly (usually within seconds) and continue to work. Although
some data will be lost during this disconnect/reconnect cycle, warm standby can be an acceptable
solution where some data loss can be tolerated.

In these types of redundant models the switching is not seamless and adds to the probability of failure
within a given system. To o set this increased probability, additional hardware (a third party voter) can
be added to the redundancy con guration to help assist in the switching from the primary to secondary
source. While these system components add to the reliability, they are normally connected in series,
which creates a hybrid parallel-series connection and introduces another point of failure for the system.
In addition, the system cost typically doubles with the additional hardware.




5                                                                                   www.thinklogical.com
White Paper - Redundant Fiber-Based Systems




Hot standby means that both the primary and secondary data systems run simultaneously and both are
providing identical data streams to the downstream client. If the primary system fails, the switchover to
the secondary system is intended to be completely seamless, or “bumpless,” with no data loss. Hot
Standby is the best choice for systems that cannot tolerate the data loss of a Cold or Warm Standby
system. There are some variations of the Hot Standby model, such as Dual Modular Redundancy or
Parallel Redundancy. The di erentiating factor between these models is how tightly the primary and
secondary units are synchronized.

Dual Modular Redundancy (DMR) or Parallel Redundancy

The approach of having multiple units running completely synchronized and in parallel is known as
DMR, or Parallel Redundancy. This model typically has rapid switchover time.

There are three basic tenets of dual system redundancy:
1. Physical separation of signal paths
2. Dual-chassis redundant signal controllers
3. Synchronization of status information

A DMR routing and extension system is con gured with two tightly synchronized primary and
secondary routers running in parallel. These routers mirror one another with identical signals being
sent through both of them at the same time. These signals are sent to their destination at a receiver
component. Deciding which unit is correct can be challenging if you have more than one router.
Having to choose which unit you are going to “trust the most” defeats the purpose (by arbitrarily giving
one router priority without dynamic review of operating parameters). Also monitoring and determining
when to switch to the secondary unit can be complicated.




6                                                                             www.thinklogical.com
White Paper - Redundant Fiber-Based Systems




The Thinklogical Advantage

Thinklogical has designed a cost-e ective, resilient solution to take the complexity out of the DMR approach.
The feature is designed into the SDI Xtreme 3G+ Receivers, and is known as a “switchover capability.” This allows
the component to receive identical streams on both input bers. By default it will attempt to synchronize to the
‘primary’ ber by searching for the synchronization characters in the received stream. Simultaneously, it will also
check the ‘secondary’ ber and attempt to synchronize to its stream. After a pre-determined amount of time,
whichever stream the receiver locks on to will be selected and the SDI data will then be decoded from that
stream. In the event that the selected stream loses synchronization, the receiver will automatically switch to the
other stream. There will be minimal loss of SDI video during this switchover. In order to prevent switching back
and forth between an intermittent signal, the receiver will continue to use the ‘switched-over’ stream regardless
of whether or not it re-acquires lock to the original stream. If an event occurs such that the switched-over
stream loses lock, then the receiver will attempt to switch back over to the original stream.




7                                                                                    www.thinklogical.com
White Paper - Redundant Fiber-Based Systems




This synchronization scheme ensures the maximum uptime in the event of a failure at any point in the
system. Interestingly, this approach mirrors the classic design common among disaster recovery
implementations. In fact, most highly available systems stick to this simple design pattern: a single,
high quality, multi-purpose physical system with comprehensive internal resiliency running
interdependent functions paired with a second, physically separated, duplicate system. The overriding
purpose of this design is the prevention of, or rapid recovery from, a failure, which allows a system to
continue to operate despite a partial or complete failure of any signi cant component.

Summary

The idea of redundancy is not di cult to grasp, but implementing it takes some thought. An initial
decision on Cold, Warm or Hot Standby will impact all aspects of the implementation. The choice of
proper hardware and robust system architecture is critical for a well functioning system.

It is clear that organizations cannot fully leverage the bene ts of redundancy models without a
comprehensive routing and extension solution. Thinklogical’s system solutions o er innovative
organizations the ability to create high density, scalable and redundant system architectures that
deliver broad functionality and provide high ROI. It is very important to keep in mind that lower
system cost doesn’t always equal lower total cost of ownership. More importantly, the cost of one
unplanned shutdown far outweighs the costs of redundancy. If data connectivity is crucial to the
success of the company or organization, it would be wise to consider the possibility of installing a
redundant system and to weigh the options carefully when choosing the key components.

About Thinklogical
Thinklogical is the leading manufacturer and provider of ber optic KVM/video extension solutions,
and ber matrix routers and switches. Organizations worldwide rely on Thinklogical's products and
solutions for optimal performance in secure visual computing environments. Through pioneering next
generation ber optic extension, switching, and server management technologies Thinklogical helps
customers reduce cost and simplify the management of complex computing infrastructures.




                                                                 © 2011 Thinklogical. All rights reserved.

                                                                 Thinklogical claims or other product information

                                        ¡
                                                                 contained in this document are subject to change

Extend   Distribute   Innovate                                   without notice. This document may not be reproduced,

                                                                 in whole or in part, without the express written consent

                                                                 of Thinklogical.
September 2011

Thinklogical White Paper: Redundant Fiber-Based Systems

  • 1.
    WHITE Redundant Fiber-Based Systems AThinklogical White Paper By Larry Wachter Senior Product Manager - Routing and Extension Solutions - Thinklogical This white paper illustrates the concept of redundant and resilient systems and how ber-based extension and routing solutions can maintain operability in the event of a failure.   www.thinklogical.com
  • 2.
    White Paper -Redundant Fiber-Based Systems Introduction At the most basic level, availability can be de ned as the probability that a system is operating successfully when needed. The term high availability has been used to encompass all things related to productivity, including reliability and maintainability. The adoption of high availability has led to redundant and resilient systems spurring a ripple e ect and ending with the creation of ber infrastructures which require products and solutions that provide various levels of fault-tolerance. In particular, this is true of ber-based routing and extension solutions, which not only provide mechanisms that aid in modular redundant system architecture, but also provide high bandwidth, cost-e ectiveness, and support for complex topologies. Consequently, Thinklogical has designed a redundant ber-based routing and extension solution that meets the requirements for reliable signal transmission in modular redundant system deployments. High Availability Achieved Through Redundant and Resilient Systems Redundancy can involve a variety of technologies, all of which pertain to physical backups, whereas resiliency deals primarily with communication protocols. A redundant device may activate as a result of a failure, but without built in resiliency as well, there could be data loss, or worse, the inability to establish the redundant connection. A resilient system will return to an operable state after encountering trouble. Therefore, if a risk event knocks a system o ine, a highly resilient system will resume its intended work and function with minimal downtime. Building a redundant and resilient system requires a holistic mentality. One must prioritize every foreseeable risk and then determine not only how to reduce the risk in the rst place, but determine how to minimize its impact on the system. The need or requirement for redundancy can be based on a set of system criteria questions: Does the system need to run around the clock and is downtime unacceptable? If a system fault occurs, should the primary system switch over to the secondary system seamlessly? What is the degree to which the data shared between sources and destinations must remain constant and reliable? How can single points of failure within the system be minimized and how can one ensure that components within the infrastructure will not stop the overall operation of the system? 2 www.thinklogical.com
  • 3.
    Application Diagram BROADCAST & POST-PRODUCTION BRIEF White Paper - Redundant Fiber-Based Systems High availability, achieved through redundancy and fault tolerance, is a critical component of many routing and extension installations, especially in secure visual computing environments. While the loss of an enterprise system for a few minutes is inconvenient, losing a secure visual computing system can have disastrous consequences. Some form of redundancy and fault tolerance is generally used if a control system shutdown or loss of visibility causes a major loss of revenue, loss of equipment, disruption to public services and/or safety. Redundancy in these situations means the duplication, or even triplication, of equipment that is needed to operate without disruption, if and when the primary equipment fails during the mission. In these types of environments the cost of failure is so high that a redundant system approach is crucial. By using a ber-based solution that supports redundant system design, users enjoy highly reliable data transmission, reduced costs of deployment and a guaranteed upgrade strategy as requirements evolve. This white paper will touch upon several various redundant and fault tolerant features and architectures for ber-based infrastructures, but will focus primarily on Dual Modular Redundancy, otherwise known as Parallel Redundancy, which is the approach taken by Thinklogical systems. This paper will also highlight features within the Thinklogical product lines that can help achieve higher availability. Redundancy on a Component Level The most important place to start to guarantee reliable operation is to provide redundant, hot-swappable components. It is also critical that modules or components should be capable of being removed, replaced or added to the system without interruption. Replacements should not need rewiring or reprogramming. In addition, many innovations have been created, such as state-based control and self-learning diagnostic routines, which have raised the ability of the controller to detect, annunciate and describe problems within the components. For many users, the ability to maintain and revise the system without shutting down o ers an acceptable level of availability, especially if the change or repair can be completed in minutes. 3 www.thinklogical.com
  • 4.
    White Paper -Redundant Fiber-Based Systems Critical system components: Uninterrupted power supply (UPS) Redundant power supplies Redundant components - Chassis - Processors - I/O modules - Sensors and actuators - PCs/HMI - Networks - Media - Servers - Databases Thinklogical’s System Contingencies Power supply redundancy is a very popular means to increase system reliability. A single power supply failure could have a catastrophic e ect that equates to a tremendous amount of lost revenue. This need for system integrity and guaranteed performance in these demanding conditions necessitates power redundancy. Therefore, all of Thinklogical’s routing and modular extension products are equipped with redundant, hot-swappable power supplies. Thinklogical’s VX and HDX line of routers are designed with hot-swappable critical system components, such as cooling fans and pluggable optics (SFP+), thus minimizing business impact in the unlikely event a component should fail. The hot-swappable I/O boards also provide excellent in-service expansion capabilities allowing the router to be recon gured without interrupting signal processing by powering down the router. In addition, the HDX Router line is equipped with dual controller cards with the ability to switch between cards in the event of a failure. 4 www.thinklogical.com
  • 5.
    White Paper -Redundant Fiber-Based Systems Models of Redundancy There are a number of common redundancy models used in the industry, such as Standby Redundancy and Dual Modular Redundancy, or Parallel Redundancy. Standby Redundancy Standby Redundancy refers to a con guration where there is an identical secondary unit to backup the primary unit. Under standby redundancy they do not share any of the load and they start operating only when active components fail. In addition, a third party may be needed to monitor the system and give the command when a switchover condition is met. In standby redundancy, the components are set to have three state: Cold, Warm and Hot Standby. Typically in Cold Standby the secondary unit is powered o in order to preserve the life of the unit. The disadvantage of this model is that there is a signi cant time delay in getting the replacement system up and running. While the hardware and software are available the unit needs to be powered up before it can be brought online into a known state. Warm Standby has a faster response time because the backup (redundant) system is always running and regularly synchronized with the Device Under Control (DUC). When a failure occurs on the primary system, the redundant system can disconnect from the failed system and connect to the backup system. This allows the system to recover fairly quickly (usually within seconds) and continue to work. Although some data will be lost during this disconnect/reconnect cycle, warm standby can be an acceptable solution where some data loss can be tolerated. In these types of redundant models the switching is not seamless and adds to the probability of failure within a given system. To o set this increased probability, additional hardware (a third party voter) can be added to the redundancy con guration to help assist in the switching from the primary to secondary source. While these system components add to the reliability, they are normally connected in series, which creates a hybrid parallel-series connection and introduces another point of failure for the system. In addition, the system cost typically doubles with the additional hardware. 5 www.thinklogical.com
  • 6.
    White Paper -Redundant Fiber-Based Systems Hot standby means that both the primary and secondary data systems run simultaneously and both are providing identical data streams to the downstream client. If the primary system fails, the switchover to the secondary system is intended to be completely seamless, or “bumpless,” with no data loss. Hot Standby is the best choice for systems that cannot tolerate the data loss of a Cold or Warm Standby system. There are some variations of the Hot Standby model, such as Dual Modular Redundancy or Parallel Redundancy. The di erentiating factor between these models is how tightly the primary and secondary units are synchronized. Dual Modular Redundancy (DMR) or Parallel Redundancy The approach of having multiple units running completely synchronized and in parallel is known as DMR, or Parallel Redundancy. This model typically has rapid switchover time. There are three basic tenets of dual system redundancy: 1. Physical separation of signal paths 2. Dual-chassis redundant signal controllers 3. Synchronization of status information A DMR routing and extension system is con gured with two tightly synchronized primary and secondary routers running in parallel. These routers mirror one another with identical signals being sent through both of them at the same time. These signals are sent to their destination at a receiver component. Deciding which unit is correct can be challenging if you have more than one router. Having to choose which unit you are going to “trust the most” defeats the purpose (by arbitrarily giving one router priority without dynamic review of operating parameters). Also monitoring and determining when to switch to the secondary unit can be complicated. 6 www.thinklogical.com
  • 7.
    White Paper -Redundant Fiber-Based Systems The Thinklogical Advantage Thinklogical has designed a cost-e ective, resilient solution to take the complexity out of the DMR approach. The feature is designed into the SDI Xtreme 3G+ Receivers, and is known as a “switchover capability.” This allows the component to receive identical streams on both input bers. By default it will attempt to synchronize to the ‘primary’ ber by searching for the synchronization characters in the received stream. Simultaneously, it will also check the ‘secondary’ ber and attempt to synchronize to its stream. After a pre-determined amount of time, whichever stream the receiver locks on to will be selected and the SDI data will then be decoded from that stream. In the event that the selected stream loses synchronization, the receiver will automatically switch to the other stream. There will be minimal loss of SDI video during this switchover. In order to prevent switching back and forth between an intermittent signal, the receiver will continue to use the ‘switched-over’ stream regardless of whether or not it re-acquires lock to the original stream. If an event occurs such that the switched-over stream loses lock, then the receiver will attempt to switch back over to the original stream. 7 www.thinklogical.com
  • 8.
    White Paper -Redundant Fiber-Based Systems This synchronization scheme ensures the maximum uptime in the event of a failure at any point in the system. Interestingly, this approach mirrors the classic design common among disaster recovery implementations. In fact, most highly available systems stick to this simple design pattern: a single, high quality, multi-purpose physical system with comprehensive internal resiliency running interdependent functions paired with a second, physically separated, duplicate system. The overriding purpose of this design is the prevention of, or rapid recovery from, a failure, which allows a system to continue to operate despite a partial or complete failure of any signi cant component. Summary The idea of redundancy is not di cult to grasp, but implementing it takes some thought. An initial decision on Cold, Warm or Hot Standby will impact all aspects of the implementation. The choice of proper hardware and robust system architecture is critical for a well functioning system. It is clear that organizations cannot fully leverage the bene ts of redundancy models without a comprehensive routing and extension solution. Thinklogical’s system solutions o er innovative organizations the ability to create high density, scalable and redundant system architectures that deliver broad functionality and provide high ROI. It is very important to keep in mind that lower system cost doesn’t always equal lower total cost of ownership. More importantly, the cost of one unplanned shutdown far outweighs the costs of redundancy. If data connectivity is crucial to the success of the company or organization, it would be wise to consider the possibility of installing a redundant system and to weigh the options carefully when choosing the key components. About Thinklogical Thinklogical is the leading manufacturer and provider of ber optic KVM/video extension solutions, and ber matrix routers and switches. Organizations worldwide rely on Thinklogical's products and solutions for optimal performance in secure visual computing environments. Through pioneering next generation ber optic extension, switching, and server management technologies Thinklogical helps customers reduce cost and simplify the management of complex computing infrastructures. © 2011 Thinklogical. All rights reserved. Thinklogical claims or other product information ¡ contained in this document are subject to change Extend Distribute Innovate without notice. This document may not be reproduced, in whole or in part, without the express written consent of Thinklogical. September 2011