This white paper illustrates the concept of redundant and resilient systems and how fiber-based extension and routing solutions can maintain operability in the event of a failure.
Thinklogical White Paper: Redundant Fiber-Based Systems
1. WHITE
Redundant Fiber-Based
Systems
A Thinklogical White Paper
By Larry Wachter
Senior Product Manager - Routing and Extension Solutions - Thinklogical
This white paper illustrates the concept of redundant and resilient
systems and how ber-based extension and routing solutions can
maintain operability in the event of a failure.
www.thinklogical.com
2. White Paper - Redundant Fiber-Based Systems
Introduction
At the most basic level, availability can be de ned as the probability that a system is operating
successfully when needed. The term high availability has been used to encompass all things related to
productivity, including reliability and maintainability. The adoption of high availability has led to
redundant and resilient systems spurring a ripple e ect and ending with the creation of ber
infrastructures which require products and solutions that provide various levels of fault-tolerance. In
particular, this is true of ber-based routing and extension solutions, which not only provide mechanisms
that aid in modular redundant system architecture, but also provide high bandwidth, cost-e ectiveness,
and support for complex topologies. Consequently, Thinklogical has designed a redundant ber-based
routing and extension solution that meets the requirements for reliable signal transmission in modular
redundant system deployments.
High Availability Achieved Through Redundant and Resilient Systems
Redundancy can involve a variety of technologies, all of which pertain to physical backups, whereas
resiliency deals primarily with communication protocols. A redundant device may activate as a result of a
failure, but without built in resiliency as well, there could be data loss, or worse, the inability to establish
the redundant connection. A resilient system will return to an operable state after encountering trouble.
Therefore, if a risk event knocks a system o ine, a highly resilient system will resume its intended work
and function with minimal downtime.
Building a redundant and resilient system requires a holistic mentality. One must prioritize every
foreseeable risk and then determine not only how to reduce the risk in the rst place, but determine how
to minimize its impact on the system. The need or requirement for redundancy can be based on a set of
system criteria questions:
Does the system need to run around the clock and is downtime unacceptable?
If a system fault occurs, should the primary system switch over to the secondary system seamlessly?
What is the degree to which the data shared between sources and destinations must remain constant
and reliable?
How can single points of failure within the system be minimized and how can one ensure that
components within the infrastructure will not stop the overall operation of the system?
2 www.thinklogical.com
3. Application Diagram BROADCAST & POST-PRODUCTION BRIEF
White Paper - Redundant Fiber-Based Systems
High availability, achieved through redundancy and fault tolerance, is a critical component of many
routing and extension installations, especially in secure visual computing environments. While the loss of
an enterprise system for a few minutes is inconvenient, losing a secure visual computing system can have
disastrous consequences. Some form of redundancy and fault tolerance is generally used if a control
system shutdown or loss of visibility causes a major loss of revenue, loss of equipment, disruption to
public services and/or safety. Redundancy in these situations means the duplication, or even triplication,
of equipment that is needed to operate without disruption, if and when the primary equipment fails
during the mission. In these types of environments the cost of failure is so high that a redundant system
approach is crucial.
By using a ber-based solution that supports redundant system design, users enjoy highly reliable data
transmission, reduced costs of deployment and a guaranteed upgrade strategy as requirements evolve.
This white paper will touch upon several various redundant and fault tolerant features and architectures
for ber-based infrastructures, but will focus primarily on Dual Modular Redundancy, otherwise known as
Parallel Redundancy, which is the approach taken by Thinklogical systems. This paper will also highlight
features within the Thinklogical product lines that can help achieve higher availability.
Redundancy on a Component Level
The most important place to start to guarantee reliable operation is to provide redundant,
hot-swappable components. It is also critical that modules or components should be capable of being
removed, replaced or added to the system without interruption. Replacements should not need rewiring
or reprogramming. In addition, many innovations have been created, such as state-based control and
self-learning diagnostic routines, which have raised the ability of the controller to detect, annunciate and
describe problems within the components. For many users, the ability to maintain and revise the system
without shutting down o ers an acceptable level of availability, especially if the change or repair can be
completed in minutes.
3 www.thinklogical.com
4. White Paper - Redundant Fiber-Based Systems
Critical system components:
Uninterrupted power supply (UPS)
Redundant power supplies
Redundant components
- Chassis
- Processors
- I/O modules
- Sensors and actuators
- PCs/HMI
- Networks
- Media
- Servers
- Databases
Thinklogical’s System Contingencies
Power supply redundancy is a very popular means to increase system reliability. A single power supply
failure could have a catastrophic e ect that equates to a tremendous amount of lost revenue. This need
for system integrity and guaranteed performance in these demanding conditions necessitates power
redundancy. Therefore, all of Thinklogical’s routing and modular extension products are equipped with
redundant, hot-swappable power supplies.
Thinklogical’s VX and HDX line of routers are designed with hot-swappable critical system components,
such as cooling fans and pluggable optics (SFP+), thus minimizing business impact in the unlikely event
a component should fail. The hot-swappable I/O boards also provide excellent in-service expansion
capabilities allowing the router to be recon gured without interrupting signal processing by powering
down the router. In addition, the HDX Router line is equipped with dual controller cards with the ability
to switch between cards in the event of a failure.
4 www.thinklogical.com
5. White Paper - Redundant Fiber-Based Systems
Models of Redundancy
There are a number of common redundancy models used in the industry, such as Standby Redundancy
and Dual Modular Redundancy, or Parallel Redundancy.
Standby Redundancy
Standby Redundancy refers to a con guration where there is an identical secondary unit to backup the
primary unit. Under standby redundancy they do not share any of the load and they start operating only
when active components fail. In addition, a third party may be needed to monitor the system and give
the command when a switchover condition is met.
In standby redundancy, the components are set to have three state: Cold, Warm and Hot Standby.
Typically in Cold Standby the secondary unit is powered o in order to preserve the life of the unit. The
disadvantage of this model is that there is a signi cant time delay in getting the replacement system up
and running. While the hardware and software are available the unit needs to be powered up before it
can be brought online into a known state.
Warm Standby has a faster response time because the backup (redundant) system is always running and
regularly synchronized with the Device Under Control (DUC). When a failure occurs on the primary
system, the redundant system can disconnect from the failed system and connect to the backup system.
This allows the system to recover fairly quickly (usually within seconds) and continue to work. Although
some data will be lost during this disconnect/reconnect cycle, warm standby can be an acceptable
solution where some data loss can be tolerated.
In these types of redundant models the switching is not seamless and adds to the probability of failure
within a given system. To o set this increased probability, additional hardware (a third party voter) can
be added to the redundancy con guration to help assist in the switching from the primary to secondary
source. While these system components add to the reliability, they are normally connected in series,
which creates a hybrid parallel-series connection and introduces another point of failure for the system.
In addition, the system cost typically doubles with the additional hardware.
5 www.thinklogical.com
6. White Paper - Redundant Fiber-Based Systems
Hot standby means that both the primary and secondary data systems run simultaneously and both are
providing identical data streams to the downstream client. If the primary system fails, the switchover to
the secondary system is intended to be completely seamless, or “bumpless,” with no data loss. Hot
Standby is the best choice for systems that cannot tolerate the data loss of a Cold or Warm Standby
system. There are some variations of the Hot Standby model, such as Dual Modular Redundancy or
Parallel Redundancy. The di erentiating factor between these models is how tightly the primary and
secondary units are synchronized.
Dual Modular Redundancy (DMR) or Parallel Redundancy
The approach of having multiple units running completely synchronized and in parallel is known as
DMR, or Parallel Redundancy. This model typically has rapid switchover time.
There are three basic tenets of dual system redundancy:
1. Physical separation of signal paths
2. Dual-chassis redundant signal controllers
3. Synchronization of status information
A DMR routing and extension system is con gured with two tightly synchronized primary and
secondary routers running in parallel. These routers mirror one another with identical signals being
sent through both of them at the same time. These signals are sent to their destination at a receiver
component. Deciding which unit is correct can be challenging if you have more than one router.
Having to choose which unit you are going to “trust the most” defeats the purpose (by arbitrarily giving
one router priority without dynamic review of operating parameters). Also monitoring and determining
when to switch to the secondary unit can be complicated.
6 www.thinklogical.com
7. White Paper - Redundant Fiber-Based Systems
The Thinklogical Advantage
Thinklogical has designed a cost-e ective, resilient solution to take the complexity out of the DMR approach.
The feature is designed into the SDI Xtreme 3G+ Receivers, and is known as a “switchover capability.” This allows
the component to receive identical streams on both input bers. By default it will attempt to synchronize to the
‘primary’ ber by searching for the synchronization characters in the received stream. Simultaneously, it will also
check the ‘secondary’ ber and attempt to synchronize to its stream. After a pre-determined amount of time,
whichever stream the receiver locks on to will be selected and the SDI data will then be decoded from that
stream. In the event that the selected stream loses synchronization, the receiver will automatically switch to the
other stream. There will be minimal loss of SDI video during this switchover. In order to prevent switching back
and forth between an intermittent signal, the receiver will continue to use the ‘switched-over’ stream regardless
of whether or not it re-acquires lock to the original stream. If an event occurs such that the switched-over
stream loses lock, then the receiver will attempt to switch back over to the original stream.
7 www.thinklogical.com