20110908 marathon seminar_en1. Why we are focusing on zero down time ?
Operation site, automatic control, information-related, operation integration and platform
Identifying the conditions for reliability and exploring the possibility from system of the now and future
September 8, 2011
Sapporo Sparkle k.k.
Satoe Kuwahara
Twitter @SatoeKuwahara
Marathon Technologies
Seminar “considering zero down time”
2. To overcome the tradeoff between “reliability and cost reduction”
Image of the trade-off is … “defensive approach”
Business loss caused Some costs are required
by down time to solve down time
Now trade-off has been changed… “aggressive approach”
Enhancing competitiveness Reducing cost
by elimination of down time by elimination of down time
Things only
zero down time Simple system
can manage Things zero down Operation cost
time can do Having impact
Marathon Technologies “Considering zero down time” 1 ©2011 SapporoSparkle kk
3. System features determine which HA to be realized. But…
Negative side of “High Availability” → determining which HA can be achieved
Two perspectives; (1) system feature, (2) balancing between cost and performance
Fail-over
Expensive → dealing with
non-stop machine possibility of
Solving problem system stop
at middleware level
Developing
System should not
from scratch be stopped
Commodity/ System area that Disadvantage
Windows base goes over the Compatibility with
system
boundary
But… “system going over the boundary of features” and “integrated system”
Marathon Technologies “Considering zero down time” 2 ©2011 SapporoSparkle kk
4. Why do we need to achieve zero down time?
Which is better, “High Availability” or “Zero Down Time”?
Why do we need to achieve zero down time?
For which system?
1. System failure will cause fatal damage to the system
2. Data loss will cause fatal damage to the system
3. Repair is difficult, or it takes a long time
4. Data recovery is difficult, or it takes a long time
5. System failure will give major impact on other systems. Because the
systems are closely related each other and may be combined.
In the past; problem of processing by system unit and data
Now; problem of devices related, machine and data
In the future; problem of systems related and overall system
Marathon Technologies “Considering zero down time” 3 ©2011 SapporoSparkle kk
5. Necessary for limited but lots of systems
System requires zero down time
Limited target. System operating in multiple areas.
Now these are expanding “combination (unification and integration)”
- The most advanced system, automatic control
- Related to observation and platform
- related to information/control integration, BtoB and Real-time-Process
- Connected Device, Cloud/web2.0, Social, Smart-Model
Another problem we are facing now
Accumulated HA… middleware and hardware layer
- HA with specific focus interferes each other. There are some unnecessary
processes.
- Overall system becomes complicated → disadvantage, failure and reducing
performance etc.
- Incompatibility between versions. Difference in the period of use.
Possibility of system and TCO perspective “Simplicity”
→ HA should not limit the complexity or the ingenuity of design and the usage.
Marathon Technologies “Considering zero down time” 4 ©2011 SapporoSparkle kk
6. Closely associated system… and real-time process
Combination of autonomous system creates one process
Common understanding; autonomous system reliability
Common understanding; provider ensures reliability
Common understanding; good overall balance creates reliability. “Return to start” when anything happen.
★Concept that simplifies the summation. Based on multiple systems operate together.
Operation External data/
system application
Base/operation Communication
server server
Real-time
database Application
RDWH
Monitoring
system
Unit box
Operation
server
Marathon Technologies “Considering zero down time” 5 ©2011 SapporoSparkle kk
7. Summary – looking at zero down time from design perspective
Zero down time system requires:
1. Competitiveness by wise use of zero down time
2. More large and advanced system
3. Common platform, related to control/information integration
4. Simplicity, it strengthens system
5. Cooperating with cloud and social infrastructure
・It is important to keep “committed Integrity”
from both sides of business and system
・Balance of overall system determines availability
・New “Availability Balance” by Technology
●Let’s think about “down time” from design perspective
- Grand design requires availability
- Ensuring availability creates possibility
- Simplicity creates value
Marathon Technologies “Considering zero down time” 6 ©2011 SapporoSparkle kk