Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Class 2

478 views

Published on

  • Be the first to comment

  • Be the first to like this

Class 2

  1. 1. CES 592 Telecommunications System Product Verification Sonoma State University Fall 2004 <ul><li>Class Lecture 2: </li></ul><ul><li>High-Availability Architectures, </li></ul><ul><li>Testing Constraints, Network Failures, and Test Strategies </li></ul>
  2. 2. Telecom System Architecture <ul><li>History – Some telecom network failures </li></ul><ul><li>High Availability defined </li></ul><ul><li>HW / SW Architectures </li></ul><ul><li>Unique problems/constraints/goals </li></ul><ul><ul><li>ISO 9001 & TL 9000 </li></ul></ul><ul><ul><li>Other constraints </li></ul></ul><ul><li>Testing for High-Availability </li></ul>
  3. 3. Communications Network Failures <ul><li>October 27, 1980 ARPAnet collapse – self-propagating error required all systems to be restarted. 4 hr outage </li></ul><ul><li>January 15, 1990 AT&T nationwide outage – error in C code caused self-propagating 9 hour outage, 5 million blocked calls </li></ul><ul><li>June 27, 1991, 8 million line outage – self-propagating error in untested code patch </li></ul>
  4. 4. Communications Network Failures <ul><li>Famous Fiber Cuts (“Backhoe Fade”) </li></ul><ul><ul><li>11/19/1990, 150,000 phone lines, outage lasted several hours </li></ul></ul><ul><ul><li>12/4/1991, 100,000 phone lines, several hours, interrupted FAA flight control and NY Mercantile Exchange </li></ul></ul><ul><li>From Computer Related Risks , </li></ul><ul><li>by Peter G. Neumann </li></ul>
  5. 5. Telecom System Architecture <ul><li>History – Some telecom network failures </li></ul><ul><li>High Availability defined </li></ul><ul><li>HW / SW Architectures </li></ul><ul><li>Unique problems/constraints/goals </li></ul><ul><ul><li>ISO 9001 & TL 9000 </li></ul></ul><ul><ul><li>Other constraints </li></ul></ul><ul><li>Testing for High-Availability </li></ul>
  6. 6. High Availability Defined <ul><li>99% Uptime = 3 days downtime / year </li></ul><ul><li>99.9% Uptime = 8 hours downtime / year </li></ul><ul><li>99.95% = 4 hours, 23 minutes / year </li></ul><ul><li>99.99% = 53 minutes / year </li></ul><ul><li>99.999% = 5 minutes / year (‘five nines’) </li></ul><ul><li>99.9999% = 30 seconds / year </li></ul><ul><li>Reference Telcordia GR-1110, TR-332 </li></ul>
  7. 7. High Availability Defined <ul><li>Expectation for reliable communications, even in the event of major disasters – when communications are needed most </li></ul><ul><li>Service Level Agreements (SLA) </li></ul><ul><li>Automatic and instantaneous recovery from internal & external faults </li></ul><ul><li>Very high Mean Time Between Failure (MTBF) 100,000+ hours </li></ul>
  8. 8. Telecom System Architecture <ul><li>History – Some telecom network failures </li></ul><ul><li>High Availability defined </li></ul><ul><li>HW / SW Architectures </li></ul><ul><li>Unique problems/constraints/goals </li></ul><ul><ul><li>ISO 9001 & TL 9000 </li></ul></ul><ul><ul><li>Other constraints </li></ul></ul><ul><li>Testing for High-Availability </li></ul>
  9. 9. High Availability Telecom System HW & SW Architectures <ul><li>Designed for High Availability </li></ul><ul><ul><li>Single fault tolerant </li></ul></ul><ul><ul><li>Low probability of double fault </li></ul></ul><ul><li>Passive backplane (higher reliability – MTBF 1,000,000+ hrs) </li></ul><ul><li>Modular Design - Fault in one card won’t impact other cards </li></ul><ul><li>Separation of control plane and data plane </li></ul><ul><li>Card Redundancy - “Hot Redundancy” (maintain sync) </li></ul><ul><li>On-line HW Replacement – “Hot Swappable” </li></ul><ul><li>In-Service errorless SW, FW, FPGA Upgrades/Downgrades </li></ul><ul><li>Alarm logs, audit logs, and provisioned settings preserved </li></ul><ul><li>Microprocessor Watchdog Timers / Heartbeat </li></ul><ul><li>User errors minimized (major source of outages) </li></ul><ul><li>Graceful shutdown / restoration </li></ul><ul><li>“ Hardened” hardware for operation at extreme power, temperature, humidity, corrosion, ESD, & vibration levels </li></ul><ul><li>Network designed for Link and Node protection </li></ul>
  10. 10. Telecom System Architecture <ul><li>History – Some telecom network failures </li></ul><ul><li>High Availability defined </li></ul><ul><li>HW / SW Architectures </li></ul><ul><li>Unique problems/constraints/goals </li></ul><ul><ul><li>ISO 9001 & TL 9000 </li></ul></ul><ul><ul><li>Other Constraints </li></ul></ul><ul><li>Testing for High-Availability </li></ul>
  11. 11. Unique Problems, Constraints, Goals: ISO 9001 & TL 9000 <ul><li>ISO 9001 Covers Quality Assurance in Design/Development, Production, Installation and Servicing </li></ul><ul><li>Certification process: </li></ul><ul><ul><li>Document quality processes of your organization </li></ul></ul><ul><ul><li>Audit by Registrar </li></ul></ul><ul><ul><li>Certification and follow-up inspections </li></ul></ul><ul><li>“ With ISO 9000 you can still have terrible processes and products. You can certify a manufacturer that makes life jackets from concrete, as long as those jackets are made according to the documented procedures” Richard Buetow, Director of Corporate Quality, Motorola </li></ul>
  12. 12. Unique Problems, Constraints, Goals: ISO 9001 & TL 9000
  13. 13. Unique Problems, Constraints, Goals: ISO 9001 & TL 9000 <ul><li>TL 9000 is a quality management process for design, development, manufacturing, delivery, installation, and maintenance of telecommunications hardware and software. </li></ul><ul><ul><ul><li>“ The organization shall establish and maintain a method to trace documented requirements through design and test.” </li></ul></ul></ul>
  14. 14. Unique problems, constraints, goals: Other Constraints <ul><li>Can not test on a live operational network, therefore the testing configuration must be representative </li></ul><ul><li>Must comply to numerous national and international standards from multiple standards bodies </li></ul>
  15. 15. Telecom System Architecture <ul><li>History – Some telecom network failures </li></ul><ul><li>High Availability defined </li></ul><ul><li>HW / SW Architectures </li></ul><ul><li>Unique problems/constraints/goals </li></ul><ul><ul><li>ISO 9001 & TL 9000 </li></ul></ul><ul><ul><li>Other constraints </li></ul></ul><ul><li>Testing for High-Availability </li></ul>
  16. 16. The Product Development Cycle System Spec Software Spec New Product Idea Software Development SW Unit Test HW-SW Integration Hardware Spec Hardware Development HW Unit Test Product Verification Engineering Development functions Product Verification functions Customer & market Driven inputs Product Line Management & Engineering inputs Release to manufacture
  17. 17. Product Verification Phase Formal Product Verification Phase Software Verification HW-SW Integration Test HW Compliance & Agency approvals Release to Production HW Stress Testing HW Standards/Reqts testing Software Verification Volume Production
  18. 18. Testing for High Availability <ul><li>“Networks are very complex systems and the only way to test them is to partition them into manageable layers and functions. Doing this is truly an art” </li></ul><ul><li>- Robert Buchanan, Jr. </li></ul>
  19. 19. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  20. 20. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><ul><li>Planned test strategy </li></ul></ul><ul><ul><li>Thorough, well though out test plan: </li></ul></ul><ul><ul><ul><li>Test cases traceable back to specifications </li></ul></ul></ul><ul><ul><ul><li>Trade-off decisions made for permutations not performed </li></ul></ul></ul><ul><ul><ul><li>Test plan includes positive and negative test cases </li></ul></ul></ul><ul><ul><ul><li>Test case pre-defines unambiguous pass/fail criteria </li></ul></ul></ul><ul><ul><ul><li>Test environment is described in detail </li></ul></ul></ul><ul><ul><ul><li>Risks are anticipated and managed with contingency plans </li></ul></ul></ul><ul><ul><li>And yet, controlled randomness in the test case </li></ul></ul><ul><ul><li>Prioritization: run the important tests early </li></ul></ul><ul><ul><li>Focus on areas of greatest risk – system state transitions </li></ul></ul><ul><ul><li>Learn from the bugs that you find </li></ul></ul><ul><ul><li>Use of automation to increase coverage, reduce schedule </li></ul></ul><ul><ul><li>Continuous refining, improving of test plan & test cases </li></ul></ul><ul><ul><li>Portions from High Quality Software Engineering , by Ross Collard </li></ul></ul>
  21. 21. Testing for High Availability <ul><li>Major Causes of Defects NOT Being Found </li></ul><ul><li>36% Scope: scenario was beyond the test strategy </li></ul><ul><li>21% Permutations: an untested combination failed </li></ul><ul><li>9% Stochastic: random failure that did not occur during testing </li></ul><ul><li>6% Process: non-compliance in the way the test was performed </li></ul><ul><li>5% Oversight: the problem was missed by the tester </li></ul><ul><li>3% Coverage: the test scenario was not included in the test strategy </li></ul><ul><li>3% Incomplete test: the scenario was in the test plan, but not included in the test cases </li></ul><ul><li>(Percent of all defects which were not found) </li></ul><ul><li>Study by Tellabs, 1998 </li></ul>
  22. 22. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  23. 23. Testing for High Availability <ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><ul><li>Verify that interfaces meet requirements of standards (GR-253…) </li></ul></ul><ul><ul><li>Verify fail-over performance meets standards (60 mS) </li></ul></ul>
  24. 24. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  25. 25. Elements of Hardware Verification Hardware Verification Compliance & Agency Approvals Stress Testing HALT/HASS Standards based Testing Physical Layer Logical Layer EMC Safety NEBS Telecom Design Stress Testing Accelerated Life-cycle Testing (Beyond normal operating limits) Where does it break?
  26. 26. Testing for High Availability <ul><li>Environmental testing for hardware </li></ul><ul><ul><li>Operation over temperature, supply voltage, vibration </li></ul></ul><ul><ul><li>Monitor software performance (traffic, alarms) during environmental testing </li></ul></ul>
  27. 27. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  28. 28. Testing for High Availability <ul><li>Stress / Load testing for software </li></ul><ul><ul><li>Multiple, simultaneous traffic types </li></ul></ul><ul><ul><li>Errors on input interface (example: bit errors) </li></ul></ul><ul><ul><li>Maximum user activity: db backup, multiple session launch, multiple data requests </li></ul></ul><ul><ul><li>Bottleneck / over-subscription of data traffic </li></ul></ul><ul><ul><li>Fail-over testing: single failure </li></ul></ul><ul><ul><li>Fail-over testing: double failure </li></ul></ul><ul><ul><li>Startup under stress, load, & errors </li></ul></ul><ul><ul><li>Alarm hysteresis, holdoff & alarm storms </li></ul></ul><ul><ul><li>Goal: more stress/load than SW will ever see operationally (find the breaking point) </li></ul></ul>
  29. 29. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  30. 30. Testing for High Availability <ul><li>HW & SW fault insertion testing </li></ul><ul><ul><li>Any hardware subsystem / module failure </li></ul></ul><ul><ul><ul><li>CPU reset </li></ul></ul></ul><ul><ul><ul><li>Power supply failure </li></ul></ul></ul><ul><ul><ul><li>Oscillator failure </li></ul></ul></ul><ul><ul><ul><li>Data bus / Address bus line fail high/low </li></ul></ul></ul><ul><ul><ul><li>Memory corruption </li></ul></ul></ul><ul><ul><li>File corruption </li></ul></ul><ul><ul><li>Resource exhaustion (memory, file handles, sockets, semaphores…) </li></ul></ul><ul><ul><li>User error – software should protect against </li></ul></ul>
  31. 31. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  32. 32. Testing for High Availability <ul><li>Interoperability testing </li></ul><ul><ul><li>Testing interfaces which pass data back and forth </li></ul></ul><ul><ul><li>Compatibility testing with other equipment </li></ul></ul><ul><ul><li>Compatibility testing with other vendor’s equipment </li></ul></ul><ul><ul><li>Consider both hardware and software versions </li></ul></ul><ul><ul><li>Compatibility with current, prior, and next version of: </li></ul></ul><ul><ul><ul><li>Operating System (Unix, Solaris, Windows) </li></ul></ul></ul><ul><ul><ul><li>Java / JRE </li></ul></ul></ul><ul><ul><li>Configurations to be tested must be prioritized by their importance and risk </li></ul></ul>
  33. 33. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  34. 34. Testing for High Availability <ul><li>Soak testing / longevity testing – continuous operation </li></ul><ul><ul><li>Telecom equipment designed for continuous operation for months, years, even decades </li></ul></ul><ul><ul><li>Run in lab for 14 hours? 7 days? 3 weeks? </li></ul></ul><ul><ul><li>1 system for 1 year = 8 systems for 45 days? </li></ul></ul><ul><ul><li>Mixture of clean and errored traffic </li></ul></ul><ul><ul><li>Monitor for traffic interruptions, alarms </li></ul></ul><ul><ul><li>Monitor performance counters: </li></ul></ul><ul><ul><ul><li>Error-free seconds counter </li></ul></ul></ul><ul><ul><ul><li>Bad packet counter </li></ul></ul></ul>
  35. 35. Testing for High Availability <ul><li>Systematic, structured testing </li></ul><ul><li>ANSI / Telcordia / ITU-T Standards-based testing </li></ul><ul><li>Environmental testing for hardware </li></ul><ul><li>Stress / Load testing for software </li></ul><ul><li>HW & SW fault insertion testing </li></ul><ul><li>Interoperability testing </li></ul><ul><li>Soak testing – continuous operation </li></ul><ul><li>Statistical sampling for manufacturing </li></ul>
  36. 36. Testing for High Availability <ul><li>Statistical sampling for manufacturing </li></ul><ul><ul><li>System testing done with small number of prototype hardware cards </li></ul></ul><ul><ul><li>Reliability testing must be done on large sample of production hardware cards </li></ul></ul><ul><ul><li>Must be done for a long period to be statistically significant: thousands of hours of total run time </li></ul></ul>
  37. 37. References <ul><li>“ Computer Related Risks”, Peter G. Neumann </li></ul><ul><li>“ Code Complete”, Steve McConnell </li></ul><ul><li>“ Software Testing and Quality Assurance”, Ross Collard </li></ul><ul><li>True Random Numbers: http://www.random.org/nform.html </li></ul><ul><li>Testing Computer Software, by C. Kaner, J. Falk, and H. Nguyen </li></ul><ul><li>IEEE Standard for Software Test Documentation, Std 829 - 1998 </li></ul><ul><li>Black-Box Testing: Techniques for Functional Testing of Software and Systems, Boris Beizer, Wiley, 1995 </li></ul><ul><li>Managing the Testing Process, Rex Black </li></ul><ul><li>Classic Testing Mistakes, Brian Marick </li></ul><ul><li>http:// www.testing.com/writings/classic/mistakes.pdf </li></ul><ul><li>Software QA / Test Resource Center </li></ul><ul><li>http:// www.softwareqatest.com/index.html </li></ul>

×