Systems Design Experiences
                   or
 Just Some War Stories…

           Abhay Ghaisas
     Product Architect, BMC Software




                                       1
Iridium – Background
   Satellite telephony project
     • 66 polar LEO satellites in six orbital planes
     • Provides telephony and messaging
     • Satellites have straight and cross links for
        communication
    •   System control segment on ground with
        multiple gateways
    •   Software development began in 1995



                                                       2
System Challenges
 Huge software system
 Interacting components

 Frameworks in nascent stage: CORBA

 Immature development processes: no
  UML
   • Experiment with different OOAD methods
   • Invent some of your own
 New language: C++


                                              3
Messaging Subsystem
   Function
     • Get messages (pages) with subscriber id
     • Locate the subscriber from HLR / VLR
     • Choose satellites and schedule of delivery
        • Multiple deliveries form different angles
     • Create transaction record




                                                      4
Messaging Subsystem
   Expectations
     • High throughput
        • Live system to support all messaging
     • High availability
        • Downtime has direct business impact
     • Fault tolerance
        • No single point of failure



                                                 5
Challenges
 No standard way of horizontal scaling
 No off-the-shelf distributed architecture
  • No application servers
 No distributed or clustered databases

 No off-the-shelf components really…




                                              6
Solution
   Build it yourself!




                               7
HA and FT Architecture
 Live and hot stand-by systems
   • Identical H/W and S/W
 Connected over two LANs
   • To avoid single point of failure




                                        8
HA and FT Architecture
   Hand-written demons to maintain health
    • Exchange heart-beat on both LANs
    • Watch for all processes to be alive
    • Declare switch-over in case of failure and
        initiate power recycle
    •   Take over from other system in case of a
        switch-over
    •   Hand shake on start-up to elect active and
        stand-by

                                                 9
HA and FT Architecture
    Cannot lose in-flight data
     • Relay each incoming message to stand-by
       system1
     • Stand-by to hold on to the data till active
       finishes transaction
     • Allows for quick take over by stand-by system in
       case of failure
    Ensure DB replication
      • No feature in DB itself
      • Active to relay DB changes to stand-by
      • Re-play DB changes on stand-by through code
1.   Log Updates

                                                     10
Other Fun Challenges
 Regularly test the limits of C++ compiler
   • C++ far off from standardization
 Test the limits of the source code control
  system




                                          11
Mobile Browser
 For early mobile phones – c. 2000
 No standard operating system
   • No standard memory management
   • No processes / scheduler
   • Memory mapped I/O
   • No file system
 Limited resources
   • Low memory
   • Poor horsepower
   • Limited real estate

                                      12
Re-invent
   How to parse HTML?
    • Cannot use standard parsers – none
        available
    •   Cannot write one with Lex and Yacc – too
        heavy
  •     Hand-write the parser – first principles
 What about data structures?
  • Hand-write all the data structures
  • Use statically allocated memory – manage
        it yourself

                                                   13
Re-invent
    Custom-made display framework
     • Memory mapped display
     • Interfaces that let you draw to the glass
     • Hand written layered XML display
         framework1
     •   Messaging to handle dynamic parts of the
         display
          • Animation, blink, and marquee!

1.   Use brute force

                                                   14
Small World
    Cannot assume a lot of resources
     • Use static limits1
     • Parse only what you can2
     • Display only what gets parsed




1.   Split resources
2.   Shed load

                                        15
Verification
 No devices available for early
  verification
 Components still to be manufactured!

 Some kits, some emulation

 Tap the display memory for automation




                                      16

Systems Design Experiences or Just Some War Stories…

  • 1.
    Systems Design Experiences or Just Some War Stories… Abhay Ghaisas Product Architect, BMC Software 1
  • 2.
    Iridium – Background  Satellite telephony project • 66 polar LEO satellites in six orbital planes • Provides telephony and messaging • Satellites have straight and cross links for communication • System control segment on ground with multiple gateways • Software development began in 1995 2
  • 3.
    System Challenges  Hugesoftware system  Interacting components  Frameworks in nascent stage: CORBA  Immature development processes: no UML • Experiment with different OOAD methods • Invent some of your own  New language: C++ 3
  • 4.
    Messaging Subsystem  Function • Get messages (pages) with subscriber id • Locate the subscriber from HLR / VLR • Choose satellites and schedule of delivery • Multiple deliveries form different angles • Create transaction record 4
  • 5.
    Messaging Subsystem  Expectations • High throughput • Live system to support all messaging • High availability • Downtime has direct business impact • Fault tolerance • No single point of failure 5
  • 6.
    Challenges  No standardway of horizontal scaling  No off-the-shelf distributed architecture • No application servers  No distributed or clustered databases  No off-the-shelf components really… 6
  • 7.
    Solution  Build it yourself! 7
  • 8.
    HA and FTArchitecture  Live and hot stand-by systems • Identical H/W and S/W  Connected over two LANs • To avoid single point of failure 8
  • 9.
    HA and FTArchitecture  Hand-written demons to maintain health • Exchange heart-beat on both LANs • Watch for all processes to be alive • Declare switch-over in case of failure and initiate power recycle • Take over from other system in case of a switch-over • Hand shake on start-up to elect active and stand-by 9
  • 10.
    HA and FTArchitecture  Cannot lose in-flight data • Relay each incoming message to stand-by system1 • Stand-by to hold on to the data till active finishes transaction • Allows for quick take over by stand-by system in case of failure  Ensure DB replication • No feature in DB itself • Active to relay DB changes to stand-by • Re-play DB changes on stand-by through code 1. Log Updates 10
  • 11.
    Other Fun Challenges Regularly test the limits of C++ compiler • C++ far off from standardization  Test the limits of the source code control system 11
  • 12.
    Mobile Browser  Forearly mobile phones – c. 2000  No standard operating system • No standard memory management • No processes / scheduler • Memory mapped I/O • No file system  Limited resources • Low memory • Poor horsepower • Limited real estate 12
  • 13.
    Re-invent  How to parse HTML? • Cannot use standard parsers – none available • Cannot write one with Lex and Yacc – too heavy • Hand-write the parser – first principles  What about data structures? • Hand-write all the data structures • Use statically allocated memory – manage it yourself 13
  • 14.
    Re-invent  Custom-made display framework • Memory mapped display • Interfaces that let you draw to the glass • Hand written layered XML display framework1 • Messaging to handle dynamic parts of the display • Animation, blink, and marquee! 1. Use brute force 14
  • 15.
    Small World  Cannot assume a lot of resources • Use static limits1 • Parse only what you can2 • Display only what gets parsed 1. Split resources 2. Shed load 15
  • 16.
    Verification  No devicesavailable for early verification  Components still to be manufactured!  Some kits, some emulation  Tap the display memory for automation 16

Editor's Notes

  • #10 Make actions restartable
  • #11 Log updates
  • #15 Use brute force
  • #16 Use static limit: Split resourcesParse only what you can: Shed load