Lahav Savir - Massively Scaleable Mobile Gateways

519 views
443 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
519
On SlideShare
0
From Embeds
0
Number of Embeds
52
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Lahav Savir - Massively Scaleable Mobile Gateways

  1. 1. RUNCOM MOBILE’S END-TO-END MESSAGING SOLUTION (MMGS) Lahav Savir Back-end Systems Architect Alon Klein VP R&D 26 June 2009 1
  2. 2. What is MMGS? • Instant Messaging (IM) gateway, connecting to public and private communities • Email gateway, providing mobile email service 2
  3. 3. Introduction • IXI Mobile founded in 2000 also doing business under the name of Runcom Mobile • IXI Mobile develops the Ogo family of mobile devices optimized for messaging applications • IXI provides end to end messaging solutions for smart phones 3
  4. 4. A Bit Of History • Passed through several generations of devices • Deployed worldwide with several communities • Used to work with 3rd party messaging gateway providers 4
  5. 5. We want our own solution ! • Control service quality • Control roadmap features • Maximum visibility to the system status • Ability to add unique optimizations for mobile networks • Self hosting • Own the IP • Time to market • Learn from past experience… • Royalty free 5
  6. 6. High Level Requirements • Keep it simple! • (Almost) Zero downtime • Huge capacity and • No single point of failure throughput (100K’s users) • Error recovery • Scalable, Extendable • Transparent failovers • Low spec machines • Distributed architecture (<3K$) • Security • Minimum data replication • Log file writing (no DB) among clusters for best performance • No Relational Database 6
  7. 7. Technology and market survey • Reviewed several vendor offerings • Reviewed several technologies – ‘Standard’ C/C++ – Java based – Erlang 7
  8. 8. And the winner is… Erlang • Strengths of Erlang that helped with this decision – Native concurrency support – Native distributed architecture for scalability and redundancy – Reference products – Quick POC – Short time to market 8
  9. 9. Project steps and progress • Proof of concept – Functional service after 2 months (!) • System design, based on POC learnt knowledge • 7 phases of major deliveries – Build Test Deploy – Each phase ended with a functional tested service • Production within 10 months from kick off – Worlds first MSP3.0 certified GW • Load testing, Load testing, Load testing … – And still, what you don’t test will fail on production ☺ 9
  10. 10. Architecture Approach – POC Conclusions • Focus on GW features • Clustering • 3 Tiers – No single point of failure – Front-end, Router, – Redundant components Transport – In-memory replicated DB – High-Speed RPC System (minimal shared data) (eTunnel) – Internal health system • Security – Advanced failover – No access to core • Load Balancing switching & session data – Hash based distribution – Limited access to – Consistent paths transport nodes – Least connections – One way firewalls – URL load balancing
  11. 11. Outlined topology Front-ends Routers Transports Load Load Firewalls Firewalls Balancers Balancers eTunnel eTunnel 11
  12. 12. System Architecture
  13. 13. Performance Benchmark Measures Setup • Basic cluster built of 11 servers – Dual core, 2Gb RAM machines Instant Messaging performance • 150,000 Connected users • 12,600 Messages per second • 45,000,000 Messages an hour System resources • 50%-60% system utilization
  14. 14. What Made the Difference? • High speed socket based RPC – eTunnel • Resource pools (eBalancer) • Minimal data sharing • Hash based distribution between layers makes consistent path through layers • Integration with 3rd party load balancer (URL SLB, health) • Reduce destructive DB operations, especially on replicated tables • Forced GC • Network interface pools • No front end firewalls • Internal health system integrated with topology manager • Extensive SNMP MIBs • Throttling – Control concurrency
  15. 15. Satellite Systems • Authentication – Authentication, Authorization & Control over the service delivered • Reporting – Nearly real-time usage reports aggregated to an hour cycle • Operation and Maintanance – Stop / Start, Watch real-time alarms, configuration – Traceability and troubleshooting – Software update - hot patch distribution and loading – Central log console – Central graphing system • Real Time Monitoring – Infrastructure, OS and App
  16. 16. Authentication UI 16
  17. 17. Reporting system UI 17
  18. 18. Operation & Management UI 18
  19. 19. Graph system UI 19
  20. 20. Monitor view 20
  21. 21. Thank you

×