Lahav Savir - Massively Scaleable Mobile Gateways
Upcoming SlideShare
Loading in...5
×
 

Lahav Savir - Massively Scaleable Mobile Gateways

on

  • 410 views

 

Statistics

Views

Total Views
410
Views on SlideShare
385
Embed Views
25

Actions

Likes
0
Downloads
0
Comments
0

4 Embeds 25

http://www.emind.co 18
https://www.linkedin.com 4
http://www.linkedin.com 2
http://ec2-54-216-87-37.eu-west-1.compute.amazonaws.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Lahav Savir - Massively Scaleable Mobile Gateways Lahav Savir - Massively Scaleable Mobile Gateways Presentation Transcript

  • RUNCOM MOBILE’S END-TO-END MESSAGING SOLUTION (MMGS) Lahav Savir Back-end Systems Architect Alon Klein VP R&D 26 June 2009 1
  • What is MMGS? • Instant Messaging (IM) gateway, connecting to public and private communities • Email gateway, providing mobile email service 2
  • Introduction • IXI Mobile founded in 2000 also doing business under the name of Runcom Mobile • IXI Mobile develops the Ogo family of mobile devices optimized for messaging applications • IXI provides end to end messaging solutions for smart phones 3
  • A Bit Of History • Passed through several generations of devices • Deployed worldwide with several communities • Used to work with 3rd party messaging gateway providers 4
  • We want our own solution ! • Control service quality • Control roadmap features • Maximum visibility to the system status • Ability to add unique optimizations for mobile networks • Self hosting • Own the IP • Time to market • Learn from past experience… • Royalty free 5
  • High Level Requirements • Keep it simple! • (Almost) Zero downtime • Huge capacity and • No single point of failure throughput (100K’s users) • Error recovery • Scalable, Extendable • Transparent failovers • Low spec machines • Distributed architecture (<3K$) • Security • Minimum data replication • Log file writing (no DB) among clusters for best performance • No Relational Database 6
  • Technology and market survey • Reviewed several vendor offerings • Reviewed several technologies – ‘Standard’ C/C++ – Java based – Erlang 7
  • And the winner is… Erlang • Strengths of Erlang that helped with this decision – Native concurrency support – Native distributed architecture for scalability and redundancy – Reference products – Quick POC – Short time to market 8
  • Project steps and progress • Proof of concept – Functional service after 2 months (!) • System design, based on POC learnt knowledge • 7 phases of major deliveries – Build Test Deploy – Each phase ended with a functional tested service • Production within 10 months from kick off – Worlds first MSP3.0 certified GW • Load testing, Load testing, Load testing … – And still, what you don’t test will fail on production ☺ 9
  • Architecture Approach – POC Conclusions • Focus on GW features • Clustering • 3 Tiers – No single point of failure – Front-end, Router, – Redundant components Transport – In-memory replicated DB – High-Speed RPC System (minimal shared data) (eTunnel) – Internal health system • Security – Advanced failover – No access to core • Load Balancing switching & session data – Hash based distribution – Limited access to – Consistent paths transport nodes – Least connections – One way firewalls – URL load balancing
  • Outlined topology Front-ends Routers Transports Load Load Firewalls Firewalls Balancers Balancers eTunnel eTunnel 11
  • System Architecture
  • Performance Benchmark Measures Setup • Basic cluster built of 11 servers – Dual core, 2Gb RAM machines Instant Messaging performance • 150,000 Connected users • 12,600 Messages per second • 45,000,000 Messages an hour System resources • 50%-60% system utilization
  • What Made the Difference? • High speed socket based RPC – eTunnel • Resource pools (eBalancer) • Minimal data sharing • Hash based distribution between layers makes consistent path through layers • Integration with 3rd party load balancer (URL SLB, health) • Reduce destructive DB operations, especially on replicated tables • Forced GC • Network interface pools • No front end firewalls • Internal health system integrated with topology manager • Extensive SNMP MIBs • Throttling – Control concurrency
  • Satellite Systems • Authentication – Authentication, Authorization & Control over the service delivered • Reporting – Nearly real-time usage reports aggregated to an hour cycle • Operation and Maintanance – Stop / Start, Watch real-time alarms, configuration – Traceability and troubleshooting – Software update - hot patch distribution and loading – Central log console – Central graphing system • Real Time Monitoring – Infrastructure, OS and App
  • Authentication UI 16
  • Reporting system UI 17
  • Operation & Management UI 18
  • Graph system UI 19
  • Monitor view 20
  • Thank you