Message Queues at salesforce.com
The subtitle goes here
Vijay Devadhar
Developer
Salesforce.com
Safe Harbor
 Safe harbor statement under the Private Securities Litigation Reform Act of 1995:

 This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if
 any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-
 looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of
 product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of
 management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments
 and customer contracts or use of our services.

 The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our
 service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth,
 interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated
 with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain,
 and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling
 non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the
 financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This
 documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.

 Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may
 not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently
 available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Vijay Devadhar
Developer
Message Queues – what are they?
   Asynchronous job queue infrastructure in salesforce.com server



   Engine behind
     • Dashboards
     • Reports,
     • Batch/Async Apex
     • Bulk API
     • and many more…
Message Queues – what are our volumes?
   Averages about 60 million messages a day



   Biggest instances account for 10 million messages a day



   95 percentile for dequeue latency is 10 minutes
Message Queues – session description
   Discuss scaling techniques used to

     • Manage capacity


     • Resource allocation



   and lessons learned..
Visit with Princess Aurora
   Average wait time – 90 mins



   Average time with princess – 120 seconds



   What you remember the most after a year? The long……… wait. 
Visit with Goofy

   Average wait time – none




   Average time with goofy – 120 seconds or until you get bored
Meanwhile…..
Characters are not Created Equal
If Only We Hired More Princess Aurora
What can we do better?
   Fire Goofy(s)



   Hire more Princesses



   Convert Goofy to Princess and vice-versa when needed
Swap Goofy for Princess Aurora
Message Queues Amusement Park
   300 + rides and characters



   Traffic which ebbs and flows with time of day, day of the week etc.,



   Plenty, but finite set of resources available
Goal
   Reduce wait times



   Fairly allocate resources



   Adapt to varying traffic patterns
Solution
   A large Shared thread pool – No ride specific silos



   Round robin the process of picking work



   If world wants Dashboards, do Dashboards
Message Queue Real Time Latencies
   Unlike Disneyland, each job takes variable amount of time



   Wait time prediction is not accurate at the tail of the queue



   We report real time and act on them if needed
Elastic Thread Pools
   Can grow from initial size



   This allows growth as traffic demands



   Wait times feedback to thread pool grow, shrink decisions
Let’s Do a Puzzle
   A man has
     • One Sheep
     • One Tiger
     • One bundle of Grass
     • One small boat



   and a big River to cross…
Similar Puzzle with Messages
   Forecasting has
      • Several sales reps whose forecasts need update
      • Forecast update for sales rep should also update VP of sales
      • Multiple sales reps for same VP



   and a big sales projection to put out ….
Solution
   Browse and Cache



   Pick up work on which you can obtain mutex lock



   Jump ahead if needed
And in Real World..
   Cache capacity is tuned to typical traffic pattern



   At times cache can fill up



   Messages may be escorted to the back of the queue
Bread Lines Vs. Turkey Lines
      Same set of ovens baking both


      Bread is the basic need, Turkey when ovens are free


      If bread lines build up, stop cooking turkey


      If no one wants bread, just give all ovens to cooking turkey
User-facing vs Background jobs
      Same set of servers for both


      Users need fast response; Background can wait


      If user requests pile up, stop processing background


      If no user requests, just process background jobs
But how do you make Turkeys stop?
   Traffic lights


   Measure key resources


   When resource usage crosses threshold, slow down on background


   Sensitive to CPU, Memory, I/O, connection usage
Lessons in Real World
   Traffic types vary, traffic volumes vary


   Handlers misbehave, components have bugs


   Distributed systems scale very well, not if you need mutex.


   Real time alerting, trending, traffic isolation, troubleshooting are
    necessary
Message Queues at salesforce.com
Message Queues at salesforce.com

Message Queues at salesforce.com

  • 1.
    Message Queues atsalesforce.com The subtitle goes here Vijay Devadhar Developer Salesforce.com
  • 2.
    Safe Harbor Safeharbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward- looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
  • 3.
  • 4.
    Message Queues –what are they?  Asynchronous job queue infrastructure in salesforce.com server  Engine behind • Dashboards • Reports, • Batch/Async Apex • Bulk API • and many more…
  • 5.
    Message Queues –what are our volumes?  Averages about 60 million messages a day  Biggest instances account for 10 million messages a day  95 percentile for dequeue latency is 10 minutes
  • 6.
    Message Queues –session description  Discuss scaling techniques used to • Manage capacity • Resource allocation  and lessons learned..
  • 9.
    Visit with PrincessAurora  Average wait time – 90 mins  Average time with princess – 120 seconds  What you remember the most after a year? The long……… wait. 
  • 11.
    Visit with Goofy  Average wait time – none  Average time with goofy – 120 seconds or until you get bored
  • 12.
  • 13.
    Characters are notCreated Equal
  • 14.
    If Only WeHired More Princess Aurora
  • 15.
    What can wedo better?  Fire Goofy(s)  Hire more Princesses  Convert Goofy to Princess and vice-versa when needed
  • 16.
    Swap Goofy forPrincess Aurora
  • 18.
    Message Queues AmusementPark  300 + rides and characters  Traffic which ebbs and flows with time of day, day of the week etc.,  Plenty, but finite set of resources available
  • 19.
    Goal Reduce wait times  Fairly allocate resources  Adapt to varying traffic patterns
  • 20.
    Solution A large Shared thread pool – No ride specific silos  Round robin the process of picking work  If world wants Dashboards, do Dashboards
  • 22.
    Message Queue RealTime Latencies  Unlike Disneyland, each job takes variable amount of time  Wait time prediction is not accurate at the tail of the queue  We report real time and act on them if needed
  • 25.
    Elastic Thread Pools  Can grow from initial size  This allows growth as traffic demands  Wait times feedback to thread pool grow, shrink decisions
  • 26.
    Let’s Do aPuzzle  A man has • One Sheep • One Tiger • One bundle of Grass • One small boat  and a big River to cross…
  • 28.
    Similar Puzzle withMessages  Forecasting has • Several sales reps whose forecasts need update • Forecast update for sales rep should also update VP of sales • Multiple sales reps for same VP  and a big sales projection to put out ….
  • 29.
    Solution Browse and Cache  Pick up work on which you can obtain mutex lock  Jump ahead if needed
  • 30.
    And in RealWorld..  Cache capacity is tuned to typical traffic pattern  At times cache can fill up  Messages may be escorted to the back of the queue
  • 31.
    Bread Lines Vs.Turkey Lines  Same set of ovens baking both  Bread is the basic need, Turkey when ovens are free  If bread lines build up, stop cooking turkey  If no one wants bread, just give all ovens to cooking turkey
  • 32.
    User-facing vs Backgroundjobs  Same set of servers for both  Users need fast response; Background can wait  If user requests pile up, stop processing background  If no user requests, just process background jobs
  • 33.
    But how doyou make Turkeys stop?  Traffic lights  Measure key resources  When resource usage crosses threshold, slow down on background  Sensitive to CPU, Memory, I/O, connection usage
  • 34.
    Lessons in RealWorld  Traffic types vary, traffic volumes vary  Handlers misbehave, components have bugs  Distributed systems scale very well, not if you need mutex.  Real time alerting, trending, traffic isolation, troubleshooting are necessary