How Salesforce built a Scalable, World-Class, Performance Engineering Team
How Salesforce built a Scalable,World-Class, PerformanceEngineering TeamSeptember 18th, 2012Kasey Lee, Salesforce, VP Performance Engineeringin/leekasey
Safe Harbor Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward- looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services. The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of intellectual property and other litigation, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-Q for the most recent fiscal quarter ended July 31, 2012. This documents and others containing important disclosures are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
A. I’m curious how PerfEng can excel in an Agile EnvironmentB. I’m curious how to utilize a Performance Engineers timeC. I’d like to understand how to better articulate the value of Performance EngineeringD. I thought this was a great place to take a break and check my social feeds before dinnerE. A, B, or CF. All of the above
What do typical Performance Teams start as? “Performance Engineering is run as a Shared Services model so your charter is the entire organization with maximum visibility. Everything flows through PerfEng because it’s so critical. Dev, QE, Technical Operations, Level II and III Support, and Professional Services wants the most out of your engineers by leveraging your talent across projects to scale mission critical applications”
Top Ten Signs Your Team Needs Help1. You laugh when asked to signoff at Feature Freeze (and Release Freeze)2. Your engineers work on 6-12 parallel projects (others work on 1-2 projects serially)3. If you attended each of your assigned scrum teams’ daily 15 min standup you’d never sit down (the entire week)4. When you can’t signoff on a feature, everyone wants to raise the goals instead of fixing the performance problem5. Every day you answer “How did you decide to prioritize my feature? How can I escalate this?” (even after you had agreement)6. You’re told to commit to a plan for the next release while your team is busiest in the current release and has no time to plan7. Your team wants to influence the product or hardware architecture but can’t find the time to even write up their analysis8. Developers discount poor results due to variance without looking at the data (even though the results of the latest release are always worse)9. IT always asks “Why do you need isolated labs? Dev and QA don’t need them”10.Devs ask your engineers to do manual tasks at all hours
That sounds like my situation… How didSalesforce approach this?
What’s in store?IntroductionThe Unique Challenge at SalesforceHow the Team ScalesWorkloadsAutomation, Tools, EnvironmentsClosing Thoughts and Tips
Brief BackgroundVP @ Salesforce Performance EngineeringSr. Director / Tech Lead @ Wily Technology Performance Engineering, Software Tools, QA, R&D LabArchitect @ Event Zero Developer, consultant for startupsDeveloper @ Ziff-Davis Benchmark Operation Industry Standard Software Benchmarks iBench, WebBench, ServerBench, NetBench
What drew me to Salesforce? • Performance and Scalability is one of the top three core values of the company • One of the most complex Enterprise scalability challenges anywhere • As of today one of the best funded teams in the industry and growing as quickly as we can find the best people
What are some key challenges at Salesforce? 1. Mission Critical Enterprise Apps Customers pay for No perf testing in production on unwary customers No tolerance for downtime or slow response times which immediately impact customers’ bottom line 2. Security is Paramount Extremely difficult to access production systems / data Can’t easily examine load and data shapes in detail 3. True Multi-Tennant Architecture Every customer can create completely different load / data characteristics at a moment’s notice
Noteworthy MilestonesMid 2006 – “System Test” Team created from HA crisisApril 2008 – Kasey Lee joins a struggling team of 7Sept 2008 – Automation & Tools Team CreatedSept 2009 – Team averts162 R1 Load Balancer DisasterJan 2010 – Leads solution to Capacity Planning crisisSept 2010 – Team predicts GC Heap 168 R1 RegressionSept 2010 – Team leads solutions to NA6 PerfNov 2011 – Team helps reduce production CPU >60%May 2012 – Team Triages 178 R1 Bytecode RegressionJune 2012 – Team size rises to 60 Traffic & complexity continues to increase toJan 2013 - Target size: 80+ ~60B / Quarter, but response times have decreased!
Major Accomplishments ex. – “CPU 15”• SWAT team optimization / tuning efforts saved the company ~$150 Million dollars• Optimizations include potential to change the JVM spec directly to benefit everyone• Great example of ROI Not only in dollars, but helps build the credibility that you can leverage to do even more
PerformanceDaily Dashboard– 11/15/2011 –Look at all thatGREEN!
How do we accomplish this? • Baseline Functionality & Benchmarking • New Feature Benchmarking • Patches / Production Support • Hardware / Infrastructure Analysis • Special Studies / Research / POC • Production Visualizations • Capacity / Sizing Guides • Architecture Expertise • Profiling Concepts and Training • Automation Frameworks • Self Service Frameworks • Data Analysis, Creation, Visualization Tools • Load Generation Tools • Environment Design • Optimization
What We Continually Focus OnBlazing fast performance delivered by Cloud teams and PerfEng throughcollaboration, innovation and transparencyEmpowered and engaged PerfEng inspired by the real world impact oftheir work and widely recognized as industry thought leadersQuick and accurate test results, effective testing, seamless schedulingand flexible environmentsFrequent assessment, optimizations, and deep visibility into featureperformance during development and in productionFully integrating PerfEng into product development as beneficial andessential members of Cloud teamsPerformance built in by Cloud teams and able to catch obviousperformance issues themselves
What really makes us so effective? 1. Our Perf/Dev ratios have been adopted (after numerous “discussions” ) 2. We have a Software Development Team 3. We have a Product Owner (Prod. Mgr) for our Labs 4. We have a dedicated TechOps team “PerfInfra” for Labs 5. We have a substantial lab for testing 6. We have a Program Manager focused on cross functional project strategy, visibility, and communications
Performance Engineering Team Structure Performance Automation & Tools & Env • Sales/Service/Data • Software Tools Developers • Features • Environments • Workloads • Product Owner • Chatter • Special Projects Lead • Features • Workloads • Platform/Mobile/UI • Features Architect • Workloads • Core/Search/Analytics Program Manager • Features • Workloads
PerfEng Historical Lag per Release Release R1Planning Final Plans Due Feature Freeze Sandbox Freeze Product Development Sprints Release Sprint Release Jan Feb Mar Apr May Jun Accumulated performance bug debt Cost of finding &fixing bugs PerfEng Begins Testing
• Late starts with minimal workloads• Increased workloads and decreased time to bring online• No longer need to track
PerfEng Starts vs. Release Timeline Release R1Planning Final Plans Due Feature Freeze Sandbox Freeze Product Development Sprints Release Sprint Release Jan Feb Mar Apr May Jun
Q: How do we do scale PerfEng to meet demands of a larger organization?
Ratios are Key to Establish and Socialize PerfEng established a 1:8 ratio of Perf/Dev IC No more than two scrum teams or three projects / release Does not include workload engineers (min of two per cloud) Does not include Managers or Software Tools Engineers Perf Managers/IC ratio may need to be higher than 1:8 Managers may require 1:3 or 1:5 due to the additional teams managers interact with cross functionally Find a ratio that enables PerfEng Factor early participation, deep dives, optimization work to provide meaningful contributions Support discussion with velocity points, automation and efficiency examples, ROI Examples
>1.2x >2x>2x Gap is closing today, but still haven’t reached the target
Embed Performance Mindset into Every Team “Closely partner with scrum teams to provide early, fast, continuous architecture engagement / results / analysis for complex scenarios and enable scrum teams to catch obvious performance issues with self service tools, automation, and processes before they reach Performance Engineering”
How We Interact with Scrum Teams • Each scrum team appoints one Dev and QE engineer who are mapped to a single PerfEng • Teams must co-develop their release plans and sign off criteria up front • Teams are accountable for their features (complete ownership coming back to PerfEng as team scales up) • Teams must characterize obvious performance criteria themselves every sprint (Cadence, PTest) • Teams must deliver their features on time or accept testing into the release sprint or beyond
Embedding Performance – A Tiered Approach Increasing Test Complexity and Feature Risk80% Scrum Team + 15% Scrum Team + 5% Scrum Team +0% PerfEng 40% PerfEng 60% PerfEngSingle user transactions on Single user transactions on Corsa Single user transactions on ISTDesktops/Local BuildsSingle user transactions in PTests High Load, High Concurrency on High Load, High Concurrency on Corsa IST Scrum Teams focus on catching obvious low-hanging fruit; PerfEng focuses on difficult to construct, high load/concurrency scenarios requiring highly specialized knowledge to detect and analyze
86 GB of meta data primarily from PerfEng workload tests!
1.4 TB of meta data from tests created by Devs and outside teams!
Q: What are the key Agile ReleaseMilestones and activities for PerfEng?
Release Timeline and PerfEng Activities Release R1 Planning Final Plans Due Feature Freeze Sandbox Freeze Product Development Sprints Release Sprint Release Jan Feb Mar Apr May Jun •Appoint Liaisons •Complete Release Plans •Double Check Exit •Initial visibility into •Signoff on all •Monitor SandboxMilestones Criteria all features Features and •Final Optimizations Workloads •Signoff on ¾ of features •Continue Workload Optimizations •Get workloads green
How Do We Allocate Engineer’s Time?70% Velocity Points Open Feature or Workloads work for a specific cloud30% Velocity Points Reserved PTOn (9 days/year to work on whatever they want) External Training Classes (e.g. SQL Tuning) Other Cloud’s projects they are interested in Conferences (e.g. HBase, Hadoop) We Leverage Agile and Foundation events (1:1:1) ADM to enable People’s Changing Interests
Templates Cover Most Important Phases of a ProjectRequirements/Arch Strategy/Test Plan Analysis/Results
Release Signoff Criteria and Team Dynamics • PerfEng will only sign off on features we worked on directly (or have thoroughly reviewed the plans and results) • Scrum Teams may sign off on features by themselves at their own risk for any feature with Medium or Less Risk (if PerfEng is short of resources)
Quick Tip – Negotiating Release Criteria Bring in teams from operations and support Quote examples of consequences of releasing without adequate throttles and caps in place Cite examples from your company or other leading companies of the cost of reduced customer credibility
What is a “Workload”? • A repeatable test simulation or benchmark that provides a meaningful result by utilizing specific inputs into the system under test while recording numerical metric data, which is subsequently analyzed and weighted to perform a qualitative assessment • Changing a variable in the workload and re-running provides a meaningful comparison • Baseline Workloads are automated and enhanced release over release wherever possible
“Shape” Terminology Load Shape – The distribution, rate, and type of requests injected into the system under test (SUT) Data Shape - The size, skew, and type of data, files, etc. accessed during the test
CategoriesPlayback tests take production traffic logs and replay traffic against the cut of data fromthat time period • This enables Salesforce.com to properly capture data skews, volumes, and transactions that customers have run at a particular time and cover features that are heavily customizableSynthetic tests involve utilizing custom tools to profile production load and data shapesand then use custom tools to create workloads that mimic the desired characterisitics • Synthetic tests enable the team to create data and load shapes that may be far greater or more accentuated than in production, in a deterministic and precise fashion that enables granular studies of linearity, bottlenecks, and resource utilization • In most situations different versions of Salesforce are compared against one another, although absolute performance metrics are used for new features or situations where it is too difficult to make meaningful comparisons
Workload HighlightsName Summary Load Shape Data ShapeDB Workloads A workload the replays real production requests against 100,000 complex Sanitized copy of real world production customer data in a precise fashion to meticulously identify reports and filters Data with emphasis on massive data sets proper DB stats, tuning for reportsGrinder A large scale, high load, high concurrency test that simulates an 400 RPS, target Sanitized copy of real world production hour of peak production traffic by replaying transactions production Data steady state utilization of 35%, peaks of 80%Force.com Simulates traffic against a standard Ideas sites / base them Requests are Synthetic data based on real world force.com application. generated across app “Ideas” 40 different URLs / operationsVisual Force A read-only targeted test isolating specific components of VF at high 32 concurrent Small Synthetic VF classes. Viewstates, request rates. Apex components are designed to be constant across requests across 10 Wrapperless / Wrapped nested data all requests so regressions are to pure VF orgs presentation and NamespacesApex A targeted test that exercises the components of Apex Cache, CPU 64 threads across Synthetic set of classes that exercise Apex consumption, Memory 16 organizations Cache, CPU Use of Apex L1, Maximal number of lines of apex, creation of temporary objectsSharing A workload that performs DML Operations on Sharing Enabled Orgs, 2 app servers, 10 Synthetic Orgs (One Territory Managed, Performs Sharing Rule Maintenance Operations on Various Entities, concurrent Users, One Regular) Territory Management Operations and Accounts/Opportunity 7 Thread groups
Workload Highlights (continued)Name Summary Load Shape Data ShapeSearch High load, high concurrency test that simulates peak production traffic Replay production searches Sanitized copy of real world by replaying searches and concurrently simulating incremental and performs incremental production Data indexing. Monitors and reports metrics on entire stack [Indexers, DB, indexing at peak load. Issues Query Servers, App Servers, Memcached] searches at 55 RPSMQ Workload: A workload which enqueues messages into QPID on an IST using 20 app servers x 20 threads synthetic; configurable messageQPID multiple IST app servers. Tests QPID (the MQ transport service) in enqueue messages of size(transport in isolation. Suitable for acceptance testing an upgrade. varying sizes for 10min-6hrisolation)MQ Workload: A workload which creates load on the integrated SFDC MQ framework 20 app servers x 20 threads synthetic; configurable messageHydra using the SFDC MQ API library. Uses synthetic asynchronous handlers enqueue messages of size(integrated) running on the app servers to simulate message and resource varying sizes for 10min-6hr consumption. Suitable for running with every release, and for simulating the impact of a new asynchronous handler.Mobile Workloads simulate user actions over a real 3G network. Captures Real Device & Emulator. On Sanitized copy of real world metrics to measure end-user perceived response times on slow Real 3G networks production Data networks and real devicesUI Workloads simulate user actions in a real browser. Captures metrics to 6 Browsers – Nightly tests Synthetic user data – across all measure end-user perceived response times. Org with Chatter data is standard pages. 3 Different orgs very large. to test across different skins/chatter.
Workload End to End Coverage* (At a Glance) UI Network App Search Indexer FFX Batch DB SANDB Wkld 8 1Grinder 1 8 3 3 3 3 7 3Force.com 1 8 1VF 7Apex 7Sharing 7 2Search 6 6 6 4 3MQ 6 2Mobile 5 4 1UI 8 5 4 1Batch 6 *Higher numbers indicate better coverage in a given tier
Daily DB, Appserver, and UI Performance Tests! Database Workloads Appserver Workloads UI - End User Response Time Workloads
168 – Performance Bugs ROINote that >50% of P0 bugs werefound by baseline workloads! 290 Total = 78 Workloads (27%), 211 Feature Testing (73%)
Michelangelo – Results Viewer• Provides single point of entry into all automated tests• Dynamic Test vs. Test views• Automatic Averaging of test runs and filtering of outliers• Compare baseline to results trends
Michelangelo Changelist Trend Example• Dramatically shows changes in performance to the changelist Specific • Row and Column Changelist fix highlighting results in 33% more GC • Color Coding activity • Annotations • Compare baseline to results trends • Absolute and Relative difference comparisons
StatsForce – High ResolutionTime CorrelatedVisualizations Notice the • OS Statistics benefits of time • Application Statistics correlation! • JVM Statistics • Errors Notice how Full GCs affect • Mix and match Notice different chart types Response (scatter) on Times! representations and chart same timeline! types on demand
Statsforce Example - Force.com Workload LoadBalancer Regression 164 166 This looks odd!
Statsforce Example – Errors Per Second 164 166 This looks odd!
Environment TypesName Description SizeIST (Integration • Large scale pod. Closest to production in both software and hardwareSystem Testing) configuration (load balancers, 8 node RAC database, etc.) • Primarily uses production dataCST (Comparison • Small environments focused on Database workloadsSystem Testing) • Primarily uses production dataDB Load (Prod, • Small environment with large sized DBs (4TB – 20TB)Synthetic) • “Prod” uses production data , “Synthetic” uses synthetic dataCorsa (“Race”) • Small environments with hardware vertically identical to production • Fewer horizontal nodes, focused on a particular SUT (Search, DB) • Does not utilize production dataVMs / Autobuilds Dedicated environment for each engineer for development purposesDesktops / Adhocs Dev local machines or Adhocs for PerfEng – dedicated for each engineer for local tests or development
Continuous Data Refresh System •Enables teams to access latest production / synthetic data with minimal downtime •Performance tests can modify / delete TB of data and rollback in minutes Details Production snapshots and corsa images are taken periodically and stored on SAN Refresh A “jukebox” server prepares snaps into “green” database images The jukebox applies schema updates and keep them “green” The “green” images are always ready to use and rsynched directly to the environments
Where is Salesforce.com PerfEng Today? 30,000 ft. view• Team has evolved from seven “Systest” engineers who struggled to produce meaningful analysis, to a world class Performance Engineering organization of >60 engineers with no significant production issues the day after release for almost three years• Active participation in features, provides visibility and risk assessment at critical milestones and averts major degradations, helps triage and mitigate production issues, delivers optimizations across the stack, and whose skills and headcount are now lobbied for by Development teams• Automation has increased from two workloads which ran a handful of times late in the release, to over 15 sophisticated workload suites that run every day and are critical to signoff
Top Ten Tips for Scaling Your Team1.Socialize your ratios for PerfEng to Developers to eventually embed into teams2.Propose a dedicated model over a shared service model3.In a pinch, provide teams the velocity points they have funded, and ask them to prioritize4.Build out your management team at every opportunity5.Develop meaningful automated workloads with low variance and show the ROI regularly6.Create a tools team that spends >=75% of their time developing automation and tools7.Make your Labs and Test Frameworks self service8.Develop production monitoring tools to collect relevant data for workloads and exit criteria9.Create frameworks to enable staged work from Dev desktop to large scale Perf environments10.Develop training classes for perfeng, new hires, Dev/QE liaisons – smaller population first
What else could beresponsible for thisdramatic optimization?
Could Increasing PerfEngcontinue this trend…?
Bonus Tips for a Happy Team• Contribute to a positive atmosphere that promotes Autonomy, Mastery, and Purpose with interesting projects to tackle in depth• Focus on your strengths and strive to improve at every opportunity• Set a bold vision with achievable milestones, and celebrate progress
What will you take from today?What will you change starting next week? “Is anything truly impossible? Perhaps it is temporarily impractical or unlikely” – Kasey Lee Ex. Human Exoskeletons (2:05)
Turn your PerfEng team from this… Into this… Manual Black Box Testers Architecture / Analysis / Simulation / Optimization / Visualization / Automation / Monitoring Experts
Lines of Defense1. Single user requests in PTest on VMs2. Single user requests / high load on Corsa3. Concurrent / high load on Corsa4. Single user requests on DB Load5. Concurrent / high load on DB Load6. Single user requests on IST7. Concurrent / high load on IST