Talk given by Jag Jayaprakash, Sr. Software Engineer, and Ganesh Tiwari, Software Engineering LMTS at Salesforce, at Spark Summit in June 2016
Salesforce customers have written hundreds of millions (and growing) number of automated tests for their custom applications. As part of our pre-release regression testing, we run these tests twice – once with the old platform code and once with the new platform code. These test runs should have identical behavior if the new platform code has to pass the regression testing and be ready for production. Our application stack is setup to run these hundreds of millions of tests, store their results, analyze, find and fix bugs – all in a few weeks. We leverage Spark jobs to do filtering and perform various transformations on the results. Once cleaned up, the results then go through a unsupervised learning pipeline written in Spark MLlib to cluster results to identify real issues vs non-issues and known issues. Key takeaways: Spark MLlib, Spark at scale, business use case, improve operational efficiency.
2. Background: Salesforce App Cloud
FORCE
Model-driven
development
platform
HEROKU
Polyglot platform for
elastic scale
APP EXCHANGE
Enterprise App
Marketplace
LIGHTENING
Visual Development
Platform
THUNDER
Stream & event
based primitives
APP EXCHANGE
Enterprise App
Marketplace
3. Background: Salesforce App Cloud
STRONGLY TYPED
Direct references to
schema objects
LOOKS LIKE JAVA
Acts like database
stored procedures
EASY TO TEST
Built-in support for creation &
execution of unit tests
OBJECT
ORIENTED
Visual Development
Platform
CLOUD HOSTED COMPILER
Interpreted & executed on on
multitenant environment
4. Background: Apex Hammer
BASELINE Execution
Results
tests
tests
tests
HAMMER
Cluster
3
Cluster
2
Cluster
1
Cluster
5
UPGRADED Execution
Results
tests
tests
tests
CUSTOMER TESTS ARE EXECUTED TWICE
in Salesforce secured environment in data centers
1st EXECUTION CALLED BASELINE
on current production version
2nd EXECUTION CALLED UPGRADED
on release candidate version
UPGRADED AND BASELINE RESULTS ARE
COMPARED
When a test passes, it should pass in both versions.
When a test fails, it should fail in both versions.
Any other state is a potential bug in release candidate
version!
5. Challenges
Infrastructure setup
to execute hammer
on two different
platform versions
Persist and compare
test execution results
Tests are executed
highly secured data
centers
150+ million
customer tests and
growing
that inspired Hammer…
Internal SLA to keep
these mammoth
efforts to under 3
weeks
6. Hammer Core: A Functional Overview
Cluster
3
Cluster
2
Cluster
1
Cluster
5
Data Ingestion
(Secured Internal APIs)
Clustering
(Spark M/L)
Each cluster is
a potential bug
Extract Transform Load
(Spark SQL)
8. Data Preparation Using Apache SparkSQL
Test Result Example Transformation Output
Baseline : Expected: 10, Actual: 10
Upgrade : Expected: 10, Actual: 10
PASS - PASS NOT A BUG
Baseline : NullPointerException: Attempt to de-reference a null object
Upgrade : NullPointerException: Attempt to de-reference a null object
FAIL - FAIL NOT A BUG
Baseline : Expected: 10, Actual: 10
Upgrade : NullPointerException: Attempt to de-reference a null object
PASS - FAIL
FOR FURTHER
ANALYSIS
Baseline : Expected: 10, Actual: 21
Upgrade : Expected: 10, Actual: 10
FAIL - PASS NOT A BUG
Baseline : Expected: 10, Actual: 21
Upgrade : Expected: 10, Actual: 51
FAIL – FAIL’
FOR FURTHER
ANALYSIS
Baseline : Expected: null, Actual: 1/25/2016
Upgrade : Expected: null, Actual: 3/10/2016
CANONIZE NOT A BUG
9. Spark Machine Learning Pipeline
Group test failure
records into K
Clusters
Each cluster is a
potential bug in
Salesforce App
Cloud platform
Enables human
inspection of cluster
to determine if it’s a
bug or not
Designed to operate
on records marked
“FOR FURTHER
ANALYSIS”
that inspired Hammer…