1. Applying data science to sales pipelines !
– for fun and profit!
!
Andy Twigg!
Chief Scientist!
2. WHY APPLY DATA SCIENCE TO SALES?!
Problem: sales teams are biased!
!
• Unrealistic targets – “you must have 3x coverage”!
• Happy ears – “they said they’ll definitely buy it”!
• Sandbagging – reps want to look like heroes, so don’t report deals
until late in the quarter!
We should be able to remove these biases!
• Stat: since 1995, CRM data has increased ~150x, but forecast
accuracy has reduced by 10% !
!
è data is available, but not helping!
3. PROBLEMS!
Opportunity Scoring!
• Pr(win) ?!
• Pr(win in quarter) ?!
• How does this compare to sales team commits?!
• Which deals can we influence most?!
Forecasting!
• How much will be won this quarter?!
4. SALES OPPORTUNITIES!
• Opportunities are temporal, either open or closed. Once closed, either won/lost!
• Usually proceed through stages, except:!
• Stages are a partial order - can skip / revisit!
• An opportunity can be entered as closed (no open observations)!
• As the opportunity evolves, we get more and more data about the opportunity!
• Sales teams mark an opportunity ‘committed’ – they predict win within the quarter!
• A pipeline is a set of open opportunities!
• We want to estimate Pr(final outcome = won), Pr(closed before time t), …!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
open closed
committed
7. • sales team: good precision (~70-80%) but poor recall (~10-40%)!
• model won precision ~ sales team won precision!
• model won recall ~ 3 x sales team won recall!
First observation Last observation
precision recall F1 precision recall F1
model 0.65 0.86 0.74 0.75 0.93 0.83
sales team 0.70 0.07 0.13 0.87 0.45 0.59
10. ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
11. ANATOMY OF AN OPPTY!
Pushed out
Pulled back
in
Final outcome:
won
Committed
here (by the
sales rep)
Predicted
won from
the start
Predicted won
in the correct
quarter
13. Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
SALES OPPORTUNITIES!
state!
xt!
state!
…!
x0!
• Sequence of observations x0, x1, … !
• associated with fixed target y={0,1}!
• Consider states as a MDP: state xt encodes temporal features
about previous states (cf RMF features)!
• # times this stage was previously visited, time between successive
visits, time in current stage, direction of amount change, …!
y=1!
14. • Sequence of observations x0, x1, … !
• associated with fixed target y={0,1}!
• Consider states as a MDP: state xt encodes temporal features
about previous states (cf RMF features)!
• # times this stage was previously visited, time between successive
visits, time in current stage, direction of amount change, …!
• States also contain!
• Sales-specific features e.g. momentum!
• External data e.g. firmographic!
• Global features e.g. avg_sales_cycle(target)!
• Gives examples {(x0,y),(x1,y),…} for each opportunity!
• Shuffle to break correlations between successive examples!
SALES OPPORTUNITIES!
y=1!
state!
xt!
state!
…!
x0!
Lead
created!
Stage:
Qualifying!
Email sent! Email opened! Amount=
$1000! Call!
Stage:
Validate!
Meeting! Demo!
Close date!
changed!
Stage:
negotiation!
Outcome:
Closed/won!
15. DURATION MODEL!
• Win/loss model!
• Pr(win)!
• independent of time horizon!
• RF/GBDT!
!
• Duration model!
• Pr(win within quarter)!
• Poisson regression: assume that in current state xt, fixed probability of closing each day!
• Train a model to predict expected duration d, conditioned on outcome=win!
• Integrating corresponding exponential distribution gives Pr(close < t) (interarrival times)!
• Pr(win < t) = Pr(win) Pr(close < t | win)!
16. FORECASTING: BOTTOM-UP!
Bottom-up: Predict current quarter based
on currently open pipeline!
!
Considers quality of deals in pipeline!
!
Ignores trends, deals not in pipeline!
$265,410!
$157,000
77%
$200,000
37%
$82,000
86%
+!
-!
Obvious solution: expected amount in
pipeline wrt Pr(win in quarter) scores!
17. FORECASTING: TOP-DOWN!
Top-down: Predict current quarter based on
previous quarters!
!
Accounts for seasonality and trending!
!
Ignores state of current pipeline!
0.0e+002.5e+08
observed
5.0e+072.5e+08
trend
−5e+065e+06
seasonal
−1e+075e+06
2013.0 2013.2 2013.4 2013.6 2013.8 2014.0 2014.2 2014.4
random
Time
Decomposition of additive time series
+!
-!
Typical decomposition of
revenue time series into 3
components:!
!
• Trend component!
• Seasonal component!
• Random component!
Idea: try to reduce the
random component by taking
into account current pipeline!
18. ‘HYBRID’ FORECASTING!
top down + bottom up!
• Idea: augment ARIMA model with side
information from bottom-up model!
• Allows model to adjust coefficients in
response to bottom-up features
(representing current pipeline) while
retaining ARIMA features !
• Amount predicted to close in
current quarter!
• Average score of currently open
opportunities!
• Average predicted days to close!
• Historic adjusted coverage ratios!
!
• Sometimes known as ARIMAX [1]!
[1] robjhyndman.com/hyndsight/arimax!
!
19. WORD VECTORS!
• Train word2vec model on text fields
on opportunities!
• description, status, risks, …!
• “deal pushed out because no
budget this quarter”!
!
• ~200m words!
• Gives 300-dimensional ‘neural’ word
embeddings!
• Compare to GoogleNews model!
• Learned some sales-specific
concepts!
In [23]: model.most_similar('lost')!
Out[23]:!
[('disqualified', 0.7105633020401001),!
('killed', 0.6871206164360046),!
('won', 0.6662579774856567),!
('abandoned', 0.6619119048118591),!
('closing', 0.6464139223098755),!
('moved', 0.6406350135803223),!
('reopened', 0.6268107891082764),!
('closed_lost', 0.6187739968299866),!
('low_probability', 0.6092942953109741),!
('closed', 0.6073518395423889)]!
!
In [24]: gn_model.most_similar('lost')!
Out[24]:!
[(u'losing', 0.7544215321540833),!
(u'lose', 0.7136349081993103),!
(u'regained', 0.618366003036499),!
(u'loses', 0.6115548610687256),!
(u'loosing', 0.576453447341919),!
(u'gained', 0.5561528205871582),!
(u'dropped', 0.5492223501205444),!
(u'loss', 0.5399519205093384),!
(u'won', 0.5263957977294922),!
(u'regain', 0.5241336822509766)]!