Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence

127 views

Published on

B2B sales intelligence has become an integral part of LinkedIn’s business to help companies optimize resource allocation and design effective sales and marketing strategies. This new trend of data-driven approaches has “sparked” a new wave of AI and ML needs in companies large and small. Given the tremendous complexity that arises from the multitude of business needs across different verticals and product lines, Apache Spark, with its rich machine learning libraries, scalable data processing engine and developer-friendly APIs, has been proven to be a great fit for delivering such intelligence at scale.

See how Linkedin is utilizing Spark for building sales intelligence products. This session will introduce a comprehensive B2B intelligence system built on top of various open source stacks. The system puts advanced data science to work in a dynamic and complex scenario, in an easily controllable and interpretable way. Balancing flexibility and complexity, the system can deal with various problems in a unified manner and yield actionable insights to empower successful business. You will also learn about some impactful Spark-ML powered applications such as prospect prediction and prioritization, churn prediction, model interpretation, as well as challenges and lessons learned at LinkedIn while building such platform.

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
127
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
5
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence

  1. 1. Songtao Guo, Wei Di Transforming B2B Sales with Spark Powered Sales Intelligence
  2. 2. Songtao Guo Sr. Staff & Tech Lead Data Analytics Data Mining at LinkedIn soguo@linkedin.com Wei Di Staff Data Scientist Data Analytics Data Mining at LinkedIn wdi@linkedin.com
  3. 3. Outline Overview of Sales Intelligence ❏  B2B landscape ❏  Challenges ❏  One-stop solution overview ❏  Derived insightful features from LinkedIn Graph ❏  Centralized Data Mart ❏  B2B Intelligent Engine ❏  Model learning ❏  Model reasoning ❏  Model management B2B Intelligent Framework Overview & Details ❏  Problem definition ❏  Labelling Logic & Interpretation ❏  Performance & Lessons learned Case Study
  4. 4. Demand Generation Seal the deal Onboard & Retention Business Growth Business Needs B2B Predictive Modeling B2B Analytics Landscape How to prioritize marketing leads? How much is the potential spending from a client if acquired? Who are more likely to become customers? Why? What is the best channel to acquire? What is the market segmentation? How are new customers onboarding? How are the customers engaged with products based on different segments? Which customer is going to renew less, why and what can we do? Which customer has upsell potential, why and what product to buy? Acquire New Customers Empower Existing Customers
  5. 5. Challenges of B2B Data & Application ●  Data quality issues ○  Incomplete, sparse, noisy and dynamic over time ○  Missing historical data ●  Lack of centralized data covering various needs ●  Unclear source of truth ●  From score to actionable insights ●  Unify existing solutions ○  Multiple owners ○  Multiple predictive scores exist for similar purposes ○  Inconsistent quality ●  Scale model building
  6. 6. Change Landscape with Spark Simplicity is the ultimate sophistication. Leonardo da Vinci ●  Chaos of data ●  Redundant/repeating work ●  No monitoring for feature evolvement and model degradation ●  No scalability and slow iteration The Old Landscape One Stop Solution ●  Semi-automatic with flexibility of self- configuration ●  Scalable and fast iteration ●  Monitoring with closed feedback loop The New Paradigm MLlib
  7. 7. B2B Intelligent Engine Intelligence Layer Predictive Modeling Clustering Optimization Data Layer Feature Mart Feature management Label management Application Layer Segmentation B2B predictive insights Territory planning Solution integration MLlib
  8. 8. Data Layer Feature Mart Feature Management Label Management •  Provide a central feature mart that serves as source of truth for key account level metrics •  Scale feature generation •  Ensure high reliability and consistency (source control, peer-review) •  Cover all the company entities and sufficient long time series •  A standardized feature onboarding process •  Unified metastore to support feature governance and feature application •  Feature search and feature profiling (Spark SQL) •  Standardize problem definition •  Scale label preparation •  Gold dataset management •  Support continuous performance monitoring Feature Mart Canonical Data Sources Member, Company, Job, Tracking, SFDC, ... Service Log Prioritizer,... External Data Sources SEC, ... Label Mart Source Data label generation pipelines Human Labeler Feature Feature Batch derived labels Survey labels Feature Metastore
  9. 9. Intelligence Layer Workflows Feature Engineering Algorithms •  Highly configurable model building Azkaban workflow Spark MLPipelines Spark ParamGridBuider •  Highly configurable scoring Azkaban workflow •  Integrate features from various of sources: company level feature mart and user experimental features •  Support feature representation for different entity types •  Adapt different feature granularities: daily, weekly, monthly, etc. •  Regression RandomForest, GradientBoostedTree, XGBoostTree, LinearRegression •  Classification DecisionTree, RandomForest, GradientBoostedTree, XGBoostTree, LogisticRegression, MultilayerPerceptronClassification Intelligence Engine Model Selection Score Reasoning Feature Grouping Feature Integration Feature Engineering Model Validation Model Chaining Model Ensemble Model Parameter Tuning
  10. 10. Application Layer Prospect Score Account Propensity Size-Of-Prize Upsell/Churn Score Demand Generation Seal the deal Onboard & Retention Business Growth Acquire New Customers Empower Existing Customers Life5me Value clustering regression/classification feature normaliza7on/ selec7on limited/skewed training data non-linearity but requires model interpreta7on large & sparse dataset
  11. 11. Outline Overview of Sales Intelligence ❏  B2B landscape ❏  Challenges ❏  One-stop solution overview ❏  Derived insightful features from LinkedIn Graph ❏  Centralized Data Mart ❏  B2B Intelligent Engine ❏  Model learning ❏  Model reasoning ❏  Model management B2B Intelligent Framework Overview & Details ❏  Problem definition ❏  Labelling Logic & Interpretation ❏  Performance & Lessons learned Case Study
  12. 12. Derive Features from IN Graph Activity SocialMember Profile Company Growth Product Booking Product Usage Product Whitespace Product Performance Company Profile Sales Marketer Decision maker Recruiter Sales Talent professional Decision maker Sales Sales Employee
  13. 13. Centralized Feature/Label Mart Feature Coverage: ●  derived company level knowledge ●  generic member/company level ●  internal & external data sources ●  monthly/daily granularity ●  region/functionality segmentation ●  enriched meta information ●  traceback to years early Feature Management: ●  feature profiling ●  visualized monitoring system ●  whitelist/blacklist features as needed ●  self-serve onboarding platform Label Management ●  Unified label generation logic for same type of problems across different verticals ●  Vertical independent label generation pipeline
  14. 14. B2B Intelligence Engine Label Data New Data Meta & Data Store Feature Engineering Feature Integration/Select Intelligence Engine Train/Validation/Test Dataset Model Interpretation Model Building Feature Engineering Scoring Model Management & Update New Model Previous Models Predictive Solution Production Deployment Reasoning Model Building ●  leverage Spark/ML: wide coverage of modeling approaches for regression, classification, and clustering ●  do parallel multi-level model training ●  fast iteration ●  self-serve configuration/tuning ●  various evaluation methods cater to business requirement
  15. 15. Model Interpretation Master Model Component Models SOCIAL E-LEARNING MARKETING AFFILIATION ENGAGEMENT SPENDING GROWTH PRODUCT USAGE (low) Provide high-level overview and insights through component scores Deep dive to finer-level predictors Estimate overall conversion probability from master score ●  Monthly active seats ●  Avg days to onboard ●  Average CRM usage ●  Mobile view ratio ●  Percent of active seat- holders who use advanced features ●  Percent of active seat- holders who imported sf contacts …... Low High Low Medium Medium Medium +
  16. 16. Model Management/Monitoring ●  Monitor both feature/model performance changes over time ●  Centralized model repo with standard format ●  Feed in new training data to generate “challenger models” to compete ●  Pay attention to feature upstream change ●  Business customers evolve dynamically ●  Products update periodically ●  Performance degradation ●  Failure/Outlier examples ●  Feature statistics over time: ○  non-null count, sum, medium ○  coefficient of variation for volatility evaluation ●  Model refresh ●  Feature diagnosis feature monitoring performance monitoring inherent temporal nature
  17. 17. Outline Overview of Sales Intelligence ❏  B2B landscape ❏  Challenges ❏  One-stop solution overview ❏  Derived insightful features from LinkedIn Graph ❏  Centralized Data Mart ❏  B2B Intelligent Engine ❏  Model learning ❏  Model reasoning ❏  Model management B2B Intelligent Framework Overview & Details ❏  Problem definition ❏  Labelling Logic & Interpretation ❏  Performance & Lessons learned Case Study
  18. 18. Case Study Account Propensity Score (APS) Upsell Propensity Score (UPS) definition Which enterprise accounts have higher chances of buying the product & why? Which existing enterprise customers have upsell potential & why ? common ●  business varies significant across region/product ●  data evolves dynamically, time series events ●  binary propensity model ●  actionable insights on dissect aspects ●  handle ‘confuse’ samples correctly (p = 0.5?) differences ●  data sparse, noisy ●  region level, regions that are very small need to borrow information from other regions/ global ●  score accuracy is important for the whole spectrum ●  evaluation on all tiers ●  data is more reliable given rich product usage features ●  product level ●  score accuracy is important for the top ones ●  evaluation on recommended potentials p ( rmaster | f,..f ) w1 * p ( rcompo. | f1,..fq ) w2* p ( rcompo. | fq+1,..f q+n) ….
  19. 19. Label Generation: APS Label is defined at (account + region/etc) level ●  Positive – 1: closed won opportunity. ●  Negative – 0: closed disengaged opportunity Opportunity evolves: time series events ●  lookback from opportunity creation time ●  find explicit negatives ●  try until won: flip opportunities ●  differentiate mid vs. year-end modeling re-touch label
  20. 20. Label Generation: UPSLabel is defined at (account + product) level for two scenarios: Add-On and Renewal ●  Positives – 1: increased sell & continue contract ●  Negatives – 0: other cases ●  lookback from upsell creation time ●  focus on good quality positives ●  consider contract renewal/churn for add-on case $500 +$300 Contract 1 Churned $500 $500 upRenewal scenario $500 $500 and less Contract 1 Contract 1 label Add-On scenario $500 +$600 $500 -$200 Contract 1 Contract 1 Renewed Renewed
  21. 21. Fast growth reflects potential demands, find out how our product can help the customer further grow and increase social selling & affinity. Upsell Propensity Score Acct. A Acct. B Acct. C Score 90 75 30 Comp. & Growth Fast member growth Hot industry Edu./Gov Booking Product A: $1000 Product B: $500 Product B: $700 Usage Highly Engaged Engaged Less Engaged Perf. Good hiring perf. White Space $2,000 $500 $100 Modeling & Interpretation Account Propensity Score Acct. A Acct. B Acct. C Score 92 65 30 Comp. & Growth Top Industry Fast Growth Decline Affinity High Medium Low Booking High Medium low Low Social Selling High Medium High Medium low Better onboarding & training to boost product usage and performance.
  22. 22. Model Performance & Business Impact Model numerical validation •  Account Propensity model: AUROC of 0.64; High vs. Avg: 1.7x; •  Product Upsell model: AUROC of 0.84; High vs. Avg: 2.7x; Medium vs. Avg: 1.7x. Business impact (Prioritization & Marketing) •  Account Propensity: 17Q1: 5.3pp win-rate boost compared to no prioritization. •  Product Upsell: •  Marketing: 2.0x open rate, and 3.3x CTR compared to baseline. •  SMB sales: generated 22% ~ 46% more opportunities with customers. Business Impact High Medium High Medium Low Avg 1.7x likely to be upsold than avg. 2.7x likely to be upsold than avg. Account Propensity Product Upsell Tier 1 Tier 2 Tier 3 Avg 1.7x likely to be won than avg.
  23. 23. Summary End-to-end B2B solution ●  Centralized large-scale data & advanced intelligence ●  Speed up individual model development cycles ●  Drive monetization impact and business performance
  24. 24. Thank You. We are Hiring! Songtao Guo, Wei Di soguo,wdi@linkedin.com

×