Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce

Jayesh Govindarajan
Search Relevance @ Salesforce
Improving Enterprise findability

Jayesh
Govindarajan
Senior Director Search
Relevance, Data Science
Salesforce
1. How is search in the enterprise
different ?
2. Enterprise findability problem
3. Relevance, LETOR algorithms
4. Deploying models in solr
5. A model for every customer
6. Putting the pieces together

Forward-Looking Statements
Statement under the Private Securities Litigation Reform Act of 1995:
This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties
materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or
implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking,
including any projections of product or service availability, subscriber growth, earnings, revenues, or other financial items and any statements
regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded
services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality
for our service, new products and services, our new business model, our past operating losses, possible fluctuations in our operating results and
rate of growth, interruptions or delays in our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with
completed and any possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our
ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer
deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further
information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the
most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing
important disclosures are available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available
and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that
are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

Largest Enterprise Search Service!
1.6
300TB+ 600M+
Queries / Week
<2min
Incremental Index Latency
500B+
Average Click Rank
Index Size
Documents in the Index
<120ms
Query Latency on Search Server
7B+
Index Updates / Day

Empower enterprise users to
effortlessly find all the
information they need in order
to be successful with
Salesforce
Intelligent, Fast and Powerful
Be a competitive differentiator
for Salesforce
The Search Vision
What information do you need?

1. How is enterprise search
different ?

Diversity of Data is a Challenge!
Sales Cloud
Structured data
SFA, B2C
Service Cloud
Unstructured data
Case Mgmt, KB, Field Svc
Community Cloud
Enterprise Social data
Q&A, Chatter, Files
App Cloud
Search APIs
Person Search
People data

Diversity of Intentions:
A Service agent exploring a community forum to educate himself:
Recall
A Service agent looking for a case similar to the one she is currently assigned:
Precision
A Sales rep looking for a named account to call:
Precision
A Sales rep looking for contacts in an industry within a certain geo:
Recall
Patterns of search and discovery differ by user roles, and searched entity

Customer diversity: one size doesn’t fit all
Matching models to Customer Orgs
Some Orgs want a lower coefficient
in some cases
Some Orgs want a higher coefficient

2. Understanding enterprise
findability problem

Most ranking functions start off with a few boosts
and end up like….this
Form
1. Query independent signals - multiplicative boosts in range [1-3]
2. Entity specific signals - additive boosts in the range of [1-12]
a. Accounts, Contacts, Leads - LastActivityScore, LastModifiedScore
b. Cases - CaseStatus, CaseEscalationScore
3. ...

Getting to a machine learned function has
challenges
Constraint
1. Customers build apps on enterprise search platforms. One cannot
simply cutover to a new ranking system.
Key Lessons
2. Understanding the current search equation is key to anticipating
customer breakage/impact.
3. Important to formalize the “Human Intelligence” equation behind a
working system.

3. Machine learning and Learning
to Rank methods

Build a probabilistic model of relevance.
Chance that user clicks on the ith record: pr(r,q,i)= L(r,q)*Rb(i)
- L() is a function which maps the (record,query) pair to the likelihood that a user clicks on r in
response to q
- Rb() is a function which corrects for the positional bias probabilities

Master Relevance Equation
Goal: learn the best linear function of these variables
From queries and clicks.

Logistic Regression
docPV score Clicked
queryId
-hnjnxlbxd 0 10.892 1
1ttuuy6n3 5 0.230 0
1ttuuy6n3 0 0.232 0
1ttuuy6n3 0 0.230 0
1ttuuy6n3 0 0.230 1
1ttuuy6n3 0 0.244 0
1ttuuy6n3 0 0.231 0
1ttuuy6n3 6 0.228 0
1ttuuy6n3 5 0.228 0
1ttuuy6n3 0 0.231 0
If this was the data, the
simplest approach would be
logistic regression
P(clicked) = sigmoid (a0
+ a1
(docPV) +
a2
(score))
Incremental effects of docPV and
score on relevance
Bias term that affects all
observations equally
The only thing to change about this is that we want
a separate bias term for each positional rank.

Page views and Lucene score for result in
position 1
Page views and Lucene score for result in
position 3
Position that was clicked
Example: Query/Click Data
Goal:

b1
b2
b3
b4
b5
1
Result 1 Result 2 Result 3 Result 4 Result 5
=
● Five logistic regressions with shared weights, but different biases.
● Coefficients and biases are learned via MLE (SGD).
Learning

Results
---------- ACCOUNTS ----------
Coefficients
-------------------------
docPC 0.203
docPV 0.312
doclm_score 0.642
lastAccessed_score 0.34
score 0.251
Rank Bias
-------------------------
Rank 1 1.0
Rank 2 0.884
Rank 3 0.843
Rank 4 0.788
Rank 5 0.94
The model is much better at predicting which of the five
results will be clicked.
Detect 50% of occurrences
where the 5th result was
clicked. Wrong on 1 out of 8
attempts
All else being equal, the odds of clicking position 2 are
about .884 compared to the odds of clicking position 1
Opportunities are the subject
of more general searches. E.g.
“Which opportunities are John
Smith working on?”
Searches for accounts or
cases are more likely to be
very specific. E.g. “I have a
specific account in mind…”

Results: Coefficients (normalized by relative influence)
q1 / docPV q2 / docPC d1 /
doclm_score
d2 /
lastaccessed
d3 /
oppclosedate
d4 /
oppclosed
d5 /
caseEscState
d6 /
caseClosed
Lucene Score
users 3.12 7.24
groups 4.59 4.59
files 1.32 3.74
cases 0.85 0.15 0.49 0.33 1.21
leads 2.01 2.30 1.01 1.62
contacts 1.87 1.42 1.15 0.76 2.76
accounts 1.64 1.07 1.65 0.96 4.17
oppy 2.27 0.53 0.53 0.62 0.81 3.64
kb 0.50 0.32 2.00
● Lucene score is always most important (except for leads)
● LastModified is extremely important for accounts and leads, but not at all for cases.

4. Implementing Model
representation in SOLR

Relevance Metadata JSON format
{
"schema": 1.0,
"global": {
"pc_s": 1.5, Boost Parent Child scores by 1.5
"pv_s": 2.0, Boost PageView counts by 2
"lm_s": 1.333, Last Modified
},
"entity": {
"500": { Specific for Cases (key prefix 005)
"cc_s": 4.0, Boost Open Cases
"lm_s": 1.0, Apply a different boost for Last Modified
}

Relevance Metadata (RMD)
Relevance
Model
{
schemaVersion:"V1",
orgId:"00D1234567",
Account:{
PV:2.1,
LastMod:0.5
},...
QIR:"Solr",
DBRerank:"CoreApp"
}
Model
Store
JSON is stored in a Blob
field in Setup BPO or
HBase table
Changes to format /
schema won't affect
table (it's just a blob)
AB Experiment
Name:
Org:
Params:
JSON
Model Deploy
Org:
Params:
JSON
Querying
Pass the same JSON to
Query layer and Solr
Server, each should have
code that knows what to
do given the JSON
Model Builder
(offline)
solutions to help us (Devs / PM
etc) build the model JSON files
Pass entire JSON to
Solr, or just boost
function
The same JSON is used to run AB
experiment and eventually deploy
into production
Relevancy coefficients are expressed in
a JSON data structure, so that we can
easily specify per-entity or global-to-org
coefficients

5. Stacking base and custom
models

Recap: And one size doesn’t fit all
Cluster Orgs based on their ACR response curves
Green orgs are hurt
badly by increasing
coefficient changes
Blue orgs are hurt badly
by decreasing
coefficient changes
Reds are hurt badly
either way
Three distinct clusters
observed in hierarchical
clustering

Stacking models
RELEVANCE PIPELINE
Base Model
(All orgs, all entities)
...Accounts
Model
Case
Model
Knowledge
Article Model
Feeds
Model
} Entity Signals
Org 1
Org 2
...
Org n

Putting the pieces together:
Relevance ML Pipeline, Runtime

Relevance ML Pipeline
RELEVANCE PIPELINE
Common representation
of ranking model. Infra
to automate training, A/B
testing and deployment.
FEATURE DETAILS
● Config driven Model
Deployment
● Automated model
generation
● Training and A/B
experimentation
Core App Solr
Model
Deployment
Model
Evaluation
A/B Experiments
Model Building
ML Training Infrastructure
Search
Query
Logs
RMD
JSON
Training
Models
RMD
JSON
RMD
JSON
RMD
JSON
Relevance Runtime
RMD
JSON
LEARN
TEST
SHIP
RMD JSON
Relevance
Model
Representation
Ranker
RMD
JSON
Training
Models

Relevance Runtime Infrastructure
RELEVANCE RUNTIME
Executes machine
learning models as a
service in solr, at scale
FEATURE DETAILS
● Ranking functions in solr
(linear and non-linear)
● Support for org, entity
specific models
● Query Understanding
Enterprise
Data
Index
Training Data Signals
- Interaction, Behavior
Clicks, Likes, Mentions
TF/IDF
Query
Understand
ing
(NLP, Q&A)
Level 1
Top K
Ranker
Level 2
ML Ranker Model
Salesforce
Cloud
Feature
Engineering
Snippet
Generation
Level 3
Post-Ranking
Model
MachineLearning
Pipeline
Search
Results
Conte
nt
Users/Acti
ons
Query
/Intent
query

We are Hiring !
ML Engineers
Engineering
Managers
Software
EngineersData Scientists Join Salesforce Search Cloud
Mining Intent @ Work

Results: Positional Bias
Opportunities are the subject of more general
searches. E.g. “Which opportunities are John
Smith working on?”
Searches for accounts or cases are more
likely to be very specific. E.g. “I have a
specific account in mind…”

Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce

More Related Content

What's hot

Viewers also liked

Similar to Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce

More from Lucidworks

Recently uploaded

Improving Enterprise Findability: Presented by Jayesh Govindarajan, Salesforce