Exploring protein-protein interactions by Weak Affinity Chromatography (WAC) ...
Alexis R
1. Using AI To Provide Insights And
Recommendations From Activity Data
Alexis Roos
Director of Machine Learning
@alexisroos
2. Agenda
• Introduction
• Activity & AI
• Building Classifiers
• Generating Insights
• Demo
• Relationship Insights
• Wrap up and QAs
3. This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the
assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we
make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of product or service availability, subscriber
growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any
statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.
The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, new
products and services, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in
our Web hosting, breach of our security measures, the outcome of any litigation, risks associated with completed and any possible mergers and acquisitions, the
immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new
releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise
customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the
most recent fiscal year and in our quarterly report on Form 10-Q for the most recent fiscal quarter. These documents and others containing important disclosures are
available on the SEC Filings section of the Investor Information section of our Web site.
Any unreleased services or features referenced in this or other presentations, press releases or public statements are not currently available and may not be
delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available.
Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.
Statement under the Private Securities Litigation Reform Act of 1995
Forward-Looking Statement
4. Doing Well and Doing Good
#1 World’s Most
Innovative Companies
Best Places to Work
for LGBTQ Equality
#1 The World’s Best
Workplaces
#1 Workplace for
Giving Back
#1 Top 50 Companies
that Care
The World’s Most
Innovative Companies
#1 The Future 50
5. Salesforce Keeps Getting Smarter with Einstein
Guide Marketers
Einstein Engagement Scoring
Einstein Segmentation (pilot)
Einstein Vision for Social
Assist Service Agents
Einstein Bots (pilot)
Einstein Agent (pilot)
Einstein Vision for Field Service (pilot)
Coach Sales Reps
Einstein Forecasting (pilot)
Einstein Lead & Opportunity Scoring
Einstein Activity Capture
Advise Retailers
Einstein Product Recommendations
Einstein Search Dictionaries
Einstein Predictive Sort
Empower Admins & Developers
Einstein Prediction Builder (pilot)
Einstein Vision & Language
Einstein Discovery
Help Community Members
Einstein Answers (pilot)
Community Sentiment (pilot)
Einstein Recommendations
Austin Buchan
CEO, College Forward
6.
7. Agenda
• Introduction
• Activity & AI
• Building Classifiers
• Generating Insights
• Demo
• Relationship Insights
• Wrap up and QAs
8. Activities are the R in CRM
• Timestamped data
Emails, meetings, tasks, phone calls, etc
• User centric
User is the initiator or owner
• Defines relationship
Who is connected to whom, frequency of touchpoints, reciprocity, meetings
• High volume, high potential, extremely rich
Lots of traffic, content contains lots of information and signals
• Historical context is important
Length of relationship, how the relationship evolves over time
• Can be used for many different use cases
Email Insights, timelines, opportunity scoring, search, etc
9. • Large scale distributed real time streaming platforms are hard!
Unique use case, multiple products and services
• Large volume of activities
Tens of thousands of orgs connected through Inbox and Einstein Activity Capture
• Automatic capture really important
Maintains high fidelity
• Generate accurate intelligence
Some events are rare
• Speed
Emails must be processed within seconds
• Data privacy and security
Security, auditable access, data retention, privacy, GDPR, etc
Platform challenges
10. Augment CRM experience using AI and activity
Suggest
Action(s)
Email Insights:
Pricing discussed, Executive
involved, Scheduling Requested, etc.
AI Inbox
Timelines
Other Salesforce
Apps
…
Automatic
activity
capture
Extract
Insights
Emails,
meetings,
tasks,
calls, etc
Generate
Context
Reply with price list
Insert free time
Involve Executive
etc
Contextual services:
Recommended connections, Best time
to email, Suggested recipients, etc.
11. Agenda
• Introduction
• Activity & AI
• Building Classifiers
• Generating Insights
• Demo
• Relationship Insights
• Wrap up and QAs
12. ● Right language
● Automated vs non automated
● Inbound / outbound
● Within or outside the organization
● etc
Challenge 1: get to relevant emails
13. Challenge 2: structure of an email
INTRO
SIGNATURE
CONFIDENTIALITY NOTICE
REPLY CHAIN
BODYBODY
Hey Alexis,
Let’s meet with Ascander on Friday to discuss
the $10,000/year rate. Ascander’s phone
number is (123) 456-7890.
Thanks,
Noah Bergman
Engineer at Salesforce
(123) 456-7890
The contents of this email and any attachments
are confidential and are intended solely for
addressee…
From: Alexis alexis@salesforce.com
Date: April 1, 2017
Subject: Important Document
Noah, how much does your product cost?
HEADER INFORMATION ...
14. How can we get a higher yield of positive labels when labeling by hand?
-> Use filters + Word2Vec
. Train Word2Vec on unlabeled email
. Find words close in distance to “price”, “cost”,
“license”, etc
• No labels, and
currently no
mechanism to
infer labels
• Pricing
discussions are
important, but
relatively rare
events
Challenge 3: many insights require labeling data
16. Word Embedding (e.g., Word2Vec) for Feature Generation
● Word embeddings for individual tokens
capture the semantic.
● Aggregating word embeddings provides
powerful vectorized representation for a
body of text (e.g., email).
● Aggregated word embeddings are
incorporated as part of the feature vector
used in machine learning model training.
Unsupervised Learning for Better Representation of Text
Word Vectors Machine Learning
Models
17. Latent Dirichlet Allocation (LDA)
A document is a probability distribution over topics
Boeing: mixture of topics 4 and 5
Air Force One: mixture of topics 1 and 5
https://databricks.com/blog/2015/09/22/large-scale-topic-modeling-improvements-to-lda-on-apache-spark.html
-> Use entire topic distribution in the feature vector
18. Generating feature vectors and model training
Feature
Engineering
Model
Training
Model
Evaluation
LDA
Text
Processing
/ TF-IDF
Labeled
Training
Data
Emails
Word2Vec
20. Feeding Deep Learning Model with Word Embedding
Training Recurrent Neural Networks with LSTM for State-of-the-Art
● LSTM networks are capable of capturing subtleties in natural human language.
● Using pre-trained word embeddings reduces demand for large quantity of labeled data.
● The combination opens up possibilities of advanced intelligence beyond classification.
21. Agenda
• Introduction
• Activity & AI
• Building Classifiers
• Generating Insights
• Demo
• Relationship Insights
• Wrap up and QAs
22. How We Generate Insights from Activity
● Each classifier runs and extracts metadata
○ Scheduling: the date you want the meeting
○ Out of Office: the person’s return date
○ Executive Involved: the name and title of the exec
● Assign Actions based on which classifier is true
○ Create event on <Date>
○ Reply with Template
○ View Profile for <Exec>
● Classifier + Extracted Metadata + Actions = Insight
○ Classifier: “Scheduling Requested”
○ Extracted Metadata: “next Tuesday”
○ Actions: Create Event, Send Times, View Calendar
● Do it all in < 2 seconds
23. Collect and Filter
● Gather as much activity as possible
● Filter out spam, marketing e-mails, etc.
Collect Filter
24. Score and Extract
● The Spark Structured Streaming “Scoring Pipeline” portion you saw earlier
● Identifies which E-mails contain important moments
● Extracts the relevant metadata
Filtered
Email
Stream
Scheduling
Requested
Pricing
Discussed
etc...
SCHED_REQ
date: 2018-04-04
25. Make Insights Actionable
● Each Insight Type has actions or “next steps”
● Actions consume the extracted metadata and make it readily available for clients
SCHED_REQ
date: 2018-04-04
Insight
Publisher
(Context-Free)
Scheduling
Requested
● Create Event
● Send Times
● View Calendar
26. Agenda
• Introduction
• Activity & AI
• Building Classifiers
• Generating Insights
• Demo
• Relationship Insights
• Wrap up and QAs
28. Agenda
• Introduction
• Activity & AI
• Building Classifiers
• Generating Insights
• Demo
• Relationship Insights
• Wrap up and QAs
29. AI & Context
What do all those apps have in common? User context
Data + Algorithms + Compute = Killer Apps
Google
30. Consumer vs Enterprise Context
User isn’t the product but the customer
• Retention, privacy, GDPR, security, auditing, etc
Context has to be scoped
• Cannot be used globally: organization, team, user levels
Very rich
• Goes way beyond user context: organizations/groups/teams, products and services, companies,
different types of activities across many different products, etc
Very dynamic
• Fast coming data with lots of interaction points
31. Context enables us to deliver deeper insights.
Go beyond using a single email to make classification and action recommendation
This sender looks familiar, how well should I know him / her?
• Are we strongly connected? Is he or she important to my accounts or opportunities? etc
Is this email discussing products or services that my company sell?
Is this email discussing competitors?
Who, in my org, can help me sell to an individual or company?
• Supply relevant background information on a particular individual or company
• Identify who is the key decision maker
• Give me historical information for that individual or company
• Make an introduction for me
etc
32. A graph is an efficient means for encoding relationships.
An org can have thousands of contacts
• These contacts exist within the org itself (e.g.,
sales rep, account exec)
• Perhaps more importantly, contacts extend
beyond the org (e.g., buyers)
That same org can have millions of
events per week
• Events (e.g., meetings, emails, phone calls)
connect contacts and indicate a relationship
• The number and nature of events between
contacts can indicate strength of connection /
relationship
15 Jan Email - Sylvia to Andrea: introduction
20 Jan Meeting - Created by Andrea with Sylvia
31 Jan Email - Andrea to Sylvia & Mark: info request
01 Feb Email - Sylvia to Andrea & Mark: product info
04 Feb Email - Andrea to Sylvia & Joe
17 Feb Meeting created by Andrea with Alex and Joe
…
Andrea
Buyer
Mark
Evaluator
Alex
Sponsor
Joe
Acct mngr
Sylvia
Sales
33. T
Name: Joe Roos
Email: joe@salesforceuser.com
Title: Account manager
Company: Salesforce user
Recommended Connections:
{(Joe, 7.1), (Sylvia, 5), (…)}
Recommended Connections
Problem: John, a salesperson, needs more
information to polish his strategy with Andrea, an
important lead.
To whom should he turn?
Solution: recommended connections uses the
contact graph to identify Joe as the best person to
turn to for that information. Joe already knows
Andrea and has shared connections.
John
Sales Rep
Andrea
Buyer
Mark
Evaluator
Alex
Sponsor
Joe
Acct
mngr
Sylvia
Sales
34. Coupled with AI models, our graph delivers Contextual Services.
ContextGraph Models
• Pricing discussed
• Scheduling requested
• Exec involved
• etc.
• Identify hot leads
• Best time to email
• Recommend connections
• Updated contact info notification
• Suggest recipients, or rooms, for meetings
• Identify contact’s role: economic buyer, evaluator, influencer, etc.
• Relationship with contact: e.g., strength of connection, communication topics
Who is a particular email from and why should I care?
Role, latest communication, meeting history, mutual
friends, contact info, etc.
T
B
CD
E
A
U
35. Graph and ML/Deep Learning
Context free insights
• Aka pricing discussed vs pricing request: does not require context of products or services
• Allows composition: pricing discussed can be combined with or product mention or contact strength
AI insights enrich the graph
• Feedback
DL on graph
• Ex: DeepWalk to find who is influencer, colleagues, similar profiles, etc
DL on Graph: convnets, RNNs
• Challenges: non Euclidean data, invariance (node ordering), non fixed (dynamic), directed, etc
Still an area of research
37. Agenda
• Introduction
• Activity & AI
• Building Classifiers
• Generating Insights
• Demo
• Relationship Insights
• Wrap up and QAs
38. Take aways
• Activity data can tremendously reinforce Salesforce applications
• Context changes the meaning of data
• Streaming, batch, ML and Graph are complimentary
• Privacy has a wide impact
• Large scale activity processing and near real time
Welcome everyone .. my name is Alexis Roos, I am director of ML at Salesforce in Salescloud team and I am to discuss how we are building insights and recommendations from activity data.
This presentation will cover some future developments, and Salesforce being a public company, I have to remind you to only make purchasing decisions based on products that are commercially available today from Salesforce.
Salesforce offers a comprehensive customer success platform that allows our users to deliver an unique experience.
We are the fastest growing, top 5 enterprise software company, and are proud to have been called “Innovator of the Decade” last year and named
one of Fortune’s “Best Place to Work” nine years in a row.
We are over $10B in revenue for FY18 and it is thanks to our customers and partners that we could achieve this success.
Our customer success platform is the World’s #1 CRM, and covers a spectrum of our user’s customer interactions, from sales, marketing, service, commerce and community; and capabilities with IoT, Applications and Analytics clouds.
Over the last few years, Salesforce has embarked in adding AI right into our platform and all our applications; to deliver the World’s Smartest CRM and Salesforce made significant investments in Intelligence related companies ranging from analytics to Machine Learning or Deep Learning.
Using our customer data and a deep understanding of the customer experience, we are working on making all of our applications smarter across our clouds through our Einstein initiative.
Today, we will specifically going to discuss what we are doing in Sales cloud related to activity data.
Would like to play a video introducing Salesforce Inbox
I will start with Salesforce introduction and I will then cover why context matters for AI and why use graph. Noah will then discuss design principles for building the graph and best practices and lessons learned.
An Activity can be broadly defined as Any user interaction event with a timestamp.
Activites are either originated or consumed by the user. Typically, point to point with a customer
Indicate relationship strength with customers.
Helps answer who is connected to whom.
frequency, duration, reciprocity all of are important signals indicating health and strenght of the relationship.
high potential extremely rich dataset. Sales people send out 100s of emails every day, 10s of meetings and calls. Very rich in signals.
Recency important but historical context is always relevant
However, dealing with Activities is challenging
Some of the challenges of working with Activities in our context.
The first challenge stems from our unique use cases and space.
Very few companies in the world process 100s of millions of events in real time over a highly distributed system of multiple extractors and evaluators.
High reliability and on-demand access a must. The same platform powers Inbox, EAC and other product - meaning its extensible.
On top of that, Activities are high volume events. Creates scale and perf challenges.
Automatic data capture of Activities is really important. your relationships and network not what you say it is, its what happened.
people make mistakes, forget to log. In order to generate accurate insights, data must be accurate and reliable.
When using ML to derive insights, rarity of activities a huge concern.
On time insights greatly increase user efficiency and effectiveness. over time, the number of insights would grow but insights delivery time should be constant.
A side effect of being high potential, very rich datasets is that activity data needs to be protected, encrypted and handled really carefully. Contains PII.
Plus data retention and GDPR regulations mandate storage constraints.
now, we move to talk about how solve these problems
My team is working on Data science using activity data such as emails, meetings, phone calls, tasks, etc
We have developed automated activity capture allowing to capture and federate all activities for an user or across users working on an opportunity.
Using that data and AI, we can make all Salesforce application smarter.
Using AI we can make Salesforce Inbox smarter.
We have started by focusing on emails and extracting relevant insights in emails such as pricing discussed, scheduling request or executive involved.
We surface those insights in our AI Inbox and they will soon be available in Sales cloud UI and other applications.
This allows our users to stay on top of what matters; which is for instance identifying an important pricing discussion from a top customer.
In addition to surfacing insights, we suggest what actions our users should take next and we track direct and indirect feedback from our users to keep improving the experience.
Lastly, we organize historical activities into contextual information; and use it to reinforce models and improve recommendations as I will discuss in next few slides.
I will start with Salesforce introduction and I will then cover why context matters for AI and why use graph. Noah will then discuss design principles for building the graph and best practices and lessons learned.
Let’s examine what a typical email might look like
May contain several parts:
Headers (incl. Auto-Reply)
Greeting
Body
Signature
Confidentiality notice
Reply chain [noise; might duplicate insights]
or none of these at all!
Filters may need access to header information (eg: Auto-Reply headers)
Classifiers usually operate on the body portion.
Once we’ve isolate the body, the E-mail can be classified
So a big question is, how can we get a higher yield of positive labels when labeling by hand?
Here is some synthetic data that we’ve generated in order to help illustrate this idea.
Here you see that there are a lot more negative labels than positive labels.
What if we can zoom in on the region of space around the positive labels?
CLICK
If we think of the green circle as the decision boundary of a classifier, then this classifier has really high recall
In some situations, it might be really difficult to separate the positive and negative labels, but significantly easier to build a high-recall classifier, which we can use as a filter
Labelers can only look at points inside the green circle
Our data labeling pipeline has a filtering step prior to shipping data off to labelers using our in house labeling tool.
This involves a number of steps where we remove emails such as Mass Email or marketing emails.
We also use sophisticated sampling to improve recall and solve problem mentioned in previous slide
Now, LDA models each document as a probability distribution over topics
So, when we score a new document, we get back a distribution over topics
CLICK
For instance, if we were to score a wikipedia article about Boeing, we would likely get back a probability distribution composed primarily of topics 4 and 5
Similarly…
CLICK
We have a pipeline for generating feature vectors and training models, and today we’re going to focus on how we use LDA to generate a piece of our feature vector
Our models are trained with a variety of NLP techniques – depends on the model
Models are optimized for true positives
False positives make users lose trust
Rather have lower recall and higher precision
The trained models are deployed and then used in our near-real-time scoring pipeline.
We have a real-time scoring pipeline using structured streaming that another team at SalesforceIQ is building
The scoring pipeline is pretty straightforward: we read in data from Kafka, apply that high-recall filter, generate features, and then score
CLICK
Generating an Insight from activity involves combining the scoring pipeline I just discussed with business rules.
Data (facts) - Information - Knowledge pyramid. Classifier gives us facts, facts lend themselves to information
(facts) run classifiers
(facts) link with actions
(information) combine it all together
The biggest challenge for us is getting all of this to happen in just a few seconds.
Inbox is a very real-time use case
User reads e-mail before it’s scored = missed engagement opportunity
User takes action before it’s scored = “why do I even need this?”
So let’s explore what happens at each of these stages in a bit more detail
Step 1 is to collect as much activity as possible and filter out the noise
Spam
Marketing E-mails
etc.
Do this with a variety of heuristics which run very efficiently.
But preparing an E-mail for classification is a bit more complicated than you might think…
Our pipeline runs each classifier in parallel and generates 1 fact per classifier.
Whole process uses Kafka + Spark Structured Streaming
Facts get passed on to another Kafka topic
These facts are used as the basis for Insights
The facts enter a portion of our pipeline called the Insight Publisher
Facts are associated with Actions
Metadata is transformed into a format usable by Clients like Salesforce Inbox
Clients can surface suggested next-steps
We’ve now converted a fact into something a salesperson can act on… assuming of course that we generate the correct facts
I will start with Salesforce introduction and I will then cover why context matters for AI and why use graph. Noah will then discuss design principles for building the graph and best practices and lessons learned.
AI has finally entered mainstream and is present everywhere in our personal lives
. Siri provides a personal assistant, that has transformed the way you are looking for, where ever you are.
. Amazon powers recommendations allowing you to to shop more efficiently.
. Google powers a multiple of personalized services.
What do all those AI applications have in common?
They leverage user context, that is knowledge acquired overtime about the user which allows to improve recommendations or services overtime.
There are some notable differences between consumer and enterprise space.
In the enterprise, user isn’t the product but a customer. As such there are lots of capabilities that need to be supported such as data security and auditing but also retention and privacy or GDPR which gives the user control to their data. These rules impact how context can be acquired and governed and add significant complexity.
Context has to be scoped: some of the data can only be accessed by users while some of it might be accessed by teams or an organization but we just cannot share data across organizations such as Citibank doesn’t want to share their data with Bank of America.
Trust is our #1 value and that customers own their data and have full control over who can see/access their data.
The context is very rich and encompasses many different activities but also products, companies or services.
It is also very dynamic as in our case, we are handling activity data such as emails which are coming very fast and constantly reshape the user context
For our customers, context enables us to deliver deeper insights.
This allows for instance to classify an email using content beyond a single email.
For instance, do I know the sender? How well do I know him or her?
Is the email discussing my company products and services or is it discussing competitors?
Who in my organization can help me sell to an individual or company?
Can I get more background information or historical data about the company, or identify the decision maker?
And automatically get an introduction?
A graph is the ultimate data structure to encode complex relationships.
An organization can have ten of thousands of contacts which extend beyond the organization (senders or recipients of emails for instance)
A single organization can have millions of events every week.
We want to capture these interactions as a graph as this allows us to learn over time who the contacts and roles are within and outside the organization allowing us, for instance, to power a services like recommended connections which I will explain in next slide.
In addition the context can expand to company information or products and services which can be used across insights.
Nodes of the graph can model the various types of data we need to capture such as contacts, companies, products, etc while edges of the graph can model relationships and these edges can be directed and have properties such as for instance showing that two contacts are connected directly and have discussed certain topics such as pricing.
In the example on the upper right, we show a very simple graph that is constructed automatically overtime using emails and meetings of Salesforce users who have opted in their activity; and shows contacts with the target organization on the upper side and a sales organization that is using Salesforce on the lower side.
With a graph of activities modeled for an organization, we can now implement services such as Recommended Connections.
On the left is the graph that is slightly more developed than the previous slide showing contacts derived from the salesforce users on the bottom side; and an external organization on the upper side. In it we can see John, a salesperson, who is trying to reach out to Andrea.
The graph models collective knowledge from Andrea's organization and can help us calculate who is best connected to Andrea (in that case Joe and Sylvia as they did business with Andrea before) and as such we can recommend to John to get an introduction from Joe.
Our graph allows to delivery contextual services to improve our AI models such as how important is the sender of this email or the company.
But it also allows to deliver additional services such as recommended connections, when a contact is updated, who is the influencer, when is the best time to send an email, and so on.
The insights we generate from activities such as email insights like pricing discussed are written back to the graph which allows to further improve services such as recommended connections: for instance looking for connections that have discussed pricing. This in turn provides a feedback loop allowing to constantly improve our models.
Noah will deliver the remainder of the presentation but I want to cover one more slide on Graph and how it fits with Machine Learning and Deep Learning.
We are still early in this journey and our approach so far has been:
. Ensuring that our insights are context free. For instance, we have defined pricing discussed vs pricing request as this doesn’t require any context such as knowing about products or services. It not only makes the model more accurate and easier to build but also allows composition such as combining pricing discussed with product mention to identify pricing discussions about our organization products.
. As discuss in the previous slide, our AI insights enrich the graph. This allows to refine models over time and develop more graph services.
. We also have been using some Deep Learning directly on the graph such as DeepWalk which Noah will discuss later.
. Lastly using Graph data structure more broadly with Deep learning is still an active area of research but has a number of challenges to overcome including the fact that graph data is non Euclidean or dealing with invariance, dynamicity or directness.
Just walk through the diagram.
Batch, why on the right; online serving. Building vs serving graph; robust and change over time.
Looking in the peach rectangle, we have to make a tradeoff about how much information we take from the Shared Activity Store and put onto the graph. I’ll give start with an example to motivate this.
It is a very exciting time, to join Salesforce and we are looking for engineers and scientists to help us build more AI capabilities into our platform and applications, so please check out our openings.