LeadGenius Co-Founder and Chief Scientist, Anand Kulkarni discusses the future of sales automation, remote work, and outbound email at the SVDE Meetup Group presented by Treasure Data. September 2015.
Full video of presentation available at: http://blog.leadgenius.com/data-driven-sales-that-scale-ai-that-sells/
6. They analyze which companies want to buy what
they’re selling
6
old school new school
7. Sales people engage those prospects in commercial
conversations (“selling”)
7
old school new school
8. What do salespeople do all day?
8
Salespeople Find Companies
The Search Problem
Salespeople Analyze Companies
The Intent Problem
Salespeople Talk to People
The Sales Problem
9. 9
AI that Finds Companies
The Search Problem
AI That Understands Buying Behavior
The Intent Problem
AI that Talks to People
The Email Turing Test
Three Problems of Interest
12. The Company Search Problem
12
At LeadGenius, we want to figure out every single company in the world
who might buy somebody’s product.
We’ll start by solving the slightly more general problem of finding
every company in the United States.
After that, we’ll talk about how to decide which ones of those companies
want to buy something.
13. Grabbing data about companies
13
We crawled data from fifty-five sources,
including:
• Social Media
• Online Directories
• Secretary of Sate Listings
• SEC filings
• IRS nonprofit database
17. Entity resolution: The Fancy Way
17
A company p is a vector of ~30 properties that we know about it.
(Name, address, revenue, industry, founding year, technologies used,…)
18. Entity resolution: The Fancy Way
18
A company p is a vector of ~30 properties that we know about it.
(Name, address, revenue, industry, founding year, technologies used,…)
two companies are the same if distance (p1, p2) < e
distance between companies = probability of same
20. This works, but…
20
Super slow!
Requires us to do pairwise comparisons …
… potentially across a huge number
of data points and data sources.
Sometimes data falls out of date.
22. 22
Let’s find a set of properties that are
less likely to change often.
23. Entity Resolution: The Easy Way
23
Two companies are the same if and only if they have the same “official”
physical address.
24. So… how many businesses are in the US?
21,708,021 US businesses
6,049,655 US businesses have >1 person
• Yelp (~47M establishments, some of which are same company)
• LinkedIn (~2M unique companies)
• CrunchBase (~650K unique companies)
• AngelList (~289K unique companies)
24
25. Some queries we can answer
25
• Which U.S. industries have the most distinct organizations listed in
LinkedIn?
Industry Count
Construction 157533
Real Estate 114366
Information Technology and Services 113292
Hospital & Health Care 99552
Marketing and Advertising 87820
• Q: How many Fortune 500 companies have websites?
• A: 499!
26. Bonus Problems
26
• How long is information trustworthy after we
retrieve it? (decay functions)
• What’s the optimal frequency to retrieve
information? (expectation-maximizations)
• How do we nab information from sites that don’t
have cleanly-structured schemas?
(watch humans do it)
28. The Problem
28
Given a set of companies who have brought something from us in the
past…
… which companies are interested in buying from us in the future?
This is a very hard problem.
Non-generalizable: Whether someone’s buying something depends
heavily on the specific industry.
Time-dependent: Whether some company needs a product is always
changing.
29. The Conventional Approach: Machine Learning
29
From our previous step, we already have a whole set of companies
represented as mathematical vectors.
We just need to train up a solid classifier to separate which ones are
going to buy from us and which ones aren’t.
31. How it Works
• We train a neural net by showing it a whole bunch (greater than
10,000) labeled examples of companies who have bought our
products in the past.
31
32. How it Works
• Our system learns a function that separates the objects in space.
32
33. How it Works
• For new objects, our classifier can decide which type it is!
33
34. 34
There are some good ways we can use them here, too!
companies
matching
an ICP
companies not
matching an ICP
ICP - “Ideal Customer Profile”
35. A better strategy: Human Computation
• Pull a probabilistic estimate from our classifier on whether a
company is in-market for a product or not.
• If the probability is low – below 80% – we escalate it to a
trained person in a 500-person crowd who can make a human-
powered determination on whether the company is going to buy
or not. They can even add a feature.
• After we make that call, add that data to the training set to make
the classifier smarter
• Boosts likelihood of success to human levels… depending on
the human.
35
36. An even better strategy: monitor behaviors;
cluster companies by stage
• Behaviors:
For each one of the companies we’ve crawled, see whether they’ve:
- Viewed a website that sells that technology
- Opened a new job position relevant to the product
- Started hiring
- Tweeted about problem
- Replied positively to an email someone sent
36
• Stage:
Companies need different items at different stages in their lives:
- Figure out where they are in a “lifecycle”
- Match them against similar companies
38. 38
How do we write
(sales) messages by email
that are as good as those
sent by a human being?
39. Recent results from deep learning say yes.
Is this even plausible?
Xiaome (Microsoft China) trained on real-world chatbot logs
Support chats trained on data
(Deep Mind, Google)
39
40. Example: Automated Support Interactions
40
Describe your problem: browser keeps crashing
Machine: hi
Human: hi
Machine: could you please let me know what is the operating system you are using ?
Human: linux
Machine: ok this is the issue with the system per- formance , nasser , tools .
Human: what is that ?
Machine: is it a windows machine ?
Machine: we have linux on your machine .
Machine: is it windows xp or windows 7 ?
Human: i am using linux
Machine: let me take a remote of your machine and check if issue has been pushed properly . Hum
Machine: please open terminal and run the be- low commands in one computer :
Machine: sudo apt - get update
42. Problem
Requirements
Given a topic, an author, an inbox, and a
recipient, compose a message that:
1) has a maximum likelihood of yielding
a positive reply
2) looks indistinguishable from an email
that the author sent
This is a hard problem…. the Turing Test
It’s been around since the 40s.
Restricting it to sales and email might make it easier
42
43. Secret Weapons
* We can choose to ask a human
being from our crowd of trained folks
for help.
* We can mine the inbox for whatever
examples we need or want.
43
44. Strategy
44
1) Craft a generalized template by analyzing the sender’s email inbox
2) Collect data at scale to populate that message
3) Change content based on what you discover about that person
46. Writing Messages
46
Going further…
- How likely is someone to reply to us based on…
- Length?
- Tone?
- Subject complexity?
- Word choice?
Let’s show this to the user and then optimize based on that.
47. How likely is someone to open this email?
Predicting responses from length
47
48. How likely is someone to open this email?
Predicting responses from templatization
48
49. Humans in the “crowd” can radically improve our templates automatically
Optimizing Templates
“Wish”, AAAI Human Computation 2014
49
50. What did someone say about our email?
Understanding responses
50
51. The hard way: sentiment analysis
Understanding responses
51
Positive sentiment corpus Negative sentiment corpus
Twitter as a Corpus for Sentiment Analysis and Opinion Mining (2011)
53. The easy way: human computation
Understanding responses
53
54. Scripting responses
54
From:
anand@leadgenius.com
To: sarah@hotlead.com
Subj: Quick Question, Sarah
Hi Sarah,
I saw you guys were hiring
for SDRs. We know each
other through Michael
James and I wanted to see if
we might be able to help you
scale your SDR team. I have
a few extra SDRs we can
push your way.
Let me know if you’d like to
chat further – we’re doing
this for SoldLead8 already.
BTW, congrats on your
recent round!
Cheers!
AK
Interested?
Here’s 3 times
that work for
me!
Here’s more
information!
Check back
later.
Specific
question
Automatically
schedule a
follow-up mail
56. 56
AI that Finds Companies
The Search Problem
AI That Understands Buying Behavior
The Intent Problem
AI that Talks to People
The Email Turing Test
57. Conclusions
• Company search can be attacked with
large-scale crawling, human
computation, entity resolution, and
careful data updates
• Buying intent can be deduced
automatically based on classifiers but is
done better with human computation
• Email communication is complex, has a lot
of interesting subproblems, and is
solvable!
57