11. The Recommender Ecosystem
Similar Profiles
Connections
Network updates
Events You May
Be Interested In
News
11
12.
13. LinkedIn Recommendation Engine
Jobs Groups
People
Recommendation … Ads
Entities Companies
Searches
be interested in
Similar Groups
Jobs You May
Jobs Browse
Browse Map
Similar Jobs
News
Browse Map
TalentMatch
Groups
Events
GYML
Referral
Profiles
People
Similar
Center
Map
… and more
Products
A/B
API
Recommen-
dation Types Behavior Collaborative
Popularity User Feedback
Analysis Filtering
Shared,
Dynamic,
Unified (R-T) Feature Extraction, Entity (R-T) matching computations
Core Resolution & Enrichment Offline data munging (hadoop)
Service
16. Possible Approaches
Naïve K Nearest Neighbor solution
Complexity is O(n 2 )
Clustering
Latent Factor Models like PLSI or LDA
Hierarchical Agglomerative clustering
Self Organizing Maps
Item based Collaborative Filtering
Find pairs of Users viewed in the same session
17. Challenges
Scale
175+ M profiles
Dimensionality
~2M companies
~200K schools
~147 industries
~200 countries
~25K titles
~40K Skills
~200 Job Functions
Similar means different things to different people
Similar Behavior doesn’t mean you can replace me at my job
Accuracy vs Relevance (me & my boss.. )
Realtime..
It’s a problem of accuracy.. Not recall..
18. Approach
Focus attention only on pairs likely to be similar
Filter out the possibly dis-similar pairs
Run Similarity Functions on filtered in pairs
FILTER
Cluster
Rank
20. Approach
Focus attention only on pairs likely to be similar
Filter out the possibly dis-similar pairs
Run Similarity Functions on filtered in pairs
FILTER
Cluster
Rank
21. Similarity Functions
Different bands of attributes
Boolean, Jaccard or Cosine Similarities across attribute
pairs.
• Logisitic Regression with Elastic Penalty
Learn model params on a set of hand labeled data points
Predicted value interpreted as score
22.
23. Ad Ranking
Given
U j ,{(c0, b0 ), (c1, b1 ), (c2, b2 ), (c3, b3 )..(cn, bn )}, H
Objective
argmax(pCTR i *bidi )
iÎC
Goal:
Increase revenue
Respect daily budgets of Advertisers
Good user experience
25. Virtual Profiling
Title : Eng Mgr
Company : LinkedIn
Location : CA,USA
Skills : ML, RecSys
Title : Vice President
Company : Twitter
Location : CA,USA
Skills : DM, ML,
RecSys
……………….
26. Virtual Profiling
Title :
Title : Eng Mgr Sr. SE<1>, Eng Mgr<1>,
Company : LinkedIn Eng Dir<1>
Location : CA,USA
Skills : ML, RecSys Company :
Title : Sr. SE LinkedIn<2>, Google<1>,
Company : Google
Location : PA, USA Location :
Skills : ML, DM CA,USA <2>, PA, USA<1>
Skills :
Title : Eng Dir ML<2>, RecSys<1>,
Company : Linkedin Stats<1>, DM<1>
Location : PA, USA
Skills : ML, Stats, DM
27. Virtual Profiling
Information Gain
Pick Top K overrepresented features from the
clicker distribution vs the target segment
A representative projection of the item in the
member feature space
28. CTR Prediction – CF Similarity
Ranker
MEMBER FEATURES
AD CREATIVE VIRTUAL PROFILE
Creative Score to
features pCTR
pCTRi correction
L2 regularized Logistic Regression (Liblinear, VW, Mahout, ADMM)
For new ad creatives back-off to the advertiser / ad category nodes till
they reach critical impression/click volume (explore/exploit)
29.
30. Feature Engineering – Entity Resolution
Companies
‘IBM’ has 8000+ variations
- ibm – ireland
- ibm research
- T J Watson Labs
- International Bus. Machines K-Ambiguous
- Deep Blue
Huge impact on the
business and UE
Ad targeting
TalentMatch
Referrals
Asonam’11, KDD’11
30
31. Feature Engineering – Sticky Locations
Open to relocation ?
Region similarity based on profiles or network
Region transition probability
predict individuals propensity to migrate and
most likely migration target
Impact on job recommendations
20% lift in views/viewers/applications/applicants
32. What should you transition to .. and when ?
Probability of switch
Months since graduation
32
35. Social Referral
Linkedin Group: Text Analytics
From: Deepak Agarwal – Engineering Director, LinkedIn
I found this group interesting, and I think you will too
Deepak
2X conversion
Linkedin Group: Text Analytics
> 2X Conversion
Mohammad Amin, Baoshi Yan, Sripad Sriram, Anmol Bhasin, Christian Posse.
Social Referral : Using network connections to deliver recommendations. To appear in
Proceedings of the Sixth ACM conference on Recommender systems (RecSys '12)
Taking a leaf out of PaoloCremonosi’s talk.. The answer is 50%.. There I gave it away.. Its time for coffee 50% of connections are from recommendations (PYMK50% of job applications are from recommendations (JYMBII)50% of group joins are from recommendations (GYML)
As a colleague of mine puts it.. We are the tour de force for Recommendations..From traditional recommender problems, i.e. recommending p
I am spoilt for choice here.. There is so much interesting work I can talk about .. But today I picked a few interesting areas not classically considered to be mainline recommender products.but in keeping with the Ecosystem theme, this application fits right in..Let’s talk about People Recommendations.. BUT not in the context of connecting or knowing or following or rating or dating .. This is about cloning..Recruiters and Head HuntersInterview multiple people for filling one role.Hiring Managers“Hire more like the superstars on my team..”LinkedInRecommend Jobs/News/Groups that “people like you” act on..More conceivable applications : Find similar leads for making a sales pitch, or let me give you a sample of people I want to show this Ad to.. Create me a segment .. or
Extensive set of tooling to target the population.. Yes we sorta shoot ourselves in the foot sometimes.. But then member’s come first. Example audience, in real time.. Let’s advertise tailor their campaigns. Also give a real-time reach estimate.
Solve the impedance mismatch by creating the Ad representation in the user space. This concept is used extensively at LinkedIn for all kinds of user recommendations, not just ads.
8000 name variants of IBMWe use the definition of entity resolution terminology k−ambiguous and k−variant from [10]. Same company name can denote multiple company entities but each occurrence of a company name references a single entity only. A name referring to k different entities is called k − ambigous. Additionally, An entity which can be referred to by k different names is called k − variant.Ranker approach does not work. A given name may not be resolvable in the sense that the company entity has not being created yet…Classification problemGiven a pair of (member position, company entity), a binary classifier would determine whether there is enough evidence to resolve the member position to the company entity. This would address the problem of the ranking approach in that an unresolvable member position would most likely remain unresolved because the classifier has insufficient evidence for any company entity. It is certainly possible that there could be multiple company entities with sufficient evidence for a member position.
Unreasonable effectiveness of Big Data.. This chart shows the probability of holding a title across all titles, plotted vs number of months after graduation. Notice the spikes.. They are ~12 month almost perfectly aligned.. Remember the itch that you had when you finished 2 years at your company
A brand new Recommendation Delivery paradigm – Tested on LinkedIn Groups to generate 2X Group Join rate. Applicable to advertising as well..The idea is simple - Reverse the Social Proof idea . Ask the actor to recommend their connections to interact with this item. - The message comes from the individual not LinkedInInherently socially endorsedTimely and contextualCan be applied to Ads delivery which we will be testing in the next few months
Incredibly powerful whetted paradigm that we are excited to try to rope into our Ads offerings
And now the technologies that drives it all. The core our matching algorithm uses Lucene with our custom query implementation. We use Hadoop to scale our platform. It serves a variety of needs from computing Collaborative filtering features, building Lucene indices offline, doing quality analysis of recommendation and host of other exciting thingsLucene does not provide fast real-time indexing. To keep our indices up-to date, we use a real-time indexing library on top of Lucene called Zoie. We provide facets to our members for drilling down and exploring recommendation results. This is made possible by a Faceting Search library called Bobo. For storing features and for caching recommendation results, we use a key-value store Voldemort. For analyzing tracking and reporting data, we use a distributed messaging system called Kafka.Out of these Bobo, Zoie, Voldemort and Kafka are developed at LinkedIn and are open sourced. In fact, Kafka is an apache incubator project.Historically, we have used R for model training. We have recently started experimenting with Mahout for model training.