SlideShare a Scribd company logo
1 of 44
Beyond
              Ratings
                 &
             Followers
   Anmol Bhasin
     Sr. Manager
Analytics Engineering
 www.linkedin.com
The answer is




                4
The Recommender Ecosystem
                      Similar Profiles




                           Connections




    Network updates
                             Events You May
                             Be Interested In




                          News




                                                11
LinkedIn Recommendation Engine

                                                     Jobs                           Groups
                                People
Recommendation                                                                                                     …   Ads
    Entities                                                                                                           Companies
                                                                                                                       Searches




                              be interested in




                                                                                                  Similar Groups
                               Jobs You May

                               Jobs Browse




                                                                                     Browse Map
                                                            Similar Jobs
                                                                                                                       News
                              Browse Map
               TalentMatch




                                                                                       Groups
                                                                                                                       Events




                                                                             GYML
                                 Referral
                                 Profiles
                                  People

                                  Similar


                                  Center




                                    Map
                                                                                                                       … and more
  Products



                                                                           A/B
                                                                           API
Recommen-
dation Types                 Behavior           Collaborative
                                                                                      Popularity                   User Feedback
                             Analysis             Filtering
 Shared,
 Dynamic,
  Unified                        (R-T) Feature Extraction, Entity                   (R-T) matching computations
   Core                             Resolution & Enrichment                   Offline data munging (hadoop)
  Service
Cloning
Possible Approaches

 Naïve K Nearest Neighbor solution
    Complexity is O(n 2 )


 Clustering
    Latent Factor Models like PLSI or LDA
    Hierarchical Agglomerative clustering


 Self Organizing Maps

 Item based Collaborative Filtering
    Find pairs of Users viewed in the same session
Challenges
   Scale
       175+ M profiles

   Dimensionality
       ~2M companies
       ~200K schools
       ~147 industries
       ~200 countries
       ~25K titles
       ~40K Skills
       ~200 Job Functions

   Similar means different things to different people
       Similar Behavior doesn’t mean you can replace me at my job
       Accuracy vs Relevance (me & my boss.. )

   Realtime..
   It’s a problem of accuracy.. Not recall..
Approach
  Focus attention only on pairs likely to be similar

    Filter out the possibly dis-similar pairs

    Run Similarity Functions on filtered in pairs




                             FILTER



Cluster
                                                     Rank
Locality Sensitive Hashing
 LSH function family for Cosine Distance
Approach
  Focus attention only on pairs likely to be similar

    Filter out the possibly dis-similar pairs

    Run Similarity Functions on filtered in pairs




                             FILTER



Cluster
                                                     Rank
Similarity Functions


 Different bands of attributes
    Boolean, Jaccard or Cosine Similarities across attribute
     pairs.

• Logisitic Regression with Elastic Penalty


    Learn model params on a set of hand labeled data points
    Predicted value interpreted as score
Ad Ranking
 Given
  U j ,{(c0, b0 ), (c1, b1 ), (c2, b2 ), (c3, b3 )..(cn, bn )}, H
 Objective

               argmax(pCTR i *bidi )
                  iÎC


 Goal:
   Increase revenue
   Respect daily budgets of Advertisers
   Good user experience
Campaign creation
Virtual Profiling

              Title : Eng Mgr
              Company : LinkedIn
              Location : CA,USA
              Skills : ML, RecSys

              Title : Vice President
              Company : Twitter
              Location : CA,USA
              Skills : DM, ML,
              RecSys
                     ……………….
Virtual Profiling
                                        Title :
Title : Eng Mgr                              Sr. SE<1>, Eng Mgr<1>,
Company : LinkedIn                                Eng Dir<1>
Location : CA,USA
Skills : ML, RecSys                     Company :
Title : Sr. SE                             LinkedIn<2>, Google<1>,
Company : Google
Location : PA, USA                      Location :
Skills : ML, DM                           CA,USA <2>, PA, USA<1>

                                        Skills :
Title : Eng Dir                               ML<2>, RecSys<1>,
Company : Linkedin                            Stats<1>, DM<1>
Location : PA, USA
Skills : ML, Stats, DM
Virtual Profiling

Information Gain



 Pick Top K overrepresented features from the
     clicker distribution vs the target segment



 A representative projection of the item in the
                member feature space
CTR Prediction – CF Similarity


                                    Ranker
                                                      MEMBER FEATURES
AD CREATIVE VIRTUAL PROFILE


     Creative                                              Score to
     features                                              pCTR
                                    pCTRi                  correction


    L2 regularized Logistic Regression (Liblinear, VW, Mahout, ADMM)

    For new ad creatives back-off to the advertiser / ad category nodes till
     they reach critical impression/click volume (explore/exploit)
Feature Engineering – Entity Resolution

 Companies
  ‘IBM’ has 8000+ variations
  - ibm – ireland
  - ibm research
  - T J Watson Labs
  - International Bus. Machines             K-Ambiguous

  - Deep Blue

 Huge impact on the
  business and UE
     Ad targeting
     TalentMatch
     Referrals
                        Asonam’11, KDD’11

                                                          30
Feature Engineering – Sticky Locations
 Open to relocation ?
   Region similarity based on profiles or network
   Region transition probability




   predict individuals propensity to migrate and
    most likely migration target
 Impact on job recommendations
   20% lift in views/viewers/applications/applicants
What should you transition to .. and when ?
 Probability of switch




                         Months since graduation

                                                   32
Social Referral
Social Referral

Linkedin Group: Text Analytics
      From: Deepak Agarwal – Engineering Director, LinkedIn




I found this group interesting, and I think you will too

Deepak
                                                           2X conversion
Linkedin Group: Text Analytics
                                > 2X Conversion


       Mohammad Amin, Baoshi Yan, Sripad Sriram, Anmol Bhasin, Christian Posse.
           Social Referral : Using network connections to deliver recommendations. To appear in
            Proceedings of the Sixth ACM conference on Recommender systems (RecSys '12)
Orthogonality in A/B
Beware of some A/B testing pitfalls

1. Novelty effect
     E.g., new job recommendation algorithms
      have week-long novelty effect that shows
      lifts twice the stationary (real) one
                                 job views per 5% bucket range - 6/5/11                        job views 6/19/11
                     9,000
                                                               7,000
                     8,000

                     7,000
                                                               6,000

                     6,000                                     5,000
                     5,000                                     4,000
                                                             job views per 5% bucket range -
                     4,000                                   6/5/11
                                                                3,000                                              job views 6/19/11
                     3,000
                                                               2,000
                     2,000

                     1,000
                             1 week lifts                      1,000
                                                                                  2weeks lifts
                        0                                           0
                             0    5    10    15    20   25              0           5          10   15   20   25




1. Cannibalization
     Zero-sum game or real lift?
2. Random sampling destroys
   network effect
                                                                                                                                       38
                                                                                                                                            38
Open Source Technologies



            Bobo
Zoie




            Voldemort
  Kafka




             http://data.linkedin.com   40
Credits

Engineering : Abhishek Gupta, Adam Smyczek, Adil Aijaz,
Alan Li, Baoshi Yan, Bee-Chung Chen, Deepak Agarwal,
Ethan Zhang, Haishan Liu, Igor Perisic, Jonathan
Traupman, Liang Zhang, Lokesh Bajaj, Mario Rodriguez,
Mitul Tiwari, Mohammad Amin, Monica Rogati, Parul
Jain, Paul Ogilvie, Sam Shah, Sanjay Dubey, Tarun Kumar,
Trevor Walker, Utku Irmak

Product : Andrew Hill, Christian posse, Gyanda Sachdeva,
Mike Grishaver, Parker Barrile, Sachit Kamat

                                  Alphabetically sorted 
A Recommendation for you..

 Picture yourself with this New Job:


                 You
                 Applied Researcher /
                 Research Engineer
Contact:
 abhasin@linkedin.com


http://data.linkedin.com/

More Related Content

What's hot

Overlappings and Underpinnings - Content Strategy and Information Architecture
Overlappings and Underpinnings - Content Strategy and Information ArchitectureOverlappings and Underpinnings - Content Strategy and Information Architecture
Overlappings and Underpinnings - Content Strategy and Information ArchitectureChris Moritz
 
2012 Product Primer
2012 Product Primer2012 Product Primer
2012 Product PrimerGian Pisuena
 
code name 'Yeati' pitch deck english
code name 'Yeati' pitch deck englishcode name 'Yeati' pitch deck english
code name 'Yeati' pitch deck englishJacky Lee
 
Intro to Vita Beans
Intro to Vita BeansIntro to Vita Beans
Intro to Vita Beansamruth
 
Nicetom itsme hypertube team, itsme workshop
Nicetom itsme   hypertube team, itsme workshopNicetom itsme   hypertube team, itsme workshop
Nicetom itsme hypertube team, itsme workshopchiarart6
 
Knowing me, Selling me_May011
Knowing me, Selling me_May011Knowing me, Selling me_May011
Knowing me, Selling me_May011Joao Coelho
 
Using Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOUsing Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOOptify
 
The evolution of a global workplace connect 2012 (cnw001)
The evolution of a global workplace connect 2012 (cnw001)The evolution of a global workplace connect 2012 (cnw001)
The evolution of a global workplace connect 2012 (cnw001)Mark Heid
 

What's hot (8)

Overlappings and Underpinnings - Content Strategy and Information Architecture
Overlappings and Underpinnings - Content Strategy and Information ArchitectureOverlappings and Underpinnings - Content Strategy and Information Architecture
Overlappings and Underpinnings - Content Strategy and Information Architecture
 
2012 Product Primer
2012 Product Primer2012 Product Primer
2012 Product Primer
 
code name 'Yeati' pitch deck english
code name 'Yeati' pitch deck englishcode name 'Yeati' pitch deck english
code name 'Yeati' pitch deck english
 
Intro to Vita Beans
Intro to Vita BeansIntro to Vita Beans
Intro to Vita Beans
 
Nicetom itsme hypertube team, itsme workshop
Nicetom itsme   hypertube team, itsme workshopNicetom itsme   hypertube team, itsme workshop
Nicetom itsme hypertube team, itsme workshop
 
Knowing me, Selling me_May011
Knowing me, Selling me_May011Knowing me, Selling me_May011
Knowing me, Selling me_May011
 
Using Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEOUsing Personas to Boost Online Marketing and SEO
Using Personas to Boost Online Marketing and SEO
 
The evolution of a global workplace connect 2012 (cnw001)
The evolution of a global workplace connect 2012 (cnw001)The evolution of a global workplace connect 2012 (cnw001)
The evolution of a global workplace connect 2012 (cnw001)
 

Similar to Beyond ratings and followers (RecSys 2012)

Jobs2Web Full Overview
Jobs2Web Full OverviewJobs2Web Full Overview
Jobs2Web Full OverviewJim_Kilgore
 
Jobs2web Interactive Recruiting Solutions
Jobs2web Interactive Recruiting SolutionsJobs2web Interactive Recruiting Solutions
Jobs2web Interactive Recruiting SolutionsKARA KANIS
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The PeopleDaniel Tunkelang
 
Big Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInBig Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInAlexis Baird
 
Microsoft The Power Of A Sourcing Pipeline 2008 12
Microsoft   The Power Of A Sourcing Pipeline 2008 12Microsoft   The Power Of A Sourcing Pipeline 2008 12
Microsoft The Power Of A Sourcing Pipeline 2008 12Robert Richardson
 
Deep dive into LinkedIn solutions and applications
Deep dive into LinkedIn solutions and applicationsDeep dive into LinkedIn solutions and applications
Deep dive into LinkedIn solutions and applicationsJacco Valkenburg
 
MS Fast Search Server
MS Fast Search ServerMS Fast Search Server
MS Fast Search ServerWaleed Badawy
 
Alternative Careers For Librarians
Alternative Careers For LibrariansAlternative Careers For Librarians
Alternative Careers For LibrariansNick Berry
 
Gaining Empathy with your Users - the RTFM of User Experience
Gaining Empathy with your Users - the RTFM of User ExperienceGaining Empathy with your Users - the RTFM of User Experience
Gaining Empathy with your Users - the RTFM of User ExperienceRick Boardman
 
Analytics Dream Jobs
Analytics Dream JobsAnalytics Dream Jobs
Analytics Dream Jobsvijayganesh06
 
Seo zero to seo hero intro to search marketing
Seo zero to seo hero intro to search marketingSeo zero to seo hero intro to search marketing
Seo zero to seo hero intro to search marketingJaspal Sahota
 
LiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceLiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceAliaksandr Birukou
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...Christian Posse
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsAlyona Medelyan
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics Peter Wren-Hilton
 
Core and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutCore and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutAre Halland
 

Similar to Beyond ratings and followers (RecSys 2012) (20)

Jobs2Web Full Overview
Jobs2Web Full OverviewJobs2Web Full Overview
Jobs2Web Full Overview
 
Jobs2web Interactive Recruiting Solutions
Jobs2web Interactive Recruiting SolutionsJobs2web Interactive Recruiting Solutions
Jobs2web Interactive Recruiting Solutions
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
Big Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedInBig Data and Data Standardization at LinkedIn
Big Data and Data Standardization at LinkedIn
 
Microsoft The Power Of A Sourcing Pipeline 2008 12
Microsoft   The Power Of A Sourcing Pipeline 2008 12Microsoft   The Power Of A Sourcing Pipeline 2008 12
Microsoft The Power Of A Sourcing Pipeline 2008 12
 
Deep dive into LinkedIn solutions and applications
Deep dive into LinkedIn solutions and applicationsDeep dive into LinkedIn solutions and applications
Deep dive into LinkedIn solutions and applications
 
MS Fast Search Server
MS Fast Search ServerMS Fast Search Server
MS Fast Search Server
 
Alternative Careers For Librarians
Alternative Careers For LibrariansAlternative Careers For Librarians
Alternative Careers For Librarians
 
Intranet Sitemap
Intranet SitemapIntranet Sitemap
Intranet Sitemap
 
Gaining Empathy with your Users - the RTFM of User Experience
Gaining Empathy with your Users - the RTFM of User ExperienceGaining Empathy with your Users - the RTFM of User Experience
Gaining Empathy with your Users - the RTFM of User Experience
 
Analytics Dream Jobs
Analytics Dream JobsAnalytics Dream Jobs
Analytics Dream Jobs
 
LinkedIn Solutions
LinkedIn SolutionsLinkedIn Solutions
LinkedIn Solutions
 
Rps
RpsRps
Rps
 
Seo zero to seo hero intro to search marketing
Seo zero to seo hero intro to search marketingSeo zero to seo hero intro to search marketing
Seo zero to seo hero intro to search marketing
 
LiquidPub: Services at Service of Science
LiquidPub: Services at Service of ScienceLiquidPub: Services at Service of Science
LiquidPub: Services at Service of Science
 
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw... Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
Key Lessons Learned Building Recommender Systems for Large-Scale Social Netw...
 
The Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text AnalyticsThe Next Generation SharePoint: Powered by Text Analytics
The Next Generation SharePoint: Powered by Text Analytics
 
The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics The Next-Generation SharePoint: Powered by Text Analytics
The Next-Generation SharePoint: Powered by Text Analytics
 
Case study linkedin
Case study  linkedinCase study  linkedin
Case study linkedin
 
Core and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and OutCore and Paths: Designing Findability from the Inside and Out
Core and Paths: Designing Findability from the Inside and Out
 

Recently uploaded

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 

Recently uploaded (20)

Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 

Beyond ratings and followers (RecSys 2012)

  • 1. Beyond Ratings & Followers Anmol Bhasin Sr. Manager Analytics Engineering www.linkedin.com
  • 2.
  • 3.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. The Recommender Ecosystem Similar Profiles Connections Network updates Events You May Be Interested In News 11
  • 12.
  • 13. LinkedIn Recommendation Engine Jobs Groups People Recommendation … Ads Entities Companies Searches be interested in Similar Groups Jobs You May Jobs Browse Browse Map Similar Jobs News Browse Map TalentMatch Groups Events GYML Referral Profiles People Similar Center Map … and more Products A/B API Recommen- dation Types Behavior Collaborative Popularity User Feedback Analysis Filtering Shared, Dynamic, Unified (R-T) Feature Extraction, Entity (R-T) matching computations Core Resolution & Enrichment Offline data munging (hadoop) Service
  • 14.
  • 16. Possible Approaches  Naïve K Nearest Neighbor solution  Complexity is O(n 2 )  Clustering  Latent Factor Models like PLSI or LDA  Hierarchical Agglomerative clustering  Self Organizing Maps  Item based Collaborative Filtering  Find pairs of Users viewed in the same session
  • 17. Challenges  Scale  175+ M profiles  Dimensionality  ~2M companies  ~200K schools  ~147 industries  ~200 countries  ~25K titles  ~40K Skills  ~200 Job Functions  Similar means different things to different people  Similar Behavior doesn’t mean you can replace me at my job  Accuracy vs Relevance (me & my boss.. )  Realtime..  It’s a problem of accuracy.. Not recall..
  • 18. Approach  Focus attention only on pairs likely to be similar  Filter out the possibly dis-similar pairs  Run Similarity Functions on filtered in pairs FILTER Cluster Rank
  • 19. Locality Sensitive Hashing  LSH function family for Cosine Distance
  • 20. Approach  Focus attention only on pairs likely to be similar  Filter out the possibly dis-similar pairs  Run Similarity Functions on filtered in pairs FILTER Cluster Rank
  • 21. Similarity Functions  Different bands of attributes  Boolean, Jaccard or Cosine Similarities across attribute pairs. • Logisitic Regression with Elastic Penalty  Learn model params on a set of hand labeled data points  Predicted value interpreted as score
  • 22.
  • 23. Ad Ranking  Given U j ,{(c0, b0 ), (c1, b1 ), (c2, b2 ), (c3, b3 )..(cn, bn )}, H  Objective argmax(pCTR i *bidi ) iÎC  Goal:  Increase revenue  Respect daily budgets of Advertisers  Good user experience
  • 25. Virtual Profiling Title : Eng Mgr Company : LinkedIn Location : CA,USA Skills : ML, RecSys Title : Vice President Company : Twitter Location : CA,USA Skills : DM, ML, RecSys ……………….
  • 26. Virtual Profiling Title : Title : Eng Mgr Sr. SE<1>, Eng Mgr<1>, Company : LinkedIn Eng Dir<1> Location : CA,USA Skills : ML, RecSys Company : Title : Sr. SE LinkedIn<2>, Google<1>, Company : Google Location : PA, USA Location : Skills : ML, DM CA,USA <2>, PA, USA<1> Skills : Title : Eng Dir ML<2>, RecSys<1>, Company : Linkedin Stats<1>, DM<1> Location : PA, USA Skills : ML, Stats, DM
  • 27. Virtual Profiling Information Gain  Pick Top K overrepresented features from the clicker distribution vs the target segment A representative projection of the item in the member feature space
  • 28. CTR Prediction – CF Similarity Ranker MEMBER FEATURES AD CREATIVE VIRTUAL PROFILE Creative Score to features pCTR pCTRi correction  L2 regularized Logistic Regression (Liblinear, VW, Mahout, ADMM)  For new ad creatives back-off to the advertiser / ad category nodes till they reach critical impression/click volume (explore/exploit)
  • 29.
  • 30. Feature Engineering – Entity Resolution  Companies ‘IBM’ has 8000+ variations - ibm – ireland - ibm research - T J Watson Labs - International Bus. Machines K-Ambiguous - Deep Blue  Huge impact on the business and UE  Ad targeting  TalentMatch  Referrals Asonam’11, KDD’11 30
  • 31. Feature Engineering – Sticky Locations  Open to relocation ?  Region similarity based on profiles or network  Region transition probability  predict individuals propensity to migrate and most likely migration target  Impact on job recommendations  20% lift in views/viewers/applications/applicants
  • 32. What should you transition to .. and when ? Probability of switch Months since graduation 32
  • 33.
  • 35. Social Referral Linkedin Group: Text Analytics From: Deepak Agarwal – Engineering Director, LinkedIn I found this group interesting, and I think you will too Deepak 2X conversion Linkedin Group: Text Analytics > 2X Conversion Mohammad Amin, Baoshi Yan, Sripad Sriram, Anmol Bhasin, Christian Posse. Social Referral : Using network connections to deliver recommendations. To appear in Proceedings of the Sixth ACM conference on Recommender systems (RecSys '12)
  • 36.
  • 38. Beware of some A/B testing pitfalls 1. Novelty effect  E.g., new job recommendation algorithms have week-long novelty effect that shows lifts twice the stationary (real) one job views per 5% bucket range - 6/5/11 job views 6/19/11 9,000 7,000 8,000 7,000 6,000 6,000 5,000 5,000 4,000 job views per 5% bucket range - 4,000 6/5/11 3,000 job views 6/19/11 3,000 2,000 2,000 1,000 1 week lifts 1,000 2weeks lifts 0 0 0 5 10 15 20 25 0 5 10 15 20 25 1. Cannibalization  Zero-sum game or real lift? 2. Random sampling destroys network effect 38 38
  • 39.
  • 40. Open Source Technologies Bobo Zoie Voldemort Kafka http://data.linkedin.com 40
  • 41.
  • 42. Credits Engineering : Abhishek Gupta, Adam Smyczek, Adil Aijaz, Alan Li, Baoshi Yan, Bee-Chung Chen, Deepak Agarwal, Ethan Zhang, Haishan Liu, Igor Perisic, Jonathan Traupman, Liang Zhang, Lokesh Bajaj, Mario Rodriguez, Mitul Tiwari, Mohammad Amin, Monica Rogati, Parul Jain, Paul Ogilvie, Sam Shah, Sanjay Dubey, Tarun Kumar, Trevor Walker, Utku Irmak Product : Andrew Hill, Christian posse, Gyanda Sachdeva, Mike Grishaver, Parker Barrile, Sachit Kamat Alphabetically sorted 
  • 43. A Recommendation for you.. Picture yourself with this New Job: You Applied Researcher / Research Engineer

Editor's Notes

  1. 175+ M members2 members per second
  2. Taking a leaf out of PaoloCremonosi’s talk.. The answer is 50%.. There I gave it away.. Its time for coffee 50% of connections are from recommendations (PYMK50% of job applications are from recommendations (JYMBII)50% of group joins are from recommendations (GYML)
  3. As a colleague of mine puts it.. We are the tour de force for Recommendations..From traditional recommender problems, i.e. recommending p
  4. I am spoilt for choice here.. There is so much interesting work I can talk about .. But today I picked a few interesting areas not classically considered to be mainline recommender products.but in keeping with the Ecosystem theme, this application fits right in..Let’s talk about People Recommendations.. BUT not in the context of connecting or knowing or following or rating or dating .. This is about cloning..Recruiters and Head HuntersInterview multiple people for filling one role.Hiring Managers“Hire more like the superstars on my team..”LinkedInRecommend Jobs/News/Groups that “people like you” act on..More conceivable applications : Find similar leads for making a sales pitch, or let me give you a sample of people I want to show this Ad to.. Create me a segment .. or
  5. Extensive set of tooling to target the population.. Yes we sorta shoot ourselves in the foot sometimes.. But then member’s come first. Example audience, in real time.. Let’s advertise tailor their campaigns. Also give a real-time reach estimate.
  6. Solve the impedance mismatch by creating the Ad representation in the user space. This concept is used extensively at LinkedIn for all kinds of user recommendations, not just ads.
  7. 8000 name variants of IBMWe use the definition of entity resolution terminology k−ambiguous and k−variant from [10]. Same company name can denote multiple company entities but each occurrence of a company name references a single entity only. A name referring to k different entities is called k − ambigous. Additionally, An entity which can be referred to by k different names is called k − variant.Ranker approach does not work. A given name may not be resolvable in the sense that the company entity has not being created yet…Classification problemGiven a pair of (member position, company entity), a binary classifier would determine whether there is enough evidence to resolve the member position to the company entity. This would address the problem of the ranking approach in that an unresolvable member position would most likely remain unresolved because the classifier has insufficient evidence for any company entity. It is certainly possible that there could be multiple company entities with sufficient evidence for a member position.
  8. Unreasonable effectiveness of Big Data.. This chart shows the probability of holding a title across all titles, plotted vs number of months after graduation. Notice the spikes.. They are ~12 month almost perfectly aligned.. Remember the itch that you had when you finished 2 years at your company 
  9. A brand new Recommendation Delivery paradigm – Tested on LinkedIn Groups to generate 2X Group Join rate. Applicable to advertising as well..The idea is simple - Reverse the Social Proof idea . Ask the actor to recommend their connections to interact with this item. - The message comes from the individual not LinkedInInherently socially endorsedTimely and contextualCan be applied to Ads delivery which we will be testing in the next few months
  10. Incredibly powerful whetted paradigm that we are excited to try to rope into our Ads offerings
  11. And now the technologies that drives it all. The core our matching algorithm uses Lucene with our custom query implementation. We use Hadoop to scale our platform. It serves a variety of needs from computing Collaborative filtering features, building Lucene indices offline, doing quality analysis of recommendation and host of other exciting thingsLucene does not provide fast real-time indexing. To keep our indices up-to date, we use a real-time indexing library on top of Lucene called Zoie. We provide facets to our members for drilling down and exploring recommendation results. This is made possible by a Faceting Search library called Bobo. For storing features and for caching recommendation results, we use a key-value store Voldemort. For analyzing tracking and reporting data, we use a distributed messaging system called Kafka.Out of these Bobo, Zoie, Voldemort and Kafka are developed at LinkedIn and are open sourced. In fact, Kafka is an apache incubator project.Historically, we have used R for model training. We have recently started experimenting with Mahout for model training.