SlideShare a Scribd company logo
1 of 30
TITLE CATEGORIZATION 2.1
MENTOR: RAMESH SUBRAMONIAN
    TEAM: DATA ANALYTICS
  LEADER: DANIEL TUNKELANG
         ACKNOWLEDGEMENTS:
SIMLA CEYHAN
DANIEL TUNKELANG
MONICA ROGATTI
LAUREN OLERICH
 RON BEKKERMAN
                             FLOW
CHRISTIAN POSSE
Motivation: CURRENT STATUS
25 JOB FUNCTIONS:
• TOO FEW                                  No Field Sales
• TOO NON-SPECIFIC                         Reporting is difficult
• TOO BIG

25000 CLEAN JOB TITLES:
• TOO MANY
• TOO BIG (“Owner” ~ 5M)
• TOO SMALL (~ 500)
• TOO SPECIFIC (“Human Resources Info. Sys. Mgr.”)
• TOO NON-SPECIFIC (“Specialist”)
CONSTRAINTS
• INPUT                                  • OUTPUT
CLEAN TITLE             “IMPRESSSIONS”   Clean title     Category
…                       …                …               …
facilities manager      95674            Blonde hair     Hair stylist
…                       …                stylist
                                         Chair stylist   Furniture maker
Title 1       Title 2       Cosine       …               …
…             …                          Owner           VAGUE
                                                         (UNCATEGORIZABLE)
                                (1,0)
                                         barista         Independent: not
                                                         vague
                                                         Doesn’t fit in any
                                                         existing category, too
                                                         small to form
                                                         Category …
CONSTRAINTS (CONTD)
• ~ 200 categories (from Sales: can be dealt with
  on human scale)
• Title maps to Unique category
• Precision over coverage
• Coverage ~ 80% of categorizable titles
• 2-3 nearest categories for each category
• 2 alternate categories for each title
Machine solution V00




                 User Domain Expert Feedback (Ester/Lauren in Sales)




Less than 1.5% change in coverage!
Illustrates “goodness” of computational solution!
Title - Category
Summary of Cat.s
Category Business Mgr
Software Engineer
Category Nbrs Summary
     1215   Account Director            Director Business Development     94
     1215   Account Director            Marketing Manager                 22
     1215   Account Director            Marketing Director                50
       16   Account Executive           Account Manager                    8
       16   Account Executive           Sales Manager                     10
       16   Account Executive           Director Sales                    39
        8   Account Manager             Sales Manager                     10
        8   Account Manager             Director Sales                    39
        8   Account Manager             Senior Account Manager           103
      478   Account Payable             Accountant                        42
      478   Account Payable             Accounting Manager               172
      478   Account Payable             Account Manager                    8
       42   Accountant                  Finance Manager                  108
       42   Accountant                  Accounting Manager               172
       42   Accountant                  Senior Accountant                147
      172   Accounting Manager          Accountant                        42
      172   Accounting Manager          Finance Manager                  108
      172   Accounting Manager          Financial Controller             191
       23   Administrative Assistant    Executive Assistant               45
       23   Administrative Assistant    Office Manager                    34
       23   Administrative Assistant    Assistant General Manager        326
      161   Area Manager                Sales Manager                     10
      161   Area Manager                Director Sales                    39
      161   Area Manager                Account Manager                    8
       83   Art Director                Creative Director                123
       83   Art Director                Web Designer                     183
       83   Art Director                Design Engineer                  107
      326   Assistant General Manager   Business Manager                3567
      326   Assistant General Manager   Officer                          254
      326   Assistant General Manager   Program Manager                   36
       37   Assistant Manager           Assistant General Manager        326
       37   Assistant Manager           Officer                          254
       37   Assistant Manager           Sales Manager                     10
       97   Assistant Professor         Instructor                       559
       97   Assistant Professor         Educator                         321
       97   Assistant Professor         Associate Professor              138
      138   Associate Professor         Instructor                       559
      138   Associate Professor         Assistant Professor               97
      138   Associate Professor         Lecturer                          85
       38   Attorney                    Counsel                          785
       38   Attorney                    Legal Assistant                  174
       38   Attorney                    Administrative Assistant          23
Cat. Nbr detail
Status
• Handed over to Ester/Lauren in Sales
• Iteratively incorporate human feedback
• Solution is Public, code is documented and
  with Ramesh, working on final report
• ~2-3 new technical innovations
• Developed a proposal for “titles” based on
  current understanding of LinkedIn needs
Feedback Functionality: Implemented
• Title:
1. Delete from Category (Independent)
2. Move to vague
3. Move to another category
4. Define new category

Category:
1. Delete if empty
2. Rename
3. Merge with another
Cool Technical stuff
• Distribution of membership over titles
  – How used
• Geometry of Title Word vector space
  – How used and should be
  – Lack of hyperstructure/scale
• How to cluster stars and “Local Dimension”
  – How used
  – Lack of asymptotic behaviour or transition point
    during clustering
Zipf’s law: Log(Imps) vs. Log(rank)

                        6
      LogImps$LogImps

                        5
                        4
                        3




                                    Zipf
                                    Brot




                            0   1          2            3   4

                                     LogImps$LogRank1
Membership Distribution in Titles
                                                                                           Slope drops to
                                                                                           within some % of -1:
90% members in 6000 titles              0.6                                                diminishing marginal
10% members in 19000 titles                                                                Returns : should be based
                                                                                           on marginal increase in
                      impsminustitles




                                                                                           potential earnings –
                                        0.4




                                                                                           marginal increase
                                                                                           in overhead costs
                                        0.2




                                                                                           Slope of curve nearly -1
                                                         Cut-off Rank ~ 6000
                                        0.0




                                              0   5000       10000       15000     20000

                                                              Rank_decr_imps
                                                                                                Slope = -1
             %ile of titles by impressions - %ile of titles by rank VS. Rank of title


    7/13/11 Grp Mtg                                             RSTate, LinkedIn                                 16
Projective Word-vector space
Weighted point set
embedded in Euclidean,                                            Based on
                            XYZ - axis
with induced metric                                               Cosine Sim.
                                          Boundary of nearest
                                          neighbour polyhedra     25008 points
                                          Of Bins.                In 50,000 D!

                    Ti                          Ti of size ni     Recall that n points
                                                                  define only n-1 D

                                                     UVW - axis
                           ϑij       Tj

                                    DIMENSIONALLY SPARSE!, not just in density
              ABC - axis
                                    Most angles are nearly 90 deg.s
GEOMETRY OF DATA SPACE:
                                  How should be used:
1. Project Title Word
   vectors onto N-1
   simplex: Σ                      1.                2-3.
   components = 1
2. Calculate Mean Word
   Vector
3. Drop Titles

                                                                Ti
             (KLPDS)                          4-5.
4. Recalculate the Mean
   Word Vector and
   MOVE there (increases                                                     Tj
   discrimination)                                          θ
5. Project vectors onto
   unit sphere
6. angle is geodesic
   measure
                                               Sin (θ/2) = |Ti-Tj|/2
   (distances, density etc.).
                                                                 As opposed to?
Radial distribution function of Titles
            1e+07



       Almost all angles are > 45
            8e+06




            6e+06
    count




            4e+06




            2e+06




            0e+00


                    10   20   30     40    50   60   70
                                   theta


No SCALE OR higher order structure (for hierarchical taxonomy)
Log(count) vs. Theta
        7




        6




        5




        4
count




        3




        2




        1




        0


                10    20     30    40      50   60     70
                                   theta


            No scale or higher order structure (for hierarchical taxonomy)
Galaxies = Star Clusters
Dimension of Galaxies = Star Clusters




3             2+              2-        1+
LOCAL DIMENSION
               Radius               mass
               1                    1
               2                    8
               3                    27
               4                    64

       Exponent (coeff of linear term in log-log plot)
       = Dimension (above , it is 3)

     Each point (title) has a local dimension Di

     Which is used to calculate density of the cluster:

     Imps/r^Di

     These densities are then compared
     and highest selected for categories
Aggregate Radial Distribu on of Titles
                          8



                          7


                                        y = 6.5687x - 5.3293
                          6
log10(Number of Titles)




                          5



                          4
                                                                                                        logcount

                                                                                                        Linear (logcount)
                          3



                          2



                          1



                          0
                              1   1.1     1.2    1.3   1.4       1.5        1.6   1.7   1.8   1.9   2
                                                             log10(Theta)




                                        Average cluster dimension ~ 6.6
Log(count) vs Dim.
             What does “dimension of cluster” mean?


        10




         8




         6
count




         4




         2




         0


              0    20     40   60    80      100   120
                               Dim
Power law evolution of clustering?
                                          No natural break points.




                                    3.6
                                    3.4
                                    3.2
                 log(AvgDens, 10)




 Exponent = -1
                                    3.0
                                    2.8
                                    2.6
                                    2.4




                                           2.2    2.4    2.6        2.8        3.0   3.2   3.4

                                                               log(Cats, 10)
FLOW
                     Big Picture: Taxonomy
                                                                                  Use case 2:
Title categorization                                                               Search,
                                 CLIENT: Recruiter, Advertiser,                     Recc.
Semantic network
                                     Sales Team or Search



                        Manage                                        Manufa      Top Level
    Marketing                               Software                              choices
                         ment                                         cturing




    Marketing             Sales                                                 Sales




                                                           VP Sales                     Relational
       Sales Rep
                         Dir. Sales

   7/13/11 Grp Mtg                      RSTate, LinkedIn                                        27
FLOW
              Taxonomy            Big Picture: Relational
 Title categorization                                                      Use case 1:
 Semantic network                   Sales                                  FIELD SALES

                                                              Categories
              Sales                                                 Sales Rep
                                  Sales Assoc.
                      Sales Mgr                           Reg. Sales Mgr

         Prob
         Defn 1                                                              Titles

Prob
Defn 2
                                                                      Members
                                                         PYMJPCOJ
   7/13/11 Grp Mtg                    RSTate, LinkedIn                                   28
Inadequacy of Cosine Similarity
• Bit vectors differing in 1/3 of their 1-bits
   ~ 70% Cosine Similarity                                             FLOW
 and 70% Sine Dissimilarity
• PROOF of maintaining preference order
   does NOT account for Computational
   fragility: at θ=6.3o
+/- 0.005 in Cosine => 2.6o – 8.5o in angle
• Vectors at 30 degs have Cosine Sim ~ 85%
• NOT a distance – NO geometry                    Obtaining Clean titles 2.0
• DOES NOT provide good discrimination
   between close neighbours                        V2.1 LEANER DATA

Even as intermediate means of calculating        Deconstruct V2.0 and V2.1
angle, computationally fragile:
• Poor choice, prone to error in region of           V2.2 Data Space
   interest
• 0 < angle < pi/2 (Maximally dissimilar only
   90 degs away!)                                   Title categorization
• Inadequate notion of “maximally                   Semantic network
   dissimilar”
What does LinkedIn want from Titles?
1.   Navigational ease for Sales, Search, Recommendation
2.   Robust and maintainable structure
3.   Dynamic response to labor mkt changes
4.   Structure based on Domain expertise, NOT on member
     information
5.   Assignment of members based on profile and inferred info
6.   “Universal” acceptability
7.   Free and available? Somebody else done the work?
8.   Expand use of LinkedIn as point of entry for
     recruiters, based on how they define jobs and use titles in
     searches

More Related Content

Similar to TitleCategoriesLI

Final version social business social media at work
Final version   social business   social media at workFinal version   social business   social media at work
Final version social business social media at work
Atlassian
 
Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...
Banking at Ho Chi Minh city
 
Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...
Banking at Ho Chi Minh city
 
Maven Group Capabilities Statement 2011
Maven Group Capabilities Statement  2011Maven Group Capabilities Statement  2011
Maven Group Capabilities Statement 2011
jmaven
 
March 25 - Expand Your IT Service Delivery Without Expanding Your Budget
March 25 - Expand Your IT Service Delivery Without Expanding Your BudgetMarch 25 - Expand Your IT Service Delivery Without Expanding Your Budget
March 25 - Expand Your IT Service Delivery Without Expanding Your Budget
Kaseya
 
Financial Models for Startups
Financial Models for StartupsFinancial Models for Startups
Financial Models for Startups
Jose Gonzalez
 
Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...
Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...
Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...
InSync Conference
 
Retail presentation
Retail presentationRetail presentation
Retail presentation
guest7ac2b2
 

Similar to TitleCategoriesLI (17)

Final version social business social media at work
Final version   social business   social media at workFinal version   social business   social media at work
Final version social business social media at work
 
Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...
 
Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...Service level management using ibm tivoli service level advisor and tivoli bu...
Service level management using ibm tivoli service level advisor and tivoli bu...
 
snx2005AR
snx2005ARsnx2005AR
snx2005AR
 
Maven Group Capabilities Statement 2011
Maven Group Capabilities Statement  2011Maven Group Capabilities Statement  2011
Maven Group Capabilities Statement 2011
 
6 S Tools Overview
6 S Tools Overview6 S Tools Overview
6 S Tools Overview
 
How Iterating Faster Builds Better Product by Capital One PM
How Iterating Faster Builds Better Product by Capital One PMHow Iterating Faster Builds Better Product by Capital One PM
How Iterating Faster Builds Better Product by Capital One PM
 
Accelerating Software Development
Accelerating Software DevelopmentAccelerating Software Development
Accelerating Software Development
 
Pfau the impact of design thinking
Pfau   the impact of design thinkingPfau   the impact of design thinking
Pfau the impact of design thinking
 
March 25 - Expand Your IT Service Delivery Without Expanding Your Budget
March 25 - Expand Your IT Service Delivery Without Expanding Your BudgetMarch 25 - Expand Your IT Service Delivery Without Expanding Your Budget
March 25 - Expand Your IT Service Delivery Without Expanding Your Budget
 
Business Growth 051209
Business Growth 051209Business Growth 051209
Business Growth 051209
 
Financial Models for Startups
Financial Models for StartupsFinancial Models for Startups
Financial Models for Startups
 
The On-page of SEO for Ecommerce - Adam Audette - SearchFest 2013
The On-page of SEO for Ecommerce - Adam Audette - SearchFest 2013The On-page of SEO for Ecommerce - Adam Audette - SearchFest 2013
The On-page of SEO for Ecommerce - Adam Audette - SearchFest 2013
 
Scaling tricks: practical tips for Scaling in Agile
Scaling tricks: practical tips for Scaling in AgileScaling tricks: practical tips for Scaling in Agile
Scaling tricks: practical tips for Scaling in Agile
 
Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...
Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...
Integrating obiee & essbase with your data warehouse strategy in sync10 oracl...
 
ESG: Storage to Enhance Virtual Infrastructures
ESG: Storage to Enhance Virtual InfrastructuresESG: Storage to Enhance Virtual Infrastructures
ESG: Storage to Enhance Virtual Infrastructures
 
Retail presentation
Retail presentationRetail presentation
Retail presentation
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

How to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cfHow to Check CNIC Information Online with Pakdata cf
How to Check CNIC Information Online with Pakdata cf
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Simplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptxSimplifying Mobile A11y Presentation.pptx
Simplifying Mobile A11y Presentation.pptx
 
Intro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptxIntro to Passkeys and the State of Passwordless.pptx
Intro to Passkeys and the State of Passwordless.pptx
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
WSO2 Micro Integrator for Enterprise Integration in a Decentralized, Microser...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Design Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptxDesign Guidelines for Passkeys 2024.pptx
Design Guidelines for Passkeys 2024.pptx
 
AI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by AnitarajAI in Action: Real World Use Cases by Anitaraj
AI in Action: Real World Use Cases by Anitaraj
 
Decarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational PerformanceDecarbonising Commercial Real Estate: The Role of Operational Performance
Decarbonising Commercial Real Estate: The Role of Operational Performance
 
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...Stronger Together: Developing an Organizational Strategy for Accessible Desig...
Stronger Together: Developing an Organizational Strategy for Accessible Desig...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data PlatformLess Is More: Utilizing Ballerina to Architect a Cloud Data Platform
Less Is More: Utilizing Ballerina to Architect a Cloud Data Platform
 
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
AI+A11Y 11MAY2024 HYDERBAD GAAD 2024 - HelloA11Y (11 May 2024)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
UiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overviewUiPath manufacturing technology benefits and AI overview
UiPath manufacturing technology benefits and AI overview
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 

TitleCategoriesLI

  • 1. TITLE CATEGORIZATION 2.1 MENTOR: RAMESH SUBRAMONIAN TEAM: DATA ANALYTICS LEADER: DANIEL TUNKELANG ACKNOWLEDGEMENTS: SIMLA CEYHAN DANIEL TUNKELANG MONICA ROGATTI LAUREN OLERICH RON BEKKERMAN FLOW CHRISTIAN POSSE
  • 2. Motivation: CURRENT STATUS 25 JOB FUNCTIONS: • TOO FEW No Field Sales • TOO NON-SPECIFIC Reporting is difficult • TOO BIG 25000 CLEAN JOB TITLES: • TOO MANY • TOO BIG (“Owner” ~ 5M) • TOO SMALL (~ 500) • TOO SPECIFIC (“Human Resources Info. Sys. Mgr.”) • TOO NON-SPECIFIC (“Specialist”)
  • 3. CONSTRAINTS • INPUT • OUTPUT CLEAN TITLE “IMPRESSSIONS” Clean title Category … … … … facilities manager 95674 Blonde hair Hair stylist … … stylist Chair stylist Furniture maker Title 1 Title 2 Cosine … … … … Owner VAGUE (UNCATEGORIZABLE) (1,0) barista Independent: not vague Doesn’t fit in any existing category, too small to form Category …
  • 4. CONSTRAINTS (CONTD) • ~ 200 categories (from Sales: can be dealt with on human scale) • Title maps to Unique category • Precision over coverage • Coverage ~ 80% of categorizable titles • 2-3 nearest categories for each category • 2 alternate categories for each title
  • 5. Machine solution V00 User Domain Expert Feedback (Ester/Lauren in Sales) Less than 1.5% change in coverage! Illustrates “goodness” of computational solution!
  • 10. Category Nbrs Summary 1215 Account Director Director Business Development 94 1215 Account Director Marketing Manager 22 1215 Account Director Marketing Director 50 16 Account Executive Account Manager 8 16 Account Executive Sales Manager 10 16 Account Executive Director Sales 39 8 Account Manager Sales Manager 10 8 Account Manager Director Sales 39 8 Account Manager Senior Account Manager 103 478 Account Payable Accountant 42 478 Account Payable Accounting Manager 172 478 Account Payable Account Manager 8 42 Accountant Finance Manager 108 42 Accountant Accounting Manager 172 42 Accountant Senior Accountant 147 172 Accounting Manager Accountant 42 172 Accounting Manager Finance Manager 108 172 Accounting Manager Financial Controller 191 23 Administrative Assistant Executive Assistant 45 23 Administrative Assistant Office Manager 34 23 Administrative Assistant Assistant General Manager 326 161 Area Manager Sales Manager 10 161 Area Manager Director Sales 39 161 Area Manager Account Manager 8 83 Art Director Creative Director 123 83 Art Director Web Designer 183 83 Art Director Design Engineer 107 326 Assistant General Manager Business Manager 3567 326 Assistant General Manager Officer 254 326 Assistant General Manager Program Manager 36 37 Assistant Manager Assistant General Manager 326 37 Assistant Manager Officer 254 37 Assistant Manager Sales Manager 10 97 Assistant Professor Instructor 559 97 Assistant Professor Educator 321 97 Assistant Professor Associate Professor 138 138 Associate Professor Instructor 559 138 Associate Professor Assistant Professor 97 138 Associate Professor Lecturer 85 38 Attorney Counsel 785 38 Attorney Legal Assistant 174 38 Attorney Administrative Assistant 23
  • 12. Status • Handed over to Ester/Lauren in Sales • Iteratively incorporate human feedback • Solution is Public, code is documented and with Ramesh, working on final report • ~2-3 new technical innovations • Developed a proposal for “titles” based on current understanding of LinkedIn needs
  • 13. Feedback Functionality: Implemented • Title: 1. Delete from Category (Independent) 2. Move to vague 3. Move to another category 4. Define new category Category: 1. Delete if empty 2. Rename 3. Merge with another
  • 14. Cool Technical stuff • Distribution of membership over titles – How used • Geometry of Title Word vector space – How used and should be – Lack of hyperstructure/scale • How to cluster stars and “Local Dimension” – How used – Lack of asymptotic behaviour or transition point during clustering
  • 15. Zipf’s law: Log(Imps) vs. Log(rank) 6 LogImps$LogImps 5 4 3 Zipf Brot 0 1 2 3 4 LogImps$LogRank1
  • 16. Membership Distribution in Titles Slope drops to within some % of -1: 90% members in 6000 titles 0.6 diminishing marginal 10% members in 19000 titles Returns : should be based on marginal increase in impsminustitles potential earnings – 0.4 marginal increase in overhead costs 0.2 Slope of curve nearly -1 Cut-off Rank ~ 6000 0.0 0 5000 10000 15000 20000 Rank_decr_imps Slope = -1 %ile of titles by impressions - %ile of titles by rank VS. Rank of title 7/13/11 Grp Mtg RSTate, LinkedIn 16
  • 17. Projective Word-vector space Weighted point set embedded in Euclidean, Based on XYZ - axis with induced metric Cosine Sim. Boundary of nearest neighbour polyhedra 25008 points Of Bins. In 50,000 D! Ti Ti of size ni Recall that n points define only n-1 D UVW - axis ϑij Tj DIMENSIONALLY SPARSE!, not just in density ABC - axis Most angles are nearly 90 deg.s
  • 18. GEOMETRY OF DATA SPACE: How should be used: 1. Project Title Word vectors onto N-1 simplex: Σ 1. 2-3. components = 1 2. Calculate Mean Word Vector 3. Drop Titles Ti (KLPDS) 4-5. 4. Recalculate the Mean Word Vector and MOVE there (increases Tj discrimination) θ 5. Project vectors onto unit sphere 6. angle is geodesic measure Sin (θ/2) = |Ti-Tj|/2 (distances, density etc.). As opposed to?
  • 19. Radial distribution function of Titles 1e+07 Almost all angles are > 45 8e+06 6e+06 count 4e+06 2e+06 0e+00 10 20 30 40 50 60 70 theta No SCALE OR higher order structure (for hierarchical taxonomy)
  • 20. Log(count) vs. Theta 7 6 5 4 count 3 2 1 0 10 20 30 40 50 60 70 theta No scale or higher order structure (for hierarchical taxonomy)
  • 21. Galaxies = Star Clusters
  • 22. Dimension of Galaxies = Star Clusters 3 2+ 2- 1+
  • 23. LOCAL DIMENSION Radius mass 1 1 2 8 3 27 4 64 Exponent (coeff of linear term in log-log plot) = Dimension (above , it is 3) Each point (title) has a local dimension Di Which is used to calculate density of the cluster: Imps/r^Di These densities are then compared and highest selected for categories
  • 24. Aggregate Radial Distribu on of Titles 8 7 y = 6.5687x - 5.3293 6 log10(Number of Titles) 5 4 logcount Linear (logcount) 3 2 1 0 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2 log10(Theta) Average cluster dimension ~ 6.6
  • 25. Log(count) vs Dim. What does “dimension of cluster” mean? 10 8 6 count 4 2 0 0 20 40 60 80 100 120 Dim
  • 26. Power law evolution of clustering? No natural break points. 3.6 3.4 3.2 log(AvgDens, 10) Exponent = -1 3.0 2.8 2.6 2.4 2.2 2.4 2.6 2.8 3.0 3.2 3.4 log(Cats, 10)
  • 27. FLOW Big Picture: Taxonomy Use case 2: Title categorization Search, CLIENT: Recruiter, Advertiser, Recc. Semantic network Sales Team or Search Manage Manufa Top Level Marketing Software choices ment cturing Marketing Sales Sales VP Sales Relational Sales Rep Dir. Sales 7/13/11 Grp Mtg RSTate, LinkedIn 27
  • 28. FLOW Taxonomy Big Picture: Relational Title categorization Use case 1: Semantic network Sales FIELD SALES Categories Sales Sales Rep Sales Assoc. Sales Mgr Reg. Sales Mgr Prob Defn 1 Titles Prob Defn 2 Members PYMJPCOJ 7/13/11 Grp Mtg RSTate, LinkedIn 28
  • 29. Inadequacy of Cosine Similarity • Bit vectors differing in 1/3 of their 1-bits ~ 70% Cosine Similarity FLOW and 70% Sine Dissimilarity • PROOF of maintaining preference order does NOT account for Computational fragility: at θ=6.3o +/- 0.005 in Cosine => 2.6o – 8.5o in angle • Vectors at 30 degs have Cosine Sim ~ 85% • NOT a distance – NO geometry Obtaining Clean titles 2.0 • DOES NOT provide good discrimination between close neighbours V2.1 LEANER DATA Even as intermediate means of calculating Deconstruct V2.0 and V2.1 angle, computationally fragile: • Poor choice, prone to error in region of V2.2 Data Space interest • 0 < angle < pi/2 (Maximally dissimilar only 90 degs away!) Title categorization • Inadequate notion of “maximally Semantic network dissimilar”
  • 30. What does LinkedIn want from Titles? 1. Navigational ease for Sales, Search, Recommendation 2. Robust and maintainable structure 3. Dynamic response to labor mkt changes 4. Structure based on Domain expertise, NOT on member information 5. Assignment of members based on profile and inferred info 6. “Universal” acceptability 7. Free and available? Somebody else done the work? 8. Expand use of LinkedIn as point of entry for recruiters, based on how they define jobs and use titles in searches