TitleCategoriesLI

TITLE CATEGORIZATION 2.1
MENTOR: RAMESH SUBRAMONIAN
TEAM: DATA ANALYTICS
LEADER: DANIEL TUNKELANG
ACKNOWLEDGEMENTS:
SIMLA CEYHAN
DANIEL TUNKELANG
MONICA ROGATTI
LAUREN OLERICH
RON BEKKERMAN
FLOW
CHRISTIAN POSSE

Motivation: CURRENT STATUS
25 JOB FUNCTIONS:
• TOO FEW No Field Sales
• TOO NON-SPECIFIC Reporting is difficult
• TOO BIG

25000 CLEAN JOB TITLES:
• TOO MANY
• TOO BIG (“Owner” ~ 5M)
• TOO SMALL (~ 500)
• TOO SPECIFIC (“Human Resources Info. Sys. Mgr.”)
• TOO NON-SPECIFIC (“Specialist”)

CONSTRAINTS
• INPUT • OUTPUT
CLEAN TITLE “IMPRESSSIONS” Clean title Category
… … … …
facilities manager 95674 Blonde hair Hair stylist
… … stylist
Chair stylist Furniture maker
Title 1 Title 2 Cosine … …
… … Owner VAGUE
(UNCATEGORIZABLE)
(1,0)
barista Independent: not
vague
Doesn’t fit in any
existing category, too
small to form
Category …

CONSTRAINTS (CONTD)
• ~ 200 categories (from Sales: can be dealt with
on human scale)
• Title maps to Unique category
• Precision over coverage
• Coverage ~ 80% of categorizable titles
• 2-3 nearest categories for each category
• 2 alternate categories for each title

Machine solution V00

User Domain Expert Feedback (Ester/Lauren in Sales)

Less than 1.5% change in coverage!
Illustrates “goodness” of computational solution!

Category Nbrs Summary
1215 Account Director Director Business Development 94
1215 Account Director Marketing Manager 22
1215 Account Director Marketing Director 50
16 Account Executive Account Manager 8
16 Account Executive Sales Manager 10
16 Account Executive Director Sales 39
8 Account Manager Sales Manager 10
8 Account Manager Director Sales 39
8 Account Manager Senior Account Manager 103
478 Account Payable Accountant 42
478 Account Payable Accounting Manager 172
478 Account Payable Account Manager 8
42 Accountant Finance Manager 108
42 Accountant Accounting Manager 172
42 Accountant Senior Accountant 147
172 Accounting Manager Accountant 42
172 Accounting Manager Finance Manager 108
172 Accounting Manager Financial Controller 191
23 Administrative Assistant Executive Assistant 45
23 Administrative Assistant Office Manager 34
23 Administrative Assistant Assistant General Manager 326
161 Area Manager Sales Manager 10
161 Area Manager Director Sales 39
161 Area Manager Account Manager 8
83 Art Director Creative Director 123
83 Art Director Web Designer 183
83 Art Director Design Engineer 107
326 Assistant General Manager Business Manager 3567
326 Assistant General Manager Officer 254
326 Assistant General Manager Program Manager 36
37 Assistant Manager Assistant General Manager 326
37 Assistant Manager Officer 254
37 Assistant Manager Sales Manager 10
97 Assistant Professor Instructor 559
97 Assistant Professor Educator 321
97 Assistant Professor Associate Professor 138
138 Associate Professor Instructor 559
138 Associate Professor Assistant Professor 97
138 Associate Professor Lecturer 85
38 Attorney Counsel 785
38 Attorney Legal Assistant 174
38 Attorney Administrative Assistant 23

Status
• Handed over to Ester/Lauren in Sales
• Iteratively incorporate human feedback
• Solution is Public, code is documented and
with Ramesh, working on final report
• ~2-3 new technical innovations
• Developed a proposal for “titles” based on
current understanding of LinkedIn needs

Feedback Functionality: Implemented
• Title:
1. Delete from Category (Independent)
2. Move to vague
3. Move to another category
4. Define new category

Category:
1. Delete if empty
2. Rename
3. Merge with another

Cool Technical stuff
• Distribution of membership over titles
– How used
• Geometry of Title Word vector space
– How used and should be
– Lack of hyperstructure/scale
• How to cluster stars and “Local Dimension”
– How used
– Lack of asymptotic behaviour or transition point
during clustering

Zipf’s law: Log(Imps) vs. Log(rank)

6
LogImps$LogImps

5
4
3

Zipf
Brot

0 1 2 3 4

LogImps$LogRank1

Membership Distribution in Titles
Slope drops to
within some % of -1:
90% members in 6000 titles 0.6 diminishing marginal
10% members in 19000 titles Returns : should be based
on marginal increase in
impsminustitles

potential earnings –
0.4

marginal increase
in overhead costs
0.2

Slope of curve nearly -1
Cut-off Rank ~ 6000
0.0

0 5000 10000 15000 20000

Rank_decr_imps
Slope = -1
%ile of titles by impressions - %ile of titles by rank VS. Rank of title

7/13/11 Grp Mtg RSTate, LinkedIn 16

Projective Word-vector space
Weighted point set
embedded in Euclidean, Based on
XYZ - axis
with induced metric Cosine Sim.
Boundary of nearest
neighbour polyhedra 25008 points
Of Bins. In 50,000 D!

Ti Ti of size ni Recall that n points
define only n-1 D

UVW - axis
ϑij Tj

DIMENSIONALLY SPARSE!, not just in density
ABC - axis
Most angles are nearly 90 deg.s

GEOMETRY OF DATA SPACE:
How should be used:
1. Project Title Word
vectors onto N-1
simplex: Σ 1. 2-3.
components = 1
2. Calculate Mean Word
Vector
3. Drop Titles

Ti
(KLPDS) 4-5.
4. Recalculate the Mean
Word Vector and
MOVE there (increases Tj
discrimination) θ
5. Project vectors onto
unit sphere
6. angle is geodesic
measure
Sin (θ/2) = |Ti-Tj|/2
(distances, density etc.).
As opposed to?

Radial distribution function of Titles
1e+07

Almost all angles are > 45
8e+06

6e+06
count

4e+06

2e+06

0e+00

10 20 30 40 50 60 70
theta

No SCALE OR higher order structure (for hierarchical taxonomy)

Log(count) vs. Theta
7

6

5

4
count

3

2

1

0

10 20 30 40 50 60 70
theta

No scale or higher order structure (for hierarchical taxonomy)

Dimension of Galaxies = Star Clusters

3 2+ 2- 1+

LOCAL DIMENSION
Radius mass
1 1
2 8
3 27
4 64

Exponent (coeff of linear term in log-log plot)
= Dimension (above , it is 3)

Each point (title) has a local dimension Di

Which is used to calculate density of the cluster:

Imps/r^Di

These densities are then compared
and highest selected for categories

Aggregate Radial Distribu on of Titles
8

7

y = 6.5687x - 5.3293
6
log10(Number of Titles)

5

4
logcount

Linear (logcount)
3

2

1

0
1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2
log10(Theta)

Average cluster dimension ~ 6.6

Log(count) vs Dim.
What does “dimension of cluster” mean?

10

8

6
count

4

2

0

0 20 40 60 80 100 120
Dim

Power law evolution of clustering?
No natural break points.

3.6
3.4
3.2
log(AvgDens, 10)

Exponent = -1
3.0
2.8
2.6
2.4

2.2 2.4 2.6 2.8 3.0 3.2 3.4

log(Cats, 10)

FLOW
Big Picture: Taxonomy
Use case 2:
Title categorization Search,
CLIENT: Recruiter, Advertiser, Recc.
Semantic network
Sales Team or Search

Manage Manufa Top Level
Marketing Software choices
ment cturing

Marketing Sales Sales

VP Sales Relational
Sales Rep
Dir. Sales


FLOW
Taxonomy Big Picture: Relational
Title categorization Use case 1:
Semantic network Sales FIELD SALES

Categories
Sales Sales Rep
Sales Assoc.
Sales Mgr Reg. Sales Mgr

Prob
Defn 1 Titles

Prob
Defn 2
Members
PYMJPCOJ

Inadequacy of Cosine Similarity
• Bit vectors differing in 1/3 of their 1-bits
~ 70% Cosine Similarity FLOW
and 70% Sine Dissimilarity
• PROOF of maintaining preference order
does NOT account for Computational
fragility: at θ=6.3o
+/- 0.005 in Cosine => 2.6o – 8.5o in angle
• Vectors at 30 degs have Cosine Sim ~ 85%
• NOT a distance – NO geometry Obtaining Clean titles 2.0
• DOES NOT provide good discrimination
between close neighbours V2.1 LEANER DATA

Even as intermediate means of calculating Deconstruct V2.0 and V2.1
angle, computationally fragile:
• Poor choice, prone to error in region of V2.2 Data Space
interest
• 0 < angle < pi/2 (Maximally dissimilar only
90 degs away!) Title categorization
• Inadequate notion of “maximally Semantic network
dissimilar”

What does LinkedIn want from Titles?
1. Navigational ease for Sales, Search, Recommendation
2. Robust and maintainable structure
3. Dynamic response to labor mkt changes
4. Structure based on Domain expertise, NOT on member
information
5. Assignment of members based on profile and inferred info
6. “Universal” acceptability
7. Free and available? Somebody else done the work?
8. Expand use of LinkedIn as point of entry for
recruiters, based on how they define jobs and use titles in
searches

TitleCategoriesLI

Recommended

Recommended

More Related Content

Similar to TitleCategoriesLI

Similar to TitleCategoriesLI (17)

Recently uploaded

Recently uploaded (20)

TitleCategoriesLI