© AvitoBasic elements guidelines.
FILES: Avito-LOGO_RGB.eps, Avito-LOGO_CMYK_Pa
RecSys Challenge 2016: Job Recommendation
Based on Factorization Machines and Topic
Modelling
7th place solution
Vasily Leksin, Andrey Ostapets
Avito.ru
15-09-2016
Basic elements guidelines.
Problem statement
Data description
∙ Impressions — details about which items (job postings) were
shown to which user by the existing recommender (19 August
2015 — 9 November 2015).
∙ Interactions — interactions that the user performed on the
items (clicked, bookmarked, replied or deleted).
∙ Users — users details: job roles, career level, discipline,
industry, location, experience, and education.
∙ Items — items details: title, career level, discipline, industry,
location, employment type, tags, created time and flag if item
was active during the test.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 2 / 23
Basic elements guidelines.
Problem statement
Data description: impressions and interactions
Date interval: 2015-08-19 – 2015-11-09
Impressions
∙ 201M unique user-item-week tuples
∙ 2.7M unique users
∙ 846K unique items
Interactions
∙ 8.8M events: clicked – 7.2M, deleted – 1.0M, replied – 422K,
bookmarked – 206K
∙ 785K unique users
∙ 1.03M unique items
∙ 2.8M из 6.9M (user-item) pairs are in impressions
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 3 / 23
Basic elements guidelines.
Problem statement
Data description: target users and items
150K users for making recommendations, from which:
∙ 39.7К (26.5%) have no events
∙ 59.5K (39.6%) have less than 2 events
∙ 70.6K (47.1%) have less than 3 events
327К active items, from which:
∙ 129К (39.5%) have no events
∙ 164K (50.1%) have less than 2 events
∙ 188K (57.6%) have less then 3 events
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 4 / 23
Basic elements guidelines.
Problem statement
Task of the challenge
score(R, ˆR) =
∑︁
u∈U
20(P2(ru, ˆru) + P4(ru, ˆru) + R30(ru, ˆru)+
+S30(ru, ˆru)) + 10(P6(ru, ˆru) + P20(ru, ˆru)),
where
U = {0, . . . , N − 1} – list of target users,
R = {ru}u∈U – lists of relevant items,
ˆR = {ˆru}u∈U – the solution,
Pk(ru, ˆru) – precision at top k for user u,
R30(ru, ˆru) – recall at top 30,
S30(ru, ˆru) – user success.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 5 / 23
Basic elements guidelines.
Problem statement
Models validation
∙ The last week of interactions
∙ 10 000 random users from those who made any interactions
during this week
∙ Old items (created more than a month ago) without
interactions were removed
∙ Obtained score was highly correlated with the result on the
Public Leaderboard
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 6 / 23
Basic elements guidelines.
Solution of the team
Interesting insights from data
∙ A significant proportion of users and items have a small
number of events or have no events. It means that we need to
use a hybrid approach that takes into account not only
collaborative filtering but the content data of items and users.
∙ Impressions slowly change over time. That is, the presence of a
pair of user-item in impressions is a useful feature, and we use
it as the separate model.
∙ Geographical features (distance, region, city, geoclusters etc.)
are not improve score significantly.
∙ Tokens from user profiles and items are good features.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 7 / 23
Basic elements guidelines.
Solution of the team
Interesting insights from data
User profile and sessions exampleCV
experience_n_experiencediscipline_id_user country_useregion_user jobroles
5 or more entr 10‐15 yearsSales & Commerce Germany Bavaria ['962959', '283291', '502342']
SESSIONS
created_at impressiondiscipline_id_item country_itemregion_item title tags
09‐01 1:27 1 Production & ManufactGermany Baden‐Württem['620383', '1118975'] ['102823', '1335184', '624061', '73234', '1604815', '2862074' 
09‐01 1:27 1 Other Disciplines Germany Hamburg ['4572761', '3543754', '196892['993979', '2426818', '792504', '4425481', '494116', '976257' 
09‐01 1:27 1 Other Disciplines Germany Berlin ['18091'] ['4198994', '4182900', '4354582', '1399193', '1377742', '3580 
09‐02 0:21 1 Health, Medical & Socianon_dach not specified ['165415', '1986087', '2585795['2426818', '3726822', '792504', '1830721', '184797', '325622 
09‐04 20:46 0 IT & Software DevelopmGermany Brandenburg ['655030'] ['1491612', '972718', '2426818', '2383555', '4483314', '43216 
09‐08 20:16 1 Other Disciplines Germany Hamburg ['2915824', '4035399', '156769['2110329', '503870', '2426818', '1437930', '2245760', '35922 
09‐08 22:40 1 Sales & Commerce Germany not specified ['3418410', '3413328'] ['686709', '2036672', '3794933', '502342', '3413328', '117856 
09‐08 22:41 1 IT & Software Developmnon_dach not specified ['3408137'] ['2632767', '1491612', '2245760', '689679', '1565617', '43216 
09‐09 22:58 1 Administration Germany Berlin ['4141254', '1118975'] ['4162864', '1491612', '1565617', '689679', '4204056', '15454 
09‐09 23:00 0 Production & ManufactGermany Lower Saxony ['4454260', '502342'] ['543177', '4160943', '2501578', '4329775', '3085937', '23421 
09‐09 23:01 0 Other Disciplines Germany Lower Saxony ['1567693', '568776'] ['1178568', '1248479', '370640', '2342166', '94890', '3794933 
09‐09 23:02 1 Health, Medical & Socianon_dach not specified ['1567693'] ['1178568', '1565617', '1491612', '1601282', '2380081', '4941 
09‐09 23:08 0 Finance, Accounting & CAustria not specified ['2865345', '3294368'] ['3391339', '3176219', '4499767', '2426818', '798840', '37295 
09‐09 23:08 0 Production & ManufactGermany Lower Saxony ['494116'] ['3176219', '159096', '3391339', '4499767', '494116', '372957 
09‐09 23:08 0 Health, Medical & SociaSwitzerland not specified ['128836', '1836819'] ['1798728', '128836', '675557', '2976021']  
09‐09 23:09 0 Production & ManufactAustria not specified ['2846960', '76751', '4227194',['2846960', '2632767', '3872048', '3939477', '1469275', '1695 
09‐09 23:09 0 Engineering & TechnicaGermany Berlin ['4141254'] ['749243', '362736', '692505', '3669898', '624061', '494116']  
09‐10 0:59 1 Production & ManufactAustria not specified ['1118975', '3478136'] ['502342', '4151211', '4439048', '3210328', '624061', '313759 
09‐10 0:59 1 Engineering & TechnicaGermany Bavaria ['128836', '76887'] ['816406', '3347566', '502342', '4425481', '4160943', '160481 
09‐10 0:59 1 Engineering & TechnicaGermany North Rhine‐W ['1119117', '3705605', '347813['2896178', '1357922', '2031982', '1491612', '1830721', '1335 
09‐10 1:00 0 Other Disciplines Austria not specified ['2915824', '4035399', '156769['1565617', '1496767', '82994', '1625244', '1941434', '123188 
09‐10 1:00 1 Teaching, R&D Germany North Rhine‐W ['1986087'] ['3144475', '4245173', '3096790', '655817', '2969837', '43216 
09‐10 1:00 0 Other Disciplines Germany not specified ['2573697', '4035399', '448435['3457262', '3658040', '2126708', '2110329', '2630003', '4017 
09‐10 1:00 0 Engineering & TechnicaGermany Baden‐Württem['2140778', '3241763'] ['1734724', '2000691', '4425481', '2111897', '577140', '94890 
09‐10 1:00 1 Management & Corpor Germany Bavaria ['494116', '1119117', '2387379['4245173', '1231885', '272304', '4140111', '4321623', '18307 
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 8 / 23
Basic elements guidelines.
Solution of the team
Item-based collaborative filtering
Similarity metrics:
∙ Jaccard
∙ Cosine
∙ Pearson
Event types for training:
∙ All Positive interactions
∙ Only Click interactions
∙ Impressions
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 9 / 23
Basic elements guidelines.
Solution of the team
Factorization Machines
Predicted score for user i on item j is given by:
p(i,j) = 𝜇 + wi + wj + aT
xi + bT
yj + uT
i vj ,
where
𝜇 – a global bias term,
wi and wj are weight terms for user i and item j respectively,
xi and yj are the user and item side feature vectors,
a and b are the weight vectors for those side features,
ui and vj – latent factors, which are vectors of fixed length
(number of factors is a parameter).
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 10 / 23
Basic elements guidelines.
Solution of the team
Factorization Machines: main parameters
∙ Number of latent factors (30 – 400)
∙ Number of sampled negative examples (1 – 12)
∙ Maximum number of iterations (25 – 70)
∙ Regularization parameters (1e-9 – 1e-7)
∙ User and item side features
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 11 / 23
Basic elements guidelines.
Solution of the team
Factorization Machines: side features
Users - all features (OneHotEncoder)
∙ jobroles
∙ career_level, discipline_id, industry_id
∙ country, region
∙ experience: n_entries_class, years, years_in_current
∙ edu: degree, field_of_studies
Items - all features, except latitude and longitude
(OneHotEncoder)
∙ title, tags
∙ career_level, discipline_id, industry_id
∙ country, region
∙ employment_type
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 12 / 23
Basic elements guidelines.
Solution of the team
Topic model: Latent Semantic Indexing (LSI)
∙ Let document associated with each user be all title and tags
tokens of items, which the user interacted with and job roles
tokens from user description.
∙ Convert each document into a token occurrences vector.
∙ Transform values in each vector to TF-IDF statistics and
combine all vectors into a large token-document matrix.
∙ Then we apply Singular Value Decomposition (SVD) technique
on the token-document matrix
∙ The similarity between user and item will be the similarity
between corresponding latent vectors.
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 13 / 23
Basic elements guidelines.
Solution of the team
Solution framework
Initial dataset
Item­based
models
FM models Topic model
Blending
Output
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 14 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
Base models:
FR0 SIM0 PI Local Score
1 0 0 76995
0 1 0 69622
0 0 1 104495
1 1 1 132505
∙ SIM0 – Item-based Recommender (jaccard similarity)
∙ FR0 – Factorization Machines Recommender (400 factors)
∙ PI – Past Impressions Recommender (very simple model with
binary output)
∙ Local Score: 10 000 random users who made interactions with
items during last week
∙ The score on the Public Leaderboard ≈ 3.8 × the score on our
Local Validation
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 15 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
Version «zero»:
FR0 SIM0 PI Local Score
1 2 1 134285
The first version:
FR0 SIM0 FR8.0
0 * SIM0 PI Local Score
1 13 8 1 138073
The second version:
FR1 SIM0 FR8.0
1 * SIM0 PI Local Score
1 13 8 1 140876
FR1 = FR_f 100_i25
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 16 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
The third version:
FR2 SIM2 FR8.0
2 * SIM2 PI Local Score
1 13 8 1 143653
FR2 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70
SIM2 = 0.5*SIM_jac + 0.5*SIM_click
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 17 / 23
Basic elements guidelines.
Solution of the team
Linear Ensemble
Local Score (145841):
1.0*FR3 + 15.0 * (FR8.0
3 * SIM3) + 13.0 * SIM3 + 1.0 * PI - 0.5 *
SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 *
(FR_imp2.0
* SIM_imp), where
FR3 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70
SIM3 = SIM_click
Local Score (146569):
1.0*FR4 + 15.0 * (FR8.0
4 * SIM4) + 13.0 * SIM4 + 1.0 * PI - 0.4 *
SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 *
(FR_imp2.0
* SIM_imp) + 0.2 * TM, where
FR4 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70
SIM4 = 0.4*SIM_jac + 0.6*SIM_t = 1
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 18 / 23
Basic elements guidelines.
Solution of the team
Final models set
SIM_jac Item-based jaccard similarity
SIM_click Item-based jaccard similarity on clicks
SIM_pearson Item-based pearson similarity
SIM_imp Item-based jaccard similarity on impressions
FR_f100_i25 Factorization, n_factors=100, iter=25
FR_f400_i70 Factorization, n_factors=400, iter=70
FR_f400_i50_no_side Factorization, no side data
FR_imp Factorization on impressions
TM LSI topic model
PI Past Impressions
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 19 / 23
Basic elements guidelines.
Solution of the team
Hardware&Software
∙ 1 server: 28 cores, 56 threads, 256Gb RAM
∙ Full training + prediction = 16 hours
∙ All code was written in Python
∙ ML Libraries: graphlab, gensim
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 20 / 23
Basic elements guidelines.
Leaderboard
Submission history
Score Rank Name Date
554655 9 Topic model 100 factors 06/27/16
548366 8 Top 150 candidates from every model 06/25/16
543284 8 8 model set: 4FR + 4SIM + TM 06/24/16
537157 9 Topic model 06/23/16
530599 10 3 models set: FR + 2SIM 06/23/16
497136 15 3 models set: FR + 2SIM 06/22/16
496241 1 FR with side data 03/20/16
397604 1 Past Impressions model 03/11/16
132790 1 Simple item-based recommender 03/10/16
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 21 / 23
Basic elements guidelines.
Leaderboard
Results
Rank Team Leaderboard Score Full Score
1 YunOS-OneSearch 681707.38 2052185.54
2 mim-solutions 675985.03 2035964.16
3 DaveXster 665592.06 2005263.73
4 PumpkinPie 622408.55 1866477.77
5 milk tea 613125.21 1846420.12
6 mdr_rec 605048.58 1823472.31
7 Avito 554654.72 1677898.52
8 recometric 556133.18 1677233.84
9 nodalpoints 555483.39 1671812.08
10 lucky_dog 542213.51 1632828.82
21 XING_TELECOM 461000.32 1397030.74
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 22 / 23
Thank you!
Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 23 / 23

Avito recsys-challenge-2016RecSys Challenge 2016: Job Recommendation Based on Factorization Machines and Topic Modelling

  • 1.
    © AvitoBasic elementsguidelines. FILES: Avito-LOGO_RGB.eps, Avito-LOGO_CMYK_Pa RecSys Challenge 2016: Job Recommendation Based on Factorization Machines and Topic Modelling 7th place solution Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016
  • 2.
    Basic elements guidelines. Problemstatement Data description ∙ Impressions — details about which items (job postings) were shown to which user by the existing recommender (19 August 2015 — 9 November 2015). ∙ Interactions — interactions that the user performed on the items (clicked, bookmarked, replied or deleted). ∙ Users — users details: job roles, career level, discipline, industry, location, experience, and education. ∙ Items — items details: title, career level, discipline, industry, location, employment type, tags, created time and flag if item was active during the test. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 2 / 23
  • 3.
    Basic elements guidelines. Problemstatement Data description: impressions and interactions Date interval: 2015-08-19 – 2015-11-09 Impressions ∙ 201M unique user-item-week tuples ∙ 2.7M unique users ∙ 846K unique items Interactions ∙ 8.8M events: clicked – 7.2M, deleted – 1.0M, replied – 422K, bookmarked – 206K ∙ 785K unique users ∙ 1.03M unique items ∙ 2.8M из 6.9M (user-item) pairs are in impressions Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 3 / 23
  • 4.
    Basic elements guidelines. Problemstatement Data description: target users and items 150K users for making recommendations, from which: ∙ 39.7К (26.5%) have no events ∙ 59.5K (39.6%) have less than 2 events ∙ 70.6K (47.1%) have less than 3 events 327К active items, from which: ∙ 129К (39.5%) have no events ∙ 164K (50.1%) have less than 2 events ∙ 188K (57.6%) have less then 3 events Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 4 / 23
  • 5.
    Basic elements guidelines. Problemstatement Task of the challenge score(R, ˆR) = ∑︁ u∈U 20(P2(ru, ˆru) + P4(ru, ˆru) + R30(ru, ˆru)+ +S30(ru, ˆru)) + 10(P6(ru, ˆru) + P20(ru, ˆru)), where U = {0, . . . , N − 1} – list of target users, R = {ru}u∈U – lists of relevant items, ˆR = {ˆru}u∈U – the solution, Pk(ru, ˆru) – precision at top k for user u, R30(ru, ˆru) – recall at top 30, S30(ru, ˆru) – user success. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 5 / 23
  • 6.
    Basic elements guidelines. Problemstatement Models validation ∙ The last week of interactions ∙ 10 000 random users from those who made any interactions during this week ∙ Old items (created more than a month ago) without interactions were removed ∙ Obtained score was highly correlated with the result on the Public Leaderboard Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 6 / 23
  • 7.
    Basic elements guidelines. Solutionof the team Interesting insights from data ∙ A significant proportion of users and items have a small number of events or have no events. It means that we need to use a hybrid approach that takes into account not only collaborative filtering but the content data of items and users. ∙ Impressions slowly change over time. That is, the presence of a pair of user-item in impressions is a useful feature, and we use it as the separate model. ∙ Geographical features (distance, region, city, geoclusters etc.) are not improve score significantly. ∙ Tokens from user profiles and items are good features. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 7 / 23
  • 8.
    Basic elements guidelines. Solutionof the team Interesting insights from data User profile and sessions exampleCV experience_n_experiencediscipline_id_user country_useregion_user jobroles 5 or more entr 10‐15 yearsSales & Commerce Germany Bavaria ['962959', '283291', '502342'] SESSIONS created_at impressiondiscipline_id_item country_itemregion_item title tags 09‐01 1:27 1 Production & ManufactGermany Baden‐Württem['620383', '1118975'] ['102823', '1335184', '624061', '73234', '1604815', '2862074'  09‐01 1:27 1 Other Disciplines Germany Hamburg ['4572761', '3543754', '196892['993979', '2426818', '792504', '4425481', '494116', '976257'  09‐01 1:27 1 Other Disciplines Germany Berlin ['18091'] ['4198994', '4182900', '4354582', '1399193', '1377742', '3580  09‐02 0:21 1 Health, Medical & Socianon_dach not specified ['165415', '1986087', '2585795['2426818', '3726822', '792504', '1830721', '184797', '325622  09‐04 20:46 0 IT & Software DevelopmGermany Brandenburg ['655030'] ['1491612', '972718', '2426818', '2383555', '4483314', '43216  09‐08 20:16 1 Other Disciplines Germany Hamburg ['2915824', '4035399', '156769['2110329', '503870', '2426818', '1437930', '2245760', '35922  09‐08 22:40 1 Sales & Commerce Germany not specified ['3418410', '3413328'] ['686709', '2036672', '3794933', '502342', '3413328', '117856  09‐08 22:41 1 IT & Software Developmnon_dach not specified ['3408137'] ['2632767', '1491612', '2245760', '689679', '1565617', '43216  09‐09 22:58 1 Administration Germany Berlin ['4141254', '1118975'] ['4162864', '1491612', '1565617', '689679', '4204056', '15454  09‐09 23:00 0 Production & ManufactGermany Lower Saxony ['4454260', '502342'] ['543177', '4160943', '2501578', '4329775', '3085937', '23421  09‐09 23:01 0 Other Disciplines Germany Lower Saxony ['1567693', '568776'] ['1178568', '1248479', '370640', '2342166', '94890', '3794933  09‐09 23:02 1 Health, Medical & Socianon_dach not specified ['1567693'] ['1178568', '1565617', '1491612', '1601282', '2380081', '4941  09‐09 23:08 0 Finance, Accounting & CAustria not specified ['2865345', '3294368'] ['3391339', '3176219', '4499767', '2426818', '798840', '37295  09‐09 23:08 0 Production & ManufactGermany Lower Saxony ['494116'] ['3176219', '159096', '3391339', '4499767', '494116', '372957  09‐09 23:08 0 Health, Medical & SociaSwitzerland not specified ['128836', '1836819'] ['1798728', '128836', '675557', '2976021']   09‐09 23:09 0 Production & ManufactAustria not specified ['2846960', '76751', '4227194',['2846960', '2632767', '3872048', '3939477', '1469275', '1695  09‐09 23:09 0 Engineering & TechnicaGermany Berlin ['4141254'] ['749243', '362736', '692505', '3669898', '624061', '494116']   09‐10 0:59 1 Production & ManufactAustria not specified ['1118975', '3478136'] ['502342', '4151211', '4439048', '3210328', '624061', '313759  09‐10 0:59 1 Engineering & TechnicaGermany Bavaria ['128836', '76887'] ['816406', '3347566', '502342', '4425481', '4160943', '160481  09‐10 0:59 1 Engineering & TechnicaGermany North Rhine‐W ['1119117', '3705605', '347813['2896178', '1357922', '2031982', '1491612', '1830721', '1335  09‐10 1:00 0 Other Disciplines Austria not specified ['2915824', '4035399', '156769['1565617', '1496767', '82994', '1625244', '1941434', '123188  09‐10 1:00 1 Teaching, R&D Germany North Rhine‐W ['1986087'] ['3144475', '4245173', '3096790', '655817', '2969837', '43216  09‐10 1:00 0 Other Disciplines Germany not specified ['2573697', '4035399', '448435['3457262', '3658040', '2126708', '2110329', '2630003', '4017  09‐10 1:00 0 Engineering & TechnicaGermany Baden‐Württem['2140778', '3241763'] ['1734724', '2000691', '4425481', '2111897', '577140', '94890  09‐10 1:00 1 Management & Corpor Germany Bavaria ['494116', '1119117', '2387379['4245173', '1231885', '272304', '4140111', '4321623', '18307  Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 8 / 23
  • 9.
    Basic elements guidelines. Solutionof the team Item-based collaborative filtering Similarity metrics: ∙ Jaccard ∙ Cosine ∙ Pearson Event types for training: ∙ All Positive interactions ∙ Only Click interactions ∙ Impressions Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 9 / 23
  • 10.
    Basic elements guidelines. Solutionof the team Factorization Machines Predicted score for user i on item j is given by: p(i,j) = 𝜇 + wi + wj + aT xi + bT yj + uT i vj , where 𝜇 – a global bias term, wi and wj are weight terms for user i and item j respectively, xi and yj are the user and item side feature vectors, a and b are the weight vectors for those side features, ui and vj – latent factors, which are vectors of fixed length (number of factors is a parameter). Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 10 / 23
  • 11.
    Basic elements guidelines. Solutionof the team Factorization Machines: main parameters ∙ Number of latent factors (30 – 400) ∙ Number of sampled negative examples (1 – 12) ∙ Maximum number of iterations (25 – 70) ∙ Regularization parameters (1e-9 – 1e-7) ∙ User and item side features Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 11 / 23
  • 12.
    Basic elements guidelines. Solutionof the team Factorization Machines: side features Users - all features (OneHotEncoder) ∙ jobroles ∙ career_level, discipline_id, industry_id ∙ country, region ∙ experience: n_entries_class, years, years_in_current ∙ edu: degree, field_of_studies Items - all features, except latitude and longitude (OneHotEncoder) ∙ title, tags ∙ career_level, discipline_id, industry_id ∙ country, region ∙ employment_type Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 12 / 23
  • 13.
    Basic elements guidelines. Solutionof the team Topic model: Latent Semantic Indexing (LSI) ∙ Let document associated with each user be all title and tags tokens of items, which the user interacted with and job roles tokens from user description. ∙ Convert each document into a token occurrences vector. ∙ Transform values in each vector to TF-IDF statistics and combine all vectors into a large token-document matrix. ∙ Then we apply Singular Value Decomposition (SVD) technique on the token-document matrix ∙ The similarity between user and item will be the similarity between corresponding latent vectors. Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 13 / 23
  • 14.
    Basic elements guidelines. Solutionof the team Solution framework Initial dataset Item­based models FM models Topic model Blending Output Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 14 / 23
  • 15.
    Basic elements guidelines. Solutionof the team Linear Ensemble Base models: FR0 SIM0 PI Local Score 1 0 0 76995 0 1 0 69622 0 0 1 104495 1 1 1 132505 ∙ SIM0 – Item-based Recommender (jaccard similarity) ∙ FR0 – Factorization Machines Recommender (400 factors) ∙ PI – Past Impressions Recommender (very simple model with binary output) ∙ Local Score: 10 000 random users who made interactions with items during last week ∙ The score on the Public Leaderboard ≈ 3.8 × the score on our Local Validation Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 15 / 23
  • 16.
    Basic elements guidelines. Solutionof the team Linear Ensemble Version «zero»: FR0 SIM0 PI Local Score 1 2 1 134285 The first version: FR0 SIM0 FR8.0 0 * SIM0 PI Local Score 1 13 8 1 138073 The second version: FR1 SIM0 FR8.0 1 * SIM0 PI Local Score 1 13 8 1 140876 FR1 = FR_f 100_i25 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 16 / 23
  • 17.
    Basic elements guidelines. Solutionof the team Linear Ensemble The third version: FR2 SIM2 FR8.0 2 * SIM2 PI Local Score 1 13 8 1 143653 FR2 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70 SIM2 = 0.5*SIM_jac + 0.5*SIM_click Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 17 / 23
  • 18.
    Basic elements guidelines. Solutionof the team Linear Ensemble Local Score (145841): 1.0*FR3 + 15.0 * (FR8.0 3 * SIM3) + 13.0 * SIM3 + 1.0 * PI - 0.5 * SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 * (FR_imp2.0 * SIM_imp), where FR3 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70 SIM3 = SIM_click Local Score (146569): 1.0*FR4 + 15.0 * (FR8.0 4 * SIM4) + 13.0 * SIM4 + 1.0 * PI - 0.4 * SIM_pearson - 0.3 * FR_f 400_i50_no_side + 0.5 * (FR_imp2.0 * SIM_imp) + 0.2 * TM, where FR4 = 0.5*FR_f 100_i25 + 0.5*FR_f 400_i70 SIM4 = 0.4*SIM_jac + 0.6*SIM_t = 1 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 18 / 23
  • 19.
    Basic elements guidelines. Solutionof the team Final models set SIM_jac Item-based jaccard similarity SIM_click Item-based jaccard similarity on clicks SIM_pearson Item-based pearson similarity SIM_imp Item-based jaccard similarity on impressions FR_f100_i25 Factorization, n_factors=100, iter=25 FR_f400_i70 Factorization, n_factors=400, iter=70 FR_f400_i50_no_side Factorization, no side data FR_imp Factorization on impressions TM LSI topic model PI Past Impressions Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 19 / 23
  • 20.
    Basic elements guidelines. Solutionof the team Hardware&Software ∙ 1 server: 28 cores, 56 threads, 256Gb RAM ∙ Full training + prediction = 16 hours ∙ All code was written in Python ∙ ML Libraries: graphlab, gensim Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 20 / 23
  • 21.
    Basic elements guidelines. Leaderboard Submissionhistory Score Rank Name Date 554655 9 Topic model 100 factors 06/27/16 548366 8 Top 150 candidates from every model 06/25/16 543284 8 8 model set: 4FR + 4SIM + TM 06/24/16 537157 9 Topic model 06/23/16 530599 10 3 models set: FR + 2SIM 06/23/16 497136 15 3 models set: FR + 2SIM 06/22/16 496241 1 FR with side data 03/20/16 397604 1 Past Impressions model 03/11/16 132790 1 Simple item-based recommender 03/10/16 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 21 / 23
  • 22.
    Basic elements guidelines. Leaderboard Results RankTeam Leaderboard Score Full Score 1 YunOS-OneSearch 681707.38 2052185.54 2 mim-solutions 675985.03 2035964.16 3 DaveXster 665592.06 2005263.73 4 PumpkinPie 622408.55 1866477.77 5 milk tea 613125.21 1846420.12 6 mdr_rec 605048.58 1823472.31 7 Avito 554654.72 1677898.52 8 recometric 556133.18 1677233.84 9 nodalpoints 555483.39 1671812.08 10 lucky_dog 542213.51 1632828.82 21 XING_TELECOM 461000.32 1397030.74 Vasily Leksin, Andrey Ostapets Avito.ru 15-09-2016 22 / 23
  • 23.
    Thank you! Vasily Leksin,Andrey Ostapets Avito.ru 15-09-2016 23 / 23