Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Machine Learning at LINE

1,341 views

Published on

Haruka Kikuchi
LINE / Machine Learning Team

He is going to talk about how machine learning and recommend engine technologies have been planned and implemented to LINE services with examples and overall pictures.

Data Labs is an independent division which is separated from other business departments under the mission of company-wide use of data. It supports various types of LINE services in many ways including providing high level analysis done by data scientists, providing infrastructures for recommend engines and analysis as well as publishing various reports.

This session will focus on initiatives related to machine learning, in particular:

-How they define the role and responsibility of each team to provide various types of machine learning technologies
-What tactics are being used to provide technologies to diverse services and a large amount of users
-How they utilize trendy technologies such as deep learning.

Published in: Technology
  • Be the first to comment

Machine Learning at LINE

  1. 1. MACHINE LEARNING AT LINE Haruka Kikuchi, Data Labs
  2. 2. Agenda • Who We Are • Infrastructure • ML Examples
  3. 3. WHO WE ARE
  4. 4. ● Approx. 80 people total ● Independent from service/dev depts. ● Aggregate various data ● Provide platforms, tools, BI/reports, and ML solutions e.g. recommender engines, etc. DATA LABS Sticker Data Labs Ad Manga Music Live News
  5. 5. Machine Learning MACHINE LEARNING TEAM Project
 Manager Server-side / Infra Engineer Machine Learning Engineer ● ML engineers (multi-skilled) ● Stats, Math ● Deep Learning, NLP, etc. ● Some members play multiple roles
  6. 6. Services Supports 100+ Trainings per day Runs 1000+ Predictions per day Runs 10+ DAILY OUTPUT By Machine Learning Team
  7. 7. INFRASTRUCTURE
  8. 8. SYSTEM OVERVIEW
  9. 9. SYSTEM OVERVIEW
  10. 10. SYSTEM OVERVIEW
  11. 11. SYSTEM OVERVIEW
  12. 12. SYSTEM OVERVIEW
  13. 13. To Build ML Engines DEVELOPMENT ENVIRONMENT
  14. 14. To Test ML Logics AB TEST TOOLSET
  15. 15. SYSTEM OVERVIEW
  16. 16. ML EXAMPLES
  17. 17. CONTENT RECOMMENDATION
  18. 18. Item2ItemUser2Item STICKER RECOMMENDATIONS
  19. 19. #jobs Approx. 5M #sticker packages 100M+ #users per region < 10 STICKER RECOMMENDATIONS
  20. 20. For Sticker Recommendations COLLABORATIVE FILTERING Item2item User2item Purchase History User Activity Similarity 
 among Items Preference Top-N Items 
 for Each Item Top-M Items 
 for Each User
  21. 21. ML COMPUTATION Preprocessing (ETL) Calc. item2item Calc. user2item
  22. 22. Generated Revenue from The User2item Recommendation (within The Top Page) 25%+ PURCHASE
  23. 23. OTHER CONTENT RECOMMENDATIONS Sticker, etc. MangaNEWS Live Parttime Fortune-tellingMusicStore
  24. 24. USER RECOMMENDATION
  25. 25. RECOMMEND USERS (“LOOK A LIKE” AUDIENCE) To Expand Customers Potential Customers Existing
 Customers 200M #total LINE active users
  26. 26. LOTS OF MODELS Customers (Seed Users) Are Very Different 200M #total LINE active users
  27. 27. Relatively small #seed users 10M z-features subset #features 300 Trained models #daily jobs 100 1M For Training “LOOK A LIKE” AUDIENCE
  28. 28. SPARSE DNN Input z-features Dim: 10M Score (0 - 1) Dim: 1 (scalar) Output To Infer Potential Customers
  29. 29. SPARSE DNN Input Z-features Dim: 10M Score (0 - 1) Dim: 1 (scalar) Output To Infer Potential Customers
  30. 30. ML COMPUTATION Training Preprocessing (ETL) Inference
  31. 31. UX IMPROVEMENT
  32. 32. Label Semantic Tags to Sticker Images STICKER AUTO-SUGGEST
  33. 33. MANUAL LABELING
  34. 34. TAG COLLOCATION
  35. 35. Start from Well-Trained Model TRANSFER LEARNING ImageNet dataset ImageNet Categories Xception Model (trained) Input Output Xception Model Sticker Images Sticker Tags (approx. 350) Additional layers (dense) Input Output Xception Model (tuned)
  36. 36. ML COMPUTATION Train a model Preprocessing (ETL) Inference
  37. 37. EXAMPLES True Positives Labelled and predicted correctly False Positives Not Labelled but predicted to label False Negatives Labelled but missed to predict label
  38. 38. “ ” TP FP FN Not labeled by the creator, 
 but correctly inferred Language agnostic
  39. 39. “ ” TP FP FN
  40. 40. False Positives Are Acceptable to Suggest Potential Sticker Availability RECALL > PRECISION
  41. 41. CONTENT RECOMMENDATION REVISITED
  42. 42. To Cope with Cold Start Problem IMAGE-BASED RECOMMENDATION
  43. 43. TWO SIMILARITIES Expressed as Tags AppearanceSemantics Depends on Sticker Creators
  44. 44. TWO MODELS Sticker Images Sticker Tags (approx. 350) Xception Model (tuned) Input Output Xception Model (tuned) Sticker Images Sticker Creators (1000+) Additional layers (dense) Input Output Additional layers (dense) AppearanceSemantics
  45. 45. Per Sticker Image ONE REPRESENTATION Sticker Images Sticker Tags (approx. 350) Xception Model (tuned) Input Output Xception Model (tuned) Sticker Images Sticker Creators (1000+) Input Output Additional layers (dense)Additional layers (dense) Representation of each sticker image
 (feature vector) concat ( ),
  46. 46. ML COMPUTATION Train Model(s) Preprocessing (ETL) Calc. representations
  47. 47. IMAGE SIMILARITIES Target Origin Similar Less Similar More Semantic Less Semantic
  48. 48. EX. #1
  49. 49. EX. #2
  50. 50. EX. #3
  51. 51. EX. #4
  52. 52. CONCLUSION
  53. 53. ● Work with great infrastructure and people ● Allows us to focus on ML ● Design ML to scale by default ● Z-features (reusable, extensible) ● Computationally efficient algorithms ● Language agnostic algorithms HOW WE SCALE ML PROJECTS
  54. 54. ● Who we are ● Infrastructures ● Datalake + ML cluster ● ML examples ● Sticker recommendations ● DNN examples (“look a like” audience, stickers) PRESENTED
  55. 55. ● AB test in detail (presented separately) ● Audio DNN (poster) ● Sparse DNN, Contextual Bandits (poster) ● DNN on mobile (in progress) NOT PRESENTED
  56. 56. ● Virtually accessible to all the LINE services/data. ● Great coworkers ● All the positions are open ● ML engineer, Server/infra engineer, PM WE’RE HIRING
  57. 57. THANKS!

×