A use case of online machine learning using Jubatus

14,596 views
14,469 views

Published on

* GOTO Berlin Conference 2013
Toru Shimogaki / NTT DATA CORPORATION


"The realtime processing for web services"
In Recruit Technologies, we are now concentrating on using streaming data processing and machine learning to analyze online user behavior and improve our services. We have a packaged solution named "Genn.ai" to make these technologies widely available in Recruit group. It will be opensourced. Using it, you can extract the power of Storm with simple scripts! In addition, we are making an effort to use online machine learning middleware "Jubatus" in production with NTT DATA.
http://gotocon.com/berlin-2013/presentation/The%20realtime%20processing%20for%20web%20services

Published in: Technology, Business
0 Comments
15 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
14,596
On SlideShare
0
From Embeds
0
Number of Embeds
8,468
Actions
Shares
0
Downloads
0
Comments
0
Likes
15
Embeds 0
No embeds

No notes for slide

A use case of online machine learning using Jubatus

  1. 1. GOTO Berlin Conference 2013 A use case of online machine learning using Jubatus 2013/10/18 NTT DATA Corporation System Platforms Sector OSS Professional Services Toru Shimogaki Copyright © 2013 NTT DATA Corporation
  2. 2. Who is Toru Shimogaki  The Elephant Wizard as a team lead of “NTT DATA OSS Professional Services”  Deep and wide experience deploying Open Source Software technologies for enterprise customers 10+ years Contributor Ex. pg_bulkload Copyright © 2013 NTT DATA Corporation 6+ years Leads Japanese Hadoop Community Co-Author 2nd Edition in Japan 2
  3. 3. About Jubatus (1/2)  OSS Machine Learning Platform developed by NTT Research Laboratories and Preferred Infrastructure, Inc  Motivation: Take the Right Action at the Right Time, at the Right Place Copyright © 2013 NTT DATA Corporation (Source : Hadoop Summit 2013) 3
  4. 4. About Jubatus (2/2)  Distributed Processing Framework and Streaming Machine Learning Libraries  Classification, Regression, Recommendation, Graph Mining, Anomaly Detection  Especially, Jubatus Classifier has small footprint and responds with very low latency. So it is easy to scale for multiple and simultaneous requests. (Source : Hadoop Summit 2013) Copyright © 2013 NTT DATA Corporation 4
  5. 5. Background  “SUUMO” : Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 5
  6. 6. “SUUMO” Real-Estate Service : Customer Reach and Brand Awareness Compared to our competitors , ”SUUMO” has the largest customer reach and has scored highest in brand awareness. Customer reach (Unique Users) (Million) Brand awareness (%) 9.9 64.3 5.2 4.7 2.6 4.7 SUUMO A Real estate B C SUUMO 4.4 9.2 A B C Aided SUUMO awareness SUUMO UU 77.2% 9.9million /month Awareness of Character 【SUUMO】 2013 RECRUIT SUMAI CO., LTD All Rights Reserved 87.6% 6
  7. 7. Background  “SUUMO” : Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 7
  8. 8. Web service for SmartPhone (beta ver.)  Repeatedly present two choices for candidates, then system learns user’s flavor Existing search service simply list candidates with filtering.. Resulted recommendation Present two typical and salient choices. Then a user simply choose more preferable one. Present recommendation based on acquired user’s preference Ask 10 times with countdown Copyright © 2013 NTT DATA Corporation 8
  9. 9. Inside this SUUMO Web service  This SUUMO web services is implemented with the combination of some algorithms  Multidimentional Scaling (MDS)  Jubatus classifier (Passive Aggressive algorithm)  etc. Copyright © 2013 NTT DATA Corporation 9
  10. 10. Building search space by MDS  Building the search spaces for each station by Multidimensional Scaling (MDS)  daily batch processing using R  Goal : to achieve O(log n) search (like binary search)  Using MDS, convert multi dimensional vectors to lower dimensional one with keeping distance relations among houses Rent Fraction on foot Size Deposit Age of a building ... Copyright © 2013 NTT DATA Corporation 10
  11. 11. Learning user’s flavor using Jubatus  Classify user’s flavor for real-estate using Jubatus  Goal : reflect user action to result on real-time (cannot by R)  Using Passive Aggressive algorithm  If score of the area becomes lower than threshold, remove it from search area  Processing this classification with low latency, and easy to scale Initial state 1 clicked 2 clicked 6 clicked 10 clicked  Using this approach :  Keep to include discrete candidate, which is excluded in MDS space search but a user still have some interest with unexpected attributions.  Keep sufficient diversion in order to present “salient” candidate. It is necessary to have distant choices for estimating user’s preference without losing features Copyright © 2013 NTT DATA Corporation 11
  12. 12. Wrap up  Introduced a use case of online machine learning using Jubatus  Now beta service for smartphone is released on SUUMO, which is one of the largest residence service in Japan.  Today I explained  Building a Multi-dimensional Scaling search space  Using Jubatus classifier to understand users preference adaptively  Furute works  Learn from similar users activity using logs  Semantic analysis of sentences input by client Copyright © 2013 NTT DATA Corporation 12
  13. 13. NTT DATA Corporation System Platforms Sector OSS Professional Services URL: http://oss.nttdata.co.jp/hadoop/ mail: hadoop@kits.nttdata.co.jp Copyright © 2013 NTT DATA Corporation

×