GOTO Berlin Conference 2013

A use case of online machine learning
using Jubatus
2013/10/18
NTT DATA Corporation System Platforms Sector
OSS Professional Services
Toru Shimogaki

Copyright © 2013 NTT DATA Corporation
Who is Toru Shimogaki
 The Elephant Wizard as a team lead of “NTT DATA OSS
Professional Services”
 Deep and wide experience deploying Open Source
Software technologies for enterprise customers

10+ years
Contributor
Ex. pg_bulkload
Copyright © 2013 NTT DATA Corporation

6+ years
Leads Japanese
Hadoop Community

Co-Author
2nd Edition
in Japan
2
About Jubatus (1/2)
 OSS Machine Learning Platform developed by NTT
Research Laboratories and Preferred Infrastructure, Inc
 Motivation: Take the Right Action at the Right Time, at
the Right Place

Copyright © 2013 NTT DATA Corporation

(Source : Hadoop Summit 2013)

3
About Jubatus (2/2)
 Distributed Processing Framework and Streaming
Machine Learning Libraries
 Classification, Regression, Recommendation, Graph Mining,
Anomaly Detection
 Especially,
Jubatus Classifier has
small footprint and
responds with very low
latency. So it is easy
to scale for multiple
and simultaneous
requests.
(Source : Hadoop Summit 2013)
Copyright © 2013 NTT DATA Corporation

4
Background
 “SUUMO” : Online Service for Real-Estate Business
Improve usability for smartphone access
Navigate those who don’t know how to search
residences
Efficiently approach the first time customer and
non regular short-time access users

Copyright © 2013 NTT DATA Corporation

5
“SUUMO” Real-Estate Service : Customer Reach and Brand Awareness
Compared to our competitors , ”SUUMO” has the largest customer reach and has scored highest in
brand awareness.

Customer reach (Unique Users)
(Million)

Brand awareness
(%)

9.9
64.3
5.2

4.7
2.6
4.7

SUUMO

A
Real estate

B

C

SUUMO

4.4

9.2

A

B

C

Aided SUUMO awareness
SUUMO UU

77.2%

9.9million /month

Awareness of Character 【SUUMO】

2013 RECRUIT SUMAI CO., LTD All Rights Reserved

87.6%

6
Background
 “SUUMO” : Online Service for Real-Estate Business
Improve usability for smartphone access
Navigate those who don’t know how to search
residences
Efficiently approach the first time customer and
non regular short-time access users

Copyright © 2013 NTT DATA Corporation

7
Web service for SmartPhone (beta ver.)
 Repeatedly present two choices for candidates, then
system learns user’s flavor
Existing search service
simply list candidates with
filtering..

Resulted
recommendation

Present two typical
and salient choices.
Then a user simply
choose more
preferable one.

Present recommendation
based on acquired
user’s preference

Ask 10 times
with countdown
Copyright © 2013 NTT DATA Corporation

8
Inside this SUUMO Web service
 This SUUMO web services is implemented with the
combination of some algorithms
 Multidimentional Scaling (MDS)
 Jubatus classifier (Passive Aggressive algorithm)
 etc.

Copyright © 2013 NTT DATA Corporation

9
Building search space by MDS
 Building the search spaces for each station by Multidimensional
Scaling (MDS)
 daily batch processing using R

 Goal : to achieve O(log n) search (like binary search)
 Using MDS, convert multi dimensional vectors to lower dimensional one
with keeping distance relations among houses
Rent
Fraction on foot
Size
Deposit
Age of a building
...
Copyright © 2013 NTT DATA Corporation

10
Learning user’s flavor using Jubatus
 Classify user’s flavor for real-estate using Jubatus
 Goal : reflect user action to result on real-time (cannot by R)
 Using Passive Aggressive algorithm
 If score of the area becomes lower than threshold, remove it from search area
 Processing this classification with low latency, and easy to scale

Initial state

1 clicked

2 clicked

6 clicked

10 clicked

 Using this approach :
 Keep to include discrete candidate, which is excluded in MDS space search but a
user still have some interest with unexpected attributions.
 Keep sufficient diversion in order to present “salient” candidate. It is necessary to
have distant choices for estimating user’s preference without losing features
Copyright © 2013 NTT DATA Corporation

11
Wrap up
 Introduced a use case of online machine learning using Jubatus
 Now beta service for smartphone is released on SUUMO, which
is one of the largest residence service in Japan.
 Today I explained
 Building a Multi-dimensional Scaling search space
 Using Jubatus classifier to understand users preference adaptively

 Furute works
 Learn from similar users activity using logs
 Semantic analysis of sentences input by client

Copyright © 2013 NTT DATA Corporation

12
NTT DATA Corporation System Platforms Sector
OSS Professional Services
URL:
http://oss.nttdata.co.jp/hadoop/
mail: hadoop@kits.nttdata.co.jp
Copyright © 2013 NTT DATA Corporation

A use case of online machine learning using Jubatus

  • 1.
    GOTO Berlin Conference2013 A use case of online machine learning using Jubatus 2013/10/18 NTT DATA Corporation System Platforms Sector OSS Professional Services Toru Shimogaki Copyright © 2013 NTT DATA Corporation
  • 2.
    Who is ToruShimogaki  The Elephant Wizard as a team lead of “NTT DATA OSS Professional Services”  Deep and wide experience deploying Open Source Software technologies for enterprise customers 10+ years Contributor Ex. pg_bulkload Copyright © 2013 NTT DATA Corporation 6+ years Leads Japanese Hadoop Community Co-Author 2nd Edition in Japan 2
  • 3.
    About Jubatus (1/2) OSS Machine Learning Platform developed by NTT Research Laboratories and Preferred Infrastructure, Inc  Motivation: Take the Right Action at the Right Time, at the Right Place Copyright © 2013 NTT DATA Corporation (Source : Hadoop Summit 2013) 3
  • 4.
    About Jubatus (2/2) Distributed Processing Framework and Streaming Machine Learning Libraries  Classification, Regression, Recommendation, Graph Mining, Anomaly Detection  Especially, Jubatus Classifier has small footprint and responds with very low latency. So it is easy to scale for multiple and simultaneous requests. (Source : Hadoop Summit 2013) Copyright © 2013 NTT DATA Corporation 4
  • 5.
    Background  “SUUMO” :Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 5
  • 6.
    “SUUMO” Real-Estate Service: Customer Reach and Brand Awareness Compared to our competitors , ”SUUMO” has the largest customer reach and has scored highest in brand awareness. Customer reach (Unique Users) (Million) Brand awareness (%) 9.9 64.3 5.2 4.7 2.6 4.7 SUUMO A Real estate B C SUUMO 4.4 9.2 A B C Aided SUUMO awareness SUUMO UU 77.2% 9.9million /month Awareness of Character 【SUUMO】 2013 RECRUIT SUMAI CO., LTD All Rights Reserved 87.6% 6
  • 7.
    Background  “SUUMO” :Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 7
  • 8.
    Web service forSmartPhone (beta ver.)  Repeatedly present two choices for candidates, then system learns user’s flavor Existing search service simply list candidates with filtering.. Resulted recommendation Present two typical and salient choices. Then a user simply choose more preferable one. Present recommendation based on acquired user’s preference Ask 10 times with countdown Copyright © 2013 NTT DATA Corporation 8
  • 9.
    Inside this SUUMOWeb service  This SUUMO web services is implemented with the combination of some algorithms  Multidimentional Scaling (MDS)  Jubatus classifier (Passive Aggressive algorithm)  etc. Copyright © 2013 NTT DATA Corporation 9
  • 10.
    Building search spaceby MDS  Building the search spaces for each station by Multidimensional Scaling (MDS)  daily batch processing using R  Goal : to achieve O(log n) search (like binary search)  Using MDS, convert multi dimensional vectors to lower dimensional one with keeping distance relations among houses Rent Fraction on foot Size Deposit Age of a building ... Copyright © 2013 NTT DATA Corporation 10
  • 11.
    Learning user’s flavorusing Jubatus  Classify user’s flavor for real-estate using Jubatus  Goal : reflect user action to result on real-time (cannot by R)  Using Passive Aggressive algorithm  If score of the area becomes lower than threshold, remove it from search area  Processing this classification with low latency, and easy to scale Initial state 1 clicked 2 clicked 6 clicked 10 clicked  Using this approach :  Keep to include discrete candidate, which is excluded in MDS space search but a user still have some interest with unexpected attributions.  Keep sufficient diversion in order to present “salient” candidate. It is necessary to have distant choices for estimating user’s preference without losing features Copyright © 2013 NTT DATA Corporation 11
  • 12.
    Wrap up  Introduceda use case of online machine learning using Jubatus  Now beta service for smartphone is released on SUUMO, which is one of the largest residence service in Japan.  Today I explained  Building a Multi-dimensional Scaling search space  Using Jubatus classifier to understand users preference adaptively  Furute works  Learn from similar users activity using logs  Semantic analysis of sentences input by client Copyright © 2013 NTT DATA Corporation 12
  • 13.
    NTT DATA CorporationSystem Platforms Sector OSS Professional Services URL: http://oss.nttdata.co.jp/hadoop/ mail: hadoop@kits.nttdata.co.jp Copyright © 2013 NTT DATA Corporation