A use case of online machine learning using Jubatus

N
NTT DATA OSS Professional ServicesSenior Specialist at NTT DATA OSS Professional Services
GOTO Berlin Conference 2013

A use case of online machine learning
using Jubatus
2013/10/18
NTT DATA Corporation System Platforms Sector
OSS Professional Services
Toru Shimogaki

Copyright © 2013 NTT DATA Corporation
Who is Toru Shimogaki
 The Elephant Wizard as a team lead of “NTT DATA OSS
Professional Services”
 Deep and wide experience deploying Open Source
Software technologies for enterprise customers

10+ years
Contributor
Ex. pg_bulkload
Copyright © 2013 NTT DATA Corporation

6+ years
Leads Japanese
Hadoop Community

Co-Author
2nd Edition
in Japan
2
About Jubatus (1/2)
 OSS Machine Learning Platform developed by NTT
Research Laboratories and Preferred Infrastructure, Inc
 Motivation: Take the Right Action at the Right Time, at
the Right Place

Copyright © 2013 NTT DATA Corporation

(Source : Hadoop Summit 2013)

3
About Jubatus (2/2)
 Distributed Processing Framework and Streaming
Machine Learning Libraries
 Classification, Regression, Recommendation, Graph Mining,
Anomaly Detection
 Especially,
Jubatus Classifier has
small footprint and
responds with very low
latency. So it is easy
to scale for multiple
and simultaneous
requests.
(Source : Hadoop Summit 2013)
Copyright © 2013 NTT DATA Corporation

4
Background
 “SUUMO” : Online Service for Real-Estate Business
Improve usability for smartphone access
Navigate those who don’t know how to search
residences
Efficiently approach the first time customer and
non regular short-time access users

Copyright © 2013 NTT DATA Corporation

5
“SUUMO” Real-Estate Service : Customer Reach and Brand Awareness
Compared to our competitors , ”SUUMO” has the largest customer reach and has scored highest in
brand awareness.

Customer reach (Unique Users)
(Million)

Brand awareness
(%)

9.9
64.3
5.2

4.7
2.6
4.7

SUUMO

A
Real estate

B

C

SUUMO

4.4

9.2

A

B

C

Aided SUUMO awareness
SUUMO UU

77.2%

9.9million /month

Awareness of Character 【SUUMO】

2013 RECRUIT SUMAI CO., LTD All Rights Reserved

87.6%

6
Background
 “SUUMO” : Online Service for Real-Estate Business
Improve usability for smartphone access
Navigate those who don’t know how to search
residences
Efficiently approach the first time customer and
non regular short-time access users

Copyright © 2013 NTT DATA Corporation

7
Web service for SmartPhone (beta ver.)
 Repeatedly present two choices for candidates, then
system learns user’s flavor
Existing search service
simply list candidates with
filtering..

Resulted
recommendation

Present two typical
and salient choices.
Then a user simply
choose more
preferable one.

Present recommendation
based on acquired
user’s preference

Ask 10 times
with countdown
Copyright © 2013 NTT DATA Corporation

8
Inside this SUUMO Web service
 This SUUMO web services is implemented with the
combination of some algorithms
 Multidimentional Scaling (MDS)
 Jubatus classifier (Passive Aggressive algorithm)
 etc.

Copyright © 2013 NTT DATA Corporation

9
Building search space by MDS
 Building the search spaces for each station by Multidimensional
Scaling (MDS)
 daily batch processing using R

 Goal : to achieve O(log n) search (like binary search)
 Using MDS, convert multi dimensional vectors to lower dimensional one
with keeping distance relations among houses
Rent
Fraction on foot
Size
Deposit
Age of a building
...
Copyright © 2013 NTT DATA Corporation

10
Learning user’s flavor using Jubatus
 Classify user’s flavor for real-estate using Jubatus
 Goal : reflect user action to result on real-time (cannot by R)
 Using Passive Aggressive algorithm
 If score of the area becomes lower than threshold, remove it from search area
 Processing this classification with low latency, and easy to scale

Initial state

1 clicked

2 clicked

6 clicked

10 clicked

 Using this approach :
 Keep to include discrete candidate, which is excluded in MDS space search but a
user still have some interest with unexpected attributions.
 Keep sufficient diversion in order to present “salient” candidate. It is necessary to
have distant choices for estimating user’s preference without losing features
Copyright © 2013 NTT DATA Corporation

11
Wrap up
 Introduced a use case of online machine learning using Jubatus
 Now beta service for smartphone is released on SUUMO, which
is one of the largest residence service in Japan.
 Today I explained
 Building a Multi-dimensional Scaling search space
 Using Jubatus classifier to understand users preference adaptively

 Furute works
 Learn from similar users activity using logs
 Semantic analysis of sentences input by client

Copyright © 2013 NTT DATA Corporation

12
NTT DATA Corporation System Platforms Sector
OSS Professional Services
URL:
http://oss.nttdata.co.jp/hadoop/
mail: hadoop@kits.nttdata.co.jp
Copyright © 2013 NTT DATA Corporation
1 of 13

More Related Content

Viewers also liked(20)

jubarecommenderの紹介jubarecommenderの紹介
jubarecommenderの紹介
JubatusOfficial12K views
新聞から今年の漢字を予測する新聞から今年の漢字を予測する
新聞から今年の漢字を予測する
JubatusOfficial10.9K views
jubabanditの紹介jubabanditの紹介
jubabanditの紹介
JubatusOfficial12.3K views
かまってちゃん小町かまってちゃん小町
かまってちゃん小町
JubatusOfficial11.2K views
JubaQLご紹介JubaQLご紹介
JubaQLご紹介
JubatusOfficial11.9K views
Jubatus 新機能ハイライトJubatus 新機能ハイライト
Jubatus 新機能ハイライト
JubatusOfficial12.5K views
JubaanomalyについてJubaanomalyについて
Jubaanomalyについて
JubatusOfficial12.7K views
Jubakit の紹介Jubakit の紹介
Jubakit の紹介
kmaehashi13.6K views
銀座のママ銀座のママ
銀座のママ
JubatusOfficial11K views
小町の溜息小町の溜息
小町の溜息
JubatusOfficial10.8K views
JUBARHYMEJUBARHYME
JUBARHYME
JubatusOfficial11.1K views
Jubatus 1.0 の紹介Jubatus 1.0 の紹介
Jubatus 1.0 の紹介
JubatusOfficial10.8K views

Similar to A use case of online machine learning using Jubatus(20)

More from NTT DATA OSS Professional Services(20)

Global Top 5 を目指す NTT DATA の確かで意外な技術力Global Top 5 を目指す NTT DATA の確かで意外な技術力
Global Top 5 を目指す NTT DATA の確かで意外な技術力
NTT DATA OSS Professional Services9.8K views
Spark SQL - The internal -Spark SQL - The internal -
Spark SQL - The internal -
NTT DATA OSS Professional Services4.9K views
Hadoopエコシステムのデータストア振り返りHadoopエコシステムのデータストア振り返り
Hadoopエコシステムのデータストア振り返り
NTT DATA OSS Professional Services4.5K views
HDFS Router-based federationHDFS Router-based federation
HDFS Router-based federation
NTT DATA OSS Professional Services1.7K views
Apache Hadoopの新機能Ozoneの現状Apache Hadoopの新機能Ozoneの現状
Apache Hadoopの新機能Ozoneの現状
NTT DATA OSS Professional Services6.2K views
Distributed data stores in Hadoop ecosystemDistributed data stores in Hadoop ecosystem
Distributed data stores in Hadoop ecosystem
NTT DATA OSS Professional Services5.2K views
Structured Streaming - The Internal -Structured Streaming - The Internal -
Structured Streaming - The Internal -
NTT DATA OSS Professional Services4.6K views
Apache Hadoopの未来 3系になって何が変わるのか?Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoopの未来 3系になって何が変わるのか?
NTT DATA OSS Professional Services6.7K views
Apache Hadoop and YARN, current development statusApache Hadoop and YARN, current development status
Apache Hadoop and YARN, current development status
NTT DATA OSS Professional Services2.6K views
HDFS basics from API perspectiveHDFS basics from API perspective
HDFS basics from API perspective
NTT DATA OSS Professional Services2.9K views
20170303 java9 hadoop20170303 java9 hadoop
20170303 java9 hadoop
NTT DATA OSS Professional Services3.6K views
ブロックチェーンの仕組みと動向(入門編)ブロックチェーンの仕組みと動向(入門編)
ブロックチェーンの仕組みと動向(入門編)
NTT DATA OSS Professional Services14K views
Application of postgre sql to large social infrastructure jpApplication of postgre sql to large social infrastructure jp
Application of postgre sql to large social infrastructure jp
NTT DATA OSS Professional Services1.8K views
Application of postgre sql to large social infrastructureApplication of postgre sql to large social infrastructure
Application of postgre sql to large social infrastructure
NTT DATA OSS Professional Services1K views
Apache Hadoop 2.8.0 の新機能 (抜粋)Apache Hadoop 2.8.0 の新機能 (抜粋)
Apache Hadoop 2.8.0 の新機能 (抜粋)
NTT DATA OSS Professional Services3.2K views

A use case of online machine learning using Jubatus

  • 1. GOTO Berlin Conference 2013 A use case of online machine learning using Jubatus 2013/10/18 NTT DATA Corporation System Platforms Sector OSS Professional Services Toru Shimogaki Copyright © 2013 NTT DATA Corporation
  • 2. Who is Toru Shimogaki  The Elephant Wizard as a team lead of “NTT DATA OSS Professional Services”  Deep and wide experience deploying Open Source Software technologies for enterprise customers 10+ years Contributor Ex. pg_bulkload Copyright © 2013 NTT DATA Corporation 6+ years Leads Japanese Hadoop Community Co-Author 2nd Edition in Japan 2
  • 3. About Jubatus (1/2)  OSS Machine Learning Platform developed by NTT Research Laboratories and Preferred Infrastructure, Inc  Motivation: Take the Right Action at the Right Time, at the Right Place Copyright © 2013 NTT DATA Corporation (Source : Hadoop Summit 2013) 3
  • 4. About Jubatus (2/2)  Distributed Processing Framework and Streaming Machine Learning Libraries  Classification, Regression, Recommendation, Graph Mining, Anomaly Detection  Especially, Jubatus Classifier has small footprint and responds with very low latency. So it is easy to scale for multiple and simultaneous requests. (Source : Hadoop Summit 2013) Copyright © 2013 NTT DATA Corporation 4
  • 5. Background  “SUUMO” : Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 5
  • 6. “SUUMO” Real-Estate Service : Customer Reach and Brand Awareness Compared to our competitors , ”SUUMO” has the largest customer reach and has scored highest in brand awareness. Customer reach (Unique Users) (Million) Brand awareness (%) 9.9 64.3 5.2 4.7 2.6 4.7 SUUMO A Real estate B C SUUMO 4.4 9.2 A B C Aided SUUMO awareness SUUMO UU 77.2% 9.9million /month Awareness of Character 【SUUMO】 2013 RECRUIT SUMAI CO., LTD All Rights Reserved 87.6% 6
  • 7. Background  “SUUMO” : Online Service for Real-Estate Business Improve usability for smartphone access Navigate those who don’t know how to search residences Efficiently approach the first time customer and non regular short-time access users Copyright © 2013 NTT DATA Corporation 7
  • 8. Web service for SmartPhone (beta ver.)  Repeatedly present two choices for candidates, then system learns user’s flavor Existing search service simply list candidates with filtering.. Resulted recommendation Present two typical and salient choices. Then a user simply choose more preferable one. Present recommendation based on acquired user’s preference Ask 10 times with countdown Copyright © 2013 NTT DATA Corporation 8
  • 9. Inside this SUUMO Web service  This SUUMO web services is implemented with the combination of some algorithms  Multidimentional Scaling (MDS)  Jubatus classifier (Passive Aggressive algorithm)  etc. Copyright © 2013 NTT DATA Corporation 9
  • 10. Building search space by MDS  Building the search spaces for each station by Multidimensional Scaling (MDS)  daily batch processing using R  Goal : to achieve O(log n) search (like binary search)  Using MDS, convert multi dimensional vectors to lower dimensional one with keeping distance relations among houses Rent Fraction on foot Size Deposit Age of a building ... Copyright © 2013 NTT DATA Corporation 10
  • 11. Learning user’s flavor using Jubatus  Classify user’s flavor for real-estate using Jubatus  Goal : reflect user action to result on real-time (cannot by R)  Using Passive Aggressive algorithm  If score of the area becomes lower than threshold, remove it from search area  Processing this classification with low latency, and easy to scale Initial state 1 clicked 2 clicked 6 clicked 10 clicked  Using this approach :  Keep to include discrete candidate, which is excluded in MDS space search but a user still have some interest with unexpected attributions.  Keep sufficient diversion in order to present “salient” candidate. It is necessary to have distant choices for estimating user’s preference without losing features Copyright © 2013 NTT DATA Corporation 11
  • 12. Wrap up  Introduced a use case of online machine learning using Jubatus  Now beta service for smartphone is released on SUUMO, which is one of the largest residence service in Japan.  Today I explained  Building a Multi-dimensional Scaling search space  Using Jubatus classifier to understand users preference adaptively  Furute works  Learn from similar users activity using logs  Semantic analysis of sentences input by client Copyright © 2013 NTT DATA Corporation 12
  • 13. NTT DATA Corporation System Platforms Sector OSS Professional Services URL: http://oss.nttdata.co.jp/hadoop/ mail: hadoop@kits.nttdata.co.jp Copyright © 2013 NTT DATA Corporation