SlideShare a Scribd company logo
Apache Mahout 於電子商務的應用
James Chen, Etu Solution
Hadoop in TW 2013
Sep 28, 2013
2
台灣Hadoop 2013現狀問卷調查
填寫問卷就有機會抽電影票兩張
2013/10/7 截止
3
• Apache Mahout Introduction
• Machine Learning Use Cases
• Building a recommendation system
• Collaboration Filtering
• System Architecture
• Performance
• Future Roadmap
Agenda
4
Apache Mahout
• ASF project to create scalable machine learning
libraries
– http://mahout.apache.org
• Why Mahout?
– Many open source machine learning libraries either:
• Lack Community
• Lack Documentation and Examples
• Lack Scalability
• Lack the Apache License
5
Algorithms in Mahout
Regression
Recommenders
ClusteringClassification
Freq.
Pattern
Mining
Vector Similarity
Non-MR
Algorithms
Examples
See http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
Dimension
Reduction
Evolution
6
Algorithms in Mahout
• Classification
– Logistic Regression
– Bayesian
– Support Vector Machines
– Perceptron and Winnow
– Neural Network
– Random Forests
– Restricted Boltzmann
Machines
– Online Passive Aggressive
– Boosting
– Hidden Markov Models
• Clustering
– Canopy Clustering
– K-Means
– Fuzzy K-Means
– Expectation Maximization
– Mean Shift
– Hierarchical Clustering
– Dirichlet Process Clustering
– Latent Dirichlet Allocation
– Spectral Clustering
– Minhash Clustering
– Top Down Clustering
7
Algorithms in Mahout – Cont.
• Pattern Mining
– Parallel FP Growth
• Regression
– Locally Weighted Linear
Regression
• Dimension Reduction
– SVD
– Stochastic SVD with PCA
– PCA
– Independent Component
Analysis
– Gaussian Discriminative
Analysis
• Evolution Algorithms
– Genetic Algorithms
• Recommenders
– Non-distributed
recommenders (“Taste”)
– Distributed Item-Based
Collaboration Filtering
– Collaboration Filtering using
a parallel matrix factorization
– Slope One
8
Algorithms in Mahout – Cont.
• Vector Similarity
– RowSimiliarityJob (MR)
– VectorDistanceJob (MR)
• Other
– Collocations
• Non-MapReduce algorithms
See http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
9
Mahout Focus on Scalability
• Goal: Be as fast and efficient as possible given the
intrinsic design of the algorithm
– Some algorithms won‟t scale to massive machine clusters
– Others fit logically on a Map Reduce framework like Apache
Hadoop
– Still others will need alternative distributed programming
models
– Be pragmatic
• Most Mahout implementations are Map Reduce
enabled
• (Always a) Work in Progress
10
Prepare Data from Raw content
• Lucene integration
– bin/mahout lucenevector …
• Document Vectorizer
– bin/mahout seqdirectory …
– bin/mahout seq2sparse …
• Programmatically
– See the Utils module in Mahout
• Database (JDBC)
• File System (HDFS)
11
Machine Learning
• “Machine Learning is programming computers
to optimize a performance criterion using
example data or past experience”
– Intro. To Machine Learning by E. Alpaydin
• Subset of Artificial Intelligence
• Lots of related fields:
– Information Retrieval
– Stats
– Biology
– Linear algebra
– Many more
12
Use Case : Recommendation
13
Use Cases : Classification
14
User Cases : Clustering
15
More use cases
• Recommend products/books/friends …
• Classify content into predefined groups
• Find similar content based on object properties
• Find associations/patterns in actions/behaviors
• Identify key topics in large collections of text
• Detect anomalies in machine output
• Ranking search results (PageRank)
• Others
16
Building recommendation system
• Help users find items they might like based on
historical preferences
17
Approach
• Collect User Preferences -> User vs Item Matrix
• Find Similar Users or Items (Neighborhood-based
approach)
• Works by finding similarly rated items in the user-
item-matrix (e.g. cosine, Pearson-Correlation,
Tanimoto Coefficient)
• Estimates a user's preference towards an item by
looking at his/her preferences towards similar items
18
Collaborative Filtering – User Based
Find User Similarity
1. 如何預測用戶1對於商品4的
喜好程度?
2. 找尋n個和用戶1相似的用戶
且購買過商品4(基於購買
記錄的評價)為用戶n
3. 根據用戶n對商品4的評價,
以相似度為權重回填結果
4. 針對所有用戶組合,重覆
1~3,直到所有空格都被填
滿
Items
User 1
?
User n
回填結果
19
Find Item Similarity
(Amazon)
1. 如何預測用戶1對於商品4的喜
好程度?
2. 從用戶1歷史記錄中,計算商
品n和商品4的相似度(以其他
用戶的歷史記錄)
3. 將用戶1對於商品n的評價,以
產品相似度為權重回填
4. 針對所有商品組合,重覆1~3
直到所有空格都被填滿
Items
Users
?
回填結果
Collaborative Filtering – Item Based
20
Test Drive of Mahout Recommender
• Group Len Dataset:
http://www.grouplens.org/node/12
• 1,000,209 anonymous ratings of 3,900 movies made
by 6,040 MovieLens users
• movies.dat (movie ids with title and category)
• ratings.dat (ratings of movies)
• users.dat (user information)
21
Ratings File
• Each line of ratings file has the format
UserID::MovieID::Rating::Timestamp
• Mahout requires following csv format
UserID,ItemID,Value
• tr –s „:‟ „,‟ < ratings.dat | cut –f1-3 –d, > rating.csv
22
Run Recommendation Job
• $ mahout recommenditembased 
–i [input-hdfs-path] 
-o [output-hdfs-path] 
--usersFile [File listing users] 
--tempDir
23
Recommendation Result
• Recommendation Result will look like
UserID [ItemID:Weight, ItemID:Weight,…]
• Each line represents a UserID with associated
recommended ItemID
24
Collect User Behavior Events
Implicit (Easy to collect) Explicit (Hard to collect)
View Rating (0~5)
Shopping Cart (0 or 1) Voting (0 or 1)
Order or Buy (0 or 1) Forward or Share (0 or 1)
Duration Time (Noisy) Add favorite (0 or 1)
Tag (text analysis)
Comments (text analysis)
25
Process Event into Preference
• Group by different event type, and calculate similarity
based on event types. Ex. Also View, Also Buy..
• Weighting:
– Explicit Event > Implicit Event
– Order, Cart > View
• Noise Reduction
• Normalization
26
Similarity (Vector Similarity)
• Euclidean Distance
• Pearson Correlation Coefficient (-1 ~ +1)
• Cosine Similarity
• Tanimoto Coefficient
27
Complementary
• Sometimes CF cannot generate enough
recommendation to all users
• Cold start problem
• New user and new item
• Some statistical approaches can be complementary
• Ranking is very easy to implement by MR. Word Count
?
28
Etu Recommender
Application
The Whole System
協同過濾分析
(Collaborative Filtering)
客戶
相似度
分析
轉化率分析
資
料
擷
取
產品
關聯性
分析
推
薦
清
單
推薦引擎
用戶個性化推薦
交易資料
Transaction Info
• 歷史訂單資料
• 產品被購買紀錄
Web 互動資料
• 瀏覽
• 點擊 Click
• 搜尋 Search
• 購物車 Cart
• 結帳 check-out
• 評論 Rating
Mobile 互動資料
• 下載 Download
• 點擊 Click
• 打卡 Check-in
• 付費 Payment
• 位置 Location
Social Media
(3rd party feed)
Etu Appliance
瀏覽過本商品的顧客還瀏覽了
購買過此商品的顧客還買了
購物車商品的推薦
組合購買的商品
基於瀏覽,你可能會喜歡
瀏覽過本商品的顧客最終買了
Etu Recommender
29
Data Process Flow
Front End
Java Script
Event Colloector
(Nginx)
HDFS
Log Parser
HBase
Core Engine
Mahout Job
User Based
Item Based
MR Job
Ranking &
Stats.
Rec API
Item Mgmt.
API
Dashboard
&
Mgmt Console
request
access
log
Preprocess
& Dispatch
Schedule &
Flow Control
Front End
Backend
Admin
30
System Components
• Nginx
– Event Collector & Request Forwarder
• Log Parser
– Preprocess collected log and dispatch log to HDFS
• HDFS
– Fundamental storage of the system
• Core Engine
– Scheduling & Workflow Control
– Job Driver
• Management Console
– Dashboard (PV,UV,Conv. Rate)
– Scheduling, Log Viewer, System Configuration
31
System Components – Cont.
• Recommendation Jobs
– Mahout jobs for CF
– MR jobs for Ranking
• HBase
– Recommendation Result for query
• Recommendation API
– API wrapper for frontend to query result from HBase table
– Handle business logic and policy here
• Item Management API
– API interface for frontend item management
– Allow List, Exception List
32
HBase Table
Table Rowkey Column
CATEGORY CategoryID column=f:id Category ID
column=f:rank ranking by view
column=f:rank_cart ranking by cart
column=f:rank_order ranking by order
column=f:rank_view ranking by view
ERUID_USE
R
ErUid column=f:uid ERUID/UID mapping
USER_ERUI
D
uid column=f:eruid UID/ERUID mapping
SEARCH Keyword column=f:id search ID
column=f:rec item list
33
HBase Table – Cont.
Table Rowkey Column
STATS date(Ex:20
13-06-25)
column=f:amount (全站成交金額)
column=f:item (全站成交商品數)
column=f:order (全站成交訂單數)
column=f:pv (全站PV數)
column=f:uv (全站UV數)
column=f:erAmount (推薦成交金額)
column=f:erItem (推薦成交商品數)
column=f:erOrder (推薦成交訂單數)
column=f:erPv (推薦版位 PV)
column=f:erUv (推薦版位 UV)
34
HBase Table – Item Table
Table Rowkey Column
ITEM PID column=f:avl_cat 此category是否處於可以推薦的狀態
column=f:avl_item 此item是否處於可以推薦的狀態
column=f:cat 此item所屬的類別
column=f:id item ID
column=f:pry Priority 推薦優先權值
column=f:rec_view 用mahout算出來推薦view的商品
column=f:rec_cart 用mahout算出來推薦放進cart的商品
column=f:rec_order 用mahout算出來推薦購買的商品
column=f:rec_order_co_occurrence 經常與此item一起購
買的商品
35
HBase Table – User Table
Table Rowkey Column
USER UID column=f:id 用戶ID
column=f:rec_cart 用mahout算出來推薦放進cart的商品
column=f:rec_view 用mahout算出來推薦view的商品
column=f:rec_order 用mahout算出來推薦購買的商品
column=f:rec_view_last_item_views 推薦用戶最常被看的
商品
36
Tracking Code Snippet
<script id="etu-recommender" type="text/javascript">
var erHostname='${erHostname}'
var _qevents = _qevents || [];
_qevents.push({
${paramName} : '${paramValue}',
...
});
var erUrlPrefix=('https:' == document.location.protocol ?
'https://':'http://')+erHostname+'/';
(function() {
var er = document.createElement('script');
er.type = 'text/javascript';
er.async = true;
er.src = erUrlPrefix+'/er.js?'+(new Date().getTime());
var currentJs=document.getElementById('etu-recommender');
currentJs.parentNode.insertBefore(er,currentJs);
})();
</script>
37
Sample parameters for tracking a "view" action
#
Parameter
Name
Parameter
Type
Sample Value Required
1 cid String "www.etusolution.
com"
Yes
2 uid String "johnny_nien" Yes
3 act String "view" Yes
4 pid String "P00001" Yes
5 cat String Array [ "C", "C00001" ] No, but please
take it as a yes.
6 avl * Boolean(0 or 1) 1 No
Note: Explanation about "avl" will be available later
38
Query Recommendations
<script id="etu-recommender" type="text/javascript">
var erUrlPrefix='${erUrlPrefix}';
var _qquery = _qquery || [];
_qquery.push({
${paramName} : '${paramValue}',
……
});
function etuRecQueryCallBack(queryParams,queryResult) {
// Implement Your Logic Here!!!
}
var erUrlPrefix=('https:' == document.location.protocol ?
'https://':'http://')+erHostname+'/';
(function() {
var er = document.createElement('script');
er.type = 'text/javascript';
er.async = true;
er.src = erUrlPrefix+'/er.js?'+(new Date().getTime());
var currentJs=document.getElementById('etu-recommender');
currentJs.parentNode.insertBefore(er,currentJs);
})();
</script>
39
Sample parameter for Also Buy … (Item
based)
#
Parameter
Name
Parameter
Type
Sample Value Required
1 cid String "www.etusolution.
com"
Yes
2 type String “item” Yes
3 act String ”order" Yes
4 pid String "P001" Yes
5 cat String "C001" No, but highly
recommended
40
Network Topology
Router
Nginx
NN
HM
DN
RS
DN
RS
L2
L2
User LAN
Private LAN
isolated
Master IP (22,443,8888)
Web
Server
Web
Server
internet
Public IP (22,443,8888)
Recommender ClusterWeb Server Farm
41
User Event Collection
嵌入JavaScript
擷取客戶線上行為
相關網頁
• 己登入用戶的首頁
• 搜尋頁
• 商品詳情頁
• 添加商品至購物車頁
• 下單付款頁
• 付款完成頁
Online Behavior
Online / Offline Records
客戶行為 (Event)
• 瀏覽、點擊 Click
• 搜尋 Search
• 放入購物車 Cart
• 下單付款 Check-out
• 評論 Rating
交易資料
• 歷史訂單
• 產品資料
Recommender
Batch Import
42
Generate Recommendation Result
用戶個性化推薦
瀏覽過本商品的顧客還瀏覽了
購買過此商品的顧客還買了
瀏覽過本商品的顧客最終買了
購物車商品的推薦
組合購買的商品
基於瀏覽,你可能會喜歡
資料來源:瀏覽記錄 + 購物車記錄 + 購買記錄
作用:強化推薦個性,提高使用者體驗度,提高訂單轉化率
資料來源:瀏覽記錄
作用:降低使用者的跳出率,提高訂單轉化率
資料來源:購買記錄
作用:強化交叉銷售效果,激發顧客再次下單的欲望
資料來源:瀏覽記錄 + 購買記錄
作用:降低用戶跳出率,幫助用戶提高決策率,提高訂單轉化率
資料來源:購物車記錄 + 購買記錄
作用:通過向上銷售原理,在幫助顧客滿足基本需求之後,引導其購買
更多感興趣的商品,有效提升銷售量,增加毛利率
資料來源:購物車記錄 + 購買記錄
作用:商品組合,有效提高商品交叉行銷,提高銷售量
資料來源:瀏覽記錄
作用:最大程度減少跳出率,提升顧客忠誠度,增加商品的複購率
43
Recommender 如何應用在電子商務
Etu Recommender
Application
協同過濾
分析
Collaborative
Filtering
轉化率分析
資
料
擷
取
推
薦
清
單
推薦引擎
Etu Appliance
Etu Recommender
Product Pages
Category Pages
Search Results
Cart Pages
Email Confirmation
EDM
歷史訂單
產品資料
即時訂單
瀏覽、點擊
放入購物車
結帳
線上評論
搜尋
44
Recommender 的轉化率分析
Online Performance Tracking
Item A
Item A
透過點擊
推薦清單
透過主頁或其他所有頁面
PV1, UV1
PV2, UV2
推薦商品點擊率 =
PV2 or UV2
PV1 or UV1
推薦商品轉化率 =
透過
推薦清單
U-Cart 2
UV2 or PV2
**
 PV : page view
 UV : unique visitor
 U-Cart : added to
cart by UV
U-Cart 1
U-Cart 2
Algorithm Benchmark
• Train vs Test (80-20)
• A/B test
45
Summary
• Mahout is very useful if you would like to build a
machine learning application on top of Hadoop
• BUT, a recommendation system is not algorithm only
• DON‟T re-invent the wheels. Leverage mahout and
hadoop
• Put most of your efforts on integration, performance
tuning, and business logic
46
Future Roadmaps
• Offline to online integration -> Offline User Event
Collection
• 360 Degree CRM -> CRM Connector
• Social Recommendation -> Social Connector
• Retargeting -> Customer Behavior Data Warehouse
• Go real-time!
47
WE’ RE
HIRING!
www.etusolution.com
info@etusolution.com
Taipei, Taiwan
318, Rueiguang Rd., Taipei 114, Taiwan
T: +886 2 7720 1888
F: +886 2 8798 6069
Beijing, China
Room B-26, Landgent Center,
No. 24, East Third Ring Middle Rd.,
Beijing, China 100022
T: +86 10 8441 7988
F: +86 10 8441 7227
Contact
49
Recommendation
Alice
Bob
Peter
5 1 4
? 2 5
4 3 2
50
Algorithms Examples –
Recommendation
• Prediction: Estimate Bob's preference towards “The
Matrix”
1. Look at all items that
– a) are similar to “The Matrix“
– b) have been rated by Bob
=> “Alien“, “Inception“
2. Estimate the unknown preference with a weighted sum
51
Algorithms Examples –
Recommendation
• MapReduce phase 1
– Map – Make user the key
(Alice, Matrix, 5)
(Alice, Alien, 1)
(Alice, Inception, 4)
(Bob, Alien, 2)
(Bob, Inception, 5)
(Peter, Matrix, 4)
(Peter, Alien, 3)
(Peter, Inception, 2)
Alice (Matrix, 5)
Alice (Alien, 1)
Alice (Inception, 4)
Bob (Alien, 2)
Bob (Inception, 5)
Peter (Matrix, 4)
Peter (Alien, 3)
Peter (Inception, 2)
52
Algorithms Examples –
Recommendation
• MapReduce phase 1
– Reduce – Create inverted index
Alice (Matrix, 5)
Alice (Alien, 1)
Alice (Inception, 4)
Bob (Alien, 2)
Bob (Inception, 5)
Peter (Matrix, 4)
Peter (Alien, 3)
Peter (Inception, 2)
Alice (Matrix, 5) (Alien, 1) (Inception, 4)
Bob (Alien, 2) (Inception, 5)
Peter(Matrix, 4) (Alien, 3) (Inception, 2)
53
Algorithms Examples –
Recommendation
• MapReduce phase 2
– Map – Isolate all co-occurred ratings (all cases where a user
rated both items)
Matrix, Alien (5,1)
Matrix, Alien (4,3)
Alien, Inception (1,4)
Alien, Inception (2,5)
Alien, Inception (3,2)
Matrix, Inception (4,2)
Matrix, Inception (5,4)
Alice (Matrix, 5) (Alien, 1) (Inception, 4)
Bob (Alien, 2) (Inception, 5)
Peter(Matrix, 4) (Alien, 3) (Inception, 2)
54
Algorithms Examples –
Recommendation
• MapReduce phase 2
– Reduce – Compute similarities
Matrix, Alien (5,1)
Matrix, Alien (4,3)
Alien, Inception (1,4)
Alien, Inception (2,5)
Alien, Inception (3,2)
Matrix, Inception (4,2)
Matrix, Inception (5,4)
Matrix, Alien (-0.47)
Matrix, Inception (0.47)
Alien, Inception(-0.63)
55
Recommendation
Alice
Bob
Peter
5 1 4
2 5
4 3 2
1.5

More Related Content

What's hot

Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
Ted Dunning
 
Apache mahout
Apache mahoutApache mahout
Apache mahout
Puneet Gupta
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
Cataldo Musto
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
Grant Ingersoll
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
Korea Sdec
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
Ajit Koti
 
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenders
sscdotopen
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
Drew Farris
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
Yasmine Gaber
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to Mahout
Uri Lavi
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
Aman Adhikari
 
Apache Mahout Architecture Overview
Apache Mahout Architecture OverviewApache Mahout Architecture Overview
Apache Mahout Architecture Overview
Stefano Dalla Palma
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
DataStax Academy
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
OSCON Byrum
 
Mahout classification presentation
Mahout classification presentationMahout classification presentation
Mahout classification presentation
Naoki Nakatani
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
aneeshabakharia
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
Joaquin Delgado PhD.
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantGrant Ingersoll
 

What's hot (20)

Whats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache MahoutWhats Right and Wrong with Apache Mahout
Whats Right and Wrong with Apache Mahout
 
Apache mahout
Apache mahoutApache mahout
Apache mahout
 
Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014 Apache Mahout Tutorial - Recommendation - 2013/2014
Apache Mahout Tutorial - Recommendation - 2013/2014
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Apache Mahout
Apache MahoutApache Mahout
Apache Mahout
 
Next directions in Mahout's recommenders
Next directions in Mahout's recommendersNext directions in Mahout's recommenders
Next directions in Mahout's recommenders
 
Mahout Introduction BarCampDC
Mahout Introduction BarCampDCMahout Introduction BarCampDC
Mahout Introduction BarCampDC
 
Mahout
MahoutMahout
Mahout
 
Mahout part2
Mahout part2Mahout part2
Mahout part2
 
Intro to Mahout
Intro to MahoutIntro to Mahout
Intro to Mahout
 
Introduction to Apache Mahout
Introduction to Apache MahoutIntroduction to Apache Mahout
Introduction to Apache Mahout
 
Apache Mahout Architecture Overview
Apache Mahout Architecture OverviewApache Mahout Architecture Overview
Apache Mahout Architecture Overview
 
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data PlatformsCassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
Cassandra Summit 2014: Apache Spark - The SDK for All Big Data Platforms
 
Hands on Mahout!
Hands on Mahout!Hands on Mahout!
Hands on Mahout!
 
mahout introduction
mahout  introductionmahout  introduction
mahout introduction
 
Mahout classification presentation
Mahout classification presentationMahout classification presentation
Mahout classification presentation
 
Orchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache MahoutOrchestrating the Intelligent Web with Apache Mahout
Orchestrating the Intelligent Web with Apache Mahout
 
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
RecSys 2015 Tutorial - Scalable Recommender Systems: Where Machine Learning m...
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow Elephant
 

Viewers also liked

Mahout資料分析基礎入門
Mahout資料分析基礎入門Mahout資料分析基礎入門
Mahout資料分析基礎入門
Jhang Raymond
 
Multi thread 多執行緒程式設計(use c#)
Multi thread 多執行緒程式設計(use c#)Multi thread 多執行緒程式設計(use c#)
Multi thread 多執行緒程式設計(use c#)
Gelis Wu
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
Karen Li
 
「沙中撈金術」﹣談開放原始碼的推薦系統
「沙中撈金術」﹣談開放原始碼的推薦系統 「沙中撈金術」﹣談開放原始碼的推薦系統
「沙中撈金術」﹣談開放原始碼的推薦系統 建興 王
 
Lassen van Aluminium
Lassen van AluminiumLassen van Aluminium
Lassen van Aluminium
Jorg Eichhorn
 
China modern agriculture business model and industrial chain investment strat...
China modern agriculture business model and industrial chain investment strat...China modern agriculture business model and industrial chain investment strat...
China modern agriculture business model and industrial chain investment strat...
Qianzhan Intelligence
 
수원호텔캐슬 홍콩항공권
수원호텔캐슬 홍콩항공권수원호텔캐슬 홍콩항공권
수원호텔캐슬 홍콩항공권
hjsoidjgo
 
Equipo de gestion tic del vicente hondarza
Equipo de gestion tic del vicente hondarzaEquipo de gestion tic del vicente hondarza
Equipo de gestion tic del vicente hondarzaluisalbertodiazquintero
 
Remix Conference 2015—Sam Hashemi, "Remember, This Is Water"
Remix Conference 2015—Sam Hashemi, "Remember, This Is Water" Remix Conference 2015—Sam Hashemi, "Remember, This Is Water"
Remix Conference 2015—Sam Hashemi, "Remember, This Is Water"
Remix Software
 
China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...
Qianzhan Intelligence
 
China rfid industry market forecast and investment strategy planning report, ...
China rfid industry market forecast and investment strategy planning report, ...China rfid industry market forecast and investment strategy planning report, ...
China rfid industry market forecast and investment strategy planning report, ...
Qianzhan Intelligence
 
Air Quality Map
Air Quality MapAir Quality Map
WINPOT CASINO
WINPOT CASINOWINPOT CASINO
WINPOT CASINO
WINPOT CASINO
 
China dental medical industry forward looking and investment strategy report,...
China dental medical industry forward looking and investment strategy report,...China dental medical industry forward looking and investment strategy report,...
China dental medical industry forward looking and investment strategy report,...
Qianzhan Intelligence
 
listing output program C
listing output program Clisting output program C
listing output program C
AdjievanGestu
 
China animal husbandry indepth research and investment forecast report
China animal husbandry indepth research and investment forecast reportChina animal husbandry indepth research and investment forecast report
China animal husbandry indepth research and investment forecast reportQianzhan Intelligence
 
China methanol industry market research and investment forecast report
China methanol industry market research and investment forecast reportChina methanol industry market research and investment forecast report
China methanol industry market research and investment forecast report
Qianzhan Intelligence
 
MORE Vision 8: Crisis & Retail Consumption
MORE Vision 8: Crisis & Retail ConsumptionMORE Vision 8: Crisis & Retail Consumption
MORE Vision 8: Crisis & Retail ConsumptionMIPIMWorld
 

Viewers also liked (20)

Mahout資料分析基礎入門
Mahout資料分析基礎入門Mahout資料分析基礎入門
Mahout資料分析基礎入門
 
Multi thread 多執行緒程式設計(use c#)
Multi thread 多執行緒程式設計(use c#)Multi thread 多執行緒程式設計(use c#)
Multi thread 多執行緒程式設計(use c#)
 
Tag based recommender system
Tag based recommender systemTag based recommender system
Tag based recommender system
 
「沙中撈金術」﹣談開放原始碼的推薦系統
「沙中撈金術」﹣談開放原始碼的推薦系統 「沙中撈金術」﹣談開放原始碼的推薦系統
「沙中撈金術」﹣談開放原始碼的推薦系統
 
Lassen van Aluminium
Lassen van AluminiumLassen van Aluminium
Lassen van Aluminium
 
China modern agriculture business model and industrial chain investment strat...
China modern agriculture business model and industrial chain investment strat...China modern agriculture business model and industrial chain investment strat...
China modern agriculture business model and industrial chain investment strat...
 
수원호텔캐슬 홍콩항공권
수원호텔캐슬 홍콩항공권수원호텔캐슬 홍콩항공권
수원호텔캐슬 홍콩항공권
 
Equipo de gestion tic del vicente hondarza
Equipo de gestion tic del vicente hondarzaEquipo de gestion tic del vicente hondarza
Equipo de gestion tic del vicente hondarza
 
Remix Conference 2015—Sam Hashemi, "Remember, This Is Water"
Remix Conference 2015—Sam Hashemi, "Remember, This Is Water" Remix Conference 2015—Sam Hashemi, "Remember, This Is Water"
Remix Conference 2015—Sam Hashemi, "Remember, This Is Water"
 
China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...China jewelry industry consumption demand and market competition and investme...
China jewelry industry consumption demand and market competition and investme...
 
Qodamah's Recommendation
Qodamah's RecommendationQodamah's Recommendation
Qodamah's Recommendation
 
China rfid industry market forecast and investment strategy planning report, ...
China rfid industry market forecast and investment strategy planning report, ...China rfid industry market forecast and investment strategy planning report, ...
China rfid industry market forecast and investment strategy planning report, ...
 
Air Quality Map
Air Quality MapAir Quality Map
Air Quality Map
 
WINPOT CASINO
WINPOT CASINOWINPOT CASINO
WINPOT CASINO
 
China dental medical industry forward looking and investment strategy report,...
China dental medical industry forward looking and investment strategy report,...China dental medical industry forward looking and investment strategy report,...
China dental medical industry forward looking and investment strategy report,...
 
listing output program C
listing output program Clisting output program C
listing output program C
 
Ostern in finnland daria
Ostern in finnland dariaOstern in finnland daria
Ostern in finnland daria
 
China animal husbandry indepth research and investment forecast report
China animal husbandry indepth research and investment forecast reportChina animal husbandry indepth research and investment forecast report
China animal husbandry indepth research and investment forecast report
 
China methanol industry market research and investment forecast report
China methanol industry market research and investment forecast reportChina methanol industry market research and investment forecast report
China methanol industry market research and investment forecast report
 
MORE Vision 8: Crisis & Retail Consumption
MORE Vision 8: Crisis & Retail ConsumptionMORE Vision 8: Crisis & Retail Consumption
MORE Vision 8: Crisis & Retail Consumption
 

Similar to Apache Mahout 於電子商務的應用

Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
Navisro Analytics
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
Keeyong Han
 
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To Code
Yuto Hayamizu
 
OSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningRobin Anil
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
Niall Beard
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
itstuff
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
Tao Feng
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopJayant Shekhar
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
Andy Stretton
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
Abhay Ratnaparkhi
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
Deepak Agarwal
 
HQL over Tiered Data Warehouse
HQL over Tiered Data WarehouseHQL over Tiered Data Warehouse
HQL over Tiered Data WarehouseDataWorks Summit
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
S. Diana Hu
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
PR Cell, IIM Rohtak
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
Mateusz Dymczyk
 
On the way of listening to the crowd for supporting modeling activities
On the way of listening to the crowd for supporting modeling activitiesOn the way of listening to the crowd for supporting modeling activities
On the way of listening to the crowd for supporting modeling activities
Davide Ruscio
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
Craig Jordan
 

Similar to Apache Mahout 於電子商務的應用 (20)

Collaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro AnalyticsCollaborative Filtering and Recommender Systems By Navisro Analytics
Collaborative Filtering and Recommender Systems By Navisro Analytics
 
Buidling large scale recommendation engine
Buidling large scale recommendation engineBuidling large scale recommendation engine
Buidling large scale recommendation engine
 
Beyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To CodeBeyond EXPLAIN: Query Optimization From Theory To Code
Beyond EXPLAIN: Query Optimization From Theory To Code
 
OSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine LearningOSCON: Apache Mahout - Mammoth Scale Machine Learning
OSCON: Apache Mahout - Mammoth Scale Machine Learning
 
Bioschemas Workshop
Bioschemas WorkshopBioschemas Workshop
Bioschemas Workshop
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Building Recommendation Platforms with Hadoop
Building Recommendation Platforms with HadoopBuilding Recommendation Platforms with Hadoop
Building Recommendation Platforms with Hadoop
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval Latest trends in AI and information Retrieval
Latest trends in AI and information Retrieval
 
Recsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and DeepakRecsys2016 Tutorial by Xavier and Deepak
Recsys2016 Tutorial by Xavier and Deepak
 
HQL over Tiered Data Warehouse
HQL over Tiered Data WarehouseHQL over Tiered Data Warehouse
HQL over Tiered Data Warehouse
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning... RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
RecSys 2015 Tutorial – Scalable Recommender Systems: Where Machine Learning...
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
IT webinar 2016
IT webinar 2016IT webinar 2016
IT webinar 2016
 
Machine Learning for (JVM) Developers
Machine Learning for (JVM) DevelopersMachine Learning for (JVM) Developers
Machine Learning for (JVM) Developers
 
On the way of listening to the crowd for supporting modeling activities
On the way of listening to the crowd for supporting modeling activitiesOn the way of listening to the crowd for supporting modeling activities
On the way of listening to the crowd for supporting modeling activities
 
Tableau and hadoop
Tableau and hadoopTableau and hadoop
Tableau and hadoop
 

More from James Chen

Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
James Chen
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
James Chen
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
James Chen
 
Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012
James Chen
 
Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結
James Chen
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systexJames Chen
 

More from James Chen (6)

Hadoop con 2015 hadoop enables enterprise data lake
Hadoop con 2015   hadoop enables enterprise data lakeHadoop con 2015   hadoop enables enterprise data lake
Hadoop con 2015 hadoop enables enterprise data lake
 
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和SparkEtu Solution Day 2014 Track-D: 掌握Impala和Spark
Etu Solution Day 2014 Track-D: 掌握Impala和Spark
 
Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作Etu L2 Training - Hadoop 企業應用實作
Etu L2 Training - Hadoop 企業應用實作
 
Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012Hadoop的典型应用与企业化之路 for HBTC 2012
Hadoop的典型应用与企业化之路 for HBTC 2012
 
Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結Hadoop 與 SQL 的甜蜜連結
Hadoop 與 SQL 的甜蜜連結
 
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
[Hic2011] using hadoop lucene-solr-for-large-scale-search by systex
 

Recently uploaded

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 

Recently uploaded (20)

UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 

Apache Mahout 於電子商務的應用

  • 1. Apache Mahout 於電子商務的應用 James Chen, Etu Solution Hadoop in TW 2013 Sep 28, 2013
  • 3. 3 • Apache Mahout Introduction • Machine Learning Use Cases • Building a recommendation system • Collaboration Filtering • System Architecture • Performance • Future Roadmap Agenda
  • 4. 4 Apache Mahout • ASF project to create scalable machine learning libraries – http://mahout.apache.org • Why Mahout? – Many open source machine learning libraries either: • Lack Community • Lack Documentation and Examples • Lack Scalability • Lack the Apache License
  • 5. 5 Algorithms in Mahout Regression Recommenders ClusteringClassification Freq. Pattern Mining Vector Similarity Non-MR Algorithms Examples See http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms Dimension Reduction Evolution
  • 6. 6 Algorithms in Mahout • Classification – Logistic Regression – Bayesian – Support Vector Machines – Perceptron and Winnow – Neural Network – Random Forests – Restricted Boltzmann Machines – Online Passive Aggressive – Boosting – Hidden Markov Models • Clustering – Canopy Clustering – K-Means – Fuzzy K-Means – Expectation Maximization – Mean Shift – Hierarchical Clustering – Dirichlet Process Clustering – Latent Dirichlet Allocation – Spectral Clustering – Minhash Clustering – Top Down Clustering
  • 7. 7 Algorithms in Mahout – Cont. • Pattern Mining – Parallel FP Growth • Regression – Locally Weighted Linear Regression • Dimension Reduction – SVD – Stochastic SVD with PCA – PCA – Independent Component Analysis – Gaussian Discriminative Analysis • Evolution Algorithms – Genetic Algorithms • Recommenders – Non-distributed recommenders (“Taste”) – Distributed Item-Based Collaboration Filtering – Collaboration Filtering using a parallel matrix factorization – Slope One
  • 8. 8 Algorithms in Mahout – Cont. • Vector Similarity – RowSimiliarityJob (MR) – VectorDistanceJob (MR) • Other – Collocations • Non-MapReduce algorithms See http://cwiki.apache.org/confluence/display/MAHOUT/Algorithms
  • 9. 9 Mahout Focus on Scalability • Goal: Be as fast and efficient as possible given the intrinsic design of the algorithm – Some algorithms won‟t scale to massive machine clusters – Others fit logically on a Map Reduce framework like Apache Hadoop – Still others will need alternative distributed programming models – Be pragmatic • Most Mahout implementations are Map Reduce enabled • (Always a) Work in Progress
  • 10. 10 Prepare Data from Raw content • Lucene integration – bin/mahout lucenevector … • Document Vectorizer – bin/mahout seqdirectory … – bin/mahout seq2sparse … • Programmatically – See the Utils module in Mahout • Database (JDBC) • File System (HDFS)
  • 11. 11 Machine Learning • “Machine Learning is programming computers to optimize a performance criterion using example data or past experience” – Intro. To Machine Learning by E. Alpaydin • Subset of Artificial Intelligence • Lots of related fields: – Information Retrieval – Stats – Biology – Linear algebra – Many more
  • 12. 12 Use Case : Recommendation
  • 13. 13 Use Cases : Classification
  • 14. 14 User Cases : Clustering
  • 15. 15 More use cases • Recommend products/books/friends … • Classify content into predefined groups • Find similar content based on object properties • Find associations/patterns in actions/behaviors • Identify key topics in large collections of text • Detect anomalies in machine output • Ranking search results (PageRank) • Others
  • 16. 16 Building recommendation system • Help users find items they might like based on historical preferences
  • 17. 17 Approach • Collect User Preferences -> User vs Item Matrix • Find Similar Users or Items (Neighborhood-based approach) • Works by finding similarly rated items in the user- item-matrix (e.g. cosine, Pearson-Correlation, Tanimoto Coefficient) • Estimates a user's preference towards an item by looking at his/her preferences towards similar items
  • 18. 18 Collaborative Filtering – User Based Find User Similarity 1. 如何預測用戶1對於商品4的 喜好程度? 2. 找尋n個和用戶1相似的用戶 且購買過商品4(基於購買 記錄的評價)為用戶n 3. 根據用戶n對商品4的評價, 以相似度為權重回填結果 4. 針對所有用戶組合,重覆 1~3,直到所有空格都被填 滿 Items User 1 ? User n 回填結果
  • 19. 19 Find Item Similarity (Amazon) 1. 如何預測用戶1對於商品4的喜 好程度? 2. 從用戶1歷史記錄中,計算商 品n和商品4的相似度(以其他 用戶的歷史記錄) 3. 將用戶1對於商品n的評價,以 產品相似度為權重回填 4. 針對所有商品組合,重覆1~3 直到所有空格都被填滿 Items Users ? 回填結果 Collaborative Filtering – Item Based
  • 20. 20 Test Drive of Mahout Recommender • Group Len Dataset: http://www.grouplens.org/node/12 • 1,000,209 anonymous ratings of 3,900 movies made by 6,040 MovieLens users • movies.dat (movie ids with title and category) • ratings.dat (ratings of movies) • users.dat (user information)
  • 21. 21 Ratings File • Each line of ratings file has the format UserID::MovieID::Rating::Timestamp • Mahout requires following csv format UserID,ItemID,Value • tr –s „:‟ „,‟ < ratings.dat | cut –f1-3 –d, > rating.csv
  • 22. 22 Run Recommendation Job • $ mahout recommenditembased –i [input-hdfs-path] -o [output-hdfs-path] --usersFile [File listing users] --tempDir
  • 23. 23 Recommendation Result • Recommendation Result will look like UserID [ItemID:Weight, ItemID:Weight,…] • Each line represents a UserID with associated recommended ItemID
  • 24. 24 Collect User Behavior Events Implicit (Easy to collect) Explicit (Hard to collect) View Rating (0~5) Shopping Cart (0 or 1) Voting (0 or 1) Order or Buy (0 or 1) Forward or Share (0 or 1) Duration Time (Noisy) Add favorite (0 or 1) Tag (text analysis) Comments (text analysis)
  • 25. 25 Process Event into Preference • Group by different event type, and calculate similarity based on event types. Ex. Also View, Also Buy.. • Weighting: – Explicit Event > Implicit Event – Order, Cart > View • Noise Reduction • Normalization
  • 26. 26 Similarity (Vector Similarity) • Euclidean Distance • Pearson Correlation Coefficient (-1 ~ +1) • Cosine Similarity • Tanimoto Coefficient
  • 27. 27 Complementary • Sometimes CF cannot generate enough recommendation to all users • Cold start problem • New user and new item • Some statistical approaches can be complementary • Ranking is very easy to implement by MR. Word Count ?
  • 28. 28 Etu Recommender Application The Whole System 協同過濾分析 (Collaborative Filtering) 客戶 相似度 分析 轉化率分析 資 料 擷 取 產品 關聯性 分析 推 薦 清 單 推薦引擎 用戶個性化推薦 交易資料 Transaction Info • 歷史訂單資料 • 產品被購買紀錄 Web 互動資料 • 瀏覽 • 點擊 Click • 搜尋 Search • 購物車 Cart • 結帳 check-out • 評論 Rating Mobile 互動資料 • 下載 Download • 點擊 Click • 打卡 Check-in • 付費 Payment • 位置 Location Social Media (3rd party feed) Etu Appliance 瀏覽過本商品的顧客還瀏覽了 購買過此商品的顧客還買了 購物車商品的推薦 組合購買的商品 基於瀏覽,你可能會喜歡 瀏覽過本商品的顧客最終買了 Etu Recommender
  • 29. 29 Data Process Flow Front End Java Script Event Colloector (Nginx) HDFS Log Parser HBase Core Engine Mahout Job User Based Item Based MR Job Ranking & Stats. Rec API Item Mgmt. API Dashboard & Mgmt Console request access log Preprocess & Dispatch Schedule & Flow Control Front End Backend Admin
  • 30. 30 System Components • Nginx – Event Collector & Request Forwarder • Log Parser – Preprocess collected log and dispatch log to HDFS • HDFS – Fundamental storage of the system • Core Engine – Scheduling & Workflow Control – Job Driver • Management Console – Dashboard (PV,UV,Conv. Rate) – Scheduling, Log Viewer, System Configuration
  • 31. 31 System Components – Cont. • Recommendation Jobs – Mahout jobs for CF – MR jobs for Ranking • HBase – Recommendation Result for query • Recommendation API – API wrapper for frontend to query result from HBase table – Handle business logic and policy here • Item Management API – API interface for frontend item management – Allow List, Exception List
  • 32. 32 HBase Table Table Rowkey Column CATEGORY CategoryID column=f:id Category ID column=f:rank ranking by view column=f:rank_cart ranking by cart column=f:rank_order ranking by order column=f:rank_view ranking by view ERUID_USE R ErUid column=f:uid ERUID/UID mapping USER_ERUI D uid column=f:eruid UID/ERUID mapping SEARCH Keyword column=f:id search ID column=f:rec item list
  • 33. 33 HBase Table – Cont. Table Rowkey Column STATS date(Ex:20 13-06-25) column=f:amount (全站成交金額) column=f:item (全站成交商品數) column=f:order (全站成交訂單數) column=f:pv (全站PV數) column=f:uv (全站UV數) column=f:erAmount (推薦成交金額) column=f:erItem (推薦成交商品數) column=f:erOrder (推薦成交訂單數) column=f:erPv (推薦版位 PV) column=f:erUv (推薦版位 UV)
  • 34. 34 HBase Table – Item Table Table Rowkey Column ITEM PID column=f:avl_cat 此category是否處於可以推薦的狀態 column=f:avl_item 此item是否處於可以推薦的狀態 column=f:cat 此item所屬的類別 column=f:id item ID column=f:pry Priority 推薦優先權值 column=f:rec_view 用mahout算出來推薦view的商品 column=f:rec_cart 用mahout算出來推薦放進cart的商品 column=f:rec_order 用mahout算出來推薦購買的商品 column=f:rec_order_co_occurrence 經常與此item一起購 買的商品
  • 35. 35 HBase Table – User Table Table Rowkey Column USER UID column=f:id 用戶ID column=f:rec_cart 用mahout算出來推薦放進cart的商品 column=f:rec_view 用mahout算出來推薦view的商品 column=f:rec_order 用mahout算出來推薦購買的商品 column=f:rec_view_last_item_views 推薦用戶最常被看的 商品
  • 36. 36 Tracking Code Snippet <script id="etu-recommender" type="text/javascript"> var erHostname='${erHostname}' var _qevents = _qevents || []; _qevents.push({ ${paramName} : '${paramValue}', ... }); var erUrlPrefix=('https:' == document.location.protocol ? 'https://':'http://')+erHostname+'/'; (function() { var er = document.createElement('script'); er.type = 'text/javascript'; er.async = true; er.src = erUrlPrefix+'/er.js?'+(new Date().getTime()); var currentJs=document.getElementById('etu-recommender'); currentJs.parentNode.insertBefore(er,currentJs); })(); </script>
  • 37. 37 Sample parameters for tracking a "view" action # Parameter Name Parameter Type Sample Value Required 1 cid String "www.etusolution. com" Yes 2 uid String "johnny_nien" Yes 3 act String "view" Yes 4 pid String "P00001" Yes 5 cat String Array [ "C", "C00001" ] No, but please take it as a yes. 6 avl * Boolean(0 or 1) 1 No Note: Explanation about "avl" will be available later
  • 38. 38 Query Recommendations <script id="etu-recommender" type="text/javascript"> var erUrlPrefix='${erUrlPrefix}'; var _qquery = _qquery || []; _qquery.push({ ${paramName} : '${paramValue}', …… }); function etuRecQueryCallBack(queryParams,queryResult) { // Implement Your Logic Here!!! } var erUrlPrefix=('https:' == document.location.protocol ? 'https://':'http://')+erHostname+'/'; (function() { var er = document.createElement('script'); er.type = 'text/javascript'; er.async = true; er.src = erUrlPrefix+'/er.js?'+(new Date().getTime()); var currentJs=document.getElementById('etu-recommender'); currentJs.parentNode.insertBefore(er,currentJs); })(); </script>
  • 39. 39 Sample parameter for Also Buy … (Item based) # Parameter Name Parameter Type Sample Value Required 1 cid String "www.etusolution. com" Yes 2 type String “item” Yes 3 act String ”order" Yes 4 pid String "P001" Yes 5 cat String "C001" No, but highly recommended
  • 40. 40 Network Topology Router Nginx NN HM DN RS DN RS L2 L2 User LAN Private LAN isolated Master IP (22,443,8888) Web Server Web Server internet Public IP (22,443,8888) Recommender ClusterWeb Server Farm
  • 41. 41 User Event Collection 嵌入JavaScript 擷取客戶線上行為 相關網頁 • 己登入用戶的首頁 • 搜尋頁 • 商品詳情頁 • 添加商品至購物車頁 • 下單付款頁 • 付款完成頁 Online Behavior Online / Offline Records 客戶行為 (Event) • 瀏覽、點擊 Click • 搜尋 Search • 放入購物車 Cart • 下單付款 Check-out • 評論 Rating 交易資料 • 歷史訂單 • 產品資料 Recommender Batch Import
  • 42. 42 Generate Recommendation Result 用戶個性化推薦 瀏覽過本商品的顧客還瀏覽了 購買過此商品的顧客還買了 瀏覽過本商品的顧客最終買了 購物車商品的推薦 組合購買的商品 基於瀏覽,你可能會喜歡 資料來源:瀏覽記錄 + 購物車記錄 + 購買記錄 作用:強化推薦個性,提高使用者體驗度,提高訂單轉化率 資料來源:瀏覽記錄 作用:降低使用者的跳出率,提高訂單轉化率 資料來源:購買記錄 作用:強化交叉銷售效果,激發顧客再次下單的欲望 資料來源:瀏覽記錄 + 購買記錄 作用:降低用戶跳出率,幫助用戶提高決策率,提高訂單轉化率 資料來源:購物車記錄 + 購買記錄 作用:通過向上銷售原理,在幫助顧客滿足基本需求之後,引導其購買 更多感興趣的商品,有效提升銷售量,增加毛利率 資料來源:購物車記錄 + 購買記錄 作用:商品組合,有效提高商品交叉行銷,提高銷售量 資料來源:瀏覽記錄 作用:最大程度減少跳出率,提升顧客忠誠度,增加商品的複購率
  • 43. 43 Recommender 如何應用在電子商務 Etu Recommender Application 協同過濾 分析 Collaborative Filtering 轉化率分析 資 料 擷 取 推 薦 清 單 推薦引擎 Etu Appliance Etu Recommender Product Pages Category Pages Search Results Cart Pages Email Confirmation EDM 歷史訂單 產品資料 即時訂單 瀏覽、點擊 放入購物車 結帳 線上評論 搜尋
  • 44. 44 Recommender 的轉化率分析 Online Performance Tracking Item A Item A 透過點擊 推薦清單 透過主頁或其他所有頁面 PV1, UV1 PV2, UV2 推薦商品點擊率 = PV2 or UV2 PV1 or UV1 推薦商品轉化率 = 透過 推薦清單 U-Cart 2 UV2 or PV2 **  PV : page view  UV : unique visitor  U-Cart : added to cart by UV U-Cart 1 U-Cart 2 Algorithm Benchmark • Train vs Test (80-20) • A/B test
  • 45. 45 Summary • Mahout is very useful if you would like to build a machine learning application on top of Hadoop • BUT, a recommendation system is not algorithm only • DON‟T re-invent the wheels. Leverage mahout and hadoop • Put most of your efforts on integration, performance tuning, and business logic
  • 46. 46 Future Roadmaps • Offline to online integration -> Offline User Event Collection • 360 Degree CRM -> CRM Connector • Social Recommendation -> Social Connector • Retargeting -> Customer Behavior Data Warehouse • Go real-time!
  • 48. www.etusolution.com info@etusolution.com Taipei, Taiwan 318, Rueiguang Rd., Taipei 114, Taiwan T: +886 2 7720 1888 F: +886 2 8798 6069 Beijing, China Room B-26, Landgent Center, No. 24, East Third Ring Middle Rd., Beijing, China 100022 T: +86 10 8441 7988 F: +86 10 8441 7227 Contact
  • 50. 50 Algorithms Examples – Recommendation • Prediction: Estimate Bob's preference towards “The Matrix” 1. Look at all items that – a) are similar to “The Matrix“ – b) have been rated by Bob => “Alien“, “Inception“ 2. Estimate the unknown preference with a weighted sum
  • 51. 51 Algorithms Examples – Recommendation • MapReduce phase 1 – Map – Make user the key (Alice, Matrix, 5) (Alice, Alien, 1) (Alice, Inception, 4) (Bob, Alien, 2) (Bob, Inception, 5) (Peter, Matrix, 4) (Peter, Alien, 3) (Peter, Inception, 2) Alice (Matrix, 5) Alice (Alien, 1) Alice (Inception, 4) Bob (Alien, 2) Bob (Inception, 5) Peter (Matrix, 4) Peter (Alien, 3) Peter (Inception, 2)
  • 52. 52 Algorithms Examples – Recommendation • MapReduce phase 1 – Reduce – Create inverted index Alice (Matrix, 5) Alice (Alien, 1) Alice (Inception, 4) Bob (Alien, 2) Bob (Inception, 5) Peter (Matrix, 4) Peter (Alien, 3) Peter (Inception, 2) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Bob (Alien, 2) (Inception, 5) Peter(Matrix, 4) (Alien, 3) (Inception, 2)
  • 53. 53 Algorithms Examples – Recommendation • MapReduce phase 2 – Map – Isolate all co-occurred ratings (all cases where a user rated both items) Matrix, Alien (5,1) Matrix, Alien (4,3) Alien, Inception (1,4) Alien, Inception (2,5) Alien, Inception (3,2) Matrix, Inception (4,2) Matrix, Inception (5,4) Alice (Matrix, 5) (Alien, 1) (Inception, 4) Bob (Alien, 2) (Inception, 5) Peter(Matrix, 4) (Alien, 3) (Inception, 2)
  • 54. 54 Algorithms Examples – Recommendation • MapReduce phase 2 – Reduce – Compute similarities Matrix, Alien (5,1) Matrix, Alien (4,3) Alien, Inception (1,4) Alien, Inception (2,5) Alien, Inception (3,2) Matrix, Inception (4,2) Matrix, Inception (5,4) Matrix, Alien (-0.47) Matrix, Inception (0.47) Alien, Inception(-0.63)