SlideShare a Scribd company logo
2013. 06. 13(Thu)
Team 11. Junghyun Kwon
Kunwoo Park
Jongin Lee
Seungkyu Nam
I know
what items really are
• Problem
• Challenges
• Related works
• Motivation
• Approaches
• Experiment setup
• Feature extraction
• Result
• Discussion
2
Contents
• Purpose of Track 1 in 2012 KDD cup
• Predict which users(or items) a Weibo user might follow.
• Recommendation System [1]
• Save valuable time sifting through
less relevant stories
• Increase customer satisfaction
3
Problem
Twitter.com
• 90% data of the world are generated for the last three years
• 1.0 × 1016 byte everyday
• Sensor, Mobile, SNS, Online transaction
• 10 billion tweets everyday
• 30 billion FB msgs everyday (*)
• …
4
Problem
Source: http://goo.gl/9xXaG
*: BLOTER.NET 12.01.26
• Problem
• Too many data to find the informative features
• 80 million training data, Large user and item meta data
• Few accepted results compared to many rejected results
• Take too much time for data processing
• SVM for all data: 16 days
• Lack of computing resources
• Our goal
• Train large and complex Weibo data as much as
possible in a single machine
• Find effective features with a simpler(and faster) approach
5
Challenges
• Online learning [2],[3]
• Learns one instance at a time
• Ex. Product searching
• Pro – minimize some performance criteria
• Con – many incorrect label feedback
• Map-Reduce [4]
• Parallel, distributed model for processing large data
• Pro – good for lots of input, intermediate and output data
• Con – bad for synchronization required data
6
Related works
7
Motivation
User Keywords
Year of birth
Gender
Number of tweets
Tag-ids
Category
Keywords
What Item is favorite for which user ?
8
Motivation
User
Item
User IDs in User_profile.txt include item IDs in item.txt.
9
Motivation
User Keywords
Year of birth
Gender
Number of tweets
Tag-ids
Category
Keywords
User Keywords
Year of birth
Gender
Number of tweets
Tag-ids
Feature 1
Feature 2
Feature 3
Feature 4
Feature 5
Our training data!
• Extract features between users and items using
metadata of user and item.
• Train model by Support Vector Machine
• Libsvm in R
10
Initial Approach
Failure!
Lots of computation time: 16 days for training SVN
Lack of computational resource: single machine
• Apply logistic regression using stochastic gradient
descent
• Logistic regression
• Stochastic gradient descent
11
Alternative Approach
stochastic gradient descent:
gradient descent:
1. Training data (73,209,277 user-item pairs)
- applying target ID, 38,332,489 user-item pairs
2. Test data (public, 2,617,106 user-item pairs)
3. Used features
- User’s number of tweet
- User’s number of tag
- Age similarity
- Item’s number of tweet
- Item’s number of tag
- Gender similarity
- Network similarity
- Number of Item’s follower
- Keyword similarity
4. Construct separate models using each feature
5. Evaluation metrics : F1 score, MAP@3
6. Baseline : Random prediction
12
Experiment Setup
• Age similarity = zscore( ||user_age – item_age|| )
• Gender similarity =
1
−1
0
𝑖𝑓 𝑠𝑎𝑚𝑒 𝑔𝑒𝑛𝑑𝑒𝑟
𝑖𝑓 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑔𝑒𝑛𝑑𝑒𝑟
𝑖𝑓 𝑢𝑛𝑘𝑜𝑤𝑛 𝑔𝑒𝑛𝑑𝑒𝑟
• Z-scored number of tweets from user
• Z-scored number of tweets from item
• Z-scored number of tags from user
• Z-scored number of tags from item
• Z-scored number of followers of item
13
Feature Extraction
• Keyword similarity =
𝑢𝑠𝑒𝑟_ 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 ∙ 𝑖𝑡𝑒𝑚_𝑘𝑒𝑦𝑤𝑜𝑟𝑑
𝑢𝑠𝑒𝑟_ 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 𝑖𝑡𝑒𝑚 _𝑘𝑒𝑦𝑤𝑜𝑟𝑑
: (cosine similarity)
1. Reduce lower document frequency(DF) under 20%. (255,141 → 2,507)
2. Using PCA, reduce the dimension (2,507 → 1,191) by choosing the k
as follow :
Begin k=1:N (number of total PC)
when
error = 1 −
λ 𝑖
𝑘
𝑖=1
λ 𝑖
𝑁
𝑖=1
≤ 0.05
End
14
Feature Extraction
• Network similarity =
𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑢𝑠𝑒𝑟 ∩ 𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔(𝑖𝑡𝑒𝑚)
𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑢𝑠𝑒𝑟 ∪ 𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔(𝑖𝑡𝑒𝑚)
𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑢 : 𝑆𝑒𝑡 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 𝑎 𝑢𝑠𝑒𝑟 𝑓𝑜𝑙𝑙𝑜𝑤𝑠
15
Feature Extraction
user
item
𝟐
𝟓
= 𝟎. 𝟒
• Homophilly
• Similar people get together!
Age Similarity, Gender Similarity
16
Background of choosing features
• Friend recommendation in Facebook
17
Background of choosing features
Common Friends
Works!!!!!
18
Results
• All models outperformed random predictor
• Network similarity showed the highest f1 score
• Model using all features showed the best performance
• Top-5 model covers more accepted items compared to the model using all features
• Interestingly, prediction conducted by only two feature,
age similarity and network similarity, presented similar results with Top-5 model.
• Contribution
• Successfully trained large data set with a light classifier
• Found many features by analyzing meta data
• We saw the unseen 
• Limitation
• Our models fairly showed good prediction results,
but they are not comparable to the level of KDD-Cup winners
• Possible solution: ensemble learning
• to make the best model using multiple weak classifiers(predictors)
19
Discussion
• Power of feature scaling
• Importance of learning rate
• Difficulty of handling Big Data
• Data reduction technique is essential for handling
large dimensional data.
20
What we learned 
Q & A
21
[1] Phelan, Owen, Kevin McCarthy, and Barry Smyth. "Using twitter to
recommend real-time topical news." Proceedings of the third ACM c
onference on Recommender systems. ACM, 2009.
[2] Littlestone, Nick. "Learning quickly when irrelevant attributes abo
und: A new linear-threshold algorithm." Machine learning 2.4 (1988):
285-318.
[3] Mairal, Julien, et al. "Online learning for matrix factorization and s
parse coding." The Journal of Machine Learning Research 11 (2010):
19-60.
[4] Tang, Jie, et al. "Social influence analysis in large-scale networks."
Proceedings of the 15th ACM SIGKDD international conference on K
nowledge discovery and data mining. ACM, 2009.
22
References
Appendix
23
• F-score
= 2 ×
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ×𝑟𝑒𝑐𝑎𝑙𝑙
𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +𝑟𝑒𝑐𝑎𝑙𝑙
= 2 x
𝑡𝑝
𝑡𝑝+𝑓𝑝
×
𝑡𝑝
𝑡𝑝+𝑓𝑛
𝑡𝑝
𝑡𝑝+𝑓𝑛
+
𝑡𝑝
𝑡𝑝+𝑓𝑛
= 2 ×
𝑡𝑝
2×𝑡𝑝+𝑓𝑝+𝑓𝑛
24
Baseline for F-score
total : 2617106
tp(true positive) : 30792
fp(false positive) : 1276492
tn(true negative) : 1279030
fn(false negative) : 30792
precision : 0.0235541779751
recall : 0.5
f-score : 0.0449889982087
rec_test_txt(target_user & public)
http://en.wikipedia.org/wiki/F1_score
Random Prediction
• MAP@3 (Mean Average Precision)
• ap@n = Σ k=1,...,n P(k) / (number of items clicked in m items)
• AP@n = Σ i=1,...,N ap@ni / N
25
Baseline for MAP@3
rec_log_test.txt(target_user&public)
https://www.kddcup2012.org/c/kddcup2012-track1/details/Evaluation
(UserId)t(ItemId)t(Result)t(Unix-timestamp)
(UserId)t(ItemId)t(ItemId)t(ItemId)
MAP@3 0.034106932193
Random Prediction

More Related Content

What's hot

Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
Enrico Palumbo
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
Chris Johnson
 
Revenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social AdvertisingRevenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social Advertising
Cigdem Aslay
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
Chris Johnson
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
Tapas Majumdar
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
DataWorks Summit/Hadoop Summit
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
Bhaskar Mitra
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
ananth
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
MLconf
 
DeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsDeepWalk: Online Learning of Representations
DeepWalk: Online Learning of Representations
Bryan Perozzi
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
BAINIDA
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
Shuai Zhang
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
Nesreen K. Ahmed
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Sujit Pal
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
Ding Li
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
ananth
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
Cigdem Aslay
 
Deep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural NetworksDeep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural Networks
Madhu Sanjeevi (Mady)
 

What's hot (20)

Entity2rec recsys
Entity2rec recsysEntity2rec recsys
Entity2rec recsys
 
Collaborative Filtering with Spark
Collaborative Filtering with SparkCollaborative Filtering with Spark
Collaborative Filtering with Spark
 
Revenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social AdvertisingRevenue Maximization in Incentivized Social Advertising
Revenue Maximization in Incentivized Social Advertising
 
Algorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at SpotifyAlgorithmic Music Recommendations at Spotify
Algorithmic Music Recommendations at Spotify
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Recurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas MikolovRecurrent networks and beyond by Tomas Mikolov
Recurrent networks and beyond by Tomas Mikolov
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Erik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better MortgageErik Bernhardsson, CTO, Better Mortgage
Erik Bernhardsson, CTO, Better Mortgage
 
DeepWalk: Online Learning of Representations
DeepWalk: Online Learning of RepresentationsDeepWalk: Online Learning of Representations
DeepWalk: Online Learning of Representations
 
Deep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr SanparitDeep learning and image analytics using Python by Dr Sanparit
Deep learning and image analytics using Python by Dr Sanparit
 
Talk@rmit 09112017
Talk@rmit 09112017Talk@rmit 09112017
Talk@rmit 09112017
 
Representation Learning in Large Attributed Graphs
Representation Learning in Large Attributed GraphsRepresentation Learning in Large Attributed Graphs
Representation Learning in Large Attributed Graphs
 
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
Embed, Encode, Attend, Predict – applying the 4 step NLP recipe for text clas...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
Machine learning with graph
Machine learning with graphMachine learning with graph
Machine learning with graph
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
MaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - OverviewMaxEnt (Loglinear) Models - Overview
MaxEnt (Loglinear) Models - Overview
 
Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...Workload-aware materialization for efficient variable elimination on Bayesian...
Workload-aware materialization for efficient variable elimination on Bayesian...
 
Deep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural NetworksDeep Learning Part 1 : Neural Networks
Deep Learning Part 1 : Neural Networks
 

Similar to [CS570] Machine Learning Team Project (I know what items really are)

Lecture 1
Lecture 1Lecture 1
Lecture 1
neocremia
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Lippo Group Digital
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
Ashish Jaiman
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Hima Patel
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
CS, NcState
 
Crowdsourcing the Semantic Web
Crowdsourcing the Semantic WebCrowdsourcing the Semantic Web
Crowdsourcing the Semantic Web
Elena Simperl
 
Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)
Nikhil Dandekar
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousingVaishnavi
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
Neo4j
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
Dr. Shikha Mehta
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning
Nikolaos Vasiloglou
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGanger
QuantUniversity
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
Crai Macdonald
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
Tao Xie
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
David Zibriczky
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
Yalçın Yenigün
 
Data Driven: The Ancestry.com Journey to Self-Service Analytics
Data Driven: The Ancestry.com Journey to Self-Service AnalyticsData Driven: The Ancestry.com Journey to Self-Service Analytics
Data Driven: The Ancestry.com Journey to Self-Service Analytics
William Yetman
 
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Masayuki Nii
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
Sasha Lazarevic
 
The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
Seokhwan Kim
 

Similar to [CS570] Machine Learning Team Project (I know what items really are) (20)

Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomera...
 
Deep Learning with CNTK
Deep Learning with CNTKDeep Learning with CNTK
Deep Learning with CNTK
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
Crowdsourcing the Semantic Web
Crowdsourcing the Semantic WebCrowdsourcing the Semantic Web
Crowdsourcing the Semantic Web
 
Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)Machine Learning at Quora (2/26/2016)
Machine Learning at Quora (2/26/2016)
 
Business intelligence and data warehousing
Business intelligence and data warehousingBusiness intelligence and data warehousing
Business intelligence and data warehousing
 
Relationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine LearningRelationships Matter: Using Connected Data for Better Machine Learning
Relationships Matter: Using Connected Data for Better Machine Learning
 
Shikha fdp 62_14july2017
Shikha fdp 62_14july2017Shikha fdp 62_14july2017
Shikha fdp 62_14july2017
 
A gentle introduction to relational learning
A gentle introduction to relational learning A gentle introduction to relational learning
A gentle introduction to relational learning
 
Synthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGangerSynthetic Data Generation with DoppelGanger
Synthetic Data Generation with DoppelGanger
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
Intelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software EngineeringIntelligent Software Engineering: Synergy between AI and Software Engineering
Intelligent Software Engineering: Synergy between AI and Software Engineering
 
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
A Combination of Simple Models by Forward Predictor Selection for Job Recomme...
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
Data Driven: The Ancestry.com Journey to Self-Service Analytics
Data Driven: The Ancestry.com Journey to Self-Service AnalyticsData Driven: The Ancestry.com Journey to Self-Service Analytics
Data Driven: The Ancestry.com Journey to Self-Service Analytics
 
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
Framework Enabling End-Users to Maintain Web Applications (ICICWS2015)
 
BMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist DeckBMDSE v1 - Data Scientist Deck
BMDSE v1 - Data Scientist Deck
 
The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)The Eighth Dialog System Technology Challenge (DSTC8)
The Eighth Dialog System Technology Challenge (DSTC8)
 

More from Kunwoo Park

Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Kunwoo Park
 
Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction Ratings
Kunwoo Park
 
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Kunwoo Park
 
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on Twitter
Kunwoo Park
 
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용
Kunwoo Park
 
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
Kunwoo Park
 
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGs
Kunwoo Park
 
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?
Kunwoo Park
 
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터
Kunwoo Park
 
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9
Kunwoo Park
 
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7
Kunwoo Park
 
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2
Kunwoo Park
 

More from Kunwoo Park (12)

Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
Detecting Misleading Headlines in Online News: Hands-on Experiences on Attent...
 
Positivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction RatingsPositivity Bias in Customer Satisfaction Ratings
Positivity Bias in Customer Satisfaction Ratings
 
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
Achievement and Friends: Key Factors of Player Retention Vary Across Player L...
 
Persistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on TwitterPersistent Sharing of Fitness App Status on Twitter
Persistent Sharing of Fitness App Status on Twitter
 
새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용새해 목표 달성을 위한 생활 데이터의 활용
새해 목표 달성을 위한 생활 데이터의 활용
 
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
소셜 데이터를 이용한 연구소개 - 피트니스 앱의 지속 사용에 관한 연구
 
MS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGsMS thesis defense - Gender swapping and its effects in MMORPGs
MS thesis defense - Gender swapping and its effects in MMORPGs
 
[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?[DISC2013] Mood and Weather: Feeling the Heat?
[DISC2013] Mood and Weather: Feeling the Heat?
 
[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터[20131002 gct606] 날씨,감정,그리고 트위터
[20131002 gct606] 날씨,감정,그리고 트위터
 
Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9Social Network Analysis:Methods and Applications Chapter 9
Social Network Analysis:Methods and Applications Chapter 9
 
Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7Social Network Analysis : Methods and Applications Chapter 6 and 7
Social Network Analysis : Methods and Applications Chapter 6 and 7
 
Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2Social Network Analysis : Methods and Applications Ch 1,2
Social Network Analysis : Methods and Applications Ch 1,2
 

Recently uploaded

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
RinaMondal9
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 

Recently uploaded (20)

Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
Free Complete Python - A step towards Data Science
Free Complete Python - A step towards Data ScienceFree Complete Python - A step towards Data Science
Free Complete Python - A step towards Data Science
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 

[CS570] Machine Learning Team Project (I know what items really are)

  • 1. 2013. 06. 13(Thu) Team 11. Junghyun Kwon Kunwoo Park Jongin Lee Seungkyu Nam I know what items really are
  • 2. • Problem • Challenges • Related works • Motivation • Approaches • Experiment setup • Feature extraction • Result • Discussion 2 Contents
  • 3. • Purpose of Track 1 in 2012 KDD cup • Predict which users(or items) a Weibo user might follow. • Recommendation System [1] • Save valuable time sifting through less relevant stories • Increase customer satisfaction 3 Problem Twitter.com
  • 4. • 90% data of the world are generated for the last three years • 1.0 × 1016 byte everyday • Sensor, Mobile, SNS, Online transaction • 10 billion tweets everyday • 30 billion FB msgs everyday (*) • … 4 Problem Source: http://goo.gl/9xXaG *: BLOTER.NET 12.01.26
  • 5. • Problem • Too many data to find the informative features • 80 million training data, Large user and item meta data • Few accepted results compared to many rejected results • Take too much time for data processing • SVM for all data: 16 days • Lack of computing resources • Our goal • Train large and complex Weibo data as much as possible in a single machine • Find effective features with a simpler(and faster) approach 5 Challenges
  • 6. • Online learning [2],[3] • Learns one instance at a time • Ex. Product searching • Pro – minimize some performance criteria • Con – many incorrect label feedback • Map-Reduce [4] • Parallel, distributed model for processing large data • Pro – good for lots of input, intermediate and output data • Con – bad for synchronization required data 6 Related works
  • 7. 7 Motivation User Keywords Year of birth Gender Number of tweets Tag-ids Category Keywords What Item is favorite for which user ?
  • 8. 8 Motivation User Item User IDs in User_profile.txt include item IDs in item.txt.
  • 9. 9 Motivation User Keywords Year of birth Gender Number of tweets Tag-ids Category Keywords User Keywords Year of birth Gender Number of tweets Tag-ids Feature 1 Feature 2 Feature 3 Feature 4 Feature 5 Our training data!
  • 10. • Extract features between users and items using metadata of user and item. • Train model by Support Vector Machine • Libsvm in R 10 Initial Approach Failure! Lots of computation time: 16 days for training SVN Lack of computational resource: single machine
  • 11. • Apply logistic regression using stochastic gradient descent • Logistic regression • Stochastic gradient descent 11 Alternative Approach stochastic gradient descent: gradient descent:
  • 12. 1. Training data (73,209,277 user-item pairs) - applying target ID, 38,332,489 user-item pairs 2. Test data (public, 2,617,106 user-item pairs) 3. Used features - User’s number of tweet - User’s number of tag - Age similarity - Item’s number of tweet - Item’s number of tag - Gender similarity - Network similarity - Number of Item’s follower - Keyword similarity 4. Construct separate models using each feature 5. Evaluation metrics : F1 score, MAP@3 6. Baseline : Random prediction 12 Experiment Setup
  • 13. • Age similarity = zscore( ||user_age – item_age|| ) • Gender similarity = 1 −1 0 𝑖𝑓 𝑠𝑎𝑚𝑒 𝑔𝑒𝑛𝑑𝑒𝑟 𝑖𝑓 𝑑𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡 𝑔𝑒𝑛𝑑𝑒𝑟 𝑖𝑓 𝑢𝑛𝑘𝑜𝑤𝑛 𝑔𝑒𝑛𝑑𝑒𝑟 • Z-scored number of tweets from user • Z-scored number of tweets from item • Z-scored number of tags from user • Z-scored number of tags from item • Z-scored number of followers of item 13 Feature Extraction
  • 14. • Keyword similarity = 𝑢𝑠𝑒𝑟_ 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 ∙ 𝑖𝑡𝑒𝑚_𝑘𝑒𝑦𝑤𝑜𝑟𝑑 𝑢𝑠𝑒𝑟_ 𝑘𝑒𝑦𝑤𝑜𝑟𝑑 𝑖𝑡𝑒𝑚 _𝑘𝑒𝑦𝑤𝑜𝑟𝑑 : (cosine similarity) 1. Reduce lower document frequency(DF) under 20%. (255,141 → 2,507) 2. Using PCA, reduce the dimension (2,507 → 1,191) by choosing the k as follow : Begin k=1:N (number of total PC) when error = 1 − λ 𝑖 𝑘 𝑖=1 λ 𝑖 𝑁 𝑖=1 ≤ 0.05 End 14 Feature Extraction
  • 15. • Network similarity = 𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑢𝑠𝑒𝑟 ∩ 𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔(𝑖𝑡𝑒𝑚) 𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑢𝑠𝑒𝑟 ∪ 𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔(𝑖𝑡𝑒𝑚) 𝐹𝑜𝑙𝑙𝑜𝑤𝑖𝑛𝑔 𝑢 : 𝑆𝑒𝑡 𝑜𝑓 𝑛𝑜𝑑𝑒𝑠 𝑎 𝑢𝑠𝑒𝑟 𝑓𝑜𝑙𝑙𝑜𝑤𝑠 15 Feature Extraction user item 𝟐 𝟓 = 𝟎. 𝟒
  • 16. • Homophilly • Similar people get together! Age Similarity, Gender Similarity 16 Background of choosing features
  • 17. • Friend recommendation in Facebook 17 Background of choosing features Common Friends Works!!!!!
  • 18. 18 Results • All models outperformed random predictor • Network similarity showed the highest f1 score • Model using all features showed the best performance • Top-5 model covers more accepted items compared to the model using all features • Interestingly, prediction conducted by only two feature, age similarity and network similarity, presented similar results with Top-5 model.
  • 19. • Contribution • Successfully trained large data set with a light classifier • Found many features by analyzing meta data • We saw the unseen  • Limitation • Our models fairly showed good prediction results, but they are not comparable to the level of KDD-Cup winners • Possible solution: ensemble learning • to make the best model using multiple weak classifiers(predictors) 19 Discussion
  • 20. • Power of feature scaling • Importance of learning rate • Difficulty of handling Big Data • Data reduction technique is essential for handling large dimensional data. 20 What we learned 
  • 22. [1] Phelan, Owen, Kevin McCarthy, and Barry Smyth. "Using twitter to recommend real-time topical news." Proceedings of the third ACM c onference on Recommender systems. ACM, 2009. [2] Littlestone, Nick. "Learning quickly when irrelevant attributes abo und: A new linear-threshold algorithm." Machine learning 2.4 (1988): 285-318. [3] Mairal, Julien, et al. "Online learning for matrix factorization and s parse coding." The Journal of Machine Learning Research 11 (2010): 19-60. [4] Tang, Jie, et al. "Social influence analysis in large-scale networks." Proceedings of the 15th ACM SIGKDD international conference on K nowledge discovery and data mining. ACM, 2009. 22 References
  • 24. • F-score = 2 × 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ×𝑟𝑒𝑐𝑎𝑙𝑙 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 +𝑟𝑒𝑐𝑎𝑙𝑙 = 2 x 𝑡𝑝 𝑡𝑝+𝑓𝑝 × 𝑡𝑝 𝑡𝑝+𝑓𝑛 𝑡𝑝 𝑡𝑝+𝑓𝑛 + 𝑡𝑝 𝑡𝑝+𝑓𝑛 = 2 × 𝑡𝑝 2×𝑡𝑝+𝑓𝑝+𝑓𝑛 24 Baseline for F-score total : 2617106 tp(true positive) : 30792 fp(false positive) : 1276492 tn(true negative) : 1279030 fn(false negative) : 30792 precision : 0.0235541779751 recall : 0.5 f-score : 0.0449889982087 rec_test_txt(target_user & public) http://en.wikipedia.org/wiki/F1_score Random Prediction
  • 25. • MAP@3 (Mean Average Precision) • ap@n = Σ k=1,...,n P(k) / (number of items clicked in m items) • AP@n = Σ i=1,...,N ap@ni / N 25 Baseline for MAP@3 rec_log_test.txt(target_user&public) https://www.kddcup2012.org/c/kddcup2012-track1/details/Evaluation (UserId)t(ItemId)t(Result)t(Unix-timestamp) (UserId)t(ItemId)t(ItemId)t(ItemId) MAP@3 0.034106932193 Random Prediction