Profiler for Smartphone Users Interests Using Modified Hierarchical Agglomerative Clustering Algorithm Based on Browsing History
1. Profiler for Smartphone Users Interests Using
Modified Hierarchical Agglomerative Clustering
Algorithm Based on Browsing History
Priagung Khusumanegara, Rischan Mafrur, and Deokjai Choi
School of Electronics & Computer Engineering
Chonnam National University
{priagung.123, rischanlab}@gmail.com, dchoi@jnu.ac.kr
The 23rd IFIP World Computer Congress (WCC 2015)
Tuesday, October 6
2. Outline
• Introduction
• Proposed System
• Data Collection
• Data Extraction
• URL Filtering Categories
• Modified Hierarchical Agglomerative Clustering
• Experimental Results
3. Introduction
Motivations:
• Smartphone provides many applications to support our activity which one of
the applications is web browser applications.
• People spend much time on browsing activity for finding useful information
that they are interested on it.
• It is not easy to find the particular pieces of information that they interested
on it.
Proposed System:
• We proposed a Modified Hierarchical Agglomerative Clustering
Algorithms on a server-based application to provides smartphone users
profiling for interests-focused based on browsing history.
5. Data Collection
• Data Collection Application
– It is built using the Funf framework. Funf framework is an
extensible sensing and data processing framework for
mobile devices.
• Number of participants
– 30 Chonnam National University Students (aged 19-22)
• Period of data collection
– 1 month (July 2014)
Figure: Data Collection Application
6. Data Extraction
• Example of Collected Data
• We extract collected data to observe a resource name
part of URL structure which is useful information to
analyze user interest.
• Example of data extraction
7. URL Filtering Categories
• URL Labeling
The URL data is labeled based on a popular Web directory.
Open Directory Project (ODP) (dmoz.org).
• URL Filtering Categories
• The matrix of filtering categories
Category Group Category Type
Business Business/Economy, Job Search/Careers, Real Estate, and Shopping
Communications and search Blog/Web Communication, Social Networks, Email, and Search Engines
General Computer/Internet, Education, News/Media, and Reference
Lifestyle
Entertainment, Games, Arts, Humor, Religion, Restaurants/Food, and Tra
vel
𝐶4𝑥𝑛 =
𝑘𝑒𝑦1,1 𝑘𝑒𝑦1,2 … 𝑘𝑒𝑦1,𝑛
𝑘𝑒𝑦2,1 𝑘𝑒𝑦2,2 … 𝑘𝑒𝑦2,𝑛
𝑘𝑒𝑦3,1
𝑘𝑒𝑦4,1
𝑘𝑒𝑦3,2
𝑘𝑒𝑦4,2
…
…
𝑘𝑒𝑦3,𝑛
𝑘𝑒𝑦4,𝑛
10. Experimental Results
(Cont’d)
• C 4.5 is the best known classification
algorithm used to generate decision trees
for continuous and discrete attributes.
• Attributes:
– Total session time
– Total time a user stays at the site
– Total number of accessed pages during the whole
session
13. Conclusion and Futures Work
• We proposed a modified Hierarchical Agglomerative Clustering that can
automatically provides a user interests profile.
• The proposed algorithms can measure degree of users’ interests and
inferring particular pieces of information that they interested on it based on
browsing history
• Our work can outperforms the C4.5 algorithm in execution time and
accuracy on all memory utilization.
• In the future, we need to implement Map-Reduce algorithm on modified
Hierarchical Agglomerative Clustering to enhance performance of clustering
algorithm.
14. Acknowledgements
This research was supported by Basic Science
Research Program through the National Research
Foundation of Korea (NRF) funded by the Ministry
of Education (2012R1A1A2007014).