Time Based Cluster Analysis for Automatic Blog Generation

3,131 views

Published on

Presented at the Social Web Search and Mining Workshop, WWW2008 in Beijing

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,131
On SlideShare
0
From Embeds
0
Number of Embeds
21
Actions
Shares
0
Downloads
25
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Time Based Cluster Analysis for Automatic Blog Generation

    1. Time Based Context Cluster Analysis for Automatic Blog Generation Luca Costabello and Laurent-Walter Goix Telecom Italia, Italy
    2. Context as Blog Content <ul><li>User context is gaining importance </li></ul><ul><ul><li>Location info </li></ul></ul><ul><ul><li>Nearby buddies </li></ul></ul><ul><ul><li>The surrounding environment in general </li></ul></ul><ul><li>We mine context data to detect daily user actions </li></ul><ul><li>User actions are converted into natural text </li></ul><ul><li>Blog posts describing the user days enable the detection of a community of users with similar behavioral patterns. </li></ul>
    3. Context-Based Blog Generation 1) Raw data gathering Daily actions 2) Offline Cluster analysis 3) Blog post generation
    4. System Architecture
    5. Cluster Analysis: Detecting User Actions 2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a 2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a 2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a 2007-10-03 08:47:50 222-1-61104-72395762 n/a,n/a 2007-10-03 08:47:21 222-1-61104-72395762 n/a,n/a 2007-10-03 08:46:51 222-1-61104-72384437 n/a,n/a 2007-10-03 08:46:20 222-1-61104-72376116 n/a,n/a 2007-10-03 08:45:15 222-1-61104-72395763 n/a,n/a 2007-10-03 08:44:02 222-1-61104-72400263 n/a,n/a 2007-10-03 08:42:33 222-1-61104-72395770 n/a,n/a 2007-10-03 08:42:02 222-1-61104-72400262 n/a,n/a 2007-10-03 08:40:08 222-1-24650-1281 residence,home 2007-10-03 08:36:26 222-1-24650-1281 residence,home 2007-10-03 08:33:02 222-1-24650-1281 residence,home Cluster 1 (Static) Start 08:58 End 11:02 CGI 222-1-61101-162201 VP CGI Office, TILab VP Bth Not available Cluster 2 (Movement) Start 08:42 End 08:56 CGI From 222-1-24550-1281 CGI To 222-1-24650-121 VP CGI From Residence,home VP CGI To Office, TILab VP Bth Not available Timestamp Cell ID Cell ID Virtual Place
    6. Clustering Algorithms Dimensions <ul><li>Location </li></ul><ul><ul><li>GSM/UMTS Cell IDs </li></ul></ul><ul><ul><li>User-defined Cell ID Labels </li></ul></ul><ul><li>Time </li></ul><ul><ul><li>Chronological order of actions must be respected </li></ul></ul>Categorical attributes Euclidean distance not available Time must be evaluated according to “temporal distance” Ad-hoc algorithms had to be designed
    7. Cell-Based Location Data Issues <ul><li>Context updates occur with variable frequency </li></ul><ul><li>Detecting static situations VS detecting movement </li></ul><ul><li>Base station concentration affects context data patterns </li></ul><ul><li>Frequent cell handovers during static actions </li></ul>
    8. Compare&Merge Algorithm 2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a 2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a 2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a Context History Preliminary Context Scan Long Temporary Cluster Short Temporary Clusters Temporary Clusters Merge Static Cluster Movement Cluster Static Cluster
    9. MultiLevel Sliding Window Algorithm <ul><li>For each window iteration: </li></ul><ul><li>Check if any user-defined label is available. </li></ul><ul><li>Detect user movement </li></ul><ul><li>Detect the most frequent position </li></ul><ul><li>Merge window data with previous window iteration (if detected position is the same) </li></ul>
    10. Algorithms Comparison Lower precision than C&M. (A 30 minute long window leads to a less than 30 minutes error) Very high in optimal situations (less than 2-5 minutes) Precision <ul><li>Non-labeled areas </li></ul><ul><li>Frequent cell handovers </li></ul><ul><li>Good user labeling </li></ul><ul><li>Cells with low handovers issues </li></ul>Optimal usage None Frequent cell handovers Critical situations MultiLevel Sliding Window Compare&Merge  
    11. Cluster Analysis Accuracy VS User Perception
    12. From Clusters To Blog Post NLG Natural Text Generation Action Detector Context Clusters User Preferences
    13. Results <ul><li>Mining context history leads to user pattern discovery </li></ul><ul><li>Daily actions sharing </li></ul><ul><li>Detection of user communities, according to daily behaviors </li></ul><ul><li>Clustering accuracy VS personal memories perception </li></ul><ul><li>Movement detection </li></ul><ul><li>Location-labeling importance </li></ul>
    14. <ul><li>Any Questions? </li></ul>Thank You! luca.costabello@guest.telecomitalia.it [email_address] Email

    ×