Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Time Based Context Cluster Analysis for Automatic Blog Generation Luca Costabello and Laurent-Walter Goix Telecom Italia, ...
Context as Blog Content <ul><li>User context  is gaining importance </li></ul><ul><ul><li>Location info </li></ul></ul><ul...
Context-Based Blog Generation 1) Raw data gathering Daily actions 2) Offline Cluster analysis 3) Blog post generation
System Architecture
Cluster Analysis: Detecting User Actions 2007-10-03 11:02:33  222-1-61101-72162201  office,tilab  2007-10-03 10:59:09  222...
Clustering Algorithms Dimensions <ul><li>Location </li></ul><ul><ul><li>GSM/UMTS Cell IDs </li></ul></ul><ul><ul><li>User-...
Cell-Based Location Data Issues <ul><li>Context updates occur with  variable frequency </li></ul><ul><li>Detecting  static...
Compare&Merge Algorithm 2007-10-03 11:02:33  222-1-61101-72162201  office,tilab  2007-10-03 10:59:09  222-1-61101-72162201...
MultiLevel Sliding Window Algorithm <ul><li>For each window iteration: </li></ul><ul><li>Check if any user-defined label i...
Algorithms Comparison Lower precision than C&M.  (A 30 minute long window leads to a less than 30 minutes error) Very high...
Cluster Analysis Accuracy VS User Perception
From Clusters To Blog Post  NLG Natural Text Generation Action Detector Context Clusters User Preferences
Results <ul><li>Mining context history leads to user pattern discovery </li></ul><ul><li>Daily actions sharing </li></ul><...
<ul><li>Any Questions? </li></ul>Thank You! luca.costabello@guest.telecomitalia.it  [email_address] Email
Upcoming SlideShare
Loading in …5
×

Time Based Cluster Analysis for Automatic Blog Generation

3,362 views

Published on

Presented at the Social Web Search and Mining Workshop, WWW2008 in Beijing

Published in: Technology
  • Be the first to comment

Time Based Cluster Analysis for Automatic Blog Generation

  1. Time Based Context Cluster Analysis for Automatic Blog Generation Luca Costabello and Laurent-Walter Goix Telecom Italia, Italy
  2. Context as Blog Content <ul><li>User context is gaining importance </li></ul><ul><ul><li>Location info </li></ul></ul><ul><ul><li>Nearby buddies </li></ul></ul><ul><ul><li>The surrounding environment in general </li></ul></ul><ul><li>We mine context data to detect daily user actions </li></ul><ul><li>User actions are converted into natural text </li></ul><ul><li>Blog posts describing the user days enable the detection of a community of users with similar behavioral patterns. </li></ul>
  3. Context-Based Blog Generation 1) Raw data gathering Daily actions 2) Offline Cluster analysis 3) Blog post generation
  4. System Architecture
  5. Cluster Analysis: Detecting User Actions 2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a 2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a 2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a 2007-10-03 08:47:50 222-1-61104-72395762 n/a,n/a 2007-10-03 08:47:21 222-1-61104-72395762 n/a,n/a 2007-10-03 08:46:51 222-1-61104-72384437 n/a,n/a 2007-10-03 08:46:20 222-1-61104-72376116 n/a,n/a 2007-10-03 08:45:15 222-1-61104-72395763 n/a,n/a 2007-10-03 08:44:02 222-1-61104-72400263 n/a,n/a 2007-10-03 08:42:33 222-1-61104-72395770 n/a,n/a 2007-10-03 08:42:02 222-1-61104-72400262 n/a,n/a 2007-10-03 08:40:08 222-1-24650-1281 residence,home 2007-10-03 08:36:26 222-1-24650-1281 residence,home 2007-10-03 08:33:02 222-1-24650-1281 residence,home Cluster 1 (Static) Start 08:58 End 11:02 CGI 222-1-61101-162201 VP CGI Office, TILab VP Bth Not available Cluster 2 (Movement) Start 08:42 End 08:56 CGI From 222-1-24550-1281 CGI To 222-1-24650-121 VP CGI From Residence,home VP CGI To Office, TILab VP Bth Not available Timestamp Cell ID Cell ID Virtual Place
  6. Clustering Algorithms Dimensions <ul><li>Location </li></ul><ul><ul><li>GSM/UMTS Cell IDs </li></ul></ul><ul><ul><li>User-defined Cell ID Labels </li></ul></ul><ul><li>Time </li></ul><ul><ul><li>Chronological order of actions must be respected </li></ul></ul>Categorical attributes Euclidean distance not available Time must be evaluated according to “temporal distance” Ad-hoc algorithms had to be designed
  7. Cell-Based Location Data Issues <ul><li>Context updates occur with variable frequency </li></ul><ul><li>Detecting static situations VS detecting movement </li></ul><ul><li>Base station concentration affects context data patterns </li></ul><ul><li>Frequent cell handovers during static actions </li></ul>
  8. Compare&Merge Algorithm 2007-10-03 11:02:33 222-1-61101-72162201 office,tilab 2007-10-03 10:59:09 222-1-61101-72162201 office,tilab 2007-10-03 10:55:46 222-1-61101-72162201 office,tilab 2007-10-03 10:52:41 222-1-61101-64530928 n/a,n/a 2007-10-03 10:48:59 222-1-61101-72162201 office,tilab 2007-10-03 10:45:34 222-1-61101-72162201 office,tilab 2007-10-03 10:42:11 222-1-61101-64530928 n/a,n/a 2007-10-03 10:38:47 222-1-61101-72162201 office,tilab 2007-10-03 10:37:47 222-1-61101-72162201 office,tilab 2007-10-03 09:27:01 222-1-61101-72157899 office,tilab 2007-10-03 08:58:11 222-1-61104-72386176 n/a,n/a 2007-10-03 08:56:28 222-1-24650-121 n/a,n/a 2007-10-03 08:56:05 222-1-24650-122 n/a,n/a 2007-10-03 08:54:20 222-1-54650-923 n/a,n/a 2007-10-03 08:51:31 222-1-61104-72395762 n/a,n/a 2007-10-03 08:49:16 222-1-61104-72384437 n/a,n/a 2007-10-03 08:48:47 222-1-61104-72395762 n/a,n/a 2007-10-03 08:48:18 222-1-61104-72384437 n/a,n/a Context History Preliminary Context Scan Long Temporary Cluster Short Temporary Clusters Temporary Clusters Merge Static Cluster Movement Cluster Static Cluster
  9. MultiLevel Sliding Window Algorithm <ul><li>For each window iteration: </li></ul><ul><li>Check if any user-defined label is available. </li></ul><ul><li>Detect user movement </li></ul><ul><li>Detect the most frequent position </li></ul><ul><li>Merge window data with previous window iteration (if detected position is the same) </li></ul>
  10. Algorithms Comparison Lower precision than C&M. (A 30 minute long window leads to a less than 30 minutes error) Very high in optimal situations (less than 2-5 minutes) Precision <ul><li>Non-labeled areas </li></ul><ul><li>Frequent cell handovers </li></ul><ul><li>Good user labeling </li></ul><ul><li>Cells with low handovers issues </li></ul>Optimal usage None Frequent cell handovers Critical situations MultiLevel Sliding Window Compare&Merge  
  11. Cluster Analysis Accuracy VS User Perception
  12. From Clusters To Blog Post NLG Natural Text Generation Action Detector Context Clusters User Preferences
  13. Results <ul><li>Mining context history leads to user pattern discovery </li></ul><ul><li>Daily actions sharing </li></ul><ul><li>Detection of user communities, according to daily behaviors </li></ul><ul><li>Clustering accuracy VS personal memories perception </li></ul><ul><li>Movement detection </li></ul><ul><li>Location-labeling importance </li></ul>
  14. <ul><li>Any Questions? </li></ul>Thank You! luca.costabello@guest.telecomitalia.it [email_address] Email

×