S-CUBE LP: Indentify User Tasks from Past Usage Logs
Upcoming SlideShare
Loading in...5

S-CUBE LP: Indentify User Tasks from Past Usage Logs






Total Views
Views on SlideShare
Embed Views



1 Embed 208

http://vc.infosys.tuwien.ac.at 208



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

S-CUBE LP: Indentify User Tasks from Past Usage Logs S-CUBE LP: Indentify User Tasks from Past Usage Logs Presentation Transcript

  • Extracting Task Information from Past Process Execution LogsIndentify User Tasks from Past Usage Logs ISTI-CNR (CNR) Franco Maria Nardini, Gabriele Tolomei, CNR
  • Learning Package Categorization S-Cube Monitoring and Analysis of SBA Task Modeling Extracting Task Information  from Past Process Execution Logs
  • Connections to the S-Cube IRF   Conceptual Research Framework: –  Service Composition and Coordination –  Service Infrastructure –  Adaptation and Monitoring   Logical Run-Time Architecture: –  Monitoring Engine –  Adaptation Engine –  Negotiation Engine –  Runtime QA Engine –  Resource Broker 3 View slide
  • Overview  Introduction  Goal  Methodology  Experiments  Conclusions View slide
  • Background Concepts: Usage Logs   Most complex software systems collect their lifecycle usage data in log files: –  Web search engines store a tremendous amount of data about their users in query logs: -  e.g., issued queries, timestamps, clicked results, etc. –  SBS event logs contain several information about service components exchanging messages -  e.g., service invocation, service failure, registry querying, etc.   Usage logs represent a huge source of “hidden” information (i.e., knowledge) 5
  • Knowledge Discovery from Usage Logs   Data Mining algorithms and techniques allow extracting valuable knowledge from usage logs   Extracted knowledge may refer to several aspects: –  e.g., finding usage patterns, modeling user behavior, etc.   If properly exploited, such knowledge might help improving the overall quality of the system 6
  • The Web as a Task-Execution Platform   Activities people perform are usually composition of atomic tasks   The accomplishment of those activities is moving towards the Web platform   Examples: –  planning a travel (overused!) –  organizing a birthday party –  getting a U.S. visa –  etc. 7
  • Overview  Introduction   Goal  Methodology  Experiments  Conclusions
  • Goal   Re-construct tasks/processes that users perform on the Web by means of issued queries to search engines: –  i.e., mining Web-mediated tasks from past issued user queries   Extracting tasks/processes from historical search data (i.e., query logs) collected by Web search engines   Task-based Session Discovery Problem: approached using clustering-based techniques 9
  • Overview  Introduction  Goal   Methodology  Experiments  Conclusions
  • Query Log Mining   Idea: cluster queries in a way that queries in the same cluster are likely to be task-related   Input: stream of queries issued by one user   Output: set of clusters of queries representing search tasks for that user   Key points: –  features (e.g., lexical content, time, semantic, etc.) –  clustering algorithm (e.g., centroid-based, density-based, novel heuristics) –  distance metrics (e,g., Jaccard, Levenstein, cosine, etc.) 11
  • Our Solution   A graph-based heuristics for discovering queries that are related to the same search task   Our technique has proven to outperform state-of-the-art approaches   Results was presented in a research paper published at the 4th ACM Conference on Web Search and Data Mining (WSDM 2011) –  Identifying Task-based Sessions in Search Engine Query Logs 12
  • Overview  Introduction  Goal  Methodology   Experiments  Conclusions
  • Data Set: 2006 AOL Query Log 14
  • Evaluation   We manually extract a set of tasks from a portion of our testing query log (i.e., ground-truth)   We run our proposed algorithm and evaluate its accuracy in discovering the manually-labeled tasks of the ground-truth   Evaluation is expressed in terms of popular IR-based metrics: –  Precision –  Recall –  F-measure (i.e., harmonic mean of Precision and Recall) –  Rand –  Jaccard 15
  • Results 16
  • Implications for SBS domain: Why?   Our technique was thought for, but not limited to Web search context   Service-based Systems could be another suitable context of application   Tasks might be single service instances   Processes might be workflows of orchestrated services   Query/Task clustering can be considered as a special case of more general “activity clustering” 17
  • Implications for SBS domain: How?   Past usage log data are the key point for applying our technique   Once we have logs of performed activities (e.g., service invocations) we can figure out features   Then we can cluster activities according to those features on a task/process-based perspective 18
  • Overview  Introduction  Goal  Methodology  Experiments   Conclusions
  • Conclusions   We developed a technique for mining tasks/processes from Web search logs   Our technique is based on clustering historical search data according to some features   This approach might be generalized and applied to several other contexts (e.g., software-based services)   We need usage logs from which we can extract suitable features and common interfaces!