S-CUBE LP: Indentify User Tasks from Past Usage Logs


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

S-CUBE LP: Indentify User Tasks from Past Usage Logs

  1. 1. Extracting Task Information from Past Process Execution LogsIndentify User Tasks from Past Usage Logs ISTI-CNR (CNR) Franco Maria Nardini, Gabriele Tolomei, CNR
  2. 2. Learning Package Categorization S-Cube Monitoring and Analysis of SBA Task Modeling Extracting Task Information from Past Process Execution Logs
  3. 3. Connections to the S-Cube IRF   Conceptual Research Framework: –  Service Composition and Coordination –  Service Infrastructure –  Adaptation and Monitoring   Logical Run-Time Architecture: –  Monitoring Engine –  Adaptation Engine –  Negotiation Engine –  Runtime QA Engine –  Resource Broker 3
  4. 4. Overview  Introduction  Goal  Methodology  Experiments  Conclusions
  5. 5. Background Concepts: Usage Logs   Most complex software systems collect their lifecycle usage data in log files: –  Web search engines store a tremendous amount of data about their users in query logs: -  e.g., issued queries, timestamps, clicked results, etc. –  SBS event logs contain several information about service components exchanging messages -  e.g., service invocation, service failure, registry querying, etc.   Usage logs represent a huge source of “hidden” information (i.e., knowledge) 5
  6. 6. Knowledge Discovery from Usage Logs   Data Mining algorithms and techniques allow extracting valuable knowledge from usage logs   Extracted knowledge may refer to several aspects: –  e.g., finding usage patterns, modeling user behavior, etc.   If properly exploited, such knowledge might help improving the overall quality of the system 6
  7. 7. The Web as a Task-Execution Platform   Activities people perform are usually composition of atomic tasks   The accomplishment of those activities is moving towards the Web platform   Examples: –  planning a travel (overused!) –  organizing a birthday party –  getting a U.S. visa –  etc. 7
  8. 8. Overview  Introduction   Goal  Methodology  Experiments  Conclusions
  9. 9. Goal   Re-construct tasks/processes that users perform on the Web by means of issued queries to search engines: –  i.e., mining Web-mediated tasks from past issued user queries   Extracting tasks/processes from historical search data (i.e., query logs) collected by Web search engines   Task-based Session Discovery Problem: approached using clustering-based techniques 9
  10. 10. Overview  Introduction  Goal   Methodology  Experiments  Conclusions
  11. 11. Query Log Mining   Idea: cluster queries in a way that queries in the same cluster are likely to be task-related   Input: stream of queries issued by one user   Output: set of clusters of queries representing search tasks for that user   Key points: –  features (e.g., lexical content, time, semantic, etc.) –  clustering algorithm (e.g., centroid-based, density-based, novel heuristics) –  distance metrics (e,g., Jaccard, Levenstein, cosine, etc.) 11
  12. 12. Our Solution   A graph-based heuristics for discovering queries that are related to the same search task   Our technique has proven to outperform state-of-the-art approaches   Results was presented in a research paper published at the 4th ACM Conference on Web Search and Data Mining (WSDM 2011) –  Identifying Task-based Sessions in Search Engine Query Logs 12
  13. 13. Overview  Introduction  Goal  Methodology   Experiments  Conclusions
  14. 14. Data Set: 2006 AOL Query Log 14
  15. 15. Evaluation   We manually extract a set of tasks from a portion of our testing query log (i.e., ground-truth)   We run our proposed algorithm and evaluate its accuracy in discovering the manually-labeled tasks of the ground-truth   Evaluation is expressed in terms of popular IR-based metrics: –  Precision –  Recall –  F-measure (i.e., harmonic mean of Precision and Recall) –  Rand –  Jaccard 15
  16. 16. Results 16
  17. 17. Implications for SBS domain: Why?   Our technique was thought for, but not limited to Web search context   Service-based Systems could be another suitable context of application   Tasks might be single service instances   Processes might be workflows of orchestrated services   Query/Task clustering can be considered as a special case of more general “activity clustering” 17
  18. 18. Implications for SBS domain: How?   Past usage log data are the key point for applying our technique   Once we have logs of performed activities (e.g., service invocations) we can figure out features   Then we can cluster activities according to those features on a task/process-based perspective 18
  19. 19. Overview  Introduction  Goal  Methodology  Experiments   Conclusions
  20. 20. Conclusions   We developed a technique for mining tasks/processes from Web search logs   Our technique is based on clustering historical search data according to some features   This approach might be generalized and applied to several other contexts (e.g., software-based services)   We need usage logs from which we can extract suitable features and common interfaces!