Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

S-CUBE LP: Mining Lifecycle Event Logs for Enhancing SBAs


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

S-CUBE LP: Mining Lifecycle Event Logs for Enhancing SBAs

  1. 1. Exploiting Knowledge on Past Process Execution to Improve SBA Analysis Mining Lifecycle Event Logs for Enhancing SBAs ISTI-CNR (CNR), TU Wien (TUW) Franco Maria Nardini, Gabriele Tolomei, CNR
  2. 2. Learning Package Categorization S-Cube Monitoring and Analysis of SBA Process Mining Exploiting Knowledge on Past Process Execution to Improve SBA Analysis
  3. 3. Connections to the S-Cube IRF   Conceptual Research Framework: –  Service Composition and Coordination –  Service Infrastructure –  Adaptation and Monitoring   Logical Run-Time Architecture: –  Monitoring Engine –  Adaptation Engine –  Negotiation Engine –  Runtime QA Engine –  Resource Broker 3
  4. 4. Overview  Introduction  Goal  Methodology  Experiments  Conclusions
  5. 5. SBA Event Logs   Most complex software systems collect their lifecycle usage data in event log files   SBA event logs contain several information about service components exchanging messages –  e.g., service invocation, service failure, registry querying, etc.   Event logs represent a huge source of “hidden” information (i.e., knowledge) 5
  6. 6. Mining SBA Event Logs   Data Mining algorithms and techniques allow extracting valuable knowledge from event logs   Extracted knowledge may refer to several aspects: –  e.g., service usage patterns, service failure patterns, etc.   If properly exploited, such knowledge might help improving the overall quality of the system: –  recommending frequent invoked services; –  avoiding/handling anomalous situations, etc. 6
  7. 7. Process Mining (PM)   Process Mining (PM) is an application of data mining techniques to SBA event logs   PM aims at discovering structured process models derived from patterns that are present in actual traces of service executions   Each process is usually represented by a digraph and the problem of PM has been modeled as: –  finite state machine [CW96] –  sequential pattern mining (SPM) [AGL98] –  Petri-net [vdAWM04] 7
  8. 8. Another Example: Web Search Engines   Web Search Engines (WSEs) are another example of systems that benefit from mining their event log data (i.e., Query Logs)   Query Log Mining (QLM) has proven to be effective for enhancing the overall performances of WSEs   We propose a QLM technique for identifying search patterns (tasks) from the stream of queries recorded in query logs [LOPST11] 8
  9. 9. Overview  Introduction   Goal  Methodology  Experiments  Conclusions
  10. 10. Goal   Treat PM as an instance of the SPM problem   Detect frequent sequential patterns of service invocation, i.e., services that are frequently co-invoked within the same sequence –  e.g., service Y is usually invoked afterwards service X   Find which/how services are actually used –  service recommendation –  avoiding/handling anomalous situations 10
  11. 11. Overview  Introduction  Goal   Methodology  Experiments  Conclusions
  12. 12. Sequential Pattern Mining   Event log might be viewed as sequences of events that change with time (time-series)   We are interested in finding sequences of services that are frequently invoked in a specific order, i.e., sequential patterns   Sequential Pattern Mining (SPM) is the process of extracting sequential patterns whose support exceeds a predefined minimal support threshold min_supp 12
  13. 13. PrefixSpan   One of the most efficient algorithm for finding sequential patterns [PHMP01]   Mines the complete set of patterns but greatly reduces the efforts of candidate subsequence generation   Takes only into account the chronological order between events -  i.e., it only cares if X comes before Y without worrying about the actual time interval 13
  14. 14. MiSTA   Hint: observing that two services are invoked really close rather than far away to each other in a sequence could lead to distinct conclusions   MiSTA [GNPP06] is able to deal with the actual time interval between any two consecutive service invocations   It needs a time threshold tau for specifying the maximum time interval of events in a frequent sequence 14
  15. 15. Overview  Introduction  Goal  Methodology   Experiments  Conclusions
  16. 16. Data Set: VRESCo   VRESCo is the runtime environment for Service-oriented Computing developed by VITALab@TUW   It collects usage data (i.e., events) in the form of XML log file   VRESCo event log file contains information about: invoked services, service rebinding, service failure, etc.   We only focus on service invocation events 16
  17. 17. PrefixSpan: min_supp=25% 17
  18. 18. PrefixSpan: min_supp=50% 18
  19. 19. PrefixSpan: min_supp=66% 19
  20. 20. MiSTA: min_supp=32%, tau=5sec. 20
  21. 21. MiSTA: min_supp=32%, tau=60sec. 21
  22. 22. MiSTA: min_supp=32%, tau=300sec. 22
  23. 23. Results   The service logs coming from the VRESCo runtime environment contain frequent patterns of services;   Those patters contains information about: invoked services, service rebinding, service failure, etc;   Those patterns could be collected by considering co- occurring sequences and also by considering the time;   Such inferred knowledge can be used to enhance SBAs: e.g., by means of novel design tools like service recommendation. 23
  24. 24. Overview  Introduction  Goal  Methodology  Experiments   Conclusions
  25. 25. Conclusions   Event logs collected by complex software systems represent a huge source of information (knowledge)   Find sequences of frequently co-invoked services from SBA event logs using Sequential Pattern Mining (SPM)   2 SPM algorithms run on top of a real-world SBA event log (VRESCo): PrefixSpan, MiSTA   Experimental results show that some services are often invoked together in a frequent sequence   Exploit such inferred knowledge to enhance SBAs: e.g., by means of novel design tools like service recommendation
  26. 26. References –  [CW96] J. E. Cook and A. L. Wolf, “Discovering models of software processes from event-based data”. Research Report Technical Report CUCS-819-96, Computer Science Dept., Univ. of Colorado, 1996. –  [AGL98] R. Agrawal, D. Gunopulos, and F. Leymann, “Mining Process Models from Workflow Logs”. In Sixth International Conference on Extending Database Technology, pp. 469–483, 1998 –  [vdAWM04] W. van der Aalst, T. Weijters, and L. Maruster, “Workflow Mining: Discovering Process Models from Event Logs”. IEEE Transactions on Knowledge and Data Engineering, vol. 16, no. 9, pp. 1128–1142, Sep. 2004. –  [LOPST11] C. Lucchese, S. Orlando, R. Perego, F. Silvestri, and G. Tolomei, “Identifying task-based sessions in search engine query logs”, in WSDM ’11. ACM, 2011, pp. 277–286. –  [PHMP01] J. Pei, J. Han, B. Mortazavi-Asl, and H. Pinto, “Prefixspan: Mining sequential patterns efficiently by prefix-projected pattern growth,” in ICDE ’01. IEEE, 2001 –  [GNPP06] F. Giannotti, M. Nanni, D. Pedreschi, and F. Pinelli, “Mining sequences with temporal annotations,” in SAC ’06. ACM, 2006, pp. 593–597.