Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Mining Change Events in Large Datasets


Published on

Published in: Business, Economy & Finance
  • Be the first to comment

  • Be the first to like this

Mining Change Events in Large Datasets

  1. 1. Hashmat Rohian Jiashu Zhao
  2. 2. <ul><li>Discover patterns whose frequency dramatically changes over time or any other dimension (FP mining extension) </li></ul><ul><li>Discover new rules associating changes (Financial markets) </li></ul><ul><li>Predict changes in one variable based on the changes in another dimensions (Outbreak detection) </li></ul>
  3. 3. <ul><li>Design practical and useful approach to discovering novel and interesting change knowledge from large databases </li></ul><ul><li>Analyze and present the knowledge mined in a clear and coherent manner </li></ul><ul><li>Evaluate the knowledge based on a gold standard </li></ul>
  4. 4. <ul><li>Qian's CPD(Change Point Detection) Algorithm </li></ul><ul><ul><li>Based on Qian’s measure </li></ul></ul><ul><li>Improved CPD1 { Divide and Conquer } </li></ul><ul><ul><li>Using Divide & Conquer with global ratios </li></ul></ul><ul><li>Improved CPD2 { Divide and Conquer } </li></ul><ul><ul><li>Using Divide & Conquer with local ratios </li></ul></ul><ul><li>Binomial method </li></ul><ul><li>The Kolmogorov-Smirnov test (KS-test) </li></ul>
  5. 5. <ul><li>Level-wise search </li></ul><ul><li>k-itemsets (itensets with k items) are used to explore (k+1)- itemsets from transactional databases </li></ul><ul><li>First, the set of frequent 1-itemsets is found (denoted L1) </li></ul><ul><li>L1 is used to find L2, the set of frquent 2-itemsets </li></ul><ul><li>L2 is used to find L3, and so on, until no frequent k-itemsets can be found </li></ul><ul><li>Generate strong association rules from the frequent itemsets </li></ul>
  6. 6. <ul><li>Transitional ratio </li></ul><ul><li>First Derivative </li></ul><ul><li>Second Derivative </li></ul><ul><ul><li>the rate of change of the rate of change </li></ul></ul><ul><li>Etc. </li></ul>
  7. 12. <ul><li>A stock market index is a method of measuring a section of the stock market. We use 27 stock market indices. </li></ul>
  8. 15. <ul><li>Statistical tools are more accurate for CPD </li></ul><ul><li>Binary points produce robust change points </li></ul><ul><li>The transitional ratio and the slope change measures have very similar results </li></ul><ul><li>Local change point estimation based on true and false points produce consistent measure </li></ul><ul><li>Both transitional ratio and slope robust for noisy or incomplete datasets </li></ul>
  9. 16. <ul><li>Use binary data for CPD and real data for change measure </li></ul><ul><li>Use regression to predict changes in one dimension using variables </li></ul><ul><li>Incorporate our system in the FP mining </li></ul><ul><li>Apply our methods on other real datasets </li></ul><ul><li>Make our system more efficient and automated </li></ul>
  10. 17. <ul><li>Questions? </li></ul><ul><li>Comments? </li></ul><ul><li>Feedbacks? </li></ul>