Hashmat Rohian Jiashu Zhao
<ul><li>Discover patterns whose frequency dramatically changes over time or any other dimension (FP mining extension) </li...
<ul><li>Design practical and useful approach to discovering novel and interesting change knowledge from large databases </...
<ul><li>Qian's CPD(Change Point Detection) Algorithm </li></ul><ul><ul><li>Based on Qian’s measure </li></ul></ul><ul><li>...
<ul><li>Level-wise search </li></ul><ul><li>k-itemsets (itensets with k items) are used to explore (k+1)- itemsets from tr...
<ul><li>Transitional ratio </li></ul><ul><li>First Derivative  </li></ul><ul><li>Second Derivative </li></ul><ul><ul><li>t...
 
 
 
 
 
<ul><li>A stock market index is a method of measuring a section of the stock market. We use 27 stock market indices. </li>...
 
 
<ul><li>Statistical tools are more accurate for CPD </li></ul><ul><li>Binary points produce robust change points </li></ul...
<ul><li>Use binary data for CPD and real data for change measure </li></ul><ul><li>Use regression to predict changes in on...
<ul><li>Questions? </li></ul><ul><li>Comments? </li></ul><ul><li>Feedbacks? </li></ul>
Upcoming SlideShare
Loading in...5
×

Mining Change Events in Large Datasets

168

Published on

Published in: Business, Economy & Finance
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
168
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • We tested Qian&apos;s CPD on an artificial dataset, the dataset contains a most significant change point 1550, and some other change points: 180, 220, 950, 1050, 1450. The point 1548 was found, which is very close to the most significant change point in the dataset. The result indicates that Qian&apos;s CPD is effective to find the point changes mostly. But from the definition and result, we can see that it failed to find some change points no matter how the threshold is set, which is due to counting the globally on the whole dataset. In a CPD problem, each time we divide a CPD problem into two subproblems according to the change point, searching change point on both of the new subsets. KS-test tries to determine if two datasets differ significantly. The KS-test has the advantage of making no assumption about the distribution of data.
  • Mining Change Events in Large Datasets

    1. 1. Hashmat Rohian Jiashu Zhao
    2. 2. <ul><li>Discover patterns whose frequency dramatically changes over time or any other dimension (FP mining extension) </li></ul><ul><li>Discover new rules associating changes (Financial markets) </li></ul><ul><li>Predict changes in one variable based on the changes in another dimensions (Outbreak detection) </li></ul>
    3. 3. <ul><li>Design practical and useful approach to discovering novel and interesting change knowledge from large databases </li></ul><ul><li>Analyze and present the knowledge mined in a clear and coherent manner </li></ul><ul><li>Evaluate the knowledge based on a gold standard </li></ul>
    4. 4. <ul><li>Qian's CPD(Change Point Detection) Algorithm </li></ul><ul><ul><li>Based on Qian’s measure </li></ul></ul><ul><li>Improved CPD1 { Divide and Conquer } </li></ul><ul><ul><li>Using Divide & Conquer with global ratios </li></ul></ul><ul><li>Improved CPD2 { Divide and Conquer } </li></ul><ul><ul><li>Using Divide & Conquer with local ratios </li></ul></ul><ul><li>Binomial method </li></ul><ul><li>The Kolmogorov-Smirnov test (KS-test) </li></ul>
    5. 5. <ul><li>Level-wise search </li></ul><ul><li>k-itemsets (itensets with k items) are used to explore (k+1)- itemsets from transactional databases </li></ul><ul><li>First, the set of frequent 1-itemsets is found (denoted L1) </li></ul><ul><li>L1 is used to find L2, the set of frquent 2-itemsets </li></ul><ul><li>L2 is used to find L3, and so on, until no frequent k-itemsets can be found </li></ul><ul><li>Generate strong association rules from the frequent itemsets </li></ul>
    6. 6. <ul><li>Transitional ratio </li></ul><ul><li>First Derivative </li></ul><ul><li>Second Derivative </li></ul><ul><ul><li>the rate of change of the rate of change </li></ul></ul><ul><li>Etc. </li></ul>
    7. 12. <ul><li>A stock market index is a method of measuring a section of the stock market. We use 27 stock market indices. </li></ul>
    8. 15. <ul><li>Statistical tools are more accurate for CPD </li></ul><ul><li>Binary points produce robust change points </li></ul><ul><li>The transitional ratio and the slope change measures have very similar results </li></ul><ul><li>Local change point estimation based on true and false points produce consistent measure </li></ul><ul><li>Both transitional ratio and slope robust for noisy or incomplete datasets </li></ul>
    9. 16. <ul><li>Use binary data for CPD and real data for change measure </li></ul><ul><li>Use regression to predict changes in one dimension using variables </li></ul><ul><li>Incorporate our system in the FP mining </li></ul><ul><li>Apply our methods on other real datasets </li></ul><ul><li>Make our system more efficient and automated </li></ul>
    10. 17. <ul><li>Questions? </li></ul><ul><li>Comments? </li></ul><ul><li>Feedbacks? </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×