Mining Dynamics of Data Streams in Multi-Dimensional Space Jiawei Han Department of Computer Science University of Illinoi...
Challenges of Stream Data Mining <ul><li>Mining query mode:  continuous,   ad-hoc, progressive? </li></ul><ul><li>Mining m...
Why Mining Dynamics of Data Streams in Multi-Dimensional Space?  <ul><li>Dynamics  ( changes, trends and evolutions )  of ...
Stream Data Mining Tasks <ul><li>Multi-dimensional (on-line) analysis  of streams </li></ul><ul><li>Clustering  data strea...
Example 1: Multi-Dimensional (OLAP) Analysis <ul><li>Analysis of  Web click streams </li></ul><ul><ul><li>Raw data at low ...
Example 2: Multi-Dimensional Classification <ul><li>Dynamic model update for loan or investment </li></ul><ul><ul><li>Huge...
Example 3: Hi-Dimensional Clustering <ul><li>Network intrusion detection </li></ul><ul><ul><li>Huge amount of incoming flo...
Methodology in Stream Data Mining <ul><li>Multi-dimensional (on-line) analysis </li></ul><ul><li>Mining dynamics of data s...
?- Questions in Stream Data Mining <ul><li>Will stream data mining be real in practice?  </li></ul><ul><li>Should we devel...
www. cs . uiuc . edu /~ hanj <ul><li>Thank you !!! </li></ul>
Upcoming SlideShare
Loading in...5
×

Mining Dynamics of Data Streams in Multi-Dimensional Space

437

Published on

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
437
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide
  • 05/10/10
  • 05/10/10
  • Mining Dynamics of Data Streams in Multi-Dimensional Space

    1. 1. Mining Dynamics of Data Streams in Multi-Dimensional Space Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www. cs . uiuc . edu /~ hanj
    2. 2. Challenges of Stream Data Mining <ul><li>Mining query mode: continuous, ad-hoc, progressive? </li></ul><ul><li>Mining mode : batched vs. interactive vs. lazy mining? </li></ul><ul><li>Time constraints: real-time? </li></ul><ul><li>What patterns to be mined? </li></ul><ul><ul><li>Finding patterns, anomaly, differences, …in multiple streams </li></ul></ul><ul><li>Mining dynamics (changes, trends and evolutions) of data streams </li></ul><ul><li>Multi-level/multi-dimensional processing and data mining </li></ul><ul><ul><li>Most stream data are at pretty low-level or multi-dimensional in nature </li></ul></ul>
    3. 3. Why Mining Dynamics of Data Streams in Multi-Dimensional Space? <ul><li>Dynamics ( changes, trends and evolutions ) of data streams </li></ul><ul><ul><li>Perhaps the most interesting thing in streams </li></ul></ul><ul><ul><li>Cannot just look at the current data? Save something! </li></ul></ul><ul><li>Multi-dimensional stream mining </li></ul><ul><ul><li>Most real stream data are at low-level or multi-dimensional in nature </li></ul></ul><ul><ul><li>How to examine dynamically at multi-dimensions? </li></ul></ul><ul><ul><li>Finding dynamics: patterns and outliers in certain dimensional space </li></ul></ul>
    4. 4. Stream Data Mining Tasks <ul><li>Multi-dimensional (on-line) analysis of streams </li></ul><ul><li>Clustering data streams </li></ul><ul><li>Classification of data streams </li></ul><ul><li>Mining frequent patterns in data streams </li></ul><ul><li>Mining sequential patterns in data streams </li></ul><ul><li>Mining partial periodicity in data streams </li></ul><ul><li>Mining notable gradients in data streams </li></ul><ul><li>Mining outliers and unusual patterns in data streams </li></ul><ul><li>…… , more? </li></ul>
    5. 5. Example 1: Multi-Dimensional (OLAP) Analysis <ul><li>Analysis of Web click streams </li></ul><ul><ul><li>Raw data at low levels: seconds, web page addresses, user IP addresses, … </li></ul></ul><ul><ul><li>Analysts want: changes, trends, unusual patterns, at reasonable levels of details </li></ul></ul><ul><ul><li>E.g., Average clicking traffic in North America on sports in the last 15 minutes is 40% higher than that in the last 24 hours .” </li></ul></ul><ul><li>Analysis of power consumption streams </li></ul><ul><ul><li>Raw data: power consumption flow for every household, every minute </li></ul></ul><ul><ul><li>Patterns one may find: average hourly power consumption surges up 30% for manufacturing companies in Chicago in the last 2 hours today than that of the same day a week ago </li></ul></ul>
    6. 6. Example 2: Multi-Dimensional Classification <ul><li>Dynamic model update for loan or investment </li></ul><ul><ul><li>Huge amount of incoming flow of changing information with multiple dimensional space (factors) </li></ul></ul><ul><ul><ul><li>E.g., Should we invest this company based on the situation of the current market? </li></ul></ul></ul><ul><li>Classification in dynamic (volatile) stock market </li></ul><ul><ul><li>Classification of stocks based on their current streams </li></ul></ul><ul><ul><ul><li>E.g., Is Lucent going to be up in the next little while? </li></ul></ul></ul>
    7. 7. Example 3: Hi-Dimensional Clustering <ul><li>Network intrusion detection </li></ul><ul><ul><li>Huge amount of incoming flow of network traffic information, multiple dimensional features in nature </li></ul></ul><ul><ul><li>Find burst of activities/traffic in real time </li></ul></ul><ul><ul><li>On-line clustering to detect abrupt changes </li></ul></ul><ul><li>What are the changes of e-mail or text information </li></ul><ul><ul><li>Clustering based on frequent terms </li></ul></ul><ul><ul><li>Can we perform such clustering in real-time? </li></ul></ul>
    8. 8. Methodology in Stream Data Mining <ul><li>Multi-dimensional (on-line) analysis </li></ul><ul><li>Mining dynamics of data streams </li></ul><ul><li>Time is a special dimension </li></ul><ul><ul><li>Tilted time frame (multiple time granularity) </li></ul></ul><ul><li>Stream data reduction and pre-computation </li></ul><ul><ul><li>What kind of multi-dimensional data to be pre-computed and stored for OLAP analysis? </li></ul></ul><ul><ul><li>What kind of data to be pre-computed/stored for classification? </li></ul></ul><ul><ul><li>For clustering? </li></ul></ul><ul><ul><li>For mining frequent patterns? </li></ul></ul><ul><ul><li>For mining sequential patterns? partial periodic patterns? </li></ul></ul><ul><ul><li>…… </li></ul></ul><ul><li>How to do incremental updates? How to find changes? </li></ul>
    9. 9. ?- Questions in Stream Data Mining <ul><li>Will stream data mining be real in practice? </li></ul><ul><li>Should we develop general stream data mining principles, or ad-doc application-oriented methods? </li></ul><ul><li>How are stream data mining methods different from incremental mining? </li></ul><ul><li>How are stream data mining linked with stream data management system? With continuous query processing? </li></ul><ul><li>Can we do privacy-preserving mining with stream data? </li></ul>
    10. 10. www. cs . uiuc . edu /~ hanj <ul><li>Thank you !!! </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×