Mining Dynamics of Data Streams in Multidimensional Space Jiawei Han Department of Computer Science University of Illinois...
Outline <ul><li>Characteristics of data streams </li></ul><ul><li>Mining dynamics in data streams </li></ul><ul><li>Multi-...
Characteristics of Data Streams <ul><li>Data Streams </li></ul><ul><ul><li>Data streams — continuous, ordered, changing, f...
Stream Data Applications <ul><li>Telecommunication calling records </li></ul><ul><li>Business: credit card transaction flo...
Architecture: Stream Query Processing Scratch Space (Main memory and/or Disk) User/Application Stream Query Processor Resu...
Challenges of Stream Data Processing <ul><li>Multiple, continuous, rapid, time-varying, ordered  streams </li></ul><ul><li...
Processing Stream Queries <ul><li>Query types </li></ul><ul><ul><li>One-time query vs.  continuous query  (being evaluated...
Projects on DSMS (Data Stream Management System)  <ul><li>Research projects and system prototypes </li></ul><ul><ul><li>ST...
Stream Data Mining vs. Stream Querying <ul><li>Stream mining — A more challenging task </li></ul><ul><ul><li>It shares mos...
Stream Data Mining Tasks <ul><li>Multi-dimensional (on-line) analysis  of streams </li></ul><ul><li>Clustering  data strea...
Challenges for Mining Dynamics in Data Streams <ul><li>Most stream data are at pretty low-level or multi-dimensional in na...
Multi-Dimensional Stream Analysis: Examples <ul><li>Analysis of  Web click streams </li></ul><ul><ul><li>Raw data at low l...
A Key Step — Stream Data Reduction <ul><li>Challenges of OLAPing stream data </li></ul><ul><ul><li>Raw data cannot be stor...
Regression Cube for Time-Series <ul><li>Initially, one time-series per base cell </li></ul><ul><ul><li>Too costly to store...
Basics of General Linear Regression <ul><li>n tuples in one cell: ( x i   ,  y i ),  i  = 1..n , where  y i  is the measur...
Stock Price Example — Aggregation in Standard Dimensions <ul><li>Simple linear regression on time series data </li></ul><u...
Stock Price Example — Aggregation in Time  Dimension <ul><li>Cells of two adjacent  </li></ul><ul><li>time intervals: </li...
Feasibility of Stream Regression Analysis <ul><li>Efficient storage and scalable (independent of the number of tuples in d...
A Stream Cube Architecture <ul><li>A   tilted time frame   </li></ul><ul><ul><li>Different time granularities </li></ul></...
A Tilted Time-Frame Model Up to 7 days: Up to a year: Logarithmic (exponential) scale: 2t 1t Time Now 4t 8t 16t 31   days ...
Benefits of Tilted Time-Frame Model <ul><li>Each cell stores the measures according to tilt-time-frame </li></ul><ul><ul><...
Two Critical Layers in the Stream Cube (*, theme, quarter) (user-group, URL-group, minute) m-layer (minimal interest) (ind...
On-Line Materialization vs. On-Line Computation <ul><li>On-line materialization  </li></ul><ul><ul><li>Materialization tak...
Stream Cube Structure: from m-layer to o-layer (A1, *, C1) (A1, *, C2) (A1, *, C2) (A1, *, C2) (A1, B1, C2) (A1, B2, C1) (...
Stream Cube Computation <ul><li>Cube structure from m-layer to o-layer </li></ul><ul><li>Three approaches </li></ul><ul><u...
An H-Tree Cubing Structure root entertainment sports politics uic uiuc uic uiuc jeff mary jeff Jim Q.I. Q.I. Q.I. Observat...
Benefits of H-Tree and H-Cubing <ul><li>H-tree and H-cubing  </li></ul><ul><ul><li>Developed for computing data cubes and ...
a) Time vs. m-layer size b) Space vs. m-layer size Time and Space vs. Number of Tuples at the m-Layer  (Dataset D3L3C10T40...
Time and Space vs. the Number of Levels a) Time vs. # levels b) Space vs. # levels
Mining Other Dynamic Patterns in Stream Data <ul><li>Clustering and outlier analysis  for stream mining </li></ul><ul><ul>...
Clustering for Mining Stream Dynamics <ul><li>Network intrusion detection: one example  </li></ul><ul><ul><li>Detect burst...
Classification for Dynamic Data Streams <ul><li>Decision tree induction for stream data classification </li></ul><ul><ul><...
Frequent Patterns for Stream Data <ul><li>Frequent pattern mining is valuable in stream applications </li></ul><ul><ul><li...
Conclusions <ul><li>Stream data mining:  A rich and largely unexplored field </li></ul><ul><ul><li>Current research focus ...
References <ul><li>B. Babcock, S. Babu, M. Datar, R. Motawani, and J. Widom, “Models and issues in data stream systems ” ,...
www.cs.uiuc.edu/~hanj <ul><li>Thank you !!! </li></ul>
Upcoming SlideShare
Loading in...5
×

Mining Dynamics of Data Streams in Multidimensional Space ...

465

Published on

0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
465
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • 05/10/10
  • 05/10/10
  • Mining Dynamics of Data Streams in Multidimensional Space ...

    1. 1. Mining Dynamics of Data Streams in Multidimensional Space Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign www.cs.uiuc.edu/~hanj
    2. 2. Outline <ul><li>Characteristics of data streams </li></ul><ul><li>Mining dynamics in data streams </li></ul><ul><li>Multi-dimensional (regression) analysis of data streams </li></ul><ul><li>Stream cubing and stream OLAP methods </li></ul><ul><li>Mining other kinds of patterns in data streams </li></ul><ul><li>Research problems </li></ul>
    3. 3. Characteristics of Data Streams <ul><li>Data Streams </li></ul><ul><ul><li>Data streams — continuous, ordered, changing, fast, huge amount </li></ul></ul><ul><ul><li>Traditional DBMS — data stored in finite, persistent data sets </li></ul></ul><ul><li>Characteristics </li></ul><ul><ul><li>Huge volumes of continuous data, possibly infinite </li></ul></ul><ul><ul><li>Fast changing and requires fast, real-time response </li></ul></ul><ul><ul><li>Data stream captures nicely our data processing needs of today </li></ul></ul><ul><ul><li>Random access is expensive — single linear scan algorithm ( can only have one look ) </li></ul></ul><ul><ul><li>Store only the summary of the data seen thus far </li></ul></ul><ul><ul><li>Most stream data are at pretty low-level or multi-dimensional in nature, needs multi-level and multi-dimensional processing </li></ul></ul>
    4. 4. Stream Data Applications <ul><li>Telecommunication calling records </li></ul><ul><li>Business: credit card transaction flows </li></ul><ul><li>Network monitoring and traffic engineering </li></ul><ul><li>Financial market: stock exchange </li></ul><ul><li>Engineering & industrial processes: power supply & manufacturing </li></ul><ul><li>Sensor, monitoring & surveillance: video streams </li></ul><ul><li>Security monitoring </li></ul><ul><li>Web logs and Web page click streams </li></ul><ul><li>Massive data sets (even saved but random access is too expensive) </li></ul>
    5. 5. Architecture: Stream Query Processing Scratch Space (Main memory and/or Disk) User/Application Stream Query Processor Results Multiple streams SDMS (Stream Data Management System) Continuous Query
    6. 6. Challenges of Stream Data Processing <ul><li>Multiple, continuous, rapid, time-varying, ordered streams </li></ul><ul><li>Main memory computations </li></ul><ul><li>Queries are often continuous </li></ul><ul><ul><li>Evaluated continuously as stream data arrives </li></ul></ul><ul><ul><li>Answer updated over time </li></ul></ul><ul><li>Queries are often complex </li></ul><ul><ul><li>Beyond element-at-a-time processing </li></ul></ul><ul><ul><li>Beyond stream-at-a-time processing </li></ul></ul><ul><ul><li>Beyond relational queries (scientific, data mining, OLAP) </li></ul></ul><ul><li>Multi-level/multi-dimensional processing and data mining </li></ul><ul><ul><li>Most stream data are at pretty low-level or multi-dimensional in nature </li></ul></ul>
    7. 7. Processing Stream Queries <ul><li>Query types </li></ul><ul><ul><li>One-time query vs. continuous query (being evaluated continuously as stream continues to arrive) </li></ul></ul><ul><ul><li>Predefined query vs. ad-hoc query (issued on-line) </li></ul></ul><ul><li>Unbounded memory requirements </li></ul><ul><ul><li>For real-time response, main memory algorithm should be used </li></ul></ul><ul><ul><li>Memory requirement is unbounded if one will join future tuples </li></ul></ul><ul><li>Approximate query answering </li></ul><ul><ul><li>With bounded memory, it is not always possible to produce exact answers </li></ul></ul><ul><ul><li>High-quality approximate answers are desired </li></ul></ul><ul><ul><li>Data reduction and synopsis construction methods </li></ul></ul><ul><ul><ul><li>Sketches, random sampling, histograms, wavelets, etc. </li></ul></ul></ul>
    8. 8. Projects on DSMS (Data Stream Management System) <ul><li>Research projects and system prototypes </li></ul><ul><ul><li>STREAM (Stanford): A general-purpose DSMS </li></ul></ul><ul><ul><li>Cougar (Cornell): sensors </li></ul></ul><ul><ul><li>Aurora (Brown/MIT): sensor monitoring, dataflow </li></ul></ul><ul><ul><li>Hancock (AT&T): telecom streams </li></ul></ul><ul><ul><li>Niagara (OGI/Wisconsin): Internet XML databases </li></ul></ul><ul><ul><li>OpenCQ (Georgia Tech): triggers, incr. view maintenance </li></ul></ul><ul><ul><li>Tapestry ( Xerox): pub/sub content-based filtering </li></ul></ul><ul><ul><li>Telegraph (Berkeley): adaptive engine for sensors </li></ul></ul><ul><ul><li>Tradebot (www.tradebot.com): stock tickers & streams </li></ul></ul><ul><ul><li>Tribeca (Bellcore): network monitoring </li></ul></ul><ul><ul><li>Streaminer (UIUC): new project for stream data mining </li></ul></ul>
    9. 9. Stream Data Mining vs. Stream Querying <ul><li>Stream mining — A more challenging task </li></ul><ul><ul><li>It shares most of the difficulties with stream querying </li></ul></ul><ul><ul><li>Patterns are hidden and more general than querying </li></ul></ul><ul><ul><li>It may require exploratory analysis </li></ul></ul><ul><ul><ul><li>Not necessarily continuous queries </li></ul></ul></ul><ul><li>Stream data mining tasks </li></ul><ul><ul><li>Multi-dimensional on-line analysis of streams </li></ul></ul><ul><ul><li>Mining outliers and unusual patterns in stream data </li></ul></ul><ul><ul><li>Clustering data streams </li></ul></ul><ul><ul><li>Classification of stream data </li></ul></ul>
    10. 10. Stream Data Mining Tasks <ul><li>Multi-dimensional (on-line) analysis of streams </li></ul><ul><li>Clustering data streams </li></ul><ul><li>Classification of data streams </li></ul><ul><li>Mining frequent patterns in data streams </li></ul><ul><li>Mining sequential patterns in data streams </li></ul><ul><li>Mining partial periodicity in data streams </li></ul><ul><li>Mining notable gradients in data streams </li></ul><ul><li>Mining outliers and unusual patterns in data streams </li></ul><ul><li>…… , more? </li></ul>
    11. 11. Challenges for Mining Dynamics in Data Streams <ul><li>Most stream data are at pretty low-level or multi-dimensional in nature: needs ML/MD processing </li></ul><ul><li>Analysis requirements </li></ul><ul><ul><li>Multi-dimensional trends and unusual patterns </li></ul></ul><ul><ul><li>Capturing important changes at multi-dimensions/levels </li></ul></ul><ul><ul><li>Fast, real-time detection and response </li></ul></ul><ul><ul><li>Comparing with data cube: Similarity and differences </li></ul></ul><ul><li>Stream (data) cube or stream OLAP: Is this feasible? </li></ul><ul><ul><li>Can we implement it efficiently? </li></ul></ul>
    12. 12. Multi-Dimensional Stream Analysis: Examples <ul><li>Analysis of Web click streams </li></ul><ul><ul><li>Raw data at low levels: seconds, web page addresses, user IP addresses, … </li></ul></ul><ul><ul><li>Analysts want: changes, trends, unusual patterns, at reasonable levels of details </li></ul></ul><ul><ul><li>E.g., Average clicking traffic in North America on sports in the last 15 minutes is 40% higher than that in the last 24 hours .” </li></ul></ul><ul><li>Analysis of power consumption streams </li></ul><ul><ul><li>Raw data: power consumption flow for every household, every minute </li></ul></ul><ul><ul><li>Patterns one may find: average hourly power consumption surges up 30% for manufacturing companies in Chicago in the last 2 hours today than that of the same day a week ago </li></ul></ul>
    13. 13. A Key Step — Stream Data Reduction <ul><li>Challenges of OLAPing stream data </li></ul><ul><ul><li>Raw data cannot be stored </li></ul></ul><ul><ul><li>Simple aggregates are not powerful enough </li></ul></ul><ul><ul><li>History shape and patterns at different levels are desirable: multi-dimensional regression analysis </li></ul></ul><ul><li>Proposal </li></ul><ul><ul><li>A scalable multi-dimensional stream “data cube” that can aggregate regression model of stream data efficiently without accessing the raw data </li></ul></ul><ul><li>Stream data compression </li></ul><ul><ul><li>Compress the stream data to support memory- and time-efficient multi-dimensional regression analysis </li></ul></ul>
    14. 14. Regression Cube for Time-Series <ul><li>Initially, one time-series per base cell </li></ul><ul><ul><li>Too costly to store all these time-series </li></ul></ul><ul><ul><li>Too costly to compute regression at multi-dimensional space </li></ul></ul><ul><li>Regression cube </li></ul><ul><ul><li>Base cube: only store regression parameters of base cells (e.g., 2 points vs. 1000 points) </li></ul></ul><ul><ul><li>All the upper level cuboids can be computed precisely for linear regression on both standard dimensions and time dimensions </li></ul></ul><ul><ul><li>For quadratic regression, we need 5 points </li></ul></ul><ul><ul><li>In general, we need: </li></ul></ul><ul><ul><ul><li>where k = 2 for quadratic. </li></ul></ul></ul>
    15. 15. Basics of General Linear Regression <ul><li>n tuples in one cell: ( x i , y i ), i = 1..n , where y i is the measure attribute to be analyzed </li></ul><ul><li>For sample i , a vector of k user-defined predictors u i : </li></ul><ul><li>The linear regression model: </li></ul><ul><li>where η is a k × 1 vector of regression parameters </li></ul>
    16. 16. Stock Price Example — Aggregation in Standard Dimensions <ul><li>Simple linear regression on time series data </li></ul><ul><ul><li>Cells of two companies </li></ul></ul><ul><ul><li>After aggregation: </li></ul></ul>
    17. 17. Stock Price Example — Aggregation in Time Dimension <ul><li>Cells of two adjacent </li></ul><ul><li>time intervals: </li></ul><ul><li>After aggregation </li></ul>
    18. 18. Feasibility of Stream Regression Analysis <ul><li>Efficient storage and scalable (independent of the number of tuples in data cells) </li></ul><ul><li>Lossless aggregation without accessing the raw data </li></ul><ul><li>Fast aggregation: computationally efficient </li></ul><ul><li>Regression models of data cells at all levels </li></ul><ul><li>General results: covered a large and the most popular class of regression </li></ul><ul><ul><li>Including quadratic, polynomial, and nonlinear models </li></ul></ul>
    19. 19. A Stream Cube Architecture <ul><li>A tilted time frame </li></ul><ul><ul><li>Different time granularities </li></ul></ul><ul><ul><ul><li>second, minute, quarter, hour, day, week, … </li></ul></ul></ul><ul><li>Critical layers </li></ul><ul><ul><li>Minimum interest layer (m-layer) </li></ul></ul><ul><ul><li>Observation layer (o-layer) </li></ul></ul><ul><ul><li>User: watches at o-layer and occasionally needs to drill-down down to m-layer </li></ul></ul><ul><li>Partial materialization of stream cubes </li></ul><ul><ul><li>Full materialization: too space and time consuming </li></ul></ul><ul><ul><li>No materialization: slow response at query time </li></ul></ul><ul><ul><li>Partial materialization: what do we mean “partial”? </li></ul></ul>
    20. 20. A Tilted Time-Frame Model Up to 7 days: Up to a year: Logarithmic (exponential) scale: 2t 1t Time Now 4t 8t 16t 31 days 24 hours 4 qtrs 12 months Time Now 24hrs 4qtrs 15minutes 7 days Time Now 25sec.
    21. 21. Benefits of Tilted Time-Frame Model <ul><li>Each cell stores the measures according to tilt-time-frame </li></ul><ul><ul><li>Limited memory space: Impossible to store the history in full scale </li></ul></ul><ul><li>Emphasis more on recent data </li></ul><ul><ul><li>Most applications emphasize on recent data (slide window) </li></ul></ul><ul><li>Natural partition on different time granularities </li></ul><ul><ul><li>Putting different weights on remote data </li></ul></ul><ul><ul><li>Useful even for uniform weight </li></ul></ul><ul><li>Tilted time-frame forms a new time dimension </li></ul><ul><ul><li>for mining changes and evolutions </li></ul></ul><ul><li>Essential for mining dynamic patterns or outliers </li></ul><ul><ul><li>Finding those with dramatic changes </li></ul></ul><ul><ul><li>E.g., exceptional stocks —not following the trends </li></ul></ul>
    22. 22. Two Critical Layers in the Stream Cube (*, theme, quarter) (user-group, URL-group, minute) m-layer (minimal interest) (individual-user, URL, second) (primitive) stream data layer o-layer (observation)
    23. 23. On-Line Materialization vs. On-Line Computation <ul><li>On-line materialization </li></ul><ul><ul><li>Materialization takes precious resources and time </li></ul></ul><ul><ul><ul><li>Only incremental materialization (with slide window) </li></ul></ul></ul><ul><ul><li>Only materialize “cuboids” of the critical layers? </li></ul></ul><ul><ul><ul><li>Some intermediate cells that should be materialized </li></ul></ul></ul><ul><ul><li>Popular path approach vs. exception cell approach </li></ul></ul><ul><ul><ul><li>Materialize intermediate cells along the popular paths </li></ul></ul></ul><ul><ul><ul><li>Exception cells: how to set up exception thresholds? </li></ul></ul></ul><ul><ul><ul><li>Notice exceptions do not have monotonic behavior </li></ul></ul></ul><ul><li>Computation problem </li></ul><ul><ul><li>How to compute and store stream cubes efficiently? </li></ul></ul><ul><ul><li>How to discover unusual cells between the critical layer? </li></ul></ul>
    24. 24. Stream Cube Structure: from m-layer to o-layer (A1, *, C1) (A1, *, C2) (A1, *, C2) (A1, *, C2) (A1, B1, C2) (A1, B2, C1) (A2, *, C2) (A2, B1, C1) (A1, B2, C2) (A2, B1, C2) A2, B2, C1) (A2, B2, C2)
    25. 25. Stream Cube Computation <ul><li>Cube structure from m-layer to o-layer </li></ul><ul><li>Three approaches </li></ul><ul><ul><li>All cuboids approach </li></ul></ul><ul><ul><ul><li>Materializing all cells (too much in both space and time) </li></ul></ul></ul><ul><ul><li>Exceptional cells approach </li></ul></ul><ul><ul><ul><li>Materializing only exceptional cells (saves space but not time to compute and definition of exception is not flexible ) </li></ul></ul></ul><ul><ul><li>Popular path approach </li></ul></ul><ul><ul><ul><li>Computing and materializing cells only along a popular path </li></ul></ul></ul><ul><ul><ul><li>Using H-tree structure to store computed cells (which form the stream cube —a selectively materialized cube ) </li></ul></ul></ul>
    26. 26. An H-Tree Cubing Structure root entertainment sports politics uic uiuc uic uiuc jeff mary jeff Jim Q.I. Q.I. Q.I. Observation layer Minimal int. layer Regression: Sum: xxxx Cnt: yyyy Quant-Info
    27. 27. Benefits of H-Tree and H-Cubing <ul><li>H-tree and H-cubing </li></ul><ul><ul><li>Developed for computing data cubes and ice-berg cubes </li></ul></ul><ul><ul><ul><li>J. Han, J. Pei, G. Dong, and K. Wang, “ Efficient Computation of Iceberg Cubes with Complex Measures ”, SIGMOD'01 </li></ul></ul></ul><ul><ul><li>Compressed database </li></ul></ul><ul><ul><li>Fast cubing </li></ul></ul><ul><ul><li>Space preserving in cube computation </li></ul></ul><ul><li>Using H-tree for stream cubing </li></ul><ul><ul><li>Space preserving </li></ul></ul><ul><ul><ul><li>Intermediate aggregates can be computed incrementally and saved in tree nodes </li></ul></ul></ul><ul><ul><li>Facilitate computing other cells and multi-dimensional analysis </li></ul></ul><ul><ul><li>H-tree with computed cells can be viewed as stream cube </li></ul></ul>
    28. 28. a) Time vs. m-layer size b) Space vs. m-layer size Time and Space vs. Number of Tuples at the m-Layer (Dataset D3L3C10T400K)
    29. 29. Time and Space vs. the Number of Levels a) Time vs. # levels b) Space vs. # levels
    30. 30. Mining Other Dynamic Patterns in Stream Data <ul><li>Clustering and outlier analysis for stream mining </li></ul><ul><ul><li>Clustering data streams (Guha, Motwani et al. 2000-2002) </li></ul></ul><ul><ul><li>History-sensitive, high-quality incremental clustering </li></ul></ul><ul><li>Classification of stream data </li></ul><ul><ul><li>Evolution of decision trees: Domingos et al. (2000, 2001) </li></ul></ul><ul><ul><li>Incremental integration of new streams in decision-tree induction </li></ul></ul><ul><li>Frequent pattern analysis </li></ul><ul><ul><li>Approximate frequent patterns ( Manku & Motwani VLDB’02) </li></ul></ul><ul><ul><li>Evolution and dramatic changes of frequent patterns </li></ul></ul>
    31. 31. Clustering for Mining Stream Dynamics <ul><li>Network intrusion detection: one example </li></ul><ul><ul><li>Detect bursts of activities or abrupt changes in real time — by on-line clustering </li></ul></ul><ul><li>Our methodology </li></ul><ul><ul><li>Tilted time frame work: o.w. dynamic changes cannot be found </li></ul></ul><ul><ul><li>Micro-clustering: better quality than k-means/k-median </li></ul></ul><ul><ul><ul><li>incremental, online processing and maintenance) </li></ul></ul></ul><ul><ul><li>Two stages: micro-clustering and macro-clustering </li></ul></ul><ul><ul><li>With limited “overhead” to achieve high efficiency, scalability, quality of results and power of evolution/change detection </li></ul></ul>
    32. 32. Classification for Dynamic Data Streams <ul><li>Decision tree induction for stream data classification </li></ul><ul><ul><li>VFDT (Very Fast Decision Tree)/CVFDT (Domingos, Hulten, Spencer, KDD00/KDD01) </li></ul></ul><ul><ul><li>Is decision-tree good for modeling fast changing data, e.g., stock market analysis? </li></ul></ul><ul><li>Methodology </li></ul><ul><ul><li>Tilted time framework </li></ul></ul><ul><ul><li>Instead of decision-trees, should we consider other models which do not changes drastically, e.g., Naïve Bayesian with boosting? </li></ul></ul><ul><ul><li>Incremental updating, dynamic maintenance, and model construction </li></ul></ul><ul><ul><li>Comparing of models to find changes </li></ul></ul>
    33. 33. Frequent Patterns for Stream Data <ul><li>Frequent pattern mining is valuable in stream applications </li></ul><ul><ul><li>e.g., network intrusion mining (Dokas, et al’02) </li></ul></ul><ul><li>Mining precise freq patterns in stream data: unrealistic </li></ul><ul><ul><li>Even store them in a compressed form, such as FPtree </li></ul></ul><ul><li>How to mine frequent patterns with good approximation? </li></ul><ul><ul><li>Approximate frequent patterns (Manku & Motwani VLDB’02) </li></ul></ul><ul><ul><li>Keep only current frequent patterns? No changes can be detected </li></ul></ul><ul><li>Our approach </li></ul><ul><ul><li>Keep Pattern-trees at the tilted time window frame (using tree-sharing method) </li></ul></ul><ul><ul><li>Mining evolution and dramatic changes of frequent patterns </li></ul></ul>
    34. 34. Conclusions <ul><li>Stream data mining: A rich and largely unexplored field </li></ul><ul><ul><li>Current research focus in database community: </li></ul></ul><ul><ul><ul><li>DSMS system architecture, continuous query processing, supporting mechanisms </li></ul></ul></ul><ul><ul><li>Stream data mining and stream OLAP analysis </li></ul></ul><ul><ul><ul><li>Powerful tools for finding general and unusual patterns </li></ul></ul></ul><ul><ul><ul><li>Effectiveness, efficiency and scalability: lots of open problems </li></ul></ul></ul><ul><li>Our philosophy: </li></ul><ul><ul><li>A multi-dimensional stream analysis framework </li></ul></ul><ul><ul><li>Time is a special dimension: tilted time frame </li></ul></ul><ul><ul><li>What to compute and what to save? — Critical layers </li></ul></ul><ul><ul><li>Very partial materialization/precomputation : popular path approach </li></ul></ul><ul><ul><li>Mining dynamics of stream data </li></ul></ul>
    35. 35. References <ul><li>B. Babcock, S. Babu, M. Datar, R. Motawani, and J. Widom, “Models and issues in data stream systems ” , PODS'02 (tutorial). </li></ul><ul><li>S. Babu and J. Widom, “Continuous queries over data streams”, SIGMOD Record, 30:109--120, 2001. </li></ul><ul><li>Y. Chen, G. Dong, J. Han, J. Pei, B. W. Wah, and J. Wang. “Online analytical processing stream data: Is it feasible?”, DMKD'02. </li></ul><ul><li>Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang, “Multi-dimensional regression analysis of time-series data streams”, VLDB'02. </li></ul><ul><li>P. Domingos and G. Hulten, “Mining high-speed data streams”, KDD'00. </li></ul><ul><li>M. Garofalakis, J. Gehrke, and R. Rastogi, “Querying and mining data streams: You only get one look”, SIGMOD'02 (tutorial). </li></ul><ul><li>J. Gehrke, F. Korn, and D. Srivastava, “On computing correlated aggregates over continuous data streams”, SIGMOD'01. </li></ul><ul><li>S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan, “Clustering data streams”, FOCS'00. </li></ul><ul><li>G. Hulten, L. Spencer, and P. Domingos, “Mining time-changing data streams”, KDD'01. </li></ul>
    36. 36. www.cs.uiuc.edu/~hanj <ul><li>Thank you !!! </li></ul>
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×