SlideShare a Scribd company logo
1 of 65
1
March 20, 2024
Data Mining: Concepts and
Techniques 1
Data Mining for Data Streams
March 20, 2024 Data Mining: Concepts and Techniques 2
Mining Data Streams
 What is stream data? Why Stream Data Systems?
 Stream data management systems: Issues and solutions
 Stream data cube and multidimensional OLAP analysis
 Stream frequent pattern analysis
 Stream classification
 Stream cluster analysis
 Sketching
March 20, 2024 Data Mining: Concepts and Techniques 3
Characteristics of Data Streams
 Data Streams Model:
 Data enters at a high speed rate
 The system cannot store the entire stream, but only a small
fraction
 How do you make critical calculations about the stream using a
limited amount of memory?
 Characteristics
 Huge volumes of continuous data, possibly infinite
 Fast changing and requires fast, real-time response
 Random access is expensive—single scan algorithms(can only have
one look)
March 20, 2024 Data Mining: Concepts and Techniques 4
Architecture: Stream Query Processing
Scratch Space
(Main memory and/or Disk)
User/Application
Continuous Query
Stream Query
Processor
Results
Multiple streams
SDMS (Stream Data
Management System)
March 20, 2024 Data Mining: Concepts and Techniques 5
Stream Data Applications
 Telecommunication calling records
 Business: credit card transaction flows
 Network monitoring and traffic engineering
 Financial market: stock exchange
 Engineering & industrial processes: power supply &
manufacturing
 Sensor, monitoring & surveillance: video streams, RFIDs
 Web logs and Web page click streams
 Massive data sets (even saved but random access is too
expensive)
March 20, 2024 Data Mining: Concepts and Techniques 6
DBMS versus DSMS
 Persistent relations
 One-time queries
 Random access
 “Unbounded” disk store
 Only current state matters
 No real-time services
 Relatively low update rate
 Data at any granularity
 Assume precise data
 Access plan determined by
query processor, physical DB
design
 Transient streams
 Continuous queries
 Sequential access
 Bounded main memory
 Historical data is important
 Real-time requirements
 Possibly multi-GB arrival rate
 Data at fine granularity
 Data stale/imprecise
 Unpredictable/variable data
arrival and characteristics
Ack. From Motwani’s PODS tutorial slides
March 20, 2024 Data Mining: Concepts and Techniques 7
Mining Data Streams
 What is stream data? Why Stream Data Systems?
 Stream data management systems: Issues and solutions
 Stream data cube and multidimensional OLAP analysis
 Stream frequent pattern analysis
 Stream classification
 Stream cluster analysis
March 20, 2024 Data Mining: Concepts and Techniques 8
Processing Stream Queries
 Query types
 One-time query vs. continuous query (being evaluated
continuously as stream continues to arrive)
 Predefined query vs. ad-hoc query (issued on-line)
 Unbounded memory requirements
 For real-time response, main memory algorithm should be used
 Memory requirement is unbounded if one will join future tuples
 Approximate query answering
 With bounded memory, it is not always possible to produce exact
answers
 High-quality approximate answers are desired
 Data reduction and synopsis construction methods
 Sketches, random sampling, histograms, wavelets, etc.
March 20, 2024 Data Mining: Concepts and Techniques 9
Methodologies for Stream Data Processing
 Major challenges
 Keep track of a large universe, e.g., pairs of IP address, not ages
 Methodology
 Synopses (trade-off between accuracy and storage)
 Use synopsis data structure, much smaller (O(logk N) space) than
their base data set (O(N) space)
 Compute an approximate answer within a small error range
(factor ε of the actual answer)
 Major methods
 Random sampling
 Histograms
 Sliding windows
 Multi-resolution model
 Sketches
 Radomized algorithms
March 20, 2024 Data Mining: Concepts and Techniques 10
Stream Data Processing Methods (1)
 Random sampling (but without knowing the total length in advance)
 Reservoir sampling: maintain a set of s candidates in the reservoir, which
form a true random sample of the element seen so far in the stream. As
the data stream flow, every new element has a certain probability (s/N)
of replacing an old element in the reservoir.
 Sliding windows
 Make decisions based only on recent data of sliding window size w
 An element arriving at time t expires at time t + w
 Histograms
 Approximate the frequency distribution of element values in a stream
 Partition data into a set of contiguous buckets
 Equal-width (equal value range for buckets) vs. V-optimal (minimizing
frequency variance within each bucket)
 Multi-resolution models
 Popular models: balanced binary trees, micro-clusters, and wavelets
March 20, 2024 Data Mining: Concepts and Techniques 11
Stream Data Mining vs. Stream Querying
 Stream mining—A more challenging task in many cases
 It shares most of the difficulties with stream querying
 But often requires less “precision”, e.g., no join,
grouping, sorting
 Patterns are hidden and more general than querying
 It may require exploratory analysis
 Not necessarily continuous queries
 Stream data mining tasks
 Multi-dimensional on-line analysis of streams
 Mining outliers and unusual patterns in stream data
 Clustering data streams
 Classification of stream data
March 20, 2024 Data Mining: Concepts and Techniques 12
Mining Data Streams
 What is stream data? Why Stream Data Systems?
 Stream data management systems: Issues and solutions
 Stream data cube and multidimensional OLAP analysis
 Stream frequent pattern analysis
 Stream classification
 Stream cluster analysis
 Research issues
March 20, 2024 Data Mining: Concepts and Techniques 13
Challenges for Mining Dynamics in Data
Streams
 Most stream data are at pretty low-level or multi-
dimensional in nature: needs ML/MD processing
 Analysis requirements
 Multi-dimensional trends and unusual patterns
 Capturing important changes at multi-dimensions/levels
 Fast, real-time detection and response
 Comparing with data cube: Similarity and differences
 Stream (data) cube or stream OLAP: Is this feasible?
 Can we implement it efficiently?
March 20, 2024 Data Mining: Concepts and Techniques 14
Multi-Dimensional Stream Analysis:
Examples
 Analysis of Web click streams
 Raw data at low levels: seconds, web page addresses, user IP
addresses, …
 Analysts want: changes, trends, unusual patterns, at reasonable
levels of details
 E.g., Average clicking traffic in North America on sports in the last
15 minutes is 40% higher than that in the last 24 hours.”
 Analysis of power consumption streams
 Raw data: power consumption flow for every household, every
minute
 Patterns one may find: average hourly power consumption surges
up 30% for manufacturing companies in Chicago in the last 2
hours today than that of the same day a week ago
March 20, 2024 Data Mining: Concepts and Techniques 15
A Stream Cube Architecture
 A tilted time frame
 Different time granularities
 second, minute, quarter, hour, day, week, …
 Critical layers
 Minimum interest layer (m-layer)
 Observation layer (o-layer)
 User: watches at o-layer and occasionally needs to drill-down down
to m-layer
 Partial materialization of stream cubes
 Full materialization: too space and time consuming
 No materialization: slow response at query time
 Partial materialization…
March 20, 2024 Data Mining: Concepts and Techniques 16
A Titled Time Model
 Natural tilted time frame:
 Example: Minimal: quarter, then 4 quarters  1 hour, 24 hours 
day, …
 Logarithmic tilted time frame:
 Example: Minimal: 1 minute, then 1, 2, 4, 8, 16, 32, …
Time
t
8t 4t 2t t
16t
32t
64t
4 qtrs
24 hours
31 days
12 months
time
March 20, 2024 Data Mining: Concepts and Techniques 17
Two Critical Layers in the Stream Cube
(*, theme, quarter)
(user-group, URL-group, minute)
m-layer (minimal interest)
(individual-user, URL, second)
(primitive) stream data layer
o-layer (observation)
March 20, 2024 Data Mining: Concepts and Techniques 18
On-Line Partial Materialization vs.
OLAP Processing
 On-line materialization
 Materialization takes precious space and time
 Only incremental materialization (with tilted time frame)
 Only materialize “cuboids” of the critical layers?
 Online computation may take too much time
 Preferred solution:
 popular-path approach: Materializing those along the popular
drilling paths
 H-tree structure: Such cuboids can be computed and stored
efficiently using the H-tree structure
 Online aggregation vs. query-based computation
 Online computing while streaming: aggregating stream cubes
 Query-based computation: using computed cuboids
March 20, 2024 Data Mining: Concepts and Techniques 19
Mining Data Streams
 What is stream data? Why Stream Data Systems?
 Stream data management systems: Issues and solutions
 Stream data cube and multidimensional OLAP analysis
 Stream frequent pattern analysis
 Stream classification
 Stream cluster analysis
March 20, 2024 Data Mining: Concepts and Techniques 20
Mining Approximate Frequent Patterns
 Mining precise freq. patterns in stream data: unrealistic
 Even store them in a compressed form, such as FPtree
 Approximate answers are often sufficient (e.g., trend/pattern analysis)
 Example: a router is interested in all flows:
 whose frequency is at least 1% (s) of the entire traffic stream
seen so far
 and feels that 1/10 of s (ε = 0.1%) error is comfortable
 How to mine frequent patterns with good approximation?
 Lossy Counting Algorithm (Manku & Motwani, VLDB’02)
 Based on Majority Voting…
March 20, 2024 Data Mining: Concepts and Techniques 21
Majority
 A sequence of N items.
 You have constant memory.
 In one pass, decide if some item is in
majority (occurs > N/2 times)?
9
3
9
9
9
4
6
7
9
9
9
2
N = 12; item 9 is majority
March 20, 2024 Data Mining: Concepts and Techniques 22
Misra-Gries Algorithm (‘82)
 A counter and an ID.
 If new item is same as stored ID, increment counter.
 Otherwise, decrement the counter.
 If counter 0, store new item with count = 1.
 If counter > 0, then its item is the only candidate for majority.
2 9 9 9 7 6 4 9 9 9 3 9
ID 2 2 9 9 9 9 4 4 9 9 9 9
count 1 0 1 2 1 0 1 0 1 2 1 2
March 20, 2024 Data Mining: Concepts and Techniques 23
A generalization: Frequent Items
Find k items, each occurring at least N/(k+1) times.
 Algorithm:
 Maintain k items, and their counters.
 If next item x is one of the k, increment its counter.
 Else if a zero counter, put x there with count = 1
 Else (all counters non-zero) decrement all k counters
.
.
count
IDk
.
.
.
.
ID2
ID1
ID
March 20, 2024 Data Mining: Concepts and Techniques 24
Frequent Elements: Analysis
 A frequent item’s count is decremented if all
counters are full: it erases k+1 items.
 If x occurs > N/(k+1) times, then it cannot be
completely erased.
 Similarly, x must get inserted at some point, because
there are not enough items to keep it away.
March 20, 2024 Data Mining: Concepts and Techniques 25
Problem of False Positives
 False positives in Misra-Gries algorithm
 It identifies all true heavy hitters, but not all reported items
are necessarily heavy hitters.
 How can we tell if the non-zero counters correspond to true
heavy hitters or not?
 A second pass is needed to verify.
 False positives are problematic if heavy hitters are
used for billing or punishment.
 What guarantees can we achieve in one pass?
March 20, 2024 Data Mining: Concepts and Techniques 26
Approximation Guarantees
 Find heavy hitters with a guaranteed approximation error [Demaine et
al., Manku-Motwani, Estan-Varghese…]
 Manku-Motwani (Lossy Counting)
 Suppose you want -heavy hitters--- items with freq > N
 An approximation parameter , where  << .
(E.g.,  = .01 and  = .0001;  = 1% and  = .01% )
 Identify all items with frequency >  N
 No reported item has frequency < ( - )N
 The algorithm uses O(1/ log (N)) memory
G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams, VLDB’02
March 20, 2024 Data Mining: Concepts and Techniques 27
Lossy Counting
Step 1: Divide the stream into ‘windows’
Is window size a function of support s? Will fix later…
Window 1 Window 2 Window 3
March 20, 2024 Data Mining: Concepts and Techniques 28
Lossy Counting in Action ...
Empty
Frequency
Counts
At window boundary, decrement all counters by 1
+
First Window
March 20, 2024 Data Mining: Concepts and Techniques 29
Lossy Counting continued ...
Frequency
Counts
At window boundary, decrement all counters by 1
Next Window
+
March 20, 2024 Data Mining: Concepts and Techniques 30
Error Analysis
If current size of stream = N
and window-size = 1/ε
then #windows = εN
Rule of thumb:
Set ε = 10% of support s
Example:
Given support frequency s = 1%,
set error frequency ε = 0.1%
frequency error 
How much do we undercount?
March 20, 2024 Data Mining: Concepts and Techniques 31
How many counters do we need?
Worst case: 1/ε log (ε N) counters [See paper for proof]
Output:
Elements with counter values exceeding sN – εN
Approximation guarantees
Frequencies underestimated by at most εN
No false negatives
False positives have true frequency at least sN – εN
March 20, 2024 Data Mining: Concepts and Techniques 32
Enhancements ...
Frequency Errors
For counter (X, c), true frequency in [c, c+εN]
Trick: Remember window-id’s
For counter (X, c, w), true frequency in [c, c+w-1]
Batch Processing
Decrements after k windows
If (w = 1), no error!
March 20, 2024 Data Mining: Concepts and Techniques 33
Algorithm 2: Sticky Sampling
Stream
 Create counters by sampling
 Maintain exact counts thereafter
What rate should we sample?
34
15
30
28
31
41
23
35
19
March 20, 2024 Data Mining: Concepts and Techniques 34
Sticky Sampling contd...
For finite stream of length N
Sampling rate = 2/Nε log 1/(s)
Same Rule of thumb:
Set ε = 10% of support s
Example:
Given support threshold s = 1%,
set error threshold ε = 0.1%
set failure probability  = 0.01%
Output:
Elements with counter values exceeding sN – εN
Same error guarantees
as Lossy Counting
but probabilistic
Approximation guarantees (probabilistic)
Frequencies underestimated by at most εN
No false negatives
False positives have true frequency at least sN – εN
 = probability of failure
March 20, 2024 Data Mining: Concepts and Techniques 35
Sampling rate?
Finite stream of length N
Sampling rate: 2/Nε log 1/(s)
Independent of N!
Infinite stream with unknown N
Gradually adjust sampling rate (see paper for details)
In either case,
Expected number of counters = 2/ log 1/s
March 20, 2024 Data Mining: Concepts and Techniques 36
No
of
counters
Support s = 1%
Error ε = 0.1%
N (stream length)
No
of
counters
Sticky Sampling Expected: 2/ log 1/s
Lossy Counting Worst Case: 1/ log N
Log10 of N (stream length)
March 20, 2024 Data Mining: Concepts and Techniques 37
From elements
to sets of elements …
March 20, 2024 Data Mining: Concepts and Techniques 38
Frequent Itemsets Problem ...
Stream
 Identify all subsets of items whose
current frequency exceeds s = 0.1%.
Frequent Itemsets => Association Rules
March 20, 2024 Data Mining: Concepts and Techniques 39
Three Modules
BUFFER
TRIE
SUBSET-GEN
March 20, 2024 Data Mining: Concepts and Techniques 40
Module 1: TRIE
Compact representation of frequent itemsets in lexicographic order.
50
40
30
31 29 32
45
42
50 40 30 31 29
45 32 42
Sets with frequency counts
March 20, 2024 Data Mining: Concepts and Techniques 41
Module 2: BUFFER
Compact representation as sequence of ints
Transactions sorted by item-id
Bitmap for transaction boundaries
Window 1 Window 2 Window 3 Window 4 Window 5 Window 6
In Main Memory
March 20, 2024 Data Mining: Concepts and Techniques 42
Module 3: SUBSET-GEN
BUFFER
3 3 3 4
2 2 1
2 1
3
1
1
Frequency counts
of subsets
in lexicographic order
March 20, 2024 Data Mining: Concepts and Techniques 43
Overall Algorithm ...
BUFFER
3 3 3 4
2 2 1
2 1
3
1
1 SUBSET-GEN
TRIE new TRIE
Problem: Number of subsets is exponential!
March 20, 2024 Data Mining: Concepts and Techniques 44
SUBSET-GEN Pruning Rules
A-priori Pruning Rule
If set S is infrequent, every superset of S is infrequent.
See paper for details ...
Lossy Counting Pruning Rule
At each ‘window boundary’ decrement TRIE counters by 1.
Actually, ‘Batch Deletion’:
At each ‘main memory buffer’ boundary,
decrement all TRIE counters by b.
March 20, 2024 Data Mining: Concepts and Techniques 45
Bottlenecks ...
BUFFER
3 3 3 4
2 2 1
2 1
3
1
1 SUBSET-GEN
TRIE new TRIE
Consumes main memory Consumes CPU time
March 20, 2024 Data Mining: Concepts and Techniques 46
Design Decisions for Performance
TRIE Main memory bottleneck
Compact linear array
 (element, counter, level) in preorder traversal
 No pointers!
Tries are on disk
 All of main memory devoted to BUFFER
Pair of tries
 old and new (in chunks)
mmap() and madvise()
SUBSET-GEN CPU bottleneck
Very fast implementation
 See paper for details
March 20, 2024 Data Mining: Concepts and Techniques 47
Mining Data Streams
 What is stream data? Why Stream Data Systems?
 Stream data management systems: Issues and solutions
 Stream data cube and multidimensional OLAP analysis
 Stream frequent pattern analysis
 Stream classification
 Stream cluster analysis
March 20, 2024 Data Mining: Concepts and Techniques 48
Classification for Dynamic Data Streams
 Decision tree induction for stream data classification
 VFDT (Very Fast Decision Tree)/CVFDT (Domingos, Hulten,
Spencer, KDD00/KDD01)
 Is decision-tree good for modeling fast changing data, e.g., stock
market analysis?
 Other stream classification methods
 Instead of decision-trees, consider other models
 Naïve Bayesian
 Ensemble (Wang, Fan, Yu, Han. KDD’03)
 K-nearest neighbors (Aggarwal, Han, Wang, Yu. KDD’04)
 Tilted time framework, incremental updating, dynamic
maintenance, and model construction
 Comparing of models to find changes
March 20, 2024 Data Mining: Concepts and Techniques 49
Hoeffding Tree
 With high probability, classifies tuples the same
 Only uses small sample
 Based on Hoeffding Bound principle
 Hoeffding Bound (Additive Chernoff Bound)
r: random variable
R: range of r
n: # independent observations
Mean of r is at least ravg – ε, with probability 1 – d
n
R
2
)
/
1
ln(
2

 
March 20, 2024 Data Mining: Concepts and Techniques 50
Hoeffding Tree Algorithm
 Hoeffding Tree Input
S: sequence of examples
X: attributes
G( ): evaluation function
d: desired accuracy
 Hoeffding Tree Algorithm
for each example in S
retrieve G(Xa) and G(Xb) //two highest G(Xi)
if ( G(Xa) – G(Xb) > ε )
split on Xa
recurse to next node
break
March 20, 2024 Data Mining: Concepts and Techniques 51
yes no
Packets > 10
Protocol = http
Protocol = ftp
yes
yes no
Packets > 10
Bytes > 60K
Protocol = http
Data Stream
Data Stream
Ack. From Gehrke’s SIGMOD tutorial slides
Decision-Tree Induction with Data
Streams
March 20, 2024 Data Mining: Concepts and Techniques 52
Hoeffding Tree: Strengths and Weaknesses
 Strengths
 Scales better than traditional methods
 Sublinear with sampling
 Very small memory utilization
 Incremental
 Make class predictions in parallel
 New examples are added as they come
 Weakness
 Could spend a lot of time with ties
 Memory used with tree expansion
 Number of candidate attributes
March 20, 2024 Data Mining: Concepts and Techniques 53
VFDT (Very Fast Decision Tree)
 Modifications to Hoeffding Tree
 Near-ties broken more aggressively
 G computed every nmin
 Deactivates certain leaves to save memory
 Poor attributes dropped
 Initialize with traditional learner (helps learning curve)
 Compare to Hoeffding Tree: Better time and memory
 Compare to traditional decision tree
 Similar accuracy
 Better runtime with 1.61 million examples
 21 minutes for VFDT
 24 hours for C4.5
March 20, 2024 Data Mining: Concepts and Techniques 54
CVFDT (Concept-adapting VFDT)
 Concept Drift
 Time-changing data streams
 Incorporate new and eliminate old
 CVFDT
 Increments count with new example
 Decrement old example
 Sliding window
 Nodes assigned monotonically increasing IDs
 Grows alternate subtrees
 When alternate more accurate => replace old
 O(w) better runtime than VFDT-window
March 20, 2024 Data Mining: Concepts and Techniques 55
Mining Data Streams
 What is stream data? Why Stream Data Systems?
 Stream data management systems: Issues and solutions
 Stream data cube and multidimensional OLAP analysis
 Stream frequent pattern analysis
 Stream classification
 Stream cluster analysis
 Research issues
March 20, 2024 Data Mining: Concepts and Techniques 56
Clustering Data Streams [GMMO01]
 Base on the k-median method
 Data stream points from metric space
 Find k clusters in the stream s.t. the sum of distances
from data points to their closest center is minimized
 Constant factor approximation algorithm
 In small space, a simple two step algorithm:
1. For each set of M records, Si, find O(k) centers in
S1, …, Sl
 Local clustering: Assign each point in Si to its
closest center
2. Let S’ be centers for S1, …, Sl with each center
weighted by number of points assigned to it
 Cluster S’ to find k centers
March 20, 2024 Data Mining: Concepts and Techniques 57
Hierarchical Clustering Tree
data points
level-i medians
level-(i+1) medians
March 20, 2024 Data Mining: Concepts and Techniques 58
Hierarchical Tree and Drawbacks
 Method:
 maintain at most m level-i medians
 On seeing m of them, generate O(k) level-(i+1)
medians of weight equal to the sum of the weights of
the intermediate medians assigned to them
 Drawbacks:
 Low quality for evolving data streams (register only k
centers)
 Limited functionality in discovering and exploring
clusters over different portions of the stream over time
March 20, 2024 Data Mining: Concepts and Techniques 59
Clustering for Mining Stream Dynamics
 Network intrusion detection: one example
 Detect bursts of activities or abrupt changes in real time—by on-
line clustering
 Another approach:
 Tilted time frame work: o.w. dynamic changes cannot be found
 Micro-clustering: better quality than k-means/k-median
 incremental, online processing and maintenance
 Two stages: micro-clustering and macro-clustering
 With limited “overhead” to achieve high efficiency, scalability,
quality of results and power of evolution/change detection
March 20, 2024 Data Mining: Concepts and Techniques 60
CluStream: A Framework for Clustering
Evolving Data Streams
 Design goal
 High quality for clustering evolving data streams with greater
functionality
 While keep the stream mining requirement in mind
 One-pass over the original stream data
 Limited space usage and high efficiency
 CluStream: A framework for clustering evolving data streams
 Divide the clustering process into online and offline components
 Online component: periodically stores summary statistics about
the stream data
 Offline component: answers various user questions based on
the stored summary statistics
March 20, 2024 Data Mining: Concepts and Techniques 61
The CluStream Framework
...
...
1 k
X
X ...
...
1 k
T
T
 
d
i
i
i x
x
X ...
1

 
n
CF
CF
CF
CF t
t
x
x
,
1
,
2
,
1
,
2
 Micro-cluster
 Statistical information about data locality
 Temporal extension of the cluster-feature vector
 Multi-dimensional points with time stamps
 Each point contains d dimensions, i.e.,
 A micro-cluster for n points is defined as a (2.d + 3)
tuple
 Pyramidal time frame
 Decide at what moments the snapshots of the
statistical information are stored away on disk
March 20, 2024 Data Mining: Concepts and Techniques 62
CluStream: Pyramidal Time Frame
 Pyramidal time frame
 Snapshots of a set of micro-clusters are stored
following the pyramidal pattern
 They are stored at differing levels of granularity
depending on recency
 Snapshots are classified into different orders
varying from 1 to log(T)
 The i-th order snapshots occur at intervals of αi
where α ≥ 1
 Only the last (α + 1) snapshots are stored
March 20, 2024 Data Mining: Concepts and Techniques 63
CluStream: Clustering On-line Streams
 Online micro-cluster maintenance
 Initial creation of q micro-clusters
 q is usually significantly larger than the number of natural
clusters
 Online incremental update of micro-clusters
 If new point is within max-boundary, insert into the micro-
cluster
 O.w., create a new cluster
 May delete obsolete micro-cluster or merge two closest ones
 Query-based macro-clustering
 Based on a user-specified time-horizon h and the number of
macro-clusters K, compute macroclusters using the k-means
algorithm
March 20, 2024 Data Mining: Concepts and Techniques 64
References on Stream Data Mining (1)
 C. Aggarwal, J. Han, J. Wang, P. S. Yu. A Framework for Clustering Data
Streams, VLDB'03
 C. C. Aggarwal, J. Han, J. Wang and P. S. Yu. On-Demand Classification of Evolving Data
Streams, KDD'04
 C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A Framework for Projected Clustering of
High Dimensional Data Streams, VLDB'04
 S. Babu and J. Widom. Continuous Queries over Data Streams. SIGMOD Record, Sept.
2001
 B. Babcock, S. Babu, M. Datar, R. Motwani and J. Widom. Models and Issues in Data
Stream Systems”, PODS'02. (Conference tutorial)
 Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. "Multi-Dimensional Regression
Analysis of Time-Series Data Streams, VLDB'02
 P. Domingos and G. Hulten, “Mining high-speed data streams”, KDD'00
 A. Dobra, M. N. Garofalakis, J. Gehrke, R. Rastogi. Processing Complex Aggregate
Queries over Data Streams, SIGMOD’02
 J. Gehrke, F. Korn, D. Srivastava. On computing correlated aggregates over continuous
data streams. SIGMOD'01
 C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu. Mining frequent patterns in data streams
at multiple time granularities, Kargupta, et al. (eds.), Next Generation Data Mining’04
March 20, 2024 Data Mining: Concepts and Techniques 65
References on Stream Data Mining (2)
 S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering Data Streams, FOCS'00
 G. Hulten, L. Spencer and P. Domingos: Mining time-changing data streams. KDD 2001
 S. Madden, M. Shah, J. Hellerstein, V. Raman, Continuously Adaptive Continuous Queries
over Streams, SIGMOD02
 G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams, VLDB’02
 A. Metwally, D. Agrawal, and A. El Abbadi. Efficient Computation of Frequent and Top-k
Elements in Data Streams. ICDT'05
 S. Muthukrishnan, Data streams: algorithms and applications, Proceedings of the
fourteenth annual ACM-SIAM symposium on Discrete algorithms, 2003
 R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge Univ. Press, 1995
 S. Viglas and J. Naughton, Rate-Based Query Optimization for Streaming Information
Sources, SIGMOD’02
 Y. Zhu and D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams
in Real Time, VLDB’02
 H. Wang, W. Fan, P. S. Yu, and J. Han, Mining Concept-Drifting Data Streams using
Ensemble Classifiers, KDD'03

More Related Content

Similar to data streammining and its applications.ppt

01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.pptadmsoyadm4
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1DanWooster1
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introductionbutest
 
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kambererror007
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptPadmajaLaksh
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesasnaparveen414
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningAbcdDcba12
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 abhagathk
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008Emanuele Della Valle
 
Data Mining introduction and basic concepts
Data Mining introduction and basic conceptsData Mining introduction and basic concepts
Data Mining introduction and basic conceptsPritiRishi
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solrboorad
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream ProcessingZbigniew Jerzak
 
Reflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream ProcessingReflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream ProcessingKyumars Sheykh Esmaili
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 

Similar to data streammining and its applications.ppt (20)

01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt01Introduction to data mining chapter 1.ppt
01Introduction to data mining chapter 1.ppt
 
01Intro.ppt
01Intro.ppt01Intro.ppt
01Intro.ppt
 
Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1Upstate CSCI 525 Data Mining Chapter 1
Upstate CSCI 525 Data Mining Chapter 1
 
Chapter 1. Introduction
Chapter 1. IntroductionChapter 1. Introduction
Chapter 1. Introduction
 
Stream Processing
Stream Processing Stream Processing
Stream Processing
 
Chapter 1. Introduction.ppt
Chapter 1. Introduction.pptChapter 1. Introduction.ppt
Chapter 1. Introduction.ppt
 
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & KamberChapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
Chapter - 8.1 Data Mining Concepts and Techniques 2nd Ed slides Han & Kamber
 
Unit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.pptUnit 1 (Chapter-1) on data mining concepts.ppt
Unit 1 (Chapter-1) on data mining concepts.ppt
 
Data Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notesData Mining mod1 ppt.pdf bca sixth semester notes
Data Mining mod1 ppt.pdf bca sixth semester notes
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
Introduction to data warehouse
Introduction to data warehouseIntroduction to data warehouse
Introduction to data warehouse
 
Dwdmunit1 a
Dwdmunit1 aDwdmunit1 a
Dwdmunit1 a
 
A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008A First Step Towards Stream Reasoning at FIS 2008
A First Step Towards Stream Reasoning at FIS 2008
 
Aa31163168
Aa31163168Aa31163168
Aa31163168
 
Data Mining introduction and basic concepts
Data Mining introduction and basic conceptsData Mining introduction and basic concepts
Data Mining introduction and basic concepts
 
unit 1 DATA MINING.ppt
unit 1 DATA MINING.pptunit 1 DATA MINING.ppt
unit 1 DATA MINING.ppt
 
Big Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and SolrBig Data Analysis Patterns with Hadoop, Mahout and Solr
Big Data Analysis Patterns with Hadoop, Mahout and Solr
 
Cloud-based Data Stream Processing
Cloud-based Data Stream ProcessingCloud-based Data Stream Processing
Cloud-based Data Stream Processing
 
Reflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream ProcessingReflections on Almost Two Decades of Research into Stream Processing
Reflections on Almost Two Decades of Research into Stream Processing
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 

More from ajajkhan16

Unit-II DBMS presentation for students.pdf
Unit-II DBMS presentation for students.pdfUnit-II DBMS presentation for students.pdf
Unit-II DBMS presentation for students.pdfajajkhan16
 
data stream processing.and its applications pdf
data stream processing.and its applications pdfdata stream processing.and its applications pdf
data stream processing.and its applications pdfajajkhan16
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfajajkhan16
 
Big-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxBig-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxajajkhan16
 
binarysearchtreeindatastructures-200604055006 (1).pdf
binarysearchtreeindatastructures-200604055006 (1).pdfbinarysearchtreeindatastructures-200604055006 (1).pdf
binarysearchtreeindatastructures-200604055006 (1).pdfajajkhan16
 
Array and its types and it's implemented programming Final.pdf
Array and its types and it's implemented programming Final.pdfArray and its types and it's implemented programming Final.pdf
Array and its types and it's implemented programming Final.pdfajajkhan16
 
sparkbigdataanlyticspoweerpointpptt.pptx
sparkbigdataanlyticspoweerpointpptt.pptxsparkbigdataanlyticspoweerpointpptt.pptx
sparkbigdataanlyticspoweerpointpptt.pptxajajkhan16
 
20-security.ppt
20-security.ppt20-security.ppt
20-security.pptajajkhan16
 
Linked list.pptx
Linked list.pptxLinked list.pptx
Linked list.pptxajajkhan16
 
08 MK-PPT Advanced Topic 2.ppt
08 MK-PPT Advanced Topic 2.ppt08 MK-PPT Advanced Topic 2.ppt
08 MK-PPT Advanced Topic 2.pptajajkhan16
 
Unit-I Recursion.pptx
Unit-I Recursion.pptxUnit-I Recursion.pptx
Unit-I Recursion.pptxajajkhan16
 
Unit-I Pointer Data structure.pptx
Unit-I Pointer Data structure.pptxUnit-I Pointer Data structure.pptx
Unit-I Pointer Data structure.pptxajajkhan16
 
Unit-1 DataStructure Intro.pptx
Unit-1 DataStructure Intro.pptxUnit-1 DataStructure Intro.pptx
Unit-1 DataStructure Intro.pptxajajkhan16
 

More from ajajkhan16 (15)

Unit-II DBMS presentation for students.pdf
Unit-II DBMS presentation for students.pdfUnit-II DBMS presentation for students.pdf
Unit-II DBMS presentation for students.pdf
 
data stream processing.and its applications pdf
data stream processing.and its applications pdfdata stream processing.and its applications pdf
data stream processing.and its applications pdf
 
NOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdfNOSQL in big data is the not only structure langua.pdf
NOSQL in big data is the not only structure langua.pdf
 
Big-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptxBig-Data 5V of big data engineering.pptx
Big-Data 5V of big data engineering.pptx
 
binarysearchtreeindatastructures-200604055006 (1).pdf
binarysearchtreeindatastructures-200604055006 (1).pdfbinarysearchtreeindatastructures-200604055006 (1).pdf
binarysearchtreeindatastructures-200604055006 (1).pdf
 
Array and its types and it's implemented programming Final.pdf
Array and its types and it's implemented programming Final.pdfArray and its types and it's implemented programming Final.pdf
Array and its types and it's implemented programming Final.pdf
 
sparkbigdataanlyticspoweerpointpptt.pptx
sparkbigdataanlyticspoweerpointpptt.pptxsparkbigdataanlyticspoweerpointpptt.pptx
sparkbigdataanlyticspoweerpointpptt.pptx
 
20-security.ppt
20-security.ppt20-security.ppt
20-security.ppt
 
Linked list.pptx
Linked list.pptxLinked list.pptx
Linked list.pptx
 
key1.pdf
key1.pdfkey1.pdf
key1.pdf
 
08 MK-PPT Advanced Topic 2.ppt
08 MK-PPT Advanced Topic 2.ppt08 MK-PPT Advanced Topic 2.ppt
08 MK-PPT Advanced Topic 2.ppt
 
Unit-I Recursion.pptx
Unit-I Recursion.pptxUnit-I Recursion.pptx
Unit-I Recursion.pptx
 
Unit-I Pointer Data structure.pptx
Unit-I Pointer Data structure.pptxUnit-I Pointer Data structure.pptx
Unit-I Pointer Data structure.pptx
 
Day2.pptx
Day2.pptxDay2.pptx
Day2.pptx
 
Unit-1 DataStructure Intro.pptx
Unit-1 DataStructure Intro.pptxUnit-1 DataStructure Intro.pptx
Unit-1 DataStructure Intro.pptx
 

Recently uploaded

Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitolTechU
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,Virag Sontakke
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxEyham Joco
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 

Recently uploaded (20)

Capitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptxCapitol Tech U Doctoral Presentation - April 2024.pptx
Capitol Tech U Doctoral Presentation - April 2024.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,भारत-रोम व्यापार.pptx, Indo-Roman Trade,
भारत-रोम व्यापार.pptx, Indo-Roman Trade,
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Types of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptxTypes of Journalistic Writing Grade 8.pptx
Types of Journalistic Writing Grade 8.pptx
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 

data streammining and its applications.ppt

  • 1. 1 March 20, 2024 Data Mining: Concepts and Techniques 1 Data Mining for Data Streams
  • 2. March 20, 2024 Data Mining: Concepts and Techniques 2 Mining Data Streams  What is stream data? Why Stream Data Systems?  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis  Sketching
  • 3. March 20, 2024 Data Mining: Concepts and Techniques 3 Characteristics of Data Streams  Data Streams Model:  Data enters at a high speed rate  The system cannot store the entire stream, but only a small fraction  How do you make critical calculations about the stream using a limited amount of memory?  Characteristics  Huge volumes of continuous data, possibly infinite  Fast changing and requires fast, real-time response  Random access is expensive—single scan algorithms(can only have one look)
  • 4. March 20, 2024 Data Mining: Concepts and Techniques 4 Architecture: Stream Query Processing Scratch Space (Main memory and/or Disk) User/Application Continuous Query Stream Query Processor Results Multiple streams SDMS (Stream Data Management System)
  • 5. March 20, 2024 Data Mining: Concepts and Techniques 5 Stream Data Applications  Telecommunication calling records  Business: credit card transaction flows  Network monitoring and traffic engineering  Financial market: stock exchange  Engineering & industrial processes: power supply & manufacturing  Sensor, monitoring & surveillance: video streams, RFIDs  Web logs and Web page click streams  Massive data sets (even saved but random access is too expensive)
  • 6. March 20, 2024 Data Mining: Concepts and Techniques 6 DBMS versus DSMS  Persistent relations  One-time queries  Random access  “Unbounded” disk store  Only current state matters  No real-time services  Relatively low update rate  Data at any granularity  Assume precise data  Access plan determined by query processor, physical DB design  Transient streams  Continuous queries  Sequential access  Bounded main memory  Historical data is important  Real-time requirements  Possibly multi-GB arrival rate  Data at fine granularity  Data stale/imprecise  Unpredictable/variable data arrival and characteristics Ack. From Motwani’s PODS tutorial slides
  • 7. March 20, 2024 Data Mining: Concepts and Techniques 7 Mining Data Streams  What is stream data? Why Stream Data Systems?  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis
  • 8. March 20, 2024 Data Mining: Concepts and Techniques 8 Processing Stream Queries  Query types  One-time query vs. continuous query (being evaluated continuously as stream continues to arrive)  Predefined query vs. ad-hoc query (issued on-line)  Unbounded memory requirements  For real-time response, main memory algorithm should be used  Memory requirement is unbounded if one will join future tuples  Approximate query answering  With bounded memory, it is not always possible to produce exact answers  High-quality approximate answers are desired  Data reduction and synopsis construction methods  Sketches, random sampling, histograms, wavelets, etc.
  • 9. March 20, 2024 Data Mining: Concepts and Techniques 9 Methodologies for Stream Data Processing  Major challenges  Keep track of a large universe, e.g., pairs of IP address, not ages  Methodology  Synopses (trade-off between accuracy and storage)  Use synopsis data structure, much smaller (O(logk N) space) than their base data set (O(N) space)  Compute an approximate answer within a small error range (factor ε of the actual answer)  Major methods  Random sampling  Histograms  Sliding windows  Multi-resolution model  Sketches  Radomized algorithms
  • 10. March 20, 2024 Data Mining: Concepts and Techniques 10 Stream Data Processing Methods (1)  Random sampling (but without knowing the total length in advance)  Reservoir sampling: maintain a set of s candidates in the reservoir, which form a true random sample of the element seen so far in the stream. As the data stream flow, every new element has a certain probability (s/N) of replacing an old element in the reservoir.  Sliding windows  Make decisions based only on recent data of sliding window size w  An element arriving at time t expires at time t + w  Histograms  Approximate the frequency distribution of element values in a stream  Partition data into a set of contiguous buckets  Equal-width (equal value range for buckets) vs. V-optimal (minimizing frequency variance within each bucket)  Multi-resolution models  Popular models: balanced binary trees, micro-clusters, and wavelets
  • 11. March 20, 2024 Data Mining: Concepts and Techniques 11 Stream Data Mining vs. Stream Querying  Stream mining—A more challenging task in many cases  It shares most of the difficulties with stream querying  But often requires less “precision”, e.g., no join, grouping, sorting  Patterns are hidden and more general than querying  It may require exploratory analysis  Not necessarily continuous queries  Stream data mining tasks  Multi-dimensional on-line analysis of streams  Mining outliers and unusual patterns in stream data  Clustering data streams  Classification of stream data
  • 12. March 20, 2024 Data Mining: Concepts and Techniques 12 Mining Data Streams  What is stream data? Why Stream Data Systems?  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis  Research issues
  • 13. March 20, 2024 Data Mining: Concepts and Techniques 13 Challenges for Mining Dynamics in Data Streams  Most stream data are at pretty low-level or multi- dimensional in nature: needs ML/MD processing  Analysis requirements  Multi-dimensional trends and unusual patterns  Capturing important changes at multi-dimensions/levels  Fast, real-time detection and response  Comparing with data cube: Similarity and differences  Stream (data) cube or stream OLAP: Is this feasible?  Can we implement it efficiently?
  • 14. March 20, 2024 Data Mining: Concepts and Techniques 14 Multi-Dimensional Stream Analysis: Examples  Analysis of Web click streams  Raw data at low levels: seconds, web page addresses, user IP addresses, …  Analysts want: changes, trends, unusual patterns, at reasonable levels of details  E.g., Average clicking traffic in North America on sports in the last 15 minutes is 40% higher than that in the last 24 hours.”  Analysis of power consumption streams  Raw data: power consumption flow for every household, every minute  Patterns one may find: average hourly power consumption surges up 30% for manufacturing companies in Chicago in the last 2 hours today than that of the same day a week ago
  • 15. March 20, 2024 Data Mining: Concepts and Techniques 15 A Stream Cube Architecture  A tilted time frame  Different time granularities  second, minute, quarter, hour, day, week, …  Critical layers  Minimum interest layer (m-layer)  Observation layer (o-layer)  User: watches at o-layer and occasionally needs to drill-down down to m-layer  Partial materialization of stream cubes  Full materialization: too space and time consuming  No materialization: slow response at query time  Partial materialization…
  • 16. March 20, 2024 Data Mining: Concepts and Techniques 16 A Titled Time Model  Natural tilted time frame:  Example: Minimal: quarter, then 4 quarters  1 hour, 24 hours  day, …  Logarithmic tilted time frame:  Example: Minimal: 1 minute, then 1, 2, 4, 8, 16, 32, … Time t 8t 4t 2t t 16t 32t 64t 4 qtrs 24 hours 31 days 12 months time
  • 17. March 20, 2024 Data Mining: Concepts and Techniques 17 Two Critical Layers in the Stream Cube (*, theme, quarter) (user-group, URL-group, minute) m-layer (minimal interest) (individual-user, URL, second) (primitive) stream data layer o-layer (observation)
  • 18. March 20, 2024 Data Mining: Concepts and Techniques 18 On-Line Partial Materialization vs. OLAP Processing  On-line materialization  Materialization takes precious space and time  Only incremental materialization (with tilted time frame)  Only materialize “cuboids” of the critical layers?  Online computation may take too much time  Preferred solution:  popular-path approach: Materializing those along the popular drilling paths  H-tree structure: Such cuboids can be computed and stored efficiently using the H-tree structure  Online aggregation vs. query-based computation  Online computing while streaming: aggregating stream cubes  Query-based computation: using computed cuboids
  • 19. March 20, 2024 Data Mining: Concepts and Techniques 19 Mining Data Streams  What is stream data? Why Stream Data Systems?  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis
  • 20. March 20, 2024 Data Mining: Concepts and Techniques 20 Mining Approximate Frequent Patterns  Mining precise freq. patterns in stream data: unrealistic  Even store them in a compressed form, such as FPtree  Approximate answers are often sufficient (e.g., trend/pattern analysis)  Example: a router is interested in all flows:  whose frequency is at least 1% (s) of the entire traffic stream seen so far  and feels that 1/10 of s (ε = 0.1%) error is comfortable  How to mine frequent patterns with good approximation?  Lossy Counting Algorithm (Manku & Motwani, VLDB’02)  Based on Majority Voting…
  • 21. March 20, 2024 Data Mining: Concepts and Techniques 21 Majority  A sequence of N items.  You have constant memory.  In one pass, decide if some item is in majority (occurs > N/2 times)? 9 3 9 9 9 4 6 7 9 9 9 2 N = 12; item 9 is majority
  • 22. March 20, 2024 Data Mining: Concepts and Techniques 22 Misra-Gries Algorithm (‘82)  A counter and an ID.  If new item is same as stored ID, increment counter.  Otherwise, decrement the counter.  If counter 0, store new item with count = 1.  If counter > 0, then its item is the only candidate for majority. 2 9 9 9 7 6 4 9 9 9 3 9 ID 2 2 9 9 9 9 4 4 9 9 9 9 count 1 0 1 2 1 0 1 0 1 2 1 2
  • 23. March 20, 2024 Data Mining: Concepts and Techniques 23 A generalization: Frequent Items Find k items, each occurring at least N/(k+1) times.  Algorithm:  Maintain k items, and their counters.  If next item x is one of the k, increment its counter.  Else if a zero counter, put x there with count = 1  Else (all counters non-zero) decrement all k counters . . count IDk . . . . ID2 ID1 ID
  • 24. March 20, 2024 Data Mining: Concepts and Techniques 24 Frequent Elements: Analysis  A frequent item’s count is decremented if all counters are full: it erases k+1 items.  If x occurs > N/(k+1) times, then it cannot be completely erased.  Similarly, x must get inserted at some point, because there are not enough items to keep it away.
  • 25. March 20, 2024 Data Mining: Concepts and Techniques 25 Problem of False Positives  False positives in Misra-Gries algorithm  It identifies all true heavy hitters, but not all reported items are necessarily heavy hitters.  How can we tell if the non-zero counters correspond to true heavy hitters or not?  A second pass is needed to verify.  False positives are problematic if heavy hitters are used for billing or punishment.  What guarantees can we achieve in one pass?
  • 26. March 20, 2024 Data Mining: Concepts and Techniques 26 Approximation Guarantees  Find heavy hitters with a guaranteed approximation error [Demaine et al., Manku-Motwani, Estan-Varghese…]  Manku-Motwani (Lossy Counting)  Suppose you want -heavy hitters--- items with freq > N  An approximation parameter , where  << . (E.g.,  = .01 and  = .0001;  = 1% and  = .01% )  Identify all items with frequency >  N  No reported item has frequency < ( - )N  The algorithm uses O(1/ log (N)) memory G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams, VLDB’02
  • 27. March 20, 2024 Data Mining: Concepts and Techniques 27 Lossy Counting Step 1: Divide the stream into ‘windows’ Is window size a function of support s? Will fix later… Window 1 Window 2 Window 3
  • 28. March 20, 2024 Data Mining: Concepts and Techniques 28 Lossy Counting in Action ... Empty Frequency Counts At window boundary, decrement all counters by 1 + First Window
  • 29. March 20, 2024 Data Mining: Concepts and Techniques 29 Lossy Counting continued ... Frequency Counts At window boundary, decrement all counters by 1 Next Window +
  • 30. March 20, 2024 Data Mining: Concepts and Techniques 30 Error Analysis If current size of stream = N and window-size = 1/ε then #windows = εN Rule of thumb: Set ε = 10% of support s Example: Given support frequency s = 1%, set error frequency ε = 0.1% frequency error  How much do we undercount?
  • 31. March 20, 2024 Data Mining: Concepts and Techniques 31 How many counters do we need? Worst case: 1/ε log (ε N) counters [See paper for proof] Output: Elements with counter values exceeding sN – εN Approximation guarantees Frequencies underestimated by at most εN No false negatives False positives have true frequency at least sN – εN
  • 32. March 20, 2024 Data Mining: Concepts and Techniques 32 Enhancements ... Frequency Errors For counter (X, c), true frequency in [c, c+εN] Trick: Remember window-id’s For counter (X, c, w), true frequency in [c, c+w-1] Batch Processing Decrements after k windows If (w = 1), no error!
  • 33. March 20, 2024 Data Mining: Concepts and Techniques 33 Algorithm 2: Sticky Sampling Stream  Create counters by sampling  Maintain exact counts thereafter What rate should we sample? 34 15 30 28 31 41 23 35 19
  • 34. March 20, 2024 Data Mining: Concepts and Techniques 34 Sticky Sampling contd... For finite stream of length N Sampling rate = 2/Nε log 1/(s) Same Rule of thumb: Set ε = 10% of support s Example: Given support threshold s = 1%, set error threshold ε = 0.1% set failure probability  = 0.01% Output: Elements with counter values exceeding sN – εN Same error guarantees as Lossy Counting but probabilistic Approximation guarantees (probabilistic) Frequencies underestimated by at most εN No false negatives False positives have true frequency at least sN – εN  = probability of failure
  • 35. March 20, 2024 Data Mining: Concepts and Techniques 35 Sampling rate? Finite stream of length N Sampling rate: 2/Nε log 1/(s) Independent of N! Infinite stream with unknown N Gradually adjust sampling rate (see paper for details) In either case, Expected number of counters = 2/ log 1/s
  • 36. March 20, 2024 Data Mining: Concepts and Techniques 36 No of counters Support s = 1% Error ε = 0.1% N (stream length) No of counters Sticky Sampling Expected: 2/ log 1/s Lossy Counting Worst Case: 1/ log N Log10 of N (stream length)
  • 37. March 20, 2024 Data Mining: Concepts and Techniques 37 From elements to sets of elements …
  • 38. March 20, 2024 Data Mining: Concepts and Techniques 38 Frequent Itemsets Problem ... Stream  Identify all subsets of items whose current frequency exceeds s = 0.1%. Frequent Itemsets => Association Rules
  • 39. March 20, 2024 Data Mining: Concepts and Techniques 39 Three Modules BUFFER TRIE SUBSET-GEN
  • 40. March 20, 2024 Data Mining: Concepts and Techniques 40 Module 1: TRIE Compact representation of frequent itemsets in lexicographic order. 50 40 30 31 29 32 45 42 50 40 30 31 29 45 32 42 Sets with frequency counts
  • 41. March 20, 2024 Data Mining: Concepts and Techniques 41 Module 2: BUFFER Compact representation as sequence of ints Transactions sorted by item-id Bitmap for transaction boundaries Window 1 Window 2 Window 3 Window 4 Window 5 Window 6 In Main Memory
  • 42. March 20, 2024 Data Mining: Concepts and Techniques 42 Module 3: SUBSET-GEN BUFFER 3 3 3 4 2 2 1 2 1 3 1 1 Frequency counts of subsets in lexicographic order
  • 43. March 20, 2024 Data Mining: Concepts and Techniques 43 Overall Algorithm ... BUFFER 3 3 3 4 2 2 1 2 1 3 1 1 SUBSET-GEN TRIE new TRIE Problem: Number of subsets is exponential!
  • 44. March 20, 2024 Data Mining: Concepts and Techniques 44 SUBSET-GEN Pruning Rules A-priori Pruning Rule If set S is infrequent, every superset of S is infrequent. See paper for details ... Lossy Counting Pruning Rule At each ‘window boundary’ decrement TRIE counters by 1. Actually, ‘Batch Deletion’: At each ‘main memory buffer’ boundary, decrement all TRIE counters by b.
  • 45. March 20, 2024 Data Mining: Concepts and Techniques 45 Bottlenecks ... BUFFER 3 3 3 4 2 2 1 2 1 3 1 1 SUBSET-GEN TRIE new TRIE Consumes main memory Consumes CPU time
  • 46. March 20, 2024 Data Mining: Concepts and Techniques 46 Design Decisions for Performance TRIE Main memory bottleneck Compact linear array  (element, counter, level) in preorder traversal  No pointers! Tries are on disk  All of main memory devoted to BUFFER Pair of tries  old and new (in chunks) mmap() and madvise() SUBSET-GEN CPU bottleneck Very fast implementation  See paper for details
  • 47. March 20, 2024 Data Mining: Concepts and Techniques 47 Mining Data Streams  What is stream data? Why Stream Data Systems?  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis
  • 48. March 20, 2024 Data Mining: Concepts and Techniques 48 Classification for Dynamic Data Streams  Decision tree induction for stream data classification  VFDT (Very Fast Decision Tree)/CVFDT (Domingos, Hulten, Spencer, KDD00/KDD01)  Is decision-tree good for modeling fast changing data, e.g., stock market analysis?  Other stream classification methods  Instead of decision-trees, consider other models  Naïve Bayesian  Ensemble (Wang, Fan, Yu, Han. KDD’03)  K-nearest neighbors (Aggarwal, Han, Wang, Yu. KDD’04)  Tilted time framework, incremental updating, dynamic maintenance, and model construction  Comparing of models to find changes
  • 49. March 20, 2024 Data Mining: Concepts and Techniques 49 Hoeffding Tree  With high probability, classifies tuples the same  Only uses small sample  Based on Hoeffding Bound principle  Hoeffding Bound (Additive Chernoff Bound) r: random variable R: range of r n: # independent observations Mean of r is at least ravg – ε, with probability 1 – d n R 2 ) / 1 ln( 2   
  • 50. March 20, 2024 Data Mining: Concepts and Techniques 50 Hoeffding Tree Algorithm  Hoeffding Tree Input S: sequence of examples X: attributes G( ): evaluation function d: desired accuracy  Hoeffding Tree Algorithm for each example in S retrieve G(Xa) and G(Xb) //two highest G(Xi) if ( G(Xa) – G(Xb) > ε ) split on Xa recurse to next node break
  • 51. March 20, 2024 Data Mining: Concepts and Techniques 51 yes no Packets > 10 Protocol = http Protocol = ftp yes yes no Packets > 10 Bytes > 60K Protocol = http Data Stream Data Stream Ack. From Gehrke’s SIGMOD tutorial slides Decision-Tree Induction with Data Streams
  • 52. March 20, 2024 Data Mining: Concepts and Techniques 52 Hoeffding Tree: Strengths and Weaknesses  Strengths  Scales better than traditional methods  Sublinear with sampling  Very small memory utilization  Incremental  Make class predictions in parallel  New examples are added as they come  Weakness  Could spend a lot of time with ties  Memory used with tree expansion  Number of candidate attributes
  • 53. March 20, 2024 Data Mining: Concepts and Techniques 53 VFDT (Very Fast Decision Tree)  Modifications to Hoeffding Tree  Near-ties broken more aggressively  G computed every nmin  Deactivates certain leaves to save memory  Poor attributes dropped  Initialize with traditional learner (helps learning curve)  Compare to Hoeffding Tree: Better time and memory  Compare to traditional decision tree  Similar accuracy  Better runtime with 1.61 million examples  21 minutes for VFDT  24 hours for C4.5
  • 54. March 20, 2024 Data Mining: Concepts and Techniques 54 CVFDT (Concept-adapting VFDT)  Concept Drift  Time-changing data streams  Incorporate new and eliminate old  CVFDT  Increments count with new example  Decrement old example  Sliding window  Nodes assigned monotonically increasing IDs  Grows alternate subtrees  When alternate more accurate => replace old  O(w) better runtime than VFDT-window
  • 55. March 20, 2024 Data Mining: Concepts and Techniques 55 Mining Data Streams  What is stream data? Why Stream Data Systems?  Stream data management systems: Issues and solutions  Stream data cube and multidimensional OLAP analysis  Stream frequent pattern analysis  Stream classification  Stream cluster analysis  Research issues
  • 56. March 20, 2024 Data Mining: Concepts and Techniques 56 Clustering Data Streams [GMMO01]  Base on the k-median method  Data stream points from metric space  Find k clusters in the stream s.t. the sum of distances from data points to their closest center is minimized  Constant factor approximation algorithm  In small space, a simple two step algorithm: 1. For each set of M records, Si, find O(k) centers in S1, …, Sl  Local clustering: Assign each point in Si to its closest center 2. Let S’ be centers for S1, …, Sl with each center weighted by number of points assigned to it  Cluster S’ to find k centers
  • 57. March 20, 2024 Data Mining: Concepts and Techniques 57 Hierarchical Clustering Tree data points level-i medians level-(i+1) medians
  • 58. March 20, 2024 Data Mining: Concepts and Techniques 58 Hierarchical Tree and Drawbacks  Method:  maintain at most m level-i medians  On seeing m of them, generate O(k) level-(i+1) medians of weight equal to the sum of the weights of the intermediate medians assigned to them  Drawbacks:  Low quality for evolving data streams (register only k centers)  Limited functionality in discovering and exploring clusters over different portions of the stream over time
  • 59. March 20, 2024 Data Mining: Concepts and Techniques 59 Clustering for Mining Stream Dynamics  Network intrusion detection: one example  Detect bursts of activities or abrupt changes in real time—by on- line clustering  Another approach:  Tilted time frame work: o.w. dynamic changes cannot be found  Micro-clustering: better quality than k-means/k-median  incremental, online processing and maintenance  Two stages: micro-clustering and macro-clustering  With limited “overhead” to achieve high efficiency, scalability, quality of results and power of evolution/change detection
  • 60. March 20, 2024 Data Mining: Concepts and Techniques 60 CluStream: A Framework for Clustering Evolving Data Streams  Design goal  High quality for clustering evolving data streams with greater functionality  While keep the stream mining requirement in mind  One-pass over the original stream data  Limited space usage and high efficiency  CluStream: A framework for clustering evolving data streams  Divide the clustering process into online and offline components  Online component: periodically stores summary statistics about the stream data  Offline component: answers various user questions based on the stored summary statistics
  • 61. March 20, 2024 Data Mining: Concepts and Techniques 61 The CluStream Framework ... ... 1 k X X ... ... 1 k T T   d i i i x x X ... 1    n CF CF CF CF t t x x , 1 , 2 , 1 , 2  Micro-cluster  Statistical information about data locality  Temporal extension of the cluster-feature vector  Multi-dimensional points with time stamps  Each point contains d dimensions, i.e.,  A micro-cluster for n points is defined as a (2.d + 3) tuple  Pyramidal time frame  Decide at what moments the snapshots of the statistical information are stored away on disk
  • 62. March 20, 2024 Data Mining: Concepts and Techniques 62 CluStream: Pyramidal Time Frame  Pyramidal time frame  Snapshots of a set of micro-clusters are stored following the pyramidal pattern  They are stored at differing levels of granularity depending on recency  Snapshots are classified into different orders varying from 1 to log(T)  The i-th order snapshots occur at intervals of αi where α ≥ 1  Only the last (α + 1) snapshots are stored
  • 63. March 20, 2024 Data Mining: Concepts and Techniques 63 CluStream: Clustering On-line Streams  Online micro-cluster maintenance  Initial creation of q micro-clusters  q is usually significantly larger than the number of natural clusters  Online incremental update of micro-clusters  If new point is within max-boundary, insert into the micro- cluster  O.w., create a new cluster  May delete obsolete micro-cluster or merge two closest ones  Query-based macro-clustering  Based on a user-specified time-horizon h and the number of macro-clusters K, compute macroclusters using the k-means algorithm
  • 64. March 20, 2024 Data Mining: Concepts and Techniques 64 References on Stream Data Mining (1)  C. Aggarwal, J. Han, J. Wang, P. S. Yu. A Framework for Clustering Data Streams, VLDB'03  C. C. Aggarwal, J. Han, J. Wang and P. S. Yu. On-Demand Classification of Evolving Data Streams, KDD'04  C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A Framework for Projected Clustering of High Dimensional Data Streams, VLDB'04  S. Babu and J. Widom. Continuous Queries over Data Streams. SIGMOD Record, Sept. 2001  B. Babcock, S. Babu, M. Datar, R. Motwani and J. Widom. Models and Issues in Data Stream Systems”, PODS'02. (Conference tutorial)  Y. Chen, G. Dong, J. Han, B. W. Wah, and J. Wang. "Multi-Dimensional Regression Analysis of Time-Series Data Streams, VLDB'02  P. Domingos and G. Hulten, “Mining high-speed data streams”, KDD'00  A. Dobra, M. N. Garofalakis, J. Gehrke, R. Rastogi. Processing Complex Aggregate Queries over Data Streams, SIGMOD’02  J. Gehrke, F. Korn, D. Srivastava. On computing correlated aggregates over continuous data streams. SIGMOD'01  C. Giannella, J. Han, J. Pei, X. Yan and P.S. Yu. Mining frequent patterns in data streams at multiple time granularities, Kargupta, et al. (eds.), Next Generation Data Mining’04
  • 65. March 20, 2024 Data Mining: Concepts and Techniques 65 References on Stream Data Mining (2)  S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering Data Streams, FOCS'00  G. Hulten, L. Spencer and P. Domingos: Mining time-changing data streams. KDD 2001  S. Madden, M. Shah, J. Hellerstein, V. Raman, Continuously Adaptive Continuous Queries over Streams, SIGMOD02  G. Manku, R. Motwani. Approximate Frequency Counts over Data Streams, VLDB’02  A. Metwally, D. Agrawal, and A. El Abbadi. Efficient Computation of Frequent and Top-k Elements in Data Streams. ICDT'05  S. Muthukrishnan, Data streams: algorithms and applications, Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms, 2003  R. Motwani and P. Raghavan, Randomized Algorithms, Cambridge Univ. Press, 1995  S. Viglas and J. Naughton, Rate-Based Query Optimization for Streaming Information Sources, SIGMOD’02  Y. Zhu and D. Shasha. StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time, VLDB’02  H. Wang, W. Fan, P. S. Yu, and J. Han, Mining Concept-Drifting Data Streams using Ensemble Classifiers, KDD'03