Your SlideShare is downloading. ×
Data Mining: Mining stream time series and sequence data
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data Mining: Mining stream time series and sequence data


Published on

Data Mining: Mining stream time series and sequence data

Data Mining: Mining stream time series and sequence data

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. Mining Stream, Time Series, and Sequence Data
  • 2. Methodologies for Stream Data Processing and Stream Data Systems
    Random Sampling
    Sliding Windows
    Multi resolution Methods
    Sketches Synopses
  • 3. Randomized Algorithms to analyze Data Streams
    Randomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
  • 4. Data Stream Management Systems and Stream Queries
    In traditional database systems, data are stored in finite and persistent databases.
    stream data are infinite and impossible to store fully in a database.
     Data Stream Management System (DSMS), there may be multiple data streams.
    Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
  • 5. Critical Layers of stream data cube
    Two critical cuboids (or layers)
    The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to study
    The second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
  • 6. Hoeffding Tree Algorithm
    The Hoeffding tree algorithm is a decision tree learning method for stream data classification.
    It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access.
    It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.
    It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
  • 7. Very Fast Decision Tree (VFDT) 
    The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm.
    The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method.
    VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
  • 8. Concept-adapting Very Fast Decision Tree algorithm (CVFDT).
    CVFDT also uses a sliding window approach;
    however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones.
    Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
  • 9. A Classifier Ensemble Approach to Stream Data Classification
    The idea is to train an ensemble or group of classifiers (using, say naïve Bayes) from sequential chunks of the data stream.
    Whenever a new chunk arrives, we build a new classifier from it.
    The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment.
    Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
  • 10. Clustering in evolving data streams
    Compute and store summaries of past data
    Apply a divide-and-conquer strategy
    Incremental clustering of incoming data streams
    Perform micro clustering as well as macro clustering analysis
    Explore multiple time granularity for the analysis of cluster evolution
    Divide stream clustering into on-line and off-line processes
  • 11. Mining Time-Series Data
    A time-series database consists of sequences of values or events obtained over repeated measurements of time.
    Trend Analysis
    Similarity Search in Time-Series Analysis
  • 12. Markov Chain for sequence analysis
    A Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
  • 13. Tasks using hidden Markov models include:
    Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model.
    Decoding: Given a sequence, determine the most probable path through the model that produced the sequence.
    Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
  • 14. Different algorithms in series analysis
    Forward Algorithm
    Viterbi Algorithm
    Baum-Welch Algorithm
  • 15. Visit more self help tutorials
    Pick a tutorial of your choice and browse through it at your own pace.
    The tutorials section is free, self-guiding and will not involve any additional support.
    Visit us at