• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data Mining: Mining stream time series and sequence data
 

Data Mining: Mining stream time series and sequence data

on

  • 6,198 views

Data Mining: Mining stream time series and sequence data

Data Mining: Mining stream time series and sequence data

Statistics

Views

Total Views
6,198
Views on SlideShare
6,152
Embed Views
46

Actions

Likes
2
Downloads
0
Comments
0

2 Embeds 46

http://www.dataminingtools.net 28
http://dataminingtools.net 18

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data Mining: Mining stream time series and sequence data Data Mining: Mining stream time series and sequence data Presentation Transcript

    • Mining Stream, Time Series, and Sequence Data
    • Methodologies for Stream Data Processing and Stream Data Systems
      Random Sampling
      Sliding Windows
      Histograms
      Multi resolution Methods
      Sketches Synopses
    • Randomized Algorithms to analyze Data Streams
      Randomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.
    • Data Stream Management Systems and Stream Queries
      In traditional database systems, data are stored in finite and persistent databases.
      stream data are infinite and impossible to store fully in a database.
       Data Stream Management System (DSMS), there may be multiple data streams.
      Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory
    • Critical Layers of stream data cube
      Two critical cuboids (or layers)
      The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to study
      The second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.
    • Hoeffding Tree Algorithm
      The Hoeffding tree algorithm is a decision tree learning method for stream data classification.
      It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access.
      It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.
      It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute.
    • Very Fast Decision Tree (VFDT) 
      The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm.
      The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method.
      VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.
    • Concept-adapting Very Fast Decision Tree algorithm (CVFDT).
      CVFDT also uses a sliding window approach;
      however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones.
      Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.
    • A Classifier Ensemble Approach to Stream Data Classification
      The idea is to train an ensemble or group of classifiers (using, say naïve Bayes) from sequential chunks of the data stream.
      Whenever a new chunk arrives, we build a new classifier from it.
      The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment.
      Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.
    • Clustering in evolving data streams
      Compute and store summaries of past data
      Apply a divide-and-conquer strategy
      Incremental clustering of incoming data streams
      Perform micro clustering as well as macro clustering analysis
      Explore multiple time granularity for the analysis of cluster evolution
      Divide stream clustering into on-line and off-line processes
    • Mining Time-Series Data
      A time-series database consists of sequences of values or events obtained over repeated measurements of time.
      Trend Analysis
      Similarity Search in Time-Series Analysis
    • Markov Chain for sequence analysis
      A Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.
    • Tasks using hidden Markov models include:
      Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model.
      Decoding: Given a sequence, determine the most probable path through the model that produced the sequence.
      Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.
    • Different algorithms in series analysis
      Forward Algorithm
      Viterbi Algorithm
      Baum-Welch Algorithm
    • Visit more self help tutorials
      Pick a tutorial of your choice and browse through it at your own pace.
      The tutorials section is free, self-guiding and will not involve any additional support.
      Visit us at www.dataminingtools.net