Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- 18 Data Streams by Pier Luca Lanzi 11802 views
- Data Mining: Mining stream time ser... by Datamining Tools 4075 views
- Spatial databases by Seraphic Nazir 6574 views
- 08. Mining Type Of Complex Data by Achmad Solichin 9485 views
- Section 8.1. Mining Data Streams by Tommy96 2108 views
- 5.1 mining data streams by Krish_ver2 425 views

8,784 views

8,376 views

8,376 views

Published on

Data Mining: Mining stream time series and sequence data

Published in:
Technology

No Downloads

Total views

8,784

On SlideShare

0

From Embeds

0

Number of Embeds

52

Shares

0

Downloads

0

Comments

0

Likes

6

No embeds

No notes for slide

- 1. Mining Stream, Time Series, and Sequence Data<br />
- 2. Methodologies for Stream Data Processing and Stream Data Systems<br />Random Sampling<br />Sliding Windows<br />Histograms<br />Multi resolution Methods<br />Sketches Synopses<br />
- 3. Randomized Algorithms to analyze Data Streams<br />Randomized algorithms, in the form of random sampling and sketching, are often used to deal with massive, high-dimensional data streams.<br />
- 4. Data Stream Management Systems and Stream Queries<br />In traditional database systems, data are stored in finite and persistent databases.<br />stream data are infinite and impossible to store fully in a database.<br /> Data Stream Management System (DSMS), there may be multiple data streams.<br />Once an element from a data stream has been processed, it is discarded or archived, and it cannot be easily retrieved unless it is explicitly stored in memory<br />
- 5. Critical Layers of stream data cube<br /> Two critical cuboids (or layers)<br />The first layer, called the minimal interest layer, is the minimally interesting layer that ananalyst would like to study<br />The second layer, called the observation layer, is the layer at which an analyst (or anautomated system) would like to continuously study the data.<br />
- 6. Hoeffding Tree Algorithm<br />The Hoeffding tree algorithm is a decision tree learning method for stream data classification.<br />It was initially used to track Web click streams and construct models to predict which Web hosts and Web sites a user is likely to access. <br />It typically runs in sublinear time and produces a nearly identical decision tree to that of traditional batch learners.<br />It uses Hoeffding trees, which exploit the idea that a small sample can often be enough to choose an optimal splitting attribute. <br />
- 7. Very Fast Decision Tree (VFDT) <br />The VFDT (Very Fast Decision Tree) algorithm makes several modifications to the Hoeffding tree algorithm.<br />The modifications include breaking near-ties during attribute selection more aggressively, computing the G function after a number of training examples, deactivating the least promising leaves whenever memory is running low, dropping poor splitting attributes, and improving the initialization method.<br />VFDT works well on stream data and also compares extremely well to traditional classifiers in both speed and accuracy To adapt to concept-drifting data streams.<br />
- 8. Concept-adapting Very Fast Decision Tree algorithm (CVFDT).<br />CVFDT also uses a sliding window approach; <br />however, it does not construct a new model from scratch each time. Rather, it updates statistics at the nodes by incrementing the counts associated with new examples and decrementing the counts associated with old ones. <br />Therefore, if there is a concept drift, some nodes may no longer pass the Hoeffding bound. When this happens, an alternate subtree will be grown, with the new best splitting attribute at the root.<br />
- 9. A Classifier Ensemble Approach to Stream Data Classification<br />The idea is to train an ensemble or group of classifiers (using, say naïve Bayes) from sequential chunks of the data stream.<br />Whenever a new chunk arrives, we build a new classifier from it. <br />The individual classifiers are weighted based on their expected classification accuracy in a time-changing environment. <br />Only the top-k classifiers are kept. The decisions are then based on the weighted votes of the classifiers.<br />
- 10. Clustering in evolving data streams<br />Compute and store summaries of past data<br />Apply a divide-and-conquer strategy<br />Incremental clustering of incoming data streams<br />Perform micro clustering as well as macro clustering analysis<br />Explore multiple time granularity for the analysis of cluster evolution<br />Divide stream clustering into on-line and off-line processes<br />
- 11. Mining Time-Series Data<br />A time-series database consists of sequences of values or events obtained over repeated measurements of time.<br />Trend Analysis<br />Similarity Search in Time-Series Analysis<br />
- 12. Markov Chain for sequence analysis<br />A Markov chain is a model that generates sequences in which the probability of a symbol depends only on the previous symbol.<br />
- 13. Tasks using hidden Markov models include:<br />Evaluation: Given a sequence, x, determine the probability, P(x), of obtaining x in the model.<br />Decoding: Given a sequence, determine the most probable path through the model that produced the sequence.<br />Learning: Given a model and a set of training sequences, find the model parameters (i.e., the transition and emission probabilities) that explain the training sequences with relatively high probability.<br />
- 14. Different algorithms in series analysis<br />Forward Algorithm<br />Viterbi Algorithm<br />Baum-Welch Algorithm<br />
- 15. Visit more self help tutorials<br />Pick a tutorial of your choice and browse through it at your own pace.<br />The tutorials section is free, self-guiding and will not involve any additional support.<br />Visit us at www.dataminingtools.net<br />

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment