Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Efficient motif discovery for large scale time series in healthcare

Efficient motif discovery for large scale time series in healthcare

Do Your Projects With Technology Experts
To Get this projects Call : 9566355386 / 99625 88976
Web : http://www.lemenizinfotech.com
Web : http://www.ieeemaster.com
Mail : projects@lemenizinfotech.com
Blog : http://ieeeprojectspondicherry.weebly.com
Blog : http://www.ieeeprojectsinpondicherry.blogspot.in/
Youtube:https://www.youtube.com/watch?v=eesBNUnKvws

  • Be the first to comment

Efficient motif discovery for large scale time series in healthcare

  1. 1. Do Your Projects With Domain Experts… Copyright © 2015 LeMeniz Infotech. All rights reserved LeMeniz Infotech 36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue, Pondicherry-605 005. Call: 0413-4205444, +91 9566355386, 99625 88976. Web : www.lemenizinfotech.com / www.ieeemaster.com Mail : projects@lemenizinfotech.com Efficient Motif Discovery for Large-Scale Time Series in Healthcare ABSTRACT: Analyzing time series data can reveal the temporal behavior of the underlying mechanism producing the data. Time series motifs, which are similar subsequences or frequently occurring patterns, have significant meanings for researchers especially in medical domain. With the fast growth of time series data, traditional methods for motif discovery are inefficient and not applicable to large-scale data. This work proposes an efficient Motif Discovery method for Large-scale time series (MDLats). By computing standard motifs, MDLats eliminates a majority of redundant computation in the related arts and reuses existing information to the maximum. All the motif types and subsequences are generated for subsequent analysis and classification. Our system is implemented on a Hadoop platform and deployed in a hospital for clinical electrocardiography classification. The experiments on real-world healthcare data show that MDLats outperform the state-of- the-art methods even in large time series. INTRODUCTION LARGE AMOUNTS of data are being collected from scientific experiments, business operations, and social media in the big data era . As a special case, thousands of research labs and commercial enterprises are generating large amounts of temporal data. For example, many research hospitals have trillions of data points of electrocardiography (ECG) data; Twitter collects 170 million temporal data every minute and serves up 200 million queries per day. These temporal data are sequentially collected over time, and are called time series. A time series is a sequence of real numbers where each number represents a value at a point of time. Mapping to the physical world, a time series can represent network flow, exchange rates, or weather conditions over time.
  2. 2. Do Your Projects With Domain Experts… Copyright © 2015 LeMeniz Infotech. All rights reserved LeMeniz Infotech 36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue, Pondicherry-605 005. Call: 0413-4205444, +91 9566355386, 99625 88976. Web : www.lemenizinfotech.com / www.ieeemaster.com Mail : projects@lemenizinfotech.com EXISTING SYSTEM Traditional time series data mining algorithms cannot handle large-scale time series efficiently. Owing to its high time consumption, similarity search used in such algorithms becomes their bottleneck for classification, clustering, motif discovery, and anomaly detection. Time series motifs, which are similar subsequences or frequently occurring patterns, have significant meanings for researchers especially in healthcare domain. DisADVANTAGE OF Existing SYSTEM  Due to their poor scalability, existing methods on discovering motifs cannot achieve satisfactory efficiency and accuracy when facing large-scale time series data PROPOSED SYSTEM In Proposed System Motif Discovery method for Large-scale time series data (MDLats) by combining the advantages of both exact and approximate methods. It balances the efficiency and accuracy by computing standard motifs, based on which the final motifs, including all the similar subsequences,are discovered adaptively. We have implemented MDLats on Hadoop. Experiments on public University of California, Riverside (UCR), time series data and random walks dataset show that it can scale to a very large size and well outperforms competitors in processing time, precision, and recall. Its real- world healthcare application on ECG classification achieves over 95% precision and recall. ADVANTAGE OF PROPOSED SYSTEM  It improves the efficiency for motif discovery while retaining accuracy. By computing standard motifs, it can reduce majority of redundant computation, and reuse existing information to the maximum by exploiting the relation between existing ata and newly arrived ones.  It can generate all the motifs, including not only the most similar pairs of
  3. 3. Do Your Projects With Domain Experts… Copyright © 2015 LeMeniz Infotech. All rights reserved LeMeniz Infotech 36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue, Pondicherry-605 005. Call: 0413-4205444, +91 9566355386, 99625 88976. Web : www.lemenizinfotech.com / www.ieeemaster.com Mail : projects@lemenizinfotech.com subsequences as commonly done in existing methods but also all the similar subsequences, which provide additional information for time series analysis.  It is adaptive to different length of motifs, and can avoid a full scan in the preprocessing. It is extremely efficient when the time series are regular and the fluctuations in them are sparse. Generally, both ECG and network flow data enjoy this feature.  It is scalable and can be deployed in Hadoop for parallel computing. ARCHITECTURE:
  4. 4. Do Your Projects With Domain Experts… Copyright © 2015 LeMeniz Infotech. All rights reserved LeMeniz Infotech 36, 100 Feet Road, Natesan Nagar, Near Indira Gandhi Statue, Pondicherry-605 005. Call: 0413-4205444, +91 9566355386, 99625 88976. Web : www.lemenizinfotech.com / www.ieeemaster.com Mail : projects@lemenizinfotech.com HARDWARE REQUIREMENTS:  System : Pentium IV 2.4 GHz.  Hard Disk : 40 GB.  Floppy Drive : 44 Mb.  Monitor : 15 VGA Colour. SOFTWARE REQUIREMENTS:  Operating system : Windows 7.  Coding Language : Java 1.7 ,Hadoop 0.8.1  Database : MySql 5  IDE : Eclipse

×