JAVA 2013 IEEE DATAMINING PROJECT Mining Order-Preserving Submatrices from Data with Repeated Measurements
Upcoming SlideShare
Loading in...5
×
 

JAVA 2013 IEEE DATAMINING PROJECT Mining Order-Preserving Submatrices from Data with Repeated Measurements

on

  • 1,016 views

To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org

To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org

Statistics

Views

Total Views
1,016
Views on SlideShare
1,016
Embed Views
0

Actions

Likes
0
Downloads
11
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft Word

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

JAVA 2013 IEEE DATAMINING PROJECT Mining Order-Preserving Submatrices from Data with Repeated Measurements JAVA 2013 IEEE DATAMINING PROJECT Mining Order-Preserving Submatrices from Data with Repeated Measurements Document Transcript

  • Mining Order-Preserving Sub matrices from Data with Repeated Measurements ABSTRACT Order-preserving sub matrices (OPSM’s) have been shown useful in capturing concurrent patterns in data when the relative magnitudes of data items are more important than their exact values. For instance, in analyzing gene expression profiles obtained from microarray experiments, the relative magnitudes are important both because they represent the change of gene activities across the experiments, and because there is typically a high level of noise in data that makes the exact values untreatable. To cope with data noise, repeated experiments are often conducted to collect multiple measurements. We propose and study a more robust version of OPSM, where each data item is represented by a set of values obtained from replicated experiments. We call the new problem OPSM-RM (OPSM with repeated measurements). We define OPSM-RM based on a number of practical requirements. We discuss the computational challenges of OPSM-RM and propose a generic mining algorithm. We further propose a series of techniques to speed up two time-dominating components of the algorithm. We show the GLOBALSOFT TECHNOLOGIES IEEE PROJECTS & SOFTWARE DEVELOPMENTS IEEE FINAL YEAR PROJECTS|IEEE ENGINEERING PROJECTS|IEEE STUDENTS PROJECTS|IEEE BULK PROJECTS|BE/BTECH/ME/MTECH/MS/MCA PROJECTS|CSE/IT/ECE/EEE PROJECTS CELL: +91 98495 39085, +91 99662 35788, +91 98495 57908, +91 97014 40401 Visit: www.finalyearprojects.org Mail to:ieeefinalsemprojects@gmail.com
  • effectiveness and efficiency of our methods through a series of experiments conducted on real microarray data. EXISTING SYSTEM: To synthesize additional replicates, for each gene and each experiment, we follow standard practice to model the values by a Gaussian distribution with the mean and variance equal to the sample mean and variance of the 4 replicates. The expression values of new replicates were then sampled from the Gaussian. New columns are synthesized by randomly drawing an existing column, fitting Gaussians as described, and sampling values from it. This way of construction mimics the addition of knockout experiments of genes in the same sub pathways of the original ones. PROPOSED SYSTEM: Propose a generic mining algorithm. We further propose a series of techniques to speed up two time-dominating components of the algorithm. We show the effectiveness and efficiency of our methods through a series of experiments conducted on real microarray data. The conventional order-preserving sub matrix (OPSM) mining problem was motivated and introduced to analyze gene expression data without repeated measurements. They proved that the problem is NP hard. A greedy heuristic mining algorithm was proposed, which does not guarantee the return of all OPSM’s or the best OPSM’s.
  • Advantage: 1. We can find duplicate records and same group types. 2. Easily we can catch genes relationship from database. Modules: 1. Patient Module. 2. Doctor Module. 3. User Module. 4. Admin Module. Patient Module: Department Manager's Patient Module allows administrators to access and manage their own patient billing records at their desktop. It tracks patient diagnosis information for each admission or visit as well as a historical account of services provided to each patient. This module helps to collect complete and relevant patient information. The system automates the patient administration functions to have better and efficient patient care process. Doctor Module: In this module, doctor register and login. He inserts patient details and view patient records and he change his password and the data will be passed from the co-coordinator and thus it will be submitted to the End Users (Data Users). User Module: In this module, the Users are classified into two types they are, Data Users and Data Owner Depends on the restriction the data will be passed to the Co-coordinator.The
  • co-coordinator pass the details via broker and the data will be checked with the secret key and thus it will display for the users. Admin Module: In this module, to arrange the database based on the patient and doctor details and records. The admin needs to register and register the Organization and Users Forms. Hardware Requirements: • System : Pentium IV 2.4 GHz. • Hard Disk : 40 GB. • Floppy Drive : 1.44 Mb. • Monitor : 14’ Colour Monitor. • Mouse : Optical Mouse. • Ram : 512 Mb. • Keyboard : 101 Keyboard Software Requirements: • Operating system : Windows XP. • Coding Language : ASP.Net with C# • Data Base : SQL Server 2005. CONCLUSION In this paper we have described the problem of high noise level to the mining of OPSM’s, and discussed how it can be alleviated by exploiting repeated measurements.
  • We have listed some practical requirements for a new problem, OPSM-RM that takes into account the repeated measurements, and proposed a concrete definition that fulfills the requirements. We have described a basic Apriori mining algorithm that utilizes a monotonic property of the definition. Its performance depends on the component functions generate and verify. We have proposed the counting array data structure and a sequence compression method for reducing the running time of verify. For generate, we have proposed two pruning methods based on the MinBound and the HTBound. The latter makes use of the head and tail arrays, which are useful both in constructing and proving the bound. We have performed experiments on real microarray data to demonstrate the biological validity of the OPSM-RM model, the effectiveness of the pruning methods, and the scalability of the algorithm. As sequencing-based methods have become more popular, the noise level in new gene expression datasets is expected to decrease and more distinct states of expression can be identified. How this will affect the advantages of OPSM-RM over OPSM is to be studied when more sequencing datasets become available.