Your SlideShare is downloading. ×
Data mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm and
Data mining with human genetics to enhance gene based algorithm and
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Data mining with human genetics to enhance gene based algorithm and

151

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
151
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME176DATA MINING WITH HUMAN GENETICS TO ENHANCE GENEBASED ALGORITHM AND DNA DATABASE SECURITYVijay Arputharaj JResearch Scholar, Department of Computer Science,Karpagam University, Coimbatore,Tamil Nadu, IndiaDr.R.Manicka ChezianAssociate Professor, Department of Computer Science,NGM College (Autonomous),Pollachi,Tamil Nadu, IndiaABSTRACTThe goal of data mining in DNA Database is to check some possible combinations ofDNA sequences and to generate a common sympathetic code or algorithm to formulate thesequence on mutations. Since the data mining is the best technique to analyze and extract thedata, it is also helpful to formulate the common algorithm.Data mining in the area of study on human genetics, an important goal is tounderstand the mapping relationship between the inter-individual variation in human DNAsequences and variability in disease, mutation susceptibility. In lay terms, it is used to findout how the changes in an individuals DNA sequence affect the risk of developing commondiseases and mutations with high level security. This investigation also helps in parentalidentification algorithms for DNA sequences, genome expressions. Data mining, dataextraction techniques are used to understand the need for analyses of large, complex,information-rich data sets in DNA Sequences.Regulation of gene expression includes the processes that cells and viruses use toregulate the way that the information in genes is turned into gene products. An importantchallenge in use of large scale gene expression data for biological classification occurs whenthe expression dataset being analyzed involves multiple classes. To overcome this kind ofproblems data mining is used.INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING& TECHNOLOGY (IJCET)ISSN 0976 – 6367(Print)ISSN 0976 – 6375(Online)Volume 4, Issue 3, May-June (2013), pp. 176-181© IAEME: www.iaeme.com/ijcet.aspJournal Impact Factor (2013): 6.1302 (Calculated by GISI)www.jifactor.comIJCET© I A E M E
  • 2. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME177Key Words- Data mining, DNA Database, DNA Sequence, Gene Expression, Biologicalclassification, Multiple class1. INTRODUCTIONThe Human Genome Task or Project is a worldwide scientific study mission with amain aim of formative the succession of chemical base pairs which structure DNA, also toidentify and map the genes of the human genome from the corporeal and serviceable position.A DNA database or DNA databank is a database of contains all DNA data. A DNADatabank can be used in the analysis of parental comparison, genetic diseases, geneticfingerprinting for criminology, genetic genealogy etc.Data mining in the area of human genetics, an important goal is to understand themapping relationship between the individual variation in human DNA sequences andvariability in various algorithms for database security issues, for mutation susceptibility andparental identification differences. In our country India which is solidly populated there ishuge need for DNA databases which may help in stopping different types of fraud as likePassport fraud, Other fraud etc.Data mining, data extraction techniques are used to understand the need for analysesof large, complex, information-rich data sets in DNA Sequences. Several visualizations anddata mining techniques are already available, and they are used to validate and attempt todiscover new methods for differentiating DNA sequences or exons, from non-coding DNAsequences or introns. Since the data mining is the best technique to analyze and extract thedata, it is also helpful to formulate the common algorithm.2. LITERATURE STUDY2.1 INTERNATIONAL STATUSIn northern countries data exploration techniques designed to classify DNAsequences, many different classification techniques including rule-based classifiers andneural networks. It is used visualization of both the original data and the results of the datamining to help verify patterns and to understand the distinction between the different types ofdata and classifications.Forensic identification problems are examples in which the study of DNA profiles is acommon approach. Here we present some problems and develop their treatment putting thefocus in the use of Object-Oriented Bayesian Networks - OOBN. The use of DNA databases,which began in 1995 in England, has created new challenges about its use. In Portugal, thelegislation for the construction of a genetic database was defined in 2008. Cryptographic,Authentication and High Definition Security approaches for databases are used for severalcountries like Thailand, US, UK etc2.2 NATIONAL STATUSGenetic features and environmental factors which were involved in multi factorialdiseases. data mining tools were required and we proposed a 2-Phase approach using aspecific genetic algorithm. For the first phase, the feature selection problem, we used agenetic algorithm (GA). To deal with this very specific problem, some advanced mechanismshad been introduced in the genetic algorithm such as sharing, random immigrant, dedicatedgenetic operators and a particular distance operator had been defined. Then, the second phase,
  • 3. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME178a clustering based on the features selected during the previous phase, will use the clusteringalgorithm k-means.INDIA CHENNAI: The FBI has a DNA index system. The UK has a similar database. Andif Parliament passes the DNA Profiling Bill, 2007, India will soon join the league, creating anational DNA database that will help police arrest serial offenders and give a boost toforensic investigation. The bill, drafted and sent to all ministries and departments for theirfeedback, has been modified. The final version has been sent to the law ministry, which hassent it to the legal department for final drafting,2.3 SIGNIFICANCE OF THE STUDY• The important significance of this research is useful for entire society, the identityof the citizen can be stored thru the Secured DNA Database, Which might notcontain any fraud like passport fraud, Ration card fraud etc.• This research advances and aids in criminal and forensic databases, Thisapplication is also useful for the government and for the society• This research is primarily deals with the advancement of genetic algorithm withproper security features in DNA Databases and it enhances the special features inDNA database security.3. RESEARCH STUDY AND DEVELOPMENT3.1 AIMS AND OBJECTIVES• To Enhance Database SecurityThis research is primarily deals with the advancement of genetic algorithm with propersecurity features in DNA Databases and it enhances the special features in DNA databasesecurity.• Mapping relationships in DNA sequences and variability in disease, mutationsusceptibility• Effective Solution in parental identification algorithms for DNA sequences, genomeexpressions.3.2 MATERIAL AND METHODS1. Data mining and information retrieval2. Visual Analytics and Collaboration3. Combination of Parallel algorithms for sequence analysis4. Seamless high-performance computing5. Security Algorithmsa) Reverse Encryption algorithm to protect datab) Advance Cryptography algorithm to protect datac) Advanced Encryption Standard (AES)The above methodologies the Data mining technique is used for knowledgediscovery from entire DNA Database, There can be three levels of genome data mining. Thesimplest is an in-depth analysis of the result from a single query using a genome browser. Inthis level, one may start with a gene or marker name, or by mapping a sequence to thegenome. Cross comparison of various annotation tracks may help make sense of the queryregion. This is the most popular use of any genome browser. Data mining is opposite to the
  • 4. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME179information retrieval in the sense, it does not based on predetermine criteria; it will uncoversome hidden patterns by exploring our data.Visual Analytics, Parallel algorithms are used in the implementation of security issuesin the database.Seamless High performance computing is connects with speed of access in the databaserecords Information retrieval is what based on predetermine criteria, like you are interested inretrieving group of certain peoples belongs to certain class, having certain mortgage plan, orhaving certain characteristics which you already know.Cryptography is usually referred to as "the study of secret", while nowadays is mostattached to the definition of encryption. Encryption is the process of converting plain text"unhidden" to a cryptic text "hidden" to secure it against data thieves. This process hasanother part where cryptic text needs to be decrypted on the other end to be understood.In the broad meadow of cryptography, encryption is the procedure of indoctrinationletters (or information) within such a method that hackers cannot understand writing it, otherthan that approved parties only can used it.In an encryption scheme, the memorandum or information, it is also called as plaintext; this text is encrypted using an encryption algorithm, turning it into an unreadable ciphertext. This is usually done with the use of an encryption key, which specifies how the messageis to be encoded. After that decryption is also done by the authorized party.Encryption is a method of hiding data so that it cannot be read by anyone who doesnot know the key. The key is used to lock and unlock data. To encrypt a data one wouldperform some mathematical functions on the data and the result of these functions wouldproduce some output that makes the data look like garbage to anyone who doesnt know howto reverse the operations.The Advanced Encryption Standard (AES) is a measurement for the encryption ofelectronic records which is conventional scheme by the U.S.National Institute of Standardsand Technology (NIST) in 2001,STEPS:1. KeyExpansion—round keys are derived from the cipher key using Rijndaels keyschedule.2. InitialRound1. AddRoundKey—each byte of the state is combined with the round key usingbitwise xor.3. Rounds1. SubBytes—a non-linear substitution step where each byte is replaced withanother according to a lookup table.2. ShiftRows—a transposition step where each row of the state is shiftedcyclically a certain number of steps.3. MixColumns—a mixing operation which operates on the columns of the state,combining the four bytes in each column.4. AddRoundKey4. Final Round (no MixColumns)1. SubBytes2. ShiftRows3. AddRoundKey
  • 5. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME1803.3 FINDINGS• The DNA aging & sequencing’s success in sequencing the chemical bases of DNAis almost transformed accord to the biological changes in age. It is form newknowledge about fundamental biological processes. The initial segment of the task,called mapping, it has fragmented the chromosomes into groups as a combined setof regulated expressions. High Data mined Processors can be used to point out thelocation of these grouped genes and expression of genes.• Age correlated with an increasing percentage of sperm with highly damaged DNA(range: 0–83%) and tended to inversely correlate with percentage of apoptoticsperm (range: 0.3%–23%).• Gene mutations prevent one or more of these proteins from working properly. Bychanging a gene’s instructions for making a protein, a mutation can cause theprotein to malfunction or to be missing entirely. When a mutation alters a proteinthat plays a critical role in the body, it can disrupt normal development or cause amedical condition. A condition caused by mutations in one or more genes is calleda genetic disorder• FUTURE OF GENOMIC RESEARCHDevelop and apply genome-based strategies for the early detection, diagnosis, andtreatment of diseasesDevelop new technologies to study genes and DNA on a large scale and storegenomic data efficiently5. RESULT AND DISCUSSIONIt is form new knowledge about fundamental biological processes. High Data minedProcessors can be used to point out the location of these grouped genes and expression ofgenes. The various algorithms and ideas are identified for DNA Database security also.AGE CORRELATION• Age correlated with an increasing percentage of sperm with highly damaged DNA(range: 0–83%) and tended to inversely correlate with percentage of apoptoticsperm (range: 0.3%–23%).• The DNA aging & sequencing’s success in sequencing the chemical bases of DNAis almost transformed accord to the biological changes in age. It is form newknowledge about fundamental biological processes. The initial segment of the task,called mapping, it has fragmented the chromosomes into groups as a combined setof regulated expressions. High Data mined Processors can be used to point out thelocation of these grouped genes and expression of genes.6. CONCLUSIONThe successful module in aging sequences of DNA genome expressions achievedcompletely. The research process is yet to achieve further goals and objectives in disease,mutation susceptibility, and parental modules with DNA Database security
  • 6. International Journal of Computer Engineering and Technology (IJCET), ISSN 0976-6367(Print), ISSN 0976 – 6375(Online) Volume 4, Issue 3, May – June (2013), © IAEME181REFERENCES[1] B. Figg. (2004). Cryptography and Network Security. Internet:http:/www.homepages.dsu.edu/figgw/Cryptography%20&%20Network%20Security.ppt.[March 16, 2010].[2] A. Kahate, Cryptography and Network Security (2nd ed.). New Delhi: Tata McGraw Hill, 2008.[3] M. Milenkovic. Operating System: Concepts and Design, New York: McGrew-Hill, Inc., 1992.[4] P.R. Zimmermann. An Introduction to Cryptography. Germany: MIT press. Available:http://www.pgpi.org/doc/pgpintro, 1995, [March 16, 2009].[5] W. Stallings. Cryptography and Network Security (4th ed.). Englewood (NJ):PrenticeHall,1995.[6] V. Potdar and E. Chang. “Disguising Text Cryptography Using Image Cryptography,”International Network Conference, United Kingdom: Plymouth, 2004.[7] S.A.M. Diaa, M.A.K. Hatem, and M.H. Mohiy (2010). “Evaluating The Performance ofSymmetric Encryption Algorithms” International Journal of Network Security, 2010, 10(3),pp.213-219[8] T. Ritter. “Crypto Glossary and Dictionary of Technical Cryptography’. Internet:www.ciphersbyritter.com/GLOSSARY.HTM , 2007, [August 17, 2009][9] K.M. Alallayah, W.F.M. Abd El-Wahed, and A.H. Alhamani.“Attack Of Against SimplifiedData Encryption Standard Cipher System Using Neural Networks”. Journal of ComputerScience,2010, 6(1), pp. 29-35.[10] D. Rudolf. “Development and Analysis of Block Cipher and DES System”.Internet:http://www.cs.usask..ca/~dtr467/400/, 2000, [April 24, 2009][11] H. Wang. (2002). Security Architecture for The Teamdee System. An unpublished MScThesis submitted to Polytechnic Institution and State University, Virginia, USA.[12] G.W. Moore. (2001). Cryptography Mini-Tutorial. Lecture notes University of MarylandSchool of Medicine. Internet: http://www.medparse.com/whatcryp.htm [March16, 2009].[13] T. Jakobsen and L.R. Knudsen. (2001). Attack on Block of Ciphers of Low AlgebraicDegree. Journal of Cryptography, New York, 14(3), pp.197-210.[14] N. Su, R.N. Zobel, and F.O. Iwu. “Simulation in Cryptographic Protocol Design and Analysis.”Proceedings 15th European Simulation Symposium, University of Manchester, UK., 2003.[15] Dr.R.Manicka Chezian, and Dr.T.Devi. “Termination of triggers in active databases”International Journal of Information Systems and Change Management, USA, Vol-5, No-3 PP251-266, 2011[16] Dr.R.Manicka Chezian, and Dr.T.Devi. “A new algorithm to detect the non termination oftriggers in active databases” International Journal of Advanced Networking and Applications,Vol-3, Issue-2 PP 1098-1104, 2011[17] Dr.R.Manicka Chezian, and P.M.Nishad “A vital approach to compare the size of DNAsequence using LZW with fixed length binary code and tree structures”, International Journal ofComputer Applications, Vol-3, No-1, PP 7-9, 2012[18] Dr.R.Manicka Chezian, and C.Bagyalakshmi “A survey on cloud data security using encryptiontechnique” International Journal of Advanced Research in Computer Engineering andTechnology, Vol-1, Issue-5, PP 263-265, 2012.[19] B.Saichandana, Dr.K.srinivas and Dr. Reddi Kiran Kumar, “Visual Cryptography Scheme forColor Images”, International Journal of Computer Engineering & Technology (IJCET),Volume 1, Issue 1, 2010, pp. 207 - 212, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.[20] Ahmad Salameh Abusukhon, “Block Cipher Encryption for Text-To-Image Algorithm”,International Journal of Computer Engineering & Technology (IJCET), Volume 4, Issue 3,2013, pp. 50 - 59, ISSN Print: 0976 – 6367, ISSN Online: 0976 – 6375.

×