Nowadays the disease occurrence for the people mostly depends on the inheritance nature of the clans. The diabetes is an example to refer it with the parent to child transfer in most of the possible cases. The disease diagnosis is a tedious process if every parameter of the patient is unknown, The DNA (Deoxyribonucleic Acid) sequencing and structure helps in one way or another to study the patterns and perform the proper guidelines to control the spread of the disease and also to cure it in an efficient way if the next stage of its progress is known to the medical field. The process of integrating the exact DNA sequence and structure combinations for medical analysis is a difficult one which requires the complex DNA data collection from different medical resources with proper comparisons along with the verifications and validations to implement. This paper proposed the optimal integration of DNA sequence and structure information with evaluation techniques using data mining strategies by analyzing the disease condition for individual patient condition. In future this paper will be extended with machine learning analysis of implementation through genetic algorithm to attain an automated DNA analysis system.
2. Efficient Integration of Target Patient data with DNA Sequences and Structure Information
towards Medical Diagnosis
http://www.iaeme.com/IJARET/index.asp 570 editor@iaeme.com
1. INTRODUCTION
1.1. DNA
DNA is a polymer composed of two polynucleotide chains that coil around each other to form
a double helix carrying genetic instructions for the development, functioning, growth, and
reproduction of all known organisms and many viruses [1]. DNA and ribonucleic acid (RNA)
are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides),
nucleic acids are one of the four major types of macromolecules that are essential for all known
forms of life [2, 4].
1.2. Data Integration
Data integration involves combining data residing in different sources and providing users with
a unified view of them. This process becomes significant in a variety of situations, which
include both commercial (such as when two similar companies need to merge their databases)
and scientific (combining research results from different bioinformatics repositories [3, 6], for
example) domains. Data integration appears with increasing frequency as the volume (that is,
big data) and the need to share existing data explodes. It has become the focus of extensive
theoretical work, and numerous open problems remain unsolved. Data integration encourages
collaboration between internal as well as external users [5]. The data being integrated must be
received from a heterogeneous database system and transformed to a single coherent data store
that provides synchronous data across a network of files for clients [7, 9].
1.3. Medical Diagnosis
Medical diagnosis is the process of determining which disease or condition explains a person's
symptoms and signs. It is most often referred to as diagnosis with the medical context being
implicit [8].
Figure 1 Sample Medical Diagnosis System
1.4. Problem Statement
The methodology of collecting integrating, comparing the DNA structure and sequence for data
for medical analysis is a difficult one which requires the data mining based extraction
procedural approach with retrieved medical data from different medical resources with proper
verification and validation for disease diagnosis.
The objective of this work is to perform the extraction and comparison of DNA structure
and sequence data from different medical resources and integration from several medical web
resources for the target patient data analysis.
2. PROPOSED METHODOLOGY
The proposed methodology focuses on implementing the efficient Data Integration of DNA
Sequences and Structure Information’s towards Medical Diagnosis. The DNA data collection,
3. Edward Daniel Christopher .B and Victor S.P
http://www.iaeme.com/IJARET/index.asp 571 editor@iaeme.com
integration methodology for the pattern matching for the individual patient data analysis in
order to perform the medical data analysis in an efficient way. The following figure-1 shows
the proposed methodology structure,
2.1. Efficient data integration of DNA information towards medical diagnosis and
analysis
Figure 2 Proposed DNA data analysis for the patient disease diagnosis and analysis.
3. IMPLEMENTATION
3.1. Patient Data
The patient data contains the following information’s,
Patient History
It includes the fields as in table 1,
Table 1 Patient History
Sl.No Field Name
1 Patient ID
2 Name
3 Communication
4 Family details
5. Travel history
4. Efficient Integration of Target Patient data with DNA Sequences and Structure Information
towards Medical Diagnosis
http://www.iaeme.com/IJARET/index.asp 572 editor@iaeme.com
Patient Pretest
It contains the fields as in table 2,
Table 2 Patient History
1 Health Issue identified
2 Basic Lab diagnostic results
3 Health Issue history
4 Hospital and Doctors Tracks
5 Consumable medicines history
DNA data
The following figure (a) represents the Helix DNA structure, (b) Bond DNA structure, (c)
Groove DNA Structure and (d) DNA sequence sample.
Figure 3 DNA sample data
Sample DNA Sequence
ID AA03518 standard; DNA; FUN; 237 BP.
XX
AC U03518;
XX
DE Aspergillus awamori internal transcribed spacer 1 (ITS1) and 18S
DE rRNA and 5.8S rRNA genes, partial sequence.
XX
SQ Sequence 237 BP; 41 A; 77 C; 67 G; 52 T; 0 other;
aacctgcgga aggatcatta ccgagtgcgg gtcctttggg cccaacctcc catccgtgtc 60
tattgtaccc tgttgcttcg gcgggcccgc cgcttgtcgg ccgccggggg ggcgcctctg 120
ccccccgggc ccgtgcccgc cggagacccc aacacgaaca ctgtctgaaa gcgtgcagtc 180
tgagttgatt gaatgcaatc agttaaaact ttcaacaatg gatctcttgg ttccggc 237
//
5. Edward Daniel Christopher .B and Victor S.P
http://www.iaeme.com/IJARET/index.asp 573 editor@iaeme.com
3.2. DNA Analysis
Break
Now divide the DNA into several smaller parts for proper identification analysis.
Using the MatLab, we obtain the break-ups as follows,
Daniel_DNA='CTCGAGGGGCCTAGACATTGCCCTCCAGAGAGAGCACCCAACA
CCCTCCAGGCTTGACCGGCCAGGGTG';
Codons = reshape (your_DNA (:), 3, length (Daniel_DNA)/3)'
Sub Sequence
Generate multiple sub sequences for enabling the pattern matching procedures. The java code
is as follows,
// Java Program to print the
// 'n' lobes of DNA pattern
import java.io.*;
class GFG {
// Function to print upper half
// of the DNA or the upper lobe
static void printUpperHalf(String str)
{
char first, second;
int pos = 0;
// Each half of the DNA is made of
// combination of two compounds
for (int i = 1; i <= 4; i++) {
// Taking the two carbon
// compounds from the string
first = str.charAt(pos);
second = str.charAt(pos+1);
pos += 2;
for (int j = 4 - i; j >= 1; j--)
System.out.print(" ");
System.out.print(first);
for (int j = 1; j < i; j++)
System.out.print("--");
System.out.println(second);
}
}
6. Efficient Integration of Target Patient data with DNA Sequences and Structure Information
towards Medical Diagnosis
http://www.iaeme.com/IJARET/index.asp 574 editor@iaeme.com
// Function to print lower half
// of the DNA or the lower lobe
static void printLowerHalf(String str)
{
char first, second;
int pos = 0;
for (int i = 1; i <= 4; i++) {
first = str.charAt(pos);
second = str.charAt(pos+1);
pos += 2;
for (int j = 1; j < i; j++)
System.out.print(" ");
System.out.print(first);
for (int j = 4 - i; j >= 1; j--)
System.out.print("--");
System.out.println(second);
}
}
// Function to print 'n' parts of DNA
static void printDNA(String str[], int n)
{
for (int i = 0; i < n; i++) {
int x = i % 6;
// Calling for upperhalf
if (x % 2 == 0)
printUpperHalf(str[x]);
else
// Calling for lowerhalf
printLowerHalf(str[x]);
}
}
public static void main (String[] args) {
int n = 8;
// combinations stored in the array
7. Edward Daniel Christopher .B and Victor S.P
http://www.iaeme.com/IJARET/index.asp 575 editor@iaeme.com
String DNA[] = { "ATTAATTA", "TAGCTAGC", "CGCGATAT",
"TAATATGC", "ATCGTACG", "CGTAGCAT" };
printDNA(DNA, n);
}
}
Match
Check with the patient reference DNA sub sequence match with the existing internal or external
DNA extractions. The java code is as follows,
import java.util.*;
public class DNAAlignment {
public static void main(String[] args) {
String dna1 = "acbcdb";
String dna2 = "cadbd";
Alignment score = align(dna1, dna2);
System.out.println(score);
System.out.println("Score = " + score.score());
}
// Computes and returns the optimal alignment between the
// two DNA sequences. Sequences do not have to be the same
// length.
private static Alignment align(String dna1, String dna2) {
if (dna1.length() == 0 && dna2.length() == 0) {
return new Alignment();
} else if (dna1.length() == 0) {
Alignment result = align(dna1, dna2.substring(1));
result.addMatch('-', dna2.charAt(0));
return result;
} else if (dna2.length() == 0) {
Alignment result = align(dna1.substring(1), dna2);
result.addMatch(dna1.charAt(0), '-');
return result;
} else {
Alignment first = align(dna1.substring(1), dna2);
first.addMatch(dna1.charAt(0), '-');
8. Efficient Integration of Target Patient data with DNA Sequences and Structure Information
towards Medical Diagnosis
http://www.iaeme.com/IJARET/index.asp 576 editor@iaeme.com
Alignment second = align(dna1, dna2.substring(1));
second.addMatch('-', dna2.charAt(0));
Alignment both = align(dna1.substring(1), dna2.substring(1));
both.addMatch(dna1.charAt(0), dna2.charAt(0));
if (first.score() >= second.score() && first.score() >= both.score()) {
return first;
} else if (second.score() >= first.score() && second.score() >=
both.score()) {
return second;
} else {
return both;
}
}
}
// Represents an alignment of two strands of DNA.
private static class Alignment {
String dna1;
String dna2;
// Create a new alignment with no characters included.
public Alignment() {
dna1 = "";
dna2 = "";
}
// Adds c1 to the front of the first DNA sequence,
// adds c2 to the front of the 2nd DNA sequence.
public void addMatch(char c1, char c2) {
dna1 = c1 + dna1;
dna2 = c2 + dna2;
}
// Computes the score of this alignment.
// Match is +2, mismatch is -1.
public int score() {
int score = 0;
for (int i = 0; i < dna1.length(); i++) {
9. Edward Daniel Christopher .B and Victor S.P
http://www.iaeme.com/IJARET/index.asp 577 editor@iaeme.com
if (dna1.charAt(i) == dna2.charAt(i)) {
score += 2;
} else {
score -= 1;
}
}
return score;
}
// Returns each DNA sequence on its own line.
public String toString() {
return dna1 + "n" + dna2;
}
}
}
Diagnosis results and Solution
1. Variant identification
The health issue DNA factor variant identification represents the external or internal reference
DNA subsequence matching with the target patient DNA sub sequence.
Reference DNA sub sequence: - -- - -- - --- - --- - ---- - ----
Patient DNA sub sequence: - -- - -- - ---- - ----
2. Correlation
The identified variant DNA sequence is further analyzed for the correlation mapping of existing
disease diagnosis outputs.
3. Determination
The final disease diagnosis report for the target patient with the effective integration of DNA
structure and sequence analysis for the further study creation and storage of data
representations.
4. RESULTS AND DISCUSSION
Proposed methodology Medical diagnosis Efficiency
The optimal data integrated collection of DNA structure and sequence data for the medical
diagnosis system produced the following results in table-3.
Table 3 Proposed medical diagnosis efficiency.
Sl.No Patient data Total
number
of
patients
Positive
Progress
count
Without
DNA
Analysis
Proposed integration of
Patient data with DNA
information’s (positive Progress count)
1 Cystic Fibrosis 11 02 8
2 Diabetes 43 13 29
3 Hair fall 21 06 14
4 Paralysis 07 01 04
5 Muscular dystrophy 06 01 04
10. Efficient Integration of Target Patient data with DNA Sequences and Structure Information
towards Medical Diagnosis
http://www.iaeme.com/IJARET/index.asp 578 editor@iaeme.com
The following fig-4 shows the performance of proposed medicinal diagnosis system.
Figure 4 Proposed DNA data analysis for medical diagnosis
The final results show the 67% more success (59/88) when compared with the existing non
DNA analysis approach (23/88) which provides the enlightened vision for the future
improvements.
5. CONCLUSION
The integration of patient data with their clinical history along with the DNA analysis makes
the medical diagnosis system into a huge success. The process of analyzing the DNA structures
such as helix, bond, and Groove provides the essential information’s with its reared parts and
hidden data for future processing. The DNA sequence holds the entire pattern of patient medical
status which will be compared with the reference DNA sequence of stored patterns by dividing
it into several subsequences plays the vital role in the health issue variant identifications. The
Final analysis focuses on the target patients current status with the determination of future
impacts intimates the guidelines for the further medicinal paths and suggestions. This paper
produced the success rate of 67% more than that of the existing medical analysis without DNA
analysis. In future this research will be implemented with the support of genetic algorithm and
machine learning for the further developments in optimal medical diagnosis system.
REFERENCES
[1] Jacques Cohen, Computer science and bioinformatics, Communications of the ACM, v.48 n.3,
March ,2005
[2] Jana, R., Aqel, M., Srivastava, P., and Mahanti, P. K., Soft Computing Methodologies in
Bioinformatics, European Journal of Scientific Research, Vol. 26, No. 2, pp. 189-203, 2009.
[3] www.nature.com/clinicalpractice/onc
11. Edward Daniel Christopher .B and Victor S.P
http://www.iaeme.com/IJARET/index.asp 579 editor@iaeme.com
[4] Dressman MA. Gene expression profiling detects gene amplification and differentiates tumor
types in breast cancer. Cancer Res, 63:2194-2199, 2003.
[5] Subramanian S. Gastrointestinal stromal tumors (GISTs) with KIT and PDGFRA mutations
have distinct gene expression profiles. Oncogene, 23:7780-7790, 2004.
[6] Daisuke Kihara, Yifeng David Yang, and Troy Hawkins, Bioinformatics resources for cancer
research with an emphasis on gene function and structure prediction tools. Cancer Inform.2: 25–
35 , 2006.
[7] Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM, Meta-analysis of microarrays:
interstudy Validation of gene expression profiles reveals pathway dysregulation in prostate
cancer. Cancer Res.; 62(15):4427-33, 2007.
[8] Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R,
Geisler S,
[9] Demeter J, Perou CM, Lønning PE, Brown PO, Børresen-Dale AL, Botstein D, Repeated
observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad
Sci U S A.; 100(14):8418-23, 2008.
[10] Rhodes DR, Chinnaiyan AM , Integrative analysis of the cancer transcriptome. Nat Gene; 37
Suppl:S31-7, 2008.
[11] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ, Basic local Bio alignment search tool.
J Mol Biol.;215(3):403-10, 2011.
[12] https://en.wikipedia.org/