EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE INFORMATION TOWARDS MEDICAL DIAGNOSIS

http://www.iaeme.com/IJARET/index.asp 569 editor@iaeme.com
International Journal of Advanced Research in Engineering and Technology (IJARET)
Volume 11, Issue 4, April 2020, pp. 569-579, Article ID: IJARET_11_04_055
Available online at https://iaeme.com/Home/issue/IJARET?Volume=11&Issue=4
ISSN Print: 0976-6480 and ISSN Online: 0976-6499
DOI: https://doi.org/10.34218/IJARET.12.4.2021.055
© IAEME Publication Scopus Indexed
EFFICIENT INTEGRATION OF TARGET
PATIENT DATA WITH DNA SEQUENCES AND
STRUCTURE INFORMATION TOWARDS
MEDICAL DIAGNOSIS
Edward Daniel Christopher B
Research Scholar, Department of Computer Science, St.Xavier's College,
Tirunelveli, Tamil Nadu, India
Victor S.P
Associate Professor, Department of Computer Science,
Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu, India
ABSTRACT
Nowadays the disease occurrence for the people mostly depends on the inheritance
nature of the clans. The diabetes is an example to refer it with the parent to child transfer
in most of the possible cases. The disease diagnosis is a tedious process if every
parameter of the patient is unknown, The DNA (Deoxyribonucleic Acid) sequencing and
structure helps in one way or another to study the patterns and perform the proper
guidelines to control the spread of the disease and also to cure it in an efficient way if
the next stage of its progress is known to the medical field. The process of integrating
the exact DNA sequence and structure combinations for medical analysis is a difficult
one which requires the complex DNA data collection from different medical resources
with proper comparisons along with the verifications and validations to implement. This
paper proposed the optimal integration of DNA sequence and structure information
with evaluation techniques using data mining strategies by analyzing the disease
condition for individual patient condition. In future this paper will be extended with
machine learning analysis of implementation through genetic algorithm to attain an
automated DNA analysis system.
Key words: DNA, Sequence, Structure, Diagnosis, Integration.
Cite this Article: Edward Daniel Christopher .B and Victor S.P, Efficient Integration
of Target Patient data with DNA Sequences and Structure Information towards Medical
Diagnosis, International Journal of Advanced Research in Engineering and
Technology, 11(4), 2020, pp. 569-579.
https://iaeme.com/Home/issue/IJARET?Volume=11&Issue=4

Efficient Integration of Target Patient data with DNA Sequences and Structure Information
towards Medical Diagnosis
1. INTRODUCTION
1.1. DNA
DNA is a polymer composed of two polynucleotide chains that coil around each other to form
a double helix carrying genetic instructions for the development, functioning, growth, and
reproduction of all known organisms and many viruses [1]. DNA and ribonucleic acid (RNA)
are nucleic acids. Alongside proteins, lipids and complex carbohydrates (polysaccharides),
nucleic acids are one of the four major types of macromolecules that are essential for all known
forms of life [2, 4].
1.2. Data Integration
Data integration involves combining data residing in different sources and providing users with
a unified view of them. This process becomes significant in a variety of situations, which
include both commercial (such as when two similar companies need to merge their databases)
and scientific (combining research results from different bioinformatics repositories [3, 6], for
example) domains. Data integration appears with increasing frequency as the volume (that is,
big data) and the need to share existing data explodes. It has become the focus of extensive
theoretical work, and numerous open problems remain unsolved. Data integration encourages
collaboration between internal as well as external users [5]. The data being integrated must be
received from a heterogeneous database system and transformed to a single coherent data store
that provides synchronous data across a network of files for clients [7, 9].
1.3. Medical Diagnosis
Medical diagnosis is the process of determining which disease or condition explains a person's
symptoms and signs. It is most often referred to as diagnosis with the medical context being
implicit [8].
Figure 1 Sample Medical Diagnosis System
1.4. Problem Statement
The methodology of collecting integrating, comparing the DNA structure and sequence for data
for medical analysis is a difficult one which requires the data mining based extraction
procedural approach with retrieved medical data from different medical resources with proper
verification and validation for disease diagnosis.
The objective of this work is to perform the extraction and comparison of DNA structure
and sequence data from different medical resources and integration from several medical web
resources for the target patient data analysis.
2. PROPOSED METHODOLOGY
The proposed methodology focuses on implementing the efficient Data Integration of DNA
Sequences and Structure Information’s towards Medical Diagnosis. The DNA data collection,

Edward Daniel Christopher .B and Victor S.P
integration methodology for the pattern matching for the individual patient data analysis in
order to perform the medical data analysis in an efficient way. The following figure-1 shows
the proposed methodology structure,
2.1. Efficient data integration of DNA information towards medical diagnosis and
analysis
Figure 2 Proposed DNA data analysis for the patient disease diagnosis and analysis.
3. IMPLEMENTATION
3.1. Patient Data
The patient data contains the following information’s,
Patient History
It includes the fields as in table 1,
Table 1 Patient History
Sl.No Field Name
1 Patient ID
2 Name
3 Communication
4 Family details
5. Travel history

Patient Pretest
It contains the fields as in table 2,
Table 2 Patient History
1 Health Issue identified
2 Basic Lab diagnostic results
3 Health Issue history
4 Hospital and Doctors Tracks
5 Consumable medicines history
DNA data
The following figure (a) represents the Helix DNA structure, (b) Bond DNA structure, (c)
Groove DNA Structure and (d) DNA sequence sample.
Figure 3 DNA sample data
Sample DNA Sequence
ID AA03518 standard; DNA; FUN; 237 BP.
XX
AC U03518;
XX
DE Aspergillus awamori internal transcribed spacer 1 (ITS1) and 18S
DE rRNA and 5.8S rRNA genes, partial sequence.
XX
SQ Sequence 237 BP; 41 A; 77 C; 67 G; 52 T; 0 other;
aacctgcgga aggatcatta ccgagtgcgg gtcctttggg cccaacctcc catccgtgtc 60
tattgtaccc tgttgcttcg gcgggcccgc cgcttgtcgg ccgccggggg ggcgcctctg 120
ccccccgggc ccgtgcccgc cggagacccc aacacgaaca ctgtctgaaa gcgtgcagtc 180
tgagttgatt gaatgcaatc agttaaaact ttcaacaatg gatctcttgg ttccggc 237
//

3.2. DNA Analysis
Break
Now divide the DNA into several smaller parts for proper identification analysis.
Using the MatLab, we obtain the break-ups as follows,
Daniel_DNA='CTCGAGGGGCCTAGACATTGCCCTCCAGAGAGAGCACCCAACA
CCCTCCAGGCTTGACCGGCCAGGGTG';
Codons = reshape (your_DNA (:), 3, length (Daniel_DNA)/3)'
Sub Sequence
Generate multiple sub sequences for enabling the pattern matching procedures. The java code
is as follows,
// Java Program to print the
// 'n' lobes of DNA pattern
import java.io.*;
class GFG {
// Function to print upper half
// of the DNA or the upper lobe
static void printUpperHalf(String str)
{
char first, second;
int pos = 0;
// Each half of the DNA is made of
// combination of two compounds
for (int i = 1; i <= 4; i++) {
// Taking the two carbon
// compounds from the string
first = str.charAt(pos);
second = str.charAt(pos+1);
pos += 2;
for (int j = 4 - i; j >= 1; j--)
System.out.print(" ");
System.out.print(first);
for (int j = 1; j < i; j++)
System.out.print("--");
System.out.println(second);
}
}

// Function to print lower half
// of the DNA or the lower lobe
static void printLowerHalf(String str)
{
char first, second;
int pos = 0;
for (int i = 1; i <= 4; i++) {
first = str.charAt(pos);
second = str.charAt(pos+1);
pos += 2;
for (int j = 1; j < i; j++)
System.out.print(" ");
System.out.print(first);
for (int j = 4 - i; j >= 1; j--)
System.out.print("--");
System.out.println(second);
}
}
// Function to print 'n' parts of DNA
static void printDNA(String str[], int n)
{
for (int i = 0; i < n; i++) {
int x = i % 6;
// Calling for upperhalf
if (x % 2 == 0)
printUpperHalf(str[x]);
else
// Calling for lowerhalf
printLowerHalf(str[x]);
}
}
public static void main (String[] args) {
int n = 8;
// combinations stored in the array

String DNA[] = { "ATTAATTA", "TAGCTAGC", "CGCGATAT",
"TAATATGC", "ATCGTACG", "CGTAGCAT" };
printDNA(DNA, n);
}
}
Match
Check with the patient reference DNA sub sequence match with the existing internal or external
DNA extractions. The java code is as follows,
import java.util.*;
public class DNAAlignment {
public static void main(String[] args) {
String dna1 = "acbcdb";
String dna2 = "cadbd";
Alignment score = align(dna1, dna2);
System.out.println(score);
System.out.println("Score = " + score.score());
}
// Computes and returns the optimal alignment between the
// two DNA sequences. Sequences do not have to be the same
// length.
private static Alignment align(String dna1, String dna2) {
if (dna1.length() == 0 && dna2.length() == 0) {
return new Alignment();
} else if (dna1.length() == 0) {
Alignment result = align(dna1, dna2.substring(1));
result.addMatch('-', dna2.charAt(0));
return result;
} else if (dna2.length() == 0) {
Alignment result = align(dna1.substring(1), dna2);
result.addMatch(dna1.charAt(0), '-');
return result;
} else {
Alignment first = align(dna1.substring(1), dna2);
first.addMatch(dna1.charAt(0), '-');

Alignment second = align(dna1, dna2.substring(1));
second.addMatch('-', dna2.charAt(0));
Alignment both = align(dna1.substring(1), dna2.substring(1));
both.addMatch(dna1.charAt(0), dna2.charAt(0));
if (first.score() >= second.score() && first.score() >= both.score()) {
return first;
} else if (second.score() >= first.score() && second.score() >=
both.score()) {
return second;
} else {
return both;
}
}
}
// Represents an alignment of two strands of DNA.
private static class Alignment {
String dna1;
String dna2;
// Create a new alignment with no characters included.
public Alignment() {
dna1 = "";
dna2 = "";
}
// Adds c1 to the front of the first DNA sequence,
// adds c2 to the front of the 2nd DNA sequence.
public void addMatch(char c1, char c2) {
dna1 = c1 + dna1;
dna2 = c2 + dna2;
}
// Computes the score of this alignment.
// Match is +2, mismatch is -1.
public int score() {
int score = 0;
for (int i = 0; i < dna1.length(); i++) {

if (dna1.charAt(i) == dna2.charAt(i)) {
score += 2;
} else {
score -= 1;
}
}
return score;
}
// Returns each DNA sequence on its own line.
public String toString() {
return dna1 + "n" + dna2;
}
}
}
Diagnosis results and Solution
1. Variant identification
The health issue DNA factor variant identification represents the external or internal reference
DNA subsequence matching with the target patient DNA sub sequence.
Reference DNA sub sequence: - -- - -- - --- - --- - ---- - ----
Patient DNA sub sequence: - -- - -- - ---- - ----
2. Correlation
The identified variant DNA sequence is further analyzed for the correlation mapping of existing
disease diagnosis outputs.
3. Determination
The final disease diagnosis report for the target patient with the effective integration of DNA
structure and sequence analysis for the further study creation and storage of data
representations.
4. RESULTS AND DISCUSSION
Proposed methodology Medical diagnosis Efficiency
The optimal data integrated collection of DNA structure and sequence data for the medical
diagnosis system produced the following results in table-3.
Table 3 Proposed medical diagnosis efficiency.
Sl.No Patient data Total
number
of
patients
Positive
Progress
count
Without
DNA
Analysis
Proposed integration of
Patient data with DNA
information’s (positive Progress count)
1 Cystic Fibrosis 11 02 8
2 Diabetes 43 13 29
3 Hair fall 21 06 14
4 Paralysis 07 01 04
5 Muscular dystrophy 06 01 04

The following fig-4 shows the performance of proposed medicinal diagnosis system.
Figure 4 Proposed DNA data analysis for medical diagnosis
The final results show the 67% more success (59/88) when compared with the existing non
DNA analysis approach (23/88) which provides the enlightened vision for the future
improvements.
5. CONCLUSION
The integration of patient data with their clinical history along with the DNA analysis makes
the medical diagnosis system into a huge success. The process of analyzing the DNA structures
such as helix, bond, and Groove provides the essential information’s with its reared parts and
hidden data for future processing. The DNA sequence holds the entire pattern of patient medical
status which will be compared with the reference DNA sequence of stored patterns by dividing
it into several subsequences plays the vital role in the health issue variant identifications. The
Final analysis focuses on the target patients current status with the determination of future
impacts intimates the guidelines for the further medicinal paths and suggestions. This paper
produced the success rate of 67% more than that of the existing medical analysis without DNA
analysis. In future this research will be implemented with the support of genetic algorithm and
machine learning for the further developments in optimal medical diagnosis system.
REFERENCES
[1] Jacques Cohen, Computer science and bioinformatics, Communications of the ACM, v.48 n.3,
March ,2005
[2] Jana, R., Aqel, M., Srivastava, P., and Mahanti, P. K., Soft Computing Methodologies in
Bioinformatics, European Journal of Scientific Research, Vol. 26, No. 2, pp. 189-203, 2009.
[3] www.nature.com/clinicalpractice/onc

[4] Dressman MA. Gene expression profiling detects gene amplification and differentiates tumor
types in breast cancer. Cancer Res, 63:2194-2199, 2003.
[5] Subramanian S. Gastrointestinal stromal tumors (GISTs) with KIT and PDGFRA mutations
have distinct gene expression profiles. Oncogene, 23:7780-7790, 2004.
[6] Daisuke Kihara, Yifeng David Yang, and Troy Hawkins, Bioinformatics resources for cancer
research with an emphasis on gene function and structure prediction tools. Cancer Inform.2: 25–
35 , 2006.
[7] Rhodes DR, Barrette TR, Rubin MA, Ghosh D, Chinnaiyan AM, Meta-analysis of microarrays:
interstudy Validation of gene expression profiles reveals pathway dysregulation in prostate
cancer. Cancer Res.; 62(15):4427-33, 2007.
[8] Sorlie T, Tibshirani R, Parker J, Hastie T, Marron JS, Nobel A, Deng S, Johnsen H, Pesich R,
Geisler S,
[9] Demeter J, Perou CM, Lønning PE, Brown PO, Børresen-Dale AL, Botstein D, Repeated
observation of breast tumor subtypes in independent gene expression data sets. Proc Natl Acad
Sci U S A.; 100(14):8418-23, 2008.
[10] Rhodes DR, Chinnaiyan AM , Integrative analysis of the cancer transcriptome. Nat Gene; 37
Suppl:S31-7, 2008.
[11] Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ, Basic local Bio alignment search tool.
J Mol Biol.;215(3):403-10, 2011.
[12] https://en.wikipedia.org/

EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE INFORMATION TOWARDS MEDICAL DIAGNOSIS

Recommended

Recommended

More Related Content

Similar to EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE INFORMATION TOWARDS MEDICAL DIAGNOSIS

Similar to EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE INFORMATION TOWARDS MEDICAL DIAGNOSIS (20)

More from IAEME Publication

More from IAEME Publication (20)

Recently uploaded

Recently uploaded (20)

EFFICIENT INTEGRATION OF TARGET PATIENT DATA WITH DNA SEQUENCES AND STRUCTURE INFORMATION TOWARDS MEDICAL DIAGNOSIS