This document discusses a study that uses the ke-REM (ke-Rule Extraction Method) classifier to predict promoter regions in DNA sequences. The study evaluates the performance of ke-REM compared to existing promoter prediction techniques. ke-REM constructs rules based on attribute-value pairs from a dataset of 106 E. coli DNA sequences, each containing 57 nucleotides. The results show that ke-REM competes well with existing methods for identifying promoter regions in DNA.
International Journal of Engineering Research and DevelopmentIJERD Editor
This document discusses using artificial neural networks (ANN) and adaptive neuro-fuzzy inference systems (ANFIS) to predict promoter regions in genomic DNA sequences. It analyzes 106 DNA sequences from E. coli, each 57 nucleotides long, labeled as having a promoter region (+ label) or not (- label). ANN and ANFIS classifiers are trained on most of the data and tested on the remaining data using 5-fold cross-validation. The classifiers are evaluated based on accuracy, Matthews correlation coefficient, sensitivity, and specificity metrics. The results show that ANN and ANFIS are promising approaches for identifying promoter regions that compete with existing techniques.
The document provides information about various bioinformatics tools for DNA sequence analysis. It describes tools for finding protein coding regions like GeneMark and GENSCAN. It discusses tools for predicting promoters like SoftBerry Promoter and Promoter 2.0. It outlines how Tandem Repeat Finder can detect tandem repeats and how RepeatMasker can mask interspersed repeats in a sequence. It also discusses UTRScan for finding UTR locations and CpG Islands for detecting CpG islands. For each tool, it provides the procedure and interpretation of sample results.
Three groups annotated the genome of Mycoplasma genitalium and found inconsistencies in their annotations. Of the 468 genes, 318 were annotated consistently by all three groups but 45 had conflicting annotations. Errors likely arose from insufficient sequence similarity to determine homology accurately or incorrectly inferring function based on homology alone. Database curation is needed to prevent propagation of erroneous annotations.
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
The document discusses functional annotation of differentially expressed genes using various bioinformatics tools. It describes using g:Profiler and DAVID to identify gene ontology terms enriched in differentially expressed genes. Specific steps are outlined, including uploading gene lists to the tools, selecting appropriate organisms, downloading results tables containing significantly enriched terms and associated genes. Functional networks can also be generated using the ClueGO app in Cytoscape. The purpose is to understand the biological processes, molecular functions and cellular components that may be perturbed based on changes in gene expression.
Introduction
Transcriptome analysis
Goal of functional genomics
Why we need functional genomics
Technique
1. At DNA level
2.At RNA level
3. At protein level
4. loss of function
5. functional genomic and bioinformatics
Application
Latest research and reviews
Websites of functional genomics
Conclusions
Reference
Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.
Single cell RNA-seq was performed on 18 mouse bone marrow dendritic cells. 982 genes were found to be differentially expressed between two cells, while the majority of genes showed similar expression levels. Future work will analyze the functions of differentially expressed genes to better understand heterogeneity between cells and potential roles in disease.
International Journal of Engineering Research and DevelopmentIJERD Editor
This document discusses using artificial neural networks (ANN) and adaptive neuro-fuzzy inference systems (ANFIS) to predict promoter regions in genomic DNA sequences. It analyzes 106 DNA sequences from E. coli, each 57 nucleotides long, labeled as having a promoter region (+ label) or not (- label). ANN and ANFIS classifiers are trained on most of the data and tested on the remaining data using 5-fold cross-validation. The classifiers are evaluated based on accuracy, Matthews correlation coefficient, sensitivity, and specificity metrics. The results show that ANN and ANFIS are promising approaches for identifying promoter regions that compete with existing techniques.
The document provides information about various bioinformatics tools for DNA sequence analysis. It describes tools for finding protein coding regions like GeneMark and GENSCAN. It discusses tools for predicting promoters like SoftBerry Promoter and Promoter 2.0. It outlines how Tandem Repeat Finder can detect tandem repeats and how RepeatMasker can mask interspersed repeats in a sequence. It also discusses UTRScan for finding UTR locations and CpG Islands for detecting CpG islands. For each tool, it provides the procedure and interpretation of sample results.
Three groups annotated the genome of Mycoplasma genitalium and found inconsistencies in their annotations. Of the 468 genes, 318 were annotated consistently by all three groups but 45 had conflicting annotations. Errors likely arose from insufficient sequence similarity to determine homology accurately or incorrectly inferring function based on homology alone. Database curation is needed to prevent propagation of erroneous annotations.
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
The document discusses functional annotation of differentially expressed genes using various bioinformatics tools. It describes using g:Profiler and DAVID to identify gene ontology terms enriched in differentially expressed genes. Specific steps are outlined, including uploading gene lists to the tools, selecting appropriate organisms, downloading results tables containing significantly enriched terms and associated genes. Functional networks can also be generated using the ClueGO app in Cytoscape. The purpose is to understand the biological processes, molecular functions and cellular components that may be perturbed based on changes in gene expression.
Introduction
Transcriptome analysis
Goal of functional genomics
Why we need functional genomics
Technique
1. At DNA level
2.At RNA level
3. At protein level
4. loss of function
5. functional genomic and bioinformatics
Application
Latest research and reviews
Websites of functional genomics
Conclusions
Reference
Genome annotation, NGS sequence data, decoding sequence information, The genome contains all the biological information required to build and maintain any given living organism.
Single cell RNA-seq was performed on 18 mouse bone marrow dendritic cells. 982 genes were found to be differentially expressed between two cells, while the majority of genes showed similar expression levels. Future work will analyze the functions of differentially expressed genes to better understand heterogeneity between cells and potential roles in disease.
This document discusses gene prediction and promoter prediction. It begins by explaining that gene prediction involves locating protein-coding genes within sequenced genomes in order to understand their functional content. Various computational methods are used for gene prediction, including searching for signals like start/stop codons, searching coding content, and comparing sequences to find homologs. Promoter prediction involves locating DNA elements that regulate gene expression and is challenging due to diversity and short, conserved motifs. Ab initio and comparative phylogenetic footprinting methods are used to predict promoters and regulatory elements in prokaryotes and eukaryotes.
11.06.13 - 2013 NIH Research Festival PosterFarhoud Faraji
The study uses systems genetics to identify a network of co-expressed genes centered on Cnot2 that predicts breast cancer metastasis. Cnot2 overexpression regulates the expression of network genes and modulates metastatic potential in vivo. Small RNA sequencing revealed miR-3470b as an upstream regulator of the Cnot2 network by downregulating its hub genes. miR-3470b promotes a pro-metastatic transcriptional profile and increases metastasis in vivo, demonstrating it plays a key role in regulating the metastatic process.
This document discusses functional annotation and the Gene Ontology. It describes how functional annotation attaches biological information to sequences through searches of databases for homology, domains, and pathways as well as manual curation. Searches include BLAST for homology, Pfam and InterPro for domains, and KEGG and Reactome for pathways. Assignments include EC numbers for metabolic pathways and Gene Ontology terms from automated and manual annotation. Manual annotation combines all evidence and allows incorporation of experimental data but requires more time.
20140613 Analysis of High Throughput DNA Methylation ProfilingYi-Feng Chang
This document provides an overview of analysis of high-throughput DNA methylation profiling using bisulfite sequencing (BS-Seq) technology. It discusses DNA methylation and the bisulfite conversion process. It also reviews current BS-Seq resources, information that can be presented in BS-Seq studies, and published tools for analyzing BS-Seq data, including alignment, calling methylation status, and identifying differential methylation regions. The document concludes by introducing MethPipe, a comprehensive tool for BS-Seq data analysis.
This document discusses various methods for annotating genomes after sequencing and assembly. Sequence analysis approaches like identifying open reading frames can rapidly and inexpensively find some genes, but have weaknesses like false positives and missing short genes. More accurate methods are needed to find non-coding RNAs, pseudogenes, and other elements. As sequencing technologies generate more data, the bottleneck has shifted to analysis, requiring skills in both biology and mathematics. The document provides an example sequence to annotate and poses questions about fast, cheap and accurate annotation methods.
This document discusses identifying transcriptional start sites (TSSs) of human microRNAs (miRNAs) based on high-throughput sequencing data. It integrates cap analysis of gene expression (CAGE) tags, TSS Seq libraries, and H3K4me3 chromatin signature data to identify 847 human miRNA TSSs, including 70 TSSs of miRNA clusters. A machine learning-based Support Vector Machine (SVM) model is developed to systematically select representative TSSs for each miRNA. The TSS of the important human miRNA hsa-miR-122 is experimentally validated to demonstrate the effectiveness of the proposed resource for identifying human miRNA TSSs.
The document describes the Lab for Bioinformatics and Computational Genomics at UGent. The lab has over 100 staff working on bioinformatics projects, including engineers, scientists, and clinicians. The lab's work involves applying computational methods and software tools to analyze biological data and address problems in areas like genomics, molecular modeling, phylogeny, and medical applications.
This document provides a summary of a seminar on comparative genomics techniques. It discusses three levels of genome research: structural genomics, functional genomics, and comparative genomics. Comparative genomics involves analyzing and comparing different genomes to study gene content, function, organization, and evolution. Techniques discussed include genome sequencing, mapping, and bioinformatics tools. The document also outlines what can be compared between genomes and how comparative genomics has provided insights into evolution and gene function.
Prediction of mi-RNA related to late blight disease of potatoAnimesh Kumar
The document summarizes the process of predicting miRNAs related to late blight disease of potato using bioinformatics approaches. It discusses how miRNAs play an important role in host-pathogen interactions and regulates genes. The author proposes to identify potential pathogenic miRNAs and their targets in Solanum tuberosum in response to Phytophthora infestans infection using available EST sequences. A multi-step computational approach including screening ESTs, identifying pre-miRNAs, predicting secondary structures, and identifying targets is outlined. Relevant literature on plant miRNA prediction and P. infestans is also reviewed.
The document summarizes the genome sequencing and annotation of the mycobacteriophage Serenity. Serenity was isolated from a soil sample and found to have 98 genes through 454 pyrosequencing. Its genome was manually annotated to determine gene position and function. Serenity belongs to cluster A2 but is genetically diverse from other A2 phages. The document also introduces Single Molecule Real-Time sequencing which was then used to sequence 5 additional mycobacteriophage genomes with over 100x coverage. SMRT sequencing uses phospholinked nucleotides and zero-mode waveguides to achieve longer read lengths and detect modifications like methylation.
Ion AmpliSeq™ sequencing is one of the most promising applications
of the Ion Torrent NGS platform. It involves multiplex PCR for target
enrichment. Thermo Fisher offers online Ion AmpliSeq Designer to
customers to assist assay designs. While more and more people are
adopting Ion AmpliSeq technologies, challenges for assay designs
started to emerge. Here we present bioinformatics approaches to
improve the following areas of assay design: 1) assay specificity; 2)
primer quality control; 3) SNP under primer; and 4) flexibility to adapt
to different applications of Ion AmpliSeq sequencing including variant
calling, copy number variation detection, RNA expression, gene fusion
detection, and metagenomics. Design algorithms are developed to
ensure high coverage with controlled risk of amplification efficiency,
off-target reads and SNP effects. With the optimized design algorithm,
numerous custom and community research panels have been
created, including the Ion AmpliSeq Exome Panel, TP53 Panel, and
CFTR Panel.
Austin Neurology & Neurosciences is an open access, peer reviewed, scholarly journal dedicated to publish articles covering all areas of Neurology & Neurological Sciences.
The journal aims to promote research communications and provide a forum for doctors, researchers, physicians and healthcare professionals to find most recent advances in all areas of Neurology & Neurological Sciences. Austin Neurology & Neurosciences accepts original research articles, reviews, mini reviews, case reports and rapid communication covering all aspects of neurology & neurosciences.
Austin Neurology & Neurosciences strongly supports the scientific up gradation and fortification in related scientific research community by enhancing access to peer reviewed scientific literary works. Austin Publishing Group also brings universally peer reviewed journals under one roof thereby promoting knowledge sharing, mutual promotion of multidisciplinary science.
Structural genomics aims to understand genome content through sequencing and mapping genomes. Genetic maps show relative gene locations based on recombination rates, while physical maps use DNA analysis to place genes by base pair distance. Whole genome sequencing involves breaking genomes into fragments that are sequenced and reassembled using overlaps. Functional genomics seeks to identify all genes, RNAs, proteins and their functions through methods like homology searches, microarrays, and mutagenesis screens.
The document provides an overview of genomics and molecular profiling techniques. It discusses:
- The lab for bioinformatics and computational genomics which has 10 "genome hackers" and 42 scientists.
- An introduction to personalized medicine and biomarkers.
- First generation molecular profiling techniques like gene sequencing, microarrays, PCR.
- Next generation sequencing techniques like Roche 454, Illumina, SOLID which allow high throughput sequencing.
- Next generation applications like RNA sequencing, exome sequencing, epigenetic profiling.
- The role of bioinformatics in analyzing large genomic and molecular profiling data.
Examining gene expression and methylation with next gen sequencingStephen Turner
Slides on RNA-seq and methylation studies using next-gen sequencing given at the University of Miami Hussman Institute for Human Genomics "Genetic Analysis of Complex Human Diseases" course in 2012 (http://hihg.med.miami.edu/educational-programs/analysis-of-complex-human-diseases/genetic-analysis-of-complex-human-diseases/)
Improving DNA Barcode-based Fish Identification System on Imbalanced Data usi...TELKOMNIKA JOURNAL
Problem in imbalanced data is very common in classification or identification. The problem is
raised when the number of instances of one class far exceeds the other. In the previous research, our
DNA barcode-based Identification System of Tuna and Mackerel was developed in imbalanced dataset.
The number of samples of Tuna and Mackerel were much more than those of other fish samples.
Therefore, the accuracy of the classification model was probably still in bias. This research aimed at
employing Synthetic Minority Oversampling Technique (SMOTE) to yield balanced dataset. We used kmers
frequencies from DNA barcode sequences as features and Support Vector Machine (SVM) as
classification method. In this research we used trinucleotide (3-mers) and tetranucleotide (4-mers). The
training dataset was taken from Barcode of Life Database (BOLD). For evaluating the model, we compared
the accuracy of model using SMOTE and without SMOTE in order to classify DNA barcode sequences
which is taken from Department of Aquatic Product Technology, Bogor Agricultural University. The results
showed that the accuracy of the model in the species level using SMOTE was 7% and 13% higher than
those of non-SMOTE for trinucleotide (3-mers) and tetranucleotide (4-mers), respectively. It is expected
that the use of SMOTE, as one of data balancing technique, could increase the accuracy of DNA barcode
based fish classification system, particularly in the species level which is difficult to be identified.
The document discusses the human genome project, which aimed to sequence the entire human genome and identify all human genes. It provides background on the human genome, describing its size, number of genes, and chromosomes. It details the goals and milestones of the human genome project from 1986 to 2003. Vectors like yeast artificial chromosomes and bacterial artificial chromosomes were used to clone large fragments of DNA for sequencing.
This document discusses gene identification and discovery. It begins by describing gene identification in prokaryotes, unicellular eukaryotes, and multicellular eukaryotes. It then discusses the components of protein-coding genes and different approaches to gene prediction for prokaryotes and eukaryotes. The document also covers gene structure in prokaryotes and eukaryotes, as well as software tools and methods for gene prediction. Finally, it discusses several approaches to classifying genes by function.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
This document discusses gene prediction and promoter prediction. It begins by explaining that gene prediction involves locating protein-coding genes within sequenced genomes in order to understand their functional content. Various computational methods are used for gene prediction, including searching for signals like start/stop codons, searching coding content, and comparing sequences to find homologs. Promoter prediction involves locating DNA elements that regulate gene expression and is challenging due to diversity and short, conserved motifs. Ab initio and comparative phylogenetic footprinting methods are used to predict promoters and regulatory elements in prokaryotes and eukaryotes.
11.06.13 - 2013 NIH Research Festival PosterFarhoud Faraji
The study uses systems genetics to identify a network of co-expressed genes centered on Cnot2 that predicts breast cancer metastasis. Cnot2 overexpression regulates the expression of network genes and modulates metastatic potential in vivo. Small RNA sequencing revealed miR-3470b as an upstream regulator of the Cnot2 network by downregulating its hub genes. miR-3470b promotes a pro-metastatic transcriptional profile and increases metastasis in vivo, demonstrating it plays a key role in regulating the metastatic process.
This document discusses functional annotation and the Gene Ontology. It describes how functional annotation attaches biological information to sequences through searches of databases for homology, domains, and pathways as well as manual curation. Searches include BLAST for homology, Pfam and InterPro for domains, and KEGG and Reactome for pathways. Assignments include EC numbers for metabolic pathways and Gene Ontology terms from automated and manual annotation. Manual annotation combines all evidence and allows incorporation of experimental data but requires more time.
20140613 Analysis of High Throughput DNA Methylation ProfilingYi-Feng Chang
This document provides an overview of analysis of high-throughput DNA methylation profiling using bisulfite sequencing (BS-Seq) technology. It discusses DNA methylation and the bisulfite conversion process. It also reviews current BS-Seq resources, information that can be presented in BS-Seq studies, and published tools for analyzing BS-Seq data, including alignment, calling methylation status, and identifying differential methylation regions. The document concludes by introducing MethPipe, a comprehensive tool for BS-Seq data analysis.
This document discusses various methods for annotating genomes after sequencing and assembly. Sequence analysis approaches like identifying open reading frames can rapidly and inexpensively find some genes, but have weaknesses like false positives and missing short genes. More accurate methods are needed to find non-coding RNAs, pseudogenes, and other elements. As sequencing technologies generate more data, the bottleneck has shifted to analysis, requiring skills in both biology and mathematics. The document provides an example sequence to annotate and poses questions about fast, cheap and accurate annotation methods.
This document discusses identifying transcriptional start sites (TSSs) of human microRNAs (miRNAs) based on high-throughput sequencing data. It integrates cap analysis of gene expression (CAGE) tags, TSS Seq libraries, and H3K4me3 chromatin signature data to identify 847 human miRNA TSSs, including 70 TSSs of miRNA clusters. A machine learning-based Support Vector Machine (SVM) model is developed to systematically select representative TSSs for each miRNA. The TSS of the important human miRNA hsa-miR-122 is experimentally validated to demonstrate the effectiveness of the proposed resource for identifying human miRNA TSSs.
The document describes the Lab for Bioinformatics and Computational Genomics at UGent. The lab has over 100 staff working on bioinformatics projects, including engineers, scientists, and clinicians. The lab's work involves applying computational methods and software tools to analyze biological data and address problems in areas like genomics, molecular modeling, phylogeny, and medical applications.
This document provides a summary of a seminar on comparative genomics techniques. It discusses three levels of genome research: structural genomics, functional genomics, and comparative genomics. Comparative genomics involves analyzing and comparing different genomes to study gene content, function, organization, and evolution. Techniques discussed include genome sequencing, mapping, and bioinformatics tools. The document also outlines what can be compared between genomes and how comparative genomics has provided insights into evolution and gene function.
Prediction of mi-RNA related to late blight disease of potatoAnimesh Kumar
The document summarizes the process of predicting miRNAs related to late blight disease of potato using bioinformatics approaches. It discusses how miRNAs play an important role in host-pathogen interactions and regulates genes. The author proposes to identify potential pathogenic miRNAs and their targets in Solanum tuberosum in response to Phytophthora infestans infection using available EST sequences. A multi-step computational approach including screening ESTs, identifying pre-miRNAs, predicting secondary structures, and identifying targets is outlined. Relevant literature on plant miRNA prediction and P. infestans is also reviewed.
The document summarizes the genome sequencing and annotation of the mycobacteriophage Serenity. Serenity was isolated from a soil sample and found to have 98 genes through 454 pyrosequencing. Its genome was manually annotated to determine gene position and function. Serenity belongs to cluster A2 but is genetically diverse from other A2 phages. The document also introduces Single Molecule Real-Time sequencing which was then used to sequence 5 additional mycobacteriophage genomes with over 100x coverage. SMRT sequencing uses phospholinked nucleotides and zero-mode waveguides to achieve longer read lengths and detect modifications like methylation.
Ion AmpliSeq™ sequencing is one of the most promising applications
of the Ion Torrent NGS platform. It involves multiplex PCR for target
enrichment. Thermo Fisher offers online Ion AmpliSeq Designer to
customers to assist assay designs. While more and more people are
adopting Ion AmpliSeq technologies, challenges for assay designs
started to emerge. Here we present bioinformatics approaches to
improve the following areas of assay design: 1) assay specificity; 2)
primer quality control; 3) SNP under primer; and 4) flexibility to adapt
to different applications of Ion AmpliSeq sequencing including variant
calling, copy number variation detection, RNA expression, gene fusion
detection, and metagenomics. Design algorithms are developed to
ensure high coverage with controlled risk of amplification efficiency,
off-target reads and SNP effects. With the optimized design algorithm,
numerous custom and community research panels have been
created, including the Ion AmpliSeq Exome Panel, TP53 Panel, and
CFTR Panel.
Austin Neurology & Neurosciences is an open access, peer reviewed, scholarly journal dedicated to publish articles covering all areas of Neurology & Neurological Sciences.
The journal aims to promote research communications and provide a forum for doctors, researchers, physicians and healthcare professionals to find most recent advances in all areas of Neurology & Neurological Sciences. Austin Neurology & Neurosciences accepts original research articles, reviews, mini reviews, case reports and rapid communication covering all aspects of neurology & neurosciences.
Austin Neurology & Neurosciences strongly supports the scientific up gradation and fortification in related scientific research community by enhancing access to peer reviewed scientific literary works. Austin Publishing Group also brings universally peer reviewed journals under one roof thereby promoting knowledge sharing, mutual promotion of multidisciplinary science.
Structural genomics aims to understand genome content through sequencing and mapping genomes. Genetic maps show relative gene locations based on recombination rates, while physical maps use DNA analysis to place genes by base pair distance. Whole genome sequencing involves breaking genomes into fragments that are sequenced and reassembled using overlaps. Functional genomics seeks to identify all genes, RNAs, proteins and their functions through methods like homology searches, microarrays, and mutagenesis screens.
The document provides an overview of genomics and molecular profiling techniques. It discusses:
- The lab for bioinformatics and computational genomics which has 10 "genome hackers" and 42 scientists.
- An introduction to personalized medicine and biomarkers.
- First generation molecular profiling techniques like gene sequencing, microarrays, PCR.
- Next generation sequencing techniques like Roche 454, Illumina, SOLID which allow high throughput sequencing.
- Next generation applications like RNA sequencing, exome sequencing, epigenetic profiling.
- The role of bioinformatics in analyzing large genomic and molecular profiling data.
Examining gene expression and methylation with next gen sequencingStephen Turner
Slides on RNA-seq and methylation studies using next-gen sequencing given at the University of Miami Hussman Institute for Human Genomics "Genetic Analysis of Complex Human Diseases" course in 2012 (http://hihg.med.miami.edu/educational-programs/analysis-of-complex-human-diseases/genetic-analysis-of-complex-human-diseases/)
Improving DNA Barcode-based Fish Identification System on Imbalanced Data usi...TELKOMNIKA JOURNAL
Problem in imbalanced data is very common in classification or identification. The problem is
raised when the number of instances of one class far exceeds the other. In the previous research, our
DNA barcode-based Identification System of Tuna and Mackerel was developed in imbalanced dataset.
The number of samples of Tuna and Mackerel were much more than those of other fish samples.
Therefore, the accuracy of the classification model was probably still in bias. This research aimed at
employing Synthetic Minority Oversampling Technique (SMOTE) to yield balanced dataset. We used kmers
frequencies from DNA barcode sequences as features and Support Vector Machine (SVM) as
classification method. In this research we used trinucleotide (3-mers) and tetranucleotide (4-mers). The
training dataset was taken from Barcode of Life Database (BOLD). For evaluating the model, we compared
the accuracy of model using SMOTE and without SMOTE in order to classify DNA barcode sequences
which is taken from Department of Aquatic Product Technology, Bogor Agricultural University. The results
showed that the accuracy of the model in the species level using SMOTE was 7% and 13% higher than
those of non-SMOTE for trinucleotide (3-mers) and tetranucleotide (4-mers), respectively. It is expected
that the use of SMOTE, as one of data balancing technique, could increase the accuracy of DNA barcode
based fish classification system, particularly in the species level which is difficult to be identified.
The document discusses the human genome project, which aimed to sequence the entire human genome and identify all human genes. It provides background on the human genome, describing its size, number of genes, and chromosomes. It details the goals and milestones of the human genome project from 1986 to 2003. Vectors like yeast artificial chromosomes and bacterial artificial chromosomes were used to clone large fragments of DNA for sequencing.
This document discusses gene identification and discovery. It begins by describing gene identification in prokaryotes, unicellular eukaryotes, and multicellular eukaryotes. It then discusses the components of protein-coding genes and different approaches to gene prediction for prokaryotes and eukaryotes. The document also covers gene structure in prokaryotes and eukaryotes, as well as software tools and methods for gene prediction. Finally, it discusses several approaches to classifying genes by function.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
call for paper 2012, hard copy of journal, research paper publishing, where to publish research paper,
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
International Journal of Engineering Research and Development (IJERD)IJERD Editor
International Journal of Engineering Research and Development is an international premier peer reviewed open access engineering and technology journal promoting the discovery, innovation, advancement and dissemination of basic and transitional knowledge in engineering, technology and related disciplines.
International Journal of Engineering Research and DevelopmentIJERD Editor
This document discusses the emerging health threats posed by electronic waste (e-waste). It begins by defining e-waste and noting that it makes up 2.7-3% of total waste but contains many toxic and hazardous elements. The main constituents of e-waste are discussed, including heavy metals like lead, mercury, and cadmium which can cause health effects when exposed. India's annual e-waste generation is estimated at 400,000 tons and is growing rapidly. While formal recycling systems exist, most e-waste in developing countries is handled by the informal sector without proper health and safety practices, exposing workers and local communities to the toxic materials. Proper regulations and disposal facilities are needed to address this important environmental and public
This document summarizes an academic paper presented at the International Conference on Emerging Trends in Engineering and Management in 2014. The paper proposes a design and implementation of an elliptic curve scalar multiplier on a field programmable gate array (FPGA) using the Karatsuba algorithm. It aims to reduce hardware complexity by using a polynomial basis representation of finite fields and projective coordinate representation of elliptic curves. Key mathematical concepts like finite fields, point addition, and point doubling that are important to elliptic curve cryptography are also discussed at a high level.
1. The document describes a five-level cascaded H-bridge inverter used as a distribution static compensator (DSTATCOM) to compensate for reactive power and harmonics in a power system.
2. A DSTATCOM is connected in shunt with the distribution system and uses a voltage source converter to generate a set of three-phase output voltages that can be adjusted to control the exchange of active and reactive power with the system.
3. The five-level cascaded H-bridge inverter topology reduces device voltage stress and output harmonics. Level shifted pulse width modulation and phase shifted pulse width modulation techniques are investigated for controlling the DSTATCOM.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Efficient implementation of bit parallel finite eSAT Journals
Abstract Arithmetic in Finite/Galois field is a major aspect for many applications such as error correcting code and cryptography. Addition and multiplication are the two basic operations in the finite field GF (2m).The finite field multiplication is the most resource and time consuming operation. In this paper the complexity (space) analysis and efficient FPGA implementation of bit parallel Karatsuba Multiplier over GF (2m) is presented. This is especially interesting for high performance systems because of its carry free property. To reduce the complexity of Classical Multiplier, multiplier with less complexity over GF (2m) based on Karatsuba Multiplier is used. The LUT complexity is evaluated on FPGA by using Xilinx ISE 8.1i.Furthermore,the experimental results on FPGAs for bit parallel Karatsuba Multiplier and Classical Multiplier were shown and the comparison table is provided. To the best of our knowledge, the bit parallel karatsuba multiplier consumes least resources among the known FPGA implementation. Keywords: Classical Multiplier, Cryptograph, FPGA, Galois field, Karatsuba Multiplier
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
1) The document discusses a study analyzing the impact of gene length on detecting differentially expressed genes using RNA-seq technology.
2) The study will first test the reproducibility of RNA-seq and the effect of normalization. It will then compare different statistical tests for identifying differentially expressed genes.
3) Finally, the study will specifically test how gene length impacts the likelihood of a gene being identified as differentially expressed, as longer genes are easier to map with short reads.
BTC 506 Gene Identification using Bioinformatic Tools-230302130331.pptxChijiokeNsofor
This document discusses several bioinformatics tools and methods for identifying genes from genomic sequences, including:
1. Obtaining sequence data through sequencing technologies and preprocessing data.
2. Using tools like Ensembl, RefSeq and UCSC Genome Browser for gene identification and annotation.
3. Using gene prediction tools like Augustus, GeneMark and Glimmer to predict gene locations and structures.
4. Validating predicted genes through comparison to known genes or experimental validation with RNA-seq or RT-PCR.
This document discusses various bioinformatics tools and methods for identifying genes from genomic sequences. It begins by defining genes and genomes, then describes reference databases like RefSeq that are important for gene identification. It outlines the general workflow for gene identification, including obtaining sequences, preprocessing, annotation, prediction, and validation. Specific tools mentioned include GENSCAN, Glimmer, and Augustus for gene prediction, and BLAST for sequence alignment. The document also discusses identifying other genomic features like promoters, repeats, and open reading frames. It emphasizes that accurate gene identification requires both computational and experimental approaches.
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
1) The document describes a study that uses an artificial neural network (ANN) to predict promoter regions in DNA sequences.
2) The ANN was trained on a dataset of 106 DNA sequences (57 nucleotides each) containing both promoter and non-promoter regions that were labeled.
3) The ANN achieved a maximum accuracy of 84% at predicting promoters, outperforming some existing techniques according to error rates reported in the literature.
This document discusses functional genomics and its approaches. It defines functional genomics as the worldwide experimental approach to access the function of genes by using information from structural genomics. The key functional genomics approaches discussed are transcriptomics, proteomics, metabolomics, interactomics, epigenetics, and nutrigenomics. Modern techniques discussed include expressed sequence tags (ESTs), serial analysis of gene expression (SAGE), and microarray analysis.
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
Bioinformatics is a field of science that implies the use of techniques from mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Digital Signal Processing (DSP) applications in genomic sequence analysis have received great attention in recent years.DSP principles are used to analyse genomic and proteomic sequences. The DNA sequence is mapped into digital signals in the form of binary indicator sequences. Signal processing techniques such as digital filtering is applied to genomic sequences to identify protein coding region. Frequency response of genomic sequences is used to solve many optimization problems in science, medicine and many other applications. The aim of this paper is to describe a method of generating Finite Impulse Response (FIR) of the genomic sequence. The same DNA sequence is used to convert into proteomic sequence using transcription and translation, and also digital filtering technique such as FIR filter applied to know the frequency response. The frequency response is same for both gene and proteomic sequence.
The Human Genome Project began in 1990 as a joint effort between the Department of Energy and National Institutes of Health to map the entire human genome. By 2001, both the publicly funded Human Genome Project and private company Celera Genomics had completed mapping the genome, which contains approximately 26,000 genes. DNA sequencing techniques like Sanger sequencing were used to determine the precise order of nucleotides in DNA and advance biological and medical research. The genome provides instructions to build a human through genes and proteins, with implications for medicine, biotechnology, and life sciences.
This document discusses functional genomics, which studies how genes and intergenic regions contribute to biological processes on a genome-wide scale. It describes techniques used at the DNA, RNA, and protein levels, including genetic interaction mapping, ChIP-seq, DNase-seq, FAIRE-seq, ATAC-seq, microarrays, SAGE, RNA-seq, yeast two-hybrid systems, and affinity purification mass spectrometry. Major projects like ENCODE and GTEx aim to identify all functional elements in genomic DNA and understand the role of genetic variation in gene expression across tissues.
This document describes a project submitted by Ashish Singh Tomar to fulfill the requirements for a Bachelor's Degree in Bioinformatics. The project involves developing a next-generation sequencing (NGS) data analysis pipeline using the R statistical package. The document includes sections acknowledging contributions from colleagues and supervisors, as well as chapters outlining the aims, methodology, and results of the project.
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
SAGE- Serial Analysis of Gene ExpressionAashish Patel
Serial Analysis of Gene Expression (SAGE) is a method to quantify gene expression in cells. It involves extracting short sequence tags from mRNA transcripts and concatenating them for efficient sequencing. This allows simultaneous analysis of thousands of transcripts. SAGE provides quantitative gene expression data without prior knowledge of genes and can identify differentially expressed genes between cell types or conditions. While powerful, it requires substantial sequencing and computational analysis of large datasets.
Efficiency of Using Sequence Discovery for Polymorphism in DNA SequenceIJSTA
This document summarizes a research paper that proposes an effective sequence pattern discovery method to find polymorphic motifs in human DNA sequences to detect genetic medical conditions. The method uses three algorithms - LDK, HVS, and MDFS. It aims to identify disease-causing sequences, like those related to Ebola, by finding matching patterns in polymorphic DNA sequences. The position scrolling matrix is used to report test results of sequence pattern matching.
A Comparative Encyclopedia Of DNA Elements In The Mouse GenomeDaniel Wachtel
The Mouse ENCODE Consortium mapped transcription, chromatin accessibility, transcription factor binding, chromatin modifications, and replication domains in over 100 mouse cell types and tissues to generate a comparative encyclopedia of functional DNA elements in the mouse genome. They found that while much is conserved with humans, many mouse genes and large portions of regulatory DNA show divergence. Mouse and human transcription factor networks are more conserved than cis-regulatory DNA. Species-specific regulatory sequences are enriched for repetitive elements. Chromatin state and replication timing landscapes are developmentally stable within species.
RT-PCR and DNA microarray measurement of mRNA cell proliferationIJAEMSJORNAL
For mRNA quantification, RT-PCR and DNA microarrays have been compared in few studies
(RT-PCR). Healing callus of adult and juvenile rats after femur injury was found to be rich in mRNA at
various stages of the healing process. We used both methods to examine ten samples and a total of 26 genes.
Internal DNA probes tagged with 32P were employed in reverse transcription-polymerase chain reaction
(RT-PCR) to identify genes (RT-PCR). Ten Affymetrix® Rat U34A cRNA microarrays were hybridized with
biotin-labeled cRNA generated from mRNA. There was a wide range of correlation coefficients (r) between
RT-PCR and microarray data for each gene. Meaning became genetically unique because of this diversity.
Relatively lowly expressed genes had the highest r values. The distance between PCR primers and
microarray probes was found to be higher than previously assumed, leading to a drop in agreement between
microarray calls and PCR outcomes. Microarray research showed that RT-PCR expression levels for two
genes had a "floor effect." As a result, PCR primers and microarray probes that overlap in mRNA expression
levels can provide good agreement between these two techniques.
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
Genome annotation is the process of analyzing genomic DNA sequences to extract biological meaning and context. It involves two main steps - structural annotation, which locates gene elements like exons and introns, and functional annotation, which predicts the functions of gene products. Computational tools are crucial given the vast amounts of sequence data. They use various approaches like identifying open reading frames, conserved sequences, statistical patterns and sequence similarities to model gene structures and infer functions. The results are then integrated into automated annotation pipelines to generate comprehensive and reliable gene annotations for genomes.
The research and application progress of transcriptome sequencing technology (i)creativebiolabs11
This document discusses the application of transcriptome sequencing technology. It describes how transcriptome sequencing can be used to detect mutations by analyzing sequence information from all transcripts, including SNPs and indels. It also discusses how transcriptome sequencing can be applied to determine gene expression patterns, discover new transcripts, study the regulatory mechanisms of non-coding RNA, and analyze single cell transcriptomes. The development of RNA-seq technology provides an effective means for studying transcriptional regulatory networks and their relationship to traits.
This document describes a method called AnG-HPR for predicting transcription start sites (TSSs) in human genes. It extracts n-gram features from DNA sequences, including n-grams of length 2-5. These n-gram features are used as inputs to neural network classifiers to distinguish promoter regions from exons, introns, and 3'UTRs. The performance of AnG-HPR is compared to other existing promoter prediction methods such as DragonGSF, Eponine, and FirstEF, showing that AnG-HPR achieves the best performance across three test datasets. N-gram-based approaches to promoter recognition are discussed.
Similar to International Journal of Engineering Research and Development (20)
A Novel Method for Prevention of Bandwidth Distributed Denial of Service AttacksIJERD Editor
Distributed Denial of Service (DDoS) Attacks became a massive threat to the Internet. Traditional
Architecture of internet is vulnerable to the attacks like DDoS. Attacker primarily acquire his army of Zombies,
then that army will be instructed by the Attacker that when to start an attack and on whom the attack should be
done. In this paper, different techniques which are used to perform DDoS Attacks, Tools that were used to
perform Attacks and Countermeasures in order to detect the attackers and eliminate the Bandwidth Distributed
Denial of Service attacks (B-DDoS) are reviewed. DDoS Attacks were done by using various Flooding
techniques which are used in DDoS attack.
The main purpose of this paper is to design an architecture which can reduce the Bandwidth
Distributed Denial of service Attack and make the victim site or server available for the normal users by
eliminating the zombie machines. Our Primary focus of this paper is to dispute how normal machines are
turning into zombies (Bots), how attack is been initiated, DDoS attack procedure and how an organization can
save their server from being a DDoS victim. In order to present this we implemented a simulated environment
with Cisco switches, Routers, Firewall, some virtual machines and some Attack tools to display a real DDoS
attack. By using Time scheduling, Resource Limiting, System log, Access Control List and some Modular
policy Framework we stopped the attack and identified the Attacker (Bot) machines
Hearing loss is one of the most common human impairments. It is estimated that by year 2015 more
than 700 million people will suffer mild deafness. Most can be helped by hearing aid devices depending on the
severity of their hearing loss. This paper describes the implementation and characterization details of a dual
channel transmitter front end (TFE) for digital hearing aid (DHA) applications that use novel micro
electromechanical- systems (MEMS) audio transducers and ultra-low power-scalable analog-to-digital
converters (ADCs), which enable a very-low form factor, energy-efficient implementation for next-generation
DHA. The contribution of the design is the implementation of the dual channel MEMS microphones and powerscalable
ADC system.
Influence of tensile behaviour of slab on the structural Behaviour of shear c...IJERD Editor
-A composite beam is composed of a steel beam and a slab connected by means of shear connectors
like studs installed on the top flange of the steel beam to form a structure behaving monolithically. This study
analyzes the effects of the tensile behavior of the slab on the structural behavior of the shear connection like slip
stiffness and maximum shear force in composite beams subjected to hogging moment. The results show that the
shear studs located in the crack-concentration zones due to large hogging moments sustain significantly smaller
shear force and slip stiffness than the other zones. Moreover, the reduction of the slip stiffness in the shear
connection appears also to be closely related to the change in the tensile strain of rebar according to the increase
of the load. Further experimental and analytical studies shall be conducted considering variables such as the
reinforcement ratio and the arrangement of shear connectors to achieve efficient design of the shear connection
in composite beams subjected to hogging moment.
Gold prospecting using Remote Sensing ‘A case study of Sudan’IJERD Editor
Gold has been extracted from northeast Africa for more than 5000 years, and this may be the first
place where the metal was extracted. The Arabian-Nubian Shield (ANS) is an exposure of Precambrian
crystalline rocks on the flanks of the Red Sea. The crystalline rocks are mostly Neoproterozoic in age. ANS
includes the nations of Israel, Jordan. Egypt, Saudi Arabia, Sudan, Eritrea, Ethiopia, Yemen, and Somalia.
Arabian Nubian Shield Consists of juvenile continental crest that formed between 900 550 Ma, when intra
oceanic arc welded together along ophiolite decorated arc. Primary Au mineralization probably developed in
association with the growth of intra oceanic arc and evolution of back arc. Multiple episodes of deformation
have obscured the primary metallogenic setting, but at least some of the deposits preserve evidence that they
originate as sea floor massive sulphide deposits.
The Red Sea Hills Region is a vast span of rugged, harsh and inhospitable sector of the Earth with
inimical moon-like terrain, nevertheless since ancient times it is famed to be an abode of gold and was a major
source of wealth for the Pharaohs of ancient Egypt. The Pharaohs old workings have been periodically
rediscovered through time. Recent endeavours by the Geological Research Authority of Sudan led to the
discovery of a score of occurrences with gold and massive sulphide mineralizations. In the nineties of the
previous century the Geological Research Authority of Sudan (GRAS) in cooperation with BRGM utilized
satellite data of Landsat TM using spectral ratio technique to map possible mineralized zones in the Red Sea
Hills of Sudan. The outcome of the study mapped a gossan type gold mineralization. Band ratio technique was
applied to Arbaat area and a signature of alteration zone was detected. The alteration zones are commonly
associated with mineralization. The alteration zones are commonly associated with mineralization. A filed check
confirmed the existence of stock work of gold bearing quartz in the alteration zone. Another type of gold
mineralization that was discovered using remote sensing is the gold associated with metachert in the Atmur
Desert.
Reducing Corrosion Rate by Welding DesignIJERD Editor
This document summarizes a study on reducing corrosion rates in steel through welding design. The researchers tested different welding groove designs (X, V, 1/2X, 1/2V) and preheating temperatures (400°C, 500°C, 600°C) on ferritic malleable iron samples. Testing found that X and V groove designs with 500°C and 600°C preheating had corrosion rates of 0.5-0.69% weight loss after 14 days, compared to 0.57-0.76% for 400°C preheating. Higher preheating reduced residual stresses which decreased corrosion. Residual stresses were 1.7 MPa for optimal X groove and 600°C
Router 1X3 – RTL Design and VerificationIJERD Editor
Routing is the process of moving a packet of data from source to destination and enables messages
to pass from one computer to another and eventually reach the target machine. A router is a networking device
that forwards data packets between computer networks. It is connected to two or more data lines from different
networks (as opposed to a network switch, which connects data lines from one single network). This paper,
mainly emphasizes upon the study of router device, it‟s top level architecture, and how various sub-modules of
router i.e. Register, FIFO, FSM and Synchronizer are synthesized, and simulated and finally connected to its top
module.
Active Power Exchange in Distributed Power-Flow Controller (DPFC) At Third Ha...IJERD Editor
This paper presents a component within the flexible ac-transmission system (FACTS) family, called
distributed power-flow controller (DPFC). The DPFC is derived from the unified power-flow controller (UPFC)
with an eliminated common dc link. The DPFC has the same control capabilities as the UPFC, which comprise
the adjustment of the line impedance, the transmission angle, and the bus voltage. The active power exchange
between the shunt and series converters, which is through the common dc link in the UPFC, is now through the
transmission lines at the third-harmonic frequency. DPFC multiple small-size single-phase converters which
reduces the cost of equipment, no voltage isolation between phases, increases redundancy and there by
reliability increases. The principle and analysis of the DPFC are presented in this paper and the corresponding
simulation results that are carried out on a scaled prototype are also shown.
Mitigation of Voltage Sag/Swell with Fuzzy Control Reduced Rating DVRIJERD Editor
Power quality has been an issue that is becoming increasingly pivotal in industrial electricity
consumers point of view in recent times. Modern industries employ Sensitive power electronic equipments,
control devices and non-linear loads as part of automated processes to increase energy efficiency and
productivity. Voltage disturbances are the most common power quality problem due to this the use of a large
numbers of sophisticated and sensitive electronic equipment in industrial systems is increased. This paper
discusses the design and simulation of dynamic voltage restorer for improvement of power quality and
reduce the harmonics distortion of sensitive loads. Power quality problem is occurring at non-standard
voltage, current and frequency. Electronic devices are very sensitive loads. In power system voltage sag,
swell, flicker and harmonics are some of the problem to the sensitive load. The compensation capability
of a DVR depends primarily on the maximum voltage injection ability and the amount of stored
energy available within the restorer. This device is connected in series with the distribution feeder at
medium voltage. A fuzzy logic control is used to produce the gate pulses for control circuit of DVR and the
circuit is simulated by using MATLAB/SIMULINK software.
Study on the Fused Deposition Modelling In Additive ManufacturingIJERD Editor
Additive manufacturing process, also popularly known as 3-D printing, is a process where a product
is created in a succession of layers. It is based on a novel materials incremental manufacturing philosophy.
Unlike conventional manufacturing processes where material is removed from a given work price to derive the
final shape of a product, 3-D printing develops the product from scratch thus obviating the necessity to cut away
materials. This prevents wastage of raw materials. Commonly used raw materials for the process are ABS
plastic, PLA and nylon. Recently the use of gold, bronze and wood has also been implemented. The complexity
factor of this process is 0% as in any object of any shape and size can be manufactured.
Spyware triggering system by particular string valueIJERD Editor
This computer programme can be used for good and bad purpose in hacking or in any general
purpose. We can say it is next step for hacking techniques such as keylogger and spyware. Once in this system if
user or hacker store particular string as a input after that software continually compare typing activity of user
with that stored string and if it is match then launch spyware programme.
A Blind Steganalysis on JPEG Gray Level Image Based on Statistical Features a...IJERD Editor
This paper presents a blind steganalysis technique to effectively attack the JPEG steganographic
schemes i.e. Jsteg, F5, Outguess and DWT Based. The proposed method exploits the correlations between
block-DCTcoefficients from intra-block and inter-block relation and the statistical moments of characteristic
functions of the test image is selected as features. The features are extracted from the BDCT JPEG 2-array.
Support Vector Machine with cross-validation is implemented for the classification.The proposed scheme gives
improved outcome in attacking.
Secure Image Transmission for Cloud Storage System Using Hybrid SchemeIJERD Editor
- Data over the cloud is transferred or transmitted between servers and users. Privacy of that
data is very important as it belongs to personal information. If data get hacked by the hacker, can be
used to defame a person’s social data. Sometimes delay are held during data transmission. i.e. Mobile
communication, bandwidth is low. Hence compression algorithms are proposed for fast and efficient
transmission, encryption is used for security purposes and blurring is used by providing additional
layers of security. These algorithms are hybridized for having a robust and efficient security and
transmission over cloud storage system.
Application of Buckley-Leverett Equation in Modeling the Radius of Invasion i...IJERD Editor
A thorough review of existing literature indicates that the Buckley-Leverett equation only analyzes
waterflood practices directly without any adjustments on real reservoir scenarios. By doing so, quite a number
of errors are introduced into these analyses. Also, for most waterflood scenarios, a radial investigation is more
appropriate than a simplified linear system. This study investigates the adoption of the Buckley-Leverett
equation to estimate the radius invasion of the displacing fluid during waterflooding. The model is also adopted
for a Microbial flood and a comparative analysis is conducted for both waterflooding and microbial flooding.
Results shown from the analysis doesn’t only records a success in determining the radial distance of the leading
edge of water during the flooding process, but also gives a clearer understanding of the applicability of
microbes to enhance oil production through in-situ production of bio-products like bio surfactans, biogenic
gases, bio acids etc.
Gesture Gaming on the World Wide Web Using an Ordinary Web CameraIJERD Editor
- Gesture gaming is a method by which users having a laptop/pc/x-box play games using natural or
bodily gestures. This paper presents a way of playing free flash games on the internet using an ordinary webcam
with the help of open source technologies. Emphasis in human activity recognition is given on the pose
estimation and the consistency in the pose of the player. These are estimated with the help of an ordinary web
camera having different resolutions from VGA to 20mps. Our work involved giving a 10 second documentary to
the user on how to play a particular game using gestures and what are the various kinds of gestures that can be
performed in front of the system. The initial inputs of the RGB values for the gesture component is obtained by
instructing the user to place his component in a red box in about 10 seconds after the short documentary before
the game is finished. Later the system opens the concerned game on the internet on popular flash game sites like
miniclip, games arcade, GameStop etc and loads the game clicking at various places and brings the state to a
place where the user is to perform only gestures to start playing the game. At any point of time the user can call
off the game by hitting the esc key and the program will release all of the controls and return to the desktop. It
was noted that the results obtained using an ordinary webcam matched that of the Kinect and the users could
relive the gaming experience of the free flash games on the net. Therefore effective in game advertising could
also be achieved thus resulting in a disruptive growth to the advertising firms.
Hardware Analysis of Resonant Frequency Converter Using Isolated Circuits And...IJERD Editor
-LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region[5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits.
Simulated Analysis of Resonant Frequency Converter Using Different Tank Circu...IJERD Editor
LLC resonant frequency converter is basically a combo of series as well as parallel resonant ckt. For
LCC resonant converter it is associated with a disadvantage that, though it has two resonant frequencies, the
lower resonant frequency is in ZCS region [5]. For this application, we are not able to design the converter
working at this resonant frequency. LLC resonant converter existed for a very long time but because of
unknown characteristic of this converter it was used as a series resonant converter with basically a passive
(resistive) load. . Here, it was designed to operate in switching frequency higher than resonant frequency of the
series resonant tank of Lr and Cr converter acts very similar to Series Resonant Converter. The benefit of LLC
resonant converter is narrow switching frequency range with light load[6] . Basically, the control ckt plays a
very imp. role and hence 555 Timer used here provides a perfect square wave as the control ckt provides no
slew rate which makes the square wave really strong and impenetrable. The dead band circuit provides the
exclusive dead band in micro seconds so as to avoid the simultaneous firing of two pairs of IGBT’s where one
pair switches off and the other on for a slightest period of time. Hence, the isolator ckt here is associated with
each and every ckt used because it acts as a driver and an isolation to each of the IGBT is provided with one
exclusive transformer supply[3]. The IGBT’s are fired using the appropriate signal using the previous boards
and hence at last a high frequency rectifier ckt with a filtering capacitor is used to get an exact dc
waveform .The basic goal of this particular analysis is to observe the wave forms and characteristics of
converters with differently positioned passive elements in the form of tank circuits. The supported simulation
is done through PSIM 6.0 software tool
Amateurs Radio operator, also known as HAM communicates with other HAMs through Radio
waves. Wireless communication in which Moon is used as natural satellite is called Moon-bounce or EME
(Earth -Moon-Earth) technique. Long distance communication (DXing) using Very High Frequency (VHF)
operated amateur HAM radio was difficult. Even with the modest setup having good transceiver, power
amplifier and high gain antenna with high directivity, VHF DXing is possible. Generally 2X11 YAGI antenna
along with rotor to set horizontal and vertical angle is used. Moon tracking software gives exact location,
visibility of Moon at both the stations and other vital data to acquire real time position of moon.
“MS-Extractor: An Innovative Approach to Extract Microsatellites on „Y‟ Chrom...IJERD Editor
Simple Sequence Repeats (SSR), also known as Microsatellites, have been extensively used as
molecular markers due to their abundance and high degree of polymorphism. The nucleotide sequences of
polymorphic forms of the same gene should be 99.9% identical. So, Microsatellites extraction from the Gene is
crucial. However, Microsatellites repeat count is compared, if they differ largely, he has some disorder. The Y
chromosome likely contains 50 to 60 genes that provide instructions for making proteins. Because only males
have the Y chromosome, the genes on this chromosome tend to be involved in male sex determination and
development. Several Microsatellite Extractors exist and they fail to extract microsatellites on large data sets of
giga bytes and tera bytes in size. The proposed tool “MS-Extractor: An Innovative Approach to extract
Microsatellites on „Y‟ Chromosome” can extract both Perfect as well as Imperfect Microsatellites from large
data sets of human genome „Y‟. The proposed system uses string matching with sliding window approach to
locate Microsatellites and extracts them.
Importance of Measurements in Smart GridIJERD Editor
- The need to get reliable supply, independence from fossil fuels, and capability to provide clean
energy at a fixed and lower cost, the existing power grid structure is transforming into Smart Grid. The
development of a smart energy distribution grid is a current goal of many nations. A Smart Grid should have
new capabilities such as self-healing, high reliability, energy management, and real-time pricing. This new era
of smart future grid will lead to major changes in existing technologies at generation, transmission and
distribution levels. The incorporation of renewable energy resources and distribution generators in the existing
grid will increase the complexity, optimization problems and instability of the system. This will lead to a
paradigm shift in the instrumentation and control requirements for Smart Grids for high quality, stable and
reliable electricity supply of power. The monitoring of the grid system state and stability relies on the
availability of reliable measurement of data. In this paper the measurement areas that highlight new
measurement challenges, development of the Smart Meters and the critical parameters of electric energy to be
monitored for improving the reliability of power systems has been discussed.
Study of Macro level Properties of SCC using GGBS and Lime stone powderIJERD Editor
The document summarizes a study on the use of ground granulated blast furnace slag (GGBS) and limestone powder to replace cement in self-compacting concrete (SCC). Tests were conducted on SCC mixes with 0-50% replacement of cement with GGBS and 0-20% replacement with limestone powder. The results showed that replacing 30% of cement with GGBS and 15% with limestone powder produced SCC with the highest compressive strength of 46MPa, meeting fresh property requirements. The study concluded that this ternary blend of cement, GGBS and limestone powder can improve SCC properties while reducing costs.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Full-RAG: A modern architecture for hyper-personalizationZilliz
Mike Del Balso, CEO & Co-Founder at Tecton, presents "Full RAG," a novel approach to AI recommendation systems, aiming to push beyond the limitations of traditional models through a deep integration of contextual insights and real-time data, leveraging the Retrieval-Augmented Generation architecture. This talk will outline Full RAG's potential to significantly enhance personalization, address engineering challenges such as data management and model training, and introduce data enrichment with reranking as a key solution. Attendees will gain crucial insights into the importance of hyperpersonalization in AI, the capabilities of Full RAG for advanced personalization, and strategies for managing complex data integrations for deploying cutting-edge AI solutions.
In his public lecture, Christian Timmerer provides insights into the fascinating history of video streaming, starting from its humble beginnings before YouTube to the groundbreaking technologies that now dominate platforms like Netflix and ORF ON. Timmerer also presents provocative contributions of his own that have significantly influenced the industry. He concludes by looking at future challenges and invites the audience to join in a discussion.
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
GraphRAG for Life Science to increase LLM accuracyTomaz Bratanic
GraphRAG for life science domain, where you retriever information from biomedical knowledge graphs using LLMs to increase the accuracy and performance of generated answers
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
International Journal of Engineering Research and Development
1. International Journal of Engineering Research and Development
e-ISSN: 2278-067X, p-ISSN: 2278-800X, www.ijerd.com
Volume 9, Issue 10 (January 2014), PP. 01-07
1
Determining the Promoter Region in the DNA using ke-Rem
(ke-Rule Extraction Method)
Günay Karlı1
1
International Burch University, Faculty of Engineering and IT, IT Department, Sarajevo,
Bosnia and Herzegovina.
Abstract:- Biologists endeavor to reveal the secrets of life by looking into gene sequences. Concordantly,
determining the promoter region in the DNA is an important step in the process of detecting genes. However the
gene sequence data grow too huge recently. Conventional methods remain incapable to predict promoter so it
becomes increasingly important to automate the identification of functional elements, such as coding region,
genomic region or promoters. Thus many computer scientists are interested in the biological technology, and
improve some data mining methods which take advantages of computer power to see into gene sequences. In this
study, we employ ke-REM (ke-Rule Extraction Method) classifier to predict promoters of DNA sequences, and
evaluate their performances. The obtained results show that the classifier competes the existing techniques for
identifying promoter regions.
Keywords:- Promoter prediction, inductive learning, data mining, bioinformatics, DNA.
I. INTRODUCTION
Genome coding segments for transfer ribonucleic acids (tRNAs), messenger ribonucleic acids (mRNAs)
and ribosomal ribonucleic acids (rRNAs) are known as genes [1]. Proteins have amino acids whose sequence is
determined by mRNAs. Prokaryotic cells have a simple mechanism since all the genes tend to be converted into the
corresponding mRNA and then finally into proteins [2]. Gene finding or genome analysis generally relates to that
part of computational biology that involves identification of stretches of sequence algorithmically, and it is basically
the genomic DNA which is functional biologically. This particularly incorporates protein-coding genes as well as
various functional elements like regulatory regions and RNA genes. To clearly understand a species‟ genome, one of
the most crucial and significant step is to understand gene finding.
For prokaryotes, it is relatively simple to predict the Computational Gene since all the genes are basically
converted into the corresponding mRNA and finally into proteins. However, for eukaryotic cells, the process
becomes more complicated since there is interruption of coding of the DNA sequence by random sequences
commonly known as introns. There are a number of questions which biologists have currently attempted to answer
and they include [3]:
What section of a DNA sequence codes for a protein and what section is considered as junk DNA?
How can a junk DNA be classified as intron, transposes, untranslated region, regulatory elements, dead
genes, etc.?
Dividing a genome that has been newly sequenced into the coding genes and non-coding regions.
In a DNA, determination of the promoter region is a crucial step in the process of detecting genes and this
implies that the problem of determining a promoter is of major significance in biology [4] [5]. Biologists have tried
to analyze the secrets of life by investigating the gene sequence. Nevertheless, the growth of data on the gene
sequence has been growing rapidly. Therefore, most of computer scientists use biological technology to come up
with methods generated by the power of a computer to see gene sequences [6].
Despite the fact that approaches for determining regions of coding in genome DNA sequences
have been in existence since the nineteenth century, programs designed for combining coding sequences into mRNA
sequences that could be translated were invented in early 20th century [7]. Biologists have since then had various
programs at their disposal including GenViewer [8], GeneID [7], GenLang [9], GeneParser [10], FGENEH [11],
2. Determining the Promoter Region in the DNAusing ke-Rem (ke-Rule Extraction Method)
2
SORFIND [12], Xpound [13], GRAIL [14], VEIL [15], GenScan [16], etc. Two tools, which include GRAIL and
GenScan, are the most widely used tools both in the industry and in academics [17].
Most of the above approaches are founded on motifs searching in regard to DNA sequence to establish
whether it forms a promoter or not [18]. In this context, the search is done through the help of Markov models and
position weight matrices [19][20]. Artificial intelligence has also been used to supplement statistical intelligence.
More specifically artificial neural networks have produced acceptable values with the only disadvantage being high
positive rates that are false [21][22]. However, this has not curtailed their application to solve other bioinformatics
problems.
Computational method for predicting accurate promoter is still to be wholly developed. Currently,
biologists use ke-REM. The purpose of this paper is to explicate ke-REM can be successful be used for promoter
sequences.
A. What is promoter and importance of promoter prediction?
A promoter by definition is a DNA non-coding region responsible for initiating transcriptions a specific
gene. They are usually located on the upstream and on the same strand of the DNA-towards the anti-sense strand‟s
3‟ region also referred to as the template strand. A promoter may be characterized by a nucleotide sequence of
between 100 to 1000 base pair long [23]. A special enzyme, RNA polymerase, is needed for mRNA transcription.
This enzyme needs to attach itself to the DNA near a gene for it to qualify to be called a promoter sequence.
Sequences comprises of specific response and DNA sequences responsible for providing a fully secure primary
binding site for the enzyme as well as for proteins referred to as transcription factors.
Eukaryotic and Prokaryotic promoters vary from each other. In prokaryotic organism ς70 sigma factor is
able to identify specific promoter sequences, which in this case are 5‟TATAAT3‟ and 5‟TTGACA3‟ (-10 and -35
respectively) through the help of ς70 subunit of the polymerase enzyme[24]. Eukaryotic organism on the other hand
is more complex requiring at least 7 different factors for the polymerase II enzyme to bind to the promoter.
The promoter intensity correlates with identity degree to the sequence but separated by the spacer length.
Dense promoters are however founded closer to gene [25][26]. It has for long been thought that for transcription
activity to be optimal, various promoter elements‟ combinations including -35 and -10 hexamers, must be in
existence coupled with downstream and upstream regions [25]. According to this school of thought, RNAP works
both regions of the two hexamers in sequence and promoters of A+ T-rich sequences upstream of the −35 hexamer
[26][27] in several E. coli or Bacillus subtilis were identified as facilitating increased transcription in vitro when
accessory proteins were absent [28]. Different upstream sequences show different effects on transcription increasing
it from a mere 1.5 to 90 fold [29]. Those promoter sequences that are characterized by powerful binding affinity
have a direct effect on mRNA transcription.
Regardless of whether a transcribed DNA sequence can be identified through biological testing or not,
experiments are known to be time consuming and costly. The promoter prediction approach can however narrow
promoter regions amongst huge DNA sequences. A subsequent experiment can be established and tested thus saving
time and money [6].
II. METARIAL AND METHODS
There are two core classes of the promoter prediction, namely „+‟ and „-‟. These classes will denote the
existence of promoter prediction in the DNA sequence, having the „+‟ denoting for a positive indication of promoter
location in the DNA sequence and the „-‟ denoting the absence of promoter locations in the DNA sequence. This
research paper proposes to deal with a supervised learning technique in the prediction of promoter regions in the
DNA sequence.
A. Data set
The research sought to incorporate the E. Cole promoter gene arrays of DNA in the testing the proficiency
of ANN. Such data were collected from the UCI Repository [30]. This contains a set of 106 promoter and non-
promoter instances. The research paper notes that such data is viable in the comparisons of ANN with the models
existing in the literature; additionally such information involving the use of the data set is publicly available [5].
3. Determining the Promoter Region in the DNAusing ke-Rem (ke-Rule Extraction Method)
3
The 106 DNA arrays are composed of 57 nucleotides each. 53 of the DNA sequences in the data set had a
„+‟ denoting, indicating the presence of promoter location in the DNA array. The research then sought to align the
(+) parameter instances separately allowing for transcription. The following data characterize the (+) instances as
observed from the experiment. One is that for every occurrence the (+) represents for the promoter positive
presence, a name was also given in each instance and a classification of the DNA array was made composing of A,
T, G and C stand for Adenine, Thymine, Guanine, Cytosine [30].
B. ke-REM (ke-Rule Extraction Method)
This section introduces the novel development referred to as ke-REM (ke-Rule Extraction Method) and
addresses its ability in utilizing DNA promoter region predictions. As provided for above, an e.coli dataset consists
of a total of 106 DNA sequences, each containing a length of 57 nucleotides. The computer science perspective
expresses the dataset for e-coli as consisting of 106 instances containing 57 attributes bearing four values. The
attributes for these instances can be expressed as nucleotides locations for the 57-element sequence. Each attribute
accommodates 4 values, namely T-Thymine, A-Adenine, C-Cytosine and G-Guanine.
ke-REM constructs a rule-base by applying the data set attribute-value pairs. In an effort to generate a
robust rule-base, attribute-value pairs with significant importance are used. The significant question at this point
queries, “How are pairs with significant informational value determined?” the new ke-REM upgrade uses a “gain
function” in computing the informational value for the set‟s pair. ke-REM considers the higher gain value as a
higher informational value indicator. Therefore, the attribute-value with a higher value has a greater priority in the
processing of rule-base for the prediction system. keREM (ke-Rule Extraction Method) was upgraded to have the
ability to obtain IF-THEN rules from a given set of examples. It proactively discards encountered pitfalls commonly
present in inductive learning algorithms. keREM applies the gain function value, to determine which attributes are
of significant importance and are thus given a higher priority and as such, serve to further provide rules that are
more commonly acceptable.
The following is a summary of the algorithm:
Step 1: In a particular training set, a person computes class distribution and probability distribution rate of
every attribute-value.
Step 2: For every attribute in the data set, you compute the power of classification.
Step 3: for every attribute-value pairs, its Class-based Gain is calculated with the use of computed
probability distributions, power of classification and class distribution rate.
Step 4: One rule of selection is that you can select any value whose probability distributions one for n=1.
The next step is to convert the attribute-values into rules and then you mark the classified examples.
Step 5: Move to step 8.
Step 6: Starting from the first example that is unclassified, you form combinations with the n values by
using the attribute-values that has a bigger gain.
Step 7: You apply each combination in all examples. Using the values that are made up of n combinations,
those that match only with on class are converted into a rule. You mark the classified examples.
Step 8: In the training set, when all examples are classified, you move to step 11.
Step 9: perform the expression n=n+1
Step 10: go to step 6 if n<N
Step 11: Select the most general rule if there is over one rule that represents the same examples.
Step 12: End.
III. RESULTS AND DISCUSSION
The aim of this section is to experimentally analyze our approach for promoter sequences recognition with
the use of ke-REM and compare it with the existing approaches. Ke-REM has an important feature which allows it
to compute class-based gain for every attribute-value in a particular training set. First, in this context, the class
distribution and probability distribution rate of every nucleotide that forms DNA sequence is computed on the basis
of promoter and non-promoter classes. For every attribute in the data set, you contribute power of classification.
However, there is no class information in the results for the attribute-value pairs. Therefore, using class distribution
4. Determining the Promoter Region in the DNAusing ke-Rem (ke-Rule Extraction Method)
4
rate, probability distribution and the power of classification, you compute class-based gain for every value in the
DNA sequence data set. Using this method, rules that the algorithm produces were formed by attribute-value which
has a maximum information value.
There are two phases which fulfill the experiments of promoter prediction and they reflect the principles
behind a supervised learning algorithm which include testing and training. Classification model is built during
training, and during testing, the model is applied for classification of unseen examples that are DNA sequence and
they consist of promoter and non-promoter region. You evaluate the performance of ke-REM with the use of a
standard 5-fold cross-validation. Dataset is partitioned randomly in the 5-fold cross-validation into five subsets.
Every subset contains an equal ratio of both the promoter and non-promoter region. For five times, ke-REM is
trained using 4 subsets for each time for training and remaining the 5thh subset for testing. 5 models are generated in
this way during cross-validation. You obtain the final prediction performance by averaging the results that are
achieved from every model.
We determined prediction performance of the algorithm by measuring the threshold-dependent parameters
sensitivity (SE), specificity (SP), accuracy (ACC) and Matthew‟s Correlation coefficient (MCC). The following
equations were used to calculate ACC, SE, SP and MCC.
SE=TP/(TP+FN) (1)
SP=TN/(TN+FP) (2)
ACC= (TP+TN) / (TP+TN+FP+FN) (3)
MCC= ((TP*TN)-(FN*FP))/SQRT((TP+FN)*(TN+FP)*(TP+FP)*(TN+FN)) (4)
TP is true positive (promoter predicted as promoter)
FN is false negative (promoter predicted as non- promoter)
TN is true negative (non- promoter predicted as non- promoter)
FP is false positive (non- promoter predicted promoter).
The detailed performance of module in term of SE, SP, ACC and MCC is shown in Table 1.
Table 1: Performance of ke-REM in term of SE, SP, ACC and MCC
Threshold-
Dependent
Parameters
1. Model 2. Model 3. Model 4. Model 5. Model Average
ACC 0.75 0.90 0.80 0.90 0.69 0.8085
SE 0.80 0.90 0.80 1.00 0.54 0.8077
SP 0.70 0.90 0.80 0.80 0.85 0.8092
MCC 0.50 0.80 0.60 0.82 0.40 0.6246
In the literature, the way learning algorithms perform can be evaluated by applying cross-validation with
the use of a “leave-one-out” methodology. Leave-one-out cross validation (LOOCV) is considered as a special type
of k-fold cross-validation and k represents the number of instance in the data. This implies that in all iteration, close
to every data apart from the single observation are utilized for training and then you test the model on the single
observation. An estimate that is accurate that is obtained with the use LOOCV is considered as almost unbiased
[31]. This is commonly used when the data that is available is rare, particularly in bioinformatics when there is only
dozens of data sample available.
5. Determining the Promoter Region in the DNAusing ke-Rem (ke-Rule Extraction Method)
5
Table 2: The errors of some machine learning algorithms on promoter data
set.
System Errors Classifier
REX-1 0/106 Inductive L.A
ke-REM 3/106 Class-based gain
KBANN 4/106 A hybrid ML system
BP 8/106 Standard backpropagation with one layer
O'Neill 12/106 Ad hoc tech. from the bio. lit.
Near-
Neigh
13/106 A nearest neighbours algorithm
ID3 19/106 Quinlan's decision builder
Unlike the classifiers that have been applied here for promoter prediction (Table 2), ke-REM that has been
introduced in this document tends to outperform the present classifier for promoter prediction. When the
classification error is considered, it becomes better compared to ID3, KB, NN, O‟Neil and BP.
IV. CONCLUSIONS
Within the bioinformatics field, the promoter prediction is considered as a crucial problem from a
computational perspective. Newly developed ke-REM (ke-Rule Extraction Method) in this document has been
proposed as effective in tackling the problem. For rule-base with efficient rule to be generated, you employ attribute-
value pairs with higher importance. To calculate information value of the pair in the set, ke-REM uses its own “gain
function” to calculate information value. Because value of the higher gain tends to indicate higher information
value, a greater priority is given to attribute-value with higher value when it comes to production of rule-base of the
predicting system. According the results given above, it is possible to conclude that when ke-REM for promoter
prediction is employed it leads to results that are promising. Moreover, additional improvement increases the
accuracy of the results that have been obtained.
REFERENCES
[1]. D. Mount, Bioinformatics: Sequence and Genome Analysis, New York: Cold Spring Lab Press, 2013.
[2]. K. Davies, The Sequence, US: Weidenfeld & Nicholson, 2001.
[3]. P. Jayaram and K. Bhushan, Bioinformatics For Better Tomorrow, New Delhi: Indian Institute of
Technology, 2000.
[4]. B. Óscar and B. Santiago, “Cnn-promoter, new consensus promoter prediction program,” Revista EIA, no.
15, pp. 153-164, 2011.
[5]. C. Gabriela and M.-I. Bocicor, “Promoter Sequences Prediction Using Relational Association Rule
Mining,” Evolutionary Bioinformatics, vol. 8, pp. 181-196, 2012.
[6]. J.-W. Huang, Promoter Prediction in DNA Sequences, Kaohsiung,: National Sun Yat-sen University, 2003.
[7]. R. Guigo, S. Knudsen, N. Drake and T. Smith, “Prediction of gene structure,” Journal of Molecular
Biology, no. 226, pp. 141-157, 1995.
[8]. L. Milanesi, N. Kolchanov and I. Rogozin, “GenViewer: A computing tool for protein coding regions
prediction in nucleotide sequences,” in the 2nd International Congress on Bioinformatics, Supercomputing
and Complex Genome Analysis,, 573-587, 1993.
[9]. S. Dong and G. Stormo, “Gene structure prediction by linguistic methods,” Genomics, no. 23, pp. 540-551,
1994.
[10]. E. Stormo and E. Snyder, “Identification of coding regions in genomic DNA sequences: An application of
dynamic programming and neural networks,” Nucleic Acids Research, no. 21, pp. 607-613, 1993.
[11]. V. Solovyev, A. Salamov and C. Lawrence, “Prediction of internal exons by oligonucleotide composition
and discriminant analysis of spliceable open reading frames,” Nucleic Acids Research, no. 22, pp. 5156-
5163, 1994.
6. Determining the Promoter Region in the DNAusing ke-Rem (ke-Rule Extraction Method)
6
[12]. G. Hayden and M. Hutchinson, “The prediction of exons through an analysis of spliceable open reading
frames,” Nucleic Acids Research, no. 20, pp. 3453-3462, 1992.
[13]. A. Skolnick and M. Thomas, “A probabilistic model for detecting coding regions in DNA sequences,” IMA
J. Math. Appl. Med. Biol., no. 11, pp. 149-160, 1992.
[14]. Y. Xu, R. Mural and E. Uberbacher, “Constructing gene models from accurately predicted exons: An
application of dynamic programming,” Comput. Appl. Biosci, no. 10, pp. 613-623, 1994.
[15]. J. Henderson, S. Salzberg and K. Fasman, “Finding genes in DNA with a hidden Markov model,” Journal
of Computational Biology, vol. 2, no. 4, pp. 127-141, 1997.
[16]. C. Karlin and C. Burge, “Prediction of complete gene structures in human genomic DNA,” J. Mol. Biol, no.
268:, pp. 78-94, 1997.
[17]. M. Hrishikesh, S. Nitya and M. Krishna, “An ANN-GA model based promoter prediction in Arabidopsis
thaliana using tilling microarray data,” Bioinformation, vol. 6, no. 6, p. 240–243, 2011.
[18]. J. Gordon, M. Towsey and J. Hogan, “Improved prediction of bacterial transcription start sites,”
Bioinformatics, vol. 22, no. 2, pp. 142-148, 2006.
[19]. R. Liu and D. States, “Consensus promoter identification in the human genome utilizing expressed gene
markers and gene modelling,” Genome Research, no. 12, pp. 462-469, 2002.
[20]. C. Premalatha and C. a. K. K. Aravindan, “On improving the performance of promoter prediction classifier
for eukaryotes using fuzzy based distribution balanced stratified method.,” in Proceedings of the
International Conference on Advance in Computing, Control, and Telecommunication Technologies IEEE,,
ACT, 2009.
[21]. T. Abeel, Y. Saeys, E. Bonnet and P. Rouzé, “Generic eukaryotic core promoter prediction using structural
features of DNA,” Genome Research, vol. 18, no. 2, pp. 310-323, 2008.
[22]. Y.-J. Zhang, “A novel promoter prediction method inspiring by biological immune principles.,” Global
Congress on Intelligent Systems, no. 569-573, pp. 569-573, 2009.
[23]. S. S. Roded, S. Karni and Y. Felder, Promoter Sequence Analysis, Lecture 11,, 2007.
[24]. A. Dombroski, W. Walter and M. Record, “Polypeptides containing highly conserved regions of
transcription initiation factors 70 exhibit specificity of binding to promoter DNA,” Cell, p. 501–512, 1992 .
[25]. H. Bujard, M. Brenner and W. Kammerer, Structure-function relationship of Escherichia coli promoters,
New York: Elsevier, 1997.
[26]. D. Grana, T. Gardella and M. M. Susskind, “The effects of mutations in the ant promoter of phage P22,”
Genetics, p. 319–327, 1998.
[27]. M. Craig, M. L. Suh and T. Record, “HOz and DNase I probing of Es70RNA polymerase-l PR promoter
open complexes: Mg21binding and its structural consequences at the transcription start site,” Bio-
chemistry, p. 15624–15632, 1995.
[28]. D. Frisby and P. Zuber, “Analysis of the upstream activating sequence and site of carbon and nitrogen
source repression in the promoter of anearly-induced sporulation gene of Bacillus subtilis.,” Bacteriol, p.
7557–7564, 1991.
[29]. R. Wilma, A. Sarah and J. Salomon, “Escherichia Colipromoters With UP Elements Of Different Strengths:
Modular Structure Of Bacterial Promoters,” Journal Of Bacteriology, p. 5375–5383, 1998.
[30]. A. Frank and A. Asuncion, “UCI machine learning repository,” 2010. [Online]. Available:
http://archive.ics.uci.edu/ml/.
[31]. B. Efron, “Estimating the error rate of a prediction rule: improvement on cross-validation,” J. Am. Stat.
Assoc., no. 78, p. 316–331, 1983.
[32]. Essentials of Cell Biology, Nature Education, 2010.
[33]. Q. Luo and W. a. L. P. Yang, “Promoter recognition based on the interpolated Markov chains optimized via
simulated annealing and genetic algorithm,” Recognition Letters Pattern, vol. 9, no. 27, pp. 1031-1036,
2006.
[34]. S. Clancy, “Nature Education,” DNA transcription, vol. 1, no. 1, p. 41, 2008.
7. Determining the Promoter Region in the DNAusing ke-Rem (ke-Rule Extraction Method)
7
[35]. M. Guigo and R. Burset, “Evaluation of gene structure prediction programs,” Genomics, vol. 3, no. 34, pp.
353-367, 1996.
[36]. M. Wang, M. Yin and T. Jason, “GeneScout: a data mining system for predicting vertebrate genes in
genomic DNA sequences,” Information Sciences, vol. 163, no. Special issue, pp. 201-218, 2013.
[37]. T. S. Y. B. E. R. P. a. V. d. P. Y. Abeel, “Generic eukaryotic core promoter prediction using structural
features of DNA,” Genome Research, vol. 18, no. 2, pp. 310-323, 2008.