- The document discusses the serine integrases, which are phage recombinases capable of efficiently integrating, inverting, or excising DNA segments between their recognition sites. The φC31 integrase from bacteriophage φC31 is a widely used serine integrase.
- Serine integrases have potential for implementing logic and memory in genetic circuit design due to their ability to controllably invert DNA segments between two states using attachment sites. This binary behavior allows control of gene expression.
- The φC31 integrase mediates unidirectional recombination between attachment sites with high specificity. It was recently shown to be reversible with the discovery of its recombination directionality factor gp
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
Bioinformatics is a field of science that implies the use of techniques from mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Digital Signal Processing (DSP) applications in genomic sequence analysis have received great attention in recent years.DSP principles are used to analyse genomic and proteomic sequences. The DNA sequence is mapped into digital signals in the form of binary indicator sequences. Signal processing techniques such as digital filtering is applied to genomic sequences to identify protein coding region. Frequency response of genomic sequences is used to solve many optimization problems in science, medicine and many other applications. The aim of this paper is to describe a method of generating Finite Impulse Response (FIR) of the genomic sequence. The same DNA sequence is used to convert into proteomic sequence using transcription and translation, and also digital filtering technique such as FIR filter applied to know the frequency response. The frequency response is same for both gene and proteomic sequence.
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...csandit
The DNA sequences similarity analysis approaches have been based on the representation and the frequency of sequences components; however, the position inside sequence is important information for the sequence data. Whereas, insufficient information in sequences
representations is important reason that causes poor similarity results. Based on three classifications of the DNA bases according to their chemical properties, the frequencies and
average positions of group mutations have been grouped into two twelve-components vectors,the Euclidean distances among introduced vectors applied to compare the coding sequences of the first exon of beta globin gene of 11 species.
International Journal of Engineering Research and DevelopmentIJERD Editor
This document discusses using artificial neural networks (ANN) and adaptive neuro-fuzzy inference systems (ANFIS) to predict promoter regions in genomic DNA sequences. It analyzes 106 DNA sequences from E. coli, each 57 nucleotides long, labeled as having a promoter region (+ label) or not (- label). ANN and ANFIS classifiers are trained on most of the data and tested on the remaining data using 5-fold cross-validation. The classifiers are evaluated based on accuracy, Matthews correlation coefficient, sensitivity, and specificity metrics. The results show that ANN and ANFIS are promising approaches for identifying promoter regions that compete with existing techniques.
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)Kevin Keraudren
This document summarizes a proposed method for segmenting epithelial cells in high-throughput RNAi screens using image analysis. The method uses a pipeline that includes pre-processing images using filters to reduce noise and enhance cell structures, segmenting nuclei, generating an edge map of cell-cell contacts, and performing an adaptive watershed segmentation to extract three structures: cell-cell contacts, nuclei, and cell walls. The method is shown to accurately segment these structures and provide reliable quantification of markers in different experimental conditions, distinguishing effects of depleting different actin-binding proteins on cell-cell adhesion receptors and the cytoskeleton.
Construction of phylogenetic tree from multiple gene trees using principal co...IAEME Publication
This document describes a method for constructing a phylogenetic tree from multiple gene trees using principal component analysis. Multiple gene trees are generated from different protein sequences from various organisms. Distance matrices are calculated for each gene tree and combined into a single data matrix. Principal component analysis is performed on the data matrix to extract the first principal component, which represents the consensus distance vector combining information from all gene trees. A phylogenetic tree is then generated from the consensus distance vector using UPGMA, providing a species tree that integrates information from multiple genes. The method is demonstrated on protein sequence data from primates and placental mammals.
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...Gang Zhang
Gang Zhang has extensive experience in research related to recombinant DNA, lentiviral transduction, CRISPR/Cas9 genome editing, and animal cloning. He received his PhD from Shandong Normal University and the Institute of Zoology, Chinese Academy of Sciences in 2005. Since then, he has held several postdoctoral and research positions focusing on topics such as neural stem cell differentiation, brain tumor cells, Parkinson's disease genes, and human integrin overexpression. Zhang has authored or co-authored several publications related to gene cloning techniques, lentiviral transduction, induced pluripotent stem cells, and animal cloning in mice. In this talk, he will discuss the "All or Half" law of
This document describes a study that uses machine learning algorithms to efficiently predict DNA-binding proteins. Support vector machines and cascade correlation neural networks are optimized and compared to determine the best performing model. The SVM model achieves 86.7% accuracy at predicting DNA-binding proteins using features like overall charge, patch size, and amino acid composition of proteins. The CCNN model achieves lower accuracy of 75.4%. The study aims to improve on previous work by using the standard jack-knife validation technique to evaluate model performance on unseen data.
This proposed method focus on these issues by developing a novel classification algorithm by combining Gene Expression Graph (GEG) with Manhattan distance. This method will be used to express the gene expression data. Gene Expression Graph provides the optimal view about the relationship between normal and unhealthy genes. The method of using a graph-based gene expression to express gene information was first offered by the authors in [1] and [2], It will permits to construct a classifier based on an association between graphs represented for well-known classes and graphs represented for samples to evaluate. Additionally Euclidean distance is used to measure the strength of relationship which exists between the genes.
Analysis of Genomic and Proteomic Sequence Using Fir FilterIJMER
Bioinformatics is a field of science that implies the use of techniques from mathematics, informatics, statistics, computer science, artificial intelligence, chemistry, and biochemistry to solve biological problems usually on the molecular level. Digital Signal Processing (DSP) applications in genomic sequence analysis have received great attention in recent years.DSP principles are used to analyse genomic and proteomic sequences. The DNA sequence is mapped into digital signals in the form of binary indicator sequences. Signal processing techniques such as digital filtering is applied to genomic sequences to identify protein coding region. Frequency response of genomic sequences is used to solve many optimization problems in science, medicine and many other applications. The aim of this paper is to describe a method of generating Finite Impulse Response (FIR) of the genomic sequence. The same DNA sequence is used to convert into proteomic sequence using transcription and translation, and also digital filtering technique such as FIR filter applied to know the frequency response. The frequency response is same for both gene and proteomic sequence.
SIMILARITY ANALYSIS OF DNA SEQUENCES BASED ON THE CHEMICAL PROPERTIES OF NUCL...csandit
The DNA sequences similarity analysis approaches have been based on the representation and the frequency of sequences components; however, the position inside sequence is important information for the sequence data. Whereas, insufficient information in sequences
representations is important reason that causes poor similarity results. Based on three classifications of the DNA bases according to their chemical properties, the frequencies and
average positions of group mutations have been grouped into two twelve-components vectors,the Euclidean distances among introduced vectors applied to compare the coding sequences of the first exon of beta globin gene of 11 species.
International Journal of Engineering Research and DevelopmentIJERD Editor
This document discusses using artificial neural networks (ANN) and adaptive neuro-fuzzy inference systems (ANFIS) to predict promoter regions in genomic DNA sequences. It analyzes 106 DNA sequences from E. coli, each 57 nucleotides long, labeled as having a promoter region (+ label) or not (- label). ANN and ANFIS classifiers are trained on most of the data and tested on the remaining data using 5-fold cross-validation. The classifiers are evaluated based on accuracy, Matthews correlation coefficient, sensitivity, and specificity metrics. The results show that ANN and ANFIS are promising approaches for identifying promoter regions that compete with existing techniques.
Segmenting Epithelial Cells in High-Throughput RNAi Screens (Miaab 2011)Kevin Keraudren
This document summarizes a proposed method for segmenting epithelial cells in high-throughput RNAi screens using image analysis. The method uses a pipeline that includes pre-processing images using filters to reduce noise and enhance cell structures, segmenting nuclei, generating an edge map of cell-cell contacts, and performing an adaptive watershed segmentation to extract three structures: cell-cell contacts, nuclei, and cell walls. The method is shown to accurately segment these structures and provide reliable quantification of markers in different experimental conditions, distinguishing effects of depleting different actin-binding proteins on cell-cell adhesion receptors and the cytoskeleton.
Construction of phylogenetic tree from multiple gene trees using principal co...IAEME Publication
This document describes a method for constructing a phylogenetic tree from multiple gene trees using principal component analysis. Multiple gene trees are generated from different protein sequences from various organisms. Distance matrices are calculated for each gene tree and combined into a single data matrix. Principal component analysis is performed on the data matrix to extract the first principal component, which represents the consensus distance vector combining information from all gene trees. A phylogenetic tree is then generated from the consensus distance vector using UPGMA, providing a species tree that integrates information from multiple genes. The method is demonstrated on protein sequence data from primates and placental mammals.
On the All or Half Law of Recombinant DNA, Lentivus Transduction and some oth...Gang Zhang
Gang Zhang has extensive experience in research related to recombinant DNA, lentiviral transduction, CRISPR/Cas9 genome editing, and animal cloning. He received his PhD from Shandong Normal University and the Institute of Zoology, Chinese Academy of Sciences in 2005. Since then, he has held several postdoctoral and research positions focusing on topics such as neural stem cell differentiation, brain tumor cells, Parkinson's disease genes, and human integrin overexpression. Zhang has authored or co-authored several publications related to gene cloning techniques, lentiviral transduction, induced pluripotent stem cells, and animal cloning in mice. In this talk, he will discuss the "All or Half" law of
This document describes a study that uses machine learning algorithms to efficiently predict DNA-binding proteins. Support vector machines and cascade correlation neural networks are optimized and compared to determine the best performing model. The SVM model achieves 86.7% accuracy at predicting DNA-binding proteins using features like overall charge, patch size, and amino acid composition of proteins. The CCNN model achieves lower accuracy of 75.4%. The study aims to improve on previous work by using the standard jack-knife validation technique to evaluate model performance on unseen data.
This document discusses various methods for annotating genomes after sequencing and assembly. Sequence analysis approaches like identifying open reading frames can rapidly and inexpensively find some genes, but have weaknesses like false positives and missing short genes. More accurate methods are needed to find non-coding RNAs, pseudogenes, and other elements. As sequencing technologies generate more data, the bottleneck has shifted to analysis, requiring skills in both biology and mathematics. The document provides an example sequence to annotate and poses questions about fast, cheap and accurate annotation methods.
A Study on DNA based Computation and Memory DevicesEditor IJCATR
The present study delineates Deoxyribonucleic Acid (DNA) based computing and storage devices which have good future in the vast era of information technology. The traditional devices mostly used are made up of silicon. The devices are costly and have physical limitations to cause leakage of electrons and circuit to shorten. So, there is a need of materials which are capable of doing fast processing and have vast memory storage. DNA which is a bio-molecule has all these characteristics capable of providing ample storage. In classical computing devices, electronic logic gates are elements which allow storing and transforming of information. Designing of an appropriate sequence or a net of “store” and “transform” operations (in a sense of building a device or writing a program) is equivalent to preparing some computations. In DNA based computation, the situation is analogous. The main difference is the type of computing devices since in this new method of computing instead of electronic gates, DNA molecules have been deployed for the processing of dossier. Moreover, the inherent massive parallelism of DNA computing may lead to methods solving some intractable computational problems. The aim of this research study is to analyze the logical features and memory formation using DNA bio molecules in order to achieve proliferated speed, accuracy and vast storage.
This document discusses intrinsically disordered proteins (IDPs), which lack a fixed three-dimensional structure under physiological conditions and instead exist as dynamic ensembles. It notes that IDPs challenge the traditional view that proteins require a well-defined structure to function. The document also mentions that IDPs often gain structure upon binding to their protein partners, and that their flexible, disordered state allows for low affinity but high specificity interactions optimal for regulation. Finally, it suggests intrinsic disorder may have evolved to allow for extended interaction surfaces and efficient signal processing.
Perturbing The Interactome: Multi-Omics And Personalized Methods For Network ...Marc Santolini
In this talk, I will describe several recently developed methods to study disease perturbations through the lens of network science. First I will present evidence that one can accurately predict perturbation patterns from the topology of biological networks, even when lacking measurements on the kinetic parameters governing the dynamics of these interactions. Using 87 biochemical networks with experimentally measured kinetic parameters, we show that a knowledge of the network topology offers 65% to 80% accuracy in predicting the impact of perturbations. In other words, we can use the increasingly accurate topological models to approximate perturbation patterns, bypassing expensive kinetic constant measurement. These results open new avenues in modeling drug action, and in identifying drug targets relying on the human interactome only.
Then, I will present a novel approach to identify the collective impact of miRNAs in disease. Instead of focusing on the magnitude of miRNA differential expression, here we address the secondary consequences for the interactome. We developed the Impact of Differential Expression Across Layers (IDEAL), a network-based algorithm to prioritize disease-relevant miRNAs based on the central role of their targets in the molecular interactome. This method was used in the context of asthmatic Th2 inflammation and identified five Th2-related miRNAs (mir27b, mir206, mir106b, mir203, and mir23b) whose antagonization led to a sharp reduction of the Th2 phenotype. This result offers novel approaches for therapeutic interventions.
Finally, I will present an investigation of the personalized gene expression responses when inducing hypertrophy and heart failure in 100+ strains of genetically distinct mice from the Hybrid Mouse Diversity Panel (HMDP). I will show that genes whose expression change significantly correlates with the severity of the disease are either up- or down-regulated across strains, and therefore missed by traditional population-wide analyses of differential gene expression. These uncovered personalised genes are enriched in human cardiac disease genes and form a dense co-regulated module strongly interacting with the cardiac hypertrophic signaling network in the human interactome, the set of molecular interactions in the cell. We validate our approach by showing that the knockdown of Hes1, predicted as a strong candidate, induces a dramatic reduction of hypertrophy by 80-90% in neonatal rat ventricular myocytes, demonstrating that individualized approaches are crucial to identify genes underlying complex diseases as well as to develop personalized therapies.
This document summarizes research on constructing a phylogenetic tree for COX genes using multiple sequence alignments with ClustalW. It begins by introducing phylogenetic analysis and the COX gene. It then describes the methodology used, which involved obtaining nucleotide sequences from a COX protein sequence in mice, performing a tBLASTn search to find related genes, aligning the sequences with ClustalW, and constructing rooted and unrooted phylogenetic trees. The results include the input protein sequence, tBLASTn output, ClustalW alignment, and the rooted and unrooted phylogenetic trees produced. It concludes that phylogenetic analysis is important for understanding gene and protein evolution.
Homologous recombination (HR) is the exchange of genetic material between two similar or identical molecules of DNA. The document outlines the mechanism and molecular basis of HR, including key steps like double-strand break formation, strand invasion, and Holliday junction resolution. HR serves important biological roles like DNA repair and genetic diversity. It has practical applications in gene mapping, transgenics, and gene editing technologies. Precise genome editing using HR is becoming an alternative to traditional plant breeding for crop improvement.
I elaborated these slides for an introductory class on Network Medicine given at UPV (Valencia) in October 2017. The fundamental principle behind Network Medicine is that disease phenotypes emerge from genotypes via the network properties of interactions between the underlying biological components. These phenotypes are best conceptualized as consequences of perturbations to disease modules of the biological networks in the cell, whether at the node level (disease genes) or the link level (disease edgotypes). With the further analysis of drug-disease association and drug-target association data, one can investigate the effects - therapeutic and undesired - of the associated medication. Understanding the molecular level networks allows to understand the connections between different diseases and the effects of drugs designed to target them, paving the way for personalized treatments based on one's own interactome.
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...Itoshi Nikaido
This document summarizes key findings from an analysis of 111 reference human epigenomes. It finds that 1) histone mark combinations predict gene expression and have distinct methylation and accessibility profiles, 2) megabase domains show differences in activity and structure, and 3) enhancers are enriched for conserved elements and coordinated modules associated with phenotypes.
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHMijcsa
There are plenty specific types of data which are needed to compress for easy storage and to reduce overall retrieval times. Moreover, compressed sequence can be used to understand similarities between biological sequences. DNA data compression challenge has become a major task for many researchers for the last few years as a result of exponential increase of produced sequences in gene databases. In this research paper we have attempt to develop an algorithm by self-reference bases; namely Single Base Variable Repeat Length DNA Compression (SBVRLDNAComp). There are a number of reference based compression methods but they are not satisfactory for forthcoming new species. SBVRLDNAComp is an optimal solution of the result obtained from small to long, uniform identical and non-identical string of nucleotides checked in four different ways. Both exact repetitive and non-repetitive bases are compressed by SBVRLDNAComp.The sound part of it is without any reference database BVRLDNAComp achieves 1.70 to 1.73 compression ratio α after testing on ten benchmark DNA sequences. The compressed file can be further compressed with standard tools (such as WinZip or WinRar) but even without this SBVRLDNAComp outperforms many standard DNA compression algorithms.
This document discusses various techniques for analyzing nucleotide and peptide sequences, including sequence alignment, searches of biological databases, and identifying intrinsic features, point mutations, and genetic diversity. It also describes sequence assembly, which involves reconstructing DNA sequences by aligning and merging small fragments from sequencing. Additionally, it discusses gene prediction to identify genomic regions that encode genes and protein structure prediction using statistical analysis of amino acid residues and inference from related proteins with known structures.
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
This document summarizes research on homologous recombination pathways and their regulation. It describes the key stages of homologous recombination, including DNA end resection, formation of the presynaptic filament by Rad51/Dmc1, synaptic complex assembly, and D-loop formation. It also discusses several factors that modulate homologous recombination by facilitating presynaptic filament assembly, synaptic complex formation, D-loop migration, and favoring crossover or non-crossover outcomes. The Yale laboratories aim to further elucidate the mechanisms and regulation of homologous recombination and its role in genome maintenance.
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
1. The document discusses models of homologous recombination including the Holliday model and the double-strand break repair model. It describes the key steps and proteins involved in each model.
2. Recombination involves the breakage and rejoining of DNA. In eukaryotes, the MRN/X complex processes DNA breaks. The Rad51 and Rad54 proteins then facilitate strand invasion and D-loop formation during homologous pairing.
3. Homologous recombination proteins from bacteria and eukaryotes catalyze different steps of the process. In E. coli, RecBCD introduces breaks and generates single strands for RecA to perform strand exchange, while RuvAB and Ruv
Deep learning for extracting protein-protein interactions from biomedical lit...Yifan Peng
The document presents a method called McDepCNN for extracting protein-protein interactions from biomedical literature using a multichannel dependency-based convolutional neural network. McDepCNN incorporates both automatically learned features from different CNN layers and manually crafted features using domain knowledge. It outperforms traditional machine learning and current deep learning models on two benchmark datasets, and generalizes better across different datasets than other methods. The model achieves its best performance using word embeddings, part-of-speech tags, named entities, dependency labels, and position features as input channels, and applying convolution with window sizes of 3, 5, and 7.
This document outlines a talk on protein function and bioinformatics. It discusses why bioinformatics is needed due to the rapid increase in genomic data. It introduces various bioinformatics tools for tasks like sequence analysis, database searches, and structure prediction. As a case study, it examines the genome of the psychrophilic archaeon Methanococcoides burtonii, identifying cold-adaptation features like CSP-like proteins and modified tRNAs. It emphasizes that bioinformatics provides useful predictions but must be integrated with experimental data.
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
This document outlines the process of constructing phylogenetic trees to delineate relationships among Coronaviridae species using protein sequences. It describes:
1) Choosing nucleocapsid and membrane proteins as molecular markers and collecting sequences from NCBI.
2) Performing multiple sequence alignment on the proteins using MUSCLE in MEGA, which is more accurate than ClustalW.
3) Selecting maximum likelihood as the tree-building method because it uses all sequence information without reducing it to distances and makes fewer assumptions than other methods.
The document discusses using genetic programming to predict stock market movements. It provides an overview of genetic programming, describes how it has been applied to problems like time series prediction and stock market prediction, and discusses the author's experiments applying genetic programming to predict the S&P 500 index. The results show that in some periods, genetic programming outperformed traditional models and the buy and hold benchmark strategy in terms of investment gains, particularly when transaction costs were not considered.
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programmingadil raja
The document discusses using genetic programming to develop a non-intrusive model for evaluating voice over IP (VoIP) quality based on transport layer network metrics. It first provides background on VoIP and challenges with speech quality assessment. It then describes a VoIP simulation environment and genetic programming approach. The goal is to evolve an estimation model for VoIP listening quality as a function of packet loss rate, jitter, bitrate and other transport metrics using genetic programming, and validate it against PESQ (Perceptual Evaluation of Speech Quality) scores.
This document discusses various methods for annotating genomes after sequencing and assembly. Sequence analysis approaches like identifying open reading frames can rapidly and inexpensively find some genes, but have weaknesses like false positives and missing short genes. More accurate methods are needed to find non-coding RNAs, pseudogenes, and other elements. As sequencing technologies generate more data, the bottleneck has shifted to analysis, requiring skills in both biology and mathematics. The document provides an example sequence to annotate and poses questions about fast, cheap and accurate annotation methods.
A Study on DNA based Computation and Memory DevicesEditor IJCATR
The present study delineates Deoxyribonucleic Acid (DNA) based computing and storage devices which have good future in the vast era of information technology. The traditional devices mostly used are made up of silicon. The devices are costly and have physical limitations to cause leakage of electrons and circuit to shorten. So, there is a need of materials which are capable of doing fast processing and have vast memory storage. DNA which is a bio-molecule has all these characteristics capable of providing ample storage. In classical computing devices, electronic logic gates are elements which allow storing and transforming of information. Designing of an appropriate sequence or a net of “store” and “transform” operations (in a sense of building a device or writing a program) is equivalent to preparing some computations. In DNA based computation, the situation is analogous. The main difference is the type of computing devices since in this new method of computing instead of electronic gates, DNA molecules have been deployed for the processing of dossier. Moreover, the inherent massive parallelism of DNA computing may lead to methods solving some intractable computational problems. The aim of this research study is to analyze the logical features and memory formation using DNA bio molecules in order to achieve proliferated speed, accuracy and vast storage.
This document discusses intrinsically disordered proteins (IDPs), which lack a fixed three-dimensional structure under physiological conditions and instead exist as dynamic ensembles. It notes that IDPs challenge the traditional view that proteins require a well-defined structure to function. The document also mentions that IDPs often gain structure upon binding to their protein partners, and that their flexible, disordered state allows for low affinity but high specificity interactions optimal for regulation. Finally, it suggests intrinsic disorder may have evolved to allow for extended interaction surfaces and efficient signal processing.
Perturbing The Interactome: Multi-Omics And Personalized Methods For Network ...Marc Santolini
In this talk, I will describe several recently developed methods to study disease perturbations through the lens of network science. First I will present evidence that one can accurately predict perturbation patterns from the topology of biological networks, even when lacking measurements on the kinetic parameters governing the dynamics of these interactions. Using 87 biochemical networks with experimentally measured kinetic parameters, we show that a knowledge of the network topology offers 65% to 80% accuracy in predicting the impact of perturbations. In other words, we can use the increasingly accurate topological models to approximate perturbation patterns, bypassing expensive kinetic constant measurement. These results open new avenues in modeling drug action, and in identifying drug targets relying on the human interactome only.
Then, I will present a novel approach to identify the collective impact of miRNAs in disease. Instead of focusing on the magnitude of miRNA differential expression, here we address the secondary consequences for the interactome. We developed the Impact of Differential Expression Across Layers (IDEAL), a network-based algorithm to prioritize disease-relevant miRNAs based on the central role of their targets in the molecular interactome. This method was used in the context of asthmatic Th2 inflammation and identified five Th2-related miRNAs (mir27b, mir206, mir106b, mir203, and mir23b) whose antagonization led to a sharp reduction of the Th2 phenotype. This result offers novel approaches for therapeutic interventions.
Finally, I will present an investigation of the personalized gene expression responses when inducing hypertrophy and heart failure in 100+ strains of genetically distinct mice from the Hybrid Mouse Diversity Panel (HMDP). I will show that genes whose expression change significantly correlates with the severity of the disease are either up- or down-regulated across strains, and therefore missed by traditional population-wide analyses of differential gene expression. These uncovered personalised genes are enriched in human cardiac disease genes and form a dense co-regulated module strongly interacting with the cardiac hypertrophic signaling network in the human interactome, the set of molecular interactions in the cell. We validate our approach by showing that the knockdown of Hes1, predicted as a strong candidate, induces a dramatic reduction of hypertrophy by 80-90% in neonatal rat ventricular myocytes, demonstrating that individualized approaches are crucial to identify genes underlying complex diseases as well as to develop personalized therapies.
This document summarizes research on constructing a phylogenetic tree for COX genes using multiple sequence alignments with ClustalW. It begins by introducing phylogenetic analysis and the COX gene. It then describes the methodology used, which involved obtaining nucleotide sequences from a COX protein sequence in mice, performing a tBLASTn search to find related genes, aligning the sequences with ClustalW, and constructing rooted and unrooted phylogenetic trees. The results include the input protein sequence, tBLASTn output, ClustalW alignment, and the rooted and unrooted phylogenetic trees produced. It concludes that phylogenetic analysis is important for understanding gene and protein evolution.
Homologous recombination (HR) is the exchange of genetic material between two similar or identical molecules of DNA. The document outlines the mechanism and molecular basis of HR, including key steps like double-strand break formation, strand invasion, and Holliday junction resolution. HR serves important biological roles like DNA repair and genetic diversity. It has practical applications in gene mapping, transgenics, and gene editing technologies. Precise genome editing using HR is becoming an alternative to traditional plant breeding for crop improvement.
I elaborated these slides for an introductory class on Network Medicine given at UPV (Valencia) in October 2017. The fundamental principle behind Network Medicine is that disease phenotypes emerge from genotypes via the network properties of interactions between the underlying biological components. These phenotypes are best conceptualized as consequences of perturbations to disease modules of the biological networks in the cell, whether at the node level (disease genes) or the link level (disease edgotypes). With the further analysis of drug-disease association and drug-target association data, one can investigate the effects - therapeutic and undesired - of the associated medication. Understanding the molecular level networks allows to understand the connections between different diseases and the effects of drugs designed to target them, paving the way for personalized treatments based on one's own interactome.
Reading circle of Epigenome Roadmap: Roadmap Epigenomics Consortium et. al. I...Itoshi Nikaido
This document summarizes key findings from an analysis of 111 reference human epigenomes. It finds that 1) histone mark combinations predict gene expression and have distinct methylation and accessibility profiles, 2) megabase domains show differences in activity and structure, and 3) enhancers are enriched for conserved elements and coordinated modules associated with phenotypes.
SBVRLDNACOMP:AN EFFECTIVE DNA SEQUENCE COMPRESSION ALGORITHMijcsa
There are plenty specific types of data which are needed to compress for easy storage and to reduce overall retrieval times. Moreover, compressed sequence can be used to understand similarities between biological sequences. DNA data compression challenge has become a major task for many researchers for the last few years as a result of exponential increase of produced sequences in gene databases. In this research paper we have attempt to develop an algorithm by self-reference bases; namely Single Base Variable Repeat Length DNA Compression (SBVRLDNAComp). There are a number of reference based compression methods but they are not satisfactory for forthcoming new species. SBVRLDNAComp is an optimal solution of the result obtained from small to long, uniform identical and non-identical string of nucleotides checked in four different ways. Both exact repetitive and non-repetitive bases are compressed by SBVRLDNAComp.The sound part of it is without any reference database BVRLDNAComp achieves 1.70 to 1.73 compression ratio α after testing on ten benchmark DNA sequences. The compressed file can be further compressed with standard tools (such as WinZip or WinRar) but even without this SBVRLDNAComp outperforms many standard DNA compression algorithms.
This document discusses various techniques for analyzing nucleotide and peptide sequences, including sequence alignment, searches of biological databases, and identifying intrinsic features, point mutations, and genetic diversity. It also describes sequence assembly, which involves reconstructing DNA sequences by aligning and merging small fragments from sequencing. Additionally, it discusses gene prediction to identify genomic regions that encode genes and protein structure prediction using statistical analysis of amino acid residues and inference from related proteins with known structures.
Analytical Study of Hexapod miRNAs using Phylogenetic Methodscscpconf
MicroRNAs (miRNAs) are a class of non-coding RNAs that regulate gene expression.
Identification of total number of miRNAs even in completely sequenced organisms is still an
open problem. However, researchers have been using techniques that can predict limited
number of miRNA in an organism. In this paper, we have used homology based approach for
comparative analysis of miRNA of hexapoda group .We have used Apis mellifera, Bombyx
mori, Anopholes gambiae and Drosophila melanogaster miRNA datasets from miRBase
repository. We have done pair wise as well as multiple alignments for the available miRNAs in
the repository to identify and analyse conserved regions among related species. Unfortunately,
to the best of our knowledge, miRNA related literature does not provide in depth analysis of
hexapods. We have made an attempt to derive the commonality among the miRNAs and to
identify the conserved regions which are still not available in miRNA repositories. The results
are good approximation with a small number of mismatches. However, they are encouraging and may facilitate miRNA biogenesis for hexapods.
This document summarizes research on homologous recombination pathways and their regulation. It describes the key stages of homologous recombination, including DNA end resection, formation of the presynaptic filament by Rad51/Dmc1, synaptic complex assembly, and D-loop formation. It also discusses several factors that modulate homologous recombination by facilitating presynaptic filament assembly, synaptic complex formation, D-loop migration, and favoring crossover or non-crossover outcomes. The Yale laboratories aim to further elucidate the mechanisms and regulation of homologous recombination and its role in genome maintenance.
Particle Swarm Optimization for Gene cluster IdentificationEditor IJCATR
The understanding of gene regulation is the most basic need for the classification of genes within a DNA. These genes
within the DNA are grouped together into clusters also known as Transcription Units. The genes are grouped into transcription units
for the purpose of construction and regulation of gene expression and synthesis of proteins. This knowledge further contributes as
essential information for the process of drug design and to determine the protein functions of newly sequenced genomes. It is possible
to use the diverse biological information across multiple genomes as an input to the classification problem. The purpose of this work is
to show that Particle Swarm Optimization may provide for more efficient classification as compared to other algorithms. To validate
the approach E.Coli complete genome is taken as the benchmark genome.
1. The document discusses models of homologous recombination including the Holliday model and the double-strand break repair model. It describes the key steps and proteins involved in each model.
2. Recombination involves the breakage and rejoining of DNA. In eukaryotes, the MRN/X complex processes DNA breaks. The Rad51 and Rad54 proteins then facilitate strand invasion and D-loop formation during homologous pairing.
3. Homologous recombination proteins from bacteria and eukaryotes catalyze different steps of the process. In E. coli, RecBCD introduces breaks and generates single strands for RecA to perform strand exchange, while RuvAB and Ruv
Deep learning for extracting protein-protein interactions from biomedical lit...Yifan Peng
The document presents a method called McDepCNN for extracting protein-protein interactions from biomedical literature using a multichannel dependency-based convolutional neural network. McDepCNN incorporates both automatically learned features from different CNN layers and manually crafted features using domain knowledge. It outperforms traditional machine learning and current deep learning models on two benchmark datasets, and generalizes better across different datasets than other methods. The model achieves its best performance using word embeddings, part-of-speech tags, named entities, dependency labels, and position features as input channels, and applying convolution with window sizes of 3, 5, and 7.
This document outlines a talk on protein function and bioinformatics. It discusses why bioinformatics is needed due to the rapid increase in genomic data. It introduces various bioinformatics tools for tasks like sequence analysis, database searches, and structure prediction. As a case study, it examines the genome of the psychrophilic archaeon Methanococcoides burtonii, identifying cold-adaptation features like CSP-like proteins and modified tRNAs. It emphasizes that bioinformatics provides useful predictions but must be integrated with experimental data.
Comparative genome analysis requires high quality annotations of all genomic elements. Today’s sequencing projects face numerous challenges including lower coverage, more frequent assembly errors, and the lack of closely related species with well-annotated genomes. Precise elucidation of the many different biological features encoded in any genome requires careful examination and review. We need genome annotation editing tools to modify and refine the location and structure of the genome elements that predictive algorithms cannot yet resolve automatically. During the manual annotation process, curators identify elements that best represent the underlying biology and eliminate elements that reflect systemic errors of automated analyses.
Apollo is a web-based application that supports and enables collaborative genome curation in real time, analogous to Google Docs, allowing teams of curators to improve on existing automated gene models through an intuitive interface. Researchers from nearly one hundred institutions worldwide are currently using Apollo for distributed curation efforts in over sixty genome projects across the tree of life: from plants to arthropods, to fungi, to species of fish and other vertebrates including human, cattle (bovine), and dog.
This document outlines the process of constructing phylogenetic trees to delineate relationships among Coronaviridae species using protein sequences. It describes:
1) Choosing nucleocapsid and membrane proteins as molecular markers and collecting sequences from NCBI.
2) Performing multiple sequence alignment on the proteins using MUSCLE in MEGA, which is more accurate than ClustalW.
3) Selecting maximum likelihood as the tree-building method because it uses all sequence information without reducing it to distances and makes fewer assumptions than other methods.
The document discusses using genetic programming to predict stock market movements. It provides an overview of genetic programming, describes how it has been applied to problems like time series prediction and stock market prediction, and discusses the author's experiments applying genetic programming to predict the S&P 500 index. The results show that in some periods, genetic programming outperformed traditional models and the buy and hold benchmark strategy in terms of investment gains, particularly when transaction costs were not considered.
Realtime, Non-Intrusive Evaluation of VoIP Using Genetic Programmingadil raja
The document discusses using genetic programming to develop a non-intrusive model for evaluating voice over IP (VoIP) quality based on transport layer network metrics. It first provides background on VoIP and challenges with speech quality assessment. It then describes a VoIP simulation environment and genetic programming approach. The goal is to evolve an estimation model for VoIP listening quality as a function of packet loss rate, jitter, bitrate and other transport metrics using genetic programming, and validate it against PESQ (Perceptual Evaluation of Speech Quality) scores.
An intelligent scalable stock market prediction systemHarshit Agarwal
Comparitive study of stock market prediction system using ANN and GONN. Sentiment analysis also done on yahoo news feed. Deployment done on hadoop cluster.
CGP is a form of genetic programming that represents computer programs as directed acyclic graphs encoded as strings of integers. Programs are evolved using genetic operations like mutation and crossover. Mutation changes integer values representing connections, functions, inputs, and outputs. Crossover combines parts of two genotype strings. The genotype is decoded into a phenotype program. CGP uses strategies like 1+λ evolution and can experience genetic redundancy from unused genes or sub-functions not optimized.
Genetic programming is an evolutionary algorithm that uses principles of natural selection and genetics to automatically generate computer programs to solve problems. It works by generating an initial population of random programs, evaluating their performance on the task, and breeding new programs through genetic operations like crossover and mutation. The fittest programs are selected to pass their traits to the next generation, while less fit programs are removed. This process is repeated until an optimal program is found. Genetic programming represents programs as syntax trees and evolves these trees to find solutions without requiring the programmer to specify the form or structure of the solution.
The document discusses genetic algorithms and genetic programming in Python. It describes how genetic algorithms are inspired by natural selection and genetics, using techniques like selection, crossover, and mutation to evolve solutions to problems. It provides examples of using the Python library PyEvolve to implement genetic algorithms and genetic programming to solve problems like minimizing test functions and forecasting temperatures.
This document outlines genetic programming, including an introduction describing it as emulating Darwinian evolution within computer programs. It describes the typical life cycle of genetic programming, including initializing a population, evaluating programs, selecting programs for reproduction, applying genetic operators like crossover and mutation, and repeating the process. It also discusses experimental setup parameters, genetic operators, fitness evaluation methods, and applications of genetic programming in areas like regression, classification, telecommunications, computer networks, finance, clinical domains, and more.
This presentation is intended for giving an introduction to Genetic Algorithm. Using an example, it explains the different concepts used in Genetic Algorithm. If you are new to GA or want to refresh concepts , then it is a good resource for you.
A basic overview, application and usage of MATLAB for engineers. It covered very basics essential that will help one to get started with MATLAB programming easily.
Provided by IDEAS2IGNITE
Presentation is about genetic algorithms. Also it includes introduction to soft computing and hard computing. Hope it serves the purpose and be useful for reference.
The document discusses ad-hoc networks and their key characteristics. It describes several challenges in ad-hoc networks including limited battery power, dynamic network topology, and scalability issues. It also summarizes several ad-hoc network routing protocols (e.g. DSDV, AODV, DSR), addressing both table-driven and on-demand approaches. Additionally, it outlines some ad-hoc MAC protocols like MACA and PAMAS that aim to manage shared wireless medium access.
Mobile ad hoc networks (MANETs) are formed spontaneously by wireless devices without any preexisting infrastructure. Nodes in a MANET are free to move and dynamically change the network topology. MANETs have applications in military operations, emergency response, education, and home/office use. Key challenges include dynamic topology, limited resources, and lack of centralized management. Media access control protocols address issues like hidden and exposed terminals. Routing protocols can be proactive (table-based) or reactive (on-demand) to find routes between nodes in the changing network.
Final Draft Biology Research Skills EssayOwen Walton
This document discusses the technique of prokaryotic gene cluster refactoring. It has been used to improve the efficiency of biosynthesis of secondary metabolites from active gene clusters for industrial and pharmaceutical applications. It has also been used to activate silent gene clusters to discover novel metabolites. Key achievements include refactoring gene clusters to produce high yields of metabolites like butyric acid, and refactoring the nitrogen fixation gene cluster to make it independently regulated. Refactoring aims to simplify clusters and make them easier to manipulate in heterologous hosts.
INTEGRALL is a freely available database containing over 4,800 sequences related to integrons, integrases, and gene cassettes. It provides scientists with easy access to sequence data, molecular arrangements, and genetic contexts of integrons. The database aims to organize 20 years of integron data in one place and facilitate understanding of integrons' role in bacterial adaptation and interactions. It currently includes sequences from a diverse range of bacteria and environments. Over half of gene cassettes encode antibiotic resistance genes.
This document discusses optimal tiling algorithms for selecting genomic DNA fragments for applications such as microarray design and homology searching. It defines several tiling problems involving finding the maximum weighted set of tiles (sequence fragments) within certain size bounds from a given genomic sequence. Typical parameter values are provided for applications involving sequencing lengths up to 3.4GB, tile sizes from 200bp to 1.5kb, and allowing overlaps of up to 100bp for homology searching. Efficient algorithms are sought with linear or near-linear runtimes to solve these tiling problems.
71st ICREA Colloquium "Intrinsically disordered proteins (id ps) the challeng...ICREA
This document discusses intrinsically disordered proteins (IDPs), which lack a fixed three-dimensional structure under physiological conditions and instead exist as dynamic ensembles. It notes that IDPs challenge the traditional view that proteins require a well-defined structure to function. The document also mentions that IDPs often gain structure upon binding to their targets, and that their interactions tend to be weak but optimal for regulation due to the entropic cost of folding. Finally, it suggests intrinsic disorder may have evolved to allow low affinity interactions while maintaining high specificity.
Protein Structure Prediction Using Support Vector Machine ijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
PROTEIN STRUCTURE PREDICTION USING SUPPORT VECTOR MACHINEijsc
Support Vector Machine (SVM) is used for predict the protein structural. Bioinformatics method use to protein structure prediction mostly depends on the amino acid sequence. In this paper, work predicted of 1-
D, 2-D, and 3-D protein structure prediction. Protein structure prediction is one of the most important problems in modern computation biology. Support Vector Machine haves shown strong generalization ability protein structure prediction. Binary classification techniques of Support Vector Machine are implemented and RBF kernel function is used in SVM. This Radial Basic Function (RBF) of SVM produces better accuracy in terms of classification and the learning results.
The document presents a cell-cycle knowledge integration framework that aims to capture the semantics, temporal aspects, and dynamics of the cell cycle regulatory process. It proposes an enhanced Cell-Cycle Ontology (CCO) as an extension to existing ontologies like the Gene Ontology. A data integration pipeline is described that covers the development and maintenance of the CCO knowledge base by integrating data from multiple sources. The CCO knowledge base is intended to facilitate computational analysis of cell cycle studies and enable hypothesis evaluation through reasoning services.
GPCR PROTEIN FEATURE REPRESENTATION USING DISCRETE WAVELET TRANSFORM AND PART...ijma
Features play an important role in representing classes in the hierarchy structure, and using unsuitable
features will affect classification performance. The discrete wavelet transform (DWT) approach provides the
ability to create the appropriate features to represent data. DWT can produce global and local features using
different wavelet families and decomposition levels. These two parameters are essential to obtain a suitable
representation for classes in the hierarchy structure. This study proposes using a particle swarm optimisation
(PSO) algorithm to select the suitable wavelet family and decomposition level for G-protein coupled receptor
(GPCR) hierarchical class representation. The results indicate that the PSO algorithm mostly selects
Biorthogonal wavelets and decomposition level 2 to represent GPCR protein. Concerning the performance,
the proposed method achieved an accuracy of 97.9%, 85.9%, and 77.5% at the family, subfamily, and subsubfamily levels, respectively.
It is increasingly recognized that complex systems cannot be described in a reductionist view.
Understanding the behavior of such systems starts with understanding the topology of the corresponding network.
Topological information is fundamental in constructing realistic models for the function of the network.
A consistent and efficient graphical User Interface Design and Querying Organ...CSCJournals
We propose a software layer called GUEDOS-DB upon Object-Relational Database Management System ORDMS. In this work we apply it in Molecular Biology, more precisely Organelle complete genome. We aim to offer biologists the possibility to access in a unified way information spread among heterogeneous genome databanks. In this paper, the goal is firstly, to provide a visual schema graph through a number of illustrative examples. The adopted, human-computer interaction technique in this visual designing and querying makes very easy for biologists to formulate database queries compared with linear textual query representation.
Design and development of learning model for compression and processing of d...IJECEIAES
This document describes a deep learning model for efficient compression and processing of genomic DNA sequence data. It proposes using computational optimization techniques across three stages of genomic data compression: extraction, storage, and retrieval of data. The model uses gene network embedding to combine gene interaction and expression data to learn lower dimensional representations. It predicts gene functions, reconstructs gene ontologies, and predicts gene interactions. Evaluation shows the model achieves an average precision score of 0.820 for gene interaction prediction on test data.
Fast protein binding site comparisons usingzhehuan01
This document summarizes a new method called PBSword for comparing protein binding sites. PBSword represents each binding site as a "visual words" vector that encodes the frequency of geometric features clustered into a vocabulary. It is able to quickly identify geometrically similar binding sites from large databases. Experimental results show PBSword achieves comparable classification accuracy to alignment-based methods but is much faster for large-scale comparisons.
This document summarizes recent evidence that the arrangement of chromosomes, gene loci, and nuclear bodies within the cell nucleus is not random but rather exhibits spatial organization that influences gene expression and nuclear processes. Techniques such as fluorescence in situ hybridization and chromosome conformation capture have provided insights into the positioning of chromosomes and genes relative to nuclear landmarks. Chromosomes occupy distinct territories within the nucleus and preferentially localize near the nuclear interior or periphery depending on their gene content. Association with nuclear subcompartments such as the nuclear lamina, nuclear pores, nucleoli, and polycomb bodies can impact the transcriptional state of genes and chromatin domains. Advances in genome-wide and time-lapse imaging approaches are helping to further characterize nuclear organization
- The document discusses various approaches for applying machine learning and artificial intelligence to drug discovery.
- It describes how molecules and proteins can be represented as graphs, fingerprints, or sequences to be used as input for models.
- Different tasks in drug discovery like target binding prediction, generative design of new molecules, and drug repurposing are framed as questions that AI models can aim to answer.
- Techniques discussed include graph neural networks, reinforcement learning, and conditional generation using techniques like translation models.
- Several recent works applying these approaches for tasks like predicting drug-target interactions and generating synthesizable molecules are referenced.
This document summarizes a presentation on protein-protein interactions. It discusses biological aspects of PPIs and introduces several PPI databases and tools. The presentation is divided into sections on the introduction of PPIs, databases like BIND and DIP, pathways and algorithms, and visualization tools. It provides information on the types and methods of studying PPIs experimentally.
The document discusses the Biomolecular Interaction Network Database (BIND), which stores information about molecular interactions, complexes, and pathways. BIND uses standards like ASN.1 and XML to specify interactions. It stores details about molecules, interactions, publications, and more. Tools like Pajek and MCODE are used to visualize and analyze the network. The database has expanded to include additional details like post-translational modifications, cellular localization, and links to other databases. Manual and automatic submission is supported.
1) Nanomedicine uses nanotechnology to prevent and treat disease by imparting properties like antibacterial and anti-odor functionality to skin products using nanomaterials.
2) Molecular biomimetics marries materials science and molecular biology to develop functional hybrid systems composed of inorganic and protein materials. It uses protein templates designed through genetics, surface proteins to bind synthetic entities, and the ability of proteins to self-assemble into complex structures.
3) Combinatorial biology techniques like phage display and cell-surface display are used to select protein sequences that strongly bind to inorganic surfaces from large random peptide libraries. This allows obtaining inorganic-binding proteins without an a priori knowledge of their sequence.
1) The document discusses interfaces between nanoelectronic and biological systems, including applications of nanotechnology in medicine (nanomedicine) and molecular biomimetics.
2) It describes techniques like phage display and cell surface display that use libraries of random peptides to select sequences that bind to inorganic surfaces for constructing hybrid biomaterials.
3) The applications discussed include bio-nanowire devices for ultrasensitive detection, nanowire-based sensors for cancer markers, and interfaces between nanoelectronic devices and cells.
1. Biomolecular Sciences Degree Group
Critical Essay
2015-16
CELLULAR LOGIC AND MEMORY:
THE USE OF ɸC31 INTEGRASE AND
RELATED SERINE INTEGRASES IN
GENETIC CIRCUIT DESIGN
Supervised by Dr. Sean Colloms
Dylan MacPhail
Matriculation number: 2022896
2. 2022896 5461 words
Page 1 of 27
ABSTRACT
The serine integrases are a subfamily of phage recombinases which are capable of integrating,
inverting, or excising a segment of DNA between their recognition sites with a high degree of
efficiency. Integration occurs between the phage and bacterial attachment sites (attP/B), and excision
occurs at the resultant attL/R sites to resolve the original state, requiring a recombination
directionality factor (RDF). Inversion of a segment of DNA is also possible by flanking with inverted att
sites. The directionality of inversion can be tightly controlled by expression of an RDF, and thus serine
integrases allow ‘flipping’ of a segment of DNA between two states. Due to the binary nature of
computational logic this control of directionality makes these proteins particularly attractive as a
method of implementing logic and memory into genetic circuit design. This review details aspects of
the origin, structure, and function of the widely utilised serine integrase from bacteriophage ϕC31
and discusses its application in synthetic genetic circuitry.
ABBREVIATIONS
CTD – C terminal domain
LSR – Large Serine Recombinase
NTD – N terminal domain
RAD – Recombinase addressable data
RDF – Recombination Directionality Factor
INTRODUCTION
The field of biology is vast and diverse, and billions of years of trial and error through evolution
have yielded multifaceted designs integrating a near infinite assortment of complex
functionalities. It is therefore not surprising that humans often look to biodiversity to find
inspiration when engineering new materials. One area where this has traditionally not been the
case however is circuitry and computing; an industry which has grown exponentially since its
conception. Despite the vast complexity now achievable by electronics, biology still has much
inspiration to offer to this field in terms of robustness and redundancy in complex networking
(George et al., 2003).
3. 2022896 5461 words
Page 2 of 27
The fact that all biology, on the cellular level, is a product of complex interactions with
and information storage in DNA is too often overlooked. At this level all events can be viewed
as the result of the labyrinthine circuitry of promoters, repressors, activators, and genes which
operate in harmony to produce all components of an entire organism from the same basic DNA
programming. This is achieved through developmental and regulatory switches which are not
dissimilar in function to those found in modern electronics (reviewed by Bonnet and Endy,
2013).
In computing information is stored and processed using binary algorithms in which the
basic unit is one bit. One bit of information represents a switch for which there are two possible
states: 0, or 1. Two bits of information therefore represents the number of switches needed to
record one of four possible combinations (00, 01, 10, or 11), while three bits of information can
record double this amount of patterns (000, 001, 010, 011, 100, 101, 110, or 111). The number
of possible states doubles with each bit of ‘memory’ added to the system (n bits = 2n
states) such
that 8 bits represents the capacity to store one of 256 possible patterns of 0 and 1; this is known
as one byte, which can instruct the display of one of 256 characters, numbers, or colours; or
performance of one of 256 possible actions (Horowitz and Hill, 2015). DNA, however, is not a
binary system because each position can be occupied by one of four possible bases, thus
allowing each base to represent one possible combination of a 2-bit system. If the information
storage capacity of DNA were to be fully exploited the region of DNA required to hold the
average bacterial gene (1,100bp) (Parakhia, 2010) could store 2,200 bits of information, or 275
bytes.
The capacity for DNA to securely store information in a stable state is the foundation of
a new field in information technology. Goldman and co-workers (2013) have demonstrated the
in vitro storage and recovery of 739 kilobytes (739x103
bytes) of information in DNA. DNA
4. 2022896 5461 words
Page 3 of 27
cryptography seeks to fully utilise DNA as a stable, low cost storage and encryption tool for
sensitive information which does not require energy input to maintain (Jacob and Murugan,
2013). This information is written into the DNA by chemical synthesis and read by sequencing
however, and as such is not suitable for use in the autonomous regulation of gene networks
within cells through in vivo read and write functions.
With the fields of synthetic biology and genetic engineering expanding rapidly
researchers are searching for methods of increasing control over gene networks such as
metabolism, and as such have begun looking at electronics and computing for the answers they
need. Integration of memory into a circuit (a non-transient switch of state in response to a
certain input) allows programmability as both transient and ongoing inputs can inform the
output of a circuit, while layering of circuits such that the output of one circuit is an input for the
next allows complexity to be achieved (Moon et al., 2012). Circuits which can modulate their
binary output based on the states of two distinct binary inputs demonstrate Boolean logic, for
which a full range of functions can be seen in Table 1.
Such programmable gene networks utilise a variety of methods stimulating promotion
and inhibition of genes and memory within the circuit (reviewed by Brophy and Voigt, 2014)
including protein-protein interactions (Moon et al., 2012; Stricker et al., 2008), protein-DNA
Table 1: Truth table showing the full range of Boolean Logic Functions. Output is determined by the
binary state of two inputs, A and B. The pattern of outputs for each possible input is dependent on
the architecture of the ‘logic gate’ which they inform.
5. 2022896 5461 words
Page 4 of 27
interactions (Lohmueller et al., 2012), RNA – based methods (Liang et al., 2011), and even use
of the recently discovered CRISPR-Cas system (Bikard et al., 2013; Mimee et al., 2015). These
circuits often encounter problems during development not only due to their design, but also
their context within living cells where interaction with other proteins and their high demand for
cellular resources in order to function can impair both the health of the host organism, and the
stability and predictability of the circuit (reviewed by Brophy and Voigt, 2014; Cardinale and
Arkin, 2012). This has led to a drive for the standardisation of components for synthetic biology
applications akin to a ‘parts list’ for electronic circuitry, as well as development of the
computational tools necessary to design and build predictable genetic programming (Rodrigo
and Jaramillo, 2013).
Recombinases in particular have become particularly useful as tools for genome
engineering and synthetic biology in recent years due to their ability to mediate the conservative
inversion, integration, or excision (resolution) of large segments of DNA, allowing efficient
cloning and DNA modification in vivo (reviewed by Fogg et al., 2014). Examples of recombinases
include invertases, resolvases, and integrases; classifications related to their native biological
function. Serine Integrases, a subfamily of the Large Serine Recombinases (LSRs), could be
enormously valuable in genetic circuit design. These are particularly useful due to their
predictability within a vast range of cell types, as well as the low-maintenance cost in terms of
cellular resources, small circuit size, and heritability associated with building circuitry directly
into DNA (reviewed by Fogg et al., 2014). By manipulating their attachment (att) sites phage
integrases can be made to controllably and reversibly invert DNA such that the coding strand is
non coding, or change the directionality of a gene promoter. The binary nature of this
manipulation makes them particularly amenable to the implementation of logic, memory, and
programming in living cells. This review will discuss both structural and functional aspects of the
6. 2022896 5461 words
Page 5 of 27
ϕC31 Serine Integrase and its relatives as well as their current applications and future potential
in integrating logic and memory into synthetic gene networks.
INTEGRASES IN CONTEXT
Temperate bacteriophages encode an archetypal genetic switch which allows them to cycle
between lytic growth and dormant lysogeny within prokaryotic cells in response to changing
environmental cues (reviewed by Fogg et al., 2014; Oppenheim et al., 2005). During lysogeny
lytic genes are suppressed, the phage genome is not transcribed and is replicated within a
specific site of the host genome, and lysogens are immune to superinfection (Oppenheim et al.,
2005). In 1962 Campbell was the first to suggest that the λ prophage is integrated into the host
genome at a specific site by ‘recombination’ during the lysogenic switch (Campbell, 1962). λ
integrase is as such the best understood of the phage integrases, a subset of recombinases
characterised by their ability to mediate both the host integration and excision of prophage DNA
from the host genome under separate conditions (Esposito and Scocca, 1997).
λ integrase is a tyrosine recombinase, one of two large families of recombinase (the
other being the serine recombinases), which are named after the conserved residue within their
active site used to form a covalent phospho-linkage with the DNA backbone during
recombination (reviewed by Fogg et al., 2014). Integrases insert phage DNA into host genomes
by binding the phage DNA attachment site (attP) and the attachment site of the bacterial host
DNA (attB), mediating a recombination reaction in which the end product is integrated phage
DNA flanked by the left and right attachment sites (attL & attR) (Campbell, 1992). These each
consist of one half of attP and one half of attB with a small overlap of complimentary nucleotides
in the middle. This process is unidirectional, and so in order to excise the phage DNA and resolve
the prophage (attL and attR recombination) an excisonase or recombination directionality factor
(RDF) is required (Lewis and Hatfull, 2001). Bacteriophage λ encodes the excisonase Xis, which
7. 2022896 5461 words
Page 6 of 27
it uses to mediate phage resolution, however like all tyrosine integrases a range of host cofactors
are also needed for both integration and excision of the λ prophage (Lewis and Hatfull, 2001).
Naturally att sites are direct repeats when flanking a sequence, however inversion of one att
site such that they are palindromic allows a section of DNA to be ‘flipped’ using a recombinase
(see Figure 1).
The phage encoded serine integrases belong to the LSR family which the retain catalytic
recombinase N terminal domain (NTD), but all exhibit a large, structurally diverse C terminal
extension of around 300-500 amino acids (aa) compared to a typical C terminal domain (CTD) of
40aa in other serine recombinases (Smith and Thorpe, 2002, Smith et al., 2010). While tyrosine
integrases require host co-factors for integration or resolution to occur, serine integrases do
not, and need only their cognate RDF for excision (Smith et al., 2010). In addition to this the
serine integrases also require less genomic space for their att sites, which are both <50bp and
consist of a core TT crossover site flanked by two quasi-symmetrical inverted repeats (Thorpe et
al., 2000). This is compared to >200bp attP, <30bp attB in tyrosine integrases (see Figure 2).
Serine integrases also have no topological restrictions on their att sites, while λ integrase
requires attP to be supercoiled (reviewed by Fogg et al., 2014; Smith et al., 2010).
Properties such as these make Serine integrases like the integrase (Int) from the
Streptomyces phage ϕC31 attractive targets for applications in synthetic biology and genome
Figure 1: att site positioning effects function. Integration causes the attL and attR sites to be arranged
in a directly repeating conformation, allowing excision when Int is expressed with the RDF. If att sites
are arranged to be inverted repeats inversion occurs when Int or Int + RDF are expressed.
8. 2022896 5461 words
Page 7 of 27
engineering due to the smaller sequence requirement to encode all factors needed for in vivo
activity. ɸC31 Int is among the most studied of the serine integrases; first described in 1991
(Kuhstoss and Rao, 1991), it was extensively utilised to mediate recombination within
Streptomyces strains, and also found use as a reliable unidirectional recombinase within other
prokaryotic and eukaryotic cells (reviewed by Smith et al, 2010). While the first RDF for a serine
integrase was discovered for Bxb1 in 2006 (Gosh et al., 2006), the RDF for ϕC31 Int, gp3,
remained elusive until 2011 (Khaleel et al., 2011); a discovery which finally allows full utilisation
of ϕC31 Int as a reversible integrase in synthetic biology applications.
STRUCTURE AND MECHANISM OF THE ΦC31 INTEGRASE
ɸ C31 Int is a 613 amino acid protein which mediates the unidirectional recombination of attP
and attB sites with high specificity (Kuhstoss and Rao, 1991). ɸC31 Int is a dimer in solution, and
binds each att site as such, while studies in the related serine integrase TP901-1 suggest that
both sites are then brought together to form a synaptic tetramer (Yuan et al., 2008). Unlike the
synapsis mechanism of the tyrosine integrases, which relies upon the formation of a Holliday
Junction – like intermediate (reviewed by Fogg et al., 2011), ϕC31 Int and the Serine integrases
cause a staggered double stranded break in the crossover region of each site which leaves a two
Figure 2: att site structure comparison between tyrosine and serine integrases. While Tyrosine
integrases recognise a smaller attB site, their attP site is many times larger than that of the serine
integrases, and contains binding regions for integrase, RDF, and host cofactors. The overlap region of
the tyrosine integrases is also larger.
9. 2022896 5461 words
Page 8 of 27
base pair overhang of Thymine or Adenine bases at each half site (reviewed by Fogg et al., 2011;
Smith and Thorpe, 2002; Smith et al., 2010). Following synapsis, it is thought that a right handed
‘gated rotation’ mechanism rotates each half site 180° relative to the other within the synaptic
tetramer and the overhanging bases are re-ligated and released, forming the attL and attR sites
(Olorunniji et al., 2012). Figure 3 shows the methods of synapsis and strand transfer used by
tyrosine and serine integrases.
Figure 3: Synapsis methods of phage integrases.
(A) Tyrosine integrases bind attP and attB sites forming a synaptic tetramer, and make a single
stranded ‘nick’ in each site (at black arrowheads) by attack with their catalytic residue. The
free 5’OH then attacks the 3’ phospho-tyrosine of the opposing att site nick, knocking off the
integrase monomer bound to it and forming a Holliday junction-like intermediate. This is then
repeated by the remaining integrase subunits (at white arrowheads) to fully integrate the
phage.
(B) Serine integrases also form a synaptic tetramer, however they attack all strands
simultaneously (black arrowheads), causing a double stranded break in each genome with a
two base pair overhang. Bound tightly to the integrase monomers by their 5’ phospho-serine
linkage, strands are rotated as each integrase dimer rotates 180° with respect to one another.
The free 3’OH of each strand then attacks the phosphoserine of the neighbouring half-site,
completing integration.
10. 2022896 5461 words
Page 9 of 27
Mutation of the key active site Serine in the NTD, S12A, abolishes the ability of ϕC31 Int
to induce strand transfer, however synapsis still occurs (Rowley and Smith, 2008), suggesting
regulation is not coupled to catalysis. Sequence alignment has shown the NTD of ϕC31 Int to be
homologous to those of other recombinases (Rowley and Smith, 2008), however only
unpublished data exists for the crystal structure of ϕC31 Int NTD (McMahon et al., 2013). Figure
4 shows the crystal structure of the activated transposon serine recombinase γδ resolvase which
clearly indicates the interface at which gated rotation might occur, and the crystal structure of
Figure 4: Evidence supporting gated rotation of ϕC31 tetramer
(A) Amino acid sequence alignment of ϕC31 Int against γδ resolvase and TP901-1 Int. Conserved
residues are highlighted, showing homology between proteins. Alignment was generated
using CLUSTAL Ω sequence alignment tool.
(B) Adapted from Yuan et al. (2008). Tetramer of activated γδ resolvase NTD bound to DNA.
Interface for rotation can be clearly seen and is indicated by arrowheads.
(C) Adapted from Yuan et al. (2008). Tetramer of TP901-1 Int NTD (unbound) showing structural
similarity to γδ resolvase tetramer. Arrowheads indicate possible rotation interface.
(D) Monomer of TP901-1 Int NTD analogous to subunit II in (C). Structure is coloured in a
spectrum with blue representing the N terminus, and red representing the C terminus.
Monomer was isolated from crystal structure of a tetramer (Yuan et al., 2008). PDB ID: 3BVP.
Image was generated using The PyMOL Molecular Graphics System, Version 1.7.2.2
Schrödinger, LLC.
(E) Monomer of ϕC31 Int NTD showing similarity to TP901-1 Int. Structure is coloured in a
spectrum with blue representing the N terminus, and red representing the C terminus. NTD
was isolated from unpublished crystal structure of N terminal and recombinase domains
(McMahon et al., 2013). PDB ID: 4BQQ. The largest difference between (D) and (E) is the
orientation of the red helix, which may be due to differences in the structures from which the
segment of protein shown was isolated. Image was generated using The PyMOL Molecular
Graphics System, Version 1.7.2.2 Schrödinger, LLC.
11. 2022896 5461 words
Page 10 of 27
a tetramer of the serine integrase TP901-1 Int which is consistent with this being the mechanism
used by serine integrases (Yuan et al., 2008). While it has been suggested that multiple rounds
of rotation are possible within other Serine integrases (Bai et al., 2011) this was demonstrated
to be unlikely during the canonical action of ϕC31 Int as long as the two core att site base pairs
match (Olorunniji et al., 2012).
The default function of ϕC31 Int is therefore recombination of the attP and attB sites,
and band-shift assays by Thorpe, Wilson, and Smith in 2000 showed that attL and attR
recombination (resolution) could not be mediated by ϕC31 Int alone. The mechanism by which
this directionality is controlled is poorly understood, and while it is now known that the RDF gp3
is needed for attL/attR recombination, the CTDs of serine integrases are also known to play a
large role in this regulation (Rowley et al., 2008; McEwan et al., 2009; reviewed by Smith et al.,
2010; Fogg et al., 2011; Rutherford and Van Duyne, 2014). Indeed, it has been shown that a
single amino acid substitution in the CTD of ϕC31 Int, E449K, produces a hyperactive Int which
can not only catalyse attP/attB recombination, but also attL/attR, attL/attL, and attR/attR, and
could still mediate formation of these synapses in the background of an S12A mutation (Rowley
et al., 2008). Several of such hyperactive mutants, including Int E449K, were identified in a coiled
coil domain on the CTD, a motif which commonly mediates protein-protein interactions.
Furthermore, experiments with the purified histidine tagged CTD show that while the CTDs
alone are monomers in solution they interact co-operatively to bind att sites, and that L460P
and Y475H mutations abolished inter-CTD interaction, DNA binding, and synapsis (McEwan,
Rowley and Smith, 2009). The E449K mutant in the isolated CTD could bind DNA, but could not
catalyse synapse formation, suggesting that formation of a CTD synapse is not essential for ϕC31
Int binding, but is intimately involved in the control of directionality.
12. 2022896 5461 words
Page 11 of 27
Although the full role of the RDF in this directionality remains poorly investigated, it is
likely that gp3 plays a structural role in the interaction, conferring a conformational change to
the CTD which allows resolution of the recombination sites. It has been shown in the serine
integrase Bxb1 that RDF binds Int attached to the attP/attB sites tightly and inhibits
recombination, while promoting excision at attL/attR sites (Ghosh, Wasil and Hatfull, 2006).
Without crystal structures for the full ϕC31 Int the mechanism of its RDF will likely
remain difficult to elucidate. Another area which remains cryptic is the mechanism of action of
the DNA binding domain(s) within this protein. Site-directed mutagenesis of conserved residues
within the CTD has indicated a cysteine rich motif and a valine rich motif to be important in DNA
binding (Liu et al., 2010), however the mechanism of this interaction has not been defined. The
cysteine rich motif in the CTD is a putative zinc finger domain (McEwan et al., 2011). Rutherford
and Van Duyne (2014) hypothesise that the specific orientation of the Int on each half site
Figure 5: Proposed roll of coiled coil domains in directionality of serine integrases. Adapted from
Rutherford and Van Duyne (2014). Binding of Int dimers to att sites positions coiled coil domains on
either side of the DNA. These domains then form inter-dimer interactions, and following rotation and
ligation all coiled coil domains are on the same side of the recombined att sites. When a dimer is bound
to a recombined att site in the absence of an RDF following tetramer dissociation the coiled coil
domains form intra-dimer interactions which prevent reformation of the tetramer without the RDF.
13. 2022896 5461 words
Page 12 of 27
conferred by the zinc finger domains allows the CTD coiled coil motifs to interact in an inhibitory
manner upon synapsis such that resolution is not possible without the RDF (see Figure 5).
The mode of DNA interaction is of particular interest in synthetic biology applications as
knowledge of this could allow production of integrases which could specifically bind to
endogenous sites, and also improvement of the existing specificity. It has been shown that ϕC31
Int can be targeted to non-native pseudo-sites which resemble att sequences (Combes et al.,
2002; Malla, 2005; Chalberg et al., 2006), and directed evolution through DNA shuffling has
been reported to have yielded versions of the ϕC31 Int protein which have a high binding
specificity and frequency to a pseudo attP site on human chromosome 8, as well as versions
which integrate more efficiently to pre-inserted sites within the human genome (Sclimenti,
2001; Keravala et al., 2008). High specificity targeting of recombinases has also been
demonstrated using chimeric Zinc finger domains (Akopian and Stark, 2005), however this has
not been demonstrated in ϕC31 Int, and would likely affect the regulation of the integrase.
While integrases can be targeted to ‘landing’ sites introduced into eukaryotic genomes
(reviewed by Fogg et al., 2011), concerns have been raised over the efficiency of ϕC31 Int action
within eukaryotic cells despite its widespread usage for this purpose. Inversion of the DNA
between palindromic att sites introduced using this method was used to re-arrange segments
of the human Y chromosome using ϕC31 Int (Malla, 2005), however it was found that
recombination only occurred in 56% of cells, and in some cases the action of Int had left deleted
regions of DNA, or small insertions attributed to intervention of host double stranded break
repair mechanisms such as non-homologous end joining. This would suggest that the synaptic
complex is around for a much longer time in eukaryotes, and is consistent with findings that
although ϕC31 Int can perform integration with 100% efficiency at pseudo attB sites in E. Coli
and other prokaryotes, integration is only 50% efficient in human cells (Chalberg et al., 2006),
14. 2022896 5461 words
Page 13 of 27
and efficiency in eukaryotic cells varies between organisms. It is possible that bacteriophage
integrases are not well suited to the nuclear environment of eukaryotic cells. In addition to the
lesions observed due to host repair, the efficiency of ϕC31 Int within human cells could be mildly
impaired by interaction with endogenous cell death associated protein DAXX (Chen et al., 2006).
Interestingly, this interaction occurs within the same region of the ϕC31 Int CTD as the putative
coiled coil domain. Despite this hurdle ϕC31 Int continues to be used in eukaryotic cells due to
the ability to predict insertion sites, and reliably preform integration more efficiently than
tyrosine recombinases and other methods of genomic insertion such as homologous
recombination, resulting in long lasting expression of gene products (reviewed by Fogg et al.,
2011).
The reliability of ϕC31 Int in prokaryotic cells and utility in eukaryotic cells, as well as the
reversibility conferred by use of gp3, allows it to be useful in many emerging biotechnologies.
One such advancement is a mechanism for fast and accurate plasmid assembly pioneered by
Colloms and colleagues (Colloms et al., 2013). The SIRA (Serine Integrase Recombination
Assembly) method for plasmid assembly relies on their findings that the two base pairs of the
core att site crossover region do not require conservation so long as both are the same. This
means that one ϕC31 Int enzyme can assemble multiple regions of DNA in an order pre-
determined by careful arrangement of att sites with different crossover regions flanking the
DNA. Through this method a cassette of up to six gene segments can be assembled in one
reaction using different combinations of core nucleotides, abolishing the traditionally arduous
process of using restriction enzymes and ligases in separate reactions for each inserted section
of DNA. In addition to this, any number of the segments can be removed, or replaced entirely
by a previously assembled cassette, or any expanse of DNA flanked by the appropriate
recombination sites. Furthermore, addition of further serine integrases will allow more
fragments to be assembled simultaneously. This technology vastly expands the scope of genetic
15. 2022896 5461 words
Page 14 of 27
circuit design as any number of functional units can be easily integrated or changed in few steps.
Rates of transcription can also be predictably controlled by varying the distance between a gene
and its promoter, or the positioning of an inhibitory genetic signal within the assembled array.
SIRA assembly demonstrates the extensive utility of ϕC31 Int in synthetic biology and adds to a
growing list of advantages for Int use over many of the traditional methods used for genome
engineering
ENGINEERING LOGIC AND MEMORY IN GENETIC CIRCUITRY
In terms of genetic circuit or metabolic design assembly of components by means of the
SIRA method allows greater predictability of gene expression. This could aid the mathematical
design of genetic circuits and metabolic pathways by computational methods, as the functions
of components which utilise binary and Boolean logic can be predicted efficiently (reviewed by
Brophy and Voight, 2014). Additionally, the use of differing base pairs in the crossover region of
the recombination sites could allow control of multiple outputs simultaneously by one input
which drives ϕC31 Int expression with or without its RDF. Although genetic circuitry assembled
in such a fashion could be highly predictable it is however unlikely that such ‘cellular machines’
will ever achieve widespread use as supercomputers due to the existing utility of such electronic
constructs, although it is worth noting that biochemical networks are capable of Turing Machine-
like functions and can compute large and complex calculations in as little time as simple ones
(Hjelmfelt, Weinberger and Ross, 1991).
Although the aforementioned SIRA assembly method technique efficiently utilises
serine integrases for circuit construction, the full potential of these proteins in this application
can be realised by using them as the functional units of the circuit. Recombinase based
approaches to biocomputation and gene control mostly utilise the ability of these proteins to
flip a section of DNA between palindromic attP and attB sites, or attL and attR sites. This ability
16. 2022896 5461 words
Page 15 of 27
therefore changes the sequence of DNA in a non-energy dependant, and importantly heritable
and highly efficient way. This allows the recording of, and response to different stimuli, as well
as binary and Boolean logic functions, to be encoded on a vastly reduced piece of DNA real estate
than use of genetics and biochemical pathways alone permits (reviewed by Brophy and Voight,
2014; Fogg et al, 2011).
One of the most important components which must be implemented into complex
circuitry is memory. Memory allows a sustained or heritable response to transient stimuli within
a circuit, and can permit a response which is informed by multiple sequential inputs (reviewed
by Brophy and Voight, 2014; Horowitz and Hill, 2015). Due to their ability to invert a segment of
DNA between two states, memory is an inherent aspect of the recombinases, and has been
achieved within bacterial cells (Ham et al., 2008; Yang et al., 2014). The ability of serine
integrases such as those from ϕC31 and Bxb1 to controllably invert DNA without the need for
host co-factors or large att sites, however, gives them an edge in this application (Bonnet et al.,
2012; Bonnet et al., 2013; Siuti et al., 2013).
While a single recombinase can be used to achieve digital memory with a 1-bit capacity,
(Bonnet et al., 2013; Siuti et al., 2013), layering of the sites for different recombinases allows
complex memory of order, or number, of inputs in complex ‘state machines’ which fit into a
stretch of DNA smaller than the average gene (Ham et al., 2008; Bonnet et al., 2012; Yang et al.,
2014). Recombination addressable data (RAD) modules were designed by Bonnet and co-
workers (Bonnet et al., 2012) which implemented the serine integrase from Bxb1 as a unit of
reversible memory within cells. Ham and colleagues suggest that 10 recombinases with
overlapping RAD modules could form 1010
possible states of DNA thus recording 1010
different
patterns of input from 10 signals (Ham et al., 2008). Bonnet and co-workers estimated that using
RAD modules construction of an 8bit memory system with 1 byte of memory capable of counting
17. 2022896 5461 words
Page 16 of 27
256 input pulses would require 16 recombinases (Bonnet et al., 2012). Yang and colleagues,
however, have demonstrated recording of 1.375 bytes of information using 11 different RAD
switches designed using serine integrases discovered through data mining for homology to ϕC31
Int (Yang et al., 2014). This is a gargantuan achievement in context as previous studies had only
demonstrated a maximum memory capacity of 2 bits (Bonnet et al., 2013; Siuti et al, 2013).
Although it can be argued that it does not utilise the full 2 bit / base pair capacity of DNA, the
utility of this method of memory is much greater as it can be used in rewritable storage which is
able to record data in living cells, as opposed to the single write functionality of DNA
cryptography (Ham et al., 2008; Goldman et al., 2013). Durational recording has also been shown
to be possible through genetic manipulation, however recording of this information occurred at
the population level in a form of analogue memory based on the increasing number of
responsive cells (Farzadfard and Lu, 2014). This process would be difficult to replicate using
recombinases due to their high efficiency and digital nature, and moreover a population
response would not be amenable to circuit design.
Readouts from large memory modules such as those described above requires
sequencing, digestion, or PCR; however smaller memory modules can also feasibly be
interpreted using fluorescence. The full potential of such memory is only realised however when
it is implemented into genetic circuitry through incorporation of active genetic elements,
permitting logic within the circuit as opposed to single input response (reviewed by Brophy and
Voight, 2014). The ability for logic in recombinases through use of a 2-bit system was exemplified
in 2008 by the ability of bacteria to survive on antibiotic when the associated resistance gene
was not expressed until the constitutively expressed Hin recombinase it encoded (which does
not require an RDF for reversion) was able to solve the Burnt Pancake Problem (Haynes et al.,
2008). The Burnt Pancake Problem is a logic problem whereby a stack of ‘Pancakes’ (DNA flanked
by hix sites in this case in this case) must be sorted into the correct order and each manipulation
18. 2022896 5461 words
Page 17 of 27
reverses the order of one or more ‘pancakes’ within the stack (see Figure 6). While the involved
recombinases were constitutively expressed in this experiment, and did not require an RDF, it
demonstrates the possibility that multiple recombinases with expression under the control of
different inputs could mediate output only in a specific combination. This demonstration is also
potentially useful in durational memory as the duration of a specific input driving Int expression
could be estimated based on the minimum number of random inversions in a stretch of DNA
with specific recognition sites needed to reach the observed configuration form the starting
sequence (Haynes et al., 2008). Since this demonstration a full range of all Boolean logic
functions has been achieved in 2 bit systems using the TP901-1 and Bxb1 integrases, and the
Figure 6: Solving the burnt pancake problem with integrases. In this logic problem a stack of
pancakes is presented which all have one good side facing up (solid colour in figure), and one burnt
side facing down (hashed colour in figure). The stack is the wrong way around, and the aim is to sort
all pancakes in the stack so that they are arranged from smallest (on top) to largest, all with burnt
side facing down. If the red ‘pancake’ represents an antibiotic resistance gene, and the purple
‘pancake’ is a promoter (both flanked by att sites) then the minimum amount of flips required for gene
expression represents the quickest solution (3 flips in this case) to the burnt pancake problem. Using
att sites with different overlap regions (arrowheads) one serine integrase (and its RDF) could solve this
logic problem in cells grown on antibiotic. If three separate integrases were used on their cognate att
sites (arrowheads) each flip could be controlled by a separate input.
19. 2022896 5461 words
Page 18 of 27
ϕC31 and Bxb1 integrases respectively in a
genetic device termed ‘the transcriptor’ by
Bonnet et al. (See Figure 7) (Bonnet et al.,
2013; Siuti et al., 2013).
With the ability of serine integrases
to perform logic and memory functions in
living cells demonstrated as described above
achieving any function needed for a desired
gene network is a matter of combinatorial
design using these components and other
known mechanisms of gene regulation
(reviewed by Brophy and Voight, 2014; Fogg
et al., 2011). For example, an important
function in some electronic circuits can be
oscillation of the output signal and this was
achieved in bacteria through negative and
positive biochemical feedback loops
(Stricker et al., 2008). This could
theoretically be achieved through
expression of a serine integrase which flips a
promoter driving the expression of either
the desired oscillatory gene when flanked by
attP/B sites, or the cognate RDF when
flanked by attL/R sites. Oscillation period
could be extended via targeting of the RDF
Figure 7: Creation of all Boolean logic gates using
two serine integrases. Adapted from Bonnet et al.
(2013). By using a one stranded transcriptional
repressor (red/grey T) and a constitutive promoter
(green arrow) all Boolean logic gates can be
created. These logic gates are operated by two
serine integrases which recognise a distinct set of
att sites (blue and yellow or black and white
arrowheads) and flip the DNA between them,
hence modulating the polymerase flow on each
strand of the DNA.
20. 2022896 5461 words
Page 19 of 27
mRNA by a complimentary non-coding mRNA, which could be controlled independently (see
Figure 8).
Logic gates operated by serine integrases could be layered in any combination with any
number of downstream effects and feedback loops, integrating any number of other genetically
encoded genomic tools, in order to perform an almost limitless range of specific functions within
cells. These simple genetic components integrate the potentials of computational logic and
memory with synthetic biology, thus allowing programmability for a wide range of highly
predictable functions to an extent which is far beyond the scope of anything achievable in this
field by any other currently known mechanism.
Figure 8: Concept for a synthetic gene oscillator using a serine integrase. Oscillation of a gene of
interest (GOI) is controlled by a promoter within inverted attP/B sites. Expression of the Integrase
(which could be constitutively expressed or under controlled expression) flips the promoter, turning
the GOI expression off. This activates transcription of the RDF, which allows the circuit to be reset.
Oscillatory period, and the state of the circuit, can be modified by controlled expression of a microRNA
which targets the RDF mRNA, or control of integrase expression.
21. 2022896 5461 words
Page 20 of 27
CONCLUSION
This review has discussed aspects of the origin, structure, and function of the serine integrases
with a specific focus on the integrase encoded by bacteriophage ϕC31, establishing them as
powerful tools for synthetic biology and specifically for engineering complex and programmable
behaviours within living cells. The reliability and specificity of these proteins not only allows
efficient site specific integration to occur within cells with higher proficiency than methods
which do not utilise recombinases such as digestion/ligation, but also permits utility beyond
what is capable of traditional recombinases as they do not rely upon host cofactors and
directionality may be controlled. Though larger than their tyrosine relatives, the serine
integrases also allow more efficient use of DNA by requiring recognition sites which can be one
third of the size of those needed by tyrosine recombinases.
ɸC31 integrase mediates recombination in a vast range of organisms, permitting its
prominence in the field of genetic circuit design. It is worth noting however that this integrase
has been reported to be only half as efficient in eukaryotic cells as it is in prokaryotes (Chalberg
et al., 2006), and thus more research is needed in order to improve its eukaryotic stability. Its
modes of recognition and DNA binding are also poorly understood, and thus further study is
needed due to the extensive efficacy which could result from knowledge of how to reliably
target ϕC31 Int to exogenous binding sites. Both of these areas of research would be enormously
aided by complex knowledge of the structure of this protein, however a crystal structure is
elusive.
The ability to predictably programme genetic expression can benefit all fields of
synthetic biology, and will only grow as research continues into this area. Serine Integrases have
been shown to demonstrate all of the components needed for control of a genetic circuit, and
22. 2022896 5461 words
Page 21 of 27
future research which integrates the use of these proteins with existing methods of control (or
focuses on mimicking the results of these other methods) could allow the construction of a
genetic circuit to control any biological application. Now that predictability can be built into such
circuits computationally aided design should make the proposal and implementation of such
devices relatively swift and straightforward.
OUTLOOK
Further research into the ϕC31 Integrase and other serine integrases is likely to revolutionise
the field of genomic engineering. The application of SIRA in rapid pathway assembly has already
demonstrated the utility of a ϕC31 Int in this area, however the continued discovery of more of
such proteins theoretically extends the capacity for memory and logic achievable in synthetic
circuits by 1 bit for every serine integrase incorporated. The construction of such large and
complex circuitry could one day lead to the production of a completely reprogrammable,
entirely integrase-controlled organism, however more realistic applications are likely just
around the corner.
Using serine integrases cells could be made to re-organise chromosomes; delete segments
of their genomes (including all synthetic circuitry); change lineage via expression of master
transcription factors; deliver drugs only in diseased cell states; cycle through production phases
in industry; indicate and record pollution levels; optimise crops to their environment; and so
much more in response to transient or lasting stimuli – be it complex or simple. Serine integrases
could become the standard unit of biological programming such that circuits could be designed
by computers with minimal human input, resulting in the production of a linear DNA segment
to mediate any reaction, and needs only be incorporated into target cells (possibly using
integrases). Predictable genetic manipulation may one day dominate the needs of the industrial,
23. 2022896 5461 words
Page 22 of 27
medicinal, agricultural, and public sectors, and it is likely that ϕC31 Int and other family
members will lead the way into this new era of synthetic biology: predictable genetic circuitry.
REFERENCES
Akopian, A. and Stark, W. (2005). “Site‐Specific DNA Recombinases as Instruments for Genomic Surgery”.
Advances in Genetics, pp.1-23.
Bai, H., Sun, M., Ghosh, P., Hatfull, G., Grindley, N. and Marko, J. (2011). “Single-molecule analysis reveals
the molecular bearing mechanism of DNA strand exchange by a serine recombinase”. Proceedings of the
National Academy of Sciences, 108(18), pp.7419-7424.
Bikard, D., Jiang, W., Samai, P., Hochschild, A., Zhang, F. and Marraffini, L. (2013). “Programmable
repression and activation of bacterial gene expression using an engineered CRISPR-Cas system”. Nucleic
Acids Research, 41(15), pp.7429-7437.
Bonnet, J. and Endy, D. (2013). “Switches, Switches, Every Where, In Any Drop We Drink.” Molecular Cell,
49(2), pp.232-233.
Bonnet, J., Subsoontorn, P. and Endy, D. (2012). “Rewritable digital data storage in live cells via engineered
control of recombination directionality”. Proceedings of the National Academy of Sciences, 109(23),
pp.8884-8889.
Bonnet, J., Yin, P., Ortiz, M., Subsoontorn, P. and Endy, D. (2013). “Amplifying Genetic Logic Gates”.
Science, 340(6132), pp.599-603.
Brophy, J. and Voigt, C. (2014). “Principles of genetic circuit design”. Nature Methods, 11(5), pp.508-
520.
“This review was particularly useful as a starting point to understand the requirements for
genetic circuitry and the usefulness of recombinases in this pursuit”
Campbell A. (1962) “Episomes”. Adv. Genet. 11:101–145
Campbell, A. (1992). “Chromosomal insertion sites for phages and plasmids” J. Bacteriol., 174, pp. 7495–
7499
Cardinale, S. and Arkin, A. (2012). “Contextualizing context for synthetic biology - identifying causes of
failure of synthetic biological systems”. Biotechnology Journal, 7(7), pp.856-866.
Chalberg, T., Portlock, J., Olivares, E., Thyagarajan, B., Kirby, P., Hillman, R., Hoelters, J. and Calos, M.
(2006). “Integration Specificity of Phage ϕC31 Integrase in the Human Genome”. Journal of Molecular
Biology, 357(1), pp.28-48.
Chen, J., Ji, C., Xu, G., Pang, R., Yao, J., Zhu, H., Xue, J. and Jia, W. (2006). “DAXX interacts with phage ɸC31
integrase and inhibits recombination”. Nucleic Acids Research, 34(21), pp.6298-6304.
Colloms, S., Merrick, C., Olorunniji, F., Stark, W., Smith, M., Osbourn, A., Keasling, J. and Rosser, S.
(2013). “Rapid metabolic pathway assembly and modification using serine integrase site-specific
recombination”. Nucleic Acids Research, 42(4), pp.e23
“This paper describes the development of the SIRA method for gene assembly; a method which
showcases the utility of ϕC31 Int in synthetic biology.”
24. 2022896 5461 words
Page 23 of 27
Combes, P., Till, R., Bee, S. and Smith, M. (2002). “The Streptomyces Genome Contains Multiple Pseudo-
attB Sites for the ɸC31-Encoded Site-Specific Recombination System”. Journal of Bacteriology, 184(20),
pp.5746-5752.
Esposito, D. and Scocca, J. (1997). “The integrase family of tyrosine recombinases: evolution of a
conserved active site domain”. Nucleic Acids Research, 25(18), pp.3605-3614.
Farzadfard, F. and Lu, T. (2014). “Genomically encoded analog memory with precise in vivo DNA writing
in living cell populations”. Science, 346(6211), pp.1256272-1256272.
Fogg, P., Colloms, S., Rosser, S., Stark, M. and Smith, M. (2014). “New Applications for Phage
Integrases”. Journal of Molecular Biology, 426(15), pp.2703-2716.
“This review was a useful starting point to understand the differences between phage
integrases and the utility of the serine integrases in synthetic biology.”
Gaj, T., Mercer, A., Gersbach, C., Gordley, R. and Barbas, C. (2010). “Structure-guided reprogramming of
serine recombinase DNA sequence specificity”. Proceedings of the National Academy of Sciences, 108(2),
pp.498-503.
George, S., Evans, D. and Marchette, S. (2003). “A biological programming model for self-healing.”
Proceedings of the 2003 ACM workshop on Survivable and self-regenerative systems in association with
10th ACM Conference on Computer and Communications Security - SSRS '03.
Ghosh, P., Wasil, L. and Hatfull, G. (2006). “Control of Phage Bxb1 Excision by a Novel Recombination
Directionality Factor”. PLoS Biology, 4(6), p.e186.
Goldman, N., Bertone, P., Chen, S., Dessimoz, C., LeProust, E., Sipos, B. and Birney, E. (2013). “Towards
practical, high-capacity, low-maintenance information storage in synthesized DNA”. Nature, 494(7435),
pp.77-80.
Ham, T., Lee, S., Keasling, J. and Arkin, A. (2008). “Design and Construction of a Double Inversion
Recombination Switch for Heritable Sequential Genetic Memory”. PLoS ONE, 3(7), p.e2815.
Haynes, K., Broderick, M., Brown, A., Butner, T., Dickson, J., Harden, W., Heard, L., Jessen, E., Malloy, K.,
Ogden, B., Rosemond, S., Simpson, S., Zwack, E., Campbell, A., Eckdahl, T., Heyer, L. and Poet, J. (2008).
“Engineering bacteria to solve the Burnt Pancake Problem”. J Biol Eng, 2(1), p.8.
Hjelmfelt, A., Weinberger, E. and Ross, J. (1991). “Chemical implementation of neural networks and Turing
machines”. Proceedings of the National Academy of Sciences, 88(24), pp.10983-10987.
Horowitz, P. and Hill, W. (2015). “The art of electronics”. New York, NY: CUP.
Jacob, G. and Murugan, A. (2013). “An Encryption Scheme with DNA Technology and JPEG Zigzag Coding
for Secure Transmission of Images”. [online] Arxiv.org. Available at: http://arxiv.org/abs/1305.1270v1
[Accessed 13 Feb. 2016].
Keravala, A., Lee, S., Thyagarajan, B., Olivares, E., Gabrovsky, V., Woodard, L. and Calos, M. (2008).
“Mutational Derivatives of PhiC31 Integrase with Increased Efficiency and Specificity”. Mol Ther, 17(1),
pp.112-120.
Khaleel, T., Younger, E., McEwan, A., Varghese, A. and Smith, M. (2011). “A phage protein that binds
φC31 integrase to switch its directionality”. Molecular Microbiology, 80(6), pp.1450-1463.
“This paper represents a key turning point in the research of ϕC31 Int function via the discovery
of its RDF gp3. This allows control over the directionality of recombination in genetic circuitry
with ϕC31 Int.”
Kuhstoss, S. and Rao, R. (1991). “Analysis of the integration function of the streptomycete bacteriophage
φC31”. Journal of Molecular Biology, 222(4), pp.897-908.
25. 2022896 5461 words
Page 24 of 27
Lewis, J., Hatfull, G. (2001). “Control of directionality in integrase-mediated recombination: examination
of recombination directionality factors (RDFs) including Xis and Cox proteins”. Nucleic Acids Research,
29(11), pp.2205-2216.
Liang, J., Bloom, R. and Smolke, C. (2011). “Engineering Biological Systems with Synthetic RNA Molecules”.
Molecular Cell, 43(6), pp.915-926.
Liu, S., Ma, J., Wang, W., Zhang, M., Xin, Q., Peng, S., Li, R. and Zhu, H. (2010). “Mutational Analysis of
Highly Conserved Residues in the Phage PhiC31 Integrase Reveals Key Amino Acids Necessary for the DNA
Recombination”. PLoS ONE, 5(1), p.e8863.
Lohmueller, J., Armel, T. and Silver, P. (2012). “A tunable zinc finger-based framework for Boolean logic
computation in mammalian cells”. Nucleic Acids Research, 40(11), pp.5180-5187.
Malla, S. (2005). “Rearranging the centromere of the human Y chromosome with ɸC31 integrase”. Nucleic
Acids Research, 33(19), pp.6101-6113.
McEwan, A., Raab, A., Kelly, S., Feldmann, J. and Smith, M. (2011). “Zinc is essential for high-affinity DNA
binding and recombinase activity of ɸC31 integrase”. Nucleic Acids Research, 39(14), pp.6137-6147.
McEwan, A., Rowley, P. and Smith, M. (2009). “DNA binding and synapsis by the large C-terminal domain
of ɸC31 integrase”. Nucleic Acids Research, 37(14), pp.4764-4773.
McMahon, S., McEwan, A., Smith, M. and Naismith, J. (2013). “Protein crystal structure of the N-terminal
and recombinase domains of the Streptomyces temperate phage serine recombinase, fC31 integrase”.
Unpublished.
Mimee, M., Tucker, A., Voigt, C. and Lu, T. (2015). “Programming a Human Commensal Bacterium,
Bacteroides thetaiotaomicron, to Sense and Respond to Stimuli in the Murine Gut Microbiota”. Cell
Systems, 1(1), pp.62-71.
Moon, T., Lou, C., Tamsir, A., Stanton, B. and Voigt, C. (2012). “Genetic programs constructed from layered
logic gates in single cells”. Nature, 491(7423), pp.249-253.
Olorunniji, F., Buck, D., Colloms, S., McEwan, A., Smith, M., Stark, W. and Rosser, S. (2012). “Gated rotation
mechanism of site-specific recombination by ɸC31 integrase”. Proceedings of the National Academy of
Sciences, 109(48), pp.19661-19666.
Oppenheim, A., Kobiler, O., Stavans, J., Court, D. and Adhya, S. (2005). “Switches in Bacteriophage Lambda
Development”. Annu. Rev. Genet., 39(1), pp.409-429.
Parakhia, M. (2010). “Molecular biology & biotechnology.” New Delhi: New India Publishing, p.112.
Rodrigo, G. and Jaramillo, A. (2013). “AutoBioCAD: Full Biodesign Automation of Genetic Circuits”. ACS
Synth. Biol., 2(5), pp.230-236.
Rowley, P. and Smith, M. (2008). “Role of the N-Terminal Domain of ɸC31 Integrase in attB-attP Synapsis”.
Journal of Bacteriology, 190(20), pp.6918-6921.
Rowley, P., Smith, M., Younger, E. and Smith, M. (2008). “A motif in the C-terminal domain of ɸC31
integrase controls the directionality of recombination”. Nucleic Acids Research, 36(12), pp.3879-3891.
Rutherford, K. and Van Duyne, G. (2014). “The ins and outs of serine integrase site-specific
recombination”. Current Opinion in Structural Biology, 24, pp.125-131.
Sclimenti, C. (2001). “Directed evolution of a recombinase for improved genomic integration at a native
human sequence”. Nucleic Acids Research, 29(24), pp.5044-5051.
26. 2022896 5461 words
Page 25 of 27
Siuti, P., Yazbek, J. and Lu, T. (2013). “Synthetic circuits integrating logic and memory in living cells”. Nat
Biotechnol, 31(5), pp.448-452.
“This research fully realises the ability for serine integrases for logic and memory by
demonstrating all 16 Boolean logic functions. Released at the same time as competing research
(Bonnet, et al., 2013), this paper specifically utilises ϕC31 Int.”
Smith, M. and Thorpe, H. (2002). “Diversity in the serine recombinases”. Molecular Microbiology, 44(2),
pp.299-307.
Smith, M., Brown, W., McEwan, A. and Rowley, P. (2010). “Site-specific recombination by φC31 integrase
and other large serine recombinases”. Biochm. Soc. Trans., 38(2), pp.388-394.
Stricker, J., Cookson, S., Bennett, M., Mather, W., Tsimring, L. and Hasty, J. (2008). “A fast, robust and
tunable synthetic gene oscillator”. Nature, 456(7221), pp.516-519.
Thorpe, H., Wilson, S. and Smith, M. (2000). “Control of directionality in the site-specific recombination
system of the Streptomyces phage phiC31”. Molecular Microbiology, 38(2), pp.232-241.
Yang, L., Nielsen, A., Fernandez-Rodriguez, J., McClune, C., Laub, M., Lu, T. and Voigt, C. (2014).
“Permanent genetic memory with >1-byte capacity”. Nature Methods, 11(12), pp.1261-1266.
Yuan, P., Gupta, K. and Van Duyne, G. (2008). “Tetrameric Structure of a Serine Integrase Catalytic
Domain”. Structure, 16(8), pp.1275-1286.
27. 2022896 5461 words
Page 26 of 27
LOG OF INVESTIGATION
This review began to form when I first decided I wanted to write about something within
the scope of synthetic biology, as I have a keen interest in the idea of modulation in
biological manipulation as a technological asset. I enjoy the possibility that in the future
biological research could focus on ‘plug and play’ manipulation of genomes facilitating
design of complex networks for new applications.
After looking through the list of university staff who worked in this area at
http://www.gla.ac.uk/researchinstitutes/biology/research/syntheticbiology/staff/ and
reading through the research interests of each staff member I sent an email to Dr. Sean
Colloms on 06/10/15 detailing my interest in synthetic biology and particularly biological
circuitry and chassis organisms. In this email I asked if Dr. Colloms would be willing to
supervise me in writing my critical review and suggested we should meet to discuss this
further.
I met with Dr. Colloms on 15/10/15 and we discussed the power of serine integrases in
genetic circuit design and some key concepts in this area. This meeting solidified the
topic of review, and I left with a list of papers to read which Dr. Colloms had provided:
(Bonnet et al., 2012; Bonnet et al., 2013; Bonnet and Endy., 2013)
On 18/10/15 I submitted “Genetic Circuitry: The Use of Serine Integrases in
Synthetic Logic and Memory” as the working title of my critical review.
In the months that followed I gathered resources with which to write a review of my
chosen topic. Some of the research papers used herein were identified through reading
the primary literature, while others were discovered through internet searches using
Google Scholar and various databases such as PubMed. The work of Professor Maggie
Smith of the University of York proved exceptionally useful when researching Integrase
structure and function.
ɸC31 integrase continued to crop up during my research as a well-utilised serine
integrase, however much about the specific mechanism of its directionality remained
unknown and its RDF was late to be discovered. This interested me as a better
understanding of this mechanism would enhance its utility in genetic circuit design,
however most current applications utilised the integrase for unidirectional integration
into host genomes.
28. 2022896 5461 words
Page 27 of 27
The use of this integrase in both the SIRA assembly mechanism (Colloms et al., 2013)
and the implementation of Boolean Logic Gates using serine integrases (Situi et al.,
2013) positioned the protein at the cutting edge of the genetic circuitry and synthetic
biology fields, and cemented the focus of this review.
By early January 2016 I had a solid idea of the shape which this review was going to take
and began planning to write specific sections.
A draft version of the review was sent to Dr. Colloms on 18/02/16 for feedback
I met again with Dr. Colloms on 24/02/16 and we discussed his feedback on my review.
I left with plenty of useful suggestions to improve the review. Notably, Dr. Colloms
suggested areas where figures were necessary, and introduced me to a paper which
suggests the role of the CTD coiled coil motifs in Int directionality (Rutherford and Van
Duyne, 2014).
Having incorporated the feedback of Dr. Colloms into the review and also having proof-
read and edited it in some areas, the review was finally submitted to the school office
on 07/03/16.
While there are many reviews which describe emerging applications for phage
integrases, this review focuses directly on genetic circuits integrating logic and memory,
specifically in the context of a serine integrase which is at the forefront of this
technology. As such, this is a unique piece of work which explores a new field of
biotechnology with specific focus on one small group of proteins which are likely to
revolutionise the possible applications of this research.
ACKNOWLEDGEMENTS
I would like to thank Dr. Colloms for his patience and sound advice when conceiving and
reviewing this document. None of this would have been possible without his input.
I would also like to thank my partner Sarah for her continued support and understanding while
writing this review.
My family has also been incredibly patient and understanding of the time commitment it took
to prepare this piece of work.