- The document summarizes a project that assessed the diversity of pathogenic bacteria Borrelia Burgdorferi in tick samples through analyzing co-infection patterns.
- It introduced probabilistic and optimization approaches, including calculating genotype proportions in samples and proposing minimum new strain types.
- Three optimization approaches were attempted: mixed integer linear programming, network flow, and genetic algorithm model. The document provided details of the mixed integer linear programming formulation.
1) The study aimed to determine environmental sources of variation in reproductive lifespan using genetically identical fruit fly lines. 2) While the lines were genetically identical, substantial variation was found between individuals' reproductive lifespans. 3) The study compared differences between treated and untreated lines, infected and cured lines, and results from different experimental sections, but no single environmental factor consistently explained the observed variation.
This document summarizes a research article that proposes a new three-parameter generalized beta-Poisson dose-response model for quantitative microbial risk assessment. The model allows for the minimum number of organisms required to cause infection to be a random variable, rather than fixed at one organism as in traditional single-hit beta-Poisson models. The researchers use an approximate Bayesian computation algorithm to estimate parameters for the new model by fitting it to four experimental dose-response data sets from previous studies. The results show that while the new model may better characterize some dose-response processes, it did not significantly improve fit to three of the four data sets, possibly due to small sample sizes. The generalized model provides a way to investigate dose-response mechanisms
This document summarizes a student's summer project analyzing immune response data from seasonal influenza vaccination studies. The student: 1) Developed a protocol to make data from the ImmPort clinical studies database compatible with the GenePool genomics analysis platform; 2) Was able to reproduce some key results from a 2011 study using GenePool; and 3) Identified some additional novel findings, such as a stronger correlation between vaccine response and expression of the TXNDC5 gene compared to the CaMKIV gene as originally reported. The student was successful in adapting ImmPort data for GenePool and generating new insights from reanalyzing an existing influenza vaccination study.
This study evaluated the reproducibility of an anthrax lethal toxin neutralization assay (TNA) across multiple laboratories and species. Seven laboratories performed the TNA on 108 serum samples from rabbits, nonhuman primates, and humans. The results showed similar dose-response curves across species, with slope and asymptote values within 30% of human reference material. Dilutional linearity was also consistent among species, with slopes of spiked samples less than 1.2. Agreement between laboratories was within 10% for ED50 values and 7.5% for NF50 values. When combined across laboratories and species, the relative standard deviations were 45% for ED50s and 35% for NF50s. This demonstrates the TNA can be
The document summarizes research that aimed to validate a mouse model for studying Hepatitis A Virus (HAV) by quantifying the expression of interferon-stimulated genes (ISGs) in mouse livers infected with HAV. The researcher found elevated levels of ISG20, ISG15, ISG56, IP10, interferon alpha/beta, and interferon gamma in infected mice, similar to levels found in infected humans and chimpanzees. This supported the mouse model and showed its ability to accurately represent the human immune response to HAV infection. Further research using this model may help develop treatments and further understanding of HAV.
1) Dendritic cells can use IgE and the high-affinity IgE receptor FcεRI to uptake and cross-present soluble antigens to cytotoxic T cells at very low antigen doses.
2) This IgE/FcεRI-mediated cross-presentation pathway efficiently induces cytotoxic T cell proliferation and granzyme B production in response to soluble antigens.
3) Using tumor antigen-specific IgE and dendritic cell-based vaccination experiments in vivo, the authors demonstrate that IgE/FcεRI-mediated cross-presentation significantly improves anti-tumor immune responses and induces memory responses.
Mapping genomic regions associated with Maize Lethal Necrosis (MLN) using QTL...ILRI
Six bi-parental maize populations were evaluated over three seasons for resistance to Maize Lethal Necrosis (MLN) caused by co-infection with Maize chlorotic mottle virus and Sugarcane mosaic virus. Three major QTL for MLN resistance were identified on chromosomes 3 and 6, with a major QTL on chromosome 6 validated using QTL sequencing. Five additional bi-parental populations were developed and are being used for further QTL mapping and identification of genes conferring MLN resistance.
1) The study aimed to determine environmental sources of variation in reproductive lifespan using genetically identical fruit fly lines. 2) While the lines were genetically identical, substantial variation was found between individuals' reproductive lifespans. 3) The study compared differences between treated and untreated lines, infected and cured lines, and results from different experimental sections, but no single environmental factor consistently explained the observed variation.
This document summarizes a research article that proposes a new three-parameter generalized beta-Poisson dose-response model for quantitative microbial risk assessment. The model allows for the minimum number of organisms required to cause infection to be a random variable, rather than fixed at one organism as in traditional single-hit beta-Poisson models. The researchers use an approximate Bayesian computation algorithm to estimate parameters for the new model by fitting it to four experimental dose-response data sets from previous studies. The results show that while the new model may better characterize some dose-response processes, it did not significantly improve fit to three of the four data sets, possibly due to small sample sizes. The generalized model provides a way to investigate dose-response mechanisms
This document summarizes a student's summer project analyzing immune response data from seasonal influenza vaccination studies. The student: 1) Developed a protocol to make data from the ImmPort clinical studies database compatible with the GenePool genomics analysis platform; 2) Was able to reproduce some key results from a 2011 study using GenePool; and 3) Identified some additional novel findings, such as a stronger correlation between vaccine response and expression of the TXNDC5 gene compared to the CaMKIV gene as originally reported. The student was successful in adapting ImmPort data for GenePool and generating new insights from reanalyzing an existing influenza vaccination study.
This study evaluated the reproducibility of an anthrax lethal toxin neutralization assay (TNA) across multiple laboratories and species. Seven laboratories performed the TNA on 108 serum samples from rabbits, nonhuman primates, and humans. The results showed similar dose-response curves across species, with slope and asymptote values within 30% of human reference material. Dilutional linearity was also consistent among species, with slopes of spiked samples less than 1.2. Agreement between laboratories was within 10% for ED50 values and 7.5% for NF50 values. When combined across laboratories and species, the relative standard deviations were 45% for ED50s and 35% for NF50s. This demonstrates the TNA can be
The document summarizes research that aimed to validate a mouse model for studying Hepatitis A Virus (HAV) by quantifying the expression of interferon-stimulated genes (ISGs) in mouse livers infected with HAV. The researcher found elevated levels of ISG20, ISG15, ISG56, IP10, interferon alpha/beta, and interferon gamma in infected mice, similar to levels found in infected humans and chimpanzees. This supported the mouse model and showed its ability to accurately represent the human immune response to HAV infection. Further research using this model may help develop treatments and further understanding of HAV.
1) Dendritic cells can use IgE and the high-affinity IgE receptor FcεRI to uptake and cross-present soluble antigens to cytotoxic T cells at very low antigen doses.
2) This IgE/FcεRI-mediated cross-presentation pathway efficiently induces cytotoxic T cell proliferation and granzyme B production in response to soluble antigens.
3) Using tumor antigen-specific IgE and dendritic cell-based vaccination experiments in vivo, the authors demonstrate that IgE/FcεRI-mediated cross-presentation significantly improves anti-tumor immune responses and induces memory responses.
Mapping genomic regions associated with Maize Lethal Necrosis (MLN) using QTL...ILRI
Six bi-parental maize populations were evaluated over three seasons for resistance to Maize Lethal Necrosis (MLN) caused by co-infection with Maize chlorotic mottle virus and Sugarcane mosaic virus. Three major QTL for MLN resistance were identified on chromosomes 3 and 6, with a major QTL on chromosome 6 validated using QTL sequencing. Five additional bi-parental populations were developed and are being used for further QTL mapping and identification of genes conferring MLN resistance.
In silico approach for viral mutations and sustainability of immunizationsIJERA Editor
In this paper we use virtual samples of individuals and a dynamical modeling proposed in a previous study to
study the behavior of immune memory against antigenic mutation. Our results suggest that the sustainability of
the immunizations is not a stochastic process, what contradicts the current opinion. We show that what may
cause an apparent random behavior of the immune memory is the viral variability. This result can be important
to investigate the durability of vaccines and immunizations.
Variation analysis of Swine influenza virus (SIV) H1N1 sequences in experimen...Álvaro L. Valiñas
Viral replication of swine influenza virus was observed in both vaccinated and non-vaccinated pigs challenged with H1N1, though it was lower in vaccinated pigs. Next-generation sequencing identified 276 single nucleotide variants, with more nonsynonymous variants found in vaccinated pigs, suggesting natural selection driving viral evolution. Substitutions were found across influenza virus segments and in key proteins, including some near antigenic sites that could help the virus evade immunity. The study provides insights into viral evolution dynamics in vaccinated and non-vaccinated pigs.
Variation analysis of Swine influenza virus (SIV) H1N1 sequences in experimen...Álvaro L. Valiñas
Swine influenza is a highly contagious and widely distributed disease that generates important economic losses in the pig industry. Nowadays, one of the most extended strategy used to control Swine influenza viruses (SIVs) is the trivalent vaccine application, which formulation contains the most frequently circulating SIV subtypes H1N1, H1N2 and H3N2. These vaccines do not provide sterilizing immunity against the virus, potentially favoring viral evolutionary dynamics. To better understand the main mechanisms that shape viral evolution, in this work, the SIV intra-host diversity was analyzed in samples collected from both, vaccinated and non-vaccinated animals challenged with H1N1 influenza A virus. In the present study 276 single nucleotide variants were found within 28 whole SIV genomes obtained by next generation sequencing. Differences in nucleotide variants between groups were established and the impact of each substitution found was hypothesized according to previous literature. Substitutions were allocated along all influenza genetic segments, while the most relevant non-synonymous substitutions were allocated in the NS1 protein on samples collected only from vaccinated animals. These substitutions could affect both, mRNA viral translation and pathogenesis. Moreover, new viral variants were found in both vaccinated and non-vaccinated pigs, showing relevant substitutions in the HA, NA and NP proteins that may be contributing to evasion of host immune system, virulence and host adaptation. Overall, results of the present study suggest that SIV is continuously evolving despite vaccine application, therefore new substitutions may increase viral fitness under field conditions.
SEROLOGICAL ELISAs BASED ON MONOCLONAL ANTIBODIES AS DIAGNOSTIC TOOLS FOR LUM...EuFMD
The document describes the development of two serological ELISAs using monoclonal antibodies for the diagnosis of lumpy skin disease (LSD) in cattle. Four monoclonal antibodies that recognize a 32-35 kDa protein of the LSD virus were selected. Two ELISAs were developed - a competitive ELISA and a trapping-indirect ELISA. Both assays consistently detected seroconversion in experimentally infected cattle by 14 days post infection. The trapping ELISA was less sensitive in detecting seroconversion in experimentally infected goats. The assays show promise as diagnostic tools for controlling the spread of LSD.
Biomedical Informatics 706: Precision Medicine with exposuresChirag Patel
This document discusses the need for a more comprehensive approach to understanding disease etiology by investigating environmental exposures, or the "exposome", in addition to genetic factors. It notes that genome-wide association studies have been successful in identifying genetic risk factors, but genetics alone explains only a portion of disease risk. Large studies like the National Health and Nutrition Examination Survey collect extensive exposure and health data that could be leveraged to discover environmental risk factors through an "exposome-wide association study" approach analogous to GWAS. Characterizing both genetic and environmental contributions is crucial for advancing precision medicine.
Infectious bursal disease in Ethiopian village chickensILRI
1. The study examines infectious bursal disease virus (IBDV) in village chickens in 8 villages across 2 regions in Ethiopia. Blood samples were collected from 1,280 chickens and tested for IBDV antibodies.
2. 44 chickens tested positive for IBDV antibodies. Recent mortality in growers but not chicks or adults was associated with IBDV seropositivity, consistent with IBDV disease biology.
3. IBDV appears to be circulating in 7 of the 8 villages studied. The study highlights the need for localized control strategies that consider regional socioeconomic differences in poultry keeping.
1. The document discusses using chloroplasts to produce vaccine antigens and autoantigens like myelin basic protein (MBP) to induce oral tolerance and potentially treat multiple sclerosis.
2. It describes constructing vectors to express cytokines like IL-4 and IL-10 along with MBP to simplify oral delivery and reduce inflammation.
3. The document also discusses using chloroplasts to produce the cholera toxin B subunit (CTB) fused to a rotavirus protein to develop a dual oral vaccine for cholera and rotavirus diarrhea.
Martin Grunnill has recently submitted his PhD thesis on inapparent and vertically transmitted infections in two host-virus systems. He has a broad education in tropical diseases, parasitology, and mathematical modeling of infectious diseases. His research experience includes fieldwork on dengue fever transmission in Mexico and laboratory work using molecular techniques to study covert virus infections in moths.
This study investigated the nasopharyngeal microbiome of 45 Malawian children who carried Streptococcus pneumoniae. The microbiome was characterized using 16S rRNA gene sequencing. The results showed a shift in microbial diversity with HIV infection, with Moraxella and Streptococcus influencing diversity. Multiple carriage of pneumococcal serotypes did not significantly impact microbial diversity. The major limitation was that only pneumococcal carriers were studied. Further research on non-carriers is recommended to better understand implications on pneumococcal colonization.
A study of 56,000 twins and 700,000 siblings in a large health insurance cohort found complex and elusive variation in 560 phenotypes. The variation was influenced by both genetic and environmental factors, with heritability (h2) estimates of 0.3 for genetic factors and 0.1 for common environmental factors. There are open questions about how to best capture non-genetic variation in populations and whether it is possible to partition genetic, non-genetic, and random factors that influence disease risk.
This study estimates the effective recombination rate and distribution of selection coefficients in HIV using time series sequence data from infected patients. By examining how new haplotype combinations arise between time points, the researchers estimated an effective recombination rate of 1.460.661025 recombinations per site and generation. They also found evidence that at least 15% of observed non-synonymous mutations are selected for at a rate exceeding 0.8% per generation. This provides quantitative parameters for understanding HIV evolution within patients over time.
This document compares the allele frequencies of 15 Plasmodium falciparum merozoite antigen genes in malaria infections sampled in Kenya in 2007 and 2008. It finds fluctuating allele frequencies in codons 147 and 148 of the reticulocyte-binding homologue 5 (Rh5) gene over this period in uncomplicated malaria infections. However, the dominant YH haplotype was stable over multiple years in asymptomatic and complicated infections. A regression analysis found the chance of the less common HD haplotype decreased over time from 2007 to 2009 in uncomplicated and asymptomatic infections.
1. The document analyzes how viral modification of plant phenotypes (VMPPs) that alter plant-vector interactions can impact the transmission and spread of non-persistently transmitted viruses.
2. VMPPs that initially attract aphids but then deter their settling through the accumulation of distasteful compounds can increase virus transmission early in an epidemic but ultimately limit it by decreasing the aphid population size.
3. In contrast, VMPPs that promote aphid settling and reproduction lead to greater production of winged aphids, facilitating longer-distance dispersal and potentially larger epidemics.
EWAS and the exposome: Mt Sinai in Brescia 052119Chirag Patel
The document summarizes a presentation given by Chirag Patel on estimating the genetic (h2) and shared environmental (c2) contributions to phenotypic variation using a large health insurance claims dataset of over 56,000 twin pairs and 700,000 siblings in the US. The analysis of 560 phenotypes across different disease categories found significant heritable and shared environmental components for many traits, with average h2 of 0.32 and c2 of 0.09. However, factors like air pollution, climate, and socioeconomic status explained only a modest portion of the overall shared environmental variation. This highlights the complex and elusive nature of phenotypic variation that remains unexplained. The presentation emphasizes the need to leverage exposome data to better characterize
1. The evolutionary relationships between malaria parasite species have been controversial due to past studies relying on visible traits rather than molecular data and issues like taxon bias.
2. Different genes are suitable for phylogenetic analysis, with some like rRNA being problematic due to paralogs. Studies using multiple genes from different genomic compartments provide better resolution.
3. The origin of P. falciparum, which causes the most virulent human malaria, has been debated, with evidence it may have recently switched hosts from gorillas rather than co-diverging with humans. Further sampling of ape malarias is needed to resolve this.
Identification of SNP markers for resistance to Salmonella and IBDV in indige...ILRI
Poster prepared by Psifidi, G. Banos, O. Matika, Tadelle Dessie, R. Christley, P. Wigley, J.M. Bettridge, O. Hanotte, Takele Taye Desta and P. Kaiser for the Annual Meeting of the Society of Veterinary Epidemiology and Preventive Medicine, Madrid, Spain, 20-22 March 2013.
The document is an English lesson about the link between birthdate and susceptibility to the flu. It includes objectives, an agenda, speaking prompts, vocabulary definitions, listening comprehension questions about a podcast on the topic, and lessons on making comparatives and superlatives with "more" and "most". The key points are that exposure to flu strains as a baby may provide lifelong antibodies, susceptibility varies by birth year, and knowing vulnerability helps plan epidemic responses.
1) The study aims to assess how predation risk influences the defensive chemical compounds in striped skunk spray. Specifically, it will compare the amounts of trans-2-butene-thiol and trans-2-butenyl thioacetate, two abundant noxious chemicals, in skunk populations facing different predation pressures.
2) Gas chromatography-mass spectrometry will be used to identify and quantify the two target chemicals in spray samples collected from skunks in areas of high and low mammalian and avian predation risk.
3) Three hypotheses are that skunks in riskier areas will have more variable spray potency, juveniles will have stronger spray, and bolder
Mario H. Skiadopoulos Presentation on "Evaluation of the Antibody Threshold o...Matthew Kirkby
Mario H. Skiadopoulos Presentation on "Evaluation of the Antibody Threshold of Protection Conferred by a NextGeneration Anthrax Vaccine Candidate Adjuvanted with the Immunostimulatory CPG 7909 TLR9 Agonist" at Biology of Anthrax, Tampa 2016
El 12 de mayo de 2017 celebramos en la Fundación Ramó Areces una jornada con IS Global y Unitaid sobre enfermedades transmitidas por vectores, como la malaria, entre otras.
The document provides a summary of the professional credentials and career experience of Mohamed Asim. It lists his qualifications including certifications in accounting and finance. It then outlines his career history working in financial controller roles for various companies in UAE and Pakistan from 2006 to present. His experience includes preparing financial statements, managing accounts, implementing controls, and liaising with external auditors and management.
In silico approach for viral mutations and sustainability of immunizationsIJERA Editor
In this paper we use virtual samples of individuals and a dynamical modeling proposed in a previous study to
study the behavior of immune memory against antigenic mutation. Our results suggest that the sustainability of
the immunizations is not a stochastic process, what contradicts the current opinion. We show that what may
cause an apparent random behavior of the immune memory is the viral variability. This result can be important
to investigate the durability of vaccines and immunizations.
Variation analysis of Swine influenza virus (SIV) H1N1 sequences in experimen...Álvaro L. Valiñas
Viral replication of swine influenza virus was observed in both vaccinated and non-vaccinated pigs challenged with H1N1, though it was lower in vaccinated pigs. Next-generation sequencing identified 276 single nucleotide variants, with more nonsynonymous variants found in vaccinated pigs, suggesting natural selection driving viral evolution. Substitutions were found across influenza virus segments and in key proteins, including some near antigenic sites that could help the virus evade immunity. The study provides insights into viral evolution dynamics in vaccinated and non-vaccinated pigs.
Variation analysis of Swine influenza virus (SIV) H1N1 sequences in experimen...Álvaro L. Valiñas
Swine influenza is a highly contagious and widely distributed disease that generates important economic losses in the pig industry. Nowadays, one of the most extended strategy used to control Swine influenza viruses (SIVs) is the trivalent vaccine application, which formulation contains the most frequently circulating SIV subtypes H1N1, H1N2 and H3N2. These vaccines do not provide sterilizing immunity against the virus, potentially favoring viral evolutionary dynamics. To better understand the main mechanisms that shape viral evolution, in this work, the SIV intra-host diversity was analyzed in samples collected from both, vaccinated and non-vaccinated animals challenged with H1N1 influenza A virus. In the present study 276 single nucleotide variants were found within 28 whole SIV genomes obtained by next generation sequencing. Differences in nucleotide variants between groups were established and the impact of each substitution found was hypothesized according to previous literature. Substitutions were allocated along all influenza genetic segments, while the most relevant non-synonymous substitutions were allocated in the NS1 protein on samples collected only from vaccinated animals. These substitutions could affect both, mRNA viral translation and pathogenesis. Moreover, new viral variants were found in both vaccinated and non-vaccinated pigs, showing relevant substitutions in the HA, NA and NP proteins that may be contributing to evasion of host immune system, virulence and host adaptation. Overall, results of the present study suggest that SIV is continuously evolving despite vaccine application, therefore new substitutions may increase viral fitness under field conditions.
SEROLOGICAL ELISAs BASED ON MONOCLONAL ANTIBODIES AS DIAGNOSTIC TOOLS FOR LUM...EuFMD
The document describes the development of two serological ELISAs using monoclonal antibodies for the diagnosis of lumpy skin disease (LSD) in cattle. Four monoclonal antibodies that recognize a 32-35 kDa protein of the LSD virus were selected. Two ELISAs were developed - a competitive ELISA and a trapping-indirect ELISA. Both assays consistently detected seroconversion in experimentally infected cattle by 14 days post infection. The trapping ELISA was less sensitive in detecting seroconversion in experimentally infected goats. The assays show promise as diagnostic tools for controlling the spread of LSD.
Biomedical Informatics 706: Precision Medicine with exposuresChirag Patel
This document discusses the need for a more comprehensive approach to understanding disease etiology by investigating environmental exposures, or the "exposome", in addition to genetic factors. It notes that genome-wide association studies have been successful in identifying genetic risk factors, but genetics alone explains only a portion of disease risk. Large studies like the National Health and Nutrition Examination Survey collect extensive exposure and health data that could be leveraged to discover environmental risk factors through an "exposome-wide association study" approach analogous to GWAS. Characterizing both genetic and environmental contributions is crucial for advancing precision medicine.
Infectious bursal disease in Ethiopian village chickensILRI
1. The study examines infectious bursal disease virus (IBDV) in village chickens in 8 villages across 2 regions in Ethiopia. Blood samples were collected from 1,280 chickens and tested for IBDV antibodies.
2. 44 chickens tested positive for IBDV antibodies. Recent mortality in growers but not chicks or adults was associated with IBDV seropositivity, consistent with IBDV disease biology.
3. IBDV appears to be circulating in 7 of the 8 villages studied. The study highlights the need for localized control strategies that consider regional socioeconomic differences in poultry keeping.
1. The document discusses using chloroplasts to produce vaccine antigens and autoantigens like myelin basic protein (MBP) to induce oral tolerance and potentially treat multiple sclerosis.
2. It describes constructing vectors to express cytokines like IL-4 and IL-10 along with MBP to simplify oral delivery and reduce inflammation.
3. The document also discusses using chloroplasts to produce the cholera toxin B subunit (CTB) fused to a rotavirus protein to develop a dual oral vaccine for cholera and rotavirus diarrhea.
Martin Grunnill has recently submitted his PhD thesis on inapparent and vertically transmitted infections in two host-virus systems. He has a broad education in tropical diseases, parasitology, and mathematical modeling of infectious diseases. His research experience includes fieldwork on dengue fever transmission in Mexico and laboratory work using molecular techniques to study covert virus infections in moths.
This study investigated the nasopharyngeal microbiome of 45 Malawian children who carried Streptococcus pneumoniae. The microbiome was characterized using 16S rRNA gene sequencing. The results showed a shift in microbial diversity with HIV infection, with Moraxella and Streptococcus influencing diversity. Multiple carriage of pneumococcal serotypes did not significantly impact microbial diversity. The major limitation was that only pneumococcal carriers were studied. Further research on non-carriers is recommended to better understand implications on pneumococcal colonization.
A study of 56,000 twins and 700,000 siblings in a large health insurance cohort found complex and elusive variation in 560 phenotypes. The variation was influenced by both genetic and environmental factors, with heritability (h2) estimates of 0.3 for genetic factors and 0.1 for common environmental factors. There are open questions about how to best capture non-genetic variation in populations and whether it is possible to partition genetic, non-genetic, and random factors that influence disease risk.
This study estimates the effective recombination rate and distribution of selection coefficients in HIV using time series sequence data from infected patients. By examining how new haplotype combinations arise between time points, the researchers estimated an effective recombination rate of 1.460.661025 recombinations per site and generation. They also found evidence that at least 15% of observed non-synonymous mutations are selected for at a rate exceeding 0.8% per generation. This provides quantitative parameters for understanding HIV evolution within patients over time.
This document compares the allele frequencies of 15 Plasmodium falciparum merozoite antigen genes in malaria infections sampled in Kenya in 2007 and 2008. It finds fluctuating allele frequencies in codons 147 and 148 of the reticulocyte-binding homologue 5 (Rh5) gene over this period in uncomplicated malaria infections. However, the dominant YH haplotype was stable over multiple years in asymptomatic and complicated infections. A regression analysis found the chance of the less common HD haplotype decreased over time from 2007 to 2009 in uncomplicated and asymptomatic infections.
1. The document analyzes how viral modification of plant phenotypes (VMPPs) that alter plant-vector interactions can impact the transmission and spread of non-persistently transmitted viruses.
2. VMPPs that initially attract aphids but then deter their settling through the accumulation of distasteful compounds can increase virus transmission early in an epidemic but ultimately limit it by decreasing the aphid population size.
3. In contrast, VMPPs that promote aphid settling and reproduction lead to greater production of winged aphids, facilitating longer-distance dispersal and potentially larger epidemics.
EWAS and the exposome: Mt Sinai in Brescia 052119Chirag Patel
The document summarizes a presentation given by Chirag Patel on estimating the genetic (h2) and shared environmental (c2) contributions to phenotypic variation using a large health insurance claims dataset of over 56,000 twin pairs and 700,000 siblings in the US. The analysis of 560 phenotypes across different disease categories found significant heritable and shared environmental components for many traits, with average h2 of 0.32 and c2 of 0.09. However, factors like air pollution, climate, and socioeconomic status explained only a modest portion of the overall shared environmental variation. This highlights the complex and elusive nature of phenotypic variation that remains unexplained. The presentation emphasizes the need to leverage exposome data to better characterize
1. The evolutionary relationships between malaria parasite species have been controversial due to past studies relying on visible traits rather than molecular data and issues like taxon bias.
2. Different genes are suitable for phylogenetic analysis, with some like rRNA being problematic due to paralogs. Studies using multiple genes from different genomic compartments provide better resolution.
3. The origin of P. falciparum, which causes the most virulent human malaria, has been debated, with evidence it may have recently switched hosts from gorillas rather than co-diverging with humans. Further sampling of ape malarias is needed to resolve this.
Identification of SNP markers for resistance to Salmonella and IBDV in indige...ILRI
Poster prepared by Psifidi, G. Banos, O. Matika, Tadelle Dessie, R. Christley, P. Wigley, J.M. Bettridge, O. Hanotte, Takele Taye Desta and P. Kaiser for the Annual Meeting of the Society of Veterinary Epidemiology and Preventive Medicine, Madrid, Spain, 20-22 March 2013.
The document is an English lesson about the link between birthdate and susceptibility to the flu. It includes objectives, an agenda, speaking prompts, vocabulary definitions, listening comprehension questions about a podcast on the topic, and lessons on making comparatives and superlatives with "more" and "most". The key points are that exposure to flu strains as a baby may provide lifelong antibodies, susceptibility varies by birth year, and knowing vulnerability helps plan epidemic responses.
1) The study aims to assess how predation risk influences the defensive chemical compounds in striped skunk spray. Specifically, it will compare the amounts of trans-2-butene-thiol and trans-2-butenyl thioacetate, two abundant noxious chemicals, in skunk populations facing different predation pressures.
2) Gas chromatography-mass spectrometry will be used to identify and quantify the two target chemicals in spray samples collected from skunks in areas of high and low mammalian and avian predation risk.
3) Three hypotheses are that skunks in riskier areas will have more variable spray potency, juveniles will have stronger spray, and bolder
Mario H. Skiadopoulos Presentation on "Evaluation of the Antibody Threshold o...Matthew Kirkby
Mario H. Skiadopoulos Presentation on "Evaluation of the Antibody Threshold of Protection Conferred by a NextGeneration Anthrax Vaccine Candidate Adjuvanted with the Immunostimulatory CPG 7909 TLR9 Agonist" at Biology of Anthrax, Tampa 2016
El 12 de mayo de 2017 celebramos en la Fundación Ramó Areces una jornada con IS Global y Unitaid sobre enfermedades transmitidas por vectores, como la malaria, entre otras.
The document provides a summary of the professional credentials and career experience of Mohamed Asim. It lists his qualifications including certifications in accounting and finance. It then outlines his career history working in financial controller roles for various companies in UAE and Pakistan from 2006 to present. His experience includes preparing financial statements, managing accounts, implementing controls, and liaising with external auditors and management.
The lacrimal system includes structures involved in tear production and drainage. Tears are produced by the lacrimal gland and drained through the puncta, canaliculi, lacrimal sac, and nasolacrimal duct. Obstructions anywhere in the lacrimal passage can cause epiphora or swelling. Congenital anomalies such as atresia or ectasia of the lacrimal passage can also cause drainage issues. The lacrimal system has important functions in maintaining the tear film and eye health.
This document discusses microservices and strategies for scaling applications. It describes partitioning services (X axis), functional decomposition (Y axis), and data partitioning (Z axis) as ways to scale applications. The key aspects of scalability, availability, latency, manageability, and cost are covered. Spring Cloud and Netflix components like Eureka, Zuul, Ribbon, and Feign are recommended tools for building microservices. Eureka acts as a service registry. Zuul provides routing, monitoring, and security. Ribbon provides load balancing. And Feign enables declarative web services. Docker is also mentioned for containerization.
El documento menciona tres obras del pintor español Salvador Dalí: Galatea De las Esferas (1952), La Désintégration De La Persistencia De La Memoria (1954), y Rosa Meditativa (1958). También menciona la obra Noir et Blanche (1926) del fotógrafo francés Man Ray. Finalmente, proporciona tres fuentes citadas relacionadas con las obras de Dalí.
Les véhicules à freinage automatique, l’apprentissage machine sur les appareils mobiles, la biométrie sur votre téléphone intelligent, et le système de navigation intérieure précis sur plusieurs mètres sont quelques-unes des innovations et perturbations qui transformeront le monde en 2017 et pour les années à venir.
Les Prédictions annuelles TMT de Deloitte identifient les principales tendances en matière de technologies, de médias et de télécommunications qui auront une incidence marquée au cours des 12 à 18 prochains mois.
Understanding and Applying Cloud Hybrid SearchJeff Fried
1) The document discusses cloud hybrid search, which allows searching across on-premises and online content in Office 365 with a single search experience.
2) It highlights benefits like simplified administration and lower costs compared to traditional federated search. However, there are also limitations with the default configuration regarding features like security trimming.
3) The document provides guidance on implementing cloud hybrid search and considerations for different environments, including performance, customizations, and regulatory requirements.
El documento describe los factores que afectan la velocidad de una reacción química. Explica que la velocidad de reacción se define como la cantidad de producto formado o de reactantes consumidos por unidad de tiempo. Los factores que afectan la velocidad incluyen la naturaleza de los reactivos, la concentración de los reactivos, el grado de subdivisión de los reactivos, la temperatura, la presión, y la presencia de catalizadores. Al aumentar estos factores, la velocidad de reacción generalmente aument
A tese investiga como revistas de turismo constroem imaginários perfeitos de viagem por meio de imagens e contratos comunicativos. Ela analisa estratégias usadas para criar experiências desejáveis e como destinos turísticos são figurativizados de acordo com essa perfeição. A fundamentação teórica interdisciplinar dialoga com pesquisadores como Beni, Krippendorf, Augé, Baumam, Sontag, Barthes, Duran, Amirou, Türcke, Semprini, Urry e Nasio.
This document discusses various techniques used to commit cyber fraud such as hacking, cracking, data diddling, denial of service attacks, and social engineering. It also covers the impact of cyber frauds on enterprises like financial loss, legal issues, loss of credibility. Examples provided include unauthorized access of Citi Bank data and the Sony email hack. Reasons for cyber frauds mentioned are organizations needing to update security practices, smart fraudsters, and failures of internal security controls. The document defines cyber frauds and differentiates between pure cyber frauds and cyber-enabled frauds. It outlines components of an information security policy including its purpose, security infrastructure, response mechanisms, and legal compliances. Finally, the document discusses the hierarchy of
The nuclear waste storage capacity of US nuclear plants is nearing its limit. Most plants have spent fuel pools that can hold 2000-5000 assemblies, and as of 2012, 27 plants had no dry cask storage. Spent fuel pools at many plants are 3/4 full and reaching capacity. While dry cask storage is increasing, providing some additional storage, the US has still not established a permanent nuclear waste storage site as required by law. The lack of a storage solution puts pressure on plant storage. In total, US plants currently store around 70,000 metric tons of high-level nuclear waste.
P Kershaw PP and IAG LB Bexley EtG Conference 16-11-16Paul Kershaw
The document discusses key aims and provision for improving careers guidance and reducing the number of NEET (Not in Education, Employment or Training) students among disadvantaged pupils. The aims are to ensure disadvantaged pupils are well prepared for their next steps after school and supported during and after leaving school. Provision discussed includes enhanced careers guidance, work experience opportunities, and tracking students after leaving school. The document also discusses how this aligns with Department for Education guidance and typical careers support that disadvantaged pupils should receive, such as one-on-one guidance, educational trips, and help with career management skills.
This document discusses the importance of making customer service an experience. It argues that while companies focus on transactions, customers want more than just a transaction - they want to have fun, be surprised and entertained. The document provides five golden rules for delivering a great customer experience: 1) affinity trumps capability, 2) be part of the gang, 3) knowledge is king, 4) diversity is human, and 5) trust will be rewarded. It also shares stories of companies that are successfully making customer service an experience through passionate experts, being part of the community, equipping employees with core skills and knowledge, embracing radical individualism, and being bold storymakers.
Ray was a surveyor who loved the outdoors but was diagnosed with multiple sclerosis in his 30s, which gradually took away his ability to use his body from the neck down. Despite his condition, Ray maintained a positive attitude with the support of his wife, who became his primary caregiver, though their limited funds prevented home improvements. Ray survived over 20 years with MS but additional income from disability insurance could have benefited him and his exhausted wife, who passed away shortly after Ray.
This document presents a computational method for estimating the population structure of viruses using pyrosequencing reads. The method involves four steps: 1) aligning reads to a reference genome, 2) correcting sequencing errors in the reads, 3) reconstructing haplotypes consistent with the reads, and 4) estimating the frequency of each haplotype in the population. The method is validated on pyrosequencing data from four HIV populations, with over 5000 reads each, by comparing the estimated populations to those obtained from clonal sequencing.
How to transform genomic big data into valuable clinical informationJoaquin Dopazo
This document discusses how to transform genomic big data into valuable clinical information. It begins by defining genomic big data and explaining how individual genome data contains more information than the original experiment. Next, it discusses lessons learned from genome-wide association studies, including that many loci contribute to traits and there is evidence of pleiotropy. However, individual genes cannot fully explain trait heritability. The document then discusses challenges in detecting disease-related variants from exome/genome sequencing data due to the large number of variants and presence of apparently deleterious variants in healthy individuals. It suggests taking a systems approach considering interactions and multigenicity to better understand variation and disease mechanisms.
Genomic gene expression changes resulting from Trypanosomiasis: a horizontal study Examining expression changes elucidated by micro arrays in seminal tissues associated with the pathophysiology of Trypanosomiasis during disease progression
Diversity of O Antigens within the Genus Cronobacter - MartinaPauline Ogrodzki
This study analyzed the diversity of O antigens in the bacterial genus Cronobacter by testing 82 strains representing all Cronobacter species. Restriction fragment length polymorphism analysis of the O-antigen gene cluster identified 11 previously reported and 6 new serotypes. Whole genome sequencing of reference strains confirmed the new serotypes and showed some existing PCR probes did not correctly identify genomic variations. Analysis of lipopolysaccharide phenotypes also differentiated 24 total serotypes among Cronobacter strains. Certain serotypes including C. sakazakii O2, O1, and O4 and C. turicensis O1 were found to predominately cause clinical infections. This work provides an updated systematic classification of Cronobacter serotypes.
This document summarizes key points from a class on microbial phylogenomics taught by Jonathan Eisen. It discusses reading scientific papers, specifically beginning with the introduction rather than the abstract. It also provides guidance on identifying the big question a field is trying to answer, summarizing the background and limitations of prior work, stating the specific questions authors are addressing, and identifying their experimental approach. The document does not summarize any specific paper.
Probability Models for Estimating Haplotype Frequencies and Bayesian Survival...Université de Dschang
M. Kum Cletus Kwa a soutenu une thèse de Doctorat/Phd en mathématiques ce 14 juin 2016 à l'Université de Dschang. Le jury lui a décerné à l'issue des échanges la mention très honorable.
This document summarizes experimental work analyzing the dynamics between bacteria (Escherichia coli strain B) and bacteriophage (T4) in a chemostat. The authors developed a mathematical model of the viral-host interactions and measured various parameters experimentally. They were able to determine growth efficiency, adsorption rate, latent period, and burst size but were still working to detect the sensitive bacterial population over time. Future work involves using the parameter data to simulate population dynamics and manipulating variables like glucose concentration and flow rate in additional chemostats.
This document discusses a study that found significant differences in gene expression variability between knockout and wild-type mice using microarray data from 25 publicly available datasets. The study found that knockouts exhibited either significantly increased or decreased variability compared to wild-types in virtually every dataset analyzed. Examination of the data distributions indicated that these differences were due to broad changes in variability across most genes, rather than being driven by outliers. The findings suggest that changes in gene expression variability due to gene knockouts may have important phenotypic consequences.
Reference for long range pcr based ngs applicationsssuser1e2788
The document describes a method for sequencing inherited retinal disease genes using long-range PCR. The researchers designed primers to amplify 35 genomic loci associated with retinal diseases in fragments up to 20 kb. They applied the method to 227 patients and were able to identify likely causative variants in 51% of previously unsolved cases and 24% of cases without a diagnosis after exome sequencing. The long-range PCR also helped characterize breakpoints of copy number variants and extended coverage of exome sequencing data.
This document summarizes a study on using rRNA sequences to analyze microbial communities from different environments through phylogenetic methods. It introduces the UniFrac metric, which calculates the phylogenetic distance between communities based on the shared and unique branch lengths in a phylogenetic tree of their rRNA sequences. The UniFrac metric can be used to test if communities are significantly different and to generate distance matrices to compare communities through clustering and ordination. The document evaluates UniFrac on data from 12 marine studies and explores how sampling depth affects clustering through jackknifing analyses. UniFrac provides a powerful way to integrate rRNA data from different studies into a single phylogenetic context to address questions about microbial ecology and diversity.
Looking Back at Mycobacterium tuberculosis Mouse Efficacy Testing To Move Ne...Sean Ekins
1) Tuberculosis kills over 1.6 million people per year and 1/3 of the world's population is infected. However, only one new drug has been approved in the past 40 years.
2) The authors have compiled a database of over 1,500 molecules tested in murine models of tuberculosis infection, along with their activities and molecular properties.
3) Machine learning models were able to retrospectively predict the activities of compounds in the murine models with up to 72.7% accuracy, suggesting these models may help prioritize compounds for further testing and identify new treatment leads.
This document describes a study that used microfluidic mechanical trapping to map protein-protein interactions between 90 proteins (over a third of which are transcription factors) and the four subunits of E. coli RNA polymerase. The study detected interactions that had been missed by previous high-throughput screens, suggesting some interactions can occur without DNA mediation more commonly than previously thought. Independent validation of selected interactions found evidence of binding between RNA polymerase and four transcription factors (lrp, narL, rhaR, and zraR), providing support for the protein-protein interactions identified through microfluidic mechanical trapping.
ASHG 2015 - Redundant Annotations in Tertiary AnalysisJames Warren
After obtaining genetic variants from next generation sequencing data, a precursory step in tertiary analysis is to annotate each variant with available relevant information. There is no standardized compendium for this purpose; researchers instead are required to compile data from a motley of annotation tools and public datasets. These sources for annotation are independently maintained, and accordingly there is limited concordance between their reported contents. The choice of annotation datasets thus has a direct and significant impact on the results of the analysis.
This research aims to identify protein-protein interactions in Anopheles gambiae that could be targeted by ligands to block transmission of malaria. Computational methods were used to predict nearly 10,000 putative interactions and 100 interaction modules. The interactions were identified using various data sources and techniques including gene expression, regulatory motifs, orthology, and literature mining. Validation with additional methods like structural analysis could help confirm some of the predicted interactions. The goal is to develop novel drugs, insecticides or repellents by disrupting interactions essential for malaria transmission.
This document describes the development and validation of a new quantitative PCR (qPCR) assay to estimate total bacterial load in stool samples.
1) The assay targets a conserved region of the 16S rRNA gene using new primers and a probe to generate a shorter amplicon compatible with clinical diagnostics.
2) Testing on 500 liquid and 50 solid stool samples showed the assay accurately measured total bacterial load compared to culture-based methods.
3) The new assay addresses previous issues with non-specific priming and amplification bias, and provides a standardized method for quantifying total bacteria in complex clinical samples.
The proposed research aims to develop a computational approach to analyze associations between transcription factor genes and diseases like cancer. The approach will extract gene-disease relationships from literature based on supporting evidence between genes, diseases, and evidence. Relationships will be quantitatively evaluated to extract strongly supported gene-disease linkages and rank them. Existing methods are reviewed that use properties of representative disease genes to find similar candidate genes, but the proposed method will emphasize verifiable evidence for predicted associations and their strength. The goal is to predict gene-disease relationships based on relationships between other entities to help discover disease genes.
This document outlines the schedule and requirements for a genomics course consisting of 9 sessions over March and May. Students are required to attend all sessions and give one 20-minute seminar and write one essay. Seminars will be 15% of the final grade and essays will also be 15%, with a final exam making up the remaining 70% of the grade. Topics for the seminars and essays will be assigned.
Ethan Willie summarizes his 8-month contribution to the Genome Sciences Centre, where he worked on several pipelines including ABySS, Trans-ABySS, and Genome-Validator. He validated tools like ChimeraScan, hg38 annotations, Trinity, and Manta. Willie analyzed multiple projects, developed scripts to improve workflows, and learned skills in bioinformatics problem-solving, scripting, visualization, and presentation. He acknowledges areas for improvement like troubleshooting and public speaking, and hopes to further develop his genomics skills and apply his experience in future roles.
ChimeraScan is a tool that uses paired-end transcriptome sequencing to discover chimeric transcripts, which are fusion events involving two different genes. It works by aligning reads, identifying discordant read pairs that map to different genes, and then nominating chimeras. It differs from other fusion finders by adding a fragmentation step before alignment. The document then describes ChimeraScan's algorithm in 12 steps and how to run it. Results are output in BEDPE format. It is compared to other tools using two libraries, finding some unique events but also having higher false positives than others. Overall improvements could include mapping event types and reducing runtime.
This laboratory report summarizes an experiment exploring RNA splicing in Drosophila melanogaster. Genomic DNA and total RNA were extracted from fruit flies and used to study the rngo gene. PCR and RT-PCR were performed on the genomic DNA and cDNA samples. The genomic PCR product was cloned and sequenced. Bioinformatics analysis showed the genomic sequence was longer, containing introns absent from the cDNA, indicating splicing of the rngo pre-mRNA. Future work could investigate other splicing sites and homology to human genes.
This document summarizes an experiment that aimed to change both the expression level and color of the fluorescent protein mCherry. The experiment involved:
1) Using restriction digestion and ligation to swap the promoter of mCherry from low to high expression, resulting in more mCherry colonies.
2) Attempting site-directed mutagenesis to change mCherry to mOrange but this was unsuccessful, as no orange colonies were observed.
3) Characterizing the fluorescence of mCherry, mOrange from a partner, and a negative control colony, finding mOrange emitted better at 500nm.
This document describes a target heart rate monitor project. It takes a 30 second video of a user's finger and measures color intensity changes to obtain their heart rate in beats per minute. It uses various signal processing methods like brightness computation, band-pass filtering, Fourier transforms, peak detection, and smoothing. These methods extract the heart rate from the signal and produce an EKG graph. The project aims to help users achieve their optimal heart rate for activities by monitoring and advising them in real-time. The document outlines the materials, methods, results, accomplishments, and individual contributions of the three student authors.
This document describes an algorithm to identify cigarette butts in images. The algorithm uses color segmentation, edge detection, and enhancement techniques in Matlab. It turns the original image into a binary image segmented by the color of cigarette butts. Color and edge detection are used to create a binary mask. Enhancement techniques like dilation and hole filling are applied to smooth edges before labeling objects with random colors for visualization. While the algorithm identifies most cigarette butts, it does not fully eliminate background noise.
1) The document describes a project to automate the identification of individual fin whales from photographs by applying machine learning techniques. It involves segmenting whale images to isolate identifying features, extracting features from the images using pre-trained convolutional neural networks, and classifying the whales based on these features.
2) The dataset contained 884 images of 79 individual whales, which is much smaller than datasets used in previous whale identification research. This limited the complexity of models that could be trained without overfitting. Significant effort was spent preprocessing the images to improve the signal-to-noise ratio before classification.
3) Various techniques were tested for segmentation, including Markov random fields and hidden Markov random field expectation maximization. Features were then
1. Simon Fraser University
Project report for CMPT 441/711
Bioinformatics Algorithms
Illuminating the Diversity of
Pathogenic Bacteria, Borrelia
Burgdorferi in Ticks
Authors:
Stanley Gan
Elijah Willie
Arthur Song
Ruochen Jiang
Supervisor:
Dr. Leonid
Chindelevitch
Co-Supervisor:
Katharine Walter
December 8, 2016
2. Abstract
In our group project, we try to assess the diversity of pathogenic
bacteria, Borrelia Burgdorferi, by understanding the co-infection pat-
terns in our samples. We introduced a combination of probabilistic
approach and optimization technique in attacking this problem. First
of all, we utilized multiple bioinformatics tool kits such as Bowtie,
IGV, Samtools etc, for read mapping and read visualization. Next,
based on the result from read mapping, we calculated proportions
of each variants observed at each loci for each sample using Bayes’
rule. At last, we will introduce a few approaches to answer questions
related to proportions of different co-infecting strain types in each
sample and introduce a minimum number of new strain types into
the existing reference database. The three approaches we attempted
during the course of our project are as follows, mixed integer linear
programming(Mixed ILP), network flow(NF), and Genetic Algorithm
Model(GA). In this report, we only provide the details of the ap-
proaches that we tried as the formulation needs to be reviewed before
implementation in order to get any significant results. However, from
the calculated proportions, we are able to infer that all of our samples
are indeed co-infected by multiple different strain types.
1 Introduction
Borrelia Burgdorferi is the bacterial agent that causes Lyme disease in hu-
mans and is spread by Ixodes ticks.[1] Lyme disease is a vector-borne infection
with numerous vertebrate species capable for transmission and humans are
one of the dead end hosts.[1] There are over 30,000 incidences reported in
the United States and hence it is one of the most common vector-borne dis-
ease spreading geospatially in North America. Due to the wide spreading of
Lyme disease, it is interesting yet challenging to study the strain diversity
of Borrelia. In our project, we would like to address two different biological
questions. Firstly, we would like to understand the number of different strain
types present in each tick sample. Secondly, we would like to compare these
strain types to the existing reference database and suggest a set of new strain
types if necessary. One of the motivation behind these biological questions
is co-infection patterns. By understanding these patterns, we can investigate
the complexity of heterogeneous bacterial population. One of the forces that
drive bacterial diversity is genetic recombination, which can only happen in
1
3. samples co-infected by at least two different bacterial strain types.[1] In our
context, genetic recombination will mean the exchange of genetic materi-
als between multiple chromosomes of different bacterial strains.[7] Moreover,
different bacterial genotypes have different transmission rates among host
and/or vectors, which can produce distinct transmission cycles and lead to
discrepancy in disease risks contributed by different host populations.[8] Fur-
thermore, different bacterial strain types are likely to have different possible
harms towards human. For example in [9], the paper describes arthritis as a
symptom that often accompanies B. burgdorferi s.s. infection. On the other
hand, neurological symptoms are associated with B. garinii, and skin disor-
ders with B. afzelii. Therefore, understanding co-infection patterns which in
turn illuminates bacterial strain diversity has a certain degree of significance
in developing more reliable prevention and control protocol.
2 Previous work
Some related biological aspects in our project were studied by Katharine Wal-
ter group[1] previously. In their research, they pointed out that Within-host
pathogen diversity may have important implications for human health and
disease epidemiology because hosts are frequently co-infected with multiple
pathogen species. They chose Borrelia burgdorferi as the model to study
within-host processes. In their experiment, they examined genomic variation
of Borrelia burgdorferi in 98 individual field-collected tick vectors. And the
experiment shows that 70% of ticks are infected with multiple strains. Their
work gives an idea that disease vectors like tick can be studied as epidemio-
logical sentinels.
The method used to capture the genome dataset we used in our project
was studied by Giovanna Carpi group[2] in 2015. In their work they used cus-
tom probes for multiplexed hybrid capture to sequence 30 Borrelia burgdor-
feri genomes and found that it nearly sequenced the complete( 99.5 %)
genome of Borrelia burgdorferi.
In addtion to this, Wibke Cramaro[3] group did a research on lxodes
ricinus, which is the most common tick species and most important vector
of human and animal pathogens in western Europe in 2013. They sequenced
2
4. lxodes ricinus’ genome and all sequence data was made available to public
database.
3 Data Description
Before further discussion, the data that we are working on is the whole
genome sequence of 30 tick samples sequenced using hybrid capture meth-
ods and Illumina short read technology(paired end 75bp)[2]. Also, we have
a reference database of 679 strain types based on multi-locus sequence typ-
ing(MLST) of 8 housekeeping genes. In this context, MLST is a molecular
typing method[11] in which the following 8 housekeeping genes are sequenced:
clpA, clpX, nifS, pepX, pyrG, recG, rpiB and uvrA. Furthermore, a unique
Borrelia bacterial strain type is defined by these 8 housekeeping genes(please
refer to [10] for definition based on ospC gene).
4 Problem Illustration and Description
Figure 1: Illustration of the problem
We are given the data about the genotypes observed at all loci for N sam-
ples and their respective proportions. For example in Figure 1, we observed
genotype of type A and B with proportion 0.75 and 0.25 respectively at locus
1 for sample 1. Given the genotypes observed at all loci for N samples, we
3
5. produce different strains based on combinations of genotypes at all loci while
preserving proportions. For example in Figure 1, we have shown 2 different
examples which are J={0.25BXU, 0.25AXV, 0.5AYW} and K={0.25BYU,
0.25AYV, 0.5AXW}. In J, proportion of B is 0.25(from BXU), proportion
of A is 0.25(from AXV) + 0.5(AYW)=0.75, ... and so on. The proportions
are preserved. We restate the problem in a mathematical perspective:
Given a library of known strain types (679 types), we are trying to explain
what we see(genotypes) using as few new strain types as possible. From a
mathematical perspective, we want to minimize the number of new strains
introduced to our existing library.
For example in a simple case, if our library contains {BXU, AXV, ... },
we will choose J instead of K as we have to introduce at least 2 new strain
types if we choose K. Definitely, there are other criteria to consider such as
the proportions.
5 Methods
Before dealing with any optimization, we have to calculate the proportions
as illustrated in Figure 1.
5.1 Calculating Proportions
The first step was to extract the sequences for each of the 8 house keeping
genes and their individual variants. This information was also provided,
which made advancing to the next step much easier. After obtaining the
sequences for the housekeeping genes and their variants, the next step was
to compute the proportions of reads from the thirty bacteria in the ticks
for which their whole genome was captured that maps to each variant for
all the housekeeping genes. This is accomplished by first using Bowtie[4]
to map all the reads from each of the thirty samples to each of the eight
housekeeping genes. An interesting observation is to note that about less
than one percent of each sample will map to a gene. This is to be expected
as the whole genome was sequenced, and we are only interested in eight of
the total gene populus. Next Integrative Genomics Viewer (IGV) [5], and
Samtools ”tview” [6] was used for visualizing reads that mapped to each of
the variants for each of gene. Visualization was necessary because it enabled
4
6. us to check if there were any reads that had only a portion that mapped to
a variant. However, over 99% of reads that mapped to a variant mapped to
the inside of a variant. For each gene, to compute the proportions of reads
that maps to a variant. We are interested in computing
P(vari | readj) (1)
That is we are interested in computing the probability of a variant given a
read. This is very difficult to do, as we do not have any prior information
about the distribution of variants for a gene. However, Bayes’ rule states
that
P(vari | readj) =
P(readj | vari) P(vari)
P(readj)
(2)
Thus we do not need to directly compute equation (1) we can compute
P(readj | vari) (3)
and multiply it by a proportionality constant to get
P(vari | readj) = P(readj | vari) kj (4)
where
kj =
P(vari)
P(readj)
(5)
By summing over all variants and equating to 1, we can solve for (5) i.e.
i
kj P(readj | vari) = 1 (6)
To compute (3) we can appeal to the binomial distribution since it is given
that 1
100
of mismatches within a mapping is due to sequencing errors. Thus
we can use the number of mismatches which bowtie reports for a mapped
read and the Binomial Distribution to compute a probability distribution
over (3). Thus we have that
P(readj | vari) =
mj
lj
(
1
100
)lj
(
99
100
)mj−lj
(7)
where mj is the length of the readj, and lj is number of mismatches for the
mapping between readj and varianti. Plugging (7) into (1), we get that
j
kj
m
lj
(
1
100
)l
(
99
100
)m−lj
= 1 (8)
5
7. Thus we are now able to compute kj for all readj. Now we are fully equipped
to compute the proportions for each variant in a gene. The proportion of a
varianti for a gene G will be
j P(vari | readj)
h i P(varh | readj)
(9)
We just sum up over all reads that maps to a particular variant for a gene
G and divide by the sum over all variants for that gene and multiply by 100
to get proportions in percentages.
5.2 Optimization
For the optimization part, we have 3 approaches that we tried during the
course of our project.
5.2.1 Mixed Integer Linear Programming
The idea in this program is to formulate the problem rigorously and minimize
the number of new strains, the proportions of the new strains using 0/1
weights indicator. Besides, this program also captures the possible errors
between the true proportion of a variant and its observed proportion, in
which these errors may happen due to sampling in the lab. These errors will
also be included into the objective function.
Known Parameters
• Number of loci: 8
• Number of samples: 30
• Set of different genotypes observed on sample i at locus j, Gij = {g
(1)
ij ,
g
(2)
ij , ...}.
• Pij = {p
(1)
ij , p
(2)
ij , ...} where p
(k)
ij corresponds to the proportion of geno-
type g
(k)
ij in Gij. (Note:
|Pij|
k=1
p
(k)
ij = 1, ∀1≤i≤30, ∀1≤j≤8)
• Reference = Ω where |Ω|=679
6
8. • Set of possible different strains for sample i (different combinations
of the genotypes we observed at all loci), Vi = {V
(1)
i , V
(2)
i , ...,V
(Hi)
i },
where Hi = L
j=1 |Gij|
• A representation of the strain type, V
(k)
i =
a
(k)
i1,1 a
(k)
i2,1 . . . a
(k)
iL,1
a
(k)
i1,2 . . . . . . a
(k)
iL,2
...
...
...
a
(k)
i1,|Ni1|
...
...
...
...
a
(k)
i1,Ri
a
(k)
i2,Ri
. . . a
(k)
iL,Ri
,
i-th sample k-th combination, ∀1 ≤ i ≤ 30. a
(k)
ij = {0,1}, Ri =
maxj |Gij|. For those j such that |Gij| < Ri, a
(k)
ij = 0 for k = |Gi(j+1)|, .., Ri
• For the example in the problem description, if V1 = {V
(1)
1 = BXU, V
(2)
1
= AXV, V
(3)
1 = AYW }, the matrix representation is as follows: V
(1)
1
=
0 1 1
1 0 0
0 0 0
, V
(2)
1 =
1 1 0
0 0 1
0 0 0
, V
(3)
1 =
1 0 0
0 1 0
0 0 1
• Weight for each strain type V
(k)
i , w
(k)
i where w
(k)
i =1 iff V
(k)
i is a new
strain type, otherwise w
(k)
i =0
• Weight for the proportion of each strain type V
(k)
i , c
(k)
i where c
(k)
i =1 iff
V
(k)
i is a new strain type, otherwise c
(k)
i =0
Decision Variables
• Indicator variable a
(k)
i where a
(k)
i =1 iff V
(k)
i is chosen to explain the
samples, otherwise a
(k)
i =0
• Proportion of the strain type V
(k)
i , π
(k)
i
• Eij = {e
(1)
ij , e
(2)
ij , ...} where e
(k)
ij corresponds to the error of the observed
proportion p
(k)
ij of g
(k)
ij from its true proportion.
7
9. • For convenience, let Φi =
p
(1)
i1 p
(1)
i2 . . . p
(1)
iL
p
(2)
i1 . . . . . . p
(2)
iL
...
...
...
p
(|Ni1|)
i1
...
...
...
...
p
(Ri)
i1 p
(Ri)
i2 . . . p
(Ri)
iL
, ∀1 ≤ i ≤
30. For those j such that |Gij| < Ri, p
(k)
ij = 0 for k = |Gi(j+1)|, .., Ri
Constraints
• p
(k)
ij ∈ [0, 1], e
(k)
ij ∈ [−p
(k)
ij , 1 − p
(k)
ij ] ∀i, j, k and
|Pij|
k=1
(p
(k)
ij + e
(k)
ij ) = 1,
∀1 ≤ i ≤ 30, ∀1 ≤ j ≤ 8
• a
(k)
i ∈ {0, 1} ∀i, k
• π
(k)
i ∈ [0, 1] and
Hi
k=1
π
(k)
i = 1 for all i
• e
(k)
ij ≤ T
(k)
i − p
(k)
ij and e
(k)
ij ≤ p
(k)
ij − T
(k)
i where T
(k)
i =
(i,k):g
(k)
ij ∈V
(k)
i
π
(k)
i
• For all sample i where 1 ≤ i ≤ 30,
Hi
k=1
π
(k)
i · V
(k)
i = Φi
Objective Function
min
i,k
(w
(k)
i · a
(k)
i + c
(k)
i · π
(k)
i ) +
i,j,k
e
(k)
ij
8
10. 5.2.2 Network Flow
In the NF model, we try to tackle the problem using a 2 step approach.
The first step would be to maximize the proportions of existing strain types,
which is analogous with finding a maximum flow in the network. The second
step is to explain the remaining proportions using a minimal number of new
strains. Our group tried to formulate the first step and implemented it. How-
ever, there are technicalities that have to be considered before implementing
it. In this report, we will introduce the idea that we tried and the reason it
does not work.
We create a network for each sample and we will use a simplified example
to illustrate it.
Figure 2: Simplified example for a sample
Figure 3: Network for simplified example
Consider a simplified example shown in figure 2. In this example we have
1 sample and 3 loci. Besides, assume that we have 4 reference strain types:
9
11. ACE, ADF, ACF and BCE. The proportion of each variant is shown in the
table. We build a network as shown in figure 3. In this network, we construct
several independent layers: Source layer, Locus layer, Reference layer, Merge
layer and Sink Layer. The order of the flow path will suggest which variant
to choose to explain the sample.
Construction: (1) Build layer: We construct the whole network layer
by layer. First we have Source Layer, which only contains a source node
required for a network flow structure. Next we build Locus Layer. Locus
Layer contains 3 sub-layer representing each independent locus. In each sub-
layer, it contains several nodes which represent the observed variants. Then
we have Reference layer, it contains all reference strain types that contain any
of the observed variants. For example, if XYZ is also a reference strain type
but we did not observe any variants of type X, Y, Z in the 3 loci respectively,
we do not include it in the network construction for this sample. Finally, we
have the Merge Layer which contains a merge node, and a Sink Layer that
has a sink node required for a network flow structure. (2) Edges: Connect
the source node to all nodes in the first Locus layer. As for the edges between
the sub layers in the Locus layer, they are dependent on the reference strain
types. For example, we have ACE as a reference strain type, hence we will
connect A → C →E. For the last layer in Locus layer, we just connect it to
all relevant reference nodes (the last variant in reference is the same as the
node in last Locus layer). Finally we connect all reference node with Merge
layer, and connect Merge layer to Sink layer (3) Assign capacity: For each
edge pointing to a node in Locus layer, the capacity is same as the proportion
in the table. For the edge pointing to reference nodes, Merge layer and Sink
layer, there is no limitation, we can just assign a maximum capacity to them.
Figure 4: A flaw in simplified example
10
12. Problem: The path of maximum flow flowing through a particular ref-
erence type represents the maximum proportion of that particular reference
type in which we are able to use, without violating any constraints. This
model is intuitive but it does not preserve the sequential relationship of the
variants in the reference types. The problem is illustrated as follows. As we
can see in figure 4, the node BCF is not a reference strain type and hence it
is not in the Reference layer. Therefore, there is a possibility that the max-
imum flow path goes through node B, node C and node F in Locus layer.
One possible solution to preserve the sequential relationship of the variants
in the reference type is the following:
1. Require each node in the Locus Layer has in-degree=1
2. If in-degree=k > 1, create k copies of the node
3. Given a sequential relationship of a known strain type, for example
uvw, u → v → w, if in-degree(w)=1, connect v to an unmatched copy
of w
5.2.3 Genetic Algorithm model
Since it may be difficult to find out a correct objective function, we also
attempted to design a genetic algorithm (GA) model to solve the problem
(we want to use the reference in library as much as possible to explain the
sample) in the network-flow section. GA is an algorithm that involves the
use of stochastic mutation, which helps to simulate the evolution process of
nature to help us explain the variations observed.
A common GA model usually has several parts, the initial population,
the environment, and the mutation function. In GA algorithm, we try to
evolve the initial population to our final population which fits the enviorn-
ment best. In our problem, we need another parameter to help make the
computation faster. This parameter is the called torch function. In this
problem, the population refers to all variables that needs to computed. For
example, if we have three references, BXU, AXV and AYW, we have three
variables prr1 (BXU ), prr2 (AXV ), prr3 (AYW ) to compute.
11
13. Initial population: Simply let prri
be the minimum proportion along its
path, because it is the maximum proportion that each reference node could
have.
Environment: The environment is a function to judge if an individual
in the population is suitable to survive. In this case, we testify whether
these variables (prri
) fit in our constraints. For example, we require that:
(1) if a reference contains a variant, then all
reference node i contains variant j
prri
≤
Prvariantj (2) prri
≤ the minimum proportion of variant that this reference
contains. We could say the environment possibly selected a good result for
us; Or, if an individual suits these requirements, then we can infer that this
individual possibly yields a good result which we expected. Hence, we want
to make a slight perturbation on this result and observe if we could get a
better result, which could serve as a reason for the survival of this individual.
Mutation: Mutation function tells us how much modification can be
made from previous result to a new one. For example, we set a mutation
rate to be less than or equal to 3
26
. We consider the mutation process of
character ’D’ and we apply this mutation on this alphabet, then ’D’ will
mutate from ’A’ to ’G’ in next iteration. Usually, for a simple version of
mutation, we could have a random mutation and control its mutation rate.
This mutation rate is similar to the learning rate in SGD (stochastic gradient
descent) method.
Torch: For the torch part, we record previous results and only keep in-
dividuals which are better than previous results. This serves as a guidance
to help the population evolve to a particular target.
At last, we combine all these components and let initial population live in
our environment. After running sufficient amount of iterations, we could get
a good enough result. However, we may not get the optimal result because
we do not know the actual objective function F(x) and there might numerous
x that satisfy F (x ) = 0.
12
14. 6 Results
As the optimization problem is still in formulation, we can only present some
results for the proportions.
Figure 5: A part of the results from computing proportions for genes clpX,
clpA and nifS. Note the proportions are presented as percentages
Figure 5 shows partial results after computing the proportions of three
genes (clpX, clpA, and nifS). From these tables, we can see that the pro-
portion percentages are all relatively small. This indicates that there is no
sample with a variant that has an extensively greater proportion.We can
thus infer that each sample is infected with a substantial amount of different
strains.
7 Discussion and Future Work
As most of our group mates are unfamiliar with bioinformatics and the bio-
logical aspects of this problem, during the course of this project, we learned
substantial knowledge about the biological importance of this project and
the terminologies used in bioinformatics, especially when we were using the
13
15. tool kits. Moreover, we experienced implementing a wide range of knowledge
in probability and mathematics, ranging from simple yet powerful theorem
such as Bayes’ rule to sophisticated algorithmic techniques such as network
flows and Mixed ILP.
We had a few challenges during the course of our project. One of them
is to understand a wide range of biological terminologies, which appeared
to be a steep learning curve in the beginning of our project. Furthermore,
there were lack of resources available with regards to the optimization part of
this project, based on our research. Therefore, we are required to formulate
new idea in tackling the optimization problem and we encountered numerous
failures while producing a good formulation to represent the problem. This
part of the project takes a substantial amount of time. Nevertheless, we are
satisfied and delighted to have these obstacles, as these challenges provide us
the chance to study this problem in great depth, and help us in formulating
better representations of the problem.
Once there is a clear and robust formulation of the optimization prob-
lem, we will have better insights on co-infection patterns among our tick
samples, which subsequently contribute to more effective ecological control
of the spreading of the disease. Definitely, we can try to tweak the Mixed
ILP model by trying complex weight function rather than 0/1, which might
encapsulate the biological meaning better. In fact, this approach can also be
applied to other pathogenic bacteria apart from Borrelia Burgdorferi. Hence,
the approaches that we explained in the report might serve as a general plat-
form for computational biologist or life scientist in their area of interest.
8 Supplementary Materials
In our first step, we used a python script to map reads and compute the
proportions of reads that map to a variant. The python script is avail-
able at http://tiny.cc/wexkhy. We got a large dataset of results from
computing proportions. Figure 5 in Results part shows a small part of our
computed results and all the computed results data is available at http:
//tiny.cc/ezjkhy.
We also used IGV and Samtools tview to visualize mapped reads. Figure
14
16. 6 and Figure 7 show parts of our results.
Figure 6: A part of the visualized results by using IGV. Arrows show the
reads orientation, and colours shows the read pairing.
Figure 7: A part of the visualized results by using Samtools. Top strand
shows the reference sequence, ’.’ and ’,’ shows the paired sequences, and
mismaches are shown as single characters.
15
17. References
[1] Walter KS, Carpi G, Evans BR, Caccone A, Diuk-Wasser MA (2016)
Vectors as Epidemiological Sentinels: Patterns of Within-
Tick Borrelia burgdorferi Diversity. PLoS Pathog 12(7): e1005759.
doi:10.1371/journal.ppat.1005759
[2] Carpi G, Walter KSK, Bent SJS, Hoen AGA, Diuk-Wasser M, et al.
(2015) Whole genome capture of vector-borne pathogens from
mixed DNA samples: a case study of Borrelia burgdorferi.
[3] Wibke Jochum, Anna L. Reye and Claude P. Muller (2013) Whole
genome sequencing of Ixodes ricinus, the European Lyme dis-
ease vector.
[4] Langmead B, Trapnell C, Pop M, Salzberg SL (2009) Ultrafast and
memory-efficient alignment of short DNA sequences to the hu-
man genome. Genome Biol 10:R25.
[5] James T. Robinson, Helga Thorvaldsd´ottir, Wendy Winckler, Mitchell
Guttman, Eric S. Lander, Gad Getz, Jill P. Mesirov. (2011) Integrative
Genomics Viewer. Nature Biotechnology 29, 24–26
[6] Li H.*, Handsaker B.*, Wysoker A., Fennell T., Ruan J., Homer N.,
Marth G., Abecasis G., Durbin R. and 1000 Genome Project Data
Processing Subgroup (2009) The Sequence alignment/map (SAM)
format and SAMtools. Bioinformatics, 25, 2078-9.
[7] Clancy, S., (2008). Genetic recombination. Nature education, 1(1),
p.40.
[8] Jacquot, M., Abrial, D., Gasqui, P., Bord, S., Marsot, M., Masseglia,
S., Pion, A., Poux, V., Zilliox, L., Chapuis, J.L. and Vourc’h, G.,
(2016). Multiple independent transmission cycles of a tick-borne
pathogen within a local host community. Scientific Reports, 6.
[9] Tilly, K., Rosa, P.A. and Stewart, P.E., (2008). Biology of infection
with Borrelia burgdorferi. Infectious disease clinics of North Amer-
ica, 22(2), pp.217-234.
16
18. [10] Barbour, A.G. and Travinsky, B., (2010). Evolution and distribu-
tion of the ospC gene, a transferable serotype determinant of
Borrelia burgdorferi. MBio, 1(4), pp.e00153-10.
[11] Maiden, M.C., Bygraves, J.A., Feil, E., Morelli, G., Russell, J.E., Ur-
win, R., Zhang, Q., Zhou, J., Zurth, K., Caugant, D.A. and Feavers,
I.M., (1998). Multilocus sequence typing: a portable approach
to the identification of clones within populations of pathogenic
microorganisms. Proceedings of the National Academy of Sciences,
95(6), pp.3140-3145.
17