This document provides a review and summary of major scientific events, trends, and publications in translational bioinformatics in 2008 by Russ B. Altman from Stanford University. Some of the key topics covered include the sequencing and analysis of an individual's diploid genome, next-generation sequencing technologies, genome-wide association studies, pharmacogenomics, analysis of high-throughput molecular data, neuroscience datasets, and using molecular information to improve disease detection and treatment. The review highlights over 25 seminal papers from 2008 and provides insights on emerging trends in the field.
Large scale machine learning challenges for systems biologyMaté Ongenaert
Large scale machine learning challenges for systems biology
by dr. Yvan Saeys - Machine Learning and Data Mining group, Bioinformatics and Systems Biology Division, VIB-UGent Department of Plant Systems Biology
Due to technological advances, the amount of biological data, and the pace at which it is generated has increased dramatically during the past decade. To extract new knowledge from these ever increasing data sets, automated techniques such as data mining and machine learning techniques have become standard practice.
In this talk, I will give an overview of large scale machine learning challenges in bioinformatics and systems biology, highlighting the importance of using scalable and robust techniques such as ensemble learning methods implemented on large computing grids.
I will present some of our state-of-the-art tools to solve problems such as biomarker discovery, large scale network inference, and biomedical text mining at PubMed scale.
Large scale machine learning challenges for systems biologyMaté Ongenaert
Large scale machine learning challenges for systems biology
by dr. Yvan Saeys - Machine Learning and Data Mining group, Bioinformatics and Systems Biology Division, VIB-UGent Department of Plant Systems Biology
Due to technological advances, the amount of biological data, and the pace at which it is generated has increased dramatically during the past decade. To extract new knowledge from these ever increasing data sets, automated techniques such as data mining and machine learning techniques have become standard practice.
In this talk, I will give an overview of large scale machine learning challenges in bioinformatics and systems biology, highlighting the importance of using scalable and robust techniques such as ensemble learning methods implemented on large computing grids.
I will present some of our state-of-the-art tools to solve problems such as biomarker discovery, large scale network inference, and biomedical text mining at PubMed scale.
Functional Genomics Journal Club presentation on the following publication:
Kuzawa, C. W., Chugani, H. T., Grossman, L. I., Lipovich, L., Muzik, O., Hof, P. R., … Lange, N. (2014). Metabolic costs and evolutionary implications of human brain development. Proceedings of the National Academy of Sciences, 111(36), 13010–13015. https://doi.org/10.1073/pnas.1323099111
Drug discovery and development is a long and expensive process over time has notoriously bucked Moore's law that it now has its own law called Eroom's Law named after it (the opposite of Moore). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of drug failures. Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible becomes all the more important to accelerate drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains. Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories: 1. Classification 2. Regression 3. Read-across. The talk will also cover how by using a hierarchical classification methodology you can simplify the problem of assessing toxicity of any given chemical compound. We will also address recent progress of predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them. We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will also address some of the remaining challenges and limitations yet to be addressed in the area of drug safety assessment.
Presentation for teaching faculty about resources, data, issues, and strategies for including personal genomics in the classroom, within the context of precision medicine as an overarching theme.
Prof. Mark Coles (Oxford University) - Data-driven systems medicinemntbs1
The summary of Prof. Mark Coles' presentation from the Jun 11-12th 2019 event Data-driven systems medicine at Cardiff University Brain Research Imaging Centre.
The Impact of Information Technology on Chemistry and Related SciencesAshutosh Jogalekar
This is a copy of an invited talk I gave at the ACS meeting in Dallas in March 2014. The talk was about the impact of information technology on chemistry and related sciences. I interpreted 'information technology' broadly and divided the talk into three sections: Data, Simulation and Sociology.
'Data' talks about how chemical information has grown exponentially and how chemists are coming up with new techniques to store, organize and understand this information.
'Simulation' talks about how chemists are using the last two decades' spectacular progress in hardware and software to understand the behavior of molecules in a variety of applications ranging from drug design to new materials.
'Sociology' talks about the impact of blogs and social media on the practice of chemistry. More specifically I talk about how social media is serving as a 'second tier' of peer review and how this new medium is having an increasingly influential impact on many issues close to chemists' hearts including lab safety, 'chemophobia' and the public appreciation of chemistry.
Update on the gene wiki project, introduction to knowledge.bio semantic search application, introduction to biobranch.org collaborative decision tree creator
Why You Should Join a Nonprofit Board, by Don ChamberlinDon Chamberlin
Despite the myth, a nonprofit board is open to just about anybody, not just those with lots of money in their pockets. Here are some reasons you shouldn't turn down this chance.
Functional Genomics Journal Club presentation on the following publication:
Kuzawa, C. W., Chugani, H. T., Grossman, L. I., Lipovich, L., Muzik, O., Hof, P. R., … Lange, N. (2014). Metabolic costs and evolutionary implications of human brain development. Proceedings of the National Academy of Sciences, 111(36), 13010–13015. https://doi.org/10.1073/pnas.1323099111
Drug discovery and development is a long and expensive process over time has notoriously bucked Moore's law that it now has its own law called Eroom's Law named after it (the opposite of Moore). It is estimated that the attrition rate of drug candidates is up to 96% and the average cost to develop a new drug has reached almost $2.5 billion in recent years. One of the major causes for the high attrition rate is drug safety, which accounts for 30% of drug failures. Even if a drug is approved in market, it could be withdrawn due to safety problems. Therefore, evaluating drug safety extensively as early as possible becomes all the more important to accelerate drug discovery and development. This talk provides a high-level overview of the current process of rational drug design that has been in place for many decades and covers some of the major areas where the application of AI, Deep learning and ML based techniques have had the most gains. Specifically, this talk covers a variety of drug safety related AI and ML based techniques currently in use which can generally divided into 3 main categories: 1. Classification 2. Regression 3. Read-across. The talk will also cover how by using a hierarchical classification methodology you can simplify the problem of assessing toxicity of any given chemical compound. We will also address recent progress of predictive models and techniques built for various toxicities. It will also cover some publicly available databases, tools and platforms available to easily leverage them. We will also compare and contrast various modeling techniques including deep learning techniques and their accuracy using recent research. Finally, the talk will also address some of the remaining challenges and limitations yet to be addressed in the area of drug safety assessment.
Presentation for teaching faculty about resources, data, issues, and strategies for including personal genomics in the classroom, within the context of precision medicine as an overarching theme.
Prof. Mark Coles (Oxford University) - Data-driven systems medicinemntbs1
The summary of Prof. Mark Coles' presentation from the Jun 11-12th 2019 event Data-driven systems medicine at Cardiff University Brain Research Imaging Centre.
The Impact of Information Technology on Chemistry and Related SciencesAshutosh Jogalekar
This is a copy of an invited talk I gave at the ACS meeting in Dallas in March 2014. The talk was about the impact of information technology on chemistry and related sciences. I interpreted 'information technology' broadly and divided the talk into three sections: Data, Simulation and Sociology.
'Data' talks about how chemical information has grown exponentially and how chemists are coming up with new techniques to store, organize and understand this information.
'Simulation' talks about how chemists are using the last two decades' spectacular progress in hardware and software to understand the behavior of molecules in a variety of applications ranging from drug design to new materials.
'Sociology' talks about the impact of blogs and social media on the practice of chemistry. More specifically I talk about how social media is serving as a 'second tier' of peer review and how this new medium is having an increasingly influential impact on many issues close to chemists' hearts including lab safety, 'chemophobia' and the public appreciation of chemistry.
Update on the gene wiki project, introduction to knowledge.bio semantic search application, introduction to biobranch.org collaborative decision tree creator
Why You Should Join a Nonprofit Board, by Don ChamberlinDon Chamberlin
Despite the myth, a nonprofit board is open to just about anybody, not just those with lots of money in their pockets. Here are some reasons you shouldn't turn down this chance.
Apricot
Apricotapp is a simple solution for dietitians to connect with, track, and monitor their customers to improve their results and increase client motivation.
Keynote presented at the Phenotype Foundation first annual meeting.
Describes data sharing, data annotation and the needs for further tool and ontology and ontology mapping development.
Amsterdam, January 18, 2016
Data analysis & integration challenges in genomicsmikaelhuss
Presentation given at the Genomics Today and Tomorrow event in Uppsala, Sweden, 19 March 2015. (http://connectuppsala.se/events/genomics-today-and-tomorrow/) Topics include APIs, "querying by data set", machine learning.
Recently, it was reported that the forkhead box O (FoxO) transcription factor promotes human cytomegalovirus (HCMV)
replication via direct binding to the promoters of the major immediate-early (MIE) genes, but how the FoxO factor impacts HCMV
replication remains unknown. In this report, we found that human cytomegalovirus (HCMV), a beta herpesvirus member, could
dramatically induce the expression of FOXOs in the infected human fibroblasts. The induced FOXOs were recruited into the viral
replication compartments (vRC) in the nucleus, especially at the late stage of infection. Suppression of FOXO expression by RNA
interference significantly inhibited HCMV replication, and the production of progeny virus was reduced remarkably.
Mechanistically, FOXO knockdown intensively crippled viral late gene expression at the transcriptional level, while it only
marginally affected viral DNA synthesis. This study highlights how FoxO enhances HCMV gene transcription and viral replication
to promote productive infection
The Uneven Future of Evidence-Based MedicineIda Sim
An Apple ResearchKit study enrolled 22,000 people in five days. A
study claims that Twitter can be used to identify depressed patients. A computer program crunches genomic data, the published literature, and electronic health record data to guide cancer treatment. The pace, the data sources, and the methods for generating medical evidence are changing radically. What will — what should — evidence-based medicine look like in a faster, personalized, data-dense tomorrow?
- Presented as the 3rd Annual Cochrane Lecture, October 2015 in Vienna, Austria.
MseqDR consortium: a grass-roots effort to establish a global resource aimed ...Human Variome Project
The success of whole exome sequencing (WES) for highly heterogeneous disorders, such as mitochondrial disease, is limited by substantial technical and bioinformatics challenges to correctly identify and prioritize the extensive number of sequence variants present in each patient. The likelihood of success can be greatly improved if a large cohort of patient data is assembled in which sequence variants can be systematically analysed, annotated, and interpreted relative to known phenotype. This effort has engaged and united more than 100 international mitochondrial clinicians, researchers, and bioinformaticians in the Mitochondrial Disease Sequence Data Resource (MSeqDR) consortium that formed in June 2012 to identify and prioritize the specific WES data analysis needs of the global mitochondrial disease community. Through regular web-based meetings, we have familiarized ourselves with existing strengths and gaps facing integration of MSeqDR with public resources, as well as the major practical, technical, and ethical challenges that must be overcome to create a sustainable data resource. We have now moved forward toward our common goal by establishing a central data resource (http://mseqdr.org/) that has both public access and secure web-based features that allow the coherent compilation, organization, annotation, and analysis of WES and mtDNA genome data sets generated in both clinical- and research-based settings of suspected mitochondrial disease patients. The most important aims of the MSeqDR consortium are summarized in the MSeqDR portal within the Consortium overview sections. Consortium participants are organized in 3 working groups that include (1) Technology and Bioinformatics; (2) Phenotyping, databasing, IRB concerns and access; and (3) Mitochondrial DNA specific concerns. The online MSeqDR resource is organized into discrete sections to facilitate data deposition and common reannotation, data visualization, data set mining, and access management. With the support of the United Mitochondrial Disease Foundation (UMDF) and the NINDS/NICHD U54 supported North American Mitochondrial Disease Consortium (NAMDC), the MSeqDR prototype has been built. Current major components include common data upload and reannotation using a novel HBCR based annotation tool that has also been made publicly available through the website, MSeqDR GBrowse that allows ready visualization of all public and MSeqDR specific data including labspecific aggregate data visualization tracks, MSeqDR-LSDB instance of nearly 1250 mitochondrial disease and mitochodnrial localized genes that is based on the Locus Specific Database model, exome data set mining in individuals or families using the GEM.app tool, and Account & Access Management. Within MSeqDR GBrowse it is now possible to explore data derived from MitoMap, HmtDB, ClinVar, UCSC-NumtS, ENCODE, 1000 genomes, and many other resources that bioinformaticians recruited to the project are organizing.
With the explosion of interest in both enhanced knowledge management and open science, the past few years have seen considerable discussion about making scientific data “FAIR” — findable, accessible, interoperable, and reusable. The problem is that most scientific datasets are not FAIR. When left to their own devices, scientists do an absolutely terrible job creating the metadata that describe the experimental datasets that make their way in online repositories. The lack of standardization makes it extremely difficult for other investigators to locate relevant datasets, to re-analyse them, and to integrate those datasets with other data. The Center for Expanded Data Annotation and Retrieval (CEDAR) has the goal of enhancing the authoring of experimental metadata to make online datasets more useful to the scientific community. The CEDAR work bench for metadata management will be presented in this webinar. CEDAR illustrates the importance of semantic technology to driving open science. It also demonstrates a means for simplifying access to scientific data sets and enhancing the reuse of the data to drive new discoveries.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
2. Thanks!
• Atul Butte
• Larry Hunter
• Dan Masys
• Joel Dudley
• Maureen Hillenmeyer
• Florian Schmitzberger
• Art Toga
• Jill Mesirov
3. Goals
• Provide an overview of the major scientific
events, trends and publications in
translational bioinformatics
• Create a “snapshot” of what seems to be
important in March, 2008 for the amusement
of future generations.
• Revel in the richness of opportunities and
challenges that we are privileged to approach
at this moment in time.
5. Process
1. Think about what has had impact
2. Think about sources to trust
3. Solicit advice from colleagues
4. Surf online resources
5. Print out abstracts, cut them up, made into
piles
6. Selected key papers from piles
7. Tweaked
6. Caveats
• Focused on human biology and clinical
implications (except really important model
systems)
• Focused on both data sources and informatics
methods
• (Attempted to) Focus on definitive results vs.
intriguing preliminary results
• Tried to avoid simply following crowd mentality.
7. What were the piles?
• Sequencing and sequence analysis
• Genome Wide Association Studies
• Pharmacogenomics & Drugs
• Analysis of high-throughput molecular data for
humans
• Neuroscience
• Molecular information for understanding human
8. Final Results
• 86 semi-finalist papers
• 45 finalist papers
• 27 presented here
• This talk and ENDNOTE files (and other
formats) will be made available on the
conference website.
• (Yikes!)
9. Making Life Hard...
“Deja vu--a study of duplicate citations in
Medline” (Errami et al, Bioinformatics)
• eTBLAST tool, Deja vu database
• 62,213 medline citations studied
• 0.04% potentially plagiarism
• 1.35% seemed duplicate
• 120,000 duplicate citations in
MEDLINE
11. “The diploid genome sequence of an
individual human”
(Levy et al, PLoS Biology)
• J. CraigVenter’s
• 4.1 million variants vs. standard genome build
• 22% of variants not SNPs (560K indels, 90
inversions, duplications, CNVs, etc...)
• Cost: $3B > $70M > $1M > $300K
12. “Next-generation” Sequencing
“Assembling millions of short DNA sequences using
SSAKE” (Warren et al, Bioinformatics)
• Solexa: millions of 25 nt DNA reads per run
“Genome-wide mapping of in vivo protein-DNA
interactions”
• Replaces Chromatin-IP + Microarrays (ChIP-
chip) with ChIP + Sequencing (ChIP-Seq)
• 1946 targets for NRSF transcription factor
found
13. “Identification and analysis of functional
elements in 1% of the human genome by the
ENCODE pilot project” (Birney et al, Nature)
• “Full court press” on all aspects of annotation of 1%
of human genome with combination of experimental
and computational modalities
• FINDINGS: (1) lots of transcription, (2) chromatin is
dominant, (3) 5% conserved/selected, (4) many
functional elements not conserved
• Creates a market and demo for a “human genome
knowledgebase” that annotates the history,
significance and function of every base in the
genome.
15. “Genome-wide association study of 14,000
cases of seven common diseases and 3,000
shared controls”
(Wellcome Trust Consortium, Nature)
• 500K SNP Affymetrix platform
• 24 independent associations: Bipolar disease
(1) Coronary artery disease (1), Crohn’s
disease (9), RA (3),Type I DM (7),Type II DM
(3)
• 58 other SNPs promising...data available.
16.
17. Other GWAS...
• “ Psoriasis is associated with increased beta-
defensin genomic copy number” (Hollox et al,
Nat. Gen)
• “Strong association of de novo copy number
mutations with autism” (Sebat et al, Science)
• “Genome wide expression analysis in HPV16
Cervical Cancer: identification of altered
metabolic pathways” (Perez-Plasencia et al, Infect
Agent Cancer)
19. “Randomized trial of genotype-guided versus
standard warfarin dosing in patients initiating oral
anticoagulation”
(Anderson et al, Circulation)
• GenotypeVKORC1 and CYP2C9 SNPs
• Smaller and fewer dosing changes
• 206 patients
• New gene: CYP4F2 (Caldwell et al, 2008)
• Many awaiting result of major combined
analysis of ~6000 patients internationally
(IWPC)
20. More on drugs...
“Drug-target network”
(Yildirim et al, Nature Biotech)
• Mined FDA data to build drug-target binary
associations
• Found abundance of “me too” drugs
• New drugs more diverse
• Trend towards rational drug design noted.
23. “Context-sensitive data integration and
prediction of biological networks” (Myers &
Troyanskaya, Bioinformatics)
• Many efforts to integrate across multiple data sets
for (re)construction of networks of interactions
• Often plagued by noise in these datasets
• Hypothesis: different high-throughput data sets have
different information content based on biological
questions asked
• Solution: Use gold standard of known interactions
and Bayesian analysis methods to capture
relationships between experiments and biological
reliability.
24.
25. “Enabling integrative genomic analysis of high-
impact human diseases through text mining.
(Dudley & Butte, Pac Symp on Biocomp)
• Problem: annotating large data sets, such as available
in Gene Expression Omnibus (NCBI)
• Natural language processing (NLP) techniques used
to map GEO datasets to UMLS terms for indexing
• Significantly: able to also identify matched control
data sets (for 62% of disease data sets)
• Labels cover 30% of US disease mortality
26.
27. “The human disease network” (Goh et al,
PNAS)
• Built a network of diseases and genes from OMIM
(Online Mendelian Inheritence in Man, NCBI)
• Genes associated via disease links have higher
probability of physical interaction
• Essential genes encode hubs in interaction networks
• Thus, authors predict (and show) that cancer genes
will be “central” in network
28.
29. “Global reconstruction of the human metabolic
network based on genomic and bibliomic
data” (Duarte et al, PNAS)
• Build 35 of human genome + 50 years of literature
• Used it to:
• Discover missing information about human
metabolism (!)
• Starting point for numerical model/simulation of
metabolism
• Set context for analysis of high-throughput data
sets
30.
31. “NetworkBLAST: comparative analysis of
protein networks” (Kalaev et al, Bioinformatics)
• Addresses problem of taking two networks of genes
(protein products) and finding common
substructures
• Like BLAST, looks for seed matches of similar
connection topology and then extends
• Includes estimates of significance
• One of the authors has previously created
Cytoscape, a commonly used network visualization
took = Trey Ideker
32.
33. “Universally sloppy parameter sensitivities in
systems biology models” (Gutenkunst et al,
PLoS Comput Biol)
• Systems biology mathematical models have “sloppy”
sensitivities--model depends on hard-to-predict
combinations of parameters. Authors investigated
this in 17 models.
• Sloppy sensitivities pervasive
• Even very large data sets will not yield accurate
estimates of parameters (bad news)
• Modelers should focus on prediction sensitivity with
estimated parameters as way to guide experiments
(good news)
34. Red = tight
parameter estimates
($$$)
Blue = getting the
key parameters
inaccurately
Yellow = collectively
fitting all parameters
over multiple
experiments
35. “Advancing translational research with the
Semantic Web” (Ruttenberg et al, BMC
Bioinformatics)
• Outlines a vision for semantic web and biomedical
research. Very useful primer on the subject.
• Goal: general infrastructure for aggregation,
annotation and integration of diverse biological
datatypes
• HCLSIG within W3C
• RDF, OWL and other structured representations
hold exciting promise of data integration
infrastructure. Challenges remain...
37. “A transcriptome database for astrocytes,
neurons, and oligodendrocytes: a new resource
for understanding brain development and
function” (Cahoy et al, J. Neurosci)
• Brain development and expression is one of the
great challenges of our era. Sometimes making the
data available can catalyze an entire field.
• Isolated populations of mouse brain cells using FACs
• Affymetrix expression analysis on 20K genes
• Ingenuity pathway analysis tool to find markers
• Differences in gene expression now driving
experiments
38.
39. “Genome-wide atlas of gene expression in the
adult mouse brain” (Lein et al, Nature)
41. “Distinct physiological states of Plasmodium
falciparum in malaria-infected patients” (Daily
et al, Nature)
• In vitro analysis of malaria showed little difference in
gene expression profiles
• BUT what about in the context of hosts who are
different genetically (humans)
• Took 43 blood samples and analyzed gene
expression
• Found significant differences over the sample, and
these correlated to severity of disease and
molecular response to malaria.
42.
43. “Blood gene expression signatures predict
exposure levels” (Bushel et al, PNAS)
• Can we detect exposure to toxins based on
expression changes BEFORE frank signs of toxicity
emerge
• Tested on acetominophen (APAP) exposure in rats,
then humans
• Conclusion: the expression profile of liver cells
shows signs of response well before clinical
symptoms emerge. This may be model for many
other exposures.
44.
45. “Social regulation of gene expression in human
leukocytes” (Cole et al, Genome Biol)
• Can our social situation influence our gene
expression? YES
• Compared gene expresion in 14 subjects who were
subjectively socially isolated with those who were not
• 209 genes found to be differentially expressed
• Anti-inflammatory = DOWN, Pro-inflammatory =
UP
• Major transcription control pathways showed
changes...
46.
47. “Critical review of published microarray studies
for cancer outcome and guidelines on statistical
analysis and reporting” (Dupuy & Simon, J Natl
Cancer Inst)
• Do investigators make routine avoidable mistakes
repeatedly in analysis of microarrays? YES
• Analyzed 90 studies, many in “good” journals, listed
40 (!) dos and don’ts...50% of studies did one of:
1. Inadequate control for multiple hypothesis testing
2. Spurious claim of cluster-outcome correlation
3. Biased estimate of performance based on x-
validation
48. Great work skipped...
• Ultraconserved sequences across genomes
• Biomarket statistics
• miRNA promotors conserved in vertebrates
• Rapid RNA 3D structure determination
• Genome-wide analysis of cell-cycle transcription
• SNP and CNV impact
• Gene expression across ethnic groups
• Paired-end sequencing to identify major genome rearrangements
• Reconstructing regulatory networks
49. I can’t resist
“Conversion of amino-acid sequence in proteins to
classical music: search for auditory
patterns” (Takahashi & Miller, Genome Biology)
• Map amino acids to 13 base notes
• Variations of chords distinguish similar amino
acids
• Codon distribution for rhythm
• Help listeners find patterns: thyA and
huntingtin
50. I can’t resist
“A single IGF 1 allele is a major determinant of small
size in dogs.” (Sutter et al, Science)
• Domestic dog ~15,000 years old
• GWAS on Portugese water dog (size variation)
• IGF 1 = insulin-like growth factor
• Looked at SNPs, found two haplotypes (I and B)
• B have smaller size
• Validated on a wider section of dogs...
51.
52.
53. I can’t resist
“A mutation in the myostatin gene increases muscle
mass and enhances racing performance in
heterozygote dogs.” (Mosher et al, PLOS Genetics)
• Two-base-pair delection in 3rd exon of MSTN
associated with “double muscle” phenotype
• Associated with increased athletic
performance
• ‘performance-enhancing polymorphisms...”
54.
55. I can’t resist...
“Linguistic tone is related to the population
frequency of the adaptive haplogroups of two brain
size genes, ASPM and Microcephalin” (Dediu &
Ladd, PNAS)
• Most languages are unambiguously tonal or
not (Chinese =YES, English = NO)
• Authors hypothesize a relationship between
distribution of tonal languages and distribution
of certain variants of ASPM and MCPH
• Population preferences for language may stem
from their genetics.
57. Crystal ball...
• Sequencing makes a comeback (watch out
microarrays....)
• Translational science projects will create
astounding data sets (hopefully available) to
catalyze research
• GWAS will continue to proliferate
• Consumer-oriented genetics will create demand
for online resources for interpretation
• Difficult decisions about when/how to bring
new molecular diagnostics to practice.