Application of sequencing technology and              bioinformatics to phytomedicine               William Spooner, CTO a...
Genome                 Phenotype Association                                         Genes                                ...
Genomics in research productivityReactome axon guidance pathway – highlighting NRCAM interactions                         ...
Synergy in phytopharmacology   2 + 2 = more
Genomics in pharmacology    Pharmacogenomics                                                                    Stratified...
Genomics in plant breeding                                 Germplasm/Pedigree   Phenotype                                 ...
Phytomedical traitsImage By: G. Nicolella
INPUTS                                                                     OUTPUTS                                        ...
Ensembl as a GCMS   Data management platform                               DAS         UCSC                               ...
Sequenced phytomedicinal plantsCannabis sativa                    Image By: Actv [CC]   Aquilegia coerulea                ...
INPUTS                                                                     OUTPUTS                                        ...
Harper et. al. Nature Biotechnology 30, 798–802 (2012)                                Trick et. al. Plant Biotechnol J., 7...
Atropa                Camptotheca      Ginkgo             Panax              Digitalisbelladonna               acuminata  ...
Sequencing and bioinformatics have                       ready applications to phytomedicine                       R&D.• H...         +44 (0)1223 654481                                          ...
©Eagle Genomics Ltd
Upcoming SlideShare
Loading in...5

Application of Sequencing Technology and Bioinformatics to Phytomedicine


Published on

Phytomedicine is well-suited to being studied using the next generation DNA sequencing technologies (NGS) that have revolutionised molecular biology over the past 5 years. This is true both from the angle of lead discovery/optimisation, and also target discovery/mode of action.

Starting with the search for lead candidates, bioactive molecules from plants are effectively proteins/metabolites, and, given a reference sequence, can be traced back to their genomic origins. Significantly, phytomedicinal traits can be selected for like any other plant phenotype. There are already phytomedicinal plants such as Cannabis with sequenced genomes that are being investigated in this way, and there are tens if not hundreds of species whose genomes would be similarly useful.

The study of phytopharmacology is similar in many respects to that of 'standard' pharmacology, and is tractable via the same genomic techniques. A notable exception to this concerns the synergistic interactions that can be pronounced in plant extracts, meaning that a single isolated ingredient will not always reproduce efficacy of the plant extract. Although the biological explanation of synergistic effects is far from clear, unbiased whole genome assays typified by NGS provide a promising approach to their study.

1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Over 50% of prescription drugs are derived from chemicals first identified in plants.Only in the last 20 years has rational drug discovery (lipinski, fragment-based screening) has taken over from natural sources in drug discovery.However, even though the attrition rate for modern drug candidates is diabolicalThe pharma industry continues to abandon natural product R&D.Could recent advances in sequencing and bioinformatics help reverse this trend,And exploit the 1000 medicinal plants and 10,000 potentially therapeutic phytochemicals?
  • I’d like to start with a quick example genome-wide association study.This human data is a meta-analysis of 18,000 individuals assayed by a SNP array.The zoomed megabase on chr15 contains the strongest association; at SNP rs2470893.<CL> So what is the trait in this case? <CL>...Coffee consumption! rs2470893 lies in the region of CYP1A1, which is a primary caffeine metabolizing enzyme. Each T allele is equivalent to 0.2 extra cups of coffee/day.I have had my genome scanned by 23andMe, and I’m heterozygousWhich means I can blame my genome for about 1 extra cup per week.That coffee consumption is related to caffeine metabolism is not surprising,However, another strong association is with NRCAM; a neuronal cell adhesion gene implicated in addiction vulnerabilityThis simple example demonstrates two major uses for genomics in drug discovery;Biomarker discovery and target discovery.It's been 10 years since the sequencing of the genome...Initial impact (Eric Lander) Genomic medicine will transform the health of our children and our children's children.One of the major advances has been our understanding of the genetic basis behind disease, significantly through the technique of genome wide association of genotypes, DNA differences between individuals, and observable traits (phenotypes).In this illustration, the central graph shows a region of 1million base pairs on chromosome 15, or 1 3-thousandth of the total genome. Each of the points on the graph is the location of a single base pair difference between individuals' DNA. Only a few 100s are shown here from several millions that will have been present in the experiment as a whole.The y axis is the probability that the genotype at each location is related to a specific phenotype, based on a sample of 18,000 individuals. So, in this case, what is the phenotype? Coffee consumption! Which is used as a model of addictive behaviour. The study itself was published this summer in the Journal of molecular Psychiatry - Genome-wide association analysis of coffee drinking suggests association with CYP1A1/CYP1A2 and NRCAM.So what do these genes do? CYP1A1 is the main caffeine-metabolising enzyme, and CYP1A2 metabolises other coffee aromatics. NRCAM is a neuronal cell adhesion molecule that has been implicated in addiction vulnerability.So what does this mean? The reference at rs2470893 is a C, and for each copy of T you will drink 0.2 extra cups of coffee/day. My genotype at this location is CT (23andMe), so I can blame a cup of coffee per week on my genes. However, the heritability of coffee consumption is about 50%, so I can blame quite a few more cups on my patents!We talk about THE human genome, butGenome sequences vary slightly between individualsEven single letter differences, inherited from yourparents, can lead to differences in observable traits (phenotypes);height, hair colour, or disease suceptability.By comparing decoded genomes of people with a trait to thosewithout you can determine which differences are statistically likelyto be associated.This slide is an example of such an experiment. Compared 27,000 peopleFrom the whole genome(top panel) we zoom into a 1million basepair region (0.03% genome)that contains the most strongly associated signal (y-axis); the singleletter change called rs2470893.The bottom panel shows the genes in this region.<CL>So what is the trait in this case? <CL>...Coffee consumption! The strongest genomic signal lies in theregion of CYP1A1, which is a primary caffeine metabolosingenzyme. Another strong association is with NRCAM; aneuronal cell adhesion gene implicated in addiction vulnerability.So what does this mean? For each copy of the letter T you inheretfrom your parents, you will drink 0.2 extra cups of coffee/day.I have had my genome decoded by 23andMe, I have one T -the other is a C. That means my genome at this location is responsiblefor about 1 cup per week of my prodigious coffee habit.
  • Starting with target discovery, Discover interesting disease-relatedbiology that can be modified using drugs.Find genetic drivers for disease (e.g. mutations incancer) and then develop drugs that target those mutations.Modern genomics, empowered by the human genome sequence, has revolutionised our understanding of the genetic basisbehind disease,The CEO of Eli Lilly, John Lechleiter,"Insome cases, biological knowledge is akin tolights being turned on in a room versus groping around in the dark.”The NHGRI suggests that most new drugs based on the completed genome are still perhaps 10 to 15 years in the future, Although more than 350 biotech products - many based on genetic research - are currently in clinical trials.
  • Pharmaceutical companies are now able to disentangle the mode of action for potential therapies more easily using genome information. The same premise follows for phytomed……More so if we consider synergy. Synergistic interaction means that the effect of two or more chemicals taken together is greater than the sum of their separate effects. A review published in the british journal of pharmacology 2011, suggests that phytocannabinoid-terpenoid interactions produce therapeutic synergy with respect to;treatment of pain, inflammation, depression, anxiety, addiction, epilepsy, cancer, fungal and bacterial infectionsSeveral other examples in the literature.How do you start to investigate these complex, multi-dimensional interactions? Modernwhole-genome assays would seem like a reasonable approach. Such studies are, indeed, starting to emerge, covered by a 2009 review in the journal "phytomedicine"
  • Moving from target discovery to biomarker discoverypharmacogenomics - the influence of genetic variation on drug response.Take whole-genome assays, whether genotypic, transcriptomic or epigenetic<CL> interrogate for phenotypic associations<CL> From these validate biomarkers<CL> From which clinical genetic tests can be derivedUse tests to give drugs to the right patient at the right dose at the right time.Hypertension patients with gene that slows metabolises Warfarin (CYP269) need a lower dose to avoid adverse reaction.Only cancer patients with mutation in a gene (HER1) will get therapeutic benefit from HerceptinMore examples being published all the time.Such are the benefits to efficacy and ADRS, consideration of pharmacogenomics in drug development is becoming widely accepted within large pharma.
  • Let's bring in the plant side of the equation.One of the early and perhaps most advanced applications of the “new genomics” is in plant breeding.Here, you start with the a pedigree, the example shows rice plants from International Rice Research Institute in the Philippines.The phenotypes (traits) and genotypes (genomes) of each plant are measured, and statistically associated.The associations are used to develop assays (biomarkers) for desirable phenotypes (disease/drought resistance, increased yield)These assays are used in plant breeding programs to select the individuals to take forward
  • So, what phenotypes are relevent to phytomed? 2 examples.1. biosynthesisE.g. Saponins, which possess a range of potentially therapeutic biological activities,Biosynthesis not well understood.Nobel foundation in Oklahoma using genomics to identify and characterize various enzymes involved in biosynthesis of over 30 saponins in Medicagotruculata. Plants will then be selected to produce genotypes with altered saponin levels and composition.2. Uniformity in dosage, Which is required for FDA/EMEA drug approval,But very difficult to control in natural extracts, but possible;Cambridgeshire-based GW Pharmaceuticals manufacture Sativex; effectively tincture of cannabisActive ingredients are THC and CBD, but other pharmacologically active cannabinoids are also present.However, through careful breeding it is sufficiently standardised in composition, formulation, and dose to be approved as a licensed pharmaceutical for treatment of Cancer and MS pain
  • Association mapping is useful, and the field is moving towards the use of NGS techniques. Falling costs, improved sensitivity, no need for chip design vs. arrays.Big data, complex analysis, Needs a bioinformatics workflow, either conceptual or enacted in software.Here is a generic DNA-Seq example.As the vast majority of medicinal plants have no sequenced genome, I’ve included the process of constructing the reference.To extract most value from the data, the reference needs to include the prediction and annotation of genes.The preparation of the reference is by far the most complex task, and can take several months of work. We do work with clients who have undertaken this task.Once prepared, however, the association steps are far more tractable, and standard analytic pipelines are becoming established, e.g. Galaxy, iPlant Discovery Environment, several others presented at this meeting.
  • Having a public reference genome for your species of interest is, of course, a huge bonus for genomic studies.Out of 30 or more published plant genome sequences, at least 4 species have studied phytomedicinal propertiesCannabis sativa, for example, was published in 2011 Genome Biology Aqualegia, a toxic plant, but an effective treatment for ulcers – is available at Phytozome.The cocoa Genome (antihypertensive, ACE inhibitor)And castor bean (hepatoprotective in rats).
  • Options for association mapping in the absence of a reference genome?RNA-Seq provides a ready method. Much simpler than DNA-seq.Start with a transcript assembly – this can be generated from the RNA-Seq in the absence of an existing reference,Variants are then called from alignments to the transcript assemblies.And the association study run from these.
  • This is the approach used for example by the TraitTag service, Which is developed in collaboration with Ian Bancroft and Martin Trick at the John Innes Centre.Based on work on B. napus they have been publishing for the past few years.,It is optimised for plants, and can cope with complex polyploid genomes.Requires mapping population of 20-100 lines.Produces ~100,000 SNP markers that can be used for the construction of high-density genetic mapsOr for use in GWAS/QTL mapping.
  • Whilst reference genomes for PM plants are sparseThere is a lot of current effort to produce transcriptome references.E.g. Medicinal Plant Genomics Resource from Robin Buel’s lab at MSUHas generated initial transcriptomes for 14 important speciesFurthermore, the 1000 Plant Genomes Project from Genome Alberta + BGI -> sequencing transcripttomes for 1000 speciesPriority given to plants producing medically active compounds, and those with suspected medicinal properties (e.g. traditional Chinese medicine).
  • "We have an explosion of knowledge going on, and all these tools, and research networks. All of it augers for greater research output." John Lechleiter
  • Summer 2012 major international project, “Encyclopedia of Non Coding Elements”, Findings from hundreds of researchers in 30 publications,Also made data/analysis available a virtual machine, but that’s a topic for another talk.Today weknow an order of magnitude more, in data terms,about switches that regulate our genes than we did previously.
  • Application of Sequencing Technology and Bioinformatics to Phytomedicine

    1. 1. Application of sequencing technology and bioinformatics to phytomedicine William Spooner, CTO and Founder, Eagle Genomics PAG-ICPN | San Diego 15th January 2013©Eagle Genomics Ltd ©Eagle Genomics Ltd
    2. 2. Genome Phenotype Association Genes ©Eagle Genomics LtdImage: Sartr CC advance online publication 30 August 2011; doi:10.1038/mp.2011.101 Molecular Psychiatry BY-NC-ND 3.0
    3. 3. Genomics in research productivityReactome axon guidance pathway – highlighting NRCAM interactions ©Eagle Genomics Ltd
    4. 4. Synergy in phytopharmacology 2 + 2 = more
    5. 5. Genomics in pharmacology Pharmacogenomics Stratified Medicine Phenotypic association Validated biomarker Clinical assay Genotypic Transcriptomic Right drug Epigenetic Right patient Right time© ©Eagle Genomics Ltd
    6. 6. Genomics in plant breeding Germplasm/Pedigree Phenotype Genotype Phenotyping Genotyping Phenotype/Genotype association Assays BreedingPAG-IPCN | San Diego ©Eagle Genomics Ltd 15th January 2013 6
    7. 7. Phytomedical traitsImage By: G. Nicolella CC BY SA
    8. 8. INPUTS OUTPUTS Reference DNA-Seq (samples) Assemble Reference genome genome Public EST/Protein Predict genes Gene models Reference genomeAssociation mapping Ref genes (other species) Build Orthologs/paralogsworkflow – DNA-Seq Gene models phylogeniesBiomarker discovery from Ref annot. (other species)DNA-Seq Annotate genes Annotated genes (e.g. GO) Orthologs/paralogs• End-to-end workflow• Species without a Association reference genome. DNA-Seq (samples) Call variants Variants Reference genome (alignment) Annotated genes Annotate Annotated variants Variants variants Sample metadata Associate traits Biomarkers Annotated variants (GWAS) KEY Start Input Data/output report Activity End Intermediate DataPAG-IPCN | San Diego ©Eagle Genomics Ltd Output from one activity that forms input for a subsequent activity 15th January 2013 8
    9. 9. Ensembl as a GCMS Data management platform DAS UCSC Data Hub Data Integration Data Querying Assembly/Genes httpd Data Reporting Variation Data AnalysisRegulatory Genomics Data Integration APIComparative Genomics Data QC
    10. 10. Sequenced phytomedicinal plantsCannabis sativa Image By: Actv [CC] Aquilegia coerulea Image By: Eric LyonsTheobroma cacao Image By: Kai Yan, Joseph Wong [CC] Ricinus communis Image By: Howard F. Schwartzs [CC]
    11. 11. INPUTS OUTPUTS Reference RNA-Seq (samples) Assemble Transcript assemblies transcriptomeAssociation mappingworkflow - RNA-Seq AssociationBiomarker discovery from RNA-Seq (samples) Call variants VariantsDNA-Seq Transcript assemblies (alignment)• End-to-end workflow• Species with or without Sample metadata Associate traits reference transcriptome Biomarkers Variants (GWAS) KEY Start Input Data/output report Activity End Intermediate Data Output from one activity that forms input for a subsequent activityPAG-IPCN | San Diego ©Eagle Genomics Ltd 15th January 2013 11
    12. 12. Harper et. al. Nature Biotechnology 30, 798–802 (2012) Trick et. al. Plant Biotechnol J., 7, 334-46 (2009)High-density QTL Mappinggenetic maps SNPs associated with Glucosinolate content in Rapeseed 15th January 2013 12 PAG-IPCN | San Diego
    13. 13. Atropa Camptotheca Ginkgo Panax Digitalisbelladonna acuminata biloba quinquefolius purpureaCatharanthus Dioscorea Echinacea Cannabis roseus villosa purpurea sativa Hoodia Hypericum Rauvolfia Rosmarinus serpentinaJanuary 2013 officinalis 13 th gordonii perforatum PAG-IPCN | San Diego 15
    14. 14. Sequencing and bioinformatics have ready applications to phytomedicine R&D.• Human: Improve understanding mode of action of bioactive plant compounds.• Plant: Mapping of phytomedicinally-important traits for use in breeding. – DNA-Seq vs. reference genome is gold standard. – RNA-Seq with or without reference transcriptome is still useful.• Future: – Increasing standardisation of NGS workflows. – Increasing quality of software platforms. – Growing number of public genome and transcriptome reference resources.PAG-IPCN | San Diego ©Eagle Genomics Ltd 15th January 2013 14
    15. 15. +44 (0)1223 654481 @wspoonr @eaglegen16/01/2013 15 ©Eagle Genomics Ltd
    16. 16. ©Eagle Genomics Ltd
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.