Medicinal Genomics Corporation (MGC) Digitally Measuring Nature’s PharmacyTranslational medicine requires Translation
Cannabis Indicia and Cannabis Sativa genomes   We import $1B/year in hemp products   $45B - $113B Cannabis black market   ...
Therapeutic Index 35,000 annual deaths from alcohol in the US. 25-30% of violent crimes has EtOH involved in US, 50% in UK...
Social and political considerations  Over 1M US citizens imprisoned every year for  Cannabis. 7.8M citizens imprisoned in ...
Medical Excuse or Medical Use?      Endocannabinoid Pathway is Pervasive  Plays a critical role in the following disease e...
Obesity and BMI have human variants in Endo-Cannabinoid genes   Certain rare Human FAAH & MGLL genotypes are   associated ...
Human variation of the Receptors  Are there populations with mutations in CB1, CB2, “CB3“ which may require custom  dosage...
Over 85 Cannabinoids discovered Cannabis has gone through a breeding bottleneck with prohibition and many silent chemotype...
Spectrum of Cannabinoid affects
Common Precursors
Genetic Bottleneck of Prohibition US penal codes unit of measure is weight based.  Not Volume or %THCA anchored. Drives un...
Many Parts of the Cannabinoid pathways are stillunknownWhy Sequence the Genome?1)Chemical Synthesis producesracemics 2)The...
Predictive GenomicsMutations in FAD binding domaincompromise and/or deactivate THCproduction in strains     Sirikantaramas...
Applying Sequencing to Cannabinoids  1.6Gb (n=2) Dye based Estimate  Sequencing supports 650-1.0Gb (n=2)  10 Chromosomes  ...
Alignment of Assembly to Peach  Gene Finding          BLAST2GO          Pseudo-assembly to other plants helps annotate non...
Whole Genome sequencing reveals genetics ofTHCA Synthase allozymesBlue reads are paired reads                     Copia   ...
Move to Triple Backcrossed Cultivars                                 LA Confidential  http://uf4a.org/
Cannabis Indica          • Database includes             • LA Confidential- Highly phased DNA sequence (13.5Gb)           ...
Mitochondrion Sequence    415Kb Mitochondrian sequence    115kb Chloroplast sequence
Chloroplast Sequence
THCA Synthase and its various paralogs                          Long reads help to phase the SNPs in THC SynthaseSingle re...
Other data emerges          R          RNA Seq- Mexican Sativa          Purple Kush- Indica          USO-1-Hemp          F...
Polymorphisms across 3 cultivars       ChemDawg sequenced to 327X coverage with 2x100 reads           High AT content disc...
RNA-Seq data from 5 tissues                                     Mature Bud                                     Early Bud  ...
Characterizing THCA Synthase like genes    LA Confidential Contigs with BLAST hit to THC Synthase                         ...
Root expressed synthases have novel N-terminal domain •   Root specific Synthase expression. •   FAD binding intact. •   B...
Novel Synthase differentially expressed in Roots                        Novel Candidate cannabinoid Synthase gene         ...
Family Tree of Synthase genes across cultivars
What markets are enabled with this?    Understanding Cannabichromene requires Schedule I licenses (time)    and is a longe...
Avantra’s Biomarker Platform Highlights  Simplified Multiplex Assay   Fully automated Multiplex ELISA (20-plex) on a chip ...
CLIA Certified for Mitochondrial Sequencing                1100 nuclear genes including CB1,                CB2, FAAH and ...
Courtagen’s CLIA sequencing pipeline         1                   2              3              4              5           ...
mtSEEK PDx assay feature: Embedded controlsControl human DNA 1: NA19240                                Purify DNA         ...
Barcoded Embedded DNA Controls                                                                                        Barc...
mtSEEK PDx Assay FeaturesCLIA validated assay with CPT codesNUMTs Free capture technique (5% Heteroplasmy sensitivity)Two ...
Summary- Clearing the Smoke   Phased Genome Sequence provides:        Key cannabinoid synthase pathways now resolved      ...
Acknowledgements   In 6 months we started a company, sequenced a genome, Booked Revenue and   were acquired (Now a divisio...
Upcoming SlideShare
Loading in …5
×

Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc, Kevin McKernan, Copenhagenomics 2012

2,901 views

Published on

Sequencing cannabis sativa and cannabis indica

Published in: Education, Technology
  • jajajajaja... whata nice fuckn joke!!!... ALL THE WEED YOU AMERICAN POTHEADS SMOKE IS GROWN INSIDE YOU YARDS AND CLOSETS..... STOP BLAMING ON MEXICO AND CANADA!!!
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc, Kevin McKernan, Copenhagenomics 2012

  1. 1. Medicinal Genomics Corporation (MGC) Digitally Measuring Nature’s PharmacyTranslational medicine requires Translation
  2. 2. Cannabis Indicia and Cannabis Sativa genomes We import $1B/year in hemp products $45B - $113B Cannabis black market $1.3B California “legal” dispensary driven market (Growing 50% a year) 45% grown in the US (over 22M lbs) 40% Mexican 10% Canadian 5% Other
  3. 3. Therapeutic Index 35,000 annual deaths from alcohol in the US. 25-30% of violent crimes has EtOH involved in US, 50% in UK. 80% of Domestic violence has EtOH involved Lisbon Football games successfully reduce riots by promoting Cannabis. Steve Fox Paul Armentano Mason Tvert
  4. 4. Social and political considerations Over 1M US citizens imprisoned every year for Cannabis. 7.8M citizens imprisoned in 10 years $50B/year in Prisons. Private prisons growing 17%/year. 25% of Global prisoners in the USRate of prescription drug overdoses per 100K- source CDC
  5. 5. Medical Excuse or Medical Use? Endocannabinoid Pathway is Pervasive Plays a critical role in the following disease etiologies. Analgesics – Estimated $75B US market Chronic Pain- Estimated to be over $200B US health care cost (Source: Institute for Pain management) Cancer Pain, wasting, apoptosis – Estimated over $220B US health care cost (NCI- number is $117B) MS spasticity- $10B Global Diabetes and weight management – Very Large
  6. 6. Obesity and BMI have human variants in Endo-Cannabinoid genes Certain rare Human FAAH & MGLL genotypes are associated with high Anandamide plasma levels and obesity. Patients with these genotypes will impact clinical trials
  7. 7. Human variation of the Receptors Are there populations with mutations in CB1, CB2, “CB3“ which may require custom dosages of Cannabinoids?
  8. 8. Over 85 Cannabinoids discovered Cannabis has gone through a breeding bottleneck with prohibition and many silent chemotypes we believe will be found in the genomics of existing strains.http://en.wikipedia.org/wiki/Cannabinoids What are the genetic pathways? Which enzymes have variants in these pathways? Are there extinct synthases in the genome which can be discovered/recovered? Terpenoids?
  9. 9. Spectrum of Cannabinoid affects
  10. 10. Common Precursors
  11. 11. Genetic Bottleneck of Prohibition US penal codes unit of measure is weight based. Not Volume or %THCA anchored. Drives underground market towards higher concentration THCA plant matter Many shared precursors in the pathways suggesting higher THCA concentration has come at the cost of lower CBDA and other therapeutic cannabinoids.
  12. 12. Many Parts of the Cannabinoid pathways are stillunknownWhy Sequence the Genome?1)Chemical Synthesis producesracemics 2)The plant grows quickly andproductively.3)Trend is towards cocktails ofcannabinoids and terpenoidsDiscovery of pathways can aid in breedingand Synthetic biology approaches to MFG
  13. 13. Predictive GenomicsMutations in FAD binding domaincompromise and/or deactivate THCproduction in strains Sirikantaramas et al THC Synthase Annotation
  14. 14. Applying Sequencing to Cannabinoids 1.6Gb (n=2) Dye based Estimate Sequencing supports 650-1.0Gb (n=2) 10 Chromosomes De Novo shotgun to 327X coverage 131Gb 2x100 ILMN, 300bp inserts De Novo assembly with CLC Bio and SOAPdenovo on a 64Gb RAM Mac 2 references Sativa Indica 65% AT 0.5->1% polymorphisms rate 300Mb assembly with CLC Bio.
  15. 15. Alignment of Assembly to Peach Gene Finding BLAST2GO Pseudo-assembly to other plants helps annotate non polyA expressed and or conserved regions.
  16. 16. Whole Genome sequencing reveals genetics ofTHCA Synthase allozymesBlue reads are paired reads Copia THCA synthase geneRed and Green reads are unpaired Transposon 8X higher coverage than rest of Mechanism for higher genome implying many more copiesVertical lines are SNPs THC gene copy than just 2 numberLots of SNPs in the transposons since 740X oftransposons are collapsed into this assembly.
  17. 17. Move to Triple Backcrossed Cultivars LA Confidential http://uf4a.org/
  18. 18. Cannabis Indica • Database includes • LA Confidential- Highly phased DNA sequence (13.5Gb) • Chemdawg- High Coverage DNA Sequence (131Gb)LA Conf. 3X Backcrossed Assembly sums to 722Mb (All Contigs) & 676Mb (>500bp contigs)
  19. 19. Mitochondrion Sequence 415Kb Mitochondrian sequence 115kb Chloroplast sequence
  20. 20. Chloroplast Sequence
  21. 21. THCA Synthase and its various paralogs Long reads help to phase the SNPs in THC SynthaseSingle reads 454: 700bp reads preserve phase. SNPsAre these 8 other copies of diverged THCA synthase makingTHC or could they be the other silent chemotypes?RNA-Seq can demonstrate expression Phase is critical for Amino Acid prediction Failure to phase IRLQFFLMGRstop ATTCGTCTGCA [T/A] TTCTTCCTGAT [G/C] GGGCGCTG [A/C] TTT IRLQFFLMGRCF IRLQFFLIGRstop IRLQFFLIGRCF IRLHFFLMGRstop I R L Q or H F F L M or I G R Stop or C F IRLHFFLMGRCF IRLHFFLIGRstop IRLHFFLIGRCF 2^N Peptide predictions, where N= # unphased SNVs
  22. 22. Other data emerges R RNA Seq- Mexican Sativa Purple Kush- Indica USO-1-Hemp Finola-Hemp
  23. 23. Polymorphisms across 3 cultivars ChemDawg sequenced to 327X coverage with 2x100 reads High AT content discovered, High polymorphism rate discovered 3x backcrossed LA Confidential (DNA Genetics) sequenced to over 15X Lower polymorphism rate. TABLE_2 Heterozygotes Homozygotes Total Ti/Tv CD X CD CD= Chemdawg 1,413,345 100,274 1,513,619 1.64 LA X LA LA= LA Confidential 925,602 0 925,602 1.72 CD X LA PK= Purple Kush 1,960,931 1,506,345 3,467,276 1.62 LA X CD 1,357,810 1,491,827 2,849,637 1.84 LA X PK 1,854,661 1,988,717 3,843,378 1.76 CD X PK 3,000,128 1,573,243 4,573,371 1.69 PK X PK 1,085,040 221,657 1,306,697 1.66 SNV genome wide SNV in the coding regions CODING SNPs Heterozygotes Homozygotes Total LA Conf X PKUSH 94,853 78,251 173,104 Chemdawg X PKUSH 302,449 94,467 396,916 Pkush X Pkush
  24. 24. RNA-Seq data from 5 tissues Mature Bud Early Bud Mature Leaf Early Leaf/Petiole Root
  25. 25. Characterizing THCA Synthase like genes LA Confidential Contigs with BLAST hit to THC Synthase Purple Kush assembly hole filled by 454 long reads
  26. 26. Root expressed synthases have novel N-terminal domain • Root specific Synthase expression. • FAD binding intact. • BBE domain intact • N-terminal novel domain PsbN
  27. 27. Novel Synthase differentially expressed in Roots Novel Candidate cannabinoid Synthase gene •81% nucleotide similarity to CBDA synthase. •81% nucleotide similarity to THCA synthase. •1655bp ORF •Transcriptionally active from Polyadenylated RNA •Intact FAD nucleotide and AA Binding domain of THC and mPIF sequenceROOT THCA Synthase Cannabichromene synthase candidate gene FLOWER
  28. 28. Family Tree of Synthase genes across cultivars
  29. 29. What markets are enabled with this? Understanding Cannabichromene requires Schedule I licenses (time) and is a longer term project. Armed with the genome we can design QPCR assays to quantitate Cannabinoid RNA and Mold for better labeling. Courtagen also has the potential for Q400 ELISA assays for Pesticides and Mold Medical Cannabis Industry needs better labeling and POC assays are required to manage diversion concerns inherit in centralized testing labs. Can we sequence patients to better understand cannabinoids and metabolic disease?
  30. 30. Avantra’s Biomarker Platform Highlights Simplified Multiplex Assay Fully automated Multiplex ELISA (20-plex) on a chip with all reagents on board Most applications measure five to seven different analytes Minimal sample requirements - 100uL Sample types: Serum, Plasma, Blood and other non particulate samples Highly Precise and Accurate 3-4 log dynamic range on multiple analytes Reproducibility - low Intra/inter assay CV below 10% Instrument to instrument CV less than 0.3% Improved accuracy with six replicates per analyte Typical Calibration Curves 10’s of picogram sensitivity 100000 TIMP-1 HGF S ig n a l In te n s ity ( R F U ) 10000 ICAM-1 Fast and User Friendly Workflow TIE-2 1000 VEGF-R2 Less than 1 minute sample prep Assay run time between 15-40 minutes 100 FGF-Basic Bench-top system for non-specialized technician IL8 Compact foot print – 1.8 square feet 10 -2 0 2 4 E-selectin PlGF Log Concentration (ng/mL) VEGF Company focus: Merge Genomic Data with Biomarker data
  31. 31. CLIA Certified for Mitochondrial Sequencing 1100 nuclear genes including CB1, CB2, FAAH and MGLL 20,000X coverage of Mitochondrial Genome
  32. 32. Courtagen’s CLIA sequencing pipeline 1 2 3 4 5 6Customer Courtagen Biomarkers Ongoing CLIA Databases PersonalizedAcquisition Bioinformatics Service(Saliva, Blood, Tissue) Laboratory Web/iPad App Portal ATACCGCTGGC CCTTTGGCATT ACCTATGAAGA TTGCTTCAGCC AGCGTCAGTTT CAACCTGTACG CTAGTGTGTTT Mito LR PCR, 2 different libraries Nextera Library generation Embedded controls Haloplex 1100 genes 1:2000 children affected: Sequencing can save $100-$200K per year in costs. Thought to be responsible for 10-20% of Autism 32 CONFIDENTIAL
  33. 33. mtSEEK PDx assay feature: Embedded controlsControl human DNA 1: NA19240 Purify DNA Make Barcoded Library for each mixture Mix DNAs at precise ratiosControl human DNA 2: NA12878 Purify DNA 2 or more mixtures depending on application 90%:10% Mix 1 DNA Barcode CCCCCC 95%:5% Mix 2 DNA Barcode GGGGG 98%:2% Mix 3 DNA Barcode CACACA 99%:1% Mix 4 DNA Barcode GTGTGT
  34. 34. Barcoded Embedded DNA Controls Barcoded Mixture Controls Attach unique DNA barcode Clinical patient DNA 1 Mix Controls with Clinical samples Attach unique DNA barcode Sequence samples and barcodes Patients Clinical patient DNA 2…50 De-multiplex barcodes Controls in every run provide sensitivity and specificity
  35. 35. mtSEEK PDx Assay FeaturesCLIA validated assay with CPT codesNUMTs Free capture technique (5% Heteroplasmy sensitivity)Two Libraries made from each patient only report genotypes observed in both librariesAutomated Nextera library generationBarcoded Embedded ControlsEach Library Sequenced to 10,000X coverage 2 x 150bp reads assists in reducing noise from NUMTs Dual indexing used to eliminate Patient mis-ID and sequencing artifactsSaliva, Blood and Tissue CLIA validated3 Day TAT. Backlog + Shipping and Approval= 3 week TAT Consistent Nextera Library Generation MiSeq 2 X 150bp SequencingNUMTs Depletion step
  36. 36. Summary- Clearing the Smoke Phased Genome Sequence provides: Key cannabinoid synthase pathways now resolved Synthetic biology approach for therapeutic cannabinoid manufacturing enabled Toolkit to design RT qPCR assays for sequences predictive of cannabinoid content and mold content. Critical to bring better labeling and regulation to the growing dispensary based market for medical cannabis. 1 in 3 people will get cancer in their lifetime. 1in 4 will die with or from it. Anything non-toxic and showing preliminary signs of cancer specific apoptosis is a priority.Guzman et el.Nature CancerReview -2004
  37. 37. Acknowledgements In 6 months we started a company, sequenced a genome, Booked Revenue and were acquired (Now a division of Courtagen Life Sciences). 2 Guys and a Garage Christian Giannini Lots of outsourcing Doug Smith- Beckman Genomics Karin Fredrickson, James Knight – Roche 454 Brian O’Connor, Sara Grimm- Nimbus Informatics Tim Harkins- Life Technologies Medicinal Plant Genomics Resource Harm Van Bakel- Toronto CLCBioWe are Hiring!Genetic CounselorsBioinformatics Scientistshttp://www.courtagen.com/

×