Cannabis has gone through a breeding bottleneck with prohibition and many silent chemotypes we believe will be found in the genomics of existing strains.
Stay scientific and don’ t be influenced by 30 year old stigmas Better Cannabis regulation is needed. FDA trials on complex drug cocktails are expensive making it unlikely to be a pharmaceutical priority given the generic being ever present.
Sequencing cannabis sativa and cannabis indica, Courtagen Life Sciences, Inc, Kevin McKernan, Copenhagenomics 2012
Cannabis Indicia and Cannabis Sativa genomes We import $1B/year in hemp products $45B - $113B Cannabis black market $1.3B California “legal” dispensary driven market (Growing 50% a year) 45% grown in the US (over 22M lbs) 40% Mexican 10% Canadian 5% Other
Therapeutic Index 35,000 annual deaths from alcohol in the US. 25-30% of violent crimes has EtOH involved in US, 50% in UK. 80% of Domestic violence has EtOH involved Lisbon Football games successfully reduce riots by promoting Cannabis. Steve Fox Paul Armentano Mason Tvert
Social and political considerations Over 1M US citizens imprisoned every year for Cannabis. 7.8M citizens imprisoned in 10 years $50B/year in Prisons. Private prisons growing 17%/year. 25% of Global prisoners in the USRate of prescription drug overdoses per 100K- source CDC
Medical Excuse or Medical Use? Endocannabinoid Pathway is Pervasive Plays a critical role in the following disease etiologies. Analgesics – Estimated $75B US market Chronic Pain- Estimated to be over $200B US health care cost (Source: Institute for Pain management) Cancer Pain, wasting, apoptosis – Estimated over $220B US health care cost (NCI- number is $117B) MS spasticity- $10B Global Diabetes and weight management – Very Large
Obesity and BMI have human variants in Endo-Cannabinoid genes Certain rare Human FAAH & MGLL genotypes are associated with high Anandamide plasma levels and obesity. Patients with these genotypes will impact clinical trials
Human variation of the Receptors Are there populations with mutations in CB1, CB2, “CB3“ which may require custom dosages of Cannabinoids?
Over 85 Cannabinoids discovered Cannabis has gone through a breeding bottleneck with prohibition and many silent chemotypes we believe will be found in the genomics of existing strains.http://en.wikipedia.org/wiki/Cannabinoids What are the genetic pathways? Which enzymes have variants in these pathways? Are there extinct synthases in the genome which can be discovered/recovered? Terpenoids?
Genetic Bottleneck of Prohibition US penal codes unit of measure is weight based. Not Volume or %THCA anchored. Drives underground market towards higher concentration THCA plant matter Many shared precursors in the pathways suggesting higher THCA concentration has come at the cost of lower CBDA and other therapeutic cannabinoids.
Many Parts of the Cannabinoid pathways are stillunknownWhy Sequence the Genome?1)Chemical Synthesis producesracemics 2)The plant grows quickly andproductively.3)Trend is towards cocktails ofcannabinoids and terpenoidsDiscovery of pathways can aid in breedingand Synthetic biology approaches to MFG
Predictive GenomicsMutations in FAD binding domaincompromise and/or deactivate THCproduction in strains Sirikantaramas et al THC Synthase Annotation
Applying Sequencing to Cannabinoids 1.6Gb (n=2) Dye based Estimate Sequencing supports 650-1.0Gb (n=2) 10 Chromosomes De Novo shotgun to 327X coverage 131Gb 2x100 ILMN, 300bp inserts De Novo assembly with CLC Bio and SOAPdenovo on a 64Gb RAM Mac 2 references Sativa Indica 65% AT 0.5->1% polymorphisms rate 300Mb assembly with CLC Bio.
Alignment of Assembly to Peach Gene Finding BLAST2GO Pseudo-assembly to other plants helps annotate non polyA expressed and or conserved regions.
Whole Genome sequencing reveals genetics ofTHCA Synthase allozymesBlue reads are paired reads Copia THCA synthase geneRed and Green reads are unpaired Transposon 8X higher coverage than rest of Mechanism for higher genome implying many more copiesVertical lines are SNPs THC gene copy than just 2 numberLots of SNPs in the transposons since 740X oftransposons are collapsed into this assembly.
Move to Triple Backcrossed Cultivars LA Confidential http://uf4a.org/
Cannabis Indica • Database includes • LA Confidential- Highly phased DNA sequence (13.5Gb) • Chemdawg- High Coverage DNA Sequence (131Gb)LA Conf. 3X Backcrossed Assembly sums to 722Mb (All Contigs) & 676Mb (>500bp contigs)
THCA Synthase and its various paralogs Long reads help to phase the SNPs in THC SynthaseSingle reads 454: 700bp reads preserve phase. SNPsAre these 8 other copies of diverged THCA synthase makingTHC or could they be the other silent chemotypes?RNA-Seq can demonstrate expression Phase is critical for Amino Acid prediction Failure to phase IRLQFFLMGRstop ATTCGTCTGCA [T/A] TTCTTCCTGAT [G/C] GGGCGCTG [A/C] TTT IRLQFFLMGRCF IRLQFFLIGRstop IRLQFFLIGRCF IRLHFFLMGRstop I R L Q or H F F L M or I G R Stop or C F IRLHFFLMGRCF IRLHFFLIGRstop IRLHFFLIGRCF 2^N Peptide predictions, where N= # unphased SNVs
Other data emerges R RNA Seq- Mexican Sativa Purple Kush- Indica USO-1-Hemp Finola-Hemp
Polymorphisms across 3 cultivars ChemDawg sequenced to 327X coverage with 2x100 reads High AT content discovered, High polymorphism rate discovered 3x backcrossed LA Confidential (DNA Genetics) sequenced to over 15X Lower polymorphism rate. TABLE_2 Heterozygotes Homozygotes Total Ti/Tv CD X CD CD= Chemdawg 1,413,345 100,274 1,513,619 1.64 LA X LA LA= LA Confidential 925,602 0 925,602 1.72 CD X LA PK= Purple Kush 1,960,931 1,506,345 3,467,276 1.62 LA X CD 1,357,810 1,491,827 2,849,637 1.84 LA X PK 1,854,661 1,988,717 3,843,378 1.76 CD X PK 3,000,128 1,573,243 4,573,371 1.69 PK X PK 1,085,040 221,657 1,306,697 1.66 SNV genome wide SNV in the coding regions CODING SNPs Heterozygotes Homozygotes Total LA Conf X PKUSH 94,853 78,251 173,104 Chemdawg X PKUSH 302,449 94,467 396,916 Pkush X Pkush
RNA-Seq data from 5 tissues Mature Bud Early Bud Mature Leaf Early Leaf/Petiole Root
Characterizing THCA Synthase like genes LA Confidential Contigs with BLAST hit to THC Synthase Purple Kush assembly hole filled by 454 long reads
Novel Synthase differentially expressed in Roots Novel Candidate cannabinoid Synthase gene •81% nucleotide similarity to CBDA synthase. •81% nucleotide similarity to THCA synthase. •1655bp ORF •Transcriptionally active from Polyadenylated RNA •Intact FAD nucleotide and AA Binding domain of THC and mPIF sequenceROOT THCA Synthase Cannabichromene synthase candidate gene FLOWER
Family Tree of Synthase genes across cultivars
What markets are enabled with this? Understanding Cannabichromene requires Schedule I licenses (time) and is a longer term project. Armed with the genome we can design QPCR assays to quantitate Cannabinoid RNA and Mold for better labeling. Courtagen also has the potential for Q400 ELISA assays for Pesticides and Mold Medical Cannabis Industry needs better labeling and POC assays are required to manage diversion concerns inherit in centralized testing labs. Can we sequence patients to better understand cannabinoids and metabolic disease?
Avantra’s Biomarker Platform Highlights Simplified Multiplex Assay Fully automated Multiplex ELISA (20-plex) on a chip with all reagents on board Most applications measure five to seven different analytes Minimal sample requirements - 100uL Sample types: Serum, Plasma, Blood and other non particulate samples Highly Precise and Accurate 3-4 log dynamic range on multiple analytes Reproducibility - low Intra/inter assay CV below 10% Instrument to instrument CV less than 0.3% Improved accuracy with six replicates per analyte Typical Calibration Curves 10’s of picogram sensitivity 100000 TIMP-1 HGF S ig n a l In te n s ity ( R F U ) 10000 ICAM-1 Fast and User Friendly Workflow TIE-2 1000 VEGF-R2 Less than 1 minute sample prep Assay run time between 15-40 minutes 100 FGF-Basic Bench-top system for non-specialized technician IL8 Compact foot print – 1.8 square feet 10 -2 0 2 4 E-selectin PlGF Log Concentration (ng/mL) VEGF Company focus: Merge Genomic Data with Biomarker data
CLIA Certified for Mitochondrial Sequencing 1100 nuclear genes including CB1, CB2, FAAH and MGLL 20,000X coverage of Mitochondrial Genome
Courtagen’s CLIA sequencing pipeline 1 2 3 4 5 6Customer Courtagen Biomarkers Ongoing CLIA Databases PersonalizedAcquisition Bioinformatics Service(Saliva, Blood, Tissue) Laboratory Web/iPad App Portal ATACCGCTGGC CCTTTGGCATT ACCTATGAAGA TTGCTTCAGCC AGCGTCAGTTT CAACCTGTACG CTAGTGTGTTT Mito LR PCR, 2 different libraries Nextera Library generation Embedded controls Haloplex 1100 genes 1:2000 children affected: Sequencing can save $100-$200K per year in costs. Thought to be responsible for 10-20% of Autism 32 CONFIDENTIAL
mtSEEK PDx assay feature: Embedded controlsControl human DNA 1: NA19240 Purify DNA Make Barcoded Library for each mixture Mix DNAs at precise ratiosControl human DNA 2: NA12878 Purify DNA 2 or more mixtures depending on application 90%:10% Mix 1 DNA Barcode CCCCCC 95%:5% Mix 2 DNA Barcode GGGGG 98%:2% Mix 3 DNA Barcode CACACA 99%:1% Mix 4 DNA Barcode GTGTGT
Barcoded Embedded DNA Controls Barcoded Mixture Controls Attach unique DNA barcode Clinical patient DNA 1 Mix Controls with Clinical samples Attach unique DNA barcode Sequence samples and barcodes Patients Clinical patient DNA 2…50 De-multiplex barcodes Controls in every run provide sensitivity and specificity
mtSEEK PDx Assay FeaturesCLIA validated assay with CPT codesNUMTs Free capture technique (5% Heteroplasmy sensitivity)Two Libraries made from each patient only report genotypes observed in both librariesAutomated Nextera library generationBarcoded Embedded ControlsEach Library Sequenced to 10,000X coverage 2 x 150bp reads assists in reducing noise from NUMTs Dual indexing used to eliminate Patient mis-ID and sequencing artifactsSaliva, Blood and Tissue CLIA validated3 Day TAT. Backlog + Shipping and Approval= 3 week TAT Consistent Nextera Library Generation MiSeq 2 X 150bp SequencingNUMTs Depletion step
Summary- Clearing the Smoke Phased Genome Sequence provides: Key cannabinoid synthase pathways now resolved Synthetic biology approach for therapeutic cannabinoid manufacturing enabled Toolkit to design RT qPCR assays for sequences predictive of cannabinoid content and mold content. Critical to bring better labeling and regulation to the growing dispensary based market for medical cannabis. 1 in 3 people will get cancer in their lifetime. 1in 4 will die with or from it. Anything non-toxic and showing preliminary signs of cancer specific apoptosis is a priority.Guzman et el.Nature CancerReview -2004
Acknowledgements In 6 months we started a company, sequenced a genome, Booked Revenue and were acquired (Now a division of Courtagen Life Sciences). 2 Guys and a Garage Christian Giannini Lots of outsourcing Doug Smith- Beckman Genomics Karin Fredrickson, James Knight – Roche 454 Brian O’Connor, Sara Grimm- Nimbus Informatics Tim Harkins- Life Technologies Medicinal Plant Genomics Resource Harm Van Bakel- Toronto CLCBioWe are Hiring!Genetic CounselorsBioinformatics Scientistshttp://www.courtagen.com/