Could genomics (plus NGS/Cloud) save the world?


Published on

Slides from a talk at University of East Anglia (6th December 2011) looking at the recent advances in high-throughput genomics, and how these can be applied 'big issues' such as food security and disease. Particular attention is paid to the role of open science in reacting to critical situation.
- Will Spooner, CTO.

Published in: Technology
1 Like
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • -Hello, I'm the CTO and founder of Eagle Genomics, and Cambridge-based professional services company specialising in genome content management. Make it easier for industry to consume open data and open source software Two buzzwords – NGS and Cloud. Simultaneous arrival. Evangalist, selling genomics throughout the life science industry Take on the big queston – can we save the world?
  • - 2 big issues that have bothered humanity since the dawn of time;- Famine and disease- What has genomics to offer?Genomics in food security JIC/TGACGenomics in healthcarePHG – I was part of the workgroup
  • Looking into the future – speculationSynthetic lifeMother earth, climate change. Understanding using metagenomics and the basis of molecular evolution
  • It's been 10 years since the sequencing of the genome...Initial impact (Eric Lander) Genomic medicine will transform the health of our children and our children's children.One of the major advances has been our understanding of the genetic basis behind disease, significantly through the technique of genome wide association of genotypes, DNA differences between individuals, and observable traits (phenotypes).In this illustration, the central graph shows a region of 1million base pairs on chromosome 15, or 1 3-thousandth of the total genome. Each of the points on the graph is the location of a single base pair difference between individuals' DNA. Only a few 100s are shown here from several millions that will have been present in the experiment as a whole.The y axis is the probability that the genotype at each location is related to a specific phenotype, based on a sample of 18,000 individuals. So, in this case, what is the phenotype? Coffee consumption! Which is used as a model of addictive behaviour. The study itself was published this summer in the Journal of molecular Psychiatry - Genome-wide association analysis of coffee drinking suggests association with CYP1A1/CYP1A2 and NRCAM.So what do these genes do? CYP1A1 is the main caffeine-metabolising enzyme, and CYP1A2 metabolises other coffee aromatics. NRCAM is a neuronal cell adhesion molecule that has been implicated in addiction vulnerability.So what does this mean? The reference at rs2470893 is a C, and for each copy of T you will drink 0.2 extra cups of coffee/day. My genotype at this location is CT (23andMe), so I can blame a cup of coffee per week on my genes. However, the heritability of coffee consumption is about 50%, so I can blame quite a few more cups on my patents!
  • Let's bring in the plant side of the equation.One of the early and perhaps most advanced applications of the “new genomics” is in plant breeding.Here, you start with the a pedigree, the example shows rice plants from International Rice Research Institute in the Philippines.The phenotypes and genotypes of each plant are measured, and statistically associated.The associations are used to develop assays (biomarkers) for desirable phenotypes (disease/drought resistance, increased yeild)These assays are used in plant breeding programmes to select the individuals to take forward
  • Pharmacogenomics is influence of genetic variation on drug response; i.e. correlation with pharmacological phenotypes.The idea is to use phenotypic association as the basis of a molecular biomarker. Patients that carry this biomarker can be given the right drug at the right dose at the right time.To date this approach has worked well for targeted cancer therapies such as Herceptin, not surprising as cancer is fundamentally a disease of the genome. And also Warfarin dosing. Such are the benefits to efficacy and ADRS, consideration of pharmacogenomics in drug development is becoming widely accepted within large pharma.
  • With the advent of "next generation" massively parallel DNA sequencing NGS), The cost of DNA sequencing has fallen a thousand-fold in the last three years, from $500/Mb to $0.5/Mb. This decrease in cost is truly democratising genomics; Data volumes that was previously only the domain of the largest international sequencing centres such as the Sanger in Cambridge or the Whitehead in the other Cambridge, are now available to even the most meagre labs. This has lead to an explosion in novel applications for NGS in genomics, transcriptomics, epigenomics, and metagenomics.
  • Whilst the cost of data generation has reduced dramatically, the bottleneck has become it's analysis, leading to this 2010 paper from WashU to speculate about the hoy grail of the $1000 genome being accompanied by a $100,000 price tag for analysis.New technologies = single molecule, (oxford nanopore)
  • Is lower potential TCO the only reason to go to cloud? No, trying to lower TCO is the wrong reason to move to cloud. Simple IT cost reduction should not be the only goal of a cloud implementation, says Kirwin. Just as important are other gains that can be achieved through cloud, including achieving heightened analytic processing of big data,- greater agility, - more flexibility, - faster time to market- improved business process
  • We can’t do it alone…We must collaborate…Open science/knowledgeThe scientific process has four outputs, - Methods/protocols that,- Generate results that,- Can be documented/published thereby,- Contributing new insight/knowledge to the scientific corpus.I define 'open science' as any of the above that is made for free for the use of anyone with few if any restrictionsAnyone can contribute – collaborative science
  • We must get more efficientThe situation in bioinformatics today reminds me of working in a dot com startup during the internet bubble; everyone employed web developers to hand-craft HTML web pages and CGI scripts. These days Basic data management = CREATE MANGE and REPORTContent management gives these, and adds REUSE SHARE and EXTENDOPEN SCIENCE
  • The Genome-CMS concept is not new, there is a wide variety of existing platforms that fall under the umbrella. Ensembl and SRS, for instance, are both over 10 years old. Platforms can, to some extent be classified into those which are workflow (i.e. data creation) oriented, and those which are database (i.e. data management) oriented. Another key differentiator is the license under which the software are offered; open source or proprietary.
  • One of expectations of the genome was to empower Pharmaceutical companies to create drugs based on the DNA, RNA and proteins molecules associated with genes and diseases.The rate of progress has not been as fast as some pundits would have liked, hence recent flurry of expectation-setting exercises. The US National Human Genetics Research Institute, for insurance, suggests that most new drugs based on the completed genome are still perhaps 10 to 15 years in the future, although more than 350 biotech products - many based on genetic research - are currently in clinical trials.Some pharma co execs, notably Lilly's CEO John Lechleiter, are more bullish. He sees increased research productivity resulting a larger knowledge base. To quote from a recent xconomy article: "In some cases, our knowledge of biological pathways now is akin to lights being turned on in a room versus groping around in the dark.", Lilly continue to invest in R&D.The image for this slide is from Reactome, a curated database of biological processes in humans, and shows the many molecular actors in the apoptosis pathway.
  • Reference genomics opens up new routes to R&D. If rational rules-based drug design has failed, where do we look next?Great example: the sequenced model plant, Medicagotruculata, being used to study the biosynthesis of an important class of medically-significant compounds, namely Saponin. A ready and therapeutically relevant example is the cardio-active agent digoxin, from common foxglove.Saponins possess a range of biological activities, including antimicrobial, anti-palatability, anti-cancer and hemolytic. But the biosynthesis of these compounds is not understood. Medicagotruncatula produces over 30 different triterpenesaponins. The Nobel foundation in Oklahoma are taking a functional genomics approach to identify and characterize various enzymes involved in saponin biosynthesis. Their strategy is to use gene expression analysis to provide candidate genes for further functional characterisation.Plants will then be selected to produce genotypes with altered saponin levels and composition. Transgenics (engineered genetic modification) will also be used.sition.[c] Another large class of phytomedicinal compounds, terpinoids, are being studied by researchers in Copenhagen, Denmark are using the model moss Physcomitrella patens, which also has a sequenced genome, as an alternative platform for production of terpenoid drug candidates. Terpenoid-based drugs have worldwide sales of over $12Bn,
  • High-level solution architecture shows the main features of the Service. Each customer has an account on the Service. Accounts may have more than one user. Users have defined roles within the Service.Users interact with the service through web (browser or service) or command line.The Service itself consists of the business layer and the asset layer. The business later contains web interfaces for managing users and assets (data and applications), and for monitoring.Behind the business later are the assets themselves; data files, servers, web apps, even compute clusters. Applications are wither suspended, or actively running.All assets are associated with a single accountThere is a ‘special’ account that hosts a pool public assets that can be accessed from any customer account.
  • The deployment architecture shows the components that we are using as the foundation of the Service, and the lines/protocols of communication between the components,All user requests to, and responses from, the Service will pass through a front-end gateway, with the exception of direct terminal access via SSH.The gateway is responsible for authenticatingThe security software in the gateway will be OpenAMfrom ForgeRock. Eagle Used extensively for Phase 1. OpenAM is a suite of open source Authentication, Authorizationand Federation software. Simplify the implementation of transparent single sign-on (SSO)typically operate against a disparate set of identity repositories; e.g. the OpenAMIdP and the Customer IdP in the figure.For data file management we will use SEEK from the SysMO-DB consortium.This is an ISA (Investigations, Studies and Assays) compliant, open-source, web-based assets catalogue for finding, sharing and exchanging Data, Models and Processes in Systems Biology. SEEK provides an access control layer to enable administrators to restrict access to collaborators, colleagues or other individuals. SEEK is developed by the group of Carole Goble at the University of Manchester who are exiting Eagle partners and collaborators, notably on the Eagle-led Technology Strategy Board funded project to develop cloud-based genomics workflows.For application/compute management we will use CycleCloudThis is web-based cloud software that delivers secure AWS/EC2 Clusters as a ServiceEnd-users submit jobs and data through a GUI, and CycleCloud handles instance errors, auto scaling/shrinking,data aware workflows, and data encryption and key management. For reporting we will use CycleServer.This analytics engine provides performance analytics, visualization and reporting, job control and administration and resource consumption.Back end appliances will include the Eagle EnsemblBrowser (developed in Phase 1) and BioLinux, which provides 500 pre-packaged bioinformatics appliances. The Service itself will be hosted on Amazon EC2 cloud
  • Rather than read this slide, I'll leave you with a quote from Greg Lucier, chief executive officer of Life Technologies (ABI);"From a simple return on investment, the financial stake made in mapping the entire human genome is clearly one of the best uses of taxpayer dollars the U.S. government has ever made" I challenge you to think of ways you may make use of it in your own enterprise.
  • Could genomics (plus NGS/Cloud) save the world?

    1. 1. Image: CC-BY-NC-ND 3.0 Can massively parallelDNA sequencingpaired with ubiquitous utility computingsave the world? Will Spooner, CTO, Founder, Eagle Genomics
    2. 2. Famine DiseaseImage: Image: A. Cavanagh CC-BY-NC-ND 2.0 CC-BY-ND 2.0
    3. 3. Synthetic Gaia LifeImage Image babystep-away/ CC-BY-NC-SA CC-BY-NC-ND 3.0
    4. 4. Phenotype Association Scientific impact of genomicsImage: Sartr CC BY-NC-ND online publication 30 August 2011; doi:10.1038/mp.2011.101 Molecular Psychiatry advance 3.0
    5. 5. Genomics in plant breeding Germplasm/PedigreePhenotype Genotype Phenotyping Genotyping Phenotype/Genotype association Assays Breeding
    6. 6. Genomics in pharmacology Pharmacogenomics Stratified Medicine Phenotypic Genotypic Association [biomarker] Transcriptomic Right drug Epigenetic Right patient Right time©
    7. 7. Image: CC-BY-NC-ND 3.0 Genetic Determinism Vs. Missing HeritabilityImage:
    8. 8. NGS - democratising genomics $1000 genome ~$0.01 Mbase (30x coverage)
    9. 9. The Cost of Big Data Now Soon
    10. 10. Cloud Computing = Greater ScalabilityCloud Computing = Lower Cost+ Greater agility+ More flexibility+ Faster time to market+ Improved business processTSB project – “Cloud Analytics for the Life Sciences”• Eagle Genomics• University of Manchester• NGRL Image: PictureGirl CC-BY-SA
    11. 11. Cloud – democratising HPC 30,000 virtual Cores $1,300/HOUR
    12. 12. Technological Divergence
    13. 13. Methods ResultsOpen Source Open DataPublications KnowledgeOpen Access Open Innovation
    14. 14. Doing Genomics Efficiently• “Genome Content Management is the set of processes and technologies that support the creating, managing, and reporting of genomic data.”TIMELINE: Bespoke……...Common Models……………Content Management Systems Create Share
    15. 15. Genome Content Management Systems (G-CMS) Open Source ProprietaryWorkflow Oriented Database Oriented
    16. 16. Ensembl as a GCMS Leveraging Public Resources DAS Data Integration Data Querying Assembly/Genes httpd Data Reporting Variation Data AnalysisFunctional Genomics Data Integration APIComparative Genomics Data QC
    17. 17. Genomics in research productivity Apoptotic execution Phase
    18. 18. Plant models forphytopharmacologyPhyscomitrella patens Image By: Ralf Reski CC BY-ND Image By: G. Nicolella CC BY SA
    19. 19. ExploreWork together tofind a commonpurposeNurture Open Collaborate EnterpriseBuild trust,shared Innovation Academia Governmentlanguage FoundationsExploitTurn ideas into ©tangible benefits
    20. 20. Collaborative Data Sharing Platform Users Scientist Admin Depositor Collaborator Customer 1 Customer 2 Customer 3 User Interaction Layer web terminalSecurity Layer Business Layer User/Role Data Application System Reporting/Bil Management Management Management Monitoring ling Sequence Service Asset Layer Data Files Data Files Data Files Data Files Servers Servers Apps Apps Apps Clusters Customer 1 Customer 2 Customer 3 Public
    21. 21. Open Architecture – Open Source Customer Authenticate Scientist Admin IdP Depositor SAML Token Exchange SAML HTTPS Web OpenAM IdP (LDAP)Amazon EC2 Cloud Gateway OpenAM Plugin Authorise OpenAM Server HTTPS MySQL SEEK Assets DB Web Web Server CycleCloud SEEK CycleServer Condor Encrypt/ HTTPS Decrypt Web Condor Ensembl Data Files Ensembl Data Fi BioLinux DataData Files Fi Data Fi CustomerBioLinux Sandbox Customer Sandbox Fi Data EC2/AMIs S3 Storage Customer Sandbox Customer Sandbox EC2/AMIs S3 Storage
    22. 22. Eagle Academic Collaborations
    23. 23. Can genomics save the world? • Genomicsisrevolutionising: – Basic research, – Plant breeding, – Healthcare. • Genomics R&D is empowered by: – NGS, – Cloud computing, • Collaboration is key – Open knowledge/science, – Public datasets, – Open innovation.
    24. 24. –In under a month (June 2011); • 10 e coli isolates sequenced, • 9 genome assemblies released, • 16 genome annotations released, • Over 50 analyses performed, • Resulting in 12 citable publications
    25. 25. Genome Content Management Integrated, Analysed, Validated, © Eagle GenomicsExcept where otherwise noted, this work is licensed under the Creative Commons Attribution 3.0 License
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.