• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content


Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Like this presentation? Why not share!

Sequencing Cancer Genomes - Chemical Engineering at Texas A






Total Views
Views on SlideShare
Embed Views



0 Embeds 0

No embeds



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • (All notes obtained from Circos Website. Acessed April 30, 2010. http://mkweb.bcgsc.ca/circos/intro/genomic_data/)The human genome is comprised of 22 pairs of chromosomes 1-22 and the pair of sex chromosomes X,Y. This graphic shows the chromosomes arranged in a circular orientation, shown as wedges, marked with a length scale. Data placed outside of the chromosome ring represents degree of small- and large- scale variation in the genome at a given position found between different populations. Data placed on top of the chromosome ring highlights positions of genes implicated in disease, such as cancer, diabetes, and glaucoma. Data placed inside the ring links disease-related genes found in the same biochemical pathway (grey) and the degree of similarity for a subset of the genome (colored).
  • (All notes obtained from Circos Website. Acessed April 30, 2010. http://mkweb.bcgsc.ca/circos/intro/genomic_data/)(C) mapping and sequencing tracks / chromosome band (ideogram) (A) variation and repeats / snps (v126). The histogram shows the number of SNPs per 1 Mb. (F) variation and repeats / segmental duplication. A small subset of segmental duplications are drawn, filtered by locations on chromosomes 2, 3, 7, 9. The choice of locations was motivated by the need for a visually balanced set of links. (B) variants in genome structure catalogued by the TCAG database(D) locations of genes implicated in disease. Gene-to-disease mappings were done using OMIM databaseThe graphic shows the human genome annotated with data related to genes implicated in disease, regions of variation found in various populations, and regions of similarity between chromosomes. The 24 individual chromosomes (1..22 [each present in pairs in the genome], X, Y) are arranged circularly (C), and represented by labeled (C3) ideograms on which the distance scale is displayed (C1). Some chromosomes are shown at different physical scales to illustrate the rich pattern of the data (chr2 3x; chrs 18,19,20,21,22 2x; chrs 3,7,17 10x). Within each ideogram, cytogenetic bands are shown (C2). These are large-scale features used in cytogenetics to locate and reference gross changes. On the outside of the ideograms, genomic variation between individuals and populations is represented by tracks (A) and (B). The number of catalogued locations at which single base pair changes have been observed within populations is shown as a histogram (A). Large regions which have been seen to vary in size and copy number between individuals are marked in (B). Locations of genes associated with disease are superimposed on the ideograms (D). (D3) shows the location of genes implicated in cancer (very dark red), other disease (dark red) and all other genes (red). (D2) shows locations of genes implicated in lung, ovarian, breast, prostate, pancreatic, and colon cancer, colored in progressively darker shade of red. (D1) marks gene positions implicated in other diseases such as ataxia, epilepsy, glaucoma, heart disease, neuropathy, colored in progressively darker shade of red, as well as diabetes (orange), deafness (green), and Alzheimer (blue) disease. Grey lines (E) connect positions on ideograms associated with genes that participate in the same biochemical pathways. The shade of the link reflects character of the gene - dark grey indicates that the gene is implicated in cancer, grey in disease, and light grey for all other genes. Colored links (F) connect a subset of genomic region pairs that are highly similar and illustrate the deep level of similarity between genomic regions (about 50% of the genome is in so-called repeat regions regions which appear in the genome multiple times and in a variety of locations).
  • Each line points to a specific gene

Sequencing Cancer Genomes - Chemical Engineering at Texas A Sequencing Cancer Genomes - Chemical Engineering at Texas A Presentation Transcript

  • Sequencing Cancer Genomes
    John A Pack
  • Overview
    DNA Sequencing
    Circos Plot
    Sequencing Genomes
    Examples of Sequenced Cancer Genomes
    Sequencing Disagreements
    Sequencing Proponents
    Small Scale Projects
    Future Research
  • Leading Causes of Death in US
    Heart Disease: 631,636
    Cancer: 559,888
    Stroke (cerebrovascular diseases): 137,119
    Chronic lower respiratory disease: 124,583
    Accidents (unintentional injuries): 121,599
    Diabetes: 72,449
    Data for 2006 obtained from Centers for Disease Control and Prevention (CDC) (http://www.cdc.gov/nchs/fastats/lcod.htm)
  • Genetics and Genomics Timeline
  • What is DNA Sequencing
    Made up of 3 billion chemical building blocks (A, T, C, and G)
    DNA Sequencing
    Process of determining the exact order of the building blocks that make up the DNA of the 24 different human chromosomes
    Revealed the estimated 20,000-25,000 human genes within our DNA as well as the regions controlling them
  • DNA Sequencing Process
  • Circos Plot
    Circos is a software package for visualizing data and information
    Used for identification and analysis of similarities and differences arising from comparisons of genomes
    Ledford, Heidi. “The Cancer Genome Challenge”.
  • http://mkweb.bcgsc.ca/circos/
  • http://mkweb.bcgsc.ca/circos/
  • http://mkweb.bcgsc.ca/circos/
  • Previous Discovery
    Mutation in the gene IDH1 found in 2006 study of 35 colorectal cancers
    Not expected to be of importance
    Changed only a lowly housekeeping enzyme involved in metabolism
    13,000 other genes sequenced from each of 300 more samples
  • IDH1 Mutation Surfaced Again
    12% of samples of glioblastomamultiforme (type of brain cancer)
    8% of actue myeloid leukaemia samples
  • Studying IDH1 Mutation
    Studies showed the mutation changed the activity of isocirtratedehydrogenase
    Caused a cancer-promoting metabolite to accumulate in cells
    Pharmaceutical companies hunting for a drug to stop the process
    IDH1 mutation is the inconspicuous needle found in a veritable haystack of cancer-associated mutations thanks to high powered genome sequencing.
  • Sequencing Genomes
    Labs around the world are teaming up to sequence DNA from thousands of tumors as well as healthy cells from the same person
    Nearly 75 cancer genomes have at least begun to be to sequenced and published
    By the end of 2010 researchers expect to have over 100 cancer genomes fully sequenced
  • Difficulties in Researching
    Further the research goes the larger the “haystack”
    Comparison of tumor cell to healthy cell reveals dozens of single-letter changes, or point mutations
    Comparison also reveals repeated, deleted, swapped, or inverted sequences
  • Difficulties
    “The Difficulty is going to be figuring out hot to use the information to help people rather than to just catalogue lots and lots of mutations.” – Bert Voglestein, John Hopkins University
    Clinically tumors can look the same but most differ genetically
  • Distinguishing Mutation Data
    Drivers – mutations that cause and accelerate cancers
    Passengers – Accidental by-products and thwarted DNA-repair mechanisms
    Distinguishing between the drivers and passengers is not always trivial
  • Finding Mutations
    Mutations that pop up again and again
    Identify key pathways that are mutated at different points
    Finding more questions than answers
    How do researchers decide which mutations are worthy of follow up and functional analysis?
  • World Collaboration
    The International Cancer Genome Consortium Pilot Project
    11 Countries to sequence DNA
    20 cancer types
    500 tumor samples for each
    Cost to sequence each cancer type = US$20 Million
  • Ledford, Heidi. “The Cancer Genome Challenge”.
  • Contributing Countries
    United States of America
    More than 6 types of cancer being sequenced
    Ovarian Cancer
    Brain Cancer
    GlioblastomaMultiforme (IDH1 Mutation found in 12%)
    Lung Cancer
    Acute Myeloid Leukaemia (IDH1 Mutation found in 8%)
    Colon Cancer
    Breast Cancer
    ER-, PR-, HER-
    Breast Cancer
    Breast Cancer
    ER+, HER-
  • Contributing Countries
    Breast Cancer
    HER2 overepxpressing
    Liver Cancer
    Renal-cell carcinoma
    European Union Sponsored
    Pancreatic Cancer
    Ovarian Cancer
  • Contributing Countries
    Pancreatic Cancer
    Gastric Cancer
    • Pediatric Brain Cancer
    • Medulloblastoma
    • PilocyticAstrocytoma
    • Oral Cancer
    • Gingivobuccal
  • Contributing Countries
    Rare Pancreatic Cancers
    Enteropancreatic endocrine
    Pancreatic exocrine
    Liver Cancer
    • Chronic lymphocytic leukaemia
  • ICGC
    The International Cancer Genome Consortium (ICGC), est. 2008, combined two older, large scale projects
    The Cancer Genome Project
    Over 100 partial genomes and roughly 15 whole genomes. Tends to tackle over 2,000 more in the next 5-7 years
    The US National Institutes of Health’s Cancer Genome Atlas (TCGA)
    Sequence up to 500 tumors for each of 20 cancers over next 5 years
  • TCGA Pilot Project
    The two groups in the TCGA are collaborating to sequence a subset of tumor samples (about 100) from each cancer type
    The most promising areas of the genome will then be sequenced in the remaining 400 samples
  • TCGA Network
  • From the Study
    Larger sample numbers could provide driver mutations like the one in IDH1
    Knowledge and study of these mutations could lead to developing new cancer therapies according to researchers
  • “If there are lots of abnormalities of a particular gene, the most likely explanation is often that those mutations have been selected for by the cancers and therefore are cancer-causing.”
    • Michael Stratton
    (Co-Director of the Cancer Genome Project)
  • Challenging
    IDH1 was first overlooked on the basis of the colorectal cancer data alone
    Search expanded to other cancers before importance was revealed
    Some drivers are mutated at very low frequency (less than 1% of the cancers)
    heavy sampling is needed to find these low frequency drivers
    Sequencing 500 samples per cancer reveal mutations present in as few as 3% of the tumors, but may still have important biological lessons
    Need to know in order to understand the overall genomic landscape of cancer
  • Another Popular Approach
    Look for mutations that cluster in a pathway
    In an analysis of 24 pancreatic cancers
    12 identified signaling pathways had been altered
    Very difficult approach
    Pathways overlap and boundaries not clear
    Many pathways that are obtained using data from different animals or cell types do not always match up with what’s found in human tissue
  • There is A Lot More to Do
    Distinguishing between drivers and passengers gets increasingly harder as researchers are beginning to sequence entire tumor genomes
    Only a fraction of the existing cancer genomes have been completely sequenced
  • Protein and Non-Protein Coding Regions
    Most cancer genome sequences are only covering the exome
    Keep costs low
    Directly codes for protein (easiest to interpret)
    Importance of mutations found in the non-protein coding depths
    More challenging
    Scientists don’t know what function these regions usually serve
    Majority of mutations
  • Cancer Genomes Coming Fast
    Some Full Genome have been Sequenced
    Small-cell lung carcinoma (Type of Lung Cancer)
    Metastatic melanoma (Type of Skin Cancer)
    Basal-like breast cancer (Type of Breast Cancer)
    Only exome has been sequenced
    Glioblastomamultiforme (Type of Brain Cancer)
  • Lung CancerCancer: Small-Cell Lung Carcinoma
    Sequenced: full genome
    Source: NCI-H209 cell line
    Point mutations: 22,910
    Point mutations in gene regions: 134
    Genomic rearrangements: 58
    Copy-number changes: 334
    Duplication of the CHD7 gene confirmed in two other small-cell lung carcinoma cell lines
  • Skin CancerCancer: Metastatic Melanoma
    Sequenced: full genome
    Source: COLO-829 cell line
    Point mutations: 33,345
    Point mutations in gene regions: 292
    Genomic rearrangements: 51
    Copy-number changes: 41
    Patterns of mutation reflect damage by ultraviolet light
    Ledford, Heidi. “The Cancer Genome Challenge”.
  • Breast CancerCancer: Basal-Like Breast Cancer
    • Point mutations in gene regions:
    • 200
    • 225
    • 328
    • Genomic rearrangements: 34
    • Copy-number changes:
    • 155
    • 101
    • 97
    Sequenced: full genome
    primary tumor
    brain metastasis
    tumors transplanted into mice
    Point mutations:
    27,173 in primary
    51,710 in metastasis
    109,078 in transplant
    Ledford, Heidi. “The Cancer Genome Challenge”.
    • Highlights:
    Patterns of mutation reflect damage by ultraviolet light
  • Brain CancerCancer: GlioblastomaMultiforme
    Sequenced: exome (no complete Circos plot)
    7 patient tumors
    15 tumors transplanted into mice
    Genes containing at least one protein altering mutation: 685
    Genes containing at least one protein altering point mutation: 644
    Copy-number changes: 281
    Mutations in the active site of IDH1 have been found in 12% of patients
  • Finding all mutations
    Very important to find all, even in non-protein, regions
    Maybe none of these mutations could pertain to the causation of cancer
    Some could
    Only way to find out is to systematically investigate them
  • Researcher Disagreements
    Some researchers Argue against fully sequencing genomes
    Cost of projects outweighs the benefits
    Prices will drop due to technology advances in next few years, why not wait?
    In the mean time
    Mutations that affect how many copies of a gene are found in a genome
    Cheaper to assess
    Provide more intuitive insight into biological processes
  • Sequencing Proponents
    Changes in genome copy number detection
    Array-based technology
    Fast and relatively inexpensive
    Higher-resolution snapshot of regions
    The higher resolution can provide
    More precision in mapping boundaries
    Ability to catch tiny duplications or deletions that an array may not detect
  • Array-Based Process
  • Don’t Wait to Sequence
    A lot of small scale hospitals are investing millions of dollars into cancer sequencing projects
    (e.g.) St. Jude Children’s Research Hospital
    Proponents don’t want to wait
    The real work starts after the sequencing is over
    Determining what these mutations are doing
    Old-fashioned biology and experimental analysis
  • US National Cancer Institute
    Two 2-year projects
    Develop high-throughput methods
    Test how the mutations identified by the TCGA pilot project affect cell function
    Aim to pull needles from the haystack and make since of them (like the IDH1 mutation)
  • US National Cancer Institute Projects
    Dana-Farber Cancer Center (Boston)
    Systematically amplify and reduce the expression of genes of interest in cell cultures
    Cold Spring Harbor Laboratory (New York)
    Study cancer-associated mutations using tumors transplanted into mice
  • Other Large Scale Projects
    Asses effects of deleting each gene in the mouse genome
    Learn more about the normal function of genes that are mutated in cancer
  • Impact
    Cancer is a world-wide disease
    Cancer Patients
    New Technology
    New Treatment Processes
    More grants to make new advances
  • Conclusions
    Sequencing tumor DNA genomes can lead to finding cancer-causing gene mutations
    Very challenging to pinpoint gene mutations that are cancer-causing
    Very high sample numbers
    Sampling and sequencing full cancer genomes is extremely expensive
    Some opponents think the cost outweighs the benefits right now
    A lot of people think the cost is worth it, because there is a lot more work to do after sequencing, so we should not wait for prices to come down
  • Future Research
    Better technology for making the sequencing equipment to bring costs down
    New technology to detect mutations
    Complete Full genome sequences for all cancers
    Developing ways to stop or kill these mutations but leave the healthy cells unharmed
    Nanotechnology (nanopharmaceuticals could have an impact here)
  • References
    Ledford, Heidi. “The Cancer Genome Challenge”. Nature Journal. Vol 464. 15 April 2010. p. 972-974. Macmillan Publishers Limited. 2010
    Human Genome Project Information. Facts About Genome Sequencing. Accessed: April 29, 2010. Last modified: September 19, 2008. http://www.ornl.gov/sci/techresources/Human_Genome/faq/seqfacts.shtml.
    Krzywinski, M. et al. Circos: an Information Aesthetic for Comparative Genomics.Genome Res (2009) 19:1639-1645
    Francis S. Collins1, et al. “A vision for the future of genomics research”. Nature Publishing Group. 2010. Accessed April 30, 2010. http://www.nature.com/nature/journal/v422/n6934/full/nature01626.html
    Circos Website. Acessed April 30, 2010. http://mkweb.bcgsc.ca/circos/