Exploring Your Personal
Genome with Free, Online
Bioinformatics Tools
by
Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP...
What is the future of genomic sciences and bioinformatics?
Ethical considerations of newborn screening:
privacy, inaccurac...
Reduced cost: The $1,000 genome
“Illumina’s DNA Supercomputer Ushers in the $1,000 Human Genome” (January 14, 2014)
http:/...
Presentation Overview: Predictive Pathology
Hopefully you will learn a great deal today about the biological basis of dise...
Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts
Genome: all hereditary genetic material of an organism
Chromosome: DNA, protein, and RNA found in cells
Gene: strands of 5...
GENETICS (GENES/CHROMOSOMES)
Mitosis v. Meiosis
Some chromosome abnormalities are
not inherited. De novo variants appear
f...
Video: Cell Division and the Cell Cycle
http://www.youtube.com/watch?v=Q6ucKWIIFmg
BIOCHEMISTRY (PROTEINS)
A Short Overview of Molecular Biology
and Bioinformatics
Video: Central dogma of molecular biology (1958): replication, transcription and translation
Variations (mutations) can oc...
http://www.youtube.com/watch?v=D3fOXt4MrOM
Video animation:
The central dogma of molecular biology
"DNA The Secret of Life...
After proteins are formed they fold into various shapes based on their chemical makeup.
Misfolding is a second cause of de...
When cells go bad,
control decisions must be made
that regulate the micro-“society.”
Reform or Remove?
DNA ligase, an enzy...
Sir John Gurdon:
Epigenetics Founder &
Nobel Laureate
"for the discovery that
mature cells can be
reprogrammed to become
p...
Protein-Protein Interaction
How proteins interact with one another
is key to understanding their function in the body.
Onl...
Comparative Genomics and Phylogeny
To locate new disease markers and learn how pathogens function,
it is helpful to examin...
Visualization of a Phylogenic Tree Using MEGA 6
Newick notation:
((((Cucumis sativus,Ricinus communis), Solanum lycopersic...
“Proteins are clustered on branches on the basis of the
similarity of their amino acid sequences. The phylogenetic
represe...
Mega-genomics
and
Next Generation
Sequence
Analysis
Sequencing human DNA:
The Human Genome Project and the Personal Genome Project
First Human Genomes Sequenced:
1) Dr. J. Cr...
Current understanding of the human genome,
categorized by function of each gene product,
given both as number of genes
and...
The Cost Reduction for Sequencing Genomes
Greatly Outpaced Moore’s Law
State Direct-to-Consumer Testing Statutes and Regulations
Courtesy of the Genetics & Public Policy Center with support fro...
Limitations of GINA
“The Genetic Information Nondiscrimination Act,
known as GINA, does not apply to three types of
insura...
Henrietta Lacks:
The Ethics of Cell Line Development and Research
Henrietta Lacks, 1945. Image courtesy of The Lacks Famil...
Testing Companies
23andMe
454 Life Sciences
Advanced Healthcare, Inc
AIBioTech
Ancestry DNA
Atlas Sports Genetics
Athletic...
Screenings
More than 420 Conditions and Traits are Screened for During Genetic Testing.
CONDITIONS
CANCER
LIVER
HEART
HEAR...
How to Submit Your DNA for Sequencing and Analysis with the Personal Genome Project
Basic eligibility:
1. US citizen age 2...
Volunteer
huA90CE6
John Lauerman
In His Own Words
Whole Genome Sequence (WGS) Analysis
http://youtu.be/YGIxMYiPLOU
Volunteer huA90CE6 = Case Study: John Lauerman (Harvard Analysis)
JAK2-V617F and APOE-C130R variations
Step 1
Create a C:d...
Locating and Interpreting Errors: Cytogenetic Location
JAK2-V617F is located on the short arm of chromosome 9p (9pLOH).
So...
JAK2-V617F
Human Reference Genome - “Normal” JAK2 using UCSCGB
Right Click over JAK2
and choose
“Get DNA for JAK2.”
Then, ...
We will examine a volunteer’s “Variant” JAK2 with
two free bioinformatics tools using Windows.
At the end of the talk ther...
Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA & PyMOL)
Step 1
Download and install Python https...
Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA)
Step 4
Use the Python-driven tool designed for t...
Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA)
UNDER DEVELOPMENT
After a single search of a who...
Volunteer huA90CE6 Case Study:
John Lauerman
(Looking Closer with BLAST)
Step 5
Use the generated FASTA file
to perform a ...
Volunteer huA90CE6 Case Study:
John Lauerman
(Looking Closer with BLAST)
Free Tools for Other Platforms
CGATools
(MacOS or LINUX only)
Download Complete Genomics Analysis Tools software and User ...
Using a mySQL database, it is possible to import many whole human genome sequences from
the PGP project by following the e...
Selected Bibliography
ALBERTS, B. (1983). Molecular biology of the cell. New York, Garland Pub.
CAREY, N. (2013). Epigenet...
Credits
Personal Genome Project (Harvard)
MITx: 7.00x: Introduction to Biology - The Secret of Life
(14 weeks) : Eric Land...
Contact Information
Archivopedia.com
Exploring your personal genome with free, online bioinformatics tools
Upcoming SlideShare
Loading in …5
×

Exploring your personal genome with free, online bioinformatics tools

4,477 views

Published on

"Exploring your personal genome with free, online bioinformatics tools" by Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP

Published in: Science, Technology
1 Comment
3 Likes
Statistics
Notes
  • How to get your genome sequenced for free: See slides 34-39 onward for details about Harvard's genome sequencing project. Inquire about the upcoming availability of the free Personal Genome Analyzer tool in development now (sb838@cantab.net).
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Views
Total views
4,477
On SlideShare
0
From Embeds
0
Number of Embeds
14
Actions
Shares
0
Downloads
52
Comments
1
Likes
3
Embeds 0
No embeds

No notes for slide

Exploring your personal genome with free, online bioinformatics tools

  1. 1. Exploring Your Personal Genome with Free, Online Bioinformatics Tools by Shannon Bohle, BA, MLIS, CDS (Cantab), FRAS, AHIP .org 2014 Tech Conference
  2. 2. What is the future of genomic sciences and bioinformatics? Ethical considerations of newborn screening: privacy, inaccuracy, discrimination, eugenics Video: Gattica (1997): http://www.youtube.com/watch?v=1Q67bMYOm7E
  3. 3. Reduced cost: The $1,000 genome “Illumina’s DNA Supercomputer Ushers in the $1,000 Human Genome” (January 14, 2014) http://www.businessweek.com/articles/2014-01-14/illuminas-dna-supercomputer-ushers-in-the-1-000-human-genome + Genome sequencing at birth: “Baby DNA Analysis Ushers in Brave New World of Treatment” (January 16, 2014) http://www.bloomberg.com/news/2014-01-16/baby-dna-analysis-ushers-in-brave-new-world-of-treatment-health.html = Big industry “Illumina and a Billionaire Want to Jump-Start Genomics Upstarts” (February 17, 2014) http://www.businessweek.com/articles/2014-02-17/illimuna-and-billionaire-yuri-milner-to-aid-genomics-startups The future of genomic sciences and bioinformatics is NOW.
  4. 4. Presentation Overview: Predictive Pathology Hopefully you will learn a great deal today about the biological basis of disease. Specifically, we will discuss the following pathways in which disease can occur: • At conception, chromosomes from both parents combine to pass on genetic material to a child. Sometimes when chromosomes combine there are problems that occur in this crossing over process called chiasma, and these variations are not inherited. • Chromosomal abnormalities like an addition, deletion, translocation, inversion, or insertion, are inherited. A common example of a structural variation would be Down Syndrome where there is an additional copy of chromosome 21. • Also at conception, because chromosomes contain DNA, the specific traits (called phenotypes) and the genetic code (called genotypes) are also transferred. Genotypes are always present, while phenotypes may be expressed (dominant) or hidden (recessive) in an individual. Recessive traits can be passed on through generations expressing themselves down the family line, and dominant traits can skip generations. A common example of an autosomal recessive heritable disease is sickle cell anemia. • During childhood and adulthood, factors like the environment (such as exposure to chemicals), diet, exercise, aging, et cetera can also damage genes, mutating them, and this may lead to disease. The branch of study examining context dependent, non-inherited factors is called epigenetics. An example of this is Protein misfolding. • Inherited and de novo (chiasmic, protein misfolding, and epigenetically-caused) variations can be studied in detail when looking at the level of either proteins or DNA (which is made of amino acids). Therefore, sequencing of plant, animal, and other forms of life have been done to try to understand and control biology, specifically biological function. The field of functional genomics designs technology tools that aid in diagnoses when biology malfunctions. About 40-60% of genes in a sequenced genome are related to biological function. Under different conditions, proteins may express themselves in novel, transient ways. These gene expressions are difficult to detect. Trained professionals identify specific biomarkers, like JAK2, that have a high association with diseases. Knowing these in advance can sometimes influence a person’s lifestyle choices, such as having children, diet, and medical decisions. Because bioinformatics is a very new field, a genetic counselor should interpret test results to provide patients with guidance on two items. First, their level of risk by percentage, and second, the level of confidence scientists have that a specific biomarker actually causes a disease. Scientists determine this looking across species, through phylogenetics. But most importantly they learn about the genetic basis of human disease by using bioinformatics tools to compare DNA of patients who share the same disease and creating cell lines. That is why projects like the Personal Genome Project not only benefit the individual participant, but also contribute to advances in medicine and personalized medicine. “Personalized medicine is an emerging practice of medicine that uses an individual's genetic profile to guide decisions made in regard to the prevention, diagnosis, and treatment of disease” (NLM ‘s GHR glossary). Having your genome sequenced provides an overview of your genetic background as well as the state of your genes at a given time.
  5. 5. Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts
  6. 6. Genome: all hereditary genetic material of an organism Chromosome: DNA, protein, and RNA found in cells Gene: strands of 5’ to 3’ DNA (promoters, exons, introns) (Humans have about 22,000 genes) Allele: one of 2 or more variants of each gene (two of which are inherited from parents) Genotype: coded information 2 Types: Homozygote: same alleles – AA, aa Heterozygote: different alleles – Aa Phenotype: physical manifestation of a characteristic Dominant Trait: expressed Recessive Trait: not expressed a) Autosomal Recessive: Two abnormal copies must be present to get the disorder b) X-linked Recessive: Females are carriers only GENETICS (GENES/CHROMOSOMES) A Short Overview of Biological Inheritance (Heredity) Described Through Cell Biology CELL GENOME CHROMOSOME GENE DNA AMINO ACIDS Image Courtesy of Mayo Clinic: http://www.mayoclinic.org/procedure/genetic-testing/multimedia/genetic-disorders/sls-20076216 If you have a genetic disorder or are a carrier, will your children inherit it? NOT NECESSARILY. SEE AN MD OR GENETIC COUNSELOR.
  7. 7. GENETICS (GENES/CHROMOSOMES) Mitosis v. Meiosis Some chromosome abnormalities are not inherited. De novo variants appear for the first time in an individual. They can occur in recombination or “crossing over” during mitosis or meiosis. Image Credit: OpenStax College. "Laws of Inheritance." Connexions. February 24, 2014. http://cnx.org/content/m44479/1.3. Mitosis occurs with somatic cells. It results in two cells that are duplicates of the original cell. In other words, one cell with 46 chromosomes becomes two cells with 46 chromosomes each. This kind of cell division occurs throughout the body, except in the reproductive organs. This is how most of the cells that make up our body are made and replaced. These mutations are not passed on to children. Meiosis occurs with germ cells. It results in cells with half the number of chromosomes (in diploid humans, 23 instead of the normal 46). These are the eggs and sperm. These mutations can be passed on to children in their stem cells. During gestation, the stem cells gain specificity as somatic cells of various types and germ cells to become a male or female child. Source: http://www.genome.gov/11508982#6 Chiasma During meiosis chromosomal material crosses over
  8. 8. Video: Cell Division and the Cell Cycle http://www.youtube.com/watch?v=Q6ucKWIIFmg
  9. 9. BIOCHEMISTRY (PROTEINS) A Short Overview of Molecular Biology and Bioinformatics
  10. 10. Video: Central dogma of molecular biology (1958): replication, transcription and translation Variations (mutations) can occur during these processes, sometimes causing diseases that can be passed on to children. http://www.youtube.com/watch?v=Q_WRFw8KQk4
  11. 11. http://www.youtube.com/watch?v=D3fOXt4MrOM Video animation: The central dogma of molecular biology "DNA The Secret of Life” by PBS
  12. 12. After proteins are formed they fold into various shapes based on their chemical makeup. Misfolding is a second cause of de novo variants. Misfolding sometimes causes disease, and is passed on to children. A linear analysis of amino acid chains in a protein cannot anticipate amino acids near each other when proteins fold so 3D modeling is used. http://www.youtube.com/watch?v=Pjt1Q2ZZVjA “Simulating How Proteins Self-Assemble, Or Fold” by Stanford University Video: Protein folding
  13. 13. When cells go bad, control decisions must be made that regulate the micro-“society.” Reform or Remove? DNA ligase, an enzyme, (shown left, in color) repairs mistakes in DNA. Some proteins, like p53, (shown below) enforce cell death (apoptosis). P53 malfunction is one cause of cancer, where cells with mutations grow out of control. The Life Cycle of DNA
  14. 14. Sir John Gurdon: Epigenetics Founder & Nobel Laureate "for the discovery that mature cells can be reprogrammed to become pluripotent" Turning back the clock on disease: Mature, specialized cells can be reverted to their embryonic stem cell state. University of Cambridge, 2012, the year Gurdon won the Nobel Prize Xenopus
  15. 15. Protein-Protein Interaction How proteins interact with one another is key to understanding their function in the body. Only 1% of the human genome codes for 20,000 our proteins. Function is largely determined on how proteins interact. Epigenetics “Epigenetic mechanisms are affected by several factors and processes including development in utero and in childhood, environmental chemicals, drugs and pharmaceuticals, aging, and diet. DNA methylation is what occurs when methyl groups, an epigenetic factor found in some dietary sources, can tag DNA and activate or repress genes. Histones are proteins around which DNA can wind for compaction and gene regulation. Histone modification occurs when the binding of epigenetic factors to histone “tails” alters the extent to which DNA is wrapped around histones and the availability of genes in the DNA to be activated. All of these factors and processes can have an effect on people’s health and influence their health possibly resulting in cancer, autoimmune disease, mental disorders, or diabetes among other Image and description credit: National Institutes of Health
  16. 16. Comparative Genomics and Phylogeny To locate new disease markers and learn how pathogens function, it is helpful to examine ultra-conserved regions in cross-species protein & nucleic acid production, because these are most often linked to important bodily functions, disease and health. (See: 1) Kumar S, Sanderford M, Gray VE, Ye J, Liu Li. Evolutionary diagnosis method for variants in personal exomes. Nature Methods (2012) p;9(9):855-6. doi:10.1038/nmeth.2147. 2) Liu L, Kumar S. (2013) Evolutionary Balancing is Critical for Correctly Forecasting Disease Associated Amino Acid Variants. Molecular Biology and Evolution 30:1252-1257 (Epub 2013 March 5)) About 5%-10% of the human genome are regulatory motifs across species, that turn genes “on” and “off” to control gene expression, in addition to the 1% used for coding proteins.
  17. 17. Visualization of a Phylogenic Tree Using MEGA 6 Newick notation: ((((Cucumis sativus,Ricinus communis), Solanum lycopersicum), Medicago truncatula)(Arabidopsis thaliana,Capsella rubella))
  18. 18. “Proteins are clustered on branches on the basis of the similarity of their amino acid sequences. The phylogenetic representation tends to cluster structurally (and sometimes functionally) related proteins. Drugs targeting a specific protein are more likely to be active against other proteins on the same branch. Distinct phylogenetic branches are highlighted with distinct colours (in the case of the malignant brain tumour (MBT) family, where only a few MBT domains are actually binding methyl-lysines, the red colour coding indicates the branch where all known methyl-lysine-binding domains are clustered). We assembled protein families by looking for domains associated with 'writing', 'reading' and 'erasing' acetyl and methyl marks in the Human Protein Reference Database, and by complementing the list with data from the literature, as well as data from the Pfam protein family database and the SMART (Simple Modular Architecture Research Tool) database. The phylogeny outlined in the trees is derived from multiple sequence alignments of the domain after which the family was named (full-length sequences were used for acetyltransferases as the catalytic domain is not always clearly defined for this family). If a domain is present multiple times in a protein, the protein is shown multiple times in the corresponding tree, followed by the sequential iteration of the domain in parenthesis for example, L3MBTL(2) corresponds to the second MBT domain of the protein L3MBTL. If multiple variants with insertions or deletions were reported for a gene, the variant number according to Swiss-Prot nomenclature is indicated after a hyphen: for example, TRIM33-2 in the tree of bromodomain-containing proteins corresponds to the second Swiss-Prot variant of the TRIM33 (tripartite motif-containing protein 33) bromodomain. For each tree, a seed alignment was derived from available protein structures by aligning residues that were superimposed in the three-dimensional space. Additional sequences were appended by aligning them to the closest seed sequence..”http://www.nature.com/nrd/journal/v11/n5/fig_tab/nrd3674_F2.html Phylogenetic trees of epigenetic protein families.
  19. 19. Mega-genomics and Next Generation Sequence Analysis
  20. 20. Sequencing human DNA: The Human Genome Project and the Personal Genome Project First Human Genomes Sequenced: 1) Dr. J. Craig Venter 2) Dr. James D. Watson: Molecular Biology Founder & Nobel Laureate 3) Personal Genome Project 4) Hundred Person Wellness Project 5) UK Personal Genome ProjectCold Spring Harbor Laboratory, 2006 Genome-Wide Association Studies (GWAS) compare one human genome to another to look for similarities and differences that might cause disease.
  21. 21. Current understanding of the human genome, categorized by function of each gene product, given both as number of genes and as percentage of all genes. Image description and credit: Mikael Häggström (Wikimedia Commons) Our understanding of function within the human genome is incomplete. More samples are needed for improved results.
  22. 22. The Cost Reduction for Sequencing Genomes Greatly Outpaced Moore’s Law
  23. 23. State Direct-to-Consumer Testing Statutes and Regulations Courtesy of the Genetics & Public Policy Center with support from The Pew Charitable Trusts
  24. 24. Limitations of GINA “The Genetic Information Nondiscrimination Act, known as GINA, does not apply to three types of insurance — life, disability and long-term care — that are especially important to people who may have serious inherited diseases … The American Medical Association’s code of ethics states that ‘it may be necessary’ for doctors to maintain a separate file for genetic test results so the information is not sent to insurers.” -- “Fearing Punishment for Bad Genes,” The New York Times, April 7, 2014. http://www.nytimes.com/2014/04/08/science/fearing-punishment-for-bad-genes.html?_r=1 Genetic Information Nondiscrimination Act (GINA) of 2008: http://www.genome.gov/24519851
  25. 25. Henrietta Lacks: The Ethics of Cell Line Development and Research Henrietta Lacks, 1945. Image courtesy of The Lacks Family. (Source: Wikipedia). Do you own your DNA?
  26. 26. Testing Companies 23andMe 454 Life Sciences Advanced Healthcare, Inc AIBioTech Ancestry DNA Atlas Sports Genetics Athleticode Biologis Personal Genomics Service Bioresolve Counsyl Complete Genomics deCODE Genetics deCODEme.com DNA-CARDIOCHECK DNA DTC DNATraits Eastern Biotech & Life Sciences easyDNA EnteroLab Family Tree DNA Future Genetics Geenitesti Genelex GenePlanet Genetic Health Genetic Technologies Genetic Testing Laboratories Geneyouin Genographic Project Genotek Gentle Labs Graceful Earth HealthCheckUSA HelloGene / HelloGenome Holistic Health IDNA.com i-gene Illumina Indian Biosciences InoLife Technologies Interleukin Genetics JCVI Knome Lumigenix Map My Gene MapMyGenome meragenome.com MyGene23 Navigenics Oxford Nanopore Technologies Pacific Biosciences Pathway Genomics Pediatrix Medical Group Perkin Elmer Genetics Personal Genome Project Personalis PHENOM Biosciences Positive Bioscience Sequenom SNPedia Test Country Ubiome Viaguard/Accu-metrics vuGene Xcode Life Sciences As of March 10, 2014, 23andMe had 650,000+ genotyped customers
  27. 27. Screenings More than 420 Conditions and Traits are Screened for During Genetic Testing. CONDITIONS CANCER LIVER HEART HEARING SIGHT DIABETES PSYCHIATRIAC/PSYCHOLOGICAL REPRODUCTIVE / STD (FERTILITY) REGULATORY FUNCTIONS (BREATHING, SLEEP, WEIGHT, RENAL) ADDICTION (ALCOHOL, DRUG) IMMUNE SYSTEM (HIV, AIDS) MUSCULO-SKELETAL (MARFANS) PHARMACOGENOMICS/DRUG EFFICACY (CANCER, WARFARIN) NEUROLOGICAL (PARKINSON’S, ALZHEIMER’S, MS) SKIN ABILITIES & PHYSICAL TRAITS INTELLIGENCE ENDURANCE EYE & HAIR COLOR NCBI Resources https://www.ncbi.nlm.nih.gov/variation http://www.ncbi.nlm.nih.gov/guide/genetics- medicine http://www.ncbi.nlm.nih.gov/books/NBK1116 http://www.ncbi.nlm.nih.gov/medgen http://www.ncbi.nlm.nih.gov/mesh Other Resources http://www.omim.org http://www.orpha.net http://www.genome.gov http://www.dnapolicy.org See handout for specific tests Asclepius
  28. 28. How to Submit Your DNA for Sequencing and Analysis with the Personal Genome Project Basic eligibility: 1. US citizen age 21 or older 2. Additional details: http://www.personalgenomes.org/harvard/protocols How it Works: We will be using an existing volunteer’s genome for this presentation. Steps: 1. Provide Open Consent (form) 2. Supply Medical History (form) 3. Donate DNA Samples (saliva, hair, blood, tissue) by self-collection or at a designated facility 4. Samples Sent to Lab (blood=dna, tissue=exome, saliva=microbiome); tissue samples may be used to develop cell lines for research purposes 5. Harvard’s PGP Team Analyzes Data for Anomalies and Creates a Personalized Health Prognosis Report 6. The PGP Team Publishes Your Information Online (Your data is associated with a volunteer number, but your name can also be used if you would like to do this) 7. Safety follow-up monitoring by email 8. Additional details: http://www.personalgenomes.org/harvard/howitworks
  29. 29. Volunteer huA90CE6 John Lauerman In His Own Words Whole Genome Sequence (WGS) Analysis http://youtu.be/YGIxMYiPLOU
  30. 30. Volunteer huA90CE6 = Case Study: John Lauerman (Harvard Analysis) JAK2-V617F and APOE-C130R variations Step 1 Create a C:data folder and download John Laurman’s genome from the PGP website: https://my.pgp-hms.org/profile/huA90CE6. Examine the variant report on the same page.
  31. 31. Locating and Interpreting Errors: Cytogenetic Location JAK2-V617F is located on the short arm of chromosome 9p (9pLOH). Sources: Kralovics R1, Passamonti F, Buser AS, Teo SS, Tiedt R, Passweg JR, Tichelli A, Cazzola M, Skoda RC. A gain-of-function mutation of JAK2 in myeloproliferative disorders. N Engl JMed. 2005 Apr 28;352(17):1779-90. There are 22 chromosomes and X or Y. The first integer is the chromosome number. The second integer is the letter p or q, where p is the “short arm” and q is the “long arm.” The position is usually designated by two digits (representing a region and a band), which are sometimes followed by a decimal point and one or more additional digits (representing sub-bands within a light or dark area) http://ghr.nlm.nih.gov/handbook/howgeneswork/ genelocation LIST OF COMMON ERRORS BY CHROMOSOME NUMBER: http://ghr.nlm.nih.gov/chromosomes 9pLOH Janus kinase 2 – Cytogenetic Location: 9p24 http://ghr.nlm.nih.gov/chromosome/9 Human Gene JAK2 Transcript (Including UTRs) Position: chr9:4,985,245- 5,128,183 Size: 142,939 Total Exon Count: 25 Strand: + Coding Region Position: chr9:5,021,988- 5,126,791 Size: 104,804 Coding Exon Count: 23
  32. 32. JAK2-V617F Human Reference Genome - “Normal” JAK2 using UCSCGB Right Click over JAK2 and choose “Get DNA for JAK2.” Then, in the popup window, choose “get DNA.” Using the shift key, highlight all the information. “Save As” JAK2. . Open the file with notepad to see JAK2 in more detail. http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg19&position=chr9%3A4985245-5128183
  33. 33. We will examine a volunteer’s “Variant” JAK2 with two free bioinformatics tools using Windows. At the end of the talk there will be a list of additional non-Windows compatible tools for other systems like Linux, MAC, and iPad. PGA BLAST National Center for Biotechnical Information (NCBI) (Web-based) Personal Genome Analyzer from Archivopedia
  34. 34. Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA & PyMOL) Step 1 Download and install Python https://www.python.org/download/releases/2.7.5 Windows X86-64 MSI Installer (2.7.5) [1] (sig) Step 2 Download and install the PyMOL extension for a free 3D molecule viewer http://www.lfd.uci.edu/~gohlke/pythonlibs/#pymol (pymol-1.7.1.0.win-amd64-py2.7.exe) Find application file C:Python27PyMOL Find PyMOL application file in the list and create shortcut. Drag shortcut to the desktop. Double click icon on desktop to run PyMOL. SETTING UP PYTHON (Win 7, 64-bit) Note: Installing the extension may open a C prompt window to compile. Step 3 Download and install the wxPython extension: http://downloads.sourceforge.net/ wxpython/wxPython3.0-win64-3.0.0.0-py27.exe
  35. 35. Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA) Step 4 Use the Python-driven tool designed for this project to convert an isolated chromosome in your whole genome sequence from TSV to FASTA and SQL formats in under 5 minutes. Note: The following sources were used to create the tool: Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup Human reference genome (rCRS) - http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta Insert • Browse your hard drive for the Volunteer’s Whole Genome Sequence • File name: huA90CE6--GS000006909-ASM.tsv Insert • Enter a single chromosome number you wish to examine • 1-22, X, or Y; or leave blank for whole genome. [Enter 9] Insert • Enter an exact location or leave at defaults if you wish to scan the whole chromosome or whole genome. [Use Default] Check mark “Generate FA” for FASTA Check mark “Generate SQL” for SQL Click the PROCESS BUTTON. Go to C:data for the converted files in FASTA and SQL formats.
  36. 36. Volunteer huA90CE6 -- Case Study: John Lauerman (Looking Closer with PGA) UNDER DEVELOPMENT After a single search of a whole genome or chromosome, use PGA to view the FASTA file in the “View FASTA” window. Or, view the exact location of variants simply by clicking on the “Variants” tab. This image shows some variants in John Lauerman’s Chr1 compared to the Human Reference Genome. Note: The following sources were used to create the tool: Search engine - http://wiki.personal-genome.org/index.php?title=Talk:MtDNA_haplogroup Human reference genome (rCRS) - http://www.ncbi.nlm.nih.gov/nuccore/251831106?report=fasta Future plans include adding reports with graphs and other visualizations.
  37. 37. Volunteer huA90CE6 Case Study: John Lauerman (Looking Closer with BLAST) Step 5 Use the generated FASTA file to perform a BLASTn search. In this case, John Lauerman’s Chr9 file was used (after using PGA, it is located in C:data with a .fa extension).
  38. 38. Volunteer huA90CE6 Case Study: John Lauerman (Looking Closer with BLAST)
  39. 39. Free Tools for Other Platforms CGATools (MacOS or LINUX only) Download Complete Genomics Analysis Tools software and User Guide documentation: http://cgatools.sourceforge.net CGA Tools 1.8.0 Software: CGA Tools 1.8.0 User Guide: http://cgatools.sourceforge.net/docs/1.8.0/cgatools-user-guide.pdf Illumina’s MyGenome App Requires iOS 6.1 or later. Compatible with iPad. http://www.illumina.com/clinical/clinical_informatics/mygenome_app.ilmn Complete Genomics’ Genome Voyager http://www.completegenomics.com/analysis-tools/voyager Complete Genomics’ List of Third Party Tools: http://www.completegenomics.com/analysis-tools/third-party-tools PyMOL for Linux and Mac: http://www.pymolwiki.org/index.php/Linux_Install http://www.pymolwiki.org/index.php/MAC_Install
  40. 40. Using a mySQL database, it is possible to import many whole human genome sequences from the PGP project by following the example in Slide #36 using PGA. 2. Consider the needed space allocation. ● Each unzipped TSV file of an entire genome is about 1.3 MB TO GET STARTED 1. Determine the minimum and ideal sample sizes (number of volunteer DNA sequences) for significance in your study (usually 10,000). The PGP aims for a collection of 100,000 sequenced genomes. In silico human genome scientific studies can be conducted for the following applications: ● disease biomarker identification ● pharmacogenetics 3. Consider needed time for conversion and import into a mySQL database. Create your own database Analyzing Collections of Whole Human Genomes Through Multiple Sequence Alignments and Analysis
  41. 41. Selected Bibliography ALBERTS, B. (1983). Molecular biology of the cell. New York, Garland Pub. CAREY, N. (2013). Epigenetics revolution: how modern biology is rewriting our understanding of genetics, disease, and inheritance. CHURCH, G. M., & REGIS, E. (2012). Regenesis: how synthetic biology will reinvent nature and ourselves. New York, Basic Books. SCHRÖDINGER, E. (2012). What is life?: the physical aspect of the living cell. Cambridge, Univ. Press. SKLOOT, R. (2010). The immortal life of Henrietta Lacks. New York, Crown Publishers. VENTER, J. C. (2007). A life decoded: my genome, my life. New York, Viking. WATSON, J. D. (1968). The double helix; a personal account of the discovery of the structure of DNA. New York, Atheneum. [SIGNED FIRST EDITION] WATSON, J. D. (2008). Molecular biology of the gene. San Francisco, Pearson/Benjamin Cummings. ZVELEBIL, M. J., & BAUM, J. O. (2008). Understanding bioinformatics. New York, Garland Science.
  42. 42. Credits Personal Genome Project (Harvard) MITx: 7.00x: Introduction to Biology - The Secret of Life (14 weeks) : Eric Lander (MIT, Harvard) Bioinformatic Methods I | Coursera (6 weeks): Nicholas Provart - (University of Toronto) Bioinformatic Methods II | Coursera (6 weeks): Nicholas Provart - (University of Toronto) Illumina Gattica (screenshot) Genetics & Public Policy Center Mayo Clinic Stanford University Mega 6 JMOL NIH UCSC Genome Database PyMOL CGA Tools Complete Genomics MyGenome App NCBI – BLAST PBS MG – RAST Nature John Lauerman Tracy Kovach Mikael Häggström Database of Genomic Variants National Human Genome Research Institute International Society of Genetic Genealogy Personal Genome Analyzer: Architect: S. Bohle, Programmers: D. Yount
  43. 43. Contact Information Archivopedia.com

×