This document provides an overview of DNA methylation analysis. It begins with background on DNA methylation functions and diseases. It then discusses methods for measuring DNA methylation status, including bisulfite sequencing. The document reviews steps for DNA methylation data analysis using tools like methylKit in R. It presents a case study example of analyzing DNA methylation data from human stem cells and fibroblasts. Alignment, quality control, differential methylation analysis and visualization are discussed.
description of functional genomics and structural genomics and the techniques involved in it and also decribing the models of forward genetics and techniques involved in it and reverse genetics and techniques involved in it
This document provides an overview and introduction to RNA-seq analysis using Next Generation Sequencing. It discusses the RNA-seq workflow including mapping reads with TopHat2, transcript assembly with Cufflinks, and differential expression analysis. Key points covered include the advantages of RNA-seq over microarrays, the exponential drop in sequencing costs, mapping strategies for junction reads including TopHat, and running TopHat from the command line.
This document summarizes three main next generation sequencing technologies: Roche/454FLX pyrosequencing, Illumina/Solexa sequencing by synthesis, and Applied Biosystems SOLiD sequencing by ligation. Pyrosequencing works by detecting pyrophosphate released during DNA polymerization, producing light signals to determine the sequence. Roche/454FLX amplifies DNA fragments on beads in emulsions and sequences in picotiter plates. Illumina attaches DNA fragments to a flow cell for bridge amplification and sequencing by synthesis. Applied Biosystems SOLiD performs sequencing by ligation, determining sequences through sequential ligation of oligos.
This document discusses next generation sequencing technologies. It provides details on several massively parallel sequencing platforms and describes their advantages over traditional Sanger sequencing such as higher throughput, lower costs, and ability to process millions of reads in parallel. It then outlines several applications of next generation sequencing like mutation discovery, transcriptome analysis, metagenomics, epigenetics research and discovery of non-coding RNAs.
This document provides an overview of functional genomics and methods for transcriptome analysis. It discusses two main approaches - sequence-based approaches like expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE), and microarray-based approaches. For sequence-based approaches, it describes how ESTs can provide gene discovery and expression information but have limitations. It outlines the SAGE methodology and gene index construction to organize EST data. For microarrays, it summarizes the basic workflow including sample preparation, hybridization, image analysis and data normalization to identify differentially expressed genes through statistical tests.
Lynx Therapeutics' Massively Parallel Signature Sequencing (MPSS) is an early high-throughput DNA sequencing technique. It works by attaching cDNA from an mRNA sample to beads, determining short sequence signatures from many beads in parallel, and using the signatures to count the number of individual mRNA molecules from each gene. This digital gene expression data allows MPSS to accurately quantify genes expressed at low levels by analyzing transcripts from virtually all genes simultaneously. The technique involves converting mRNA to cDNA, attaching oligonucleotide tags, PCR amplification on beads, and using fluorescent probes to determine short sequences in increments of four nucleotides from millions of beads in parallel.
ESTs are short sequences of DNA that represent genes expressed in certain tissues or organisms. They provide a quick and inexpensive way for scientists to discover new genes and map their positions in genomes. ESTs represent a snapshot of genes expressed in a tissue at a given time. Sequencing the beginning or end of cDNA clones produces 5' and 3' ESTs, which can help identify genes and study gene expression and regulation.
The document discusses genetic polymorphisms in Plasmodium falciparum, the parasite that causes malaria. It defines key terms like locus, allele, and genome. It then describes different types of genetic polymorphisms like single nucleotide polymorphisms (SNPs), insertions and deletions (INDELs), and short tandem repeats (STRs). The document focuses on polymorphisms related to drug resistance in P. falciparum, discussing genes associated with resistance to chloroquine (pfcrt) and other antimalarial drugs, along with specific mutations in those genes linked to resistance.
description of functional genomics and structural genomics and the techniques involved in it and also decribing the models of forward genetics and techniques involved in it and reverse genetics and techniques involved in it
This document provides an overview and introduction to RNA-seq analysis using Next Generation Sequencing. It discusses the RNA-seq workflow including mapping reads with TopHat2, transcript assembly with Cufflinks, and differential expression analysis. Key points covered include the advantages of RNA-seq over microarrays, the exponential drop in sequencing costs, mapping strategies for junction reads including TopHat, and running TopHat from the command line.
This document summarizes three main next generation sequencing technologies: Roche/454FLX pyrosequencing, Illumina/Solexa sequencing by synthesis, and Applied Biosystems SOLiD sequencing by ligation. Pyrosequencing works by detecting pyrophosphate released during DNA polymerization, producing light signals to determine the sequence. Roche/454FLX amplifies DNA fragments on beads in emulsions and sequences in picotiter plates. Illumina attaches DNA fragments to a flow cell for bridge amplification and sequencing by synthesis. Applied Biosystems SOLiD performs sequencing by ligation, determining sequences through sequential ligation of oligos.
This document discusses next generation sequencing technologies. It provides details on several massively parallel sequencing platforms and describes their advantages over traditional Sanger sequencing such as higher throughput, lower costs, and ability to process millions of reads in parallel. It then outlines several applications of next generation sequencing like mutation discovery, transcriptome analysis, metagenomics, epigenetics research and discovery of non-coding RNAs.
This document provides an overview of functional genomics and methods for transcriptome analysis. It discusses two main approaches - sequence-based approaches like expressed sequence tags (ESTs) and serial analysis of gene expression (SAGE), and microarray-based approaches. For sequence-based approaches, it describes how ESTs can provide gene discovery and expression information but have limitations. It outlines the SAGE methodology and gene index construction to organize EST data. For microarrays, it summarizes the basic workflow including sample preparation, hybridization, image analysis and data normalization to identify differentially expressed genes through statistical tests.
Lynx Therapeutics' Massively Parallel Signature Sequencing (MPSS) is an early high-throughput DNA sequencing technique. It works by attaching cDNA from an mRNA sample to beads, determining short sequence signatures from many beads in parallel, and using the signatures to count the number of individual mRNA molecules from each gene. This digital gene expression data allows MPSS to accurately quantify genes expressed at low levels by analyzing transcripts from virtually all genes simultaneously. The technique involves converting mRNA to cDNA, attaching oligonucleotide tags, PCR amplification on beads, and using fluorescent probes to determine short sequences in increments of four nucleotides from millions of beads in parallel.
ESTs are short sequences of DNA that represent genes expressed in certain tissues or organisms. They provide a quick and inexpensive way for scientists to discover new genes and map their positions in genomes. ESTs represent a snapshot of genes expressed in a tissue at a given time. Sequencing the beginning or end of cDNA clones produces 5' and 3' ESTs, which can help identify genes and study gene expression and regulation.
The document discusses genetic polymorphisms in Plasmodium falciparum, the parasite that causes malaria. It defines key terms like locus, allele, and genome. It then describes different types of genetic polymorphisms like single nucleotide polymorphisms (SNPs), insertions and deletions (INDELs), and short tandem repeats (STRs). The document focuses on polymorphisms related to drug resistance in P. falciparum, discussing genes associated with resistance to chloroquine (pfcrt) and other antimalarial drugs, along with specific mutations in those genes linked to resistance.
This document discusses different methods for genome sequencing and assembly, including restriction enzyme fingerprinting, marker sequences, and hybridization assays. It focuses on using marker sequences like sequence-tagged sites (STS), expressed sequence tags (ESTs), untranslated regions (UTRs), and single nucleotide polymorphisms (SNPs) to map genomes. Large-insert cloning vectors like BACs and PACs can be used with restriction enzyme fingerprinting and FPC software to assemble contigs and map genomes at a large scale. Marker sequences provide a dense set of physical markers to build accurate physical maps of genomes.
Dr. Shamalamma S. presented on DNA microarrays. DNA microarrays allow thousands of genes to be compared simultaneously by attaching DNA probes to a chip which fluorescently labeled samples can bind to. The chip is then scanned to analyze gene expression levels. Applications include disease diagnosis, toxicology studies, and pharmacogenomics. While a powerful tool, microarrays have limitations such as lack of knowledge about many genes and lack of standardization.
The Protein Information Resource, is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies & contains protein sequences databases
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
This document provides an overview of next generation sequencing (NGS) analysis. It discusses various NGS platforms such as Illumina, Roche 454, PacBio, and Ion Torrent. It also covers common file formats for sequencing data like FASTQ, quality control measures to assess data quality, and applications of NGS such as RNA-seq and ChIP-seq. The document aims to introduce researchers to basic concepts in NGS analysis and highlights available resources for storing and analyzing large sequencing datasets.
Nanopore sequencing is a fourth generation DNA sequencing technique that involves monitoring changes in electric current as DNA molecules pass through nanopores. There are two main types of nanopores: biological nanopores made of protein complexes like alpha-hemolysin, and solid state nanopores made in thin silicon nitride membranes. Nanopore sequencing has advantages of being label-free, producing long reads at high throughput with low material requirements, but challenges include slowing DNA translocation and reducing noise. Potential applications are in single molecule sensing for analysis of biomolecules.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This document provides an overview of DNA sequencing technologies. It begins with a brief history of DNA sequencing, including the discovery of DNA's structure and Sanger sequencing. The document then focuses on next generation sequencing technologies, describing several platforms such as 454 sequencing, Illumina sequencing, Ion Torrent sequencing, and Pacific Biosciences sequencing. It also discusses third generation sequencing and compares the sequencing approaches, workflows, and applications of various sequencing technologies. In conclusion, the document notes the progress and future directions of sequencing, including increased clinical applications and reduced costs.
This document discusses genome database systems. It begins with an introduction to bioinformatics and genomes. It then discusses the background of genome databases, including some examples. The major characteristics of genome database systems are described as having high complex data, schema changes at a rapid pace, and complex queries. The key areas of data management in genome databases are discussed as non-standard data, complex queries, data interpretation, integration across databases, and uniform management solutions. Major research areas and applications that impact society are also summarized.
The chain-termination method developed by Frederick Sanger and coworkers in 1977. This method used fewer toxic chemicals and lower amounts of radioactivity than the Maxam and Gilbert method. Because of its comparative ease, the Sanger method was soon automated and was the method used in the first generation of DNA sequencers.
Nanopore DNA sequencing is a fourth generation sequencing technique that involves passing single strands of DNA through a nanopore and detecting changes in electrical current caused by each nucleotide base. There are two main types of nanopores - biological nanopores which are protein channels inserted into membranes, and solid-state nanopores fabricated in thin materials like silicon nitride or graphene. Some examples of biological nanopores used for sequencing are the alpha-hemolysin pore and the MspA pore. Nanopore sequencing has advantages over other techniques in being label-free, capable of very long reads, and requiring low sample amounts. However, challenges remain in slowing DNA translocation for higher resolution and reducing noise in the electrical signals.
Course: Bioinformatics for Biomedical Research (2014).
Session: 4.1- Introduction to RNA-seq and RNA-seq Data Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Pyrosequencing is a sequencing by synthesis technique that uses a luciferase enzyme system to monitor DNA synthesis. It works by adding DNA polymerase and a single nucleotide to the DNA fragments, generating pyrophosphate that is converted to light. The light is detected and identifies the nucleotide incorporated. Pyrosequencing has applications in cDNA analysis, mutation detection, re-sequencing of disease genes, and identifying single nucleotide polymorphisms and typing bacteria and viruses.
Ion Torrent (Proton/PGM) and SOLiD sequencing are two types of next-generation sequencing technologies. Ion Torrent uses semiconductor sequencing to detect hydrogen ions released during DNA synthesis, while SOLiD uses ligation of octamer probes and fluorescent dyes to determine sequences in color space. Both have advantages such as fast run times and high throughput but also limitations including errors in homopolymers for Ion Torrent and issues with palindromic sequences for SOLiD.
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
This document discusses forward and reverse genetic approaches for understanding gene function. Forward genetics begins with a phenotype and identifies the underlying gene, while reverse genetics starts with a gene and determines its phenotype. Specific reverse genetic techniques described include large-scale random mutagenesis, homologous recombination, transposable element excision, RNA interference, genome editing using ZFNs/TALENs/CRISPR, and site-directed mutagenesis combined with transgenics. The document provides details on how each technique is used to alter genes and study their function.
Ensembl is a genome browser that annotates genes and predicts regulatory functions for vertebrate genomes. It uses raw DNA sequence data to create a tracking database and automatically finds genes and other features. Ensembl incorporates data from other sources and provides web-based access to genomic information through views of genes, transcripts, proteins, DNA homology, and more. It aims to make genome annotation freely accessible to support research.
20140613 Analysis of High Throughput DNA Methylation ProfilingYi-Feng Chang
This document provides an overview of analysis of high-throughput DNA methylation profiling using bisulfite sequencing (BS-Seq) technology. It discusses DNA methylation and the bisulfite conversion process. It also reviews current BS-Seq resources, information that can be presented in BS-Seq studies, and published tools for analyzing BS-Seq data, including alignment, calling methylation status, and identifying differential methylation regions. The document concludes by introducing MethPipe, a comprehensive tool for BS-Seq data analysis.
This document discusses different methods for genome sequencing and assembly, including restriction enzyme fingerprinting, marker sequences, and hybridization assays. It focuses on using marker sequences like sequence-tagged sites (STS), expressed sequence tags (ESTs), untranslated regions (UTRs), and single nucleotide polymorphisms (SNPs) to map genomes. Large-insert cloning vectors like BACs and PACs can be used with restriction enzyme fingerprinting and FPC software to assemble contigs and map genomes at a large scale. Marker sequences provide a dense set of physical markers to build accurate physical maps of genomes.
Dr. Shamalamma S. presented on DNA microarrays. DNA microarrays allow thousands of genes to be compared simultaneously by attaching DNA probes to a chip which fluorescently labeled samples can bind to. The chip is then scanned to analyze gene expression levels. Applications include disease diagnosis, toxicology studies, and pharmacogenomics. While a powerful tool, microarrays have limitations such as lack of knowledge about many genes and lack of standardization.
The Protein Information Resource, is an integrated public bioinformatics resource to support genomic and proteomic research, and scientific studies & contains protein sequences databases
Genome-wide association study (GWAS) technology has been a primary method for identifying the genes responsible for diseases and other traits for the past ten years. GWAS continues to be highly relevant as a scientific method. Over 2,000 human GWAS reports now appear in scientific journals. Our free eBook aims to explain the basic steps and concepts to complete a GWAS experiment.
This document provides an overview of next generation sequencing (NGS) analysis. It discusses various NGS platforms such as Illumina, Roche 454, PacBio, and Ion Torrent. It also covers common file formats for sequencing data like FASTQ, quality control measures to assess data quality, and applications of NGS such as RNA-seq and ChIP-seq. The document aims to introduce researchers to basic concepts in NGS analysis and highlights available resources for storing and analyzing large sequencing datasets.
Nanopore sequencing is a fourth generation DNA sequencing technique that involves monitoring changes in electric current as DNA molecules pass through nanopores. There are two main types of nanopores: biological nanopores made of protein complexes like alpha-hemolysin, and solid state nanopores made in thin silicon nitride membranes. Nanopore sequencing has advantages of being label-free, producing long reads at high throughput with low material requirements, but challenges include slowing DNA translocation and reducing noise. Potential applications are in single molecule sensing for analysis of biomolecules.
A workshop is intended for those who are interested in and are in the planning stages of conducting an RNA-Seq experiment. Topics to be discussed will include:
* Experimental Design of RNA-Seq experiment
* Sample preparation, best practices
* High throughput sequencing basics and choices
* Cost estimation
* Differential Gene Expression Analysis
* Data cleanup and quality assurance
* Mapping your data
* Assigning reads to genes and counting
* Analysis of differentially expressed genes
* Downstream analysis/visualizations and tables
AGRF in conjunction with EMBL Australia recently organised a workshop at Monash University Clayton. This workshop was targeted at beginners and biologists who are new to analysing Next-Gen Sequencing data. The workshop also aimed to provide users with a snapshot of bioinformatics and data analysis tips on how to begin to analyse project data. An introduction to RNA-seq data analysis was presented by AGRF Senior Bioinformatician Dr. Sonika Tyagi.
Presented: 1st August 2012
This document provides an overview of DNA sequencing technologies. It begins with a brief history of DNA sequencing, including the discovery of DNA's structure and Sanger sequencing. The document then focuses on next generation sequencing technologies, describing several platforms such as 454 sequencing, Illumina sequencing, Ion Torrent sequencing, and Pacific Biosciences sequencing. It also discusses third generation sequencing and compares the sequencing approaches, workflows, and applications of various sequencing technologies. In conclusion, the document notes the progress and future directions of sequencing, including increased clinical applications and reduced costs.
This document discusses genome database systems. It begins with an introduction to bioinformatics and genomes. It then discusses the background of genome databases, including some examples. The major characteristics of genome database systems are described as having high complex data, schema changes at a rapid pace, and complex queries. The key areas of data management in genome databases are discussed as non-standard data, complex queries, data interpretation, integration across databases, and uniform management solutions. Major research areas and applications that impact society are also summarized.
The chain-termination method developed by Frederick Sanger and coworkers in 1977. This method used fewer toxic chemicals and lower amounts of radioactivity than the Maxam and Gilbert method. Because of its comparative ease, the Sanger method was soon automated and was the method used in the first generation of DNA sequencers.
Nanopore DNA sequencing is a fourth generation sequencing technique that involves passing single strands of DNA through a nanopore and detecting changes in electrical current caused by each nucleotide base. There are two main types of nanopores - biological nanopores which are protein channels inserted into membranes, and solid-state nanopores fabricated in thin materials like silicon nitride or graphene. Some examples of biological nanopores used for sequencing are the alpha-hemolysin pore and the MspA pore. Nanopore sequencing has advantages over other techniques in being label-free, capable of very long reads, and requiring low sample amounts. However, challenges remain in slowing DNA translocation for higher resolution and reducing noise in the electrical signals.
Course: Bioinformatics for Biomedical Research (2014).
Session: 4.1- Introduction to RNA-seq and RNA-seq Data Analysis.
Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.
Pyrosequencing is a sequencing by synthesis technique that uses a luciferase enzyme system to monitor DNA synthesis. It works by adding DNA polymerase and a single nucleotide to the DNA fragments, generating pyrophosphate that is converted to light. The light is detected and identifies the nucleotide incorporated. Pyrosequencing has applications in cDNA analysis, mutation detection, re-sequencing of disease genes, and identifying single nucleotide polymorphisms and typing bacteria and viruses.
Ion Torrent (Proton/PGM) and SOLiD sequencing are two types of next-generation sequencing technologies. Ion Torrent uses semiconductor sequencing to detect hydrogen ions released during DNA synthesis, while SOLiD uses ligation of octamer probes and fluorescent dyes to determine sequences in color space. Both have advantages such as fast run times and high throughput but also limitations including errors in homopolymers for Ion Torrent and issues with palindromic sequences for SOLiD.
After sequencing of the genome has been done, the first thing that comes to mind is "Where are the genes?". Genome annotation is the process of attaching information to the biological sequences. It is an active area of research and it would help scientists a lot to undergo with their wet lab projects once they know the coding parts of a genome.
This document discusses forward and reverse genetic approaches for understanding gene function. Forward genetics begins with a phenotype and identifies the underlying gene, while reverse genetics starts with a gene and determines its phenotype. Specific reverse genetic techniques described include large-scale random mutagenesis, homologous recombination, transposable element excision, RNA interference, genome editing using ZFNs/TALENs/CRISPR, and site-directed mutagenesis combined with transgenics. The document provides details on how each technique is used to alter genes and study their function.
Ensembl is a genome browser that annotates genes and predicts regulatory functions for vertebrate genomes. It uses raw DNA sequence data to create a tracking database and automatically finds genes and other features. Ensembl incorporates data from other sources and provides web-based access to genomic information through views of genes, transcripts, proteins, DNA homology, and more. It aims to make genome annotation freely accessible to support research.
20140613 Analysis of High Throughput DNA Methylation ProfilingYi-Feng Chang
This document provides an overview of analysis of high-throughput DNA methylation profiling using bisulfite sequencing (BS-Seq) technology. It discusses DNA methylation and the bisulfite conversion process. It also reviews current BS-Seq resources, information that can be presented in BS-Seq studies, and published tools for analyzing BS-Seq data, including alignment, calling methylation status, and identifying differential methylation regions. The document concludes by introducing MethPipe, a comprehensive tool for BS-Seq data analysis.
The document introduces bioinformatics and discusses its goals and applications. Bioinformatics involves using computational tools and databases to analyze and understand biological data like DNA, RNA, and proteins. It has two main subfields - developing computational tools and databases, and applying these tools to generate biological knowledge and insights into living systems. Bioinformatics aims to better understand cells at the molecular level and how they function. It has applications in areas like drug design, forensics, agriculture, and medicine.
dkNET Webinar: Illuminating The Druggable Genome With Pharos 10/23/2020dkNET
Abstract
Pharos (https://pharos.nih.gov/) is an integrated web-based informatics platform for the analysis of data aggregated by the Illuminating the Druggable Genome (IDG) Knowledge Management Center, an NIH Common Fund initiative. The current version of Pharos (as of October 2019) spans 20,244 proteins in the human proteome, 19,880 disease and phenotype associations, and 226,829 ChEMBL compounds. This resource not only collates and analyzes data from over 60 high-quality resources to generate these types, but also uses text indexing to find less apparent connections between targets, and has recently begun to collaborate with institutions that generate data and resources. Proteins are ranked according to a knowledge-based classification system, which can help researchers to identify less studied “dark” targets that could be potentially further illuminated. This is an important process for both drug discovery and target validation, as more knowledge can accelerate target identification, and previously understudied proteins can serve as novel targets in drug discovery. In this webinar, Dr. Tudor Oprea will introduce how to use Pharos to find targets of interest for drug discovery.
The top 3 key questions that Pharos can answer:
1. What are the novel drug targets that may play a role in a specific disease?
2. What are the diseases that are related directly or indirectly to a drug target?
3. Find researchers that are related directly or indirectly to a drug target.
Presenter: Tudor Oprea, MD, PhD, Professor of Medicine, Chief of Translational Informatics Division & Internal Medicine, University of New Mexico
dkNET Webinar Information: https://dknet.org/about/webinar
Integrative bioinformatics analysis of Parkinson's disease related omics dataEnrico Glaab
Presentation on statistical meta analysis of omics data from Parkinson's disease case-control studies. The results are used for a comparative analysis against aging-related omics alterations in the brain and a prioritization of new candidate disease genes using the phenologs approach.
The document provides an overview of the Human Genome Project (HGP). It describes the HGP's goal of mapping and sequencing the entire human genome. The HGP was an international research effort that worked alongside a private company, Celera Genomics, to complete a rough draft of the human genome by 2000. The completion of the HGP marked a major scientific achievement and has transformed fields like medicine, biotechnology, and genetics by providing a comprehensive map of the human genetic code.
Next Generation Sequencing and its Applications in Medical Research - Frances...Sri Ambati
The so-called “next-generation” sequencing (NGS) technologies allows us, in a short time and in parallel, to sequence massive amounts of DNA, overcoming the limitations of the original Sanger sequencing methods used to sequence the first human genome. NGS technologies have had an enormous impact on biomedical research within a short time frame. This talk will give an overview of these applications with specific examples from Mendelian genomics and cancer research. #h2ony
Next generation sequencing (NGS) provides a high-throughput and cheaper alternative to DNA sequencing through massively parallel sequencing of millions of DNA fragments simultaneously. NGS can be used for target sequencing to identify disease-causing mutations, RNA sequencing to study entire transcriptomes, and has various applications in cancer research and treatment including identifying mutations that predict responses to immunotherapy. However, NGS also faces challenges like accurately sequencing regions with repeats and detecting fusion genes.
This document describes using computational methods to identify potential drug candidates that can inhibit breast cancer metastatic beta arrestin 2 (ARRB2). Ensemble-based virtual screening and pharmacophore modeling were used to screen drug molecules from the DrugBank database and identify top candidates. The 15 molecules with best binding were further analyzed with molecular dynamics simulations. The results suggest two molecules as the best ARRB2 inhibitor candidates based on their binding affinity and stability in simulations. The study provides a framework for discovering novel ARRB2 inhibitors using integrated computational approaches.
DNA Methylation and Epigenetic Events Underlying Renal Cell Carcinomaskomalicarol
Renal cell carcinoma (RCC) refers to a group of tumors that develop from the epithelium of the kidney tubes, including clear cell
RCC, papillary RCC, and chromophobe RCC. Most clear cell renal
carcinomas have a large histologic subtype, genetic or epigenetic
genetic von Hippel-Lindau (VHL). A comprehensive analysis of
the genetic modification genome suggested that chromosome 3p
loss and chromosome gains 5q and 7 may be a significant copy
defect in the development of clear kidney cell cancer. A more potent renal cell carcinoma may develop if chromosome 1p, 4, 9,
13q, or 14q is also lost. Renal carcinogenesis is not associated with
chronic inflammation or histological changes. However, regional hypermethylation of DNA in CpG C-type islands has already
accumulated in cancer-free kidney tissue, implying that the presence of malignant kidney lesions may also be detected by modified
DNA methylation. Modification of DNA methylation in cancerous
kidney tissue may advance kidney tissue to epigenetic mutations
and genes, leading to more serious cancers and even determining
a patient’s outcome
Amanda Myers provides her contact information and career statement, indicating her interest in areas like genetics and human health. She then lists her extensive technical expertise in areas such as chemical hazard identification, risk assessment, and laboratory skills. Her employment history includes work at an advanced testing laboratory conducting human risk assessments and toxicological profiles. She earned degrees in biology and chemistry from Ball State University and has research experience in projects involving database design, protein expression, and earthworm densities based on soil composition.
Introduction to data integration in bioinformaticsYan Xu
This document provides an introduction to data integration in bioinformatics. It discusses different types of biological data including gene expression, copy number variations, epigenetic data like DNA methylation, microRNA data, clinical data, and pathways. It also discusses challenges of data integration like data heterogeneity, incompleteness, and frequent changes. Finally, it provides examples of two case studies that integrate different types of biological and imaging data to study liver cancer and lung cancer.
This study sequenced four genes (TNNT3, TNNI2, TPM2, MYH3) in 19 individuals with distal arthrogryposis (DA) to search for pathogenic mutations. No mutations were found in the sequenced regions for the 13 individuals with unclassified DA or 5 with Sheldon-Hall syndrome. A previously reported pathogenic mutation in MYH3 was identified in the single individual with Freeman-Sheldon syndrome, providing further evidence that mutations in this gene cause this condition. The results suggest that mutations in other regions of the genes or in non-coding regions may be responsible for the unclassified DA cases.
This curriculum vitae summarizes the qualifications and experience of Ximiao He. He received his Ph.D. in Genomics from the Beijing Institute of Genomics in China, where he conducted research on genome databases and the analysis of human CpG islands and DNA methylation in cancer. He is currently a postdoctoral research fellow at the National Cancer Institute studying the effects of nucleosome occupancy and methylation on gene regulation. His research interests include DNA methylation, alternative splicing, and computational genomics tools. He has over 20 publications in peer-reviewed journals and has presented his research at several conferences.
This document provides an introduction to network medicine and discusses its application to chronic obstructive pulmonary disease (COPD). It defines network medicine as studying cellular, disease, and social networks to quantify factors contributing to individual diseases. For COPD, network approaches are being used to build disease networks, define molecular pathways, identify optimal disease phenotypes, develop new classifications, and integrate multi-omics data. Genome-wide association studies have identified risk loci for COPD, which are now being functionally validated in cellular and animal models. Network medicine aims to take a holistic rather than reductionist approach to complex diseases like COPD.
BRN Seminar 12/06/14 Introduction to Network Medicine brnmomentum
This document provides an introduction to network medicine and discusses its application to chronic obstructive pulmonary disease (COPD). It defines network medicine as studying cellular, disease, and social networks to quantify factors contributing to individual diseases. For COPD, network approaches are being used to build disease networks, define molecular pathways, identify optimal disease phenotypes, develop new classifications, and integrate multi-omics data. Genome-wide association studies have identified risk loci for COPD, which are now being functionally validated in cellular and animal models. Network medicine aims to take a holistic rather than reductionist approach to complex diseases like COPD.
Detecting clinically actionable somatic structural aberrations from targeted ...Ronak Shah
Structural aberrations including deletions, insertions, inversions, tandem duplications, translocations, and more complex rearrangements constitute a frequent type of alteration in human tumors. Here, we sought to explore the potential to discover such events from targeted DNA sequence data in our CLIA-compliant molecular diagnostics laboratory. To detect somatic structural aberrations in individual tumors, we have developed an analytic framework in Perl & Python to detect these events in data generated by a hybridization capture-based, targeted sequencing clinical assay (MSK-IMPACT), which can reveal structural rearrangements as small as 500bp.
This document discusses genetic markers for myocardial infarction (MI). It reports on several studies that have identified single nucleotide polymorphisms (SNPs) in genes that are associated with increased risk of MI:
- A study of over 4,000 Japanese patients found polymorphisms in the connexin 37 and p22phox genes increased risk of MI in men, and polymorphisms in the plasminogen activator inhibitor type 1 and stromelysin-1 genes increased risk in women.
- Other studies have found polymorphisms in genes involved in processes like leukocyte trafficking, cell cycle control, apoptosis, and lipid metabolism are associated with increased MI risk.
- While MI has many risk factors, no single genetic
1) The document discusses genetic markers that may be associated with vulnerability of plaques and risk of myocardial infarction (MI). It summarizes various studies examining gene expression and polymorphisms in patients with MI.
2) One study identified polymorphisms in 19 genes in men and 18 genes in women that were associated with increased risk of MI. The strongest associations were found with connexin 37 and PAI-1 genes.
3) Determining genotypes of connexin 37, PAI-1, and stromelysin-1 genes may help predict genetic risk of MI. However, MI is a complex disease that could be influenced by more than 1,000 genetic polymorphisms, each with small effect.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
The Building Blocks of QuestDB, a Time Series Databasejavier ramirez
Talk Delivered at Valencia Codes Meetup 2024-06.
Traditionally, databases have treated timestamps just as another data type. However, when performing real-time analytics, timestamps should be first class citizens and we need rich time semantics to get the most out of our data. We also need to deal with ever growing datasets while keeping performant, which is as fun as it sounds.
It is no wonder time-series databases are now more popular than ever before. Join me in this session to learn about the internal architecture and building blocks of QuestDB, an open source time-series database designed for speed. We will also review a history of some of the changes we have gone over the past two years to deal with late and unordered data, non-blocking writes, read-replicas, or faster batch ingestion.
Analysis insight about a Flyball dog competition team's performanceroli9797
Insight of my analysis about a Flyball dog competition team's last year performance. Find more: https://github.com/rolandnagy-ds/flyball_race_analysis/tree/main
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataKiwi Creative
Harness the power of AI-backed reports, benchmarking and data analysis to predict trends and detect anomalies in your marketing efforts.
Peter Caputa, CEO at Databox, reveals how you can discover the strategies and tools to increase your growth rate (and margins!).
From metrics to track to data habits to pick up, enhance your reporting for powerful insights to improve your B2B tech company's marketing.
- - -
This is the webinar recording from the June 2024 HubSpot User Group (HUG) for B2B Technology USA.
Watch the video recording at https://youtu.be/5vjwGfPN9lw
Sign up for future HUG events at https://events.hubspot.com/b2b-technology-usa/
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Data and AI
Round table discussion of vector databases, unstructured data, ai, big data, real-time, robots and Milvus.
A lively discussion with NJ Gen AI Meetup Lead, Prasad and Procure.FYI's Co-Found
State of Artificial intelligence Report 2023kuntobimo2016
Artificial intelligence (AI) is a multidisciplinary field of science and engineering whose goal is to create intelligent machines.
We believe that AI will be a force multiplier on technological progress in our increasingly digital, data-driven world. This is because everything around us today, ranging from culture to consumer products, is a product of intelligence.
The State of AI Report is now in its sixth year. Consider this report as a compilation of the most interesting things we’ve seen with a goal of triggering an informed conversation about the state of AI and its implication for the future.
We consider the following key dimensions in our report:
Research: Technology breakthroughs and their capabilities.
Industry: Areas of commercial application for AI and its business impact.
Politics: Regulation of AI, its economic implications and the evolving geopolitics of AI.
Safety: Identifying and mitigating catastrophic risks that highly-capable future AI systems could pose to us.
Predictions: What we believe will happen in the next 12 months and a 2022 performance review to keep us honest.
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
4th Modern Marketing Reckoner by MMA Global India & Group M: 60+ experts on W...Social Samosa
The Modern Marketing Reckoner (MMR) is a comprehensive resource packed with POVs from 60+ industry leaders on how AI is transforming the 4 key pillars of marketing – product, place, price and promotions.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
12. 12
Genomic DNA
Deep Sequencing
Techniques for Enrichment of Methylated
or Target Regions Prior to BS-Seq
Lister, R. & Ecker, J.R. Finding the
fifth base: genome-wide sequencing
of cytosine methylation. Genome
Res 19, 959-66 (2009).
18. WGBS Coverage Depth vs Replicates
• For DMR identification
• Per-sample coverage in the range of 5–15×, depending on the magnitude of methylation differences
between the groups and whether a smoothing or single CpG-based DMR identification strategy is
used
• To identify long DMRs with large methylation differences, we find that reducing coverage down to 1×
or 2× per sample is acceptable
• Biological replicates should be analyzed separately to increase power, as opposed to being pooled
together for analysis
• Strongly argue for the use of at least two separate biological replicates for DMR analysis
• Choosing an appropriate number of biological replicates is a complex issue influenced by the degree
of within-group heterogeneity, the magnitude of between-group differences and the presence of
confounding factors such as batch effects.
18Ziller, M. J., Hansen, K. D., Meissner, A. & Aryee, M. J. Coverage recommendations for methylation analysis by whole-genome bisulfite
sequencing. Nat Methods 12, 230-232, 231 p following 232, doi:10.1038/nmeth.3152 (2015).
29. Required Software in Your Laptop
• Mac OS X Terminal
• Application à Utilities à Terminal (終端機)
• Linux console
• Putty:
http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe
• SCP/SFTP/FTP client
• Winscp: http://winscp.net/download/winscp556.zip
• PDF viewer
• http://get.adobe.com/tw/reader/
• R
• https://cran.r-project.org/
29
30. Required R Packages
• Bioconductor
• http://www.bioconductor.org/install/#install-
bioconductor-packages
• methylKit:
• https://github.com/al2na/methylKit
30
> R
# dependencies
> install.packages( c("data.table","devtools"))
> source("http://bioconductor.org/biocLite.R")
> biocLite(c("GenomicRanges","IRanges"))
# install the development version from github
> library(devtools)
> install_github("al2na/methylKit",build_vignettes=FALSE)
31. Analysis Pipeline
31
Allele-specific Methylated Regions
amrfinder allelicmeth
Differential Methylation Region
dmr
Large Hypo/Hyper-Methylation Domains
pmd
Hypo/Hyper-Methylation Regions
hmr hyperhmr pmr
Methylation Calling
methcounts
Bisulfite Conversion Rate
bsrate
Remove Duplicate Reads
duplicate-remover
Mapping
walt
Quality Trimming
fastq_masker
Cross-species Comparison of Methylomes
liftOver
Calculating Methylation Ratio for Regions
bigWigAverageOverBed roimethstat bwtools
Generate Methylation BED file
Bedtools bedGraphToBigWig
fastx toolkit: http://hannonlab.cshl.edu/fastx_toolkit/
MethPipe: http://smithlabresearch.org/software/methpipe/
Bedtools: https://github.com/arq5x/bedtools2
Programs from UCSC Genome Browser:
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64
bwtool: https://github.com/CRG-Barcelona/bwtool/wiki
Sorting mr files
Sorting mr files
http://smithlabresearch.org/downloads/methpipe-manual.pdf
32. Public BS-Seq Datasets
32
http://smithlabresearch.org/software/methbase/
Other species in NCBI GEO Database
• Glycine max (Soy beans)
• Schistocerca gregaria (Locust)
• Rattus norvegicus (Rat)
• Danio rerio (Zebra fish)
• Drosophila melanogaster (Fruit fly)
• Oryza sativa (Rice)
• Macaca mulatta (Rhesus monkey)
• Mus musculus domesticus (Western Europen house mouse)
• Xenopus (Silurana) tropicalis (Frog)
• Cynoglossus semilaevis (Tongue sole, bony fish)
• Bombyx mori (Silkworm)
• Harpegnathos saltator (Jerdon's jumping ant)
• Camponotus floridanus (Florida carpenter ant)
35. DEMO Files
> cd /work3/LSLNGSDNAMETH
> ls -alh
total 12G
drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 16 00:29 .
drwxrwxrwt 109 root root 4.0K Nov 15 14:10 ..
-rwxr-xr-x 1 u00gel00 u00ycm02 65K Nov 15 17:22 h1.chrX.hmr
-rwxr-xr-x 1 u00gel00 u00ycm02 4.6G Nov 15 14:51 h1.chrX.mr.dremove
-rwxr-xr-x 1 u00gel00 u00ycm02 9.8K Nov 15 17:22 h1.chrX.pmd
-rwxr-xr-x 1 u00gel00 u00ycm02 34M Nov 15 17:39 h1.chrX_CpG.meth
-rwxr-xr-x 1 u00gel00 u00ycm02 39M Nov 15 23:52 h1.chrX_CpG.meth.for.methylKit
-rwxr-xr-x 1 u00gel00 u00ycm02 161K Nov 15 17:22 h1_gt_imr90.chrX.dmr
-rwxr-xr-x 1 u00gel00 u00ycm02 45M Nov 15 17:22 h1_imr90.chrX.methdiff
-rwxr-xr-x 1 u00gel00 u00ycm02 55K Nov 15 17:22 h1_lt_imr90.chrX.dmr
-rwxr-xr-x 1 u00gel00 u00ycm02 194K Nov 15 17:22 imr90.chrX.hmr
-rwxr-xr-x 1 u00gel00 u00ycm02 7.3G Nov 15 14:52 imr90.chrX.mr.dremove
-rwxr-xr-x 1 u00gel00 u00ycm02 5.6K Nov 15 17:22 imr90.chrX.pmd
-rwxr-xr-x 1 u00gel00 u00ycm02 35M Nov 15 17:39 imr90.chrX_CpG.meth
-rwxr-xr-x 1 u00gel00 u00ycm02 40M Nov 15 23:52 imr90.chrX_CpG.meth.for.methylKit
drwxr-xr-x 6 u00gel00 u00ycm02 4.0K Nov 15 14:28 methpipe-3.3.1
drwxr-xr-x 4 u00gel00 u00ycm02 4.0K Nov 15 14:46 methpipe-data
35
36. Quality Trimming and Split FASTQ Files into Smaller
Files (Example ONLY)
#e.g. SRR018975.fastq.gz
> for f in *.gz;
do
b=`basename $f .gz`;
echo $f
bsub -q 4G -o $f.stdout -e $f.stderr "
gzip -dc $f|
fastq_masker -q 30 -Q33|
split -dl 6000000 - $b- ";
done
> ls
SRR018975.fastq-00
SRR018975.fastq-01
SRR018975.fastq-02
… 36
#e.g. SRR018975.fastq.gz
# listing all gzip files one by one
# SRR018975.fastq
#uncompressing gzip file and out to stdout
#masking low quality reads as Ns
#spliting fastq file into smaller ones
37. Mapping BS-Seq
FASTQ Files
(Example ONLY)
> export AdapterTrich=AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
> export AdapterArich=CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
> bsub -q 4G -o rmapbs.stdout -e rmapbs.stderr "
/work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-pe
-c /work3/LSLNGSDNAMETH/methpipe-data/data/genome
-o /work3/USERNAME/Output/test.mr
-m 3 -L 400 -C $AdapterTrich:$AdapterArich
/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_1.fq
/work3/LSLNGSDNAMETH/methpipe-data/data/snippet_2.fq"
37
> /work3/LSLNGSDNAMETH/methpipe-3.3.1/bin/rmapbs-pe
Usage: rmapbs-pe [OPTIONS] <fastq-reads-file>
Options:
-o, -output output file name
-c, -chrom chromosomes in FASTA file or dir
-T, -start index of first read to map
-N, -number number of reads to map
-s, -suffix suffix of chrom files (assumes dir provided)
-m, -mismatch maximum allowed mismatches
-M, -max-map maximum allowed mappings for a read
-C, -clip clip the specified adaptor
-L, -fraglen max fragment length
-suffix-len Suffix length of reads name
-v, -verbose print more run info
Help options:
-?, -help print this help message
-about print about message
38. Example Output of imr90 chrX
38
> head -n 30 /work3/LSLNGSDNAMETH/imr90.chrX.mr.dremove |column
MR Format
•RNAME (chromosome name)
•SPOS (start position, 0-based)
•EPOS (end position, 0-based)
•QNAME (read name)
•MISMATCH (number of mismatches)
•STRAND (forward or reverse strand)
•SEQ
•QUAL
39. Remove Duplicates (Example ONLY)
> export PATH=$PATH:/pkg/biology/methpipe/methpipe-3.3.1/bin/
> bsub -q 16G -o stdout -e stderr "
LC_ALL=C sort -S 14G -k 1,1 -k 2,2n -k 3,3n -k 6,6
-o /work3/USERNAME/h1.chrX.mr.sorted_start
/work3/LSLNGSDNAMETH/h1.chrX.mr;
duplicate-remover -S /work3/USERNAME/h1.chrX_dremove_stat.txt
-o /work3/USERNAME/h1.chrX.mr.dremove
/work3/USERNAME/h1.chrX.mr.sorted_start "
> cat stdout
Successfully completed.
Resource usage summary:
CPU time : 343.80 sec.
Max Processes : 3
Max Threads : 4 39
> cat/work3/USERNAME/h1.chrX_dremove_stat.txt
TOTAL READS IN: 24350707
GOOD BASES IN: 1987943796
TOTAL READS OUT: 22884736
GOOD BASES OUT: 1867152730
DUPLICATES REMOVED: 1465971
READS WITH DUPLICATES: 1219174
48. R Packages: methylKit
The following examples were adopt from the tutorials of methylKit
• Akalin, A. et al. methylKit: a comprehensive R package for the
analysis of genome-wide DNA methylation profiles. Genome Biol
13, R87, doi:10.1186/gb-2012-13-10-r87 (2012).
• Tutorial:
http://methylkit.googlecode.com/files/methylKitTutorial_feb2012.
pdf
• Tutorial Slide: http://methylkit.googlecode.com/files/
methylKitTutorialSlides_2013.pdf
48
49. Convert MethPipe mr Format to methylKit
Format
Id chr base strand coverage freqC freqT
Chr21.9764539 chr21 9764539 R 12 25.00 75.00
Chr21.9764513 chr21 9764513 R 12 0.00 100.00
Chr21.9820622 chr21 9820622 F 13 0.00 100.00
Chr21.9837545 chr21 9837545 F 11 0.00 100.00
Chr21.9849022 chr21 9849022 F 124 72.58 27.42
Chr21.9853326 chr21 9853326 F 17 70.59 29.41
49
> awk -F $'t' -v OFS=$'t’ '$6>0{$5=int($5*100); print $1"."$2, $1,
$2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/h1.chrX_CpG.meth >
/work3/USERNAME/Output/h1.chrX_CpG.meth.for.methylKit
> awk -F $'t' -v OFS=$'t' '$6>0{$5=int($5*100); print $1"."$2, $1,
$2, "F", $6, $5, (100-$5)}' /work3/LSLNGSDNAMETH/imr90.chrX_CpG.meth >
/work3/USERNAME/Output/imr90.chrX_CpG.meth.for.methylKit
50. Read Methylation Files into methylKit Objects
> library(methylKit)
# load methylation files (change to your datasets)
> file.list=list(
system.file("extdata", "test1.myCpG.txt", package = "methylKit"),
system.file("extdata", "test2.myCpG.txt", package = "methylKit"),
system.file("extdata", "control1.myCpG.txt", package = "methylKit"),
system.file("extdata", "control2.myCpG.txt", package = "methylKit") )
# read the files to a methylRawList object: myobj
> myobj=read( file.list, sample.id=list("test1", "test2","ctrl1","ctrl2"),
assembly="hg18",treatment=c(1,1,0,0))
> head(myobj)
50
53. Get bases covered by all samples and cluster
samples
# merge all samples to one table by using base-pair locations that are covered in all samples
> meth=unite(myobj)
# cluster all samples using correlation distance and plot hierarchical clustering
> png("cluster.png", width=600, height=600)
> hc = clusterSamples(meth, dist="correlation", method="ward", plot=T)
> dev.off()
> png("pca.png", width=600,height=600)
> PCASamples(meth)
> dev.off()
53
54. Calculate differential methylation
# calculate differential methylation p-values and q-values
> myDiff=calculateDiffMeth(meth)
# get differentially methylated regions with 25% difference and qvalue < 0.01
> myDiff25p=get.methylDiff(myDiff,difference=25,qvalue=0.01)
# get differentially hypo methylated regions with 25% difference and qvalue<0.01
> myDiff25pHypo =get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hypo")
# get differentially hyper methylated regions with 25% difference and qvalue<0.01
> myDiff25pHyper=get.methylDiff(myDiff,difference=25,qvalue=0.01,type="hyper")
54