The document discusses curating sequence and literature data for RefSeq and Gene at the National Center for Biotechnology Information. It provides an overview of RefSeq, describing what RefSeq is, how it compares to GenBank, its advantages, and how the RefSeq dataset is built through curated data and sequence analysis. It then discusses the curation process in depth, including examples of curating genes, transcripts, proteins, and literature. It also describes the tools and quality assurance checks used in curation.
RefSeq curation in-depth. Examples of targeted transcript and protein curation, presented at the 8th International Biocuration conference (April, 2015).
The National Center for Biotechnology Information (NCBI) provides one of the most extensive sets of web-based tools for biological research. The tools are indispensable when planning genomics experiments, including for qPCR, NGS, and CRISPR. In this presentation, Dr Matt McNeill takes a practical look at getting started with the wealth of NCBI tools, and shares some relevant tips to help you sift through the tools and options that we regularly use. In particular, he focuses on commonly adjusted parameters that will allow you to more effectively use the powerful Basic Local Alignment Algorithm Tool (BLAST) to identify off-target hybridization/annealing events. Dr McNeill also covers practical examples using NCBI tools to design assays.
Examining gene expression and methylation with next gen sequencingStephen Turner
Slides on RNA-seq and methylation studies using next-gen sequencing given at the University of Miami Hussman Institute for Human Genomics "Genetic Analysis of Complex Human Diseases" course in 2012 (http://hihg.med.miami.edu/educational-programs/analysis-of-complex-human-diseases/genetic-analysis-of-complex-human-diseases/)
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
RefSeq curation in-depth. Examples of targeted transcript and protein curation, presented at the 8th International Biocuration conference (April, 2015).
The National Center for Biotechnology Information (NCBI) provides one of the most extensive sets of web-based tools for biological research. The tools are indispensable when planning genomics experiments, including for qPCR, NGS, and CRISPR. In this presentation, Dr Matt McNeill takes a practical look at getting started with the wealth of NCBI tools, and shares some relevant tips to help you sift through the tools and options that we regularly use. In particular, he focuses on commonly adjusted parameters that will allow you to more effectively use the powerful Basic Local Alignment Algorithm Tool (BLAST) to identify off-target hybridization/annealing events. Dr McNeill also covers practical examples using NCBI tools to design assays.
Examining gene expression and methylation with next gen sequencingStephen Turner
Slides on RNA-seq and methylation studies using next-gen sequencing given at the University of Miami Hussman Institute for Human Genomics "Genetic Analysis of Complex Human Diseases" course in 2012 (http://hihg.med.miami.edu/educational-programs/analysis-of-complex-human-diseases/genetic-analysis-of-complex-human-diseases/)
NCBI has developed a powerful suite of online biomedical and bioinformatics resources, including old friends like PubMed and OMIM and newer resources such as Genome. This collection of databases and tools are widely used by scientists and medical professionals across the world. With such a wealth of information, it is easy to get overwhelmed. Join us for an overview to NCBI resources for the information professional with an emphasis on biodata connectivity. No science degree required!
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Real time sequencing of food borne pathogens: Pathogen Analysis Pipeline at The National Center for Biotechnology Information (NCBI). Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
Presentation at 2019 ASHG GRC/GIAB workshop describing recent updates to the MANE project, which aims to provide matched annotation from RefSeq and GENCODE.
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...QIAGEN
Gene expression profiling is the key to understanding biological pathways and complex cellular systems. In this webinar we will discuss the challenges of targeted RNA-seq data analysis and present the solutions provided by the QIAGEN automated online data analysis tools. Using raw sequencing data from targeted sequencing, the output of the QIAseq primary data analysis tool and the options in QIAseq secondary analysis, such as normalization strategies, will be described. The use of Ingenuity Pathway Analysis (IPA) to unlock the molecular insights buried in experimental data by quickly identifying relationships, mechanisms, functions, and pathways of relevance will be shown with an example.
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Databasenist-spin
"Development of FDA MicroDB: A Regulatory-Grade
Microbial Reference Database" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by National Institute for Standards and Technology October 2014 by Heike Sichtig, PhD from the FDA and Luke Tallon from IGS UMSOM.
There is an increasing amount of oncogenomic data available in the last years, and more is to come. The main challenges the scientific community is and will be facing are the integration of this data to extract new knowledge and the intuitive visualization of the results obtained in the analysis. Here two complementary but independent tools for the analysis of oncogenomic data are presented: IntOGen and GiTools.
IntOGen is a framework that includes public oncogenomic data and integrates it in different ways. Its main purpose is to identify those genes which are consistently altered (up or down-regulated) across many samples in a specific experiment, and combine all experiment from a same cancer type to end up having a p-value for a gene and cancer type. This same principle can then be applied to gene modules, or sets, which consist of groups of genes that share a biological property (module analysis). IntOGen has a web page from where the user can explore the datasets included in the database, from individual genes in all cancer types to different experiments, or gene modules (GO terms, KEGG pathways or user-defined groups of genes) across all the experiments.
GiTools is a desktop-based framework developed also by the lab which allows the analysis and visualization of genomic data. It supports different input formats (all plain text) and data can even be imported from BioMart, so everything stored in that database can be used directly in GiTools. Also there is an IntOGen data importer, so users can download matrices or oncomodules at different levels (experiments or combined results) and use them directly. Right now it can perform a limited number of analysis (enrichment analysis, correlations, results combination...) but it is built in a modular fashion and it can be easily expanded to include more matrix-based statistical tests. It allows the flexible exploration of the data and creating figures for papers from there directly, which can be exported in many different formats.
Two case studies are presented to illustrate the combined usefulness of these tools, aiming to answer two main questions: “what biological processes are enriched in genes siginificantly up-regulated in cancer?” and “what is the correlation between different tumour types for the pattern of genes up-regulated?”. Also different real applications of these tools are presented, both from published and unpublished research, stressing that they can be used not only in oncogenomics projects, but also in evolution and global gene regulation.
In the near future GiTools will be incorporating new analysis, such as GSEA and clustering, and connections with the R statistical framework. IntOGen will soon have a Biomart-compatible interface, which will make the data even more easily available.
Aequatus browser: Visualising complex similarity relationships among speciesAnil Thanki
The Aequatus Browser, a web-based tool with novel rendering approaches to visualise homologous, orthologous and paralogous gene structures among differing species or subtypes of a common species.
P.S. Previously named as Synteny Borwser so in these slides referred as Synteny Browser
The National Center for Biotechnology Information (NCBI) Pathogen Analysis Pi...ExternalEvents
http://www.fao.org/about/meetings/wgs-on-food-safety-management/en/
Real time sequencing of food borne pathogens: Pathogen Analysis Pipeline at The National Center for Biotechnology Information (NCBI). Presentation from the Technical Meeting on the impact of Whole Genome Sequencing (WGS) on food safety management -23-25 May 2016, Rome, Italy.
Presentation by Justin Zook at GRC/GIAB ASHG 2017 workshop "Getting the most from the reference assembly and reference materials" on benchmarks for indels and structural variants.
Abstract: The focus in this session will be put on the differences between standard DNA mapping and RNAseq-specific transcript mapping: identifying splice variants and isoforms. The issue of transcript quantification and genomic variants that can be identified from RNAseq data will be discussed.
The NCBI Boot Camp for Beginners was designed to offer an overview of the NCBI suite of resources. In the first half of the presentation, highlighted databases were covered in four main categories: literature, sequences, genes & genomes and expression & structure. The second half of the class used the apolipoprotein A as a query that was explored through many of the NCBI databases, from identifying the reference sequences to a structural analysis of the Cys130Arg variant.
Presentation at 2019 ASHG GRC/GIAB workshop describing recent updates to the MANE project, which aims to provide matched annotation from RefSeq and GENCODE.
Molecular insight into Gene Expression Using Digital RNAseq: Digital RNAseq W...QIAGEN
Gene expression profiling is the key to understanding biological pathways and complex cellular systems. In this webinar we will discuss the challenges of targeted RNA-seq data analysis and present the solutions provided by the QIAGEN automated online data analysis tools. Using raw sequencing data from targeted sequencing, the output of the QIAseq primary data analysis tool and the options in QIAseq secondary analysis, such as normalization strategies, will be described. The use of Ingenuity Pathway Analysis (IPA) to unlock the molecular insights buried in experimental data by quickly identifying relationships, mechanisms, functions, and pathways of relevance will be shown with an example.
Development of FDA MicroDB: A Regulatory-Grade Microbial Reference Databasenist-spin
"Development of FDA MicroDB: A Regulatory-Grade
Microbial Reference Database" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by National Institute for Standards and Technology October 2014 by Heike Sichtig, PhD from the FDA and Luke Tallon from IGS UMSOM.
There is an increasing amount of oncogenomic data available in the last years, and more is to come. The main challenges the scientific community is and will be facing are the integration of this data to extract new knowledge and the intuitive visualization of the results obtained in the analysis. Here two complementary but independent tools for the analysis of oncogenomic data are presented: IntOGen and GiTools.
IntOGen is a framework that includes public oncogenomic data and integrates it in different ways. Its main purpose is to identify those genes which are consistently altered (up or down-regulated) across many samples in a specific experiment, and combine all experiment from a same cancer type to end up having a p-value for a gene and cancer type. This same principle can then be applied to gene modules, or sets, which consist of groups of genes that share a biological property (module analysis). IntOGen has a web page from where the user can explore the datasets included in the database, from individual genes in all cancer types to different experiments, or gene modules (GO terms, KEGG pathways or user-defined groups of genes) across all the experiments.
GiTools is a desktop-based framework developed also by the lab which allows the analysis and visualization of genomic data. It supports different input formats (all plain text) and data can even be imported from BioMart, so everything stored in that database can be used directly in GiTools. Also there is an IntOGen data importer, so users can download matrices or oncomodules at different levels (experiments or combined results) and use them directly. Right now it can perform a limited number of analysis (enrichment analysis, correlations, results combination...) but it is built in a modular fashion and it can be easily expanded to include more matrix-based statistical tests. It allows the flexible exploration of the data and creating figures for papers from there directly, which can be exported in many different formats.
Two case studies are presented to illustrate the combined usefulness of these tools, aiming to answer two main questions: “what biological processes are enriched in genes siginificantly up-regulated in cancer?” and “what is the correlation between different tumour types for the pattern of genes up-regulated?”. Also different real applications of these tools are presented, both from published and unpublished research, stressing that they can be used not only in oncogenomics projects, but also in evolution and global gene regulation.
In the near future GiTools will be incorporating new analysis, such as GSEA and clustering, and connections with the R statistical framework. IntOGen will soon have a Biomart-compatible interface, which will make the data even more easily available.
Aequatus browser: Visualising complex similarity relationships among speciesAnil Thanki
The Aequatus Browser, a web-based tool with novel rendering approaches to visualise homologous, orthologous and paralogous gene structures among differing species or subtypes of a common species.
P.S. Previously named as Synteny Borwser so in these slides referred as Synteny Browser
These is the second part of the lecture slides of the BITS bioinformatics training session on the UCSC Genome Browser.
See http://www.bits.vib.be/index.php?option=com_content&view=article&id=17203990:orange-genome-browsers-ucsc-training&catid=81:training-pages&Itemid=190
Tutorial on the DisGeNET Discovery Platform, with especial focus on its exploitation in the Semantic Web showing how to retrieve and integrate DisGeNET data with other RDF linked datasets.
Event: Plant and Animal Genomes conference 2012
Speaker: Rachael Huntley
The Gene Ontology (GO) is a well-established, structured vocabulary used in the functional annotation of gene products. GO terms are used to replace the multiple nomenclatures used by scientific databases that can hamper data integration. Currently, GO consists of more than 35,000 terms describing the molecular function, biological process and subcellular location of a gene product in a generic cell. The UniProt-Gene Ontology Annotation (UniProt-GOA) database1 provides high-quality manual and electronic GO annotations to proteins within UniProt. By annotating well-studied proteins with GO terms and transferring this knowledge to less well-studied and novel proteins that are highly similar, we offer a valuable contribution to the understanding of all proteomes. UniProt-GOA provides annotated entries for over 387,000 species and is the largest and most comprehensive open-source contributor of annotations to the GO Consortium annotation effort. Annotation files for various proteomes are released each month, including human, mouse, rat, zebrafish, cow, chicken, dog, pig, Arabidopsis and Dictyostelium, as well as a file for the multiple species within UniProt. The UniProt-GOA dataset can be queried through our user-friendly QuickGO browser2 or downloaded in a parsable format via the EBI3 and GO Consortium FTP4 sites. The UniProt-GOA dataset has increasingly been integrated into tools that aid in the analysis of large datasets resulting from high-throughput experiments thus assisting researchers in biological interpretation of their results. The annotations produced by UniProt-GOA are additionally cross-referenced in databases such as Ensembl and NCBI Entrez Gene.
1 http://www.ebi.ac.uk/GOA
2 http://www.ebi.ac.uk/QuickGO
3 ftp://ftp.ebi.ac.uk/pub/databases/GO/goa
4 ftp://ftp.geneontology.org/pub/go/gene-associations
Bioinformatics is defined as the application of tools of computation and analysis to the capture and interpretation of biological data. It is an interdisciplinary field, which harnesses computer science, mathematics, physics, and biology
Next Generation Sequencing for Identification and Subtyping of Foodborne Pat...Nathan Olson
"Next Generation Sequencing for Identification and Subtyping of Foodborne Pathogens" presentation at the Standards for Pathogen Identification via NGS (SPIN) workshop hosted by the National Institute for Standards and Technology October 2014 by Rebecca Lindsey, PhD from Enteric Diseases Laboratory Branch of the CDC.
Using VarSeq to Improve Variant Analysis Research WorkflowsDelaina Hawkins
Many questions must be answered when analyzing DNA sequence variants: How do I determine which variants are potentially deleterious? Is the sequencing quality sufficient? How do I prioritize the results? Which annotation sources may help answer my research question?
In this webinar presentation, we will review workflow strategies for quality control and analysis of DNA sequence variants using the VarSeq software package from Golden Helix. VarSeq is a powerful platform for analysis of DNA sequence variants in clinical and translational research settings. VarSeq provides researchers with easy access to curated public databases of variant annotation information, and also enables users to incorporate their own local databases or downloaded information about variants and genomic regions.
The presentation will include interactive demonstrations using VarSeq to analyze variants found by exome sequencing of an extended family with a complex disease. We will review strategies for assessing variant quality, applying genomic annotations, incorporating custom annotation sources, and creating variant filters in VarSeq. We will also demonstrate the PhoRank gene ranking algorithm and its application for prioritizing variants.
Using VarSeq to Improve Variant Analysis Research WorkflowsGolden Helix Inc
In this webinar presentation, we will review workflow strategies for quality control and analysis of DNA sequence variants using the VarSeq software package from Golden Helix. VarSeq is a powerful platform for analysis of DNA sequence variants in clinical and translational research settings. VarSeq provides researchers with easy access to curated public databases of variant annotation information, and also enables users to incorporate their own local databases or downloaded information about variants and genomic regions.
Introduction:
RNA interference (RNAi) or Post-Transcriptional Gene Silencing (PTGS) is an important biological process for modulating eukaryotic gene expression.
It is highly conserved process of posttranscriptional gene silencing by which double stranded RNA (dsRNA) causes sequence-specific degradation of mRNA sequences.
dsRNA-induced gene silencing (RNAi) is reported in a wide range of eukaryotes ranging from worms, insects, mammals and plants.
This process mediates resistance to both endogenous parasitic and exogenous pathogenic nucleic acids, and regulates the expression of protein-coding genes.
What are small ncRNAs?
micro RNA (miRNA)
short interfering RNA (siRNA)
Properties of small non-coding RNA:
Involved in silencing mRNA transcripts.
Called “small” because they are usually only about 21-24 nucleotides long.
Synthesized by first cutting up longer precursor sequences (like the 61nt one that Lee discovered).
Silence an mRNA by base pairing with some sequence on the mRNA.
Discovery of siRNA?
The first small RNA:
In 1993 Rosalind Lee (Victor Ambros lab) was studying a non- coding gene in C. elegans, lin-4, that was involved in silencing of another gene, lin-14, at the appropriate time in the
development of the worm C. elegans.
Two small transcripts of lin-4 (22nt and 61nt) were found to be complementary to a sequence in the 3' UTR of lin-14.
Because lin-4 encoded no protein, she deduced that it must be these transcripts that are causing the silencing by RNA-RNA interactions.
Types of RNAi ( non coding RNA)
MiRNA
Length (23-25 nt)
Trans acting
Binds with target MRNA in mismatch
Translation inhibition
Si RNA
Length 21 nt.
Cis acting
Bind with target Mrna in perfect complementary sequence
Piwi-RNA
Length ; 25 to 36 nt.
Expressed in Germ Cells
Regulates trnasposomes activity
MECHANISM OF RNAI:
First the double-stranded RNA teams up with a protein complex named Dicer, which cuts the long RNA into short pieces.
Then another protein complex called RISC (RNA-induced silencing complex) discards one of the two RNA strands.
The RISC-docked, single-stranded RNA then pairs with the homologous mRNA and destroys it.
THE RISC COMPLEX:
RISC is large(>500kD) RNA multi- protein Binding complex which triggers MRNA degradation in response to MRNA
Unwinding of double stranded Si RNA by ATP independent Helicase
Active component of RISC is Ago proteins( ENDONUCLEASE) which cleave target MRNA.
DICER: endonuclease (RNase Family III)
Argonaute: Central Component of the RNA-Induced Silencing Complex (RISC)
One strand of the dsRNA produced by Dicer is retained in the RISC complex in association with Argonaute
ARGONAUTE PROTEIN :
1.PAZ(PIWI/Argonaute/ Zwille)- Recognition of target MRNA
2.PIWI (p-element induced wimpy Testis)- breaks Phosphodiester bond of mRNA.)RNAse H activity.
MiRNA:
The Double-stranded RNAs are naturally produced in eukaryotic cells during development, and they have a key role in regulating gene expression .
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
A brief information about the SCOP protein database used in bioinformatics.
The Structural Classification of Proteins (SCOP) database is a comprehensive and authoritative resource for the structural and evolutionary relationships of proteins. It provides a detailed and curated classification of protein structures, grouping them into families, superfamilies, and folds based on their structural and sequence similarities.
This presentation explores a brief idea about the structural and functional attributes of nucleotides, the structure and function of genetic materials along with the impact of UV rays and pH upon them.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
1. National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, DHHS, USA
Curating sequence and literature
data for RefSeq and Gene
Kim D. Pruitt
8th International Biocuration Conference
Training workshop
April 23, 2015
2. National Center for Biotechnology Information
RefSeq overview
What is RefSeq?
How does it compare to GenBank?
What are the advantages?
How is the dataset built?
• Curated data
• Sequence analysis
• Curation in-depth – examples
• Data access
3. National Center for Biotechnology Information
An NCBI project to provide reference sequence standards, that
incorporate current knowledge, for genomes, transcripts, and proteins.
What is RefSeq?
Vertebrates Eukaryotes Prokaryotes Virus
Genomes 169 503 31,000 4,538
Genes 4 million 9.2 million 2 million 200,000
Transcripts 5.6 million 11 million 20,000 na
Proteins 4.9 million 10 million 38 million 214,287
Counts taken in early March 2015
4. National Center for Biotechnology Information
RefSeq versus GenBank
GenBank RefSeq
Is archival (member of INSDC) Yes No
Source of sequence Submitter GenBank (INSDC)
Source of annotation Submitter GenBank, Collaboration, Literature, Curation,
Computation
Genome is always annotated No Yes for archaea, bacteria, eukaryotes
‘Owner’ of sequence records and annotation Submitter NCBI
NCBI staff can update based on user requests Submitter must
authorize
RefSeq may drop contamination
RefSeq may add transcript/protein/pseudogene
based on data analysis and curation
RefSeq may update annotation
Annotation may be curated by NCBI staff No Yes
5. National Center for Biotechnology Information
Advantages:
Consistency
Non-redundant
Use current names
Expanded feature annotation
Connected to Gene information
Products & Access:
Annotated genomes, transcripts, proteins
Gene, BLAST, FTP, programming API
15 years of building RefSeq
www.ncbi.nlm.nih.gov/refseq/
Curation:
Correct errors
Add new records
Add functional information
Connect sequence to function
Gene & protein names
Functional sequence elements
Curation focus
Human
Mouse
Rat
Zebrafish
Cow
Chicken
6. National Center for Biotechnology Information
RefSeqs unique contribution for vertebrates
• Correct transcript/protein sequence even if genome is incomplete/wrong
• Clear information on data source & evidence
• Connect DNA<>RNA<>Protein
• Connect sequence regions to function
- for both transcripts and proteins
NM_001033952.2
7. National Center for Biotechnology Information
RefSeq Genomes in a Nutshell
Sequence
Assembly
(Annotate)
Submit
GenBank/INSDC GenomeSubmitter
Sequence
Meta-data
Nucleotide Protein
BioSampleAssembly BioProject
SRA
(reads)
FTPBLAST
Web
eUtils
Access
RefSeq Creation
Annotation Pipeline
RefSeq Curation
Collaboration
BLAST
FTP
RefSeq Gene
Genome Tracks
Reports Assembly HomoloGene
Data Submissions
RefSeq
Process Flows
Resources
8. National Center for Biotechnology Information
RefSeq genomes: Leveraging computation & curation
www.ncbi.nlm.nih.gov/genome/annotation_euk/process/
Genes
Curation
International CCDS
Collaboration
Genome Reference
Consortium (GRC)
RefSeqs
Nomenclature
Groups
Model Organism
Databases
UniProtKB/
SwissProt
miRBase
Sequence Analysis
Literature Review
Iterative process
Iterative process
Quality Checks
Model
RefSeqs
Gene
FTP
Nucleotide
Protein
Annotation Pipeline
Align:
RefSeq
cDNAs
Proteins
RNA-Seq
Interpret:
Build models
Call orthologs:
vs. human
Filter:
Best hits
Assign GeneID
Assign Accession
Public release
User Feedback!
Curated RefSeqs
9. National Center for Biotechnology Information
Annotation - a conservative approach
2. stromal antigen 3-like 5 pseudogene
3. poliovirus receptor related immunoglobulin domain pseudogene
4. paired immunoglobin-like type 2 receptor beta
(regulation of inflammatory responses)
1. STAG3L5P-PVRIG2P-PILRB readthrough
Annotate every exon
that is observed once?
Consolidate information
to represent supported
genes and transcripts!
X
10. National Center for Biotechnology Information
Exon coverage
Log2 scale graphs
Interpreted introns
Model RefSeqs
Curated
Track names
Rabbit - GeneID:103352519 - Assembly: OryCun2.0
Annotation pipeline results in NCBI Gene
Access genome annotation information including RNA-Seq tracks
Not annotated in Ensembl 76
RNA-Seq tracks
Ensembl track
Configure
11. National Center for Biotechnology Information
How to identify a RefSeq sequence record
Keyword:
• RefSeq
Accession format:
Two alpha + _+ 6-9 digits – or -
Two alpha + _ + GenBank accession
RefSeq categories
(transcripts & proteins):
• Known RefSeq
• Subject to curation
• Accession prefix N*_
• Model RefSeq
• Evidence-based predictions
• Accession prefix X*_
www.ncbi.nlm.nih.gov/nucleotide/NM_002197.2
12. National Center for Biotechnology Information
RefSeq overview
Curated data
Genes
Sequence
Publications
Imported data
• Sequence analysis
• Curation in-depth – examples
• Data access
13. National Center for Biotechnology Information
Review data
• Gene information
• Gene-2-sequence associations
• Publications
• Data from collaborators
Resolve
Errors
• Remove wrong name synonyms, publications
• Fix sequence associations
• Update gene type
• Correct collaborator Gene: NCBI Gene associations
Add data
• Create RefSeq records
• RefSeq Attributes & Summary
• Transcript variant description
• Alternate names, publications
Import • Add data from
collaborators
Update
DB
• Add, update,
remove accessions
to match GenBank
QA
• Identify data
conflicts for
curator review
BULK PROCESSES CURATION
14. National Center for Biotechnology Information
How do we curate?
• Collaborations
• Nomenclature, MODs, UniProt, Genome
Reference Consortium, individual scientists
• In-depth sequence analysis
• Genome, transcript and protein sequence
• Alignments
• RNA-Seq
• QA tests
• Epigenomics
• Clinical variants
• Literature review
mRNA, ncRNA, protein,
and pseudogene records
Collaboration
Sequence Analysis
Literature
Curation
Guidelines
Validation
Vertebrate transcripts
WWW – FTP - BLAST
Genome Annotation
15. National Center for Biotechnology Information
Tracking data & curation consistency
• Standard operating procedures
• Curation decision trees
• ncRNA <> pseudo <> protein-coding?
• 5’ complete transcript <>partial?
• Sequence analysis tools and CGI’s
• Support collaborations
Data management Curation management
• Specifications for the product
• Relational database to track data and curation
decisions over time
• Process flows
• Data validation
• Disaster recovery/backup
• Public access
16. National Center for Biotechnology Information
What do we curate?
•Genes:
• Type, location, length
• Names, Summary
• Publications
• Gene-2-accession bins
•Imported data
•Sequence:
• Accuracy, length
• Alternate splice products
• Sequence features
• Functional regions
RefSeq: www.ncbi.nlm.nih.gov/refseq/ Gene: www.ncbi.nlm.nih.gov/gene/
Protein-coding Pseudogene
ncRNAs Unknown ???
17. National Center for Biotechnology Information
Curating Literature
• Curation Review for Genes
• Move to correct gene
• Add functional citations
• Mark to include on RefSeq
• GeneRIF submissions from public
• Add RefSeq attribute and citation
• Most publications are added from:
• National Library of Medicine MeSH
indexing service
• Sequence records
• Nomenclature groups, MODs, GO,
OMIM, GWAS catalog, more…
18. National Center for Biotechnology Information
GeneRIFs – an annotated bibliography
http://www.ncbi.nlm.nih.gov/gene/10309
RefSeq curators review GeneRIF submissions from
individuals to correct spelling, check the gene
association, and remove irrelevant submissions.
19. National Center for Biotechnology Information
Curation supports data import processes
Gene
Backend
Database
HGNC
MGD
RGD
XenBase
ZFIN
QTL db
Pseudo
geneOrg
MIRBASE
OMIM
CGNC
Generic
Processing
Dataflow
FTP/API
Compare to known data
Update if OK
Report for curation if
conflicts found
20. National Center for Biotechnology Information
Curating data import errors
• Manually add or update some data
• HGNC may have: HGNC ID 1 = genome location ‘x’ = ENSG ID 1
• Processing can’t identify corresponding GeneID
• Curator reviews genomic location and either updates or creates a Gene record.
• Coordinate with data sources to reconcile data association conflicts
between sites
• NCBI may have: Gene ID 1 = HGNC ID 1 = Accession 123
• HGNC may have: HGNC ID 1 = Gene ID 1 = Accession 234
• NCBI may have: Accession 234 = GeneID 2 = HGNC ID 2 (a paralog)
21. National Center for Biotechnology Information
RefSeq overview
Curated data
Sequence analysis
Tools
Quality assurance checks
• Curation in-depth - examples
• Data access
22. National Center for Biotechnology Information
Quick access to stored BLAST results
View hits in NCBI’s genome browser
Gene back-end curation database
In-house: Set of BLAST searches per accession
Results are stored for 3 months
Quick access to results
UniVec
EST
NR
Genome
Blastn
Blastx
blastp
23. National Center for Biotechnology Information
Sequence and alignment analysis using NCBI’s
Genome Workbench
www.ncbi.nlm.nih.gov/tools/gbench/
An application for viewing and
analyzing sequence data from
NCBI databases, or upload your
data for analysis
• Compiled for several
operating systems
• Analysis: BLAST and more
• Supports many display
options
• graphical
• alignments
• dot plot
• phylogenetic trees
• more
24. National Center for Biotechnology Information
General layout
Data display area
Project Tree shows loaded data
Search for features, search the sequence, search for open reading frames
Monitor the progress of analysis tasks
*
*
25. National Center for Biotechnology Information
Multi-pane cross alignment view
Turkey_5.0
Chromosome 1
Turkey_2.01
Chromosome 1
28. National Center for Biotechnology Information
Load a set of protein accession.version numbers
Select accessions to include in your analysis
Select the analysis option from the Tool menu
29. National Center for Biotechnology Information
Load a set of protein accession.version numbers
Select accessions to include in your analysis
Select analysis option from the Tool menu
30. National Center for Biotechnology Information
Display the phylogentic tree calculated
from selected CELF proteins.
31. National Center for Biotechnology Information
Genome workbench - Multiple protein
alignment display
Curation use:
- Orthology review
- Gene type review
- Sequence conservation
32. National Center for Biotechnology Information
RADAR – a Genome Workbench plug-in for RefSeq Curation
Displays Information on:
Genomic region, gene annotation
RNA-seq called introns
CpG Islands, Repeats, variation, more
QA results for newly build RefSeq
Aligned RefSeqs, cDNAs, ESTs
Coding sequence region (green)
Strain data
Clone library
Stored in DB with quality concern (D)
Multiple alignments to the genome (M)
Consensus splice sites (‘a’, ‘d’)
Mismatches
Indels
Unaligned ends (not shown)
LibraryStrainNew RefSeq
QA
RefSeq Analysis, Display, and Recommendation
33. National Center for Biotechnology Information
RADAR
• Functions
• RNAseq supported intron
• ORF finder
• Signal peptides
• Transmembrane regions
• Compare/diff transcripts
• Find similar transcripts
• Integrated QA tests
• View nucleotide
• View translation
• Links to web for details
34. National Center for Biotechnology Information
Review data
• Gene information
• Gene-2-sequence associations
• Publications
• Data from collaborators
Resolve
Errors
• Remove wrong name synonyms, publications
• Fix sequence associations
• Update gene type
• Correct collaborator Gene: NCBI Gene associations
Add data
• Create RefSeq records
• RefSeq Attributes & Summary
• Transcript variant description
• Alternate names, publications and GeneRIF
Import •Add data from
collaborators
Update
DB
•Add, update,
remove
accessions to
match GenBank
QA
•Identify data
conflicts for
curator review
PROCESS CURATION
35. National Center for Biotechnology Information
Quality assurance tests
Tests are available in the NCBI C++ toolkit – http://www.ncbi.nlm.nih.gov/toolkit/
Transcript tests – protein tests – genome tests – alignment tests
Results
over time
Sequence
tested
Results
summary
Details (not
shown)
36. National Center for Biotechnology Information
RefSeq overview
Curated data
Sequence analysis
Curation in-depth – examples
Work flow
Making decisions
Working with collaborators
RefSeq curated data is in Gene
Annotating RefSeq records
• Data access
37. National Center for Biotechnology Information
AAAAAA
AAAAAA
AAAAAA
General process flow for manual transcript-based curation
gt ag gt ag
Identify
quality full-length
cDNAs or ESTs
Determine the supported
complete CDS
Extend 5’ and 3’ ends
using all aligning
transcript data
Representative
RefSeqs AAAAAA
Identify splice variants
and assess their
protein-coding capacity
Protein-coding variant that encodes an
alternate C-terminus
Non-coding variant that is subject to
nonsense-mediated decay (NMD)
NMs
NR
38. National Center for Biotechnology Information
Transcript-based curation process
Example: Human DNAJC22 gene (Gene ID:79962)- RefSeqs are constructed using RADAR
Curated NMs are
based on full-
length transcripts
UTRs are
extended
Model XMs are created
computationally based on
transcript and RNA-seq data and
often lack full-length support.
RNA-seq
alignments
Model
Known
Aligned
cDNAs
Chr 12
NCBI RADAR: NC_000012.12 Chromosome 12 GRCh38.p2 (similar to UCSC hg20)
39. National Center for Biotechnology Information
Determining protein-coding potential of a variant
Example: Human CCNO gene (Gene ID: 10309) – Three non-coding RefSeq (NRs) were made to represent full-
length transcript variants that either lack an open reading frame (ORF) that meets our quality criteria or the ORF
renders the transcript a candidate for nonsense-mediated decay (NMD) .
non-coding variants (NR_)
protein-coding variant (NM_)
NMD candidate
ORFs are short < 60 aa
NCBI RADAR: NC_000005.10 Chromosome 5 GRCh38.p2 (similar to UCSC hg20)
40. National Center for Biotechnology Information
Detailed documentation improves consistency
• 1 long cDNA
• Or, 2 lines of support:
• Overlapping partial transcripts + more support
• Protein homology or ORF conservation or
publication
• Consensus splice sites
• ORF length >=100 aa
• If <100 aa require more support
• Not apparently pseudogene
• 1 long cDNA if > 2 exons
• 2 independent lines of support if 2 exons
• 5 lines of support if 1 exon
• ORF length <100aa
• No quality protein hits (blastX)
• Consensus splice
• Consider if syntenic region in human, mouse
• No other data (publication) indicates it is
protein-coding
• 3’ end does not correspond to genomic polyA
Non-coding RNA lociProtein-coding RNA loci
41. National Center for Biotechnology Information
Using Epigenomic data to determine 5’ completeness
H3K4me3 tracks
from the UCSC
Genome Browser
Example: mouse Fgd4 gene (Gene ID: 224014). NCBI RADAR: NC_000082.6 Chromosome 1 GRCm38
UCSC Browser
42. National Center for Biotechnology Information
Representing genes based on published data
Example: Human APELA gene (Gene ID: 100506013) – transcript data supports an independent gene
with a short ORF (54 aa) that typically would not meet RefSeq criteria for a protein-coding locus.
Literature review confirms the short ORF is functional.
Assembly: GRCh38.p2, chromosome 4.
54 aa ORF
Functional data support the 54 aa ORF
NCBI RADAR: NC_000004.12 Chromosome 1 GRCh38.p2
43. National Center for Biotechnology Information
Gene type decisions depend on transcript data,
epigenomics and functional studies
Example: Human FALEC gene (Gene ID: 100874054)
Assembly: GRCh38.p2; chromosome 1
The locus is supported by a single
two-exon EST (AL713297.1)
Epigenomic marks support the 5’
completeness of the transcripts data
Published data support a functional
role for this lncRNA
NCBI RADAR: NC_000001.11 Chromosome 1 GRCh38.p2 (hg20)
UCSC - NC_000001.10 Chromosome 1 GRCh37 (hg19)
44. National Center for Biotechnology Information
Working with nomenclature groups to coordinate changes
Example: Non-coding gene LINC00948 was updated to a protein-coding gene MRLN (GeneID: 100507027).
Human Annotation Release 107
Private comments in the in-house Gene database record the curation history
RefSeq
proteins
(red)
45. National Center for Biotechnology Information
AAAAAA
Functional annotation on the RefSeq record
Example: Human GHRL gene (Gene ID: 51738)
- ghrelin/obestatin prepropeptide
GHRL gene
Prepro-ghrelin
Mature
peptides
pro-ghrelin
Ghrelin-28 Obestatin
Ghrelin C-Ghrelin
Signal
peptide
Ghrelin C-Ghrelin
http://www.ncbi.nlm.nih.gov/protein/NP_057446.1
46. National Center for Biotechnology Informationhttp://www.ncbi.nlm.nih.gov/gene/51738
• Mature peptides were annotated on protein products of 8
alternatively spliced transcripts (red arrows).
• The Graphics display shown in NCBI’s Gene resource was
reconfigured to show all transcripts and proteins, and to
show the protein features.
GRLH annotation display
in NCBI’s Gene resource
47. National Center for Biotechnology Information
Micro RNA annotation – collaboration with miRBase
RefSeq annotates the mature microRNAs
RefSeq represents
the miRNA stem-
loop precursor
Gene Graphics view
NCBI imports data directly from miRBase (mirbase.org)
miRBase ID:
MI0000443
Example: Human MIR124-1 (Gene ID: 406907)
NR_029668.1
http://www.ncbi.nlm.nih.gov/gene/406907
48. National Center for Biotechnology Information
RefSeq NR_029668.1
- Human MIR124-1
- Gene ID: 406907
RefSeq record – feature annotation for miRNAs
http://www.ncbi.nlm.nih.gov/nuccore/NR_029668.1
49. National Center for Biotechnology Information
Feature annotation –
More examples of feature annotation will be provided in Session 1
50. RefSeq collaborates to improve genome annotation
GRCh38 – The gap is fixed in
the updated assembly. RefSeq
and Sanger collaborate to
produce matching annotation
on the new assembly.
GRCh37 – Several exons of the
human COPG2 RefSeq were
missing in the reference genome
assembly. Curators constructed
the RefSeq from transcripts and
reported the assembly gap to
the Genome Reference
Consortium (GRC).
Chromosome 7 GRCh37/hg19 NC_000007.13
Chromosome 7 GRCh38/hg20 NC_000007.14
CCDS – The annotated CDS is
tracked by the Consensus CDS
(CCDS) collaboration once NCBI
and Ensembl have both
annotated the protein
51. National Center for Biotechnology Information
Caution: using RefSeq data from non-NCBI resources
missing XM_ variant
missing pseudogene
locus
missing locus
UCSC’s Genome Browser
RefSeq Genes track
GRCh37/hg19
- Also missing for UCSC
GRCh38/hg20
NCBI’s Graphics Viewer
GRCh38/hg20
52. National Center for Biotechnology Information
RefSeq overview
Curated data
Sequence analysis
Curation in-depth – examples
Data access
53. National Center for Biotechnology Information
Finding RefSeq data in NCBI’s Gene resource
• NCBI’s Gene resource is primarily based on RefSeq
• Gene integrates data from many sources:
• RefSeq & GeneRIF
• Official Nomenclature
• Gene Ontology
• Orthologs, Pathways, Phenotypes, Variation, Protein interactions, and
more
• Gene provides a unique ID and includes RefSeq details:
• RefSeq genome annotation
• RefSeq details including transcript variant descriptions
• Report of exon coordinates
54. National Center for Biotechnology Information
RefSeq data in Gene
• Genomic regions, transcripts, proteins
• Find genome annotation datails
• NCBI Reference Sequences (RefSeqs)
• Find information for individual accessions
55. National Center for Biotechnology Information
Manual curation provides annotation for Gene
Example: human GHRL (GeneID:51738)
Nomenclature
Summary
Publications
RefSeq transcript
variant
descriptions
56. National Center for Biotechnology Information
Navigating from Gene to Sequence to download
57. National Center for Biotechnology Information
Nucleotide & Protein queries
• Build a query starting with: refseq[filter]
• Add an organism: AND human[organism]
• Add a name, a RefSeq attribute, or a specific feature type
• AND ghrelin-27[protein name]
• Or… ‘AND mat_peptide[feature key]’ Or … ‘AND obestatin[protein name]’
Protein database query example:
refseq[filter] AND human[orgn] AND ghrelin-27[protein name] AND mat_peptide[feature key]
59. National Center for Biotechnology Information
Bulk retrievals
• RefSeq FTP site – ftp://ftp.ncbi.nlm.nih.gov/refseq/
• Comprehensive bi-monthly release organized by major groups (e.g.,
vertebrate_mammals, etc.)
• Weekly updates of transcript/protein records for some organisms
• Genomes FTP site – ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/
• Releases of genome assembly and annotation data. Updated to add new file formats,
when assembly updates, when there is a major annotation update.
• Gene FTP site – ftp://ftp.ncbi.nlm.nih.gov/gene/
• Reports Gene to RefSeq accession associations, and more.
• NCBI Programming Utilities (eUtils) – supports scripted retreivals
• Introduction: http://www.ncbi.nlm.nih.gov/books/NBK25497/
• Help: http://www.ncbi.nlm.nih.gov/books/NBK25501/
60. National Center for Biotechnology Information
User feedback and RefSeq updates
• Feedback:
http://www.ncbi.nlm.nih.gov/projects/RefSeq/update.cgi
• RefSeq Updates: subscribe to the refseq-admin mail list
http://www.ncbi.nlm.nih.gov/mailman/listinfo/refseq-announce/
• NCBI News
http://www.ncbi.nlm.nih.gov/news/
RefSeq Home page Gene report pages
61. National Center for Biotechnology Information
Databases & programming
• Terence Murphy
• Olga Ermolaeva
• Craig Wallin
• Alex Astashyn
• David Maganadze
• Mike DiCuccio
• Andrei Shkeda
• Donna Maglott
Acknowledgements
Stacy Ciufo
Eric Cox
Diana Haddad
Catherine Farrell
Tamara Goldfarb
Tripti Gupta
Vinita Joardar
Vamsi Kodali
Wenjun Li
Kelly McGarvey
Mike Murphy
Nuala O'Leary
Kathleen O’Neill
Shashi Pujar
Bhanu Rajput
Sanjida Rangwala
Lillian Riddick
Barbara Robberts
Brian Smith-White
Anjana Raina Vatsan
Dave Webb
Matt Wright
RefSeq Curators (Vertebrates & Other taxa)
NCBI Leadership
• David Lipman
• James Ostell
Genome Workbench & RADAR
• Anatoliy Kuznetsov
• David Falk
• Andrei Shkeda