The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from the NCBI databases through web services. It describes the 8 E-utilities services including ESearch, ESummary, EFetch, ELink and provides examples of how to use them in analysis pipelines. For instance, it shows how to find related human genes to articles on osteosarcoma by searching PubMed, linking to genes, and summarizing the gene data. It also demonstrates integrating PubMed and GEO Dataset searches to find cancer copy number articles associated with specific microarray platforms.
The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from NCBI databases like PubMed, Nucleotide, and Gene. It describes the 8 E-utilities programs (ESearch, EPost, ESummary, EFetch, ELink, EInfo, EGQuery, ESpell) and provides examples of building analysis pipelines using combinations of the E-utilities, such as searching PubMed and linking results to Gene records or nucleotide sequences. Sample applications demonstrated include finding related human genes to PubMed articles on osteosarcoma, downloading nucleotide sequences for Burkholderia cepacia complex, and identifying PubMed articles discussing cancer copy number changes that used specific GEO microarray
This document provides a short guide to using Entrez and the E-utilities. It describes Entrez as a global query system that searches over 40 databases at NCBI, including PubMed, Nucleotide, and Protein. It also describes the E-utilities, which provide a stable interface for programmers to query and retrieve data from Entrez. The document outlines the main functions of nine E-utilities including ESearch, ESummary, EFetch, and ELink. It also provides examples of constructing pipelines using multiple E-utilities to retrieve relevant document summaries and sequences from Entrez based on search queries or lists of IDs.
The document provides information about a bioinformatics course including the syllabus, topics, and schedule. The course focuses on using relational databases to manage and analyze large biological datasets. Topics include relation databases, web application development, genome browsers, text mining, and systems biology. The schedule lists the topics to be covered each class over 12 weeks from March to May.
This document discusses next generation DNA sequencing technologies. It begins by describing some of the limitations of traditional Sanger sequencing, such as read lengths of 500-1000 bases and throughput of 57,000 bases per run. It then introduces some key next generation sequencing technologies, such as 454 sequencing which uses emulsion PCR and pyrosequencing to achieve read lengths of 20-100 bases but higher throughput of 20-100 Mb per run. Illumina/Solexa sequencing is also discussed, which uses sequencing by synthesis with reversible terminators and laser-based detection. Finally, third generation sequencing technologies are mentioned, such as Pacific Biosciences' single molecule real time sequencing and nanopore sequencing. In summary, the document provides a high-level
The document discusses the ISA infrastructure, which provides a standardized format (ISA-TAB) for experimental metadata and data exchange. It can be used across various domains like toxicology, systems biology, and nanotechnology. The Risa R package integrates experimental metadata with analysis and allows updating metadata. Nature Scientific Data is a new publication for describing valuable datasets. The ISA framework has been adopted by over 30 public and private resources and is growing in use for facilitating reuse of investigations in various life science domains. Toxicity examples include EU projects on predictive toxicology and a rat study of drug candidates. Questions can be directed to the ISA tools group.
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
Bio-GraphIIn is a graph-based, integrative and semantically enabled repository for life science experimental data. It addresses the need for a system that supports retrospective data submissions, handles heterogeneous experimental data, and overcomes the fragmentation of existing data formats and databases. Bio-GraphIIn uses the Investigation/Study/Assay (ISA) framework and ontologies to semantically represent experimental metadata and enable rich queries across studies, with the goal of facilitating integrative data analysis.
ONTO-Toolkit is a collection of tools within the Galaxy framework that enables bio-ontology engineering using OBO file format ontologies. It includes wrappers for functions from the ONTO-PERL API to retrieve ontology terms and substructures. Two use cases are demonstrated: 1) identifying common ancestor terms between two molecular functions, and 2) finding the intersection between sub-ontologies for two biological processes to investigate overlap. The toolkit provides rich ontology-driven solutions for biologists within Galaxy.
The document outlines the usage of the NCBI E-utilities API for programmatically accessing data from NCBI databases like PubMed, Nucleotide, and Gene. It describes the 8 E-utilities programs (ESearch, EPost, ESummary, EFetch, ELink, EInfo, EGQuery, ESpell) and provides examples of building analysis pipelines using combinations of the E-utilities, such as searching PubMed and linking results to Gene records or nucleotide sequences. Sample applications demonstrated include finding related human genes to PubMed articles on osteosarcoma, downloading nucleotide sequences for Burkholderia cepacia complex, and identifying PubMed articles discussing cancer copy number changes that used specific GEO microarray
This document provides a short guide to using Entrez and the E-utilities. It describes Entrez as a global query system that searches over 40 databases at NCBI, including PubMed, Nucleotide, and Protein. It also describes the E-utilities, which provide a stable interface for programmers to query and retrieve data from Entrez. The document outlines the main functions of nine E-utilities including ESearch, ESummary, EFetch, and ELink. It also provides examples of constructing pipelines using multiple E-utilities to retrieve relevant document summaries and sequences from Entrez based on search queries or lists of IDs.
The document provides information about a bioinformatics course including the syllabus, topics, and schedule. The course focuses on using relational databases to manage and analyze large biological datasets. Topics include relation databases, web application development, genome browsers, text mining, and systems biology. The schedule lists the topics to be covered each class over 12 weeks from March to May.
This document discusses next generation DNA sequencing technologies. It begins by describing some of the limitations of traditional Sanger sequencing, such as read lengths of 500-1000 bases and throughput of 57,000 bases per run. It then introduces some key next generation sequencing technologies, such as 454 sequencing which uses emulsion PCR and pyrosequencing to achieve read lengths of 20-100 bases but higher throughput of 20-100 Mb per run. Illumina/Solexa sequencing is also discussed, which uses sequencing by synthesis with reversible terminators and laser-based detection. Finally, third generation sequencing technologies are mentioned, such as Pacific Biosciences' single molecule real time sequencing and nanopore sequencing. In summary, the document provides a high-level
The document discusses the ISA infrastructure, which provides a standardized format (ISA-TAB) for experimental metadata and data exchange. It can be used across various domains like toxicology, systems biology, and nanotechnology. The Risa R package integrates experimental metadata with analysis and allows updating metadata. Nature Scientific Data is a new publication for describing valuable datasets. The ISA framework has been adopted by over 30 public and private resources and is growing in use for facilitating reuse of investigations in various life science domains. Toxicity examples include EU projects on predictive toxicology and a rat study of drug candidates. Questions can be directed to the ISA tools group.
Metagenomic Data Provenance and Management using the ISA infrastructure --- o...Alejandra Gonzalez-Beltran
Metagenomic Data Provenance and Management using the ISA infrastructure - overview, implementation patterns & software tools
Slides presented at EBI Metagenomics Bioinformatics course: http://www.ebi.ac.uk/training/course/metagenomics2014
Bio-GraphIIn is a graph-based, integrative and semantically enabled repository for life science experimental data. It addresses the need for a system that supports retrospective data submissions, handles heterogeneous experimental data, and overcomes the fragmentation of existing data formats and databases. Bio-GraphIIn uses the Investigation/Study/Assay (ISA) framework and ontologies to semantically represent experimental metadata and enable rich queries across studies, with the goal of facilitating integrative data analysis.
ONTO-Toolkit is a collection of tools within the Galaxy framework that enables bio-ontology engineering using OBO file format ontologies. It includes wrappers for functions from the ONTO-PERL API to retrieve ontology terms and substructures. Two use cases are demonstrated: 1) identifying common ancestor terms between two molecular functions, and 2) finding the intersection between sub-ontologies for two biological processes to investigate overlap. The toolkit provides rich ontology-driven solutions for biologists within Galaxy.
The document discusses views and materialized views in data warehousing and decision support systems. It covers three main points:
1) OLAP queries typically involve aggregate queries, so precomputation is essential for fast response times. Materialized views allow precomputing aggregates across multiple dimensions.
2) Warehouses can be thought of as collections of asynchronously replicated tables and periodically maintained views, renewing interest in efficient view maintenance.
3) Materialized views store the results of views in the database for fast access like a cache, but they require maintenance as underlying tables change. Incremental maintenance algorithms are ideal to efficiently update materialized views.
The document describes MOLGENIS, an open-source software system that allows users to define data models and generate full-featured web applications and databases from those models. Key features include a graphical user interface, database integration, support for common data formats, and the ability to rapidly develop applications by editing simple domain-specific models. The system has been applied to build several genomic and biomedical databases.
The document discusses various database concepts including normalization, which is used to design optimal relation schemas by removing redundant data. It also covers transaction processing, which involves executing logical database operations as transactions to maintain data integrity. Database systems use techniques like logging and concurrency control to prevent transaction anomalies and ensure failures can be recovered from.
This document provides information about bioinformatics resources including databases of nucleotide and protein sequences. It discusses flat file databases like GenBank that store sequence data in plain text files and relational databases that improve data organization. Examples of popular biological databases are described, such as GenBank, EMBL, and DDBJ for nucleotide sequences and Swiss-Prot and TrEMBL for protein sequences. The document also covers sequence file formats, web tools for querying databases, and trace files used in sequence assembly.
The document describes a framework for biological relation extraction using biomedical ontologies and text mining. It discusses introducing biomedical text mining and outlines the problem, motivation, and challenges. It then presents the overall system components and architecture, including searching/browsing, Swanson's algorithm, protein-protein interactions, and gene clustering applications. The framework concept issues, design issues, sequence diagram, and database are also covered at a high level.
This presentation discusses standards for sharing functional genomics data. It summarizes lessons learned from the Minimum Information About a Microarray Experiment (MIAME) standard, including that simply depositing data is not enough - metadata, analysis code, and usable formats are also needed for reproducibility. For high-throughput sequencing data, a Minimum Information about a high-throughput Nucleotide Sequencing Experiment (MINSEQE) standard is proposed with similar requirements as MIAME. The presentation emphasizes keeping standards simple while ensuring machine-readability for reuse.
The document describes the Mouse Gene Expression Database (GXD), which integrates data on gene expression during mouse development from various sources and assay types. GXD provides search tools to query expression data for genes, anatomical structures, developmental stages, mutants, and references. It also offers visual summaries of expression assay results, images, and links to detailed annotations. Recent improvements include enhanced search capabilities, sortable summaries, and direct links to expression images.
Overview of three areas where the ENCODE DCC is facilitating the integration of diverse datasets: (1) defining a metadata standard (2) using ontologies for annotation (3) creating a RESTful interface for data access
This document discusses the Biological Databases project being conducted by a group of students. The project involves using the video game Minecraft to visualize protein structures retrieved from the Protein Data Bank (PDB). Python scripts are used to import PDB data files and place blocks in Minecraft to represent atoms, with different block colors used to distinguish atom types. SPARQL queries are also employed to search the RDF version of the PDB for protein entries. The goal is to build 3D protein models inside Minecraft for educational and visualization purposes.
The Microsoft Biology Foundation (MBF) is an open-source library of bioinformatics algorithms and services built on .NET. MBF provides modular and reusable code for tasks like genomics, sequencing, and analysis. It leverages existing Microsoft technologies and allows distribution of computations across platforms from local to cloud. The first version was released in June 2010. MBF is developed openly on CodePlex and aims to benefit both commercial and non-commercial users.
This document provides an overview of bioinformatics and biological databases. It discusses how bioinformatics draws from fields like biology, computer science, statistics, and machine learning. Biological databases are important resources for bioinformatics that can be searched and analyzed to answer questions, find similar sequences, locate patterns, and make predictions. The document also outlines common uses of biological databases, such as annotation searches, homology searches, pattern searches, and predictive analyses.
The document discusses the ISA infrastructure, which provides a framework for tracking metadata in bioscience experiments from data collection to sharing in linked data clouds. The infrastructure includes a metadata syntax, open source software tools, and a user community. It allows annotation of experimental metadata, materials, and processes using ontologies to make semantics explicit and enable integration and knowledge discovery. The infrastructure is growing with over 30 public and private resources adopting it to facilitate standards-compliant sharing of investigations across life science domains.
The document discusses knowledge management of experimental data through the ISA ecosystem. It describes the ISA-tab format and software suite that allows annotation and curation of experimental metadata. As a use case, it analyzes a dataset on metabolite profiling from a study of fatty acid amide hydrolase knockout mice. The ISA tools can represent investigations and assays, convert data to standardized formats, and facilitate sharing and analysis of experimental data.
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
This document summarizes content mining technology and policy developments. It discusses what content and mining are, provides a brief history of content mining, and outlines legal considerations around copyright and database rights. It then describes the ContentMine software and pipeline for scraping, normalizing, and extracting facts from scholarly documents at scale. Examples of mining applications in chemistry, clinical trials, phylogenetics, and genome annotation are provided. The document concludes with a discussion of the potential value of content mining for public health researchers.
This document discusses community standards for reproducible and reusable bioscience research. It outlines the importance of consistent reporting to maximize the value of collective scientific outputs. However, there are challenges due to the large number of bioscience reporting standards and lack of knowledge about how they relate. The document calls for a coherent catalogue of data sharing resources to evaluate standards, show relationships among them, and promote interoperability. This would help researchers make informed choices about standards and facilitate structured descriptions of experiments across domains.
This document provides an overview and agenda for a GenomeSpace workshop. It introduces GenomeSpace as an online community for sharing diverse computational genomics tools. The document reviews several popular tools integrated with GenomeSpace, including Cytoscape, Galaxy, Genomica, and GenePattern. It also outlines basic recipes for using GenomeSpace, such as uploading data, launching tools, and transferring data between tools. The workshop aims to demonstrate the GenomeSpace user interface and provide hands-on experience with key tools and integrative analysis workflows.
This document discusses establishing a national repository for microarray gene expression data using MOLGENIS and MAGE-TAB. The objectives are to populate the repository with well-annotated microarray experiments from over 6,500 biobank samples, share the software as a microarray database solution for all biobanks, and combine gene expression data with GWAS studies to create novel eQTL datasets for complex diseases. The repository was created using MOLGENIS and populated with over 12,000 curated experiments from GEO and ArrayExpress for testing purposes. Future work includes populating with local data, integrating analysis tools, and enabling data and tool sharing between local installations while maintaining privacy.
The document discusses the ISA infrastructure, which provides a generic format for experimental description and data exchange. The ISA infrastructure aims to support bio-scientists from experimental design to data publication. It does this through developing community standards, open source software tools, and engaging communities. The infrastructure provides a common framework to describe experiments in a way that allows data to flow between different systems and communities.
This document provides an introduction to bio-ontologies and the semantic web. It discusses what ontologies are and how they are used in the bio domain through initiatives like the OBO Foundry. It introduces key semantic web technologies like RDF, URIs, Turtle syntax, and SPARQL query language. It provides examples of ontologies like the Gene Ontology and how ontologies can be represented and queried using these semantic web standards.
Education takes philosophical ideals and principles and makes them practical through application and implementation in society. Philosophy provides the purpose and aims, while education renews social structures according to those philosophical ideals. Education is thus considered applied philosophy, as it dynamically enacts the fundamental principles of philosophy and works out philosophical ideals, values, and methods in real-world practice.
The document discusses views and materialized views in data warehousing and decision support systems. It covers three main points:
1) OLAP queries typically involve aggregate queries, so precomputation is essential for fast response times. Materialized views allow precomputing aggregates across multiple dimensions.
2) Warehouses can be thought of as collections of asynchronously replicated tables and periodically maintained views, renewing interest in efficient view maintenance.
3) Materialized views store the results of views in the database for fast access like a cache, but they require maintenance as underlying tables change. Incremental maintenance algorithms are ideal to efficiently update materialized views.
The document describes MOLGENIS, an open-source software system that allows users to define data models and generate full-featured web applications and databases from those models. Key features include a graphical user interface, database integration, support for common data formats, and the ability to rapidly develop applications by editing simple domain-specific models. The system has been applied to build several genomic and biomedical databases.
The document discusses various database concepts including normalization, which is used to design optimal relation schemas by removing redundant data. It also covers transaction processing, which involves executing logical database operations as transactions to maintain data integrity. Database systems use techniques like logging and concurrency control to prevent transaction anomalies and ensure failures can be recovered from.
This document provides information about bioinformatics resources including databases of nucleotide and protein sequences. It discusses flat file databases like GenBank that store sequence data in plain text files and relational databases that improve data organization. Examples of popular biological databases are described, such as GenBank, EMBL, and DDBJ for nucleotide sequences and Swiss-Prot and TrEMBL for protein sequences. The document also covers sequence file formats, web tools for querying databases, and trace files used in sequence assembly.
The document describes a framework for biological relation extraction using biomedical ontologies and text mining. It discusses introducing biomedical text mining and outlines the problem, motivation, and challenges. It then presents the overall system components and architecture, including searching/browsing, Swanson's algorithm, protein-protein interactions, and gene clustering applications. The framework concept issues, design issues, sequence diagram, and database are also covered at a high level.
This presentation discusses standards for sharing functional genomics data. It summarizes lessons learned from the Minimum Information About a Microarray Experiment (MIAME) standard, including that simply depositing data is not enough - metadata, analysis code, and usable formats are also needed for reproducibility. For high-throughput sequencing data, a Minimum Information about a high-throughput Nucleotide Sequencing Experiment (MINSEQE) standard is proposed with similar requirements as MIAME. The presentation emphasizes keeping standards simple while ensuring machine-readability for reuse.
The document describes the Mouse Gene Expression Database (GXD), which integrates data on gene expression during mouse development from various sources and assay types. GXD provides search tools to query expression data for genes, anatomical structures, developmental stages, mutants, and references. It also offers visual summaries of expression assay results, images, and links to detailed annotations. Recent improvements include enhanced search capabilities, sortable summaries, and direct links to expression images.
Overview of three areas where the ENCODE DCC is facilitating the integration of diverse datasets: (1) defining a metadata standard (2) using ontologies for annotation (3) creating a RESTful interface for data access
This document discusses the Biological Databases project being conducted by a group of students. The project involves using the video game Minecraft to visualize protein structures retrieved from the Protein Data Bank (PDB). Python scripts are used to import PDB data files and place blocks in Minecraft to represent atoms, with different block colors used to distinguish atom types. SPARQL queries are also employed to search the RDF version of the PDB for protein entries. The goal is to build 3D protein models inside Minecraft for educational and visualization purposes.
The Microsoft Biology Foundation (MBF) is an open-source library of bioinformatics algorithms and services built on .NET. MBF provides modular and reusable code for tasks like genomics, sequencing, and analysis. It leverages existing Microsoft technologies and allows distribution of computations across platforms from local to cloud. The first version was released in June 2010. MBF is developed openly on CodePlex and aims to benefit both commercial and non-commercial users.
This document provides an overview of bioinformatics and biological databases. It discusses how bioinformatics draws from fields like biology, computer science, statistics, and machine learning. Biological databases are important resources for bioinformatics that can be searched and analyzed to answer questions, find similar sequences, locate patterns, and make predictions. The document also outlines common uses of biological databases, such as annotation searches, homology searches, pattern searches, and predictive analyses.
The document discusses the ISA infrastructure, which provides a framework for tracking metadata in bioscience experiments from data collection to sharing in linked data clouds. The infrastructure includes a metadata syntax, open source software tools, and a user community. It allows annotation of experimental metadata, materials, and processes using ontologies to make semantics explicit and enable integration and knowledge discovery. The infrastructure is growing with over 30 public and private resources adopting it to facilitate standards-compliant sharing of investigations across life science domains.
The document discusses knowledge management of experimental data through the ISA ecosystem. It describes the ISA-tab format and software suite that allows annotation and curation of experimental metadata. As a use case, it analyzes a dataset on metabolite profiling from a study of fatty acid amide hydrolase knockout mice. The ISA tools can represent investigations and assays, convert data to standardized formats, and facilitate sharing and analysis of experimental data.
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
This document summarizes content mining technology and policy developments. It discusses what content and mining are, provides a brief history of content mining, and outlines legal considerations around copyright and database rights. It then describes the ContentMine software and pipeline for scraping, normalizing, and extracting facts from scholarly documents at scale. Examples of mining applications in chemistry, clinical trials, phylogenetics, and genome annotation are provided. The document concludes with a discussion of the potential value of content mining for public health researchers.
This document discusses community standards for reproducible and reusable bioscience research. It outlines the importance of consistent reporting to maximize the value of collective scientific outputs. However, there are challenges due to the large number of bioscience reporting standards and lack of knowledge about how they relate. The document calls for a coherent catalogue of data sharing resources to evaluate standards, show relationships among them, and promote interoperability. This would help researchers make informed choices about standards and facilitate structured descriptions of experiments across domains.
This document provides an overview and agenda for a GenomeSpace workshop. It introduces GenomeSpace as an online community for sharing diverse computational genomics tools. The document reviews several popular tools integrated with GenomeSpace, including Cytoscape, Galaxy, Genomica, and GenePattern. It also outlines basic recipes for using GenomeSpace, such as uploading data, launching tools, and transferring data between tools. The workshop aims to demonstrate the GenomeSpace user interface and provide hands-on experience with key tools and integrative analysis workflows.
This document discusses establishing a national repository for microarray gene expression data using MOLGENIS and MAGE-TAB. The objectives are to populate the repository with well-annotated microarray experiments from over 6,500 biobank samples, share the software as a microarray database solution for all biobanks, and combine gene expression data with GWAS studies to create novel eQTL datasets for complex diseases. The repository was created using MOLGENIS and populated with over 12,000 curated experiments from GEO and ArrayExpress for testing purposes. Future work includes populating with local data, integrating analysis tools, and enabling data and tool sharing between local installations while maintaining privacy.
The document discusses the ISA infrastructure, which provides a generic format for experimental description and data exchange. The ISA infrastructure aims to support bio-scientists from experimental design to data publication. It does this through developing community standards, open source software tools, and engaging communities. The infrastructure provides a common framework to describe experiments in a way that allows data to flow between different systems and communities.
This document provides an introduction to bio-ontologies and the semantic web. It discusses what ontologies are and how they are used in the bio domain through initiatives like the OBO Foundry. It introduces key semantic web technologies like RDF, URIs, Turtle syntax, and SPARQL query language. It provides examples of ontologies like the Gene Ontology and how ontologies can be represented and queried using these semantic web standards.
Education takes philosophical ideals and principles and makes them practical through application and implementation in society. Philosophy provides the purpose and aims, while education renews social structures according to those philosophical ideals. Education is thus considered applied philosophy, as it dynamically enacts the fundamental principles of philosophy and works out philosophical ideals, values, and methods in real-world practice.
Biography of Stephen King and His WorksJenny Reyes
Stephen King was born in 1947 in Portland, Maine and raised primarily by his mother. He graduated from the University of Maine in 1970 with a degree in English and married Tabitha Spruce the following year. While struggling to find work as a teacher, King supported himself with short story writing and began work on his first novel, Carrie. The success of Carrie allowed King to leave teaching and become a full-time writer, living in Maine where he still resides today with his family and continuing to produce bestselling novels and stories.
1. The document provides an overview of the Northern Mindanao region of the Philippines, which includes the provinces of Camiguin, Misamis Oriental, Bukidnon, Misamis Occidental, and Lanao del Norte.
2. Some key facts are that Bukidnon is the pineapple capital of the world, Misamis Occidental celebrates Christmas well, and Cagayan de Oro City is the economic center of the region.
3. The region's economy is driven primarily by agriculture, fishing, forestry, and food processing, with crops including rice, banana, sugarcane, coconut, corn, and pineapple.
This document discusses the importance of classroom learning environments. It addresses the physical, psychological, and social aspects of the classroom that can influence student learning. Specifically, it examines the role of the teacher, classroom climate, environmental factors, and tools for measuring and designing effective learning environments. The goal is to understand how to create a comprehensive setting where all students can successfully learn regardless of individual differences.
The National Competency-Based Teacher Standards (NCBTS) provides a single framework that defines effective teaching in the Philippines. It is intended to guide all aspects of a teacher's professional development and practice. The NCBTS framework is organized into 7 domains that represent distinct areas of the teaching and learning process, with each domain defined by a principle related to enhancing student learning. The NCBTS aims to establish consistent standards for quality teaching across the country and provide a common language for teachers to evaluate and improve their practice.
This copy from the DepEd is the same copy available on the Expereincial Learning Book developed by the DepEd in collaboration with the Academe and other experts all over the country.
This worksheet will enable teachers to self assess in order to remain relevant and in line with the goal of transforming education into the 21st cenury. This worksheet was developed by the DepEd. I am making it available in my site with the sole goal of spreading information to the farthest corners of the nation.
This is the National Competency Based Teachers Standard or NCBTS developed by the Academic Community, and other concerned government agencies to transform teaching into 21st century standards and the teachers as a globally competent individuals.
The document discusses the National Competency-Based Teacher Standards (NCBTS) framework for Philippine teachers. It describes the NCBTS as defining effective teaching and providing a single framework to guide teacher development. The NCBTS contains 7 domains that describe the knowledge and skills of effective teachers, including social regard for learning, learning environment, diversity of learners, curriculum, planning/assessing/reporting, community linkages, and personal growth. It emphasizes the importance of helping all students learn and recognizing individual differences. The document provides details on various strands within each domain and their related performance indicators.
N.C.B.T.S.-National Competency-Based Teacher's Standard (2013)Marianne Seras
The document outlines the National Competency-Based Teacher Standards (NCBTS) framework in the Philippines. It describes NCBTS as defining effective teaching and providing a single framework for teacher development from school to national levels. It aims to minimize confusion about teaching standards. The NCBTS framework has 7 domains: (1) Social Regard for Learning, (2) The Learning Environment, (3) Diversity of Learners, (4) Curriculum, (5) Planning/Assessing/Reporting, (6) Community Linkages, and (7) Personal Growth. It also discusses the Code of Ethics for teachers established by the Philippine Teachers Professionalization Act.
Code of Ethics for Professional Teachers of the PhilippinesJohn Bernal
This powerpoint presentation contains salient features of Code of Ethics for Professional Teachers of the Philippines citing Supreme Court Jurisprudence related to education.
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
My talk at NCI's CBIIT speaker series:
https://wiki.nci.nih.gov/display/CBIITSpeakers/2019/01/02/Jan+16%2C+Chunlei+Wu%2C+BioThings+API
A companion blog post: https://ncip.nci.nih.gov/blog/the-network-of-biothings/
See more details about BioThings project at http://biothings.io.
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
Reproducible computational research (RCR) provides the keystone to the scientific method, packaging the transformation of raw data to published results in a manner than can be communicated to others. Developing RCR standards has been a growing concern of statisticians, data scientists, and informatics professionals. Metadata provides context and provenance to raw data, and is essential to both discovery and validation RCR. This presentation will give an overview for emerging metadata standards in data, analysis, pipelines tools, and publications.
The document describes Biothings.api, a framework for building RESTful web APIs that query biological data stored in Elasticsearch. It generalizes code from existing APIs like MyGene.info and MyVariant.info. The framework includes handlers for common API endpoints, classes for constructing Elasticsearch queries, and a project template. Using this framework, new biological data APIs can be quickly created with minimal additional code. It has been used to rebuild MyVariant.info and create new APIs for taxonomy and chemicals. Future work includes improving documentation and integrating data loading/indexing functionality.
Metadata-based tools at the ENCODE PortalENCODE-DCC
This document describes metadata-driven tools developed by the ENCODE Data Coordination Center to provide access to data from the ENCODE project. It discusses how detailed metadata describing experimental variables is captured and structured to enable faceted browsing and filtering of the large dataset. Elasticsearch allows full-text searches. The metadata schema and relationships between metadata objects are shown. The ENCODE data portal allows browsing and searching across 4000+ experiments and associated files, biosamples, antibodies, and annotations.
GenBank, EMBL, and DDBJ are primary nucleotide sequence databases that collaborate to store publicly available DNA sequences. NCBI's GenBank is one of the largest primary sequence databases, containing over 240,000 organisms' sequences submitted from laboratories. PubMed and Entrez are literature and biomedical databases maintained by NCBI that allow users to search biomedical research articles and integrate related data from multiple sources. SRS is a sequence retrieval system developed by EBI that integrates over 250 molecular biology databases and allows complex queries across data sources.
Bioinformatic Harvester is a software tool that acts as a meta search engine for genes and protein information. It collects and indexes data from 16 major bioinformatics databases and allows users to search across these databases simultaneously. Search results are displayed on a single HTML page and are ranked based on relevance. Users can query the system using terms like gene names, sequences, protein domains, and literature to retrieve integrated information from databases on genes and proteins.
Role of bioinformatics in life sciences researchAnshika Bansal
1. The document discusses bioinformatics and summarizes some of its key applications and tools. It describes how bioinformatics merges biology and computer science to solve biological problems by applying computational tools to molecular data.
2. It provides examples of common bioinformatics tasks like retrieving sequences from databases, comparing sequences, analyzing genes and proteins, and viewing 3D structures.
3. The document lists several popular databases for nucleotide sequences, protein sequences, literature, and other biological data. It also introduces common bioinformatics tools for tasks like sequence alignment, translation, and structure analysis.
The document describes tools and web services from the National Center for Biomedical Ontology (NCBO) including the Ontology Web Services, Ontology Widgets, NCBO Annotator, NCBO Resource Index, and Ontology Recommender. The NCBO Annotator is an open access web service that annotates text with terms from ontologies in BioPortal and includes a variety of customization parameters. The NCBO Resource Index provides an ontology-based search across publicly available biomedical resources.
This document provides an overview of downstream analyses that can be performed after variant identification and filtering in a typical variant calling pipeline. It discusses visualization of variant data in each gene to identify potential causative variants. It also mentions association studies as another type of downstream analysis where variants are tested for association with disease phenotypes. The goal of downstream analyses is to help prioritize variants for further investigation.
Databases store organized information in tables and fields. A database management system interacts with users and applications to capture and analyze data. Biological databases contain life sciences information from experiments, literature, and computational analysis. They classify sequences, structures, and functions. Common biological databases include GenBank, UniProt, and PDB.
This document describes several text-based biological databases and how to search them. It discusses Entrez, which searches multiple databases and links related entries. It also describes the Sequence Retrieval System (SRS) which allows searching over 80 biological databases. Additionally, it outlines DBGET/LinkDB, an integrated system that searches about 20 databases and links results to associated information. The document provides an example of using each system to retrieve information on a specific protein entry.
Biological databases store and organize large amounts of biological data for research use. There are many types of biological databases that classify data by type, such as nucleotide sequences, protein sequences, genomes, protein structures, gene expression, and metabolic pathways. Databases can also be classified by their data source as primary databases containing experimental results or secondary databases that analyze primary database results. Database availability varies, with some publicly open and others proprietary. Common biological databases discussed include GenBank, UniProt, PDB, KEGG, and FlyBase.
Bioinformatics is defined as the application of tools of computation and analysis to the capture and interpretation of biological data. It is an interdisciplinary field, which harnesses computer science, mathematics, physics, and biology
openEHR Developers Workshop at #MedInfo2015Pablo Pazos
The document discusses Pablo Pazos Gutiérrez's experiences developing open source software using the openEHR standard, including EHRServer, a clinical data repository, and EHRGen, an EHR generator framework. EHRServer allows committing and querying clinical data in an openEHR format. EHRGen auto-generates user interfaces and menus from openEHR archetypes and templates. The document also describes an XML rule engine for clinical decision support using openEHR data and an approach to generalized UI generation across technologies.
Presentation to ImmPort Science Meeting, February 27, 2014 on the proper treatment of value sets in the Immport Immunology Database and Analysis Portal
The document discusses different text-based database retrieval systems for accessing biological data, including Entrez, SRS, and DBGET/LinkDB. It describes their key features and how each system allows users to search text databases using queries, with Entrez providing linked related data across multiple databases. An example shows how each system can be used to retrieve and view related information for a SwissProt protein entry.
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
This document discusses developing a business model for ZooBank, a proposed online registry of zoological nomenclature. It outlines elements to consider for the business model, including the scientific, technical, social, and financial models. It also discusses how ZooBank could operate within the EDIT network to establish a prototype web taxonomy and help coordinate taxonomic data infrastructure. Funding opportunities that could support ZooBank are also mentioned.
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
This presentation demonstrates the functionality provided by the Logical Model Designer (LMD) and Snow Owl tools, which enables terminology to be bound to the Singapore Logical Information Model.
Abstract:
A critical enabler in the journey towards semantic interoperability in Singapore is the Singapore "˜Logical Information Model' (LIM). The LIM is a model of the healthcare information shared within Singapore, and is defined as a set of reusable "˜archetypes' for each clinical concept (e.g. Problem/Diagnosis, Pharmacy Order). These archetypes are then constrained and composed into "˜templates' to support specific use cases.
The Singapore LIM harmonises the semantics of the information structures with the terminology, using multiple types of terminology bindings, including semantic, value domain and constraint bindings. Value domain bindings are defined to both national "˜reference terminology' (used for querying nationally-collated data), as well as to a variety of "˜interface terminologies' used within local clinical systems (required to enforce conformance-compliance rules over message specifications generated from the LIM). To support the diversity of pre-coordination captured in local interface terms, "˜design patterns' are included in the LIM, based on the SNOMED CT concept model. These design patterns represent a logical model of meaning for a specific concept, and allow more than one split between the information model and the terminology model to be represented in a semantically-consistent manner.
This presentation will demonstrate the "˜Logical Model Designer' (LMD) - an Eclipse-based tool that is being used to maintain Singapore's Logical Information Model. A number of features of the LMD tooling will be demonstrated, with a specific focus on how the information structure is bound to the terminology via an interface to the Snow Owl platform. Value Domains are defined as reference sets within Snow Owl and then linked to the information structures defined in the LMD.
Please see our website http://b2i.sg for further information.
This document provides an introduction to biological databases. It discusses primary databases like GenBank which contain original sequence submissions and secondary databases derived from primary data, maintained by third parties like NCBI. Some key databases mentioned include GenBank, PDB, Swiss-Prot. The document also provides an overview of the NCBI and Entrez retrieval system, which allows integrated searches across literature and sequences.
Similar to NCBI API - Integration into analysis code (20)
Immersive Learning That Works: Research Grounding and Paths ForwardLeonel Morgado
We will metaverse into the essence of immersive learning, into its three dimensions and conceptual models. This approach encompasses elements from teaching methodologies to social involvement, through organizational concerns and technologies. Challenging the perception of learning as knowledge transfer, we introduce a 'Uses, Practices & Strategies' model operationalized by the 'Immersive Learning Brain' and ‘Immersion Cube’ frameworks. This approach offers a comprehensive guide through the intricacies of immersive educational experiences and spotlighting research frontiers, along the immersion dimensions of system, narrative, and agency. Our discourse extends to stakeholders beyond the academic sphere, addressing the interests of technologists, instructional designers, and policymakers. We span various contexts, from formal education to organizational transformation to the new horizon of an AI-pervasive society. This keynote aims to unite the iLRN community in a collaborative journey towards a future where immersive learning research and practice coalesce, paving the way for innovative educational research and practice landscapes.
Anti-Universe And Emergent Gravity and the Dark UniverseSérgio Sacani
Recent theoretical progress indicates that spacetime and gravity emerge together from the entanglement structure of an underlying microscopic theory. These ideas are best understood in Anti-de Sitter space, where they rely on the area law for entanglement entropy. The extension to de Sitter space requires taking into account the entropy and temperature associated with the cosmological horizon. Using insights from string theory, black hole physics and quantum information theory we argue that the positive dark energy leads to a thermal volume law contribution to the entropy that overtakes the area law precisely at the cosmological horizon. Due to the competition between area and volume law entanglement the microscopic de Sitter states do not thermalise at sub-Hubble scales: they exhibit memory effects in the form of an entropy displacement caused by matter. The emergent laws of gravity contain an additional ‘dark’ gravitational force describing the ‘elastic’ response due to the entropy displacement. We derive an estimate of the strength of this extra force in terms of the baryonic mass, Newton’s constant and the Hubble acceleration scale a0 = cH0, and provide evidence for the fact that this additional ‘dark gravity force’ explains the observed phenomena in galaxies and clusters currently attributed to dark matter.
Signatures of wave erosion in Titan’s coastsSérgio Sacani
The shorelines of Titan’s hydrocarbon seas trace flooded erosional landforms such as river valleys; however, it isunclear whether coastal erosion has subsequently altered these shorelines. Spacecraft observations and theo-retical models suggest that wind may cause waves to form on Titan’s seas, potentially driving coastal erosion,but the observational evidence of waves is indirect, and the processes affecting shoreline evolution on Titanremain unknown. No widely accepted framework exists for using shoreline morphology to quantitatively dis-cern coastal erosion mechanisms, even on Earth, where the dominant mechanisms are known. We combinelandscape evolution models with measurements of shoreline shape on Earth to characterize how differentcoastal erosion mechanisms affect shoreline morphology. Applying this framework to Titan, we find that theshorelines of Titan’s seas are most consistent with flooded landscapes that subsequently have been eroded bywaves, rather than a uniform erosional process or no coastal erosion, particularly if wave growth saturates atfetch lengths of tens of kilometers.
The cost of acquiring information by natural selectionCarl Bergstrom
This is a short talk that I gave at the Banff International Research Station workshop on Modeling and Theory in Population Biology. The idea is to try to understand how the burden of natural selection relates to the amount of information that selection puts into the genome.
It's based on the first part of this research paper:
The cost of information acquisition by natural selection
Ryan Seamus McGee, Olivia Kosterlitz, Artem Kaznatcheev, Benjamin Kerr, Carl T. Bergstrom
bioRxiv 2022.07.02.498577; doi: https://doi.org/10.1101/2022.07.02.498577
Sexuality - Issues, Attitude and Behaviour - Applied Social Psychology - Psyc...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
TOPIC OF DISCUSSION: CENTRIFUGATION SLIDESHARE.pptxshubhijain836
Centrifugation is a powerful technique used in laboratories to separate components of a heterogeneous mixture based on their density. This process utilizes centrifugal force to rapidly spin samples, causing denser particles to migrate outward more quickly than lighter ones. As a result, distinct layers form within the sample tube, allowing for easy isolation and purification of target substances.
Microbial interaction
Microorganisms interacts with each other and can be physically associated with another organisms in a variety of ways.
One organism can be located on the surface of another organism as an ectobiont or located within another organism as endobiont.
Microbial interaction may be positive such as mutualism, proto-cooperation, commensalism or may be negative such as parasitism, predation or competition
Types of microbial interaction
Positive interaction: mutualism, proto-cooperation, commensalism
Negative interaction: Ammensalism (antagonism), parasitism, predation, competition
I. Mutualism:
It is defined as the relationship in which each organism in interaction gets benefits from association. It is an obligatory relationship in which mutualist and host are metabolically dependent on each other.
Mutualistic relationship is very specific where one member of association cannot be replaced by another species.
Mutualism require close physical contact between interacting organisms.
Relationship of mutualism allows organisms to exist in habitat that could not occupied by either species alone.
Mutualistic relationship between organisms allows them to act as a single organism.
Examples of mutualism:
i. Lichens:
Lichens are excellent example of mutualism.
They are the association of specific fungi and certain genus of algae. In lichen, fungal partner is called mycobiont and algal partner is called
II. Syntrophism:
It is an association in which the growth of one organism either depends on or improved by the substrate provided by another organism.
In syntrophism both organism in association gets benefits.
Compound A
Utilized by population 1
Compound B
Utilized by population 2
Compound C
utilized by both Population 1+2
Products
In this theoretical example of syntrophism, population 1 is able to utilize and metabolize compound A, forming compound B but cannot metabolize beyond compound B without co-operation of population 2. Population 2is unable to utilize compound A but it can metabolize compound B forming compound C. Then both population 1 and 2 are able to carry out metabolic reaction which leads to formation of end product that neither population could produce alone.
Examples of syntrophism:
i. Methanogenic ecosystem in sludge digester
Methane produced by methanogenic bacteria depends upon interspecies hydrogen transfer by other fermentative bacteria.
Anaerobic fermentative bacteria generate CO2 and H2 utilizing carbohydrates which is then utilized by methanogenic bacteria (Methanobacter) to produce methane.
ii. Lactobacillus arobinosus and Enterococcus faecalis:
In the minimal media, Lactobacillus arobinosus and Enterococcus faecalis are able to grow together but not alone.
The synergistic relationship between E. faecalis and L. arobinosus occurs in which E. faecalis require folic acid
Candidate young stellar objects in the S-cluster: Kinematic analysis of a sub...Sérgio Sacani
Context. The observation of several L-band emission sources in the S cluster has led to a rich discussion of their nature. However, a definitive answer to the classification of the dusty objects requires an explanation for the detection of compact Doppler-shifted Brγ emission. The ionized hydrogen in combination with the observation of mid-infrared L-band continuum emission suggests that most of these sources are embedded in a dusty envelope. These embedded sources are part of the S-cluster, and their relationship to the S-stars is still under debate. To date, the question of the origin of these two populations has been vague, although all explanations favor migration processes for the individual cluster members. Aims. This work revisits the S-cluster and its dusty members orbiting the supermassive black hole SgrA* on bound Keplerian orbits from a kinematic perspective. The aim is to explore the Keplerian parameters for patterns that might imply a nonrandom distribution of the sample. Additionally, various analytical aspects are considered to address the nature of the dusty sources. Methods. Based on the photometric analysis, we estimated the individual H−K and K−L colors for the source sample and compared the results to known cluster members. The classification revealed a noticeable contrast between the S-stars and the dusty sources. To fit the flux-density distribution, we utilized the radiative transfer code HYPERION and implemented a young stellar object Class I model. We obtained the position angle from the Keplerian fit results; additionally, we analyzed the distribution of the inclinations and the longitudes of the ascending node. Results. The colors of the dusty sources suggest a stellar nature consistent with the spectral energy distribution in the near and midinfrared domains. Furthermore, the evaporation timescales of dusty and gaseous clumps in the vicinity of SgrA* are much shorter ( 2yr) than the epochs covered by the observations (≈15yr). In addition to the strong evidence for the stellar classification of the D-sources, we also find a clear disk-like pattern following the arrangements of S-stars proposed in the literature. Furthermore, we find a global intrinsic inclination for all dusty sources of 60 ± 20◦, implying a common formation process. Conclusions. The pattern of the dusty sources manifested in the distribution of the position angles, inclinations, and longitudes of the ascending node strongly suggests two different scenarios: the main-sequence stars and the dusty stellar S-cluster sources share a common formation history or migrated with a similar formation channel in the vicinity of SgrA*. Alternatively, the gravitational influence of SgrA* in combination with a massive perturber, such as a putative intermediate mass black hole in the IRS 13 cluster, forces the dusty objects and S-stars to follow a particular orbital arrangement. Key words. stars: black holes– stars: formation– Galaxy: center– galaxies: star formation
3. NCBI & Entrez
• The National Center for
Biotechnology Information
advances science and health by
providing access to biomedical
and genomic information.
• Entrez is NCBI’s primary text
search and retrieval system
that integrates the PubMed
database of biomedical
literature with 39 other
literature and molecular
databases including DNA and
protein sequence, structure,
gene, genome, genetic
variation and gene expression.
4. E-utilities
• Entrez Programming Utilities
– The Entrez Programming Utilities (E-utilities) are a set of
eight server-side programs that provide a stable interface
into the Entrez query and database system at the NCBI.
– The E-utilities use a fixed URL syntax that translates a
standard set of input parameters into the values necessary
for various NCBI software components to search for and
retrieve the requested data.
E-utilitiesURL XML, FASTA, Text …
Input Output
5. Usage Guidelines and Requirements
• Use the E-utility URL
– baseURL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ …
– Python urllib/urlopen, Perl LWP::Simple, Linux wget, …
• Frequency, Timing and Registration of E-utility URL Requests
– Make no more than 3 requests per second → sleep(0.5)
– Run large jobs on weekends or between 5 PM and 9 AM EST
– Include &tool and &email in all requests
• Minimizing the Number of Requests
– &retmax=500
• Handling Special Characters Within URLs
– Space → +, " → %22, # → %23
7. ESearch (text searches)
• Responds to a text query with the list of matching UIDs in a
given database (for later use in ESummary, EFetch or ELink),
along with the term translations of the query.
• Syntax: esearch.fcgi?db=<database>&term=<query>
– Input: Entrez database (&db); Any Entrez text query (&term)
– Output: List of UIDs matching the Entrez query
• Example: Get the PubMed IDs (PMIDs) for articles about
osteosarcoma
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&
term=%22osteosarcoma%22[majr:noexp]
9. ESummary
(document summary downloads)
• Responds to a list of UIDs from a given database with the
corresponding document summaries.
• Syntax: esummary.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: XML DocSums
• Example: Download DocSums for these PubMed IDs:
24450072, 24333720, 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubme
d&id=24450072,24333720,24333432
11. EFetch (data record downloads)
• Responds to a list of UIDs in a given database with the
corresponding data records in a specified format.
• Syntax:
efetch.fcgi?db=<database>&id=<uid_list>&rettype=<retrieval
_type>&retmode=<retrieval_mode>
– Input: List of UIDs (&id); Entrez database (&db); Retrieval type
(&rettype); Retrieval mode (&retmode)
– Output: Formatted data records as specified
• Example: Download the abstract of PubMed ID 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&i
d=24333432&rettype=abstract&retmode=text
12. ELink (Entrez links)
• Responds to a list of UIDs in a given database with either a list
of related UIDs (and relevancy scores) in the same database
or a list of linked UIDs in another Entrez database
• Checks for the existence of a specified link from a list of one
or more UIDs
• Creates a hyperlink to the primary LinkOut provider for a
specific UID and database, or lists LinkOut URLs and attributes
for multiple UIDs.
13. ELink (Entrez links)
• Syntax:
elink.fcgi?dbfrom=<source_db>&db=<destination_db>&id=<u
id_list>
– Input: List of UIDs (&id); Source Entrez database (&dbfrom);
Destination Entrez database (&db)
– Output: XML containing linked UIDs from source and destination
databases
• Example: Find one set/separate sets of Gene IDs linked to
PubMed IDs 24333432 and 24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432,24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432&id=24314238
15. EGQuery (global query)
• Responds to a text query with the number of records
matching the query in each Entrez database.
• Syntax: egquery.fcgi?term=<query>
– Input: Entrez text query (&term)
– Output: XML containing the number of hits in each database.
• Example: Determine the number of records for mouse in
Entrez.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=mouse[
orgn]&retmode=xml
17. ESpell (spelling suggestions)
• Retrieves spelling suggestions for a text query in a given
database.
• Syntax: espell.fcgi?term=<query>&db=<database>
– Input: Entrez text query (&term); Entrez database (&db)
– Output: XML containing the original query and spelling suggestions.
• Example: Find spelling suggestions for the PubMed query
"osteosacoma".
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?term=osteosac
oma&db=pmc
18. EInfo (database statistics)
• Provides the number of records indexed in each field of a
given database, the date of the last update of the database,
and the available links from the database to other Entrez
databases.
• Syntax: einfo.fcgi?db=<database>
– Input: Entrez database (&db)
– Output: XML containing database statistics
• Example: Find database statistics for Entrez Protein.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein
19. EPost (UID uploads)
• Accepts a list of UIDs from a given database, stores the set on
the History Server, and responds with a query key and web
environment for the uploaded dataset.
• Syntax: epost.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: Web environment (&WebEnv) and query key (&query_key)
parameters specifying the location on the Entrez history server of the
list of uploaded UIDs
• Example: Upload five Gene IDs (7173, 22018, 54314, 403521,
525013) for later processing.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=gene&id=71
73,22018,54314,403521,525013
20. Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubme
d&term=%22osteosarcoma%22[majr:noexp]&usehistory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubm
ed&db=gene&query_key=1&WebEnv=NCID_1_220057266_130.14.
18.34_9001_1396281951_1196950266&term=%22homo+sapiens%
22[organism]&cmd=neighbor_history
3. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene
&query_key=3&WebEnv=NCID_1_220057266_130.14.18.34_9001_
1396281951_1196950266
21. Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz
• It can be used instead of "ELink".
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
• It can be used instead of "ESummary".
22. Application 2
• Find nucleotide sequences of "Burkholderia cepacia complex"
and download in GenBank format
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccor
e&term=%22burkholderia+cepacia+complex%22[organism]&usehist
ory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore
&query_key=1&WebEnv=NCID_1_264773253_130.14.22.215_9001
_1396244608_457974498&rettype=gb&retmode=text
23. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
24. "cancer copy number" articles
"Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
25. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
26. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
27. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
28. Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
30. EBot
• EBot is an interactive web tool that first allows
users to construct an arbitrary E-utility
analysis pipeline and then generates a Perl
script to execute the pipeline. The Perl script
can be downloaded and executed on any
computer with a Perl installation. For more
details, see the EBot page linked above.
– http://www.ncbi.nlm.nih.gov/Class/PowerTools/e
utils/ebot/ebot.cgi
31. Entrez Direct
• E-utilities on the UNIX Command Line
• Download from ftp://ftp.ncbi.nih.gov/entrez/entrezdirect/
• Entrez Direct Functions
– esearch performs a new Entrez search using terms in indexed fields.
– elink looks up neighbors (within a database) or links (between databases).
– efilter filters or restricts the results of a previous query.
– efetch downloads records or reports in a designated format.
– xtract converts XML into a table of data values.
– einfo obtains information on indexed fields in an Entrez database.
– epost uploads unique identifiers (UIDs) or sequence accession numbers.
– nquire sends a URL request to a web page or CGI service.
• Entering Query Commands
– esearch -db pubmed -query "opsin gene conversion" | elink -related
32. Links
• References
– Entrez Programming Utilities Help
• http://www.ncbi.nlm.nih.gov/books/NBK25501/
– Entrez Help
• http://www.ncbi.nlm.nih.gov/books/NBK3836/
• Useful Links
– Entrez Unique Identifiers (UIDs) for selected databases
• http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?r
eport=objectonly
– Valid values of &retmode and &rettype for EFetch (null = empty string)
• http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?r
eport=objectonly
– The full list of Entrez links
• http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html