SlideShare a Scribd company logo
1 of 36
Download to read offline
NCBI API – Integration into
analysis code
QBRC Tech Talk
Jiwoong Kim
Outlines
• Introduction
• Usage Guidelines of the E-utilities
• Sample Applications of the E-utilities
NCBI & Entrez
• The National Center for
Biotechnology Information
advances science and health by
providing access to biomedical
and genomic information.
• Entrez is NCBI’s primary text
search and retrieval system
that integrates the PubMed
database of biomedical
literature with 39 other
literature and molecular
databases including DNA and
protein sequence, structure,
gene, genome, genetic
variation and gene expression.
E-utilities
• Entrez Programming Utilities
– The Entrez Programming Utilities (E-utilities) are a set of
eight server-side programs that provide a stable interface
into the Entrez query and database system at the NCBI.
– The E-utilities use a fixed URL syntax that translates a
standard set of input parameters into the values necessary
for various NCBI software components to search for and
retrieve the requested data.
E-utilitiesURL XML, FASTA, Text …
Input Output
Usage Guidelines and Requirements
• Use the E-utility URL
– baseURL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ …
– Python urllib/urlopen, Perl LWP::Simple, Linux wget, …
• Frequency, Timing and Registration of E-utility URL Requests
– Make no more than 3 requests per second → sleep(0.5)
– Run large jobs on weekends or between 5 PM and 9 AM EST
– Include &tool and &email in all requests
• Minimizing the Number of Requests
– &retmax=500
• Handling Special Characters Within URLs
– Space → +, " → %22, # → %23
ESearch
ESearch (text searches)
• Responds to a text query with the list of matching UIDs in a
given database (for later use in ESummary, EFetch or ELink),
along with the term translations of the query.
• Syntax: esearch.fcgi?db=<database>&term=<query>
– Input: Entrez database (&db); Any Entrez text query (&term)
– Output: List of UIDs matching the Entrez query
• Example: Get the PubMed IDs (PMIDs) for articles about
osteosarcoma
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&
term=%22osteosarcoma%22[majr:noexp]
ESummary
ESearch
UIDs
EFetch
UID
ESummary
(document summary downloads)
• Responds to a list of UIDs from a given database with the
corresponding document summaries.
• Syntax: esummary.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: XML DocSums
• Example: Download DocSums for these PubMed IDs:
24450072, 24333720, 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubme
d&id=24450072,24333720,24333432
EFetch
ELink
EFetch (data record downloads)
• Responds to a list of UIDs in a given database with the
corresponding data records in a specified format.
• Syntax:
efetch.fcgi?db=<database>&id=<uid_list>&rettype=<retrieval
_type>&retmode=<retrieval_mode>
– Input: List of UIDs (&id); Entrez database (&db); Retrieval type
(&rettype); Retrieval mode (&retmode)
– Output: Formatted data records as specified
• Example: Download the abstract of PubMed ID 24333432
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&i
d=24333432&rettype=abstract&retmode=text
ELink (Entrez links)
• Responds to a list of UIDs in a given database with either a list
of related UIDs (and relevancy scores) in the same database
or a list of linked UIDs in another Entrez database
• Checks for the existence of a specified link from a list of one
or more UIDs
• Creates a hyperlink to the primary LinkOut provider for a
specific UID and database, or lists LinkOut URLs and attributes
for multiple UIDs.
ELink (Entrez links)
• Syntax:
elink.fcgi?dbfrom=<source_db>&db=<destination_db>&id=<u
id_list>
– Input: List of UIDs (&id); Source Entrez database (&dbfrom);
Destination Entrez database (&db)
– Output: XML containing linked UIDs from source and destination
databases
• Example: Find one set/separate sets of Gene IDs linked to
PubMed IDs 24333432 and 24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432,24314238
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme
d&db=gene&id=24333432&id=24314238
EGQuery
EGQuery (global query)
• Responds to a text query with the number of records
matching the query in each Entrez database.
• Syntax: egquery.fcgi?term=<query>
– Input: Entrez text query (&term)
– Output: XML containing the number of hits in each database.
• Example: Determine the number of records for mouse in
Entrez.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=mouse[
orgn]&retmode=xml
ESpell
ESpell (spelling suggestions)
• Retrieves spelling suggestions for a text query in a given
database.
• Syntax: espell.fcgi?term=<query>&db=<database>
– Input: Entrez text query (&term); Entrez database (&db)
– Output: XML containing the original query and spelling suggestions.
• Example: Find spelling suggestions for the PubMed query
"osteosacoma".
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?term=osteosac
oma&db=pmc
EInfo (database statistics)
• Provides the number of records indexed in each field of a
given database, the date of the last update of the database,
and the available links from the database to other Entrez
databases.
• Syntax: einfo.fcgi?db=<database>
– Input: Entrez database (&db)
– Output: XML containing database statistics
• Example: Find database statistics for Entrez Protein.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein
EPost (UID uploads)
• Accepts a list of UIDs from a given database, stores the set on
the History Server, and responds with a query key and web
environment for the uploaded dataset.
• Syntax: epost.fcgi?db=<database>&id=<uid_list>
– Input: List of UIDs (&id); Entrez database (&db)
– Output: Web environment (&WebEnv) and query key (&query_key)
parameters specifying the location on the Entrez history server of the
list of uploaded UIDs
• Example: Upload five Gene IDs (7173, 22018, 54314, 403521,
525013) for later processing.
– http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=gene&id=71
73,22018,54314,403521,525013
Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubme
d&term=%22osteosarcoma%22[majr:noexp]&usehistory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubm
ed&db=gene&query_key=1&WebEnv=NCID_1_220057266_130.14.
18.34_9001_1396281951_1196950266&term=%22homo+sapiens%
22[organism]&cmd=neighbor_history
3. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene
&query_key=3&WebEnv=NCID_1_220057266_130.14.18.34_9001_
1396281951_1196950266
Application 1
• Find related human genes to articles searched for non-
extended MeSH term "Osteosarcoma" (PubMed → Gene)
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz
• It can be used instead of "ELink".
– ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
• It can be used instead of "ESummary".
Application 2
• Find nucleotide sequences of "Burkholderia cepacia complex"
and download in GenBank format
1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccor
e&term=%22burkholderia+cepacia+complex%22[organism]&usehist
ory=y
2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore
&query_key=1&WebEnv=NCID_1_264773253_130.14.22.215_9001
_1396244608_457974498&rettype=gb&retmode=text
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
"cancer copy number" articles
"Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
cancer "copy number"
esearch.fcgi?db=pubmed
Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter]
esearch.fcgi?db=gds
esummary.fcgi?db=pubmed
WebEnv, query_key
esummary.fcgi?db=gds
WebEnv, query_key
GPL9704
GPL8226
GPL6804
GPL6801
elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds
Parsing
Result table
Common
PubMed title
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
Make custom scripts with XML-parser
EBot
• EBot is an interactive web tool that first allows
users to construct an arbitrary E-utility
analysis pipeline and then generates a Perl
script to execute the pipeline. The Perl script
can be downloaded and executed on any
computer with a Perl installation. For more
details, see the EBot page linked above.
– http://www.ncbi.nlm.nih.gov/Class/PowerTools/e
utils/ebot/ebot.cgi
Entrez Direct
• E-utilities on the UNIX Command Line
• Download from ftp://ftp.ncbi.nih.gov/entrez/entrezdirect/
• Entrez Direct Functions
– esearch performs a new Entrez search using terms in indexed fields.
– elink looks up neighbors (within a database) or links (between databases).
– efilter filters or restricts the results of a previous query.
– efetch downloads records or reports in a designated format.
– xtract converts XML into a table of data values.
– einfo obtains information on indexed fields in an Entrez database.
– epost uploads unique identifiers (UIDs) or sequence accession numbers.
– nquire sends a URL request to a web page or CGI service.
• Entering Query Commands
– esearch -db pubmed -query "opsin gene conversion" | elink -related
Links
• References
– Entrez Programming Utilities Help
• http://www.ncbi.nlm.nih.gov/books/NBK25501/
– Entrez Help
• http://www.ncbi.nlm.nih.gov/books/NBK3836/
• Useful Links
– Entrez Unique Identifiers (UIDs) for selected databases
• http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?r
eport=objectonly
– Valid values of &retmode and &rettype for EFetch (null = empty string)
• http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?r
eport=objectonly
– The full list of Entrez links
• http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html
NCBI databases
• Literature: PubMed, PubMed Central, NLM Catalog, MeSH, Books, Site
Search
• Health: PubMed Health, MedGen, GTR, dbGaP, ClinVar, OMIM, OMIA
• Organisms: Taxonomy
• Nucleotide Sequences: Nucleotide, GSS, EST, SRA, PopSet, Probe
• Genomes: Genome, Assembly, Epigenomics, UniSTS, SNP, dbVar,
BioProject, BioSample, Clone
• Genes: Gene, HomoloGene, UniGene, GEO Profiles, GEO DataSets
• Proteins: Protein, Conserved Domains, Protein Clusters, Structure
• Chemicals: PubChem Compound, PubChem Substance, PubChem BioAssay
• Pathways: BioSystems
E-utilities
• Eight server-side programs
– ESearch : Searching a Database
– EPost : Uploading UIDs to Entrez
– ESummary : Downloading Document Summaries
– EFetch : Downloading Full Records
– ELink : Finding Related Data Through Entrez Links
– EInfo : Getting Database Statistics and Search Fields
– EGQuery : Performing a Global Entrez Search
– ESpell : Retrieving Spelling Suggestions
Sample Applications of the E-utilities
• Basic pipelines
– ESearch - ESummary/EFetch
– EPost - ESummary/EFetch
– ELink - ESummary/Efetch
– ESearch - ELink - ESummary/EFetch
– EPost - ELink - ESummary/EFetch
– EPost - ESearch
– ELink - ESearch
Application 3
• Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array"
platform GEO Datasets
1. tr 'n' 't' < cancer_copy_number.pubmed_result.txt | sed 's/tt/n/g' | sed 's/^t[0-9]*: //' | sed 's/t/ /g' >
cancer_copy_number.pubmed_result.oneLine.txt
2. sed 's/^.* PubMed *PMID: *//' cancer_copy_number.pubmed_result.oneLine.txt | sed 's/; .*//' | sed 's/.$//' >
cancer_copy_number.pubmed_ids.txt
3. for id in $(cat cancer_copy_number.pubmed_ids.txt); do perl ~/scripts/elink.pl pubmed gds $id pubmed_gds | sed
"s/^/$idt/"; done > cancer_copy_number.pubmed_gds_ids.txt
4. awk -F't' '($1 == "Platform")' Affymetrix_Genome-Wide_Human_SNP_Array.gds_result.txt | cut -f2 | sed
's/^Accession: //' > Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt
5. for platform in $(cat Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt); do perl
~/scripts/esearch.pl gds $platform; done | sort -nu > Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt
6. paste cancer_copy_number.pubmed_ids.txt cancer_copy_number.pubmed_result.oneLine.txt | perl
~/scripts/table.addColumns.pl cancer_copy_number.pubmed_gds_ids.txt 0 - 0 1 | perl ~/scripts/table.search.pl
Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt 0 - 1 | perl ~/scripts/table.mergeLines.pl -d ', ' - 0,2 >
cancer_copy_number.Affymetrix_Genome-Wide_Human_SNP_Array.pubmed_gds.txt

More Related Content

What's hot

2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_uploadProf. Wim Van Criekinge
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenisBOSC 2010
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_uploadProf. Wim Van Criekinge
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekingeProf. Wim Van Criekinge
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Functional Genomics Data Society
 
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_uploadProf. Wim Van Criekinge
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkBOSC 2010
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarJenny Molloy
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giantsBenjamin Good
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisBOSC 2010
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Prof. Wim Van Criekinge
 

What's hot (20)

2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload2019 03 05_biological_databases_part4_v_upload
2019 03 05_biological_databases_part4_v_upload
 
Swertz bosc2010 molgenis
Swertz bosc2010 molgenisSwertz bosc2010 molgenis
Swertz bosc2010 molgenis
 
2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload2019 03 05_biological_databases_part3_v_upload
2019 03 05_biological_databases_part3_v_upload
 
2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge2016 bioinformatics i_databases_wim_vancriekinge
2016 bioinformatics i_databases_wim_vancriekinge
 
Ibn Sina
Ibn SinaIbn Sina
Ibn Sina
 
Cshl minseqe 2013_ouellette
Cshl minseqe 2013_ouelletteCshl minseqe 2013_ouellette
Cshl minseqe 2013_ouellette
 
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
Martin Ringwald, Mouse Gene Expression DB, fged_seattle_2013
 
ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014ENCODE-DCC-metadata-standard-Biocurator 2014
ENCODE-DCC-metadata-standard-Biocurator 2014
 
2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload2019 02 12_biological_databases_part1_v_upload
2019 02 12_biological_databases_part1_v_upload
 
Mercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_frameworkMercer bosc2010 microsoft_framework
Mercer bosc2010 microsoft_framework
 
2020 02 11_biological_databases_part1
2020 02 11_biological_databases_part12020 02 11_biological_databases_part1
2020 02 11_biological_databases_part1
 
CSHALS 2013
CSHALS 2013CSHALS 2013
CSHALS 2013
 
NETTAB 2012
NETTAB 2012NETTAB 2012
NETTAB 2012
 
ContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data SeminarContentMine Presentation for WHO Health Data Seminar
ContentMine Presentation for WHO Health Data Seminar
 
Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012Drug Discovery- ELRIG -2012
Drug Discovery- ELRIG -2012
 
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013Michael Reich, GenomeSpace Workshop, fged_seattle_2013
Michael Reich, GenomeSpace Workshop, fged_seattle_2013
 
Computing on the shoulders of giants
Computing on the shoulders of giantsComputing on the shoulders of giants
Computing on the shoulders of giants
 
Kanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenisKanterakis bosc2010 molgenis
Kanterakis bosc2010 molgenis
 
4A2B2C-2013
4A2B2C-20134A2B2C-2013
4A2B2C-2013
 
Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]Bio ontologies and semantic technologies[2]
Bio ontologies and semantic technologies[2]
 

Viewers also liked (13)

Philosophy vs. education
Philosophy vs. educationPhilosophy vs. education
Philosophy vs. education
 
Comparing with adverbs
Comparing with adverbsComparing with adverbs
Comparing with adverbs
 
Biography of Stephen King and His Works
Biography of Stephen King and His WorksBiography of Stephen King and His Works
Biography of Stephen King and His Works
 
Region x northern mindanao by bumanglag and ternio
Region x  northern mindanao by bumanglag and ternioRegion x  northern mindanao by bumanglag and ternio
Region x northern mindanao by bumanglag and ternio
 
Ncbts vs. code of ethics
Ncbts  vs. code of ethicsNcbts  vs. code of ethics
Ncbts vs. code of ethics
 
NCBTS Domain 2: Learning Environment
NCBTS Domain 2: Learning EnvironmentNCBTS Domain 2: Learning Environment
NCBTS Domain 2: Learning Environment
 
NCBTS
NCBTSNCBTS
NCBTS
 
NCBTS Framework
NCBTS FrameworkNCBTS Framework
NCBTS Framework
 
NCBTS Worksheet
NCBTS WorksheetNCBTS Worksheet
NCBTS Worksheet
 
National Competency Based Teachers Standard
National Competency Based Teachers StandardNational Competency Based Teachers Standard
National Competency Based Teachers Standard
 
NCBTS
NCBTSNCBTS
NCBTS
 
N.C.B.T.S.-National Competency-Based Teacher's Standard (2013)
N.C.B.T.S.-National Competency-Based Teacher's Standard (2013)N.C.B.T.S.-National Competency-Based Teacher's Standard (2013)
N.C.B.T.S.-National Competency-Based Teacher's Standard (2013)
 
Code of Ethics for Professional Teachers of the Philippines
Code of Ethics for Professional Teachers of the PhilippinesCode of Ethics for Professional Teachers of the Philippines
Code of Ethics for Professional Teachers of the Philippines
 

Similar to NCBI API – Integration into analysis code and custom scripts

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeChunlei Wu
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchJeremy Leipzig
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalENCODE-DCC
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchAnshika Bansal
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web servicesTrish Whetzel
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptxSilpa87
 
Data retreival system
Data retreival systemData retreival system
Data retreival systemShikha Thakur
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBioinformaticsCentre
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdfnedalalazzwy
 
openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015Pablo Pazos
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataBarry Smith
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API ENCODE-DCC
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...ICZN
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologySnow Owl
 

Similar to NCBI API – Integration into analysis code and custom scripts (20)

BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical KnowledgeBioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
BioThings API: Building a FAIR API Ecosystem for Biomedical Knowledge
 
The Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational ResearchThe Role of Metadata in Reproducible Computational Research
The Role of Metadata in Reproducible Computational Research
 
Biothings presentation
Biothings presentationBiothings presentation
Biothings presentation
 
Metadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE PortalMetadata-based tools at the ENCODE Portal
Metadata-based tools at the ENCODE Portal
 
Major databases in bioinformatics
Major databases in bioinformaticsMajor databases in bioinformatics
Major databases in bioinformatics
 
Harvester I
Harvester IHarvester I
Harvester I
 
Role of bioinformatics in life sciences research
Role of bioinformatics in life sciences researchRole of bioinformatics in life sciences research
Role of bioinformatics in life sciences research
 
NCBO Tools and Web services
NCBO Tools and Web servicesNCBO Tools and Web services
NCBO Tools and Web services
 
Variant analysis and whole exome sequencing
Variant analysis and whole exome sequencingVariant analysis and whole exome sequencing
Variant analysis and whole exome sequencing
 
Databases_CSS2.pptx
Databases_CSS2.pptxDatabases_CSS2.pptx
Databases_CSS2.pptx
 
Data retreival system
Data retreival systemData retreival system
Data retreival system
 
Biological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdfBiological Database (1)pptxpdfpdfpdf.pdf
Biological Database (1)pptxpdfpdfpdf.pdf
 
Bioinformatics مي.pdf
Bioinformatics  مي.pdfBioinformatics  مي.pdf
Bioinformatics مي.pdf
 
openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015openEHR Developers Workshop at #MedInfo2015
openEHR Developers Workshop at #MedInfo2015
 
Enhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort DataEnhancing the Quality of ImmPort Data
Enhancing the Quality of ImmPort Data
 
Data Retrieval Systems
Data Retrieval SystemsData Retrieval Systems
Data Retrieval Systems
 
The ENCODE Portal REST API
The ENCODE Portal REST API The ENCODE Portal REST API
The ENCODE Portal REST API
 
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
Yde de Jong & Dave Roberts - ZooBank and EDIT: Towards a business model for Z...
 
The Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to TerminologyThe Logical Model Designer - Binding Information Models to Terminology
The Logical Model Designer - Binding Information Models to Terminology
 
Intro to databases
Intro to databasesIntro to databases
Intro to databases
 

Recently uploaded

Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)riyaescorts54
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRlizamodels9
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPirithiRaju
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxPABOLU TEJASREE
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2John Carlo Rollon
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naJASISJULIANOELYNV
 

Recently uploaded (20)

Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
(9818099198) Call Girls In Noida Sector 14 (NOIDA ESCORTS)
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCRCall Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
Call Girls In Nihal Vihar Delhi ❤️8860477959 Looking Escorts In 24/7 Delhi NCR
 
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdfPests of jatropha_Bionomics_identification_Dr.UPR.pdf
Pests of jatropha_Bionomics_identification_Dr.UPR.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptxBREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
BREEDING FOR RESISTANCE TO BIOTIC STRESS.pptx
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2Evidences of Evolution General Biology 2
Evidences of Evolution General Biology 2
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
FREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by naFREE NURSING BUNDLE FOR NURSES.PDF by na
FREE NURSING BUNDLE FOR NURSES.PDF by na
 

NCBI API – Integration into analysis code and custom scripts

  • 1. NCBI API – Integration into analysis code QBRC Tech Talk Jiwoong Kim
  • 2. Outlines • Introduction • Usage Guidelines of the E-utilities • Sample Applications of the E-utilities
  • 3. NCBI & Entrez • The National Center for Biotechnology Information advances science and health by providing access to biomedical and genomic information. • Entrez is NCBI’s primary text search and retrieval system that integrates the PubMed database of biomedical literature with 39 other literature and molecular databases including DNA and protein sequence, structure, gene, genome, genetic variation and gene expression.
  • 4. E-utilities • Entrez Programming Utilities – The Entrez Programming Utilities (E-utilities) are a set of eight server-side programs that provide a stable interface into the Entrez query and database system at the NCBI. – The E-utilities use a fixed URL syntax that translates a standard set of input parameters into the values necessary for various NCBI software components to search for and retrieve the requested data. E-utilitiesURL XML, FASTA, Text … Input Output
  • 5. Usage Guidelines and Requirements • Use the E-utility URL – baseURL: http://eutils.ncbi.nlm.nih.gov/entrez/eutils/ … – Python urllib/urlopen, Perl LWP::Simple, Linux wget, … • Frequency, Timing and Registration of E-utility URL Requests – Make no more than 3 requests per second → sleep(0.5) – Run large jobs on weekends or between 5 PM and 9 AM EST – Include &tool and &email in all requests • Minimizing the Number of Requests – &retmax=500 • Handling Special Characters Within URLs – Space → +, " → %22, # → %23
  • 7. ESearch (text searches) • Responds to a text query with the list of matching UIDs in a given database (for later use in ESummary, EFetch or ELink), along with the term translations of the query. • Syntax: esearch.fcgi?db=<database>&term=<query> – Input: Entrez database (&db); Any Entrez text query (&term) – Output: List of UIDs matching the Entrez query • Example: Get the PubMed IDs (PMIDs) for articles about osteosarcoma – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed& term=%22osteosarcoma%22[majr:noexp]
  • 9. ESummary (document summary downloads) • Responds to a list of UIDs from a given database with the corresponding document summaries. • Syntax: esummary.fcgi?db=<database>&id=<uid_list> – Input: List of UIDs (&id); Entrez database (&db) – Output: XML DocSums • Example: Download DocSums for these PubMed IDs: 24450072, 24333720, 24333432 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pubme d&id=24450072,24333720,24333432
  • 11. EFetch (data record downloads) • Responds to a list of UIDs in a given database with the corresponding data records in a specified format. • Syntax: efetch.fcgi?db=<database>&id=<uid_list>&rettype=<retrieval _type>&retmode=<retrieval_mode> – Input: List of UIDs (&id); Entrez database (&db); Retrieval type (&rettype); Retrieval mode (&retmode) – Output: Formatted data records as specified • Example: Download the abstract of PubMed ID 24333432 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&i d=24333432&rettype=abstract&retmode=text
  • 12. ELink (Entrez links) • Responds to a list of UIDs in a given database with either a list of related UIDs (and relevancy scores) in the same database or a list of linked UIDs in another Entrez database • Checks for the existence of a specified link from a list of one or more UIDs • Creates a hyperlink to the primary LinkOut provider for a specific UID and database, or lists LinkOut URLs and attributes for multiple UIDs.
  • 13. ELink (Entrez links) • Syntax: elink.fcgi?dbfrom=<source_db>&db=<destination_db>&id=<u id_list> – Input: List of UIDs (&id); Source Entrez database (&dbfrom); Destination Entrez database (&db) – Output: XML containing linked UIDs from source and destination databases • Example: Find one set/separate sets of Gene IDs linked to PubMed IDs 24333432 and 24314238 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme d&db=gene&id=24333432,24314238 – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubme d&db=gene&id=24333432&id=24314238
  • 15. EGQuery (global query) • Responds to a text query with the number of records matching the query in each Entrez database. • Syntax: egquery.fcgi?term=<query> – Input: Entrez text query (&term) – Output: XML containing the number of hits in each database. • Example: Determine the number of records for mouse in Entrez. – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/egquery.fcgi?term=mouse[ orgn]&retmode=xml
  • 17. ESpell (spelling suggestions) • Retrieves spelling suggestions for a text query in a given database. • Syntax: espell.fcgi?term=<query>&db=<database> – Input: Entrez text query (&term); Entrez database (&db) – Output: XML containing the original query and spelling suggestions. • Example: Find spelling suggestions for the PubMed query "osteosacoma". – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/espell.fcgi?term=osteosac oma&db=pmc
  • 18. EInfo (database statistics) • Provides the number of records indexed in each field of a given database, the date of the last update of the database, and the available links from the database to other Entrez databases. • Syntax: einfo.fcgi?db=<database> – Input: Entrez database (&db) – Output: XML containing database statistics • Example: Find database statistics for Entrez Protein. – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?db=protein
  • 19. EPost (UID uploads) • Accepts a list of UIDs from a given database, stores the set on the History Server, and responds with a query key and web environment for the uploaded dataset. • Syntax: epost.fcgi?db=<database>&id=<uid_list> – Input: List of UIDs (&id); Entrez database (&db) – Output: Web environment (&WebEnv) and query key (&query_key) parameters specifying the location on the Entrez history server of the list of uploaded UIDs • Example: Upload five Gene IDs (7173, 22018, 54314, 403521, 525013) for later processing. – http://eutils.ncbi.nlm.nih.gov/entrez/eutils/epost.fcgi?db=gene&id=71 73,22018,54314,403521,525013
  • 20. Application 1 • Find related human genes to articles searched for non- extended MeSH term "Osteosarcoma" (PubMed → Gene) 1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubme d&term=%22osteosarcoma%22[majr:noexp]&usehistory=y 2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi?dbfrom=pubm ed&db=gene&query_key=1&WebEnv=NCID_1_220057266_130.14. 18.34_9001_1396281951_1196950266&term=%22homo+sapiens% 22[organism]&cmd=neighbor_history 3. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=gene &query_key=3&WebEnv=NCID_1_220057266_130.14.18.34_9001_ 1396281951_1196950266
  • 21. Application 1 • Find related human genes to articles searched for non- extended MeSH term "Osteosarcoma" (PubMed → Gene) – ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2pubmed.gz • It can be used instead of "ELink". – ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz • It can be used instead of "ESummary".
  • 22. Application 2 • Find nucleotide sequences of "Burkholderia cepacia complex" and download in GenBank format 1. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=nuccor e&term=%22burkholderia+cepacia+complex%22[organism]&usehist ory=y 2. http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nuccore &query_key=1&WebEnv=NCID_1_264773253_130.14.22.215_9001 _1396244608_457974498&rettype=gb&retmode=text
  • 23. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets cancer "copy number" esearch.fcgi?db=pubmed Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter] esearch.fcgi?db=gds esummary.fcgi?db=pubmed WebEnv, query_key esummary.fcgi?db=gds WebEnv, query_key GPL9704 GPL8226 GPL6804 GPL6801 elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds Parsing Result table Common PubMed title
  • 24. "cancer copy number" articles "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets
  • 25. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets cancer "copy number" esearch.fcgi?db=pubmed Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter] esearch.fcgi?db=gds esummary.fcgi?db=pubmed WebEnv, query_key esummary.fcgi?db=gds WebEnv, query_key GPL9704 GPL8226 GPL6804 GPL6801 elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds Parsing Result table Common PubMed title
  • 26. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets
  • 27. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets cancer "copy number" esearch.fcgi?db=pubmed Affymetrix "Genome-Wide" Human "SNP Array" AND gpl[Filter] esearch.fcgi?db=gds esummary.fcgi?db=pubmed WebEnv, query_key esummary.fcgi?db=gds WebEnv, query_key GPL9704 GPL8226 GPL6804 GPL6801 elink.fcgi?dbfrom=pubmed&db=gds esearch.fcgi?db=gds Parsing Result table Common PubMed title
  • 28. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets
  • 29. Make custom scripts with XML-parser
  • 30. EBot • EBot is an interactive web tool that first allows users to construct an arbitrary E-utility analysis pipeline and then generates a Perl script to execute the pipeline. The Perl script can be downloaded and executed on any computer with a Perl installation. For more details, see the EBot page linked above. – http://www.ncbi.nlm.nih.gov/Class/PowerTools/e utils/ebot/ebot.cgi
  • 31. Entrez Direct • E-utilities on the UNIX Command Line • Download from ftp://ftp.ncbi.nih.gov/entrez/entrezdirect/ • Entrez Direct Functions – esearch performs a new Entrez search using terms in indexed fields. – elink looks up neighbors (within a database) or links (between databases). – efilter filters or restricts the results of a previous query. – efetch downloads records or reports in a designated format. – xtract converts XML into a table of data values. – einfo obtains information on indexed fields in an Entrez database. – epost uploads unique identifiers (UIDs) or sequence accession numbers. – nquire sends a URL request to a web page or CGI service. • Entering Query Commands – esearch -db pubmed -query "opsin gene conversion" | elink -related
  • 32. Links • References – Entrez Programming Utilities Help • http://www.ncbi.nlm.nih.gov/books/NBK25501/ – Entrez Help • http://www.ncbi.nlm.nih.gov/books/NBK3836/ • Useful Links – Entrez Unique Identifiers (UIDs) for selected databases • http://www.ncbi.nlm.nih.gov/books/NBK25497/table/chapter2.chapter2_table1/?r eport=objectonly – Valid values of &retmode and &rettype for EFetch (null = empty string) • http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.chapter4_table1/?r eport=objectonly – The full list of Entrez links • http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html
  • 33. NCBI databases • Literature: PubMed, PubMed Central, NLM Catalog, MeSH, Books, Site Search • Health: PubMed Health, MedGen, GTR, dbGaP, ClinVar, OMIM, OMIA • Organisms: Taxonomy • Nucleotide Sequences: Nucleotide, GSS, EST, SRA, PopSet, Probe • Genomes: Genome, Assembly, Epigenomics, UniSTS, SNP, dbVar, BioProject, BioSample, Clone • Genes: Gene, HomoloGene, UniGene, GEO Profiles, GEO DataSets • Proteins: Protein, Conserved Domains, Protein Clusters, Structure • Chemicals: PubChem Compound, PubChem Substance, PubChem BioAssay • Pathways: BioSystems
  • 34. E-utilities • Eight server-side programs – ESearch : Searching a Database – EPost : Uploading UIDs to Entrez – ESummary : Downloading Document Summaries – EFetch : Downloading Full Records – ELink : Finding Related Data Through Entrez Links – EInfo : Getting Database Statistics and Search Fields – EGQuery : Performing a Global Entrez Search – ESpell : Retrieving Spelling Suggestions
  • 35. Sample Applications of the E-utilities • Basic pipelines – ESearch - ESummary/EFetch – EPost - ESummary/EFetch – ELink - ESummary/Efetch – ESearch - ELink - ESummary/EFetch – EPost - ELink - ESummary/EFetch – EPost - ESearch – ELink - ESearch
  • 36. Application 3 • Find "cancer copy number" articles with "Affymetrix Genome-Wide Human SNP Array" platform GEO Datasets 1. tr 'n' 't' < cancer_copy_number.pubmed_result.txt | sed 's/tt/n/g' | sed 's/^t[0-9]*: //' | sed 's/t/ /g' > cancer_copy_number.pubmed_result.oneLine.txt 2. sed 's/^.* PubMed *PMID: *//' cancer_copy_number.pubmed_result.oneLine.txt | sed 's/; .*//' | sed 's/.$//' > cancer_copy_number.pubmed_ids.txt 3. for id in $(cat cancer_copy_number.pubmed_ids.txt); do perl ~/scripts/elink.pl pubmed gds $id pubmed_gds | sed "s/^/$idt/"; done > cancer_copy_number.pubmed_gds_ids.txt 4. awk -F't' '($1 == "Platform")' Affymetrix_Genome-Wide_Human_SNP_Array.gds_result.txt | cut -f2 | sed 's/^Accession: //' > Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt 5. for platform in $(cat Affymetrix_Genome-Wide_Human_SNP_Array.platform_accessions.txt); do perl ~/scripts/esearch.pl gds $platform; done | sort -nu > Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt 6. paste cancer_copy_number.pubmed_ids.txt cancer_copy_number.pubmed_result.oneLine.txt | perl ~/scripts/table.addColumns.pl cancer_copy_number.pubmed_gds_ids.txt 0 - 0 1 | perl ~/scripts/table.search.pl Affymetrix_Genome-Wide_Human_SNP_Array.gds_ids.txt 0 - 1 | perl ~/scripts/table.mergeLines.pl -d ', ' - 0,2 > cancer_copy_number.Affymetrix_Genome-Wide_Human_SNP_Array.pubmed_gds.txt