Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Advanced NCBI

3,589 views

Published on

Course NCBI / Entrez Web-Services Blast XML, XSLT, LINUX, Bioinformatics
Université de Nantes France
25 Sept 2013

Published in: Health & Medicine
  • Be the first to comment

Advanced NCBI

  1. 1. Advanced NCBI. The Entrez API https://github.com/lindenb/courses Pierre Lindenbaum @yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.com Institut du Thorax. Nantes. France September 27, 2016 Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  2. 2. NCBI ? What about EBI, ENSEMBL, ... Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  3. 3. Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  4. 4. What will be covered today? : File formats... EInfo, GQuery, ESearch , Esummary, EFetch.. processing XML answer with XSLT: HTML, SVG, R... generating a java parser for dbSNP. NCBI EBot using standalone BLAST Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  5. 5. CURL c u r l ” http :// en . w i k i p e d i a . org / wiki /Main page” wget −O − ” http :// en . w i k i p e d i a . org / wiki /Main page” Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  6. 6. XML Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  7. 7. XSLT Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  8. 8. XSLT Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  9. 9. XSLTPROC x s l t p r o c s t y l e s h e e t . x s l f i l e . xml > r e s u l t . xml Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  10. 10. JSON Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  11. 11. Formats Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  12. 12. Formats Genbank https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch. fcgi?db=nucleotide&id=25&rettype=gb LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992 DEFINITION Blue Whale heavy s a t e l l i t e DNA. ACCESSION X53813 X17460 VERSION X53813 .1 GI :25 KEYWORDS s a t e l l i t e DNA. SOURCE Balaenoptera musculus ( Blue whale ) ORGANISM Balaenoptera musculus Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera . REFERENCE 1 ( bases 1 to 422) AUTHORS Arnason ,U. and Widegren ,B. TITLE Composition and chromosomal l o c a l i z a t i o n of cetacean h i g h l y r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the blue whale , Balaenoptera musculus JOURNAL Chromosoma 98 (5) , 323−329 (1989) PUBMED 2612291 COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common cetacean component c l o n e s and <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s . FEATURES Location / Q u a l i f i e r s source 1 . . 4 2 2 / organism=”Balaenoptera musculus ” / mol type=”genomic DNA” / d b x r e f=”taxon :9771” / c l o n e =”7” m i s c f e a t u r e 1 . . 4 2 2 / note=”heavy s a t e l l i t e DNA” ORIGIN 1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t gPierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  13. 13. Formats ASN.1 https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch. fcgi?db=nucleotide&id=25 Seq−e n t r y ::= seq { i d { embl { a c c e s s i o n ”X53813” , v e r s i o n 1 } , g i 25 } , d e s c r { t i t l e ” Blue Whale heavy s a t e l l i t e DNA” , source { org { taxname ” Balaenoptera musculus ” , common ” Blue whale ” , db { { db ” taxon ” , tag i d 9771 } } , orgname { name b i no m i al { genus ” Balaenoptera ” , s p e c i e s ” musculus ” } , l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; Craniata ; Ve r t e b r a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera ” , gcode 1 , mgcode 2 , d i v ”MAM” } } , subtype {Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  14. 14. Formats ASN.1 (schema) http: //www.ncbi.nlm.nih.gov/data_specs/asn/insdseq.asn INSDSeq ::= SEQUENCE { l o c u s V i s i b l e S t r i n g , l e n g t h INTEGER , s t r a n d e d n e s s V i s i b l e S t r i n g OPTIONAL , moltype V i s i b l e S t r i n g , topology V i s i b l e S t r i n g OPTIONAL , d i v i s i o n V i s i b l e S t r i n g , update−date V i s i b l e S t r i n g , create−date V i s i b l e S t r i n g OPTIONAL , update−r e l e a s e V i s i b l e S t r i n g OPTIONAL , create−r e l e a s e V i s i b l e S t r i n g OPTIONAL , d e f i n i t i o n V i s i b l e S t r i n g , primary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL , entry−v e r s i o n V i s i b l e S t r i n g OPTIONAL , a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL , other−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL , secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL, p r o j e c t V i s i b l e S t r i n g OPTIONAL , keywords SEQUENCE OF INSDKeyword OPTIONAL , segment V i s i b l e S t r i n g OPTIONAL , source V i s i b l e S t r i n g OPTIONAL , organism V i s i b l e S t r i n g OPTIONAL , taxonomy V i s i b l e S t r i n g OPTIONAL , r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL , comment V i s i b l e S t r i n g OPTIONAL , comment−s e t SEQUENCE OF INSDComment OPTIONAL , struc−comments SEQUENCE OF INSDStrucComment OPTIONAL , primary V i s i b l e S t r i n g OPTIONAL , source−db V i s i b l e S t r i n g OPTIONAL ,Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  15. 15. Formats ASN.1 (tools) DATATOOL Generate C++ data storage classes based on ASN.1 serialization streams. Convert data between ASN.1, XML and JSON formats. Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  16. 16. Formats XML https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch. fcgi?db=nucleotide&id=25&retmode=xml <?xml v e r s i o n=” 1.0 ”?> <!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncbi . nlm . nih . gov/ dtd /NCBI G <GBSet> <GBSeq> <GBSeq locus>X53813</ GBSeq locus> <GBSeq length>422</ GBSeq length> <GBSeq strandedness>double</ GBSeq strandedness> <GBSeq moltype>DNA</GBSeq moltype> <GBSeq topology>l i n e a r</ GBSeq topology> <GBSeq division>MAM</ GBSeq division> <GBSeq update−date>22−JUN−1992</GBSeq update−date> <GBSeq create−date>13−JUL−1990</ GBSeq create−date> <G B S e q d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ G B S e q d e f i n i t i o n> <GBSeq primary−a c c e s s i o n>X53813</ GBSeq primary−a c c e s s i o n> <GBSeq accession−v e r s i o n>X53813 .1</ GBSeq accession−v e r s i o n> <GBSeq other−s e q i d s> <GBSeqid>emb| X53813 . 1 |</GBSeqid> <GBSeqid>g i |25</GBSeqid> </ GBSeq other−s e q i d s> <GBSeq secondary−a c c e s s i o n s> <GBSecondary−accn>X17460</GBSecondary−accn> </ GBSeq secondary−a c c e s s i o n s> <GBSeq keywords> <GBKeyword>s a t e l l i t e DNA</GBKeyword> </GBSeq keywords> <GBSeq source>Balaenoptera musculus ( Blue whale )</ GBSeq source> <GBSeq organism>Balaenoptera musculus</ GBSeq organism> <GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t eb r a t a ; Euteleostomi ; Mam a c t y l a ; Cetacea ; M y s t i c e t i ; B a l a e n o p t e r i d a e ; Balaenoptera</GBSeq taxonomy>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  17. 17. Formats XML (DTD) http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.mod.dtd <!ELEMENT GBSeq ( GBSeq locus , GBSeq length , GBSeq strandedness ? , GBSeq moltype , GBSeq topology ? , GBSeq division , GBSeq update−date , GBSeq create−date ? , GBSeq update−r e l e a s e ? , GBSeq create−r e l e a s e ? , GBSeq definition , GBSeq primary−a c c e s s i o n ? , GBSeq entry−v e r s i o n ? , GBSeq accession−v e r s i o n ? , GBSeq other−s e q i d s ? , GBSeq secondary−a c c e s s i o n s ? , GBSeq project ? , GBSeq keywords ? , GBSeq segment ? , GBSeq source ? , GBSeq organism ? , GBSeq taxonomy ? , GBSeq references ? , GBSeq comment ? , GBSeq comment−s e t ? , GBSeq struc−comments ? , ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  18. 18. E-Utilities Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  19. 19. GI Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  20. 20. GI http://www.ncbi.nlm.nih.gov/news/ 03-02-2016-phase-out-of-GI-numbers/ : ”NCBI is phasing out sequence GIs - use Accession.Version instead!” Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  21. 21. E-Utilities Set of seven server-side programs that provide a stable interface to the search, retrieval, and linking functions of the Entrez system, using a fixed URL syntax. The output provided by the E-Utilities is in XML format, sometimes JSON, (...) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  22. 22. Entrez Direct http://www.ncbi.nlm.nih.gov/books/NBK179288/ ”Entrez Direct (EDirect) is an advanced method for accessing the NCBI’s set of interconnected databases (publication, sequence, structure, gene, variation, expression, etc.) from a UNIX terminal window. Functions take search terms from command-line arguments. Individual operations are combined to build multi-step queries. Record retrieval and formatting normally complete the process.” Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  23. 23. EInfo Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  24. 24. EInfo Provides a list of the names of all valid Entrez databases. Provides statistics for a single database, including lists of indexing fields and available link names. Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  25. 25. EInfo Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  26. 26. EInfo XML Ouput https: //eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi <e I n f o R e s u l t> <DbList> <DbName>pubmed</DbName> <DbName>p r o t e i n</DbName> <DbName>nuccore</DbName> <DbName>n u c l e o t i d e</DbName> <DbName>nucgss</DbName> <DbName>nucest</DbName> <DbName>s t r u c t u r e</DbName> <DbName>genome</DbName> <DbName>assembly</DbName> <DbName>gcassembly</DbName> <DbName>genomeprj</DbName> <DbName>b i o p r o j e c t</DbName> <DbName>biosample</DbName> <DbName>biosystems</DbName> <DbName>b l a s t d b i n f o</DbName>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  27. 27. EInfo JSON Ouput https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo. fcgi?retmode=json { ” header ”: { ” type ”: ” e i n f o ” , ” v e r s i o n ”: ”0.3” } , ” e i n f o r e s u l t ”: { ” d b l i s t ”: [ ”pubmed” , ” p r o t e i n ” , ” nuccore ” , ( . . . ) ” unigene ” , ” g e n c o l l ” , ” gtr ” ] } }Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  28. 28. EInfo Return statistics for a given Entrez database: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi? db=DbName Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  29. 29. EInfo Statistics for Pubmed https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo. fcgi?db=pubmed <?xml v e r s i o n=” 1.0 ”?> <e I n f o R e s u l t> <DbInfo> <DbName>pubmed</DbName> <MenuName>PubMed</MenuName> <D e s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ D e s c r i p t i o n> <DbBuild>Build130805 −2117m.4</ DbBuild> <Count>22974581</Count> <LastUpdate>2013/08/06 08 :33</ LastUpdate> <F i e l d L i s t> ( . . . ) <F i e l d> <Name>UID</Name> <FullName>UID</FullName> <D e s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ D e s c r i p t i o n> <TermCount>0</TermCount> <IsDate>N</ IsDate> <I s N u m e r i c a l>Y</ I s N u m e r i c a l> <SingleToken>Y</ SingleToken> <H i e r a r c h y>N</ H i e r a r c h y> <IsHidden>Y</ IsHidden> </ F i e l d> <F i e l d> ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  30. 30. EInfo Statistics for Pubmed https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo. fcgi?db=pubmed&retmode=json { ” header ”: { ” type ”: ” e i n f o ” , ” v e r s i o n ”: ”0.3” } , ” e i n f o r e s u l t ”: { ” d b i n f o ”: { ”dbname ”: ”pubmed ” , ”menuname ”: ”PubMed” , ” d e s c r i p t i o n ”: ”PubMed b i b l i o g r a p h i c r e c o r d ” , ” d b b u i l d ”: ” Build160921 −2207m.6” , ” count ”: ”26470199” , ” l a s t u p d a t e ”: ”2016/09/22 16:32” , ” f i e l d l i s t ”: [ { ”name ”: ”ALL” , ” fullname ”: ” A l l F i e l d s ” , ” d e s c r i p t i o n ”: ” A l l terms from a l l s e a r c h a b l e f i e l d s ” , ” termcount ”: ”179424126” , ” i s d a t e ”: ”N” , ” i s n u m e r i c a l ”: ”N” , ” s i n g l e t o k e n ”: ”N” , ” h i e r a r c h y ”: ”N” , ” i s h i d d e n ”: ”N” } , { ”name ”: ”UID” , ” fullname ”: ”UID” , ” d e s c r i p t i o n ”: ” Unique number a s s i g n e d to p u b l i c a t i o n ” ,Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  31. 31. EInfo With entrez-direct $ e i n f o −dbs $ e i n f o −db pubmed Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  32. 32. GQuery Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  33. 33. GQuery Provides the number of records retrieved in all Entrez databases by a single text query. Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  34. 34. GQuery Example $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode =xml” <R e s u l t> <Term>t y r a n n o s a u r u s rex</Term> <eGQueryResult> <ResultItem><DbName>pubmed</DbName><MenuName/><Count>41</Count><Status> Ok</ Status></ ResultItem> <ResultItem><DbName>pmc</DbName><MenuName/><Count>160</Count><Status>Ok< / Status></ ResultItem> <ResultItem><DbName>mesh</DbName><MenuName/><Count>15</Count><Status>Ok< / Status></ ResultItem> <ResultItem><DbName>books</DbName><MenuName/><Count>179</Count><Status> Ok</ Status></ ResultItem> <ResultItem><DbName>pubmedhealth</DbName><MenuName/><Count>21</Count>< Status>Ok</ Status></ ResultItem> <ResultItem><DbName>omim</DbName><MenuName/><Count>10</Count><Status>Ok< / Status></ ResultItem> <ResultItem><DbName>omia</DbName><MenuName/><Count>0</Count><Status>Term or Database i s not found</ Status></ ResultItem> <ResultItem><DbName>n c b i s e a r c h</DbName><MenuName/><Count>1</Count>< Status>Ok</ Status></ ResultItem> <ResultItem><DbName>nuccore</DbName><MenuName/><Count>0</Count><Status> Term or Database i s not found</ Status></ ResultItem> ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  35. 35. GQuery Transforming to HTML using XSLT The XSLT stylesheet. https://raw.githubusercontent.com/ lindenb/courses/master/about.ncbi/gquery2html.xsl 1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?> 2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’> 3 <x s l : o u t p u t method=” html ”/> 4 5 <x s l : t e m p l a t e match=”/”><html><body> 6 <x s l : a p p l y −templates s e l e c t=” R e s u l t ”/> 7 </body></ html></ x s l : t e m p l a t e> 8 9 <x s l : t e m p l a t e match=” R e s u l t ”> 10 <t a b l e><c a p t i o n><x s l : v a l u e −of s e l e c t=”Term”/></ c a p t i o n> 11 <t r><th>Database</ th><th>Count</ th><th>Status</ th></ t r> 12 <x s l : a p p l y −templates s e l e c t=” eGQueryResult / ResultItem ”/> 13 </ t a b l e> 14 </ x s l : t e m p l a t e> 15 16 <x s l : t e m p l a t e match=” ResultItem ”> 17 <t r> 18 <td><a> 19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncbi . nlm . nih . gov/<x s l : v a l u e −of s e l e c t=” DbName”/>?cmd=se arch&amp ; term=<x s l : v a l u e −of s e l e c t=” t r a n s l a t e (/ R e s u l t /Term , ’ ’ , ’+ ’) ”/></ x s l : a t t r i b u t e> 20 <x s l : v a l u e −of s e l e c t=”DbName”/></a></ td> 21 <td><x s l : v a l u e −of s e l e c t=”Count”/></ td> 22 <td><x s l : v a l u e −of s e l e c t=” Status ”/></ td> 23 </ t r> 24 </ x s l : t e m p l a t e> 25 26 </ x s l : s t y l e s h e e t> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  36. 36. GQuery Transforming to HTML $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ gquery ? term=t y r a n n o s a u r u s%20rex&retmode =xml” | x s l t p r o c gquery2html . x s l − <html> <body> <t a b l e> <caption>t y r a n n o s a u r u s rex</ caption> <t r> <th>Database</ th> <th>Count</ th> <th>Status</ th> </ t r> <t r> <td> <a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pubmed?cmd=s earch&amp ; term=t y r a n n o s a u r u s </ td> <td>41</ td> <td>Ok</ td> </ t r> <t r> <td> <a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/pmc?cmd=searc h&amp ; term=t y r a n n o s a u r u s+re </ td> <td>160</ td> <td>Ok</ td> </ t r> <t r> <td> <a h r e f=” h t t p s ://www. ncbi . nlm . nih . gov/mesh?cmd=sea rch&amp ; term=t y r a n n o s a u r u s+r </ td> <td>15</ td>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  37. 37. ESearch Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  38. 38. ESearch Provides a list of UIDs matching a text query Posts the results of a search on the History server Downloads all UIDs from a dataset stored on the History server Combines or limits UID datasets stored on the History server Sorts sets of UIDs Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  39. 39. ESearch Syntax Base URL https: //eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  40. 40. ESearch Searching for ’Mammuthus primigenius’ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D” | x m l l i n t −−format − <e Sea rc hR esu lt> <Count>684</Count> <RetMax>20</RetMax> <RetStart>0</ RetStart> <I d L i s t> <Id>507866428</ Id> <Id>124056416</ Id> <Id>383843869</ Id> <Id>383843867</ Id> <Id>383843865</ Id> <Id>383843863</ Id> <Id>383843861</ Id> <Id>383843859</ Id> <Id>383843857</ Id> <Id>383843855</ Id> <Id>383843853</ Id> <Id>383843851</ Id> <Id>383843849</ Id> <Id>383843847</ Id> <Id>383843845</ Id> <Id>157367690</ Id> <Id>157367676</ Id> <Id>157367662</ Id> <Id>157367648</ Id> <Id>157367634</ Id> </ I d L i s t> <T r a n s l a t i o n S e t> <T r a n s l a t i o n>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  41. 41. ESearch Searching for ’Mammuthus primigenius’ (JSON) c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e &term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmode=j s o n ” { ” header ”: { ” type ”: ” e s e a r c h ” , ” v e r s i o n ”: ”0.3” } , ” e s e a r c h r e s u l t ”: { ” count ”: ”811” , ” retmax ”: ”20” , ” r e t s t a r t ”: ”0” , ” i d l i s t ”: [ ”1059791223” , ”198241525” , ”198241523” , ”198241521” , ”198241519” , ”198241517” , ”198241515” , ”198241513” , ”198241511” , ”198241509” , ”198241507” , ”198241505” , ”198241503” , ”198241501” , ”198241499” , ”198241497” , ”198241495” , ”198241493” , ”198241491” ,Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  42. 42. ESearch the retmax parameter c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=2” | x m l l i n t −−format − <e Sea rc hR esu lt> <Count>684</Count> <RetMax>2</RetMax> <RetStart>0</ RetStart> <I d L i s t> <Id>507866428</ Id> <Id>124056416</ Id> </ I d L i s t> <T r a n s l a t i o n S e t> <T r a n s l a t i o n> <From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From> <To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To> </ T r a n s l a t i o n> </ T r a n s l a t i o n S e t> <T r a n s l a t i o n S t a c k> <TermSet> <Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term> <F i e l d>Organism</ F i e l d> <Count>684</Count> <Explode>Y</ Explode> </TermSet> <OP>GROUP</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  43. 43. ESearch the retstart parameter c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” | x m l l i n t −−format − <e Sea rc hR esu lt> <Count>684</Count> <RetMax>3</RetMax> <RetStart>100</ RetStart> <I d L i s t> <Id>300810656</ Id> <Id>300810655</ Id> <Id>300810654</ Id> </ I d L i s t> <T r a n s l a t i o n S e t> <T r a n s l a t i o n> <From>”Mammuthus p r i m i g e n i u s ” [ORGN]</From> <To>”Mammuthus p r i m i g e n i u s ” [ Organism ]</To> </ T r a n s l a t i o n> </ T r a n s l a t i o n S e t> <T r a n s l a t i o n S t a c k> <TermSet> <Term>”Mammuthus p r i m i g e n i u s ” [ Organism ]</Term> <F i e l d>Organism</ F i e l d> <Count>684</Count> <Explode>Y</ Explode> </TermSet> <OP>GROUP</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>”Mammuthus p r i m i g e n i u s ” [ Organism ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  44. 44. ESearch rettype=retcount c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=n u c l e o t i d e& term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” | x m l l i n t −−format − <eSearchResult> <Count>684</Count> </ eSearchResult> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  45. 45. ESearch sort=Date Released c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db= n u c l e o t i d e&term=%22Mammuthus%20 p r i m i g e n i u s%22%5BORGN%5D&s o r t=Date+Released ” x m l l i n t −−format − <eSearchResult><Count>811</Count><RetMax>20</RetMax> <Id>1033204644</ Id> <Id>1033204658</ Id> <Id>1033204672</ Id> <Id>1033204686</ Id> <Id>1033204729</ Id> <Id>1033204771</ Id> <Id>1033204785</ Id> <Id>1033204799</ Id> <Id>1033204813</ Id> <Id>1033204827</ Id> <Id>1033204871</ Id> <Id>1033205124</ Id> <Id>1033205194</ Id> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  46. 46. ESummary Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  47. 47. ESummary Syntax Returns document summaries (DocSums) for a list of input UIDs Returns DocSums for a set of UIDs stored on the Entrez History server Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  48. 48. ESummary Syntax Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/ eutils/esummary.fcgi?db=(DB)&id=(TERM) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  49. 49. ESummary Retrieve nucleotide gi=507866428 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db= n u c l e o t i d e&i d =507866428” <eSummaryResult> <DocSum> <Id>507866428</ Id> <Item Name=” Caption ” Type=” S t r i n g ”>KC524742</ Item> <Item Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb <Item Name=” Extra ” Type=” S t r i n g ”>g i |507866428| gb | KC524742 . 1 | [ 5 0 7 8 6 6 4 2 8 ]</ Item> <Item Name=” Gi ” Type=” I n t e g e r ”>507866428</ Item> <Item Name=” CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item> <Item Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item> <Item Name=” Flags ” Type=” I n t e g e r ”>0</ Item> <Item Name=” TaxId ” Type=” I n t e g e r ”>37349</ Item> <Item Name=” Length ” Type=” I n t e g e r ”>9042</ Item> <Item Name=” Status ” Type=” S t r i n g ”>l i v e</ Item> <Item Name=” ReplacedBy ” Type=” S t r i n g ”></ Item> <Item Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item> </DocSum> </ eSummaryResult> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  50. 50. ESummary Retrieve nucleotide gi=507866428 in JSON $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db= n u c l e o t i d e&i d =507866428& retmode=j s o n ” { ” header ”: { ” type ”: ”esummary ” , ” v e r s i o n ”: ”0.3” } , ” r e s u l t ”: { ” u i d s ”: [ ”507866428” ] , ”507866428”: { ” uid ”: ”507866428” , ” c a p t i o n ”: ”KC524742 ” , ” t i t l e ”: ”Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r ” e x t r a ”: ” g i |507866428| gb | KC524742 . 1 | ” , ” g i ”: 507866428 , ” c r e a t e d a t e ”: ”2013/06/15” , ” updatedate ”: ”2013/06/21” , ” f l a g s ”: ”” , ” t a x i d ”: 37349 , ” s l e n ”: 9042 , ” biomol ”: ” genomic ” , ” moltype ”: ”dna ” , ” topology ”: ” l i n e a r ” , ” sourcedb ”: ” i n s d ” , ” s e g s e t s i z e ”: ”” , ” p r o j e c t i d ”: ”0” , ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  51. 51. ESummary Retrieve snp rs25 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d =25 ” <eSummaryResult> <DocSum> <Id>25</ Id> <Item Name=”SNP ID” Type=” I n t e g e r ”>25</ Item> <Item Name=”Organism” Type=” S t r i n g ”></ Item> <Item Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item> <Item Name=”GLOBAL MAF” Type=” S t r i n g ”>0.4913</ Item> <Item Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item> <Item Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item> <Item Name=”SUSPECTED” Type=” S t r i n g ”></ Item> <Item Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item> <Item Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item> <Item Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item> <Item Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item> <Item Name=”CHR” Type=” S t r i n g ”>7</ Item> <Item Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item> <Item Name=”HANDLE” Type=” S t r i n g ”>1000GENOMES, BGI , BL ,BUSHMAN,COMPLETE GENOMICS, CSHL−HAPM <Item Name=”FXN CLASS” Type=” S t r i n g ”>intron−v a r i a n t</ Item> <Item Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−frequency , by−hapmap</ Item> <Item Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item> <Item Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item> <Item Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 :g .11584142T&gt ; C, NG 027670 .1 :g .29268 <Item Name=”HET” Type=” I n t e g e r ”>50</ Item> <Item Name=”SRATE” Type=” I n t e g e r ”>0</ Item> <Item Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item> <Item Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17|11574141|11584142|THSD7A|0 <Item Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item> <Item Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item> <Item Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  52. 52. ESummary Retrieve pubmed pmid=7939126 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed& i d =7939126” <eSummaryResult> <DocSum> <Id>7939126</ Id> <Item Name=”PubDate” Type=”Date”>1994 Apr</ Item> <Item Name=”EPubDate” Type=”Date”></ Item> <Item Name=” Source ” Type=” S t r i n g ”>Sleep</ Item> <Item Name=” A u t h o r L i s t ” Type=” L i s t ”> <Item Name=” Author ” Type=” S t r i n g ”>Broughton R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Cartwright R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Doucette D</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Edmeads J</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Edwardh M</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Ervin F</ Item> <Item Name=” Author ” Type=” S t r i n g ”>Orchard B</ Item> <Item Name=” Author ” Type=” S t r i n g ”>H i l l R</ Item> <Item Name=” Author ” Type=” S t r i n g ”>T u r r e l l G</ Item> </ Item> <Item Name=” LastAuthor ” Type=” S t r i n g ”>T u r r e l l G</ Item> <Item Name=” T i t l e ” Type=” S t r i n g ”>Homicidal somnambulism: a case r e p o r t .</ Item> <Item Name=”Volume” Type=” S t r i n g ”>17</ Item> <Item Name=” I s s u e ” Type=” S t r i n g ”>3</ Item> <Item Name=” Pages ” Type=” S t r i n g ”>253−64</ Item> <Item Name=” LangList ” Type=” L i s t ”> <Item Name=”Lang” Type=” S t r i n g ”>E n g l i s h</ Item> </ Item> <Item Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item> <Item Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item> <Item Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  53. 53. EFetch Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  54. 54. EFetch Syntax Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/ eutils/efetch.fcgi?db=(db)&id=(ID) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  55. 55. EFetch Retrieve nucleotide gi=507866428 as ASN.1 Default https://eutils.ncbi.nlm.nih.gov/entrez/eutils/ efetch.fcgi?db=nucleotide&id=507866428 Seq−e n t r y ::= set { c l a s s nuc−prot , d e s c r { source { genome genomic , org { taxname ”Mammuthus p r i m i g e n i u s ” , common ” woolly mammoth” , db { { db ” taxon ” , tag i d 37349 } } , orgname { name b i no m i al { genus ”Mammuthus” , s p e c i e s ” p r i m i g e n i u s ” } , mod { { Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  56. 56. EFetch Retrieve nucleotide gi=507866428 as Fasta https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch. fcgi?db=nucleotide&id=507866428&rettype=fasta >g i |507866428| gb | KC524742 . 1 | Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i a l cds GCACTTGCTTTTTTTGTCTTCTTCAGACCACGACATGGGACTCAGCGACGGGGAATGGGAGTTGGTGTTG AAAACCTGGGGGAAAGTGGAGGCTGACATCCCGGGCCATGGGCTGGAAGTCTTCGTCAGGTAAAGGAAGA AATCCTGTGGCCCCCATCACCCACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  57. 57. EFetch Retrieve nucleotide gi=507866428 as TinySeq https: //eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi? db=nucleotide&id=507866428&rettype=fasta&retmode=xml <?xml v e r s i o n=” 1.0 ”?> <!DOCTYPE TSeqSet PUBLIC ”−//NCBI//NCBI TSeq/EN” <TSeqSet> <TSeq> <TSeq seqtype v a l u e=” n u c l e o t i d e ”/> <TSeq gi>507866428</ TSeq gi> <TSeq accver>KC524742 .1</ TSeq accver> <TSeq taxid>37349</ TSeq taxid> <TSeq orgname>Mammuthus p r i m i g e n i u s</TSeq orgnam <T S e q d e f l i n e>Mammuthus p r i m i g e n i u s i s o l a t e CME2 <TSeq length>9042</ TSeq length> <TSeq sequence>GCACTTGCTTTTTTTGTCTTCTTCAGACCACGA </TSeq> </TSeqSet> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  58. 58. EFetch Retrieve nucleotide gi=507866428 as Genbank-xml https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch. fcgi?db=nucleotide&id=507866428&retmode=xml <GBSeq> <GBSeq locus>KC524742</ GBSeq locus> <GBSeq length>9042</ GBSeq length> <GBSeq strandedness>double</ GBSeq strandedness> <GBSeq moltype>DNA</GBSeq moltype> <GBSeq topology>l i n e a r</ GBSeq topology> <GBSeq division>MAM</ GBSeq division> <GBSeq update−date>21−JUN−2013</GBSeq update−date> <GBSeq create−date>15−JUN−2013</ GBSeq create−date> <G B S e q d e f i n i t i o n>Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i <GBSeq primary−a c c e s s i o n>KC524742</ GBSeq primary−a c c e s s i o n> <GBSeq accession−v e r s i o n>KC524742 .1</ GBSeq accession−v e r s i o n> <GBSeq other−s e q i d s> <GBSeqid>gb | KC524742 . 1 |</GBSeqid> <GBSeqid>g i |507866428</GBSeqid> </ GBSeq other−s e q i d s> <GBSeq source>Mammuthus p r i m i g e n i u s ( woolly mammoth)</ GBSeq source> <GBSeq organism>Mammuthus p r i m i g e n i u s</ GBSeq organism> ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  59. 59. EFetch Retrieve nucleotide gi=507866428 as Genbank https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch. fcgi?db=nucleotide&id=507866428&rettype=gb LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013 DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i a l cds . ACCESSION KC524742 VERSION KC524742 .1 GI :507866428 KEYWORDS . SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth) ORGANISM Mammuthus p r i m i g e n i u s Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ; Mammuthus . REFERENCE 1 ( bases 1 to 9042) AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L . and Berenbrink ,M. TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e charge JOURNAL Science 340 (6138) , 1234192 (2013) PUBMED 23766330 REFERENCE 2 ( bases 1 to 9042) AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N. TITLE D i r e c t Submission JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada COMMENT ##Assembly−Data−START## Sequencing Technology : : Sanger dideoxy sequencing ##Assembly−Data−END## FEATURES Location / Q u a l i f i e r s source 1 . . 9 0 4 2 / organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  60. 60. EFetch Efetch works with the ACCESSION NUMBERS https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch. fcgi?db=nucleotide&id=KC524742&rettype=gb LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013 DEFINITION Mammuthus p r i m i g e n i u s i s o l a t e CME2005/915 myoglobin (Mb) gene , p a r t i a l cds . ACCESSION KC524742 VERSION KC524742 .1 GI :507866428 KEYWORDS . SOURCE Mammuthus p r i m i g e n i u s ( woolly mammoth) ORGANISM Mammuthus p r i m i g e n i u s Eukaryota ; Metazoa ; Chordata ; Craniata ; V e r t e br a t a ; Euteleostomi ; Mammalia ; E u t h e r i a ; A f r o t h e r i a ; Proboscidea ; E l e p h a n t i d a e ; Mammuthus . REFERENCE 1 ( bases 1 to 9042) AUTHORS Mirceta , S . , Signore ,A.V. , Burns , J .M. , Cossins ,A.R. , Campbell ,K. L . and Berenbrink ,M. TITLE E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e charge JOURNAL Science 340 (6138) , 1234192 (2013) PUBMED 23766330 REFERENCE 2 ( bases 1 to 9042) AUTHORS Signore ,A.V. , Campbell ,K. L . and Poinar ,H.N. TITLE D i r e c t Submission JOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sciences , U n i v e r s i t y of Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , Canada COMMENT ##Assembly−Data−START## Sequencing Technology : : Sanger dideoxy sequencing ##Assembly−Data−END## FEATURES Location / Q u a l i f i e r s source 1 . . 9 0 4 2 / organism=”Mammuthus p r i m i g e n i u s ”Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  61. 61. EFetch Using the WebEnv parameter. Web environment string returned from a previous ESearch, EPost or ELink call. When provided, ESearch will post the results of the search operation to this pre-existing WebEnv. Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  62. 62. EFetch Using the WebEnv parameter. Searching extinct species in the NCBI taxonomy (’extinct[PROP]’) c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db= taxonomy&term=e x t i n c t%5BPROP%5D” <e Sea rc hR esu lt> <Count>145</Count> <RetMax>20</RetMax> <RetStart>0</ RetStart> <QueryKey>1</QueryKey> <WebEnv>NCID 1 75550312 130.14.18.34 9001 1375948145 325582538</WebEnv> <I d L i s t> <Id>1225531</ Id> <Id>1225530</ Id> <Id>1211276</ Id> <Id>1211275</ Id> <Id>1027716</ Id> <Id>948961</ Id> <Id>943952</ Id> <Id>867394</ Id> <Id>867393</ Id> <Id>748142</ Id> <Id>748141</ Id> <Id>741158</ Id> <Id>703576</ Id> <Id>703571</ Id> <Id>703559</ Id> <Id>693865</ Id> <Id>686441</ Id> <Id>665113</ Id> <Id>659069</ Id> <Id>656807</ Id>Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  63. 63. EFetch Using the WebEnv parameter. Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’) using the WebEnv parameter. $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy& query key=1&WebEnv=NCID 1 75550312 130.14.18.34 9001 1375948145 325582538&retmode=xml” <TaxaSet><Taxon> <TaxId>1225531</ TaxId> <S c i e n t i f i c N a m e>Equus ovodovi</ S c i e n t i f i c N a m e> <OtherNames> <Synonym>Equus ( Sussemionus ) ovodovi</Synonym> <Name> <ClassCDE>a u t h o r i t y</ClassCDE> <DispName>Equus ovodovi Eisenmann &amp ; Sergej , 2011</DispName> </Name> </OtherNames> <ParentTaxId>1225530</ ParentTaxId> <Rank>s p e c i e s</Rank> <D i v i s i o n>Mammals</ D i v i s i o n> <GeneticCode> <GCId>1</GCId> <GCName>Standard</GCName> </ GeneticCode> <MitoGeneticCode> ( . . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  64. 64. EPOST Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  65. 65. EPost Uploads a list of UIDs to the Entrez History server Appends a list of UIDs to an existing set of UID lists attached to a Web Environment Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  66. 66. EPost Post gi to epost Get a list of gis of extincts animals: wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db= taxonomy&term=e x t i n c t [PROP]& retmax =1000’ | x m l l i n t −format − | grep ’<Id >’ | cut −d ’<’ −f 2 | cut −d ’>’ −f 2| t r ”n” ” , ” output: 1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772 ,1607771 ,1607767 ,1607757 ,1607756 Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  67. 67. EPost Post gi to epost wget −O − ’ h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / epost . f c g i ?db=taxonomy& WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 5 9001 1474637318 669113391 0MetA0 S MegaStore F 1&i d =1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772... ” Output: <?xml v e r s i o n=” 1.0 ”?> <!DOCTYPE ePostResult PUBLIC ”−//NLM//DTD ePostResult , 11 May 2002//EN” ” h t t p : // www. ncbi . nlm . nih . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”> <ePostResult> <QueryKey>1</QueryKey> <WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5 9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv> </ ePostResult> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  68. 68. EPost Searching in the WebEnv Search Homo Sapiens in WebEnv ? c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy& term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538&query key=1” <e Sea rc hR esu lt> <Count>0</Count> <RetMax>0</RetMax> <RetStart>0</ RetStart> <QueryKey>8</QueryKey> <WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv> <I d L i s t /> <T r a n s l a t i o n S e t /> <T r a n s l a t i o n S t a c k> <OP>GROUP</OP> <TermSet> <Term>homo s a p i e n s [ A l l Names ]</Term> <F i e l d>A l l Names</ F i e l d> <Count>0</Count> <Explode>N</ Explode> </TermSet> <OP>AND</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>(#2) AND homo s a p i e n s [ A l l Names ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  69. 69. EPost Searching in the WebEnv Search Tyranosaurus in WebEnv ? $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db= taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538&query key=1” <e Sea rc hR esu lt> <Count>1</Count> <RetMax>1</RetMax> <RetStart>0</ RetStart> <QueryKey>9</QueryKey> <WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv> <I d L i s t> <Id>436494</ Id> </ I d L i s t> <T r a n s l a t i o n S e t /> <T r a n s l a t i o n S t a c k> <OP>GROUP</OP> <TermSet> <Term>Tyrannosaurus [ A l l Names ]</Term> <F i e l d>A l l Names</ F i e l d> <Count>1</Count> <Explode>N</ Explode> </TermSet> <OP>AND</OP> </ T r a n s l a t i o n S t a c k> <QueryTranslation>(#2) AND Tyrannosaurus [ A l l Names ]</ QueryTranslation> </ e Se ar ch Res ul t> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  70. 70. EDirect: combining tools Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  71. 71. Piping Edirect esearch −db taxonomy −query ” Tyrannosaurus ” | e f e t c h −format xml Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  72. 72. Piping Edirect esearch −db pubmed −query ” Tyrannosaurus ” | e f i l t e r −mindate 2005 | e f e t c h −format docsum | x t r a c t −pattern DocumentSummary −element MedlineCitation /PMID −element Id S o r t F i r s t A u t h o r Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  73. 73. Elink Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  74. 74. Elink Returns UIDs linked to an input set of UIDs in either the same or a different Entrez database Returns UIDs linked to other UIDs in the same Entrez database that match an Entrez query Checks for the existence of Entrez links for a set of UIDs within the same database Lists the available links for a UID Lists LinkOut URLs and attributes for a set of UIDs Lists hyperlinks to primary LinkOut providers for a set of UIDs Creates hyperlinks to the primary LinkOut provider for a single UID Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  75. 75. Elink Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  76. 76. ELink Searching the pubmed records associated to sequence gi:507866428 h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=n u c l e o t i d e&db= pubmed&i d =507866428&cmd=n e i g h b o r s c o r e <e L i n k R e s u l t> <LinkSet> <DbFrom>nuccore</DbFrom> <I d L i s t> <Id>507866428</ Id> </ I d L i s t> <LinkSetDb> <DbTo>pubmed</DbTo> <LinkName>nuccore pubmed</LinkName> <Link> <Id>23766330</ Id> <Score>0</ Score> </ Link> </ LinkSetDb> </ LinkSet> </ e L i n k R e s u l t> $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& i d =23766330& r e t t y p e=medline&retmode=t e x t ” PMID− 23766330 TI − E v o l u t i o n of mammalian d i v i n g c a p a c i t y t r a c e d by myoglobin net s u r f a c e charge . PG − 1234192 LID − 10.1126/ s c i e n c e .1234192 [ doi ] Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  77. 77. Transformations Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  78. 78. Efetch Transforming to SVG Using the stylesheet https://github.com/lindenb/xslt-sandbox/blob/master/ stylesheets/bio/ncbi/gb2svg.xsl x s l t p r o c <( c u r l ” h t t p s :// raw . github . com/ l i n d e n b / x s l t −sandbox / master / s t y l e s h e e t s / bio / ncbi / gb2svg . x s l ” ) ” h t t p s ://www. ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=n u c l e o t i d e&i d =14971102& retmode=xml&r e t t y p e=gbc” Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  79. 79. Efetch Transforming to SVG 1 <?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?> 2 <s v g : s v g xmlns:svg=” h t t p : //www. w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=” stroke−width:1px ; ”> 3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e> 4 <s v g : d e f s> 5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”> 6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” black ”/> 7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=” whitesmoke ”/> 8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” black ”/> 9 </ s v g : l i n e a r G r a d i e n t> 10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” v e r t i c a l b o d y g r a d i e n t ”> 11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” white ”/> 12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/> 13 </ s v g : l i n e a r G r a d i e n t> 14 </ s v g : d e f s> 15 <s v g : s t y l e type=” t e x t / c s s ”/> 16 <s v g : g> 17 <s v g : g transform=” t r a n s l a t e (0 ,0) ”> 18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (# v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” black ”/> 19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t e x t> 20 <s v g : g> 21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 0 7 4 ”/> 22 <s v g : t e x t y=”54” x=”460” text−anchor=” middle ”><s v g : t s p a n s t y l e=” font− w e i g h t : b o l d ; ”>source</ s v g : t s p a n><s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3 . org /2001/XMLSchema−i n s t a n c e ” x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/ x l i n k ” font−weight=” bold ”>organism</ s v g : t s p a n>:Human r o t a v i r u s A < s v g : t s p a n x m l n s : x s i=” h t t p : //www. w3 . org /2001/XMLSchema−i n s t a n c e ” x m l n s : x l i n k=” h t t p : //www. w3 . org /1999/ x l i n k ” font−weight=” bold ”> mol type</ s v g : t s p a n>:genomic RNA <s v g : t s p a n x m l n s : x s i=” h t t p : //www.Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  80. 80. Efetch Transforming to SVG Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  81. 81. Efetch Transforming to R $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed& term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | x m l l i n t −−format − $ c u r l −s ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5 9001 1375957034 1619786167&query key=1&retmode=xml” Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  82. 82. Efetch Transforming to R 1 <?xml v e r s i o n=’ 1.0 ’ encoding=”UTF−8” ?> 2 <x s l : s t y l e s h e e t x m l n s : x s l=’ h t t p : //www. w3 . org /1999/XSL/ Transform ’ v e r s i o n=’ 1.0 ’> 3 <x s l : o u t p u t method=” t e x t ”/> 4 5 6 <x s l : t e m p l a t e match=”/”> 7 date2count &l t ;− l i s t () 8 <x s l : a p p l y −templates s e l e c t=”/ PubmedArticleSet / PubmedArticle [ M e d l i n e C i t a t i o n / DateCreated / Year ] ”/> 9 df &l t ;− data . frame ( 10 Year=as . i n t e g e r ( names ( date2count ) ) , 11 Count=u n l i s t ( date2count ) 12 ) 13 png ( ’ jeterpubmed . png ’ ) 14 p l o t ( df ) 15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( year ) ’ ) 16 dev . o f f () 17 </ x s l : t e m p l a t e> 18 19 <x s l : t e m p l a t e match=” PubmedArticle ”> 20 <x s l : v a r i a b l e name=” year ” s e l e c t=” M e d l i n e C i t a t i o n / DateCreated / Year ”/> 21 date2count [ [ ”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] &l t ;− i f e l s e ( i s . n u l l ( date2count [ [ ”<x s l : v a l u e −of s e l e c t=”$ year ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e −of s e l e c t=” $ year ”/>” ] ] ) 22 </ x s l : t e m p l a t e> 23 24 </ x s l : s t y l e s h e e t> Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  83. 83. Efetch Transforming to R $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5 9001 1375957034 1619786167&query key=1&retmode=xml” | x s l t p r o c pubmed2rstats . x s l − date2count <− l i s t () date2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ” 2013” ] ] ) date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ” 2012” ] ] ) date2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ” 2012” ] ] ) date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ” 2011” ] ] ) date2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ” 2011” ] ] ) ( . . ) df <− data . frame ( Year=as . i n t e g e r ( names ( date2count ) ) , Count=u n l i s t ( date2count ) ) png ( ’ jeterpubmed . png ’ ) p l o t ( df ) t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( year ) ’ ) dev . o f f () Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  84. 84. Efetch Transforming to R $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed& u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 5 9001 1375957034 1619786167&query key=1&retmode=xml” | x s l t p r o c pubmed2rstats . x s l − | R −−no−save Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  85. 85. Generating a JAVA parser Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  86. 86. Using the XML schema XML Schema for dbSNP ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_3.4.xsd <?xml v e r s i o n=” 1.0 ” encoding=”UTF−8”?> <xsd:schema xmlns:xsd=” h t t p : //www. w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncbi . nlm . nih . ementFormDefault=” q u a l i f i e d ” a t t r i b u t e F o r m D e f a u l t=” u n q u a l i f i e d ”> <x s d : e l e m e n t name=” ExchangeSet ”> <x s d : a n n o t a t i o n> <xsd:documentation>Set of dbSNP refSNP docsums , v e r s i o n 3.4</ xsd:documentation> </ x s d : a n n o t a t i o n> <xsd:complexType> <x s d : s e q u e n c e> <x s d : e l e m e n t name=” SourceDatabase ” minOccurs=”0”> <xsd:complexType> <x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”> <x s d : a n n o t a t i o n> <xsd:documentation>NCBI taxonomy ID f o r v a r i a t i o n</ xsd:documentation> </ x s d : a n n o t a t i o n> </ x s d : a t t r i b u t e> <x s d : a t t r i b u t e name=” organism ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”> <x s d : a n n o t a t i o n> <xsd:documentation>common name f o r s p e c i e s used as part of database name </ x s d : a n n o t a t i o n> </ x s d : a t t r i b u t e> <x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”> <x s d : a n n o t a t i o n> <xsd:documentation>organism a b b r e v i a t i o n used i n dbSNP . </ xsd:documentat </ x s d : a n n o t a t i o n> </ x s d : a t t r i b u t e> <x s d : a t t r i b u t e name=” gpipeOrgAbbr ” type=” x s d : s t r i n g ”> <x s d : a n n o t a t i o n> <xsd:documentation>organism a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  87. 87. Using the XML schema Compiling the XML Schema for dbSNP with XJC $ x j c −d . ” f t p :// f t p . ncbi . nlm . nih . gov/ snp / specs /docsum 3 . 4 . xsd ” p a r s i n g a schema . . . comp iling a schema . . . h t t p s / www ncbi nlm nih gov / snp /docsum/ Assay . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ Assembly . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/BaseURL . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/Component . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ ExchangeSet . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ FxnSet . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/MapLoc . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ ObjectFactory . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ PrimarySequence . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/Rs . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ RsLinkout . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ RsStruct . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/Ss . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum/ package−i n f o . j a v a Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  88. 88. Using the XML schema Compiling the XML Schema for dbSNP with XJC Search the non-genomic rs# in dbSNP. 1 import h t t p s . www ncbi nlm nih gov . snp . docsum . ∗ ; 2 import j a va x . xml . bind . ∗ ; 3 import j a va x . xml . stream . ∗ ; 4 import j a va x . xml . stream . even ts . ∗ ; 5 c l a s s ParseDbSnp 6 { 7 p u b l i c s t a t i c void main ( S t r i n g [ ] args ) throws Exception 8 { 9 JAXBContext jaxbCtxt=JAXBContext . newInstance ( ” h t t p s . www ncbi nlm nih gov . snp . docsum” ) ; 10 Unmarshaller u n m a r s h a l l e r=jaxbCtxt . c r e a t e U n m a r s h a l l e r () ; 11 XMLInputFactory i f a c t o r y = XMLInputFactory . newInstance () ; 12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ; 13 while ( r . hasNext () ) 14 { 15 XMLEvent evt=r . peek () ; 16 i f ( ! ( evt . i s S t a r t E l e m e n t () && evt . asStartElement () . getName () . g e t L o c a l P a r t () . e q u a l s ( ”Rs” ) ) ) 17 { 18 evt=r . nextEvent () ; 19 continue ; 20 } 21 22 Rs r s=u n m a r s h a l l e r . unmarshal ( r , Rs . c l a s s ) . getValue () ; 23 i f ( ” genomic ” . e q u a l s ( r s . getMolType () ) ) continue ; 24 System . out . p r i n t l n ( ” r s ”+r s . getRsId ()+” ”+r s . getMolType () ) ; 25 } 26 r . c l o s e () ; 27 } 28 } Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  89. 89. Using the XML schema Compiling the XML Schema for dbSNP with XJC compile... $ j a v a c ParseDbSnp . j a v a h t t p s / www ncbi nlm nih gov / snp /docsum /∗. j a v a and run... $ c u r l −s ” f t p :// f t p . ncbi . nih . gov/ snp / organisms /human 9606/XML/ ds ch1 . xml . gz” | gunzip −c | j a v a ParseDbSnp rs701 cDNA rs860 cDNA rs861 cDNA rs862 cDNA rs863 cDNA rs864 cDNA rs865 cDNA rs866 cDNA rs877 cDNA rs878 cDNA rs879 cDNA rs880 cDNA rs882 cDNA rs883 cDNA rs884 cDNA rs885 cDNA rs886 cDNA rs913 cDNA rs945 cDNA rs946 cDNA ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  90. 90. NCBI EBot Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  91. 91. NCBI EBot URL https://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/ ebot/ebot.cgi Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  92. 92. NCBI EBot Sample output #!/ usr / bin / p e r l ( . . . ) # PUBLIC DOMAIN NOTICE # N a t i o n a l Center f o r Biotechnology I n f o r m a t i o n use LWP: : Simple ; use LWP: : UserAgent ; use Net : : FTP; my $delay = 0; my $maxdelay = 3; my $base = ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s /” ; $params{email} = ”nobody@nowhere . com” ; $params{db} = ” nuccore ” ; $params{ t o o l } = ” ebot ” ; $params{term} = ”Mammuthus+p r i m i g e n i u s [ORGN] ” ; %params = e s e a r c h(%params ) ; $params{retmode} = ”xml” ; $params{ o u t f i l e } = ” r e s u l t . xml” ; $params{ r e t t y p e } = ” n a t i v e ” ; e f e t c h b a t c h (%params ) ; Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  93. 93. BLAST Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  94. 94. Standalone Blast Downloading Standalone tools are available at ftp://ftp.ncbi.nlm.nih.gov/ blast/executables/blast+/LATEST/ #add BLAST to your path export PATH=${PATH}:/ path / to / ncbi−blast −2.2.28+/ bin Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  95. 95. Standalone Blast Download a sample apis mellifera proteins c u r l −o p r o t e i n . fa . gz ” f t p :// f t p . ncbi . nih . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . fa . gz” gunzip p r o t e i n . fa . gz Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  96. 96. Standalone Blast Create a Blast database with makeblastdb Getting help... $ makeblastdb −help ( . . . ) −dbtype <String , ‘ nucl ’ , ‘ prot ’> Molecule type of t a r g e t db −in <F i l e I n > Input f i l e / database name Default = ‘−’ −i n p u t t y p e <String , ‘ asn1 bin ’ , ‘ asn1 txt ’ , ‘ blast Type of the data s p e c i f i e d in i n p u t f i l e Default = ‘ fasta ’ ( . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  97. 97. Standalone Blast Create a Blast database with makeblastdb Create the BLAST database: $ makeblastdb −in p r o t e i n . fa −dbtype prot B u i l d i n g a new DB, c u r r e n t time : 09/02/2013 18:29:38 New DB name : p r o t e i n . fa New DB t i t l e : p r o t e i n . fa Sequence type : Protein Keep Linkouts : T Keep MBits : T Maximum f i l e s i z e : 1000000000B Adding sequences from FASTA; added 10570 sequences Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  98. 98. Standalone Blast Query a Blast database with blastp Get help: $ b l a s t p −help ( . . . ) −query <F i l e I n > Input f i l e name Default = ‘−’ −db <String > BLAST database name ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  99. 99. Standalone Blast Blast human EIF4G1 gi:187956781 $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n& r e t t y p e=f a s t a&i d =187956781” | b l a s t p −db p r o t e i n . fa Query= g i |187956781| gb | AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ] ( . . . ) Score E Sequences producing s i g n i f i c a n t alignments : ( B i t s ) Value g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49 g i |328779480| r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.017 g i |110762568| r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38.1 0.018 ( . . . ) > g i |328782175| r e f | XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n i n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Apis m e l l i f e r a ] Length=899 Score = 189 b i t s (479) , Expect = 4e−49, Method : Compositional matrix a d j u s t . I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%) Query 717 KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSI 774 ++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I S b j c t 22 RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGI 73 Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−− 829 LNKLTP+ F +L + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L S b j c t 74 LNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAA 133 Query 830 −MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLK 888 K E F LLL++C+ EFE E FE + DE EE S b j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  100. 100. Standalone Blast Blast human EIF4G1 gi:187956781 , ouput XML $ c u r l ” h t t p s :// e u t i l s . ncbi . nlm . nih . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n& r e t t y p e=f a s t a&i d =187956781” | b l a s t p −db p r o t e i n . fa −outfmt 5 ( . . . ) <H i t h s p s> <Hsp> <Hsp num>1</Hsp num> <Hsp bit−s c o r e>189.119</ Hsp bit−s c o r e> <Hsp score>479</ Hsp score> <Hsp evalue>3.78314 e−49</ Hsp evalue> <Hsp query−from>717</ Hsp query−from> <Hsp query−to>1017</ Hsp query−to> <Hsp hit−from>22</ Hsp hit−from> <Hsp hit−to>319</ Hsp hit−to> <Hsp query−frame>0</ Hsp query−frame> <Hsp hit−frame>0</ Hsp hit−frame> <H s p i d e n t i t y>115</ H s p i d e n t i t y> <H s p p o s i t i v e>175</ H s p p o s i t i v e> <Hsp gaps>39</ Hsp gaps> <Hsp align−l e n>319</ Hsp align−l e n> <Hsp qseq>KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQ IARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLL−−−−−−−−KNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKK <Hsp hseq>RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGILNKLTPEKFAKLS VAKRKMLGNIKFIGELGKLGIVSETILHRCILQLLEKKRRRRSRGDTAEDIECLCQIMRTCGRILDSDKGRGLMDQYFKRMNSLAESRD <Hsp midline>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L K E F LLL++C+ EFE E FE + DE EE E R +A+R+ LGNIKFIGEL KL +++E I+H C+++LL + E +ECLC+++ T G+ LD +K + MDQYF +M + + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp midline> </Hsp> ( . . . )Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  101. 101. NCBI URL-API Blast Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  102. 102. NCBI URL-API Blast https://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html $ c u r l ” h t t p s ://www. ncbi . nlm . nih . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE &DATABASE=nr&PROGRAM=b l a s t p&FILTER=L&HITLIST SZE=500” ( . . . ) <!−−QBlastInfoBegin RID = 1NRYGX9K014 RTOE = 29 QBlastInfoEnd −−> ( . . . ) Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour
  103. 103. The End Pierre Lindenbaum@yokofakun pierre.lindenbaum@univ-nantes.fr http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/cour

×