Harvester Ii

858 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
858
On SlideShare
0
From Embeds
0
Number of Embeds
26
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Harvester Ii

    1. 1. Bioinformatic Harvester II Education and Training A cademia S inica I nstitute of B io m edical S ciences Biomedical IT Core Ming-Fang Tsai [email_address]
    2. 2. Agenda 1. Overview 3. Introduction to Databases 4. Summary 2. Advance Search
    3. 3. Overview
    4. 4. Feature Overview <ul><li>提供 多 種輸入方式 ,gene name, ID, 位置 , 疾病 , 文獻及 sequence 等 . </li></ul><ul><li>濾除 重覆 出現的資料 </li></ul><ul><li>排序 查詢結果 , 最相關的資料排序最高 </li></ul><ul><li>兩 個以上的資料庫結果一致的排序較 高 </li></ul><ul><li>查詢結果可以存成 xls </li></ul><ul><li>可自行開發 程式查詢 Harvester 的資料 </li></ul><ul><li>Crosslink 的網站資料來源可信度 高 (uniprot, NCBI, UCSC, Ensembl, Source…) </li></ul><ul><li>網頁資料最長 14 天更新一次 </li></ul>
    5. 5. http://harvester.embl.de <ul><li>Gene: ATM </li></ul><ul><li>Genbank : NM_000051 </li></ul><ul><li>Domain: FHA </li></ul><ul><li>Localization: nuclear </li></ul>
    6. 6. Flow Diagram Text results Genome Gene position localization Domain : : Graphical results Sequence Saved as a file Human Mouse Rat Cross Link Input page rank
    7. 7. General Query chromosome 4q32 human chromosome 4q32 107 筆 P42262 P42262 Text results Genome Graphical results Save as a file Input
    8. 8. General Query chromosome 4q32 human chromosome 4q32 brain brain hypothetical Text results Genome Graphical results Save as a file Input 善用 AND 及 NOT 功能 , 可有效縮小範圍 , 原 107 筆減為 20 筆資料
    9. 9. Exercise <ul><li>http:// harvester.embl.de / </li></ul><ul><li>如何查詢 6q21-6q24 之間的資料 ? 且過濾掉 hypothetical 的資訊 . </li></ul>
    10. 10. 6q21-6q24, not hypothetical 可以使用自定 protein list 的功能 , 稍後將會介紹 Q: 如果想查 6q21,6q24, brain 的資料 , 要怎麼做呢 ?
    11. 11. 如何將查詢結果另存新檔 ? <ul><li>檔案  另存新檔  存成檔名 .xls ( 含副檔名 ) </li></ul>Text results Genome Graphical results Save as a file Input Text results 跟 graphical results 都是相同方式存檔哦 ! Graphical results
    12. 12. Advance Search
    13. 13. User Defined Protein List <ul><li>Default : all available proteins </li></ul><ul><li>User defined protein list </li></ul>2345 第一次需設定密碼 , 以後憑此密碼調出資料 Protein set ID 回功能頁 Protein 之間以空格或換行分開 , 都可以
    14. 14. Protein-Set Search 2345 注意 : 密碼最好設定複雜一點 , 萬一有人輸入相同密碼 , 則一樣可以存取你的資料 .
    15. 15. Exercise <ul><li>http:// harvester.embl.de / </li></ul><ul><li>自訂 protein set </li></ul><ul><li>查詢自訂的 protein, 或輸入密碼 5522, </li></ul><ul><li>查詢 ID 17 </li></ul>
    16. 16. 如何查詢 6q21, 6q24, brain? <ul><li>STEP 1 : 先查詢 6q21, 6q24, not hypothetical </li></ul>得到 272 筆資料
    17. 17. 如何查詢 6q21, 6q24, brain? <ul><li>STEP 2 : 將結果另存 xls 檔 </li></ul>
    18. 18. <ul><li>STEP 3 : 處理 xls 檔 , 得到 protein List </li></ul>如何查詢 6q21, 6q24, brain? 複製欄位 B 的 protein list
    19. 19. 如何查詢 6q21, 6q24, brain? <ul><li>STEP 4 : 將複製的資料 , 貼到自訂 protein list </li></ul>
    20. 20. Exercise <ul><li>http:// harvester.embl.de / </li></ul><ul><li>查詢 17q22,17q23,cancer </li></ul>
    21. 21. Sequence Search
    22. 22. Sequence Search QARENKDFVR http://www-db.embl.de/jss/servlet/de.embl.bk.htmlfind.HarvesterOutputMysql?m = seqSearch 此處可以是任意長度任意字 <ul><li>PQ_ _ A+ 任意字 , 任意長度 +NK_ _ VR </li></ul><ul><li>最多可以輸入 4 種條件 </li></ul>
    23. 23. Sequence Pattern Search <ul><li>‘ %’ : 表示任意字元任意長度 </li></ul><ul><ul><li>如果非查詢字串的位置是起始或結尾可不加 % </li></ul></ul><ul><ul><li>字串內也可以加 % </li></ul></ul><ul><li>‘ _’ : 一個 _ 表示一個字元 , 兩個以上以此類推 </li></ul><ul><li>最多可輸入 四 種條件 </li></ul>
    24. 24. Send sequence to STRING,BLAST,SMART, CDART <ul><li>http://harvester.embl.de/sequence-input.html </li></ul><ul><li>會同時將序列送到此四個網站執行 , 結果頁將同時回傳至同一頁 </li></ul>
    25. 25. Term Search by application <ul><li>http://www-db.embl.de/jss/servlet/de.embl.bk.htmlfind.HarvesterOutputMysql?m = doSearch&fH =0&search=brain </li></ul><ul><li>&quot;&fH= 0 &quot; for searches in human </li></ul><ul><li>&quot;&fH= 1 &quot; for searches in mouse </li></ul><ul><li>&quot;&fH= 2 &quot; for searches in rat </li></ul><ul><li>&quot;Search= search term “, search term can be anything </li></ul>
    26. 26. Sequence Search by application <ul><li>http://www-db.embl.de/jss/servlet/de.embl.bk.htmlfind.HarvesterOutputMysql?m =seqSearch&sequence_1=%25KDEL%25 </li></ul><ul><li>&quot; %25 &quot; is the ASCII character for &quot;%&quot; in Harvester searches </li></ul>
    27. 27. Template <ul><li>Harvester  Harvester Wiki </li></ul>keyword url 連結方法 sequence
    28. 28. Practice <ul><li>Example: %VDH__EYGNL%(_ _ 之間無空格 ) </li></ul><ul><li>Example: %VDH_ _EYGNL% </li></ul>http://www-db.embl.de/jss/servlet/de.embl.bk.htmlfind.HarvesterOutputMysql?m = seqSearch <ul><li>注意字串間不能有空格 , 或 % , _ 以外符號 </li></ul>
    29. 29. Introduction to Databases
    30. 30. <ul><li>S tanford O nline U niversal R esource for C lones and E STs </li></ul><ul><li>Gene summary </li></ul><ul><li>Dynamically collects and compiles data from many scientific databases </li></ul><ul><ul><li>UniGene , dbEST , Swiss-Prot , GeneMap99 , RHdb , GeneCards and LocusLink </li></ul></ul><ul><li>Query by LocusLink IDs, UniGene ID, gene name </li></ul><ul><li>Currently Homo sapiens , Mus musculus , Rattus are available. </li></ul>
    31. 31. http://genome-www5.stanford.edu/cgi-bin/source/sourceSearch
    32. 33. Source Batch Search <ul><li>http://genome-www5.stanford.edu/cgi-bin/source//sourceBatchSearch </li></ul>
    33. 34. GFP-cDNA <ul><li>Fluorescent protein localization </li></ul>
    34. 35. GFP-cDNA <ul><li>GFP-cDNA provides the localization of proteins to sub cellular compartments of the eukaryotic cell applying fluorescence microscopy. </li></ul><ul><li>Experimental data are complemented with bioinformatics analyses and published online. </li></ul><ul><li>A collaboration of EMBL and German Cancer Research Centre ( DKFZ ). </li></ul>
    35. 36. GFP-cDNA <ul><li>http://gfp-cdna.embl.de / </li></ul>輸入方式跟 Harvester 一樣
    36. 37. SMART <ul><li>Simple Modular Architecture Research Tool 提供蛋白質序列分析和 domain 查詢。 輸入蛋白質序列或 domain, 可查詢蛋白質序列和 domain 結構的關聯性。 </li></ul><ul><li>Protein domain analysis and identification </li></ul><ul><li>More than 500 domain families found in signaling, extra-cellular and chromatin-associated proteins are detectable. </li></ul><ul><li>The basic data of SMART are manually alignments of protein domain families </li></ul><ul><li>Source databases are Swiss-Prot , SP-TrEMBL , and Ensembl . </li></ul>
    37. 38. SMART <ul><li>http://smart.embl-heidelberg.de / </li></ul><ul><li>Normal Mode: </li></ul><ul><li>input Uniprot/Ensembl sequence ID/ACC </li></ul><ul><li>protein sequence </li></ul><ul><li>Only include the proteins from 170 completely sequenced genomes </li></ul>
    38. 39. SMART <ul><li>http://smart.embl-heidelberg.de / </li></ul>
    39. 40. SMART <ul><li>The smart database is already integrated in the NCBI-CDD Search and the InterPro search . </li></ul>
    40. 41. PSORT <ul><li>http://psort.ims.u-tokyo.ac.jp / </li></ul><ul><li>prediction of subcellular localization </li></ul><ul><li>MCCRQLEHDRA TERKKEVEKF KRLIRDPETI KHLDRHSDSK QGKYLNWDAV </li></ul>
    41. 42. SOSUI <ul><li>http://bp.nuap.nagoya-u.ac.jp/sosui/sosui_submit.html </li></ul><ul><li>The algorithm was developed in 1996 at Tokyo University. </li></ul><ul><li>The name means as much as &quot; hydrophobic &quot;. </li></ul><ul><li>Predict whether the protein is a soluble or a transmembrane protein. </li></ul><ul><li>The prediction is based on the physicochemical properties of amino acid sequences such as hydrophobicity and charges . </li></ul>
    42. 43. SOSUI Result http://sosui.proteome.bio.tuat.ac.jp/cgi-bin/adv_sosui.cgi
    43. 44. iHOP <ul><li>http://www.ihop-net.org/UniPub/iHOP/ </li></ul>Gene defining information interaction
    44. 45. GoPubmed <ul><li>http://www.gopubmed.org/ </li></ul><ul><li>Gene ontology : 將所有基因從功能或性質上分成三類 , 如 Biological Process Lists 、 Cellular Component Lists 、 Molecular Function Lists 。 </li></ul>
    45. 46. STRING <ul><li>http://string.embl.de/ </li></ul>
    46. 47. Summary
    47. 48. Summary <ul><li>Harvester 目前提供 human, mouse, rat 三種 genome </li></ul><ul><li>優點 </li></ul><ul><ul><li>資料來源為知名網站或 publication( 如 UniProt, SMART, PSORT etc..) </li></ul></ul><ul><ul><li>Gene 及 protein 資料涵蓋廣泛 : 實驗資料 ( 如 GFP-cDNA ), 預測 domain( 如 SMART ), interaction( 如 String ) </li></ul></ul><ul><ul><li>Crosslink 多個網站 ( 如 NCBI, Ensembl, Genome Browser ) 可同時比較資料 </li></ul></ul><ul><ul><li>網站資料至少 14 天更新一次 </li></ul></ul><ul><li>唯美中不足的是 , 許多工具不易找到連結 ( 如 sequence 的輸入 ), 需花較多時間熟悉 </li></ul>
    48. 49. Thank you!

    ×