Your SlideShare is downloading. ×
0
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Kusarinoko: developing the public next generation sequencing data search interface that works.
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Kusarinoko: developing the public next generation sequencing data search interface that works.

7,756

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
7,756
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Transcript

    • 1. Kusarinoko:developingthe public next generation sequencing datasearch interfacethat works. Tazro Ohta Database Center for Life Science Research Organization of Information and Systems
    • 2. Problems for NGS data archivemanaging large-scale dataKusarinoko project, for better way to search and browsemetadata, fix and addInside of Sequence Read Archivestatistics of SRA reveals how it isToday’s topics
    • 3. Problems for NGS dataarchive
    • 4. Storing large-scale NGS data causes many problemsdata transfer, storage, backup...Metadata management is one big problem for public NGSdatabasemetadata : description of sequencing data. sample, sequencer platform,application, etc.Fixing metadata is a lifeline for public NGS databaseCost of storing large-scale sequence data
    • 5. organism : mouse ATGCATGCATGCATGCATGCAT GCATGCATGCATGCATGCATGC : nervous cell cell ATGCATGCATGCATGCATGCAT GCATGCATGCATGATGCATGCA sequencer : 454 TGCATGCATGCATGCATGCATG CATGCATGCATGCATGCATGCA date : 2011 12 08 TGCATGATGCATCGATGCAATG CATGCATGCATGCATGCATGCA TGCATGCATGCATGCATGCATG CATGCATGCATGCAGCATGCAT GCATGCATGCATGCATGCATGC SRA ATGCATGCATGCATGCATGCATLab / Research institute DRA INSDC int’l nucleotide seq DB collaboration data exchange and sharing ATGCATGCATGCAT GCATGCATGCATGC ATGCATGCATGCATdata submission ATGCATGCATGCATGCATGCAT GCATGCATGCATGA GCATGCATGCATGCATGCATGC TGCATGCATGCATG ATGCATGCATGCATGCATGCAT CATGCATGCATGCA GCATGCATGCATGATGCATGCA Dat TGCATGATGCATCG TGCATGCATGCATGCATGCATG w/ metadata CATGCATGCATGCA Data ID : 000001 CATGCATGCATGCATGCATGCA org TGCATGCATGCATG TGCATGATGCATCGATGCAATG CATGCATGCATGCA CATGCATGCATGCATGCATGCA organism : mouse GCATGCATGCATGC cell TGCATGCATGCATGCATGCATG ATGCATGCATGCAT CATGCATGCATGCAGCATGCAT cell : nervous cell seq GCATGCATGCATGCATGCATGC ATGCATGCATGCATGCATGCAT sequencer : 454 date date : 2011 12 08 ENA Sequence Read ArchivePublic NGS database, Sequence Read Archive
    • 6. Over 55,000 submissions, over 350,000 sequence runsand still increasing amount and size of the dataMetadata is provided apart, and is not described perfectlysubmission / study / experiment / sample / runFixing metadata and adding extra information is NEEDEDIt cannot be easy to find the data you want
    • 7. Kusarinoko project, for better way to search and browse
    • 8. Cutting the cost of using public data of SRAsearch, browse, download, checkGiving more resources to support using datais the data really sound?Aim of Kusarinoko project
    • 9. Study.xml Experiment.xml Submission.xml Sequence Data metadataRun.xml Sample.xml pubmed ID FastQC result get from sra.dbcls.jp calculate seq quality Submission.xml by FastQC integrate Kusarinoko Integrate metadata, add extra information
    • 10. Covering only the data which has at least one publishedarticleif a paper is not published yet, Kusarinoko cannot find it. publication info:sra.dbcls.jpQuality checking is still beta verstill on validating and trying to offer better information, will take more timeLimitation and features
    • 11. http://g86.dbcls.jp/kusarinoko or google “kusarinoko”
    • 12. Inside of Sequence Read Archive
    • 13. Statistics of SRA by publication and seq quality ONLY PUBLIC NGS DATA IN SRA WHICH HASPUBLICATION Detailed stat will be available online at project website soonStatistics for stepping into SRA
    • 14. 2007~2011 number of submission Blue: Roche Yellow: Illumina Green: AB Pink: Helicos Red: PacBioplatform trend statistics
    • 15. number of PubMed ID colored by Library type Blue: genomic Red: transcriptomic Brown: metagenomic Yellow: synthetic Purple: Viral RNA Green: non genomic total 97 journals (unidentified) 587 total # of pmid:Journal statistics
    • 16. quick quality calc; total average qual (phred) Blue: Roche Yellow: Illumina Green: AB Pink: Helicos Red: PacBio same as max read length total # of items (continuing) (run): 16,006minimum read length vs average quality value
    • 17. total N content rate; no correlation with number of reads, library prep methods total # of items (continuing) (run): 16,006total number of reads vs N content
    • 18. total sequence duplication same as previous stat amount of reads seems not to effect duplication total # of items (continuing) (run): 16,006total number of reads vs duplication rate
    • 19. Conclusion
    • 20. Developed a service to help searching and browsing SRA datapublication information and result of quality check support the metadata.Statistics revealed the inside of SRA and gave some insightsshowed NGS trends, and some items don’t have enough quality even if it has apublished article.Detailed results and more at poster presentation: 2P-0132(today)Conclusion: for making use of public resources
    • 21. Thank You

    ×