Your SlideShare is downloading. ×
Technology development of database integration to make sense of big data in lifescience
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Technology development of database integration to make sense of big data in lifescience

409

Published on

In order to promote life science researches in Japan, National Bioscience Database Center(NBDC) makes databases easier to use. As one of core technology development program in NBDC, Database Center …

In order to promote life science researches in Japan, National Bioscience Database Center(NBDC) makes databases easier to use. As one of core technology development program in NBDC, Database Center for Life Science (DBCLS) has been tackling the problem how to organize big data in lifescience including huge amount of nucleotide sequence data from next generation sequencers and various types of gene expression data.

For nucleotide sequence data, we sorted out data deposited in Sequence Read Archive (SRA) for recycling those data in collaboration with DDBJ, which collaboratively holds SRA. We have been maintaining the statistics of SRA based on study types, sequencer types(platform) and species of samples by analyzing metadata of SRA, and these information is available from our DBCLS SRA website (http://sra.dbcls.jp/). Notably, we are collecting SRA entries associated with publications and diseases, and these search form is also accesible for use from DBCLS SRA website.
We are also developing search engine for nucleotide sequence data by utilizing the compressed suffix array. We have developed GooGle-like search engine for RNA molecules, called GGRNA[1], and it is available for use from GGRNA website (http://ggrna.dbcls.jp/).

In order to handle various types of gene expression data, we made the integrated dataset and its interface, called RefEx (Reference Expression dataset: http://refex.dbcls.jp/ ) to browse gene expression data derived from public databases by following four methods in human, mouse and rat.

1. Expressed Sequence Tag (EST) counts in EST division of INSDC(DDBJ/ENA/GenBank)
2. DNA microarray (Affymetrix GeneChip)
3. CAGE(Cap Analysis Gene Expression) tag counts around transcription start sites
4. Transcriptome sequence counts from the next generation sequencers (RNA-seq)

Web interface for RefEx contains the form in which users can search by gene names, various types of IDs, chromosomal regions in genetic maps, keywords and nucleotide sequences. Gene expression values are mapped to the 3D body image in BodyParts3D[2] as well as the graphical histograms for those are available for different types of measurement methods.

We will present current status of the project and utility of the system developed.

[1] Naito, Y. and H. Bono (2012) GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts. Nucleic Acids Research. 40: W592-W596.
[2] Mitsuhashi, N., Fujieda, K., Tamura, T., Kawamoto, S., Takagi, T. and K. Okubo (2009) BodyParts3D: 3D structure database for anatomical concepts. Nucleic Acids Research. 37: D782-D785.

You can see the TogoTV version of this presentation from http://togotv.dbcls.jp/20130903.html

Published in: Technology, Business
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
409
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
1
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Technology development of database integration to make sense of big data in lifescience Hidemasa Bono Database Center for Lifescience (DBCLS) Research Organization of Information and Systems (ROIS), JAPAN
  • 2. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Who we are: togoDB • The integrated database project in Japan • Collaborative effort to recycle data –Provide data which can easily reuse –Retain data which is part of ‘public data’ 2 TogoHeadquarters Technology developer DNA data archiver Universities & institutes Data organizer http://biosciencedbc.jp/
  • 3. © 2013 DBCLS Licensed under CC BY 2.1JAPAN NBDC portal 3 http://biosciencedbc.jp/
  • 4. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 4 http://integbio.jp/dbcatalog/
  • 5. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 5 photo by @hirabat (1st Bono Conference on 20130113 ) • No registration • Not only for academia, also for-profit Free!
  • 6. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Big data in lifescience • Output mostly from machines –NGS(Next Generation Sequencers) • over 100M lines, 2Gbyte in size/sample • Ethical issues: Personal human genome • So many variations in... –Data format –Application: re-sequencing, de novo seq, RNA-seq,... –Annotation: granularity of metadata Pictures from Togo Picture Gallery http://g86.dbcls.jp/togopic/ NGS(SRA) GEO ArrayExpress Genome Metagenome RNAseq ChIPseq microarray (GeneChip, Oligoarray)
  • 7. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Making sense of big data... 1. Exhaustive, but functional index 3. Highly curated dataset 2. Search engine for lifescience NGS(SRA) GEO ArrayExpress Genome Metagenome RNAseq ChIPseq microarray (GeneChip, Oligoarray)
  • 8. © 2013 DBCLS Licensed under CC BY 2.1JAPAN What we have developed 1.Yellow pages for NGS data archived http://SRA.dbcls.jp/ 2.Search engine for nucleotide sequences http://GGRNA.dbcls.jp/ 3.Summarization and visualization of reference transcriptome data http://RefEx.dbcls.jp/ 8
  • 9. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 1. DBCLS SRA • Yellow pages for NGS data archived –Indexed by metadata. Search by.... • Statistics • Publications • Diseases –Direct link to original DB(SRA) • Pre-calculated QC data 9 Search data Download Quality Check Data processing Analysis Pipeline to help users re-use public NGS data http://SRA.dbcls.jp/
  • 10. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Statistics: studies 10http://SRA.dbcls.jp/
  • 11. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 11
  • 12. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Statistics: samples 12 http://SRA.dbcls.jp/
  • 13. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Search by publications 13
  • 14. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Search by diseases 14
  • 15. © 2013 DBCLS Licensed under CC BY 2.1JAPAN Search by diseases(cont.) 15 http://SRA.dbcls.jp/
  • 16. © 2013 DBCLS Licensed under CC BY 2.1JAPAN GGRNA 16 •Quickly finds nucleotiode sequence as well as other fields in RefSeq transcripts using suffix array •Easily highlights PCR primers, microarray probes and target sequences of siRNA 2. GooGle like RNA search engine http://GGRNA.dbcls.jp/ Naito Y. & Bono H. Nucleic Acids Res. (2012) 40: W592-6. doi: 10.1093/nar/gks448
  • 17. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 17 Genome version of GGRNA? Yes, we can! GooGle like Genome search engine http://GGGenome.dbcls.jp/
  • 18. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 3. RefEx: Reference Expression Dataset • 40 organs dataset, 4 different methods, with BodyParts3D –Reference of gene expression in normal organs throughout the mammalian body –Practical example of reuse of useful public data • The search for "tissue-specific genes" 18 EST Classical Expressed Sequence Tags GeneChip Affymetrix’s microarray CAGE Cap Analysis of Gene Expression RNAseq Transcriptome Sequencing http://RefEx.dbcls.jp/
  • 19. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 19 http://RefEx.dbcls.jp/
  • 20. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 20
  • 21. © 2013 DBCLS Licensed under CC BY 2.1JAPAN 21
  • 22. © 2013 DBCLS Licensed under CC BY 2.1JAPAN What we have developed 1.Yellow pages for NGS data archived http://SRA.dbcls.jp/ 2.Search engine for nucleotide sequences http://GGRNA.dbcls.jp/ 3.Summarization and visualization of reference transcriptome data http://RefEx.dbcls.jp/ 22 are developing
  • 23. © 2013 DBCLS Licensed under CC BY 2.1JAPAN TogoTV Archive of talks and tutorial videos expounding how to use biological databases and tools 23 http://togotv.dbcls.jp/en/Acknowledgement •Members in DBCLS for technology development •NBDC for funding/DDBJ for storage & CPU time •All people for sharing precious data

×