Database Integration to Improve Accessibility to High-Throughput Sequence DataTazro Ohta
This document discusses the need for reliable and accessible databases to store high-throughput sequencing (HTS) data. It describes how integrating databases with publications and metadata can improve searchability and reuse of HTS data. Specifically, it notes that integrating Japan's DRA HTS database with PubMed, PMC, and sequence read quality data sources allows for more efficient searches based on metadata. This enhanced metadata searchability addresses the previous problem of data being difficult to find without descriptions. The integration demonstrates the power of combining resources to improve data accessibility while maintaining reliability through curation. Going forward, the document suggests the database could further integrate alignment data and require analysis pipelines be publicly available to improve reproducibility.
This document discusses large-scale data in life sciences. It covers next-generation sequencing data such as short reads from sequencing platforms like Illumina HiSeq 2000. It also discusses techniques for analyzing sequencing data such as de novo assembly and reference alignment. Key algorithms mentioned include Velvet and SOAPdenovo for de novo assembly. Issues around processing large datasets with tools like Galaxy are also briefly covered.
Kusarinoko: developing the public next generation sequencing data search inte...Tazro Ohta
The Kusarinoko project aims to develop a public next generation sequencing data search interface that makes it easier to find and access data from the Sequence Read Archive (SRA). Problems with managing and searching the large amount of metadata in public databases motivated the creation of Kusarinoko. It integrates metadata from various SRA files, adds publication information and quality check results to provide more support for users. Statistics analyzed by the project provide insights into trends in the SRA, such as changes in sequencing platforms over time, and show that some data may lack sufficient quality despite being associated with published articles.
The document provides information about an NGS database that maps public expression data to chromosomes and CPUs. It also includes stock photos and image links from photo sharing websites to illustrate the mapping of data.
This document appears to be a collection of random Twitter links from various accounts posted over time. It does not form a coherent story or provide any essential information in its current state. The document references Twitter accounts and post IDs but does not include any tweet text or summaries.
Database Integration to Improve Accessibility to High-Throughput Sequence DataTazro Ohta
This document discusses the need for reliable and accessible databases to store high-throughput sequencing (HTS) data. It describes how integrating databases with publications and metadata can improve searchability and reuse of HTS data. Specifically, it notes that integrating Japan's DRA HTS database with PubMed, PMC, and sequence read quality data sources allows for more efficient searches based on metadata. This enhanced metadata searchability addresses the previous problem of data being difficult to find without descriptions. The integration demonstrates the power of combining resources to improve data accessibility while maintaining reliability through curation. Going forward, the document suggests the database could further integrate alignment data and require analysis pipelines be publicly available to improve reproducibility.
This document discusses large-scale data in life sciences. It covers next-generation sequencing data such as short reads from sequencing platforms like Illumina HiSeq 2000. It also discusses techniques for analyzing sequencing data such as de novo assembly and reference alignment. Key algorithms mentioned include Velvet and SOAPdenovo for de novo assembly. Issues around processing large datasets with tools like Galaxy are also briefly covered.
Kusarinoko: developing the public next generation sequencing data search inte...Tazro Ohta
The Kusarinoko project aims to develop a public next generation sequencing data search interface that makes it easier to find and access data from the Sequence Read Archive (SRA). Problems with managing and searching the large amount of metadata in public databases motivated the creation of Kusarinoko. It integrates metadata from various SRA files, adds publication information and quality check results to provide more support for users. Statistics analyzed by the project provide insights into trends in the SRA, such as changes in sequencing platforms over time, and show that some data may lack sufficient quality despite being associated with published articles.
The document provides information about an NGS database that maps public expression data to chromosomes and CPUs. It also includes stock photos and image links from photo sharing websites to illustrate the mapping of data.
This document appears to be a collection of random Twitter links from various accounts posted over time. It does not form a coherent story or provide any essential information in its current state. The document references Twitter accounts and post IDs but does not include any tweet text or summaries.
Now and then: next-generation sequencing database to encourage the big data science
1. データベースから見た 次世代シーケンスによる研究の これまでとこれから
研究者を助けるために データベースは何をすべきか
Now and then: next-generation sequencing database to encourage the big data science
Database Center for Life Science
大田達郎 Tazro Ohta