Advertisement
Tyler cshlte18 repeat_f_unl
Upcoming SlideShare
PEN: How global-comparative data challenges conventional wisdomPEN: How global-comparative data challenges conventional wisdom
Loading in ... 3
1 of 1
Advertisement

More Related Content

Similar to Tyler cshlte18 repeat_f_unl(20)

Advertisement

Tyler cshlte18 repeat_f_unl

  1. Acknowledgements Thanks go to Elizabeth Smikle, Jireh Agda, Nolan Dickson, Dina Soliman and Megan Milton for programming, image generation and layout support. Next Steps • What types of data need to be included to make RepeatFUnL useful? • What formats should be supported/compatible? • Please fill out our questionnaire at www.repeatfunl.org • Gaining cooperation and support of MGE and repeat community and private databases • Please contact if you are interested in furthering this project • telliott@boldsystems.org • @TransposableMan Future Goals • Provide analysis tools to aid in data curation and generation for users • Serve as a platform to enforce community developed standards for MGE and repeat annotation and classification • Develop teaching applications to introduce students to genomic data and curation • Understand the impact of MGEs and repeats on phenotypic variation and disease across the Tree of Life • Unravel the evolutionary diversity of MGEs and other mobile DNA Value Added by RepeatFUnL • Aggregate data across sources in single, searchable format for easy download • Build off expertise and reputation of the Centre for Biodiversity Genomics in developing and maintaining mature sequence databases and NGS analysis resources (BOLD, mBRAVE) • Make computational intensive data generated by experts more discoverable and usable to general scientific community • Universal data schema for repeat and MGE transactions and storage of data RepeatFUnL: Filterable Universal Library • RepeatFUnL will aggregate MGE and repeat information across databases, support and enhance current databases rather than replace them • The central units of RepeatFUnL are Repeat Records • Data stored in NoSQL format to aid in searching and filtering a large distributed dataset • Will include data from databases, primary literature, uploaded from users and generated de novo Repeat Data Challenges • Mobile genetic element (MGE) and repeat information is of value for a variety of disciplines (evolution, ecology, agriculture, medicine, biotechnology) • MGE and repeat data is difficult to generate, requires curation, with few standards for storage, classification and annotation • Long read and cheaper sequencing will enable large projects to generate millions of genomes over the next decade and managing repeat information will be crucial (Figure 1) • Many databases exist (Table 1), but these can be hard to search and download, along with data being duplicated and fragmentated across multiple databases • Repeat information would greatly benefit from better connectivity and searchability Analyze Download Upload Curate Collaborate Search Tyler A. Elliott and Sujeevan Ratnasingham Centre for Biodiversity Genomics, University of Guelph, Ontario, Canada Developing a comprehensive, integrative repeat database for the broad scientific community Genomes Databases MGE/Repeat Community Literature Figure 1. Projected growth in genomes sequenced over the next decade. Table 1. Current repeat and MGE information. * indicates an underestimate. MGE/Repeat Statistic Number MGE records in Databases 1.3 million Accessions with MGEs in GenBank 6 million* Repeat records in Databases 8 million Species with MGE/repeat records ~3000 Taxonomy Repeat Records References Associated Data # External IDs 0.0 2.5 5.0 7.5 10.0 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 Year Genomes(millions) Archaea and Bacteria Eukaryote Plasmid Virus Number of Genomes Sequenced
Advertisement