SlideShare a Scribd company logo
1 of 1
Download to read offline
Acknowledgements
Thanks go to Elizabeth Smikle, Jireh Agda, Nolan Dickson, Dina Soliman and Megan Milton for programming,
image generation and layout support.
Next Steps
• What types of data need to be included
to make RepeatFUnL useful?
• What formats should be
supported/compatible?
• Please fill out our questionnaire at
www.repeatfunl.org
• Gaining cooperation and support of MGE
and repeat community and private
databases
• Please contact if you are interested in
furthering this project
• telliott@boldsystems.org
• @TransposableMan
Future Goals
• Provide analysis tools to aid in data
curation and generation for users
• Serve as a platform to enforce
community developed standards for
MGE and repeat annotation and
classification
• Develop teaching applications to
introduce students to genomic data and
curation
• Understand the impact of MGEs and
repeats on phenotypic variation and
disease across the Tree of Life
• Unravel the evolutionary diversity of
MGEs and other mobile DNA
Value Added by RepeatFUnL
• Aggregate data across sources in single,
searchable format for easy download
• Build off expertise and reputation of the
Centre for Biodiversity Genomics in
developing and maintaining mature
sequence databases and NGS analysis
resources (BOLD, mBRAVE)
• Make computational intensive data
generated by experts more discoverable
and usable to general scientific
community
• Universal data schema for repeat and
MGE transactions and storage of data
RepeatFUnL: Filterable Universal
Library
• RepeatFUnL will aggregate MGE and
repeat information across databases,
support and enhance current databases
rather than replace them
• The central units of RepeatFUnL are
Repeat Records
• Data stored in NoSQL format to aid in
searching and filtering a large distributed
dataset
• Will include data from databases, primary
literature, uploaded from users and
generated de novo
Repeat Data Challenges
• Mobile genetic element (MGE) and repeat information is of value for a variety of disciplines
(evolution, ecology, agriculture, medicine, biotechnology)
• MGE and repeat data is difficult to generate, requires curation, with few standards for storage,
classification and annotation
• Long read and cheaper sequencing will enable large projects to generate millions of genomes
over the next decade and managing repeat information will be crucial (Figure 1)
• Many databases exist (Table 1), but these can be hard to search and download, along with data
being duplicated and fragmentated across multiple databases
• Repeat information would greatly benefit from better connectivity and searchability
Analyze
Download Upload
Curate
Collaborate Search
Tyler A. Elliott and Sujeevan Ratnasingham
Centre for Biodiversity Genomics, University of Guelph, Ontario, Canada
Developing a comprehensive, integrative repeat database for the broad
scientific community
Genomes Databases
MGE/Repeat
Community Literature
Figure 1. Projected growth in genomes sequenced over the next decade.
Table 1. Current repeat and MGE information. * indicates an underestimate.
MGE/Repeat Statistic Number
MGE records in Databases 1.3 million
Accessions with MGEs in
GenBank
6 million*
Repeat records in
Databases
8 million
Species with MGE/repeat
records
~3000
Taxonomy
Repeat Records
References
Associated Data
#
External IDs
0.0
2.5
5.0
7.5
10.0
2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028
Year
Genomes(millions)
Archaea and Bacteria
Eukaryote
Plasmid
Virus
Number of Genomes Sequenced

More Related Content

Similar to Tyler cshlte18 repeat_f_unl

Big Data and Tangibles - TEI 13
Big Data and Tangibles - TEI 13Big Data and Tangibles - TEI 13
Big Data and Tangibles - TEI 13Consuelo Valdes
 
FutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptxFutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptxPriyanshuYadav365563
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen ARDC
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Sandra Binning
 
AVAToL-related funding opportunities presentation
AVAToL-related funding opportunities presentationAVAToL-related funding opportunities presentation
AVAToL-related funding opportunities presentationlisarmoore
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012IUPUI
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...African Open Science Platform
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forumChris Dwan
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Micah Altman
 
Online Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUOnline Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUNYU Tandon Online
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2Alex Hardisty
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET
 
Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012IUPUI
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)Michael Atkins
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsShikha Thakur
 
The Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food CommunityThe Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food Communitye-ROSA
 

Similar to Tyler cshlte18 repeat_f_unl (20)

Big Data and Tangibles - TEI 13
Big Data and Tangibles - TEI 13Big Data and Tangibles - TEI 13
Big Data and Tangibles - TEI 13
 
FutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptxFutureBioinformatics and Optimization tools for sustainable development.pptx
FutureBioinformatics and Optimization tools for sustainable development.pptx
 
Cri big data
Cri big dataCri big data
Cri big data
 
Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen Supporting researchers in the molecular life sciences Jeff Christiansen
Supporting researchers in the molecular life sciences Jeff Christiansen
 
Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?Public Data Archiving in Ecology and Evolution: How well are we doing?
Public Data Archiving in Ecology and Evolution: How well are we doing?
 
E science2015
E science2015E science2015
E science2015
 
AVAToL-related funding opportunities presentation
AVAToL-related funding opportunities presentationAVAToL-related funding opportunities presentation
AVAToL-related funding opportunities presentation
 
Chapter 12
Chapter 12Chapter 12
Chapter 12
 
Open Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality DataOpen Access as a Means to Produce High Quality Data
Open Access as a Means to Produce High Quality Data
 
Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012Meeting the NSF DMP Requirement June 13, 2012
Meeting the NSF DMP Requirement June 13, 2012
 
The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...The role of libraries and information professionals during the Big Data Era/ ...
The role of libraries and information professionals during the Big Data Era/ ...
 
2016 09 cxo forum
2016 09 cxo forum2016 09 cxo forum
2016 09 cxo forum
 
Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse Dissemination Information Packages (DIPS) for Information Reuse
Dissemination Information Packages (DIPS) for Information Reuse
 
Online Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYUOnline Graduate Programs in Bioinformatics at NYU
Online Graduate Programs in Bioinformatics at NYU
 
10th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v210th e concertation-brussels-06march2013-v2
10th e concertation-brussels-06march2013-v2
 
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
dkNET Webinar: Creating and Sustaining a FAIR Biomedical Data Ecosystem 10/09...
 
Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012Meeting the NSF DMP Requirement: March 7, 2012
Meeting the NSF DMP Requirement: March 7, 2012
 
2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)2015 GU-ICBI Poster (third printing)
2015 GU-ICBI Poster (third printing)
 
Career oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of BioinformaticsCareer oppurtunities in the field of Bioinformatics
Career oppurtunities in the field of Bioinformatics
 
The Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food CommunityThe Vision and the Grand Challenges of the Agri-Food Community
The Vision and the Grand Challenges of the Agri-Food Community
 

Recently uploaded

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |aasikanpl
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physicsvishikhakeshava1
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Nistarini College, Purulia (W.B) India
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡anilsa9823
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionPriyansha Singh
 

Recently uploaded (20)

Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
Call Us ≽ 9953322196 ≼ Call Girls In Mukherjee Nagar(Delhi) |
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Work, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE PhysicsWork, Energy and Power for class 10 ICSE Physics
Work, Energy and Power for class 10 ICSE Physics
 
Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...Bentham & Hooker's Classification. along with the merits and demerits of the ...
Bentham & Hooker's Classification. along with the merits and demerits of the ...
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisRaman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Caco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorptionCaco-2 cell permeability assay for drug absorption
Caco-2 cell permeability assay for drug absorption
 

Tyler cshlte18 repeat_f_unl

  • 1. Acknowledgements Thanks go to Elizabeth Smikle, Jireh Agda, Nolan Dickson, Dina Soliman and Megan Milton for programming, image generation and layout support. Next Steps • What types of data need to be included to make RepeatFUnL useful? • What formats should be supported/compatible? • Please fill out our questionnaire at www.repeatfunl.org • Gaining cooperation and support of MGE and repeat community and private databases • Please contact if you are interested in furthering this project • telliott@boldsystems.org • @TransposableMan Future Goals • Provide analysis tools to aid in data curation and generation for users • Serve as a platform to enforce community developed standards for MGE and repeat annotation and classification • Develop teaching applications to introduce students to genomic data and curation • Understand the impact of MGEs and repeats on phenotypic variation and disease across the Tree of Life • Unravel the evolutionary diversity of MGEs and other mobile DNA Value Added by RepeatFUnL • Aggregate data across sources in single, searchable format for easy download • Build off expertise and reputation of the Centre for Biodiversity Genomics in developing and maintaining mature sequence databases and NGS analysis resources (BOLD, mBRAVE) • Make computational intensive data generated by experts more discoverable and usable to general scientific community • Universal data schema for repeat and MGE transactions and storage of data RepeatFUnL: Filterable Universal Library • RepeatFUnL will aggregate MGE and repeat information across databases, support and enhance current databases rather than replace them • The central units of RepeatFUnL are Repeat Records • Data stored in NoSQL format to aid in searching and filtering a large distributed dataset • Will include data from databases, primary literature, uploaded from users and generated de novo Repeat Data Challenges • Mobile genetic element (MGE) and repeat information is of value for a variety of disciplines (evolution, ecology, agriculture, medicine, biotechnology) • MGE and repeat data is difficult to generate, requires curation, with few standards for storage, classification and annotation • Long read and cheaper sequencing will enable large projects to generate millions of genomes over the next decade and managing repeat information will be crucial (Figure 1) • Many databases exist (Table 1), but these can be hard to search and download, along with data being duplicated and fragmentated across multiple databases • Repeat information would greatly benefit from better connectivity and searchability Analyze Download Upload Curate Collaborate Search Tyler A. Elliott and Sujeevan Ratnasingham Centre for Biodiversity Genomics, University of Guelph, Ontario, Canada Developing a comprehensive, integrative repeat database for the broad scientific community Genomes Databases MGE/Repeat Community Literature Figure 1. Projected growth in genomes sequenced over the next decade. Table 1. Current repeat and MGE information. * indicates an underestimate. MGE/Repeat Statistic Number MGE records in Databases 1.3 million Accessions with MGEs in GenBank 6 million* Repeat records in Databases 8 million Species with MGE/repeat records ~3000 Taxonomy Repeat Records References Associated Data # External IDs 0.0 2.5 5.0 7.5 10.0 2018 2019 2020 2021 2022 2023 2024 2025 2026 2027 2028 Year Genomes(millions) Archaea and Bacteria Eukaryote Plasmid Virus Number of Genomes Sequenced