Fifth presentation at Eagle Genomics 2nd symposium from Mario Caccamo from TGAC.
"As the cost of sequencing continues to drop the use of these technologies to directly genotype large populations is becoming the method of choice. Direct sequencing goes beyond the detection of single-nucleotide polymorphisms allowing for screening more complex variants including insertions/deletions and translocations. The manipulation of large genotype datasets using conventional relational database solutions, however, does not scale resulting in many cases in situations where the size of the indexes surpasses the size of the data. There are many characteristics of genotype data that cannot be efficiently exploited by relational approaches. For instance, genotype data are generated once to be used many times in what is called WORM data for write-once read-many, therefore an indexing structure supporting updates is not required. Another important observation is that even for complex structural variants, simple and uniform data types can model genotypes. This suggests that simple flat files in high-performance disks can be a satisfactory solution to implement genotype warehouse databases. Unfortunately, when genotype data are combined with phenotype information, this simple solution will not suffice. In this presentation we will discuss the development of a database platform that can efficiently support genotypes-by-sequencing datasets with some example of applications to plant genomics data."