There are data from nearly 30,000 experiments deposited in the Sequence Read Archive, a repository of genomic and transcriptomic data corresponding to approximately 2 x 10^15 nucleotides (equivalent to approximately 600,000 human genomes). This is a huge resource that could be mined for a variety of different purposes. It is also a vast collection of different samples, having been gathered under a variety of different technologies and experimental protocols. Jamie Alnasir and I have demonstrated that the annotation of this data is in general very poor and hence undermines the usefulness of such a resource. We draw some conclusions from this and where such databases need to go in the future.