24. … but with a wrinkle
• Lab personnel accept
the software you give
them
• Analysts are more
than happy to develop
their own
• We need to make it
easy for analysts to
build tools within the
system
ddooling@wustl.edu
31. Challenges
• There is still much more work to do
• Sequencing is demolishing Moore’s law
• The cult of traces
• The richness of data
• Visualization
ddooling@wustl.edu
33. Thanks
Web Site
http://genome.wustl.edu/
Blog
http://www.politigenomics.com/
LIMS Paper
http://www.biomedcentral.com/1471-2105/8/362
UR Presentation
http://www.media-landscape.com/yapc/2006-06-27.ScottSmith/
ddooling@wustl.edu
Editor's Notes
There is too much data
4
genomes to more than an order of magnitude increase
Move from processing regions to single genomes to multi-genome comparisons
This is a story about how we are trying to deal with this problem
This creates tension
Sample in -> answer
out
Don’t care how the sausage was made.
Never the same pipe
twice (TJ
Max)
And expanding beyond the laboratory
Different aligners, genotypers
How do we even begin to tackle
this problem?
How do we resolve the tension between changing pipelines and production systems?
Metadata
Store DNA
types, equipment, reagents, even process steps as rows rather than tables
So maq is not maq, it is an aligner
Standards like SAM help
Solexa/Maq
specific commands
Generic
medical resequencing pipeline
Never write SQL
XML and flow chart
Click on any box to see processing
details including file system location
Screenshot of script vs. module
photograph
What I have talked about here is automation
There
is still much work to do in data reduction
How do you compare more than three genomes?
How
do you track all the analysis?
So that’s one problem