3. Congrats!
• first ever genome assembly
• complete approach
• with real-world cutting edge tools
• Some shortcuts:
• we used 2% of a eukaryotic (ant) genome
• only 1 type of paired reads
• only used “one-step” software.
4. So you want to do sequence a
genome…
• Sampling?
• algorithms prefer low diversity
• Sequencing approach?
• paired end?
• which sequencer?
• what is needed for scaffolding?
6. So you want to do sequence a
genome…
• Sampling?
• algorithms prefer low diversity
• Sequencing approach?
• paired end?
• which sequencer?
• what is needed for scaffolding?
• input data Q/A?
• sequencer statistics
fastqc
Unable to detect all errors!
• • bio-relevant measurements? (e.g. % mapping to known data)
7. So you want to do sequence a
genome…
• trimming/deduplicating/filtering
• removing excess/redundant data
• removing errors
•Which assembler?
• used by others? (publications/ online list/ forum/
assemblathon)
• something new?
!
• assembly result QA
• sequence statistics (e.g., QUAST)
• bio-relevant measures (e.g. ,CEGMA)
8. Perfect parameters
• Instead: need to test many combinations
• of trimming
• of filtering
• different assembly software
9. Take home messages
• No “best way”
• Need to install a lot of software
• A lot of work in UNIX - to launch software, to convert
formats…
• Need to test many parameters
• Be careful with qualities!
10. No need to understand
everything!
20% effort for 80% result