Computers are particularly well suited to searching through the billions of bases in any genome looking for important information that can alter treatments and save lives. However, if done incorrectly, computers can take forever to find even the simplest portions of sequence in the genome. In this discussion we will be showing practical information and examples about how to correctly encode genomic sequences for super fast, programmatic search. We’ll explore in detail how computers work, think, and read. Given the right “genomic language”, they can be our greatest, and fastest, allies in finding malicious variants.
Timothy Collinson
2. Who is Tim?
2
• A long time programmer
• Worked for some interesting companies!
• Technical Fellow working at HP Labs
• Interested in technical challenges
• Has a pretty big family (wife and 6 kids)
• Scout Master (plenty of experience with
Blood)
3. Why is reading the human genome hard?
3
DNA seems quite simple:
A C G T
(Note: RNA is nearly as simple)
4. Why is reading the human genome hard?
4
The human genome has a few
of those simple letters
smashed together in pairs…
5. Why is reading the human genome hard?
5
3,200,000,000*
* or so, give or take
6. Why is reading the human genome hard?
6
Add image showing entire
genome here*
* We would need 40,000 slides, or so, to show it**
** This would make my presentation boring***
*** This is a wild assumption, bare with me
7. Reading the human genome is hard?
7
Finding patterns in letters can be simple!
Let’s try:
8. Reading the human genome is hard?
8
Find the changed base:
Normal - A C G T T G C A
Variant - A C G T G T C A
9. Reading the human genome is hard?
9
Let’s try an example in the BRCA1 Gene*
* It’s slightly larger
13. Computers are fast?
13
Computers can be fast, but computers are
stupid.
They must be taught to do everything.
Deep down, they don’t do much.
14. Computers are fast?
14
Sometimes we try to teach computers to think
like humans.
The cognitive dissonance between humans and
computers is reduced this way.
15. Computers are fast?
15
Human’s would rather look at this:
A C G T
Than this:
00011011
But let’s not get ahead of ourselves.
27. Computers think differently.
27
Computers can process a lot of instructions.
This Mac can handle about 3,200,000,000 per
second*
Everything a computer does is a set of bitwise
instructions.
* That number seems familiar!
28. Computers think differently.
28
int i = 7;
int j = 8;
int k = i + j;
We see:
• Create a variable, called i, set to 7.
• Create a variable, called j, set to 8.
• Create a variable, called k, set to the sum of i and j.
29. Computers think differently.
29
A computer sees:
• Create a storage location of a certain size
• Name it i
• Store a value in that location
• Keep track of that location for further use
• Create a storage location of a certain size
• Name it j
• Store a value in that location
• Keep track of that location for further use
• Create a storage location of a certain size
• Name is k
• Check to see if there is a storage location named i
• Get the value
• Check to see if there is a storage location named j
• Get the value
• Use operations to sum the values from the locations found in previous instructions
• Place resulting value in to location from previous instruction
• Keep track of that location for further use
34. Computers think differently.
34
As we move forward it is going to take more
than new chemistry techniques and algorithms
to find variants.
We’re also going to have to reduce the number
of instructions a computer must complete in
order to find variants.