2. The human genome is a 3bln character string of
nucleotides: [ACTG]{3000000000}
Next-generation sequencing allows the genome to be
sequenced at low costs in a short term
For example 454 sequencing generates >1mln 400bp
sequences
Homopolymer problems! Observed vs expected homopolymer
12
length
10
Fluorescence Observed
Expected
8
6
4
2
0
0 2 4 6 8 10 12
Homopolymer size
3. Define a homopolymer block AAAA starting on
position 10 as 10A4
Position = 10
Nucleotide = A
Altitude = 4
ACGTAGGTTTCCA
Genomic pyramid
1 1 1 1 1 2 3 2 1
Start=5; Stop=13; Surface=9; Height=3; Width=5
4. Genomic pyramids
Have a peak of at least 3 high 121
Are bordered by a 1 left and a 1 right 23432
Have a minimal width of 5 1231
Do NOT have to be symmetric 123431
Do NOT have ‘a local minimum’ 1235341
Do NOT overlap with another pyramid 12321231
Examples of good pyramids:
1 2 3 2 1, 1 2 5 4 3 2 1, 1 2 5 3 1, 1 3 5 4 3 1 …
Each pyramid has a start position, a stop, a height, a
width and a surface
5. You are given:
The complete human genome sequence @
http://athos.ugent.be/BPC/genome (be gentle on the
server when downloading…)
Chromosome Y contains 27744 pyramids , biggest=45
highest=38 widest=7
You are asked to give:
The highest, the widest and biggest surface pyramid in
the human genome by the start position
In case of a draw, give the number of ‘draw pyramids’
6. In case the previous questions were not exactly
difficult enough, answer the following
Is there a gene or (multiple genes) that overlaps with 2 of
the 3 special pyramids (biggest, widest, highest). If so,
give the name(s) of the gene(s)
Given that π is 3.1415, we define πramids as 3 consecutive
genomic pyramids(start locations) that are spaced a
multiple of 14 and 15 apart on the same chromosome.
How many πramids can you find? In which genes?
Δ ---- x*14 ---- Δ ---- x*15 ---- Δ
Δ ---- x*14 ---- Δ Δ Δ ---- x*15 ---- Δ
Δ - 10 - Δ – 18 -Δ ---- x*15 ---- Δ
7. You are allowed to work together, but submit results
individually
Submit your final (partial) answers (!= questions) to
joachim.deschrijver@ugent.be
Questions? After the practicum exercises @ Thursday
Submission deadline: 2011
The winner receives
A place in the FBW Bioinformatics Hall of Fame
Life long respect and recognition :-)
Life long free linux (web)hosting
… a secret prize
8. Enjoy !
Don’t crash your computer!
Don’t waste computing time
and/or memory:
CODE EFFICIENT