The document appears to discuss genomic sequence alignment between ribosomal proteins of bacteria. It includes the amino acid sequences of the 60s and 50s ribosomal proteins of a bacterium. There are discussions around comparing the sequences, calculating percent similarity between them, and identifying exact matches, physical similarities, and gaps between the sequences. Pairwise sequence alignment is mentioned as being slow for large databases.
48. Bucketed spans
Chaos Game Representation w/ geohash
Common Terms query
Fuzzy query
Synonym injection
... many hours of experiments
FAILED
Tuesday, August 13, 13
67. Random Projections
2) iterate over sequence
MAQDQGEKENPMRELRIRKL
13
Emit a “1”
......QDKQDK 1
Tuesday, August 13, 13
68. Random Projections
3) repeat for all trigrams
MAQDQGEKENPMRELRIRKL
.............................................. 0CVV
......QDKQDK 1
Tuesday, August 13, 13
69. Random Projections
3) repeat for all trigrams
MAQDQGEKENPMRELRIRKL
.............................................. 0
...........QED 1
..........................MRE 1
CVV
QED
MRE
......QDKQDK 1
Tuesday, August 13, 13