2. http://iddo-friedberg.net Twitter: @iddux
About me
●
2003: PhD Hebrew University, Jerusalem
●
2003-2007: Postdoc, Burnham Institute, CA
●
2007-2009: Researcher, UC San Diego
●
2009-2015: Assistant Professor, Miami
University, Ohio
●
2015- Associate Professor, Iowa State
University
7. http://iddo-friedberg.net Twitter: @iddux
Gene Blocks
Transcription
mRNA
transcripts
Translation
Gene 1 Gene 2 Gene 3
●
Gene blocks are any suspected syntenic group of open
reading frames (ORFs) which have a maximum allowed
spacing. For my research this maximum is 500 nt.
8. http://iddo-friedberg.net Twitter: @iddux
Background
●
Operons are an important feature in
prokaryotic genetics.
– Often contain full metabolic pathways.
●
a set of chemical processes transforming
one compound into another.
– Regulate groups of genes.
– Allow for the frequent transfer of gene blocks
between organisms.
●
Therefore, studying operon evolution helps us
to understand metabolic pathway formation.
9. http://iddo-friedberg.net Twitter: @iddux
How we model changes in gene
blocks
●
We borrow ideas from
sequence evolution, but
genes are the atom of
change.
– Changes are called
events.
– There are more possible
events modeling gene
block evolution than in
biological sequence
evolution.
5' ATCCGA 3'
ATCCGT ATC-GA
24. http://iddo-friedberg.net Twitter: @iddux
Antibiotic resistance
WHO (2014): “A problem so serious that it threatens the achievements
of modern medicine... A post-antibiotic era, in which common infections and
minor injuries can kill, far from being an apocalyptic fantasy,
is instead a very real possibility for the 21st century.”
26. Example - Image classification
27
Deep
learning
algorithm
learning to
classify
images
Imagenet (huge dataset)
Separate but relevant task
Smaller different dataset -
of hotdogs and not-hotdogs
27. Natural Language Processing
28
English wikipedia
LSTM
language
model
Sentiment classification
task
Smaller different dataset
(Sentiment classification task)
28. Antibiotic resistance classification
29
UniRef50 bacteria
protein sequences
(734948 sequences)
LSTM
protein
sequence
language
model
Classify protein
sequences into 14
classes of antibiotic
resistance
Classification on COALA 0.4 cdhit dataset
31. http://iddo-friedberg.net Twitter: @iddux
Why CAFA?
“On the one hand, we have enormous “protein” databases that are replete with
errors, wishful thinking, phantoms, and uncertainties. On the other, we have a
tiny fraction of real proteins that have been studied in any depth.”
–- Dan Graur
Biggest problem in molecular
biology: < $1,000 genome,
BUT:
$20,000- >$10,000,000
annotation.
32. http://iddo-friedberg.net Twitter: @iddux
CAFA
●
The Critical Assessment of Function Annotation
●
Hundreds of scientists trying to predict protein
function from sequence
●
A friendly competition between scientific teams
33. The Protein function prediction problem
>sp|P04637|P53_HUMAN
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTED
PGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFL
HSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTE
VVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSD
CTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRTE
EENLRKKGEPHHELPPGSTKRALPNNTSSSPQPKKKPLDGEYFTLQIRGRERFEMFR
ELNEALELKDAQAGKEPGGSRAHSSHLKSKKGQSTSRHKKLMFKTEGPDSD
Cell differentiation
Apoptosis
Biological process
?
34. http://iddo-friedberg.net Twitter: @iddux
CAFA Timeline
p re pa rati on
p redi cti on
annotati ong rowt h assessment
Participants
Organizers
t i me
Sep. 2013
J an.2014
Sep.2014 M ar .2015
Launch
C A FA 2
Cl ose
submi ssi on
Rel ease
results
t0 t1