Formal languages to map Genotype to Phenotype in Natural Genomes

Formal languages to map
Genotype to Phenotype
in Natural Genomes
Laura Adam
GBCB student

Outline
1. The Genotype to Phenotype (G2P) mapping
problem
2. Using formal languages to formalize G2P
mapping
3. Implementation in synthetic/systems biology
design software to study mutants

1. THE GENOTYPE TO PHENOTYPE
(G2P) MAPPING PROBLEM
G2P= Genotype to Phenotype

Genotype to Phenotype mapping
Genotype
• genetic makeup of a cell, an
organism, or an individual
• specific alleles
• inherited
Phenotype
• observable characteristics or
traits
– morphology, development,
biochemical or physiological
properties, behavior, and
products of behavior (such as
a bird's nest).
Definitions from Wikipedia
Phenotypes result from:
• the expression of an organism's genes
• the influence of environmental factors and developmental conditions
• the interactions between the two
Mapping ?

The ultimate goal of a G2P map
G2P map
The
Phenotype
is X

Historical perspectives: Mendelian
genetics
Law of Segregation:
“Mendelian genetics, considers traits that are determined completely by individual genes.”

Traditional G2P mapping is linear
• Sui Huang, Rational drug discovery: what can we learn from regulatory networks?, Drug Discovery Today, Volume 7, Issue 20, 15 October 2002
• Peccoud, J., Velden, K. V., Podlich, D., Winkler, C., Arthur, L., & Cooper, M. (2004). The selective values of alleles in a molecular network model
are context dependent. Genetics, 166(4), 1715–25.
Phenotypes
Central dogma

Current Formalisms
Databases:
genetic mapping, genome annotation,
genotype, mutant, transcripteome,
proteome and metabolomic data.
Ontologies:
Controlled vocabulary for annotation of
genes and their products (cellular
component, molecular function, biological
process)
Actually, G2P maps are nonlinear:
Gene Networks
• Priest, N. K., Rudkin, J. K., Feil, E. J., van den Elsen, J. M. H., Cheung, A., Peacock, S. J., Laabei, M., et al. (2012). From genotype to phenotype:
can systems biology be used to predict Staphylococcus aureus virulence? Nature reviews. Microbiology, 10(11), 791–7.
doi:10.1038/nrmicro2880
• Benfey, P. N., & Mitchell-Olds, T. (2008). From genotype to phenotype: systems biology meets natural variation. Science.
“replacing the linear pathways with interconnected networks.”

Gene expression mechanisms also
matter
“The current understanding of the mechanisms of gene expression indicates
the importance of nonlinear effects resulting from gene interactions. “
– Peccoud, J., Velden, K. V., Podlich, D., Winkler, C., Arthur, L., & Cooper, M. (2004). The selective values
of alleles in a molecular network model are context dependent. Genetics.
 Trans-regulatory element = gene which may modify (or regulate) the
expression of distant genes
– Phosphorylation, protein complex, transcription inhibition, etc.
 Cis-regulatory element = a region of DNA or RNA that regulates the
expression of genes located on the same section of DNA
– Translation rate depends on RBS and CDS, etc.
– Folding alters function and dynamics

What is missing in current G2P maps?
Gene expression mechanisms: the dynamics
trans and cis interactions

2. HOW TO FORMALIZE G2P
MAPPING TO MAKE PREDICTIONS?
What formal languages can bring.

Is “language of life” just a metaphor?
Or what insights can we get from
computational studies of natural language?

Natural Language Processing
How about a computational linguistics approach to the
G2P mapping problem?
 Like a text, in biology we have a support for
information (Genotype), and a meaning (Phenotype)
 Anaphora as trans-interactions:
• “type of expression whose reference depends upon
another referential element”
• eg: relation noun/pronoun

Anaphora as trans-interactions
• “type of expression whose reference depends upon
another referential element”

How about a computational linguistics approach to the
G2P mapping problem?
 Like a text, we have a support for information
(Genotype), and a meaning (Phenotype)
 Anaphora as trans-interactions:
• eg: relation noun/pronoun (Mary – she)
 Inflectional morphology as cis-interactions
• eg: subject+verb (+tense)

Inflectional morphology as cis-
interactions

How about a computational linguistics approach to
the G2P mapping problem?
Like a text, we have a support for information
(Genotype), and a meaning (Phenotype)
Anaphora as trans-interactions:
• eg: relation noun/pronoun (Mary – she)
Inflectional morphology as cis-interactions
• eg: subject+verb (+tense)
Handle context:
• Wittgenstein - language-game
• He went there.

We are learning
about formal
languages
Nous apprenons les
languages formels
(Nosotros)
estamos
estudiando
los
lenguajes
formales
Natural languages and Computers?
>> Linguistic universal

Intuition: Formal languages
• <subject> <verb> <object> = (SVO)
– A linguistic typology
– Could be <subject> <object> <verb> = (SOV)
 We are learning about formal languages
 Nous apprenons les languages formels
(Nosotros) estamos estudiando los
lenguajes formales

Intuition: Formal language
• SVO_sentence  Subject, Verb, Object
• Object  Noun phrase | Relative_clause
• Subject  “I” | “You” | “He” | “She” | “We’ |
“They”
• Verb  “are learning” | “is learning”
• Noun phrase  “about formal languages”
• Relative_clause  “that formal languages are
awesome”
A grammar is a: Set of rules describing how to form sentences from a language’s vocabulary

Example: Formal language
Object  Noun
Phrase
SVO_sentence 
Subject Verb Object
SVO_sentence
Subject
We
Verb
are learning
Object
Noun phrase
about formal
languages.
A parse tree represents the syntactic structure of a string according to some formal grammar.

Context free
grammar
• Terminals =words
• Non Terminals =
intermediary steps
• Rules:
– Non-terminals
{Terminals and Non
terminals}
• Start
>> The language is the
set of all sentences that
can be produced
Noam Chomsky
"father of modern linguistics"

The repressilator
Elowitz, M. B., & Leibler, S. (2000). A synthetic oscillatory network of transcriptional regulators. Nature, 403(6767), 335-8. doi:10.1038/35002125

The toggle switch
Gardner, T. S., Cantor, C. R., & Collins, J. J. (2000). Construction of a genetic toggle switch in Escherichia coli. Nature, 403(6767), 339-42.
doi:10.1038/35002131
lacI
tetR

Grammar and Biology?
• Pattern to express protein (typology):
– <promoter> <rbs> <coding_seq> <ter> <ter>
>> Some underlying rules that must govern biology !

What would a CFG for Biology be like?
• “Sentence” to express proteins
– Transcription: promoter, terminator
– Translation: ribosome biding site
• Central dogma:
– Cassette: Promoter + RBS + CDS + Terminator

Example: SynBio CFG
Rules
• CONSTRUCT  CAS | 2CAS | 2CASREV | 3CAS
• 2CAS  CAS, CAS
• 2CASREV  CAS, [, CAS, ]
• 3CAS  CAS, CAS, CAS
• CAS  PROMOTER, CIX, TERMINATOR
• CIX  CISTRON | CISTRON, CISTRON
• CISTRON  RBS, CDS
• TERMINATOR  TERMINATOR, TERMINATOR

CONSTRUCT
CAS
PROMOTER
ptetr
CIX
CISTRON
RBS
rbsA
CDS
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
Example: SynBio CFG
Express a gene
1. CONSTRUCT  CAS
2. CAS  PROMOTER, CIX,
TERMINATOR
3. CIX  CISTRON
4. CISTRON  RBS, CDS
5. TERMINATOR 
TERMINATOR, TERMINATOR

CONSTRUCT
CAS
PROMOTER
ptetr
CIX
CISTRON
RBS
rbsA
CDS
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
[ CAS
PROMOTER
placi
CIX
CISTRON
RBS
rbsB
CDS
tetr
CISTRON
RBS
rbsB
CDS
gfp
TERMINATOR
t1
]
Example: SynBio CFG
A toggle switch
lacI
tetR
GFP

And the Phenotype? The meaning
• Use of Attribute Grammars
• It is a CFG plus:
– Terminals and Non-Terminals have attributes
– Rules have semantic actions to compute
attributes values
>> While going through the parse tree, we now
also evaluate the semantics (meaning)

And the Phenotype? The meaning
– Transcription:
• dna  dna + mrna
– Translation:
• mrna  mrna + protein
– Degradation mrna:
• mrna  []
– Degradation protein:
• protein  []
– Interaction promoter protein:
• dna + repressor <-> dna_repressor_x

Example - Attributes
• Promoter: transcription rate, repressor
– Promoter(transcription_rate, repressor)  ptetr (50, tetr)
– Promoter(transcription_rate, repressor)  placi (10, laci)
• RBS: translation rate
– RBS(translation_rate)  rbsA (25)
– RBS(translation_rate)  rbsB (50)
• CDS: degradation rate for the protein and the mRNA
– CDS(protein_deg,mrna_deg)  laci(1,1)
– CDS(protein_deg,mrna_deg)  tetr(1,1)
• Terminator
– Terminator  t1

Example: Semantic Actions
• CAS  PROMOTER(transcription_rate, repressor), CIX,
TERMINATOR
– Transcription: dna  dna + mrna, [transcription_rate]
– Interaction: if repressor in construct then dna + repressor
 dna_repressor_X
• CISTRON  RBS(translation_rate),
CDS(protein_deg,mrna_deg)
– Translation: mrna  mrna + protein, [translation_rate]
– Degradation_mrna: mrna  φ, [mrna_deg]
– Degradation_protein: protein  φ, [protein_deg]

Semantic DNA Compilation
Genetic Design
A
Get Chemical
Equations for A
Attribute
Grammar

Semantic DNA Compilation
Genetic Design
B
Get Chemical
Equations for B
Attribute
Grammar

CONSTRUCT
CAS
PROMOTER
ptetr
CIX
CISTRON
RBS
rbsA
CDS
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
[ CAS
PROMOTER
placi
CIX
CISTRON
RBS
rbsB
CDS
tetr
CISTRON
RBS
rbsB
CDS
gfp
TERMINATOR
t1
]
Example: Toggle switch

CAS
PROMOTER
ptetr
CIX
CISTRON
RBS
rbsA
CDS
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2

CAS
PROMOTER: ptetr
ptetr
CIX
CISTRON
RBS
rbsA
CDS
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2

CAS
PROMOTER: ptetr
ptetr
CIX
CISTRON
RBS
rbsA
CDS
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
• Translation: mrna  mrna +
protein, [translation_rate]
• Degradation_mrna: mrna  φ,
[mrna_deg]
• Degradation_protein: protein  φ,
[protein_deg]

CAS
PROMOTER: ptetr
ptetr
CIX
CISTRON
RBS: rbsA
rbsA
CDS
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
[mrna_deg]
[protein_deg]

CAS
PROMOTER: ptetr
ptetr
CIX
CISTRON
RBS: rbsA
rbsA
CDS: laci
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
[mrna_deg]
[protein_deg]

CAS
PROMOTER: ptetr
ptetr
CIX
CISTRON
RBS: rbsA
rbsA
CDS: laci
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
• Translation: mrna_rbsA_laci 
mrna_rbsA_laci + protein_laci, [25]
• Degradation_mrna: mrna_rbsA_laci
 φ, [1]
• Degradation_protein: protein_laci
 φ, [1]

CAS
PROMOTER: ptetr
ptetr
CIX
CISTRON
RBS: rbsA
rbsA
CDS: laci
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
 φ, [1]
 φ, [1]
• Transcription: dna  dna + mrna,
[transcription_rate]
• Interaction: if repressor in construct
then dna + repressor 
dna_repressor_X

CAS
PROMOTER: ptetr
ptetr
CIX
CISTRON
RBS: rbsA
rbsA
CDS: laci
laci
TERMINATOR
TERMINATOR
t1
TERMINATOR
t2
 φ, [1]
 φ, [1]
• Transcription: dna_ptetr_rbsA_laci
 dna_ptetr_rbsA_laci +
mrna_rbsA_laci , [50]
• Interaction: if tetr in construct
dna_ptetr_rbsA_laci + protein_tetr
 dna_ptetr_rbsA_laci _tetr_X

Toggle switch
laci/tetr
Get Chemical
Equations for A
CONST
RUCT
CAS
PROMO
TER
ptetr
CIX
CISTRO
N
RBS
rbsA
CDS
laci
TERMIN
ATOR
TERMIN
ATOR
t1
TERMIN
ATOR
t2
[ CAS
PROMO
TER
placi
CIX
CISTRO
N
RBS
rbsB
CDS
tetr
CISTRO
N
RBS
rbsB
CDS
gfp
TERMIN
ATOR
t1
]
• Transcription: dna_ptetr_rbsA_laci 
dna_ptetr_rbsA_laci + mrna_rbsA_laci ,
[50]
• Interaction: if tetr in construct
dna_ptetr_rbsA_laci + protein_tetr 
dna_ptetr_rbsA_laci _tetr_X
• Degradation_mrna: mrna_rbsA_laci  φ,
[1]
• Degradation_protein: protein_laci  φ,
[1]
• Transcription: dna_placi_rbsB_tetr 
dna_placi_rbsB_tetr + mrna_rbsB_tetr ,
[10]
• Interaction: if laci in construct
dna_placi_rbsB_tetr + protein_tetr 
dna_placi_rbsB_tetr_laci_X
• Translation: mrna_rbsB_tetr 
mrna_rbsB_tetr + protein_tetr, [50]
• Degradation_mrna: mrna_rbsB_tetr  φ,
[1]
• Degradation_protein: protein_tetr  φ,
[1]
Attribute
Grammar

Natural
language
Natural
genomes
Formal
languages
Synthetic
biology
Scaling up to Natural Genomes
Building a Yeast Cell Cycle Attribute Grammar:
A projection of the Cell Cycle model onto the Genome

AG
?
1. The syntax 2. The chemical equations

Genome database – Wild-type
56

57
0 200000 400000 600000 800000 1000000 1200000 1400000 1600000
I
II
III
IV
V
VI
VII
VIII
IX
X
XI
XII
XIII
XIV
XV
XVI
< CLN3
< LTE1
< CDC15
CDC28 >
< PDS1 < SWI5
BCK2 >
< CDC14
CDC20 > < CDH1 < ESP1
CDC6 > NET1 > MAD2 >
SBF >
SIC1 >
TEM1 > MCM1 > BUB2>
< CLN2 < CLB2 < CLB5
#chromosome
# bp
22 genes

Syntax of the yeast cell cycle grammar
GENOME  MODEL ( CHRI ) ( CHRII ) ( CHRIII ) ( CHRIV ) ( CHRV ) ( CHRVI ) ( CHRVII ) (
CHRVIII ) ( CHRIX ) ( CHRX ) ( CHRXI ) ( CHRXII ) ( CHRXIII ) ( CHRXIV ) (
CHRXV ) ( CHRXVI )
CHRI  CHRI_L [CLN3 ] CHRI_M1 [LTE1 ] CHRI_M2 [CDC15 ] CHRI_R
CHRII  CHRII_L CDC28 CHRII_R
CHRIV  CHRIV_L [PDS1 ] CHRIV_M [SWI5 ] CHRIV_R
CHRV  CHRV_L BCK2 CHRV_R
CHRVI  CHRVI_L [CDC14 ] CHRVI_R
CHRVII  CHRVII_L CDC20 CHRVII_M1 [CDH1 ] CHRVII_M2 [ESP1 ] CHRVII_R
CHRX  CHRX_L CDC6 CHRX_M1 NET1 CHRX_M2 MAD2 CHRX_R
CHRXI  CHRXI_L SBF CHRXI_R
CHRXII  CHRXII_L SIC1 CHRXII_R
CHRXIII  CHRXIII_L [TEM1 ] CHRXIII_M1 MCM1 CHRXIII_M2 [BUB2 ] CHRXIII_R
CHRXVI  CHRXVI_L [CLN2 ] CHRXVI_M1 CLB2 CHRXVI_M2 [CLB5 ] CHRXVI_R

Chen’s Model Cell Cycle
•150 parameters
•>100 mutants
•59 ODEs
•4 events
Chen, K. C., Calzone, L., Csikasz-Nagy, A., Cross, F. R., Novak, B., & Tyson, J. J. (2004). Integrative analysis of cell cycle control in budding yeast.
Molecular biology of the cell, 15(8), 3841-62. doi:10.1091/mbc.E03-11-0794

Rules’ Semantic Actions
Trans interactions:
• Synthesis of {proteinX} by {proteinY}
• synthesis (X, Y, background_synthesis,
Y_dependant_synthesis)
• Degradation of {protein}
• Phosphorylation of {protein}
• Dephosphorylation of {protein}
• Association of {proteinA} and
{proteinB}
• Dissociation of {proteinA} and
{proteinB}
• Degradation of {proteinA) in {proteinB}
• {proteinA}/{proteinB} complex
formation
• {proteinA}/{proteinB} dissociation
• …
• Growth
Events:
• Reset ORI
• Start DNA synthesis
• Spindle checkpoint
• Cell division
Kinetic laws/functions:
• BB
• Michaelis-Menten
• Mass action1 (1 element)
• Mass action 2 (2elements)
• Goldbeter-Koshland function

Semantic – BCK2 example
• Definitions: Chemical Equation “ Protein, mass [rate]”
– synthesis(Name, Rate):-Write(Name.” = [mass] . ” .Rate).
• Rules
– ChrV  ChrV_L Bck2(B0) ChrV_R, {synthesis (‘Bck2’, B0)}
• Parts = alleles
– Bck2(0.054)[part_bck2_wt].
– Bck2(0)[part_bck2_ Δ].
63

Compile and Get wild type SBML file

Future: Mutant design
• Consider:
– What genes are modified?
 New parts
– How biologists make the mutant?
 New grammar rules
– How it relates to the mathematical model?
 New semantic actions
>> We can compute what would be the behavior of new mutants
according to the model
Phenotype:
• Inviable (phase blocked?)
• Viable (size at onset of DNA synthesis, size at bud emergence, size
at division, and duration of G1 phase?)

3. IMPLEMENTATION IN
SYNTHETIC/SYSTEMS BIOLOGY DESIGN
SOFTWARE TO STUDY MUTANTS
GenoCAD.org

To Switches and Oscillators, Yeast Cell
Cycle…and beyond!
Working on a workflow for users to define their OWN
Attribute Grammar:
• Define the syntax
• Define your template equations (regular, trans and cis),
choose kinetic laws >>parameters
1. Link equations to grammar rules as semantic actions
2. Link parameters to categories
3. Add any cis interaction
Attribute Grammars can be a formalism for G2P maps

Use generated compiler to analyze
your designs in GenoCAD
Design1
Your project’s
grammar
AG
editor
Database
Design2
Design mutants
Prolog
compiler
Java
(libSBML)
SBML
Java
(libSBML)
SBML

Conclusions
 Semantic models of DNA sequences:
– formalize G2P mapping and confer predictive powers with Attribute Grammars:
• translate DNA sequences into mathematical models
• predicting the phenotype they encode
– fill a gap in annotating genetic information by integrating gene expression
mechanisms
 Attribute grammar for the yeast cell cycle:
– in a logical and structured fashion, information from genomic databases and
mathematical models will be utilized in the exploration of novel mutants
– semantic models for natural genomes
 Genetic design tools user-friendly to the majority and still adaptable to
specific projects.
– GenoCAD: create libraries of parts, rule-based design and simulation, generation of
SBML files
– Define your own project’s Attribute Grammars: GUI editor
 Design mutants in minutes and simulate them!

Acknowledgements
• VBI SynBio Group
– J. Peccoud (P.I.)
– N. Adames
– D. Ball
– M. Lux
– C. Overend
– M. Wilson
– and Patrick (Yizhi) Cai
– and R. Hertzberg
Cai, Y., Lux, M. W., Adam, L., & Peccoud, J. (2009). Modeling structure-function relationships in
synthetic DNA sequences using attribute grammars. PLoS computational biology
• My PhD committee:
 Dr. Bevan
 Dr. Garner
 Dr. Kepes
 Dr. Peccoud
 Dr. Ramakrishnan
 Dr. Tyson
And Dennie Munson!

ADDITIONAL INFORMATION
Resources
82

Syntactic Limitation
83
The Chomsky hierachy
Searls, D.B. “Linguistic approaches to biological sequences.” Bioinformatics 13, no. 4 (1997): 333.
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/13/4/333.

Parsing
84
Left to Right
Top-Down
Parse
The Parse Tree of the Sentence
"The boy went home“
Right to Left
Top-Down
Parse
Left to Right
Bottom-Up
Parse
Right to Left
Bottom-Up
Parse

Use of attribute grammar in synthetic
biology
85
Formal definition Semantic In the synthetic
biology context
V, a finite set of non-
terminals
Attributes Parts categories
Σ, a finite set of
terminals
Attributes values Genetic Parts
R, a finite relation from
V to (VUΣ)*
Semantic actions Design Rules
S∈V, the start symbol Hard-coded
declarations
Start

Formal languages to map Genotype to Phenotype in Natural Genomes

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (12)

Similar to Formal languages to map Genotype to Phenotype in Natural Genomes

Similar to Formal languages to map Genotype to Phenotype in Natural Genomes (20)

Recently uploaded

Recently uploaded (20)

Formal languages to map Genotype to Phenotype in Natural Genomes