SlideShare a Scribd company logo
1 of 36
Download to read offline
НЕдеструктивный дизайн
Лена Чуклина
Артур Залевский
Лекция 1. Базовые принципы дизайна
SMTB 2014
Disclaimer (ответственное заявление)
• В этой лекции нещадно используются
•  Идеи и примеры из книги Robin Williams “The Non-
designer’s Design Book”*
•  Постеры участников школы “Современная биология
& будущее биотехнологий” 2013 и 2014**
*Авторские права не соблюдены. Автору лекции очень стыдно…
**Эти люди знали, на что шли. Они подавали свои постеры для разбора на
школе. Некоторым это помогло сделать постеры лучше.
Отличный план
1.  Зачем нужен дизайн?
•  Сколько нужно выучить алгебры для достижения
гармонии?
2.  Четыре принципа дизайна
• Contrast (контраст)
• Repetition (повтор)
• Alignment (выравнивание)
• Proximity (близость)
3.  Примеры
Дизайн – не для красоты
•  … the important part must stand out and the unimportant
must be subdued . . . .
•  Jan Tschichold 1935
•  … важное должно выделяться, а второстепенное
должно отойти на второй план…
•  Ян Щичольд 1935 г.
Базовые принципы дизайна
Базовые принципы дизайна
Элементы дизайна
•  Цвета
•  Формы и линии
•  Шрифты
•  Взаимное расположение ( + выравнивание)
Близость (она же группировка)
•  Близость в пространстве подразумевает смысловую
близость
•  Группируйте элементы в смысловые единицы
Близость.
До После
Близость
Близость
Выравнивание
• Ни один
• Элемент
• Не должен быть
• Расположен
• произвольно
• Всему
• свое
• место
Виды выравнивания
Сильная
линия дает
опору
Что изменилось?
этот постер
можно
сделать
лучше
в три клика
Повторение
•  Повторяйте элементы дизайна. Повтор создает
структуру и успокаивает
•  Что можно повторять?
•  Цвет
•  Шрифт
•  Толщину линий
•  Размеры (шрифтов, колонок, картинок)
Что здесь
повторяется?
Modelling Leaf Shape Evolution with Gaussian Processes
N. A. Raharinirina, L. Rusaitis, H. Jackson, N. S. Jones, J. W. J. Anderson, M. Tsiantis, M. Cartolano and J. Hein‡
Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, United Kingdom
‡ hein@stats.ox.ac.uk .
Motivation
Leaf shapes display a tremendous variation over
their evolution, which makes them an attractive
system to study. Our focus of investigation is to
find some ways of quantifying this leaf shape di-
versity and to infer the existing phylogenetic trees
from sample leaf data. Although there are many
techniques available already for phylogenetic infer-
ence, in our implementation, we will take the edges
of the leaves as a 2-D function, and assume that they
come from a phylogenetic Gaussian process.
Varying the topology of the
phylogeny that we assume
the leaves come from, we in-
tend to be able to select the
correct one simply by maxi-
mum likelihood methods.
Representing Leaf Shapes
a) Olimarabidopsis Pumila b) Arabidopsis Neglecta
We quantify the leaves by taking a 2-D represen-
tation of them, and finding the distances from the
vein to the edge of the leaf, as well as using the gra-
dient or just the very tip of the leaf to compare the
effectiveness of each different data type.
Gaussian Process Regression Model
We infer a Gaussian Process on our leaf data and find the mean and the covariance function of the GP.
Firstly, we analyse one leaf shape GP regression, and get covariance in space only:
k(x, x0
; l) = e
(x x0)2
2l2
+ 2
(x x0
).
Then, to do a phylogenetic inference, we introduce a covariance in evolutionary time
t for the leaves u and v:
k(xu, x0
v; l, t = (t1, t2)) = e (t1+t2)
e
(xu x0
v)2
2l2
+ 2
(u v) (xu x0
v).
Maximizing the likelihood over (l, t) we find the most likely phylogeny:
p(y|X, (l, t)) =
1
(2⇡)
n
2 |Ky|
1
2
e
1
2 (y µ)T
K 1
y (y µ)
.
Inference on Simulated Data
Simulating ’leaves’ from a GP for which we know
all the relevant parameters, we can see how well we
are able to recover them using our inference proce-
dure. Most simulated data sets we tried this on gave
reasonable results, and the estimate of the time be-
tween leaves was not overly sensitive to incorrect
lengthscales.
0.5 1.0 1.5 2.0
0.200.250.300.350.400.450.50
Proportion of Correct Trees
Length scale
Proportioncorrect
a) Comparison with
UPGMA(red)
●
●
●
●●●●●● ●●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
0.25 0.638888888888889 1.41666666666667 2
024681012
Estimate of total time of tree
Length scale
Time
b) Allowed evolutionary time
(red is true total time)
The benchmark we are trying to beat is the pro-
portion of times that the correct phylogeny was
inferred using the UPGMA method, using a dis-
tance matrix given by the sum of squared distances
between points on the leaves.
Simulating 4 leaves with total of 15 possible phy-
logenies, we took 100 datasets. The proportion
of phylogenies selected correctly by UPGMA was
0.385. At the correct lengthscale (l = 1), the propor-
tion selected correctly by the Gaussian process in-
ference was 0.53. So we can say with confidence that
Gaussian Process regression performs better than
UPGMA when we get the covariance structure cor-
rect. As our lengthscale guess gets further from the
truth, though, the performance of the GP inference
decreases a lot.
Results on Real Leaf Data
a) Original b) Polar Form c) Consensus d) Gradient e)Tip of the Leaf
The previous analysis on the simulated data showed that it is possible
to use the GP to infer phylogenies. Encouraged by this, we used a gen-
eral squared exponential space covariance and a simple exponential
covariance in time on a real sample of 5 leaves in Arabidopsis family.
In the figures above, we present the maximum likelihood surfaces of
all the different data type representations we used for the leaf shape.
The green point is the maximum likelihood of the true phylogeny, so
we can straightforwardly quantify the strength of our predictions.
The normal space covariance seems to give us reasonably good results
of some data sets, and very poor predictions for the other. Therefore,
the model is highly sensitive to the type of leaf shape representation.
Olimarabidopsis pumila
Arabidopsis halleri
Arabidopsis lyrata
Arabidopsis neglecta
Arabidopsis thaliana
True Tree
Olimarabidopsis pumila
Arabidopsis halleri
Arabidopsis lyrata
Arabidopsis neglecta
Arabidopsis thaliana
The tree has log likelihood = 67.75
Tree Comparison
True Tree (left) against our
most likely inferred Tree(right)
Further Results and Extensions
Another area of interest was to investigate the con-
sequences of assuming a non-homogeneous space
covariance, by increasing correlation between spe-
cial points on the leaves. We chose these points as
the turning points in the leaf outlines by analysing
the gradient, and we changed the space covariance
matrix accordingly. We observed some drastic im-
provement in the prediction for some data types,
particularly for the original, gradient and tip rep-
resentations. Thus, provided we can find the right
covariance structure that represent the leaf shape,
we can make much better predictions.
a) A significant improvement
for true tree likelihood in
original leaf representation
b) Modified space-covariance
matrix for leaves Halleri,
Thaliana, Pumila, Neglecta
The project directs to many other areas still left to investigate, from studying these modified covariance
matrices and hyperparameter sensitivity more in-depth, as well as experimenting with 2-dimensional re-
gression models and other representations of the leaf shapes. Gaussian Process regression proves to be a
powerful method worthy of more investigation.
References
[1] Nick S. Jones and John Moriarty (2010). Evolutionary Infer-
ence for Functional Data: Using Gaussian Processes on Phy-
logenies to Study Shape Evolution.
[2] C. E. Rasmussen, C. K. I. Williams (2006). Gaussian Processes
for Machine Learning, the MIT Press.
Acknowledgements
This work was carried out as part of the Oxford Summer School
in Computational Biology, 2011, in conjunction with the Depart-
ment of Plant Sciences, and with support from the Department of
Zoology. Funding was provided by J. Hein’s PRA. We specially
thank J. W. J. Anderson, N. S. Jones and J. Hein for guidance, and
everyone at the Plant Sciences that made this project possible.
Что здесь:
1. Повторяется?
2. Выравнивается?
Контраст
•  Избегайте похожих элементов. Если они не
одинаковые (совсем одинаковые), то сделайте их
действительно разными
Контраст цвета
Контраст в тоне, насыщенности и
яркости
Отсутствует J
Присутствует J
Контраст в структуре
Усиливаем контраст в шрифтах
Усиливаем контраст в линиях
amount of mRNA under
different conditions
stat
nod
1
2
3
4
-2
-3
60
80
100
120
140
FC
Контраст,
которого не
хватает
Контраста
не хватает,
т.к.
элементов
слишком
много
Потренируемся на котиках… (мышах)
• Какие принципы нарушены?
• В картинках
• В шрифтах
• В цветах
Как питаются россияне (slon.ru)
Что не так с повтором/контрастом?
Какие
принципы
соблюдены, а
какие
нарушены?
Последний… TSS mapping and transcript repertoire!
TSS position in relation to gene is key to its function. !
Promoter motif prediction!
Transcrip)on+Start+Site+Map+Of+Soy+Symbiont+
Bradyrhizobium-japonicum-Based+On+dRNA:seq!
1Moscow!Ins*tute!of!Physics!and!Technology,!Dolgoprudny,!Russia!2A.A.!Kharkevich!Ins*tute!for!Informa*on!Transmission!Problems,!
Moscow,!Russia,!3!M.V.Lomonosov!Moscow!State!University,!Moscow,!Russia,!4MassachuseKs!Ins*tute!of!Technology,!Boston,!USA,!!
5Ins*tute!of!Microbiology!and!Molecular!Biology,!JustusOLiebeg!Universitat!Giessen,!Gießen,!Germany!
chuklina.jelena@gmail.com
Jelena!Chuklina1,!2,!Nikolay!Lyubimov3,!Maxim!Imakaev4,!Elena!EvguenievaOHackenberg5!and!Mikhail!S.!Gelfand2,3!
A+ sub+ T+ G+ sub+ C+
Outline!
•  Perform new round of machine-learning with updated training set!
•  Update gTSS and 5’-aTSS classification!
•  Compare dRNA-seq data with expression array and proteome
data!
!
Acknowledgments!
•  Julia Hahn and Sebastian Thalmann for experimental validation of transcription
start sites and promoter motifs!
•  Iakov Davydov and Aleksandr Chuklin for numerous advices on program
development!
•  Cynthia Sharma, Konrad Förstner, Jorg Vogel for sequencing and read mapping
•  Gabriella Pessi und Hans-Martin Fischer for nodule RNA!
!
Summary!
1. We! detected! 17574! peaks,! aYer! machine!
learning!10071!were!leY!as!TSS.!
2. We! detected! 3979! RpoD! promoters,! 485! RpoN!
mo*fs,!159!TSSes!have!both.!
3. AYer! reOannota*on! 73! ncTSS! and! 682! iTSSes!
were!reOclassified!as!gTSSes.!
Abstract!
dRNA%seq) was) designed) for) selec4ve) sequencing) of) na4ve) transcripts)
origina4ng) from) transcrip4on) start) sites) (TSS).) Here) we) present) TSSF) –)
Transcrip4on) Start) Site) Finder) –) a) soBware) package) which) allows)
comprehensive) analysis) of) bacterial) trancriptomic) landscape.) TSS) map)
allows)to)assess)repertoire)of)small)non%coding)RNA,)inves4gate)promoter)
mo4fs)and)improve)gene)annota4on.)
In) this) study) we) use) TSSF) to) compare) transcriptome) of) soy) symbiont)
Bradyrhizobium) japonicum,) in) liquid) cultures) and) root) nodule) popula4ng)
bacteroids.))
!
Re-annotation!
TSS detection. Machine learning!
(+)! library! is! RNA,! selected! for! primary!
transcripts,! (O)! library! is! all! RNA,! including!
processed!(Fig.1).!All!peaks!matching!in!(+)!and!
(O)! library! were! treated! as! candidate! TSS! and!
were! subjected! to! automated! machine!
learning.! ExpertOassessed! peaks! as! a! training!
set! (Fig.2! and! Table1).! Machine! learning! was!
performed! separately! for! freeOliving! bacteria!
(FR)! and! nodules! (NO).! To! compute! support!
vectors,! the! following! parameters! were!
selected:!
i.  Height!of!(+)!and!(O)!peak!(Fig.!3)!!
ii.  ra*o!of!(+)!and!(O)!peak!
iii. average!expression!in!30!b.p.!radius!
Fig. 3. Peak detection: RNA-seq
read coverage (blue), salience
function (green), peaks (red)
Fig. 5. Best-scoring patterns were used to construct Positional weight matrix (PWM).
PWM threshold determination (upper): score distribution density of normal
upstreams is skewed towards higher scores when compared to random sequences.
Resulting logos (lower) of RpoD (σ70) and RpoN(σ54).
0.00
0.05
0.10
0.15
5 10 15 20
totalScore
density
normal
random
RpoN, score distribution density.
TSSes overexpressed in nodules
subs*tu*on!
box2!
box1! box2!
box1! box2!
extension!
shiY!
box1!
!+ ISGA+vs+old+ RAST+vs+old+ RAST+vs+ISGA+
matching!!CDSes+ 4749! 4669! 7690!
matching!genes+ 4796! !! !!
reOannotated!start+ 3050! 2941! 898!
new!genes+ 1351! 1105! 556!
discarded!+ 525! 707! 127!
!+ old+ ISGA+ RAST++
genes++ 8373! 9197! !!
CDS++ 8317! 9144! 8715!
sRNA length assessment!
Typical transcript starts with TSS and ends with
terminator. We used 3 publically available tools
(ARNold, TransTermHP, WebGesterDB) for
rho-independent terminator prediction. Only
ARNold predicts terminators independently of
annotated gene end and we used it to assess
sRNA length. !
Only 247 TSSes were matched with
terminators, their length was usually 40-200 nt,
rarely more than 400 nt.!
See also:
poster by Julia Hahn!
Fig. 2. Expert assessment of candidate TSS
for training set.
Fig. 1. dRNA-seq
data. (+) library –
red, (-) library – blue.
Table 1. Training set:
M a n u a l l y a s s e s s e d
0-130kb and 1681..1920 kb
(symb.island) of genome
Fig. 9. Start-codon re-annotation: change in protein lengths after re-annotation with RAST and ISGA.
There is clear skew of ORFs which became shorter for both ISGA and RAST. This leads to iTSS re-
classification as gTSS
5’-untranslated region length!
Fig 7. While most of
5‘-UTR have typical
length of 20-40 nt,
there is considerable
amount of leaderless
transcripts, which
s e e m s t o b e
common property of
bacteria
Fig. 8. Re-annotation of RegR (blr0904): now
the TSS №1 precedes start-codon. Old
annotation is grey, new is cyan. P1, P2, P3
are predicted promoters.
Table 2. Number of genes (CDS) predicted by different AGEs
Table 3. Different B.japonicum USDA 110 annotations
Anti-sense transcript mapping!
Most! of! TSS,! classified! as! gTSS! and! aTSS!
belong!to!5’OUTR!and!oYen!don’t!intersect!
corresponding!an*Osense!transcripts!and!!
thus!are!gTSS/oTSS,!transcribed!divergently!(as!
dashed!arrow!above).!Overlap!in!various!aTSS!types!
is!due!to!overlap!of!annotated!genes.!
Protein:coding+genes:+
•  4084!proteinOcoding!genes!have!TSS!
•  Maximal!number!of!TSS!per!gene!is!4!
•  873!proteinOcoding!genes!have!more!than!
one!TSS!
An):+sense+RNAs:+
•  4013!genes!have!an*Osense!TSS!(2056!of!
them!expressed)!
Internal+TSSes:+
•  4167!genes!have!iTSSes!(2368!of!them!
are!expressed)!
!
! gTSS!=!gene!TSS!
iTSS!=!internal!TSS!
oTSS!=!orphan!TSS!
aTSS_5!!
aTSS_i!!!!!!!!!an*Osense!
aTSS_3!
Fig. 6. Different TSS type (=transcript type) distribution.
Abundance of iTSS maybe due to: 1) Operon intrinsic
promoter; 2) RNA cleavage products misclassified as
TSS. For aTSS misclassification analysis, see below.
1340+
oTSS!
TSS! mapping! allows! for! correc*on! of!
annota*on! errors,! especially! reO
annota*on!of!start!codons.!!We!applied!
automated! genome! annota*on! engines!
(AGE)! RAST! and! ISGA! to! improve!
Bradyrhizobium) japonicum) USDA) 110)
annota*on.!
TSS! candidate! upstream! sequences! is!
enriched!with!promoter!mo*fs!!!!
promoters!support!TSS!candidate!as!true!TSS.!
Usually!promoters!possess:!
1.  Conserved!twoObox!sequences!
2.  Conserved!distance!to!TSS!
3.  Conserved!distance!between!boxes!
We! scanned! 60! nt! sequences! upstream! of!
each! predicted! TSS! (or! subset! of! TSS!
u p r e g u l a t e d! i n! n o d u l e s )! t o! fi n d!
overrepresented!6Ont!mo*fs.!We!allowed!1O2!
nt!shiY!of!boxes!from!the!ideal!distance,!1O2!
nt!extension!of!distance!between!boxes!and!!
1O2! subs*tu*ons! in! each! box,! penalizing! for!
each.!
Fig. 4. In the region -35 and -10 nt accordingly there are
the most concentration of correlated position.
Illustration is based on 5000 best patterns
0.00
0.05
0.10
0.15
5 10 15
totalScore
density
normal
random
RpoD, score distribution density.
P3P2P1
1 2
1 2 3
T
ATG
old
TTG
new
RegR, bll0904
Яркость и насыщенность
Последовательные цвета

More Related Content

Viewers also liked

Evaluation question 1
Evaluation question 1Evaluation question 1
Evaluation question 1
W07ULONGWE
 
Rise of The Hyper Connected Shopper
Rise of The Hyper Connected ShopperRise of The Hyper Connected Shopper
Rise of The Hyper Connected Shopper
eTailing India
 

Viewers also liked (13)

Evaluating the Cloud
Evaluating the CloudEvaluating the Cloud
Evaluating the Cloud
 
John Carlos
John CarlosJohn Carlos
John Carlos
 
Evaluation question 1
Evaluation question 1Evaluation question 1
Evaluation question 1
 
सोशल सेलींग से इ काँमर्स व्यापार आगे बढाना
सोशल सेलींग से इ काँमर्स व्यापार आगे बढानासोशल सेलींग से इ काँमर्स व्यापार आगे बढाना
सोशल सेलींग से इ काँमर्स व्यापार आगे बढाना
 
Ciberassetjament
CiberassetjamentCiberassetjament
Ciberassetjament
 
ux, famous, ember
ux, famous, emberux, famous, ember
ux, famous, ember
 
Rise of The Hyper Connected Shopper
Rise of The Hyper Connected ShopperRise of The Hyper Connected Shopper
Rise of The Hyper Connected Shopper
 
Government Puts New Measures For Digital Payments
Government Puts New Measures For Digital PaymentsGovernment Puts New Measures For Digital Payments
Government Puts New Measures For Digital Payments
 
Cloud Computing to Boost eCommerce
Cloud Computing to Boost eCommerceCloud Computing to Boost eCommerce
Cloud Computing to Boost eCommerce
 
Ciberassetjament
CiberassetjamentCiberassetjament
Ciberassetjament
 
Mary's story
Mary's storyMary's story
Mary's story
 
Channel Expansion with eCommerce, Aggregates and More
Channel Expansion with eCommerce, Aggregates and MoreChannel Expansion with eCommerce, Aggregates and More
Channel Expansion with eCommerce, Aggregates and More
 
Protecting Your IPR (Intellectual Property) by Disha Dewan at eTailing India ...
Protecting Your IPR (Intellectual Property) by Disha Dewan at eTailing India ...Protecting Your IPR (Intellectual Property) by Disha Dewan at eTailing India ...
Protecting Your IPR (Intellectual Property) by Disha Dewan at eTailing India ...
 

Similar to Basic_principles_of_design.

DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
DivyanshGupta922023
 
Angga df
Angga dfAngga df
Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis
Taweh Beysolow II
 

Similar to Basic_principles_of_design. (20)

A Parametric Active Polygon For Leaf Segmentation And Shape Estimation
A Parametric Active Polygon For Leaf Segmentation And Shape EstimationA Parametric Active Polygon For Leaf Segmentation And Shape Estimation
A Parametric Active Polygon For Leaf Segmentation And Shape Estimation
 
Topological Data Analysis (TDA) for volumetric X-ray CT data
Topological Data Analysis (TDA) for volumetric X-ray CT dataTopological Data Analysis (TDA) for volumetric X-ray CT data
Topological Data Analysis (TDA) for volumetric X-ray CT data
 
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITIONA NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
A NOVEL DATA DICTIONARY LEARNING FOR LEAF RECOGNITION
 
Data in science
Data in science Data in science
Data in science
 
UC Davis Plant Science Symposium: Topological Data Analysis
UC Davis Plant Science Symposium: Topological Data AnalysisUC Davis Plant Science Symposium: Topological Data Analysis
UC Davis Plant Science Symposium: Topological Data Analysis
 
DHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptxDHC Microbiome Presentation 4-23-19.pptx
DHC Microbiome Presentation 4-23-19.pptx
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Topological Data Analysis What is it? What is it good for? How can it be use...
Topological Data Analysis  What is it? What is it good for? How can it be use...Topological Data Analysis  What is it? What is it good for? How can it be use...
Topological Data Analysis What is it? What is it good for? How can it be use...
 
Updating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling MethodologiesUpdating Ecological Niche Modeling Methodologies
Updating Ecological Niche Modeling Methodologies
 
D1T3 enm workflows updated
D1T3 enm workflows updatedD1T3 enm workflows updated
D1T3 enm workflows updated
 
Development of Shape Based Leaf Categorization
Development of Shape Based Leaf CategorizationDevelopment of Shape Based Leaf Categorization
Development of Shape Based Leaf Categorization
 
J017134853
J017134853J017134853
J017134853
 
Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...Natural Language Processing for Data Extraction and Synthesizability Predicti...
Natural Language Processing for Data Extraction and Synthesizability Predicti...
 
Angga df
Angga dfAngga df
Angga df
 
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTIONINFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
INFLUENCE OF DATA GEOMETRY IN RANDOM SUBSET FEATURE SELECTION
 
Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2Nbe rcausalpredictionv111 lecture2
Nbe rcausalpredictionv111 lecture2
 
Advanced biometrical and quantitative genetics akshay
Advanced biometrical and quantitative genetics akshayAdvanced biometrical and quantitative genetics akshay
Advanced biometrical and quantitative genetics akshay
 
Holder and Koch ievobio-2013 ascertainment biases
Holder and Koch ievobio-2013 ascertainment biasesHolder and Koch ievobio-2013 ascertainment biases
Holder and Koch ievobio-2013 ascertainment biases
 
"A Metaheuristic Search Technique for Graceful Labels of Graphs" by J. Ernstb...
"A Metaheuristic Search Technique for Graceful Labels of Graphs" by J. Ernstb..."A Metaheuristic Search Technique for Graceful Labels of Graphs" by J. Ernstb...
"A Metaheuristic Search Technique for Graceful Labels of Graphs" by J. Ernstb...
 
Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis Overview and Implementation of Principal Component Analysis
Overview and Implementation of Principal Component Analysis
 

Recently uploaded

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 

Recently uploaded (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 

Basic_principles_of_design.

  • 1. НЕдеструктивный дизайн Лена Чуклина Артур Залевский Лекция 1. Базовые принципы дизайна SMTB 2014
  • 2. Disclaimer (ответственное заявление) • В этой лекции нещадно используются •  Идеи и примеры из книги Robin Williams “The Non- designer’s Design Book”* •  Постеры участников школы “Современная биология & будущее биотехнологий” 2013 и 2014** *Авторские права не соблюдены. Автору лекции очень стыдно… **Эти люди знали, на что шли. Они подавали свои постеры для разбора на школе. Некоторым это помогло сделать постеры лучше.
  • 3. Отличный план 1.  Зачем нужен дизайн? •  Сколько нужно выучить алгебры для достижения гармонии? 2.  Четыре принципа дизайна • Contrast (контраст) • Repetition (повтор) • Alignment (выравнивание) • Proximity (близость) 3.  Примеры
  • 4. Дизайн – не для красоты •  … the important part must stand out and the unimportant must be subdued . . . . •  Jan Tschichold 1935 •  … важное должно выделяться, а второстепенное должно отойти на второй план… •  Ян Щичольд 1935 г.
  • 7. Элементы дизайна •  Цвета •  Формы и линии •  Шрифты •  Взаимное расположение ( + выравнивание)
  • 8. Близость (она же группировка) •  Близость в пространстве подразумевает смысловую близость •  Группируйте элементы в смысловые единицы
  • 12. Выравнивание • Ни один • Элемент • Не должен быть • Расположен • произвольно • Всему • свое • место
  • 17. Повторение •  Повторяйте элементы дизайна. Повтор создает структуру и успокаивает •  Что можно повторять? •  Цвет •  Шрифт •  Толщину линий •  Размеры (шрифтов, колонок, картинок)
  • 19. Modelling Leaf Shape Evolution with Gaussian Processes N. A. Raharinirina, L. Rusaitis, H. Jackson, N. S. Jones, J. W. J. Anderson, M. Tsiantis, M. Cartolano and J. Hein‡ Department of Statistics, University of Oxford, 1 South Parks Road, OX1 3TG, United Kingdom ‡ hein@stats.ox.ac.uk . Motivation Leaf shapes display a tremendous variation over their evolution, which makes them an attractive system to study. Our focus of investigation is to find some ways of quantifying this leaf shape di- versity and to infer the existing phylogenetic trees from sample leaf data. Although there are many techniques available already for phylogenetic infer- ence, in our implementation, we will take the edges of the leaves as a 2-D function, and assume that they come from a phylogenetic Gaussian process. Varying the topology of the phylogeny that we assume the leaves come from, we in- tend to be able to select the correct one simply by maxi- mum likelihood methods. Representing Leaf Shapes a) Olimarabidopsis Pumila b) Arabidopsis Neglecta We quantify the leaves by taking a 2-D represen- tation of them, and finding the distances from the vein to the edge of the leaf, as well as using the gra- dient or just the very tip of the leaf to compare the effectiveness of each different data type. Gaussian Process Regression Model We infer a Gaussian Process on our leaf data and find the mean and the covariance function of the GP. Firstly, we analyse one leaf shape GP regression, and get covariance in space only: k(x, x0 ; l) = e (x x0)2 2l2 + 2 (x x0 ). Then, to do a phylogenetic inference, we introduce a covariance in evolutionary time t for the leaves u and v: k(xu, x0 v; l, t = (t1, t2)) = e (t1+t2) e (xu x0 v)2 2l2 + 2 (u v) (xu x0 v). Maximizing the likelihood over (l, t) we find the most likely phylogeny: p(y|X, (l, t)) = 1 (2⇡) n 2 |Ky| 1 2 e 1 2 (y µ)T K 1 y (y µ) . Inference on Simulated Data Simulating ’leaves’ from a GP for which we know all the relevant parameters, we can see how well we are able to recover them using our inference proce- dure. Most simulated data sets we tried this on gave reasonable results, and the estimate of the time be- tween leaves was not overly sensitive to incorrect lengthscales. 0.5 1.0 1.5 2.0 0.200.250.300.350.400.450.50 Proportion of Correct Trees Length scale Proportioncorrect a) Comparison with UPGMA(red) ● ● ● ●●●●●● ●●● ● ●●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 0.25 0.638888888888889 1.41666666666667 2 024681012 Estimate of total time of tree Length scale Time b) Allowed evolutionary time (red is true total time) The benchmark we are trying to beat is the pro- portion of times that the correct phylogeny was inferred using the UPGMA method, using a dis- tance matrix given by the sum of squared distances between points on the leaves. Simulating 4 leaves with total of 15 possible phy- logenies, we took 100 datasets. The proportion of phylogenies selected correctly by UPGMA was 0.385. At the correct lengthscale (l = 1), the propor- tion selected correctly by the Gaussian process in- ference was 0.53. So we can say with confidence that Gaussian Process regression performs better than UPGMA when we get the covariance structure cor- rect. As our lengthscale guess gets further from the truth, though, the performance of the GP inference decreases a lot. Results on Real Leaf Data a) Original b) Polar Form c) Consensus d) Gradient e)Tip of the Leaf The previous analysis on the simulated data showed that it is possible to use the GP to infer phylogenies. Encouraged by this, we used a gen- eral squared exponential space covariance and a simple exponential covariance in time on a real sample of 5 leaves in Arabidopsis family. In the figures above, we present the maximum likelihood surfaces of all the different data type representations we used for the leaf shape. The green point is the maximum likelihood of the true phylogeny, so we can straightforwardly quantify the strength of our predictions. The normal space covariance seems to give us reasonably good results of some data sets, and very poor predictions for the other. Therefore, the model is highly sensitive to the type of leaf shape representation. Olimarabidopsis pumila Arabidopsis halleri Arabidopsis lyrata Arabidopsis neglecta Arabidopsis thaliana True Tree Olimarabidopsis pumila Arabidopsis halleri Arabidopsis lyrata Arabidopsis neglecta Arabidopsis thaliana The tree has log likelihood = 67.75 Tree Comparison True Tree (left) against our most likely inferred Tree(right) Further Results and Extensions Another area of interest was to investigate the con- sequences of assuming a non-homogeneous space covariance, by increasing correlation between spe- cial points on the leaves. We chose these points as the turning points in the leaf outlines by analysing the gradient, and we changed the space covariance matrix accordingly. We observed some drastic im- provement in the prediction for some data types, particularly for the original, gradient and tip rep- resentations. Thus, provided we can find the right covariance structure that represent the leaf shape, we can make much better predictions. a) A significant improvement for true tree likelihood in original leaf representation b) Modified space-covariance matrix for leaves Halleri, Thaliana, Pumila, Neglecta The project directs to many other areas still left to investigate, from studying these modified covariance matrices and hyperparameter sensitivity more in-depth, as well as experimenting with 2-dimensional re- gression models and other representations of the leaf shapes. Gaussian Process regression proves to be a powerful method worthy of more investigation. References [1] Nick S. Jones and John Moriarty (2010). Evolutionary Infer- ence for Functional Data: Using Gaussian Processes on Phy- logenies to Study Shape Evolution. [2] C. E. Rasmussen, C. K. I. Williams (2006). Gaussian Processes for Machine Learning, the MIT Press. Acknowledgements This work was carried out as part of the Oxford Summer School in Computational Biology, 2011, in conjunction with the Depart- ment of Plant Sciences, and with support from the Department of Zoology. Funding was provided by J. Hein’s PRA. We specially thank J. W. J. Anderson, N. S. Jones and J. Hein for guidance, and everyone at the Plant Sciences that made this project possible.
  • 21. Контраст •  Избегайте похожих элементов. Если они не одинаковые (совсем одинаковые), то сделайте их действительно разными
  • 23. Контраст в тоне, насыщенности и яркости Отсутствует J Присутствует J
  • 26. Усиливаем контраст в линиях amount of mRNA under different conditions stat nod 1 2 3 4 -2 -3 60 80 100 120 140 FC
  • 28.
  • 30. Потренируемся на котиках… (мышах) • Какие принципы нарушены? • В картинках • В шрифтах • В цветах
  • 31. Как питаются россияне (slon.ru) Что не так с повтором/контрастом?
  • 32.
  • 34. Последний… TSS mapping and transcript repertoire! TSS position in relation to gene is key to its function. ! Promoter motif prediction! Transcrip)on+Start+Site+Map+Of+Soy+Symbiont+ Bradyrhizobium-japonicum-Based+On+dRNA:seq! 1Moscow!Ins*tute!of!Physics!and!Technology,!Dolgoprudny,!Russia!2A.A.!Kharkevich!Ins*tute!for!Informa*on!Transmission!Problems,! Moscow,!Russia,!3!M.V.Lomonosov!Moscow!State!University,!Moscow,!Russia,!4MassachuseKs!Ins*tute!of!Technology,!Boston,!USA,!! 5Ins*tute!of!Microbiology!and!Molecular!Biology,!JustusOLiebeg!Universitat!Giessen,!Gießen,!Germany! chuklina.jelena@gmail.com Jelena!Chuklina1,!2,!Nikolay!Lyubimov3,!Maxim!Imakaev4,!Elena!EvguenievaOHackenberg5!and!Mikhail!S.!Gelfand2,3! A+ sub+ T+ G+ sub+ C+ Outline! •  Perform new round of machine-learning with updated training set! •  Update gTSS and 5’-aTSS classification! •  Compare dRNA-seq data with expression array and proteome data! ! Acknowledgments! •  Julia Hahn and Sebastian Thalmann for experimental validation of transcription start sites and promoter motifs! •  Iakov Davydov and Aleksandr Chuklin for numerous advices on program development! •  Cynthia Sharma, Konrad Förstner, Jorg Vogel for sequencing and read mapping •  Gabriella Pessi und Hans-Martin Fischer for nodule RNA! ! Summary! 1. We! detected! 17574! peaks,! aYer! machine! learning!10071!were!leY!as!TSS.! 2. We! detected! 3979! RpoD! promoters,! 485! RpoN! mo*fs,!159!TSSes!have!both.! 3. AYer! reOannota*on! 73! ncTSS! and! 682! iTSSes! were!reOclassified!as!gTSSes.! Abstract! dRNA%seq) was) designed) for) selec4ve) sequencing) of) na4ve) transcripts) origina4ng) from) transcrip4on) start) sites) (TSS).) Here) we) present) TSSF) –) Transcrip4on) Start) Site) Finder) –) a) soBware) package) which) allows) comprehensive) analysis) of) bacterial) trancriptomic) landscape.) TSS) map) allows)to)assess)repertoire)of)small)non%coding)RNA,)inves4gate)promoter) mo4fs)and)improve)gene)annota4on.) In) this) study) we) use) TSSF) to) compare) transcriptome) of) soy) symbiont) Bradyrhizobium) japonicum,) in) liquid) cultures) and) root) nodule) popula4ng) bacteroids.)) ! Re-annotation! TSS detection. Machine learning! (+)! library! is! RNA,! selected! for! primary! transcripts,! (O)! library! is! all! RNA,! including! processed!(Fig.1).!All!peaks!matching!in!(+)!and! (O)! library! were! treated! as! candidate! TSS! and! were! subjected! to! automated! machine! learning.! ExpertOassessed! peaks! as! a! training! set! (Fig.2! and! Table1).! Machine! learning! was! performed! separately! for! freeOliving! bacteria! (FR)! and! nodules! (NO).! To! compute! support! vectors,! the! following! parameters! were! selected:! i.  Height!of!(+)!and!(O)!peak!(Fig.!3)!! ii.  ra*o!of!(+)!and!(O)!peak! iii. average!expression!in!30!b.p.!radius! Fig. 3. Peak detection: RNA-seq read coverage (blue), salience function (green), peaks (red) Fig. 5. Best-scoring patterns were used to construct Positional weight matrix (PWM). PWM threshold determination (upper): score distribution density of normal upstreams is skewed towards higher scores when compared to random sequences. Resulting logos (lower) of RpoD (σ70) and RpoN(σ54). 0.00 0.05 0.10 0.15 5 10 15 20 totalScore density normal random RpoN, score distribution density. TSSes overexpressed in nodules subs*tu*on! box2! box1! box2! box1! box2! extension! shiY! box1! !+ ISGA+vs+old+ RAST+vs+old+ RAST+vs+ISGA+ matching!!CDSes+ 4749! 4669! 7690! matching!genes+ 4796! !! !! reOannotated!start+ 3050! 2941! 898! new!genes+ 1351! 1105! 556! discarded!+ 525! 707! 127! !+ old+ ISGA+ RAST++ genes++ 8373! 9197! !! CDS++ 8317! 9144! 8715! sRNA length assessment! Typical transcript starts with TSS and ends with terminator. We used 3 publically available tools (ARNold, TransTermHP, WebGesterDB) for rho-independent terminator prediction. Only ARNold predicts terminators independently of annotated gene end and we used it to assess sRNA length. ! Only 247 TSSes were matched with terminators, their length was usually 40-200 nt, rarely more than 400 nt.! See also: poster by Julia Hahn! Fig. 2. Expert assessment of candidate TSS for training set. Fig. 1. dRNA-seq data. (+) library – red, (-) library – blue. Table 1. Training set: M a n u a l l y a s s e s s e d 0-130kb and 1681..1920 kb (symb.island) of genome Fig. 9. Start-codon re-annotation: change in protein lengths after re-annotation with RAST and ISGA. There is clear skew of ORFs which became shorter for both ISGA and RAST. This leads to iTSS re- classification as gTSS 5’-untranslated region length! Fig 7. While most of 5‘-UTR have typical length of 20-40 nt, there is considerable amount of leaderless transcripts, which s e e m s t o b e common property of bacteria Fig. 8. Re-annotation of RegR (blr0904): now the TSS №1 precedes start-codon. Old annotation is grey, new is cyan. P1, P2, P3 are predicted promoters. Table 2. Number of genes (CDS) predicted by different AGEs Table 3. Different B.japonicum USDA 110 annotations Anti-sense transcript mapping! Most! of! TSS,! classified! as! gTSS! and! aTSS! belong!to!5’OUTR!and!oYen!don’t!intersect! corresponding!an*Osense!transcripts!and!! thus!are!gTSS/oTSS,!transcribed!divergently!(as! dashed!arrow!above).!Overlap!in!various!aTSS!types! is!due!to!overlap!of!annotated!genes.! Protein:coding+genes:+ •  4084!proteinOcoding!genes!have!TSS! •  Maximal!number!of!TSS!per!gene!is!4! •  873!proteinOcoding!genes!have!more!than! one!TSS! An):+sense+RNAs:+ •  4013!genes!have!an*Osense!TSS!(2056!of! them!expressed)! Internal+TSSes:+ •  4167!genes!have!iTSSes!(2368!of!them! are!expressed)! ! ! gTSS!=!gene!TSS! iTSS!=!internal!TSS! oTSS!=!orphan!TSS! aTSS_5!! aTSS_i!!!!!!!!!an*Osense! aTSS_3! Fig. 6. Different TSS type (=transcript type) distribution. Abundance of iTSS maybe due to: 1) Operon intrinsic promoter; 2) RNA cleavage products misclassified as TSS. For aTSS misclassification analysis, see below. 1340+ oTSS! TSS! mapping! allows! for! correc*on! of! annota*on! errors,! especially! reO annota*on!of!start!codons.!!We!applied! automated! genome! annota*on! engines! (AGE)! RAST! and! ISGA! to! improve! Bradyrhizobium) japonicum) USDA) 110) annota*on.! TSS! candidate! upstream! sequences! is! enriched!with!promoter!mo*fs!!!! promoters!support!TSS!candidate!as!true!TSS.! Usually!promoters!possess:! 1.  Conserved!twoObox!sequences! 2.  Conserved!distance!to!TSS! 3.  Conserved!distance!between!boxes! We! scanned! 60! nt! sequences! upstream! of! each! predicted! TSS! (or! subset! of! TSS! u p r e g u l a t e d! i n! n o d u l e s )! t o! fi n d! overrepresented!6Ont!mo*fs.!We!allowed!1O2! nt!shiY!of!boxes!from!the!ideal!distance,!1O2! nt!extension!of!distance!between!boxes!and!! 1O2! subs*tu*ons! in! each! box,! penalizing! for! each.! Fig. 4. In the region -35 and -10 nt accordingly there are the most concentration of correlated position. Illustration is based on 5000 best patterns 0.00 0.05 0.10 0.15 5 10 15 totalScore density normal random RpoD, score distribution density. P3P2P1 1 2 1 2 3 T ATG old TTG new RegR, bll0904