1. 20
40
60
80
20 30 40 50
Fitted Pleasantness
Pleasantness
(R= 0.493, P<0.001)
2
4
6
0 100 200 300
Complexity
Pleasantness
20
40
60
80
0 100 200 300 400 500
Complexity
Kermen Model Prediction
Previous Models
Challenge Overview
[1]
Odor Intensity
Boelens model
Predicting olfactory perception from chemical structure
Chung Wen Yu1, Yusuke Ihara1,2, Joel D. Mainland1,3
1
Monell Chemical Senses Center, Philadelphia, PA
2
Institute for Innovation, Ajinomoto Co., Inc., Kawasaki, Japan
3
University of Pennsylvania, Philadelphia, PA
• Goal: Predict olfactory perception using
chemical structure
• Data: 49 human subjects rated the perceptual features of
476 odorants
• Subchallenge 1: Predict the ratings of every subject
• Subchallenge 2: Predict the mean and standard
deviation of ratings across subjects
• Khan et al., 2007 predicted the pleasantness of odorants using
a linear model with the first seven principal components of
physicochemical space [1].
• This model had similar performance on the DREAM challenge
dataset.
• Kermen’s model [2] found that molecular complexity
(a combination of size and symmetry) predicted odor
pleasantness.
• This model does not perform as well as Khan’s model.
Khan et al., 2007
Kermen et al., 2011
Khan et al. (2007) Figure 5. C
Khan Model Prediction
(R= 0.304, P<0.03)
Complexity Model (Kermen et al., 2011)
Pleasantness
(R= 0.286, P<0.001)
• The intensity model relies on structural
features to make predictions.
• Boelens model predicts whether or not
molecules have an odor based on their
volatility and lipophilicity [4].
• Previous studies predicting olfactory
thresholds modeled air to receptor
transport [5,6].
−4
0
4
−200 0 200 400
Boiling Point (°C)
logP
ethylene glycol
water
glycerin
sorbitol
maltitol
caffeine
L-Arginine
L-Histidine
TNT
ethyl salicylate
ethane
propane
butane
methane
Krypton
acetone
ethyl mercaptan
ethanol
carbon monoxide
hexane
pentane
O
O
O
OH
OH
OH
OH
OH
HO
HO
OH
OH
HO
HO
HO
OH
OH
OHH
O
O OH
OHHO
HO
OH
OH
OH
OH
HO
OH
OHHO
HO
OH
OO
H3
C
N O
N
O
CH 3
CH 3
N
N
H3 C
CH 3
N
+
O
O
–
N
+
OO
–
N
+
O
O
–
CH 3
O
H3C
H3C SH
Alkanes
0.0
0.2
0.4
0.6
INTENSITY
SWEET
VALENCE
FRUIT
CHEMICAL
BAKERY
GARLIC
DECAYED
BURNT
SOUR
FLOWER
SWEATY
ACID
MUSKY
FISH
COLD
SPICES
AMMONIA.
WOOD
GRASS
WARM
rvalue
CV
LB
−0.2
0.0
0.2
INTENSITY
SWEET
VALENCE
FRUIT
CHEMICAL
BAKERY
GARLIC
DECAYED
BURNT
SOUR
FLOWER
SWEATY
ACID
MUSKY
FISH
COLD
SPICES
AMMONIA
WOOD
GRASS
WARM
LBrvalue-CVrvalue
Generate Predictive models
• Features: we used physicochemical
descriptors [3], Morgan fingerprints,
and NSPDK fingerprints.
• Initial data cleaning: we removed
non-informative variables, and performed
cube-root transformation and normalization.
• Model building: we built predictive models
using the Extra-Trees algorithm
• Cross-validation: we performed 5-fold CV,
repeated twice.−0.2
0.0
0.2
0.4
0.6
BURNT
BAKERY
WOOD
WARM
SPICES
FRUIT
GRASS
CHEMICAL
SWEATY
SOUR
SWEET
ACID
AMMONIA
COLD
MUSKY
GARLIC
INTENSITY
FISH
FLOWER
DECAYED
VALENCE
ImportanceofSimilarityDescriptors
Similarity
Features
Physicochemical
Features
X1v
X0sol
VvdwMG
Vx
MW
X1sol
ATS1p
X0v
AMR
Dilution
0.0 0.1 0.2
Variable Importance Score
0.25
0.50
0.75
0
50
100
150
200
6823
Number of Variables
rvalue
CV
LB
Test-Retest
0.76
Moskone Musk ketone
Musk xylene Musk ambrette
Musks
A B C
D E F
G H I
J K L
M N O
WiA_B.m.
NSPDK155708
SpMaxA_B.m.
Ho_Dz.Z.
Ho_Dz.m.
NSPDK8767
NSPDK8768
NSPDK56642963
SpMax7_Bh.s.
NSPDK61229
−0.1 0.0 0.1
Variable Importance Score
Odor Pleasantness
NSPDK61229:
Acetylvanillin
NSPDK155708:
Ethyl vanillin acetate
2D Structual Matrix:
atomic mass, polarizability,
charges, etc.
+
0.25
0.50
0.75
0
50
100
150
200
6823
Number of Variables
rvalue
CV
LB
Test-Retest
0.71
• 20 molecules were rated twice by each
subject, allowing us to calculate the
test-retest correlation.
• Test-retest sets the ceiling for predictive
models.
• Averaged subject ratings are more reliable
than individual subject ratings.
0.00
0.25
0.50
0.75
FRUIT
SWEET
VALENCE
GARLIC
INTENSITY
BAKERY
DECAYED
CHEMICAL
AMMONIA
FLOWER
SOUR
ACID
SPICES
FISH
BURNT
MUSKY
WOOD
COLD
WARM
SWEATY
GRASS
rvalue
Test-Retest
Subchallenge 1
Subchallenge 2
Best DREAM model
Subchallenge 2
Test-Retest
1. Khan M.K., Luk C., Flinker A., Aggarwal A., Lapid H., Haddad R., Sobel N. (2007). Predicting Odor Pleasantness from Odorant Structure: Pleasantness as a
Reflection of the Physical World. The Journal of Neuroscience. 27(37):10015-10023.
2. Kermen F., Chakirian A., Sezille C., Joussain P., Le Goff G., Ziessel A., Chastrette M., Mandairon N., Didier A., Rouby C. & Bensafi M. (2011). Molecular com-
plexity determines the number of olfactory notes and the pleasantness of smells. Scientific Report. DOI: 10.1038/srep00206
3. Talete srl, DRAGON (Software for Molecular Descriptor Calculation) Version 6.0 - 2012 - http://www.talete.mi.it/
4. H. Boelens. (1983). Structure-activity relationships in chemoreception by human olfaction. TIPS. 421-426
5. Abraham MH., Sanchez-Moreno R., Cometto-Muniz JE., Cain WS. (2012). An Algorithm for 353 Odor Detection Thresholds in Humans. Chem.
Senses 37: 207-218
6.Hau KM., and Connel DW. (1998). Quantitative Structure-Activity Relationships (QSARs) for Odor Thresholds of Volatile Organic Compounds (VOCs).
Indoor Air 8: 23–33
Bakery - Vanillin
NSPDK61660:
S-Furfuryl thioacetate
(sulfurous)
NSPDK62131:
Furfuryl Methyl Disulfide
(alliaceous)
NSPDK7363:
Furfuryl Mercaptan
(coffee)
Burnt - Furfuryl
NSPDK539829:
Vanillin Isobutyrate
Morgan8467:
Ethylvanillin
NSPDK1183:
Vanillin
Decayed - Thiol
0.15
morgan6997:
2-Ethylphenol
morgan18515:
2-Butylpheno
morgan6943:
2-Isopropylphenol
NSPDK62444:
S-Methyl thiobutyrate
(Cheesy)
NSPDK13582:
4-Butyrothiolactone
(Garlic)
NSPDK7969:
Benzenethiol
(Meaty)
Chemical - Phenol
Conclusion
• Averaged subject ratings are reliable for both odor intensity and pleasantness. Most of the
perceptual descriptors show great within and across subject variance.
• The intensity model performed comparably to the test-retest correlation, while the
pleasantness model performed worse than the test-retest correlation.
• Models for the 19 descriptors performed worse than intensity and pleasantness models.
• Some models rely on physicochemical features, whereas others use molecular
template-matching.
Literature Cited
Predicting Odor from Structure