ケモインフォマティクス

Masaaki Kotera,
Laboratory of Chemical Life Science,
Bioinformatics Centre, Institute for Chemical Research,
Kyoto University.
ケモインフォマティクス

イントロダクション
遺伝子
タンパク質
代謝化合物
パスウェイ
機能
バイオインフォマティクス
薬物
システムズバイオロジー
ケモインフォマティクス

Reference
pathways

• Combined
pathways
from
many
organisms

Organic-‐speciﬁc
pathways

• Reconstructed
from
many
evidences
such
as
genome,
metabolome,
…

KEGG
global
map

http://www.genome.jp/kegg/pathway.html

KEGG
global
map
of
human


Human
metabolome


Human
gut
meta-‐metabolome


Why
bother
with
chemical

structures?
①

http://chemgarden.littlestar.jp/

• Metabolic
network
is
“small
world”.

– Jeong
et
al.,
Nature,
2000.

– Fell
and
Wagner,
Nature
Metabolic
Engineering,

2000.

• No,
it
is
not.

– Ma
and
Zeng,
Bioinforma4cs,
2003.

– Arita,
PNAS,
2004.

Large-‐scale
organizaTon
of

metabolic
networks

Metabolic
world
is
not
small

反応タイプが同じでもドメイン構成が違えばEC
番号が変わってしまう例

Why
bother
with
chemical

structures?
②


:
:
:
:
:
:
:
:
1. oxidoreductases
2. transferases
3. hydrolases
4. lyases
5. Isomerases
6. ligases
1.1
1.2
1.3
1.4
1.5
1.3.1
1.3.2
1.3.3
1.3.5
1.3.1.1
1.3.1.2
1.3.1.3
1.3.1.69 zeatin reductase
Class Subclass Sub-subclass Complete EC number
IUBMB’s
Enzyme
List
(EC
numbers)

IUBMB = International Union of Biochemistry and Molecular Biology

EC
classiﬁcaTon
criteria

Class Subclass Sub-subclass Remarks
1. Oxidoreductases Functional groups
of reductants
Oxidants Which compounds are
reductants, or oxidants?
2. Transferases Transferred
groups
Transferred groups in detail From where to where?
3. Hydro-lases Hydrolyzed bond Hydrolyzed bond in detail Nucleases and
peptidases are classified
in much more detail.
4. Lyases Digested bond Types of products Some hydrolase-like
reactions
5. Isomerases Types of
isomeration (RS,
EZ, Redox, Transfer,
Elimation）
Types of reacting bonds, or
products
Any one-molecular
reactions.
6. Ligases Generated bond Types of substrate Multi-step reactions

SCOPEC

Thioesterase domain
of polypeptide,
polyketide and fatty
acid synthases
Aspartate aminotransferase-
like domain
NAD(P)-binding
Rossmann-fold domain
Phosphotransferases
on alcohol groups
Hydro-lyase
(Trans)glycosidases
Trypsin-like serine
proteases
Alkyl or aryl
transferase
Alcohol dehydrogenase
using NAD(P)+
P-loop containing
nucleoside triphosphate
hydrolases
O- or S-glycosidases

What
is
meant
by
being
“similar”?

• Enzyme
“proteins”

are
similar

• Enzyme
“reacTons”
are

similar

Sequence 3D
Globally Full-length Fold
Locally Motif Cavity
Reaction Substrates
Globally ? ?
Locally ? ?

Why
bother
with
chemical

structures?

① 代謝経路の流れを考える必要
② 酵素「タンパク質」の類似性と酵素「反応」の類似性の区別

Genome
annotaTon
with
chemical
point
of
view

G
E
R
Genes
Enymes
Reactions
Organisms #1
G’
E’
R
#2
Similar
Similar
Identical
KEGG Orthology (KO)
G
E
R
Genes
Enzymes
Reactions
G’
E’
R’
Similar
Similar
Similar
Reaction Class (RC)
Genestoreactions
Reactionstogenes
Sequence similarity groups Reaction similarity groups

ケモインフォマティクスの基礎

• Chemical
data
storage
and
retrieval

– Chemical
ﬁle
formats
(SMILES,
Molﬁles,
etc)

– Chemical
databases

• Virtual
screening

• QuanTtaTve
structure-‐acTvity
relaTonship

(QSAR)

http://www.amazon.co.jp/dp/4621075527

化学構造フォーマット

• Chemical
line
notaTons

– SMILES
…
Simplified molecular input line entry specification

– SLN
…
SYBYL
Line
NotaTon

– InChI
…
The
IUPAC
InternaTonal
Chemical
IdenTfier

• Chemical
table
files

– Molfiles,
SDF

– KCF
…
KEGG
Chemical
FuncTon

– Protein
Data
Bank
Format

• Chemical
XML

– CML
…
Chemical
Markup
Language

IUPAC命名法
• InternaTonal
Union
of
Pure
and
Applied

Chemistry

• 命名の基本：最も長い炭素直鎖に数詞をつけ
て命名する。

• ちょっとだけ詳しく：

– http://kusuri-jouhou.com/chemistry/
iupac.html

Morgan法
http://www.amazon.co.jp/dp/4621075527

KEGG
Chemical

FuncTon
(KCF)

フォーマットによる
化合物構造表現
(Hattori et al., 2003)

化学構造フィンガープリント
Φ(C) = ( 0, 1, 0, 1, 1, 0, 1, 0, 0, … )

代表的なフィンガープリント
Fingerprints
Dimension
CDK
fingerprint
1024
CDK
extended
fingerprint
1022
CDK
graph-‐only
fingerprint
1024
CDK
hybridizaTon
fingerprint
1024
E-‐state
fingerprint
71
Klekota-‐Roth
fingerprint
4860
MACCS
fingerprint
164
PubChem
fingerprint
879

OpenBabel

• Free
soiware
mainly
used
for
converTng

chemical
ﬁle
formats.

• Available
for
Windows,
Unix,
and
Mac
OS.

• Distributed
under
the
GNU
GPL.

• hjp://openbabel.org/

http://cdk.github.io/cdk/1.4/docs/api/
CDKがどんなツールを持
つかなど理解するには、
CDKのサイトからjavadoc
へのリンクをたどるのが一
番早いです。

ChemAxon
の
Marvin
View
• Marvin ViewはSDファイルやSmilesなど様々な構造フォー
マットのファイルを読み込み、テーブル形式やグリッド形式で
の表示が行えます。
• Marvin Viewはスタンドアロンの構造ブラウジングツールとし
ての利用はもちろん、Webアプリケーションやカスタムアプリ
ケーションに組み込んで利用することもできます。
• シンプルなレンダラ―を用いれば数万もの分子を迅速にタイ
ル形式や表形式で表示できます。属性フィールド、分子名、
IUPAC名、SMILESも表示可能です。

• （無料版でもそこそこ使えます）

• http://chemaxon.jp/wp/product/marvin-view

化合物データベース
• KEGG
COMPOUND

– hjp://www.genome.jp/kegg/compound/

• ChEBI

– hjp://www.ebi.ac.uk/chebi/

• PubChem

– hjp://pubchem.ncbi.nlm.nih.gov/

• ChemSpider

– hjp://www.chemspider.com/

• ZINC

– hjp://zinc.docking.org/

• Chemical
Substances
Database

– hjp://www.saglasie.com/tr/chemical/

KEGG

COMPOUND
（けぐこんぱうんど）
生体分子の機能分類に基づいて検
索ができる
（もちろん分子名での検索も可能）。

PubChem

http://pubchem.ncbi.nlm.nih.gov/
分子量やキラル原子の数、
元素の種類などで分子の検
索ができる（もちろん分子名
での検索も可能）
（ぱぶけむ）

PubChem

分子量やキラル原子の数、
元素の種類などで分子の検
での検索も可能）
「SDF」をクリックするとSDF形式
の化合物構造データが手に入る
（SDF = mol ファイルの拡張形式）
（ぱぶけむ）

ChEBI

http://www.ebi.ac.uk/chebi/
（けびぃ、ちぇびぃ）
分子量や電荷数などの他、
分子描画ツールで分子の検
での検索も可能）。

ChEBI
（けびぃ、ちぇびぃ）
分子量や電荷数などの他、
分子描画ツールで分子の検
での検索も可能）。
ココをクリックするとSDF形
式の化合物構造データが
複数まとめてダウンロード
できる。
（SDF = mol ファイルの拡張形式）

MassBank

• MSn
の類似スペクトル検索

• キーワードによるスペクトルの簡易検索

• イオン、中性脱離分子、分子式による検索

• 部分化学構造式での検索

• 中性脱離分子を利用したMSｎのスペクトル検索

• スペクトル閲覧・比較ツール

• MSn
の類似スペクトル一括検索サービス

• スペクトルのブラウズ

• スペクトルのカテゴリ別表示

• WEB-‐API
による
MassBank
へのアクセス
http://www.massbank.jp/ja/database.html より抜粋

MS
and
NMR
databases
• MassBank

– http://www.massbank.jp/ -MS

• Human
Metabolome
Database

– hjp://www.hmdb.ca/

-‐
both
MS
and
NMR

• Biological
MagneTc
Resonance
Data
Bank

– hjp://www.bmrb.wisc.edu/structgen/
-‐
NMR

• CHENOMX

– hjp://www.chenomx.com/
-‐
MS,
有料

• METLIN

– hjp://metlin.scripps.edu/
-‐
MS

• Fiehn
Lib

– hjp://ﬁehnlab.ucdavis.edu/Metabolite-‐Library-‐2007/
-‐
MS

最近の研究の紹介
Supervised
de
novo
reconstrucTon
of
metabolic

pathways
from
metabolome-‐scale
compound

sets

Bioinforma4cs,
29,
i135-‐144
(2013).

KCF-‐S:
KEGG
Chemical
FuncTon
and
Substructure
for

improved
interpretability
and
predicTon
in

chemical
bioinformaTcs

BMC
Systems
Biology,
in
press
(2013).

・中間体予測？反応ステップ予測？
・化学構造変換ルール依存？非依存？
・ターゲット？非ターゲット？
de
novo
代謝パスウェイ予測

“酵素反応らしさ”
この化合物ペアは「酵素反応らしい」かどうか？
この化合物ペアはどうか？
反応ステップ予測は、化合物ペアが「酵素反応らしい」かどうかを判定する問
題と見なせる。

化合物ペアの特徴ベクトル
Φ(C)
Φ(C’)
Differential features
Common features
where I(E) : indicator function which returns 1 if E = true otherwise 0.
Ex)

L2-regularized linear SVM for
compound pairs (L2SVM)
Given
, we solve the
following optimization problem :
where
+1
-1

五分割交差検定
Fingerprint
L1SVM
L2SVM
L1SVM
L2SVM
BASELINE
RANDOM
CDK
0.957
0.942
0.958
0.943
0.873
0.500
CDK
extended
0.960
0.945
0.960
0.946
0.876
0.500
CDK
graph
only
0.938
0.921
0.941
0.923
0.823
0.500
CDK
hybridizaTon
0.951
0.935
0.952
0.936
0.826
0.500
E-‐state
0.817
0.777
0.817
0.778
0.719
0.500
Klekota
Roth
0.951
0.935
0.952
0.936
0.854
0.500
MACCS
0.909
0.902
0.908
0.902
0.799
0.500
PubChem
0.952
0.947
0.954
0.925
0.871
0.500
Baseline: 化学構造の類似性が高い化合物ペアを「酵素反応らし
い」と判定したコントロール実験
Diff-common
Diff-only
AUC scores

KCF-‐S
=
KCF
の拡張

生化学者に馴染みの深い「部分構造断片」の生成

演習

1. KegDrawを使ってみよう

• 化合物描画ツール

2. SIMCOMP
を使ってみよう

• 化合物類似構造検索ツール

3. PathPred

• 化合物代謝予測ツール

SIMCOMP
1. hjp://www.genome.jp/tools/simcomp/
にアク
セス

2. データを貼付けて「View
Structure」をクリック

• サンプルデータは hjp://web.kuicr.kyoto-‐u.ac.jp/
~kot/enshu/
のenshu*.mol

3. さらに「Compute」をクリック

4. 結果が出たら、「Map
to
Pathway」または「Map

to
BRITE」を選択して「Exec」

• 入力した化合物がどの分類に属するのか、どの
パスウェイで合成されそうか、考えてみましょ
う。

029
Unknown
molﬁle

mol
ファイルか
SDF
ファイルさ
え手に入れば、、、

030
Simcomp
search

SIMCOMP や SUBCOMP で類似性検索
をして

034

引っかかって来た化合物をマッピ
ングできます。

KegDraw
1. hjp://www.genome.jp/download/　にアクセス

2. KegDraw
0.1.11beta
の
Mac
OSX
[dmg]
をクリックしてダウ
ンロード

3. データをロード

• サンプルデータは hjp://web.kuicr.kyoto-‐u.ac.jp/~kot/
enshu/
のenshu*.mol

4. 結合を消してみたり、新たに書き加えたりしてみましょう

5. そして「Tools」メニューから「Search
Structure」
→「SIMCOMP」

• あとはさっきと同じ。

060
KegArray

KegDraw
は化合物／糖鎖構造描画ツールで、
SIMCOMP
検索にも対応してます。

PathPred
1. hjp://www.genome.jp/tools/pathpred/
にアクセ
ス

2. 「Biosynthesis
of
Secondary
Metabolites
(plants)」を
チェックし「Next」をクリック

3. データを貼付けて「View
Structure」をクリック

• サンプルデータは hjp://web.kuicr.kyoto-‐u.ac.jp/~kot/enshu/

のenshu*.mol

4. さらに「Compute」をクリック

5. 結果が出たら、画面中程にある<show
all
path>をク
リック

• CXで始まる化合物は予測された中間体、Cで始まる化合物
はKEGGに登録されている（存在の知られている）化合物

037
PathPred

パスウェイ上に載っていない化合物の場合、
パスウェイ予測プログラム
PathPred
にか
けてみるのもいいでしょう。

レポート課題
① 化合物XのMolﬁleを作成せよ。

② 化合物XのSMILESを作成せよ。

③ KEGGデータベース中でこれと同じ化合物は存在するか？

 ある場合は、その化合物のKEGG
IDを答えよ。

 ない場合は、最も類似している化合物のKEGG
IDを答えよ。ま
た、化合物Xがその最も類似している化合物から生合成される
と仮定したとき、どのような酵素反応が起こると考えられるか、
考察せよ。

Molﬁle,
SMILES
および③の回答を圧縮ファイルにし下記にメールす
ること。

提出先：kyomu@bic.kyoto-‐u.ac.jp　締切：2013年12月2日（月）
化合物X

ケモインフォマティクス

Recommended

Recommended

More Related Content

Similar to ケモインフォマティクス

Similar to ケモインフォマティクス (12)

More from Mas Kot

More from Mas Kot (12)

ケモインフォマティクス