Using	semiempirical methods	for	fast	and	automated	predictions
Feel	free	to	tweet,	record,	…
#WATOC
@janhjensen
Jan	H.	Jensen,	University	of	Copenhagen
Fast	and	Accurate	Prediction	of	the	Regioselectivity
of	Electrophilic	Aromatic	Substitution	Reactions	of	
Heteroaromatic Systems	
Jimmy	C.	Kromann and	Jan	H.	Jensen
University	of	Copenhagen
Morten	Jørgensen,	Monika	Kruszyk,	Mikkel Jessing
Lundbeck A/S
Contact	me	for	a	preprint
N
H
N + Br+
-H+
N
H
N N
H
N N
H
N
Br
Br
Br
or ?
Streitwieser:	Proton	affinity	correlated	with	reaction	rate
i.e.	reaction	occurs	at	site	with	highest	PA
N
H
N
N
H
N H
H
H
N
H
N
H
N
H
N
169.4182.2 181.1
H HH
H
PM3/COSMOheat	of	formation
90%	success	rate	for	520	compounds
Kcal/mol
Workflow	/	Automation
Molecule Protonated
isomers
Conformational
search
Find	lowest	
energy	isomer
Display
result
Check	for
proton	transfer
ChemDraw RDKit
SMILES
c1cnc(cc1c1n(c(c(n1)c1ccc(cc1Cl)Cl)C(=O)OC)COCC[Si](C)(C)C)NC(=O)C
RDKitRDKit
N
N
N
Cl
Cl
O
O
O
Si
N
H
O
1
RegioSQM
23
4
5
6
6	isomers	x	20	confs
SMILES
Web	server
regiosqm.org
Lessons	learned	– Asking	the	right	questions	
N
H
N
N
H
N H
H
H
N
H
N
H
N
H
N
169.4182.2 181.1
H HH
H
“Which	atom	has	the	highest	PA?”	instead	of	“What	is	the	PA?”	
“What	is	the	relative	pKa?”	instead	of	“What	is	the	pKa?”	
Jensen,	Swain,	Olsen	JPCA 2017	(10.1021/acs.jpca.6b10990	)
Lessons	learned	– Newer	SQM	methods	not	more	accurate
pKa prediction
PM3 AM1PM6 PM7PM6-DH+
Accuracy	of	solvation	energy	may	play	a	role
COSMO	Errors	for	MNSOL	database
Kcal/mol AM1 PM3 PM6
MAE 3.6 3.4 3.9
ME 0.0 0.5 1.8
Lessons	learned	– Learn	cheminformatics
All aspects	of	calculation	must	be	automated	– including	“chemical	transformation”	and	analysis	
Cheminformatics	toolkits,	e.g.	RDKit,	are	incredibly	useful	for	“chemical	transformation”	problems
Datasets	must	include	SMILES.		XYZ	->	SMILES	unsolved	problem.					
Protonate
here
But	not
here
Lessons	learned	- Cheminformatics
Creating	intermediate/products	as	easy	as	creating	protonation	states
Setup	for	TS	search?
Must	solve	atom	mapping	problem
Lessons	learned	– Use	by	non-experts
Automation ✔
Interface ✔
(SMILES	input/HTML)
Installation ?
(MOPAC,	Python,	RDKit,	oBabel)
Parallel	computing/queueing	?
N
N
N
Cl
Cl
O
O
O
Si
N
H
O
~100	geometry	optimizations	->	100-200	minutes	on	1	core
->	4-8	minutes	on	24	cores
Options
Webserver/dedicated	node
Pay-per-use	cloud	computing
Thank	you
Using semiempirical methods for fast and automated predictions

Using semiempirical methods for fast and automated predictions