Ashutosh	
  (Ash)	
  Jogalekar	
  
(h1p://wavefunc:on.fieldofscience.com)	
  
About me
•  Medicinal and computational chemist working in the biopharma
industry in Cambridge, MA.
•  Blogger “The Curious Wavefunction”
•  Contact:
- Blog: http://wavefunction.fieldofscience.com
- Twitter: @curiouswavefn
- Email: curiouswavefunction@gmail.com
Two kinds of scientific revolutions
•  Idea-driven (Kuhn): physics (quantum theory), astronomy
(expanding universe), biology (evolution)
•  Tool-driven (Galison): engineering (transistor), biology
(sequencing), astronomy (telescope).
Thomas	
  Kuhn	
  
The Structure of
Scientific Revolutions
(1962)
Image and Logic
(1997)
Peter	
  Galison	
  
Chemistry as an experimental science has benefited
much more from tool-driven revolutions.
Major tool-driven revolutions in chemistry
Our latest (and greatest) tool
The Computer
“I	
  think	
  it's	
  fair	
  to	
  say	
  that	
  personal	
  computers	
  have	
  become	
  the	
  
most	
  empowering	
  tool	
  we've	
  ever	
  created.	
  They're	
  tools	
  of	
  
communica=on,	
  they're	
  tools	
  of	
  crea=vity,	
  and	
  they	
  can	
  be	
  
shaped	
  by	
  their	
  user.”	
  –	
  Bill	
  Gates	
  
A brief history of computers in chemistry
•  1950s: Driven by quantum chemistry and crystallography.
•  Early efforts needed access to centralized machines, travel. Computations
enormously expensive: 1.5 years (1959) vs one day (2014).
Punched	
  card	
  
(2014)	
  
Punched	
  card	
  
(1960)	
  
UNIVAC	
  1:	
  1.5	
  yrs	
  to	
  calculate	
  12	
  molecules	
  
Apple	
  MacBook	
  Air:	
  4	
  hours	
  for	
  same	
  calcula:on	
  
•  1958: Moore’s Law; doubling of transistors every two years.
•  1970s: Use of computers started becoming routine. Still slow.
•  1990s: Exponential developments in desktop computing, software, internet.
•  2000s:Applications to biology, materials science become routine.
	
  
How have computers affected chemistry?
•  Publications: ~25 major journals, also described in others.
•  Companies: Schrodinger, OpenEye, CCG, Perkin-Elmer etc.
•  Conferences: Gordon Conference, IAQMS.
•  ACS Division of Computers in Chemistry.
•  Awards:ACS Award for Computers in Chemistry.
Data
Simulation & Analysis
Sociology
The Future
Data
“It	
  is	
  a	
  capital	
  mistake	
  to	
  theorize	
  before	
  one	
  has	
  data.”	
  	
  
-­‐-­‐	
  Arthur	
  Conan	
  Doyle	
  (“Sherlock	
  Holmes:	
  A	
  Scandal	
  in	
  Bohemia”)	
  
Chemical data has grown exponentially
Growth	
  of	
  the	
  Cambridge	
  Structural	
  Database	
  (Image:	
  CSD)	
  
Why? Better tools to determine and record structures, properties.
Data repositories have enabled easy and
instant global access to data.
	
  
•  Chemical Abstracts Service
(CAS): 75 million registered
substances.
•  Protein Data Bank (PDB):
97, 000 protein structures.
•  Cambridge Structural
Database (CSD): 40, 000
added every year.
•  Scifinder, Google Scholar.	
  
Standardization
•  Chemical structure representation: drawing, manipulation.
Standard, multiple compressed file formats (eg. SMILES strings),
error-free sharing of data.
•  E-Notebook: Standardized and safe record keeping, organization,
analysis and visualization.
ChemDraw	
  
SMILES	
  
Data is easier to compare, verify and reproduce.	
  
What can we do with all
this data?
Visualization
•  Instant visualization of data in various forms, user-friendly presentation;
eg. Spotfire, instant Jchem etc.
•  Tools ranging from basic plots to advanced, on-the-fly statistical analysis
(eg. principal component analysis, regression) now available.
•  Instant comprehension of complex biomolecular and inorganic structures
(eg. Pymol).
Much easier to make sense of data and property relationships.
Software for chemical analysis
•  What do you use software for? Analytical, spectroscopic,
purification?
•  Advanced techniques now more easily accessible.
•  Enormous savings in time and labor.
NMR	
   Crystallography	
   GC-­‐MS	
  
Ubiquitously affected everyday chemical research and the
work of bench chemists.
Using data intelligently: Cheminformatics
•  Applying tools from informatics and computer science to extract
meaning from data.
•  Most common problems: Searching, finding trends, correlating
chemical structures to various properties (descriptors).
If	
  only	
  all	
  correla:ons	
  were	
  this	
  good…	
  
Case Study I: Similarity searching
- Simplified representations (eg. bit strings) make searches of
millions of molecules very fast
- Tanimoto similarity: Efficient, can be calculated for any property.
- Drug side effects similarity prediction especially promising.
Tanimoto	
  similarity	
  between	
  molecules	
  
J.	
  Med.	
  Chem.,	
  2010,	
  53,	
  4830	
   Drug	
  side	
  effects:	
  Nature	
  Biotechnology	
  2007,	
  25,	
  197	
  
Case Study II: Diversity analysis
•  Humans are pattern-seeking; often ignore diversity to focus on
similarity.
•  Maximizing diversity = Maximize probability of finding new
molecules with novel properties.
•  Create molecular libraries of millions of compounds; screening
collections for drug discovery, materials science etc.
Shape	
  diversity:	
  Nat.	
  Chem.	
  Biol.	
  2012,	
  8,	
  358	
   Voltage	
  vs	
  safety	
  of	
  Li-­‐ion	
  ba8eries:	
  Nat.	
  Mat.	
  2013,	
  12,	
  191	
  
Simulation and Analysis
“Nobody	
  believes	
  a	
  theore=cal	
  result,	
  except	
  the	
  person	
  who	
  
calculated	
  it.	
  Everybody	
  believes	
  an	
  experimental	
  result,	
  except	
  
the	
  person	
  who	
  measured	
  it.”	
  	
  
-­‐-­‐	
  Paul	
  Labute	
  (Chemical	
  Compu=ng	
  Group)	
  
How it happened
Michael Levitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)
How it happened
Michael Levitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)
Major applications: QM and MM
•  Quantum chemistry made computers; computers made quantum
chemistry.
•  Molecular mechanics: Classical mechanics applied to molecules.
•  QM equations cannot be solved exactly. Need approximations,
iterative processing, and computing power.
•  Useful for calculating many properties (energies, dipole moments,
reactivity).
Poten:al	
  energy	
  surface	
  	
  
for	
  chemical	
  reac:ons	
  
Fullerene	
  from	
  graphene:	
  Nat.	
  Chem.	
  2010,	
  2,	
  450	
  
The 2013 Nobel Prize
•  Tradeoff: Quantum mechanics (QM) - accurate but expensive.
Molecular mechanics (MM) – inaccurate but cheap.
•  QM/MM: Best of both worlds, multiscale.
•  Applicable to large biological systems (proteins, DNA), extended
materials (zeolites, polymers).
A Few Good Applications
Molecular Dynamics
•  Molecular Dynamics (MD): Newton’s laws of motion applied to
molecules, millions of steps; large amounts of data.
•  Parallel processing, special-purpose machines allow MD to surpass
Moore’s Law.
•  Simulations approaching biological timescales becoming routine.
Knowledge-Based Protein Folding
•  Knowledge-based protein structure prediction:Taking advantage of
existing information in PDB to predict folded structures.
•  Use advanced statistical methods based on PDB data for assigning
probabilities to various solutions: Rosetta.
•  Outstanding success in CASP (Critical Assessment of Protein
Structure).
"The	
  amazing	
  thing	
  is	
  that	
  Rose1a	
  had	
  31	
  points	
  and	
  the	
  next	
  best	
  group	
  had	
  8	
  points.	
  It	
  is	
  
like	
  baseball	
  in	
  1927,	
  when	
  Babe	
  Ruth	
  hit	
  60	
  home	
  runs	
  and	
  the	
  runner	
  up	
  hit	
  14,	
  and	
  
en:re	
  teams	
  didn't	
  hit	
  as	
  many	
  as	
  he	
  did”.	
  –	
  Peter	
  Kollman	
  (UCSF),	
  CASP	
  2000.	
  
Overlap	
  between	
  predicted	
  (red)	
  and	
  experimental	
  (green)	
  protein	
  structures	
  
Protein design
•  Protein design: Given a structure, find alternative sequences.
•  Uses of alternative sequences: Enzymes catalyzing new reactions, new
small molecule-binding proteins (eg. for environmental cleanups).
•  2003: First protein designed entirely de novo.
•  2008: First enzyme catalyzing reaction with no natural precedent.
•  As PDB grows, protein design becomes better.
Top7:	
  Protein	
  designed	
  from	
  scratch.	
  
(Science,	
  2003,	
  302,	
  1364)	
  
Kemp	
  eliminase	
  enzyme	
  from	
  scratch	
  
(Nature,	
  2008,	
  453,	
  190)	
  
Structure-Based Drug Design
•  Predict structure of drug bound to protein, suggest modifications to
improve properties.
•  Combination of crystallography data and simulation.
•  Outstanding success in some areas: eg. HIV protease inhibitors against
AIDS.
Impact	
  of	
  addi:on	
  of	
  HIV	
  protease	
  inhibitors	
  
to	
  an:retroviral	
  therapy	
  among	
  AIDS	
  pa:ents	
  in	
  
San	
  Francisco	
  (Am	
  J	
  Epidemiol.	
  152,	
  2,	
  2000)	
  
HIV	
  protease	
  
bound	
  to	
  
indinavir	
  
Katharine	
  
Holloway	
  
The wisdom of crowds (and clouds)
•  FoldIt: Computer game to solve
protein folding and design problems.
•  Led to HIV protein structure and
algorithm discovery.
PNAS,	
  2011,	
  108,	
  18949	
  	
  
Comparison	
  of	
  Folding@Home	
  
with	
  leading	
  supercomputers	
  
•  Distributed computing,
Folding@Home: 100 million hours
logged on Nintendo PS3, also enabled
on cloud.
•  Used to study folding of proteins
involved in cancer, Alzheimer’s
disease; drug design.
Exciting Future Areas
New materials for the new millennium
•  Based on Density Functional Theory (Nobel Prize 1998).
•  Application of materials simulations and computational screening:
- Hydrogen storage (metal-organic frameworks)
- Photovoltaics and solar cells
- Alloys and new materials for batteries
- Semiconductor design
The age of biology
•  Human Genome Project: Computers made it possible.
•  Sequencing has greatly surpassed Moore’s Law. New techniques;
IonTorrent, Nanopore etc.
•  Computational Biology and Bioinformatics: Comparing genomes,
predicting diseases, mapping ancestral differences.
•  Aided by massive amounts of data: GenBank, Cancer Genome,
Ensembl, UniProt etc.
•  Ripe territory for Big Data and new informatics techniques.
Sociology
“The	
  democra=za=on	
  of	
  informa=on	
  and	
  exper=se	
  that	
  springs	
  
from	
  the	
  world	
  wide	
  web,	
  and	
  the	
  power	
  of	
  groups	
  of	
  mo=vated	
  
amateurs	
  to	
  strike	
  out	
  on	
  their	
  own	
  in	
  technical	
  subjects,	
  is	
  
weakening	
  the	
  authority	
  of	
  “experts”	
  in	
  society.”	
  -­‐-­‐	
  George	
  
Whitesides.	
  
The chemical blogosphere
•  Chemistry blogs took off in 2002, initially focused on research, grad
school hijinks.
•  Quickly diversified; peer review, job market, academic culture, publishing,
issues in industry, safety culture.
Derek	
  Lowe:	
  drug	
  discovery,	
  industry	
  
Chemjobber:	
  The	
  Job	
  Market,	
  safety	
  culture,	
  industry	
  
Paul	
  Bracher:	
  academic	
  culture,	
  peer	
  review	
  
Ash	
  Jogalekar:	
  Nature	
  and	
  evolu:on	
  of	
  chemistry,	
  peer	
  review	
  
SeeArrOh:	
  Chemophobia,	
  food,	
  peer	
  review	
  
C&EN	
  official	
  blog	
  
James	
  Ashenhurst:	
  Org	
  Chem	
  tutoring	
  
What are blogs good for?
Peer Review 2.0
•  Timely, democratic review of
latest research.
•  Interesting research highlighted
immediately.
•  Critiqued by large audience.
•  Self-selecting.
•  Instrumental in spotting:
fallacious research, self-
plagiarism, dubious
methodologies and fabrication.
Non-research contributions
•  Lab safety (C&EN).
•  Academic culture (Chembark).
•  The (sad) state of the job market
(Chemjobber).
•  Representation of women, minorities
(Dr Rubidium).
•  Chemophobia, Industry (SeeArrOh,
Derek Lowe).
Peer Review 2.0: case study I
•  First reported instance of comprehensive informal peer review of chemical
literature.
•  2006: 37 step synthesis of hexacyclinol described in single-author paper in
Angew. Chem. by James LaClair (Xenobe Institute).
•  Commenter on blog of Stanford grad student Dylan Stiles points out
inconsistencies in structure, others weigh in and point out many more.
•  Other official papers refute data, suggest alternative structure.
•  Extensive discussion of problems with paper on multiple blogs, hundreds of
comments. Paper retracted in 2012, long after problems were clear.
“The	
  proof	
  is	
  in	
  the	
  product”.	
  
Peer Review 2.0: case study II
•  April 2012: Paper in JACS on amino acid chirality and origin of life.
•  Two issues: Bad scientific communication and charges of self-plagiarism.
•  Extensive similarities with two previous articles highlighted by Nature
Chemistry editor Stuart Cantrill exclusively on Twitter.
•  Paper retracted in May 2012.
•  Case illustrates peer-review operating entirely outside formal channels.
ACS	
  Press	
  Release:	
  “New	
  scien:fic	
  research	
  
raises	
  the	
  possibility	
  that	
  advanced	
  versions	
  of	
  
T.	
  rex	
  and	
  other	
  dinosaurs	
  —	
  monstrous	
  
creatures	
  with	
  the	
  intelligence	
  and	
  cunning	
  of	
  
humans	
  —	
  may	
  be	
  the	
  life	
  forms	
  that	
  evolved	
  
on	
  other	
  planets	
  in	
  the	
  universe.”	
  
Photo	
  uploaded	
  by	
  Stuart	
  Cantrill	
  on	
  Twi1er	
  
Chemists:	
  Embrace	
  open	
  access	
  
•  Open-access, arXiv@Cornell
–  ASAP publishing
–  Open access
–  Instant and free peer review by large community
•  Chemical community less open to sharing and arXiv-style
publishing?
•  Cultural differences between various scientific communities
(eg. particle physicists vs total synthesis chemists).
The Future
“Predic=on	
  is	
  difficult,	
  especially	
  about	
  the	
  future”	
  -­‐-­‐	
  Niels	
  Bohr.	
  
Challenges and promises
•  Data:
- Bigger, better annotated databases with quality control.
- Statistics becoming more useful and appreciated.
- Greater awareness of data mining tools among
experimental chemists.
•  Simulation:
- Long molecular dynamics simulations approaching
realistic timescales.
- Insights from network theory used in synthetic planning.
- First quantum chemistry calculation on quantum
computer (2010).
- Better statistical validation of results, quality control.
Challenges and promises
•  Sociology
- More open access journals, more open access options.
- Widespread publicity of research results.
- Discussion, criticism on blogs being taken seriously.
Retraction Watch.
- Better use of multimedia (Twitter, Skype, podcasts).
- Cultural changes:
- More cross-talk between chemists, statisticians and
computer scientists.
- More cross-talk between academia and industry.
- Willingness to share data, code, experimental results.
- Willingness to present and discuss negative data.
But…be afraid of the hype
Fortune	
  Magazine,	
  October	
  1981	
  
•  Jetpacks
•  Artificial intelligence
•  Nuclear fusion
•  Robot maids
In	
  30	
  years…	
  (1950-­‐2014)	
  
Translating hype into reality
•  Fearlessness; ability to jump across boundaries, question
received wisdom.
•  Resilience; ability to bounce back from failure.
•  Adaptability; ability to welcome change.
•  Teamwork; ability to collaborate and share.
•  Imagination; ability to think outside the box.
The future of information technology in
chemistry…
…is us
“The	
  best	
  way	
  to	
  predict	
  the	
  future	
  
is	
  to	
  invent	
  it.”	
  –	
  Alan	
  Kay.	
   “Be	
  the	
  change	
  that	
  you	
  wish	
  
to	
  see	
  in	
  the	
  world.”	
  –	
  Gandhi.	
  

The Impact of Information Technology on Chemistry and Related Sciences

  • 1.
    Ashutosh  (Ash)  Jogalekar   (h1p://wavefunc:on.fieldofscience.com)  
  • 2.
    About me •  Medicinaland computational chemist working in the biopharma industry in Cambridge, MA. •  Blogger “The Curious Wavefunction” •  Contact: - Blog: http://wavefunction.fieldofscience.com - Twitter: @curiouswavefn - Email: curiouswavefunction@gmail.com
  • 3.
    Two kinds ofscientific revolutions •  Idea-driven (Kuhn): physics (quantum theory), astronomy (expanding universe), biology (evolution) •  Tool-driven (Galison): engineering (transistor), biology (sequencing), astronomy (telescope). Thomas  Kuhn   The Structure of Scientific Revolutions (1962) Image and Logic (1997) Peter  Galison   Chemistry as an experimental science has benefited much more from tool-driven revolutions.
  • 4.
  • 5.
    Our latest (andgreatest) tool The Computer “I  think  it's  fair  to  say  that  personal  computers  have  become  the   most  empowering  tool  we've  ever  created.  They're  tools  of   communica=on,  they're  tools  of  crea=vity,  and  they  can  be   shaped  by  their  user.”  –  Bill  Gates  
  • 6.
    A brief historyof computers in chemistry •  1950s: Driven by quantum chemistry and crystallography. •  Early efforts needed access to centralized machines, travel. Computations enormously expensive: 1.5 years (1959) vs one day (2014). Punched  card   (2014)   Punched  card   (1960)   UNIVAC  1:  1.5  yrs  to  calculate  12  molecules   Apple  MacBook  Air:  4  hours  for  same  calcula:on   •  1958: Moore’s Law; doubling of transistors every two years. •  1970s: Use of computers started becoming routine. Still slow. •  1990s: Exponential developments in desktop computing, software, internet. •  2000s:Applications to biology, materials science become routine.  
  • 7.
    How have computersaffected chemistry? •  Publications: ~25 major journals, also described in others. •  Companies: Schrodinger, OpenEye, CCG, Perkin-Elmer etc. •  Conferences: Gordon Conference, IAQMS. •  ACS Division of Computers in Chemistry. •  Awards:ACS Award for Computers in Chemistry.
  • 8.
  • 9.
    Data “It  is  a  capital  mistake  to  theorize  before  one  has  data.”     -­‐-­‐  Arthur  Conan  Doyle  (“Sherlock  Holmes:  A  Scandal  in  Bohemia”)  
  • 10.
    Chemical data hasgrown exponentially Growth  of  the  Cambridge  Structural  Database  (Image:  CSD)   Why? Better tools to determine and record structures, properties. Data repositories have enabled easy and instant global access to data.   •  Chemical Abstracts Service (CAS): 75 million registered substances. •  Protein Data Bank (PDB): 97, 000 protein structures. •  Cambridge Structural Database (CSD): 40, 000 added every year. •  Scifinder, Google Scholar.  
  • 11.
    Standardization •  Chemical structurerepresentation: drawing, manipulation. Standard, multiple compressed file formats (eg. SMILES strings), error-free sharing of data. •  E-Notebook: Standardized and safe record keeping, organization, analysis and visualization. ChemDraw   SMILES   Data is easier to compare, verify and reproduce.  
  • 12.
    What can wedo with all this data?
  • 13.
    Visualization •  Instant visualizationof data in various forms, user-friendly presentation; eg. Spotfire, instant Jchem etc. •  Tools ranging from basic plots to advanced, on-the-fly statistical analysis (eg. principal component analysis, regression) now available. •  Instant comprehension of complex biomolecular and inorganic structures (eg. Pymol). Much easier to make sense of data and property relationships.
  • 14.
    Software for chemicalanalysis •  What do you use software for? Analytical, spectroscopic, purification? •  Advanced techniques now more easily accessible. •  Enormous savings in time and labor. NMR   Crystallography   GC-­‐MS   Ubiquitously affected everyday chemical research and the work of bench chemists.
  • 15.
    Using data intelligently:Cheminformatics •  Applying tools from informatics and computer science to extract meaning from data. •  Most common problems: Searching, finding trends, correlating chemical structures to various properties (descriptors). If  only  all  correla:ons  were  this  good…  
  • 16.
    Case Study I:Similarity searching - Simplified representations (eg. bit strings) make searches of millions of molecules very fast - Tanimoto similarity: Efficient, can be calculated for any property. - Drug side effects similarity prediction especially promising. Tanimoto  similarity  between  molecules   J.  Med.  Chem.,  2010,  53,  4830   Drug  side  effects:  Nature  Biotechnology  2007,  25,  197  
  • 17.
    Case Study II:Diversity analysis •  Humans are pattern-seeking; often ignore diversity to focus on similarity. •  Maximizing diversity = Maximize probability of finding new molecules with novel properties. •  Create molecular libraries of millions of compounds; screening collections for drug discovery, materials science etc. Shape  diversity:  Nat.  Chem.  Biol.  2012,  8,  358   Voltage  vs  safety  of  Li-­‐ion  ba8eries:  Nat.  Mat.  2013,  12,  191  
  • 18.
    Simulation and Analysis “Nobody  believes  a  theore=cal  result,  except  the  person  who   calculated  it.  Everybody  believes  an  experimental  result,  except   the  person  who  measured  it.”     -­‐-­‐  Paul  Labute  (Chemical  Compu=ng  Group)  
  • 19.
    How it happened MichaelLevitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)
  • 20.
    How it happened MichaelLevitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)
  • 21.
    Major applications: QMand MM •  Quantum chemistry made computers; computers made quantum chemistry. •  Molecular mechanics: Classical mechanics applied to molecules. •  QM equations cannot be solved exactly. Need approximations, iterative processing, and computing power. •  Useful for calculating many properties (energies, dipole moments, reactivity). Poten:al  energy  surface     for  chemical  reac:ons   Fullerene  from  graphene:  Nat.  Chem.  2010,  2,  450  
  • 22.
    The 2013 NobelPrize •  Tradeoff: Quantum mechanics (QM) - accurate but expensive. Molecular mechanics (MM) – inaccurate but cheap. •  QM/MM: Best of both worlds, multiscale. •  Applicable to large biological systems (proteins, DNA), extended materials (zeolites, polymers).
  • 23.
    A Few GoodApplications
  • 24.
    Molecular Dynamics •  MolecularDynamics (MD): Newton’s laws of motion applied to molecules, millions of steps; large amounts of data. •  Parallel processing, special-purpose machines allow MD to surpass Moore’s Law. •  Simulations approaching biological timescales becoming routine.
  • 25.
    Knowledge-Based Protein Folding • Knowledge-based protein structure prediction:Taking advantage of existing information in PDB to predict folded structures. •  Use advanced statistical methods based on PDB data for assigning probabilities to various solutions: Rosetta. •  Outstanding success in CASP (Critical Assessment of Protein Structure). "The  amazing  thing  is  that  Rose1a  had  31  points  and  the  next  best  group  had  8  points.  It  is   like  baseball  in  1927,  when  Babe  Ruth  hit  60  home  runs  and  the  runner  up  hit  14,  and   en:re  teams  didn't  hit  as  many  as  he  did”.  –  Peter  Kollman  (UCSF),  CASP  2000.   Overlap  between  predicted  (red)  and  experimental  (green)  protein  structures  
  • 26.
    Protein design •  Proteindesign: Given a structure, find alternative sequences. •  Uses of alternative sequences: Enzymes catalyzing new reactions, new small molecule-binding proteins (eg. for environmental cleanups). •  2003: First protein designed entirely de novo. •  2008: First enzyme catalyzing reaction with no natural precedent. •  As PDB grows, protein design becomes better. Top7:  Protein  designed  from  scratch.   (Science,  2003,  302,  1364)   Kemp  eliminase  enzyme  from  scratch   (Nature,  2008,  453,  190)  
  • 27.
    Structure-Based Drug Design • Predict structure of drug bound to protein, suggest modifications to improve properties. •  Combination of crystallography data and simulation. •  Outstanding success in some areas: eg. HIV protease inhibitors against AIDS. Impact  of  addi:on  of  HIV  protease  inhibitors   to  an:retroviral  therapy  among  AIDS  pa:ents  in   San  Francisco  (Am  J  Epidemiol.  152,  2,  2000)   HIV  protease   bound  to   indinavir   Katharine   Holloway  
  • 28.
    The wisdom ofcrowds (and clouds) •  FoldIt: Computer game to solve protein folding and design problems. •  Led to HIV protein structure and algorithm discovery. PNAS,  2011,  108,  18949     Comparison  of  Folding@Home   with  leading  supercomputers   •  Distributed computing, Folding@Home: 100 million hours logged on Nintendo PS3, also enabled on cloud. •  Used to study folding of proteins involved in cancer, Alzheimer’s disease; drug design.
  • 29.
  • 30.
    New materials forthe new millennium •  Based on Density Functional Theory (Nobel Prize 1998). •  Application of materials simulations and computational screening: - Hydrogen storage (metal-organic frameworks) - Photovoltaics and solar cells - Alloys and new materials for batteries - Semiconductor design
  • 31.
    The age ofbiology •  Human Genome Project: Computers made it possible. •  Sequencing has greatly surpassed Moore’s Law. New techniques; IonTorrent, Nanopore etc. •  Computational Biology and Bioinformatics: Comparing genomes, predicting diseases, mapping ancestral differences. •  Aided by massive amounts of data: GenBank, Cancer Genome, Ensembl, UniProt etc. •  Ripe territory for Big Data and new informatics techniques.
  • 32.
    Sociology “The  democra=za=on  of  informa=on  and  exper=se  that  springs   from  the  world  wide  web,  and  the  power  of  groups  of  mo=vated   amateurs  to  strike  out  on  their  own  in  technical  subjects,  is   weakening  the  authority  of  “experts”  in  society.”  -­‐-­‐  George   Whitesides.  
  • 33.
    The chemical blogosphere • Chemistry blogs took off in 2002, initially focused on research, grad school hijinks. •  Quickly diversified; peer review, job market, academic culture, publishing, issues in industry, safety culture. Derek  Lowe:  drug  discovery,  industry   Chemjobber:  The  Job  Market,  safety  culture,  industry   Paul  Bracher:  academic  culture,  peer  review   Ash  Jogalekar:  Nature  and  evolu:on  of  chemistry,  peer  review   SeeArrOh:  Chemophobia,  food,  peer  review   C&EN  official  blog   James  Ashenhurst:  Org  Chem  tutoring  
  • 34.
    What are blogsgood for? Peer Review 2.0 •  Timely, democratic review of latest research. •  Interesting research highlighted immediately. •  Critiqued by large audience. •  Self-selecting. •  Instrumental in spotting: fallacious research, self- plagiarism, dubious methodologies and fabrication. Non-research contributions •  Lab safety (C&EN). •  Academic culture (Chembark). •  The (sad) state of the job market (Chemjobber). •  Representation of women, minorities (Dr Rubidium). •  Chemophobia, Industry (SeeArrOh, Derek Lowe).
  • 35.
    Peer Review 2.0:case study I •  First reported instance of comprehensive informal peer review of chemical literature. •  2006: 37 step synthesis of hexacyclinol described in single-author paper in Angew. Chem. by James LaClair (Xenobe Institute). •  Commenter on blog of Stanford grad student Dylan Stiles points out inconsistencies in structure, others weigh in and point out many more. •  Other official papers refute data, suggest alternative structure. •  Extensive discussion of problems with paper on multiple blogs, hundreds of comments. Paper retracted in 2012, long after problems were clear. “The  proof  is  in  the  product”.  
  • 36.
    Peer Review 2.0:case study II •  April 2012: Paper in JACS on amino acid chirality and origin of life. •  Two issues: Bad scientific communication and charges of self-plagiarism. •  Extensive similarities with two previous articles highlighted by Nature Chemistry editor Stuart Cantrill exclusively on Twitter. •  Paper retracted in May 2012. •  Case illustrates peer-review operating entirely outside formal channels. ACS  Press  Release:  “New  scien:fic  research   raises  the  possibility  that  advanced  versions  of   T.  rex  and  other  dinosaurs  —  monstrous   creatures  with  the  intelligence  and  cunning  of   humans  —  may  be  the  life  forms  that  evolved   on  other  planets  in  the  universe.”   Photo  uploaded  by  Stuart  Cantrill  on  Twi1er  
  • 37.
    Chemists:  Embrace  open  access   •  Open-access, arXiv@Cornell –  ASAP publishing –  Open access –  Instant and free peer review by large community •  Chemical community less open to sharing and arXiv-style publishing? •  Cultural differences between various scientific communities (eg. particle physicists vs total synthesis chemists).
  • 38.
    The Future “Predic=on  is  difficult,  especially  about  the  future”  -­‐-­‐  Niels  Bohr.  
  • 39.
    Challenges and promises • Data: - Bigger, better annotated databases with quality control. - Statistics becoming more useful and appreciated. - Greater awareness of data mining tools among experimental chemists. •  Simulation: - Long molecular dynamics simulations approaching realistic timescales. - Insights from network theory used in synthetic planning. - First quantum chemistry calculation on quantum computer (2010). - Better statistical validation of results, quality control.
  • 40.
    Challenges and promises • Sociology - More open access journals, more open access options. - Widespread publicity of research results. - Discussion, criticism on blogs being taken seriously. Retraction Watch. - Better use of multimedia (Twitter, Skype, podcasts). - Cultural changes: - More cross-talk between chemists, statisticians and computer scientists. - More cross-talk between academia and industry. - Willingness to share data, code, experimental results. - Willingness to present and discuss negative data.
  • 41.
    But…be afraid ofthe hype Fortune  Magazine,  October  1981   •  Jetpacks •  Artificial intelligence •  Nuclear fusion •  Robot maids In  30  years…  (1950-­‐2014)  
  • 42.
    Translating hype intoreality •  Fearlessness; ability to jump across boundaries, question received wisdom. •  Resilience; ability to bounce back from failure. •  Adaptability; ability to welcome change. •  Teamwork; ability to collaborate and share. •  Imagination; ability to think outside the box.
  • 43.
    The future ofinformation technology in chemistry… …is us “The  best  way  to  predict  the  future   is  to  invent  it.”  –  Alan  Kay.   “Be  the  change  that  you  wish   to  see  in  the  world.”  –  Gandhi.