Cyberinfrastructure Day 2010: Applications in Biocomputing

Jeremy Yang
Software Systems Manager
Division of Biocomputing
Dept. of Biochemistry & Molecular Biology
UNM School of Medicine

Cyberinfrastructure Day -- April 22, 2010

I. What is Biocomputing?
II. Cyber Revolution (~1980-2010+)
III. Cyberinfrastructure (To be or not to be?)
IV. Super Computing, Redefined

Division of Biocomputing
http://biocomp.health.unm.edu/
Department of Biochemistry & Molecular Biology
School of Medicine

Also affiliated with the NIH Roadmap-funded UNM
Center for Molecular Discovery

  Biomolecular screening  Data mining, machine
informatics learning
  Cheminformatics   3D visualization
  Bioinformatics   Public data integration
  Genomics   Collaborations in
  Virtual screening chemistry, biology,
medicine, comp sci
  Molecular modeling
  BIOMED 505 course
  SAR (Structure-
Activity-Relationship)   Software development,
management, deployment
& support

Larry Sklar, et al., UNMCMD (NIH Roadmap)

~$20M NIH awarded to date

 32 cpu Linux cluster  2+ Oracle instances
 32GB RAM server  PostgreSQL, MySQL

 Linux: OpenSUSE, CentOS,  Stereo graphics

RedHat, Fedora, Ubuntu workstation
 SGI/IRIX  25+ scientific software

 Windows, Mac OS X packages
 Automated integration with
 Supported in-house
NIH databases applications

We are cyberinfrastructure users and providers!

Virtual chemistry; property prediction, chemspace
navigation, computer aided molecular design, graph
theory, databases

 Nucleotide and protein sequence analysis
 Genomics, proteomics

 Merging with chemical biology, etc.

 Computational search for likely
biological actives Example:
3D shape search;
 Database may be real or virtual prozac & paxil
compounds
 2D and 3D methods

 2D similarity search

 3D similarity search (shape,

pharmacophore)
 docking (3D, protein binding site)

c/o OpenEye Rocs

atoms, bonds, surfaces, fields, interactions, stereo

serotonin

hemoglobin

Computational models for protein-ligand binding

Abl kinase
(1iep.pdb)‫‏‬
interaction potentia
hydrophobic (green
hbond acceptors (r

Gleevec in binding site

Gleevec is a leukemia drug known to bind with Abl kinase.

(Watch movies...)

PyMol movie:

http://video.google.com/videoplay?
docid=-5859274887925224981#

Jmol interactive DNA modeling demo:

http://chemapps.stolaf.edu/pe/protexpl/htm/top.htm?
id=1d66&&&chpa=true

Expert users can advance understanding via rich, dynamic, visual interfaces.

E.g., Searching NIH PubChem for non-selectivity

Many biomedical data sources worldwide

SLIDE 15 (15 MIN?)

Division of Biocomputing in 2008

 Rapid change, challenge and opportunity
 Learning from history, trends (new not enough)

 Winners and losers

 Science, experts have led and followed.

 ~1980-2010 covers 3σ (99.7%)

 And evolution...

 Rapid change, challenge and opportunity
 Learning from history, trends

 Winners and losers

 Science, experts have led and followed.

 ~1980-2010 covers 3σ (99.7%)

 And evolution...

1977: Atari 2600
1978: Space Invaders
1981: IBM-PC (MS-DOS)
1983: cellphone
1983: GNU Project
1984: Neuromancer,
William Gibson,
“cyberspace”
1984: Apple Mac, mouse,
windows & icons

1985: Oracle 5 (client-server)
1989: Intel 486 Pentium (1M
transistors, 50MHz)
1990: MS Windows 3.0
1990: WWW (Berners-Lee)
1991: High Perf Comp &
Comm Act (Al Gore)
1991: Linux (Linux Torvalds)
1991: AOL
1991: ETrade

1993: Jurassic Park (via SGI)
1993: NCSA Mosaic
1994: Netscape Navigator
1994: “Good Times” hoax
1994: Match.com
1995: “Concept” virus (Word)
1995: Internet Explorer
1995: Apache project
1995: Yahoo!

1995: Amazon.com
1995: My mother gets email
1997: Google
1997: eBay
1999: Melissa virus (Outlook)
1999: Napster (p2p)
2000: MS convicted
2000: 3M USA broadband*
2000: dot-com bubble pops

*Fixed non dial-up internet connections >56k (FCC).

2000: 802.11b wireless
2001: Apple iPod
2001: Apple iTunes
2001: Wikipedia
2003: Skype
2005: YouTube
2005: Rio power grid hacked
2005: NSA domestic surveillance
2006: Facebook

2006: Amazon Cloud
2007: DOD hacked
2008: 70M USA broadband*
2009: Cyberdefense USA priority
2009: Twitter role in Iran election
protests
2010: UAVs are SOPs
2011: Cyber terrorism?

*Fixed non dial-up internet connections >56k (FCC).

The dotted line keeps moving...

Case study: database cheminformatics in
pharma research, 1990→2000.

 In 1990, high speed chemical searching was
beyond standard capabilities.
 Research groups managed local servers in

their labs & specialized DB engines (e.g.
Daylight Inc.).
 By 2000, this function had moved to IT (via

Oracle cartridges, etc.) corporate informatics
infrastructure
 Transition not smooth, but very beneficial.

Standard cocaine
functions:
substructure,
similarity,
identity
chemical
searching

imidazoles

(1) office equipment
(2) lab equipment
(3) experimental apparatus
(4) the experiment
(5) a commodity
(6) custom configured experimental
vehicle for exploration
(5) all of the above

 Scientific software
 Computational science

 Commodity software

 Engineering enables science

 Science requires agile

development, high performance,
experimentation, risk taking,
play.
 Cyberinfrastructure users and

developers/maintainers

SLIDE 30 (30 MIN?)

 Scientific research  Scientific software for
experts
 Computational research
 Enabling software for
 High performance
scientists
computing as a research
tool  Commoditization (e.g.

cloud computing)
 High performance

infrastructure as a  Plumbing vs.

productivity tool experimental apparatus
 Appropriate tiers and

domains

IT: “Poorly managed Research: “We need
computers and needy ill- power, flexibility and
trained users put the access and not another
system at risk.” lame PC.”

And with other cyberfolks too. And with great
results.

 In ~5 yrs, super → un-super
 Super computing? Define computer.

 Advances from unexpected places:

  gaming, movies (graphics -- vs. AI)
  social networking (crowdsourcing)
  even business (web standards, UIs, security)
 Super computing is pushing the current limits
 But where are the key frontiers?

Advances from unexpected places...

Colossus code breaking computer, UK.

Eniac computer, Univ of Pennsylvania.

High performance (super) computing is pushing the current limits.

This is what a “computer” looks like.

“The network is the computer.” - John Gage (Sun, NetDay founder)

Corollaries:
 The network is the (semantic) database

 The network is cyberspace

 The network is us too

 Super users → super computing
 Blackbox AI/monolith paradigm limiting

 Human/computer co-evolution

Cytoscape
biological network
visualizer with
drug - target
interactions

“Super Computers” @ Division of Biocomputing
 Tudor Oprea
 Cristian Bologa

 Stephen Mathias

 Oleg Ursu
Happy Earth Day!

 Jerome Abear

 Ramona Curpan

 Liliana Halip
Jeremy Yang
 Andrei Leitao jjyang@salud.unm.edu

Cyberinfrastructure Day -- April 22, 2010

Cyberinfrastructure Day 2010: Applications in Biocomputing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Cyberinfrastructure Day 2010: Applications in Biocomputing

Similar to Cyberinfrastructure Day 2010: Applications in Biocomputing (20)

More from Jeremy Yang

More from Jeremy Yang (20)

Cyberinfrastructure Day 2010: Applications in Biocomputing