SlideShare a Scribd company logo
1 of 52
Download to read offline
Jeremy Yang
                    Software Systems Manager
                    Division of Biocomputing
                    Dept. of Biochemistry & Molecular Biology
                    UNM School of Medicine


Cyberinfrastructure Day -- April 22, 2010
I. What is Biocomputing?
 II. Cyber Revolution (~1980-2010+)
III. Cyberinfrastructure (To be or not to be?)
IV. Super Computing, Redefined
Division of Biocomputing
       http://biocomp.health.unm.edu/
Department of Biochemistry & Molecular Biology
              School of Medicine




Also affiliated with the NIH Roadmap-funded UNM
            Center for Molecular Discovery
    Biomolecular screening     Data mining, machine
     informatics                 learning
    Cheminformatics            3D visualization
    Bioinformatics             Public data integration
    Genomics                   Collaborations in
    Virtual screening           chemistry, biology,
                                 medicine, comp sci
    Molecular modeling
                                BIOMED 505 course
    SAR (Structure-
     Activity-Relationship)     Software development,
                                 management, deployment
                                 & support
Larry Sklar, et al., UNMCMD (NIH Roadmap)




                               ~$20M NIH awarded to date
 32 cpu Linux cluster          2+ Oracle instances
 32GB RAM server               PostgreSQL, MySQL


 Linux: OpenSUSE, CentOS,      Stereo graphics

RedHat, Fedora, Ubuntu         workstation
 SGI/IRIX                      25+ scientific software

 Windows, Mac OS X            packages
 Automated integration with
                                Supported in-house
NIH databases                  applications


     We are cyberinfrastructure users and providers!
Virtual chemistry; property prediction, chemspace
navigation, computer aided molecular design, graph
                  theory, databases
 Nucleotide and protein sequence analysis
 Genomics, proteomics


 Merging with chemical biology, etc.
 Computational search for likely
biological actives                               Example:
                                             3D shape search;
 Database may be real or virtual              prozac & paxil
compounds
 2D and 3D methods


 2D similarity search

 3D similarity search (shape,

pharmacophore)
 docking (3D, protein binding site)




                          c/o OpenEye Rocs
atoms, bonds, surfaces, fields, interactions, stereo




     serotonin




                                  hemoglobin
Computational models for protein-ligand binding


           Abl kinase
           (1iep.pdb)‫‏‬
                                                                                 interaction potentia
                                                                                 hydrophobic (green
                                                                                 hbond acceptors (r


Gleevec in binding site




                     Gleevec is a leukemia drug known to bind with Abl kinase.
(Watch movies...)



PyMol movie:

http://video.google.com/videoplay?
docid=-5859274887925224981#




Jmol interactive DNA modeling demo:

http://chemapps.stolaf.edu/pe/protexpl/htm/top.htm?
id=1d66&&&chpa=true




  Expert users can advance understanding via rich, dynamic, visual interfaces.
E.g., Searching NIH PubChem for non-selectivity
Many biomedical data sources worldwide




                                 SLIDE 15 (15 MIN?)
Division of Biocomputing in 2008
 Rapid change, challenge and opportunity
 Learning from history, trends (new not enough)


 Winners and losers


 Science, experts have led and followed.


 ~1980-2010 covers 3σ (99.7%)


 And evolution...
 Rapid change, challenge and opportunity
 Learning from history, trends


 Winners and losers


 Science, experts have led and followed.


 ~1980-2010 covers 3σ (99.7%)


 And evolution...
1977: Atari 2600
1978: Space Invaders
1981: IBM-PC (MS-DOS)
1983: cellphone
1983: GNU Project
1984: Neuromancer,
  William Gibson,
  “cyberspace”
1984: Apple Mac, mouse,
  windows & icons
1985: Oracle 5 (client-server)
1989: Intel 486 Pentium (1M
  transistors, 50MHz)
1990: MS Windows 3.0
1990: WWW (Berners-Lee)
1991: High Perf Comp &
  Comm Act (Al Gore)
1991: Linux (Linux Torvalds)
1991: AOL
1991: ETrade
1993: Jurassic Park (via SGI)
1993: NCSA Mosaic
1994: Netscape Navigator
1994: “Good Times” hoax
1994: Match.com
1995: “Concept” virus (Word)
1995: Internet Explorer
1995: Apache project
1995: Yahoo!
1995: Amazon.com
1995: My mother gets email
1997: Google
1997: eBay
1999: Melissa virus (Outlook)
1999: Napster (p2p)
2000: MS convicted
2000: 3M USA broadband*
2000: dot-com bubble pops

 *Fixed non dial-up internet connections >56k (FCC).
2000: 802.11b wireless
2001: Apple iPod
2001: Apple iTunes
2001: Wikipedia
2003: Skype
2005: YouTube
2005: Rio power grid hacked
2005: NSA domestic surveillance
2006: Facebook
2006: Amazon Cloud
2007: DOD hacked
2008: 70M USA broadband*
2009: Cyberdefense USA priority
2009: Twitter role in Iran election
  protests
2010: UAVs are SOPs
2011: Cyber terrorism?


 *Fixed non dial-up internet connections >56k (FCC).
The dotted line keeps moving...

Case study: database cheminformatics in
     pharma research, 1990→2000.
 In 1990, high speed chemical searching was
beyond standard capabilities.
 Research groups managed local servers in

their labs & specialized DB engines (e.g.
Daylight Inc.).
 By 2000, this function had moved to IT (via

Oracle cartridges, etc.) corporate informatics
infrastructure
 Transition not smooth, but very beneficial.
Standard      cocaine
               functions:
             substructure,
               similarity,
                identity
               chemical
               searching

imidazoles
(1) office equipment
(2) lab equipment
(3) experimental apparatus
(4) the experiment
(5) a commodity
(6) custom configured experimental
  vehicle for exploration
(5) all of the above
(1) office equipment
(2) lab equipment
(3) experimental apparatus
(4) the experiment
(5) a commodity
(6) custom configured experimental
  vehicle for exploration
(5) all of the above
 Scientific software
 Computational science


 Commodity software


 Engineering enables science


 Science requires agile

development, high performance,
experimentation, risk taking,
play.
 Cyberinfrastructure users and

developers/maintainers

                                  SLIDE 30 (30 MIN?)
 Scientific research       Scientific software for
                           experts
 Computational research
                            Enabling software for
 High performance
                           scientists
computing as a research
tool                        Commoditization (e.g.

                           cloud computing)
 High performance

infrastructure as a         Plumbing vs.

productivity tool          experimental apparatus
                            Appropriate tiers and

                           domains
IT: “Poorly managed       Research: “We need
computers and needy ill-    power, flexibility and
  trained users put the    access and not another
     system at risk.”            lame PC.”
And with other cyberfolks too. And with great
                   results.
 In ~5 yrs, super → un-super
 Super computing? Define computer.


 Advances from unexpected places:


           gaming, movies (graphics -- vs. AI)
           social networking (crowdsourcing)
           even business (web standards, UIs, security)
 Super computing is pushing the current limits
 But where are the key frontiers?
Advances from unexpected places...
Colossus code breaking computer, UK.
Eniac computer, Univ of Pennsylvania.
Cray computer
SLIDE 40 (40 MIN?)
High performance (super) computing is pushing the current limits.
This is what a “computer” looks like.
“The network is the computer.” - John Gage (Sun, NetDay founder)
Corollaries:
 The network is the (semantic) database


 The network is cyberspace


 The network is us too
 Super users → super computing
  Blackbox AI/monolith paradigm limiting


  Human/computer co-evolution




        Cytoscape
biological network
    visualizer with
      drug - target
       interactions
“Super Computers” @ Division of Biocomputing
 Tudor Oprea
 Cristian Bologa


 Stephen Mathias


 Oleg Ursu
                                                 Happy Earth Day!

 Jerome Abear


 Ramona Curpan


 Liliana Halip
                                                Jeremy Yang
 Andrei Leitao                                 jjyang@salud.unm.edu



                    Cyberinfrastructure Day -- April 22, 2010

More Related Content

What's hot

BIO MOLECULAR COMPUTING
BIO MOLECULAR COMPUTINGBIO MOLECULAR COMPUTING
BIO MOLECULAR COMPUTINGPRINCEWILLMAX
 
Bio-Molecular computers
Bio-Molecular computersBio-Molecular computers
Bio-Molecular computersMoumita Kanrar
 
Dna computing
Dna computingDna computing
Dna computingcsr522
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data DriverLarry Smarr
 
DNA & Molecular Computing
DNA & Molecular ComputingDNA & Molecular Computing
DNA & Molecular Computingjoshdean
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...Larry Smarr
 
Power point presentation of saminer topic DNA based computing
Power point presentation of saminer topic  DNA based computingPower point presentation of saminer topic  DNA based computing
Power point presentation of saminer topic DNA based computingPaushali Sen
 
Dna computer-presentation
Dna computer-presentationDna computer-presentation
Dna computer-presentationvivekvivek2112
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekData Driven Innovation
 
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystems
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystems2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystems
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystemsTimeScience
 
Synbioabs2 gold
Synbioabs2 goldSynbioabs2 gold
Synbioabs2 goldjunvirola
 
Nanomedicine and Cryonics
Nanomedicine and CryonicsNanomedicine and Cryonics
Nanomedicine and CryonicsDanila Medvedev
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesLarry Smarr
 

What's hot (20)

Bio computing
Bio computingBio computing
Bio computing
 
BIO MOLECULAR COMPUTING
BIO MOLECULAR COMPUTINGBIO MOLECULAR COMPUTING
BIO MOLECULAR COMPUTING
 
DNA computing
DNA computingDNA computing
DNA computing
 
Bio-Molecular computers
Bio-Molecular computersBio-Molecular computers
Bio-Molecular computers
 
Dna computers
Dna computers Dna computers
Dna computers
 
Dna computing
Dna computingDna computing
Dna computing
 
Sequencing Genomics: The New Big Data Driver
Sequencing Genomics:The New Big Data DriverSequencing Genomics:The New Big Data Driver
Sequencing Genomics: The New Big Data Driver
 
DNA & Molecular Computing
DNA & Molecular ComputingDNA & Molecular Computing
DNA & Molecular Computing
 
DNA Computing
DNA ComputingDNA Computing
DNA Computing
 
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
High Performance Cyberinfrastructure to Support Data-Intensive Biomedical Res...
 
Power point presentation of saminer topic DNA based computing
Power point presentation of saminer topic  DNA based computingPower point presentation of saminer topic  DNA based computing
Power point presentation of saminer topic DNA based computing
 
DNA Computing
DNA ComputingDNA Computing
DNA Computing
 
Dna computer-presentation
Dna computer-presentationDna computer-presentation
Dna computer-presentation
 
MOLECULAR COMPUTING
MOLECULAR COMPUTINGMOLECULAR COMPUTING
MOLECULAR COMPUTING
 
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel WeitschekGenomic Big Data Management, Integration and Mining - Emanuel Weitschek
Genomic Big Data Management, Integration and Mining - Emanuel Weitschek
 
DNA COMPUTER
DNA COMPUTERDNA COMPUTER
DNA COMPUTER
 
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystems
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystems2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystems
2015-08-13 ESA: NextGen tools for scaling from seeds to traits to ecosystems
 
Synbioabs2 gold
Synbioabs2 goldSynbioabs2 gold
Synbioabs2 gold
 
Nanomedicine and Cryonics
Nanomedicine and CryonicsNanomedicine and Cryonics
Nanomedicine and Cryonics
 
Building an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic SciencesBuilding an Information Infrastructure to Support Genetic Sciences
Building an Information Infrastructure to Support Genetic Sciences
 

Similar to Cyberinfrastructure Day 2010: Applications in Biocomputing

Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridIan Foster
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and KnowledgeIan Foster
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchDavid Ruau
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Keiichiro Ono
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Robert Grossman
 
Emerging Forms of Data and Analytics
Emerging Forms of Data and AnalyticsEmerging Forms of Data and Analytics
Emerging Forms of Data and AnalyticsDavid De Roure
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of ScienceGlobus
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesLarry Smarr
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Sage Base
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingGigaScience, BGI Hong Kong
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningLarry Smarr
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...Larry Smarr
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance CollaborationLarry Smarr
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astrowebuploader
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science James Hendler
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Preferred Networks
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Larry Smarr
 

Similar to Cyberinfrastructure Day 2010: Applications in Biocomputing (20)

Aaas Data Intensive Science And Grid
Aaas Data Intensive Science And GridAaas Data Intensive Science And Grid
Aaas Data Intensive Science And Grid
 
Computation and Knowledge
Computation and KnowledgeComputation and Knowledge
Computation and Knowledge
 
Cool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical ResearchCool Informatics Tools and Services for Biomedical Research
Cool Informatics Tools and Services for Biomedical Research
 
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...Introduction to Biological Network Analysis and Visualization with Cytoscape ...
Introduction to Biological Network Analysis and Visualization with Cytoscape ...
 
Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011Bionimbus - Northwestern CGI Workshop 4-21-2011
Bionimbus - Northwestern CGI Workshop 4-21-2011
 
Emerging Forms of Data and Analytics
Emerging Forms of Data and AnalyticsEmerging Forms of Data and Analytics
Emerging Forms of Data and Analytics
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic SciencesBuilding an Information Infrastructure to Support Microbial Metagenomic Sciences
Building an Information Infrastructure to Support Microbial Metagenomic Sciences
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24Stephen Friend Dana Farber Cancer Institute 2011-10-24
Stephen Friend Dana Farber Cancer Institute 2011-10-24
 
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data HandlingScott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
Scott Edmunds: GigaScience - Big-Data, Data Citation and Future Data Handling
 
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a BeginningCollaborations Between Calit2, SIO, and the Venter Institute-a Beginning
Collaborations Between Calit2, SIO, and the Venter Institute-a Beginning
 
Collins seattle-2014-final
Collins seattle-2014-finalCollins seattle-2014-final
Collins seattle-2014-final
 
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
The OptIPlanet Collaboratory Supporting Microbial Metagenomics Researchers Wo...
 
High Performance Collaboration
High Performance CollaborationHigh Performance Collaboration
High Performance Collaboration
 
wolstencroft-ogf20-astro
wolstencroft-ogf20-astrowolstencroft-ogf20-astro
wolstencroft-ogf20-astro
 
The Science of Data Science
The Science of Data Science The Science of Data Science
The Science of Data Science
 
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
Jubatus: Realtime deep analytics for BIgData@Rakuten Technology Conference 2012
 
NRNB EAC Report 2011
NRNB EAC Report 2011NRNB EAC Report 2011
NRNB EAC Report 2011
 
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
Building a Community Cyberinfrastructure to Support Marine Microbial Ecology ...
 

More from Jeremy Yang

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsJeremy Yang
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerJeremy Yang
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesJeremy Yang
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APIJeremy Yang
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerJeremy Yang
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Jeremy Yang
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterJeremy Yang
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Jeremy Yang
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discoveryJeremy Yang
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingJeremy Yang
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of ComputingJeremy Yang
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsJeremy Yang
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsJeremy Yang
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds posterJeremy Yang
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryJeremy Yang
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDJeremy Yang
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesJeremy Yang
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...Jeremy Yang
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsJeremy Yang
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNJeremy Yang
 

More from Jeremy Yang (20)

TIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS AnalyticsTIGA: Target Illumination GWAS Analytics
TIGA: Target Illumination GWAS Analytics
 
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizerDrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
DrugCentralDb and BioClients: Dockerized PostgreSql with Python API-tizer
 
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypothesesMining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
Mining ClinicalTrials.gov via CTTI AACT for drug target hypotheses
 
TIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST APITIN-X v2: modernized architecture with REST API
TIN-X v2: modernized architecture with REST API
 
Ex-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles ExplorerEx-files: Sex-Specific Gene Expression Profiles Explorer
Ex-files: Sex-Specific Gene Expression Profiles Explorer
 
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
Illuminating the Druggable Genome with Knowledge Engineering and Machine Lear...
 
Open Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource posterOpen Phenotypic Drug Discovery Resource poster
Open Phenotypic Drug Discovery Resource poster
 
Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)Badapple: promiscuity patterns from noisy evidence (poster)
Badapple: promiscuity patterns from noisy evidence (poster)
 
Bibliological data science and drug discovery
Bibliological data science and drug discoveryBibliological data science and drug discovery
Bibliological data science and drug discovery
 
BioMISS: Language Diversity of Computing
BioMISS: Language Diversity of ComputingBioMISS: Language Diversity of Computing
BioMISS: Language Diversity of Computing
 
The Language Diversity of Computing
The Language Diversity of ComputingThe Language Diversity of Computing
The Language Diversity of Computing
 
RMSD: routine measure stirs doubts
RMSD: routine measure stirs doubtsRMSD: routine measure stirs doubts
RMSD: routine measure stirs doubts
 
Canonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformaticsCanonicalized systematic nomenclature in cheminformatics
Canonicalized systematic nomenclature in cheminformatics
 
Molecular scaffolds poster
Molecular scaffolds posterMolecular scaffolds poster
Molecular scaffolds poster
 
Molecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discoveryMolecular scaffolds are special and useful guides to discovery
Molecular scaffolds are special and useful guides to discovery
 
The BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARDThe BADAPPLE promiscuity plugin for BARD
The BADAPPLE promiscuity plugin for BARD
 
Cheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case StudiesCheminformatics Software Development: Case Studies
Cheminformatics Software Development: Case Studies
 
How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...How am I supposed to organize a protein database when I can't even organize m...
How am I supposed to organize a protein database when I can't even organize m...
 
UNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applicationsUNM Division of Biocomputing public web applications
UNM Division of Biocomputing public web applications
 
Promiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCNPromiscuous patterns and perils in PubChem and the MLSCN
Promiscuous patterns and perils in PubChem and the MLSCN
 

Cyberinfrastructure Day 2010: Applications in Biocomputing

  • 1. Jeremy Yang Software Systems Manager Division of Biocomputing Dept. of Biochemistry & Molecular Biology UNM School of Medicine Cyberinfrastructure Day -- April 22, 2010
  • 2. I. What is Biocomputing? II. Cyber Revolution (~1980-2010+) III. Cyberinfrastructure (To be or not to be?) IV. Super Computing, Redefined
  • 3. Division of Biocomputing http://biocomp.health.unm.edu/ Department of Biochemistry & Molecular Biology School of Medicine Also affiliated with the NIH Roadmap-funded UNM Center for Molecular Discovery
  • 4.   Biomolecular screening  Data mining, machine informatics learning   Cheminformatics   3D visualization   Bioinformatics   Public data integration   Genomics   Collaborations in   Virtual screening chemistry, biology, medicine, comp sci   Molecular modeling   BIOMED 505 course   SAR (Structure- Activity-Relationship)   Software development, management, deployment & support
  • 5. Larry Sklar, et al., UNMCMD (NIH Roadmap) ~$20M NIH awarded to date
  • 6.  32 cpu Linux cluster  2+ Oracle instances  32GB RAM server  PostgreSQL, MySQL  Linux: OpenSUSE, CentOS,  Stereo graphics RedHat, Fedora, Ubuntu workstation  SGI/IRIX  25+ scientific software  Windows, Mac OS X packages  Automated integration with  Supported in-house NIH databases applications We are cyberinfrastructure users and providers!
  • 7.
  • 8. Virtual chemistry; property prediction, chemspace navigation, computer aided molecular design, graph theory, databases
  • 9.  Nucleotide and protein sequence analysis  Genomics, proteomics  Merging with chemical biology, etc.
  • 10.  Computational search for likely biological actives Example: 3D shape search;  Database may be real or virtual prozac & paxil compounds  2D and 3D methods  2D similarity search  3D similarity search (shape, pharmacophore)  docking (3D, protein binding site) c/o OpenEye Rocs
  • 11. atoms, bonds, surfaces, fields, interactions, stereo serotonin hemoglobin
  • 12. Computational models for protein-ligand binding Abl kinase (1iep.pdb)‫‏‬ interaction potentia hydrophobic (green hbond acceptors (r Gleevec in binding site Gleevec is a leukemia drug known to bind with Abl kinase.
  • 13. (Watch movies...) PyMol movie: http://video.google.com/videoplay? docid=-5859274887925224981# Jmol interactive DNA modeling demo: http://chemapps.stolaf.edu/pe/protexpl/htm/top.htm? id=1d66&&&chpa=true Expert users can advance understanding via rich, dynamic, visual interfaces.
  • 14. E.g., Searching NIH PubChem for non-selectivity
  • 15. Many biomedical data sources worldwide SLIDE 15 (15 MIN?)
  • 17.  Rapid change, challenge and opportunity  Learning from history, trends (new not enough)  Winners and losers  Science, experts have led and followed.  ~1980-2010 covers 3σ (99.7%)  And evolution...
  • 18.  Rapid change, challenge and opportunity  Learning from history, trends  Winners and losers  Science, experts have led and followed.  ~1980-2010 covers 3σ (99.7%)  And evolution...
  • 19. 1977: Atari 2600 1978: Space Invaders 1981: IBM-PC (MS-DOS) 1983: cellphone 1983: GNU Project 1984: Neuromancer, William Gibson, “cyberspace” 1984: Apple Mac, mouse, windows & icons
  • 20. 1985: Oracle 5 (client-server) 1989: Intel 486 Pentium (1M transistors, 50MHz) 1990: MS Windows 3.0 1990: WWW (Berners-Lee) 1991: High Perf Comp & Comm Act (Al Gore) 1991: Linux (Linux Torvalds) 1991: AOL 1991: ETrade
  • 21. 1993: Jurassic Park (via SGI) 1993: NCSA Mosaic 1994: Netscape Navigator 1994: “Good Times” hoax 1994: Match.com 1995: “Concept” virus (Word) 1995: Internet Explorer 1995: Apache project 1995: Yahoo!
  • 22. 1995: Amazon.com 1995: My mother gets email 1997: Google 1997: eBay 1999: Melissa virus (Outlook) 1999: Napster (p2p) 2000: MS convicted 2000: 3M USA broadband* 2000: dot-com bubble pops *Fixed non dial-up internet connections >56k (FCC).
  • 23. 2000: 802.11b wireless 2001: Apple iPod 2001: Apple iTunes 2001: Wikipedia 2003: Skype 2005: YouTube 2005: Rio power grid hacked 2005: NSA domestic surveillance 2006: Facebook
  • 24. 2006: Amazon Cloud 2007: DOD hacked 2008: 70M USA broadband* 2009: Cyberdefense USA priority 2009: Twitter role in Iran election protests 2010: UAVs are SOPs 2011: Cyber terrorism? *Fixed non dial-up internet connections >56k (FCC).
  • 25. The dotted line keeps moving... Case study: database cheminformatics in pharma research, 1990→2000.
  • 26.  In 1990, high speed chemical searching was beyond standard capabilities.  Research groups managed local servers in their labs & specialized DB engines (e.g. Daylight Inc.).  By 2000, this function had moved to IT (via Oracle cartridges, etc.) corporate informatics infrastructure  Transition not smooth, but very beneficial.
  • 27. Standard cocaine functions: substructure, similarity, identity chemical searching imidazoles
  • 28. (1) office equipment (2) lab equipment (3) experimental apparatus (4) the experiment (5) a commodity (6) custom configured experimental vehicle for exploration (5) all of the above
  • 29. (1) office equipment (2) lab equipment (3) experimental apparatus (4) the experiment (5) a commodity (6) custom configured experimental vehicle for exploration (5) all of the above
  • 30.  Scientific software  Computational science  Commodity software  Engineering enables science  Science requires agile development, high performance, experimentation, risk taking, play.  Cyberinfrastructure users and developers/maintainers SLIDE 30 (30 MIN?)
  • 31.  Scientific research  Scientific software for experts  Computational research  Enabling software for  High performance scientists computing as a research tool  Commoditization (e.g. cloud computing)  High performance infrastructure as a  Plumbing vs. productivity tool experimental apparatus  Appropriate tiers and domains
  • 32. IT: “Poorly managed Research: “We need computers and needy ill- power, flexibility and trained users put the access and not another system at risk.” lame PC.”
  • 33. And with other cyberfolks too. And with great results.
  • 34.  In ~5 yrs, super → un-super  Super computing? Define computer.  Advances from unexpected places:   gaming, movies (graphics -- vs. AI)   social networking (crowdsourcing)   even business (web standards, UIs, security)  Super computing is pushing the current limits  But where are the key frontiers?
  • 36. Colossus code breaking computer, UK.
  • 37. Eniac computer, Univ of Pennsylvania.
  • 39.
  • 40. SLIDE 40 (40 MIN?)
  • 41.
  • 42. High performance (super) computing is pushing the current limits.
  • 43.
  • 44.
  • 45.
  • 46. This is what a “computer” looks like.
  • 47.
  • 48. “The network is the computer.” - John Gage (Sun, NetDay founder)
  • 49. Corollaries:  The network is the (semantic) database  The network is cyberspace  The network is us too
  • 50.  Super users → super computing  Blackbox AI/monolith paradigm limiting  Human/computer co-evolution Cytoscape biological network visualizer with drug - target interactions
  • 51.
  • 52. “Super Computers” @ Division of Biocomputing  Tudor Oprea  Cristian Bologa  Stephen Mathias  Oleg Ursu Happy Earth Day!  Jerome Abear  Ramona Curpan  Liliana Halip Jeremy Yang  Andrei Leitao jjyang@salud.unm.edu Cyberinfrastructure Day -- April 22, 2010