SlideShare a Scribd company logo
1 of 54
Download to read offline
Open Source Visualization of
      Scientific Data
          8 August 2011
         Dr. Marcus D. Hanwell
         marcus.hanwell@kitware.com




                                      1	
  
Outline
•    Background
•    Why is open science important?
•    Opening up chemistry over the last four years
•    The Visualization Toolkit (VTK)
•    ParaView – a client-server Qt based VTK GUI
•    New frontiers – web, mobile and tablets
•    Future directions



                                                     2	
  
My Background
•  Ph.D. (Physics) – University of Sheffield
•  Google Summer of Code – Avogadro
•  Postdoc (Chemistry) – University of Pittsburgh
•  R&D engineer – Kitware, Inc
•  Passionate about physics, chemistry, and the
   growing need to improve computational tools
•  See the need for powerful open source, cross
   platform frameworks and applications
•  Develop(ed): Gentoo, KDE, Kalzium, Avogadro,
   Open Babel, VTK, ParaView, Titan, CMake

                                                    3	
  
Kitware
•  Founded in 1998: 5 former GE Research employees
•  95 employees: 42% PhD
•  Privately held, profitable from creation, no debt
•  Rapidly Growing: >30% in 2010, 7M web-visitors/quarter
•  Offices                                  •  2011 Small Business
   –  Albany, NY                               Administration’s
                                               Tibbetts Award
   –  Carrboro, NC
                                            •  HPCWire Readers
   –  Lyon, France                             and Editor’s Choice

   –  Bangalore, India                      •  Inc’s 5000 List: 2008
                                               to 2010
Kitware: Core Technologies




                CMake




                CDash




                             5	
  
What Is “Open Science”?

“Open science is the idea that scientific
knowledge of all kinds should be openly
shared as early as is practical in the
discovery process.”
                          openscience.org



                                            6	
  
What Is The Problem?
“…when the journal system was developed
in the 17th and 18th centuries it was an
excellent example of open science. The
journals are perhaps the most open system
for the dissemination of knowledge that can
be constructed — if you’re working with 17th
century technology. But, of course, today
we can do a lot better.”
                          openscience.org
                                               7	
  
Opening Up Chemistry
•  Computational chemistry is currently one
   of the more closed sciences
•  Lots of black box proprietary codes
  –  Only a few have access to the code
  –  Publishing results from black box codes
  –  Many file formats in use, little agreement
•  More papers should be including data
•  Growing need for open standards

                                                  8	
  
Movements for Open Chemistry
•  Formed an “unorganization” – Blue Obelisk
  –  Published first article in 2005
  –  Open data, open standards and open source
  –  Meet at ACS and other conferences when possible
  –  Follow-up article currently in press



•  Quixote collaboration more recently
  –  Provide meaningful data storage and exchange
  –  Principally targeting computational chemistry

                                                       9	
  
Typical Chemistry Workflow

        Log File                      Input File
                   Edit/Analyze	
  




   Results	
           Data	
         Job	
  Submission	
  




                                        Local
                   Calcula@on	
        Remote
                                                          10	
  
Problem: Pretty Complex/Manual
•    Most steps require user intervention
•    Obtain starting structure (previous work, databases)
•    Edit structure
•    Write input file
•    Move input file to cluster
•    Submit to queue
•    Wait for completion
•    Retrieve input file
•    Analyze output file
•    Extract the relevant data, change formats
•    Store results
•    Repeat

                                                            11	
  
Improved Chemistry Workflow

       Log File                      Input File
                  Edit/Analyze	
  



  Results	
           Data	
         Job	
  Submission	
  




                                       Local
                  Calcula@on	
        Remote

                                                         12	
  
Avogadro
•  Project began 2006
•  Split into library and
application (plugin based)
•  One of very few open source editors
•  Designed to be extensible from the start
•  Generate input & read output from many codes
•  An active and growing community
•  Chemistry needs a free, open framework


                                                  13	
  
Avogadro’s Roots
•  Avogadro projected started in 2006
•  First funded work in 2007 by Marcus Hanwell
  –  Google Summer of Code student
  –  Final year of Ph.D. spent the summer coding
  –  Funded as part of KDE project – Kalzium editor
•  Built on several other open source projects
  –  Qt, Eigen, Open Babel, Blue Obelisk Data Repository
•  Also uses open standards, e.g. OpenGL
•  Cross platform, open source stack

                                                       14	
  
Avogadro Vital Statistics
•    Supports Linux, Windows and Mac OS X
•    Contributions from over 20 developers
•    Over 180,000 downloads over 4 years
•    Translated into 19 languages
•    Used by Kalzium for molecular editor
•    Featured by Trolltech/Nokia,
     –  Qt in use
     –  Qt ambassador program
                                             15	
  
16	
  
Desktop Database
•  Use of “document store” NoSQL
  •  Doesn’t force too much structure
     •  Some entries have experimental data available
     •  Some have computational jobs
  •  Employ a “pile of stuff” approach
     •  Can store both source and derived data
     •  Calculate identifiers, QSAR properties, etc
•  MongoDB is a scalable, open solution
  •  Proven scaling with large web applications

                                                        17	
  
Chemistry Data Explorer
•    Qt application
•    Connects to local or remote database
•    Uses VTK for visual data exploration
•    Can ingest new data
     –  Uses Open Babel to generate descriptors
     –  Standard InChi, SMILES, molecular weight
     –  More could be added
        •  All derived from files stored in the database

                                                           18	
  
Chemistry Data Explorer




                          19	
  
Database Interaction on the Web
•  Avogadro directly accesses some (read-
   only) public databases:
  •  PDB, NIH “fetch by name”
  •  More could be added
•  ChemData follows this approach
•  Quixote aims to support both public and
   private sharing models – open framework


                                             20	
  
Z-Matrix/Cartesian Molecule Editor




                                     21	
  
Avogadro




           22	
  
Calling Stand Alone Programs
•  Many already supported:
  •  GAMESS, GAMESS-UK, Molpro, Q-Chem,
     MOPAC, NWChem, Gaussian, Dalton
  •  Easy to add more
•  Some codes writing Avogadro based
   custom applications,
  •  Q-Chem, Molpro…
•  DLPOLY author approached me:
  •  Open sourced DLPOLY2, want a GUI

                                          23	
  
GAMESS Input Generation




                          24	
  
OpenQube – Quantum Data
•  Reads in key quantum data
  –  Basis set used in calculation
  –  Eigenvectors for molecular orbitals
  –  Density matrix for electron density
  –  Standard geometry
•  Multithreaded calculation
  –  Produce regular grids of scalar data
  –  Molecular orbitals, electron density…

                                             25	
  
Molecular Orbitals and Electron Density

•  Quantum files store basis sets and
   matrices
                −αr 2
  GTO = ce
  φ i = ∑ c µiφ µ
        µ

  ρ(r) = ∑ ∑ Pµν φ µ φν
            µ   ν
•  Using these equations, and the supplied
   matrices – calculate cubes
                                             26	
  
Advanced Visualization: VTK
•  New Avogadro plugin:
  •  Takes volumetric data from Avogadro
  •  Uses GPU accelerated rendering in VTK
•  Widespread excitement from many in the
   chemistry community
•  Several groups interested in collaborating
•  Google Summer of Code project
•  Leverage significant capabilities in VTK
                                                27	
  
Volume Rendered With Contours




                                28	
  
Electron Density Volume Render




                                 29	
  
Electron Density Ray Tracing




                               30	
  
VTK: The Toolkit
•  Collection of C++ libraries
  –  Leveraged by many applications
  –  Divided into logical areas, e.g.
     •  Filtering – data processing in visualization pipeline
     •  InfoVis – informatics visualization
     •  Widgets – 3D interaction widgets
     •  VolumeRendering – 3D volume rendering
•  Cross platform, using OpenGL
•  Wrapped in Python, Tcl and Java

                                                            31	
  
VTK Development Team
 • From Ohloh: Very large, active development team: Over the past twelve
  months, 100 developers contributed new code to VTK. This is one of the largest open-source
  teams in the world, and is in the top 2% of all project teams on Ohloh.




                                                      and many others...
                                                                                               32	
  
ParaView
•    Parallel visualization application
•    Open source, BSD licensed
•    Turn-key application wrapper around VTK
•    Parallel data processing and rendering




                                               33	
  
Large Data Visualization
•  BlueGene/L at LLNL
  –  65,536 compute nodes (32 bit PPC)
  –  1,024 I/O nodes (32 bit PPC)
  –  512 MB of RAM per node
•  Sandia Red Storm
  –  12,960 compute nodes (AMD Opteron dual)
  –  640 service and I/O nodes
  –  40 TB of DDR RAM per node

                                               34	
  
1 Billion Cell Asteroid Simulation




                                     35	
  
Tiled Displays




                 36	
  
Parallel Processing/Rendering




                                37	
  
3D Chemistry Visualization
•  Some existing features specific to chemistry
   –  Gaussian cube, PDB, and a few others
•  Excellent handling of volumetric data:
   –  Marching cubes
   –  Volume rendering
   –  Contouring
•  Advanced rendering:
   –  Point sprites
   –  Manta – real time ray tracing

                                                  38	
  
Titan: VTK and Informatics
•  Led by Sandia National Laboratories
•  Substantial expansion of VTK:
   –  Informatics & analysis
•  Actively developed, growing feature set
•  Improved 2D rendering and API
•  Database connectivity, client-server, pipeline
   based approach
•  Uses web technologies such as ProtoViz
•  Scalable, interactive infoviz

                                                    39	
  
Manta: Real Time Ray Tracing




                               40	
  
New Frontiers
•  New work porting VTK
  –  Use C++ as the common core
    •  iOS port in the early stages
    •  Android port
  –  Use OpenGL ES 2.0 – new rendering code
•  Also ParaViewWeb – delivering over web
  –  Use image delivery and rendering on server
  –  Also using WebGL for rendering (optionally)


                                                   41	
  
Future Directions
•  VTK modularization (in progress)
   –  Developing more agile build systems
   –  Automating more with CMake
•  Using Git more fully to improve stability
   –  Use of master and next
   –  Topic branches - merge when ready
•  Code review using Gerrit
   –  Integration with continuous integration
   –  Test before merge


                                                42	
  
Standard Representations




                           43	
  
Standard Representations




                           44	
  
Volumetric Data: Molecular Orbitals




                                  45	
  
Biomolecules




               46	
  
Nanomaterials




                47	
  
Periodic Systems




                   48	
  
Simplified Views




                   49	
  
Hybrid Views: CPK + MO + Ball & Stick




                                    50	
  
Linked Views of Live Data




                            51	
  
2D: Graphs and Charts




                        52	
  
Informatics




              53	
  
3D Interaction Widgets




                         54	
  

More Related Content

What's hot

Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsTom Mens
 
Survival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsSurvival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsTom Mens
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationTanu Malik
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015Tanu Malik
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014aceas13tern
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataXing Xu
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsTanu Malik
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and FutureKeiichiro Ono
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBencht_ivanov
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in CytoscapeKeiichiro Ono
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusTanu Malik
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanBoris Glavic
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...Raffaele Montella
 

What's hot (20)

Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projects
 
Survival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsSurvival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projects
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
 
Bioinformatics on Azure
Bioinformatics on AzureBioinformatics on Azure
Bioinformatics on Azure
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in Cytoscape
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 

Similar to Open Source Visualization of Scientific Data

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNeo4j
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsSimon Cockell
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesJan Aerts
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheetinside-BigData.com
 
OpenStack Documentation in the Open
OpenStack Documentation in the OpenOpenStack Documentation in the Open
OpenStack Documentation in the OpenAnne Gentle
 
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingClouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingMatthew Vaughn
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab OverviewEd Dodds
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Centre of Competence
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...Docker, Inc.
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codecppfrug
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformaticsStephen Turner
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...BigData_Europe
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMETyrone Grandison
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudOla Spjuth
 

Similar to Open Source Visualization of Scientific Data (20)

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Open sourcery
Open sourceryOpen sourcery
Open sourcery
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheet
 
OpenStack Documentation in the Open
OpenStack Documentation in the OpenOpenStack Documentation in the Open
OpenStack Documentation in the Open
 
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingClouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHME
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 

Recently uploaded

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024SynarionITSolutions
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 

Recently uploaded (20)

Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 

Open Source Visualization of Scientific Data

  • 1. Open Source Visualization of Scientific Data 8 August 2011 Dr. Marcus D. Hanwell marcus.hanwell@kitware.com 1  
  • 2. Outline •  Background •  Why is open science important? •  Opening up chemistry over the last four years •  The Visualization Toolkit (VTK) •  ParaView – a client-server Qt based VTK GUI •  New frontiers – web, mobile and tablets •  Future directions 2  
  • 3. My Background •  Ph.D. (Physics) – University of Sheffield •  Google Summer of Code – Avogadro •  Postdoc (Chemistry) – University of Pittsburgh •  R&D engineer – Kitware, Inc •  Passionate about physics, chemistry, and the growing need to improve computational tools •  See the need for powerful open source, cross platform frameworks and applications •  Develop(ed): Gentoo, KDE, Kalzium, Avogadro, Open Babel, VTK, ParaView, Titan, CMake 3  
  • 4. Kitware •  Founded in 1998: 5 former GE Research employees •  95 employees: 42% PhD •  Privately held, profitable from creation, no debt •  Rapidly Growing: >30% in 2010, 7M web-visitors/quarter •  Offices •  2011 Small Business –  Albany, NY Administration’s Tibbetts Award –  Carrboro, NC •  HPCWire Readers –  Lyon, France and Editor’s Choice –  Bangalore, India •  Inc’s 5000 List: 2008 to 2010
  • 5. Kitware: Core Technologies CMake CDash 5  
  • 6. What Is “Open Science”? “Open science is the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.” openscience.org 6  
  • 7. What Is The Problem? “…when the journal system was developed in the 17th and 18th centuries it was an excellent example of open science. The journals are perhaps the most open system for the dissemination of knowledge that can be constructed — if you’re working with 17th century technology. But, of course, today we can do a lot better.” openscience.org 7  
  • 8. Opening Up Chemistry •  Computational chemistry is currently one of the more closed sciences •  Lots of black box proprietary codes –  Only a few have access to the code –  Publishing results from black box codes –  Many file formats in use, little agreement •  More papers should be including data •  Growing need for open standards 8  
  • 9. Movements for Open Chemistry •  Formed an “unorganization” – Blue Obelisk –  Published first article in 2005 –  Open data, open standards and open source –  Meet at ACS and other conferences when possible –  Follow-up article currently in press •  Quixote collaboration more recently –  Provide meaningful data storage and exchange –  Principally targeting computational chemistry 9  
  • 10. Typical Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula@on   Remote 10  
  • 11. Problem: Pretty Complex/Manual •  Most steps require user intervention •  Obtain starting structure (previous work, databases) •  Edit structure •  Write input file •  Move input file to cluster •  Submit to queue •  Wait for completion •  Retrieve input file •  Analyze output file •  Extract the relevant data, change formats •  Store results •  Repeat 11  
  • 12. Improved Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula@on   Remote 12  
  • 13. Avogadro •  Project began 2006 •  Split into library and application (plugin based) •  One of very few open source editors •  Designed to be extensible from the start •  Generate input & read output from many codes •  An active and growing community •  Chemistry needs a free, open framework 13  
  • 14. Avogadro’s Roots •  Avogadro projected started in 2006 •  First funded work in 2007 by Marcus Hanwell –  Google Summer of Code student –  Final year of Ph.D. spent the summer coding –  Funded as part of KDE project – Kalzium editor •  Built on several other open source projects –  Qt, Eigen, Open Babel, Blue Obelisk Data Repository •  Also uses open standards, e.g. OpenGL •  Cross platform, open source stack 14  
  • 15. Avogadro Vital Statistics •  Supports Linux, Windows and Mac OS X •  Contributions from over 20 developers •  Over 180,000 downloads over 4 years •  Translated into 19 languages •  Used by Kalzium for molecular editor •  Featured by Trolltech/Nokia, –  Qt in use –  Qt ambassador program 15  
  • 16. 16  
  • 17. Desktop Database •  Use of “document store” NoSQL •  Doesn’t force too much structure •  Some entries have experimental data available •  Some have computational jobs •  Employ a “pile of stuff” approach •  Can store both source and derived data •  Calculate identifiers, QSAR properties, etc •  MongoDB is a scalable, open solution •  Proven scaling with large web applications 17  
  • 18. Chemistry Data Explorer •  Qt application •  Connects to local or remote database •  Uses VTK for visual data exploration •  Can ingest new data –  Uses Open Babel to generate descriptors –  Standard InChi, SMILES, molecular weight –  More could be added •  All derived from files stored in the database 18  
  • 20. Database Interaction on the Web •  Avogadro directly accesses some (read- only) public databases: •  PDB, NIH “fetch by name” •  More could be added •  ChemData follows this approach •  Quixote aims to support both public and private sharing models – open framework 20  
  • 22. Avogadro 22  
  • 23. Calling Stand Alone Programs •  Many already supported: •  GAMESS, GAMESS-UK, Molpro, Q-Chem, MOPAC, NWChem, Gaussian, Dalton •  Easy to add more •  Some codes writing Avogadro based custom applications, •  Q-Chem, Molpro… •  DLPOLY author approached me: •  Open sourced DLPOLY2, want a GUI 23  
  • 25. OpenQube – Quantum Data •  Reads in key quantum data –  Basis set used in calculation –  Eigenvectors for molecular orbitals –  Density matrix for electron density –  Standard geometry •  Multithreaded calculation –  Produce regular grids of scalar data –  Molecular orbitals, electron density… 25  
  • 26. Molecular Orbitals and Electron Density •  Quantum files store basis sets and matrices −αr 2 GTO = ce φ i = ∑ c µiφ µ µ ρ(r) = ∑ ∑ Pµν φ µ φν µ ν •  Using these equations, and the supplied matrices – calculate cubes 26  
  • 27. Advanced Visualization: VTK •  New Avogadro plugin: •  Takes volumetric data from Avogadro •  Uses GPU accelerated rendering in VTK •  Widespread excitement from many in the chemistry community •  Several groups interested in collaborating •  Google Summer of Code project •  Leverage significant capabilities in VTK 27  
  • 28. Volume Rendered With Contours 28  
  • 29. Electron Density Volume Render 29  
  • 30. Electron Density Ray Tracing 30  
  • 31. VTK: The Toolkit •  Collection of C++ libraries –  Leveraged by many applications –  Divided into logical areas, e.g. •  Filtering – data processing in visualization pipeline •  InfoVis – informatics visualization •  Widgets – 3D interaction widgets •  VolumeRendering – 3D volume rendering •  Cross platform, using OpenGL •  Wrapped in Python, Tcl and Java 31  
  • 32. VTK Development Team • From Ohloh: Very large, active development team: Over the past twelve months, 100 developers contributed new code to VTK. This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh. and many others... 32  
  • 33. ParaView •  Parallel visualization application •  Open source, BSD licensed •  Turn-key application wrapper around VTK •  Parallel data processing and rendering 33  
  • 34. Large Data Visualization •  BlueGene/L at LLNL –  65,536 compute nodes (32 bit PPC) –  1,024 I/O nodes (32 bit PPC) –  512 MB of RAM per node •  Sandia Red Storm –  12,960 compute nodes (AMD Opteron dual) –  640 service and I/O nodes –  40 TB of DDR RAM per node 34  
  • 35. 1 Billion Cell Asteroid Simulation 35  
  • 36. Tiled Displays 36  
  • 38. 3D Chemistry Visualization •  Some existing features specific to chemistry –  Gaussian cube, PDB, and a few others •  Excellent handling of volumetric data: –  Marching cubes –  Volume rendering –  Contouring •  Advanced rendering: –  Point sprites –  Manta – real time ray tracing 38  
  • 39. Titan: VTK and Informatics •  Led by Sandia National Laboratories •  Substantial expansion of VTK: –  Informatics & analysis •  Actively developed, growing feature set •  Improved 2D rendering and API •  Database connectivity, client-server, pipeline based approach •  Uses web technologies such as ProtoViz •  Scalable, interactive infoviz 39  
  • 40. Manta: Real Time Ray Tracing 40  
  • 41. New Frontiers •  New work porting VTK –  Use C++ as the common core •  iOS port in the early stages •  Android port –  Use OpenGL ES 2.0 – new rendering code •  Also ParaViewWeb – delivering over web –  Use image delivery and rendering on server –  Also using WebGL for rendering (optionally) 41  
  • 42. Future Directions •  VTK modularization (in progress) –  Developing more agile build systems –  Automating more with CMake •  Using Git more fully to improve stability –  Use of master and next –  Topic branches - merge when ready •  Code review using Gerrit –  Integration with continuous integration –  Test before merge 42  
  • 45. Volumetric Data: Molecular Orbitals 45  
  • 46. Biomolecules 46  
  • 47. Nanomaterials 47  
  • 50. Hybrid Views: CPK + MO + Ball & Stick 50  
  • 51. Linked Views of Live Data 51  
  • 52. 2D: Graphs and Charts 52  
  • 53. Informatics 53