SlideShare a Scribd company logo
Open Source Visualization of
      Scientific Data
          8 August 2011
         Dr. Marcus D. Hanwell
         marcus.hanwell@kitware.com




                                      1	
  
Outline
•    Background
•    Why is open science important?
•    Opening up chemistry over the last four years
•    The Visualization Toolkit (VTK)
•    ParaView – a client-server Qt based VTK GUI
•    New frontiers – web, mobile and tablets
•    Future directions



                                                     2	
  
My Background
•  Ph.D. (Physics) – University of Sheffield
•  Google Summer of Code – Avogadro
•  Postdoc (Chemistry) – University of Pittsburgh
•  R&D engineer – Kitware, Inc
•  Passionate about physics, chemistry, and the
   growing need to improve computational tools
•  See the need for powerful open source, cross
   platform frameworks and applications
•  Develop(ed): Gentoo, KDE, Kalzium, Avogadro,
   Open Babel, VTK, ParaView, Titan, CMake

                                                    3	
  
Kitware
•  Founded in 1998: 5 former GE Research employees
•  95 employees: 42% PhD
•  Privately held, profitable from creation, no debt
•  Rapidly Growing: >30% in 2010, 7M web-visitors/quarter
•  Offices                                  •  2011 Small Business
   –  Albany, NY                               Administration’s
                                               Tibbetts Award
   –  Carrboro, NC
                                            •  HPCWire Readers
   –  Lyon, France                             and Editor’s Choice

   –  Bangalore, India                      •  Inc’s 5000 List: 2008
                                               to 2010
Kitware: Core Technologies




                CMake




                CDash




                             5	
  
What Is “Open Science”?

“Open science is the idea that scientific
knowledge of all kinds should be openly
shared as early as is practical in the
discovery process.”
                          openscience.org



                                            6	
  
What Is The Problem?
“…when the journal system was developed
in the 17th and 18th centuries it was an
excellent example of open science. The
journals are perhaps the most open system
for the dissemination of knowledge that can
be constructed — if you’re working with 17th
century technology. But, of course, today
we can do a lot better.”
                          openscience.org
                                               7	
  
Opening Up Chemistry
•  Computational chemistry is currently one
   of the more closed sciences
•  Lots of black box proprietary codes
  –  Only a few have access to the code
  –  Publishing results from black box codes
  –  Many file formats in use, little agreement
•  More papers should be including data
•  Growing need for open standards

                                                  8	
  
Movements for Open Chemistry
•  Formed an “unorganization” – Blue Obelisk
  –  Published first article in 2005
  –  Open data, open standards and open source
  –  Meet at ACS and other conferences when possible
  –  Follow-up article currently in press



•  Quixote collaboration more recently
  –  Provide meaningful data storage and exchange
  –  Principally targeting computational chemistry

                                                       9	
  
Typical Chemistry Workflow

        Log File                      Input File
                   Edit/Analyze	
  




   Results	
           Data	
         Job	
  Submission	
  




                                        Local
                   Calcula@on	
        Remote
                                                          10	
  
Problem: Pretty Complex/Manual
•    Most steps require user intervention
•    Obtain starting structure (previous work, databases)
•    Edit structure
•    Write input file
•    Move input file to cluster
•    Submit to queue
•    Wait for completion
•    Retrieve input file
•    Analyze output file
•    Extract the relevant data, change formats
•    Store results
•    Repeat

                                                            11	
  
Improved Chemistry Workflow

       Log File                      Input File
                  Edit/Analyze	
  



  Results	
           Data	
         Job	
  Submission	
  




                                       Local
                  Calcula@on	
        Remote

                                                         12	
  
Avogadro
•  Project began 2006
•  Split into library and
application (plugin based)
•  One of very few open source editors
•  Designed to be extensible from the start
•  Generate input & read output from many codes
•  An active and growing community
•  Chemistry needs a free, open framework


                                                  13	
  
Avogadro’s Roots
•  Avogadro projected started in 2006
•  First funded work in 2007 by Marcus Hanwell
  –  Google Summer of Code student
  –  Final year of Ph.D. spent the summer coding
  –  Funded as part of KDE project – Kalzium editor
•  Built on several other open source projects
  –  Qt, Eigen, Open Babel, Blue Obelisk Data Repository
•  Also uses open standards, e.g. OpenGL
•  Cross platform, open source stack

                                                       14	
  
Avogadro Vital Statistics
•    Supports Linux, Windows and Mac OS X
•    Contributions from over 20 developers
•    Over 180,000 downloads over 4 years
•    Translated into 19 languages
•    Used by Kalzium for molecular editor
•    Featured by Trolltech/Nokia,
     –  Qt in use
     –  Qt ambassador program
                                             15	
  
16	
  
Desktop Database
•  Use of “document store” NoSQL
  •  Doesn’t force too much structure
     •  Some entries have experimental data available
     •  Some have computational jobs
  •  Employ a “pile of stuff” approach
     •  Can store both source and derived data
     •  Calculate identifiers, QSAR properties, etc
•  MongoDB is a scalable, open solution
  •  Proven scaling with large web applications

                                                        17	
  
Chemistry Data Explorer
•    Qt application
•    Connects to local or remote database
•    Uses VTK for visual data exploration
•    Can ingest new data
     –  Uses Open Babel to generate descriptors
     –  Standard InChi, SMILES, molecular weight
     –  More could be added
        •  All derived from files stored in the database

                                                           18	
  
Chemistry Data Explorer




                          19	
  
Database Interaction on the Web
•  Avogadro directly accesses some (read-
   only) public databases:
  •  PDB, NIH “fetch by name”
  •  More could be added
•  ChemData follows this approach
•  Quixote aims to support both public and
   private sharing models – open framework


                                             20	
  
Z-Matrix/Cartesian Molecule Editor




                                     21	
  
Avogadro




           22	
  
Calling Stand Alone Programs
•  Many already supported:
  •  GAMESS, GAMESS-UK, Molpro, Q-Chem,
     MOPAC, NWChem, Gaussian, Dalton
  •  Easy to add more
•  Some codes writing Avogadro based
   custom applications,
  •  Q-Chem, Molpro…
•  DLPOLY author approached me:
  •  Open sourced DLPOLY2, want a GUI

                                          23	
  
GAMESS Input Generation




                          24	
  
OpenQube – Quantum Data
•  Reads in key quantum data
  –  Basis set used in calculation
  –  Eigenvectors for molecular orbitals
  –  Density matrix for electron density
  –  Standard geometry
•  Multithreaded calculation
  –  Produce regular grids of scalar data
  –  Molecular orbitals, electron density…

                                             25	
  
Molecular Orbitals and Electron Density

•  Quantum files store basis sets and
   matrices
                −αr 2
  GTO = ce
  φ i = ∑ c µiφ µ
        µ

  ρ(r) = ∑ ∑ Pµν φ µ φν
            µ   ν
•  Using these equations, and the supplied
   matrices – calculate cubes
                                             26	
  
Advanced Visualization: VTK
•  New Avogadro plugin:
  •  Takes volumetric data from Avogadro
  •  Uses GPU accelerated rendering in VTK
•  Widespread excitement from many in the
   chemistry community
•  Several groups interested in collaborating
•  Google Summer of Code project
•  Leverage significant capabilities in VTK
                                                27	
  
Volume Rendered With Contours




                                28	
  
Electron Density Volume Render




                                 29	
  
Electron Density Ray Tracing




                               30	
  
VTK: The Toolkit
•  Collection of C++ libraries
  –  Leveraged by many applications
  –  Divided into logical areas, e.g.
     •  Filtering – data processing in visualization pipeline
     •  InfoVis – informatics visualization
     •  Widgets – 3D interaction widgets
     •  VolumeRendering – 3D volume rendering
•  Cross platform, using OpenGL
•  Wrapped in Python, Tcl and Java

                                                            31	
  
VTK Development Team
 • From Ohloh: Very large, active development team: Over the past twelve
  months, 100 developers contributed new code to VTK. This is one of the largest open-source
  teams in the world, and is in the top 2% of all project teams on Ohloh.




                                                      and many others...
                                                                                               32	
  
ParaView
•    Parallel visualization application
•    Open source, BSD licensed
•    Turn-key application wrapper around VTK
•    Parallel data processing and rendering




                                               33	
  
Large Data Visualization
•  BlueGene/L at LLNL
  –  65,536 compute nodes (32 bit PPC)
  –  1,024 I/O nodes (32 bit PPC)
  –  512 MB of RAM per node
•  Sandia Red Storm
  –  12,960 compute nodes (AMD Opteron dual)
  –  640 service and I/O nodes
  –  40 TB of DDR RAM per node

                                               34	
  
1 Billion Cell Asteroid Simulation




                                     35	
  
Tiled Displays




                 36	
  
Parallel Processing/Rendering




                                37	
  
3D Chemistry Visualization
•  Some existing features specific to chemistry
   –  Gaussian cube, PDB, and a few others
•  Excellent handling of volumetric data:
   –  Marching cubes
   –  Volume rendering
   –  Contouring
•  Advanced rendering:
   –  Point sprites
   –  Manta – real time ray tracing

                                                  38	
  
Titan: VTK and Informatics
•  Led by Sandia National Laboratories
•  Substantial expansion of VTK:
   –  Informatics & analysis
•  Actively developed, growing feature set
•  Improved 2D rendering and API
•  Database connectivity, client-server, pipeline
   based approach
•  Uses web technologies such as ProtoViz
•  Scalable, interactive infoviz

                                                    39	
  
Manta: Real Time Ray Tracing




                               40	
  
New Frontiers
•  New work porting VTK
  –  Use C++ as the common core
    •  iOS port in the early stages
    •  Android port
  –  Use OpenGL ES 2.0 – new rendering code
•  Also ParaViewWeb – delivering over web
  –  Use image delivery and rendering on server
  –  Also using WebGL for rendering (optionally)


                                                   41	
  
Future Directions
•  VTK modularization (in progress)
   –  Developing more agile build systems
   –  Automating more with CMake
•  Using Git more fully to improve stability
   –  Use of master and next
   –  Topic branches - merge when ready
•  Code review using Gerrit
   –  Integration with continuous integration
   –  Test before merge


                                                42	
  
Standard Representations




                           43	
  
Standard Representations




                           44	
  
Volumetric Data: Molecular Orbitals




                                  45	
  
Biomolecules




               46	
  
Nanomaterials




                47	
  
Periodic Systems




                   48	
  
Simplified Views




                   49	
  
Hybrid Views: CPK + MO + Ball & Stick




                                    50	
  
Linked Views of Live Data




                            51	
  
2D: Graphs and Charts




                        52	
  
Informatics




              53	
  
3D Interaction Widgets




                         54	
  

More Related Content

What's hot

Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projects
Tom Mens
 
Survival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsSurvival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projects
Tom Mens
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
Tanu Malik
 
Bioinformatics on Azure
Bioinformatics on AzureBioinformatics on Azure
Bioinformatics on Azure
Tillmann Eitelberg
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityTanu Malik
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
Tanu Malik
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
aceas13tern
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
CloudLightning
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Xing Xu
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
Tanu Malik
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
Keiichiro Ono
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
Helix Nebula The Science Cloud
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
t_ivanov
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
CSUC - Consorci de Serveis Universitaris de Catalunya
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in Cytoscape
Keiichiro Ono
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
The Statistical and Applied Mathematical Sciences Institute
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
Tanu Malik
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
Boris Glavic
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
Raffaele Montella
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt

What's hot (20)

Evolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projectsEvolution of database access technologies in Java-based software projects
Evolution of database access technologies in Java-based software projects
 
Survival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projectsSurvival analysis of database technologies in open source Java projects
Survival analysis of database technologies in open source Java projects
 
LDV: Light-weight Database Virtualization
LDV: Light-weight Database VirtualizationLDV: Light-weight Database Virtualization
LDV: Light-weight Database Virtualization
 
Bioinformatics on Azure
Bioinformatics on AzureBioinformatics on Azure
Bioinformatics on Azure
 
PTU: Using Provenance for Repeatability
PTU: Using Provenance for RepeatabilityPTU: Using Provenance for Repeatability
PTU: Using Provenance for Repeatability
 
GlobusWorld 2015
GlobusWorld 2015GlobusWorld 2015
GlobusWorld 2015
 
Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014Tim Pugh-SPEDDEXES 2014
Tim Pugh-SPEDDEXES 2014
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics DataBest pratices at BGI for the Challenges in the Era of Big Genomics Data
Best pratices at BGI for the Challenges in the Era of Big Genomics Data
 
GEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC ProgramsGEN: A Database Interface Generator for HPC Programs
GEN: A Database Interface Generator for HPC Programs
 
Cytoscape: Now and Future
Cytoscape: Now and FutureCytoscape: Now and Future
Cytoscape: Now and Future
 
Hybrid Cloud for CERN
Hybrid Cloud for CERN Hybrid Cloud for CERN
Hybrid Cloud for CERN
 
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBenchWBDB 2015 Performance Evaluation of Spark SQL using BigBench
WBDB 2015 Performance Evaluation of Spark SQL using BigBench
 
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
Technical Challenges and Approaches to Build an Open Ecosystem of Heterogeneo...
 
What's New in Cytoscape
What's New in CytoscapeWhat's New in Cytoscape
What's New in Cytoscape
 
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
2019 GDRR: Blockchain Data Analytics - QuTrack: Model Life Cycle Management f...
 
GeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with GlobusGeoDataspace: Simplifying Data Management Tasks with Globus
GeoDataspace: Simplifying Data Management Tasks with Globus
 
Ipaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, IanIpaw14 presentation Quan, Tanu, Ian
Ipaw14 presentation Quan, Tanu, Ian
 
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
How to expand the Galaxy from genes to Earth in six simple steps (and live sm...
 
Iwesep19.ppt
Iwesep19.pptIwesep19.ppt
Iwesep19.ppt
 

Similar to Open Source Visualization of Scientific Data

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
Neo4j
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
Simon Cockell
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
Jan Aerts
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheet
inside-BigData.com
 
OpenStack Documentation in the Open
OpenStack Documentation in the OpenOpenStack Documentation in the Open
OpenStack Documentation in the Open
Anne Gentle
 
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingClouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Matthew Vaughn
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
Boston Consulting Group
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
Ed Dodds
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Centre of Competence
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
Docker, Inc.
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
GEO Analytics Canada
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
cppfrug
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
Stephen Turner
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
BigData_Europe
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
AIMS (Agricultural Information Management Standards)
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
inside-BigData.com
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHME
Tyrone Grandison
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
David Wallom
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
Ola Spjuth
 

Similar to Open Source Visualization of Scientific Data (20)

Novo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4jNovo Nordisk's journey in developing an open-source application on Neo4j
Novo Nordisk's journey in developing an open-source application on Neo4j
 
Reproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformaticsReproducibility - The myths and truths of pipeline bioinformatics
Reproducibility - The myths and truths of pipeline bioinformatics
 
E Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutesE Afgan - Zero to a bioinformatics analysis platform in four minutes
E Afgan - Zero to a bioinformatics analysis platform in four minutes
 
Open sourcery
Open sourceryOpen sourcery
Open sourcery
 
QuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data SpreadsheetQuantCell Research - The Big Data Spreadsheet
QuantCell Research - The Big Data Spreadsheet
 
OpenStack Documentation in the Open
OpenStack Documentation in the OpenOpenStack Documentation in the Open
OpenStack Documentation in the Open
 
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computingClouds, Clusters, and Containers: Tools for responsible, collaborative computing
Clouds, Clusters, and Containers: Tools for responsible, collaborative computing
 
Beyond the Science Gateway
Beyond the Science GatewayBeyond the Science Gateway
Beyond the Science Gateway
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
 
IMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens NeudeckerIMPACT Interoperability Framework - Clemens Neudecker
IMPACT Interoperability Framework - Clemens Neudecker
 
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
The Tale of Two Deployments: Greenfield and Monolith Apps with Docker Enterpr...
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
 
How static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ codeHow static analysis supports quality over 50 million lines of C++ code
How static analysis supports quality over 50 million lines of C++ code
 
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
2018 ABRF Tools for improving rigor and reproducibility in bioinformatics
 
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
Big Data Europe SC6 WS 3: Ron Dekker, Director CESSDA European Open Science A...
 
COPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob DaveyCOPO - Collaborative Open Plant Omics, by Rob Davey
COPO - Collaborative Open Plant Omics, by Rob Davey
 
Virtualization for HPC at NCI
Virtualization for HPC at NCIVirtualization for HPC at NCI
Virtualization for HPC at NCI
 
Running Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHMERunning Mixed Workloads on Kubernetes at IHME
Running Mixed Workloads on Kubernetes at IHME
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
Data-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and CloudData-intensive bioinformatics on HPC and Cloud
Data-intensive bioinformatics on HPC and Cloud
 

Recently uploaded

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 

Recently uploaded (20)

RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Assure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyesAssure Contact Center Experiences for Your Customers With ThousandEyes
Assure Contact Center Experiences for Your Customers With ThousandEyes
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 

Open Source Visualization of Scientific Data

  • 1. Open Source Visualization of Scientific Data 8 August 2011 Dr. Marcus D. Hanwell marcus.hanwell@kitware.com 1  
  • 2. Outline •  Background •  Why is open science important? •  Opening up chemistry over the last four years •  The Visualization Toolkit (VTK) •  ParaView – a client-server Qt based VTK GUI •  New frontiers – web, mobile and tablets •  Future directions 2  
  • 3. My Background •  Ph.D. (Physics) – University of Sheffield •  Google Summer of Code – Avogadro •  Postdoc (Chemistry) – University of Pittsburgh •  R&D engineer – Kitware, Inc •  Passionate about physics, chemistry, and the growing need to improve computational tools •  See the need for powerful open source, cross platform frameworks and applications •  Develop(ed): Gentoo, KDE, Kalzium, Avogadro, Open Babel, VTK, ParaView, Titan, CMake 3  
  • 4. Kitware •  Founded in 1998: 5 former GE Research employees •  95 employees: 42% PhD •  Privately held, profitable from creation, no debt •  Rapidly Growing: >30% in 2010, 7M web-visitors/quarter •  Offices •  2011 Small Business –  Albany, NY Administration’s Tibbetts Award –  Carrboro, NC •  HPCWire Readers –  Lyon, France and Editor’s Choice –  Bangalore, India •  Inc’s 5000 List: 2008 to 2010
  • 5. Kitware: Core Technologies CMake CDash 5  
  • 6. What Is “Open Science”? “Open science is the idea that scientific knowledge of all kinds should be openly shared as early as is practical in the discovery process.” openscience.org 6  
  • 7. What Is The Problem? “…when the journal system was developed in the 17th and 18th centuries it was an excellent example of open science. The journals are perhaps the most open system for the dissemination of knowledge that can be constructed — if you’re working with 17th century technology. But, of course, today we can do a lot better.” openscience.org 7  
  • 8. Opening Up Chemistry •  Computational chemistry is currently one of the more closed sciences •  Lots of black box proprietary codes –  Only a few have access to the code –  Publishing results from black box codes –  Many file formats in use, little agreement •  More papers should be including data •  Growing need for open standards 8  
  • 9. Movements for Open Chemistry •  Formed an “unorganization” – Blue Obelisk –  Published first article in 2005 –  Open data, open standards and open source –  Meet at ACS and other conferences when possible –  Follow-up article currently in press •  Quixote collaboration more recently –  Provide meaningful data storage and exchange –  Principally targeting computational chemistry 9  
  • 10. Typical Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula@on   Remote 10  
  • 11. Problem: Pretty Complex/Manual •  Most steps require user intervention •  Obtain starting structure (previous work, databases) •  Edit structure •  Write input file •  Move input file to cluster •  Submit to queue •  Wait for completion •  Retrieve input file •  Analyze output file •  Extract the relevant data, change formats •  Store results •  Repeat 11  
  • 12. Improved Chemistry Workflow Log File Input File Edit/Analyze   Results   Data   Job  Submission   Local Calcula@on   Remote 12  
  • 13. Avogadro •  Project began 2006 •  Split into library and application (plugin based) •  One of very few open source editors •  Designed to be extensible from the start •  Generate input & read output from many codes •  An active and growing community •  Chemistry needs a free, open framework 13  
  • 14. Avogadro’s Roots •  Avogadro projected started in 2006 •  First funded work in 2007 by Marcus Hanwell –  Google Summer of Code student –  Final year of Ph.D. spent the summer coding –  Funded as part of KDE project – Kalzium editor •  Built on several other open source projects –  Qt, Eigen, Open Babel, Blue Obelisk Data Repository •  Also uses open standards, e.g. OpenGL •  Cross platform, open source stack 14  
  • 15. Avogadro Vital Statistics •  Supports Linux, Windows and Mac OS X •  Contributions from over 20 developers •  Over 180,000 downloads over 4 years •  Translated into 19 languages •  Used by Kalzium for molecular editor •  Featured by Trolltech/Nokia, –  Qt in use –  Qt ambassador program 15  
  • 16. 16  
  • 17. Desktop Database •  Use of “document store” NoSQL •  Doesn’t force too much structure •  Some entries have experimental data available •  Some have computational jobs •  Employ a “pile of stuff” approach •  Can store both source and derived data •  Calculate identifiers, QSAR properties, etc •  MongoDB is a scalable, open solution •  Proven scaling with large web applications 17  
  • 18. Chemistry Data Explorer •  Qt application •  Connects to local or remote database •  Uses VTK for visual data exploration •  Can ingest new data –  Uses Open Babel to generate descriptors –  Standard InChi, SMILES, molecular weight –  More could be added •  All derived from files stored in the database 18  
  • 20. Database Interaction on the Web •  Avogadro directly accesses some (read- only) public databases: •  PDB, NIH “fetch by name” •  More could be added •  ChemData follows this approach •  Quixote aims to support both public and private sharing models – open framework 20  
  • 22. Avogadro 22  
  • 23. Calling Stand Alone Programs •  Many already supported: •  GAMESS, GAMESS-UK, Molpro, Q-Chem, MOPAC, NWChem, Gaussian, Dalton •  Easy to add more •  Some codes writing Avogadro based custom applications, •  Q-Chem, Molpro… •  DLPOLY author approached me: •  Open sourced DLPOLY2, want a GUI 23  
  • 25. OpenQube – Quantum Data •  Reads in key quantum data –  Basis set used in calculation –  Eigenvectors for molecular orbitals –  Density matrix for electron density –  Standard geometry •  Multithreaded calculation –  Produce regular grids of scalar data –  Molecular orbitals, electron density… 25  
  • 26. Molecular Orbitals and Electron Density •  Quantum files store basis sets and matrices −αr 2 GTO = ce φ i = ∑ c µiφ µ µ ρ(r) = ∑ ∑ Pµν φ µ φν µ ν •  Using these equations, and the supplied matrices – calculate cubes 26  
  • 27. Advanced Visualization: VTK •  New Avogadro plugin: •  Takes volumetric data from Avogadro •  Uses GPU accelerated rendering in VTK •  Widespread excitement from many in the chemistry community •  Several groups interested in collaborating •  Google Summer of Code project •  Leverage significant capabilities in VTK 27  
  • 28. Volume Rendered With Contours 28  
  • 29. Electron Density Volume Render 29  
  • 30. Electron Density Ray Tracing 30  
  • 31. VTK: The Toolkit •  Collection of C++ libraries –  Leveraged by many applications –  Divided into logical areas, e.g. •  Filtering – data processing in visualization pipeline •  InfoVis – informatics visualization •  Widgets – 3D interaction widgets •  VolumeRendering – 3D volume rendering •  Cross platform, using OpenGL •  Wrapped in Python, Tcl and Java 31  
  • 32. VTK Development Team • From Ohloh: Very large, active development team: Over the past twelve months, 100 developers contributed new code to VTK. This is one of the largest open-source teams in the world, and is in the top 2% of all project teams on Ohloh. and many others... 32  
  • 33. ParaView •  Parallel visualization application •  Open source, BSD licensed •  Turn-key application wrapper around VTK •  Parallel data processing and rendering 33  
  • 34. Large Data Visualization •  BlueGene/L at LLNL –  65,536 compute nodes (32 bit PPC) –  1,024 I/O nodes (32 bit PPC) –  512 MB of RAM per node •  Sandia Red Storm –  12,960 compute nodes (AMD Opteron dual) –  640 service and I/O nodes –  40 TB of DDR RAM per node 34  
  • 35. 1 Billion Cell Asteroid Simulation 35  
  • 36. Tiled Displays 36  
  • 38. 3D Chemistry Visualization •  Some existing features specific to chemistry –  Gaussian cube, PDB, and a few others •  Excellent handling of volumetric data: –  Marching cubes –  Volume rendering –  Contouring •  Advanced rendering: –  Point sprites –  Manta – real time ray tracing 38  
  • 39. Titan: VTK and Informatics •  Led by Sandia National Laboratories •  Substantial expansion of VTK: –  Informatics & analysis •  Actively developed, growing feature set •  Improved 2D rendering and API •  Database connectivity, client-server, pipeline based approach •  Uses web technologies such as ProtoViz •  Scalable, interactive infoviz 39  
  • 40. Manta: Real Time Ray Tracing 40  
  • 41. New Frontiers •  New work porting VTK –  Use C++ as the common core •  iOS port in the early stages •  Android port –  Use OpenGL ES 2.0 – new rendering code •  Also ParaViewWeb – delivering over web –  Use image delivery and rendering on server –  Also using WebGL for rendering (optionally) 41  
  • 42. Future Directions •  VTK modularization (in progress) –  Developing more agile build systems –  Automating more with CMake •  Using Git more fully to improve stability –  Use of master and next –  Topic branches - merge when ready •  Code review using Gerrit –  Integration with continuous integration –  Test before merge 42  
  • 45. Volumetric Data: Molecular Orbitals 45  
  • 46. Biomolecules 46  
  • 47. Nanomaterials 47  
  • 50. Hybrid Views: CPK + MO + Ball & Stick 50  
  • 51. Linked Views of Live Data 51  
  • 52. 2D: Graphs and Charts 52  
  • 53. Informatics 53