2. Speakers & Topics
§ William Schroeder, President & CEO, Kitware, Inc.
- The whys and hows of Open Science
§ Dr. Marcus Hanwell, R&D Engineer, Kitware, Inc.
- Building an open-source research program (in Chemistry)
§ Brian Wylie, Sandia National Labs
- Research collaborations from a government perspective
3. The Scientific Method
• Document
• Share
• Data
• Methodology
• Archive
Galileo Galilei 1613
4. Open Science
Ensuring reproducibility
§ Open Documents
- Hypothesis
- Descriptions REPRODUCIBILITY
- Results
Positive Evidence Negative Evidence
§ Open Data
Accumulate Disproof
Support Hypothesis
§ Open Methodology
- Experimental apparatus
- Software If it isn’t reproducible, it
- Workflow isn’t science
- Parameter Sets
5. Example: OSA Interactive Science Publishing (ISP)
§ Augmented PDF
§ Contains links to executable viewer
§ Downloads data and viewer as necessary to reproduce
paper images (results)
6. Example: Insight Journal
§ Timely publishing of publications, data, and software
§ Evaluated automatically; further reviewed by community
PDF doc
Journal Git
Repository
Code
Input
Author Data
Results Web Build
Data Site Machines
7. Benefits of Open Science
§ Collaboration “…much of our intelligence and creativity
results from interactions with tools and
- Leveraging international communities artifacts and from collaborating with other
and expertize individuals.”
-- Shneiderman
§ Agile Innovation
- Facilitate technology mashups
- Move science to application faster
- More focus on technology; less on protection
§ Business Models
- Growing the pie, creating new opportunities
- Customization, software integration
8. Example: Collaboration
§ NIH National Center of Biomedical Computing NA-MIC
§ Developing the OS NA-MIC Kit; 3D Slicer application
9. Example: Agile Innovation (Open Source for Medical Imaging)
Creating VTK (Visualization
Toolkit)
Led to the creation of:
- ITK
- VolView and finally…
- BioImageXD
- Osirix
- MedINRIA
- VisTrails
- NIH / NCI caBIG – XIP
- VR-Renderer
- IGSTK
- ParaView
- Etc….
10. Example: Business Models
§ Kitware: Building open source collaboration
platforms
- The usual support and training
- Consulting
- Engaging in collaborative R&D
CMake
- Providing technology integration services,
aka creating custom solutions
CDash
11. The Open Technology Highway
§ Provide an open infrastructure
- Support research, teaching, non-profit
and commercial activities
- Any (legal) activity can hang off of the highway
- Spur innovation, create opportunities
- Get from idea to product faster
- Do not have to replicate technology
- Too many toll gates (i.e., closed systems,
unreasonable IP) slows everything down
- Prefer non-reciprocal licenses
12. Next Up
§ Marcus: Building a research program for chemistry
§ Brian: open science and research collaboration from a government perspective
14. Grass Roots Effort
§ Bootstrapped several efforts without funding
- Spare time
- Parts of other projects when possible
§ Formed an “unorganization” – Blue Obelisk
- Published first article in 2005
- Open data, open standards and open source
- Meet at ACS and other conferences when possible
- Follow-up article currently in press
§ Quixote collaboration more recently
- Provide meaningful data storage and exchange
- Principally targeting computational chemistry
15. The Early Years
§ Avogadro projected started in 2006
§ First funded work in 2007 by Marcus Hanwell
- Google Summer of Code student
- Final year of Ph.D. spent the summer coding
- Funded as part of KDE project – Kalzium editor
§ Built on several other open source projects
- Qt, Eigen, Open Babel, Blue Obelisk Data Repository
§ Also uses open standards, such as OpenGL for rendering
§ Cross platform, open source stack
16. Community Tools, Standards and Resources
§ Make extensive use of Qt for standard GUI elements
- Much more than just GUI – multithreading, web resources
- Avogadro chosen as an outstanding example of “Qt in Use”
- Marcus Hanwell recently chosen as a “Qt Ambassador”
§ OpenGL for cross platform 3D rendering
- Accelerated rendering of 3D molecular geometry
- Facilitates interacting with the scene
- Use of GLSL for impressive, fast rendering
§ Open Babel for chemical input/output and more
- There are a lot of chemical file formats…
- Has a lot of chemical knowledge, e.g. bond perception
§ Git for distributed version control
- We work across multiple sites, time zones and institutions
- Gerrit for code review more recently – improving code quality
17. Evangelizing: Getting the Message Out
§ Traditional social media used to communicate
- Blogs, Planets, Twitter, Identi.ca, Friendfeed, Google+
§ Talks and posters at conferences
- Open source conferences talking about chemistry
- Chemistry conferences talking about open source chemistry
§ Several meetings and workshops about open chemistry
- Daresbury Laboratory: Chemical Visualization and Quixote
- NIH National Cancer Institute – Databases and Open Chemistry
§ Publications in the traditional journals
§ Screencasts showing off what the software can do
§ In person workshops and training sessions
18. Bringing About Real Change
§ 2011 is the ”International Year of Chemistry”
§ Chemistry has been quite closed traditionally
§ We are working hard to change this
§ Recently led a Phase I SBIR to develop “open chemistry tools”
- GUI acting as the center of the chemical workflow
- Database application using MongoDB, chemically aware
- Cluster integration on the desktop – submit, monitor and retrieve
§ Chemical simulation/calculation now biggest HPC user in military
§ Open tools can use both open and closed computational codes
- Largely written in Fortran to run on clusters
- NWChem recently open sourced – PNNL quantum code
- Already work with GAMESS, GAMESS-UK, Q-Chem, Gaussian…
§ The time is right for change in chemistry
- Opportunity to accelerate the rate of research
19. Funding Open Chemistry Tools
§ Kitware’s core business is based on “open collaboration platforms”
§ Led a Phase I Small Business Innovation Research project (US Army)
- Invited to apply for Phase II funding, currently pending
§ Make use of Apache and BSD licenses
- Allow for participation of a wider cross-section of the community
- Reduced licensing complications
- Important for industry and government collaboration
§ Successfully taken part in Google Summer of Code – funded students
- Student in 2007 working on Avogadro and Kalzium
- Mentor for KDE in 2008-2010
- VTK organization administrator and mentor in 2011
§ Looking to other funding agencies and collaborations in future
20. Developing in Niche Areas
§ The population of active researchers in chemistry is relatively small
- The number of those researchers who code is even smaller
- Of those, the number that wish to contribute to open source is tiny
§ Developing and nurturing these communities can be challenging
§ Some students develop a feature in a summer and disappear
§ Other professors might develop code over the summers
§ Have to lower the barrier to entry as much as possible
§ Often need to help with tools, build systems, etc
21. Enabling Technologies in Chemistry
§ Large number of computational chemistry codes
- Many do not have dedicated user interfaces
- Forming a new area enabling chemical workflows
- Some of the open source codes that can benefit
- NWChem – quantum chemistry code
- Quantum Espresso – plane wave code
- Free for use codes such as GAMESS
- Commercial codes such as Molpro, Q-Chem, others
- These codes are executed in a separate process
§ Libraries that can be used in the GUI:
- The Visualization Toolkit (VTK) provides advanced rendering
- ParaView library provides client-server technology for large data
22. Working With Academia, Industry and Government
§ In the past licensing has not been ideal
- Some form of GPL or non-commercial only license fine for most academics
- Industry and government need more liberal licenses in general, e.g. BSD, Apache 2
§ Can be challenging to ensure everyone gets something out of the deal
§ Avoiding the trap of dual-licensing – often kills community and shared ownership
§ Funders can find it harder to understand commercialization
§ We normally employ a services/consulting role
23. Government
Open
Source
Collabora'ons
Brian Wylie
Sandia National Laboratories
Sandia
Na7onal
Laboratories
is
a
mul7-‐program
laboratory
managed
and
operated
by
Sandia
Corpora7on,
a
wholly
owned
subsidiary
of
Lockheed
Mar7n
Corpora7on,
for
the
U.S.
Department
of
Energy’s
Na7onal
Nuclear
Security
Administra7on
under
contract
DE-‐AC04-‐94AL85000.
24. Government Open Source
Resources
• GOSCON
Government
Open
Source
Conference
(goscon.org)
• Open
Source
Center:
Foreign
open
source
intelligence
data
(opensource.gov)
• Open
Source
SoQware
Ins7tute:
Non-‐profit
corp/govt/acad
(oss-‐ins7tute.org)
• Government
Open
Source
SoQware
Resource
Centre
(gossrc.org)
• Center
for
Strategic
and
Interna7onal
Studies
(tracks
open
source
legisla7on
csis.org)
25. Government Open Source
Around
the
World
180
Open
Source
Ini'a'ves
by
Region
(2000-‐2009)
160
140
120
100
Failed
80
Proposed
60
Approved
40
20
0
Europe
Asia
La7n
North
Africa
Middle
America
America
East
Data
Courtesy
of
the
Center
for
Strategic
and
Interna'onal
Studies
26. Government Open Source
Example
Projects
Sandia
Los
Alamos
Kitware
University
of
Utah
Open
source
data
analysis
and
visualiza7on
pla[orm
28. Government Open Source
Collabora'on
Benefits
No
specific
vendor
“lock-‐in/out”
Allows
a
diversified
development
team
Government
Known
code
base
(strengths
and
weaknesses)
Typically
easier
to
integra7on
with
other
OS
tools
Improvement
of
the
OS
project
Money
Commercial
Leveraging
project
for
other/future
work
Improvement
of
the
OS
project
Student/Professor
support
Academic
Publishing/Sharing
Improvement
of
the
OS
project
29. Government Open Source
Collabora'on
Issues
Need
to
relax
into
exis7ng
OS
license*
Government
New
projects
should
pick
a
liberal
OS
license
Funding
source
may
hesitate
on
Open
Source
Proprietary
projects
/
Intellectual
Property
Government
bureaucracy
Commercial
Mixed
soQware
skill
set
Deliverables
can
get
distorted
*
No
gov’t
sell
back
clause
Academic
Work
may
not
be
publica7on
material
If
you
do
publish,
it
may
be
a
joint
publica7on