Communicating Systems Biology - Why and How We Should Do Better in a Digital World
1. Communicating Systems Biology –
Why and How We Should Do Better
in a Digital World ?
Philip E. Bourne
University of California San Diego
pbourne@ucsd.edu
http://www.slideshare.net/pebourne/
ICBP Houston April 27, 2012
2. Why We Should Do Better
• Discovery processes are increasingly complex
and broad in scope
• Data must be connected more closely to the
methods under study
• Science is an increasingly social endeavor
http://www.discoveryinformaticsinitiative.org/
Yolanda Gil and Haym Hirsch
ICBP Houston April 27, 2012
3. Why We Should Do Better
The Scientific Process is Too Slow to Respond to
a Crisis – Either Global or Personal
By the time the paper is published
we could all be dead
http://knol.google.com/k/plos-currents-influenza#
ICBP Houston April 27, 2012
Motivation
4. In a time of crisis the need for fast access
to accurate data and any knowledge associated
with that data are paramount
Structure Summary page activity for
H1N1 Influenza related structures
Jan. 2008 Jul. 2008 Jan. 2009 Jul. 2009 Jan. 2010 Jul. 2010
3B7E: Neuraminidase of A/Brevig Mission/1/1918
H1N1 strain in complex with zanamivir
1RUZ: 1918 H1 Hemagglutinin
* http://www.cdc.gov/h1n1flu/estimates/April_March_13.htm
ICBP Houston April 27, 2012
Motivation
5. If that is not enough…
For some people the scientific
process may be too slow to save
their life
ICBP Houston April 27, 2012
Motivation
6. Josh Sommer – A Remarkable Young Man
Co-founder & Executive Director the Chordoma Foundation
http://sagecongress.org/Presentations/Sommer.pdf
ICBP Houston April 27, 2012
Motivation
7. Chordoma
• A rare form of brain
cancer
• No known drugs
• Treatment – surgical
resection followed by
intense radiation
therapy
http://upload.wikimedia.org/wikipedia/commons/2/2b/Chordoma.JPG
ICBP Houston April 27, 2012
Motivation
14. Science is an Increasingly Social
Endeavor
Witness the Story of Meredith
ICBP Houston April 27, 2012
15. A Requirement is More Open Science
But ….
ICBP Houston April 27, 2012
16. Openness is Misunderstood by
Scientists
• Witness the confusion regarding open access
• Witness PubMed Central
ICBP Houston April 27, 2012
17. What Are the Impediments to Open
Science?
Change Reward
You don’t get tenure for starting
a blog!
ICBP Houston April 27, 2012
18. How Can We Do Better? …
ICBP Houston April 27, 2012
19. How Can We Do Better?
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management and discovery tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
ICBP Houston April 27, 2012 Easy Hard
20. Both Are Under Stress
• PubMed contains ~21M • 1330 databases
entries (May 2011) reported in NAR 2011
• ~100,000 papers indexed • MetaBase
per month http://biodatabase.org
• In Feb 2009: reports 2,651 entries
– 67,406,898 interactive
edited 12,587 times
searches were done
– 92,216,786 entries were
viewed
PLoS Comp. Biol. 2005 1(3) e34
21. Some More Comparisons
• Journals have a pretty • Efforts to make the
standardized interface interfaces different!
• Little attempt at a
• Journals have a business
business model
model compared to the Web 2.0
• The quality is declining as world
numbers increase (?) • Quality is increasing (?)
• Audience believes they • Not well sustained
are sustainable
Databases versus journals PLoS Comp. Biol. 2008. 4(7): e1000136
22. We Need Data and
The Knowledge and Data Cycle Knowledge About That
0. Full text of PLoS papers stored 4. The composite view has
in a database links to pertinent blocks Data to Interoperate
of literature text and back to the PDB
1. User clicks on content
4.
2. Metadata and
webservices to data
provide an interactive
1. view that can be
3. A composite view of
1. A link brings up figures
from the paper journal and database
content results
annotated
3.
3. Selecting features
provides a
data/knowledge
mashup
2. 4. Analysis leads to new
2. Clicking the paper figure retrieves
data from the PDB which is
content I can share
analyzed
PLoS Comp. Biol. 2005 1(3) e34
23. We Need Data and Knowledge About That
Data to Interoperate – What is Stopping Us?
• Governance – publishers vs. database
providers
• Reward
• Metadata standards for provenance, privacy
etc.
• Exemplars
• ….
Caveat: Each discipline is different – I speak very much from a biomedical
sciences perspective
ICBP Houston April 27, 2012
24. A Small Example - The World Wide
Protein Data Bank
• The single worldwide
repository for data on
the structure of
biological
macromolecules
• Vital for drug discovery
and the life sciences
• 41 years old
• Free to all
http://www.wwpdb.org
ICBP Houston April 27, 2012
PLoS Comp. Biol. 2005 1(3) e34
We need data and knowledge about that data to interoperate
25. The World Wide Protein Data Bank –
The Best Case Scenario
• Paper not published
unless data are
deposited – strong data
to literature
correspondence
• Highly structured data
conforming to extensive
ontologies
• DOI’s assigned to every
structure
http://www.wwpdb.org
ICBP Houston April 27, 2012
PLoS Comp. Biol. 2005 1(3) e34
We need data and knowledge about that data to interoperate
26. Example Interoperability: The Database View
www.rcsb.org/pdb/explore/literature.do?structureId=1TIM
ICBP Houston April 27, 2012
BMC Bioinformatics 2010 11:220 We need data and knowledge about that data to interoperate
27. Example Interoperability: The Literature View
http://biolit.ucsd.edu
ICBP Houston April 27, 2012
Nucleic Acids Research 2008 36(S2) W385-389
We need data and knowledge about that data to interoperate
28. Semantic Tagging & Widgets are a
Powerful Tool to Integrate Data and
Knowledge of that Data, But as Yet
Not Used Much
Will Widgets and Semantic Tagging Change Computational Biology?
PLoS Comp. Biol. 6(2) e1000673
ICBP Houston April 27, 2012
We need data and knowledge about that data to interoperate
29. Semantic Tagging of Database Content
in The Literature or Elsewhere
ICBP Houston April http://www.rcsb.org/pdb/static.do?p=widgets/widgetShowcase.jsp
27, 2012
Semantic Tagging PLoS Comp. Biol. 6(2) e1000673
30. Where Will It All End?
http://richard.cyganiak.de/2007/10/lod/lod-datasets_2010-09-22_colored.html
ICBP Houston April 27, 2012
31. This is Literature Post-processing
Better to Get the Authors Involved
• Authors are the absolute experts on the
content
• More effective distribution of labor
• Add metadata before the article enters the
publishing process
ICBP Houston April 27, 2012
We need data and knowledge about that data to interoperate
32. Word Add-in for Authors
• Allows authors to add metadata as they write, before they
submit the manuscript
• Authors are assisted by automated term recognition
– OBO ontologies
– Database IDs
• Metadata are embedded directly into the manuscript
document via XML tags, OOXML format
– Open
– Machine-readable
• Open source, Microsoft Public License
http://www.codeplex.com/ucsdbiolit
ICBP Houston April 27, 2012
We need data and knowledge about that data to interoperate
33. Challenges
• Authors
– Carrot IF one or more publishers fast tracked a
paper that had semantic markup it might catch on
• Publishers
– Carrot Competitive advantage
ICBP Houston April 27, 2012
We need data and knowledge about that data to interoperate
34. The Promise – A Hypothetical Example
Cardiac Disease
Literature
Immunology Literature
Shared Function
ICBP Houston April 27, 2012
We need data and knowledge about that data to interoperate
35. How Can We Do Better?
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management and discovery tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
ICBP Houston April 27, 2012 Easy Hard
36. One Small Example of the Problem
• jMol, VMD … are de facto
standard important tools
for rendering biological
molecules .. but
• They are not versatile ie
do not for example:
– Respond to the data they
are reading
– Offer views that match the
users interests
– Allow the user to annotate
the data
– Allow those annotations to
be shared (published?)
ICBP Houston April 27, 2012
Think More About the Tools
37. Github is Great But We Need Apps for
Science
Computational Biology Resources Lack Persistence and Usability. PLoS
Comp. Biol. 2008 4(7): e1000136
38. A Few Things to Accelerate the Rate of
Scientific Discovery
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
ICBP Houston April 27, 2012 Easy Hard
39. Reward Systems Need to Change
What is Needed?
• Author disambiguation
• Auditing (identification and metrics) of all
scholarship - means new tools
• Seniors need to promote alternative forms of
scholarship
• Juniors need to respond
Ten Simple Rules for Getting Promoted as a Computational Biologist in Academia
PLoS Comp Biol 2011 7(10 e1002001
ICBP Houston April 27, 2012
Reward Systems Need to Change
40. What Are these Alternative Forms of
Scholarship?
Reviews Curation
Research
[Grants]
Journal Poster
Article Session
Conference
Paper
Blogs
Community Service/Data
ICBP Houston April 27, 2012
Reward Systems Need to Change
42. A Unique Identifier is Going to Happen
• It is DOIs for people
• Some scientists will
resist
• The winner is ORCID?
ICBP Houston April 27, 2012
Reward Systems Need to Change
43. Ideally the ID will be Tagged to Every
Piece of Scholarly Communication
I an Not a Scientist I am a Number
PLoS Comp. Biol. 2008 4(12) e1000247
ICBP Houston April 27, 2012
Reward Systems Need to Change
44. One Solution:
Use the Traditional Reward System in New Ways
The Wikipedia Experiment – Topic Pages
• Identify areas of Wikipedia that
relate to the journal that are
missing of stubs
• Develop a Wikipedia page in the
sandbox
• Have a Topic Page Editor review
the page
• Publish the copy of record with
associated rewards
• Release the living version into
Wikipedia
ICBP Houston April 27, 2012
45. How Can We Do Better?
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management and discovery tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
ICBP Houston April 27, 2012 Easy Hard
46. The Truth About My Laboratory
• I have ?? mail folders!
• The intellectual
memory of my
laboratory is in those
folders
• This is an unhealthy hub
and spoke mentality
ICBP Houston April 27, 2012
We Need Scientist Management Tools
47. The Truth About My Laboratory
• I generate way more negative that
positive data, but where is it?
• Content management is a mess
– Slides, posters…..
– Data, lab notebooks ….
– Collaborations, Journal clubs …
• Software is open but where is it? http://artbyvida.com/portfolio.php
• Farewell is for the data too
Computational Biology Resources Lack Persistence and Usability. PLoS
Comp. Biol. 2008 4(7): e1000136
ICBP Houston April 27, 2012
We Need Scientist Management Tools
48. Many Great Tools Out There
Taverna
ICBP Houston April 27, 2012
We Need Scientist Management Tools
49. The Dream of Discovery Informatics
• At the end of the day a software agent reviews
all of our labs electronic notebooks. Common
themes and individual interests are extracted
and searched against recent literature, public
data, blogs, other social media and results
returned and ranked for perusal next morning
over coffee.
ICBP Houston April 27, 2012
50. How Can We Do Better?
• Better communication, data and knowledge access,
and new modes of discovery, which means:
– We need data and knowledge about that data to
interoperate i.e. we need new kinds of fast, versatile
publications and data archives
– We need to be more open with both
– We need to think more about the tools that analyze,
visualize and annotate data to maximize knowledge
discovery
– Reward systems need to change
– We need scientist management tools
– We need to be less fixated on the big data problems
– We need to unleash the full power of the Internet
ICBP Houston April 27, 2012 Easy Hard
51. Yes YouTube Can Increase the Rate of
Discovery
Unleash the full power of the Internet
52. The Lab Experiment
Paper+Rich Media
• My students enjoyed the experience
• The shyest student was actually the most bold
in front of the camera
• “We will become a generation of “science
castors”
• They liked the exposure for the most part –
rather than the PI it puts them out in front
ICBP Houston April 27, 2012
Unleash the full power of the Internet
53. Organic Growth
3 Years Later
www.scivee.tv
• Some of their work viewed 20,000+ times
• Global audience of researchers, educators and
academic/research institutions
– 60,000 unique visitors & 2M pageviews/month
– 16,000 registered users & 600 communities
– 5,000 uploads of video content (about journal articles,
conferences, research news and classes)
– Growing 4-5% monthly
• Sustainability - evolving a business model
supporting journals and conferences
ICBP Houston April 27, 2012
Unleash the full power of the Internet
54. Products
What Emerged: SciveeCasts
ApplicationProduct Primary Customers
Journals PubCast Journals, publishers,
societies
Meetings PosterCast Societies, conference orgs.
SlideCast
Comm. PaperCast Societies, journals
Podcast
SlideCast
Education PosterCast Societies, universities
SlideCast
Books BookCast Publishers, book sellers
ICBP Houston April 27, 2012
Unleash the full power of the Internet
55. Proposal - The TeachU Workflow
Step 1 Mac
PC
presenter starts
PowerPoint
Step 4
Slides
slides are
uploaded
Website
Step 3
presenter stops
recording and
initiates upload Step 5
Step 2
presenter starts slides and podcast Step 6
recording on are automatically listener
smart phone Sync File synchronized plays back
Podcast
synchronized
presentation
Android
iPhone
Windows Phone 7 ICBP Houston April 27, 2012
56. Acknowledgements
• BioLit Team
– Lynn Fink • wwPDB team
– Parker Williams
– Marco Martinez – Andreas Prilc
– Rahul Chandran – Dimitris Dimitropoulos
– Greg Quinn
• MBT • SciVee Team
– John Moreland – Apryl Bailey
– John Beaver
– Leo Chalupa
– Lynn Fink http://www.scivee.tv
• Microsoft Scholarly Communications
– Pablo Fernicola – Marc Friedman (CEO)
– Lee Dirks – Ken Liu
– Savas Parastitidas – Alex Ramos
– Alex Wade – Willy Suwanto
– Tony Hey
– Ben Yukich
http://biolit.ucsd.edu
http//www.pdb.org
ICBP Houston April 27, 2012
http://www.codeplex.com/ucsdbiolit
58. What Is Open Science
• Unrestricted access and reuse of scientific
knowledge as found in the literature and
elsewhere provided attribution is given
• Ditto the data, protocols, software etc. from
which that knowledge is derived
• Something catalyzed by the Fourth Paradigm
ICBP Houston April 27, 2012
59. What Motivates Me to Talk About
Open Science?
• I am a domain (life) scientist not a computer or information
scientist
• I have been co-directing a major open and freely accessible
biological data source – the Protein Data Bank (PDB) for the past 11
years.
• Almost 6 years ago I co-founded and remain the founding Editor in
Chief of the open access journal PLoS Computational Biology
• I co-founded SciVee.tv to disseminate science in new ways
• There must be a business model to enable persistence and growth
ICBP Houston April 27, 2012
60. What Are the Promises of Open
Science?
• To accelerate the rate of scientific discovery
worldwide
• To enable contributions from a broader
geographic and economic base
• To approach learning and comprehension in
new ways
• To reach a broader audience including the
general public
ICBP Houston April 27, 2012
61. MBT Features
http://mbt.sdsc.edu
• Offer a framework not an
end user application
• Responds to the data type
• Support read write access
Immunologists
• Encourages others to
write end user
Immunome Research, 2007 3(1):3
applications
• Discourages feature creep
Medicinal BMC Bioinformatics 2005, 6:21.
Chemists
ICBP Houston April 27, 2012
Think More About the Tools