A Cabinet of Web 2.0
Scientiﬁc Curiosities Ian Mulvany, Product Development Manager, Nature Publishing Group This talk takes a tour through science related web 2.0 efforts and discusses areas of the practice of science that can be impacted through web 2.0 approaches. A video of this presentation will be posted at http://videolectures.net/
Some of the people involved
• Timo Hannay - Director Nature.com • Jason Wilde - Publisher Physical Sciences • Amanda Ward - Head of Platform Technologies • Tony Hammond - Applications Architect • Alf Eaton - Product Development Manager • Euan Adie - Product Development Manager • Gavin Bell - Product Development Manager • Hilary Spencer - Product Development Manager • Ian Mulvany - Product Development Manager
• Publishing Industry Facts &
Figures • Nature • (Some) Issues that Web 2.0 can impact • Identity and Authority • Content Discovery • Citizen Science • Google Wave • Ongoing Challenge • The Future
Costs of research Source: Research
Information Network A signiﬁcant contribution to the total cost of research is the time required for researchers to ﬁnd the appropriate material for reading. There is an opportunity here to decrease such costs through creation of better tools for information discovery. source http://www.rin.ac.uk/
• "It is intended, ﬁrst,
to place before the general public the grand results of scientiﬁc work and scientiﬁc discovery" • "to aid scientiﬁc men ... by affording Norman Lockyer them an opportunity of discussing the various scientiﬁc questions that arise from time to time" Nature is principally a scientiﬁc communication company. We have to engage with the methods of communication that are important for science. If we started today our starting point would naturally be the web, and not a print journal.
(Some) Publishing Milestones • 1896,
Wilhelm Röntgen, X-Rays • 1925, Raymond Dart , Australopithecus africanus • 1938, P Kapitza, Superfluidity • 1953, J D Watson and F H C Crick, DNA • 1985, J C Farman, B G Gardiner and J D Shanklin, Ozone Hole • 1995, Michel Mayor and Didier Queloz, Extra Solar Planets • 2001, Human Genome
Journal Evolution •1869 Journal Founded
•1899 Journal Makes a Proﬁt •1967 Peer Review •1971 First Expansion (until 1974) •1992 Nature Genetics •1995 Holzbrink Ownership •1995 Nature.com •2004 Connotea •2007 Nature Network Peer review only introduced in 1967 in order to deal with a backlog of about 3000 manuscripts.
2.0 Web 2.0 is about
getting and using data. There are two aspects, one is about lowering the barrier for participation, and the second is about data mining the resultant information in order to provide better services or tools. This can also lead to a strong ﬁrst mover advantage, as the network of data or participation gets bigger the value in the network gets bigger
Web 1.0 Web 2.0 DoubleClick
Google AdSense Ofoto Flickr Akamai BitTorrent mp3.com Napster Britannica Online Wikipedia personal websites blogging evite upcoming.org and EVDB domain name speculation search engine optimization page views cost per click screen scraping web services publishing participation CMS wikis directories (taxonomy) tagging (folksonomy) stickiness syndication
image credit sam brown, explodingdog
Should be aware not to focus on just the technology " " Building for Machines: " " " Semantic Markup " " " Well documented API's " " " " " Building for Humans:" " " " reduce the barrier to participation " " " increase the usefulness of serendipity and recommendation
Stay Classy, SXSW: Building Respectful
Software http://panelpicker.sxsw.com/ideas/view/3691?return=%2Fideas%2Findex %2Finteractive%2Fq%3Abuilding+respectful make your software respectful http://panelpicker.sxsw.com/ideas/view/3691?return=%2Fideas%2Findex%2Finteractive%2Fq%3Abuilding+respectful
“ While scientists have gloried
in the disruptive effect that the Web is having on publishers and libraries, with many ﬁelds strongly pushing open publication models, we are much more resistant to letting it be a disruptive force in the practice of our disciplines.” Jim Hendler Scientists resist Although the idea of a data driven approach should have an appeal to scientists, science changes slowly. There are a lot of implicit norms that are hard to change.
} NIH requests all Nature
offers to fundholders upload to PubMed deposit their 70% of Central on behalf manuscripts in scientists can’t of authors with PubMed Central even be their permission archive bothered to say } } ”yes” 4% compliance 30% compliance Scientists resist An example of low participation in open data models is the low uptake of deposition of articles into pubmed.
Humans Public Academic Machines This
is the framework that Iʼm going to be using to think about the topics in this talk. These are just two dimensions against which one can look at things, there are many other ways of looking at these issues. When putting together these slides I got interested in the tension between machine oriented efforts and human oriented efforts on the web. In addition web 2.0 can have a big impact on public engagement with science, so I wanted to see if I could line up these two trends together.
Identity on the web is
a fractured thing. It makes it difﬁcult to manage all of the accounts that a person has, but on the other hand it makes it easy to present different personas to different online communities.
100, 000 Identity is a
signiﬁcant and growing issue in science. Each year India produces 100, 000 postdocs. Full names are often not revealed owing to caste discrimination. http://www.nature.com/nature/journal/v452/n7187/full/452530d.html
1.1 Billion > 129 photo:
Szymon Kochanski 129 surnames are shared by 1.1 billion people, 85% of the chinese population. Generally identity is a self enforcing protocol. Works most of the time, but ... Surgeon Liu Hui, padded his CV with publications by another researcher who shared his surname and initial, rose to become an assistant dean at Tsinghua University. Discrepancies were noticed and he was dismissed by the university in March 2006
http://www.mluvany.net Scopus Author ID 6603325879
Thompson Researcher ID B-2805-2008 CrossRef 62.1000/182 Contributor ID These are currently the most commonly discussed options for managing identity within an academic context, each has pros and cons, and none has gained enough momentum to be universally adopted. Nature is currently taking a wait and see approach, but we would like to see an open system gaining adoption.
1619 - 1677 Henry Oldenburg,
ﬁrst secretary of the Royal Society, invented the practice of peer review with the Transactions of the Philosophical Society. His own reputation suffered, he was jailed for being a potential dutch spy and thrown in the tower of london for a while.
Main-path analysis and path-dependent transitions
in HistCite™-based historiograms Journal of the American Society for Information Science and Technology (forthcoming) Diana Lucio-Arias1 & Loet Leydesdorff2 Amsterdam School of Communications Research (ASCoR), University of Amsterdam Kloveniersburgwal 48, 1012 CX Amsterdam, The Netherlands. This is the Main-Path Analysis technique, but as yet such analysis tends to be done on a case by case basis.
1 Cox, D.R. (1972) Regression
models and life-tables. J. Roy. Statist. Soc. B 34: 21 000 Some papers act as a kind of black hole for citations, they get into the literature and get cited and cited and cited. This paper has over 21 000 citations. The mis-citations to this paper have a h-index of 12, a level that Hirsch had concluded “…might be a typical value for advancement to tenure…” http://network.nature.com/people/boboh/blog/2008/06/24/outdone-by-mis-prints
y easy plain text, emails
hyperlinks Twitter views tags citations? contributing microformats MicroFormats (semantic web) academic papers Semantic Web hard mining easy PDF sucks, academic papers are hard to create and PDF is hard to extract any useful information from in a programatic way.
Humans Article Writing Peer Review
Author Identiﬁcation Article Publishing Public Academic Machines This is where most of the academic publishing workﬂow currently lives, it is manual work that can only be done by highly trained experts.
Case Study: Nature Chemistry We
have started extracting entities from our Nature Chemistry journal, and we hope to roll this program out to other journals.
HO CAS – 50-67-9 NH
2 NH Serotonin SMILES – Oc1cc2c(cc1)ncc2CCN InChI – 1S/C10H12N2O/c11-43-7-6-12-10-2-1-8(13)5-9 (7)10/h1-2,5-6,1 2-13H,3-4,11H2 InChIKey – QZAYGJVTTNCV MB-HFFFAOYSA-N Chemistry is a visual science! molecules cas #s ﬁrst appeard in 1907, is owned by ACS, contains no semantics smiles 1987, not unique to a compound Inchi/Inchikey 200/2005
Organise metadata: create good architecture
so generated data can be easily reused across a range of applications. http://www.ﬂickr.com/photos/timecollapse/ We hope to be able to extended the types of entities that we are extracting from our articles.
Expanding the annotation of journal
articles from Nature Chemistry to Nature Chemical Biology and then to all NPG journals Creating a central NPG database of compounds and related journal articles
There are many curated databases
that look for information about domain speciﬁc results in the literature. An example is ﬂybase that collects information about results using the model organism Drosophila.
Wormbase does the same for
C. elegans. Both require a large amount of human curating. Having the body of scientiﬁc literature be semantically annotated should help with this kind of curation.
Nature Precedings was the ﬁrst
preprint server for the life sciences. It also includes the ability to vote and comment on submissions and provides each submission with a unique identiﬁer.
PLoS have launched PloS Currents:
Inﬂuenza, based on top of Google Knol. Both Preceedings and Currents have editorial curation of content, and allow easy publication of objects such as posters, proceedings papers and white papers.
The Kind of Information that
we can capture with Connotea includes full citation information Usage patterns, (when did an item get added to our DB, how many times has it been added) Extra meta-data such as tags Potentially social network information, how many of my friends have added this item?
This allows one to see
data on what is being read in Mendeley libraries. This starts to open up a new layer of information about the impact of papers that goes beyond what can be captured by the impact factor.
Nature Network Online social communities
also allow us to begin to capture conversations about science. NPG launched Nature Network in 2009 and is one of the most active online forums for the discussion of science.
People are using these rooms
to have real-time conversations around real-time events. This broadcasts an event and the conversions around an event to the web. It enables real time distant participation.
So now we can see
a world in which the article is no longer the only digital artefact of note. Much more of the process of science is becoming visible through online engagement of scientists.
Humans Article Writing Peer Review
Author Identiﬁcation Article Publishing Science Blogging/Tweeting/Social Communities SIOC Public Academic Entity Extraction Machines Social media as it exists now is problematic - effervescent - closed - siloed - unstructured Tools like SioC, an ontology for social media, can help draw this layer of information to the machine.
But another more interesting version
is to get people in interact directly with your data! " stardust at home " http://stardustathome.ssl.berkeley.edu/about.php " http://folding.stanford.edu/ " http://fold.it/portal/ " citizen science blog " http://citizensci.com/ " great backyard bird count " http://www.birdsource.org/gbbc/
You need to make it
engaging, like the Fold it Project, or Galaxy Zoo. Even if machines and machine learning could answer some of these questions (like image analysis of galaxy rotation), humans can do it now. You get the scientiﬁc beneﬁt now, you engage the public with science now.
Fold it Stardust at home
Humans Article Writing Peer to Patent Peer Review Galaxy Zoo Author Identiﬁcation Article Publishing Science Blogging/Tweeting/Social Communities Turk SIOC RDF Public Academic Entity Extraction Seti at Home Folding at home Machines Now we have an interesting picture, but most of the arrows in this picture point down. Where are the efforts to make computers more friendly to people? One pointer to how that will happen in the future is Google Wave.
Google Wave photo credit: ﬂickr
prgibbs New product from Google, launching in September 09 For the deﬁnitive guide to google wave look at: http://www.youtube.com/watch?v=v_UyVmITiYQ
Robot App Engine Gadget html5
Embed Container (blogger) Of interest for developers are the APIʼs the wave exposes. Naively one can think of Robots as allowing two way communication with a wave, Gadgets for pulling content into a wave, and the Embed gadget as a tool for pushing waves into other contexts, such as blogs or wikis.
Email Thread? Document? Game Server?
IM? Gallery? Group? ? ?? ? The metaphors for what wave is have not settled down yet. This is a consequence of the current interface, new interfaces will be possible. The key is that Wave enables exposing 3rd party APIʼs to the user in a totally opaque way. It hides the details, and makes it easier for people to interact with computers.
Fold it Stardust at home
Humans Article Writing Peer to Patent Peer Review Galaxy Zoo Author Identiﬁcation Article Publishing Science Blogging/Tweeting/Social Communities Turk SIOC RDF Public Academic WAVE Entity Extraction Seti at Home Folding at home Machines
• Publishers will continue to
exist but will become communication companies • They must learn to treat the web as a network, not a distribution channel • Journals should be more like databases, and vice versa • Publishing and broadcasting are merging (or colliding?); to some extent, he same goes for publishing and software • The disruptive forces include new economics, lower barriers to entry, and a complex competitive environment Final thoughts Some predictions for scientiﬁc publishing.
• Mobile devices as sensors
e.g. noisetube.net • Rich web applications building on HTML 5 will be a real competitor to the desktop • The problem of scientiﬁc identity will be solved • We will have a scientiﬁc recommendation engine that works • Frameworks for programming genetic code, much like we now program computer code, will be available • Computers will do much of the heavy lifting of science • http://www.nature.com/nature/focus/arts/futures Final thoughts Some predictions for science.
“The future is already here.
It's just not very evenly distributed” - William Gibson Sci Foo is an annual weekend un-conference that brings together people doing interesting things at the interface between science, technology and culture. Looking at what these people are doing gives us a hint of things to come.
Extra image Acknowledgements • http://www.flickr.com/people/matthewfield/
Matthew Field, Lots Of People • http://www.flickr.com/people/garthimage/ Garth Burgess, Southampton Docks • http://13c4.wordpress.com/ Pamela Bumstead, 50 reasons not to • http://www.flickr.com/people/mayeve/ clock • http://www.flickr.com/people/sublimelyhappy/ Sarah Gerke, Rolodex • http://www.flickr.com/people/thedepartment/ Kate Andrews, Library • http://www.flickr.com/people/sirstick/ Alexander Hauser, new mail • http://commons.wikimedia.org/wiki/User:CJ The Thinker • Gavin Bell, helpful discussions about OpenID