Linking chemistry: wider lessons for how we publish research


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Re-engineer to have a single web site that integrated all our content types. Users can come in and explore out content, find relationships, find the value in their subscriptions. Involves a lot of complex database issues and historical content issues.
  • Starting point – ask our customers. Started with our internal customers (US). 30 interviews – like this one we did in China USA – 10 UK – 10 China – 10 Looking for similarities / differences / common threads – focus on what do you do, how can we help you do it.
  • Customer Focussed requirements: PDF – not Prospect Simplicity Few mouse clicks Google-like search
  • This is how people search over 60% What’s the point>
  • Users want PDF: “lovely, and beautiful” We’ve had a long-standing commitment to HTML as the article of the future Demonstrated embedded movies, semantic enrichment (data mining) and project prospect. None of that can be achieved through a static PDF. Enable a system for faster delivery of PDFs. PDF Advance Articles
  • Execution Joint announcement between Microsoft Research & Creative Commons at O’Reilly eTech Binary and source code available on CodePlex as of 3/11/2009 ( ) Based on a research project with Dr. Phil Bourne at University of California-San Diego (2008) Goals Facilitate semantic mark-up using ontologies and controlled vocabularies Facilitate/automate referencing to PDB, NCBO and other bio-related resources from manuscript Scenario Authors do not need to be aware of the use of semantic technologies A domain-specific ontology is downloaded and made available from within Microsoft Word Authors can record their intention, the meaning of the terms they use based on their community’s agreed vocabulary
  • Linking chemistry: wider lessons for how we publish research

    1. 1. Linking chemistry: wider lessons for how we publish research
    2. 2. Anomalocaris
    3. 4. What have we gained? <ul><li>Very interesting </li></ul><ul><li>Very clever </li></ul><ul><li>We’re all innovating into niches, </li></ul><ul><li>growing extra legs, fins, ears... </li></ul>
    4. 5. Who has this been for? <ul><li>US! </li></ul><ul><li>All good publicity, shows we’ll all good at our jobs, pats on the back all round </li></ul><ul><li>Have any of these widgets solved any more research problems? Saved time? Money? </li></ul><ul><li>And how much are they used? Really? </li></ul><ul><li>But helps become flexible to try out new things, test models, approaches, technologies </li></ul>
    5. 6. OK, what do we think? 2007 version <ul><li>Publishing methods are changing </li></ul><ul><li>Reading methods are changing </li></ul><ul><li>We know what our readers want </li></ul><ul><li>We know how the researchers work </li></ul><ul><li>How many did we get right? </li></ul>
    6. 7. RSC Prospect – semantic publishing <ul><li>What were we trying to improve? </li></ul><ul><ul><li>Discoverability </li></ul></ul><ul><ul><li>Use </li></ul></ul><ul><ul><li>Understanding </li></ul></ul><ul><ul><li>Linking </li></ul></ul><ul><li>Steering how the chemical sciences </li></ul><ul><li>develop on the web... </li></ul>
    7. 10. What do we learn with Prospect? <ul><li>This is probably the way to go </li></ul><ul><li>How do we cover all subjects? </li></ul><ul><li>Scale-up in manual QA </li></ul><ul><li>Scale-up during huge growth and scope </li></ul><ul><li>How to use all that real chemistry data </li></ul><ul><li>Pump prime to change what we ask from authors </li></ul><ul><li>Is our vision the day-glo article? </li></ul><ul><ul><li>“ Free headache for every user” </li></ul></ul>
    8. 11. Do we know what’s going on?
    9. 12. Vision … eBooks Journals Databases
    10. 13. Market Research
    11. 14. Students <ul><li>“ Starting point for finding information is Wikipedia” </li></ul><ul><li>“ I’d like to use images from articles” </li></ul><ul><li>“ Colleagues tell me about papers so I don’t really need to go looking for them” </li></ul><ul><li>Not very sophisticated outlook: have access to expensive services and content but don’t use. </li></ul>
    12. 15. Academic Faculty <ul><li>“ Too busy to spend much time online” </li></ul><ul><li>“ My students tell me about everything that I need to know” </li></ul><ul><li>“ I use email alerts to find content” </li></ul>
    13. 16. Industrial Researchers <ul><li>“ Need to find information any way we can” </li></ul><ul><li>“ Don’t have access to expensive systems” </li></ul><ul><li>Most ingenious and sophisticated users. </li></ul>
    14. 17. Librarians <ul><li>“ Don’t change any URLs” </li></ul><ul><li>“ Users are confused when they don’t have access to all the content” </li></ul><ul><li>“ When things go wrong, we have to sort it out” </li></ul><ul><li>“ I need my customers to know that the library has paid for this content” </li></ul>
    15. 18. Key Findings Simplicity
    16. 19. Key Findings
    17. 20. So what have we built? <ul><li>Simple interface to fulfill the main use – come in, find a paper </li></ul><ul><li>RSC Publishing platform </li></ul><ul><li>All our journal, books, databases XML, in Marklogic. Normalised. A single query interface. </li></ul>
    18. 21. The article of the future
    19. 22. PDFs
    20. 24. Dumbing down can be good <ul><li>Lowest common denominator </li></ul><ul><ul><li>PDF </li></ul></ul><ul><ul><li>Word & ChemDraw </li></ul></ul><ul><ul><li>JPG </li></ul></ul><ul><ul><li>Enables participation, exchange & comparison </li></ul></ul><ul><ul><ul><li>familiarity for the user </li></ul></ul></ul>
    21. 25. Authors? <ul><li>number </li></ul><ul><li> tech ability/interest of authors & readers </li></ul>
    22. 26. Rzepa,H.
    23. 28. <ul><li>Phil Bourne </li></ul><ul><li>Lynn Fink </li></ul>Source code and binary: Relationships: Ontology browser Intent: Term recognition & disambiguation based on OBO or OWL formats John Wilbanks Services: Ontology download web service Ontology Add-in for Word 2007
    24. 29. <? xml version =&quot;1.0&quot; ?> < cml version =&quot;3&quot; convention =&quot;org-synth-report&quot; xmlns =&quot;;> < molecule id =&quot;m1&quot;> < atomArray > < atom id =&quot;a1&quot; elementType =&quot;C&quot; x2 =&quot;-2.9149999618530273&quot; y2 =&quot;0.7699999809265137&quot; /> < atom id =&quot;a2&quot; elementType =&quot;C&quot; x2 =&quot;-1.5813208400249916&quot; y2 =&quot;1.5399999809265137&quot; /> < atom id =&quot;a3&quot; elementType =&quot;O&quot; x2 =&quot;-0.24764171819695613&quot; y2 =&quot;0.7699999809265134&quot; /> < atom id =&quot;a4&quot; elementType =&quot;O&quot; x2 =&quot;-1.5813208400249912&quot; y2 =&quot;3.0799999809265137&quot; /> < atom id =&quot;a5&quot; elementType =&quot;H&quot; x2 =&quot;-4.248679083681063&quot; y2 =&quot;1.5399999809265137&quot; /> < atom id =&quot;a6&quot; elementType =&quot;H&quot; x2 =&quot;-2.914999961853028&quot; y2 =&quot;-0.7700000190734864&quot; /> < atom id =&quot;a7&quot; elementType =&quot;H&quot; x2 =&quot;-4.248679083681063&quot; y2 =&quot;-1.907348645691087E-8&quot; /> < atom id =&quot;a8&quot; elementType =&quot;H&quot; x2 =&quot;1.0860374036310796&quot; y2 =&quot;1.5399999809265132&quot; /> </ atomArray > < bondArray > < bond atomRefs2 =&quot;a1 a2&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a2 a3&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a2 a4&quot; order =&quot;2&quot; /> < bond atomRefs2 =&quot;a1 a5&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a1 a6&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a1 a7&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a3 a8&quot; order =&quot;1&quot; /> </ bondArray > </ molecule > </ cml > Relationships: Navigate and link referenced chemistry Available soon: Data: Semantics stored in Chemistry Markup Language Intent: Recognizes chemical dictionary and ontology terms Author and edit 1D and 2D chemistry. Intelligence: Verifies validity of authored chemistry Authoring: Chem4Word – Chemistry Drawing in Word
    25. 30. We forget our place <ul><li>We assume our readers spend their lives in our platforms </li></ul>
    26. 31. <ul><li>As few interfaces as possible </li></ul>What do humans want?
    27. 32. What do computers want? <ul><li>Web services </li></ul>
    28. 33. What is ChemSpider? + + Free to use
    29. 34. Quick history <ul><li>Launched 2007 with a vision </li></ul><ul><li>Bedroom project for 2 years </li></ul><ul><li>Acquired by RSC in 2009 </li></ul><ul><li>Since acquisition </li></ul><ul><ul><li>Infrastructure and development resource </li></ul></ul><ul><ul><li>Community and user support added </li></ul></ul><ul><ul><li>Integration with RSC Publishing resources </li></ul></ul><ul><ul><li>Engagement with RSC’s community of influence </li></ul></ul>
    30. 35. ChemSpider as aggregator <ul><li>Public and non-public compound sets </li></ul><ul><li>Publications </li></ul><ul><li>Domain resources – DailyMed, Wikipedia </li></ul><ul><li>Personal data sets </li></ul><ul><li>SureChem patents </li></ul><ul><li>Multimedia additions, blogs, Open Notebook Sci </li></ul><ul><li>Data normalised, errors rejected </li></ul>
    31. 36. ChemSpider as chemical search <ul><li>Single search across 400 data sources </li></ul><ul><li>Text, structure and substructure search tools </li></ul><ul><ul><li>Simple and very advanced </li></ul></ul><ul><li>Validated names for external search expansion </li></ul><ul><ul><li>Google Scholar, PubMed, RSC, Google patents </li></ul></ul><ul><li>Links to original sources </li></ul><ul><li>Mobile version </li></ul>
    32. 37. ChemSpider as wiki <ul><li>Adds quality and quantity </li></ul><ul><li>Users can comment, others action </li></ul><ul><li>Registered users can curate names, add links, spectra, data and multimedia resources </li></ul><ul><li>Depositors can load data sets, curated on load </li></ul><ul><li>Curators and Master Curators approve additions and decide what is the correct structure </li></ul>
    33. 38. ChemSpider as resource <ul><li>Download and reuse structures </li></ul><ul><li>Download and reuse search structure set </li></ul><ul><li>2D and 3D images </li></ul><ul><li>Embed and link - never draw a structure again </li></ul><ul><li>Predicted properties </li></ul><ul><ul><li>Physical, Environmental, Biological </li></ul></ul><ul><li>Spectra, crystal structures </li></ul><ul><li>ChemSpider SyntheticPages </li></ul><ul><li>Spectral game </li></ul>
    34. 39. ChemSpider as compound hub <ul><li>Extensive web services </li></ul><ul><ul><li>Utility conversion services </li></ul></ul><ul><ul><li>MassSpec services </li></ul></ul><ul><ul><li>Name + structure lookups </li></ul></ul><ul><ul><li>Prediction services </li></ul></ul><ul><li>Validated name:structure pairs provides public disambiguation service </li></ul><ul><li>InChI Resolver – links InChIKeys to compounds: MYPYJXKWCTUITO - LYRMYLQWSA-N </li></ul>
    35. 40. For the RSC (more selfishly) <ul><li>Critical mass for a chemistry portal </li></ul><ul><li>Support for semantic developments </li></ul><ul><ul><li>Proven user need to search for compounds </li></ul></ul><ul><ul><li>Shop window for our other publications </li></ul></ul><ul><li>Ability to test other business models (some of which are a little scary) </li></ul><ul><li>Engagement with new publishing methods </li></ul>
    36. 41. ChemSpider as chemistry portal
    37. 42. What will the RSC be doing? <ul><li>Facilitating scientific communication </li></ul><ul><li>Publish articles – with the semantics </li></ul><ul><li>But what else can be part of this? </li></ul><ul><li>Partial publication syntheses, blogs, unpublished data </li></ul><ul><li>How does all this sit alongside traditional articles? </li></ul>
    38. 43. Micropublication?
    39. 44. Nanopublication? <ul><li>Slide from Jan Velterop </li></ul>
    40. 45. Exploring standards and extraction of semantic assertions <ul><li>Pistoia Alliance </li></ul><ul><li>“ An initiative to provide an open foundation of data standards, ontologies and web-services to streamline the Pharmaceutical Drug Discovery workflow” </li></ul><ul><li>Semantic Enrichment of the Scientific Literature (SESL) Oct09-Oct10 </li></ul><ul><li>Pistoia Alliance-funded </li></ul><ul><li>EBI </li></ul><ul><li>Elsevier, NPG, OUP, RSC </li></ul>
    41. 47. Do we know what the answer is? <ul><li>Probably not, but RSC Publishing & ChemSpider allows us to cover pretty much everything. </li></ul><ul><ul><li>Serve existing behaviours </li></ul></ul><ul><ul><li>Allow experimentation </li></ul></ul><ul><li>Science – keying into the readers as well as the researchers is more important than the technology. </li></ul>