Acs denver dirks potenzone 30 aug2011


Published on

Presentation at the American Chemical Society Meeting in Denver CO, August 30, 2011 at the Skolnick Award Symposium honoring Sandy Lawson.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Acs denver dirks potenzone 30 aug2011

  1. 1. Enriched research documents at the cutting edge:When research papers no longer make sense on paper Rudy Potenzone SciencePoint Solutions Lee Dirks Education & Scholarly Communication Microsoft Research | Connections Presented at the American Chemical Society National Meeting Denver CO, August 30, 2011 at the Skolnick Award Symposium in Honor of Sandy Lawson
  2. 2. Agenda• Part 1 – The Scientific Paper• Part 2 – Emergence of the ePaper• Part 3 – Of Workflows and Add-ins• Part 4 – Impact of the ePaper• Part 5 – A Glimpse to the Future
  3. 3. Agenda• Part 1 – The Scientific Paper• Part 2 – Emergence of the ePaper• Part 3 – Of Workflows and Addins• Part 4 – Impact of the ePaper• Part 5 – A Glimpse to the Future 3
  4. 4. A Brief History of Enriched Scientific Papers• Research papers have long enjoyed the ability to exist on paper with enriched content• Embed figures and associated electronic items – chemical structures that included full bonding and structural information – Crystallographic databases – Spectral databases – Biological sequence and Pathway databases – Supplemental material repositories
  5. 5. Issues with External Repositories• Often not complete• Poorly audited with some notable exceptions• References between the paper and the files are often lost or incorrect• There is a real loss of context due to the separation of all the information• Reproducibility is not certain!
  6. 6. Agenda• Part 1 – The Scientific Paper• Part 2 – Emergence of the ePaper• Part 3 – Of Workflows and Add-ins• Part 4 – Impact of the ePaper• Part 5 – A Glimpse to the Future
  7. 7. My Bio – a Content Perspective• The NIH/EPA Chemical Information System – SANSS, MSSS, FRSS, etc.• Chemical Abstracts Service – CA, Registry, CASREACT, CHEMCATS, SciFinder• MDL Information Systems/Elsevier – ACD, various synthesis, Beilstein• LION bioscience, Ingenuity Systems – SRS and Ingenuity Pathway Analysis (IPA)• CambridgeSoft – ACX, etc.
  8. 8. Why Are We NOTFocusing On Authoring Tools?
  9. 9. On the Verge of a Major Revolution• Technology that enables authors to create elaborate versions of results of research• Capturing the full context of research in progress: – The formal scientific report – The very METHODS used – Full data repository – Complete workflows• With the resulting documentation offering information for completely reproducible results
  10. 10. Envisioning a New Era of Research Reporting Reproducible Research CollaborationInteractive Data Dynamic Documents Reputation & Influence
  11. 11. Benefits of a Scientific ePaper• Helping to improve the quality of science• Facilitating the intellectual transfer of the core discoveries• Fully documenting the provenance of the research• Preserving the knowledge with complete context• Services easily accessible on top of the data – a new value-added layer – visualization and analysis – discovery through simulation and modeling – etc.• Accessible Reproducible Research!!
  12. 12. Reproducible Research Scientific publications have at least two goals: 1. to announce a result and 2. to convince readers that the result is correct. 3. Preservation of knowledgeJill P. Mesirov. Accessible Reproducible Research. Science Vol. 327 (22) Jan 2010(from
  13. 13. Rich Original Content Fully Fully ReproducibleWorkflow Reproducible Content Process Content Content SharingEmbedded Driving Driving Better Services Better Science Science Full Data Content Embedded
  14. 14. Agenda• Part 1 – The Scientific Paper• Part 2 – Emergence of the ePaper• Part 3 – Of Workflows and Add-ins• Part 4 – Impact of the ePaper• Part 5 – A Glimpse to the Future
  15. 15. Redefining the Document• Microsoft introduced their open document format – OpenXML – in Office 2007
  16. 16. Project "Chem4Word"– Chemical Drawing in Microsoft WordSemantic chemistry for students and publishers Author/edit 1D and 2D chemistry. Intent: Recognizes Change chemical layout styles. chemical dictionary Relationships: and ontology terms Navigate and link referenced chemistry Data: Semantics stored in Chemistry Markup Language (CML) <?xml version="1.0" ?> <cml version="3" convention="org-synth-report" xmlns=""> <molecule id="m1"> <atomArray> <atom id="a1" elementType="C" x2="- 2.9149999618530273" y2="0.7699999809265137" /> <atom id="a2" elementType="C" x2="- 1.5813208400249916" y2="1.5399999809265137" /> <atom id="a3" elementType="O" x2="- ology/personaltech/08askk.html?_r=1 0.24764171819695613" y2="0.7699999809265134" /> <atom id="a4" elementType="O" x2="- 1.5813208400249912" y2="3.0799999809265137" /> <atom id="a5" elementType="H" x2="-4.248679083681063" y2="1.5399999809265137" /> <atom id="a6" elementType="H" x2="-2.914999961853028" y2="-0.7700000190734864" /> Intelligence: Verifies validity <atom id="a7" elementType="H" x2="-4.248679083681063" y2="-1.907348645691087E-8" /> <atom id="a8" elementType="H" x2="1.0860374036310796" y2="1.5399999809265132" /> of authored chemistry </atomArray> <bondArray> <bond atomRefs2="a1 a2" order="1" /> <bond atomRefs2="a2 a3" order="1" /> <bond atomRefs2="a2 a4" order="2" /> <bond atomRefs2="a1 a5" order="1" /> <bond atomRefs2="a1 a6" order="1" /> <bond atomRefs2="a1 a7" order="1" /> <bond atomRefs2="a3 a8" order="1" /> </bondArray> </molecule> </cml> V1.0 now available (binary and open source)
  17. 17. GenePattern Reproducible Research Add-in Services: Connects to GenePattern database Relationships: Inline graphics are synchronized to dataset Data: Control and execute query pipelinesData: Resulting data (and into GenePatternprovenance) stored withinWord document Source code and binary:
  18. 18. Research Information Centre (RIC) Project Virtual Research Environment (VRE) Toolkit for SharePoint Collaborative environment for research groups Personal site for each researcher and project site for each project Document management, federated search, social networking, real-time communication, blogs, wikisProject Overview: Version 1.1 (Open Source under Ms-PL):
  19. 19. oreChem – The Chemical Semantic Web • Lee Giles • Geoffrey Fox • Carl Lagoze • Jeremy Frey • Peter Murray-Rust • Karl Mueller • Simon Coles • Jim Downing • Prasenjit Mitra • Nico Adams Demonstrating: • Large collaboration project focusing on interoperability • At-source capture of chemistry dataSemantic storage • Chemical structure search • Compound object authoring • Retrospective harvesting of chemistry data • Reuse through common ORE data model • Semantic authoring • Virtualized triple storage experiments documents scientists text measurements molecules data data molecules Compound Mash-up (re-use) document of data authoring
  20. 20. Enabling the Chemical Semantic Web“RSC Publishing and Southampton University drivethe chemical semantic web…”
  21. 21. Recent developments of interestElseviers Article of the Future Competition Grand Challenge & Article of the Future contest -- ongoing collaboration between Elsevier and the scientific community to redefine how a scientific article is presented online.PLoS Currents: Influenza In conjunction with NIH & Google Knol – a rapid research note service, enable this exchange by providing an open-access online resource for immediate, open communication and discussion of new scientific data, analyses, and ideas in the field of influenza. All content is moderated by an expert group of influenza researchers, but in the interest of timeliness, does not undergo in-depth peer review.Nature Preceedings Connects thousands of researchers and provides a platform for sharing new and preliminary findings with colleagues on a global scale – via pre-print manuscripts, posters and presentations. Claim priority and receive feedback on your findings prior to formal publication.Mendeley (and Papers) Called “iTunes” for academic papers; 400,000+ users have signed up and a staggering 30+ million scientific papers have been uploaded.
  22. 22. Several Commercial Data Sharing + Analysis Services• Swivel• IBM’s “Many Eyes”• Gapminder & Google’s Trendalyzer• Metaweb’s “Freebase”• CSA’s “Illustrata”
  23. 23. Harvard’s “Dataverse” Project Via web application software, data citation standards, and statistical methods, the Dataverse Network project increases scholarly recognition and distributed control for authors, journals, archives, teachers, and others who produce or organize data; facilitates data access and analysis for researchers and students; and ensures long- term preservation whether or not the data are in the public domain. [From the Institute of Quantitative Social Science (IQSS) at Harvard University]
  24. 24. Taverna• Taverna is an open source and domain- independent Workflow Management System – A suite of tools used to design and execute scientific workflows and aid in silico experimentation.• Taverna has been created by the myGrid team and funded through OMII-UK. The project has guaranteed funding until 2014.• The Taverna Suite is written in Java and includes the Taverna Engine (used for enacting workflows) that powers both the Taverna Workbench (desktop client) and the Taverna Server.
  25. 25. More on Taverna• Integrated with other myGrid tools – social networking and workflow sharing environment for scientists – curated catalogue of Web services for Life Sciences
  26. 26. Provenance Log what, where, when who For data and for publicationsTo Do Ingredient List Dissolve 4- Add K2CO3 Heat at reflux Cool and add Heat at Cool and add Extract with Combine organics, List flourinated powder for 1.5 hours Br11OCB reflux until water (30ml) DCM dry over MgSO4 & Fluorinated biphenyl 0.9 g Br11OCB 1.59 g biphenyl in completion (3x40ml) filter Potassium Carbonate 2.07 g butanone Butanone 40 ml Plan Add Cool Add Reflux Liquid- Add Reflux Cool Add Dry Filter liquid extraction b Ev 0.9031 grammes excess g Inorganics dissolve 2 3 of 40 ml layers. Added brine ~20ml. text image Weigh Measure Measure Sample of 4- Butanone dried via silica column andProcess measured into 100ml RB flask. flourinatedRecord Used 1ml extra solvent to wash out biphenyl Annotate container. DCM MgSO4 Annotate 1 1 2 2 1 3 1 4 3 5 2 6 2 7 4 8 9 10 11 Add Cool Add Reflux Add Add Reflux Cool Liquid- Dry Filter text Sample of liquid (Buchner) Butanone Annotate extraction b Sample of Br11OCB Water Annotate Annotate Ev K2CO3 Measure Powder Measure 27 Weigh Weigh text Started reflux at 13.30. (Had to change heater stirrer) Only reflux 40 text Washed MgSO4 with text ml for 45min, next step 14:15. Organics are yellow DCM ~ 50ml
  27. 27. myGrid Open Suite of Tools Workflow Repository Workflow GUI Workbench Client User Interfaces and 3rd party plug-ins Web PortalsService Catalogue Provenance Workflow Programming and Store Server APIsActivity and Service Plug-in Manager Open Provenance Model Secure Service Access, and Programming APIs
  28. 28. Recycling, Reuse, Repurposing • Share • Search • Re-use • Re-purpose • Execute • Communicate • Record
  29. 29. Project TridentBuilt on Windows Workflow Foundation Author, Execute and Monitor Workflows Compose and modify workflows via drag & drop canvas View data products, performance metrics, and provenance data Version 1.2 (Open Source under Apache 2.0 License):
  30. 30. KNIME• KNIME (Konstanz Information Miner)• A user-friendly and comprehensive Open- Source platform for: – Data integration – Processing – Analysis – Exploration• Growing vendor adoption – PerkinElmer, Shrodinger, Tripos, CCG, ChemAxon, etc.
  31. 31. Accelrys Pipeline Pilot Chemistry
  32. 32. Accelrys Pipeline Pilot ADME
  33. 33. Accelrys Pipeline Pilot Biology
  34. 34. Accelrys Pipeline Pilot Genomics
  35. 35. Envisioning a New Era of Research Reporting Reproducible ResearchImagine…• Live research reports – multiple end-user ‘views’ – dynamically tailor presentations Interactive Collaboration• An Data authoring environment that absorbs and encapsulates – research workflows – outputs from the lab experiments• A report that can be dropped into an electronic lab workbench and Dynamic reconstitute an entire experiment Documents• Dynamic mash up data and workflows across experiments• Apply new analyses and Reputation visualizations and perform new in & Influence silico experiments
  36. 36. Agenda• Part 1 – The Scientific Paper• Part 2 – Emergence of the ePaper• Part 3 – Of Workflows and Add-ins• Part 4 – Impact of the ePaper• Part 5 – A Glimpse to the Future
  37. 37. Impact of These Innovations• On Science• On the Business of Science• On the Scientific Community• And Other Emotional Factors . . .
  38. 38. Overall Impacts Authors will be somewhat inconvenienced to learn new things . . . But as readers and consumers it will clearly be beneficial! Across Industry and Academia it will be positive advance The vendors will be skeptical and reluctant to change – but will move with the spending community!
  39. 39. On the Scientific Community• This will provide a significantly more capable platform for science – Extending collaboration – Easing validation of research – Offering transfer of knowledge and ease of extension of research projects• But is DOES further erode the status quo system of rewards and tenure!
  40. 40. And Other Emotional Factors Is There An Elephant In This Room??• The Publishers??• CAS?? Other A&I companies??• Well what about Electronic Lab Notebooks??
  41. 41. On the Business of Science• Publishers will need to continue to evolve to find a role as “cool provider” of these tools and become a “hot” distribution center• A&I companies will need to redefine their role• Software vendors have a real opportunity, if they can adapt . . .
  42. 42. The Value of the A & I LayersAbstracting and Indexing in the Future Going Forward Today • Indexing with Context “Built- • Indexing is Key In” • Precision and • Will Abstracting or more The Old Days Recall correctly ‘Content Monitoring’ • “Beats” Google become the value add? • Abstracting was • Or be an reliable data every time Key aggregator? • True Assessment of Content
  43. 43. Agenda• Part 1 – The Scientific Paper• Part 2 – Emergence of the ePaper• Part 3 – Of Workflows and Add-ins• Part 4 – Impact of the ePaper• Part 5 – A Glimpse to the Future
  44. 44. Rich Content Sources Direct Search Tools Challenge Or OpportunityReproducible Science Complete Provenance
  45. 45. The Opportunity Before Us • Faster Development in an Increasingly Complex World – Improving reproducibility of scientific results – Data Sharing and collaboration services – Reliable maintenance of provenance – Faster availability and efficient query tools – Secure and/or controlled access to data – Finding related data and research partners – Assurance that data will be preserved • A Brave New World for Scientific Discovery and Research – Cross-domain partnerships – Enhanced broad availability of data and prior research • Improved Knowledge Transfer – Both upstream and downstream – Realizing the promise of translational medicine
  46. 46. Thank You! Rudy Potenzone SciencePoint Solutions Lee Dirks Education & Scholarly Communication Microsoft Research | or scholar@microsoft.comURL – Scholarly Communication at Microsoft