Berlin 6 Open Access Conference: Tony Hey


Published on

Published in: Education, Technology
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Berlin 6 Open Access Conference: Tony Hey

    1. 1. eScience and Open Access Supporting Data-Centric Research with Client + Cloud Tony Hey Corporate Vice President Microsoft Research
    2. 2. The Fourth Paradigm: Data-Centric Science
    3. 3. <ul><li>Data collection </li></ul><ul><ul><li>Sensor networks, satellite surveys, high throughput laboratory instruments, observation devices, supercomputers, LHC … </li></ul></ul><ul><li>Data processing, analysis, visualization </li></ul><ul><ul><li>Legacy codes, workflows, data mining, indexing, searching, graphics … </li></ul></ul><ul><li>Archiving </li></ul><ul><ul><li>Digital repositories, libraries, preservation, … </li></ul></ul><ul><li>SensorMap </li></ul><ul><ul><li>Functionality: Map navigation </li></ul></ul><ul><ul><li>Data: sensor-generated temperature, video camera feed, traffic feeds, etc. </li></ul></ul><ul><li>Scientific visualizations </li></ul><ul><ul><li>NSF Cyberinfrastructure report, March 2007 </li></ul></ul>
    4. 4. <ul><li>Thousand years ago – Experimental Science </li></ul><ul><ul><li>Description of natural phenomena </li></ul></ul><ul><li>Last few hundred years – Theoretical Science </li></ul><ul><ul><li>Newton’s Laws, Maxwell’s Equations… </li></ul></ul><ul><li>Last few decades – Computational Science </li></ul><ul><ul><li>Simulation of complex phenomena </li></ul></ul><ul><li>Today – Data-centric Science </li></ul><ul><ul><li>Scientists overwhelmed with data sets </li></ul></ul><ul><ul><li>from many different sources </li></ul></ul><ul><ul><ul><li>Data captured by instruments </li></ul></ul></ul><ul><ul><ul><li>Data generated by simulations </li></ul></ul></ul><ul><ul><ul><li>Data generated by sensor networks </li></ul></ul></ul><ul><ul><li>eScience is the set of tools and technologies </li></ul></ul><ul><ul><li>to support data federation and collaboration </li></ul></ul><ul><ul><ul><li>For analysis and data mining </li></ul></ul></ul><ul><ul><ul><li>For data visualization and exploration </li></ul></ul></ul><ul><ul><ul><li>For scholarly communication and dissemination </li></ul></ul></ul><ul><li>(With thanks to Jim Gray) </li></ul>
    5. 5. <ul><li>The Sloan Digital Sky Survey is the first major astronomical survey project: </li></ul><ul><ul><li>5 color images of ¼ of the sky </li></ul></ul><ul><ul><li>Pictures of 300 million celestial objects </li></ul></ul><ul><ul><li>Distances to the closest 1 million galaxies </li></ul></ul><ul><li>Jim Gray from Microsoft Research worked with astronomer Alex Szalay to build the public ‘SkyServer’ archive for the survey </li></ul><ul><li>New model of scientific publishing </li></ul><ul><ul><li>Have to publish the data before astronomers publish their analysis </li></ul></ul>
    6. 6. <ul><li>Posterchild in 21st century data publishing </li></ul><ul><ul><li>380 million web hits in 6 years </li></ul></ul><ul><ul><li>930,000 distinct users vs 10,000 astronomers </li></ul></ul><ul><ul><li>1600 scientific papers </li></ul></ul><ul><ul><li>Delivered 50,000 hours of lectures to high schools </li></ul></ul><ul><ul><li>Delivered 100B rows of data </li></ul></ul><ul><li>World’s most used astronomy facility over last 2 years </li></ul>
    7. 7. <ul><li>Goal of 1 million visual galaxy classifications by the public </li></ul><ul><li>Enormous publicity (CNN, Times, Washington Post, BBC) </li></ul><ul><li>100,000 people participating, blogs, poems … </li></ul><ul><li>Allows general public to search for photographs and classify different types of galaxies </li></ul>
    8. 9. <ul><li>Participants </li></ul><ul><li>Alyssa Goodman; Harvard University </li></ul><ul><li>Alex Szalay; Johns Hopkins University </li></ul><ul><li>Curtis Wong, Jonathan Fay; Microsoft Research </li></ul><ul><li>Goals </li></ul><ul><li>Integration of data sets and one-click contextual access </li></ul><ul><li>Easy access and use </li></ul><ul><li>In just over a little more than two months, a million users have downloaded, installed and launched the application (2,206,497 unique sessions) </li></ul><ul><li>We invite you to experience it! </li></ul>Seamless Rich Social Media Virtual Sky Web application for science and education
    9. 11. <ul><li>Journal subscriptions rising faster than library budgets. No freedom for new journals in new and emerging fields. </li></ul><ul><li>Web technology and digital media now make dissemination of knowledge ‘easy’ and ‘free’ without the traditional paper journals. </li></ul><ul><li>As Dean of Engineering at Southampton: </li></ul><ul><ul><li>Supposed to monitor the research output of over 200 Faculty and 500 Post Docs and Grad Students </li></ul></ul><ul><ul><li>University library could not afford to subscribe to all the journals that my staff published in, not to mention conference proceedings and workshop contributions … </li></ul></ul>
    10. 14. Requests for ETDs grew from around 220,000 in 1997/98 to nearly 20M by 2006/07
    11. 15. <ul><li>SciELO (scientific electronic library online) is a virtual library for Latin-America, the Caribbean, Spain and Portugal. </li></ul><ul><li>It consists of a network: </li></ul><ul><li>Regional collections (SciELO Brazil, SciELO Chile, SciELO Cuba, SciELO Colombia, etc) </li></ul><ul><li>Thematic areas (SciELO public health) </li></ul><ul><li>The library forms part of a project being developed by FAPESP (Fundação de Amparo à Pesquisa do Estado de São Paulo) in collaboration with BIREME (Centro Latinoamericano y del Caribe de Información en Ciencias de la Salud). </li></ul><ul><li>The FAPESP/BIREME project envisages developing a common methodology for preparing, storing, disseminating and evaluating scientific literature in electronic form. </li></ul>
    12. 17. Month SciELO Pages Translated
    13. 18. Supporting researchers worldwide The Research Lifecycle
    14. 19. <ul><li>Open access </li></ul><ul><li>Open source </li></ul><ul><li>Open data </li></ul> “ In order to help catalyze and facilitate the growth of advanced CI, a critical component is the adoption of open access policy for data, publications and software.” NSF Advisory Committee on Cyberinfrastructure (ACCI) <ul><li>Microsoft Interoperability Principles </li></ul><ul><ul><li>Open Connections to Microsoft Products </li></ul></ul><ul><ul><li>Support for Standards </li></ul></ul><ul><ul><li>Data Portability </li></ul></ul><ul><ul><li>Engagement with Open Source community </li></ul></ul>
    15. 20. What does this mean? You go to a great web site It supports OpenID No need to create/manage yet another account You can now use Live ID to authenticate
    16. 21. <ul><li>Great for interoperability </li></ul><ul><ul><li>Many sites support OpenID-based authentication </li></ul></ul><ul><ul><li>Other major ID providers are working on v2.0 of the protocol (e.g. Yahoo, Google) </li></ul></ul><ul><li>No need to manage account-related information on multiple sites </li></ul><ul><ul><li>Name, email, web site, interests, etc. </li></ul></ul><ul><li>You control which sites are allowed to access your profile information and/or authenticate </li></ul>
    17. 22. Insert Creative Commons licenses from any Office 2007 application Incorporate license information in the OOXML so that the license can be read even without Office installed Integration with the Creative Commons Web API so that new licenses can be created
    18. 23. <ul><li>Data Acquisition and Modeling </li></ul><ul><ul><li>Data capture from source, cleaning, storage, etc. </li></ul></ul><ul><ul><li>SQL Server, SSIS, Windows WF </li></ul></ul><ul><li>Support Collaboration </li></ul><ul><ul><li>Allow researchers to work together, share context, facilitate interactions </li></ul></ul><ul><ul><li>SharePoint Server, One Note 2007 (shared) </li></ul></ul><ul><li>Data Analysis, Modeling, and Visualization </li></ul><ul><ul><li>Mining techniques (OLAP, cubes) and visual analytics </li></ul></ul><ul><ul><li>SQL Analysis Services, BI, Excel, Optima, SILK (MSR-A) </li></ul></ul><ul><li>Disseminate and Share Research Outputs </li></ul><ul><ul><li>Publish, Present, Blog, Review and Rate </li></ul></ul><ul><ul><li>Word, PowerPoint </li></ul></ul><ul><li>Archiving </li></ul><ul><ul><li>Published literature, reference data, curated data, etc. </li></ul></ul><ul><ul><li>SQL Server </li></ul></ul>Microsoft has technologies that can offer end-to-end support eScience is the set of tools and technologies to support Data-centric Science
    19. 25. Semantic Annotations in Word <ul><li>Phil Bourne and Lynn Fink, UCSD </li></ul><ul><li>Goals </li></ul><ul><li>Semantic mark-up using ontologies and controlled vocabularies </li></ul><ul><li>Facilitate/automate referencing to PDB (and other resources) from manuscript </li></ul><ul><li>Conversion of manuscript to NLM DTD for direct submission to publisher </li></ul><ul><li>Scenario </li></ul><ul><li>Authors do not need to be aware of the use of semantic technologies </li></ul><ul><li>A domain-specific ontology is downloaded and made available from within Microsoft Word 2007 </li></ul><ul><li>Authors can record their intention, the meaning of the terms they use based on their community’s agreed vocabulary </li></ul>Attribution: Richard Cyganiak
    20. 26. Chemistry Drawing for Office <ul><li>Peter Murray Rust, Univ. of Cambridge </li></ul><ul><li>Murray Sargent, Office </li></ul><ul><li>Geraldine Wade, Advanced Reading Technologies </li></ul><ul><li>Goals </li></ul><ul><li>Support students/researchers in simple chemistry structure authoring/editing </li></ul><ul><li>Enable ecosystem of tools around lifecycle of chemistry-related scholarly works </li></ul><ul><li>Support the Chemistry Markup Language </li></ul><ul><li>Proof of concept plug-in </li></ul><ul><li>Execution </li></ul><ul><li>MSR Developer working on the proof of concept </li></ul><ul><li>Post-doc in Cambridge using prototype lug-in and giving feedback </li></ul><ul><li>Advanced Reading Technologies creating necessary glyphs </li></ul>
    21. 27. <? xml version =&quot;1.0&quot; ?> < cml version =&quot;3&quot; convention =&quot;org-synth-report&quot; xmlns =&quot;;> < molecule id =&quot;m1&quot;> < atomArray > < atom id =&quot;a1&quot; elementType =&quot;C&quot; x2 =&quot;-2.9149999618530273&quot; y2 =&quot;0.7699999809265137&quot; /> < atom id =&quot;a2&quot; elementType =&quot;C&quot; x2 =&quot;-1.5813208400249916&quot; y2 =&quot;1.5399999809265137&quot; /> < atom id =&quot;a3&quot; elementType =&quot;O&quot; x2 =&quot;-0.24764171819695613&quot; y2 =&quot;0.7699999809265134&quot; /> < atom id =&quot;a4&quot; elementType =&quot;O&quot; x2 =&quot;-1.5813208400249912&quot; y2 =&quot;3.0799999809265137&quot; /> < atom id =&quot;a5&quot; elementType =&quot;H&quot; x2 =&quot;-4.248679083681063&quot; y2 =&quot;1.5399999809265137&quot; /> < atom id =&quot;a6&quot; elementType =&quot;H&quot; x2 =&quot;-2.914999961853028&quot; y2 =&quot;-0.7700000190734864&quot; /> < atom id =&quot;a7&quot; elementType =&quot;H&quot; x2 =&quot;-4.248679083681063&quot; y2 =&quot;-1.907348645691087E-8&quot; /> < atom id =&quot;a8&quot; elementType =&quot;H&quot; x2 =&quot;1.0860374036310796&quot; y2 =&quot;1.5399999809265132&quot; /> </ atomArray > < bondArray > < bond atomRefs2 =&quot;a1 a2&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a2 a3&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a2 a4&quot; order =&quot;2&quot; /> < bond atomRefs2 =&quot;a1 a5&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a1 a6&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a1 a7&quot; order =&quot;1&quot; /> < bond atomRefs2 =&quot;a3 a8&quot; order =&quot;1&quot; /> </ bondArray > </ molecule > </ cml > Molecule added in Word* * This is just a screenshot from a very early prototype. We are actively working on significantly improving the quality of the rendering. CML stored in OOXML
    22. 28. <ul><li>Large collaboration project focusing on interoperability </li></ul><ul><li>OAI-ORE as the focus </li></ul><ul><li>Model for semantically representing eChemistry research work </li></ul><ul><li>Repository to store generated models </li></ul><ul><li>Models will be generated and consumed by tools used by chemistry researchers (e.g. PubChem, eCrystals, TableSheer, Excel, Word) </li></ul><ul><li>Project Collaborators </li></ul><ul><li>Lee Dirks, Alex Wade (Microsoft) </li></ul><ul><li>Carl Lagoze (Cornell University) </li></ul><ul><li>Geoffrey Fox, Marlon Pierce (Indiana University) </li></ul><ul><li>Peter Murray-Rust (University of Cambridge) </li></ul><ul><li>Herbert Van de Sompel (LANL)* </li></ul><ul><li>Steve Bryant (PubChem)* </li></ul><ul><li>C. Lee Giles, Prasenjit Mitra, Karl Mueller (Penn State) </li></ul><ul><li>Jeremy Frey, Simon Coles (University of Southampton) </li></ul><ul><li>* Advisory roles </li></ul>
    23. 29. <ul><li>Organization </li></ul><ul><li>High-profile EU Commission Project, €14M for 4 years </li></ul><ul><li>Consortium of 5 national libraries, 4 national archives, 4 universities and 4 industry partners </li></ul><ul><li>Goals </li></ul><ul><li>Preservation of Office Documents based on OpenXML </li></ul><ul><li>Deliver converters for MS Office binary formats </li></ul><ul><li>Funded open source project for ODF to/from OpenXML converter </li></ul><ul><li>Deliver Preservation Toolkit </li></ul>PLANETS Tools and methods for sustainable long-term preservation of digital objects
    24. 30. Cloud Computing
    25. 31. <ul><li>A Windows OS for the Cloud </li></ul><ul><ul><li>Platform for developing and hosting Cloud applications </li></ul></ul><ul><ul><li>Built to take advantage of datacenters </li></ul></ul><ul><ul><li>Exposes services that hosted applications can leverage </li></ul></ul><ul><ul><ul><li>Computer, storage, messaging, etc. </li></ul></ul></ul><ul><ul><ul><li>VMs on the horizon </li></ul></ul></ul>Source: Azure Services Platform whitepaper
    26. 32. <ul><li>Live Services </li></ul><ul><ul><li>Live Mesh, Desktop-in-the-cloud, files in the cloud </li></ul></ul><ul><ul><li>Mostly consumer-oriented services </li></ul></ul><ul><ul><li>Developer APIs for building consumer-oriented services and applications in the Cloud </li></ul></ul><ul><li>.NET Services </li></ul><ul><ul><li>Enterprise services </li></ul></ul><ul><ul><ul><li>Workflow, Identity and authentication, etc. </li></ul></ul></ul><ul><ul><li>Can be integrated by business applications </li></ul></ul>
    27. 33. <ul><li>A collaboration space in the Cloud </li></ul><ul><li>Current functionality </li></ul><ul><ul><li>Upload documents, invite people to join the workspace, share, collaborate on documents </li></ul></ul><ul><li>More great features coming soon </li></ul><ul><ul><li>Office Web applications announced at PDC08 </li></ul></ul>
    28. 34. <ul><li>Expect scientific research environments will follow similar trends to the commercial sector </li></ul><ul><ul><li>Leverage computing and data storage in the cloud </li></ul></ul><ul><ul><li>Scientists already experimenting with services </li></ul></ul><ul><li>For many of the same reasons </li></ul><ul><ul><li>Siloed research teams, no resource sharing across labs </li></ul></ul><ul><ul><li>High storage costs </li></ul></ul><ul><ul><li>Low resource utilization </li></ul></ul><ul><ul><li>Excess capacity </li></ul></ul><ul><ul><li>High costs of reliably keeping machines up-to-date </li></ul></ul><ul><ul><li>Little support for developers, system operators </li></ul></ul>
    29. 35. RIC VRE VISION         VIRTUAL RESEARCH ENVIRONMENT   <ul><li>Online tools and services to enhance research process . </li></ul><ul><li>Facilitate collaboration among researchers . </li></ul><ul><li>Provide a more effective means to work together . </li></ul><ul><li>Available as a service with minimal barrier to entry… </li></ul><ul><li>Most of the technology comprising a VRE is evolutionary . </li></ul><ul><li>It is the convergence of these technologies that is unique … </li></ul>     
    30. 36. CONTENT MANAGEMENT Tools to organize, manage and control digital content. Typical features include automated templates, organization, versioning, workflow management, document management, and content virtualization. KNOWLEDGE MANAGEMENT Tools for individuals and teams to distribute and share knowledge. Tools, such as blogs and wikis for more unstructured, self-governing approach to knowledge transfer, and the capture, and creation of knowledge through the development of new forms of community (rating, ranking, etc). SOCIAL NETWORKING Individuals or teams can express identify and identify collaborators. Primary features are profile pages, groups, self expression, content creation tools, content sharing, blogs and forums, recommendations and tagging. ONLINE COLLABORATION Tools that facilitate working together to achieve a common goal. Enable individuals to find each other and the information they need, communicate, and work together to achieve a common goal. Core elements are messaging, groupware, real-time collaboration and communication.
    31. 37. Currently in beta evaluation, directed by The British Library. Existing RIC Members Remember Me Login New to RIC? Sign Up Username: Password: Forgot your ID or Password? Plan The Research Search for study ideas, plan the study, and apply for funding. Network Connect with fellow researchers for sharing ideas, resources etc. Experiment Use online tools to achieve faster results. Publish Disseminate the study results for the public. British Library for Research A one stop solution for carrying out research studies in planned & phased manner and networking with fellow community members
    32. 38. <ul><li>Exchange, Sharepoint, Live Meeting, Dynamics CRM, etc. </li></ul><ul><li>No need to build your own infrastructure or maintain/manage servers </li></ul><ul><li>Moving forward, even research services could move to the Cloud </li></ul>
    33. 39. <ul><li>Important/key considerations </li></ul><ul><ul><li>Formats or “well-known” representations of data/information </li></ul></ul><ul><ul><li>Pervasive access protocols are key (e.g. HTTP) </li></ul></ul><ul><ul><li>Data/information is uniquely identified (e.g. URIs) </li></ul></ul><ul><ul><li>Links/associations between data/information </li></ul></ul><ul><li>Data/information is inter-connected through machine-interpretable information (e.g. paper X is about star Y ) </li></ul><ul><li>Social networks are a special case of ‘data meshes’ </li></ul>Attribution: Richard Cyganiak
    34. 40. scholarly communications domain-specific services instant messaging identity document store blogs & social networking mail notification search books citations visualization and analysis services storage/data services compute services virtualization Project management Reference management knowledge management knowledge discovery Vision of Future Research Environment with both Software + Services
    35. 41. <ul><li> </li></ul><ul><li>MSR downloads: </li></ul><ul><li> </li></ul><ul><li> </li></ul><ul><li>CodePlex: </li></ul><ul><li>The Faculty Connection; </li></ul><ul><li>MSDN Academic Alliance; </li></ul>