E research overview gahegan bioinformatics workshop 2010


Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Reproducible Science
  • Some might be the teams and some might be publicly available. Journal of Visualised Experiments Workflows are methods Blue are outputs Yellow is publishing Pink are third party commentary and relationship to third parties Pointing to all of these. And each can also point to a range of articles.
  • Some might be the teams and some might be publicly available. Journal of Visualised Experiments Think the pebble business Points the other way too. It’s an open linked data web of science.
  • Research is becoming more interdisciplinary and that the teams are becoming more distributed - There is therefore an substantial investment being made in e-research and VREs as a means of supporting this; There continues to be an awesome amount of investment in e-Science, cyberinfrastructure and web-based infrastructure to support scientific collaboration. This was the whole point of the UK’s eScience programme (£240million) and the US Cyberinfrastructure program Research Information Centre (Microsoft and British Library) is a small step. How does the British Library plan to support internationally distributed, sometimes collaborating yet sometimes competing, multi and cross disciplinary researchers?
  • E research overview gahegan bioinformatics workshop 2010

    1. 1. eResearch: the evolution of science Mark Gahegan Center for eResearch The University of Auckland
    2. 2. Vannevar Bush, As We May Think (1945) <ul><li>There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers - conclusions which he cannot find time to grasp, much less to remember, as they appear. </li></ul><ul><li>Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose… </li></ul><ul><li>… A record, if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted. (Bush, 1945) </li></ul>
    3. 3. The data explosion (from Wired ‘Big Data, July 2008) Terabytes What it stores 1 2,600 songs Large hard disk ($200) 20 Photos uploaded to FaceBook every month 120 All the data collected by the Hubble telescope 330 Weekly data produced by the Large Hadron Collider (est.) 440 All the international climate / weather data compiled by the National Climatic Data Center in the USA 530 All the videos in YouTube 1000 (1 petabyte) Data processed by Google servers every 72 minutes
    4. 4. Sarah E. Fratesi, 2008 Journal of Research Practice Volume 4, Issue 1, Article M1, Scientific Journals as Fossil Traces of Sweeping Change in the Structure and Practice of Modern Geology
    5. 5. Problems with Science <ul><li>The three pillars of Science </li></ul><ul><ul><li>Communicable </li></ul></ul><ul><ul><li>Repeatable </li></ul></ul><ul><ul><li>Refutable </li></ul></ul><ul><li>Science efficiency </li></ul><ul><ul><li>Share expensive facilities / equipment </li></ul></ul><ul><ul><li>Find, use, and understand, relevant resources </li></ul></ul><ul><ul><li>Question assumptions and reasoning effectively </li></ul></ul>
    6. 6. Connectivity resources eResearch Societal context Science drivers Global issues Theories, concepts Knowledge representation Data: Observations, measurements, experiments Instrumentation Information: real-time, archives, analyses Informatics resources Models, simulations Supercomputing People Collaboration, visualization, education resources Awareness / Outreach Education Support / Enabling
    7. 7. Reproducible Science means context, quality, trust, easy access to the sources
    8. 8. Methods / workflows are scientific commodities <ul><li>Scripts, workflows, simulations, experimental plans statistical models, ... </li></ul><ul><li>Repeatable, reproducible, comparable and reusable research. </li></ul><ul><li>Sharing propagates expertise and builds reputation. </li></ul>, http://myexperiment.org
    9. 9. Reproducible, or rather “fully supported”, Transparent science, Composite research components Methods Lab Books Preprints Data Video Blogs Podcasts Codes Algorithms Models Presentations Ontologies Intermediate Results Related Articles Comments & Reviews Plans Models Carole Goble, UK eScience
    10. 10. Connections run both ways… Methods Lab Books Preprints Data Video Blogs Podcasts Codes Algorithms Models Presentations Ontologies Intermediate Results Related Articles Comments & Reviews Carole Goble, UK eScience
    11. 11. Virtual Research Environments Support for knowledge communities Social networks of collaboration, use cases, Emergent trends and patterns
    12. 12. Example: GEON—the Geosciences Network www.geongrid.org
    13. 13. 3D Earthquake Modeling
    14. 14. Earthquake scenarios
    15. 15. Some challenges and consequences <ul><li>Bigger, infrastructures: some institutionally focussed, some nationally focussed, some community focussed </li></ul><ul><li>Who ‘OWNS’ our research: where is it physically housed? How is access managed? </li></ul><ul><li>eResearch may also change the nature of the ‘Library’ the ‘Institution’ and even the ‘Academy’. Consider: Publish, Peer Review, Contribution, Tenure </li></ul>
    16. 16. What next for NZ? Aligning the research institutions around eReseach <ul><ul><li>Planning with MoRST for a long-term integrated landscape of HPC and eResearch, a National eResearch Infrastructure </li></ul></ul><ul><ul><li>What are the research needs, tools, applications, environments, computing capabilities that we will need, over the next 10 years? </li></ul></ul><ul><ul><li>Please get in touch if you would like to include your ideas and needs: </li></ul></ul><ul><ul><ul><li>[email_address] </li></ul></ul></ul><ul><ul><ul><li>[email_address] </li></ul></ul></ul>
    17. 17. [email_address] Questions, comments
    18. 18. Example 1 Fossils and climate: Paleo-Integration (Community and data integration) Graphic Correlation Database PGAP PaleoIntegration Project Allister Rees, University of Arizona
    19. 19. Architecture —simplified 3-tier architecture: Front - user interface (computer terminal, user-friendly search terms and tools) Back - databases (schema, ontology coding - age, geography, content) Middleware - translates user-selected parameters for database searches - keeps track of user selections (workflow), so a modified search doesn’t mean “starting over” - routes user requests to different software components (e.g. data query, spatial data conversion), bringing results from multiple databases and tools together on one screen How?
    20. 20. Early Jurassic Climates, Vegetation, and Dinosaur Distributions Integration of various data, datasets and databases Download search results, analyze and interpret data Fossil collection and publication Publish new results and interpretations?
    21. 21. LATE JURASSIC PLANT DIVERSITY Paleobiology Database (PBDB) Paleomap Project
    22. 22. LATE JURASSIC COALS AND EVAPORITES Paleogeographic Atlas Project (PGAP) Oil Source Rocks Dataset (OSR) Paleomap Project
    23. 23. LATE JURASSIC DINOSAURS Dinosauria Dataset (DINO) Paleomap Project
    24. 24. Climate / biome reconstruction TENDAGURU MORRISON
    25. 25. Example 2: Earthquake simulation (data integration & HPC) GEON SYNSEIS Integration Platform Dogan Seber, SDSC Seismic GEON portal and HPC Environment Gravity Magnetic Simulation, Analyses and Integration Scientific Discoveries Internal and External Datasets Subsurface Model