Data 2.0|


Published on

Presentation given at Supercomputing 2007 on the progress of data sharing models, specifically highlighting the collision of data grid / data service and Web 2.0 worlds.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Data 2.0|

    1. 1. Data 2.0 a new way of integrating data? Neil Chue Hong SC07, Reno
    2. 2. Summary <ul><li>From Data Grids </li></ul><ul><li>To Data Services </li></ul><ul><li>The Rise of Web 2.0 </li></ul><ul><li>Towards Data 2.0 </li></ul>
    3. 3. Grid versus Users <ul><li>Grid is about: </li></ul><ul><ul><li>sharing resources </li></ul></ul><ul><ul><li>interoperable middleware </li></ul></ul><ul><ul><li>allowing bigger problems </li></ul></ul><ul><ul><li>integrating communities </li></ul></ul><ul><ul><li>improving security </li></ul></ul><ul><ul><li>bringing together data </li></ul></ul><ul><li>Users want to: </li></ul><ul><ul><li>access more resources </li></ul></ul><ul><ul><li>ignore middleware </li></ul></ul><ul><ul><li>solve bigger problems </li></ul></ul><ul><ul><li>form communities </li></ul></ul><ul><ul><li>have simple security </li></ul></ul><ul><ul><li>bring together data </li></ul></ul><ul><li>Grid and Users want very similar things </li></ul><ul><ul><li>and yet there is still a “want-got-gap” between them </li></ul></ul><ul><ul><li>how can this be bridged? </li></ul></ul>
    4. 4. Data Grids <ul><li>The first generation of Grids concentrated on Compute Grids </li></ul><ul><ul><li>harnessing capacity to improve capability </li></ul></ul><ul><li>Then came the first Data Grids </li></ul><ul><ul><li>mechanisms for dealing with the large amounts of data generated by sensors and simulations </li></ul></ul>
    5. 5. Data Challenges Diversity Scale Ownership Security of data resource types, vendors, middleware, schema, metadata of collections, formats, geographical, political and social distance on individual, group, and organisation levels; intersecting yet independent for client, service and data owner; at many levels, with many tradeoffs
    6. 6. Move towards data services <ul><li>Defined interface to stored collection of data </li></ul><ul><ul><li>e.g. Google and Amazon </li></ul></ul><ul><li>But the data could be: </li></ul><ul><ul><li>replicated </li></ul></ul><ul><ul><li>shared </li></ul></ul><ul><ul><li>federated </li></ul></ul><ul><ul><li>virtual </li></ul></ul><ul><ul><li>incomplete </li></ul></ul><ul><li>Improve the ability to discover, reference, </li></ul><ul><li>annotate, search, and provide provenance </li></ul>Make access transparent Make integration easy Make management simple
    7. 7. Grid Data Services <ul><li>Data middleware provides a way of publishing data in a uniform way </li></ul><ul><ul><li>accessible </li></ul></ul><ul><ul><li>discoverable </li></ul></ul><ul><ul><li>searchable </li></ul></ul><ul><li>Provide tools such as </li></ul><ul><ul><li>registries </li></ul></ul><ul><ul><li>replica catalogs </li></ul></ul><ul><ul><li>mediators </li></ul></ul>
    8. 8. Grid versus User: Round 2 <ul><li>Grids provide: </li></ul><ul><ul><li>data </li></ul></ul><ul><ul><li>discovery services </li></ul></ul><ul><ul><li>distributed queries </li></ul></ul><ul><ul><li>basic provenance </li></ul></ul><ul><ul><li>workflows to represent analysis process </li></ul></ul><ul><li>Users want: </li></ul><ul><ul><li>information </li></ul></ul><ul><ul><li>to find the right data </li></ul></ul><ul><ul><li>cross-database searches </li></ul></ul><ul><ul><li>sophisticated annotation </li></ul></ul><ul><ul><li>to explore the information space </li></ul></ul><ul><li>Data 2.0 must go beyond simple data access </li></ul><ul><ul><li>domain-specific vs generic data services </li></ul></ul><ul><ul><li>composability, interoperability and ease of use </li></ul></ul>
    9. 9. The Rise of Web 2.0 <ul><li>New sites allow non-technical users to share information and interact in programmable environments </li></ul><ul><ul><li>Social Networking: MySpace, Bebo, Facebook </li></ul></ul><ul><ul><li>GIS: Google Maps, Google Earth </li></ul></ul><ul><ul><li>Preference Matching: Amazon </li></ul></ul><ul><ul><li>Meta-clustering: digg, </li></ul></ul><ul><ul><li>Information Publishing: Flickr </li></ul></ul>
    10. 10. The Rise of Web 2.0 <ul><li>New sites allow non-technical users to share information and interact in programmable environments </li></ul><ul><ul><li>Social Networking: MySpace, Bebo, Facebook </li></ul></ul><ul><ul><li>GIS: Google Maps, Google Earth </li></ul></ul><ul><ul><li>Preference Matching: Amazon </li></ul></ul><ul><ul><li>Meta-clustering: digg, </li></ul></ul><ul><ul><li>Information Publishing: Flickr </li></ul></ul><ul><li>An army of curators, a world of information </li></ul>
    11. 11. The Four Levels of e-Science Enlightenment <ul><li>1) Resources: Providing access to a larger and wider diversity of resources </li></ul><ul><li>2) Automation: Increasing the automation and repeatability of experimentation </li></ul><ul><li>3) Collaboration: Allowing intra and cross disciplinary collaboration through enabling networks </li></ul><ul><li>4) Participation: Increasing access to a wider set of users and increasing knowledge in a domain by bringing new people to the subject </li></ul>
    12. 12. From DSs to VREs <ul><li>Virtual Research Environments </li></ul><ul><ul><li>bridge gap between middleware and users </li></ul></ul><ul><ul><li>integrate functionality and facilities </li></ul></ul><ul><li>Harness interest in communities and make it easy to contribute and easy to benefit </li></ul><ul><ul><li>infrastructure </li></ul></ul><ul><ul><li>annotation tools </li></ul></ul><ul><ul><li>graphical environment </li></ul></ul>
    13. 13. SEE-GEO: Geolinking Census DB Borders DB WFS GDAS OGSA-DAI getData getFeature geoLink Feature Portrayal GLS Portal Map Server Receive ticket for results Retrieve annotated image Store image on server Send parameterised query FPS Call out to existing FP service Cache attributes Stream polygons Request attributes Request features Run algorithm Stream relevant annotated polygons Concentrate on algorithm Access domain-specific data sets Utilise existing services Efficient delivery methods
    14. 14. Virtual Workspace for the Study of Ancient Documents <ul><li>An interface allowing browsing and searching of multiple image collections, including tools to compare and annotate the researcher’s personal collection </li></ul>
    15. 15. Data 2.0: From Silos to Sharing <ul><li>Choose data based on stored metadata </li></ul><ul><ul><li>bring together for each user </li></ul></ul><ul><li>Build a community by providing tools to contribute back </li></ul>Manc Data Soton Data OD OD Choose Dataset Dataset Annotation VRE Portal Amy Annot. Add Annotation Edin Data OD Bob Annot. Central Annot.
    16. 16. Data 2.0: a new way of integrating data? <ul><li>Many diverse data sources </li></ul><ul><ul><li>independently owned and curated </li></ul></ul><ul><li>Many diverse users </li></ul><ul><ul><li>each sharing and utilising multiple datasets </li></ul></ul><ul><li>A personalised, virtual data warehouse </li></ul><ul><ul><li>bring together many sources to appear as one </li></ul></ul><ul><li>Allow shared, distributed, centralised, replicated annotation to build a community </li></ul>
    17. 17. What is the future of data? <ul><li>Data must be available to all to be useful </li></ul><ul><li>Individuals must be able to harness the data to make it important to them </li></ul><ul><li>The work you have seen today will help this happen </li></ul><ul><li>Data 2.0 is not as far away as you think! </li></ul>