An On-line Collaborative Data Management System
Upcoming SlideShare
Loading in...5
×
 

An On-line Collaborative Data Management System

on

  • 1,182 views

A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management ...

A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management system that was developed for GeoChronos - an on-line collaborative platform for Earth observation scientists.

Statistics

Views

Total Views
1,182
Views on SlideShare
1,182
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    An On-line Collaborative Data Management System An On-line Collaborative Data Management System Presentation Transcript

    • An On-line Collaborative Data Management System Roger Curry 1 , Cameron Kiddle 1 , Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre, University of Calgary 2 Centre for Earth Observation Science, University of Alberta
      • Data Challenges
      • Related Work
      • Data Management System
      • Use Case: GeoChronos
      • Summary and Future Work
      Outline GCE 2010 Nov. 14, 2010
      • Data Acquisition
        • Much scientific data stored on off-line media
        • Cumbersome and time consuming to access
        • Making data available on-line difficult
        • Insufficient storage and bandwidth
      • Sharing of Data
        • Lack of willingness to share data
        • Proprietary data - need for controlled access
      Data Challenges - I GCE 2010 Nov. 14, 2010
      • Usability of Data
        • Insufficient metadata to describe data
        • Various metadata standards in some domains, but many lacking metadata standards – many scientists use their own metadata format
      • Finding Data
        • Difficult to find data that you need
        • Different data organized / stored differently
        • Tools to browse, search, visualize data often lacking
      Data Challenges - II GCE 2010 Nov. 14, 2010
      • Content Management Systems
        • i.e., Drupal, Joomla!, Microsoft SharePoint, Plone, ...
        • Offer rich set of features but do not handle:
          • Meaningful support to specific data formats
          • Efficient association of metadata and ancillary files to data sets
          • Access to a variety of data processing tools
          • Uniform handling of outputs from processing tools
      • Spectral Libraries
        • i.e., USGS, ASTER, Vegetation Spectral Library (VSL)
        • Are available on-line but lack:
          • ability to dynamically restructure metadata for browsing
          • collaboration features enabled by social networking
      Related Work - I GCE 2010 Nov. 14, 2010
      • Spectral Library Tools
        • i.e., DLR-DFD Spectral Archive, SPECCHIO
        • Flexibile in creating / handling metadata but:
          • Have a fixed metadata schema – do not support new metadata needs
      • Data repositories for other domains
        • i.e., Astrophysics Data System, FLUXNET, European Bioinformatics (EBI) Databases
        • Offer wide range of functionality but:
          • Primarily focus on data that is already validated and structured
          • Do not handle preliminary, intermediate, untested data (i.e. research in progress)
      • Digital Libraries
        • i.e., Planetary Data Systems, NCore, SciPort
        • Have flexible functionality but:
          • Most focus on well-defined digital artefacts
          • Limited in handling collaboration on evolving data, metadata and schemas
      Related Work - II GCE 2010 Nov. 14, 2010
      • Supports the following functionality:
        • On-line access to data
        • Enables scientists to share data while maintaining control of who sees it
        • Ability to add and edit metadata while working with multiple schemas
        • Collaboratively create new schemas to facilitate consistent/accurate recording of metadata
        • Dynamically restructure the way data is browsed
      Data Management System - Overview GCE 2010 Nov. 14, 2010
    • Data Management System - Framework
      • User & Data:
        • User acquires data from sensor and uploads to portal
        • Direct acquisition of data also possible
      • Elgg Portal:
        • Built on top of Elgg – Open source social networking platform
        • Fine grained access control
        • Flexible data model
      • Data Storage:
        • Currently local NFS storage
        • Working on distributed iRODS based system
      • Data Ingestion Service:
        • Creates records, parses metadata, establishes ancillary relationships
        • Deployed on cloud-based Condor pool
      GCE 2010 Nov. 14, 2010
    • Data Management System – Data Model GCE 2010 Nov. 14, 2010 Source: http://docs.Elgg.org/wiki/File:Elgg_data_model.png) Data Management System – Data Model
      • Arbitrary metadata can be assigned to any entity
      • Annotations allow users to comment on entities not owned by them
      • Data management system adds three new types of ElggObjects
        • Schema
        • Collection
        • Record
    • Data Management System - Schemas
      • Create schemas
        • Custom or standards-based (i.e. Dublin Core)
        • Individually or as a collaborative team
      • Schemas consist of
        • Namespace
        • Description
        • Read/write access permissions
        • Series of metadata keys
      • Metadata keys consist of
        • Name
        • Description
        • Type (text, latlong, ancillary)
        • Optionality: required, recommended, optional
      GCE 2010 Nov. 14, 2010
    • Data Management System - Collections
      • Group of related data
        • i.e., spectral library, set of satellite data
      • Collection consists of
        • Name, description, read/write access permissions, metadata, records
      GCE 2010 Nov. 14, 2010
    • Data Management System - Records GCE 2010 Nov. 14, 2010
      • Atomic unit of data management system
        • Usually represents a single file, but does not need to be associated with a file
      • Tabbed interface for viewing:
        • Spectral plot, metadata, ancillary data, map, comments
        • Custom tabs based on data type
    • Data Management System – Virtual Directory Structure GCE 2010 Nov. 14, 2010
      • Dynamic restructuring of data for browsing purposes
      • Folders based on metadata keys/values
      • User can customize the metadata keys used to establish the directory hierarchy
    • Use Case - GeoChronos GCE 2010 Nov. 14, 2010 (http://geochronos.org/)
      • An on-line platform
        • For:
          • Earth Observation Scientists
        • Facilitating:
          • Collaboration between scientists
          • Data access, management and sharing
          • Application access, management and sharing
        • Leveraging:
          • Web 2.0 and social networking technologies
          • Cloud computing technologies
        • Funded by:
          • CANARIE - Network Enabled Platform (NEP-1) program
          • Cybera
      GeoChronos - Overview GCE 2010 Nov. 14, 2010
    • GeoChronos - Project Team GCE 2010 Nov. 14, 2010 Dr. Arturo Sanchez-Azofeifa University of Alberta Dr. John Gamon University of Alberta Dr. Benoit Rivard University of Alberta Dr. Rob Simmonds University of Calgary Prinicipal Investigators Project Coordination Platform Development Domain Scientists
    • GeoChronos - Virtual Organization GCE 2010 Nov. 14, 2010
      • Libraries created
        • Ingested some existing on-line libraries
          • USGS, ASTER, Vegetation Spectral Library (VSL)
          • Many enhanced features as part of GeoChronos Spectral Library module - improved browsing, dynamic plotting, mapping, annotations, ...
        • Domain scientists have contributed libraries
          • Rock samples, tar sand samples, lichen samples, vegetation samples, alfalfa/barley field samples
      • Data formats / parsers supported
        • ENVI, UNISPEC, ASD, several ASCII formats
      • Schemas incorporated
        • Library specific – USGS, ASTER, VSL, ...
        • Sensor/Format specific – UNISPEC, ENVI, ..
        • Other Standards – Dublin Core
      • Currently hosting (including MODIS data)
        • 10+ schemas,
        • 20+ collections (libraries),
        • 20,000+ records
      GeoChronos – Spectral Libraries GCE 2010 Nov. 14, 2010
    • GeoChronos – MODIS Satellite Data
      • Developed automated workflow service for mosaicing, subsetting, reprojecting and masking MODIS satellite data
      • Significantly reduces time that scientists have spent manually doing such workflows
      • Data management system used to store raw MODIS satellite data and data products derived from the workflow
      • Parsers/schemas specific to MODIS data have been added to system
      • User provided with same powerful interface as Spectral Libraries for browsing, accessing and viewing data
      GCE 2010 Nov. 14, 2010
      • Have developed data management system in an interactive, iterative fashion
      • Domain scientists on project have provided much guidance, testing and feedback
      • Have customized, enhanced the data management system based on feedback received
      GeoChronos – User Feedback GCE 2010 Nov. 14, 2010
      • Identified data related challenges facing scientists
      • Discussed some related efforts and shortcomings of these approaches
      • Presented an on-line collaborative data management system addressing many data challenges
      • Showed example usage of the data management system by GeoChronos
      Summary GCE 2010 Nov. 14, 2010
      • Currently have a single local data repository
        • Working on extending data management system to work with distributed data repositories using iRODS
      • Currently have powerful browsing functionality
        • Need to add search functionality across collections and based on metadata values
      • Currently support custom metadata schemas
        • Plan to make use of Semantic Web technologies to better relate data and provide ontological mapping between different metadata schemas / standards
      • Currently work with spectral and MODIS satellite data
        • Plan to incorporate other data such as carbon flux data, other satellite data, meteorological data, phenology tower data
      Next Steps GCE 2010 Nov. 14, 2010
    • Contact Information GCE 2010 Nov. 14, 2010 http://geochronos.org/ [email_address] http://grid.ucalgary.ca/ http://ceos.ualberta.ca/ http://www.cybera.ca/