An On-line Collaborative Data Management System
Upcoming SlideShare
Loading in...5
×
 

An On-line Collaborative Data Management System

on

  • 1,224 views

A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management ...

A presentation I prepared that was presented by Rob Simmonds at the Gateway Computing Environments 2010 Workshop in New Orleans on November 14, 2010. It provides an overview of a data management system that was developed for GeoChronos - an on-line collaborative platform for Earth observation scientists.

Statistics

Views

Total Views
1,224
Views on SlideShare
1,224
Embed Views
0

Actions

Likes
0
Downloads
5
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

An On-line Collaborative Data Management System An On-line Collaborative Data Management System Presentation Transcript

  • An On-line Collaborative Data Management System Roger Curry 1 , Cameron Kiddle 1 , Rob Simmonds 1 and Gilberto Z. Pastorello Jr. 2 1 Grid Research Centre, University of Calgary 2 Centre for Earth Observation Science, University of Alberta
    • Data Challenges
    • Related Work
    • Data Management System
    • Use Case: GeoChronos
    • Summary and Future Work
    Outline GCE 2010 Nov. 14, 2010
    • Data Acquisition
      • Much scientific data stored on off-line media
      • Cumbersome and time consuming to access
      • Making data available on-line difficult
      • Insufficient storage and bandwidth
    • Sharing of Data
      • Lack of willingness to share data
      • Proprietary data - need for controlled access
    Data Challenges - I GCE 2010 Nov. 14, 2010
    • Usability of Data
      • Insufficient metadata to describe data
      • Various metadata standards in some domains, but many lacking metadata standards – many scientists use their own metadata format
    • Finding Data
      • Difficult to find data that you need
      • Different data organized / stored differently
      • Tools to browse, search, visualize data often lacking
    Data Challenges - II GCE 2010 Nov. 14, 2010
    • Content Management Systems
      • i.e., Drupal, Joomla!, Microsoft SharePoint, Plone, ...
      • Offer rich set of features but do not handle:
        • Meaningful support to specific data formats
        • Efficient association of metadata and ancillary files to data sets
        • Access to a variety of data processing tools
        • Uniform handling of outputs from processing tools
    • Spectral Libraries
      • i.e., USGS, ASTER, Vegetation Spectral Library (VSL)
      • Are available on-line but lack:
        • ability to dynamically restructure metadata for browsing
        • collaboration features enabled by social networking
    Related Work - I GCE 2010 Nov. 14, 2010
    • Spectral Library Tools
      • i.e., DLR-DFD Spectral Archive, SPECCHIO
      • Flexibile in creating / handling metadata but:
        • Have a fixed metadata schema – do not support new metadata needs
    • Data repositories for other domains
      • i.e., Astrophysics Data System, FLUXNET, European Bioinformatics (EBI) Databases
      • Offer wide range of functionality but:
        • Primarily focus on data that is already validated and structured
        • Do not handle preliminary, intermediate, untested data (i.e. research in progress)
    • Digital Libraries
      • i.e., Planetary Data Systems, NCore, SciPort
      • Have flexible functionality but:
        • Most focus on well-defined digital artefacts
        • Limited in handling collaboration on evolving data, metadata and schemas
    Related Work - II GCE 2010 Nov. 14, 2010
    • Supports the following functionality:
      • On-line access to data
      • Enables scientists to share data while maintaining control of who sees it
      • Ability to add and edit metadata while working with multiple schemas
      • Collaboratively create new schemas to facilitate consistent/accurate recording of metadata
      • Dynamically restructure the way data is browsed
    Data Management System - Overview GCE 2010 Nov. 14, 2010
  • Data Management System - Framework
    • User & Data:
      • User acquires data from sensor and uploads to portal
      • Direct acquisition of data also possible
    • Elgg Portal:
      • Built on top of Elgg – Open source social networking platform
      • Fine grained access control
      • Flexible data model
    • Data Storage:
      • Currently local NFS storage
      • Working on distributed iRODS based system
    • Data Ingestion Service:
      • Creates records, parses metadata, establishes ancillary relationships
      • Deployed on cloud-based Condor pool
    GCE 2010 Nov. 14, 2010
  • Data Management System – Data Model GCE 2010 Nov. 14, 2010 Source: http://docs.Elgg.org/wiki/File:Elgg_data_model.png) Data Management System – Data Model
    • Arbitrary metadata can be assigned to any entity
    • Annotations allow users to comment on entities not owned by them
    • Data management system adds three new types of ElggObjects
      • Schema
      • Collection
      • Record
  • Data Management System - Schemas
    • Create schemas
      • Custom or standards-based (i.e. Dublin Core)
      • Individually or as a collaborative team
    • Schemas consist of
      • Namespace
      • Description
      • Read/write access permissions
      • Series of metadata keys
    • Metadata keys consist of
      • Name
      • Description
      • Type (text, latlong, ancillary)
      • Optionality: required, recommended, optional
    GCE 2010 Nov. 14, 2010
  • Data Management System - Collections
    • Group of related data
      • i.e., spectral library, set of satellite data
    • Collection consists of
      • Name, description, read/write access permissions, metadata, records
    GCE 2010 Nov. 14, 2010
  • Data Management System - Records GCE 2010 Nov. 14, 2010
    • Atomic unit of data management system
      • Usually represents a single file, but does not need to be associated with a file
    • Tabbed interface for viewing:
      • Spectral plot, metadata, ancillary data, map, comments
      • Custom tabs based on data type
  • Data Management System – Virtual Directory Structure GCE 2010 Nov. 14, 2010
    • Dynamic restructuring of data for browsing purposes
    • Folders based on metadata keys/values
    • User can customize the metadata keys used to establish the directory hierarchy
  • Use Case - GeoChronos GCE 2010 Nov. 14, 2010 (http://geochronos.org/)
    • An on-line platform
      • For:
        • Earth Observation Scientists
      • Facilitating:
        • Collaboration between scientists
        • Data access, management and sharing
        • Application access, management and sharing
      • Leveraging:
        • Web 2.0 and social networking technologies
        • Cloud computing technologies
      • Funded by:
        • CANARIE - Network Enabled Platform (NEP-1) program
        • Cybera
    GeoChronos - Overview GCE 2010 Nov. 14, 2010
  • GeoChronos - Project Team GCE 2010 Nov. 14, 2010 Dr. Arturo Sanchez-Azofeifa University of Alberta Dr. John Gamon University of Alberta Dr. Benoit Rivard University of Alberta Dr. Rob Simmonds University of Calgary Prinicipal Investigators Project Coordination Platform Development Domain Scientists
  • GeoChronos - Virtual Organization GCE 2010 Nov. 14, 2010
    • Libraries created
      • Ingested some existing on-line libraries
        • USGS, ASTER, Vegetation Spectral Library (VSL)
        • Many enhanced features as part of GeoChronos Spectral Library module - improved browsing, dynamic plotting, mapping, annotations, ...
      • Domain scientists have contributed libraries
        • Rock samples, tar sand samples, lichen samples, vegetation samples, alfalfa/barley field samples
    • Data formats / parsers supported
      • ENVI, UNISPEC, ASD, several ASCII formats
    • Schemas incorporated
      • Library specific – USGS, ASTER, VSL, ...
      • Sensor/Format specific – UNISPEC, ENVI, ..
      • Other Standards – Dublin Core
    • Currently hosting (including MODIS data)
      • 10+ schemas,
      • 20+ collections (libraries),
      • 20,000+ records
    GeoChronos – Spectral Libraries GCE 2010 Nov. 14, 2010
  • GeoChronos – MODIS Satellite Data
    • Developed automated workflow service for mosaicing, subsetting, reprojecting and masking MODIS satellite data
    • Significantly reduces time that scientists have spent manually doing such workflows
    • Data management system used to store raw MODIS satellite data and data products derived from the workflow
    • Parsers/schemas specific to MODIS data have been added to system
    • User provided with same powerful interface as Spectral Libraries for browsing, accessing and viewing data
    GCE 2010 Nov. 14, 2010
    • Have developed data management system in an interactive, iterative fashion
    • Domain scientists on project have provided much guidance, testing and feedback
    • Have customized, enhanced the data management system based on feedback received
    GeoChronos – User Feedback GCE 2010 Nov. 14, 2010
    • Identified data related challenges facing scientists
    • Discussed some related efforts and shortcomings of these approaches
    • Presented an on-line collaborative data management system addressing many data challenges
    • Showed example usage of the data management system by GeoChronos
    Summary GCE 2010 Nov. 14, 2010
    • Currently have a single local data repository
      • Working on extending data management system to work with distributed data repositories using iRODS
    • Currently have powerful browsing functionality
      • Need to add search functionality across collections and based on metadata values
    • Currently support custom metadata schemas
      • Plan to make use of Semantic Web technologies to better relate data and provide ontological mapping between different metadata schemas / standards
    • Currently work with spectral and MODIS satellite data
      • Plan to incorporate other data such as carbon flux data, other satellite data, meteorological data, phenology tower data
    Next Steps GCE 2010 Nov. 14, 2010
  • Contact Information GCE 2010 Nov. 14, 2010 http://geochronos.org/ [email_address] http://grid.ucalgary.ca/ http://ceos.ualberta.ca/ http://www.cybera.ca/