Project Matsu: Elastic Clouds for Disaster Relief
 

Project Matsu: Elastic Clouds for Disaster Relief

on

  • 2,383 views

This is a talk I gave at OGF 29 in Chicago on June 21, 2010.

This is a talk I gave at OGF 29 in Chicago on June 21, 2010.

Statistics

Views

Total Views
2,383
Views on SlideShare
2,383
Embed Views
0

Actions

Likes
0
Downloads
38
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Project Matsu: Elastic Clouds for Disaster Relief Project Matsu: Elastic Clouds for Disaster Relief Presentation Transcript

    • www.opencloudconsortium.org
      Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief
      Collin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud Consortium
      June 21, 2010
    • Project Matsu Goals
      Provide persistent data resources and elastic computing to assist in disasters:
      Make imagery available for disaster relief workers
      Elastic computing for large scale image processing
      Change detection for temporally different and geospatially identical image sets
      Provide a resource to test standards and interoperability studies large data clouds
    • Part 1:Open Cloud Consortium
    • 501(3)(c) Not-for-profit corporation
      Supports the development of standards, interoperability frameworks, and reference implementations.
      Manages testbeds: Open Cloud Testbed and IntercloudTestbed.
      Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.
      Develops benchmarks.
      4
      www.opencloudconsortium.org
    • OCC Members
      Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo
      Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago
      Government agencies: NASA
      Open Source Projects: Sector Project
      5
    • Operates Clouds
      500 nodes
      3000 cores
      1.5+ PB
      Four data centers
      10 Gbps
      Target to refresh 1/3 each year.
      • Open Cloud Testbed
      • Open Science Data Cloud
      • IntercloudTestbed
      • Project Matsu: Cloud-based Disaster Relief Services
    • Open Science Data Cloud
      Astronomical data
      Biological data (Bionimbus)
      Networking data
      Image processing for disaster relief
      7
    • Focus of OCC Large Data Cloud Working Group
      8
      App
      App
      App
      App
      App
      Table-based Data Services
      Relational-like Data Services
      App
      App
      Cloud Compute Services (MapReduce, UDF, & other programming frameworks)
      App
      App
      Cloud Storage Services
      Developing APIs for this framework.
    • Tools and Standards
      Apache Hadoop/MapReduce
      Sector/Sphere large data cloud
      Open Geospatial Consortium
      Web Map Service (WMS)
      OCC tools are open source (matsu-project)
      http://code.google.com/p/matsu-project/
    • Part 2: Technical Approach
      Hadoop – Lead Andrew Levine
      Hadoop with Python Streams – Lead Collin Bennet
      Sector/Sphere – Lead YunhongGu
    • Implementation 1: Hadoop & MapreduceAndrew Levine
    • Image Processing in the Cloud - Mapper
      Mapper Input Key: Bounding Box
      Mapper Output Key: Bounding Box
      Mapper Output Value:
      + Timestamp
      (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5)
      Mapper Input Value:
      Mapper Output Key: Bounding Box
      Mapper Output Value:
      + Timestamp
      + Timestamp
      Mapper Output Key: Bounding Box
      Mapper Output Value:
      + Timestamp
      Mapper Output Key: Bounding Box
      Step 1: Input to Mapper
      Mapper Output Value:
      + Timestamp
      Mapper resizes and/or cuts up the original
      image into pieces to output Bounding Boxes
      Mapper Output Key: Bounding Box
      Mapper Output Value:
      + Timestamp
      Mapper Output Key: Bounding Box
      Mapper Output Value:
      + Timestamp
      Mapper Output Key: Bounding Box
      Mapper Output Value:
      + Timestamp
      Mapper Output Key: Bounding Box
      Mapper Output Value:
      + Timestamp
      Step 3: Mapper Output
      Step 2: Processing in Mapper
    • Image Processing in the Cloud - Reducer
      Reducer Key Input: Bounding Box
      (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375)
      Reducer Value Input:


      Step 1: Input to Reducer
      Result is a delta of the two Images
      Assemble Images based on timestamps and compare
      Step 2: Process difference in Reducer
      All images go to different map layers set of images for display in WMS
      Timestamp 1
      Set
      Timestamp 2
      Set
      Delta Set
      Step 3: Reducer Output
    • Implementation 2: Hadoop & Python StreamsCollin Bennett
    • Preprocessing Step
      • All images (in a batch to be processed) are combined into a single file.
      • Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm).
      geolocation timestamp | tuple size ; image width ; image height; comma-separated list of pixels
      the fields in red are metadata needed to process the image in the reducer
    • Map and Shuffle
      • We can use the identity mapper
      • All of the work for mapping was done in the pre-process step
      • Map / Shuffle key is the geolocation
      • In the reducer, the timestamp will be 1st field of each record when splitting on ‘|’
    • Implementation 3: Sector/SphereYunhongGu
    • Sector Distributed File System
      Sector aggregate hard disk storage across commodity computers
      With single namespace, file system level reliability (using replication), high availability
      Sector does not split files
      A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network
      A directory can be kept together on a single node as well, as an option
    • Sphere UDF
      Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images)
      Existing applications can be wrapped up in a Sphere UDF
      In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs
      ./stream -ihaiti -cossim_foo -o results
    • For More Information
      info@opencloudconsortium.org
      www.opencloudconsortium.org