www.opencloudconsortium.orgProject Matsu: Large Scale On-Demand Image Processing for Disaster ReliefCollin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud ConsortiumJune 21, 2010
Project Matsu GoalsProvide persistent data resources and elastic computing to assist in disasters:Make imagery available for disaster relief workersElastic computing for large scale image processingChange detection for temporally different and geospatially identical image setsProvide a resource to test standards and interoperability studies large data clouds
Part 1:Open Cloud Consortium
501(3)(c) Not-for-profit corporationSupports the development of standards, interoperability frameworks, and reference implementations.Manages testbeds: Open Cloud Testbed and  IntercloudTestbed.Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.Develops benchmarks.4www.opencloudconsortium.org
OCC MembersCompanies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, YahooUniversities:  CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of ChicagoGovernment agencies: NASAOpen Source Projects: Sector Project5
Operates Clouds500 nodes3000 cores1.5+ PBFour data centers10 GbpsTarget to refresh 1/3 each year.Open Cloud Testbed
Open Science Data Cloud
IntercloudTestbed
Project Matsu: Cloud-based Disaster Relief ServicesOpen Science Data CloudAstronomical dataBiological data (Bionimbus)Networking dataImage processing for disaster relief7
Focus of OCC Large Data Cloud Working Group8AppAppAppAppAppTable-based Data ServicesRelational-like Data ServicesAppAppCloud Compute Services (MapReduce, UDF, & other programming frameworks)AppAppCloud Storage ServicesDeveloping APIs for this framework.
Tools and StandardsApache Hadoop/MapReduceSector/Sphere large data cloudOpen Geospatial ConsortiumWeb Map Service (WMS)OCC tools are open source (matsu-project)http://code.google.com/p/matsu-project/
Part 2: Technical ApproachHadoop – Lead Andrew LevineHadoop with Python Streams – Lead Collin BennetSector/Sphere – Lead YunhongGu
Implementation 1: Hadoop & MapreduceAndrew Levine
Image Processing in the Cloud - MapperMapper Input Key: Bounding BoxMapper Output Key: Bounding BoxMapper Output Value:+ Timestamp(minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5)Mapper Input Value:Mapper Output Key: Bounding BoxMapper Output Value:+ Timestamp+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxStep 1: Input to MapperMapper Output Value:+ TimestampMapper resizes and/or cuts up the originalimage into pieces to output Bounding BoxesMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampStep 3: Mapper OutputStep 2: Processing in Mapper
Image Processing in the Cloud - ReducerReducer Key Input: Bounding Box(minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375)Reducer Value Input:……Step 1: Input to ReducerResult is a delta of the two ImagesAssemble Images based on timestamps and compareStep 2: Process difference in ReducerAll images go to different map layers set of images for display in WMSTimestamp 1SetTimestamp 2SetDelta SetStep 3: Reducer Output
Implementation 2: Hadoop & Python StreamsCollin Bennett
Preprocessing StepAll images (in a batch to be processed) are combined into a single file.
Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm).geolocation \t timestamp | tuple size ; image width ; image height; comma-separated list of pixelsthe fields in red are metadata needed to process the image in the reducer
Map and ShuffleWe can use the identity mapper
All of the work for mapping was done in the pre-process step

Project Matsu: Elastic Clouds for Disaster Relief

  • 1.
    www.opencloudconsortium.orgProject Matsu: LargeScale On-Demand Image Processing for Disaster ReliefCollin Bennett, Robert Grossman, YunhongGu, and Andrew LevineOpen Cloud ConsortiumJune 21, 2010
  • 2.
    Project Matsu GoalsProvidepersistent data resources and elastic computing to assist in disasters:Make imagery available for disaster relief workersElastic computing for large scale image processingChange detection for temporally different and geospatially identical image setsProvide a resource to test standards and interoperability studies large data clouds
  • 3.
  • 4.
    501(3)(c) Not-for-profit corporationSupportsthe development of standards, interoperability frameworks, and reference implementations.Manages testbeds: Open Cloud Testbed and IntercloudTestbed.Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud.Develops benchmarks.4www.opencloudconsortium.org
  • 5.
    OCC MembersCompanies: Aerospace,Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, YahooUniversities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of ChicagoGovernment agencies: NASAOpen Source Projects: Sector Project5
  • 6.
    Operates Clouds500 nodes3000cores1.5+ PBFour data centers10 GbpsTarget to refresh 1/3 each year.Open Cloud Testbed
  • 7.
  • 8.
  • 9.
    Project Matsu: Cloud-basedDisaster Relief ServicesOpen Science Data CloudAstronomical dataBiological data (Bionimbus)Networking dataImage processing for disaster relief7
  • 10.
    Focus of OCCLarge Data Cloud Working Group8AppAppAppAppAppTable-based Data ServicesRelational-like Data ServicesAppAppCloud Compute Services (MapReduce, UDF, & other programming frameworks)AppAppCloud Storage ServicesDeveloping APIs for this framework.
  • 11.
    Tools and StandardsApacheHadoop/MapReduceSector/Sphere large data cloudOpen Geospatial ConsortiumWeb Map Service (WMS)OCC tools are open source (matsu-project)http://code.google.com/p/matsu-project/
  • 12.
    Part 2: TechnicalApproachHadoop – Lead Andrew LevineHadoop with Python Streams – Lead Collin BennetSector/Sphere – Lead YunhongGu
  • 13.
    Implementation 1: Hadoop& MapreduceAndrew Levine
  • 14.
    Image Processing inthe Cloud - MapperMapper Input Key: Bounding BoxMapper Output Key: Bounding BoxMapper Output Value:+ Timestamp(minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5)Mapper Input Value:Mapper Output Key: Bounding BoxMapper Output Value:+ Timestamp+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxStep 1: Input to MapperMapper Output Value:+ TimestampMapper resizes and/or cuts up the originalimage into pieces to output Bounding BoxesMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampMapper Output Key: Bounding BoxMapper Output Value:+ TimestampStep 3: Mapper OutputStep 2: Processing in Mapper
  • 15.
    Image Processing inthe Cloud - ReducerReducer Key Input: Bounding Box(minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375)Reducer Value Input:……Step 1: Input to ReducerResult is a delta of the two ImagesAssemble Images based on timestamps and compareStep 2: Process difference in ReducerAll images go to different map layers set of images for display in WMSTimestamp 1SetTimestamp 2SetDelta SetStep 3: Reducer Output
  • 16.
    Implementation 2: Hadoop& Python StreamsCollin Bennett
  • 17.
    Preprocessing StepAll images(in a batch to be processed) are combined into a single file.
  • 18.
    Each line containsthe image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-timeHadoop streaming paradigm).geolocation \t timestamp | tuple size ; image width ; image height; comma-separated list of pixelsthe fields in red are metadata needed to process the image in the reducer
  • 19.
    Map and ShuffleWecan use the identity mapper
  • 20.
    All of thework for mapping was done in the pre-process step