SlideShare a Scribd company logo
Project Matsu: Large Scale On-Demand
Image Processing for Disaster Relief
Collin Bennett, Robert Grossman,
Yunhong Gu, and Andrew Levine
Open Cloud Consortium
June 21, 2010
www.opencloudconsortium.org
Project Matsu Goals
• Provide persistent data resources and elastic
computing to assist in disasters:
– Make imagery available for disaster relief workers
– Elastic computing for large scale image processing
– Change detection for temporally different and
geospatially identical image sets
• Provide a resource to test standards and
interoperability studies large data clouds
Part 1:
Open Cloud Consortium
• 501(3)(c) Not-for-profit corporation
• Supports the development of standards,
interoperability frameworks, and reference
implementations.
• Manages testbeds: Open Cloud Testbed and
Intercloud Testbed.
• Manages cloud computing infrastructure to support
scientific research: Open Science Data Cloud.
• Develops benchmarks.
4
www.opencloudconsortium.org
OCC Members
• Companies: Aerospace, Booz Allen Hamilton,
Cisco, InfoBlox, Open Data Group, Raytheon,
Yahoo
• Universities: CalIT2, Johns Hopkins,
Northwestern Univ., University of Illinois at
Chicago, University of Chicago
• Government agencies: NASA
• Open Source Projects: Sector Project
5
Operates Clouds
• 500 nodes
• 3000 cores
• 1.5+ PB
• Four data centers
• 10 Gbps
• Target to refresh 1/3
each year.
• Open Cloud Testbed
• Open Science Data Cloud
• Intercloud Testbed
• Project Matsu: Cloud-
based Disaster Relief
Services
Open Science Data Cloud
7
Astronomical data
Biological data
(Bionimbus)
Networking data
Image processing for disaster relief
Focus of OCC Large Data Cloud Working Group
8
Cloud Storage Services
Cloud Compute Services
(MapReduce, UDF, & other programming
frameworks)
Table-based Data
Services
Relational-like
Data Services
App App App App App
App App
App App
• Developing APIs for this framework.
Tools and Standards
• Apache Hadoop/MapReduce
• Sector/Sphere large data cloud
• Open Geospatial Consortium
– Web Map Service (WMS)
• OCC tools are open source (matsu-project)
– http://code.google.com/p/matsu-project/
Part 2: Technical Approach
• Hadoop – Lead Andrew Levine
• Hadoop with Python Streams – Lead Collin
Bennet
• Sector/Sphere – Lead Yunhong Gu
Implementation 1:
Hadoop & Mapreduce
Andrew Levine
Image Processing in the Cloud - Mapper
Mapper Input Key: Bounding Box
Mapper Input Value:
Mapper Output Key: Bounding Box
Mapper Output Value:
Mapper resizes and/or cuts up the original
image into pieces to output Bounding Boxes
(minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5)
Step 1: Input to Mapper
Step 2: Processing in Mapper Step 3: Mapper Output
Mapper Output Key: Bounding Box
Mapper Output Value:
Mapper Output Key: Bounding Box
Mapper Output Value:
Mapper Output Key: Bounding Box
Mapper Output Value:
Mapper Output Key: Bounding Box
Mapper Output Value:
Mapper Output Key: Bounding Box
Mapper Output Value:
Mapper Output Key: Bounding Box
Mapper Output Value:
Mapper Output Key: Bounding Box
Mapper Output Value:
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
+ Timestamp
Image Processing in the Cloud - Reducer
Reducer Key Input: Bounding Box
(minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375)
Reducer Value Input:
Step 1: Input to Reducer
… …
Step 2: Process difference in Reducer
Assemble Images based on timestamps and compare Result is a delta of the two Images
Step 3: Reducer Output
All images go to different map layers set of images for display in WMS
Timestamp 1
Set
Timestamp 2
Set
Delta Set
Implementation 2:
Hadoop & Python Streams
Collin Bennett
Preprocessing Step
• All images (in a batch to be processed) are
combined into a single file.
• Each line contains the image’s byte array
transformed to pixels (raw bytes don’t seem
to work well with the one-line-at-a-time
Hadoop streaming paradigm).
geolocation t timestamp | tuple size
; image width ; image height; comma-
separated list of pixels
the fields in red are metadata needed to process the image in the
reducer
Map and Shuffle
• We can use the identity mapper
• All of the work for mapping was
done in the pre-process step
• Map / Shuffle key is the geolocation
• In the reducer, the timestamp will be
1st field of each record when
splitting on ‘|’
Implementation 3:
Sector/Sphere
Yunhong Gu
Sector Distributed File System
• Sector aggregate hard disk storage across
commodity computers
– With single namespace, file system level reliability
(using replication), high availability
• Sector does not split files
– A single image will not be split, therefore when it
is being processed, the application does not need
to read the data from other nodes via network
– A directory can be kept together on a single node
as well, as an option
Sphere UDF
• Sphere allows a User Defined Function to be
applied to each file (either it is a single image
or multiple images)
• Existing applications can be wrapped up in a
Sphere UDF
• In many situations, Sphere streaming utility
accepts a data directory and a application
binary as inputs
• ./stream -i haiti -c ossim_foo -o results
For More Information
info@opencloudconsortium.org
www.opencloudconsortium.org

More Related Content

What's hot

QGIS Module 4
QGIS Module 4QGIS Module 4
QGIS Module 4
CAPSUCSF
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
Gabriela Agustini
 
Mapreduce
MapreduceMapreduce
Mapreduce
Humera Shaikh
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Ural-PDC
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
ExtremeEarth
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
Subhas Kumar Ghosh
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
Ashraf Uddin
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
ExtremeEarth
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersAbolfazl Asudeh
 
T.2.4 – 3 d modelling for harvesting planning (by graphitech)
T.2.4 – 3 d modelling for harvesting planning (by graphitech)T.2.4 – 3 d modelling for harvesting planning (by graphitech)
T.2.4 – 3 d modelling for harvesting planning (by graphitech)SLOPE Project
 
GIS and QGIS training notes
GIS and QGIS training notesGIS and QGIS training notes
GIS and QGIS training notes
Arnold Kilaini
 
All the New Cool Stuff in QGIS 2.0
All the New Cool Stuff in QGIS 2.0All the New Cool Stuff in QGIS 2.0
All the New Cool Stuff in QGIS 2.0
Nathan Woodrow
 
Hadoop
HadoopHadoop
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
Leila panahi
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
Pietro Michiardi
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
Cleverence Kombe
 
QGIS training class 1
QGIS training class 1QGIS training class 1
QGIS training class 1
Hiroaki Sengoku
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersAbhishek Singh
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingRiyad Parvez
 

What's hot (20)

QGIS Module 4
QGIS Module 4QGIS Module 4
QGIS Module 4
 
MapReduce Algorithm Design
MapReduce Algorithm DesignMapReduce Algorithm Design
MapReduce Algorithm Design
 
Mapreduce
MapreduceMapreduce
Mapreduce
 
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
Principles of Computing Resources Planning in Cloud-Based Problem Solving Env...
 
Big Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open WorkshopBig Linked Data Interlinking - ExtremeEarth Open Workshop
Big Linked Data Interlinking - ExtremeEarth Open Workshop
 
06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering06 how to write a map reduce version of k-means clustering
06 how to write a map reduce version of k-means clustering
 
MapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large ClustersMapReduce: Simplified Data Processing on Large Clusters
MapReduce: Simplified Data Processing on Large Clusters
 
Big Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open WorkshopBig Linked Data Querying - ExtremeEarth Open Workshop
Big Linked Data Querying - ExtremeEarth Open Workshop
 
MapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large ClustersMapReduce : Simplified Data Processing on Large Clusters
MapReduce : Simplified Data Processing on Large Clusters
 
T.2.4 – 3 d modelling for harvesting planning (by graphitech)
T.2.4 – 3 d modelling for harvesting planning (by graphitech)T.2.4 – 3 d modelling for harvesting planning (by graphitech)
T.2.4 – 3 d modelling for harvesting planning (by graphitech)
 
cnsm2011_slide
cnsm2011_slidecnsm2011_slide
cnsm2011_slide
 
GIS and QGIS training notes
GIS and QGIS training notesGIS and QGIS training notes
GIS and QGIS training notes
 
All the New Cool Stuff in QGIS 2.0
All the New Cool Stuff in QGIS 2.0All the New Cool Stuff in QGIS 2.0
All the New Cool Stuff in QGIS 2.0
 
Hadoop
HadoopHadoop
Hadoop
 
MapReduce Scheduling Algorithms
MapReduce Scheduling AlgorithmsMapReduce Scheduling Algorithms
MapReduce Scheduling Algorithms
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
Map reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clustersMap reduce - simplified data processing on large clusters
Map reduce - simplified data processing on large clusters
 
QGIS training class 1
QGIS training class 1QGIS training class 1
QGIS training class 1
 
Mapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large ClustersMapreduce - Simplified Data Processing on Large Clusters
Mapreduce - Simplified Data Processing on Large Clusters
 
Pregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph ProcessingPregel: A System For Large Scale Graph Processing
Pregel: A System For Large Scale Graph Processing
 

Similar to Project Matsu

Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
Robert Grossman
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
Cloudera, Inc.
 
Introduction to Google Earth Engine .pptx
Introduction to Google Earth Engine .pptxIntroduction to Google Earth Engine .pptx
Introduction to Google Earth Engine .pptx
Putu Perdana Kusuma Wiguna
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
Databricks
 
Brewing the Ultimate Data Fusion
Brewing the Ultimate Data FusionBrewing the Ultimate Data Fusion
Brewing the Ultimate Data Fusion
Safe Software
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
Aly Abdelkareem
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
Srinath Perera
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern PresentationDaniel Cahall
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data StackZubair Nabi
 
Efficient Scheduling for Dynamic Streaming of 3D Scene for Mobile Devices
Efficient Scheduling for Dynamic Streaming of 3D Scene for Mobile DevicesEfficient Scheduling for Dynamic Streaming of 3D Scene for Mobile Devices
Efficient Scheduling for Dynamic Streaming of 3D Scene for Mobile Devices
Budianto Tandianus
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developments
virtualcitySYSTEMS GmbH
 
MapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overviewMapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overview
Prakher Hajela Saxena
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReducesscdotopen
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
Rebekah Rodriguez
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
Dong-Won Shin
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
Milind Bhandarkar
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
Kumari Surabhi
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerancePallav Jha
 

Similar to Project Matsu (20)

Project Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster ReliefProject Matsu: Elastic Clouds for Disaster Relief
Project Matsu: Elastic Clouds for Disaster Relief
 
HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010HP - Jerome Rolia - Hadoop World 2010
HP - Jerome Rolia - Hadoop World 2010
 
Introduction to Google Earth Engine .pptx
Introduction to Google Earth Engine .pptxIntroduction to Google Earth Engine .pptx
Introduction to Google Earth Engine .pptx
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
Brewing the Ultimate Data Fusion
Brewing the Ultimate Data FusionBrewing the Ultimate Data Fusion
Brewing the Ultimate Data Fusion
 
Object extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learningObject extraction from satellite imagery using deep learning
Object extraction from satellite imagery using deep learning
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Cahall Final Intern Presentation
Cahall Final Intern PresentationCahall Final Intern Presentation
Cahall Final Intern Presentation
 
The Big Data Stack
The Big Data StackThe Big Data Stack
The Big Data Stack
 
Efficient Scheduling for Dynamic Streaming of 3D Scene for Mobile Devices
Efficient Scheduling for Dynamic Streaming of 3D Scene for Mobile DevicesEfficient Scheduling for Dynamic Streaming of 3D Scene for Mobile Devices
Efficient Scheduling for Dynamic Streaming of 3D Scene for Mobile Devices
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developments
 
MapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overviewMapInfo Professional 12.5 and Discover3D 2014 - A brief overview
MapInfo Professional 12.5 and Discover3D 2014 - A brief overview
 
Scalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduceScalable Similarity-Based Neighborhood Methods with MapReduce
Scalable Similarity-Based Neighborhood Methods with MapReduce
 
GRID COMPUTING
GRID COMPUTINGGRID COMPUTING
GRID COMPUTING
 
Rack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC SupercomputerRack Cluster Deployment for SDSC Supercomputer
Rack Cluster Deployment for SDSC Supercomputer
 
ICIECA 2014 Paper 05
ICIECA 2014 Paper 05ICIECA 2014 Paper 05
ICIECA 2014 Paper 05
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 
Extending Hadoop for Fun & Profit
Extending Hadoop for Fun & ProfitExtending Hadoop for Fun & Profit
Extending Hadoop for Fun & Profit
 
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop ClustersA performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
A performance analysis of OpenStack Cloud vs Real System on Hadoop Clusters
 
Hadoop fault tolerance
Hadoop  fault toleranceHadoop  fault tolerance
Hadoop fault tolerance
 

Recently uploaded

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
yhkoc
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
balafet
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
jerlynmaetalle
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
John Andrews
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 

Recently uploaded (20)

一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
一比一原版(CU毕业证)卡尔顿大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
Machine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptxMachine learning and optimization techniques for electrical drives.pptx
Machine learning and optimization techniques for electrical drives.pptx
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Influence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business PlanInfluence of Marketing Strategy and Market Competition on Business Plan
Influence of Marketing Strategy and Market Competition on Business Plan
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
Chatty Kathy - UNC Bootcamp Final Project Presentation - Final Version - 5.23...
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 

Project Matsu

  • 1. Project Matsu: Large Scale On-Demand Image Processing for Disaster Relief Collin Bennett, Robert Grossman, Yunhong Gu, and Andrew Levine Open Cloud Consortium June 21, 2010 www.opencloudconsortium.org
  • 2. Project Matsu Goals • Provide persistent data resources and elastic computing to assist in disasters: – Make imagery available for disaster relief workers – Elastic computing for large scale image processing – Change detection for temporally different and geospatially identical image sets • Provide a resource to test standards and interoperability studies large data clouds
  • 3. Part 1: Open Cloud Consortium
  • 4. • 501(3)(c) Not-for-profit corporation • Supports the development of standards, interoperability frameworks, and reference implementations. • Manages testbeds: Open Cloud Testbed and Intercloud Testbed. • Manages cloud computing infrastructure to support scientific research: Open Science Data Cloud. • Develops benchmarks. 4 www.opencloudconsortium.org
  • 5. OCC Members • Companies: Aerospace, Booz Allen Hamilton, Cisco, InfoBlox, Open Data Group, Raytheon, Yahoo • Universities: CalIT2, Johns Hopkins, Northwestern Univ., University of Illinois at Chicago, University of Chicago • Government agencies: NASA • Open Source Projects: Sector Project 5
  • 6. Operates Clouds • 500 nodes • 3000 cores • 1.5+ PB • Four data centers • 10 Gbps • Target to refresh 1/3 each year. • Open Cloud Testbed • Open Science Data Cloud • Intercloud Testbed • Project Matsu: Cloud- based Disaster Relief Services
  • 7. Open Science Data Cloud 7 Astronomical data Biological data (Bionimbus) Networking data Image processing for disaster relief
  • 8. Focus of OCC Large Data Cloud Working Group 8 Cloud Storage Services Cloud Compute Services (MapReduce, UDF, & other programming frameworks) Table-based Data Services Relational-like Data Services App App App App App App App App App • Developing APIs for this framework.
  • 9. Tools and Standards • Apache Hadoop/MapReduce • Sector/Sphere large data cloud • Open Geospatial Consortium – Web Map Service (WMS) • OCC tools are open source (matsu-project) – http://code.google.com/p/matsu-project/
  • 10. Part 2: Technical Approach • Hadoop – Lead Andrew Levine • Hadoop with Python Streams – Lead Collin Bennet • Sector/Sphere – Lead Yunhong Gu
  • 11. Implementation 1: Hadoop & Mapreduce Andrew Levine
  • 12. Image Processing in the Cloud - Mapper Mapper Input Key: Bounding Box Mapper Input Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper resizes and/or cuts up the original image into pieces to output Bounding Boxes (minx = -135.0 miny = 45.0 maxx = -112.5 maxy = 67.5) Step 1: Input to Mapper Step 2: Processing in Mapper Step 3: Mapper Output Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: Mapper Output Key: Bounding Box Mapper Output Value: + Timestamp + Timestamp + Timestamp + Timestamp + Timestamp + Timestamp + Timestamp + Timestamp + Timestamp
  • 13. Image Processing in the Cloud - Reducer Reducer Key Input: Bounding Box (minx = -45.0 miny = -2.8125 maxx = -43.59375 maxy = -2.109375) Reducer Value Input: Step 1: Input to Reducer … … Step 2: Process difference in Reducer Assemble Images based on timestamps and compare Result is a delta of the two Images Step 3: Reducer Output All images go to different map layers set of images for display in WMS Timestamp 1 Set Timestamp 2 Set Delta Set
  • 14. Implementation 2: Hadoop & Python Streams Collin Bennett
  • 15. Preprocessing Step • All images (in a batch to be processed) are combined into a single file. • Each line contains the image’s byte array transformed to pixels (raw bytes don’t seem to work well with the one-line-at-a-time Hadoop streaming paradigm). geolocation t timestamp | tuple size ; image width ; image height; comma- separated list of pixels the fields in red are metadata needed to process the image in the reducer
  • 16. Map and Shuffle • We can use the identity mapper • All of the work for mapping was done in the pre-process step • Map / Shuffle key is the geolocation • In the reducer, the timestamp will be 1st field of each record when splitting on ‘|’
  • 18. Sector Distributed File System • Sector aggregate hard disk storage across commodity computers – With single namespace, file system level reliability (using replication), high availability • Sector does not split files – A single image will not be split, therefore when it is being processed, the application does not need to read the data from other nodes via network – A directory can be kept together on a single node as well, as an option
  • 19. Sphere UDF • Sphere allows a User Defined Function to be applied to each file (either it is a single image or multiple images) • Existing applications can be wrapped up in a Sphere UDF • In many situations, Sphere streaming utility accepts a data directory and a application binary as inputs • ./stream -i haiti -c ossim_foo -o results