×
  • Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield

by on Jul 10, 2013

  • 2,873 views

Monsanto built a geospatial platform on Hadoop and HBase capable of managing over 120 billion polygons. As a result of the extreme data volumes and compute complexities we were forced to migrate our ...

Monsanto built a geospatial platform on Hadoop and HBase capable of managing over 120 billion polygons. As a result of the extreme data volumes and compute complexities we were forced to migrate our data processing from a more traditional RDBMS to a scale out Hadoop implementation. Data processing that took over 30 days on 8% of the data now runs in under 12 hours on the entire data set. Very little concrete material exist for how you process spatial data via MapReduce or model it in HBase. We will provide concrete and novel examples for processing and storing spatial data on Hadoop and HBase. As part of the data processing pipeline we integrated the popular open source geospatial processing library GDAL with MapReduce to convert all geospatial datasets to a common format and projection. We developed a method for splitting and processing images via MapReduce in which the boundaries of splits needed to be shared by multiple tasks due to the nature of the computation being performed on the data. Bulk writes to HBase were performed by writing HFiles directly. Finally we developed a novel method for storing geospatial data in HBase that met the needs of our access pattern.

Statistics

Views

Total Views
2,873
Views on SlideShare
2,711
Embed Views
162

Actions

Likes
6
Downloads
92
Comments
0

2 Embeds 162

http://www.scoop.it 161
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
Post Comment
Edit your comment

Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield Building a geospatial processing pipeline using Hadoop and HBase and how Monsanto is using it to help farmers increase their yield Presentation Transcript