Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

5,392 views

Published on

Skybox Imaging is using Hadoop as the engine of it's satellite image processing system. Using CDH to store and process vast quantities of raw satellite image data enables Skybox to create a system that scales as they launch larger numbers of ever more complex satellites. Skybox has developed a CDH based framework that allows image processing specialists to develop complex processing algorithms using native code and then publish those algorithms into the highly scalable Hadoop Map/Reduce interface. This session will provide an overview of how we use hdfs, hbase and map/reduce to process raw camera data into high resolution satellite images.

Published in: Technology, Business

Hadoop World 2011: Indexing the Earth - Large Scale Satellite Image Processing Using Hadoop - Oliver Guinan, Skybox Imaging

  1. 1. Indexing the Earth Hadoop World NYC 2011Oliver Guinan -VP Ground Data Systems ollie@skyboximaging.com
  2. 2. Session Agenda‣ Skybox‣ The Big Data problem‣ Indexing the planet at scale‣ Questions 2 HadoopWorld 2011
  3. 3. Today’s data is oldBridge underconstruction Convention center (completed under construction 2009) (completed 2010) Stadium under construction Image taken (completed 2010) September 2008. > than three years old 3 HadoopWorld 2011
  4. 4. A problem of scale 4 HadoopWorld 2011
  5. 5. Satellite Imagery = Transparency... -15% 43% vegetation damage 55,245 gallons of oil crude 215 6,254 automobiles containers 5J F M A M J J A S O N D J F M A M J J A S O HadoopWorld 2011J N D F
  6. 6. The problem ofcapacity 6 HadoopWorld 2011
  7. 7. Sensor networkin space 7 HadoopWorld 2011
  8. 8. New approach: Many distributed, low-cost satellites 8 HadoopWorld 2011
  9. 9. Total Raw Data compute Sensor Network • Satellites produce ~1TB of raw data/day Single Satellite Sensors in Network 15 20Data Captured per Year (PB) 11.25 15 Sensors in Network Title 7.5 10 3.75 5 0 0 Year1 Year2 Year3 Year4 Year5 9 HadoopWorld 2011
  10. 10. Total Raw Data storage Sensor Network• Satellites produce ~1TB of raw data/day Single Satellite Sensors in Network 30 20Data Captured per Year (PB) 22.5 15 Sensors in Network Title 15 10 7.5 5 0 0 Year1 Year2 Year3 Year4 Year5 10 HadoopWorld 2011
  11. 11. Enter the elephant 11 HadoopWorld 2011
  12. 12. Hadoop from space - processing bits Hadoop is bad at: ๏Calling native C code or libraries at scale ๏Scientific computing is immature in Java 12 HadoopWorld 2011
  13. 13. Hadoop from space - processing bits Standard Java Hadoop ๏Hadoop knows where data stored ๏Jobs efficiently scheduled close to data ๏Throughput optimized 13 HadoopWorld 2011
  14. 14. Hadoop from space - processing bits Hadoop Pipes & Streaming ๏Hadoop schedules jobs without regard to the data required by the job ๏Native code reads data across the network ๏Drives up network costs and drives down throughput 14 HadoopWorld 2011
  15. 15. Hadoop from space - processing bits BusBoy ✓Hadoop manages data reads & writes ✓Hadoop schedules jobs close to the data ✓Jobs read data and hand off to native code for processing 15 HadoopWorld 2011
  16. 16. Architecture Overview Hadoop Task BusBoy C code math.lib gdal.lib cv.lib Inputs Outputs Logging Progress Hadoop JobTracker HDFS HBase Hive 16 HadoopWorld 2011
  17. 17. Framework Benefits - Deployment✓Low time to first byte✓Insight into job progress✓Diagnostics for large scale operations✓Logging 17 HadoopWorld 2011
  18. 18. Framework Benifits - Development✓Prototyping outside of Hadoop✓Rapid turnaround✓Testable interfaces 18 HadoopWorld 2011
  19. 19. Skybox providing Big Data✓Produce the most complete and timely data about the world✓Make data available to users to mine the raw data for information✓Turn Big Data into knowledge, at Earth scale Skybox BusBoy 19 HadoopWorld 2011
  20. 20. Color Images Simulated from aerial platform using flight sensor 20 HadoopWorld 2011
  21. 21. HD Video HadoopWorld 2011
  22. 22. Questions? Sample Data?bigdata@skyboximaging.com HadoopWorld 2011

×