SlideShare a Scribd company logo
1 of 22
Download to read offline
Indexing the Earth
       Hadoop World NYC 2011
Oliver Guinan -VP Ground Data Systems
         ollie@skyboximaging.com
Session Agenda



‣   Skybox

‣   The Big Data problem

‣   Indexing the planet at scale

‣   Questions


                                                      2
                                   HadoopWorld 2011
Today’s data is old


Bridge under
construction           Convention center
 (completed            under construction
    2009)               (completed 2010)


                                  Stadium under
                                   construction
   Image taken
                                (completed 2010)
    September
    2008. > than
  three years old


                                                           3
                                        HadoopWorld 2011
A problem of scale




                                        4
                     HadoopWorld 2011
Satellite Imagery = Transparency...


          -15%                                                                    43%
        vegetation                                                               damage




                                        55,245 gallons
                                         of oil crude




            215                                                                   6,254
        automobiles                                                             containers
                                                                                                           5
J   F   M   A   M   J   J   A   S   O   N   D   J   F   M   A   M   J   J   A    S   O HadoopWorld 2011J
                                                                                           N    D              F
The problem of
capacity




                                    6
                 HadoopWorld 2011
Sensor network
in space




                                    7
                 HadoopWorld 2011
New approach: Many distributed, low-cost satellites




                                                                 8
                                              HadoopWorld 2011
Total Raw Data compute
                                                                      Sensor Network
         • Satellites produce ~1TB of raw data/day                    Single Satellite
                                                                      Sensors in Network

                                15                                                   20
Data Captured per Year (PB)




                              11.25                                                  15




                                                                                            Sensors in Network Title
                                7.5                                                  10




                               3.75                                                    5




                                 0                                                     0
                                      Year1   Year2   Year3   Year4   Year5
                                                                                            9
                                                                         HadoopWorld 2011
Total Raw Data storage
                                                                     Sensor Network
• Satellites produce ~1TB of raw data/day                            Single Satellite
                                                                     Sensors in Network

                               30                                                   20
Data Captured per Year (PB)




                              22.5                                                  15




                                                                                           Sensors in Network Title
                               15                                                   10




                               7.5                                                    5




                                0                                                     0
                                     Year1   Year2   Year3   Year4   Year5
                                                                                           10
                                                                        HadoopWorld 2011
Enter the elephant




                                        11
                     HadoopWorld 2011
Hadoop from space - processing bits




 Hadoop is bad at:

 ๏Calling   native C code or libraries at scale

 ๏Scientific   computing is immature in Java


                                                              12
                                           HadoopWorld 2011
Hadoop from space - processing bits



 Standard Java Hadoop

 ๏Hadoop    knows where data stored

 ๏Jobs   efficiently scheduled close to data

 ๏Throughput    optimized

                                                             13
                                          HadoopWorld 2011
Hadoop from space - processing bits
 Hadoop Pipes & Streaming

 ๏Hadoop    schedules jobs without regard to

  the data required by the job

 ๏Native   code reads data across the network

 ๏Drives   up network costs and drives down

  throughput
                                                           14
                                        HadoopWorld 2011
Hadoop from space - processing bits


 BusBoy

 ✓Hadoop manages data reads & writes

 ✓Hadoop schedules jobs close to the data

 ✓Jobs read data and hand off to native code
  for processing
                                                          15
                                       HadoopWorld 2011
Architecture Overview

     Hadoop Task
          BusBoy
              C code

                                          math.lib
                                          gdal.lib
                                          cv.lib
            Inputs    Outputs   Logging     Progress



                   Hadoop JobTracker


           HDFS        HBase          Hive
                                                                          16
                                                       HadoopWorld 2011
Framework Benefits - Deployment



✓Low time to first byte

✓Insight into job progress

✓Diagnostics for large scale operations

✓Logging

                                                             17
                                          HadoopWorld 2011
Framework Benifits - Development



✓Prototyping outside of Hadoop

✓Rapid turnaround

✓Testable interfaces



                                                     18
                                  HadoopWorld 2011
Skybox providing Big Data

✓Produce the most complete and timely data
 about the world
✓Make data available to users to mine the raw
 data for information
✓Turn Big Data into knowledge, at Earth scale


                   Skybox
                        BusBoy
                                                          19
                                       HadoopWorld 2011
Color Images




               Simulated from aerial platform using flight sensor
                                                               20
                                      HadoopWorld 2011
HD Video




           HadoopWorld 2011
Questions?
       Sample Data?
bigdata@skyboximaging.com




                    HadoopWorld 2011

More Related Content

Similar to Indexing the Earth - Hadoop World 2011

SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
OReillyStrata
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
Brock Noland
 

Similar to Indexing the Earth - Hadoop World 2011 (20)

Run Your First Hadoop 2.x Program
Run Your First Hadoop 2.x ProgramRun Your First Hadoop 2.x Program
Run Your First Hadoop 2.x Program
 
Hadoop For Enterprises
Hadoop For EnterprisesHadoop For Enterprises
Hadoop For Enterprises
 
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
Troubleshooting Dual-Protocol Networks and Systems by Scott Hogg at gogoNET L...
 
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
Nagios Conference 2012 - Dave Josephsen - 2002 called they want there rrd she...
 
Scalability 09262012
Scalability 09262012Scalability 09262012
Scalability 09262012
 
(java2days) Is the Future of Java Cloudy?
(java2days) Is the Future of Java Cloudy?(java2days) Is the Future of Java Cloudy?
(java2days) Is the Future of Java Cloudy?
 
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL DatabasesSQL on Hadoop: Defining the New Generation of Analytic SQL Databases
SQL on Hadoop: Defining the New Generation of Analytic SQL Databases
 
Log everything!
Log everything!Log everything!
Log everything!
 
Content over IPv6: no excuses
Content over IPv6: no excusesContent over IPv6: no excuses
Content over IPv6: no excuses
 
Whats new in eCognition 8.8
Whats new in eCognition 8.8Whats new in eCognition 8.8
Whats new in eCognition 8.8
 
Hadoop crashcourse v3
Hadoop crashcourse v3Hadoop crashcourse v3
Hadoop crashcourse v3
 
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
The Synergy Between the Object Database, Graph Database, Cloud Computing and ...
 
Guides To Analyzing WebKit Performance
Guides To Analyzing WebKit PerformanceGuides To Analyzing WebKit Performance
Guides To Analyzing WebKit Performance
 
2012 10 24_briefing room
2012 10 24_briefing room2012 10 24_briefing room
2012 10 24_briefing room
 
Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01Commonanduniqueusecases 110831113310-phpapp01
Commonanduniqueusecases 110831113310-phpapp01
 
Common and unique use cases for Apache Hadoop
Common and unique use cases for Apache HadoopCommon and unique use cases for Apache Hadoop
Common and unique use cases for Apache Hadoop
 
Netapp Evento Virtual Business Breakfast 20110616
Netapp Evento  Virtual  Business  Breakfast 20110616Netapp Evento  Virtual  Business  Breakfast 20110616
Netapp Evento Virtual Business Breakfast 20110616
 
Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012Windows azure uk universities overview march 2012
Windows azure uk universities overview march 2012
 
Big Data Analysis Starts with R
Big Data Analysis Starts with RBig Data Analysis Starts with R
Big Data Analysis Starts with R
 
Delivering Over The Top Video at Scale - Akamai at OTTCon 2013
Delivering Over The Top Video at Scale - Akamai at OTTCon 2013Delivering Over The Top Video at Scale - Akamai at OTTCon 2013
Delivering Over The Top Video at Scale - Akamai at OTTCon 2013
 

Recently uploaded

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

Indexing the Earth - Hadoop World 2011

  • 1. Indexing the Earth Hadoop World NYC 2011 Oliver Guinan -VP Ground Data Systems ollie@skyboximaging.com
  • 2. Session Agenda ‣ Skybox ‣ The Big Data problem ‣ Indexing the planet at scale ‣ Questions 2 HadoopWorld 2011
  • 3. Today’s data is old Bridge under construction Convention center (completed under construction 2009) (completed 2010) Stadium under construction Image taken (completed 2010) September 2008. > than three years old 3 HadoopWorld 2011
  • 4. A problem of scale 4 HadoopWorld 2011
  • 5. Satellite Imagery = Transparency... -15% 43% vegetation damage 55,245 gallons of oil crude 215 6,254 automobiles containers 5 J F M A M J J A S O N D J F M A M J J A S O HadoopWorld 2011J N D F
  • 6. The problem of capacity 6 HadoopWorld 2011
  • 7. Sensor network in space 7 HadoopWorld 2011
  • 8. New approach: Many distributed, low-cost satellites 8 HadoopWorld 2011
  • 9. Total Raw Data compute Sensor Network • Satellites produce ~1TB of raw data/day Single Satellite Sensors in Network 15 20 Data Captured per Year (PB) 11.25 15 Sensors in Network Title 7.5 10 3.75 5 0 0 Year1 Year2 Year3 Year4 Year5 9 HadoopWorld 2011
  • 10. Total Raw Data storage Sensor Network • Satellites produce ~1TB of raw data/day Single Satellite Sensors in Network 30 20 Data Captured per Year (PB) 22.5 15 Sensors in Network Title 15 10 7.5 5 0 0 Year1 Year2 Year3 Year4 Year5 10 HadoopWorld 2011
  • 11. Enter the elephant 11 HadoopWorld 2011
  • 12. Hadoop from space - processing bits Hadoop is bad at: ๏Calling native C code or libraries at scale ๏Scientific computing is immature in Java 12 HadoopWorld 2011
  • 13. Hadoop from space - processing bits Standard Java Hadoop ๏Hadoop knows where data stored ๏Jobs efficiently scheduled close to data ๏Throughput optimized 13 HadoopWorld 2011
  • 14. Hadoop from space - processing bits Hadoop Pipes & Streaming ๏Hadoop schedules jobs without regard to the data required by the job ๏Native code reads data across the network ๏Drives up network costs and drives down throughput 14 HadoopWorld 2011
  • 15. Hadoop from space - processing bits BusBoy ✓Hadoop manages data reads & writes ✓Hadoop schedules jobs close to the data ✓Jobs read data and hand off to native code for processing 15 HadoopWorld 2011
  • 16. Architecture Overview Hadoop Task BusBoy C code math.lib gdal.lib cv.lib Inputs Outputs Logging Progress Hadoop JobTracker HDFS HBase Hive 16 HadoopWorld 2011
  • 17. Framework Benefits - Deployment ✓Low time to first byte ✓Insight into job progress ✓Diagnostics for large scale operations ✓Logging 17 HadoopWorld 2011
  • 18. Framework Benifits - Development ✓Prototyping outside of Hadoop ✓Rapid turnaround ✓Testable interfaces 18 HadoopWorld 2011
  • 19. Skybox providing Big Data ✓Produce the most complete and timely data about the world ✓Make data available to users to mine the raw data for information ✓Turn Big Data into knowledge, at Earth scale Skybox BusBoy 19 HadoopWorld 2011
  • 20. Color Images Simulated from aerial platform using flight sensor 20 HadoopWorld 2011
  • 21. HD Video HadoopWorld 2011
  • 22. Questions? Sample Data? bigdata@skyboximaging.com HadoopWorld 2011