Why Teams call analytics are critical to your entire business
Indexing the Earth - Hadoop World 2011
1. Indexing the Earth
Hadoop World NYC 2011
Oliver Guinan -VP Ground Data Systems
ollie@skyboximaging.com
2. Session Agenda
‣ Skybox
‣ The Big Data problem
‣ Indexing the planet at scale
‣ Questions
2
HadoopWorld 2011
3. Today’s data is old
Bridge under
construction Convention center
(completed under construction
2009) (completed 2010)
Stadium under
construction
Image taken
(completed 2010)
September
2008. > than
three years old
3
HadoopWorld 2011
5. Satellite Imagery = Transparency...
-15% 43%
vegetation damage
55,245 gallons
of oil crude
215 6,254
automobiles containers
5
J F M A M J J A S O N D J F M A M J J A S O HadoopWorld 2011J
N D F
9. Total Raw Data compute
Sensor Network
• Satellites produce ~1TB of raw data/day Single Satellite
Sensors in Network
15 20
Data Captured per Year (PB)
11.25 15
Sensors in Network Title
7.5 10
3.75 5
0 0
Year1 Year2 Year3 Year4 Year5
9
HadoopWorld 2011
10. Total Raw Data storage
Sensor Network
• Satellites produce ~1TB of raw data/day Single Satellite
Sensors in Network
30 20
Data Captured per Year (PB)
22.5 15
Sensors in Network Title
15 10
7.5 5
0 0
Year1 Year2 Year3 Year4 Year5
10
HadoopWorld 2011
12. Hadoop from space - processing bits
Hadoop is bad at:
๏Calling native C code or libraries at scale
๏Scientific computing is immature in Java
12
HadoopWorld 2011
13. Hadoop from space - processing bits
Standard Java Hadoop
๏Hadoop knows where data stored
๏Jobs efficiently scheduled close to data
๏Throughput optimized
13
HadoopWorld 2011
14. Hadoop from space - processing bits
Hadoop Pipes & Streaming
๏Hadoop schedules jobs without regard to
the data required by the job
๏Native code reads data across the network
๏Drives up network costs and drives down
throughput
14
HadoopWorld 2011
15. Hadoop from space - processing bits
BusBoy
✓Hadoop manages data reads & writes
✓Hadoop schedules jobs close to the data
✓Jobs read data and hand off to native code
for processing
15
HadoopWorld 2011
17. Framework Benefits - Deployment
✓Low time to first byte
✓Insight into job progress
✓Diagnostics for large scale operations
✓Logging
17
HadoopWorld 2011
18. Framework Benifits - Development
✓Prototyping outside of Hadoop
✓Rapid turnaround
✓Testable interfaces
18
HadoopWorld 2011
19. Skybox providing Big Data
✓Produce the most complete and timely data
about the world
✓Make data available to users to mine the raw
data for information
✓Turn Big Data into knowledge, at Earth scale
Skybox
BusBoy
19
HadoopWorld 2011
20. Color Images
Simulated from aerial platform using flight sensor
20
HadoopWorld 2011