Use of Cloud Computing for scalable geospatial
data processing and access
Andrew Turner
CTO, FortiusOne
andrew@fortiusone.com

Partner: U.S. Federal Geographic Data Committee
What is GeoCommons?
                      A Brief History
Vulnerability Identification

                                 Chicago



                        Denver         Atlanta   Fiber Density

                 Route 2


Los Angeles
              Route 1




        Electric
   Transmission Line
        Density
Columbus
      Circle



Holland
Tunnel

                   Baseline connectivity of a fiber
             WTC   network provider in NYC. This
                   particular provider is a good proxy
                   for the structure of the entire island
                   of Manhattan since they have about
                   80% of the right of ways on the
                   island and a large number of egress
                   points off the island. The higher
                   the peak in the map the more
                   frequently used the path is as a
                   possible routing path.
Lastly a scenario is run where just
10,000 sq ft. of damage is done to
the Holland Tunnel and the impact
calculated. The result is a 8.6%
loss of network connectivity, 134
times the impact of the WTC
simulation. The dramatic impact is
seen in the image from the loss as
well as the stress put on the GW
Bridge route out of the city.
GeoCommons: Version 1
Find
interesting data
Find             Map a
interesting data   relevant area
Find             Map a        Visualize to
interesting data   relevant area   find meaning
Find             Map a        Visualize to
interesting data   relevant area   find meaning


                                   Layer, Modify,
                                    and Analyze
Find             Map a        Visualize to
interesting data   relevant area   find meaning


                   Collaborate     Layer, Modify,
                   with others      and Analyze
Find             Map a        Visualize to
interesting data   relevant area   find meaning


 Publish and       Collaborate     Layer, Modify,
 share results     with others      and Analyze
Visualization
Analysis
Applying Lessons Learned
Modularize

             Application Programming Interface




                Finder              Maker




 RESTful                  Core
Interfaces
Relational Databases Don’t Scale Well
Datasets as Databases

           KML
           Shapefile
           CSV (Excel)
           GeoRSS
           Documents




         Finder                 Maker




                         Core
Datasets as Databases

 Upload    KML
           Shapefile
           CSV (Excel)
           GeoRSS
           Documents




          Finder                Maker




                         Core
Datasets as Databases

 Upload     KML
            Shapefile
            CSV (Excel)
            GeoRSS
            Documents




          Finder                 Maker

          Parse &
            Store



                          Core
Datasets as Databases

 Upload     KML
            Shapefile
            CSV (Excel)
            GeoRSS
            Documents




          Finder                 Maker

          Parse &
            Store



                          Core
Datasets as Databases

 Upload     KML
            Shapefile
            CSV (Excel)
            GeoRSS
            Documents




          Finder                 Maker

          Parse &
            Store



                          Core
Datasets as Databases

 Upload     KML
            Shapefile
            CSV (Excel)
            GeoRSS
            Documents




          Finder                 Maker

          Parse &
            Store



                          Core
Datasets as Databases

 Upload     KML
            Shapefile
            CSV (Excel)
            GeoRSS
            Documents
                          Download



          Finder                     Maker

          Parse &
            Store



                          Core
Datasets as Databases

 Upload     KML
            Shapefile
            CSV (Excel)
            GeoRSS
            Documents
                          Download



          Finder                      Maker

          Parse &
            Store                    Analyze



                          Core
Datasets as Databases

 Upload     KML
            Shapefile                 Visualize
            CSV (Excel)
            GeoRSS
            Documents
                          Download



          Finder                      Maker

          Parse &
            Store                    Analyze



                          Core
Geospatial Catalog and Server
Delivery Mechanisms
Appliances

• Sun 4150
• RAID Array
Web Scaled Racks

• 3 Appliances
• Network File Storage
• Load Balancer
• Monitoring and Tunnels
• Production & Staging racks
• Racks in office for development
Limits in   Limits in
Scaling     Development
Limits in   Limits in
Scaling     Development
People
Limits in   Limits in
Scaling     Development
People
Power
Limits in   Limits in
Scaling     Development
People
Power
Size
Limits in   Limits in
Scaling     Development
People
Power
Size
Cost
Limits in   Limits in
Scaling     Development
People
Power
Size
Cost
Time
Limits in   Limits in
Scaling     Development
People
Power
Size
Cost
Time
Limits in   Limits in
Scaling     Development
People      Testing on “clean” machines
Power
Size
Cost
Time
Limits in   Limits in
Scaling     Development
People      Testing on “clean” machines
Power
Size        Deployment testing of
Cost        upgrades
Time
Limits in   Limits in
Scaling     Development
People      Testing on “clean” machines
Power
Size        Deployment testing of
Cost        upgrades
Time
            Controlled Environments
Leveraging the Cloud




                       http://www.flickr.com/photos/kky/704056791
                                                              url
Amazon Web Services
Management Consoles
Processing via MapReduce
Launching New Instances
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand



    CentOS AMI
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand



    CentOS AMI



   build
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand


                          register
    CentOS AMI

                 bundle

   build
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand


                          register
    CentOS AMI

                 bundle              instantiate

   build
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand


                          register
    CentOS AMI

                 bundle              instantiate

   build
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand


                          register
    CentOS AMI

                 bundle              instantiate

   build
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand


                          register
    CentOS AMI

                 bundle              instantiate

   build
Elastic Computing Cluster - EC2

• Virtual Servers
• Machine Images (AMI)
• On-Demand


                          register
    CentOS AMI

                 bundle              instantiate

   build
Elastic Block Store - EBS




Create EBS


        100 GB
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB            snapshot
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB                           snapshot




                          S3   Diff v1
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB                           snapshot




                          S3   Diff v1       Diff v2
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB                           snapshot


                                                       Create & Attach

                          S3   Diff v1       Diff v2
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB                           snapshot


                                                       Create & Attach

                          S3   Diff v1       Diff v2
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB                           snapshot


                                                       Create & Attach

                          S3   Diff v1       Diff v2
Elastic Block Store - EBS




Create EBS

                 attach
        100 GB                           snapshot


                                                       Create & Attach

                          S3   Diff v1       Diff v2
Public Datasets
Additional Benefits

• Federation
• Tile generation
• Content-delivery System
• Simple Queue System (SQS)
                                 tiles/openstreetmap/9/74/97.png
                                 tiles/openstreetmap/9/74/98.png
                                 tiles/bluemarble/9/74/97.png
                    S3 Storage   tiles/bluemarble/9/74/98.png
Cloud Architecture

• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup




     v1.4.3

  Default
 Datasets
Cloud Architecture

• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
                create
              instance



     v1.4.3

  Default
 Datasets
Cloud Architecture

• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
                create
              instance



     v1.4.3

  Default
 Datasets
Cloud Architecture

• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
                create
              instance



     v1.4.3

                  attach data
  Default
 Datasets
Cloud Architecture

• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
                create
              instance



     v1.4.3                     Snapshot
                  attach data
  Default
 Datasets

                                      Backup   Backup   Backup
Cloud Architecture

• EC2 image of current system architecture
• EBS image stored to S3 of default database
• Current application release in S3
• Start an EC2, attach data, attach code, startup
                create
              instance
                                    Cache
                                                        S3
                                  Downloads

     v1.4.3                     Snapshot
                  attach data
  Default
 Datasets

                                      Backup   Backup    Backup
Scaling

• RESTful architecture
• Caching for speed, and CDN support
• Amazon Web Services
  • CloudWatch
  • Elastic Scaling
  • Load Balancer
Private Instances
First Users: Meedan, Media
Repeatable
Repeatable
Data Federation


                  community
Geospatial Federated Search Search
Geocoding
Geocoding - Scale as Required

   Upload
    CSV



                          Cache
             Geocode
                          Results

            API
                                    Geocoding
                                    Engine

                       TIGER/Line
                           SQLite
Geocoding - Scale as Required

   Upload
    CSV



                          Cache
             Geocode
                          Results

            API
                                    Geocoding
                                    Engine

                       TIGER/Line
                           SQLite
Best Practices Applied to the Government

• Built using open, established tools
• Full choice - Linux, Windows
• Full Control
• Repeatable processes
• Continual backup
• Scaling dynamic and large datasets
• Synchronous and Asynchronous analysis
Level of Maturity

• Widely adopted
• Broad support and ecosystem
• Full stack support
Perceived Impediments to Adoption

• Single Vendor (open-source alternatives arising)
• Maintenance and Location
• Data Security
Thank you




                  Andrew Turner
          andrew@fortiusone.com
http://highearthorbit.com/presentations

Geospatial Analysis in the Cloud

  • 1.
    Use of CloudComputing for scalable geospatial data processing and access Andrew Turner CTO, FortiusOne andrew@fortiusone.com Partner: U.S. Federal Geographic Data Committee
  • 2.
    What is GeoCommons? A Brief History
  • 3.
    Vulnerability Identification Chicago Denver Atlanta Fiber Density Route 2 Los Angeles Route 1 Electric Transmission Line Density
  • 6.
    Columbus Circle Holland Tunnel Baseline connectivity of a fiber WTC network provider in NYC. This particular provider is a good proxy for the structure of the entire island of Manhattan since they have about 80% of the right of ways on the island and a large number of egress points off the island. The higher the peak in the map the more frequently used the path is as a possible routing path.
  • 7.
    Lastly a scenariois run where just 10,000 sq ft. of damage is done to the Holland Tunnel and the impact calculated. The result is a 8.6% loss of network connectivity, 134 times the impact of the WTC simulation. The dramatic impact is seen in the image from the loss as well as the stress put on the GW Bridge route out of the city.
  • 8.
  • 10.
  • 11.
    Find Map a interesting data relevant area
  • 12.
    Find Map a Visualize to interesting data relevant area find meaning
  • 13.
    Find Map a Visualize to interesting data relevant area find meaning Layer, Modify, and Analyze
  • 14.
    Find Map a Visualize to interesting data relevant area find meaning Collaborate Layer, Modify, with others and Analyze
  • 15.
    Find Map a Visualize to interesting data relevant area find meaning Publish and Collaborate Layer, Modify, share results with others and Analyze
  • 19.
  • 20.
  • 21.
  • 22.
    Modularize Application Programming Interface Finder Maker RESTful Core Interfaces
  • 23.
  • 24.
    Datasets as Databases KML Shapefile CSV (Excel) GeoRSS Documents Finder Maker Core
  • 25.
    Datasets as Databases Upload KML Shapefile CSV (Excel) GeoRSS Documents Finder Maker Core
  • 26.
    Datasets as Databases Upload KML Shapefile CSV (Excel) GeoRSS Documents Finder Maker Parse & Store Core
  • 27.
    Datasets as Databases Upload KML Shapefile CSV (Excel) GeoRSS Documents Finder Maker Parse & Store Core
  • 28.
    Datasets as Databases Upload KML Shapefile CSV (Excel) GeoRSS Documents Finder Maker Parse & Store Core
  • 29.
    Datasets as Databases Upload KML Shapefile CSV (Excel) GeoRSS Documents Finder Maker Parse & Store Core
  • 30.
    Datasets as Databases Upload KML Shapefile CSV (Excel) GeoRSS Documents Download Finder Maker Parse & Store Core
  • 31.
    Datasets as Databases Upload KML Shapefile CSV (Excel) GeoRSS Documents Download Finder Maker Parse & Store Analyze Core
  • 32.
    Datasets as Databases Upload KML Shapefile Visualize CSV (Excel) GeoRSS Documents Download Finder Maker Parse & Store Analyze Core
  • 33.
  • 34.
  • 35.
  • 36.
    Web Scaled Racks •3 Appliances • Network File Storage • Load Balancer • Monitoring and Tunnels • Production & Staging racks • Racks in office for development
  • 37.
    Limits in Limits in Scaling Development
  • 38.
    Limits in Limits in Scaling Development People
  • 39.
    Limits in Limits in Scaling Development People Power
  • 40.
    Limits in Limits in Scaling Development People Power Size
  • 41.
    Limits in Limits in Scaling Development People Power Size Cost
  • 42.
    Limits in Limits in Scaling Development People Power Size Cost Time
  • 43.
    Limits in Limits in Scaling Development People Power Size Cost Time
  • 44.
    Limits in Limits in Scaling Development People Testing on “clean” machines Power Size Cost Time
  • 45.
    Limits in Limits in Scaling Development People Testing on “clean” machines Power Size Deployment testing of Cost upgrades Time
  • 46.
    Limits in Limits in Scaling Development People Testing on “clean” machines Power Size Deployment testing of Cost upgrades Time Controlled Environments
  • 47.
    Leveraging the Cloud http://www.flickr.com/photos/kky/704056791 url
  • 48.
  • 49.
  • 50.
  • 51.
  • 52.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand CentOS AMI
  • 53.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand CentOS AMI build
  • 54.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand register CentOS AMI bundle build
  • 55.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand register CentOS AMI bundle instantiate build
  • 56.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand register CentOS AMI bundle instantiate build
  • 57.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand register CentOS AMI bundle instantiate build
  • 58.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand register CentOS AMI bundle instantiate build
  • 59.
    Elastic Computing Cluster- EC2 • Virtual Servers • Machine Images (AMI) • On-Demand register CentOS AMI bundle instantiate build
  • 60.
    Elastic Block Store- EBS Create EBS 100 GB
  • 61.
    Elastic Block Store- EBS Create EBS attach 100 GB
  • 62.
    Elastic Block Store- EBS Create EBS attach 100 GB snapshot
  • 63.
    Elastic Block Store- EBS Create EBS attach 100 GB snapshot S3 Diff v1
  • 64.
    Elastic Block Store- EBS Create EBS attach 100 GB snapshot S3 Diff v1 Diff v2
  • 65.
    Elastic Block Store- EBS Create EBS attach 100 GB snapshot Create & Attach S3 Diff v1 Diff v2
  • 66.
    Elastic Block Store- EBS Create EBS attach 100 GB snapshot Create & Attach S3 Diff v1 Diff v2
  • 67.
    Elastic Block Store- EBS Create EBS attach 100 GB snapshot Create & Attach S3 Diff v1 Diff v2
  • 68.
    Elastic Block Store- EBS Create EBS attach 100 GB snapshot Create & Attach S3 Diff v1 Diff v2
  • 69.
  • 70.
    Additional Benefits • Federation •Tile generation • Content-delivery System • Simple Queue System (SQS) tiles/openstreetmap/9/74/97.png tiles/openstreetmap/9/74/98.png tiles/bluemarble/9/74/97.png S3 Storage tiles/bluemarble/9/74/98.png
  • 71.
    Cloud Architecture • EC2image of current system architecture • EBS image stored to S3 of default database • Current application release in S3 • Start an EC2, attach data, attach code, startup v1.4.3 Default Datasets
  • 72.
    Cloud Architecture • EC2image of current system architecture • EBS image stored to S3 of default database • Current application release in S3 • Start an EC2, attach data, attach code, startup create instance v1.4.3 Default Datasets
  • 73.
    Cloud Architecture • EC2image of current system architecture • EBS image stored to S3 of default database • Current application release in S3 • Start an EC2, attach data, attach code, startup create instance v1.4.3 Default Datasets
  • 74.
    Cloud Architecture • EC2image of current system architecture • EBS image stored to S3 of default database • Current application release in S3 • Start an EC2, attach data, attach code, startup create instance v1.4.3 attach data Default Datasets
  • 75.
    Cloud Architecture • EC2image of current system architecture • EBS image stored to S3 of default database • Current application release in S3 • Start an EC2, attach data, attach code, startup create instance v1.4.3 Snapshot attach data Default Datasets Backup Backup Backup
  • 76.
    Cloud Architecture • EC2image of current system architecture • EBS image stored to S3 of default database • Current application release in S3 • Start an EC2, attach data, attach code, startup create instance Cache S3 Downloads v1.4.3 Snapshot attach data Default Datasets Backup Backup Backup
  • 77.
    Scaling • RESTful architecture •Caching for speed, and CDN support • Amazon Web Services • CloudWatch • Elastic Scaling • Load Balancer
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85.
    Geocoding - Scaleas Required Upload CSV Cache Geocode Results API Geocoding Engine TIGER/Line SQLite
  • 86.
    Geocoding - Scaleas Required Upload CSV Cache Geocode Results API Geocoding Engine TIGER/Line SQLite
  • 87.
    Best Practices Appliedto the Government • Built using open, established tools • Full choice - Linux, Windows • Full Control • Repeatable processes • Continual backup • Scaling dynamic and large datasets • Synchronous and Asynchronous analysis
  • 88.
    Level of Maturity •Widely adopted • Broad support and ecosystem • Full stack support
  • 89.
    Perceived Impediments toAdoption • Single Vendor (open-source alternatives arising) • Maintenance and Location • Data Security
  • 90.
    Thank you Andrew Turner andrew@fortiusone.com http://highearthorbit.com/presentations