SlideShare a Scribd company logo
1 of 22
Experiences on Processing Spatial
      Data with MapReduce

   Ariel Cary, Zhengguo Sun, Vagelis Hristidis, Naphtali Rishe
                  Florida International University
          School of Computing and Information Sciences
                11200 SW 8th St, Miami, FL 33199
           {acary001,sunz,vagelis,rishen}@cis.fiu.edu

         Sponsored by: NSF Cluster Exploratory (CluE)
Agenda
    1. Introduction
    2. Solving Spatial Problems in MapReduce
          β€’      R-Tree Index Construction
          β€’      Aerial Image Processing
    3. Experimental Results
    4. Related Work
    5. Conclusion




2   Florida International University
Introduction
    ο‚— Spatial databases mainly store:
        ο‚— Raster data (satellite/aerial digital images), and
        ο‚— Vector data (points, lines, polygons).
    ο‚— Traditional sequential computing models may
      take excessive time to process large and complex
      spatial repositories.
    ο‚— Emerging parallel computing models, such as
      MapReduce, provide a potential for scaling data
      processing in spatial applications.



3   Florida International University
Introduction (cont.)
    ο‚— MapReduce is an emerging massively parallel
       computing model (Google) composed of two
       functions:
        ο‚— Map: takes a key/value pair, executes some
          computation, and emits a set of intermediate
          key/value pairs as output.
        ο‚— Reduce: merges its intermediate values, executes
          some computation on them, and emits the final
          output.
    ο‚— In this work, we present our experiences in
       applying the MapReduce model to:
        ο‚— Bulk-construct R-Trees (vector) and
4   Florida International University
        ο‚— Compute aerial image quality (raster)
Introduction (cont.)
    ο‚— Apache's Hadoop
                                                                                                    Proxy
    ο‚— Linux operating                                       Cloud

        system
    ο‚—   XEN hypervisor
    ο‚—   Hadoop Distributed                          1
                                                                          Internet
        File System (HDFS)             bin/hadoop dfs put




    ο‚—   SOCKS proxy
        server                                                              2

                                                                    bin/hadoop jar

    ο‚—   480 computers                                                                         3

                                                                                     bin/hadoop dfs get
        (nodes), each half
        terabytes storage
5   Florida International University
2. Solving Spatial Problems in
          MapReduce
  β€’   R-Tree Index Construction
  β€’   Aerial Image Processing
MapReduce (MR) R-Tree
    Construction
    ο‚— R-Tree Bulk-Construction
        ο‚— Every object o in database D has two attributes:
          ο‚— o.id - the object’s unique identifier.
          ο‚— o.P - the object’s location in some spatial domain.
        ο‚— The goal is to build an R-Tree index on D.
    ο‚— MapReduce Algorithm
          1. Database partitioning function computation
             (MR).
          2. A small R-Tree is created for each partition (MR).
          3. The small R-Trees are merged into the final R-
             Tree.

7   Florida International University
Phase 1 – Partitioning Function

    – Goal: compute f to assign objects of D into one of
       RMap and Reduce inputs/outputss.t.:
           possible partitions in computing partitioning function f.
         –Function Input: (Key, Value) Output: (Key,are
             R (ideally) equally-sized partitions Value)
             generated (minimal variance(C, U(o.P))
              Map                   (o.id, o.P)       is acceptable).
             Reduce         (C, list(ui, i=1, .., L))    SΒ΄
         – Objects close in the spatial domain are placed
    Where:   within the same partition.
          β€’ o is an solution:
    – Proposed spatial object in the database.
          β€’ C which is a constant that helps in sending Mappers'
         – Use Z-order space-filling curve to map spatial
              outputs to a single Reducer.
          β€’ coordinate samples (3%) into an value.
              U is a space-filling curve, e.g. Z-order sorted
          β€’ sequence. containing R-1 splitting points.
              S' is an array

8
         – Collect splitting points that partition the
    Florida International University
             sequence in R ranges.
Phase 2 - R-Tree Construction in MR
    ο‚— Mappers compute f() values for objects.
    ο‚— Reducers compute an R-Tree for each group of
       objects with identical f() value
                         MapReduce functions in constructing R-Trees.
            Function        Input: (Key, Value)              Output: (Key, Value)
              Map                 (o.id, o.P)                     (f(o.P), o)
             Reduce         (f(o.P), list(oi, i=1, .., A))         tree.root


    Where:
      β€’ o is an spatial object in the database.
      β€’ f is the partitioning function computed in phase
        1.
      β€’ Tree.root is the R-Tree root node.
9   Florida International University
Phase 3 - R-Tree Consolidation
     ο‚— sequential process




10   Florida International University
Image Processing in MapReduce
     ο‚— Aerial Image Quality Computation
         ο‚— Let d be a orthorectified aerial photography (DOQQ)
           file and t be a tile inside d, d.name is d’s file name
           and t.q is the quality information of tile t.
         ο‚— The goal is to compute a quality bitmap for d.
     ο‚— MapReduce Algorithm
         ο‚— A customized InputFormatter partitions each DOQQ
           file d into several splits containing multiple tiles.
         ο‚— The Mappers compute the quality bitmap for each
           tile inside a split.
         ο‚— The Reducers merge all the bitmaps that belongs to
           a file d and write them to an output file.

11   Florida International University
Image Processing in MapReduce
     ο‚— MapReduce Algorithm

                           Input and output of map and reduce functions
       Function          Input: (Key, Value)           Output: (Key, Value)
         Map                (d.name+t.id, t)            (d.name, (t.id,t.q))
        Reduce           (d.name, list(t.id,t.q))       Quality-bitmap of d


     Where:
       β€’ d is a DOQQ file.
       β€’ t is a tile in d.
       β€’ t.q is the quality bitmap of t.

12   Florida International University
3. Experimental Results
Experimental Results: Setting
     ο‚— Data Set                                   Table 4. Spatial data sets used in experiments*.

           Problem      Data                    Data size
                                   Objects                                        Description
                         set                     (GB)
            R-Tree      FLD         11.4 M           5       Points of properties in the state of Florida.
                                                             Yellow pages directory of points of businesses mostly
                         YPD         37 M           5.3
                                                             in the United States but also in other countries.
            Image      Miami-                                Aerial imagery of Miami-Dade county, FL (3-inch
                                    482 files       52
            Quality    Dade                                  resolution)

            * Data sets supplied by the High Performance Database Research Center at Florida International University

     ο‚— Environment
       ο‚— The cluster was provided by the Google and IBM
         Academic Cluster Computing Initiative.
       ο‚— The cluster contains around 480 computers running
         Hadoop - open source MapReduce.
14     Florida International University
Experimental Results: R-Tree
     ο‚— R-Tree Construction Performance Metrics

                         30.00                                                  60.00

                         25.00                                                  50.00
                                                           MR2                                                    MR2
                         20.00                                                  40.00




                                                                   Time (min)
            Time (min)




                                                           MR1                                                    MR1
                         15.00                                                  30.00

                         10.00                                                  20.00

                          5.00                                                  10.00

                          0.00                                                   0.00
                                 2    4     8   16    32    64                          4     8      16      32    64
                                          Reducers                                                Reducers
                                  (a) FLD data set                                          (b) YPD data set
                          MapReduce job completion times for various number of reducers in phase-2 (MR2).




15   Florida International University
Experimental Results: R-Tree
     ο‚— MapReduce R-Trees vs. Single Process (SP)
                                        Objects per Reducer       Consolidated R-Tree

                Data set       R        Average       Stdev        Nodes       Height
                   FLD         2          5,690,419    12,183        172,776     4
                               4          2,845,210     6,347        172,624     4
                               8          1,422,605     2,235        173,141     4
                              16            711,379     2,533        162,518     4
                              32            355,651     2,379        173,273     3
                              64            177,826     1,816        173,445     3
                              SP         11,382,185           0      172,681     4
                   YPD         4          9,257,188    22,137        568,854     4
                               8          4,628,594     9,413        568,716     4
                              16          2,314,297     7,634        568,232     4
                              32          1,157,149     6,043        567,550     4
                              64            578,574     2,982        566,199     4
                              SP         37,034,126           0      587,353     5

16   Florida International University
Experimental Results: Imagery
     ο‚— Tile Quality Computation


                       25                                                         4
                                                    Reduce                       3.5
                                                                                       Reduce
                       20                                                              Map
                                                    Map                           3
         Time (min)




                                                                    Time (min)
                       15                                                        2.5
                                                                                  2
                       10
                                                                                 1.5

                        5                                                         1
                                                                                 0.5
                        0                                                         0
                             4   8   16 32 64 128 256 512                              2       4      8        16
                                       Reducers                                            Size of data (GB)

                      (a) Fixed data size, variable Reducers  (b) Variable data size, fixed Reducers
                               Fig. 9. MapReduce job completion time for tile quality computation


17   Florida International University
4. Related Work
Related Work
     ο‚— Previous works on R-Tree parallel construction faced
        intrinsic distributed computing problems: load
        balancing, process scheduling, fault tolerance, etc.
     ο‚— Schnitzer and Leutenegger [16] proposed a Master-
        Client R-Tree, where the data set is first partitioned
        using Hilbert packing sort algorithm, then the
        partitions are declustered into a number of
        processors, where individual trees are built. At the
        end, a master process combines the individual trees
        into the final R-Tree.
     ο‚— Papadopoulos and Manolopoulos [17] proposed a
        methodology for sampling-based space partitionining,
        load balancing, and partition assignment into a set of
     Florida International University
19
        processors in parallely building R-Trees.
5. Conclusion
Conclusion
     ο‚— We used the MapReduce model to solve two
        spatial problems on a Google&IBM cluster:
         ο‚— (a) Bulk-construction of R-Trees and
         ο‚— (b) Aerial image quality computation
     ο‚— MapReduce can dramatically improve task
       completion times. Our experiments show close to
       linear scalability.
     ο‚— Our experience in this work shows MapReduce
       has the potential to be applicable to more
       complex spatial problems.

21   Florida International University
References
     [1] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching.
         SIGMOD 1984:47-57.
     [2] NSF Cluster Exploratory Program:
         http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm
     [3] Google&IBM Academic Cluster Computing Initiative:
     http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html
     [4] Apache Hadoop project: http://hadoop.apache.org
     [6] Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified data processing on
         large clusters. In Proceedings of the 6th Conference on Symposium on
         Opearting Systems Design & Implementation, USENIX Association, Volume 6,
         pp. 10-10, December 2004.
     [12]        High Performance Database Research Center (HPDRC), Research
         Division of the Florida International University, School of Computing and
         Information Sciences, University Park, Telephone: (305) 348-1706, FIU ECS-243,
         Miami, FL 33199.
     [16]        Schnitzer B., Leutenegger S.T.: Master-client R-trees: a new parallel R-
         tree architecture, In Proceedings of the 11th International Conference on
         Scientific and Statistical Database Management, pp. 68-77, August 1999.
     [17]        Apostolos Papadopoulos, Yannis Manolopoulos: Parallel bulk-loading of
         spatial data, Parallel Computing, Volume 29, Issue 10, pp. 1419 - 1444, October
         2003.
22   Florida International University

More Related Content

What's hot

Hadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperHadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperDavid Luan
Β 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and RJunHo Cho
Β 
Algorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fmAlgorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fmMark Levy
Β 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation InformaticsDavid Gleich
Β 
Cerba ppt gi2011-harmonization-of-spatial-planning-data_final
Cerba ppt gi2011-harmonization-of-spatial-planning-data_finalCerba ppt gi2011-harmonization-of-spatial-planning-data_final
Cerba ppt gi2011-harmonization-of-spatial-planning-data_finalIGN Vorstand
Β 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on HadoopDataWorks Summit
Β 
Implementação do Hash Coalha/Coalesced
Implementação do Hash Coalha/CoalescedImplementação do Hash Coalha/Coalesced
Implementação do Hash Coalha/CoalescedCriatividadeZeroDocs
Β 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoopDavid Chiu
Β 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Cloudera, Inc.
Β 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High AvailabilityCloudera, Inc.
Β 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examplesAndrea Iacono
Β 
D49996 gc11 intro
D49996 gc11 introD49996 gc11 intro
D49996 gc11 introHkn Crk
Β 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windowsMuhammad Shahid
Β 

What's hot (15)

Hadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaperHadoop with Lustre WhitePaper
Hadoop with Lustre WhitePaper
Β 
Integrate Hive and R
Integrate Hive and RIntegrate Hive and R
Integrate Hive and R
Β 
Algorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fmAlgorithms on Hadoop at Last.fm
Algorithms on Hadoop at Last.fm
Β 
Simulation Informatics
Simulation InformaticsSimulation Informatics
Simulation Informatics
Β 
Cerba ppt gi2011-harmonization-of-spatial-planning-data_final
Cerba ppt gi2011-harmonization-of-spatial-planning-data_finalCerba ppt gi2011-harmonization-of-spatial-planning-data_final
Cerba ppt gi2011-harmonization-of-spatial-planning-data_final
Β 
Enabling R on Hadoop
Enabling R on HadoopEnabling R on Hadoop
Enabling R on Hadoop
Β 
Implementação do Hash Coalha/Coalesced
Implementação do Hash Coalha/CoalescedImplementação do Hash Coalha/Coalesced
Implementação do Hash Coalha/Coalesced
Β 
Ads
AdsAds
Ads
Β 
Big Data Analysis With RHadoop
Big Data Analysis With RHadoopBig Data Analysis With RHadoop
Big Data Analysis With RHadoop
Β 
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
Β 
Hadoop exercise
Hadoop exerciseHadoop exercise
Hadoop exercise
Β 
Hw09 Production Deep Dive With High Availability
Hw09   Production Deep Dive With High AvailabilityHw09   Production Deep Dive With High Availability
Hw09 Production Deep Dive With High Availability
Β 
Mapreduce by examples
Mapreduce by examplesMapreduce by examples
Mapreduce by examples
Β 
D49996 gc11 intro
D49996 gc11 introD49996 gc11 intro
D49996 gc11 intro
Β 
Map reduce and Hadoop on windows
Map reduce and Hadoop on windowsMap reduce and Hadoop on windows
Map reduce and Hadoop on windows
Β 

Similar to Experiences on Processing Spatial Data with MapReduce ssdbm09

MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxHARIKRISHNANU13
Β 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advancedChirag Ahuja
Β 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraSomnath Mazumdar
Β 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pigSean Murphy
Β 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansattilacsordas
Β 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersXiao Qin
Β 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to HadoopApache Apex
Β 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesKelly Technologies
Β 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopVictoria LΓ³pez
Β 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesKelly Technologies
Β 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Skills Matter Talks
Β 
Sector CloudSlam 09
Sector CloudSlam 09Sector CloudSlam 09
Sector CloudSlam 09Robert Grossman
Β 
Knowledg graphs yosi mass
Knowledg graphs yosi massKnowledg graphs yosi mass
Knowledg graphs yosi massdiannepatricia
Β 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduceNewvewm
Β 

Similar to Experiences on Processing Spatial Data with MapReduce ssdbm09 (20)

MAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptxMAP REDUCE IN DATA SCIENCE.pptx
MAP REDUCE IN DATA SCIENCE.pptx
Β 
Map Reduce
Map ReduceMap Reduce
Map Reduce
Β 
Mapreduce advanced
Mapreduce advancedMapreduce advanced
Mapreduce advanced
Β 
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and CassandraBrief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Brief introduction on Hadoop,Dremel, Pig, FlumeJava and Cassandra
Β 
Democratizing Big Semantic Data management
Democratizing Big Semantic Data managementDemocratizing Big Semantic Data management
Democratizing Big Semantic Data management
Β 
Hadoop pig
Hadoop pigHadoop pig
Hadoop pig
Β 
Hadoop
HadoopHadoop
Hadoop
Β 
Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014Yarn spark next_gen_hadoop_8_jan_2014
Yarn spark next_gen_hadoop_8_jan_2014
Β 
Hadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticiansHadoop 101 for bioinformaticians
Hadoop 101 for bioinformaticians
Β 
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop ClustersHDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
HDFS-HC: A Data Placement Module for Heterogeneous Hadoop Clusters
Β 
Hadoop-Introduction
Hadoop-IntroductionHadoop-Introduction
Hadoop-Introduction
Β 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
Β 
Hadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologiesHadoop trainting in hyderabad@kelly technologies
Hadoop trainting in hyderabad@kelly technologies
Β 
Hadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoopHadoop, MapReduce and R = RHadoop
Hadoop, MapReduce and R = RHadoop
Β 
Hadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologiesHadoop trainting-in-hyderabad@kelly technologies
Hadoop trainting-in-hyderabad@kelly technologies
Β 
Apache Spark with Scala
Apache Spark with ScalaApache Spark with Scala
Apache Spark with Scala
Β 
Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012Zaharia spark-scala-days-2012
Zaharia spark-scala-days-2012
Β 
Sector CloudSlam 09
Sector CloudSlam 09Sector CloudSlam 09
Sector CloudSlam 09
Β 
Knowledg graphs yosi mass
Knowledg graphs yosi massKnowledg graphs yosi mass
Knowledg graphs yosi mass
Β 
Hadoop & MapReduce
Hadoop & MapReduceHadoop & MapReduce
Hadoop & MapReduce
Β 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
Β 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
Β 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
Β 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
Β 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
Β 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
Β 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
Β 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
Β 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
Β 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
Β 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
Β 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
Β 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Β 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
Β 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
Β 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
Β 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Β 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
Β 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
Β 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Β 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Β 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
Β 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
Β 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
Β 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
Β 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Β 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
Β 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Β 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
Β 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
Β 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
Β 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
Β 

Experiences on Processing Spatial Data with MapReduce ssdbm09

  • 1. Experiences on Processing Spatial Data with MapReduce Ariel Cary, Zhengguo Sun, Vagelis Hristidis, Naphtali Rishe Florida International University School of Computing and Information Sciences 11200 SW 8th St, Miami, FL 33199 {acary001,sunz,vagelis,rishen}@cis.fiu.edu Sponsored by: NSF Cluster Exploratory (CluE)
  • 2. Agenda 1. Introduction 2. Solving Spatial Problems in MapReduce β€’ R-Tree Index Construction β€’ Aerial Image Processing 3. Experimental Results 4. Related Work 5. Conclusion 2 Florida International University
  • 3. Introduction ο‚— Spatial databases mainly store: ο‚— Raster data (satellite/aerial digital images), and ο‚— Vector data (points, lines, polygons). ο‚— Traditional sequential computing models may take excessive time to process large and complex spatial repositories. ο‚— Emerging parallel computing models, such as MapReduce, provide a potential for scaling data processing in spatial applications. 3 Florida International University
  • 4. Introduction (cont.) ο‚— MapReduce is an emerging massively parallel computing model (Google) composed of two functions: ο‚— Map: takes a key/value pair, executes some computation, and emits a set of intermediate key/value pairs as output. ο‚— Reduce: merges its intermediate values, executes some computation on them, and emits the final output. ο‚— In this work, we present our experiences in applying the MapReduce model to: ο‚— Bulk-construct R-Trees (vector) and 4 Florida International University ο‚— Compute aerial image quality (raster)
  • 5. Introduction (cont.) ο‚— Apache's Hadoop Proxy ο‚— Linux operating Cloud system ο‚— XEN hypervisor ο‚— Hadoop Distributed 1 Internet File System (HDFS) bin/hadoop dfs put ο‚— SOCKS proxy server 2 bin/hadoop jar ο‚— 480 computers 3 bin/hadoop dfs get (nodes), each half terabytes storage 5 Florida International University
  • 6. 2. Solving Spatial Problems in MapReduce β€’ R-Tree Index Construction β€’ Aerial Image Processing
  • 7. MapReduce (MR) R-Tree Construction ο‚— R-Tree Bulk-Construction ο‚— Every object o in database D has two attributes: ο‚— o.id - the object’s unique identifier. ο‚— o.P - the object’s location in some spatial domain. ο‚— The goal is to build an R-Tree index on D. ο‚— MapReduce Algorithm 1. Database partitioning function computation (MR). 2. A small R-Tree is created for each partition (MR). 3. The small R-Trees are merged into the final R- Tree. 7 Florida International University
  • 8. Phase 1 – Partitioning Function – Goal: compute f to assign objects of D into one of RMap and Reduce inputs/outputss.t.: possible partitions in computing partitioning function f. –Function Input: (Key, Value) Output: (Key,are R (ideally) equally-sized partitions Value) generated (minimal variance(C, U(o.P)) Map (o.id, o.P) is acceptable). Reduce (C, list(ui, i=1, .., L)) SΒ΄ – Objects close in the spatial domain are placed Where: within the same partition. β€’ o is an solution: – Proposed spatial object in the database. β€’ C which is a constant that helps in sending Mappers' – Use Z-order space-filling curve to map spatial outputs to a single Reducer. β€’ coordinate samples (3%) into an value. U is a space-filling curve, e.g. Z-order sorted β€’ sequence. containing R-1 splitting points. S' is an array 8 – Collect splitting points that partition the Florida International University sequence in R ranges.
  • 9. Phase 2 - R-Tree Construction in MR ο‚— Mappers compute f() values for objects. ο‚— Reducers compute an R-Tree for each group of objects with identical f() value MapReduce functions in constructing R-Trees. Function Input: (Key, Value) Output: (Key, Value) Map (o.id, o.P) (f(o.P), o) Reduce (f(o.P), list(oi, i=1, .., A)) tree.root Where: β€’ o is an spatial object in the database. β€’ f is the partitioning function computed in phase 1. β€’ Tree.root is the R-Tree root node. 9 Florida International University
  • 10. Phase 3 - R-Tree Consolidation ο‚— sequential process 10 Florida International University
  • 11. Image Processing in MapReduce ο‚— Aerial Image Quality Computation ο‚— Let d be a orthorectified aerial photography (DOQQ) file and t be a tile inside d, d.name is d’s file name and t.q is the quality information of tile t. ο‚— The goal is to compute a quality bitmap for d. ο‚— MapReduce Algorithm ο‚— A customized InputFormatter partitions each DOQQ file d into several splits containing multiple tiles. ο‚— The Mappers compute the quality bitmap for each tile inside a split. ο‚— The Reducers merge all the bitmaps that belongs to a file d and write them to an output file. 11 Florida International University
  • 12. Image Processing in MapReduce ο‚— MapReduce Algorithm Input and output of map and reduce functions Function Input: (Key, Value) Output: (Key, Value) Map (d.name+t.id, t) (d.name, (t.id,t.q)) Reduce (d.name, list(t.id,t.q)) Quality-bitmap of d Where: β€’ d is a DOQQ file. β€’ t is a tile in d. β€’ t.q is the quality bitmap of t. 12 Florida International University
  • 14. Experimental Results: Setting ο‚— Data Set Table 4. Spatial data sets used in experiments*. Problem Data Data size Objects Description set (GB) R-Tree FLD 11.4 M 5 Points of properties in the state of Florida. Yellow pages directory of points of businesses mostly YPD 37 M 5.3 in the United States but also in other countries. Image Miami- Aerial imagery of Miami-Dade county, FL (3-inch 482 files 52 Quality Dade resolution) * Data sets supplied by the High Performance Database Research Center at Florida International University ο‚— Environment ο‚— The cluster was provided by the Google and IBM Academic Cluster Computing Initiative. ο‚— The cluster contains around 480 computers running Hadoop - open source MapReduce. 14 Florida International University
  • 15. Experimental Results: R-Tree ο‚— R-Tree Construction Performance Metrics 30.00 60.00 25.00 50.00 MR2 MR2 20.00 40.00 Time (min) Time (min) MR1 MR1 15.00 30.00 10.00 20.00 5.00 10.00 0.00 0.00 2 4 8 16 32 64 4 8 16 32 64 Reducers Reducers (a) FLD data set (b) YPD data set MapReduce job completion times for various number of reducers in phase-2 (MR2). 15 Florida International University
  • 16. Experimental Results: R-Tree ο‚— MapReduce R-Trees vs. Single Process (SP) Objects per Reducer Consolidated R-Tree Data set R Average Stdev Nodes Height FLD 2 5,690,419 12,183 172,776 4 4 2,845,210 6,347 172,624 4 8 1,422,605 2,235 173,141 4 16 711,379 2,533 162,518 4 32 355,651 2,379 173,273 3 64 177,826 1,816 173,445 3 SP 11,382,185 0 172,681 4 YPD 4 9,257,188 22,137 568,854 4 8 4,628,594 9,413 568,716 4 16 2,314,297 7,634 568,232 4 32 1,157,149 6,043 567,550 4 64 578,574 2,982 566,199 4 SP 37,034,126 0 587,353 5 16 Florida International University
  • 17. Experimental Results: Imagery ο‚— Tile Quality Computation 25 4 Reduce 3.5 Reduce 20 Map Map 3 Time (min) Time (min) 15 2.5 2 10 1.5 5 1 0.5 0 0 4 8 16 32 64 128 256 512 2 4 8 16 Reducers Size of data (GB) (a) Fixed data size, variable Reducers (b) Variable data size, fixed Reducers Fig. 9. MapReduce job completion time for tile quality computation 17 Florida International University
  • 19. Related Work ο‚— Previous works on R-Tree parallel construction faced intrinsic distributed computing problems: load balancing, process scheduling, fault tolerance, etc. ο‚— Schnitzer and Leutenegger [16] proposed a Master- Client R-Tree, where the data set is first partitioned using Hilbert packing sort algorithm, then the partitions are declustered into a number of processors, where individual trees are built. At the end, a master process combines the individual trees into the final R-Tree. ο‚— Papadopoulos and Manolopoulos [17] proposed a methodology for sampling-based space partitionining, load balancing, and partition assignment into a set of Florida International University 19 processors in parallely building R-Trees.
  • 21. Conclusion ο‚— We used the MapReduce model to solve two spatial problems on a Google&IBM cluster: ο‚— (a) Bulk-construction of R-Trees and ο‚— (b) Aerial image quality computation ο‚— MapReduce can dramatically improve task completion times. Our experiments show close to linear scalability. ο‚— Our experience in this work shows MapReduce has the potential to be applicable to more complex spatial problems. 21 Florida International University
  • 22. References [1] Antonin Guttman: R-Trees: A Dynamic Index Structure for Spatial Searching. SIGMOD 1984:47-57. [2] NSF Cluster Exploratory Program: http://www.nsf.gov/pubs/2008/nsf08560/nsf08560.htm [3] Google&IBM Academic Cluster Computing Initiative: http://www.google.com/intl/en/press/pressrel/20071008_ibm_univ.html [4] Apache Hadoop project: http://hadoop.apache.org [6] Jeffrey Dean, Sanjay Ghemawat: MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th Conference on Symposium on Opearting Systems Design & Implementation, USENIX Association, Volume 6, pp. 10-10, December 2004. [12] High Performance Database Research Center (HPDRC), Research Division of the Florida International University, School of Computing and Information Sciences, University Park, Telephone: (305) 348-1706, FIU ECS-243, Miami, FL 33199. [16] Schnitzer B., Leutenegger S.T.: Master-client R-trees: a new parallel R- tree architecture, In Proceedings of the 11th International Conference on Scientific and Statistical Database Management, pp. 68-77, August 1999. [17] Apostolos Papadopoulos, Yannis Manolopoulos: Parallel bulk-loading of spatial data, Parallel Computing, Volume 29, Issue 10, pp. 1419 - 1444, October 2003. 22 Florida International University