Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

20131106 acm geocrowd


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

20131106 acm geocrowd

  1. 1. The One and Many Maps: Participatory and Temporal Diversities in OpenStreetMap Tyng-Ruey Chuang1, Dong-Po Deng1,3, Chun–Chen Hsu1,2, Rob Lemmens3 ! 1Institute of Information Science, Academia Sinica, Taiwan 2Department of Computer Science and Information Engineering, National Taiwan University 3Faculty of Geo–Information Science and Earth Observation (ITC), University of Twente, Netherlands Second ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information (GEOCROWD) 2013 In conjunction with ACM SIGSPATIAL 2013
  2. 2. Background • OSM is a wiki-style online mapping platform in which tens of thousands of people voluntarily contribute geospatial data into the making of a global map (Haklay & Weber, 2008). • Its peer production model demonstrates that more and more mapping activities are done by the citizens. • It represents the success of a collective form of geospatial content creation.
  3. 3. Collaborative Geospatial Content Creation • OSM is a type of PPGIS, as well as the characteristics of data collaboration in OSM as the subjects of VGI research. • The current state of OSM actually is an assembly of many edits and updates over a period of time. • Every edit or update should be a meaningful unit in the understanding of data collaboration activities in OSM.
  4. 4. Aims • We intend to look for ways to systematically and efficiently discover data collaboration patterns and diversities in OSM • It is an initial study of the OSM dataset (at least about the part of Taiwan) by developing a set of metrics to summarize • • • user participation, and spatiotemporal variations of updates in defined areas of OSM We hope to see the OSM not as one collective map but as many overlapping maps concurrently in the making with each in its own characteristics
  5. 5. OSM Data Model Node Way Open polyline Closed polyline Area Relation Tag
  6. 6. OSM Data An example of a Node <node id='1762782473' timestamp='2012-12-12T03:49:16Z' uid='1048' user='dongpo' visible='true' version='2' changeset='14245247' lat='23.864527' lon='121.5217101'> <tag k='name' v='⽴立川漁場' /> <tag k='tourism' v='attraction' /> <tag k='source' v='survey' /> <tag k='addr: housenumber' v='45' /> <tag k='addr:district' v='⿂魚池' /> <tag k='addr:town' v='壽豐鄉' /> <tag k='addr:county' v='花蓮縣' /> </node>
  7. 7. OSM Data An example of a Way <way id='118416207' timestamp='2012-05-23T17:43:06Z' uid='1048' user='dongpo' visible='true' version='4' changeset='14246301'> <nd ref='1088092959' /> <nd ref='1088092953' /> .... <nd ref='1600948228' /> <tag k='highway' v='primary' /> <tag k='lanes' v='2' /> <tag k='oneway' v='yes' /> <tag k='ref' v='Hwy 11C' /> <tag k='ref:zh' v='台11丙線' /> </way>
  8. 8. OSM Data An example of a Relation <relation id='2498406' timestamp='2012-10-14T19:01:55Z' uid='1048' user='dongpo' visible='true' version='1' changeset='13497007'> <member type='way' ref='185846446' role='outer' <member type='way' ref='185846444' role='outer' <member type='way' ref='151063000' role='outer' <member type='way' ref='185846448' role='outer' <member type='way' ref='185846445' role='outer' <tag k='admin_level' v='8' /> <tag k='boundary' v='administrative' /> <tag k='name' v='草屯鎮 (Caotun)' /> <tag k='name:en' v='Caotun' /> <tag k='name:zh' v='草屯鎮' /> <tag k='type' v='boundary' /> </relation> /> /> /> /> />
  9. 9. The definitions for the metrics To measure participatory and temporal differences among cells in OSM, we define the following functions on Dc. The c in Dc means in a cell. When the context is clear, we omit the subscript c and simply write: where di, 0 < i < n -1 , is a node in the cell, and n is the total number of nodes.
  10. 10. The definitions for the metrics For a node di, we write: where ki is the node id of di, ti the age, ui the user id of its contributor, and pi the position (i.e., the pair of its lat and lon values). Note that, by definition, geographically pi is within the boundary of c for all .
  11. 11. Node and Mapper Density In general, we use areac to denote the area covered by a cell c, and we use popc for the people population in region c. When it is clear in context, we omit the subscript and simply write area and pop. The following measures the densities of nodes, as well as those of their contributors, i.e., the mappers.
  12. 12. Node age and temporality Recall the following auxiliary functions for a sequence s where min s is the minimum of s, max s the maximum of s, s the average of s, and cv(s) the coefficient of variation for elements in s.
  13. 13. Node age and temporality(cont.) The 4-tuple <min t, max t, t, cv(t)> measures the age characteristics of the nodes in a cell. That is,
  14. 14. Node age and temporality(cont.) We define a sequence g = (g0, g1, g2,..., gn-2) to measure the gaps between any two consecutive elements in t. That is, which is the gap in days between the dates when the two nodes di+1 and di were added into the cell.
  15. 15. Node age and temporality(cont.) The 4-tuple <min g, max g, g, cv(g)> measures the
  16. 16. Graphing cells by two metrics • As multiple metrics are in use, a cell can be measured in two metrics and the two results compared. • Often we will compare the two sets of measurement over all cells to see if there are patterns.
  17. 17. Number of mappers and number of nodes Figure 1: Distribution of the cells by both mapper count and node count. Figure 2: Mapping the cells in Taiwan by their types. (c.f. Figure 1)
  18. 18. Locations and Cities in Taiwan Figure 3: Cities in Taiwan
  19. 19. Distributions of Mappers Figure 4: Spatial distribution of mappers over area. Figure 5: Spatial distribution of mappers over population.
  20. 20. Distributions of Nodes Figure 6: Spatial distribution of nodes over area. Figure 7: Spatial distribution of nodes over population
  21. 21. Node Age Average and Variance Figure 8: Spatial distribution of average node age. Figure 9: Spatial distribution of the variance of node age.
  22. 22. Number of Mappers and Update Interval Figure 10: Distribution of the cells by both mapper count and average time gap between two additions. Figure 11: Spatial distribution of the cells by their types. (c.f. Figure 10)
  23. 23. The 80/20 Hypothesis Figure 12: Distribution of the cells by both mapper count and ratio of mappers needed for combined 80% node contribution. Figure 13: Spatial distribution of the cells by their types. (c.f. Figure 12)
  24. 24. Related work • Previous investigations into the data quality issues of OSM have shown that the OSM dataset can be fairly accurate, and is mostly comparable to commercial/gov. datasets at least in urban areas [3, 8, 6, 9]. • Researchers had also developed visual analytics to gain insights into the spatial diversity of OSM datasets, e. g. to see whether users in different countries would exhibit distinct mapping activities and habits [13]. • These visualization tools can provide valuable information when improving the data quality of OSM. Neis and Zipf identified active mappers and casual mappers by examining quantitatively their contributions in OSM [12]. • Mooney and Corcoran examined directly the characteristics of “heavily edited” objects in OpenStreetMap of UK, and they considered these characteristics might be developed as data quality indicators for OpenStreetMap in the future [10].
  25. 25. Conclusion and future work • This paper is a preliminary study in two sense. We only analyze • • the Taiwan part of OpenStreetMap, and the cells independently (though spatial distribution is visualized and discussed). • Because of the time constraint, we have not looked into other geographical areas in OSM. • Also, as a mapper may contribute to multiple cells, we ought to look into mapping activities across the cells. We intend to pursue these directions in the future. • The programs we use to analyze the data are in their early stage of development, and the way we prepare the data for analysis is rather ad hoc.
  26. 26. Conclusion and future work • We are currently consider how better to structure the programs so that they can be easily ported and reused. • Metrics-based analysis tools like these can be very useful in improving the data quality in OpenStreetMap as it helps discover areas where there is participatory or temporal unevenness in the map making process itself.
  27. 27. Thank for your attention! Question? To the extent possible under law, the authors have waived all copyright and related or neighboring rights to this paper. For the avoidance of doubt, this work is released under the CC0 Public Domain Dedication ( Anyone can copy, modify, distribute and perform this work, even for commercial purposes, all without asking permission.