Successfully reported this slideshow.
Your SlideShare is downloading. ×

Using Crowdsourcing, Automated Methods and Google Street View to Collect Sidewalk Accessibility Data

Ad

makeability lab
クラウドソーシング・コンピュータビジョン・
ストリートビューを用いた歩道の
アクセシビリティデータの収集手法
原航太郎 | Project Sidewalk (PI: Jon E. Froehlich)

Ad

A
B
C

Ad

D
A
B
C

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Check these out next

1 of 133 Ad
1 of 133 Ad

Using Crowdsourcing, Automated Methods and Google Street View to Collect Sidewalk Accessibility Data

Download to read offline

In this presentation, I describe a system that uses crowdsourcing, computer vision, machine learning, and Google Street View to collect sidewalk accessibility data.

In this presentation, I describe a system that uses crowdsourcing, computer vision, machine learning, and Google Street View to collect sidewalk accessibility data.

Advertisement
Advertisement

More Related Content

Advertisement

Using Crowdsourcing, Automated Methods and Google Street View to Collect Sidewalk Accessibility Data

  1. 1. makeability lab クラウドソーシング・コンピュータビジョン・ ストリートビューを用いた歩道の アクセシビリティデータの収集手法 原航太郎 | Project Sidewalk (PI: Jon E. Froehlich)
  2. 2. A B C
  3. 3. D A B C
  4. 4. Human-Computer Interaction Lab
  5. 5. Characterizing Sidewalk Accessibility at Scale using Google Street View, Crowdsourcing, and Automated Methods Kotaro Hara | Project Sidewalk (PI: Prof. Jon Froehlich) makeability lab
  6. 6. I want to start with a story…
  7. 7. You Your Friend
  8. 8. 30.6million U.S. adults with mobility impairment
  9. 9. 15.2million use an assistive aid
  10. 10. Incomplete Sidewalks Physical Obstacles Surface Problems No Curb Ramps Stairs/Businesses
  11. 11. The lack of street-level accessibility information can have a significant impact on the independence and mobility of citizens cf. Nuernberger, 2008; Thapar et al., 2004
  12. 12. Accessibility-aware Navigation
  13. 13. Visualizing Accessibility of a City
  14. 14. Our goal is to collect and deliver data for the accessibility of every city in the world
  15. 15. Physical Street Audits
  16. 16. Time-consuming and expensive
  17. 17. Mobile Crowdsourcing SeeClickFix.com
  18. 18. These mobile tools require people to be on-site Mobile Crowdsourcing SeeClickFix.com
  19. 19. Use Google Street View (GSV) as a massive data source for scalably finding and characterizing street-level accessibility
  20. 20. AutomationCrowdsourcing How can we efficiently collect accurate accessibility data with…
  21. 21. Amazon Mechanical Turk is an online labor market where you can hire workers to complete small tasks
  22. 22. Task: Find the company name from an email domain $0.02 per task Task interface
  23. 23. Timer: 00:07:00 of 3 hours University of Maryland: Help make our sidewalks more accessible for wheelchair users with Google Maps Kotaro Hara 10 3 hours Crowdsourcing Data Collection Hara K., Le V., and Froehlich J.E [ASSETS2012, CHI2013] Crowdsourcing | Image Labeling
  24. 24. Manual labeling is accurate, but labor intensive
  25. 25. Manual labeling is accurate, but labor intensive
  26. 26. Computer Vision
  27. 27. Computer vision automatically finds curb ramps Automatic Curb Ramp Detection
  28. 28. Automatic Curb Ramp Detection Curb Ramp Labels Detected with Computer Vision
  29. 29. Automatic Curb Ramp Detection Curb Ramp Labels Detected with Computer Vision
  30. 30. Some curb ramps never get detected False detections Automatic Curb Ramp Detection
  31. 31. 2x Manual Label Verification
  32. 32. Computer vision + verification is cheaper but less accurate compared to manual labeling
  33. 33. Automatic Task Allocation Research Question How can we combine manual labeling and computer vision to achieve high accuracy and low cost?
  34. 34. Tohme遠目 Remote Eye・
  35. 35. Computer vision + verification is cheaper but less accurate Manual labeling is accurate, but labor intensive Design Principles
  36. 36. Computer vision + verification is cheaper but less accurate (not true for easy tasks) Manual labeling is accurate, but labor intensive Design Principles
  37. 37. Dataset svDetect Automatic Curb Ramp Detection svCrawl Web Scraper Tohme 遠目 Remote Eye・
  38. 38. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation Tohme 遠目 Remote Eye・
  39. 39. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification Tohme 遠目 Remote Eye・
  40. 40. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  41. 41. Tohme 遠目 Remote Eye・ .
  42. 42. Tohme 遠目 Remote Eye・
  43. 43. Tohme 遠目 Remote Eye・ Complexity: Cardinality: Depth: CV: 0.14 0.33 0.21 0.22
  44. 44. Tohme 遠目 Remote Eye・ Complexity: Cardinality: Depth: CV: 0.14 0.33 0.21 0.22 Predict computer vision performance
  45. 45. Tohme 遠目 Remote Eye・ Complexity: Cardinality: Depth: CV: 0.14 0.33 0.21 0.22 The easy task is passed to the cheaper verification workflow.
  46. 46. Tohme 遠目 Remote Eye・ .
  47. 47. Tohme 遠目 Remote Eye・
  48. 48. Tohme 遠目 Remote Eye・ Complexity: Cardinality: Depth: CV: 0.82 0.25 0.96 0.54
  49. 49. Tohme 遠目 Remote Eye・ Complexity: Cardinality: Depth: CV: 0.82 0.25 0.96 0.54
  50. 50. Tohme 遠目 Remote Eye・ Complexity: Cardinality: Depth: CV: 0.82 0.25 0.96 0.54The difficult task is passed to the more accurate labeling workflow.
  51. 51. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  52. 52. Google Street View Panoramas and Metadata 3D Point-cloud Data Top-down Google Maps Imagery Scraper
  53. 53. Saskatoon Los Angeles Baltimore Washington D.C. Washington D.C. Baltimore Los Angeles Saskatoon
  54. 54. D.C. | Downtown D.C. | Residential Scraper | Areas of Study
  55. 55. Washington D.C. Dense urban area Semi-urban residential areas Scraper
  56. 56. Washington D.C. Baltimore Los Angeles Saskatoon Total Area:11.3 km2 Intersections: 1,086 Curb Ramps: 2,877 Missing Curb Ramps:647 Avg. GSV Data Age:2.2 yr* * At the time of downloading data in summer 2013 Scraper
  57. 57. How well does GSV data reflect the current state of the physical world?
  58. 58. Vs.Vs.
  59. 59. Washington D.C. Baltimore Physical Audit Areas GSV and Physical World > 97.7% agreement 273 Intersections Dataset | Validating Dataset Small disagreement due to construction.
  60. 60. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  61. 61. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  62. 62. Dataset
  63. 63. Ground Truth Curb Ramp Dataset 2 researchers labeled curb ramps in our dataset 2,877 curb ramp labels (M=2.6 per intersection) Dataset
  64. 64. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  65. 65. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  66. 66. Deformable Part Models Felzenszwalb et al. 2008 Automatic Curb Ramp Detection http://www.cs.berkeley.edu/~rbg/latent/
  67. 67. Deformable Part Models Felzenszwalb et al. 2008 Automatic Curb Ramp Detection http://www.cs.berkeley.edu/~rbg/latent/ Root filter Parts filter Displacement cost
  68. 68. Automatic Curb Ramp Detection Multiple redundant detection boxes Detected Labels Stage 1: Deformable Part Model Correct 1 False Positive 12 Miss 0
  69. 69. Automatic Curb Ramp Detection Curb ramps shouldn’t be in the sky or on roofs Correct 1 False Positive 12 Miss 0 Detected Labels Stage 1: Deformable Part Model
  70. 70. Automatic Curb Ramp Detection Detected Labels Stage 2: Post-processing
  71. 71. Automatic Curb Ramp Detection Detected Labels Stage 3: SVM-based Refinement Filter out labels based on their size, color, and position. Correct 1 False Positive 5 Miss 0
  72. 72. Automatic Curb Ramp Detection Correct 1 False Positive 3 Miss 0 Detected Labels Stage 3: SVM-based Refinement
  73. 73. Google Street View Panoramic Image Curb Ramp Labels Detected by Computer Vision Automatic Curb Ramp Detection
  74. 74. Good example!
  75. 75. Bad Example :(
  76. 76. Used two-fold cross validation to evaluate CV sub-system
  77. 77. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision(%) Recall (%) Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS Precision Higher, less false positives Recall Higher, less false negatives
  78. 78. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision(%) Recall (%) Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS Goal: maximize area under curve
  79. 79. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision(%) Recall (%) Stage 1: DPM Stage 2: Post-Processing Stage 3: SVM Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS More than 20% of curb ramps were missed
  80. 80. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision(%) Recall (%) Stage 1: DPM Stage 2: Post-Processing Stage 3: SVM Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS Confidence threshold of - 0.99, which results in 26% precision and 67% recall
  81. 81. Occlusion Illumination Scale Viewpoint Variation Structures Similar to Curb Ramps Curb Ramp Design Variation Automatic Curb Ramp Detection CURB RAMP DETECTION IS A HARD PROBLEM
  82. 82. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  83. 83. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  84. 84. Automatic Task Allocation | Features to Assess Scene Difficulty for CV A number of streets connected in an intersection Depth information for a road width and variance in distance Top-down images to assess complexity of an intersection A number of detections and confidence values
  85. 85. Automatic Task Allocation | Features to Assess Scene Difficulty for CV A number of street from metadata Depth information to assess a road width and variance in distance Top-down images to assess complexity of an intersection A number of detections and confidence values
  86. 86. Depth information for a road width and variance in distance Automatic Task Allocation | Features to Assess Scene Difficulty for CV
  87. 87. Automatic Task Allocation | Features to Assess Scene Difficulty for CV A number of streets from metadata Depth information for a road width and variance in distance Top-down images to assess complexity of an intersection A number of detections and confidence values
  88. 88. Google Maps Styled Maps Top-down images to assess complexity of an intersection Automatic Task Allocation | Features to Assess Scene Difficulty for CV
  89. 89. Automatic Task Allocation | Features to Assess Scene Difficulty for CV A number of streets from metadata Depth information for a road width and variance in distance Top-down images to assess complexity of an intersection CV Output: A number of detections and confidence values
  90. 90. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  91. 91. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  92. 92. 3x Manual Labeling | Labeling Interface
  93. 93. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  94. 94. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye・
  95. 95. 2x Manual Label Verification
  96. 96. Automatic Task Allocation Can we combine manual labeling and computer vision to achieve high accuracy and low cost?
  97. 97. STUDY METHOD: CONDITIONS Manual labeling without smart task allocation &vs. CV + Verification without smart task allocation Tohme遠目 Remote Eye・ vs. Evaluation
  98. 98. Accuracy Task Completion Time Evaluation STUDY METHOD: MEASURES
  99. 99. Recruited workers from Mturk Used 1,046 GSV images (40 used for golden insertion) Evaluation STUDY METHOD: APPROACH
  100. 100. RESULTS Labeling Tasks Verification Tasks # of distinct turkers: 242 161 1,270 582# of HITs completed: # of tasks completed: 6,350 4,820 # of tasks allocated: 769 277 Evaluation We used Monte Carlo simulations for evaluation
  101. 101. 84% 68% 83% 88% 58% 86%86% 63% 84% 0% 20% 40% 60% 80% 100% AccuracyMeasures(%) Precision Recall F-measure 94 42 81 0 20 40 60 80 100 TaskCompletionTime/Scene(s) Accuracy measures Task completion time per scene Manual Labeling CV and Manual Verification & Tohme 遠目 Remote Eye・ Manual Labeling CV and Manual Verification & Tohme 遠目 Remote Eye・ Evaluation | Labeling Accuracy and Time Cost Error bars are standard deviations. ACCURACY COST (TIME)
  102. 102. 84% 68% 83% 88% 58% 86%86% 63% 84% 0% 20% 40% 60% 80% 100% AccuracyMeasures(%) Precision Recall F-measure Error bars are standard deviations. Manual Labeling CV and Manual Verification & 94 42 81 0 20 40 60 80 100 TaskCompletionTime/Scene(s) Manual Labeling CV and Manual Verification & Accuracy measures Task completion time per scene Tohme 遠目 Remote Eye・ Tohme 遠目 Remote Eye・ Evaluation | Labeling Accuracy and Time Cost 13% reduction in cost ACCURACY COST (TIME)
  103. 103. svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Evaluation | Smart Task Allocator ~80% of svVerify tasks were correctly routed ~50% of svLabel tasks were correctly routed
  104. 104. svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Evaluation | Smart Task Allocator If svControl worked perfectly, Tohme’s cost would drop to 28% of a manually labelling approach alone.
  105. 105. Example Labels from Manual Labeling
  106. 106. Evaluation | Example Labels from Manual Labeling
  107. 107. Evaluation | Example Labels from Manual Labeling
  108. 108. Evaluation | Example Labels from Manual Labeling
  109. 109. Evaluation | Example Labels from Manual Labeling
  110. 110. Evaluation | Example Labels from Manual Labeling
  111. 111. This is a driveway. Not a curb ramp. Evaluation | Example Labels from Manual Labeling
  112. 112. Evaluation | Example Labels from Manual Labeling
  113. 113. Evaluation | Example Labels from Manual Labeling
  114. 114. Examples Labels from CV + Verification
  115. 115. Raw Street View Image Evaluation | Example Labels from CV + Verification
  116. 116. False detection Automatic Detection Evaluation | Example Labels from CV + Verification
  117. 117. Automatic Detection + Human Verification Evaluation | Example Labels from CV + Verification
  118. 118. 8,209Intersections in DC
  119. 119. 8,209Intersections in DC BACK OF THE ENVELOPE CALCULATIONS Manually labeling GSV with our custom interfaces would take 214 hours With Tohme, this drops to 184 hours We think we can do better 
  120. 120. makeability lab Smart task management can improve efficiency of semi-automatic crowd-powered system Takeaway We can combine crowdsourcing and automated methods to collect accessibility data from Street View
  121. 121. FUTURE WORK: COMPUTER VISION Context integration & scene understanding 3D-data integration Improve training & sample size Mensuration
  122. 122. FUTURE WORK: DEPLOYMENT OF VOLUNTEER WEB SITE
  123. 123. This work is supported by Faculty Research Award makeability lab
  124. 124. THE CROWD-POWERED STREETVIEW ACCESSIBILITY TEAM! Kotaro Hara Jin Sun Victoria Le Robert Moore Sean Pannella Jonah Chazan David Jacobs Jon Froehlich Zachary Lawrence Graduate Student Undergraduate High School Professor Thanks! @kotarohara_en | kotaro@cs.umd.edu

Editor's Notes

  • My name is Kotaro Hara. Today, I will talk about how we can use automated methods and crowdsourcing to collect accessibility information about cities
  • My name is Kotaro Hara. Today, I will talk about how we can use automated methods and crowdsourcing to collect accessibility information about cities
  • I want to tell you a story…
  • Imagine that you and a friend are on a walk. You’re both somewhat unfamiliar with the area.

    Suddenly, in the middle of the sidewalk, you encounter a fire hydrant

    -- Image Reference
    http://www.iconsdb.com/black-icons/fire-hydrant-icon.html
  • In this case, you manage to go around because there is a driveway, but they are temporarily forced onto the street which is dangerous.
  • Now, you get to the end of the block and discover that there is no curb cut. You are forced to turn around and find another way.

    The problem is not only the sidewalks remain inaccessible, but there are currently few mechanisms to find out about the accessibility of a route in advance
  • Now, you get to the end of the block and discover that there is no curb cut. You are forced to turn around and find another way.

    The problem is not only the sidewalks remain inaccessible, but there are currently few mechanisms to find out about the accessibility of a route in advance

    -- Quote from paper
    The problem is not just that sidewalk accessibility fundamentally affects where and how people travel in cities but also that there are few, if any, mechanisms to determine accessible areas of a city a priori


    -- What Jon wrote
    The problem is not just that there are inaccessible areas of cities but that there are currently few methods for us to determine them a priori
  • Now, you get to the end of the block and discover that there is no curb cut. You are forced to turn around and find another way.

    The problem is not only the sidewalks remain inaccessible, but there are currently few mechanisms to find out about the accessibility of a route in advance

    -- Quote from paper
    The problem is not just that sidewalk accessibility fundamentally affects where and how people travel in cities but also that there are few, if any, mechanisms to determine accessible areas of a city a priori


    -- What Jon wrote
    The problem is not just that there are inaccessible areas of cities but that there are currently few methods for us to determine them a priori
  • According to the most recent US Census (2010), roughly 30.6 million adults have physical disabilities that affect their ambulatory activities [128].

    -----
    Flickr: 3627562740_c74f7bfb82_o.jpg
  • Of these, nearly half report using an assistive aid such as a wheelchair (3.6 million) or a cane, crutches, or walker (11.6 million)

    内閣府のデータでは日本では総数366.3万人。

    ----

    Flickr: 14816521847_5c3c7af348_o.jpg

  • Despite comprehensive civil rights legislation for Americans with disabilities (e.g., [9,75]), many city streets, sidewalks, and businesses in the US remain inaccessible [90,96,120].
  • The lack of street-level accessibility information can have a significant negative impact on the independence and mobility of citizens [99,120].

    99: Nuernberger, A. (2008). Presenting accessibility to mobility-impaired travelers. (Doctoral dissertation,
    University of California, Berkeley).
    120: Thapar, N., Warner, G., Drainoni, M., Williams, S., Ditchfield, H., Wierbicky, J., & Nesathurai, S.
    (2004). A pilot of functional access to public buildings and facilities for persons with
    impairments. Disability and Rehabilitation, 26(5), 280-9.
  • So we would like to develop technologies such as an accessibility aware navigation system. It shows an accessible path instead of a shortest path based on your mobility level.
  • We also want to build an application that allows you to visualize the accessibility of a city. You can quickly compare which area of a city is more accessible. We need geo-data to make these.
  • To do this, we need a lot of data about accessibility. Our group’s goal is to collect and deliver street-level accessibility data for every city in the world.

    -- Image
    http://www.flickr.com/photos/rgb12/6225459696/lightbox/
  • Traditionally, information about a neighborhood have been gathered by volunteers or government organizations through physical audit.
  • However, this is time-consuming and expensive.


  • Mobile crowdsourcing such as SeeClickFix.com
  • Mobile crowdsourcing such as SeeClickFix.com
  • And NYC 311 allows citizens to report neighborhood sidewalk accessibility issues.
  • But this requires people to be on-site
  • Our approach is different though complementary. Use Google Street View as a massive data source…
  • Today, I am going to talk about how we can use crowdsourcing and automated methods to collect accessibility data Google Street View.
  • Amazon Mechanical Turk is an online labor market where you can hire workers to complete small tasks.
  • For example, if you are a worker, you can go to Amazon’s website to browse through available tasks
  • Choose one of the tasks. For example, this task is about finding the company name from an email domain. You can get 2 cents for completing a task through this web interface.
  • We recruit crowd worker from Amazon Mechanical Turk. For those of you who don’t know Mechanical Turk, it is an online labor market where you can work or recruit workers to perform small tasks over the Internet.
  • Using this platform, we recruit workers to work on our task. We developed this interface where you can see Google Street View imagey, and label, in this case, an obstacle in path.
  • We showed that this is an effective method, but it is labor intensive.
  • We showed that this is an effective method, but it is labor intensive.
  • To more efficiently find accessibility attributes, we turned to computer vision, which is used for applications like face detection.
  • Different attributes affect sidewalk accessibility for people with mobility impairment. For example, presence of curb ramps, surface conditions, obstacles, steep gradients, and more.
  • And removed even more errors
  • And removed even more errors
  • Computer vision is not perfect. And there are false positives, which can be fixed by verification. It misses curb ramps, and humans need to label these.
  • Here you see detected curb ramps as green boxes on top of the Street View image (to the next slide to play).
  • The question is, can we achieve same or better accuracy as a system with a lower time cost compared to manual labeling.

    5 min
  • To do this, we developed a system called Tohme. It combines the two approach.
  • This is the overview of the system. A custom web scraper that collects dataset including Street View images. A computer vision based detector finds curb ramps.
  • So we designed a smart task allocator.
  • It routes detection results to a cheap manual verification workflow to remove false positive errors. However, since our verification task disallow workers to fix the false negatives, curb ramps that are missed never get detected.
  • So if the allocator predicts false negative, it then passes tasks to manual labeling workflow.
  • We get a Street View image.
  • We run a detector
  • Then extract features.
  • Our task allocator predicts presence of false negatives. If it predicts no false negative, then it allocates a task to a verification workflow.
  • Our task allocator predicts presence of false negatives. If it predicts no false negative, then it allocates a task to a verification workflow.
  • Another example.
  • Run a detector
  • Extract features.
  • If the allocator predicts false negative, then it passes the task to the labeling workflow.
  • If the allocator predicts false negative, then it passes the task to the labeling workflow.
  • Let’s first talk about our web scraper
  • Let’s first talk about our web scraper
  • We scraped GSV panoramas and metadata from the intersections. We also scraped their accompanying 3-d point cloud data. As well as top-down Google Maps imagery. These datasets are used to train automatic task allocator.

    _AUz5cV_ofocoDbesxY3Kw
    -dlUzxwCI_-k5RbGw6IlEg
    0C6PG3Zpuwz11kZKfG_vUg
    D-2VNbhqOqYAKTU0hFneIw
  • Because sidewalk infrastructure can vary in design and appearance across cities and countries, we included 4 regions including Washington DC, Baltimore, Los Angeles, and Saskatoon.
  • We also looked at different types of city areas.
  • Blue regions represent dense urban areas, and red regions represent residential area.
  • In all, we had 11.3 square kilometers. There were 1,086 intersections. We found 2,877 curb ramps and 647 missing curb ramps based on the ground truth data. Average Street View image age was 2.2 years old.
  • (pause) But how well does Street View data reflect the current state of curb ramp infrastructure.
  • To answer this question, we compared Street View intersections with physical intersections
  • To answer this question, we compared Street View intersections with physical intersections
  • First, we physically visited intersections and took multiple pictures.
    The areas included four subset regions, and it consisted of 273 intersections.
    We then counted the numbers of curb ramps and missing curb ramps in both dataset, and evaluate their concordance.
    As a result, we observed over 97% agreement between Google Street View and the real world. A small disagreement due to construction.
  • Moving on to our dataset
  • Moving on to our dataset
  • Moving on to our dataset
  • To train and evaluate our computer vision program, 2 members of our research team manually labeled curb ramps in Street View images. In total, we collected 2,877 curb ramp labels.
  • To train and evaluate our computer vision program, 2 members of our research team manually labeled curb ramps in Street View images. In total, we collected 2,877 curb ramp labels.
  • Our computer vision component has three parts.
  • Our computer vision component has three parts.
  • We experimented with various object detection. We chose to build it on top of a framework called DPM, one of the most successful approaches in object detection.
  • DPM models a target object and its parts with histogram of gradient features. It also models the spatial relationship between the parts.
  • DPM sweeps through an entire image, and detects areas that look like a curb ramp. Detections are shown in red boxes. Numbers of correct detections and errors are shown in this table. There are some redundant labels such as overlapping boxes.

    h7ZW0_VasRt3vhevz1mjeg
  • And there shouldn’t be curb ramps in the sky.

    h7ZW0_VasRt3vhevz1mjeg
  • We use non-maxima suppression to remove overlapping labels, and 3D point cloud data to remove curb ramps that are not on ground level. Note, that this 3D data is coarse we cannot identify detailed structure of curb ramps.

    h7ZW0_VasRt3vhevz1mjeg
  • We get a cleaner result, but we still have some errors. We try to remove them by utilizing other information such as size of a bounding box and RGB information.

    h7ZW0_VasRt3vhevz1mjeg
  • This is the final result with computer vision alone.

    h7ZW0_VasRt3vhevz1mjeg
  • I will talk about how we can combine crowdsourcing and automated methods to collect curb ramp data from Google Street View efficiently.

    Today, how algorithmic work management plays a role in this process.
  • And removed even more errors
  • Our curve is less ideal
  • For our system, we set the confidence threshold to emphasize higher recall than higher precision because false positives are easier to correct
  • We observed various image properties that could cause computer vision to make errors. Including occlusion, illumination, scale, view point variation, structures similar to curb ramps, and variation in design of curb ramps.
  • That’s what we do with the task allocator.
  • That’s what we do with the task allocator.
  • We used following features.
    To assess complexity of intersections, we used street cardinality in the meta data.
  • Depth data
  • It allows us to estimate a size of a street, which is useful because further the curb ramp, harder to detect.
  • We also assessed the complexity of each intersection with top-down imagery.
  • Because looks of curb ramps vary more in irregular intersections, computer vision tend to fail finding curb ramps. For example, the intersection on the right is arguably more complex than the one on the left.
  • We also used the number of detection boxes, their positions, and confidence to see how confused the computer vision program was.
  • Our manual labeling tool allows people to control a viewing angle. You select the curb ramp button at the top, and label the target. We collect outline labels of curb ramps to collect rich data to train computer vision.
  • Let’s talk about the verification task
  • Let’s talk about the verification task
  • Here you see detected curb ramps as green boxes on top of the Street View image (to the next slide to play).
  • The question is, can we achieve same or better accuracy as a system with a lower time cost compared to manual labeling.
  • We compare the performance of manual labeling without smart task allocation, computer vision plus verification without smart task allocation, and finally Tohme.
  • We measured accuracy and average task completion time of each workflow.
  • Turkers completed over 6,300 labeling tasks and 4,800 verification tasks and we used monte carlo simulations for evaluation
  • On the left, I show accuracy. On the right, I show cost. We want accuracy to be high, and cost to be low.
  • On the left, I show accuracy. On the right, I show cost. We want accuracy to be high, and cost to be low.

    For manual labeling approach alone, our accuracy measures are 84 – 86%. 94 seconds per intersection
    For CV + manual verification, our results dropped substantially but so did the time cost by more than half
    So, now, for Tohme, here we saw similar accuracies to the manual baseline approach

  • 217 of 277 tasks correctly routed to svVerify
  • We compare the performance of manual labeling without smart task allocation, computer vision plus verification without smart task allocation, and finally Tohme.
  • We compare the performance of manual labeling without smart task allocation, computer vision plus verification without smart task allocation, and finally Tohme.
  • We measured accuracy and average task completion time of each workflow.
  • We recruited multiple workers to work on labeling tasks and verification tasks. We evaluated the result with Monte Carlo simulation.
  • Let’s see how turkers labeled.
  • In general, their labels were high quality
  • In general, their labels were high quality
  • Even with a difficult scene with shadows, they labeled correctly most of the times.
  • Even with a difficult scene with shadows, they labeled correctly most of the times.
  • But some times there were errors.
  • For example this person labeled a drive way as a curb ramp.
  • And some was a little lazy.
  • And labeled two curb ramps with a single label.
  • Here are some examples.
  • Here are some examples.
  • With only computer vision, there are false positive detections.
  • With human verification, errors get corrected.
  • Based on the shapefile downloaded from data.dc.gov, there are 8,209 intersections in DC

    Manual labeling: 94s per intersection * 8,209 intersections =
    Tohme: 81 s per intersection

    ----
    Source:
    http://data.dc.gov/Metadata.aspx?id=2106
  • Based on the shapefile downloaded from data.dc.gov, there are 8,209 intersections in DC

    Manual labeling: 94s per intersection * 8,209 intersections =
    Tohme: 81 s per intersection

    ----
    Source:
    http://data.dc.gov/Metadata.aspx?id=2106
  • (i) Context integration. While we use some context information in Tohme (e.g., 3D-depth data, intersection complexity inference), we are exploring methods to include broader contextual cues about buildings, traffic signal poles, crosswalks, and pedestrians as well as the precise location of corners from top-down map imagery.

    (ii) 3D-data integration. Due to low-resolution and noise, we currently use 3D-point cloud data as a ground plane mask rather than as a feature to our CV algorithms. We plan to explore approaches that combine the 3D and 2D imagery to increase scene structure understanding (e.g., [28]). If higher resolution depth data becomes available, this may be useful to directly detect the presence of a curb or corner, which would likely improve our results.

    (iii) Training. Our CV algorithms are currently trained using GSV scenes from all eight city regions in our dataset. Given the variation in curb ramp appearance across geographic areas, we expect that performance could be improved if we trained and tested per city.

×