Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Characterizing Physical World Accessibility at Scale Using Crowdsourcing, Computer Vision, & Machine Learning

1,525 views

Published on

This talk was given as part of the Human-Computer Interaction Institute seminar series at Carnegie Mellon University. My host was Professor Jeffrey Bigham. More info here: https://www.hcii.cmu.edu/news/seminar/event/2014/10/characterizing-physical-world-accessibility-scale-using-crowdsourcing-computer-vision-machine-learning

You can download the original PowerPoint deck with videos here:
http://www.cs.umd.edu/~jonf/talks.html

Abstract: Roughly 30.6 million individuals in the US have physical disabilities that affect their ambulatory activities; nearly half of those individuals report using an assistive aid such as a wheelchair, cane, crutches, or walker. Despite comprehensive civil rights legislation, many city streets, sidewalks, and businesses remain inaccessible. The problem is not just that street-level accessibility affects where and how people travel in cities but also that there are few, if any, mechanisms to determine accessible areas of a city a priori.

In this talk, I will describe our research developing novel, scalable data-collection methods for acquiring accessibility information about the built environment using a combination of crowdsourcing, computer vision, and online map imagery (e.g., Google Street View). Our overarching goal is to transform the ways in which accessibility information is collected and visualized for every sidewalk, street, and building façade in the world. This work is in collaboration with University of Maryland Professor David Jacobs and graduate students Kotaro Hara and Jin Sun along with a number of undergraduate students and high school interns.

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Characterizing Physical World Accessibility at Scale Using Crowdsourcing, Computer Vision, & Machine Learning

  1. 1. Human Computer Interaction Laboratory makeability lab CHARACTERIZING PHYSICAL WORLD ACCESSIBILITY AT SCALE USING CROWDSOURCING, COMPUTER VISION, & MACHINE LEARNING
  2. 2. My Group Started in 2012
  3. 3. Human-Computer Interaction Lab
  4. 4. BenShneiderman BenBederson JonFroehlich JenGolbeck LeahFindlater MarshiniChetty JennyPreece AllisonDruin MonaLeighGuha TammyClegg JuneAhn EvanGolub TimClausner KentNorman IraChinoy KariKraus AnneRose CatherinePlaisant computer science hcil JessicaVitak NiklasElmqvist NicholasDiakopoulos
  5. 5. HCIL Hackerspace Founded in 2012
  6. 6. HCIL Hackerspace Looking North
  7. 7. HCIL Hackerspace Looking South
  8. 8. Three Soldering Stations HCIL Hackerspace
  9. 9. Craft/Textile Station HCIL Hackerspace
  10. 10. Two Mannequins HCIL Hackerspace
  11. 11. Wall of Electronic Components HCIL Hackerspace
  12. 12. Quadcopters HCIL Hackerspace
  13. 13. Two 3D-Printers HCIL Hackerspace
  14. 14. One CNC Machine HCIL Hackerspace
  15. 15. Physical Making HCIL Student Leyla Norooz
  16. 16. Electronics Making HCIL student Tansy McBurnie
  17. 17. E-Textile Design HCIL Student Michael Gubbels showing SFF
  18. 18. Collaborative Working HCIL students Joseph, Cy, Matt, and Jonah
  19. 19. Student Sewing HCIL student Matt sewing
  20. 20. Fun! HCIL students Kotaro Hara and Allan Fong
  21. 21. More Fun! HCIL students Sean, Michael, Alexa, and me
  22. 22. Human-Computer Interaction Lab
  23. 23. Human Computer Interaction Laboratory makeability lab CHARACTERIZING PHYSICAL WORLD ACCESSIBILITY AT SCALE USING CROWDSOURCING, COMPUTER VISION, & MACHINE LEARNING
  24. 24. 30.6 million U.S. adults with mobility impairment
  25. 25. 15.2 million use an assistive aid
  26. 26. Incomplete Sidewalks Physical Obstacles Surface Problems No Curb Ramps Stairs/Businesses
  27. 27. The National Council on Disability noted that there is no comprehensive information on “the degree to which sidewalks are accessible” in cities. National Council on Disability, 2007 The impact of the Americans with Disabilities Act: Assessing the progress toward achieving the goals of the ADA
  28. 28. The lack of street-level accessibility information can have a significant impact on the independence and mobility of citizens cf. Nuernberger, 2008; Thapar et al., 2004
  29. 29. I usually don’t go where I don’t know [about accessible routes] -P3, congenital polyneuropathy
  30. 30. “Man in Wheelchair Hit By Vehicle Has Died From Injuries” -The Aurora, May 9, 2013
  31. 31. http://youtu.be/gWuryTNRFzw
  32. 32. http://accesscore.org This is a mockup interface based on walkscore.com and walkshed.com
  33. 33. http://accesscore.org This is a mockup interface based on walkscore.com and walkshed.com
  34. 34. How might a tool like AccessScore: Change the way people think about and understand their neighborhoods Influence property values Impact where people choose to live Change how governments/citizens make decisions about infrastructural investments
  35. 35. AccessScore would not change how people navigate the city, for this we need a different tool…
  36. 36. NAVIGATION TOOLS ARE NOT ACCESSIBILITY AWARE
  37. 37. Routing for: Manual Wheelchair 1st of 3 Suggested Routes 16 minutes, 0.7 miles, 1 obstacle ! ! ! ! A B Route 1 Route 2 Surface Problem Avg Severity: 3.6 (Hard to Pass) Recent Comments: “Obstacle is passable in a manual chair but not in a motorized chair” Routing for: Manual Wheelchair A 1st of 3 Suggested Routes 16 minutes, 0.7 miles, 1 obstacle ! ACCESSIBILITY AWARE NAVIGATION SYSTEMS
  38. 38. Where is this data going to come from?
  39. 39. Safe Routes to School Walkability Audit Rock Hill, South Carolina Walkability Audit Wake County, North Carolina Walkability Audit Wake County, North Carolina TRADITIONAL WALKABILITY AUDITS
  40. 40. Safe Routes to School Walkability Audit Rock Hill, South Carolina Walkability Audit Wake County, North Carolina Walkability Audit Wake County, North Carolina TRADITIONAL WALKABILITY AUDITS
  41. 41. http://www1.nyc.gov/311/index.page MOBILE REPORTING SOLUTIONS
  42. 42. http://www1.nyc.gov/311/index.page MOBILE REPORTING SOLUTIONS
  43. 43. Similar to physical audits, these tools are built for in situ reporting and do not support remote, virtual inquiry—which limits scalability Not designed for accessibility data collection
  44. 44. MARK & FIND ACCESSIBLE BUSINESSES wheelmap.org axsmap.com
  45. 45. MARK & FIND ACCESSIBLE BUSINESSES wheelmap.org axsmap.com Focuses on businesses rather than streets & sidewalks Model is still to report on places you’ve visited
  46. 46. Our Approach: Use Google Street View (GSV) as a massive data source for scalably finding and characterizing street-level accessibility
  47. 47. HIGH-LEVEL RESEARCH QUESTIONS 1.Can we use Google Street View (GSV) to find street- level accessibility problems? 2.Can we create interactive systems to allow minimally trained crowdworkers to quickly and accurately perform remote audit tasks? 3.Can we use computer vision and machine learning to scale our approach?
  48. 48. ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation TOWARDS SCALABLE ACCESSIBILITY DATA COLLECTION
  49. 49. ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation TODAY’S TALK
  50. 50. ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation TODAY’S TALK
  51. 51. ASSETS’12 GOALS: 1.Investigate viability of reapproprating online map imagery to determine sidewalk accessibility via crowd workers 2.Examine the effect of three different interactive labeling interfaces on task accuracy and duration
  52. 52. WEB-BASED LABELING INTERFACE
  53. 53. WEB-BASED LABELING INTERFACE FOUR STEP PROCESS 1. Find and mark accessibility problem 2. Select problem category 3. Rate problem severity 4. Submit completed image
  54. 54. WEB-BASED LABELING INTERFACE VIDEO Video shown to crowd workers before they labeled their first image http://youtu.be/aD1bx_SikGo
  55. 55. WEB-BASED LABELING INTERFACE VIDEO http://youtu.be/aD1bx_SikGo
  56. 56. THREE LABELING INTERFACES Point-and-click Rectangular Outline Polygonal Outline Pixel Granularity
  57. 57. Los Angeles DATASET: 100 IMAGES New York Baltimore Washington DC
  58. 58. DATASET BREAKDOWN 34 29 27 11 19 0 10 20 30 40 No Curb Ramp Surface Problem Object in Path Sidewalk Ending No Sidewalk Accessibility Issues Manually curated 100 images from urban neighborhoods in LA, Baltimore, Washington DC, and NYC
  59. 59. DATASET BREAKDOWN 34 29 27 11 19 0 10 20 30 40 No Curb Ramp Surface Problem Object in Path Sidewalk Ending No Sidewalk Accessibility Issues Manually curated 100 images from urban neighborhoods in LA, Baltimore, Washington DC, and NYC Used to evaluate false positive labeling activity
  60. 60. DATASET BREAKDOWN 34 29 27 11 19 0 10 20 30 40 No Curb Ramp Surface Problem Object in Path Sidewalk Ending No Sidewalk Accessibility Issues Manually curated 100 images from urban neighborhoods in LA, Baltimore, Washington DC, and NYC Used to evaluate false positive labeling activity This breakdown based on majority vote data from 3 independent researcher labels
  61. 61. Our ground truth process
  62. 62. What accessibility problems exist in this image?
  63. 63. R1 R2 R3 Researcher Label Table Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem Other
  64. 64. Object in Path Curb Ramp Missing R1 R2 R3 Researcher Label Table
  65. 65. Object in Path Curb Ramp Missing R1 R2 R3 Researcher Label Table x2 Researcher 1
  66. 66. x4 Object in Path Curb Ramp Missing R1 R2 R3 Researcher Label Table Researcher 2
  67. 67. Researcher 3 Object in Path Curb Ramp Missing R1 R2 R3 Researcher Label Table x8
  68. 68. Researcher 1 Researcher 2 Researcher 3
  69. 69. There are multiple ways to examine the labels.
  70. 70. Object in Path Curb Ramp Missing R1 R2 R3 Researcher Label Table Image Level Analysis This table tells us what accessibility problems exist in the image
  71. 71. Pixel Level Analysis Labeled pixels tell us where the accessibility problems exist in the image.
  72. 72. Why do we care about image level vs. pixel level?
  73. 73. Coarse Precise Point Location Level Sub-block Level Block Level (Pixel Level) (Image Level)
  74. 74. Coarse Precise Point Location Level Sub-block Level Block Level (Pixel Level) (Image Level)
  75. 75. Coarse Precise Point Location Level Sub-block Level Block Level (Pixel Level) (Image Level) Pixel level labels could be used for training machine learning algorithms for detection and recognition tasks
  76. 76. Coarse Precise Localization Spectrum Point Location Level Sub-block Level Block Level Class Spectrum Multiclass Object in Path Curb Ramp Missing Prematurely Ending Sidewalk Surface Problem Binary Problem No Problem (Pixel Level) (Image Level) TWO ACCESSIBILITY PROBLEM SPECTRUMS Different ways of thinking about accessibility problem labels in GSV Coarse Precise
  77. 77. Object in Path Curb Ramp Missing R1 R2 R3 Researcher Label Table Problem Multiclass label Binary Label Sidewalk Ending Surface Problem Other
  78. 78. To produce a single ground truth dataset, we used majority vote.
  79. 79. R1 R2 R3 Maj. Vote Researcher Label Table Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem Other
  80. 80. R1 R2 R3 Maj. Vote Researcher Label Table Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem Other
  81. 81. R1 R2 R3 Maj. Vote Researcher Label Table Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem Other
  82. 82. ASSETS’12 MTURK STUDY METHOD Independently posted 3 labeling interfaces to MTurk. Crowdworkers could work with only one interface. For training, turkers required to watch first 1.5 mins of 3-min instructional video. Hired ~7 workers per image to explore avg accuracy Turkers paid ~3-5 cents per HIT. We varied number of images/HIT from 1-10.
  83. 83. ASSETS’12 MTURK DESCRIPTIVE RESULTS Hired 132 unique workers Worked on 2,325 assignments Provided a total of 4,309 labels (AVG=1.9/image)
  84. 84. MAIN FINDINGS: IMAGE-LEVEL ANALYSIS 0% 20% 40% 60% 80% 100% Point-and-click Outline Rectangle AVERAGE ACCURACY Higher is better 0 10 20 30 40 50 Point-and-click Outline Rectangle MEDIAN TASK TIME (SECS) Lower is better
  85. 85. MAIN FINDINGS: IMAGE-LEVEL ANALYSIS 83.0% 82.6% 79.2% 0% 20% 40% 60% 80% 100% Point-and-click Outline Rectangle AVERAGE ACCURACY All three interfaces performed similarly. This is without quality control. 0 10 20 30 40 50 Point-and-click Outline Rectangle MEDIAN TASK TIME (SECS) Higher is better Lower is better
  86. 86. MAIN FINDINGS: IMAGE-LEVEL ANALYSIS 83.0% 82.6% 79.2% 0% 20% 40% 60% 80% 100% Point-and-click Outline Rectangle 32.9 41.5 43.3 0 10 20 30 40 50 Point-and-click Outline Rectangle AVERAGE ACCURACY MEDIAN TASK TIME (SECS) All three interfaces performed similarly. This is without quality control. Point-and-click is the fastest; 26% faster than Outline & 32% faster than Rectangle Higher is better Lower is better
  87. 87. ASSETS’12 CONTRIBUTIONS: 1.Demonstrated that minimally trained crowd workers could locate and categorize sidewalk accessibility problems in GSV images with > 80% accuracy 2.Showed that point-and-click fastest labeling interface but that outline faster than rectangle
  88. 88. ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation TODAY’S TALK
  89. 89. ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation TODAY’S TALK
  90. 90. CHI’13 GOALS: 1.Expand ASSETS’12 study with larger sample. •Examine accuracy as function of turkers/image •Evaluate quality control mechanisms •Gain qualitative understanding of failures/successes 2.Validate researcher ground truth with labels from three wheelchair users
  91. 91. Los Angeles DATASET: EXPANDED TO 229 IMAGES New York Baltimore Washington DC
  92. 92. CHI’13 GOALS: 1.Expand ASSETS’12 study with larger sample. •Examine accuracy as function of turkers/image •Evaluate quality control mechanisms •Gain qualitative understanding of failures/successes 2.Validate researcher ground truth with labels from three wheelchair users
  93. 93. GROUND TRUTH: MAJORITY VOTE 3 RESEARCHER LABELS How “good” is our ground truth?
  94. 94. IN-LAB STUDY METHOD Three wheelchair participants Independently labeled 75 of 229 GSV images Used think-aloud protocol. Sessions were video recorded 30-min post-study interview We used Fleiss’ kappa to measure agreement between wheelchair users and researchers
  95. 95. Here is an example recording from the study session
  96. 96. IN-LAB STUDY RESULTS Strong agreement (κmulticlass=0.74) between wheelchair participants and researcher labels (ground truth) In interviews, one participant mentioned using GSV to explore areas prior to travel
  97. 97. CHI’13 GOALS: 1.Expand ASSETS’12 study with larger sample. •Examine accuracy as function of turkers/image •Evaluate quality control mechanisms •Gain qualitative understanding of failures/successes 2.Validate researcher ground truth with labels from three wheelchair users
  98. 98. CHI’13 GOALS: 1.Expand ASSETS’12 study with larger sample. •Examine accuracy as function of turkers/image •Evaluate quality control mechanisms •Gain qualitative understanding of failures/successes 2.Validate researcher ground truth with labels from three wheelchair users
  99. 99. CHI’13 MTURK STUDY METHOD Similar to ASSETS’12 but more images (229 vs. 100) and more turkers (185 vs. 132) Added crowd verification quality control Recruited 28+ turkers per image to investigate accuracy as function of workers
  100. 100. University of Maryland: Help make our sidewalks more accessible for wheelchair users with Google Maps Kotaro Hara Timer: 00:07:00 of 3 hours 10 3 hours Labeling Interface
  101. 101. Kotaro Hara Timer: 00:07:00 of 3 hours University of Maryland: Help make our sidewalks more accessible for wheelchair users with Google Maps 3 hours 10 Verification Interface
  102. 102. Kotaro Hara Timer: 00:07:00 of 3 hours University of Maryland: Help make our sidewalks more accessible for wheelchair users with Google Maps 3 hours 10 Verification Interface
  103. 103. CHI’13 MTURK LABELING STATS Hired 185 unique workers Worked on 7,517 labeling tasks (AVG=40.6/turker) Provided a total of 13,379 labels (AVG=1.8/image) Hired 273 unique workers Provided a total of 19,189 verifications CHI’13 MTURK VERIFICATION STATS
  104. 104. CHI’13 MTURK LABELING STATS Hired 185 unique workers Worked on 7,517 labeling tasks (AVG=40.6/turker) Provided a total of 13,379 labels (AVG=1.8/image) CHI’13 MTURK VERIFICATION STATS Hired 273 unique workers Provided a total of 19,189 verifications Median image labeling time vs. verification time: 35.2s vs. 10.5s
  105. 105. CHI’13 MTURK KEY FINDINGS 81% accuracy without quality control 93% accuracy with quality control
  106. 106. Some turker labeling successes...
  107. 107. TURKER LABELING EXAMPLES Curb Ramp Missing
  108. 108. TURKER LABELING EXAMPLES Curb Ramp Missing
  109. 109. TURKER LABELING EXAMPLES Object in Path
  110. 110. TURKER LABELING EXAMPLES Object in Path
  111. 111. TURKER LABELING EXAMPLES Prematurely Ending Sidewalk
  112. 112. TURKER LABELING EXAMPLES Prematurely Ending Sidewalk
  113. 113. TURKER LABELING EXAMPLES Surface Problems
  114. 114. TURKER LABELING EXAMPLES Surface Problems
  115. 115. TURKER LABELING EXAMPLES Surface Problems Object in Path
  116. 116. TURKER LABELING EXAMPLES Surface Problems Object in Path
  117. 117. And now some turker failures…
  118. 118. TURKER LABELING ISSUES Overlabeling Some Turkers Prone to High False Positives No Curb Ramp
  119. 119. No Curb Ramp TURKER LABELING ISSUES Overlabeling Some Turkers Prone to High False Positives Incorrect Object in Path label. Stop sign is in grass.
  120. 120. TURKER LABELING ISSUES Overlabeling Some Turkers Prone to High False Positives Surface Problems
  121. 121. TURKER LABELING ISSUES Overlabeling Some Turkers Prone to High False Positives Surface Problems Tree not actually an obstacle
  122. 122. TURKER LABELING ISSUES Overlabeling Some Turkers Prone to High False Positives No problems in this image
  123. 123. TURKER LABELING ISSUES Overlabeling Some Turkers Prone to High False Positives
  124. 124. T1 T2 T3 Maj. Vote 3 Turker Majority Vote Label Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem Other T3 provides a label of low quality
  125. 125. To look into the effect of turker majority vote on accuracy, we had 28 turkers label each image
  126. 126. 28 groups of 1: We had 28 turkers label each image:
  127. 127. 28 groups of 1: We had 28 turkers label each image: 9 groups of 3:
  128. 128. 28 groups of 1: We had 28 turkers label each image: 9 groups of 3: 5 groups of 5:
  129. 129. 28 groups of 1: We had 28 turkers label each image: 9 groups of 3: 5 groups of 5:
  130. 130. 28 groups of 1: We had 28 turkers label each image: 9 groups of 3: 5 groups of 5: 4 groups of 7: 3 groups of 9:
  131. 131. 78.3% 83.8% 86.8% 86.6% 87.9% 50% 60% 70% 80% 90% 100% 1 turker (N=28) 3 turkers (N=9) 5 turkers (N=5) 7 turkers (N=4) 9 turkers (N=3) Average Image-level Accuracy (%) Error bars: standard error Image-Level Accuracy Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem 4 Labels Multiclass Accuracy
  132. 132. 78.3% 83.8% 86.8% 86.6% 87.9% 50% 60% 70% 80% 90% 100% 1 turker (N=28) 3 turkers (N=9) 5 turkers (N=5) 7 turkers (N=4) 9 turkers (N=3) Average Image-level Accuracy (%) Error bars: standard error Image-Level Accuracy Multiclass Accuracy Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem 4 Labels
  133. 133. 78.3% 83.8% 86.8% 86.6% 87.9% 50% 60% 70% 80% 90% 100% 1 turker (N=28) 3 turkers (N=9) 5 turkers (N=5) 7 turkers (N=4) 9 turkers (N=3) Average Image-level Accuracy (%) Error bars: standard error Image-Level Accuracy Multiclass Accuracy Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem 4 Labels Accuracy saturates after 5 turkers
  134. 134. 78.3% 83.8% 86.8% 86.6% 87.9% 50% 60% 70% 80% 90% 100% 1 turker (N=28) 3 turkers (N=9) 5 turkers (N=5) 7 turkers (N=4) 9 turkers (N=3) Average Image-level Accuracy (%) Error bars: standard error Image-Level Accuracy Multiclass Accuracy Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem 4 Labels Stderr: 0.2% Stderr=0.2%
  135. 135. 78.3% 83.8% 86.8% 86.6% 87.9% 50% 60% 70% 80% 90% 100% 1 turker (N=28) 3 turkers (N=9) 5 turkers (N=5) 7 turkers (N=4) 9 turkers (N=3) Average Image-level Accuracy (%) Error bars: standard error Image-Level Accuracy Multiclass Accuracy Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem 4 Labels Binary Accuracy
  136. 136. 78.3% 83.8% 86.8% 86.6% 87.9% 50% 60% 70% 80% 90% 100% 1 turker (N=28) 3 turkers (N=9) 5 turkers (N=5) 7 turkers (N=4) 9 turkers (N=3) Average Image-level Accuracy (%) Error bars: standard error Image-Level Accuracy Multiclass Accuracy Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem 4 Labels Binary Accuracy 1 Label Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem Problem
  137. 137. 78.3% 83.8% 86.8% 86.6% 87.9% 80.6% 86.9% 89.7% 90.6% 90.2% 50% 60% 70% 80% 90% 100% 1 turker (N=28) 3 turkers (N=9) 5 turkers (N=5) 7 turkers (N=4) 9 turkers (N=3) Average Image-level Accuracy (%) Error bars: standard error Image-Level Accuracy Multiclass Accuracy Object in Path Curb Ramp Missing Sidewalk Ending Surface Problem 4 Labels Binary Accuracy 1 Label Problem
  138. 138. 81.2% 85.8% 88.1% 89.3% 91.8% 92.7% 90.7% 50% 60% 70% 80% 90% 100% EVALUATING QUALITY CONTROL MECHANISMS Image-Level, Binary Classification 1 labeler 1 labeler, 3 verifiers (majority vote) 1 labeler, 3 verifiers (zero tolerance) 3 labelers (majority vote) 3 labelers (majority vote) 3 verifiers (majority vote) 3 labelers (majority vote) 3 verifiers (zero tolerance) 5 labelers (majority vote)
  139. 139. 81.2% 85.8% 88.1% 89.3% 91.8% 92.7% 90.7% 50% 60% 70% 80% 90% 100% EVALUATING QUALITY CONTROL MECHANISMS Image-Level, Binary Classification 1 labeler 1 labeler, 3 verifiers (majority vote) 1 labeler, 3 verifiers (zero tolerance) 3 labelers (majority vote) 3 labelers (majority vote) 3 verifiers (majority vote) 3 labelers (majority vote) 3 verifiers (zero tolerance) 5 labelers (majority vote) 3 labelers + 3 verifiers = 93%
  140. 140. CHI’13 CONTRIBUTIONS: 1.Extended and reaffirmed findings from ASSETS’12 about viability of GSV and crowd work for locating and categorizing accessibility problems 2.Validated our ground truth labeling approach 3.Assessed simple quality control approaches
  141. 141. ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation TODAY’S TALK
  142. 142. ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation TODAY’S TALK
  143. 143. All of the approaches so far relied purely on manual labor, which limits scalability
  144. 144. & Manual Labor Computation
  145. 145. Automatic Workflow Adaptation for Crowdsourcing Lin et al. 2012; Dai et al. 2011 ; Kamar et al. 2012
  146. 146. Computer Vision & Streetview Goodfellow et al., 2014; Chu et al., 2014; Naik et al., 2014
  147. 147. Tohme 遠目 Remote Eye ・
  148. 148. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・ Design Principles 1.Computer vision is cheap (zero cost) 2.Manual verification is far cheaper than manual labeling 3.Automatic curb ramp detection is hard and error prone 4.Fixing a false positive is easy, fixing a false negative is hard (requires manual labeling).
  149. 149. The “lack of curb cuts is a primary obstacle to the smooth integration of those with disabilities into the commerce of daily life.” Kinney et al. vs. Yerusalim & Hoskins, 1993 3rd Circuit Court of Appeals
  150. 150. “Without curb cuts, people with ambulatory disabilities simply cannot navigate the city” Kinney et al. vs. Yerusalim & Hoskins, 1993 3rd Circuit Court of Appeals
  151. 151. Dataset svDetect Automatic Curb Ramp Detection svCrawl Web Scraper Tohme 遠目 Remote Eye ・
  152. 152. svCrawl Web Scraper svDetect Automatic Curb Ramp Detection Tohme 遠目 Remote Eye ・ Curb Ramp Detection on Street View image False positives False negatives = missed curb ramps
  153. 153. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation Tohme 遠目 Remote Eye ・
  154. 154. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification Tohme 遠目 Remote Eye ・
  155. 155. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification Tohme 遠目 Remote Eye ・ svVerify can only fix false positives, not false negatives! That is, there is no way for a worker to add new labels at this stage!
  156. 156. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  157. 157. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・ .
  158. 158. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  159. 159. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・ Complexity: Cardinality: Depth: CV: 0.14 0.33 0.21 0.22
  160. 160. svVerify Manual Label Verification Tohme 遠目 Remote Eye ・ Complexity: Cardinality: Depth: CV: 0.14 0.33 0.21 0.22 Predict presence of false negatives with linear SVM and Lasso regression svCrawl Web Scraper Dataset svLabel Manual Labeling
  161. 161. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svVerify Manual Label Verification Tohme 遠目 Remote Eye ・ Complexity: Cardinality: Depth: CV: 0.82 0.25 0.96 0.54
  162. 162. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  163. 163. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  164. 164. Google Street View Intersection Panoramas and GIS Metadata 3D Point-cloud Data Top-down Google Maps Imagery Scraper & Dataset
  165. 165. Saskatoon Los Angeles Baltimore Washington D.C. Washington D.C. Baltimore Los Angeles Saskatoon
  166. 166. Washington D.C. Dense urban area Semi-urban residential areas Scraper & Dataset
  167. 167. Washington D.C. Baltimore Los Angeles Saskatoon * At the time of downloading data in summer 2013 Scraper & Dataset Total Area: 11.3 km2 Intersections: 1,086 Curb Ramps: 2,877 Missing Curb Ramps: 647 Avg. GSV Data Age: 2.2 yrs
  168. 168. How well does GSV data reflect the current state of the physical world?
  169. 169. Vs.
  170. 170. Washington D.C. Baltimore Physical Audit Areas GSV and Physical World > 97.7% agreement 273 Intersections Dataset | Validating Dataset Small disagreement due to construction.
  171. 171. Washington D.C. Baltimore Physical Audit Areas 273 Intersections > 97.7% agreement Dataset Key Takeaway Google Street View is a viable source of curb ramp data
  172. 172. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  173. 173. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  174. 174. AUTOMATIC CURB RAMP DETECTION 1.Curb Ramp Detection 2.Post-Processing Output 3.SVM-Based Classification
  175. 175. Deformable Part Models Felzenszwalb et al. 2008 Automatic Curb Ramp Detection http://www.cs.berkeley.edu/~rbg/latent/
  176. 176. Deformable Part Models Felzenszwalb et al. 2008 Automatic Curb Ramp Detection http://www.cs.berkeley.edu/~rbg/latent/ Root filter Parts filter Displacement cost
  177. 177. Automatic Curb Ramp Detection Multiple redundant detection boxes Detected Labels Stage 1: Deformable Part Model Correct 1 False Positive 12 Miss 0
  178. 178. Automatic Curb Ramp Detection Curb ramps shouldn’t be in the sky or on roofs Correct 1 False Positive 12 Miss 0 Detected Labels Stage 1: Deformable Part Model
  179. 179. Automatic Curb Ramp Detection Detected Labels Stage 2: Post-processing
  180. 180. Automatic Curb Ramp Detection Detected Labels Stage 3: SVM-based Refinement Filter out labels based on their size, color, and position. Correct 1 False Positive 5 Miss 0
  181. 181. Automatic Curb Ramp Detection Correct 1 False Positive 3 Miss 0 Detected Labels Stage 3: SVM-based Refinement
  182. 182. Automatic Curb Ramp Detection Correct 6 False Positive 11 Miss 1 Detected Labels Stage 1: Deformable Part Model
  183. 183. Automatic Curb Ramp Detection Correct 6 False Positive 6 Miss 1 Detected Labels Stage 2: Post-processing
  184. 184. Automatic Curb Ramp Detection Correct 6 False Positive 4 Miss 1 Detected Labels Stage 3: SVM-based Refinement
  185. 185. Some curb ramps never get detected False positive detections Automatic Curb Ramp Detection Correct 6 False Positive 4 Miss 1
  186. 186. Some curb ramps never get detected False positive detections Automatic Curb Ramp Detection Correct 6 False Positive 4 Miss 1 These false negatives are expensive to correct!
  187. 187. Used two-fold cross validation to evaluate CV sub-system
  188. 188. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision (%) Recall (%) Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS Precision Higher, less false positives Recall Higher, less false negatives
  189. 189. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision (%) Recall (%) Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS Goal: maximize area under curve
  190. 190. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision (%) Recall (%) Stage 1: DPM Stage 2: Post-Processing Stage 3: SVM Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS
  191. 191. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision (%) Recall (%) Stage 1: DPM Stage 2: Post-Processing Stage 3: SVM Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS
  192. 192. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision (%) Recall (%) Stage 1: DPM Stage 2: Post-Processing Stage 3: SVM Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS More than 20% of curb ramps were missed
  193. 193. 0% 20% 40% 60% 80% 100% 0% 20% 40% 60% 80% 100% Precision (%) Recall (%) Stage 1: DPM Stage 2: Post-Processing Stage 3: SVM Automatic Curb Ramp Detection COMPUTER VISION SUB-SYSTEM RESULTS Confidence threshold of -0.99, which results in 26% precision and 67% recall
  194. 194. Occlusion Illumination Scale Viewpoint Variation Structures Similar to Curb Ramps Curb Ramp Design Variation Automatic Curb Ramp Detection CURB RAMP DETECTION IS A HARD PROBLEM
  195. 195. Can we predict difficult intersections & CV performance?
  196. 196. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  197. 197. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  198. 198. Automatic Task Allocation | Features to Assess Scene Difficulty for CV Number of connected streets from metadata Depth information for intersection complexity analysis Top-down images to assess complexity of an intersection Number of detections and confidence values
  199. 199. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  200. 200. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  201. 201. 3x Manual Labeling | Labeling Interface
  202. 202. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  203. 203. svCrawl Web Scraper Dataset svDetect Automatic Curb Ramp Detection svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Tohme 遠目 Remote Eye ・
  204. 204. 2x Manual Label Verification
  205. 205. 2x Manual Label Verification
  206. 206. Automatic Detection and Manual Verification Automatic Task Allocation Can Tohme achieve equivalent or better accuracy at a lower time cost compared to a completely manual approach?
  207. 207. STUDY METHOD: CONDITIONS Manual labeling without smart task allocation & vs. CV + Verification without smart task allocation Tohme 遠目 Remote Eye ・ vs. Evaluation
  208. 208. Accuracy Task Completion Time Evaluation STUDY METHOD: MEASURES
  209. 209. Recruited workers from Mturk Used 1,046 GSV images (40 used for golden insertion) Evaluation STUDY METHOD: APPROACH
  210. 210. RESULTS Labeling Tasks Verification Tasks # of distinct turkers: 242 161 1,270 582 # of HITs completed: # of tasks completed: 6,350 4,820 # of tasks allocated: 769 277 Evaluation We used Monte Carlo simulations for evaluation
  211. 211. 84% 88% 86% 0% 20% 40% 60% 80% 100% Accuracy Measures (%) Precision Recall F-measure Manual Labeling CV and Manual Verification & 94 0 20 40 60 80 100 Task Completion Time / Scene (s) Manual Labeling CV and Manual Verification & Tohme 遠目 Remote Eye ・ Tohme 遠目 Remote Eye ・ Evaluation | Labeling Accuracy and Time Cost Error bars are standard deviations. ACCURACY COST (TIME)
  212. 212. 84% 68% 88% 58% 86% 63% 0% 20% 40% 60% 80% 100% Accuracy Measures (%) Precision Recall F-measure Manual Labeling CV and Manual Verification & 94 42 0 20 40 60 80 100 Task Completion Time / Scene (s) Manual Labeling CV and Manual Verification & Tohme 遠目 Remote Eye ・ Tohme 遠目 Remote Eye ・ Evaluation | Labeling Accuracy and Time Cost Error bars are standard deviations. ACCURACY COST (TIME)
  213. 213. 84% 68% 83% 88% 58% 86% 86% 63% 84% 0% 20% 40% 60% 80% 100% Accuracy Measures (%) Precision Recall F-measure Manual Labeling CV and Manual Verification & 94 42 81 0 20 40 60 80 100 Task Completion Time / Scene (s) Manual Labeling CV and Manual Verification & Tohme 遠目 Remote Eye ・ Tohme 遠目 Remote Eye ・ Evaluation | Labeling Accuracy and Time Cost Error bars are standard deviations. ACCURACY COST (TIME)
  214. 214. 84% 68% 83% 88% 58% 86% 86% 63% 84% 0% 20% 40% 60% 80% 100% Accuracy Measures (%) Precision Recall F-measure Manual Labeling CV and Manual Verification & 94 42 81 0 20 40 60 80 100 Task Completion Time / Scene (s) Manual Labeling CV and Manual Verification & Tohme 遠目 Remote Eye ・ Tohme 遠目 Remote Eye ・ Evaluation | Labeling Accuracy and Time Cost Error bars are standard deviations. 13% reduction in cost ACCURACY COST (TIME)
  215. 215. svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Evaluation | Smart Task Allocator ~80% of svVerify tasks were correctly routed ~50% of svLabel tasks were correctly routed
  216. 216. svControl Automatic Task Allocation svVerify Manual Label Verification svLabel Manual Labeling Evaluation | Smart Task Allocator If svControl worked perfectly, Tohme’s cost would drop to 28% of a manually labelling approach alone.
  217. 217. Example Labels from Manual Labeling
  218. 218. Evaluation | Example Labels from Manual Labeling
  219. 219. Evaluation | Example Labels from Manual Labeling
  220. 220. Evaluation | Example Labels from Manual Labeling
  221. 221. Evaluation | Example Labels from Manual Labeling
  222. 222. Evaluation | Example Labels from Manual Labeling
  223. 223. This is a driveway. Not a curb ramp. Evaluation | Example Labels from Manual Labeling
  224. 224. Evaluation | Example Labels from Manual Labeling
  225. 225. Evaluation | Example Labels from Manual Labeling
  226. 226. Examples Labels from CV + Verification
  227. 227. Raw Street View Image Evaluation | Example Labels from CV + Verification
  228. 228. False detection Automatic Detection Evaluation | Example Labels from CV + Verification
  229. 229. Automatic Detection + Human Verification Evaluation | Example Labels from CV + Verification
  230. 230. Automatic Detection Evaluation | Example Labels from CV + Verification
  231. 231. Automatic Detection + Human Verification Evaluation | Example Labels from CV + Verification
  232. 232. False verification Automatic Detection + Human Verification Evaluation | Example Labels from CV + Verification
  233. 233. UIST’14 CONTRIBUTIONS: 1.First CV system for automatically detecting curb ramps in images 2.Showed that automated methods could be used to improve labeling efficiency for curb ramps 3.Validated GSV as a viable curb ramp dataset
  234. 234. TOWARDS SCALABLE ACCESSIBILITY DATA COLLECTION ASSETS’12 Poster Feasibility study + labeling interface evaluation HCIC’13 Workshop Exploring early solutions to computer vision (CV) HCOMP’13 Poster 1st investigation of CV + crowdsourced verification CHI’13 Large-scale turk study + label validation with wheelchair users ASSETS’13 Applied to new domain: bus stop accessibility for visually impaired UIST’14 Crowdsourcing + CV + “smart” work allocation The Future
  235. 235. 8,209 Intersections in DC
  236. 236. 8,209 Intersections in DC BACK OF THE ENVELOPE CALCULATIONS Manually labeling GSV with our custom interfaces would take 214 hours With Tohme, this drops to 184 hours We think we can do better  Unclear how long a physical audit would take
  237. 237. FUTURE WORK: COMPUTER VISION Context integration & scene understanding 3D-data integration Improve training & sample size Mensuration
  238. 238. FUTURE WORK: FASTER LABELING & VERIFICATION INTERFACES
  239. 239. FUTURE WORK: TRACK PHYSICAL ACCESSIBILITY CHANGES OVER TIME
  240. 240. FUTURE WORK: ADDITIONAL SURVEYING TECHNIQUES Transmits real-time imagery of physical space along with measurements
  241. 241. THE CROWD-POWERED STREETVIEW ACCESSIBILITY TEAM! Kotaro Hara Jin Sun Victoria Le Robert Moore Sean Pannella Jonah Chazan David Jacobs Jon Froehlich Zachary Lawrence Graduate Student Undergraduate High School Professor
  242. 242. Flickr User: Pedro Rocha https://www.flickr.com/photos/pedrorocha/3627562740/ Flickr User: Brooke Hoyer https://www.flickr.com/photos/brookehoyer/14816521847/ Flickr User: Jen Rossey https://www.flickr.com/photos/jenrossey/3185264564/ Flickr User: Steven Vance https://www.flickr.com/photos/jamesbondsv/8642938765 Flickr User: Jorge Gonzalez https://www.flickr.com/photos/macabrephotographer/6225178809/ Flickr User: Mike Fraser https://www.flickr.com/photos/67588280@N00/10800029263// PHOTO CREDITS Flickr User: Susan Sermoneta https://www.flickr.com/photos/en321/344387583/
  243. 243. This work is supported by: Faculty Research Award Human Computer Interaction Laboratory makeability lab
  244. 244. Human Computer Interaction Laboratory makeability lab CHARACTERIZING PHYSICAL WORLD ACCESSIBILITY AT SCALE USING CROWDSOURCING, COMPUTER VISION, & MACHINE LEARNING

×