Development of Real-World Sensor Optimal
Placement Support Software
A y a n e S a i t o 1 ) , W a t a r u K a w a i 2 ) a n d Y u t a S u g i u r a 1 )
1 ) K e i o U n i v e r s i t y
2 ) T h e U n i v e r s i t y o f T o k y o
Asian CHI Symposium 2020
• Gesture recognition and posture estimation can be performed by
combining real-world sensors and machine learning.
• The number and position of sensors that can acquire unique sensor
data are often determined by trial and error.
2
Combining Real-World Sensors and Machine Learning
The flow of system design
Examination
of sensor
placement
Create device
Acquire
identification
result
Accumulate
learning data
Combining real-world sensors
and machine learning [1]
[1] Munehiko Sato, Ivan Poupyrev, and Chris Harrison. Touche: Enhancing touch interaction on humans, screens, liquids, and everyday objects. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI
'12, pp. 483{492, New York, NY, USA, 2012. ACM.
• Learn virtual egocentric video and the posture of a humanoid model
walking in a virtual world
• Estimate the walking postures of people in the real world by combining
data in images shot by a camera attached a pedestrian and learning
data in the virtual world
• Use learning data in the virtual world to estimate the real-world system
3
Simulation of Real-World System on Computer
[2] Yuan Y., Kitani K. (2018) 3D Ego-Pose Estimation via Imitation Learning. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11220.
Springer, Cham
• Difficulties in accumulating learning data in the real world
→ Generate classifier using the learning data acquired in the virtual world
4
Software to Support Layout and Data Collection
for Machine-learning-based Distance Sensors
Kinect Measurement target
Acquire 3D depth
information
Convert to
3D mesh
Recordable = The movement of the sensor
measurement target can be used multiple times
Unity (Virtual world)
Object imitating a
distance sensor
Measure distance between
sensor and 3D mesh using
RayCast function and binary
search algorithm = Sensor value
Unity’s Game screen
• Purpose
• Develop a system that presents the optimal distance sensor
placement using software to support layout and data collection for
machine-learning-based real-world sensors
• Principle
• Arranging many sensors on the software and calculating the
identification rate when each sensor is used
• Calculate the combination of sensors with high identification rate
5
Our Purpose and Principle
• Facial expression identification based on changes in the distance
between distance sensors placed on an eyeglass frame and the skin
surface on a face
6
Facial Expression Identification by Distance Sensor
AffectiveWear [5]
[5] Masai, K., Sugiura, Y., Ogata, M., Kunze, K., Inami, M., Sugimoto, M.: Facial Expression Recognition in Daily Life by Embedded Photo Reflective Sensors on Smart Eyewear. In: Proceedings of the 21st International Conference on
Intelligent User Interfaces (IUI '16), pp. 317-326. ACM, New York, NY, USA (2016).
• Consist of a learning phase using software and an estimation phase
using real-world sensors
7
Flow of Proposed System
• Fix the head with the chin rest and record the facial expression using
Kinect
• Play the recorded facial expressions on Unity and measure the
distance between the sensor and face
• Display measured distance data as a graph and save as CSV file
8
Distance Data Acquisition on Software
Sensor
Record using Kinect Unity’s game screen
Sensor
value
Kinect Chin rest
• Calculate the identification rate with one sensor for all the
sensors placed on Unity
• Save the sensor position with the highest identification rate with
one sensor
• Calculate identification rate with two sensor data including
the saved optimal single sensor and one sensor selected
from the remaining sensors
• Save the sensor position with the highest identification rate with
two sensors
• By performing the same operation with three sensor data,
four sensor data, etc., the placement on each number of
sensors with highest identification rate were obtained
9
Optimal Placement Calculation
30% 35%
50% 40%
50% 60%
40%
80%
70%
90%
: Sensor used for identification
: Sensor not used for identification
50% : Identification rate
• Presents the optimal placement using 22 sensors placed around the
eye
• Create 1-frame data from averaging 10 consecutive frames for each
sensor
• Reduce the influence of noise on the Kinect 3D depth information
10
Present Optimal Placement
Total number of sensors 22 (around the eye)
Number of facial expressions to identify 4 kinds
Number of data acquisitions for each
facial expression
10 times
Number of frames 1 frame
(average 10 frames )
Learning Support vector machine
Evaluation method Leave-one-out cross-
validation 4 facial expressions
22 Sensors
placed on Unity
11
Results of Presenting Optimal Placement
Optimal placement and identification results in case of 4 sensors
Result : 95.0%
Result : 100.0%
: Sensor used for identification
: Sensor not used for identification
Identification rate for each number of sensors
Optimal placement and identification results in case of 9 sensors
• Optimal placement with 9 sensors was reproduced in the real world
using ToF sensors and distance data were acquired
• Verifie whether the ToF sensors data can be successfully classified
by referring the classifier generated from the distance data of 9
sensors placed optimally on Unity
12
Real-World Sensor Data Acquisition
Total number of sensors 9
Number of facial expressions to identify 4 kinds
Number of data acquisitions for each
facial expression
10 times
Number of frames 1 frame
(average 30 frames )
Learning Support vector machine
Evaluation method Leave-one-out cross-
validation
ToF
sensor
• Identify 9 ToF sensors data by referring to the classifier generated
from the distance data of 9 sensors placed optimally on Unity
• Identification rate is 85.0%
13
Identification Results of ToF Sensors
• Influence of noise on the 3D depth information of Kinect
• Increase the processing speed of the software
• A combination of sensors calculated by another algorithm may have a
higher identification rate
14
Limitations and Future Work
15
Conclusion
Implement
ation
• Optimal placement and number of
sensors were examined by
increasing the number of sensors
one by one and performing SVM
identification
Evaluation
Back
ground
• The number and position of sensors
that can acquire unique sensor data
are often determined by trial and
error
Purpose
• Develop a system that presents the
optimal distance sensor placement
using software to support layout and
data collection for machine-learning-
based real-world sensors
Contact: ayane-3110@keio.jp

Development of Real-World Sensor Optimal Placement Support Software(AsianCHI2020)

  • 1.
    Development of Real-WorldSensor Optimal Placement Support Software A y a n e S a i t o 1 ) , W a t a r u K a w a i 2 ) a n d Y u t a S u g i u r a 1 ) 1 ) K e i o U n i v e r s i t y 2 ) T h e U n i v e r s i t y o f T o k y o Asian CHI Symposium 2020
  • 2.
    • Gesture recognitionand posture estimation can be performed by combining real-world sensors and machine learning. • The number and position of sensors that can acquire unique sensor data are often determined by trial and error. 2 Combining Real-World Sensors and Machine Learning The flow of system design Examination of sensor placement Create device Acquire identification result Accumulate learning data Combining real-world sensors and machine learning [1] [1] Munehiko Sato, Ivan Poupyrev, and Chris Harrison. Touche: Enhancing touch interaction on humans, screens, liquids, and everyday objects. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, CHI '12, pp. 483{492, New York, NY, USA, 2012. ACM.
  • 3.
    • Learn virtualegocentric video and the posture of a humanoid model walking in a virtual world • Estimate the walking postures of people in the real world by combining data in images shot by a camera attached a pedestrian and learning data in the virtual world • Use learning data in the virtual world to estimate the real-world system 3 Simulation of Real-World System on Computer [2] Yuan Y., Kitani K. (2018) 3D Ego-Pose Estimation via Imitation Learning. In: Ferrari V., Hebert M., Sminchisescu C., Weiss Y. (eds) Computer Vision – ECCV 2018. ECCV 2018. Lecture Notes in Computer Science, vol 11220. Springer, Cham
  • 4.
    • Difficulties inaccumulating learning data in the real world → Generate classifier using the learning data acquired in the virtual world 4 Software to Support Layout and Data Collection for Machine-learning-based Distance Sensors Kinect Measurement target Acquire 3D depth information Convert to 3D mesh Recordable = The movement of the sensor measurement target can be used multiple times Unity (Virtual world) Object imitating a distance sensor Measure distance between sensor and 3D mesh using RayCast function and binary search algorithm = Sensor value Unity’s Game screen
  • 5.
    • Purpose • Developa system that presents the optimal distance sensor placement using software to support layout and data collection for machine-learning-based real-world sensors • Principle • Arranging many sensors on the software and calculating the identification rate when each sensor is used • Calculate the combination of sensors with high identification rate 5 Our Purpose and Principle
  • 6.
    • Facial expressionidentification based on changes in the distance between distance sensors placed on an eyeglass frame and the skin surface on a face 6 Facial Expression Identification by Distance Sensor AffectiveWear [5] [5] Masai, K., Sugiura, Y., Ogata, M., Kunze, K., Inami, M., Sugimoto, M.: Facial Expression Recognition in Daily Life by Embedded Photo Reflective Sensors on Smart Eyewear. In: Proceedings of the 21st International Conference on Intelligent User Interfaces (IUI '16), pp. 317-326. ACM, New York, NY, USA (2016).
  • 7.
    • Consist ofa learning phase using software and an estimation phase using real-world sensors 7 Flow of Proposed System
  • 8.
    • Fix thehead with the chin rest and record the facial expression using Kinect • Play the recorded facial expressions on Unity and measure the distance between the sensor and face • Display measured distance data as a graph and save as CSV file 8 Distance Data Acquisition on Software Sensor Record using Kinect Unity’s game screen Sensor value Kinect Chin rest
  • 9.
    • Calculate theidentification rate with one sensor for all the sensors placed on Unity • Save the sensor position with the highest identification rate with one sensor • Calculate identification rate with two sensor data including the saved optimal single sensor and one sensor selected from the remaining sensors • Save the sensor position with the highest identification rate with two sensors • By performing the same operation with three sensor data, four sensor data, etc., the placement on each number of sensors with highest identification rate were obtained 9 Optimal Placement Calculation 30% 35% 50% 40% 50% 60% 40% 80% 70% 90% : Sensor used for identification : Sensor not used for identification 50% : Identification rate
  • 10.
    • Presents theoptimal placement using 22 sensors placed around the eye • Create 1-frame data from averaging 10 consecutive frames for each sensor • Reduce the influence of noise on the Kinect 3D depth information 10 Present Optimal Placement Total number of sensors 22 (around the eye) Number of facial expressions to identify 4 kinds Number of data acquisitions for each facial expression 10 times Number of frames 1 frame (average 10 frames ) Learning Support vector machine Evaluation method Leave-one-out cross- validation 4 facial expressions 22 Sensors placed on Unity
  • 11.
    11 Results of PresentingOptimal Placement Optimal placement and identification results in case of 4 sensors Result : 95.0% Result : 100.0% : Sensor used for identification : Sensor not used for identification Identification rate for each number of sensors Optimal placement and identification results in case of 9 sensors
  • 12.
    • Optimal placementwith 9 sensors was reproduced in the real world using ToF sensors and distance data were acquired • Verifie whether the ToF sensors data can be successfully classified by referring the classifier generated from the distance data of 9 sensors placed optimally on Unity 12 Real-World Sensor Data Acquisition Total number of sensors 9 Number of facial expressions to identify 4 kinds Number of data acquisitions for each facial expression 10 times Number of frames 1 frame (average 30 frames ) Learning Support vector machine Evaluation method Leave-one-out cross- validation ToF sensor
  • 13.
    • Identify 9ToF sensors data by referring to the classifier generated from the distance data of 9 sensors placed optimally on Unity • Identification rate is 85.0% 13 Identification Results of ToF Sensors
  • 14.
    • Influence ofnoise on the 3D depth information of Kinect • Increase the processing speed of the software • A combination of sensors calculated by another algorithm may have a higher identification rate 14 Limitations and Future Work
  • 15.
    15 Conclusion Implement ation • Optimal placementand number of sensors were examined by increasing the number of sensors one by one and performing SVM identification Evaluation Back ground • The number and position of sensors that can acquire unique sensor data are often determined by trial and error Purpose • Develop a system that presents the optimal distance sensor placement using software to support layout and data collection for machine-learning- based real-world sensors Contact: ayane-3110@keio.jp