While end-users can acquire full 3D gestures with many input devices, they often capture only 3D trajectories, which are 3D uni-path, uni-stroke single-point gestures performed in thin air. Such trajectories with their (x,y,z) coordinates could be interpreted as three 2D stroke gestures projected on three planes,\ie, XY, YZ, and ZX, thus making them admissible for established 2D stroke gesture recognizers. To investigate whether 3D trajectories could be effectively and efficiently recognized, four 2D stroke gesture recognizers, i.e., $P, $P+, $Q, and Rubine, are extended to the third dimension: $P^3, $P+^3, $Q^3, and Rubine-Sheng, an extension of Rubine for 3D with more features. Two new variations are also introduced: $F for flexible cloud matching and FreeHandUni for uni-path recognition. Rubine3D, another extension of Rubine for 3D which projects the 3D gesture on three orthogonal planes, is also included. These seven recognizers are compared against three challenging datasets containing 3D trajectories, i.e., SHREC2019 and 3DTCGS, in a user-independent scenario, and 3DMadLabSD with its four domains, in both user-dependent and user-independent scenarios, with varying number of templates and sampling. Individual recognition rates and execution times per dataset and aggregated ones on all datasets show a highly significant difference of $P+^3 over its competitors. The potential effects of the dataset, the number of templates, and the sampling are also studied.
Automating Google Workspace (GWS) & more with Apps Script
Recognizing 3D Trajectories as 2D Multi-stroke Gestures
1.
2. Recognizing 3D Trajectories as 2D Multi-stroke
Gestures
ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020)
Paolo Roselli
Università degli Studi di
Roma, Italy
Université catholique de
Louvain, Belgium
Jean Vanderdonckt
LouRIM
Université catholique de
Louvain, Belgium
Nathan Magrofuoco
LouRIM
Université catholique de
Louvain, Belgium
Arthur Sluÿters
LouRIM
Université catholique de
Louvain, Belgium
Mehdi Ousmer
LouRIM
Université catholique de
Louvain, Belgium
5. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 1
Ousmer, M., Vanderdonckt, J., Buraga, S., 2019. An ontology for reasoning on body-based gestures, in: Proceedings of the ACM SIGCHI
Symposium on Engineering Interactive Computing Systems - EICS ’19. Presented at the ACM SIGCHI Symposium, ACM Press, Valencia,
Spain, pp. 1–6. https://doi.org/10.1145/3319499.3328238
GES
6. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 2
X
Y
Z
• Deep Learning
• SVM
• Neural Networks
• Machine Learning
• Nearest-Neighbor-Classification
(NNC)
• Pattern matching
pi=<xi , yi , zi >
8. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 4
What are the suitable state-of-art 2D gesture recognizers that
could answer our main question? How can we recognize a 3D
gesture?
What are the sampling rate and the number of templates that
give us the best results?
Are these recognizers efficient? Can we compare them with
existing 3D recognizers?
10. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 7
GRANDMA, Rubine’s recognizer(1991): The first published algorithm in
stroke gesture recognition, it uses statistical matching.
-It describes gestures with a vector of geometrical features.
$P: One of $-family accurate recognizer. $P is invariant to scaling, rotation,
order, number and direction.
-Representation of the gesture as a point cloud.
$Q : An improvement of the $P recognizer for low computational devices by
reducing Its computational needs. The recognizer uses a Lookup table for
the point matching and implements the early abandoning.
$P+ : The most accurate recognizer of the $-family. An optimization of $P
with good results for low-vision users. It has a more flexible cloud matching
than $P.
2D Recognizers
11. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 8
3D Recognizers
Rubine3D:
• Projection of the gesture on each plane (XY, YZ,
ZX)
• Usage of the original features vector to describe
a gesture.
• Use of a heuristic to determine the candidate
gesture class if the result classes are different
between planes.
X
Y
Z
12. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 9
3D Recognizers
Rubine-Sheng:
• Extension of the Rubine’s recognizer to three
dimensions.
• Description of the 3D gesture with a vector of
16 features.
• Extension of the original feature vectors to 3D
by adding three features. (Sheng, “A Study of
AdaBoost in 3D Gesture Recognition.”)
13. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 10
3D Recognizers
$P3:
• Extension of the $P recognizer to three dimensions.
• The point Cloud defined as a set of 3D points.
• Pre-Processing of the gesture (Resample, Scale, Translate to origin).
• Implementation of the Greedy-5 heuristic used in the $P.
$Q3:
• Use of a 16x16x16 3D grid for the lookup technique.
• Storage of the indices (row, column, layer) of the closest point in the
grid.
• The best matching point cloud is the point cloud with the lowest
dissimilarity score.
14. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 11
3D Recognizers
$F:
• Extension of the $P3 recognizer with a flexible cloud matching of $P+.
• Pre-processing of the gestures.
• The best matching template is the template with the lowest dissimilarity
score.
$P+3:
• Extension of the $P+ recognizer to the third dimension.
• Pre-processing of the gestures.
• Inclusion of the ‘z’ coordinate in the turning angle and the point distance
computation.
15. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 10
3D Recognizers
FreeHandUni:
• Derives from the FreeHand recognizer pseudocode (Craciun et al.).
• Replacement of the hand pose structure with a 3D point (x,y,z).
• $P3 extension with a flexible cloud matching of $P+ .
• No early abandoning.
• The best matching template is the template with the lowest dissimilarity
score.
17. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 12
Experiment
• Recognizers: $P3, $Q3, $P+3, $F, FH, R3D, RS
• Gesture sets: SHREC2019, 3DTCGS, 3DMadLabSD (4 Domains)
• Number of Templates (T) : {1,2,4,8,16}
• Sampling (N) : {4,8,16,32,64}
• User-Independent Scenario:
6 (Dataset)×5 (Sampling)×5 (Number of Templates)×100
(repetitions)×7(Recognizer)
= 105,000 recognition trials
• User-Dependent Scenario:
4 (Dataset)×10 (users)×5(Sampling)×4 (Number of
Templates)×100 (repetitions)×7 (Recognizer)
= 560,000 recognition trials
• Quantitative Measures:
• Recognition rate
• Execution time
• Evaluate effects:
• Datasets
• Number of templates T
• Sampling N
18. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 13
Datasets:
SHREC 2019
Cross “X” Circle “O”
V-mark “V” Caret “^” Square “[]”
3DTCGS
arc3Dleft arc3Dright caret check circle curly-braket-
left
curly-braket-
right
delete left-swipe pigtail poly3Dxyz poly3Dxzy poly3Dyxz poly3Dyzx
poly3Dzxy poly3Dzyx rectangle right-swipe spiral square-braket-left square-braket-
right
star triangle V X zig-zag
3DMadLabSD
0 1 2 3 4
5 6 7 8 9
a b c d e
f g h i j
Spring Mass Wheel Pulley Hinge
Fast
Forward
Rewind Play Pause Delete
Cuboid Cylinder Sphere
Rectangular
Pipe Hemisphere
Cylindrical
Pipe
Pyramid Tetrahedron Cone Toroid
Domain1
Domain3
Domain2
Domain4
19. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 14
SHREC2019 Results :
Execution time tables
Recognition tables
20. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 15
SHREC2019 Results : Confusion matrix for $P+3
21. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 16
SHREC2019 Results : recognition rate
26. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 21
Results: recognition rate
• Loss when switching from UD to UI scenario
27. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 22
Discussion
• $P+3 is the best recognizer in most conditions.
• Rubine-Sheng should be avoided.
• $-like recognizers are more accurate than R3D and RS.
• $F and FH have small variation.
• A significant effect on recognition rate (see paper for details).
-Number of templates.
-Sampling.
-Datasets
28. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 23
Limitation
• Rejection of some recognizers due to their supposed
computational complexity.
• The lack of diversity in the evaluated gestures.
• The impact of other factors on the measures.
30. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 25
Conclusion
• Possibility to interpret a 3D trajectory as three 2D trajectories
projected on each plane.
• Extension of four 2D stroke gesture recognizers to three
dimensions.
• Implementation of a testing framework in JavaScript.
• Test an extensive range of 3D trajectories .
• The winner of the test is the $P+3 recognizer.
31. ACM ISS 2020 (Lisbon, Portugal, November 9-11, 2020) 26
Future work
• Inclusion of omitted recognizers in the test to compare them against
$P+3.
• Evaluation of the gestures' depth variation impact.
• Test of the uni-stroke single-point gestures in a real-world application.
• Evaluation of complex gestures.
• Comparison with other algorithms (Not template based).
Hello everyone my name is Mehdi Ousmer and I'm a PhD student from Université catholique de Louvain in Belgium
Today I have the pleasure to present our recent work "Recognizing 3D trajectories as 2D Multi-stroke gestures".
I will start with the introduction, and then I'll move on to the other parts.
In a previous gesture elicitation study, we asked participants to elicit gestures that fit actions and commands to control objects in an IoT environment.
As we can see in this video, some participants elicited 2D symbolic gestures in thin air.
In the 3D space, we can represent gestures as sequential data of a single point. In the literature, we call these kinds of gestures “trajectories”.
Moreover, I would like to mention that there are several gesture recognizers for 2D or 3D gestures based on different techniques.
And for that reason, we raised in this work the question "Can we recognize 3D trajectories as 2D Multi-stroke gestures?"
Now I'd like to move to the challenges that we faced to answer the question:
After conducting a targeted Literature review on 2D stroke gesture recognizers, we selected four recognizers for multiple reasons, but, mainly because they are simple as well as well-known recognizers.
these recognizers are:
Rubine's algorithm
$P
$Q
$P+
Next, We extended these recognizers to 3 dimensions.
Rubine 3D: a 3D version of Rubine's recognizer formed of three 2D Rubine recognizers.
//
RubineSheng: An extension of Rubine’s recognizer to three dimensions.
$Pcube: The 3D version of $P.
$Qcube: The 3D version of $Q.
$F: An improved version of the $Pcube with a flexible cloud matching of $P+.
$P+cube: The 3D version of $P+.
FreehandUni: An improved version of the $Pcube with a flexible cloud matching of $P+. However, it differs from $F because the early abandoning is not included.
After we created new 3D recognizers, we must prove their efficiency. Thus, we'll experiment them in both user-independent and user-dependent scenarios, using three trajectories datasets.
Furthermore, we tested recognizers for different conditions which are defined by two variables: the number of templates and the number of points (sampling).
Now, I would like to draw your attention to the total number of recognition trials performed for, which is in total 665,000 recognition trials
We collected the test results through two quantitative measures: The recognition rate and the execution time.
Thanks to the collected results , we can analyze the effects of three factors on the recognition rate and execution time: the datasets/ the number of templates and the sampling.
During this experimentation, we used three datasets:
The SHREC2019: a dataset proposed in a recent contest.
3DTCGS: It was used in 3Cents recognizer evaluation.
3DMadLabSD: the dataset is composed of 4 subsets (Domains).
We present some examples of graphics and tables, produced from the test results on each dataset
Tables contain the recognition rate as well as the execution time values of every recognizer for the diverse conditions, the colour scheme is defined by the median value in each table.
We generated a normalized balanced confusion matrix for the $P+cube.
We plot individual results for the different conditions.
In this slide, we can examine the averaged recognition rate for each recognizer on all datasets.
In this diagram, as you can notice $P+cube has the best recognition rate with 87.48 %, even if it is a high recognition rate,
the $P+cube can have higher recognition rate, it depends on the gesture dataset, and the rate can exceed 90%, or
We remark the $-like recognizers which have a similar rate, which is around 80%.
At least we can observe that the R3D and RS have low recognition rates, under 70%.
The SHREC2019 and 3DTCGS datasets are evaluated only in the user-independent scenario, from what we view in these diagrams, the recognition rates are similar to the averaged recognition rates seen in the previous slide.
In this slide, we take note of the difference between the user-dependent and user-independent scenario.
We observe a loss of nearly 10% for the $P+cube, and around 20% for the other $-like recognizers.
In these diagrams, we notice the low rates of all recognizers for the Domain4. The principal cause is the type of gestures in this dataset.
This figure indicates the recognition rates for all recognizers on the MadLabSD in the User-independent and User-dependent scenarios.
We remark a significant difference when switching from user-dependent
to the user-independent scenario, though there are some exceptions.
Based on the results of the evaluation, we can say that $P+cube is the most efficient and effective recognizer in most conditions.
Rubine-Sheng should be avoided, due to its low recognition rate in all conditions compared to the other recognizers.
Concerning the effect of factors on measures,
We can cite the significant effect of the number of templates on the recognition rates of all the recognizers.
The significant effect of the number of points on the $-like recognizers
The datasets significantly affect the recognition rates of all recognizers in the user-independent scenario.
We can see some limitations of this experiment.
Rejecting some recognizers
Have one kind of gestures to evaluate
We didn’t include the dataset size as a factor to evaluate
To sum up, we made seven 3D gesture recognizers based on four 2D recognizers selected from the state-of-the-art.
We implemented a testing framework in JavaScript and tested the recognizers. The test results showed that the $P+cube provides excellent results.
We have a list of some improvements for future work
Evaluate the effect of the depth variations in gesture.
Compare with other recognizers based on other techniques