EarAuthCam: Personal Identification and Authentication Method Using Ear Images Acquired with a Camera-Equipped Hearable Device

EarAuthCam:
Personal Identification and Authentication Method Using
Ear Images Acquired with a Camera-Equipped Hearable Device
Ke io U niv e rsity
Yurina Mizuho, Yohei Kawasaki,
Takashi Amesaka, Yuta Sugiura
The Augmented Humans International Conference 2024

• Hearable devices
• Used for more diverse purposes
• Listening to music, phone calls, operating applications, etc.
• Record daily activities
• Lifelog, location
• Access to private information via voice control
• Calendar/Mail/ lifelog
• Risk of information leakage due to theft or loss
2
Background
Calendar
Address
Voice Memo Location
Lifelog

Personal authentication using camera attached to hearable devices
• Purpose
• Add security to hearable devices
• Reducing the risk of information leakage
• Additional functions
• payment, healthcare
• Proposed method
• Attach a small camera to a hearable device
• Identify / Authenticate the wearer using an ear image
3
Our Approach：EarAuthCam

4
[1] RN002 TW, https://amzn.asia/d/bDP2Id3 (Accessed on 04/03/2024).
Related Works
Hand gesture recognition
[Tamaki, 2009]
Facial expression reconstruction
[Chen, 2020]
Hearable devices×Biometrics
Ear Acoustic
[Gao, 2019]
PPG signals
[Choi, 2024]
Ear Acoustic
[RN002 TW[1]]
△Environmental sound, ear conditions
Hearable devices×Camera
Tooth sound
[Wang, 2022]

• Ear shape varies from individual to individual
• stable against aging [Iannarelli, 1989]
• has a highly complex asymmetric structure [Hurley, 2007]
• There are many biometrics using ear shapes
• Cannot be integrated into hearable devices
5
Ear Biometrics
Bodyprint
[Holz, 2015]
Model-based human ear identification
[Jeges, 2006]
Characteristic ear parts

6
Flow of the proposed method
Collect
ear image
Remove
background
(Mask R-CNN)
Resize
HOG
features
SVM
Auto encoder
Calculate
error rates
Identification
Authentication
Capturing Ear Images Image Processing Identification・Authentication

• Attach a small camera (3 mm2) to earphone
• Angle and position of camera to be fixed with tape
• Connected to PC and operated manually
7
Method：Hardware
Camera connected to PC
Earphone with camera
Camera
PC
Evaluation
board
Camera (SONY, IU233N2-Z)

• Capture the upper part of the ear
• Particularly high concentration of individuality [Ryu, 1998]
• 640×480 pixel
8
Method
Camera direction Example of an ear image

• Mask R-CNN[3]
• Framework for object detection in images
• Use to
• Estimate the ear region
• Remove the background
9
[3] K He, G Gkioxari, P Dollar et al., "Mask r-cnn", Proceedings of the IEEE international conference on computer vision, pp. 2961-2969,(2017).
[4] Lin, T-Y., Maire, M., et al.: Microsoft COCO: Common objects in context; In proceedings of the 13th European Conference on Computer Vision, pp. 740-755 (2014).
Method：Background Removal
Annotated ear region
train : validation 8 : 2
Annotation tool VIA
(VGG Image Annotator)
Pre-trained weights Microsoft COCO[4]
Environment Google Colaboratory
Epoch 50
Mask R-CNN training

• Remove background except for the estimated ear regions
• Resize
• 640×480 pixel →128×96 pixel
10
Method：Background Removal
The estimated ear region The image without background

• Difference between identification and authentication
11
Method
Task Evaluation Use case
Identification
Whose data the input data is.
(Classification)
Accuracy
(Correct
Answer rate)
Share a hearable device
→ Personalized information
Authentication
Whether the input data is
the person’s data or not.
Equal Error Rate
(EER)
One person uses a device
→ Additional functions

• HOG features
• Histograms of Oriented Gradient
• robust to local brightness changes
• 5940 dimensions per image
• SVM
• Support Vector Machine
• Using data from all the people we wanted to identify
• Evaluation
• 10-fold cross-validation
12
Method：Identification
window size 128 × 96
block size 16 × 16
block stride 8 × 8
cell size 8 × 8
bin 9
Parameters of HOG features

• Autoencoder
• Compress input data, extract features, reconstruct the output
• Train to minimize the difference between input and output (similarity Score)
• Use only the data from the person in question
• If the Score is
• below the threshold
• ⇒User
• greater than the threshold
• ⇒Stranger
13
Method：Authentication
Authentication using autoencoder

• FRR (False Rejection Rate)
• Percentage of recognizing oneself as others
• FAR (False Acceptance Rate)
• Percentage of recognizing others as oneself
• EER (Equal Error Rate)
• The rate when FRR and FAR are equal
14
Method：Authentication
EER
Error rate with threshold change
Varies in trade-offs
depending on the threshold

15
Experiment
• 20 participants×50 images = 1000 images
• Condition
• Indoors
• Bright/dark × 4 directions
• Sitting
• Reattach after each shot
Captured ear image
Number of participants 20 (11 males, 9 females)
Age 24.2 ± 6.9
Number of images per person 50
Train : Test 9 : 1 (10-fold cross validation)
Wearing position Right ear
Mask R-CNN training 100
(2 participants ×50 images)
Identification/Authentication 900
(18 participants ×50 images)
Participants’ information

• Identification
• Accuracy 84.1%
16
Result
Identification results for 18 participants
• Authentication
• EER 8.36%
Authentication results for 18 participants

• 6 participants×30 images×3 conditions = 540 images
• Lighting conditions
• Bright (1000 lx)
• Normal (300 lx)
• Dark (30 lx)
17
Additional Experiment
Number of participants 6（2 males, 4 females）
Age 22.3 ± 0.75
Number of images per person 30×3 conditions
Wearing position Right ear
Participants’ information

• The model trained using images in all three conditions
• Identification：98.3%
• Authentication：5.61%
• Set up in a bright condition
• Add data in other conditions
18
Additional Experiment：Results and Discussion
Results under different lighting conditions.

• Constraints in the experiment
• Lighting, Hair, Body movement
• Combining with non-image-based methods
• Acoustic microphone, infrared sensor, etc.
• Improve accuracy
• Use images of both ear
• Increase training data of Mask R-CNN
• Camera angle
• Include more ear features in the images using fisheye lens
19
Limitations and Future works

20
Summary
Background Access more sensitive information through hearable devices
Related works Biometrics using ear acoustic and ear shape
Our approach Identification / Authentication using earphone with camera
Use case Tailored information presentation to the user
Implementation Attach a small camera to earphone
Results Identification: 84.1%, Authentication: EER 8.36%
Future work Improving accuracy, combining with other methods

• [Gao, 2019] Yang Gao, Wei Wang, Vir V. Phoha, Wei Sun, and Zhanpeng Jin. 2019. EarEcho:Using ear canal echo for wearable authentication. Proc. ACM Interact. Mob.
Wearable Ubiquitous Technol. 3, 3, Article 81 (Sep 2019), 1–24 pages. https://doi.org/10.1145/3351239
• [Choi, 2023] Seokmin Choi, Junghwan Yim, Yincheng Jin, Yang Gao, Jiyang Li, and Zhanpeng Jin. 2023. EarPPG: Securing Your Identity with Your Ears. In Proceedings of
the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 835–849.
https://doi.org/10.1145/3581641.3584070
• [Wang, 2022] Zi Wang, Yili Ren, Yingying Chen, and Jie Yang. 2022. ToothSonic: Earable authentication via acoustic toothprint. Proc. ACM Interact. Mob.Wearable
Ubiquitous Technol. 6, 2, Article 78 (Jul 2022), 1–24 pages. https://doi.org/10.1145/3534606
• [Iannarelli, 1989] A.V. Iannarelli. 1989. Ear identification. Paramount Publishing Company, Fremont, CA. https://books.google.co.jp/books?id=jgPkAAAACAAJ
• [Hurley,2007] D. J. Hurley, B. Arbab-Zavar, and M. S. Nixon. 2007. The ear as a biometric. In 2007 15th European Signal Processing Conference. IEEE, Poznan, Poland,
25–29.
• [Holz, 2015] Christian Holz, Senaka Buthpitiya, and Marius Knaust. 2015. Bodyprint: Biometric user identification on mobile devices using the capacitive touchscreen to
scan body parts. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). Association for Computing Machinery, New
York, NY, USA, 3011–3014. https://doi.org/10.1145/2702123.2702518
• [Jeges, 2006] Erno Jeges and Laszlo Mate. 2006. Model-based human ear identification. In 2006 World Automation Congress. IEEE, Budapest, Hungary, 1–6.
https://doi.org/10.1109/WAC.2006.375757
• [Tamaki, 2009] Emi Tamaki, Takashi Miyaki, and Jun Rekimoto. 2009. Brainy Hand: An Ear-Worn Hand Gesture Interaction Device. In CHI ’09 Extended Abstracts on
Human Factors in Computing Systems (Boston, MA, USA) (CHI EA ’09). Association for Computing Machinery, New York, NY, USA, 4255—-4260.
https://doi.org/10.1145/1520340.1520649
• [Chen, 2020] Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing
Facial Expressions by Deep Learning Contours of the Face with Ear-Mounted Miniature Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface
Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 112—-125.
https://doi.org/10.1145/3379337.3415879
• [Ryu, 1998] Onebae Ryu, Tatsuhiro Tamura, and Katsuyuki Shinohara. 1998. The Individual Identification with Pinna Image: An Examination for Auto Detection of Pinna
Area and Individuality. ITE Tech. Report 22, 45 (1998), 49–53.
21
Reference

• Identification
• Accuracy 84.1%
23
Result
Identification results for 18 participants
Classifier Identification Accuracy (%)
Random forest 77.8
K-nearest neighbor 78.6
SVM (our approach) 84.1
Comparison of with other classifiers

• Authentication
• Total EER 8.36%
24
Result
The EER for each participant Authentication results for 18 participants

• The number of ear images for each participant
• 50 images
• 10 images
25
Discussion：The number of training data
Relationship between
the number of ear images and accuracy

• The number of ear images
• Only 10 images
• Processing time
• In Google Colaboratory
• Set up： 2 minutes
• Process a single image
• Identification：4.33 seconds
• Authentication：4.7 seconds
26
Discussion：Usability
Relationship between
the number of ear images and accuracy

• The accuracy of background removal
• Directly affect to identification/authentication accuracy
• Increase the data for training
• Data augmentation
• Use large number of individuals
• Increase pattern of ear images
• Dataset quality
• Remove the outlier images before training
27
Discussion：Background Removal

• Keeping the camera running all the time consumes a lot of battery
• Leading to heat generation
• Activate camera only when necessary
• When detecting the attachment, accessing more personal information
• Reduce the frame rate
28
Discussion：Battery Consumption

• Camera-based method
• Privacy issues
• Extract only the calculated features from the image
• Use infrared camera
• Make people around the camera aware of its use
• Shining an LED while the camera is activated
29
Discussion：Privacy

30
Comparison with related work using ear images

• Existing products
31
https://www.amazon.in/V-T-I-Bluetooth-Headphones-Canceling-Microphone/dp/B07YB743JM (Accessed on 04/03/2024).
Camera equipped with a hearable device

EarAuthCam: Personal Identification and Authentication Method Using Ear Images Acquired with a Camera-Equipped Hearable Device

Recommended

Recommended

More Related Content

Similar to EarAuthCam: Personal Identification and Authentication Method Using Ear Images Acquired with a Camera-Equipped Hearable Device

Similar to EarAuthCam: Personal Identification and Authentication Method Using Ear Images Acquired with a Camera-Equipped Hearable Device (20)

More from sugiuralab

More from sugiuralab (20)

Recently uploaded

Recently uploaded (20)

EarAuthCam: Personal Identification and Authentication Method Using Ear Images Acquired with a Camera-Equipped Hearable Device