Earphones are now used for longer hours than before with the advancement in wireless technology and miniaturization. In addition, the application of earphones has become more diverse, and opportunities to access highly confidential information through them have increased. We propose a method comprising a hearable device equipped with a small camera for user authentication from ear images. This method improves the security of the hearable device. Ear images are first captured with the camera. The ear regions in the images are then extracted using a mask region-based convolutional neural network. Finally, the user is identified using histograms of oriented gradient features and a support vector machine (SVM). Our method was able to identify 18 participants with an accuracy of 84.1%. Users are authenticated through unsupervised anomaly detection using an autoencoder with an error rate of 8.36%. This method facilitates hands- and eye-free operations without requiring any explicit authentication action by the user.
Scanning the Internet for External Cloud Exposures via SSL Certs
EarAuthCam: Personal Identification and Authentication Method Using Ear Images Acquired with a Camera-Equipped Hearable Device
1. EarAuthCam:
Personal Identification and Authentication Method Using
Ear Images Acquired with a Camera-Equipped Hearable Device
Ke io U niv e rsity
Yurina Mizuho, Yohei Kawasaki,
Takashi Amesaka, Yuta Sugiura
The Augmented Humans International Conference 2024
2. • Hearable devices
• Used for more diverse purposes
• Listening to music, phone calls, operating applications, etc.
• Record daily activities
• Lifelog, location
• Access to private information via voice control
• Calendar/Mail/ lifelog
• Risk of information leakage due to theft or loss
2
Background
Calendar
Address
Voice Memo Location
Lifelog
3. Personal authentication using camera attached to hearable devices
• Purpose
• Add security to hearable devices
• Reducing the risk of information leakage
• Additional functions
• payment, healthcare
• Proposed method
• Attach a small camera to a hearable device
• Identify / Authenticate the wearer using an ear image
3
Our Approach:EarAuthCam
5. • Ear shape varies from individual to individual
• stable against aging [Iannarelli, 1989]
• has a highly complex asymmetric structure [Hurley, 2007]
• There are many biometrics using ear shapes
• Cannot be integrated into hearable devices
5
Ear Biometrics
Bodyprint
[Holz, 2015]
Model-based human ear identification
[Jeges, 2006]
Characteristic ear parts
6. 6
Flow of the proposed method
Collect
ear image
Remove
background
(Mask R-CNN)
Resize
HOG
features
SVM
Auto encoder
Calculate
error rates
Identification
Authentication
Capturing Ear Images Image Processing Identification・Authentication
7. • Attach a small camera (3 mm2) to earphone
• Angle and position of camera to be fixed with tape
• Connected to PC and operated manually
7
Method:Hardware
Camera connected to PC
Earphone with camera
Camera
PC
Evaluation
board
Camera (SONY, IU233N2-Z)
Capturing Ear Images Image Processing Identification・Authentication
8. • Capture the upper part of the ear
• Particularly high concentration of individuality [Ryu, 1998]
• 640×480 pixel
8
Method
Capturing Ear Images Image Processing Identification・Authentication
Camera direction Example of an ear image
Characteristic ear parts
9. • Mask R-CNN[3]
• Framework for object detection in images
• Use to
• Estimate the ear region
• Remove the background
9
[3] K He, G Gkioxari, P Dollar et al., "Mask r-cnn", Proceedings of the IEEE international conference on computer vision, pp. 2961-2969,(2017).
[4] Lin, T-Y., Maire, M., et al.: Microsoft COCO: Common objects in context; In proceedings of the 13th European Conference on Computer Vision, pp. 740-755 (2014).
Method:Background Removal
Capturing Ear Images Image Processing Identification・Authentication
Annotated ear region
train : validation 8 : 2
Annotation tool VIA
(VGG Image Annotator)
Pre-trained weights Microsoft COCO[4]
Environment Google Colaboratory
Epoch 50
Mask R-CNN training
10. • Remove background except for the estimated ear regions
• Resize
• 640×480 pixel →128×96 pixel
10
Method:Background Removal
The estimated ear region The image without background
Capturing Ear Images Image Processing Identification・Authentication
11. • Difference between identification and authentication
11
Method
Capturing Ear Images Image Processing Identification・Authentication
Task Evaluation Use case
Identification
Whose data the input data is.
(Classification)
Accuracy
(Correct
Answer rate)
Share a hearable device
→ Personalized information
Authentication
Whether the input data is
the person’s data or not.
Equal Error Rate
(EER)
One person uses a device
→ Additional functions
12. • HOG features
• Histograms of Oriented Gradient
• robust to local brightness changes
• 5940 dimensions per image
• SVM
• Support Vector Machine
• Using data from all the people we wanted to identify
• Evaluation
• 10-fold cross-validation
12
Method:Identification
window size 128 × 96
block size 16 × 16
block stride 8 × 8
cell size 8 × 8
bin 9
Parameters of HOG features
Capturing Ear Images Image Processing Identification・Authentication
13. • Autoencoder
• Compress input data, extract features, reconstruct the output
• Train to minimize the difference between input and output (similarity Score)
• Use only the data from the person in question
• If the Score is
• below the threshold
• ⇒User
• greater than the threshold
• ⇒Stranger
13
Method:Authentication
Capturing Ear Images Image Processing Identification・Authentication
Authentication using autoencoder
14. • FRR (False Rejection Rate)
• Percentage of recognizing oneself as others
• FAR (False Acceptance Rate)
• Percentage of recognizing others as oneself
• EER (Equal Error Rate)
• The rate when FRR and FAR are equal
14
Method:Authentication
EER
Error rate with threshold change
Capturing Ear Images Image Processing Identification・Authentication
Varies in trade-offs
depending on the threshold
15. 15
Experiment
• 20 participants×50 images = 1000 images
• Condition
• Indoors
• Bright/dark × 4 directions
• Sitting
• Reattach after each shot
Captured ear image
Number of participants 20 (11 males, 9 females)
Age 24.2 ± 6.9
Number of images per person 50
Train : Test 9 : 1 (10-fold cross validation)
Wearing position Right ear
Mask R-CNN training 100
(2 participants ×50 images)
Identification/Authentication 900
(18 participants ×50 images)
Participants’ information
Capturing Ear Images Image Processing Identification・Authentication
16. • Identification
• Accuracy 84.1%
16
Result
Capturing Ear Images Image Processing Identification・Authentication
Identification results for 18 participants
• Authentication
• EER 8.36%
Authentication results for 18 participants
17. • 6 participants×30 images×3 conditions = 540 images
• Lighting conditions
• Bright (1000 lx)
• Normal (300 lx)
• Dark (30 lx)
17
Additional Experiment
Number of participants 6(2 males, 4 females)
Age 22.3 ± 0.75
Number of images per person 30×3 conditions
Wearing position Right ear
Participants’ information
Capturing Ear Images Image Processing Identification・Authentication
18. • The model trained using images in all three conditions
• Identification:98.3%
• Authentication:5.61%
• Set up in a bright condition
• Add data in other conditions
18
Additional Experiment:Results and Discussion
Capturing Ear Images Image Processing Identification・Authentication
Results under different lighting conditions.
19. • Constraints in the experiment
• Lighting, Hair, Body movement
• Combining with non-image-based methods
• Acoustic microphone, infrared sensor, etc.
• Improve accuracy
• Use images of both ear
• Increase training data of Mask R-CNN
• Camera angle
• Include more ear features in the images using fisheye lens
19
Limitations and Future works
Characteristic ear parts
20. 20
Summary
Background Access more sensitive information through hearable devices
Related works Biometrics using ear acoustic and ear shape
Our approach Identification / Authentication using earphone with camera
Use case Tailored information presentation to the user
Implementation Attach a small camera to earphone
Results Identification: 84.1%, Authentication: EER 8.36%
Future work Improving accuracy, combining with other methods
21. • [Gao, 2019] Yang Gao, Wei Wang, Vir V. Phoha, Wei Sun, and Zhanpeng Jin. 2019. EarEcho:Using ear canal echo for wearable authentication. Proc. ACM Interact. Mob.
Wearable Ubiquitous Technol. 3, 3, Article 81 (Sep 2019), 1–24 pages. https://doi.org/10.1145/3351239
• [Choi, 2023] Seokmin Choi, Junghwan Yim, Yincheng Jin, Yang Gao, Jiyang Li, and Zhanpeng Jin. 2023. EarPPG: Securing Your Identity with Your Ears. In Proceedings of
the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 835–849.
https://doi.org/10.1145/3581641.3584070
• [Wang, 2022] Zi Wang, Yili Ren, Yingying Chen, and Jie Yang. 2022. ToothSonic: Earable authentication via acoustic toothprint. Proc. ACM Interact. Mob.Wearable
Ubiquitous Technol. 6, 2, Article 78 (Jul 2022), 1–24 pages. https://doi.org/10.1145/3534606
• [Iannarelli, 1989] A.V. Iannarelli. 1989. Ear identification. Paramount Publishing Company, Fremont, CA. https://books.google.co.jp/books?id=jgPkAAAACAAJ
• [Hurley,2007] D. J. Hurley, B. Arbab-Zavar, and M. S. Nixon. 2007. The ear as a biometric. In 2007 15th European Signal Processing Conference. IEEE, Poznan, Poland,
25–29.
• [Holz, 2015] Christian Holz, Senaka Buthpitiya, and Marius Knaust. 2015. Bodyprint: Biometric user identification on mobile devices using the capacitive touchscreen to
scan body parts. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI ’15). Association for Computing Machinery, New
York, NY, USA, 3011–3014. https://doi.org/10.1145/2702123.2702518
• [Jeges, 2006] Erno Jeges and Laszlo Mate. 2006. Model-based human ear identification. In 2006 World Automation Congress. IEEE, Budapest, Hungary, 1–6.
https://doi.org/10.1109/WAC.2006.375757
• [Tamaki, 2009] Emi Tamaki, Takashi Miyaki, and Jun Rekimoto. 2009. Brainy Hand: An Ear-Worn Hand Gesture Interaction Device. In CHI ’09 Extended Abstracts on
Human Factors in Computing Systems (Boston, MA, USA) (CHI EA ’09). Association for Computing Machinery, New York, NY, USA, 4255—-4260.
https://doi.org/10.1145/1520340.1520649
• [Chen, 2020] Tuochao Chen, Benjamin Steeper, Kinan Alsheikh, Songyun Tao, François Guimbretière, and Cheng Zhang. 2020. C-Face: Continuously Reconstructing
Facial Expressions by Deep Learning Contours of the Face with Ear-Mounted Miniature Cameras. In Proceedings of the 33rd Annual ACM Symposium on User Interface
Software and Technology (Virtual Event, USA) (UIST ’20). Association for Computing Machinery, New York, NY, USA, 112—-125.
https://doi.org/10.1145/3379337.3415879
• [Ryu, 1998] Onebae Ryu, Tatsuhiro Tamura, and Katsuyuki Shinohara. 1998. The Individual Identification with Pinna Image: An Examination for Auto Detection of Pinna
Area and Individuality. ITE Tech. Report 22, 45 (1998), 49–53.
21
Reference
23. • Identification
• Accuracy 84.1%
23
Result
Capturing Ear Images Image Processing Identification・Authentication
Identification results for 18 participants
Classifier Identification Accuracy (%)
Random forest 77.8
K-nearest neighbor 78.6
SVM (our approach) 84.1
Comparison of with other classifiers
24. • Authentication
• Total EER 8.36%
24
Result
Capturing Ear Images Image Processing Identification・Authentication
The EER for each participant Authentication results for 18 participants
25. • The number of ear images for each participant
• 50 images
• Identification:84.1%
• Authentication:8.36%
• 10 images
• Identification:82.8%
• Authentication:10.9%
25
Discussion:The number of training data
Relationship between
the number of ear images and accuracy
Capturing Ear Images Image Processing Identification・Authentication
26. • The number of ear images
• Only 10 images
• Identification:82.8%
• Authentication:10.9%
• Processing time
• In Google Colaboratory
• Set up: 2 minutes
• Process a single image
• Identification:4.33 seconds
• Authentication:4.7 seconds
26
Discussion:Usability
Relationship between
the number of ear images and accuracy
27. • The accuracy of background removal
• Directly affect to identification/authentication accuracy
• Increase the data for training
• Data augmentation
• Use large number of individuals
• Increase pattern of ear images
• Dataset quality
• Remove the outlier images before training
27
Discussion:Background Removal
28. • Keeping the camera running all the time consumes a lot of battery
• Leading to heat generation
• Activate camera only when necessary
• When detecting the attachment, accessing more personal information
• Reduce the frame rate
28
Discussion:Battery Consumption
29. • Camera-based method
• Privacy issues
• Extract only the calculated features from the image
• Use infrared camera
• Make people around the camera aware of its use
• Shining an LED while the camera is activated
29
Discussion:Privacy