Xeric Facial Recognition Whitepaper

Facial recognition is both a science as well as an art. While there are various theories available on getting it correct, it depends
on real situation on ground to achieve the intended result. Apart from quality of enrolled and probe images, it is relatively
important to consider other non-technical aspect which could be equally important. This guide does not gives you a special
formula to cover all situation, but to provide sufficient guidance to achieve optimum result. Many of it might be very minor that
one could easily missed out and spend much time and resources to eventually rectify it.
Introduction
Once those essential considerations are being taken care during design and installation, you will be able to:
• Set and manage the appropriate level of expectations with customers.
• Design and proposal the most suitable solution that deals with real onsite conditions.
• Achieved the highest probability of intercept with best quality face indexes with given conditions.
What can be achieved?
This guide will illustrate and explain various key considerations that are critical to achieving optimum result with Xeric. There
are few fundamental principles on CCTV that will still be applicable while using Xeric. On top of those, we will also need to
consider quality of images, camera installations, lighting, etc. We will also be introduction COMAH concept, which represents
few important critical of consideration while designing the solution onsite.
These factors are inter-related and they should not be considered individually. Instead, they must be examined holistically
onsite to know how each impact the other and the entire solution as a whole.
Key Considerations
Xeric Facial Recognition Whitepaper Page 1

In this section, we will first exam the quality of images and they are broadly classified into 2 categories. Enrolled images refer
to those images that are being imported into Xeric as watchlist database. These images formed the watchlist which are stored
on Xeric and being used to make comparison against those live images. Images that are being captured by cameras are usually
being referred to as probe image or live images.
Quality differences between enrolled and probe images will affect Xeric’s accuracy. Facial recognition is fundamentally pattern
matching algorithm at work. It evolves from matching algorithm using simple geometric models to sophisicated mathematical
computations and matching processes. Thus the differences does contributed to overall performance of Xeric. In this section,
we will examine recommended quality of both enrolled and probe image.
Image Quality
Enrolled images are those images that are being imported into Xeric watchlist database. They formed the database which
probe images will be used to search against it for similar faces. There are few ways to obtain images for importing into Xeric,
following are some of those common means:-
1. Passport Photographs / Mugshots
Usually these photos will comply to a standard, such as ICAO, to ensure those images are of good
quality. It is recommended to have a higher resolution image as high resolution images are able to resize
to input it into the database.
2. Live video feeds
Xeric will indexed all the faces that are correctly presented in front of cameras. These indexed images
often provide the best form of enroll images as they are obtained from a source which will also be used
to capture probe images.
3. Images from Identity card, newspaper, magazine
There might be requirement for some user, such as law enforcement, to provide images that are not of
high quality. They can be images from an identity card, faces that appear in newspaper or magazine. If
the photo are of decent size and quality, these images can still be used as enrolled images.
Identity Card such as one on the
left can be used to extract the
photo of the person as enroll
image. However, the quality of
the photo play a part in the
accurancy.
This image is an example of
passport photo or mugshot that
can be used as enrolled image.
Live video feed, such as above, can be used to input into Xeric and extract face images which can be used as enrolled image.
Enrolled Images

Probe images are images that are being used to compare or match against those enrolled images in the database. Typically, a
probe image are being obtained using live video feed when a face are being presented in front of camera. Xeric’s algorithm
provide face tracking and face extraction, and putting the face as an indexed image within Xeric. All indexed images are being
searched against watchlist database to determine if there are any match. Apart from using live video feed, facial image or
recorded video can also be input into Xeric to compare against watchlist database.
You will noticed that differences between enrolled images and probe images are in the way that they are being used. Both
image types need to fulfill certain minimum requirement in order to provide optimal performance. This is due to
fundamentally, facial recognition are pattern matching between 2 images and similarity scores are being provided. Thus the
higher the level of similarity between 2 images, the higher the accurancy of match.
Good lighting level is essential to capture details on a face clearly for analysis purpose to achieve the aim of facial recognition.
A good lighting level should evenly lit a face, clearly show facial details of a person. It should balance between having an image
that is either too bright or too dark, where details are lost or hidden. This affects the level of accuracy of facial recognition
engine.
Photo 1 below shows an example of Good Lighting. The face is evenly lit without any burnout or shadow zone. There is a
distinct different between foreground, which is the face, and background, which could be a wall. Ideally, all photos should be
enrolled or indexed as such to provide optimal performance.
Photo 2 below shows an example of Overexposed image. There are no or very little shadow areas and many parts of the face
are being burned out or with glare. As compared to Photo 1, the glare erased the boundary of spectacles and other details
around the cheeks, nose and forehead area.
Photo 3 below shows another extreme contrast to overexposed, Underexposed. An underexposed image does not have
sufficient lighting and thus a lot of details are being hidden.
This image on the left shows an outcome of a
successful match. The left most image is the
indexed image from a live video feed. This
image is being matched against a watchlist
database, which contains this enrolled image.
This enrolled image is the same person close
to 20 years ago.
Photo 1 - Good Lighting Photo 2 - Overexposed Photo 3 - Underexposed
Probe Images
Lighting Level

Camera is the sensor that captures facial image for facial recognition engine to perform its analytic task. Customer might want
a CCTV camera to perform multiple role, whereby saving the number of cameras that they need to install. Many existing
cameras are not suitable for facial recognition due to resolution, field of view (FoV) and camera angle. We will be covering more
on these aspect at later section.
The most versatile camera type that we always recommend is the box camera, which enable one to fit with a telephoto lens.
This provide the flexibility of installation as well as getting the required pixel between eyes and FoV needed for facial
recognition. Xeric do not limit itself from using just box camera but match the type of camera based on customer’s
requirement. In extreme cases, Xeric uses webcam or built-in camera on laptop to perform its role.
Camera Selection
Photo 2 and Photo 3 could be the resultant of improper lighting condition or camera settings. In most cases, IP cameras are
able to cater to relatively wide range of lighting. It is highly recommended to leave the exposure setting of the camera to the
camera itself. One might just indicator a specific zone or area which need to be properly exposed to cater for various lighting
conditions. Lighting, apart from fixed lighting, could also mean sun rise, sun set, overcast, reflection, etc. In most cases, lighting
level of an image falls between Good Lighting condition and both extreme of over and under exposed image.
Resolution can be defined as level of visual detail or clarity in an image. In term of CCTV terminology, it is defining the
horizontal and vertical pixel count of a camera. Take for an example, a typical IP camera could be 720p, 1080p or 3.1 megapixels
or 5.2 megapixel. It is also further divided into whether the aspect ratio is 4:3 or 16:9. In a typical 720p 16:9 format, one will get
1280 pixels horizontally and 720 pixels vertically.
So what is a pixel? Pixel means “Picture Element” and they are the smallest unit of element that makes up of a picture.
Translating this into 720p resolution, we will be 1280 picture elements laying horizontally and 720 picture elements laying
vertically to form an image. The table below shows different resolution level of a CCTV camera.
How does resolution affects facial recognition? Xeric required a minimum of 30 pixels between eyes, recommended 120, to
perform accurate facial recognition. A high-resolution camera is able to produce higher pixel count, which equate to index and
match more people as compared to camera with lower resolution within a common FoV.
Resolution, together with frame per second, activites level and duration of activities determine your network bandwidth and
required storage. In Xeric’s case, Xeric do not really required huge amount of storage as Xeric do not need to store videos.
Images are being processed at the front end and sending those processed images of a single person required approximately
10kB per image. By default, Xeric captured 5 images per person which make it 50kB per person.
The Resolution Table on the next page illustrate to you various common resolution that you encounter, the number of pixels
and their common term. We will also show you the relationship between resolution, frame rate, duration and computing the
video size.
Resolution

Field of View (FoV) refers to an area that can be seen through a CCTV camera, which is in direct relationship to the lens
attached. The 2 images below are example of same scene taken with 2 different horizontal FoV.
From the look of it, the differences in FoV do not seem to be very significant. Image on the left is approximately 4 feet wider
than image on the right. However, this slight difference does make significant differences if this image is suitable for facial
recognition. When we look at the image on the left, we are achieving approximately 32 pixels between eyes on a 720p image.
This image just managed to meet the minimum requirement for facial recognition. Using same camera setting and by just
having a more narrow FoV, we are able to achieve 65 pixels between eyes. This is illustrated in the below images.
Wide Field of View Narrow Field of View
Field of View

The images above further illustrate the differences of using a suitable lens for getting the appropriate FoV. While a camera is
looking at a particular scene, a varifocal len will be able to bring your zone of focus “closer” to you.
Using this principal, facial recognition requires a narrow FoV where we are able to achieve good pixel count between eyes.
Apart from good pixel count, it is important to get the image to be in good focus, which is easier to achieve in narrower FoV. By
controlling the pixel count between eyes, we are also able to define our “capture zone” where faces of people within the zone
will be indexed.
There is another term which one will need to take note as this affects your object focus, Depth of Field (DoF). DoF is a zone
defines within a minimum distance and maximum distance away from the lens where object within it will be in focus. This zone
is affected by iris setting of the camera, which deals with the amount of light that enters the lens.
You can see from the previous example that having an appropriate FoV increases pixels between eyes. This can be achieved
using a varifocal lens, that gives you the flexibility to zoom into your Area of Interest (AOI). Many people mixed up between
optical zoom and digital zoom and while they enable you to focus on your AOI, it does not bring the same result.
Optical zoom is achieved using a physical zoom lens. There is no change in resolution and you achieved the same pixel count
both horizontally and vertically. Digital zoom is achieved using in-camera processing to enlarge the AOI and cropping away
those edges. While you can maintain the same aspect ratio, this AOI enlarges the pixels and reduces both resolution and image
quality.
3.0mm lens
10.0mm lens
40.0mm lens
Resolution + Field of View

Using same settings of the images above, what would be the result of using a higher resolution camera? You must have gotten
it correct! Yes, you can expect to have higher pixel between eyes even with wide FoV. However, look at the pixel between eyes
when appropriate FoV is being achieved, a whopping 99 pixel between eyes! You can see the illustration at the images below.
A good rule of thumb to follow is to fit the face into a box estimating 800 x 600 as shown in the image below. While using a
1080p camera, the FoV should only fit approximately 3-4 people within the view. These recommendations enable you to obtain
optimal pixel count between eyes to enable Xeric to achieve its best performance. However, there are other factors at work,
which include enrolled images and COMAH. We will cover what is COMAH at later section, which presents to you field issues and
you might be facing too.

Having a good facial recognition engine, good lighting and good camera configuration only solve part of the equation of good
facial recognition. If your camera is not being installed at appropriate position, your rate of indexed and accuracy will be
affected. This section provides recommendations to physically install your cameras at position that increases probability for
facial recognition.
The best place to place a camera is directly in front of the person at eye level. This gives you direct frontal face image that is the
best for facial recognition. Realistically, this is often not possible without obstructing human movement. On another aspect,
this could be a covert operation which general public cannot be alarmed and will be non-cooperative. Before looking at how
cameras could be mounted, let’s see what are the limitations that we are addressing.
Facial recognition works fundamentally by comparing both images and provide the result on their similarity. Probe
image that looks similar to enroll image yield better result as compared to those images that looked different. Pose
differences played a significant part in matching same person in different images that looked at different direction.
Laboratory studies show that when templating error is taken out of consideration, most facial recognition engine give
similar performance in terms of accuracy. Xeric’s engine taken this into consideration and automatically perform pose
correction on probe images prior to matching them against enrolled images. What’s even better is that current JPEG
image database can also be pose corrected before enrolling them, ensuring legacy databases can be use.
Even with good algorithm, it is important to take Pitching, Rolling and Yawing of faces into consideration when
planning out where to position facial cameras. This ensures offset of each axis are kept within tolerance in order for
algorithm to accept and index a face before rejecting it. The images below illustrate to you what are the differences
between pitch, roll and yaw.
Pitch Down Pitch Up
Roll Right Roll Left
Yaw Right Yaw Left
Camera Installation
Angle of Capture
Pitch, Roll & Yaw

The best position to capture face for a facial recognition camera is right in front of the person. However, this is not possible in
most cases where people could be moving around freely and without obstruction. Another alternative is to mount the camera
at a higher position, achieving approximately 20% Vertical Angle of Incidence. There is a simple formula that you can use to
determine the recommended capture distance when given mounting height of camera.
Example 1:
You have a ceiling height of 3.5m and you need to determine how far away from the camera to setup a chokepoint. A typical
Asian Male has his eye level at approximately 1.5m. Applying these information into the formula gets the following:-
(3.5m - 1.5m) / 0.2 = 10m
In this example, you can setup the chokepoint approximately 10m away from camera mounting point.
Example 2:
You will need to setup a chokepoint 8m away from camera mounting point and the ceiling height is 3.5m. How can you apply
this information to the formula?
Restructuring the above formula brings you to (CD * 0.2) + EL = CH
(8.0m * 0.2) + 1.5m = 3.1m
Recommended mounting height is 3.1m, which is slightly lower the ceiling. You can consider having a dropdown pole to bring
camera height to 3.1m for best performance.
Position for mounting the camera put your understanding of FoV, Resolution and Pixels between eyes at test. A good
installation takes into consideration all of the factors to determine appropriate camera, lens, location and height.
(CH - EL) / 0.2 = CD
Where CH = Ceiling Height, CD = Capture Distance and EL = Eye Level
Placement of camera must always take consideration of achieving frontal facial view as much as possible. However, in
real situation, this is hard to achieve especially with covert cameras and uncooperative people. It is thus important to
consider COMAH for each camera in relation to overall objectives of your deployment. The next section provides a
rule of thumb and simple formula to calculate where could you mount your cameras.
Camera Mounting Formula

Performance of facial recognition system involved much more than having a very accurate engine. Primary function of facial
recognition is to capture facial image of human for matching against database. In this case, one need to take into considerations
both human and environmental aspect to optimise opportunity and quality to achieve accuracy. In order to do that, it is
recommended to consider COMAH (read “coma”). COMAH stands for Chokepoint, Obstruction, Movement, Attraction and
Human.
Obstruction could be a permanent or temporary object placement that impedes or prevent passage. Obstruction in our
context could mean a pillar, wall, plants or people who could be standing between our subject and camera. Besides looking at
the floor plan to determine placement of camera, it is even more critical to visit the site.
Visiting the site during different time of the day, different days of the week enable one to know if there could possibility of
temporary obstruction. Talking to customer on their future plan of the area able us to know potential permanent obstruction.
Obstruction might divert people away from the area or reduces probability of getting facial images of subject.
Chokepoint can be defined as a narrow passage where one need to pass through to reach their destination. A chokepoint can
be a gantry, metal detector or gate, etc. as it depends on customer’s requirement. Capturing of faces in this zone provide many
advantages:-
1. Walking speed of people will be reduced to increase probabilities of getting image on people looking straight
and level. Stationary subject or subject with slower pace reduces motion blur without need to increase frame
rate. Higher frame rate could resulted in higher storage requirement which equals to higher cost for customer.
2. Infrastructure can be enhanced over this region to provide evenly diffused lighting to improve quality of
images. Uneven lighting, backlighting and natural lighting could reduces accuracy of facial recognition.
3. This zone enable camera to focus on a specific wide to achieve optimal pixel count between eyes. Taking this
into consideration, the setup might require only a 720p or 1080p camera. Comparing this to wide FoV that
might require much higher MegaPixel camera and resulted in high processing requirement.
Chokepoint
Obstruction
COMAH

Having camera looking over an area with more people increases probability of getting their faces as compared to an area with
lesser crowd. Studying movement of people in that area is important to position camera at the most appropriate location. Usual
peak movement occur before working & school hours, after working & school hours, major event as well as holidays. In some
circumstances, people might be taking short cut to save on time or to avoid a detour.
In order to know all these, it is important to visit the site over varies period to get better understanding on these matters.
Depending on the requirement, changes to site could be proposed to create natural flow of human to where you want them to
be.
Attraction could be anything that draws attention of people by looking at the attraction or physically moving towards it.
Subject could already be in chokepoint and something draws their view away from the front and look towards it. Depending
on the angle of pitch, roll and yaw, accuracy of match could be lower. It could also be falling out of the limit and no face being
indexed.
However, we could be using attraction to our advantage. Instead of looking towards another zone or place, we could
“encourage” subject to be looking straight in front of them. There could be some interesting features that enable them to look
directly ahead, capturing their best pose. One point to note is that this attraction needs to be constantly changing to keep the
subject engaged or interested to continue looking at it.
Besides being an art and science, Facial Recognition also deals with culture and human behaviour. Without taking this aspect
into consideration, effectiveness of capturing faces of people could be limiting.
In some countries, ladies might need to wear veil and men might need to keep their beard for religious reason. However, with
increase in global travel, similar situation could happen anywhere when people move around. Human behave differently when
they are approaching a swinging door, escalator, elevator, obstruction, etc. Prior to fixing camera directly in front of their
walking path according to placement formula, it is important to study how do they move to determine the capture area.
Another bigger phenomenon is what we known as “Mobile Zombie”, people who keep staring down to their mobile phone. In
order to capture their faces, you might need to think of creative ways to capture their attention and lift their head.
Movement
Attraction
Human

While it is important to have an accurate, effective and efficient facial recognition engine, it is equally important to put other
factors into consideration. A physically challenging installation with comprehensive considerations could achieve high
probability of indexed and high accuracy. A relative simple installation without much consideration could put one in situation
wondering why performance is not up to par.
Below summaries all the major considerations that had been discussed in this whitepaper:
1. Facial Recognition is matching 2 images together to product the result. Quality differences between Enrolled and
Probe image has to be as close as possible.
2. Lighting on both enrolled and probe image affects clarity of facial details to achieve high accuracy.
3. Consideration on camera type, resolution and lens are important. Depending on each physical requirement and its
affect on storage system, different combination of camera + resolution + lens are being used. It is not recommended
to stick to a permanent configuration for every scenario.
4. Chokepoint, Obstruction, Movement, Attraction and Human affects various considerations. Height of camera,
location of mounting, etc. are being affected by COMAH to give the best facial angle, evenly distributed lights and
maximum frontal facing path.
5. Most importantly, it is highly recommended to test all your assumption onsite to eliminate those issues that cannot
be uncovered during tabletop planning.
This whitepaper is never meant to be exhausive and not a “must follow” rule. This is a guideline based on situations we saw on
the field and experienced that we accumulated after spending many nights working on installations. This whitepaper aims to
enable you to leverage on what we have already know and jump-start to provide an effective, efficient and accuracy solution
for your customer.
Conclusion
Sunny Tan | MSc, MCiiSCM, CCSMP
https://sg.linkedin.com/in/sunnytan30
May 2016

Info-Technologies & Trade Pte Ltd
81 Ubi Avenue 4,
#07-22 UB.One
Singapore 408830
Xeric@inter2tkorea.com

Xeric Facial Recognition Whitepaper

Recommended

Recommended

More Related Content

Similar to Xeric Facial Recognition Whitepaper

Similar to Xeric Facial Recognition Whitepaper (20)

More from Sunny Tan

More from Sunny Tan (13)

Recently uploaded

Recently uploaded (20)

Xeric Facial Recognition Whitepaper