1. Fall detection: Microsoft Kinect
Prashantkumar Patel(pnp249), Vatsal Gopani(vbg221), Jatri Dave (jad752)
Department of Computer Science and Engineering, New York University Tandon School of Engineering
ABSTRACT
Elderly people sometimes have the problem of falling down
unexpectedly. These accidents need to be taken into
consideration.Using Microsoft Kinect, we have presented the
fall detection system. We have presented two methods for
detecting the fall. We have analyzed the pros and cons for
both of these methods. In the first method for fall detection
we are using Kinect’s core API’s to detect the fall. In second
method, we are extracting the contour of human blob and
performing fall detection based on the contour tracking.
Index Terms— Fall Detect, Kinect, Computer Vision
1. INTRODUCTION
The fall of elderly people is a clinical problem due to the age
factor. Often these falls lead to serious injuries and may cause
the death. High mortality occurs among elderly people due to
unstable equilibrium which ultimately turns out to be a Fall.
According to cdc [1] One out of five falls causes a
serious injury such as broken bones or a head injury may
happen. Only half of the hospitalized elderly who have
suffered a fall, have survived more than one year. Also,
adjusted for inflation, the direct medical costs forfall injuries
are $31 billion annually. Older adults living alone are at a
great risk of delayed assistance following a fall. Having a
system that can autonomously detect a fall incident could
decrease these injuries consequently the treatment
expenditures.
We are proposing a systemthat could be considered
as low cost implementation. There have been severalattempts
to solve this problem using various sensors and
methodologies. One of those methods contains using sensors
such as accelerometer to detect the fall. Another type of
detection method requires the floor vibration detection which
is a complex and expensive approach to continue with. We
on the other hand are proposing the computer vision based
system which does not require any wearable sensors nor
requires complex setup for the environment. The approach
that we have proposed performs a real-time algorithm and
accurately detects falls. It gives a perfect result of weather a
fall has occurred or not.
We are using the Microsoft Kinect sensorto perform
the analysis. It is a RGB-D sensor which is capable of
performing complex computer vision based task efficiently.
In the case of insufficient light, Kinect distance sensors
collect the image contourand texture more clearly, compared
to the traditional tracking systembased on binocular camera.
It can use infrared sensorto scan the environment to generate
the current scene depth map.
In this report, we have presented two different
methods. Both the methods are fundamentally different. The
first method uses Kinect’s core API from which the core
Kinect’s skeleton structure is obtained. In the other method,
we have used the classical computer vision algorithms to
detect the fall. We have avoided using the core Kinect’s API
and performed the algorithm and analyses on emguCV-
openCV wrapper for C#.
2. MOTIVATION
In the majority of Old Age Homes, personal rooms are
provided. Sometimes they provide the sharing rooms for two
people. The result is, most of the time these elderly people do
not get enough attention that is required for their betterhealth
care when they are alone in their rooms. So, if they need any
urgent medical attention, they cannot ask for it immediately
and one of the major cause behind this need is their
unanticipated collapse. Over 1.6 million US adults are being
treated for fall related injuries every year suffering from
minor injuries to death [3]. One of the major reason behind
these many accidents is aging. Clearly elderly people are the
more prone to this threat. Considering this issue,it would be
logical to come up with some systemthat can partially solve
it issue or contribute towards the solution of the problem. we
decided to work on a system that can detect the fall so that
immediate actions can be taken without the need of the
patient to do anything.
There are many systems in the market that can be
used to detect fall. Some of them are robust and some of them
are not but they are not used widely or in a way that they can
make any significant change in people’s lives. The reason
behind that is, they are not user friendly. A survey says that
elder people do not like when they have to wear something
24x7. They feel that their freedom is taken away. Hence it is
required to build a system that is both robust and smart
enough to monitor the movements of people without
interfering with them. To implement this idea, we decided to
build a systemthat uses a camera to detect the fall.
3. RELATED WORK
2. Yet to decide whether to write or not.
4. THE PROPOSED APPROACH
We are proposing two approaches here to represent the
solution of the problem. Both of these approaches uses the
sensor for different purpose. Both of these approaches uses
the Microsoft Kinect. A little background of the device and
its capabilities could provide little help to understand the
larger problem.
4.1 The Kinect Sensor.
As we stated earlier the Kinect sensor is an RGB-D based
sensorwhich is manufactured by the Microsoft and is mainly
used in the gaming research and related area. There are two
main variant available in the market Kinect V1 and Kinect
V2. We are using the Kinect for windows V1. The device, in
Fig.1, is composed of an RGB camera and an infrared (IR)
depth sensor, both characterized by a resolution of 640×480
at 30 fps It was developed by Prime Sense [11], which
provides also the software library for full-body 3D motion
capture.
We are performing the algorithm implementation using the
OpenCV. Although there is no direct support to program the
sensor and manipulate the data stream coming from Kinect,
there are some experimental packages available to work with.
4.2 Fall detection by skeleton structure.
Kinect provides the detection of necessary skeleton points to
work with from human body. Figure 2 shows the details of
which skeleton points can be obtained using Kinect.
As we can get the joints position from this skeleton,
we can also use themto track the human in front of the Kinect
and detect its position.The first approach is entirely based on
this concept.
In this method, we have obtained the coordinate
points of “Head”, “Spine”, “Foot Right” and “Foot Left”
from skeleton. Once you get the (x,y) coordinates ofa human
skeleton, your task becomes easier.
Here, we have taken only the Y coordinates from
skeleton points of a moving person in account as we know
that the knowledge of X coordinate will make no difference
in our analysis of detecting if a person has fallen or not.After
that the differences between the Y-coordinates of the
previously obtained points are calculated. If all the
differences go below the threshold value, it will show the
result as a “Fall”. A careful observation is required to
calculate the threshold value as the erroneous threshold value
will generate False results.
The below figure shows the flow diagram of the first
approach.
as we can see on the above image. We have detected a fall of
the person and the skeleton structure of the person is also
3. shown there. We have shown the depth map with the data but
there is a color frame which is also associated.
4.2.1 benefits of using this approach
- We can use Kinect’s core APIs which can track the
human body and estimate the pose even with the
little occlusion.
- Also, using this approach means less computational
workload. This decreases the cost of hardware
(CPU) to keep the systemrunning.
4.2.2 disadvantage of using this approach
1. As we are using core APIs of Kinect, it limits the
capabilities of the system because of the core
problems.
Kinect is designed to use as gaming
controller and hence can only recognize the human
in standing or sitting position. When human lies on
the floor, Kinect cannot recognize the body (It
cannot recognize human body horizontally.) We
cannot get the skeletal data when a person is already
lying on the floor.
2. Secondly, while working with multiple people,
Kinect constantly switches between persons and can
keep track of only one person at a time.
This creates interference with the smooth
operation of the system. Due to these problems that
we faced, we decided to work on another approach
which is more complex and includes more
computation but gives better results.
4.3 Fall detection by human blob tracking.
One of the reason we mentioned here was that Kinect cannot
recognize human body horizontally. Now this can be solved
if we do not use the core APIs of Kinect and build our own
approach to deal with this problem. Hence, we removed the
core APIs and switched to classical computer vision methods
to solve the problem. This increased the amount of
computation but started giving better results.
4.3.1 detecting the human without Kinect.
Since we are not using the kinect’s core api to perform solve
the problem we are using the OpenCV to perform the analysis
on the raw kinect’s data. OpenCV does not have the official
support for the C# programming language but there is a
wrapper of OpenCV around the C# which is known as
EmguCV we used this wrapper to perform the analysis.
following is the flow diagram that explains the details of the
approach.
Kinect provides the depth data of the surrounding
environment. Kinect can also provide the color stream
simultaneously with 30fps providing 640*480 resolution of
the pixels. Now processing the depth map was the necessary
part in order to use the data. Depth map is not a RGB based
data where we can simply apply our algorithms. We needed
to come up with a way so that we can visualize the depth map
given from Kinect. The data that Kinect gives us is a raw
depth data we need to convert this data into the OpenCV
compatible image object so that further processing on the
image can be performed.
We are assigning the probably RGB value of each
of the depth pixels so that we can convert the depth data into
somewhat visual data and then we process the depth frame
further.
Although depth map is extremely useful for many of
the computer vision based problem preprocessing of the
depth map is necessary. Depth map is highly unstable and
there are countless local variation happens from one frame to
another. We needed to clean the depth map or at least try to
minimize the error in the depth output.We used the Gaussian
filter and Median filter for the purpose and we reduced the
noise from depth data.
Now in order to detect the moving object from the
depth map without using the Kinect api’s we needed to come
up with a way by which we can dynamically obtain the blob
of the moving object in the scene and then perform
calculation on the object. We used the famous dynamic
background subtraction removal algorithm for the purpose.
Although there are three variations of this algorithm we used
the MOG2 algorithm, the benefits of using this algorithm is
that this algorithm takes care of the trailing shadows of the
moving object and One important feature of this algorithm is
that it selects the appropriate number of Gaussian distribution
for each pixel. It provides better adaptability to varying
scenes due illumination changes etc.
When using MOG2 Each pixel is modeled using six
distributions 3 background and 3 foreground, and the
background distributions are initialized using a set of training
frames (T sec). When a new pixel value is being updated any
distribution to which the new value matches has its range
updated and weight increased; unmatched distributions will
slowly decrease the weights and eventually become zero. If
the new pixel value does not match any active distribution,
the foreground distribution with the least weight (or inactive)
is reinitialized based on this value. After updating, any
foreground distribution whose weight has reached a
predefined threshold replaces the least weight (or inactive)
background distribution. The parameter settings are such that
4. a stationary object placed in the scene will be updated into the
background after approximately 5 min, and background
distributions will become inactive if not matched for 20 min.
Due to the depth imagery using an actively emitted pattern of
infrared light, many of the problems, such as lighting and
shadows, associated with background modeling in color
imagery are avoided.
Image of Moving person.
As you can see in the above example where we have extracted
the contour of the moving object we have also extracted the
bounding rectangle around the object. Based on the data that
we gathered we then use the height/width ratio and convert it
into dip angle. Below is anotherimage showing the extracted
mask from the background image. The mask image shows the
moving object and the data which is being subtracted from
the background.
After getting the results from the mog2 we extracted
the contourregion of the moving object. The ore assumption
of this algorithm to work is that when human is in standing
position the bounding rectangle around the human has the
ratio of width/height which is greater than 1.0. When the
human falls on the surface the width of the human blob
increases and the height decreases so that we get the very low
ratio and we conclude that we detect the fall. Following is the
working screenshot of the problem.
and the mask of the fallen person.
5. EXPERIMENTAL ANALYSIS
5.1. Experimental results of room setup.
We took an average room of approx. 12’ x 8’ (In our case a
Kitchen) We kept it almost empty to simplify the process.We
mounted the Kinect on the top of one wall of room at
approximately 6.5’ height.
5.2. Experimental results of first approach.
In the first approach, we took multiple thresholds to test the
results we were getting the perfect results to our surprise.
Consider the following table for the analysis of our classifier.
Threshold TPR FPR
0 0 0
100 0 0.3
250 0 0.8
350 0.016 1
400 0.2 1
900 1 1
The Calculated ROC for the above result is shown below.
5. Although we need to notice that all the test cases that we
performed are based on the ideal work environment that we
set up.
5.3 Experimental results of second approach.
For the second approach the data that we calculated are
shown as below.
Threshold TPR FPR
1 0 0
0.7 0 0
0.5 0.6 0.1
0.4 0.8 0.2
0.2 0.9 0.4
0.1 1 0.8
0 1 1
Again, above plotting is based on our ideal case analysis
where we performed the testing in experimental space.
.
7. FUTURE WORK
Please do not paginate your paper. Page numbers, session
numbers, and conference identification will be inserted when
the paper is included in the proceedings.
8. CONCLUSION
Illustrations must appear within the designated margins.
They may span the two columns. If possible, position
illustrations at the top of columns, rather than in the middle
or at the bottom. Caption and number every illustration. All
halftone illustrations must be clear black and white prints. If
you use color, make sure that the color figures are clear when
printed on a black-only printer.
9. FOOTNOTES
Use footnotes sparingly (or not at all!) and place them at the
bottom of the column on the page on which they are
referenced. Use Times 9-point type, single-spaced. To help
your readers, avoid using footnotes altogether and include
necessary peripheral observations in the text (within
parentheses, if you prefer, as in this sentence).
6. 10. REFERENCES
List and number all bibliographical references at the end of
the paper. The references can be numbered in alphabetic
order or in order of appearance in the document. When
referring to themin the text, type the corresponding reference
number in square brackets as shown at the end of this
sentence [2]. An additional final page (the fifth page, in most
cases) is allowed, but must contain only references to the
prior literature.
[1] Center for Disease Control and Prevention (CDC),
“Falls among older adults: An overview,” [Online]. Available:
http://www.cdc.gov/homeandrecreationalsafety/Falls/adultfalls.htm
l
[2] Jones, C.D., A.B. Smith, and E.F. Roberts, Book Title, Publisher,
Location, Date.
[3] Nihseniorhealth: About falls Available online:
http://nihseniorhealth.gov/falls/aboutfalls/01.html (accessed on 10
December 2013).