A feasibility study of Smart Stadium Watching
Check IEEE Format on BlackBoard
Nedal Raboey, Information System
Appendix A:
Abstract: Sports events at the stadium are often subjected to violence, rioting, and throwing of objects on the players or amongst audiences, which results in physical injuries and financial loss involving repair and banning to hold future events. Authorities have used sophisticated surveillance systems with arrays of CCTV to monitor stadiums and surrounding areas. However, with there being massive amounts of video feed available and limited human resources, it is impossible to manually check and interpret these feeds into meaningful information. SSW, Smart Stadium Watching, provides a smart and intelligent system which has features of human vision and interpretation of human behavior as normal or abnormal. SSW provides a combination of multiple techniques to encounter common scenarios occurring during sports events like rioting and throwing of objects. SSW uses technologies such as face recognition for identifying miscreants, crowd analysis for determining crowd behavior [1], and object detection for finding the projectile’s path, origin, and target of thrown objects.
1. INTRODUCTION
Sports and rioting in stadiums have a long history. There have been many instances in stadiums across globe where rioting and throwing objects (fire-works, plastic bottles, banners, rocks etc.) have occurred. This results not only in physical danger and harm to the players and match officials but also to the stadium authorities in terms of monetary loss due to fines and banning on organizing future events as well as damaging reputation. Most affected of all sports is football, where a large number of fans attend the match. Due to high tension and a few miscreants, there is huge risk of danger to human lives. Stadium security is dependent on human resources. Security personnel tend to review large amount of video feeds coming from number of cameras across stadium. Human error, lapse in judgment, un-availability of adequate number of resources, delayed response, etc. make current security system vulnerable to terrible accidents. Human intelligence and ability to assess situation and emotion is important, but there is need to develop a smart, intelligent system with faster response, analyzing speed.
The proposed system, SSW, will enable stadium authorities to put an end or impose restrictions on incidents by identifying miscreants involved in throwing objects or any unruly behavior. This system uses robust image/video processing by tracking an object thrown into the arena [2] [3] and analyzing its projection path to identify the location of origin with seat number and even recognizing the person involved [4] and his target to minimize damage/danger by intercepting the objects or clearing the target area [26]. This system proposes both pro and pre-emptive action.
Human vision and sensory awareness have great efficiency in scanning a large are.
A feasibility study of Smart Stadium WatchingCheck IEEE Format.docx
1. A feasibility study of Smart Stadium Watching
Check IEEE Format on BlackBoard
Nedal Raboey, Information System
Appendix A:
Abstract: Sports events at the stadium are often subjected to
violence, rioting, and throwing of objects on the players or
amongst audiences, which results in physical injuries and
financial loss involving repair and banning to hold future
events. Authorities have used sophisticated surveillance systems
with arrays of CCTV to monitor stadiums and surrounding
areas. However, with there being massive amounts of video feed
available and limited human resources, it is impossible to
manually check and interpret these feeds into meaningful
information. SSW, Smart Stadium Watching, provides a smart
and intelligent system which has features of human vision and
interpretation of human behavior as normal or abnormal. SSW
provides a combination of multiple techniques to encounter
common scenarios occurring during sports events like rioting
and throwing of objects. SSW uses technologies such as face
recognition for identifying miscreants, crowd analysis for
determining crowd behavior [1], and object detection for
finding the projectile’s path, origin, and target of thrown
objects.
1. INTRODUCTION
Sports and rioting in stadiums have a long history. There have
been many instances in stadiums across globe where rioting and
throwing objects (fire-works, plastic bottles, banners, rocks
etc.) have occurred. This results not only in physical danger and
harm to the players and match officials but also to the stadium
authorities in terms of monetary loss due to fines and banning
on organizing future events as well as damaging reputation.
Most affected of all sports is football, where a large number of
2. fans attend the match. Due to high tension and a few miscreants,
there is huge risk of danger to human lives. Stadium security is
dependent on human resources. Security personnel tend to
review large amount of video feeds coming from number of
cameras across stadium. Human error, lapse in judgment, un-
availability of adequate number of resources, delayed response,
etc. make current security system vulnerable to terrible
accidents. Human intelligence and ability to assess situation and
emotion is important, but there is need to develop a smart,
intelligent system with faster response, analyzing speed.
The proposed system, SSW, will enable stadium authorities to
put an end or impose restrictions on incidents by identifying
miscreants involved in throwing objects or any unruly behavior.
This system uses robust image/video processing by tracking an
object thrown into the arena [2] [3] and analyzing its projection
path to identify the location of origin with seat number and even
recognizing the person involved [4] and his target to minimize
damage/danger by intercepting the objects or clearing the target
area [26]. This system proposes both pro and pre-emptive
action.
Human vision and sensory awareness have great efficiency in
scanning a large area, including a group of people to interpret
the meaning and reason. Human interaction recognition (HIR)
and human action recognition (HAR) are two important
applications of computer vision and are useful for developing
next generations of intelligent surveillance systems [27]. The
machine learning [5] and human behavior detection is a
sophisticated technology in its developmental stage. A robust
approach is to use methods that describe video sequences as an
unordered set of local space-time features. With the increase in
the number of CCTVs [6] and the amount of data with it,
manual supervision tends to numerous errors and problems.
Challenges in transmission and archiving huge feeds received
from many cameras need to be addressed. It is necessary to
optimize the quality of feeds while storing them. There is a
need of an intelligent and smart system which can detect an
3. event as suspicious or abnormal and interpret the meaning based
on probabilistic models. It aids in decision making, faster
transmission, and the ability to process feeds from a number of
sources at the same time [28]. This paper discusses the
fundamental concept of how the SSW system works. The
various factors and algorithms used to build it are also
discussed.
2. Stadium Smart Watching (SSW)
SSW technique follows four basic steps as in Fig 1:
1. Surveillance
2. Detection
3. Tracking
4. Risk Assessment
Fig 1. Entities in SSW
2.1 Surveillance in SSW collects video/image feeds from
CCTVs and provides monitoring assistance. There are a number
of CCTVs installed in a region as well as the number of feeds
received from them. The video footage or images received are
stitched together to a single panoramic feed, which allows a
user to monitor a large area into a single feed. There are two
methods to achieve panoramic view:
1. Single-Shot: Fish-like lenses or single lens with rotatable
shaft to spin to create a single shot video or image. The costs
for these types of lenses are high.
2. Mutli-Shot: The feeds from multiple cameras are stitched
together to a single feed. It requires image processing like
image alignment, image projection and image blending.
4. Image stitching is comparatively easier to perform than video
stitching. The moving objects and different levels of color
illumination are major issues in video stitching. Image blending
and dynamic alignment are used to overcome them. Block
matching and feature matching techniques are used in
overlapped images. The position and description of objects are
used to determine degree of similarity in different images [7].
In Fig 2, it shows the panoramic view of the camera during the
process of toning, stitching and color normalization.
Fig 2. Flow chart of the proposed video stitching algorithm.
Fig 3. The results with drift problem (Top) and without drift
problem (Bottom)
Fig 4. An example of the drifting problem (as calculated [7]
The image of panoramic view follows basic pre-processing
methods like color blending, adjustment in drift, alignment etc.
(as illustrated in Fig 3,4) Alignment is adjusted by adding the
front-edge of the leftmost image of the panorama with the tail
edge. Images in Fig 5 show the results of aligning images
properly.
Fig 5. An example of blending problem
(Top)With blending, (Bottom)Without blending
Use references to support current state of the art in surveillance
What is needed/ coming/ limitation
5. 2.2 Detection in SSW
It provides a baseline for all surveillance systems. It can be face
detection [8] of an individual or a group of individuals,
recognizing a crowd or mob, or even detection of any moving or
still attended or unattended objects. Detecting events [32] in an
image or a video can be classified as human-centered action
detection or pooling of features and actions.
Face detection is the most basic phenomenon in a surveillance
system. Face recognition includes the automatically identifying
or verifying a person from an image or video sequences from
the database of images of serial or past offenders. It provides
relevant information about the individuals. Face detection or
recognition uses previously stored images for comparison with
the currently received or retrieved feeds. It poses a few issues
like pose variation, misalignment, facial aging, illumination
variation, expression variation, and distance between face and
camera. There are different algorithms available for face
detection. The algorithm discussed here uses facial
points/locations such as nose, eyes, marks on the face, lips, and
etc.
All visual information in a scene is considered as local change
along one or more dimensions of a single function known as
Plenoptic function or light field function.
The light field camera captures not only the intensity but also
the direction of all possible incident rays on each photo sensor
pixel. This property of LFC was effectively analyzed to
reconstruct both super resolution and high dynamic range
images for both face and iris recognition. In a 2D image
captured, one can see only few regions with sharp focus. The
principle of the light field imaging is to use the entire scene
information to be captured in the image. The 3-D image
reconstruction is done [9] to match it with the image stored.
This eliminates the various issues of aging and illumination. An
example of face detection is shown in Fig 6
6. Fig 6. Shows that the algorithm has the ability to construct a
likely facial image of a target
This detection technique can also be used in identifying
restricted items or objects. The basic feature of face detection
to create 3D reconstruction of an image to match up with stored
image. It is most efficiently used in specific facial features like
a scar or other visible marking, but it can be applied on objects
like knives or fireworks.
Fig 7. It shows the types of items that allow for easy detection
in the facial recognition systems.
In places like stadiums, identifying an individual is rather
difficult and cumbersome. The riot or clashes occur in a crowd,
so it necessary to detect a crowd and it behavioral response. The
technique [10] used here is to detect a crowd at two-levels, i)
Narrow-scale to identify human appearances and ii) on a larger
scale identifying any repetitive elements. Crowd segmentation
serves a higher purpose in analyzing behavior, interaction, and
count [19].
Foreground/background segmentation [24] on depth image
streams is used in order to coarsely segment persons, and then
depth information is used to localize head candidates, which are
then tracked in time on an automatically estimated ground
plane. Number of people counting is an instantaneous
estimation of the number of persons present in a scene, and
crowd analysis is the higher-level analysis of behaviors of
groups of people in crowded scenes. Techniques for counting
the number of people are classified as i) Detection+ Counting
approach: by motion segmentation of moving objects; ii)
Feature-Based Counting: by using multiple features associated
with crowd density.
Use references to support current state of the art in Detection
What is needed/ coming/ limitation
7. 2.3 Tracking in SSW
Tracking objects or person of interest is easy after detection in
case of still or standing objects or persons [30]. Objects thrown
into sports arena can be detected and the location of origin and
target can be determined by the technique used here Time of
Flight (ToF) [11]. Most of the motion detectors suffer motion
blur and shadowing [21] [23], but ToF uses relative photometry
principle [17]. It calculates the relative amount of photons at
different time intervals.
Color measurement algorithms that can work at low light levels
are important such as in night-vision and rapid industrial
inspection to minimize exposure times. Any movement in
objects or cameras cause Phase Mixing in a given time frame
leading to motion blur. The shooter time frame and shadow
removing technique [31] helps in reducing SNR [14]. Obscure
visual environments impair visual perception and result in the
worse performances in object detection, identification, and
recognition. Blurred images might also induce negative
affective responses to the visual environment. The image blur
increases ambiguity, uncertainty, and unpredictability of a
visual environment resulting in impaired performances in
various perceptual tasks such as object detection, identification,
and recognition.
ToF camera uses multiple electric charges at different time
intervals to calculate depth value. ToF depth cameras use IR
signals of fixed wavelength and measures the phase offset
between the emitted and reflected IR signals to obtain the depth
from the camera to objects as shown in Fig 8
Fig 8. Compares the times of the proposed system with the other
current methods being used
ToF determines depth value locating the origin of the object
thrown and the projectile’s path to determine the target location
as in Fig 8. The sensitivity of landing area is marked and the
8. projectile helps in determining landing area can also be
determined. Based on the depth value, any accident can be
averted. Object tracking can be summarized into two categories:
discriminative and generative approaches [12]. Discriminative
aims to segment the target from the background by
classification problems. Generative methods formulated
tracking by establishing the appearance model of the target [25]
[29].
Use references to support current state of the art in tracking.
What is needed/ coming/ limitation
2.4 Risk Assessment in SSW
It is the most crucial part of a surveillance system. Analyzing
individual behavior and crowd behavior [22] determine the risk
of thrown objects by pin-pointing the origin and target of
thrown objects. The sophisticated detection and tracking system
is useless unless we convert the feed into meaningful
information. The crowd behavior analysis [13] is most
important in a place like a stadium. The major challenge in
abnormality detection is that there is no clear definition of
abnormalities as they are basically context dependent and can
be defined as outliers of normal distributions. Techniques for
detecting abnormalities are classified as i) Object Based
Method: by segmenting frame into smaller objects, which are
tracked and are referred by external trajectories; ii) Holistic
Approach: by isolating each individual from the crowd; iii)
Hybrid Approach: by combining both above method using
optical flow histograms of movement of crowd. The human
behavior can be classified in two categories [14], i) Normal and
ii) Abnormal as in Fig 9.
Fig 9. The Defining Normal and Abnormal Behaviors.
Feature point tracking method is another method for risk
assessment by acquiring a serial of features, which are
9. classified into three categories: undesired motion, moving
object and static object. It uses three steps holistic approach i)
Image Analysis to eliminate vibration and noise; ii) Pattern
Recognition with pre-defined gestures, moves; iii) Behavioral
Analysis Process based on context decision.
Fig 10. Shows a model that calculates risk for pedestrian
situations.
The crowd behavior analysis is still in its early development
phase. Machine learning and contextual and motion pattern
understanding are still improving.
Use references to support current state of the art in tracking.
What is needed/ coming/ limitation
3. ARCHITECURE of SSW
The SSW uses networked structure as in Fig 11 based on
distributed IP addresses with dedicated server at each point. The
networked structure provides security and privacy [15] [18] to
the system from any unauthorized use. IP CCTV systems do not
require local recording. They can transmit their images across
Local Networks, the Internet, and Wide Area Networks to a
central location, where they can be recorded, viewed, and
managed. IP CCTV [20] systems convert all images to data and
have no theoretical limit to resolution, providing the relevant
bandwidth to transmit the images exists.
Each component (cameras, security terminals, storage (NAS)
server, SSW server.) are inter-connected by private WLAN,
with personalized IP address. It provides hierarchical security
access to security team. Security feeds are stored in a separate
server for future viewing and analysis.
Fig 11. Architecture of Proposed SSW
Address here how all can/will work together
4. CONCLUSION
10. The loss of human lives and physical injuries to players and
audience members are tragic. The financial loss is also a
concerning due to bans on hosting future games in the stadium,
repair cost of stadium, and compensation cost to the victims.
The new proposed SSW will decrease the occurrence of these
incidents by eliminating human error and overcoming
limitations in the current systems. Better image/video
processing will help monitoring and identifying people in the
large area with more ease. Steady and robust algorithms have
better results than the existing systems requiring less manual
interference and decreasing the chance of error. Prevention of
dangerous incidents in the stadium will be good for stadium
authorities. It will protect the stadium’s reputation and reduce
money loss by preventing fines and lost business. The cost
effective SSW system has an even wider application also. It can
be implemented at public places like railway station, shopping
complex, and airports to create more secure public areas
everywhere.
SSW uses various algorithms combining human intelligence,
response, behavior, awareness with computer vision. The faster
processing speed makes this system reliable and independent of
human error. The implementation cost and time is very
effective. Since, it provides a safe experience for the audiences
as well as saving sport clubs the costs of fines, reputation and
even property damage.
Then write conclusion tying the above together
References
Check references IEEE format
[1] Crowd Analysis using Computer Vision Tools, IEEE Signal
Processing Magazine, Sep 2010
[2] Moving shadow detection and removal – a wavelet transform
based approach Manish Khare, Rajneesh Kumar Srivastava,
Ashish Khare ,Department of Electronics and Communication,
University of Allahabad, Allahabad 211002, India
[4] Analyzing Tracklets for the Detection of Abnormal Crowd
11. Behavior, 2015 IEEE Winter Conference on Applications of
Computer Vision
[5] A survey on computer vision tools for action recognition,
crowd surveillance and suspect retrieval, Te´ofilo E. de
Campos1 1Centre for Vision, Speech and Signal Processing,
University of Surrey, Guildford, GU2 7XH, UK
[6] Overview of Recent Advances in CCTV Processing Chain in
the INDECT and INSIGMA Projects, 2013 International
Conference on Availability, Reliability and Security
[7] A 360-degree Panoramic Video System Design,Kai-Chen
Huang, Po-Yu Chien, Cheng-An Chien, Hsiu-Cheng Chang and
Jiun-In Guo ,National Chiao Tumg University, Taiwan
[8] Presentation Attack Detection for Face Recognition Using
Light Field Camera, IEEE TRANSACTIONS ON IMAGE
PROCESSING, VOL. 24, NO. 3, MARCH 2015
[9] A Taxonomy of 2D and 3D Face Recognition Methods,
Radhey Shyam and Yogendra Narain Singh, Department of
Computer Science & Engineering Institute of Engineering and
Technology, Lucknow - 226 021, India
[10] Crowd Detection from Still Images, Ognjen Arandjelovi´c
Department of Engineering, University of Cambridge, CB2 1PZ,
UK
[11] Time-of-Flight Depth Camera Motion, Seungkyu Lee,
Member, IEEE
[12] An Algorithm for Real-time Object Tracking in Complex
Environment, 2014 International Joint Conference on Neural
Networks (IJCNN)
[13] Behavior Tracking Model in Dynamic Situation using the
risk ratio EM, Yuchae Jung ,Dept. of multimedia , SookMyung
Women’s University ,Seoul, Korea
[14] Effects of image blur on visual perception and affective
response, Kohske Takahshi & Katsumi Watanabe Research
center for Advanced Science and Technology, The University of
Tokyo, Tokyo, Japan
[15] A System of Abnormal Behaviour Detection in Aerial
surveillance, 20 13 IEEE
12. [16] Surveillance Camera System to Achieve Privacy Protection
and Crime Prevention, 2014 Tenth International Conference on
Intelligent Information Hiding and Multimedia Signal
Processing
[17] Photon Detection and Color Perception at Low Light
Levels, 2014 Canadian Conference on Computer and Robot
Vision
[18] A Privacy Preserving Human Tracking Scheme, IEEE ICC
2014 - Communication and Information Systems Security
Symposium
[19] Real-time people counting from depth imagery of crowded
environments, 2014 11th IEEE International Conference on
Advanced Video and Signal Based Surveillance (AVSS)
[20] IP Camera:
http://www.cisco.com/c/en/us/td/docs/solutions/Enterprise/Vide
o/IPVS/IPVS_DG/IPVS-DesignGuide.pdf
[21] Efficient Shadow Removal Technique for Tracking Human
Objects, Aniket K Shahade, Department of Information
Technology Shri Sant Gajanan Maharaj College of Engineering,
Shegaon, Maharashtra, India
[22] Automated Real-Time Detection of Potentially Suspicious
Behavior in Public Transport Areas, IEEE TRANSACTIONS
ON INTELLIGENT TRANSPORTATION SYSTEMS, VOL. 14,
NO. 2, JUNE 2013
[23] Moving shadow detection and removal – a wavelet
transform based approach, Manish Khare, Rajneesh Kumar
Srivastava, Ashish Khare, Department of Electronics and
Communication, University of Allahabad, Allahabad 211002,
India
[24] Reference Face Graph for Face Recognition Mehran Kafai,
Member, IEEE, Le An, Student Member, IEEE, and Bir Bhanu,
Fellow, IEEE
[25] Accurate Prediction of Interception Positions for Catching
Thrown Objects in Production Systems, IEEE 2009
[26] OCCLUSION-AWARE 3D MULTIPLE OBJECT
TRACKER WITH TWO CAMERAS FOR VISUAL
13. SURVEILLANCE, 2014 11th IEEE International Conference on
Advanced Video and Signal Based Surveillance (AVSS)
[27] Human Interaction Recognition from Distance Signature of
Body Centers During Time, 2014 7th International Symposium
on Telecommunications (IST'2014)
[28] Detecting Abnormal Behaviors in Crowded Scenes , 1-
Oluwatoyin P. Popoola and 2-Hui Ma, 1-Systems Engineering
Department, Faculty of Engineering, University of Lagos,
Nigeria and 2-College of Electronic Engineering, Heilongjiang
University, Harbin 150080, PR China
[29] Catching Objects in Flight, IEEE TRANSACTIONS ON
ROBOTICS, VOL. 30, NO. 5, OCTOBER 2014
[30] REAL-TIME TRACKING WITH AN EMBEDDED 3D
CAMERA WITH FPGA PROCESSING, Alessandro Muscoloni
and Stefano Mattoccia, Dipartimento di Informatica - Scienza e
Ingegneria (DISI), University of Bologna, Viale Risorgimento
2, 40136 Bologna (Italy)
[31] Motion Area based Exposure Fusion Algorithm for Ghost
Removal in High Dynamic Range Video Generation, Shu-Yi
HUANG†∗ , Qin LIU∗ , Hao WANG∗ and Takeshi IKENAGA,
∗ Software Institute, Nanjing University, Nanjing, China
[32] On Criminal Identification in Color Skin Images Using
Skin Marks (RPPVSM) and Fusion with Inferred Vein Patterns.
Arfika Nurhudatiana*, Student Member, IEEE and Adams Wai-
Kin Kong, Member, IEEE
http://www.arupassociates.com/en/projects/king-abdullah-
sports-city/
https://www.youtube.com/channel/UCAyCedE9vZKDMqyX0fy
QVug