By
GadamSrikanth
K.Sudarshan
R.Gopinath
Guided by Dr. MaheshKumar H.Kolekar



                                      1
   Object-level understanding
    Locations of persons and objects.
    E.g:„Car‟ appeared in the video
   Tracking-level understanding
    Object trajectories – correspondence
   Activity-level understanding
    E.g: Recognition of human activities and events



                                                      2
    Human activity recognition is an important area of
    computer vision research and applications.
   The goal of the activity recognition is an automated
    analysis or interpretation of ongoing events and their
    context from video data.
   Its applications include surveillance systems, patient
    monitoring systems, and a variety of systems that
    involve interactions between persons and electronic
    devices such as human-computer interfaces.
   Most of these applications require recognition of high-
    level activities, often composed of multiple simple
    actions of persons.

                                                              3
Categorized based on their complexity:
 Actions: single actor movements.

   e.g.: bending, walking etc.
 Interactions: human-human/object
  interactions.
   e.g.: punching, lifting bag etc.
 Group activities: activities of groups.

   e.g.: group dancing, group stealing etc.

                                              4
   Surveillance: cameras installed in areas that
    may need monitoring such as
    banks, airports, military installations, and
    convenience stores.
    Currently ,surveillance systems are mainly for
    recording.
    The Aim for activity detection using CCTV‟s
    is to monitor suspicious activities for real-time
    reactions like fighting and stealing.


                                                        5
  Sports play analysis:
   analyzing the play and deducing the actions in
   the sport.
e.g.:




                                                    6
   Unmanned Aerial Vehicles(UAV‟s):
    Automated understanding of aerial images.
    Recognition of military activities like border
    security, people in bunkers etc.


                           UAV capturing 3 Taliban
                           insurgents planting
                           IED.(improvised explosive
                           device)



                                                       7
   The kit we are using for the processing of the
    video is the “DEV8000” kit.
   It has an TI OMAP3530 processor based on
    600Mhz ARM cortex A8 – core.
   Memory supporting up to 256MB DDR
    SDRAM and 256MB NAND flash.
   It even supports Ethernet , Audio , USB OTG
    , SD/MMC , Keyboard , UART(Universal
    Asynchronous Receiver/Transmitter) , Camera
    , Wi-Fi , GPRS , GPS through modules .
                                                     8
   The device includes state-of-the-art power-
    management techniques required for high-
    performance mobile products and supports
    high-level operating systems such as Windows
    CE, Linux, Symbian OS , Android.
   The board has two methods to boot the system
    from either SD card or NAND flash.




                                                   9
   Autonomous All Terrain Vehicle:
    Build an autonomous ground vehicle in a
    modular way employing sensor fusion at
    various levels leading to software APIs for
    several sensors, an Attitude & Heading
    Reference System, a path planner and a map
    builder.
   Real time images for radar and micro air
    vehicles:
    A computer-vision platform for micro air
    vehicles.

                                                  10
   Unmanned Aerial Vehicle with Real-Time
    wireless video transmission capability:
    The UAV will transmit video captured by its
    sensor to a base station in real-time.
   HDD based Multimedia system with video &
    audio:
    A multimedia system based on OMAP and
    Linux.



                                                  11
   Car Assisting System with Image and Location
    processing:
    A car management system to assist driver by
    providing model for outer environment with
    support of cameras, gps and other sensors.
   Autonomous-Seeway:
    The project is to make autonomous a seeway
    already built, and implement their control
    algorithms for tracking people or
    vehicles, through vision algorithms with a
    camera and a laser mapping.

                                                   12
 x-loader is a boot strap program , to initialize the CPU.
 u-boot is a second level boot strap program for interacting with users and
  updating the images required for OS, and leading the kernel .
 The latest 2.6.x – kernel(interface between software and hardware) is employed
 and can be customized based on devkit8000.
 Rootfs employs open source system .It is small in capacity but powerful .



                                                                             13
 The board will be booted from the NAND flash by
  default , but can also be booted from the SD card .
 Using hyper terminal in windows we interfaced the
  LCD and the Board.
 Installed cross compilation environment tool in
  Ubuntu.
Cross Compilation: It is used to compile for a platform
  upon which it is not feasible to do the compiling, like
  microcontrollers that don't support an operating
  systems.
 Installed other required tools and drivers in Linux.




                                                      14
   The scope of our project is recognition of
    common activities like walking ,clapping etc.
   Object level:
   This is the first level in the recognition. We
    have to fix our Object/Objects of Interest.
   In this technique in a video after acquiring the
    first frame the user manually fix some points
    called feature points on the frame according to
    human anatomy.

                                                       15
   The feature points are such that the parts
    between the points are rigid. We finally form
    the skeleton structure of the human body.
   Then these points can used to form rectangles
    resembling human structure now the final
    structure formed is a model on which the
    computer works on. The figure which follows
    illustrate this



                                                    16
17
Tracking:
Once we had divided the human into
 rectangular segments. We can track them in
 following frames .And hence we can track
 their motion.
This can be done by searching for the
 rectangular region which matches the original
 rectangular region that was in first frame and
 tracking it. Thus we can at any point of time
 keep the track of rectangular frames which
 help us to track the human motion as a whole.

                                                  18
Here searching in the sense it means that finding
  the region where the pixel by pixel match is
  very high.
Other methods of tracking can also be used which
  are simple than this but the conditions under
  which they can be used may differ.
Image segmentation method can also be used but
  condition must be that the back ground must
  be well known. Shown next…


                                                19
   One of the techniques of Image segmentation is
    known background subtraction to extract our
    desired object of interest.
   Once extracted tracking can be easily done as
    we are able to separate image into background
    and foreground the movement of human can
    be interpreted by the movement of foreground
    and thus it can be tracked.



                                                 20
21
   Activity Recognition:
   There are several ways to identify the activity
    recognition
   Model Fitting:
   In this method the resulting pattern of motion
    which is obtained is compared with the activity
    templates which are already present in the
    memory.
   The activity is recognized by figuring out the
    best match with the templates.
                                                  22
   References:
   Devkit 8000 user manual
    Abstract of Dr .Omaima Nomir ( Computer
    Sciences Department, Faculty of Computer and
    Information, Mansoura University )
   Google




                                                   23

Human activity recognition

  • 1.
  • 2.
    Object-level understanding Locations of persons and objects. E.g:„Car‟ appeared in the video  Tracking-level understanding Object trajectories – correspondence  Activity-level understanding E.g: Recognition of human activities and events 2
  • 3.
    Human activity recognition is an important area of computer vision research and applications.  The goal of the activity recognition is an automated analysis or interpretation of ongoing events and their context from video data.  Its applications include surveillance systems, patient monitoring systems, and a variety of systems that involve interactions between persons and electronic devices such as human-computer interfaces.  Most of these applications require recognition of high- level activities, often composed of multiple simple actions of persons. 3
  • 4.
    Categorized based ontheir complexity:  Actions: single actor movements. e.g.: bending, walking etc.  Interactions: human-human/object interactions. e.g.: punching, lifting bag etc.  Group activities: activities of groups. e.g.: group dancing, group stealing etc. 4
  • 5.
    Surveillance: cameras installed in areas that may need monitoring such as banks, airports, military installations, and convenience stores. Currently ,surveillance systems are mainly for recording. The Aim for activity detection using CCTV‟s is to monitor suspicious activities for real-time reactions like fighting and stealing. 5
  • 6.
     Sportsplay analysis: analyzing the play and deducing the actions in the sport. e.g.: 6
  • 7.
    Unmanned Aerial Vehicles(UAV‟s): Automated understanding of aerial images. Recognition of military activities like border security, people in bunkers etc. UAV capturing 3 Taliban insurgents planting IED.(improvised explosive device) 7
  • 8.
    The kit we are using for the processing of the video is the “DEV8000” kit.  It has an TI OMAP3530 processor based on 600Mhz ARM cortex A8 – core.  Memory supporting up to 256MB DDR SDRAM and 256MB NAND flash.  It even supports Ethernet , Audio , USB OTG , SD/MMC , Keyboard , UART(Universal Asynchronous Receiver/Transmitter) , Camera , Wi-Fi , GPRS , GPS through modules . 8
  • 9.
    The device includes state-of-the-art power- management techniques required for high- performance mobile products and supports high-level operating systems such as Windows CE, Linux, Symbian OS , Android.  The board has two methods to boot the system from either SD card or NAND flash. 9
  • 10.
    Autonomous All Terrain Vehicle: Build an autonomous ground vehicle in a modular way employing sensor fusion at various levels leading to software APIs for several sensors, an Attitude & Heading Reference System, a path planner and a map builder.  Real time images for radar and micro air vehicles: A computer-vision platform for micro air vehicles. 10
  • 11.
    Unmanned Aerial Vehicle with Real-Time wireless video transmission capability: The UAV will transmit video captured by its sensor to a base station in real-time.  HDD based Multimedia system with video & audio: A multimedia system based on OMAP and Linux. 11
  • 12.
    Car Assisting System with Image and Location processing: A car management system to assist driver by providing model for outer environment with support of cameras, gps and other sensors.  Autonomous-Seeway: The project is to make autonomous a seeway already built, and implement their control algorithms for tracking people or vehicles, through vision algorithms with a camera and a laser mapping. 12
  • 13.
     x-loader isa boot strap program , to initialize the CPU.  u-boot is a second level boot strap program for interacting with users and updating the images required for OS, and leading the kernel .  The latest 2.6.x – kernel(interface between software and hardware) is employed and can be customized based on devkit8000.  Rootfs employs open source system .It is small in capacity but powerful . 13
  • 14.
     The boardwill be booted from the NAND flash by default , but can also be booted from the SD card .  Using hyper terminal in windows we interfaced the LCD and the Board.  Installed cross compilation environment tool in Ubuntu. Cross Compilation: It is used to compile for a platform upon which it is not feasible to do the compiling, like microcontrollers that don't support an operating systems.  Installed other required tools and drivers in Linux. 14
  • 15.
    The scope of our project is recognition of common activities like walking ,clapping etc.  Object level:  This is the first level in the recognition. We have to fix our Object/Objects of Interest.  In this technique in a video after acquiring the first frame the user manually fix some points called feature points on the frame according to human anatomy. 15
  • 16.
    The feature points are such that the parts between the points are rigid. We finally form the skeleton structure of the human body.  Then these points can used to form rectangles resembling human structure now the final structure formed is a model on which the computer works on. The figure which follows illustrate this 16
  • 17.
  • 18.
    Tracking: Once we haddivided the human into rectangular segments. We can track them in following frames .And hence we can track their motion. This can be done by searching for the rectangular region which matches the original rectangular region that was in first frame and tracking it. Thus we can at any point of time keep the track of rectangular frames which help us to track the human motion as a whole. 18
  • 19.
    Here searching inthe sense it means that finding the region where the pixel by pixel match is very high. Other methods of tracking can also be used which are simple than this but the conditions under which they can be used may differ. Image segmentation method can also be used but condition must be that the back ground must be well known. Shown next… 19
  • 20.
    One of the techniques of Image segmentation is known background subtraction to extract our desired object of interest.  Once extracted tracking can be easily done as we are able to separate image into background and foreground the movement of human can be interpreted by the movement of foreground and thus it can be tracked. 20
  • 21.
  • 22.
    Activity Recognition:  There are several ways to identify the activity recognition  Model Fitting:  In this method the resulting pattern of motion which is obtained is compared with the activity templates which are already present in the memory.  The activity is recognized by figuring out the best match with the templates. 22
  • 23.
    References:  Devkit 8000 user manual  Abstract of Dr .Omaima Nomir ( Computer Sciences Department, Faculty of Computer and Information, Mansoura University )  Google 23

Editor's Notes

  • #10 BSP – board support pachageWince – windows embeded conpact
  • #11 API = application programming interface