PhD presentation bboulay


Published on

PhD presentation of Bernard Boulay from the 23 of January, 2007.

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Parler du 3D posture avatar
  • General posture recognition rate authorises that postures belonging to the same general posture can be mixed Detailed posture recognition rate differentiates each detailed postures
  • We will focus in the next to rotation steps: 36, 45, 90 Representation and comparison times are negligible compared to generation time by considering rotation step superior to 36 degrees. Representation and comparison times are similar for the others repreentations
  • The GPRR are superior to the DPRR. A rotation step of 36 degrees gives the best recognition rates. The best recognition rates are obtained with the H. & V projections. Hu moments give the worst results. This happens because of the invariance property of this representation. In particular the orientation invariance. For example a standing posture can be mixed with a lying posture.
  • We are also interested by the problem of intermediate postures which are postures between two postures of interest. We can see on this example the video sequence of a person down her left arm. This video is constituted of two postures of interest: standing with one arm up and standing with arms near the body. We hope to recognise the succession of the three postures: standing, one arm up and standing. The recognition are displaying on the different graphics for each 2D approaches, on the left without temporal filtering and on the right with the temporal filtering. First we can remark that the H.&V. representation recognises correctly the succession of the 3 postures even with no filtering. Second we see that temporal filtering correct wrong recognitions the other representations. Moreover we see that for the Hu moments representation, standing postures is mixed with lying postures.
  • We have also used synthetic video to identify the ambiguous cases. For example we can see in the table how the T-shape posture is recognised for a given view point.
  • This graphical interface is composed of 3 parts: The filtered postures can then be used for behaviour analysis.
  • This table represents the general posture recognition rates for the different 2D approaches according to the watershed algorithm. H & V projections gives the best recognition rates, followed by the geometric features. The recognition is correct with rates superior to 80%. We can notice that Hu moments representation does not work correctly, in particular as seen previously because of the invariance on orientation, and also because when a hole occurs in the silhouette, this error is on all the terms of the Hu moment. Similar results are obtained with the VSIP algorithm. In the next we will focus on H & V projections representation.
  • We see here the recognition of the detailed postures. Recognition rates are similar for the both segmentations, except for sitting on a chair posture The recognition rates are quite good from 70 up to 80 %
  • We have also tested our approach on other kind of video sequences. In particular, we are interested in video sequences involved in gait analysis. For this purpose we have introduced a new posture of interest: the walking posture. During the recognition we plan to recognise succession of standing with arms near the body and walking posture. In this video, the silhouettes obtained are good since there is a big contrast between the person and the background. We can see on the graph that the postures are well recognised, and in particular that the gait cycle are well detected
  • We have also tested our approach on video sequences acquired for the gait competition. In the video the person walk from the right to the left, and the left to the right on a semi ellipse. Even if the silhouettes are noisy, the postures and the gait cycles are well recognised except for a finite cases. On the different videos we have tested … are correctly recognised on … total postures.
  • Our proposed approach has also been tested for action recognition. We focus on self action i.e. action where only one person is involved.
  • The first action we have recognised is the fall, which is an important action for medical purpose. For example it can be used for helping elderly person at home. The fall action is characterised by the transition between a standing posture and a lying one. We can see on the video that the falling action is well recognised. Since it is based on genera postures and since these postures are well recognised the action is also well recognised.
  • The second action we have tested is the walking action. The tests are realised on the sequences taken from the gait challenge competition. The table shows the number of gait cycles correctly recognised. The action is correctly recognised except for a finite number of cases.
  • In conclusion we can say that the properties highlighting with the synthetical data are verified with the real data. In particular …. The Hu moments are definitely not adapted to or approach. Finally the processing time …
  • In conclusion, our approach is able to recognise 9 detailed postures which correspond to 4 general postures. The approach have been successfully tested for different type of silhouettes. It has also been tested for self-action recognition. We have identified 4 constraints in the beginning of the introduction. Some work in automated approach and in real time
  • PhD presentation bboulay

    1. 1. Human Posture Recognition for Behaviour Analysis 23 January 2007 Presented by Bernard Boulay ORION team Advisor : Monique Thonnat
    2. 2. Outline <ul><li>Introduction on human posture </li></ul><ul><li>State of the art </li></ul><ul><li>Overview of the hybrid approach </li></ul><ul><li>3D posture avatar </li></ul><ul><li>Hybrid approach </li></ul><ul><li>Evaluation </li></ul><ul><li>Conclusion – Future works </li></ul>
    3. 3. Introduction on human posture
    4. 4. Introduction <ul><li>Human posture is a specific configuration of the human body. </li></ul><ul><li>Human posture recognition is a difficult and challenging problem: </li></ul><ul><ul><li>degree of freedom of the human body </li></ul></ul><ul><ul><li>morphology of the person (height, corpulence, …) </li></ul></ul><ul><ul><li>clothes </li></ul></ul>Many possible perceptions of a given posture Standing Sitting Bending
    5. 5. Objectives <ul><li>To propose a generic human posture recognition approach: </li></ul><ul><ul><li>video interpretation context </li></ul></ul><ul><ul><li>various set of postures </li></ul></ul><ul><li>Four constraints : </li></ul><ul><li>One monocular static camera </li></ul><ul><ul><li>easily adaptation to different environments </li></ul></ul><ul><li>Automated approach </li></ul><ul><ul><li>no interaction from a human operator during the recognition process </li></ul></ul><ul><li>Independence from the viewpoint </li></ul><ul><ul><li>capability of providing results for any position of the camera </li></ul></ul><ul><li>Real-time </li></ul><ul><ul><li>generation of an alarm (or other action) as soon as an event occurs </li></ul></ul>
    6. 6. Applications Human Behaviour analysis Human posture recognition Video surveillance systems Aware house applications Virtual reality Intelligent user interfaces Sport monitoring
    7. 7. State of the art
    8. 8. Video interpretation <ul><li>Human posture recognition is one task of video interpretation </li></ul>Object segmentation Object classification Person tracking Posture recognition Behaviour analysis People detection
    9. 9. Previous work <ul><li>Physiological and mechanical sensors </li></ul><ul><li>Video sensors </li></ul><ul><li>Several taxonomies are possible: </li></ul><ul><ul><ul><li>dimensionality of workspace (e.g. 2D, 3D, ) </li></ul></ul></ul><ul><ul><ul><li>type of model used (statistical, explicit, learning based) </li></ul></ul></ul><ul><ul><ul><li>quantity of sensors (one or several cameras) </li></ul></ul></ul><ul><li>Approach types described: </li></ul><ul><ul><ul><li>2D approaches with explicit models </li></ul></ul></ul><ul><ul><ul><li>2D approaches with statistical models </li></ul></ul></ul><ul><ul><ul><li>3D approaches for mono camera configuration </li></ul></ul></ul><ul><ul><ul><li>3D approaches for multi cameras configuration </li></ul></ul></ul>
    10. 10. 2D approaches with explicit models <ul><li>Based on a set of 2D models depending on the viewpoint. </li></ul><ul><li>Body parts are detected : </li></ul><ul><ul><li>extremities (head, hands and feet) </li></ul></ul><ul><ul><li>limbs (legs and arms) </li></ul></ul><ul><li>+ Low processing time </li></ul><ul><li>Sensitive to segmentation errors </li></ul>Cardboard model [Ju et al. 1996]
    11. 11. 2D approaches with statistical models <ul><li>Based on a statistical modeling of postures </li></ul><ul><ul><li>statistical terms are generally derived from the silhouette. </li></ul></ul><ul><li>+ Low processing time </li></ul><ul><li>Dependent on the training </li></ul><ul><li>viewpoint </li></ul>Horizontal 2D probability map for standing posture [Panini et al. 2003] (green points have a high probability to belong to a standing posture) Silhouettes Horizontal projections
    12. 12. 3D approaches – mono camera <ul><li>Explicit-based approaches </li></ul><ul><ul><li>an articulated 3D body model </li></ul></ul><ul><ul><li>search the 3D model parameters such </li></ul></ul><ul><ul><li>as the model projection on the image plane fits </li></ul></ul><ul><ul><li>with the input image (silhouette, contour) </li></ul></ul><ul><li>+ Independent from the viewpoint </li></ul><ul><li>- High processing time </li></ul><ul><li>Learning-based approaches </li></ul><ul><ul><li>do not need an explicit 3D model </li></ul></ul><ul><ul><li>store annotated 3D postures in an image database </li></ul></ul><ul><li>- High processing time </li></ul><ul><li>- Need of an exhaustive database </li></ul>3D human model [Sminchisescu et al. 2002]
    13. 13. 3D approaches – multiple cameras <ul><li>Improve the 3D measures </li></ul><ul><li>Solve self-occlusion problem </li></ul><ul><li>Model-based approaches </li></ul><ul><ul><li>use an articulated 3D body model </li></ul></ul><ul><ul><li>compute the 3D model parameters </li></ul></ul><ul><li>Learning-based approaches </li></ul><ul><ul><li>learning is made on annotated images </li></ul></ul><ul><ul><li>images can be treated separately </li></ul></ul><ul><ul><li>or used to reconstruct a 3D shape </li></ul></ul><ul><ul><li>[Cohen et al. 2003] </li></ul></ul><ul><li>+ Independent from the viewpoint </li></ul><ul><li>- High processing time </li></ul>3D shape representation [Cohen et al. 2003]
    14. 14. Previous work – video sensor Low processing time Independent from the viewpoint Hybrid approach = 2D approaches + 3D approaches High processing time Dependence from the viewpoint Drawbacks Independent from the viewpoint Low processing time Advantages 3D approaches 2D approaches
    15. 15. Overview of the hybrid approach
    16. 16. Proposed approach Object segmentation Object classification Person tracking People detection Behaviour analysis Posture filter Detected silhouette Identifier Filtered posture Posture detector Camera parameters Posture recognition
    17. 17. 3D posture avatar
    18. 18. 3D posture avatar Joints Body parts 3D human body model Joint parameters Body primitives 3D posture avatar A B B is composed of A
    19. 19. 3D human body model <ul><li>Defines the relation between the body parts and the joints. </li></ul><ul><li>Our model is : </li></ul><ul><ul><li>20 body parts </li></ul></ul><ul><ul><li>9 joints (main articulations) </li></ul></ul><ul><ul><li>1 positioning joint </li></ul></ul>
    20. 20. Joint parameters <ul><li>Define the rotation of the different body parts in the 3D space. </li></ul><ul><li>Rotations are represented with Euler angles: </li></ul><ul><ul><li>three angles according to each axis </li></ul></ul><ul><ul><li>allows to easily model a 3D posture avatar </li></ul></ul><ul><li>3 degrees of freedom for each joint except the knees: </li></ul><ul><ul><li>23 parameters </li></ul></ul>
    21. 21. Body primitives <ul><li>Define the visual aspect of the body parts. </li></ul><ul><li>Surfacic model: </li></ul><ul><ul><li>composed of vertices (2D facets which live in 3D space ) </li></ul></ul><ul><ul><li>realistic silhouettes </li></ul></ul><ul><ul><li>computation time is similar to a volumetric human model (cylinder, parallelepiped, …) </li></ul></ul>
    22. 22. Postures of interest Standing Sitting Bending Lying Hierarchical representation of postures General postures Detailed postures Accurate postures
    23. 23. Hybrid approach
    24. 24. Proposed approach Posture detector Camera parameters Object segmentation Object classification Person tracking People detection Behaviour analysis Posture filter detected silhouette Identifier Filtered posture
    25. 25. Posture detector – silhouettes generation <ul><li>To generate silhouettes according to the detected person. </li></ul>Detected silhouette Virtual camera 3D silhouette generator 3D posture avatars Generated silhouettes Camera parameters 3D position
    26. 26. 3D posture avatar positioning <ul><li>Calibration matrix of the camera: </li></ul><ul><ul><li>3D position of a point on the image </li></ul></ul><ul><ul><li>the point is on the ground </li></ul></ul><ul><li>Posture avatar position depends on the posture type </li></ul><ul><ul><li>it corresponds to the rotation axis </li></ul></ul>Generated silhouettes Rotation step
    27. 27. Posture detector – silhouettes comparison Generated silhouettes Detected silhouette 2D silhouettes comparison Detected posture
    28. 28. Silhouette Comparison <ul><li>Comparing two silhouettes is a problem of shape similarity. </li></ul><ul><li>Existing approaches can be classified in 3 categories: </li></ul><ul><ul><li>feature based approaches </li></ul></ul><ul><ul><li>boundary based approaches </li></ul></ul><ul><ul><li>structural based approaches </li></ul></ul><ul><li>Different approaches have been studied according two issues: </li></ul><ul><ul><li>computation rapidity </li></ul></ul><ul><ul><li>independence from the silhouette quality </li></ul></ul>
    29. 29. Silhouette Comparison Classification of 2D methods to represent silhouettes --- + Distance transform -- --- Shape from context -- + Skeletonisation - ++ H. & V. projections -- ++ Geometric features -- ++ Hu moments Independence from the silhouette quality Computation rapidity 2D methods
    30. 30. Silhouette Comparison <ul><li>Feature based approach: </li></ul>horizontal and vertical projections
    31. 31. Silhouette Comparison
    32. 32. Silhouette Comparison <ul><li>Boundary based approach : skeletonisation [Fujiyoshi et al., 2004] </li></ul><ul><ul><li>two dilatations (remove small hole) </li></ul></ul><ul><ul><li>erosion (smooth out anomalies) </li></ul></ul><ul><ul><li>centroid computation </li></ul></ul><ul><ul><li>distance from centroid to boundary computation </li></ul></ul><ul><ul><li>distance curve smoothing (mean window algorithm) </li></ul></ul><ul><ul><li>local maxima extraction </li></ul></ul>0 7 9 11 21 41 Skeleton for several window sizes
    33. 33. Temporal posture filtering <ul><li>Temporal posture coherency is exploited to repair posture recognition errors. </li></ul><ul><li>Posture stability principle: </li></ul><ul><ul><li>for a high enough frame-rate (>5 fps) the posture remains similar for two consecutive frames </li></ul></ul><ul><li>Tracking information (the identifier) provides the list of the previous postures. </li></ul><ul><li>Size of the window depends on the frame rate (10 im/s  2*10+1) </li></ul><ul><ul><li> delay of 1 second </li></ul></ul>Previous recognised postures Posture filter Detected posture Filtered posture Identifier
    34. 34. Evaluation
    35. 35. Evaluation method <ul><li>The approach is evaluated on both </li></ul><ul><ul><li>Synthetic video </li></ul></ul><ul><ul><ul><li>many viewpoints </li></ul></ul></ul><ul><ul><ul><li>experimentations for different problems (intermediate postures, ambiguous postures) </li></ul></ul></ul><ul><ul><ul><li>automatic ground truth </li></ul></ul></ul><ul><ul><ul><li>difficulty for simulating realistic noisy videos </li></ul></ul></ul><ul><ul><li>Real video </li></ul></ul><ul><ul><ul><li>erroneous silhouettes </li></ul></ul></ul><ul><ul><ul><li>intermediate postures </li></ul></ul></ul>
    36. 36. Evaluation method <ul><li>True positive for a given posture Pi </li></ul><ul><li>General posture Pg recognition rate </li></ul><ul><li>with ng the number of general posture, ngi the number of posture Pgi </li></ul><ul><li>Detailed posture Pd recognition rate </li></ul>
    37. 37. Synthetic video - data generation <ul><li>Rotation around w axis with the rotation angle on the ground </li></ul><ul><li>Position of a virtual camera with the rotation angle </li></ul><ul><ul><li>10 avatars </li></ul></ul><ul><ul><li>360 orientations </li></ul></ul><ul><ul><li>19 viewpoints </li></ul></ul><ul><ul><li> 68400 silhouettes </li></ul></ul><ul><li>Data base and tools are available online [] </li></ul>
    38. 38. Synthetic video – processing time <ul><li>Silhouette processing time is characterised by: </li></ul><ul><ul><li>generation time (independent from 2D method) </li></ul></ul><ul><ul><li>representation time (dependent on 2D method) </li></ul></ul><ul><ul><li>comparison time (dependent on 2D method) </li></ul></ul>Generation time Rotation step (degrees) Time (s.) Rotation step (degrees) Time (s.) Representation time Comparison time (for geometric features)
    39. 39. Synthetic video – 2D methods comparison 47 71 90 63 82 45 68 84 36 Skeletonisation 54 75 90 72 89 45 76 90 36 H. & V. projections 52 69 90 72 88 45 75 89 36 Geometric features 43 59 90 54 68 45 57 69 36 Hu moments Detailed posture rate recognition (%) General posture rate recognition (%) Rotation step (degrees)
    40. 40. Synthetic video – key postures With temporal posture filtering Without temporal posture filtering 0=standing with one arm up 2=standing with arms near the body 3=T-shape 7=lying with spread legs <ul><li>Attempt: </li></ul><ul><li>standing with arms near the body </li></ul><ul><li>standing with one arm up </li></ul><ul><li>standing with arms near the body </li></ul>
    41. 41. Synthetic video – ambiguous cases <ul><li>Ambiguous cases = 2 postures have similar silhouettes for a certain viewpoint </li></ul><ul><ul><li>depend on 2D method, viewpoint, posture, orientation </li></ul></ul><ul><li>For each viewpoint, recognition results are analysed </li></ul><ul><li>A confident value can be computed according to a viewpoint and a posture (T-Shape: 205/360) </li></ul>T-shape recognition with H. & V. projections 205 53 102 T-Shape T-Shape Standing with arms near the body Standing with one arm up Recognised posture Ground-truth 0 90 180 270 Orientation (degrees)
    42. 42. Synthetic video - conclusion <ul><li>Optimal rotation step = 36 degrees  10 silhouettes by posture avatar </li></ul><ul><ul><li>trade-off between computation time - recognition rate </li></ul></ul><ul><li>General posture recognition rate > detailed posture recognition rate </li></ul><ul><li>Horizontal and vertical projections representation gives the best recognition. </li></ul><ul><ul><li>less sensitive to intermediate postures </li></ul></ul><ul><li>Hu moments not adapted to human posture recognition </li></ul><ul><ul><li>properties of invariance (translation, scale and orientation) introduce wrong recognition. </li></ul></ul>
    43. 43. Real video – segmentation algorithms <ul><li>3 segmentation algorithms </li></ul><ul><ul><li>“ VSIP algorithm” (ORION) </li></ul></ul><ul><ul><ul><li>reference image in different color space </li></ul></ul></ul><ul><ul><ul><li>under segmented silhouette </li></ul></ul></ul><ul><ul><ul><li>algorithm is manually tuned to obtain good silhouettes </li></ul></ul></ul><ul><ul><li>“ Watershed algorithm” [Lerallut,2006,CMM] </li></ul></ul><ul><ul><ul><li>watershed </li></ul></ul></ul><ul><ul><ul><li>over segmented silhouette </li></ul></ul></ul><ul><ul><ul><li>algorithm is manually tuned </li></ul></ul></ul><ul><ul><li>“ Gait algorithm” [, University of South Florida] </li></ul></ul><ul><ul><ul><li>statistical modeling of the background </li></ul></ul></ul><ul><ul><ul><li>noisy over segmented silhouette </li></ul></ul></ul><ul><ul><ul><li>silhouettes are available online </li></ul></ul></ul>
    44. 44. Real video – own sequences Current image Binary image Detailed postures General postures not filtered filtered
    45. 45. Real video – own sequences General posture recognition rate (%) for the different silhouette representations with “ Watershed algorithm” 93 78 89 100 H. & V. projections 65 82 68 93 Skeletonis-ation 35 27 73 68 Hu moments 83 77 82 94 Geometric features Lying Bending Sitting Standing
    46. 46. Real video – own sequences <ul><li>Watershed VSIP </li></ul><ul><li>1.Standing with one arm up 71 71 </li></ul><ul><li>2.Standing 74 74 </li></ul><ul><li>3.T-shape 80 80 </li></ul><ul><li>4.Sitting on a chair 66 52 </li></ul><ul><li>5.Sitting on the floor 69 74 </li></ul><ul><li>6.Bending 78 91 </li></ul><ul><li>7.Lying with spread legs 77 81 </li></ul><ul><li>8.Lying with curled up legs 63 60 </li></ul>Detailed posture recognition rate (%) with horizontal and vertical projections for different segmentation algorithms
    47. 47. Real video – gait sequence 78/81 postures correctly recognised New posture of interest: the walking posture Recognised posture Ground-truth posture Recognised postures 2=standing posture 3=walking posture
    48. 48. Real video – gait sequence 162/186 (87%) postures correctly recognised For the 5 sequences: 711/911 (78%) postures are correctly recognised Recognised posture Ground-truth posture Recognised postures 2=standing posture 3=walking posture
    49. 49. Action recognition using postures <ul><li>Representation: finite state machine </li></ul><ul><li>Recognition </li></ul><ul><ul><li>stack associated to the detected person </li></ul></ul><ul><ul><li>matching between the current stack and the different action templates </li></ul></ul>P1 min 1 max 1 P2 min 2 max 2 Pn min n max n P1,32 P2,5 P1,21
    50. 50. Action recognition – the fall Standing 3 ∞ Bending or sitting 0 10 Lying 3 ∞ Based on general postures 0 0 10 Recognised falling action FN FP TP
    51. 51. Action recognition – the walk Standing with arms near the body 2 10 Walking 3 15 Based on detailed postures 3 0 62 Recognised walking action FN FP TP
    52. 52. Real video - conclusion <ul><li>General posture recognition rate > detailed posture recognition rate </li></ul><ul><li>Horizontal and vertical projections representation gives the best recognition. </li></ul><ul><li>Hu moments not adapted to human posture recognition </li></ul><ul><ul><li>properties of invariance </li></ul></ul><ul><ul><li>erroneous silhouette </li></ul></ul><ul><li>Processing time </li></ul><ul><ul><li>acquisition  action recognition = 5-6 frames per second </li></ul></ul>
    53. 53. Conclusion – Future works
    54. 54. Conclusion <ul><li>Recognise correctly 9 detailed postures </li></ul><ul><ul><li>several types of silhouette </li></ul></ul><ul><ul><li>self-action recognition </li></ul></ul><ul><li>Constraints </li></ul><ul><ul><li>One monocular static camera </li></ul></ul><ul><ul><ul><li>a priori knowledge about the context </li></ul></ul></ul><ul><ul><li>Automated approach </li></ul></ul><ul><ul><ul><li>adaptation of the 3D posture avatar by the height </li></ul></ul></ul><ul><ul><li>Independence from the viewpoint </li></ul></ul><ul><ul><ul><li>3D posture avatars </li></ul></ul></ul><ul><ul><li>Real-time </li></ul></ul><ul><ul><ul><li>treat 5-6 frames per second </li></ul></ul></ul>
    55. 55. Limitations <ul><li>Computation time (5-6 images by second) </li></ul><ul><ul><li>silhouette generation time </li></ul></ul><ul><li>3D posture avatar adaptability </li></ul><ul><ul><li>person height </li></ul></ul><ul><li>Limitation of the number of interesting postures (11 postures) </li></ul><ul><ul><li>computation time </li></ul></ul><ul><ul><li>discrimination power </li></ul></ul><ul><li>Occlusion </li></ul>
    56. 56. Future works <ul><li>Dealing with static occlusion by using context information </li></ul>
    57. 57. Future works <ul><li>Intelligent camera </li></ul><ul><ul><li>integration of the approach on a dedicated chip (collaboration with STMicroelectronics) </li></ul></ul><ul><li>Dynamic body primitive adaptability </li></ul><ul><ul><li>need of a database of several body primitives </li></ul></ul><ul><li>3D posture avatar variability or gesture recognition </li></ul><ul><ul><li>varying the joint parameters (thesis on gesture recognition) </li></ul></ul><ul><li>Processing time improvement </li></ul><ul><ul><li>authorised posture transitions can be represented with an automata to decrease the number of silhouette to generate </li></ul></ul><ul><ul><li>orientation of the person can also be used to only generate silhouettes for the correct orientation </li></ul></ul>
    58. 58. Questions ? <ul><li>List of the publications: </li></ul><ul><li>International Journal: </li></ul><ul><ul><li>[1] Applying 3D Human Model in a Posture Recognition System . Boulay, B. and Bremond, F. and Thonnat, M. Pattern Recognition Letter, Special Issue on Vision for Crime Detection and Prevention . November 2006, 15(27), pp 1788-1796 </li></ul></ul><ul><li>International conferences: </li></ul><ul><ul><li>[1] Posture Recognition with a 3D Human Model . Boulay, B. and Bremond, F. and Thonnat, M. Proceedings of IEE International Symposium on Imaging for Crime Detection and Prevention . 2005 </li></ul></ul><ul><ul><li>[2] Human Posture Recognition in Video Sequence . Boulay, B. and Bremond, F. and Thonnat, M. Proceeding Joint IEEE International Workshop on VS-PETS, Visual Surveillance and Performance Evaluation of Tracking and Surveillance . October 2003, pp 23-29 </li></ul></ul>
    59. 60. Questions ? :
    60. 61. Proposed approach Video stream People detection Contextual Knowledge base People tracking Silhouette 3D position Identifier Posture detector Posture filter Recognised posture Behaviour analysis
    61. 62. Contextual knowledge base <ul><li>Necessary to interpret a scene : </li></ul><ul><ul><li>Positions of the contextual objects (chair, desk, …) </li></ul></ul><ul><ul><li>Location of the zones of interest (forbidden zone, safe zone, …) </li></ul></ul><ul><ul><li>Characteristics of the camera (calibration matrix and position) </li></ul></ul><ul><ul><li>Semantic associated to contextual objects for behaviour analysis </li></ul></ul>Characteristics of the camera
    62. 63. Overview of the proposed approach <ul><li>Hybrid approach which combines : </li></ul><ul><ul><li>3D posture avatar </li></ul></ul><ul><ul><li>2D silhouette representation techniques </li></ul></ul><ul><li>Approach is composed of 4 tasks </li></ul><ul><ul><li>3D posture avatar are generated </li></ul></ul><ul><ul><li>Silhouettes are represented and compared with 2D techniques </li></ul></ul><ul><ul><li>Posture is detected </li></ul></ul><ul><ul><li>Posture is filtered </li></ul></ul>
    63. 64. 3D posture avatar <ul><li>3D human model (description) body parts + articulations </li></ul><ul><li>Body primitives (vertices type) polygonal meshes visualisation of the body parts </li></ul><ul><li>Articulation parameters (Euler representation) localisation of the body parts </li></ul><ul><li>Body animation (the transformation matrix, the animation algorithms) </li></ul><ul><li>Posture of interest : hierarchical general and detailed </li></ul><ul><li>General = set of 3D posture avatars </li></ul><ul><li>Detailed = a 3D posture avatar </li></ul>
    64. 65. <ul><li>A 3D posture avatar model </li></ul><ul><li>A 3D avatar animation algorithm </li></ul><ul><li>A hierarchical posture classification </li></ul><ul><li>A 3D engine implementation </li></ul>
    65. 66. Hybrid approach <ul><li>Overview (the approach + video 3d avatar tournant sur lui meme) </li></ul><ul><li>Silhouette generation </li></ul><ul><ul><li>virtual camera (les 2 matrices) </li></ul></ul><ul><ul><li>3D posture avatar position (localisation + orientations)(equations…) </li></ul></ul><ul><ul><li>silhouette extraction (z-buffer) </li></ul></ul><ul><li>Silhouette modeling and comparison </li></ul><ul><ul><li>4 representations kept (computation time, silhouette quality dependence) description + table </li></ul></ul><ul><li>Posture filtering </li></ul><ul><ul><li>posture stability principle </li></ul></ul>
    66. 67. Results <ul><li>Ground truth model : description of the different attributes </li></ul><ul><li>Viper </li></ul><ul><li>comparison results + gt file </li></ul><ul><li>Synthetic data (generation + results) </li></ul><ul><li>Real data (acquisition + results) </li></ul><ul><li> database </li></ul><ul><li>Action recognition (the fall + the walk) state machine, general or detailed postures </li></ul>
    67. 68. Silhouette Comparison <ul><li>Boundary based approach : Shape from context [Belongie et al.] </li></ul><ul><ul><li>Chamfer distance </li></ul></ul>
    68. 69. Silhouette Comparison <ul><li>Structural based approach : distance transform [Sminchisescu et al., 2002] </li></ul>
    69. 70. 3D posture avatar generation <ul><li>For each body primitive </li></ul><ul><li>For each 3D point </li></ul><ul><li>1. the point is translated to the origin </li></ul><ul><li>2. the point is rotated around the X, Y and Z axis </li></ul><ul><li>3. the point is translated to its original location </li></ul>
    70. 71. Virtual camera <ul><li>The camera transform (4x4) </li></ul><ul><ul><li>extrinsic set of parameters defines the transformation from the world reference to the camera transform (eye, center, up) </li></ul></ul><ul><li>The perspective transform (4x4) </li></ul><ul><ul><li>intrinsic set of parameters define the transformation to model the distortion of the camera (fovy, aspect, znear, zfar) </li></ul></ul>
    71. 72. Silhouette Comparison <ul><li>Feature based approah: Hu moments [Bobick and Davis, 2001] </li></ul><ul><ul><li>2D moments </li></ul></ul><ul><ul><li>Centered (translation </li></ul></ul><ul><ul><li>invariance) </li></ul></ul><ul><ul><li>Normalise by the area </li></ul></ul><ul><ul><li>(scale invariance) </li></ul></ul><ul><ul><li>Rotation invariance </li></ul></ul>
    72. 73. Silhouette Comparison <ul><li>Feature based approach: geometric features </li></ul><ul><ul><li>Area </li></ul></ul><ul><ul><li>Centroid </li></ul></ul><ul><ul><li>Orientation </li></ul></ul><ul><ul><li>Eccentricity </li></ul></ul><ul><ul><li>Compactness </li></ul></ul><ul><li>Similarity </li></ul>
    73. 74. Ground-truth <ul><li>A ground-truth is necessary to evaluate the hybrid approach. </li></ul><ul><li>The ground-truth attributes are: </li></ul><ul><ul><li>Identifier </li></ul></ul><ul><ul><li>Bounding box </li></ul></ul><ul><ul><li>Posture </li></ul></ul><ul><ul><li>Orientation </li></ul></ul><ul><ul><li>Occlusion type </li></ul></ul><ul><li>The ground-truth is acquired with the Viper software (XML format) [Mariano et al., 2002]. </li></ul>
    74. 75. Synthetic video – ambiguous cases 239 53 50 0 0 0 0 0 0 0 9 121 284 18 0 1 0 0 0 0 0 8 0 23 283 0 0 0 0 0 0 0 7 0 0 9 323 0 0 0 0 0 0 6 0 0 0 37 357 22 0 0 0 0 5 0 0 0 0 2 338 0 0 0 0 4 0 0 0 0 0 0 205 0 29 35 3 0 0 0 0 0 0 53 294 33 27 2 0 0 0 0 0 0 60 53 249 113 1 0 0 0 0 0 0 42 18 49 185 0 9 8 7 6 5 4 3 2 1 0
    75. 76. Synthetic video – 2D methods comparison 1e-4 4e-4 6e-4 3e-3 5e-3 6e-3 6e-5 2.5e-4 3.9e-4 4e-5 3e-4 4e-4 tc (s/frame) 0.01 0.03 0.04 0.02 0.03 0.04 0.02 0.03 0.04 0.01 0.03 0.04 tr (s/frame) 0.52 1.04 1.28 0.52 1.04 1.28 0.52 1.04 1.28 0.52 1.04 1.28 tg (s/frame) 47 63 68 54 72 76 52 72 75 43 54 57 DPRR (%) 71 82 84 75 89 90 69 88 89 59 68 69 GPRR (%) 90 45 36 90 45 36 90 45 36 90 45 36 Rotation step (degrees) Skeletonisation H. & V. projections Geometric features Hu moments
    76. 77. Proposed approach Posture filter Object segmentation Object classification Person tracking People detection Behaviour analysis Detected silhouette Identifier Filtered posture Posture detector Camera parameters
    77. 78. Posture filter <ul><li>Temporal posture coherency is exploited to repair posture recognition errors. </li></ul>Detected posture Identifier Posture filter Filtered posture Previous recognised postures
    78. 79. Posture detector – silhouettes generation Camera parameters 3D posture avatars 3D silhouette generator 3D position Virtual camera Generated silhouettes
    79. 80. Posture detector – silhouette generation Camera parameters 3D posture avatars 3D silhouette generator 3D position Virtual camera Generated silhouettes
    80. 81. Silhouette generation <ul><li>3D posture avatars are rotated. </li></ul><ul><li>Z-buffer technique is used to extract silhouette </li></ul><ul><li>To avoid problem of misdrawing: </li></ul><ul><ul><li>Operation buffer: all the drawing operations are done in this buffer </li></ul></ul><ul><ul><li>Current buffer: it contains the considered 3D posture avatar. The silhouette is extracted from this buffer. </li></ul></ul>Generated silhouettes
    81. 82. Posture detector – silhouettes comparison Comparison Generated silhouettes Detected silhouette Detected posture
    82. 83. Real video – Own sequences (watershed/ VSIP ) 63/ 60 77/ 81 78/ 91 69/ 74 66/ 52 80/ 80 74/ 74 71/ 71 Success percentage 196/ 192 45/ 35 0/ 0 0/ 0 0/ 0 0/ 0 0/ 0 0/ 0 8.Lying with curled up legs 81/ 86 162/ 158 0/ 0 0/ 0 0/ 0 0/ 0 0/ 0 0/ 0 7.Lying with spread legs 0/ 0 2/ 2 54/ 63 0/ 8 0/ 0 0/ 0 0/ 0 0/ 0 6.Bending 25/ 28 0/ 0 6/ 0 100/ 106 1/ 1 0/ 0 0/ 0 0/ 0 5.Sitting on the floor 11/ 12 0/ 0 7/ 0 44/ 20 51/ 40 0/ 0 0/ 0 0/ 0 4.Sitting on a chair 0/ 0 0/ 0 0/ 0 0/ 0 0/ 0 55/ 55 3/ 3 27/ 27 3.T-shape 0/ 0 0/ 0 2/ 1 0/ 10 22/ 36 1/ 1 67/ 67 5/ 5 2.Standing 0/ 0 0/ 0 0/ 5 0/ 0 3/ 0 13/ 13 21/ 21 79/ 79 1.Standing with one arm up 8 7 6 5 4 3 2 1 Ground-truth Recognised