The task of human activity recognition in videos can be solved by using an HMM since videos are inherently a sequentiaal information. We define a new SVM based kernel for this task by designing the kernel as an HMM based kernel known as HMM-IMK.
Chain code is a technique used in digital image processing for contour detection and representation. It encodes contours as a sequence of direction codes indicating the path from one pixel to the next along the boundary. Chain code provides a compact representation of shapes and is used for applications like contour matching, object recognition, and shape analysis. While offering efficient storage and computation, chain code can be sensitive to noise and may lose detail for complex contours.
This document outlines a crime detection project that uses machine learning to analyze crime data and predict future crimes. It discusses the current limitations of crime detection systems and how the project aims to provide more accurate and efficient prediction through tools like Python, Pandas, NumPy, Matplotlib and Scikit-learn. The project will preprocess and visualize crime data, train a KNN classifier model, and use location, crime type and time features to predict crimes.
Digital image processing has evolved significantly since the early 20th century. Some key developments include the first use of digital images in newspapers in the 1920s, improvements to space imagery in the 1960s that aided NASA missions, and the growth of medical applications like CAT scans in the 1970s. Today, digital image processing is used widely across many domains like enhancement, artistic effects, medicine, mapping, industrial inspection, security, and human-computer interfaces. It involves fundamental steps such as acquisition, enhancement, restoration, segmentation, and compression.
Edge detection algorithms identify points in a digital image where the image brightness changes sharply or has discontinuities. Common edge detection methods include gradient operators like Prewitt and Sobel, the Laplacian of Gaussian (LoG) used in Marr-Hildreth edge detection, and the Canny edge detector. The Canny edge detector applies smoothing, finds the image gradient, performs non-maximum suppression and double thresholding to detect edges with good localization and a single response to each edge.
Lecture 1 for Digital Image Processing (2nd Edition)Moe Moe Myint
-What is Digital Image Processing?
-The Origins of Digital Image Processing
-Examples of Fields that Use Digital Image Processing
-Fundamentals Steps in Digital Image Processing
-Components of an Image Processing System
Cloud Technologies providing Complete Solution for all
AcademicProjects Final Year/Semester Student Projects
For More Details,
Contact:
Mobile:- +91 8121953811,
whatsapp:- +91 8522991105,
Office:- 040-66411811
Email ID: cloudtechnologiesprojects@gmail.com
Crime rate analysis using k nn in python
Fingerprint recognition is a process that compares fingerprints to identify or verify individuals. It involves extracting minutiae features like ridge endings and bifurcations from scanned fingerprints. The process includes image enhancement, binarization, thinning the image to extract minutiae, and matching minutiae between two fingerprints. Fingerprint recognition has applications in security systems due to fingerprints being unique and durable over a person's lifetime. However, high accuracy is needed and fingerprints could be stolen, posing a security threat. Future work involves improving minutiae extraction and matching algorithms.
Introduction to digital image processing, image processing, digital image, analog image, formation of digital image, level of digital image processing, components of a digital image processing system, advantages of digital image processing, limitations of digital image processing, fields of digital image processing, ultrasound imaging, x-ray imaging, SEM, PET, TEM
Chain code is a technique used in digital image processing for contour detection and representation. It encodes contours as a sequence of direction codes indicating the path from one pixel to the next along the boundary. Chain code provides a compact representation of shapes and is used for applications like contour matching, object recognition, and shape analysis. While offering efficient storage and computation, chain code can be sensitive to noise and may lose detail for complex contours.
This document outlines a crime detection project that uses machine learning to analyze crime data and predict future crimes. It discusses the current limitations of crime detection systems and how the project aims to provide more accurate and efficient prediction through tools like Python, Pandas, NumPy, Matplotlib and Scikit-learn. The project will preprocess and visualize crime data, train a KNN classifier model, and use location, crime type and time features to predict crimes.
Digital image processing has evolved significantly since the early 20th century. Some key developments include the first use of digital images in newspapers in the 1920s, improvements to space imagery in the 1960s that aided NASA missions, and the growth of medical applications like CAT scans in the 1970s. Today, digital image processing is used widely across many domains like enhancement, artistic effects, medicine, mapping, industrial inspection, security, and human-computer interfaces. It involves fundamental steps such as acquisition, enhancement, restoration, segmentation, and compression.
Edge detection algorithms identify points in a digital image where the image brightness changes sharply or has discontinuities. Common edge detection methods include gradient operators like Prewitt and Sobel, the Laplacian of Gaussian (LoG) used in Marr-Hildreth edge detection, and the Canny edge detector. The Canny edge detector applies smoothing, finds the image gradient, performs non-maximum suppression and double thresholding to detect edges with good localization and a single response to each edge.
Lecture 1 for Digital Image Processing (2nd Edition)Moe Moe Myint
-What is Digital Image Processing?
-The Origins of Digital Image Processing
-Examples of Fields that Use Digital Image Processing
-Fundamentals Steps in Digital Image Processing
-Components of an Image Processing System
Cloud Technologies providing Complete Solution for all
AcademicProjects Final Year/Semester Student Projects
For More Details,
Contact:
Mobile:- +91 8121953811,
whatsapp:- +91 8522991105,
Office:- 040-66411811
Email ID: cloudtechnologiesprojects@gmail.com
Crime rate analysis using k nn in python
Fingerprint recognition is a process that compares fingerprints to identify or verify individuals. It involves extracting minutiae features like ridge endings and bifurcations from scanned fingerprints. The process includes image enhancement, binarization, thinning the image to extract minutiae, and matching minutiae between two fingerprints. Fingerprint recognition has applications in security systems due to fingerprints being unique and durable over a person's lifetime. However, high accuracy is needed and fingerprints could be stolen, posing a security threat. Future work involves improving minutiae extraction and matching algorithms.
Introduction to digital image processing, image processing, digital image, analog image, formation of digital image, level of digital image processing, components of a digital image processing system, advantages of digital image processing, limitations of digital image processing, fields of digital image processing, ultrasound imaging, x-ray imaging, SEM, PET, TEM
The document discusses edge detection methods including gradient based approaches like Sobel and zero crossing based techniques like Laplacian of Gaussian. It proposes a new algorithm that applies fuzzy logic to the results of gradient and zero crossing edge detection on an image to more accurately identify edges. The algorithm calculates gradient and zero crossings, applies fuzzy rules to classify pixels, and thresholds to determine final edge pixels.
Human Activity Recognition (HAR) systems aim to recognize human activities through sensors in order to provide assistance. The key steps in designing a HAR system are:
1) Acquiring sensor data and preprocessing it by removing noise.
2) Segmenting the preprocessed data into windows that may contain activities.
3) Extracting features from each window to reduce the data into discriminative features.
4) Training a classification model on the extracted features to predict activity labels, and evaluating the model's performance using methods like a confusion matrix.
This document summarizes a study that used data mining techniques to predict crime using real-world crime datasets from Denver and Los Angeles. The goals were to identify crime hotspots and predict future crime types based on location, time, and other attributes. The models tested included the Apriori algorithm to identify frequent crime patterns, a naïve Bayesian classifier to predict crime type based on location/time features, and a decision tree classifier. Key results identified crime hotspots and showed the Bayesian classifier achieved prediction accuracies of 51-54% while the decision tree was more complex and achieved lower accuracy.
Edge detection is the name for a set of mathematical methods which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities.
This document discusses various techniques for image segmentation. It begins by defining image segmentation as dividing an image into constituent regions or objects based on visual characteristics. There are two main categories of segmentation techniques: edge-based techniques which detect discontinuities, and region-based techniques which partition images into regions of similarity. Popular region-based techniques include region growing, region splitting and merging, and watershed transformation. Edge-based techniques detect edges using methods like edge detection. The document provides an overview of these segmentation techniques and their applications in image analysis tasks.
The document discusses image segmentation techniques. It describes image segmentation as partitioning a digital image into multiple regions based on characteristics like color or texture. Common applications of image segmentation include industrial inspection, optical character recognition, and medical imaging. The techniques discussed are fixed thresholding, iterative thresholding, and fuzzy c-means clustering. Fuzzy c-means clustering is identified as the most suitable for pest image segmentation based on its lower entropy and normalized mutual information values. Simulated annealing is also proposed to improve upon the limitations of fuzzy c-means clustering.
Digital image processing refers to processing digital images using a computer. A digital image is composed of pixels that each have a location and value. Digital image processing originated in the 1920s with transmitting newspaper images via cable. It advanced with space missions in the 1960s and medical uses in the 1970s. Today, digital image processing is used widely for tasks like image enhancement, artistic effects, medical visualization, industrial inspection, and law enforcement.
This document discusses face recognition systems and the use of artificial neural networks for face recognition. It describes the basic steps in a face recognition system as face detection, alignment, feature extraction, and matching. Two types of neural networks that can be used for recognition are described - Radial Basis Function Networks and Back Propagation Networks. RBF Networks have an input, hidden, and output layer while BPN uses backpropagation of errors to adjust weights. The document also outlines some applications of face recognition systems such as ID verification and criminal investigations.
Lab manual of Digital image processing using python by khalid Shaikhkhalidsheikh24
This document is a practical workbook for a digital image processing course. It contains 8 lab sessions where a student learns how to install Python and PyCharm, read and display images, extract image pixel information, convert images between color spaces and formats, apply filters like blurring, and perform operations like edge detection and resizing. Each lab has the objective, task description, source code, and output for tasks related to foundational digital image processing techniques.
An evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method which combines the other two methods.
"FingerPrint Recognition Using Principle Component Analysis(PCA)”Er. Arpit Sharma
Fingerprint recognition is one of the oldest and most popular biometric technologies and it is used in criminal investigations, civilian, commercial applications, and so on. Fingerprint matching is the process used to determine whether the two sets of fingerprints details come from the same finger or not. This work focuses on feature extraction and minutiae matching stage. There are many matching techniques used for fingerprint recognition systems such as minutiae based matching, pattern based matching, Correlation based matching, and image based matching.
A new method based upon Principal Component Analysis (PCA) for fingerprint enhancement is proposed in this paper. PCA is a useful statistical technique that has found application in fields such as face recognition and image compression, and is a common technique for finding patterns in data of high dimension. In the proposed method image is first decomposed into directional images using decimation free Directional Filter bank DDFB. Then PCA is applied to these directional fingerprint images which gives the PCA filtered images. Which are basically directional images? Then these directional images are reconstructed into one image which is the enhanced one. Simulation results are included illustrating the capability of the proposed method.
The document discusses the eigenface approach for face recognition. It provides an overview of eigenfaces, how they are calculated from a training set of faces, and how they can be used to identify faces by projecting faces onto the eigenface space. Major steps include calculating the eigenfaces from a training set, projecting new images into eigenface space to get weight coefficients, and comparing the weights to known individuals' weights or thresholds to classify faces as known or unknown. Advantages are ease of implementation and little preprocessing required, while limitations include sensitivity to head scale and only applicable to frontal views under controlled conditions.
The document discusses object tracking in computer vision. It begins with an introduction and overview of applications of object tracking. It then discusses object representation, detection, tracking algorithms and methodologies. It compares different tracking methods and provides an example of object tracking in MATLAB. Key steps in object tracking include object detection, tracking the detected objects across frames using algorithms like point tracking, kernel tracking and silhouette tracking. Common challenges with object tracking are also summarized.
Applications of Digital image processing in Medical FieldAshwani Srivastava
This document discusses different types of electromagnetic radiation used for imaging. It describes digital images as composed of pixels and notes that digital image processing involves manipulating digital images on a computer. It outlines different levels of image processing from low-level tasks like noise reduction to mid-level tasks like segmentation to high-level tasks like image analysis. It provides examples of imaging applications using gamma rays, X-rays, ultraviolet light, microwaves, radio waves, and magnetic resonance imaging.
Lec2: Digital Images and Medical Imaging ModalitiesUlaş Bağcı
This document discusses an introductory lecture on digital images and medical imaging modalities. It provides background on several modalities including X-ray, ultrasound, computed tomography, magnetic resonance imaging, positron emission tomography, and diffusion weighted imaging. For each modality, it describes the basic physics principles, clinical applications, and examples of images. The document emphasizes that medical image analysis is an important and active area of research that can help address challenges in measurement, detection, and diagnosis.
Design of a hand geometry based biometric systemBhavi Bhatia
This document provides details about the design of a hand geometry-based biometric system. It discusses the methodology used, which includes image acquisition, preprocessing, feature extraction, matching, and decision stages. Image acquisition involves capturing grayscale images of hands using a digital camera. Preprocessing includes binarization to separate the hand from the background. Feature extraction measures finger lengths and widths. The extracted features are then matched against templates in a database to verify a user's identity. The overall goal is to develop a biometric verification system using geometric features of the hand.
This document provides an overview of a digital image processing lecture given by Dr. Moe Moe Myint at Technological University in Kyaukse, Myanmar. It includes information about the instructor's contact information and office hours. The document then summarizes the contents of Chapter 2, which covers topics like visual perception, light and the electromagnetic spectrum, image sensing and acquisition, and basic relationships between pixels. Examples and diagrams are provided to illustrate concepts like the structure of the human eye, image formation, brightness adaptation, and the electromagnetic spectrum. Optical illusions are also discussed as examples of how visual perception does not always match physical light intensities.
The document discusses human activity recognition from video data using computer vision techniques. It describes recognizing activities at different levels from object locations to full activities. Basic activities like walking and clapping are the focus. Key steps involve tracking segmented objects across frames and comparing motion patterns to templates to identify activities through model fitting. The DEV8000 development kit and Linux are used to process video and recognize activities in real-time. Applications discussed include surveillance, sports analysis, and unmanned vehicles.
This document discusses the development of an Android application for physical activity recognition using the accelerometer sensor. It provides background on the Android operating system and its open development environment. It then summarizes relevant research papers on activity recognition using mobile sensors. The document outlines the process of collecting and labeling accelerometer data from smartphone sensors during different physical activities. Features are extracted from the sensor data and several machine learning classifiers are evaluated for activity recognition. The application will recognize activities and track metrics like calories burned, distance traveled, and implement fall detection and medical reminders.
The document discusses edge detection methods including gradient based approaches like Sobel and zero crossing based techniques like Laplacian of Gaussian. It proposes a new algorithm that applies fuzzy logic to the results of gradient and zero crossing edge detection on an image to more accurately identify edges. The algorithm calculates gradient and zero crossings, applies fuzzy rules to classify pixels, and thresholds to determine final edge pixels.
Human Activity Recognition (HAR) systems aim to recognize human activities through sensors in order to provide assistance. The key steps in designing a HAR system are:
1) Acquiring sensor data and preprocessing it by removing noise.
2) Segmenting the preprocessed data into windows that may contain activities.
3) Extracting features from each window to reduce the data into discriminative features.
4) Training a classification model on the extracted features to predict activity labels, and evaluating the model's performance using methods like a confusion matrix.
This document summarizes a study that used data mining techniques to predict crime using real-world crime datasets from Denver and Los Angeles. The goals were to identify crime hotspots and predict future crime types based on location, time, and other attributes. The models tested included the Apriori algorithm to identify frequent crime patterns, a naïve Bayesian classifier to predict crime type based on location/time features, and a decision tree classifier. Key results identified crime hotspots and showed the Bayesian classifier achieved prediction accuracies of 51-54% while the decision tree was more complex and achieved lower accuracy.
Edge detection is the name for a set of mathematical methods which aim at identifying points in a digital image at which the image brightness changes sharply or, more formally, has discontinuities.
This document discusses various techniques for image segmentation. It begins by defining image segmentation as dividing an image into constituent regions or objects based on visual characteristics. There are two main categories of segmentation techniques: edge-based techniques which detect discontinuities, and region-based techniques which partition images into regions of similarity. Popular region-based techniques include region growing, region splitting and merging, and watershed transformation. Edge-based techniques detect edges using methods like edge detection. The document provides an overview of these segmentation techniques and their applications in image analysis tasks.
The document discusses image segmentation techniques. It describes image segmentation as partitioning a digital image into multiple regions based on characteristics like color or texture. Common applications of image segmentation include industrial inspection, optical character recognition, and medical imaging. The techniques discussed are fixed thresholding, iterative thresholding, and fuzzy c-means clustering. Fuzzy c-means clustering is identified as the most suitable for pest image segmentation based on its lower entropy and normalized mutual information values. Simulated annealing is also proposed to improve upon the limitations of fuzzy c-means clustering.
Digital image processing refers to processing digital images using a computer. A digital image is composed of pixels that each have a location and value. Digital image processing originated in the 1920s with transmitting newspaper images via cable. It advanced with space missions in the 1960s and medical uses in the 1970s. Today, digital image processing is used widely for tasks like image enhancement, artistic effects, medical visualization, industrial inspection, and law enforcement.
This document discusses face recognition systems and the use of artificial neural networks for face recognition. It describes the basic steps in a face recognition system as face detection, alignment, feature extraction, and matching. Two types of neural networks that can be used for recognition are described - Radial Basis Function Networks and Back Propagation Networks. RBF Networks have an input, hidden, and output layer while BPN uses backpropagation of errors to adjust weights. The document also outlines some applications of face recognition systems such as ID verification and criminal investigations.
Lab manual of Digital image processing using python by khalid Shaikhkhalidsheikh24
This document is a practical workbook for a digital image processing course. It contains 8 lab sessions where a student learns how to install Python and PyCharm, read and display images, extract image pixel information, convert images between color spaces and formats, apply filters like blurring, and perform operations like edge detection and resizing. Each lab has the objective, task description, source code, and output for tasks related to foundational digital image processing techniques.
An evaluation of two popular segmentation algorithms, the mean shift-based segmentation algorithm and a graph-based segmentation scheme. We also consider a hybrid method which combines the other two methods.
"FingerPrint Recognition Using Principle Component Analysis(PCA)”Er. Arpit Sharma
Fingerprint recognition is one of the oldest and most popular biometric technologies and it is used in criminal investigations, civilian, commercial applications, and so on. Fingerprint matching is the process used to determine whether the two sets of fingerprints details come from the same finger or not. This work focuses on feature extraction and minutiae matching stage. There are many matching techniques used for fingerprint recognition systems such as minutiae based matching, pattern based matching, Correlation based matching, and image based matching.
A new method based upon Principal Component Analysis (PCA) for fingerprint enhancement is proposed in this paper. PCA is a useful statistical technique that has found application in fields such as face recognition and image compression, and is a common technique for finding patterns in data of high dimension. In the proposed method image is first decomposed into directional images using decimation free Directional Filter bank DDFB. Then PCA is applied to these directional fingerprint images which gives the PCA filtered images. Which are basically directional images? Then these directional images are reconstructed into one image which is the enhanced one. Simulation results are included illustrating the capability of the proposed method.
The document discusses the eigenface approach for face recognition. It provides an overview of eigenfaces, how they are calculated from a training set of faces, and how they can be used to identify faces by projecting faces onto the eigenface space. Major steps include calculating the eigenfaces from a training set, projecting new images into eigenface space to get weight coefficients, and comparing the weights to known individuals' weights or thresholds to classify faces as known or unknown. Advantages are ease of implementation and little preprocessing required, while limitations include sensitivity to head scale and only applicable to frontal views under controlled conditions.
The document discusses object tracking in computer vision. It begins with an introduction and overview of applications of object tracking. It then discusses object representation, detection, tracking algorithms and methodologies. It compares different tracking methods and provides an example of object tracking in MATLAB. Key steps in object tracking include object detection, tracking the detected objects across frames using algorithms like point tracking, kernel tracking and silhouette tracking. Common challenges with object tracking are also summarized.
Applications of Digital image processing in Medical FieldAshwani Srivastava
This document discusses different types of electromagnetic radiation used for imaging. It describes digital images as composed of pixels and notes that digital image processing involves manipulating digital images on a computer. It outlines different levels of image processing from low-level tasks like noise reduction to mid-level tasks like segmentation to high-level tasks like image analysis. It provides examples of imaging applications using gamma rays, X-rays, ultraviolet light, microwaves, radio waves, and magnetic resonance imaging.
Lec2: Digital Images and Medical Imaging ModalitiesUlaş Bağcı
This document discusses an introductory lecture on digital images and medical imaging modalities. It provides background on several modalities including X-ray, ultrasound, computed tomography, magnetic resonance imaging, positron emission tomography, and diffusion weighted imaging. For each modality, it describes the basic physics principles, clinical applications, and examples of images. The document emphasizes that medical image analysis is an important and active area of research that can help address challenges in measurement, detection, and diagnosis.
Design of a hand geometry based biometric systemBhavi Bhatia
This document provides details about the design of a hand geometry-based biometric system. It discusses the methodology used, which includes image acquisition, preprocessing, feature extraction, matching, and decision stages. Image acquisition involves capturing grayscale images of hands using a digital camera. Preprocessing includes binarization to separate the hand from the background. Feature extraction measures finger lengths and widths. The extracted features are then matched against templates in a database to verify a user's identity. The overall goal is to develop a biometric verification system using geometric features of the hand.
This document provides an overview of a digital image processing lecture given by Dr. Moe Moe Myint at Technological University in Kyaukse, Myanmar. It includes information about the instructor's contact information and office hours. The document then summarizes the contents of Chapter 2, which covers topics like visual perception, light and the electromagnetic spectrum, image sensing and acquisition, and basic relationships between pixels. Examples and diagrams are provided to illustrate concepts like the structure of the human eye, image formation, brightness adaptation, and the electromagnetic spectrum. Optical illusions are also discussed as examples of how visual perception does not always match physical light intensities.
The document discusses human activity recognition from video data using computer vision techniques. It describes recognizing activities at different levels from object locations to full activities. Basic activities like walking and clapping are the focus. Key steps involve tracking segmented objects across frames and comparing motion patterns to templates to identify activities through model fitting. The DEV8000 development kit and Linux are used to process video and recognize activities in real-time. Applications discussed include surveillance, sports analysis, and unmanned vehicles.
This document discusses the development of an Android application for physical activity recognition using the accelerometer sensor. It provides background on the Android operating system and its open development environment. It then summarizes relevant research papers on activity recognition using mobile sensors. The document outlines the process of collecting and labeling accelerometer data from smartphone sensors during different physical activities. Features are extracted from the sensor data and several machine learning classifiers are evaluated for activity recognition. The application will recognize activities and track metrics like calories burned, distance traveled, and implement fall detection and medical reminders.
This document summarizes a student project on human activity recognition using smartphones. A group of 4 students submitted the project to partially fulfill requirements for a Bachelor of Technology degree in computer science and engineering. The project involved developing a system to recognize human activities using the accelerometer and gyroscope sensors in smartphones. Various machine learning algorithms were tested and evaluated on experimental data collected from smartphone sensors. The goal of the project was to create an accurate and lightweight activity recognition system for smartphones, while also exploring active learning methods to reduce the amount of labeled training data needed.
This document summarizes a presentation on gesture recognition technology. It discusses the introduction of gesture recognition, the types of gestures, uses of gesture recognition including sign language recognition and virtual controllers. It also discusses input devices such as wired gloves and depth cameras. The document outlines algorithms for gesture recognition including 3D model-based, skeletal-based, and appearance-based algorithms. It concludes with discussion of challenges for gesture recognition including limitations of equipment and variations in recognition accuracy.
The document discusses a project to develop a desktop application that converts sign language to speech and text to sign language. It aims to help communicate with deaf people by removing barriers. The team plans to use EmguCV and C# Speech Engine. It has created an application that converts signs to text using image processing. Future work includes completing the software to cover all words in Arabic sign language.
Human Computer Interaction, Gesture provides a way for computers to understand human body language, Deals with the goal of interpreting hand gestures via mathematical algorithms, Enables humans to interface with the machine (HMI) and interact naturally without any mechanical devices
Gesture recognition using artificial neural network,a technology for identify...NidhinRaj Saikripa
The presentation contains a technology for identifying any type of body motions commonly originating from hand and face using artificial neural network.This include identifying sign language also.This technology is for speech impaired individuals.
// I have shared an IJCSE standard paper in this topic
cvpr2011: human activity recognition - part 3: single layerzukun
The document discusses approaches to human activity analysis from the 1990s to 2000s, including early approaches that modeled activities as sequences of body poses using hidden Markov models, and later space-time approaches that analyzed activities as 3D video volumes, extracting global features or sparse interest points from the volumes to enable action recognition. It reviews key works that used techniques such as motion history images, volume matching, and spatio-temporal interest point detectors to analyze and recognize human activities in video data.
This presentation was prepared by Ishara Amarasekera based on the paper, Activity Recognition using Cell Phone Accelerometers by Jennifer R. Kwapisz, Gary M. Weiss and Samuel A. Moore.
This presentation contains a summary of the content provided in this research paper and was presented as a paper discussion for the course, Mobile and Ubiquitous Application Development in Computer Science.
On the Development of A Real-Time Multi-Sensor Activity Recognition SystemOresti Banos
There exist multiple activity recognition solutions offering
good results under controlled conditions. However, little attention has been given to the development of functional systems operating in realistic settings. In that vein, this work aims at presenting the complete process for the design, implementation and evaluation of a real-time activity recognition system. The proposed recognition system consists of three wearable inertial sensors used to register the user body motion, and a mobile application to collect and process the sensory data for the recognition of the user activity. The system not only shows good recognition capabilities after online evaluation but also after analysis at runtime. In view of the obtained results, this system may serve for the recognition
of some of the most frequent daily physical activities.
Wearable Computing - Part III: The Activity Recognition Chain (ARC)Daniel Roggen
This document discusses activity recognition from sensor data. It describes how simple binary sensors can provide some information but full activity detection requires interpreting multiple correlated sensor streams using techniques like signal processing, machine learning and reasoning. Key steps in activity recognition systems are preprocessing, segmentation, feature extraction, and classification of sensor data. Challenges include continuous recognition, dealing with variable executions of activities, and separating activities from non-activities.
https://imatge-upc.github.io/activitynet-2016-cvprw/
This thesis explore different approaches using Convolutional and Recurrent Neural Networks to classify and temporally localize activities on videos, furthermore an implementation to achieve it has been proposed. As the first step, features have been extracted from video frames using an state of the art 3D Convolutional Neural Network. This features are fed in a recurrent neural network that solves the activity classification and temporally location tasks in a simple and flexible way. Different architectures and configurations have been tested in order to achieve the best performance and learning of the video dataset provided. In addition it has been studied different kind of post processing over the trained network's output to achieve a better results on the temporally localization of activities on the videos. The results provided by the neural network developed in this thesis have been submitted to the ActivityNet Challenge 2016 of the CVPR, achieving competitive results using a simple and flexible architecture.
The document discusses human action recognition using spatio-temporal features. It proposes using optical flow and shape-based features to form motion descriptors, which are then classified using Adaboost. Targets are localized using background subtraction. Optical flows within localized regions are organized into a histogram to describe motion. Differential shape information is also captured. The descriptors are used to train a strong classifier with Adaboost that can recognize actions in testing videos.
This document outlines the generalised method of moments (GMM) estimation technique. It begins with the basic principles of GMM, including that it uses theoretical relations that parameters should satisfy to choose parameter estimates. It then discusses estimating GMM, hypothesis testing with GMM, and extensions such as using GMM with dynamic stochastic general equilibrium (DSGE) models. The document provides details on how population moments relate to sample moments, and how method of moments estimation and instrumental variables estimation can both be viewed as special cases of GMM. It concludes by explaining how the generalized method of moments estimator works by minimizing a weighted distance between sample and population moments.
Linear SVM: A Case Study, First Part (Coding And Theory) discusses spam filtering using a linear support vector machine (SVM) model. It introduces the concepts of term frequency-inverse document frequency (TF-IDF) to transform text into numeric vectors for modeling, defines the separating hyperplane and role of bias terms in linear SVMs, and explains that the goal is to find the hyperplane that maximizes the margin between examples of different classes for classification.
This document provides an overview of gesture recognition technology, including what gestures are, the history and basic workings of gesture recognition, different types of gesture recognition and sensing technologies, algorithms used, applications, and challenges. It discusses hand, facial, and sign language recognition and technologies like wired gloves, cameras, and controllers. Benefits include interacting without mouse/keyboard and with 3D environments without physical contact. Applications include rehabilitation, sign language, gaming, and assisting those with disabilities.
Red Tacton is a new human area network technology developed by Japanese company NTT that uses weak electric fields on the surface of the human body for high-speed network transmission. It works by creating changes in electric fields from a transmitting transceiver that are received by a sensing receiving transceiver. Data transfer is faster and more secure compared to other technologies, though it has a limited range of a few centimeters and is more costly. Potential applications include touch-based controls and transferring various media types through the human body.
This document summarizes a presentation on video indexing and retrieval using XML and XQuery. It outlines the proposed model, which indexes sports videos semantically using XML. It extracts features from raw video data and indexes events and objects. Queries are processed using XQuery to dynamically return customized summaries and search results to users. The document also discusses existing methods, the methods planned for key frames, objects, motion, and classification, as well as why XML is suitable for the indexing structure.
Key Frame Extraction in Video Stream using Two Stage Method with Colour and S...ijtsrd
Key Frame Extraction is the summarization of videos for different applications like video object recognition and classification, video retrieval and archival and surveillance is an active research area in computer vision. In this paper describe a new criterion for well presentative key frames and correspondingly, create a key frame selection algorithm based Two stage Method. A two stage method is used to extract accurate key frames to cover the content for the whole video sequence. Firstly, an alternative sequence is got based on color characteristic difference between adjacent frames from original sequence. Secondly, by analyzing structural characteristic difference between adjacent frames from the alternative sequence, the final key frame sequence is obtained. And then, an optimization step is added based on the number of final key frames in order to ensure the effectiveness of key frame extraction. Khaing Thazin Min | Wit Yee Swe | Yi Yi Aung | Khin Chan Myae Zin "Key Frame Extraction in Video Stream using Two-Stage Method with Colour and Structure" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd27971.pdfPaper URL: https://www.ijtsrd.com/computer-science/data-processing/27971/key-frame-extraction-in-video-stream-using-two-stage-method-with-colour-and-structure/khaing-thazin-min
This document discusses action recognition in videos. It begins by defining action recognition and describing its applications such as surveillance, video search, and medical monitoring. Challenges of action recognition like scale variations, camera motion, and human pose differences are presented. The document reviews papers on local space-time features with SVMs and two-stream convolutional networks. It shows that local features combined with SVMs achieved the best results on a dataset of human actions. Two-stream ConvNets, which use spatial and temporal streams, became the state-of-the-art by capturing shape from frames and motion from optical flow. Future work may explore deeper ConvNets with larger datasets.
Video Content Identification using Video Signature: SurveyIRJET Journal
This document summarizes previous research on video content identification using video signatures. It discusses three types of video signatures (spatial, temporal, and spatio-temporal) that have been used to generate unique descriptors to identify identical video scenes. The document then reviews several existing methods for video signature extraction and matching, including techniques based on ordinal signatures, motion signatures, color histograms, local descriptors using interest points, and compressed video shot matching using dominant color profiles. It concludes by proposing a new temporal signature-based method that aims to accurately detect a video segment embedded in a longer unrelated video by extracting frame-level features, generating fine and coarse signatures, and performing frame-by-frame signature matching.
This document discusses a content-based video retrieval system based on dominant color and texture features. It begins with an introduction to content-based video retrieval and the challenges involved. It then describes representing video through segmentation into shots and frames. The proposed method extracts dominant color, texture, and color histogram features from frames. Texture is captured through gray-level co-occurrence matrix analysis. A combined feature vector is constructed and similarity measured through Euclidean distance. The system is aimed at efficient video retrieval through analyzing dominant color and texture information.
Video Key-Frame Extraction using Unsupervised Clustering and Mutual ComparisonCSCJournals
The document presents a novel method for extracting key frames from videos using unsupervised clustering and mutual comparison. It assigns weights of 70% to color (HSV histogram) and 30% to texture (GLCM) when computing frame similarity for clustering. It then performs mutual comparison of extracted key frames to remove near duplicates, improving accuracy. The algorithm is computationally simple and able to detect unique key frames, improving concept detection performance as validated on open databases.
The document proposes a method to estimate the authenticity of online videos by analyzing visual quality and video structure. It aims to distinguish original videos from edited versions. The method bridges visual quality assessment and digital forensics by using shot identification to detect deleted shots and enable normalization. Visual quality is used as a proxy for authenticity, with lower quality and more deleted shots indicating lower authenticity. The proposed algorithm estimates the original parent video, detects information loss through editing, applies quality metrics, and aggregates penalties to calculate a video's authenticity degree.
How to prepare a perfect video abstract for your research paper – Pubrica.pdfPubrica
This document provides guidance on creating a video abstract for a research paper. It defines a video abstract as a shorter video that retains the essential meaning of the original through a series of moving pictures. The document discusses techniques for video abstraction, including keyframes, animations, and PowerPoint presentations. It provides technical specifications for video quality and formatting and guidelines for video submission.
Key frame extraction methodology for video annotationIAEME Publication
This document summarizes a research paper that proposes a key frame extraction methodology to facilitate video annotation. The methodology uses edge difference between consecutive video frames to determine if the content has significantly changed. Frames where the edge difference exceeds a threshold are selected as key frames. The algorithm calculates edge differences for all frame pairs in a video. It then computes statistics like mean and standard deviation to determine a threshold. Frames with differences above this threshold are extracted as key frames. The key frames extracted represent important content changes in the video. Extracting key frames reduces processing requirements for video annotation compared to analyzing all frames. The methodology was tested on videos from domains like transportation and performed well at selecting representative frames.
Coronary heart disease is a disease with the highest mortality rates in the world. This makes the development of the diagnostic system as a very interesting topic in the field of biomedical informatics, aiming to detect whether a heart is normal or not. In the literature there are diagnostic system models by combining dimension reduction and data mining techniques. Unfortunately, there are no review papers that discuss and analyze the themes to date. This study reviews articles within the period 2009-2016, with a focus on dimension reduction methods and data mining techniques, validated using a dataset of UCI repository. Methods of dimension reduction use feature selection and feature extraction techniques, while data mining techniques include classification, prediction, clustering, and association rules.
Key frame extraction is an essential technique in the computer vision field. The extracted key frames should brief the salient events with an excellent feasibility, great efficiency, and with a high-level of robustness. Thus, it is not an easy problem to solve because it is attributed to many visual features. This paper intends to solve this problem by investigating the relationship between these features detection and the accuracy of key frames extraction techniques using TRIZ. An improved algorithm for key frame extraction was then proposed based on an accumulative optical flow with a self-adaptive threshold (AOF_ST) as recommended in TRIZ inventive principles. Several video shots including original and forgery videos with complex conditions are used to verify the experimental results. The comparison of our results with the-state-of-the-art algorithms results showed that the proposed extraction algorithm can accurately brief the videos and generated a meaningful compact count number of key frames. On top of that, our proposed algorithm achieves 124.4 and 31.4 for best and worst case in KTH dataset extracted key frames in terms of compression rate, while the-state-of-the-art algorithms achieved 8.90 in the best case.
How to prepare a perfect video abstract for your research paper – Pubrica.pptxPubrica
A video abstract is a series of moving pictures taken from a lengthier movie that is significantly shorter than the original yet retains the original's essential meaning.
Learn More : https://bit.ly/3JVyrCW
Reference: https://pubrica.com/services/publication-support/Video-Abstract/
Why Pubrica:
When you order our services, we promise you the following – Plagiarism free | always on Time | 24*7 customer support | Written to international Standard | Unlimited Revisions support | Medical writing Expert | Publication Support | Bio statistical experts | High-quality Subject Matter Experts.
Contact us:
Web: https://pubrica.com/
Blog: https://pubrica.com/academy/
Email: sales@pubrica.com
WhatsApp : +91 9884350006
United Kingdom: +44-1618186353
VISUAL ATTENTION BASED KEYFRAMES EXTRACTION AND VIDEO SUMMARIZATIONcscpconf
Recent developments in digital video and drastic increase of internet use have increased the
amount of people searching and watching videos online. In order to make the search of the
videos easy, Summary of the video may be provided along with each video. The video summary
provided thus should be effective so that the user would come to know the content of the video
without having to watch it fully. The summary produced should consists of the key frames that
effectively express the content and context of the video. This work suggests a method to extract
key frames which express most of the information in the video. This is achieved by quantifying
Visual attention each frame commands. Visual attention of each frame is quantified using a
descriptor called Attention quantifier. This quantification of visual attention is based on the
human attention mechanism that indicates color conspicuousness and the motion involved seek
more attention. So based on the color conspicuousness and the motion involved each frame is
given a Attention parameter. Based on the attention quantifier value the key frames are extracted and are summarized adaptively. This framework suggests a method to produces meaningful video summary.
Content based video retrieval using discrete cosine transformnooriasukmaningtyas
A content based video retrieval (CBVR) framework is built in this paper.
One of the essential features of video retrieval process and CBVR is a color
value. The discrete cosine transform (DCT) is used to extract a query video
features to compare with the video features stored in our database. Average
result of 0.6475 was obtained by using the DCT after implementing it to the
database we created and collected, and on all categories. This technique was
applied on our database of video, check 100 database videos, 5 videos in
Keywords: each category.
Real-Time Video Copy Detection in Big DataIRJET Journal
This document summarizes research on real-time video copy detection algorithms using Hadoop. It discusses existing algorithms like TIRI-DCT and brightness sequence that have limitations such as being slow and inaccurate. The paper proposes implementing improved versions of these algorithms using Hadoop for faster search times. Fingerprint extraction and indexing techniques like inverted file-based similarity search and cluster-based similarity search are also summarized. The paper concludes that using Hadoop can significantly improve efficiency for processing large video datasets while optimizing algorithms for speed, accuracy and robustness against various attacks.
Key frame extraction for video summarization using motion activity descriptorseSAT Journals
This document presents a method for video summarization using motion activity descriptors. It extracts key frames from videos by comparing motion between consecutive frames using block matching algorithms like diamond search and three step search. These algorithms determine which blocks to compare from consecutive frames to find the closest block match and derive a motion activity descriptor. Frames with high motion descriptors, indicating more difference between frames, are selected as key frames for the video summary. The method was tested on various video categories and showed high precision and summarization for some videos but lower values for others, depending on factors like scene changes, motion detectability, and object/area properties. An effective summary balances high precision with a high summarization factor by selecting frames that best represent the video's
Key frame extraction for video summarization using motion activity descriptorseSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
This paper addresses the need for a scientific framework enabling the adaptive delivery of omnidirectional video within heterogeneous environment. It considers state of the art techniques for the adaptive video streaming over HTTP and extends it towards omnidirectional/360-degree videos.
Human Vision and Electronic Imaging 2018, 28 January - 2 February, 2018 • Burlingame, California USA
1. The document proposes an efficient algorithm to retrieve videos from a database using a video clip as a query.
2. Key features like color, texture, edges and motion are extracted from video shots and clusters are created using these features to reduce search time complexity.
3. When a query video is given, its features are used to search the closest cluster. Then sequential matching of additional features and shot lengths is done to find the most similar matching videos from the database.
This document presents a framework for automatic semantic content extraction from videos. It discusses extracting frames from videos, using a genetic algorithm-based classifier to identify objects in frames, and applying an ontology and rules to extract semantic concepts and events from the identified objects based on their spatial and temporal relationships. The proposed approach uses a domain-independent ontology model and rules to semantically represent video content without relying on specific domains or assumptions. The framework has been implemented and tested on multiple domains, providing satisfactory results for semantic video content retrieval.
Similar to Human Activity Recognition (HAR) using HMM based Intermediate matching kernel by representing video as sequence of sets of feature vectors (20)
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
DEEP LEARNING FOR SMART GRID INTRUSION DETECTION: A HYBRID CNN-LSTM-BASED MODELgerogepatton
As digital technology becomes more deeply embedded in power systems, protecting the communication
networks of Smart Grids (SG) has emerged as a critical concern. Distributed Network Protocol 3 (DNP3)
represents a multi-tiered application layer protocol extensively utilized in Supervisory Control and Data
Acquisition (SCADA)-based smart grids to facilitate real-time data gathering and control functionalities.
Robust Intrusion Detection Systems (IDS) are necessary for early threat detection and mitigation because
of the interconnection of these networks, which makes them vulnerable to a variety of cyberattacks. To
solve this issue, this paper develops a hybrid Deep Learning (DL) model specifically designed for intrusion
detection in smart grids. The proposed approach is a combination of the Convolutional Neural Network
(CNN) and the Long-Short-Term Memory algorithms (LSTM). We employed a recent intrusion detection
dataset (DNP3), which focuses on unauthorized commands and Denial of Service (DoS) cyberattacks, to
train and test our model. The results of our experiments show that our CNN-LSTM method is much better
at finding smart grid intrusions than other deep learning algorithms used for classification. In addition,
our proposed approach improves accuracy, precision, recall, and F1 score, achieving a high detection
accuracy rate of 99.50%.
Human Activity Recognition (HAR) using HMM based Intermediate matching kernel by representing video as sequence of sets of feature vectors
1. Presented By
Rupali Bhatnagar
14CSE2013
Under the guidance of
Dr. Veena T.
Assistant Professor
Project Presentation on
Human Activity Recognition using HMM based Intermediate
Matching Kernel by representing videos as sets of feature
vectors
Department of Computer Science And Engineering
National Institute of Technology Goa
8 July 2016
2. Outline
• Human Activity Recognition
• Types of patterns in a video
• Challenges to the task of classification of videos
• Problem Statement
• Related Work
• SVM based methods
• GMM based methods
• HMM based methods
• Proposed Solution
• Feature Extraction Module
• Classification using HIMK based SVM
• Results
• Conclusions and Future Directions
• References
January 10, 2017 Human Activity Recognition using HIMK by representing videos as sets of feature vectors 2
3. Human Activity Recognition
January 10, 2017 3
• Automatic detection of human activity events from videos by :-
• Detecting when the activity takes place
• Determining what activity has taken place
• APPLICATIONS:-
• Surveillance Systems
• Patient Monitoring Systems
• Crowd Behaviour Prediction Systems
• Sports play analysis
• Content based video search
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
4. Classification of videos
January 10, 2017 4
• Video is composed of a sequence of frames.
• The number of frames depends on the duration of the video.
• The images are temporally related to one another.
• The images themselves have a local spatial correlations.
Time t= 0 1 2 3 4 t T-2 T-1 T
Figure : A video is composed of a sequence of frames
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
5. Classification of videos : Types of patterns
• A video has 2 categories of patterns:-
• SPATIAL PATTERNS
• TEMPORAL PATTERNS
January 10, 2017 5Human Activity Recognition using HIMK by representing videos as sets of feature vectors
6. Classification of videos : Types of patterns
• A video has 2 categories of patterns:-
• SPATIAL PATTERNS
• Local features of frames of videos.
• Appearance based features – Corners, Edges, Colors, etc
• Helps in detecting:-
• Edges
• Backgrounds
• Textures
• Objects
• TEMPORAL PATTERNS
January 10, 2017 6
Figure : Spatial patterns in an image
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
7. Classification of videos : Types of patterns
• A video has 2 categories of patterns:-
• SPATIAL PATTERNS
• TEMPORAL PATTERNS
• Capture the sequence of frames.
• Motion information embedded in the video can be taken out.
January 10, 2017
Human Activity Recognition using HIMK by
representing videos as sets of feature vectors
7
Figure : Motion information embedded in a video (Action = handclapping)
8. Classification of videos : Challenges
• Varying length representations[1]
• High dimensionality
• Intra-class variability
• Inter-class similarity
January 10, 2017 8Human Activity Recognition using HIMK by representing videos as sets of feature vectors
9. Classification of videos : Challenges
• Varying length representations[1]
• High dimensionality
• Intra-class variability
• Inter-class similarity
January 10, 2017 9
Time t= 0 1 2 3 4 t1 T1 -2 T1 -1 T1
Time t= 0 1 2 3 4 t2 T2 -2 T2-1 T2
T1 frames =
Figure : Varying length representations for videos of different sizes
Video 1
Video 2
F1 F2 …… Ft1 ……. ……… FT1
T2 frames = F1 F2 …… Ft2 ……. …….. ……. ……… FT2
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
10. Classification of videos : Challenges
• Varying length representations
• High dimensionality
• Intra-class variability
• Inter-class similarity
January 10, 2017
Human Activity Recognition using HIMK by
representing videos as sets of feature vectors
10
Time t= 0 1 2 3 4 t T-2 T-1 T
D-dimensional D-dimensional D-dimensional
D-dim D-dim D-dim D-dim
F1 F2 … … Ft … …. FT
Figure : High dimensionality of video data
11. Classification of videos : Challenges
• Varying length representations
• High dimensionality
• Intra-class variability
• Inter-class similarity
January 10, 2017 11
Figure : Variations in the running class
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
12. Classification of videos : Challenges
• Varying length representations
• High dimensionality
• Intra-class variability
• Inter-class similarity
January 10, 2017 12
Figure : (a) Similarity between Karate and Taekwondo classes
(b) Similarity between running and walking classes
(a) (b)
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
13. Problem Statement
• For the task of human activity recognition, we need to come up with a
methodology that does the following:-
• The model should capture the appearance based information in the video.
• It should also capture the temporal information of a video.
• The model captures the sequential information in video accurately.
• The model should have a definitive reason to classify a given video by using the
information we capture above.
January 10, 2017 13Human Activity Recognition using HIMK by representing videos as sets of feature vectors
14. Related Work
January 10, 2017 14
• SVM based methods
• GMM based methods
• HMM based methods
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
15. Related Work
• SVM based methods
• Method 1 : By Yegnanarayana et al.[2]
• Uses 3 kinds of features : Color Features, Shape features & Motion features
• Uses 1-vs-rest approach for SVM classification
• GMM based methods
• HMM based methods
January 10, 2017 15Human Activity Recognition using HIMK by representing videos as sets of feature vectors
16. Related Work
• SVM based methods
• Method 2 : Directed Acyclic Graph based SVM (DAGSVM) by Jiang et al.[3]
• Uses features based on video editing, color, texture and motion.
• Uses 1-vs-1 SVM classifiers arranged as a directed acyclic graph.
• GMM based methods
• HMM based methods
January 10, 2017 16
Figure : DAGSVM Approach
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
17. Related Work
• SVM based methods
• Method 3 : Hierarchical SVM by Yuan et al.[4]
• Uses Spatial features – face-frame ratio, brightness & entropy.
• Uses Temporal features - average shot length, cut percentage, average color difference & camera
motion.
• Creates 2 trees:
• Local optimal SVM binary tree
• Global optimal SVM binary tree
• GMM based methods
• HMM based methods
January 10, 2017 17Human Activity Recognition using HIMK by representing videos as sets of feature vectors
18. • SVM based methods
• Method 4 : String Kernel by Ballan et al.[5]
• Events are modeled as a sequence composed of histograms of visual features, computed using Bag of
Words(BoW) approach.
• The sequences are treated as strings (phrases) where each histogram is considered as a character.
• String kernel is based on Needleman-Wunsch edit distance which is computed as following:-
𝐾 𝑥, 𝑥′ = 𝑒−𝑑(𝑥,𝑥′)
• GMM based methods
• HMM based methods
Related Work
January 10, 2017 18
Figure: String Kernel Approach by Ballan et al.
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
19. Related Work
January 10, 2017 19
• SVM based methods
• GMM based methods
• Method 1: Approach by Xu et al.[6]
• They combine 3 video features and 1 audio feature to create a super vector and then apply
Principal Component Analysis(PCA) to reduce the dimensionality.
• They model the features for various classes using GMM and train the parameters of GMM using
Expectation-Maximization Algorithm(EM).
• HMM based methods
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
20. Related Work
January 10, 2017 20
• SVM based methods
• GMM based methods
• HMM based methods
• Method: ACTIVE(Activity Concept Transition in Video Events) by Nevatia et al.[7]
• Video event is defined as a sequence of activity concepts .
• A new concept is generated with certain probabilities based on the previous concept.
• An observation is a low level feature vector from a sub-clip and generated based on the concepts.
• The feature vector is obtained by using Fisher Kernel over the HMM.
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
21. Proposed Solution
January 10, 2017 21
Figure : Model of the proposed solution
Video
Dataset
Class
Labels
Video
Representation
using Bag of
Words of models
HoG Feature
Extraction
Feature Extraction Module
SVM Classifier
HIMK Kernel
Gram Matrix
Classification Module
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
22. Proposed Model: Feature Extraction
January 10, 2017 22
• Histogram of Oriented Gradients[8] is scale-invariant & rotation-invariant within a cell.
Normalization makes it illuminance-invariant.
• Useful for object detection.
Block B
C11 C12 C13 C14 C15
. .
. .
. Cell .
. .
C51 C52 C53 C54 C55
Figure : Image containing blocks which contain overlapping cells
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
23. Proposed Model: Feature Extraction
January 10, 2017 23
• 2 methods to extract features:-
• Dense HoG features by using overlapping blocks
• Dense HoG features by using non-overlapping blocks
Method 1: Overlapping blocks based HoG algorithm by Dalal et al.[8]-
• Feature Vector Dimension = (no of blocks in image * no of pixels in image)
• Where no. of overlapping blocks for image =
𝑛𝑜 𝑜𝑓 𝑟𝑜𝑤𝑠−1 ∗(𝑛𝑜 𝑜𝑓 𝑐𝑜𝑙𝑠 −1)
(𝐵𝑙𝑜𝑐𝑘 𝑠𝑖𝑧𝑒)
• Due to the overlapping nature of the blocks in the image, the dimensionality of the local feature vector
increases.
• This resulted in a very huge training feature vector set.
• This feature vector set became computationally inefficient.
• Also, because of such a huge dimensional data, it is not possible to apply statistical methods of
dimensionality reduction (PCA)
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
24. Proposed Model: Feature Extraction
January 10, 2017 24
• 2 methods to extract features:-
• Dense HoG features by using overlapping blocks
• Dense HoG features by using non-overlapping blocks
Method 2: Non-overlapping blocks based HoG algorithm by Dalal et al.[8]-
• Due to the overlapping nature of the blocks in the image, the dimensionality of the local feature vector
increases.
• We observe that dimensionality of the feature vector for each frame in the video reduces drastically when
we ignore the non-overlapping block data.
[266x36] dimensional ⟶ [70x36] dimensional
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
25. Video Representation: Bag of words model
January 10, 2017 25
Training dataset represented as a
set of HoG feature vectors taken
from each frame of each training
video
Clustering
A
B
D
E
F
Codebook Generation
Codewords generated by clustering Generated codebook
(extracted Features)
Figure: Codebook generated using codewords (Bag of words model)
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
26. Histogram Matching Score based K-medoid
clustering
January 10, 2017 26
• INTUITION-
• Features used = Histogram of Oriented Gradients(HoG).
• For calculating similarity between Histograms, we use Histogram Matching Score.
• HISTOGRAM MATCHING SCORE-
HMS h1,h2 =
𝟏
𝑵 𝒏=𝟏
𝑵
min 𝒉 𝟏𝒏, 𝒉 𝟐𝒏
where N= number of bins in Histograms h1 and h2.
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
27. Histogram Matching Score based K-medoid
clustering
January 10, 2017 27
HMS(H1,H2)=
4+3+2+4+6+1+1+6+5
9
= 3.44
Figure : Calculation of Histogram Matching Score
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
28. Histogram Matching Score based K-medoid
clustering
January 10, 2017 28
Algorithm : Histogram Matching Score based – K-medoid algorithm
Inputs: k := number of clusters
Initialize: {x1, x2 . . . xk} ← k random cluster centers
while {x1, x2 . . . xk} not converged do
for each data vector vi do
for each cluster centre xk do
Calculate Histogram Matching Score between vi and xk
Assign index of vi as:
index(vi) ← max(Histogram Matching Score w.r.t all the cluster centers)
for each cluster k do
New cluster center xnew = medoid of all the Histogram Scores in the cluster
if ( xnew == x ) then
return converged
else
x = xnew
return not converged
end
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
29. SVM Classifier
January 10, 2017 29
• SVM is a discriminative classifier with the following properties:-
It is a binary classifier.
It constructs an optimum hyperplane to divide the data.[9]
Maximum
Margin
Hyperplane
Figure : Maximum Margin Hyperplane for Linearly
Separable Data
Figure : Soft Margin Hyperplane for Non Linearly
Separable Data & Overlapping Data[10]
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
30. Kernel based methods for SVM
January 10, 2017 30
• Kernel method was proposed to handle the issues for non-linearly separable
data & overlapping data.
Nonlinear transformation of data to a higher dimensional feature space induced by a
Mercer kernel.
Construction of optimal linear solutions in the kernel feature space.
Figure: Illustration of Kernel method for non-linearly separable data
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
31. Sequence Kernel/Dynamic Kernel
January 10, 2017 31
• Videos are a sequence of frames. To capture the motion information, we model a video as a sequence of feature
vectors.
• ADVANTAGE- No need to convert varying length representations into a fixed length representation.
• Examples of Sequence Kernels:
• Fisher Kernel
• Probablistic Sequence Kernel
• GMM Supervector kernel
• CIGMM-IMK[11]
• HIMK[12]
F1 F2 …… Ft1 ……. ……… FT1
F1 F2 …… Ft2 ……. …….. ……. ……… FT2
Feature vector of size T1 (xi)
Feature vector of size T2 (xj)
Figure: Feature vector of 2 examples with different lengths
K(xi,xj)
SEQUENCE KERNEL
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
32. Intermediate Matching Kernel(IMK)
January 10, 2017 32
• Intermediate Matching Kernel makes use of virtual feature vectors to match 2 varying length
representations.
X1 X2 … … Xm Y1 Y2 … … … … Yn
K(X1*, Y1*) K(X2*, Y2*) … … … … K(XQ*, YQ*)
X1* X2* … … … … XQ* Y1* Y2* … … … … YQ*
Figure: Matching using virtual feature vectors
Mapping to virtual feature vector Mapping to virtual feature vector
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
33. HMM-based Intermediate Matching Kernel(HIMK)
January 10, 2017 33
• In its core, it uses an HMM that is an apt model for representing sequential information.
• Intermediate Matching Kernel makes use of virtual feature vectors to match 2 varying length
representations.
• Proposed by Dileep et al.[12], HIMK for speech is calculated as sum of base kernels of all the
components of all the GMMs that are present at each state of the HMM.
Figure: HMM based IMK calculation for speech signals [12]
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
34. HMM-based Intermediate Matching Kernel(HIMK)
January 10, 2017 34
Figure: HIMK for videos
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
35. Results
January 10, 2017 35
Boxing Handclapping Handwaving Jogging Running Walking
Boxing 63.7% 6.72% 2.61% 7.34% 15.9% 3.73%
Handclapping 11.31% 71.48% 8.41% 2.64% 5.11% 1.05%
Handwaving 18.26% 12.39% 65.34% 1.4% 2.03% 0.58%
Jogging 8.61% 1.26% 1.4% 49.54% 22.9% 16.29%
Running 4.65% 0.16% 0.67% 19.61% 62.18% 12.73%
Walking 5.13% 2.19% 4.31% 23.47% 12.29% 52.61%
Accuracy 60.81%
Table: Percent wise Confusion Matrix using the proposed method for k=32
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
36. Results
January 10, 2017 36
Representation Accuracy
String kernel with Chi Square Metric 52.5%
String kernel with Intersection metric 51.48%
String kernel with Kolomogrov Smirnov
metric
48.37%
Proposed method 60.81%
Table: Comparison of accuracy of classification
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
37. Conclusions & Future Directions
January 10, 2017 37
• Conclusion
• We proposed to use HIMK based SVM classifier for the task of human activity
recognition.
• We discussed the feature extraction process to get a varying length representation
for videos using the Bag of Features model using Histogram Match based K-medoids
algorithm.
• We then discussed about the HMM based IMK and how to use the HIMK for the task
of classification for videos.
• Future Work
• Use of motion features for better representation.
• Use of deep learning based feature representations for videos.
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
38. References
January 10, 2017 38
1. Roach, M., Mason, J. S., Evans, N. W., Xu, L. Q., & Stentiford, F. “Recent Trends in Video Analysis: A Taxonomy of Video
Classification Problems”, in IMSA, 2002, pp. 348-353.
2. V. Suresh, M. C Krishna, R. Swamy and B. Yegnanarayana, "Content-based video classification using support vector
machines", in International conference on neural information processing, 2004, pp. 726-731.
3. X. Jiang, T. Sun, and S. Wang, "An automatic video content classification scheme based on combined visual features model
with modified DAGSVM,“ in Multimedia Tools and Applications, 2010 ,vol. 52, no. 1, pp. 105–120.
4. Yuan, X., Lai, W., Mei, T., Hua, X. S., Wu, X. Q., & Li, S., ”Automatic video genre categorization using hierarchical SVM”, in
International Conference on Image Processing, 2006, pp. 2905-2908
5. L. Ballan, M. Bertini, A. Del Bimbo, and G. Serra, "Video event classification using string kernels, in "Multimedia Tools and
Applications, 2009 vol. 48, no. 1, pp. 69–87.
6. Xu, L. Q., & Li, Y. “Video classification using spatial-temporal features and PCA”, in In International Conference on
Multimedia and Expo ,2003, vol. 3, pp: 3-485.
7. Sun, Chen, and Ram Nevatia. "Active: Activity concept transitions in video event classification." In Proceedings of the IEEE
International Conference on Computer Vision, 2013 ,pp. 913-920.
8. Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." In IEEE Computer Society
Conference on Computer Vision and Pattern Recognition,2005, vol. 1, pp. 886-893.
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
39. References
January 10, 2017 39
9. Vapnik, Vladimir N. "An overview of statistical learning theory“ in IEEE transactions on neural networks,1999, vol.
10,no 5, pp : 988-999.
10. Dileep, A. D., T. Veena, and C. Chandra Sekhar "A review of kernel methods based approaches to classification and
clustering of sequential patterns, part i: sequences of continuous feature vectors.“ in Data Mining: Concepts,
Methodologies, Tools, and Applications: Concepts, Methodologies, Tools, and Applications, 2012,vol. 1, pp: 1-251.
11. Dileep, Aroor Dinesh, and Chellu Chandra Sekhar "GMM-based intermediate matching kernel for classification of
varying length patterns of long duration speech using support vector machines“ in IEEE transactions on neural
networks and learning systems, 2014, vol. 25, no. 8, pp: 1421-1432.
12. Dileep, A. D., and C. Chandra Sekhar "HMM based intermediate matching kernel for classification of sequential
patterns of speech using support vector machines. in IEEE Transactions on Audio, Speech, and Language Processing,
2013, vol. 21, no. 12, pp: 2570-2582.
Human Activity Recognition using HIMK by representing videos as sets of feature vectors
40. January 10, 2017 40
THANK YOU
Human Activity Recognition using HIMK by representing videos as sets of feature vectors