The document presents a method for human action recognition using both RGB and depth data from an RGB-D sensor. Motion History Images (MHIs) are generated from RGB videos and Depth Motion Maps (DMMs) are generated from depth data after rotating the 3D point clouds. A 4-channel Deep Convolutional Neural Network is trained with one channel for MHIs and three channels for the rotated DMM views. Evaluated on the UTD-MHAD dataset, the proposed method achieves better recognition accuracy when fusing both RGB and depth modalities compared to using each individually.
Initial Introduction of Image processing is included in these slides which contain 1. Introduction of Image Processing
2.Elements of visual perception
3. Image sensing and Quantization
4.A simple image formation model
5.Basic concept of Sampling and Quantization
Reader will find it easy to understand the topics described here in slides . A detailed description of each topic illustrated here.
Please read and if you like do comments also.... Thanks
Information search using text and image queryeSAT Journals
Abstract An image retrieval and re-ranking system utilizing a visual re-ranking framework which is proposed in this paper the system retrieves a dataset from the World Wide Web based on textual query submitted by the user. These results are kept as data set for information retrieval. This dataset is then re-ranked using a visual query (multiple images selected by user from the dataset) which conveys user’s intention semantically. Visual descriptors (MPEG-7) which describe image with respect to low-level feature like color, texture, etc are used for calculating distances. These distances are a measure of similarity between query images and members of the dataset. Our proposed system has been assessed on different types of queries such as apples, Console, Paris, etc. It shows significant improvement on initial text-based search results.This system is well suitable for online shopping application. Index Terms: MPEG-7, Color Layout Descriptor (CLD), Edge Histogram Descriptor (EHD), image retrieval and re-ranking system
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
The students can learn about basics of image processing using matlab.
It explains the image operations with the help of examples and Matlab codes.
Students can fine sample images and .m code from the link given in slides.
Initial Introduction of Image processing is included in these slides which contain 1. Introduction of Image Processing
2.Elements of visual perception
3. Image sensing and Quantization
4.A simple image formation model
5.Basic concept of Sampling and Quantization
Reader will find it easy to understand the topics described here in slides . A detailed description of each topic illustrated here.
Please read and if you like do comments also.... Thanks
Information search using text and image queryeSAT Journals
Abstract An image retrieval and re-ranking system utilizing a visual re-ranking framework which is proposed in this paper the system retrieves a dataset from the World Wide Web based on textual query submitted by the user. These results are kept as data set for information retrieval. This dataset is then re-ranked using a visual query (multiple images selected by user from the dataset) which conveys user’s intention semantically. Visual descriptors (MPEG-7) which describe image with respect to low-level feature like color, texture, etc are used for calculating distances. These distances are a measure of similarity between query images and members of the dataset. Our proposed system has been assessed on different types of queries such as apples, Console, Paris, etc. It shows significant improvement on initial text-based search results.This system is well suitable for online shopping application. Index Terms: MPEG-7, Color Layout Descriptor (CLD), Edge Histogram Descriptor (EHD), image retrieval and re-ranking system
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
The students can learn about basics of image processing using matlab.
It explains the image operations with the help of examples and Matlab codes.
Students can fine sample images and .m code from the link given in slides.
Introduction to Digital Image Processing Using MATLABRay Phan
This was a 3 hour presentation given to undergraduate and graduate students at Ryerson University in Toronto, Ontario, Canada on an introduction to Digital Image Processing using the MATLAB programming environment. This should provide the basics of performing the most common image processing tasks, as well as providing an introduction to how digital images work and how they're formed.
You can access the images and code that I created and used here: https://www.dropbox.com/sh/s7trtj4xngy3cpq/AAAoAK7Lf-aDRCDFOzYQW64ka?dl=0
A Review on Image Compression using DCT and DWTIJSRD
Image Compression addresses the matter of reducing the amount of data needed to represent the digital image. There are several transformation techniques used for data compression. Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) is mostly used transformation. The Discrete cosine transform (DCT) is a method for transform an image from spatial domain to frequency domain. DCT has high energy compaction property and requires less computational resources. On the other hand, DWT is multi resolution transformation. The research paper includes various approaches that have been used by different researchers for Image Compression. The analysis has been carried out in terms of performance parameters Peak signal to noise ratio, Bit error rate, Compression ratio, Mean square error. and time taken for decomposition and reconstruction.
A REGULARIZED ROBUST SUPER-RESOLUTION APPROACH FORALIASED IMAGES AND LOW RESO...cscpconf
This paper presents a hybrid approach for images and video super-resolution. We have proposed the approach for enhancing the resolution of images and low resolution, under
sampled videos. We exploited the shift and motion based robust super-resolution (SR)algorithm [1] and the diffusion image regularization method proposed in [2] to obtain the alias free and jerk free smooth SR image.We presented a framework for obtaining super-resolution video thatis robust,even in the presence of fast changing video frames. Wecompare our hybrid
approach framework’s simulation results with different resolution enhancement techniques i.e. Robust Super-resolution, IBP and Interpolation methods reported in the literature. This
approach shows good results in term of different quality parameters.
Fundamental concepts and basic techniques of digital image processing. Algorithms and recent research in image transformation, enhancement, restoration, encoding and description. Fundamentals and basic techniques of pattern recognition.
It is the basic introduction of how the images will be captured and converted form analog to digital format by using sampling and quantization process and further algorithms will be apply on the digitized image.
Introduction to digital image processing, image processing, digital image, analog image, formation of digital image, level of digital image processing, components of a digital image processing system, advantages of digital image processing, limitations of digital image processing, fields of digital image processing, ultrasound imaging, x-ray imaging, SEM, PET, TEM
Different Approach of VIDEO Compression Technique: A StudyEditor IJCATR
The main objective of video compression is to achieve video compression with less possible losses to reduce the
transmission bandwidth and storage memory. This paper discusses different approach of video compression for better transmission of
video frames for multimedia application. Video compression methods such as frame difference approach, PCA based method,
accordion function, fuzzy concept, and EZW and FSBM were analyzed in this paper. Those methods were compared for performance,
speed and accuracy and which method produces better visual quality.
Internet data almost double every year. The need of multimedia communication
is less storage space and fast transmission. So, the large volume of video data has become
the reason for video compression. The aim of this paper is to achieve temporal compression
for three-dimensional (3D) videos using motion estimation-compensation and wavelets.
Instead of performing a two-dimensional (2D) motion search, as is common in conventional
video codec’s, the use of a 3D motion search has been proposed, that is able to better exploit
the temporal correlations of 3D content. This leads to more accurate motion prediction and
a smaller residual. The discrete wavelet transform (DWT) compression scheme has been
added for better compression ratio. The DWT has a high-energy compaction property thus
greatly impacted the field of compression. The quality parameters peak signal to noise ratio
(PSNR) and mean square error (MSE) have been calculated. The simulation results shows
that the proposed work improves the PSNR from existing work.
Introduction to Digital Image Processing Using MATLABRay Phan
This was a 3 hour presentation given to undergraduate and graduate students at Ryerson University in Toronto, Ontario, Canada on an introduction to Digital Image Processing using the MATLAB programming environment. This should provide the basics of performing the most common image processing tasks, as well as providing an introduction to how digital images work and how they're formed.
You can access the images and code that I created and used here: https://www.dropbox.com/sh/s7trtj4xngy3cpq/AAAoAK7Lf-aDRCDFOzYQW64ka?dl=0
A Review on Image Compression using DCT and DWTIJSRD
Image Compression addresses the matter of reducing the amount of data needed to represent the digital image. There are several transformation techniques used for data compression. Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) is mostly used transformation. The Discrete cosine transform (DCT) is a method for transform an image from spatial domain to frequency domain. DCT has high energy compaction property and requires less computational resources. On the other hand, DWT is multi resolution transformation. The research paper includes various approaches that have been used by different researchers for Image Compression. The analysis has been carried out in terms of performance parameters Peak signal to noise ratio, Bit error rate, Compression ratio, Mean square error. and time taken for decomposition and reconstruction.
A REGULARIZED ROBUST SUPER-RESOLUTION APPROACH FORALIASED IMAGES AND LOW RESO...cscpconf
This paper presents a hybrid approach for images and video super-resolution. We have proposed the approach for enhancing the resolution of images and low resolution, under
sampled videos. We exploited the shift and motion based robust super-resolution (SR)algorithm [1] and the diffusion image regularization method proposed in [2] to obtain the alias free and jerk free smooth SR image.We presented a framework for obtaining super-resolution video thatis robust,even in the presence of fast changing video frames. Wecompare our hybrid
approach framework’s simulation results with different resolution enhancement techniques i.e. Robust Super-resolution, IBP and Interpolation methods reported in the literature. This
approach shows good results in term of different quality parameters.
Fundamental concepts and basic techniques of digital image processing. Algorithms and recent research in image transformation, enhancement, restoration, encoding and description. Fundamentals and basic techniques of pattern recognition.
It is the basic introduction of how the images will be captured and converted form analog to digital format by using sampling and quantization process and further algorithms will be apply on the digitized image.
Introduction to digital image processing, image processing, digital image, analog image, formation of digital image, level of digital image processing, components of a digital image processing system, advantages of digital image processing, limitations of digital image processing, fields of digital image processing, ultrasound imaging, x-ray imaging, SEM, PET, TEM
Different Approach of VIDEO Compression Technique: A StudyEditor IJCATR
The main objective of video compression is to achieve video compression with less possible losses to reduce the
transmission bandwidth and storage memory. This paper discusses different approach of video compression for better transmission of
video frames for multimedia application. Video compression methods such as frame difference approach, PCA based method,
accordion function, fuzzy concept, and EZW and FSBM were analyzed in this paper. Those methods were compared for performance,
speed and accuracy and which method produces better visual quality.
Internet data almost double every year. The need of multimedia communication
is less storage space and fast transmission. So, the large volume of video data has become
the reason for video compression. The aim of this paper is to achieve temporal compression
for three-dimensional (3D) videos using motion estimation-compensation and wavelets.
Instead of performing a two-dimensional (2D) motion search, as is common in conventional
video codec’s, the use of a 3D motion search has been proposed, that is able to better exploit
the temporal correlations of 3D content. This leads to more accurate motion prediction and
a smaller residual. The discrete wavelet transform (DWT) compression scheme has been
added for better compression ratio. The DWT has a high-energy compaction property thus
greatly impacted the field of compression. The quality parameters peak signal to noise ratio
(PSNR) and mean square error (MSE) have been calculated. The simulation results shows
that the proposed work improves the PSNR from existing work.
A Hardware Model to Measure Motion Estimation with Bit Plane Matching AlgorithmTELKOMNIKA JOURNAL
The multistep approach involving combination of techniques is referred as motion estimation.
The proposed approach is an adaptive control system to measure the motion from starting point to limit of
search. The motion patterns are used to analyze and avoid stationary regions of image. The algorithm
proposed is robust efficient and the calculations justify its advantages. The motivation of the work is to
maximize the encoding speed and visual quality with the help of motion vector algorithm. In this work a
hardware model is developed in which a frame of pictures are captured and sent via serial port to the system.
MATLAB simulation tool is used to detect the motion among the picture frame. Once any motion is detected
that signal is sent to the hardware which will give the appropriate sign accordingly. This system is developed
on two platforms (hardware as well software) to estimate and measure the motion vectors
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
Image fusion is the process of combining two or more images with specific objects with more precision. It is very common that when one object is focused remaining objects will be less highlighted. To get an image highlighted in all areas, a different means is necessary. This is done by the Image Fusion. In remote sensing, the increasing availability of Space borne images and synthetic aperture radar images gives a motivation to different kinds of image fusion algorithms. In the literature a number of time domain image fusion techniques are available. Few transform domain fusion techniques are proposed. In transform domain fusion techniques, the source images will be decomposed, then integrated into a single data and will be reconstructed back into time domain. In this paper, singular value decomposition as a tool to have transform domain data will be utilized for image fusion. In the literature, the quality assessment of fusion techniques is mainly by subjective tests. In this paper, objective quality assessment metrics are calculated for existing and proposed techniques. It has been found that the new image fusion technique outperformed the existing ones.
IMAGE RECOGNITION USING MATLAB SIMULINK BLOCKSETIJCSEA Journal
The world over, image recognition are essential players in promoting quality object recognition especially in emergency and search-rescue operation. In this paper precise image recognition system using Matlab Simulink Blockset to detect selected object from crowd is presented. The process involves extracting object
features and then recognizes it considering illumination, direction and pose. A Simulink model has been developed to eliminate the tiny elements from the image, then creating segments for precise object recognition. Furthermore, the simulation explores image recognition from the coloured and gray-scale images through image processing techniques in Simulink environment. The tool employed for computation
and simulation is the Matlab image processing blockset. The process comprises morphological operation method which is effective for captured images and video. The results of extensive simulations indicate that this method is suitable for application identifying a person from a crow. The model can be used in emergency and search-rescue operation as well as in medicine, information security, access control, law enforcement, surveillance system, microscopy etc.
Markerless motion capture for 3D human model animation using depth cameraTELKOMNIKA JOURNAL
3D animation is created using keyframe based system in 3D animation software such as Blender and Maya. Due to the long time interval and the need of high expertise in 3D animation, motion capture devices were used as an alternative and Microsoft Kinect v2 sensor is one of them. This research analyses the capabilities of the Kinect sensor in producing 3D human model animations using motion capture and keyframe based animation system in reference to a live motion performance. The quality, time interval and cost of both animation results were compared. The experimental result shows that motion capture system with Kinect sensor consumed less time (only 2.6%) and cost (30%) in the long run (10 minutes of animation) compare to keyframe-based system, but it produced lower quality animation. This was due to the lack of body detection accuracy when there is obstruction. Moreover, the sensor’s constant assumption that the performer’s body faces forward made it unreliable to be used for a wide variety of movements. Furthermore, standard test defined in this research covers most body parts’ movements to evaluate other motion capture system.
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classif...CSCJournals
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
Wavelet-Based Warping Technique for Mobile Devicescsandit
The role of digital images is increasing rapidly in
mobile devices. They are used in many
applications including virtual tours, virtual reali
ty, e-commerce etc. Such applications
synthesize realistic looking novel views of the ref
erence images on mobile devices using the
techniques like image-based rendering (IBR). Howeve
r, with this increasing role of digital
images comes the serious issue of processing large
images which requires considerable time.
Hence, methods to compress these large images are v
ery important. Wavelets are excellent data
compression tools that can be used with IBR algorit
hms to generate the novel views of
compressed image data. This paper proposes a framew
ork that uses wavelet-based warping
technique to render novel views of compressed image
s on mobile/ handheld devices. The
experiments are performed using Android Development
Tools (ADT) which shows the proposed
framework gives better results for large images in
terms of rendering time.
RADAR Images are strongly preferred for analysis of geospatial information about earth surface to assesse envirmental conditions radar images are captured by different remote sensors and that images are combined together to get complementary information. To collect radar images SAR(Synthetic Aperture Radar) sensors are used which are active sensors and can gather information during day and night without affecting weather conditions. We have discussed DCT and DWT image fusion methods,which gives us more informative fused image simultaneously we have checked performance parameters among these two methods to get superior method from these two techniques
Computer Vision Based 3D Reconstruction : A ReviewIJECEIAES
3D reconstruction are used in many fields starts from the object reconstruction such as site, and cultural artifacts in both ground and under the sea levels. The scientist are beneficial for these task in order to learn and keep the environment into 3D data due to the extinction. In this paper explained vision setup that is commonly used such as single camera, stereo camera, Kinect / Structured Light/ Time of Flight camera and fusion approach. The prior works also explained how the 3D reconstruction perform in many fields and using various algorithms.
A Detailed Analysis on Feature Extraction Techniques of Panoramic Image Stitc...IJEACS
Image stitching is a technique which is used for attaining a high resolution panoramic image. In this technique, distinct aesthetic images that are imaged from different view and angles are combined together to produce a panoramic image. In the field of computer graphics, photographic and computer vision, Image stitching techniques are considered as current research areas. For obtaining a stitched image it becomes mandatory that one should have the knowledge of geometric relations among multiple image co-ordinate system [1].First, image stitching will be done based on feature key point matches. Final image with seam will be blended with image blending technique. Hence in this paper we are going to address multiple distinct techniques like some invariant features as Scale Invariant Feature Transform and Speeded up Robust Transform and Corner techniques as Harris Corner Detection Technique that are useful in sorting out the issues related with stitching of images.
This paper presents the maneuver of mouse pointer and performs various mouse operations such as left
click, right click, double click, drag etc using gestures recognition technique. Recognizing gestures is a
complex task which involves many aspects such as motion modeling, motion analysis, pattern recognition
and machine learning.
Keeping all the essential factors in mind a system has been created which recognizes the movement of
fingers and various patterns formed by them. Color caps have been used for fingers to distinguish it from
the background color such as skin color. Thus recognizing the gestures various mouse events have been
performed. The application has been created on MATLAB environment with operating system as windows
7.
Image Processing Compression and Reconstruction by Using New Approach Artific...CSCJournals
In this paper a neural network based image compression method is presented. Neural networks offer the potential for providing a novel solution to the problem of data compression by its ability to generate an internal data representation. This network, which is an application of back propagation network, accepts a large amount of image data, compresses it for storage or transmission, and subsequently restores it when desired. A new approach for reducing training time by reconstructing representative vectors has also been proposed. Performance of the network has been evaluated using some standard real world images. It is shown that the development architecture and training algorithm provide high compression ratio and low distortion while maintaining the ability to generalize and is very robust as well.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Event Management System Vb Net Project Report.pdfKamal Acharya
In present era, the scopes of information technology growing with a very fast .We do not see any are untouched from this industry. The scope of information technology has become wider includes: Business and industry. Household Business, Communication, Education, Entertainment, Science, Medicine, Engineering, Distance Learning, Weather Forecasting. Carrier Searching and so on.
My project named “Event Management System” is software that store and maintained all events coordinated in college. It also helpful to print related reports. My project will help to record the events coordinated by faculties with their Name, Event subject, date & details in an efficient & effective ways.
In my system we have to make a system by which a user can record all events coordinated by a particular faculty. In our proposed system some more featured are added which differs it from the existing system such as security.
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Imran2016
1. Human Action Recognition Using RGB-D Sensor and
Deep Convolutional Neural Networks
Javed Imran
Department of Computer Science and Engineering
IIT Roorkee
Roorkee, India
javed.csit@gmail.com
Praveen Kumar
Department of Computer Science and Engineering
Visvesvaraya National Institute of Technology
Nagpur, India
praveen.kverma@gmail.com
Abstract— In this paper, we propose an approach to recognize
human actions by the fusion of RGB and Depth data. Firstly,
Motion History Images (MHI) are generated from RGB videos
which represent the temporal information about the action. Then
the original depth data is rotated in 3D point clouds and three
Depth Motion Maps (DMM) are generated over the entire depth
sequence corresponding to the front, side and top projection
views. A 4 Channel Deep Convolutional Neural Network is
trained, where the first channel is for classifying MHIs and the
remaining three for the front, side and top view generated from
depth data respectively. The proposed method is evaluated on
publically available UTD-MHAD dataset which contains both
RGB and depth videos. Experimental results show that
combining two modalities gives better recognition accuracy than
using each modality individually.
I. INTRODUCTION
Human action recognition is one of the most important topics
in computer vision. It has applications in the field of security
systems, human-computer interaction, video indexing and
querying, content based video analytics, web-video search and
retrieval, bio-mechanics, monitoring and intelligent
environments. Prior to the release of depth cameras, research
on action recognition was mainly focused on learning and
recognizing actions from image sequences captured by
traditional RGB video cameras [1]. But with the introduction
of low cost depth sensors like Microsoft Kinect and ASUS
Xtion, many researchers have focused on action recognition
using depth information [2-4]. The depth cameras have several
advantages as compared to RGB cameras. For example, the
outputs of depth cameras are insensitive to changes in
lightning conditions. Secondly, the 3D structure and shape
information provided by the depth maps makes it easier to deal
with problems like segmentation and detection.
Recently Deep Convolutional Neural Networks (CNN) have
given state-of-the-art performance in image recognition,
segmentation and classification task [5-6]. In this paper, we
have used a pretrained ImageNet model because no such large
RGB-D dataset exists which can train a deep CNN from
scratch. This approach bears similarity to other multi-stream
approaches [7-9]. As suggested in [8], we first rotate the 3D
point clouds constructed from original depth data to handle
view invariance. These rotated depth frames are used to
generate depth motion maps (DMMs) by accumulating motion
energy in three projected views [10]. Motion History Images
(MHIs) are generated from RGB videos where the intensity of
each pixel is a function of the recency of motion in a sequence
[11]. Four CNNs are trained separately corresponding to front
view, side view, top view and MHI, and their results are fused
to produce the final classification score.
The rest of this paper is organized as follows. In Section 2,
related works are presented. In section 3, implementation
details are discussed which includes MHI and DMM
generation along with the proposed 4-stream CNN
architecture. Section 4 describes the experimental results.
Section 5 concludes the paper.
II. RELATED WORK
Motion History Image based action recognition has been
actively studied since Bobick and Davis proposed the concept
of Motion Energy Image (MEI) and Motion History Image
(MHI) to recognize many types of aerobics exercises [11].
Though the computation of MHI is computationally
inexpensive, but this template matching approach is
susceptible to noise and variations in performing same actions
by different individuals. In [12], Meng et al. combined Motion
History Image (MHI) and Modified Motion History Image
(MMHI) and used SVM_2K as linear binary classifier. With
the release of Kinect camera in 2010, researchers shifted their
focus on action recognition based on depth data. In [2], Li et
al. construct an action graph to describe the pattern of the
action. Their action graph consists of multiple nodes where
each node represents a set of salient postures that is
characterized by a bag of 3D points. But this sampling scheme
is view dependent. In [3], a view invariant Histogram of 3D
joint locations (HOJ3D) were calculated from action depth
sequences. They are reprojected using LDA and different
clusters are formed based on similar posture visual words.
Discrete hidden Markrov Models were used to model the
temporal evolutions of these visual words. In [4], Depth
Motion Maps (DMM) are generated by projecting depth maps
onto three orthogonal planes. Histogram of Gradient (HOG)
features are then extracted from DMMs and classified using
linear SVM. In [10], normalized DMMs are generated by
absolute differencing between two consecutive depth maps
without thresholding, and an l2-regularized classifier is
2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
978-1-5090-2029-4/16/$31.00 @2016 IEEE 144
2. employed for action recognition. In general, all the above
mentioned methods are based on hand-crafted features which
are either time consuming or dataset dependent.
With the recent success of Convolutional Neural Networks
[5], deep neural architectures are widely used in the area of
image and video classification tasks [13, 7]. The availability of
pretrained ImageNet models [14] further leverages the
researchers to apply them in the domain of RGB-D action and
object recognition. In [15], Ji et al. have used 3-dimensional
(3D) CNN model for action recognition. This model extracts
features from both spatial and temporal dimensions by
performing 3D convolutions. The first layer in their
architecture was hardwired to encode prior knowledge on
features. Thus, it is not clear how this approach will perform
when applied to a new dataset. In [7], Simonyan and
Zisserman have proposed a Two-Stream CNN for action
recognition in videos. They have used the fact that a video can
be decomposed into spatial and temporal components. Each
stream is implemented using a deep ConvNet, softmax scores
of which are combined by late fusion. In [9], two separate
CNNs are trained for RGB and depth images. A jet colormap
is applied to depth images to convert them into three channel
image so as to make effective use of ImageNet pretrained
model. Concatenation of last layers of both CNNs into one
fully connected layer followed by a softmax classifier is used
for object recognition.
Our work bears the most similarity to [8], in which each depth
map is rotated in 3D point clouds before generating three
DMMs for front, side and top view. However, we have also
used RGB video as the second modality to generate MHIs. A
four-channel CNNs is then trained, and their scores are
combined by late fusion to give the classification score.
III. MHI, DMM AND 4-STREAM CNN
In this section, we describe the generation of Motion History
Image from RGB videos and Depth Motion Maps from depth
data. We also discuss the input preprocessing, proposed CNN
architecture, network training and class score fusion.
A. Motion History Image
The concept of Motion History Image was proposed by
Bobick and Davis [11] in 2001. They proposed the generation
of motion energy image (MEI), which captures the occurrence
of motion in a video sequence in one image. Next they
generated motion history image (MHI), which gives the
temporal information of the motion in the image plane. The
brightness of the pixels in this image is higher where the
motion has occurred more recently as compared to where the
motion has occurred earlier.
Addition of both the MHI and MEI generate the MHI for a
video sample. The MHI H (x,y,t) can be computed from an
update function D(x,y,t) [17]:
H (x, y, t) = (1)
Here, (x, y) represents the pixel location, t denotes the time, D
(x, y, t) shows the object presence (or motion) in the current
video image, the duration τ governs the temporal extent of the
movement, and δ is the decay parameter.
B. Depth Motion Maps
Yang et. al [4] proposed the concept of Depth Motion Maps
(DMM) to capture the 3D structure and depth information.
The depth images in the entire depth video sequence are
projected onto three orthogonal planes. Then the absolute
difference between consecutive projected depth maps are
calculated and combined to form three 2D depth motion maps.
Before projection, the depth data is rotated in 3D point clouds
as discussed in [8]. This is done to handle the problem of view
invariance and also provides a method to generate more
training data. Finally, the DMM is generated as follows [10]:
(2)
where is the projected map of ith
frame under projection
view v {front, side, top}.
C. Input preprocessing
The MHI and three DMMs (front view, side view and top
view) generated using the techniques discussed above are in
grayscale. So we colorize them into 3 channel RGB images so
as to fully utilize the power of CNN pre-trained on ImageNet.
For the MHIs, we apply five different colormaps: copper, hot,
pink, gray and bone available in Matlab R2015. The rotated
DMMs are colorized using the improved rainbow
pseudocoloring technique proposed in [9]. Fig. 1 and 2 shows
the result after coloring. The resulting images are then resized
to 224×224 so as to make them compatible with the pre-
trained ImageNet model.
(a) Original MHI
(b) MHIs obtained after applying different colormaps
Fig. 1. Examples of original and colored MHIs for Swipe-right action
D. Proposed CNN Architecture
In [6], Simonyan & Zisserman have shown that very high
recognition accuracy can be achieved by using very deep
architecture and filters with very small (3×3) receptive fields.
2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
145
3. (a) Original front, side and top view DMM
(b) DMMs obtained after applying pseudocoloring
Fig. 2. Examples of DMM for Wave action
In addition to this, very deep models generalize well to other
datasets. This motivated us to use their pre-trained VGG-16
model for training our network. VGG-16 consists of sixteen
layers, including thirteen convolutional layers and three fully
connected layers. In our proposed architecture, four such
networks are combined by fusing their softmax scores as
shown in Fig 3. Two fusion schemes are analyzed; first is
average rule, and second is product rule.
E. Network Training & Class Score Fusion
Four CNNs are trained; one for the color coded MHIs and
remaining three for front, side and top view DMMs
respectively. Dropout layer with ratio set to 0.8 is added
between the last two fully connected layers to avoid
overfitting. The learning rate is set to 10-4
, weight decay set to
0.0005 and momentum set to 0.9 with a batch size of 16. The
entire network is trained using MatConvNet [18] toolbox on a
system with NVIDIA Quadro K4200 GPU. During testing, the
posterior probabilities generated by the softmax layer of four
CNNs are combined using average and product rule.
IV. EXPERIMENTAL RESULTS
Our proposed framework is evaluated on publically available
UTD-MHAD dataset [18] which contains both RGB and depth
data captured using Kinect. It contains 27 actions as shown in
Fig. 4. The same experimental setting in [18] is followed
where the data from the subject numbers 1, 3, 5, 7 is used for
training, and the data for the subject numbers 2, 4, 6, 8 is used
for testing. The results are given in Table 1 and the individual
class accuracy is shown in Fig. 5.
Fig. 3. DMM and MHI based 4-stream deep CNN architecture for action recognition
2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
146
4. Swipe-left Swipe-right Wave Clap Throw Arm-cross
Basketball-shoot Draw X Draw-circle-CC Draw-circle-CCW Draw-traingle Bowling
Boxing Baseball-swing Tennis-swing Arm-curl Tennis-serve Push
Knock Catch Pickup-Throw Jog Walk Sit-to-stand
Stand-to-sit Lunge Squat
Fig. 4. Samples of UTD-MHAD dataset
Fig. 5. Class-specific accuracy for UTD-MHAD dataset (Using product rule based decision-level fusion)xample of a figure caption. (figure caption)
2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
147
5. TABLE I. COMPARISON OF RECOGNITION ACCURACY ON UTD-MHAD
DATASET
Method Accuracy (%)
C. Chen et al. [18] 79.1
Bulbul et al. [19] 88.4
Ours Depth (Avg. of front, side & top DMM)
RGB (using only MHI)
Depth + RGB (Average Rule)
Depth + RGB (Product Rule)
87.9
70.0
88.8
91.2
V. CONCLUSION
In this paper, we have presented a deep convolutional neural
network based framework classify human actions based on
RGB-D data. The experimental results on UTD-MHAD
dataset demonstrates that fusion of different modalities can
give better performance than using each modality individually.
Our approach also proves to be robust and efficient than
traditional hand-crafted based feature extraction techniques.
State-of-the-art results can be achieved even on a small dataset
by fine tuning a pre-trained model like VGG-16. In the future,
we will combine other modalities like skeleton stream and
handle confusion between similar classes by applying
Dempster-Shafer Belief theory.
REFERENCES
[1] Aggarwal, Jake K., and Michael S. Ryoo. "Human activity analysis: A
review." ACM Computing Surveys (CSUR) 43.3 (2011): 16.
[2] Li, Wanqing, Zhengyou Zhang, and Zicheng Liu. "Action recognition
based on a bag of 3d points." Computer Vision and Pattern Recognition
Workshops (CVPRW), 2010 IEEE Computer Society Conference on.
IEEE, 2010.
[3] Xia, Lu, Chia-Chih Chen, and J. K. Aggarwal. "View invariant human
action recognition using histograms of 3d joints." Computer Vision and
Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer
Society Conference on. IEEE, 2012.
[4] Yang, Xiaodong, Chenyang Zhang, and YingLi Tian. "Recognizing
actions using depth motion maps-based histograms of oriented
gradients."Proceedings of the 20th ACM international conference on
Multimedia. ACM, 2012.
[5] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet
classification with deep convolutional neural networks." Advances in
neural information processing systems. 2012.
[6] Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional
networks for large-scale image recognition." arXiv preprint
arXiv:1409.1556(2014).
[7] Simonyan, Karen, and Andrew Zisserman. "Two-stream convolutional
networks for action recognition in videos." Advances in Neural
Information Processing Systems. 2014.
[8] Wang, Pichao, et al. "ConvNets-Based Action Recognition from Depth
Maps through Virtual Cameras and Pseudocoloring." Proceedings of the
23rd Annual ACM Conference on Multimedia Conference. ACM, 2015.
[9] Eitel, Andreas, et al. "Multimodal deep learning for robust RGB-D
object recognition." Intelligent Robots and Systems (IROS), 2015
IEEE/RSJ International Conference on. IEEE, 2015.
[10] Chen, Chen, Kui Liu, and Nasser Kehtarnavaz. "Real-time human action
recognition based on depth motion maps." Journal of Real-Time Image
Processing (2013): 1-9.
[11] Bobick, Aaron F., and James W. Davis. "The recognition of human
movement using temporal templates." Pattern Analysis and Machine
Intelligence, IEEE Transactions on 23.3 (2001): 257-267.
[12] Meng, Hongying, et al. "Motion history histograms for human action
recognition." Embedded Computer Vision. Springer London, 2009. 139-
162.
[13] Krizhevsky, Alex, and Geoffrey E. Hinton. "Using very deep
autoencoders for content-based image retrieval." ESANN. 2011.
[14] http://www.vlfeat.org/matconvnet/pretrained/
[15] Ji, Shuiwang, et al. "3D convolutional neural networks for human action
recognition." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 35.1 (2013): 221-231.
[16] Ahad, Md Atiqur Rahman, et al. "Motion history image: its variants and
applications." Machine Vision and Applications 23.2 (2012): 255-281.
[17] Vedaldi, Andrea, and Karel Lenc. "MatConvNet: Convolutional neural
networks for matlab." Proceedings of the 23rd Annual ACM Conference
on Multimedia Conference. ACM, 2015.
[18] Chen, Chen, Roozbeh Jafari, and Nasser Kehtarnavaz. "UTD-MHAD: a
multimodal dataset for human action recognition utilizing a depth
camera and a wearable inertial sensor." Image Processing (ICIP), 2015
IEEE International Conference on. IEEE, 2015.
[19] Bulbul, Mohammad Farhad, Yunsheng Jiang, and Jinwen Ma. "DMMs-
Based Multiple Features Fusion for Human Action
Recognition." International Journal of Multimedia Data Engineering
and Management (IJMDEM) 6.4 (2015): 23-39.
2016 Intl. Conference on Advances in Computing, Communications and Informatics (ICACCI), Sept. 21-24, 2016, Jaipur, India
148