In the healthcare industry, patients are often exposed to harmful pathogens due to a lack of compliance to hand hygiene protocol. The vast majority of healthcare professionals do not abide by hand hygiene standards established by the World Health Organization, facilitating the spread of nosocomial (hospital-acquired) infections. When representatives trained in proper handwashing procedures monitored medical professionals, there was a significant increase in compliance with proper protocol. Given this correlation between observance and adherence, a Hygiene Monitoring System was developed to monitor handwashing through the application of machine learning. The embedded system captured, processed, and compared instances of handwashing to the proper procedure. An implementation of this system would encourage healthcare professionals to follow the official protocol denoted by the World Health Organization and dramatically reduce the likelihood of healthcare–associated infections.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET Journal
1) The document discusses approaches for semantic concept detection in videos using techniques like support vector machines (SVM) and convolutional neural networks (CNN).
2) It proposes a concept detection system that uses SVM and CNN together, extracting features from key frames using Hue moments and classifying the features with SVM and CNN.
3) The outputs of SVM and CNN are fused to improve concept detection accuracy compared to using the classifiers individually. Fusing the two classifiers is intended to better identify the concepts in video frames.
The objective of a video communication system is to deliver the maximum of video data from the source to the destination through a communication channel using all of its available bandwidth. To achieve this objective, the source coding should compress the original video sequence as much as possible and the compressed video data should be robust and resilient to channel errors. However, while achieving a high coding efficiency, compression also makes the coded video bitstream vulnerable to transmission errors. Thus, the process of video data compression tends to work against the objectives of robustness and resilience to errors. Therefore, extra information that needs to be transmitted in 3-D video has brought new challenge and consumer applications will not gain more popularity unless the 3-D video coding problems are addressed.
This document discusses video quality analysis for H.264 based on the human visual system. It proposes an improved video quality assessment method that adds color comparison to structural similarity measurement. The method separates similarity measurement into four comparisons: luminance, contrast, structure, and color. Experimental results on video sets with two distortion types show the proposed method's quality scores are more consistent with visual quality than classical methods. It also discusses the H.264 video coding standard and provides examples of encoding and decoding experimental results.
This document discusses a structural similarity based approach for efficient multi-view video coding. It begins with an introduction to multi-view video coding and the structural similarity index metric. It then proposes using structural similarity to exploit structural information between different video views. The method uses structural similarity for rate distortion optimization in encoding. Experimental results show the left and right views of a video, their structural similarity image, the decoded 3D video, and the achieved minimum distortion level. The document aims to improve multi-view video quality by using structural similarity during the encoding process.
Optimized image processing and clustering to mitigate security threats in mob...TELKOMNIKA JOURNAL
Since there are provisions of many attributes that are not possible or difficult to follow by networks conventionally, mobile ad-hoc networks are extensively deployed. This application starts through the defense sectors, the sensory node presents in the hostile territories down to the gadgets for congestion communication in traffic by general transportation when travelling for adequate provision of infrastructure during disaster recovery. As a lot of importance related to (mobile ad hoc network) MANET application, one important factor in ad-hoc networks is security. Using image processing for securing MANET is the area of focus of this research. Therefore, in this article, the security threats are assessed and representative proposals are summarized in ad-hoc network’s context. The study reviewed the current situation of the art for original to security provision called mobile ad hoc network for wireless networking. The threats to security are recognized while the present solution is observed. The study additionally summarized education erudite, talks on general issues and future instructions are recognized. Also, in this study, the forecast weighted clustering algorithm (FWCA) is employed as a cluster head over weighted clustering algorithm (WCA) is examined as quality in cluster-based routing, service is highly significant with MANET.
This document compares the performance of three lossless image compression techniques: Run Length Encoding (RLE), Delta encoding, and Huffman encoding. It tests these algorithms on binary, grayscale, and RGB images to evaluate compression ratio, storage savings percentage, and compression time. The results found that Delta encoding achieved the highest compression ratio and storage savings, while Huffman encoding had the fastest compression time. In general, the document evaluates and compares the performance of different lossless image compression algorithms.
Deep hypersphere embedding for real-time face recognitionTELKOMNIKA JOURNAL
With the advancement of human-computer interaction capabilities of robots, computer vision surveillance systems involving security yields a large impact in the research industry by helping in digitalization of certain security processes. Recognizing a face in the computer vision involves identification and classification of which faces belongs to the same person by means of comparing face embedding vectors. In an organization that has a large and diverse labelled dataset on a large number of epoch, oftentimes, creates a training difficulties involving incompatibility in different versions of face embedding that leads to poor face recognition accuracy. In this paper, we will design and implement robotic vision security surveillance system incorporating hybrid combination of MTCNN for face detection, and FaceNet as the unified embedding for face recognition and clustering.
5 ijaems sept-2015-9-video feature extraction based on modified lle using ada...INFOGAIN PUBLICATION
Locally linear embedding (LLE) is an unsupervised learning algorithm which computes the low dimensional, neighborhood preserving embeddings of high dimensional data. LLE attempts to discover non-linear structure in high dimensional data by exploiting the local symmetries of linear reconstructions. In this paper, video feature extraction is done using modified LLE alongwith adaptive nearest neighbor approach to find the nearest neighbor and the connected components. The proposed feature extraction method is applied to a video. The video feature description gives a new tool for analysis of video.
IRJET- Study of SVM and CNN in Semantic Concept DetectionIRJET Journal
1) The document discusses approaches for semantic concept detection in videos using techniques like support vector machines (SVM) and convolutional neural networks (CNN).
2) It proposes a concept detection system that uses SVM and CNN together, extracting features from key frames using Hue moments and classifying the features with SVM and CNN.
3) The outputs of SVM and CNN are fused to improve concept detection accuracy compared to using the classifiers individually. Fusing the two classifiers is intended to better identify the concepts in video frames.
The objective of a video communication system is to deliver the maximum of video data from the source to the destination through a communication channel using all of its available bandwidth. To achieve this objective, the source coding should compress the original video sequence as much as possible and the compressed video data should be robust and resilient to channel errors. However, while achieving a high coding efficiency, compression also makes the coded video bitstream vulnerable to transmission errors. Thus, the process of video data compression tends to work against the objectives of robustness and resilience to errors. Therefore, extra information that needs to be transmitted in 3-D video has brought new challenge and consumer applications will not gain more popularity unless the 3-D video coding problems are addressed.
This document discusses video quality analysis for H.264 based on the human visual system. It proposes an improved video quality assessment method that adds color comparison to structural similarity measurement. The method separates similarity measurement into four comparisons: luminance, contrast, structure, and color. Experimental results on video sets with two distortion types show the proposed method's quality scores are more consistent with visual quality than classical methods. It also discusses the H.264 video coding standard and provides examples of encoding and decoding experimental results.
This document discusses a structural similarity based approach for efficient multi-view video coding. It begins with an introduction to multi-view video coding and the structural similarity index metric. It then proposes using structural similarity to exploit structural information between different video views. The method uses structural similarity for rate distortion optimization in encoding. Experimental results show the left and right views of a video, their structural similarity image, the decoded 3D video, and the achieved minimum distortion level. The document aims to improve multi-view video quality by using structural similarity during the encoding process.
Optimized image processing and clustering to mitigate security threats in mob...TELKOMNIKA JOURNAL
Since there are provisions of many attributes that are not possible or difficult to follow by networks conventionally, mobile ad-hoc networks are extensively deployed. This application starts through the defense sectors, the sensory node presents in the hostile territories down to the gadgets for congestion communication in traffic by general transportation when travelling for adequate provision of infrastructure during disaster recovery. As a lot of importance related to (mobile ad hoc network) MANET application, one important factor in ad-hoc networks is security. Using image processing for securing MANET is the area of focus of this research. Therefore, in this article, the security threats are assessed and representative proposals are summarized in ad-hoc network’s context. The study reviewed the current situation of the art for original to security provision called mobile ad hoc network for wireless networking. The threats to security are recognized while the present solution is observed. The study additionally summarized education erudite, talks on general issues and future instructions are recognized. Also, in this study, the forecast weighted clustering algorithm (FWCA) is employed as a cluster head over weighted clustering algorithm (WCA) is examined as quality in cluster-based routing, service is highly significant with MANET.
This document compares the performance of three lossless image compression techniques: Run Length Encoding (RLE), Delta encoding, and Huffman encoding. It tests these algorithms on binary, grayscale, and RGB images to evaluate compression ratio, storage savings percentage, and compression time. The results found that Delta encoding achieved the highest compression ratio and storage savings, while Huffman encoding had the fastest compression time. In general, the document evaluates and compares the performance of different lossless image compression algorithms.
Deep hypersphere embedding for real-time face recognitionTELKOMNIKA JOURNAL
With the advancement of human-computer interaction capabilities of robots, computer vision surveillance systems involving security yields a large impact in the research industry by helping in digitalization of certain security processes. Recognizing a face in the computer vision involves identification and classification of which faces belongs to the same person by means of comparing face embedding vectors. In an organization that has a large and diverse labelled dataset on a large number of epoch, oftentimes, creates a training difficulties involving incompatibility in different versions of face embedding that leads to poor face recognition accuracy. In this paper, we will design and implement robotic vision security surveillance system incorporating hybrid combination of MTCNN for face detection, and FaceNet as the unified embedding for face recognition and clustering.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This survey propose a Novel Joint Data-Hiding and
Compression Scheme (JDHC) for digital images using side match
vector quantization (SMVQ) and image in painting. In this
JDHC scheme image compression and data hiding scheme are
combined into a single module. On the client side, the data should
be hided and compressed in sub codebook such that remaining
block except left and top most of the image. The data hiding and
compression scheme follows raster scanning order i.e. block by
block on row basis. Vector Quantization used with SMVQ and
Image In painting for complex block to control distortion and
error injection. The receiver side process is based on two
methods. First method divide the received image into series of
blocks the receiver achieve hided data and original image
according to the index value in the segmented block. Second
method use edge based harmonic in painting is used to get
original image if any loss in the image.
This document describes a system for Tamil video retrieval based on categorization in the cloud. The system first categorizes Tamil videos into subcategories based on camera motion parameters. It then segments the videos into shots and extracts representative key frames from each shot based on edge and color features. These features are stored in a feature library in the cloud. When a Tamil query is submitted, the system retrieves similar videos from the cloud based on matching the query features to the stored features. The system is implemented using the Eucalyptus cloud computing platform for its flexibility and ability to handle large computational loads.
This document provides an overview of digital image processing. It discusses key concepts like image types (intensity, binary, indexed, RGB), image file formats (TIFF, JPEG), image resolutions, and the steps involved in digital image processing. The MATLAB Image Processing Toolbox is also mentioned as a tool for performing operations on images like visualization, analysis, and processing. Edge detection is highlighted as an important but difficult task in digital image processing.
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONNEERAJ BAGHEL
This document presents a second progress report on video summarization research. It provides an outline of topics covered, including an introduction to video summarization, a literature review summarizing 5 papers on the topic, identified research gaps, challenges, the problem statement of finding key frames based on extracted text, overview of relevant datasets and tools used, and conclusions. The literature review analyzes the objectives, methods, strengths and limitations of the summarized papers.
This document discusses image processing and summarizes several key techniques. It begins by defining image processing and describing how images are digitized and processed. It then summarizes three main categories of image processing: image enhancement, image restoration, and image compression. Specific techniques discussed include contrast stretching, density slicing, and edge enhancement. The document also discusses visual saliency models, motion saliency, and using conditional random fields for video object extraction.
Medical Video Compression has to be loss less to avoid the danger of diagnostic errors. presentation outlines an approach to improve the compression ratio of medical video sequence using HEVC
This document discusses a content-based video retrieval system based on dominant color and texture features. It begins with an introduction to content-based video retrieval and the challenges involved. It then describes representing video through segmentation into shots and frames. The proposed method extracts dominant color, texture, and color histogram features from frames. Texture is captured through gray-level co-occurrence matrix analysis. A combined feature vector is constructed and similarity measured through Euclidean distance. The system is aimed at efficient video retrieval through analyzing dominant color and texture information.
Image Authentication Using Digital Watermarkingijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET Journal
This document describes a proposed system to automate student attendance management using convolutional neural networks and face recognition. The system would take attendance automatically by detecting faces in the classroom and comparing them to a database of student faces. This would make the attendance process more efficient than current manual methods like calling roll numbers or paper sign-ins. The system would use a CNN algorithm and face detection/recognition techniques like PCA to detect and identify student faces during lectures and automatically update attendance records.
An overview Survey on Various Video compressions and its importanceINFOGAIN PUBLICATION
With the rise of digital computing and visual data processing, the need for storage and transmission of video data became prevalent. Storage and transmission of uncompressed raw visual data is not a good practice, because it requires a large storage space and great bandwidth. Video compression algorithms can compress this raw visual data or video into smaller files with a little sacrifice on the quality. This paper an overview and comparison of standard efforts on video compression algorithm of: MPEG-1, MPEG-2, MPEG-4, MPEG-7
The document discusses digital image processing and representation. It defines digital image processing as using computer algorithms to process digital images. Some key applications mentioned include remote sensing, medical processing, and robotics. The fundamental components of an image processing system are described as image sensors, specialized hardware, computer, software, storage, display and networking. Key steps in digital image processing include image acquisition, enhancement, restoration, representation and description, recognition, and using a knowledge base. Specific techniques like color processing, wavelets, compression, and morphology are also outlined.
The document discusses several key topics related to digital images:
- Raster images are composed of pixels arranged in a grid, while vector images use mathematical descriptions of lines, curves and shapes. Raster images lose quality when scaled while vector images maintain quality.
- Resolution refers to the number of pixels per inch in a raster image, affecting quality of on-screen and printed display. Higher resolutions have more pixels and finer detail.
- Aspect ratio expresses the proportional relationship between an image's width and height, such as 16:9 for HDTV. Formats with unequal ratios require enlarging or adding borders for presentation.
- Common file formats include GIF for simple graphics, JPEG for photographs
The document discusses various topics relating to raster and vector images, including:
- Raster images are composed of pixels while vector images are composed of mathematical objects. Vector images can be scaled without quality loss.
- Common file formats for raster images include JPEG, TIFF, PNG, and GIF while common vector formats are EPS, AI, and PDF.
- Other topics covered include color models (RGB, CMYK), resolution, aspect ratio, and image editing software like Photoshop, Illustrator, and InDesign.
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAScscpconf
This article describes the design and development ofa system for remote indoor 3D monitoring
using an undetermined number of Microsoft® Kinect sensors. In the proposed client-server
system, the Kinect cameras can be connected to different computers, addressing this way the
hardware limitation of one sensor per USB controller. The reason behind this limitation is the
high bandwidth needed by the sensor, which becomes also an issue for the distributed system
TCP/IP communications. Since traffic volume is too high, 3D data has to be compressed before
it can be sent over the network. The solution consists in self-coding the Kinect data into RGB
images and then using a standard multimedia codec to compress color maps. Information from
different sources is collected into a central client computer, where point clouds are transformed
to reconstruct the scene in 3D. An algorithm is proposed to conveniently merge the skeletons
detected locally by each Kinect, so that monitoring of people is robust to self and inter-user
occlusions. Final skeletons are labeled and trajectories of every joint can be saved for event
reconstruction or further analysis.
IRJET- ROI based Automated Meter Reading System using PythonIRJET Journal
This document describes an automated meter reading system using image processing and Python. Key points:
- The system takes images of a meter panel containing multiple meters. It then extracts the meter reading from each meter using image processing techniques in Python like thresholding, contour detection and digit recognition.
- The extracted meter readings are uploaded to a server (ThingSpeak) for remote access. This avoids the need for a service provider to manually record readings.
- An algorithm was developed to detect the region of interest containing each meter display, identify the segments that make up each digit, and recognize the digits based on segment patterns.
- Tests on sample meter images successfully extracted readings from individual meters as well as a panel with
Indoor 3 d video monitoring using multiple kinect depth camerasijma
The document describes a system for remote indoor 3D video monitoring using multiple Kinect depth cameras. The system addresses hardware limitations of connecting multiple Kinect cameras to individual computers by implementing a client-server architecture that allows an unlimited number of Kinects to be connected across different computers. 3D data from the Kinects is compressed before being sent over the network to reconstruct the scene and merge skeleton detections in the central client. An optimal camera layout is also proposed to minimize infrared interference while ensuring overlapping coverage for robust skeleton tracking of moving subjects.
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...sipij
We present a video compression framework that has two key features. First, we aim at achieving
perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the
literature have been evaluated and the performance was assessed using four well-known performance
metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have
been performed to demonstrate the proposed framework.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Detection of medical instruments project- PART 1Sairam Adithya
this presentation is about a project done by me and my colleague related to computer vision. This project is used to classify the uploaded images of biomedical instruments into prominent ones like ECG, EEG, x-ray machine, CT, MRI, and so on. A website has been developed on which the user can upload any image he is unknown of and the model will tell what instrument it is along with a paragraph explaining the instrument in a crisp manner
The document describes a major project report on a cloud-based intrusion detection system using a backpropagation neural network based on particle swarm optimization. It discusses cloud computing concepts, characteristics, service models, and security threats. The proposed methodology uses particle swarm optimization to optimize training data sets for a backpropagation neural network intrusion detection system. Soft computing techniques like artificial neural networks, fuzzy logic, genetic algorithms, and particle swarm optimization are applied. The objectives are to design an intrusion detection system and evaluate its performance on test data sets.
SIGN LANGUAGE INTERFACE SYSTEM FOR HEARING IMPAIRED PEOPLEIRJET Journal
The document describes a proposed sign language interface system for hearing impaired people. The system aims to use machine learning algorithms like convolutional neural networks to classify hand gestures captured by a webcam into corresponding letters or words. The system would preprocess the images, extract features, then use a trained CNN model to predict the sign and output it as text and speech for better understanding by users. The goal is to help bridge communication between deaf/mute and normal people without requiring specialized gloves or sensors.
International Journal of Computational Engineering Research(IJCER)ijceronline
International Journal of Computational Engineering Research(IJCER) is an intentional online Journal in English monthly publishing journal. This Journal publish original research work that contributes significantly to further the scientific knowledge in engineering and Technology.
This survey propose a Novel Joint Data-Hiding and
Compression Scheme (JDHC) for digital images using side match
vector quantization (SMVQ) and image in painting. In this
JDHC scheme image compression and data hiding scheme are
combined into a single module. On the client side, the data should
be hided and compressed in sub codebook such that remaining
block except left and top most of the image. The data hiding and
compression scheme follows raster scanning order i.e. block by
block on row basis. Vector Quantization used with SMVQ and
Image In painting for complex block to control distortion and
error injection. The receiver side process is based on two
methods. First method divide the received image into series of
blocks the receiver achieve hided data and original image
according to the index value in the segmented block. Second
method use edge based harmonic in painting is used to get
original image if any loss in the image.
This document describes a system for Tamil video retrieval based on categorization in the cloud. The system first categorizes Tamil videos into subcategories based on camera motion parameters. It then segments the videos into shots and extracts representative key frames from each shot based on edge and color features. These features are stored in a feature library in the cloud. When a Tamil query is submitted, the system retrieves similar videos from the cloud based on matching the query features to the stored features. The system is implemented using the Eucalyptus cloud computing platform for its flexibility and ability to handle large computational loads.
This document provides an overview of digital image processing. It discusses key concepts like image types (intensity, binary, indexed, RGB), image file formats (TIFF, JPEG), image resolutions, and the steps involved in digital image processing. The MATLAB Image Processing Toolbox is also mentioned as a tool for performing operations on images like visualization, analysis, and processing. Edge detection is highlighted as an important but difficult task in digital image processing.
Mtech Second progresspresentation ON VIDEO SUMMARIZATIONNEERAJ BAGHEL
This document presents a second progress report on video summarization research. It provides an outline of topics covered, including an introduction to video summarization, a literature review summarizing 5 papers on the topic, identified research gaps, challenges, the problem statement of finding key frames based on extracted text, overview of relevant datasets and tools used, and conclusions. The literature review analyzes the objectives, methods, strengths and limitations of the summarized papers.
This document discusses image processing and summarizes several key techniques. It begins by defining image processing and describing how images are digitized and processed. It then summarizes three main categories of image processing: image enhancement, image restoration, and image compression. Specific techniques discussed include contrast stretching, density slicing, and edge enhancement. The document also discusses visual saliency models, motion saliency, and using conditional random fields for video object extraction.
Medical Video Compression has to be loss less to avoid the danger of diagnostic errors. presentation outlines an approach to improve the compression ratio of medical video sequence using HEVC
This document discusses a content-based video retrieval system based on dominant color and texture features. It begins with an introduction to content-based video retrieval and the challenges involved. It then describes representing video through segmentation into shots and frames. The proposed method extracts dominant color, texture, and color histogram features from frames. Texture is captured through gray-level co-occurrence matrix analysis. A combined feature vector is constructed and similarity measured through Euclidean distance. The system is aimed at efficient video retrieval through analyzing dominant color and texture information.
Image Authentication Using Digital Watermarkingijceronline
International Journal of Computational Engineering Research (IJCER) is dedicated to protecting personal information and will make every reasonable effort to handle collected information appropriately. All information collected, as well as related requests, will be handled as carefully and efficiently as possible in accordance with IJCER standards for integrity and objectivity.
IRJET- Automated Student’s Attendance Management using Convolutional Neural N...IRJET Journal
This document describes a proposed system to automate student attendance management using convolutional neural networks and face recognition. The system would take attendance automatically by detecting faces in the classroom and comparing them to a database of student faces. This would make the attendance process more efficient than current manual methods like calling roll numbers or paper sign-ins. The system would use a CNN algorithm and face detection/recognition techniques like PCA to detect and identify student faces during lectures and automatically update attendance records.
An overview Survey on Various Video compressions and its importanceINFOGAIN PUBLICATION
With the rise of digital computing and visual data processing, the need for storage and transmission of video data became prevalent. Storage and transmission of uncompressed raw visual data is not a good practice, because it requires a large storage space and great bandwidth. Video compression algorithms can compress this raw visual data or video into smaller files with a little sacrifice on the quality. This paper an overview and comparison of standard efforts on video compression algorithm of: MPEG-1, MPEG-2, MPEG-4, MPEG-7
The document discusses digital image processing and representation. It defines digital image processing as using computer algorithms to process digital images. Some key applications mentioned include remote sensing, medical processing, and robotics. The fundamental components of an image processing system are described as image sensors, specialized hardware, computer, software, storage, display and networking. Key steps in digital image processing include image acquisition, enhancement, restoration, representation and description, recognition, and using a knowledge base. Specific techniques like color processing, wavelets, compression, and morphology are also outlined.
The document discusses several key topics related to digital images:
- Raster images are composed of pixels arranged in a grid, while vector images use mathematical descriptions of lines, curves and shapes. Raster images lose quality when scaled while vector images maintain quality.
- Resolution refers to the number of pixels per inch in a raster image, affecting quality of on-screen and printed display. Higher resolutions have more pixels and finer detail.
- Aspect ratio expresses the proportional relationship between an image's width and height, such as 16:9 for HDTV. Formats with unequal ratios require enlarging or adding borders for presentation.
- Common file formats include GIF for simple graphics, JPEG for photographs
The document discusses various topics relating to raster and vector images, including:
- Raster images are composed of pixels while vector images are composed of mathematical objects. Vector images can be scaled without quality loss.
- Common file formats for raster images include JPEG, TIFF, PNG, and GIF while common vector formats are EPS, AI, and PDF.
- Other topics covered include color models (RGB, CMYK), resolution, aspect ratio, and image editing software like Photoshop, Illustrator, and InDesign.
DISTRIBUTED SYSTEM FOR 3D REMOTE MONITORING USING KINECT DEPTH CAMERAScscpconf
This article describes the design and development ofa system for remote indoor 3D monitoring
using an undetermined number of Microsoft® Kinect sensors. In the proposed client-server
system, the Kinect cameras can be connected to different computers, addressing this way the
hardware limitation of one sensor per USB controller. The reason behind this limitation is the
high bandwidth needed by the sensor, which becomes also an issue for the distributed system
TCP/IP communications. Since traffic volume is too high, 3D data has to be compressed before
it can be sent over the network. The solution consists in self-coding the Kinect data into RGB
images and then using a standard multimedia codec to compress color maps. Information from
different sources is collected into a central client computer, where point clouds are transformed
to reconstruct the scene in 3D. An algorithm is proposed to conveniently merge the skeletons
detected locally by each Kinect, so that monitoring of people is robust to self and inter-user
occlusions. Final skeletons are labeled and trajectories of every joint can be saved for event
reconstruction or further analysis.
IRJET- ROI based Automated Meter Reading System using PythonIRJET Journal
This document describes an automated meter reading system using image processing and Python. Key points:
- The system takes images of a meter panel containing multiple meters. It then extracts the meter reading from each meter using image processing techniques in Python like thresholding, contour detection and digit recognition.
- The extracted meter readings are uploaded to a server (ThingSpeak) for remote access. This avoids the need for a service provider to manually record readings.
- An algorithm was developed to detect the region of interest containing each meter display, identify the segments that make up each digit, and recognize the digits based on segment patterns.
- Tests on sample meter images successfully extracted readings from individual meters as well as a panel with
Indoor 3 d video monitoring using multiple kinect depth camerasijma
The document describes a system for remote indoor 3D video monitoring using multiple Kinect depth cameras. The system addresses hardware limitations of connecting multiple Kinect cameras to individual computers by implementing a client-server architecture that allows an unlimited number of Kinects to be connected across different computers. 3D data from the Kinects is compressed before being sent over the network to reconstruct the scene and merge skeleton detections in the central client. An optimal camera layout is also proposed to minimize infrared interference while ensuring overlapping coverage for robust skeleton tracking of moving subjects.
PERCEPTUALLY LOSSLESS COMPRESSION WITH ERROR CONCEALMENT FOR PERISCOPE AND SO...sipij
We present a video compression framework that has two key features. First, we aim at achieving
perceptually lossless compression for low frame rate videos (6 fps). Four well-known video codecs in the
literature have been evaluated and the performance was assessed using four well-known performance
metrics. Second, we investigated the impact of error concealment algorithms for handling corrupted pixels
due to transmission errors in communication channels. Extensive experiments using actual videos have
been performed to demonstrate the proposed framework.
International Journal of Engineering Research and Development (IJERD)IJERD Editor
journal publishing, how to publish research paper, Call For research paper, international journal, publishing a paper, IJERD, journal of science and technology, how to get a research paper published, publishing a paper, publishing of journal, publishing of research paper, reserach and review articles, IJERD Journal, How to publish your research paper, publish research paper, open access engineering journal, Engineering journal, Mathemetics journal, Physics journal, Chemistry journal, Computer Engineering, Computer Science journal, how to submit your paper, peer reviw journal, indexed journal, reserach and review articles, engineering journal, www.ijerd.com, research journals,
yahoo journals, bing journals, International Journal of Engineering Research and Development, google journals, hard copy of journal
Detection of medical instruments project- PART 1Sairam Adithya
this presentation is about a project done by me and my colleague related to computer vision. This project is used to classify the uploaded images of biomedical instruments into prominent ones like ECG, EEG, x-ray machine, CT, MRI, and so on. A website has been developed on which the user can upload any image he is unknown of and the model will tell what instrument it is along with a paragraph explaining the instrument in a crisp manner
The document describes a major project report on a cloud-based intrusion detection system using a backpropagation neural network based on particle swarm optimization. It discusses cloud computing concepts, characteristics, service models, and security threats. The proposed methodology uses particle swarm optimization to optimize training data sets for a backpropagation neural network intrusion detection system. Soft computing techniques like artificial neural networks, fuzzy logic, genetic algorithms, and particle swarm optimization are applied. The objectives are to design an intrusion detection system and evaluate its performance on test data sets.
SIGN LANGUAGE INTERFACE SYSTEM FOR HEARING IMPAIRED PEOPLEIRJET Journal
The document describes a proposed sign language interface system for hearing impaired people. The system aims to use machine learning algorithms like convolutional neural networks to classify hand gestures captured by a webcam into corresponding letters or words. The system would preprocess the images, extract features, then use a trained CNN model to predict the sign and output it as text and speech for better understanding by users. The goal is to help bridge communication between deaf/mute and normal people without requiring specialized gloves or sensors.
Gesture Recognition System using Computer VisionIRJET Journal
This document presents a gesture recognition system using computer vision and convolutional neural networks. It discusses developing classifiers to recognize hand gestures and facial expressions. A dataset of 87,000 images is used to train models to classify 26 letters of the American Sign Language alphabet, as well as additional classes for space, delete and nothing. The models are trained using transfer learning with MobileNet, achieving validation accuracies of over 90% for hand gesture classification and implementing a system that recognizes and translates gestures in real-time. It concludes the paper developed robust models for American Sign Language translation and facial expression recognition using CNNs.
Implementation of embedded arm9 platform using qt and open cv for human upper...Krunal Patel
: In this Paper, A novel architecture for automotive vision using an embedded device will be
implemented on ARM9 Board with highly computing capabilities and low processing power. Currently,
achieving real-time image processing routines such as convolution, thresholding, edge detection and some of the
complex media applications is a challenging task in embedded Device, because of limited memory. An open
software framework, Linux OS is used in embedded devices to provide a good starting point for developing the
multitasking kernel, integrated with communication protocols, data management and graphical user interface for
reducing the total development time. To resolve the problems faced by the image processing applications in
embedded Device a new application environment was developed. This environment provides the resources
available in the operating system which runs on the hardware with complex image processing libraries. This
paper presents the capture of an image from the USB camera, applied to image processing algorithms to Detect
Human Upper Body. The application (GUI) Graphical User Interface was designed using Qt and ARM Linux
gcc Integrated Development Environment (IDE) for implementing image processing algorithm using Open
Source Computer Vision Library (OpenCV). This developed software integrated in mobiles by the cross
compilation of Qt and the OpenCV software for Linux Operating system. The result utilized by Viola and Jones
Algorithm with Haar Features of the image using OpenCV.
This document describes the implementation of a telemedical network application for a health smart home (HSH) system. The HSH system allows patients to be monitored remotely while remaining in their own homes. It involves setting up biomedical sensors in the home to monitor vital signs and transmitting the data via a wireless network to a local monitoring station. The data can then be accessed via a telemedicine interface that allows text, voice, and video communication between patients and doctors. The interface was developed in Visual Basic and uses sockets to enable real-time communication over TCP/IP networks, including transmission of video, audio, and file data through different windows.
IRJET- An Optimized Approach for Deaf and Dumb People using Air WritingIRJET Journal
This document presents a proposed approach for communication using air writing detection and keyword recognition to help deaf and dumb people communicate. The system works by capturing motion from a camera, detecting colors in the HSV color space, preparing the image for character recognition using thresholding, performing optical character recognition to detect characters, and then matching the detected characters to keywords to convey messages through audio or video output. Some benefits of the system include enabling more expressive communication than limited gestures and providing a cost-effective alternative to other sensors. Future work could make the system more compact and wearable for real-time communication.
An Approach Towards Lossless Compression Through Artificial Neural Network Te...IJERA Editor
This document describes research on using an artificial neural network technique for lossless image compression. It proposes using a feed-forward neural network with the Levenberg-Marquardt algorithm for image coding and decoding. Image blocks are encoded through the hidden layer and decoded through the output layer. Training is conducted to select optimal weight matrices for encoding and reconstructing images with high peak signal-to-noise ratio and low mean squared error between original and reconstructed images. The technique aims to achieve lossless compression by eliminating inter-pixel redundancies in images.
IRJET- Analysing Wound Area Measurement using Android AppIRJET Journal
This document describes an Android app that uses image processing techniques to measure wound areas from digital images. The app first pre-processes images to remove noise and enhance edges. It then uses Sobel edge detection, kernel algorithms, and fuzzy c-means clustering to segment the wound from the image. Pixels within the wound boundary are counted and scaled to calculate the actual wound area. The app was found to accurately measure wound areas in clinical tests to within 90% compared to traditional measurement methods. Future work could expand the technique to other medical imaging applications like fractures or retinal diseases.
Metaphorical Analysis of diseases in Tomato leaves using Deep Learning Algori...IRJET Journal
The document discusses using deep learning methods like Convolutional Neural Networks (CNN) and ResNet-50 to identify and detect diseases in tomato plant leaves. A pretrained ResNet-50 model is used as part of a CNN-based disease detection model developed in Keras. Images are classified using Tensorflow. The model is tested on a tomato leaf dataset and achieves successful identification of leaf diseases.
This document presents a distributed framework for analyzing multimodal data from multiple sensors. The framework uses a publish/subscribe architecture to synchronize data collection across sensor nodes. Data is streamed from sensor nodes to processing nodes for analysis. To validate the framework, researchers built a multimodal learning system that collected audio, video, and motion data from presentations to provide feedback. Fifty-four students tested the system, which received positive feedback regarding usability and learning experience. The distributed framework allows scalable and efficient multimodal data collection and analysis.
Background differencing algorithm for moving object detection using system ge...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Detection of medical instruments project- PART 2Sairam Adithya
this presentation is a continuation of the previous one. In this presentation, the work process for individual steps has been clearly explained with snippets of code taken from the source code. This is present along with output visualization, advantages and conclusion.
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
This document describes a Linux-based data acquisition and processing system implemented on a palmtop computer. The system uses a PCMCIA data acquisition card and free Linux drivers and libraries to acquire signals from sensors. As a demonstration, a phonometer application was created that can sample 1024 signals at 100 ksamples/s and compute the fast Fourier transform of the signal up to 6 times per second. The document outlines the hardware and software design of the system, including using a custom Linux kernel, COMEDI libraries for device control, and TCL/Tk for the user interface. Experimental results showed the system could successfully implement the phonometer application for acoustic signal analysis on the palmtop computer.
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
This document describes the development of a data acquisition and processing system using a palmtop computer running Linux. The system uses a PCMCIA data acquisition card and free Linux drivers and libraries. A demo application was created that can sample 1024 signals from a microphone at 100 ksamples/s and compute the fast Fourier transform of the signal up to 6 times per second. The document outlines the hardware and software implementation including developing the C code on a desktop, cross compiling it for the palmtop, and downloading and testing the executable on the palmtop computer. It provides details on using COMEDI libraries for data acquisition and TCL/Tk for the graphical user interface.
Clo architecture for video surveillance service based on p2 p and cloud compu...manish bhandare
This document proposes an architecture for video surveillance services based on peer-to-peer (P2P) and cloud computing technologies. The architecture aims to address issues with traditional centralized video surveillance systems, such as limited bandwidth, storage space, and scalability. It uses concepts from Hadoop such as data replication across multiple nodes. The proposed system has directory nodes that manage metadata and peer nodes that store video data. Each video is replicated across a primary peer node and two secondary peer nodes. This allows the system to be scalable, fault tolerant, efficient and reliable. The document describes the system components, operation flows for video recording and monitoring, and discusses implementation considerations.
This document discusses face detection on embedded systems. It begins by providing background on face detection applications and existing solutions. It then describes implementing a Viola-Jones face detection algorithm on a PC as a software prototype, achieving 80% accuracy. This implementation is then ported to an embedded system using a Nios II softcore processor. Profile analysis shows the bottleneck is searching locations. The document explores reducing the search space through downsampling images using bicubic interpolation, achieving a 4x speedup with no loss of accuracy on test images.
FACE COUNTING USING OPEN CV & PYTHON FOR ANALYZING UNUSUAL EVENTS IN CROWDSIRJET Journal
The document discusses face counting using OpenCV and Python by analyzing unusual events in crowds. It proposes using the Haar cascade algorithm for face detection and counting. Feature extraction is performed using gray-level co-occurrence matrix (GLCM) to extract texture and edge features. Discriminant analysis is then used to differentiate between samples accurately. The system aims to correctly detect and count faces in images using Python tools like OpenCV for digital image processing tasks and feature extraction algorithms like GLCM and discrete wavelet transform (DWT). It is intended to have good recognition accuracy compared to previous methods.
IRJET- A Vision based Hand Gesture Recognition System using Convolutional...IRJET Journal
This document describes a vision-based hand gesture recognition system using convolutional neural networks. The system captures images of hand gestures using a camera, pre-processes the images, and classifies the gestures using a CNN model. The CNN architecture includes convolutional layers, max pooling layers, dropout layers, and fully connected layers. The system was trained on a dataset of images representing 7 different hand gestures. Testing achieved over 90% accuracy in recognizing the gestures. This vision-based approach allows for natural human-computer interaction without physical devices.
An Stepped Forward Security System for Multimedia Content Material for Cloud ...IRJET Journal
The document discusses a proposed system for securing multimedia content on cloud infrastructures. The system uses a two-level approach: 1) generating signatures for 3D videos to robustly represent them with little storage, and 2) a distributed matching engine for scalably storing and matching signatures of original and query objects. The system was tested on over 11,000 3D videos and 1 million images, achieving high accuracy and scalability when deployed on Amazon cloud resources.
ViewShift: Hassle-free Dynamic Policy Enforcement for Every Data LakeWalaa Eldin Moustafa
Dynamic policy enforcement is becoming an increasingly important topic in today’s world where data privacy and compliance is a top priority for companies, individuals, and regulators alike. In these slides, we discuss how LinkedIn implements a powerful dynamic policy enforcement engine, called ViewShift, and integrates it within its data lake. We show the query engine architecture and how catalog implementations can automatically route table resolutions to compliance-enforcing SQL views. Such views have a set of very interesting properties: (1) They are auto-generated from declarative data annotations. (2) They respect user-level consent and preferences (3) They are context-aware, encoding a different set of transformations for different use cases (4) They are portable; while the SQL logic is only implemented in one SQL dialect, it is accessible in all engines.
#SQL #Views #Privacy #Compliance #DataLake
Orchestrating the Future: Navigating Today's Data Workflow Challenges with Ai...Kaxil Naik
Navigating today's data landscape isn't just about managing workflows; it's about strategically propelling your business forward. Apache Airflow has stood out as the benchmark in this arena, driving data orchestration forward since its early days. As we dive into the complexities of our current data-rich environment, where the sheer volume of information and its timely, accurate processing are crucial for AI and ML applications, the role of Airflow has never been more critical.
In my journey as the Senior Engineering Director and a pivotal member of Apache Airflow's Project Management Committee (PMC), I've witnessed Airflow transform data handling, making agility and insight the norm in an ever-evolving digital space. At Astronomer, our collaboration with leading AI & ML teams worldwide has not only tested but also proven Airflow's mettle in delivering data reliably and efficiently—data that now powers not just insights but core business functions.
This session is a deep dive into the essence of Airflow's success. We'll trace its evolution from a budding project to the backbone of data orchestration it is today, constantly adapting to meet the next wave of data challenges, including those brought on by Generative AI. It's this forward-thinking adaptability that keeps Airflow at the forefront of innovation, ready for whatever comes next.
The ever-growing demands of AI and ML applications have ushered in an era where sophisticated data management isn't a luxury—it's a necessity. Airflow's innate flexibility and scalability are what makes it indispensable in managing the intricate workflows of today, especially those involving Large Language Models (LLMs).
This talk isn't just a rundown of Airflow's features; it's about harnessing these capabilities to turn your data workflows into a strategic asset. Together, we'll explore how Airflow remains at the cutting edge of data orchestration, ensuring your organization is not just keeping pace but setting the pace in a data-driven future.
Session in https://budapestdata.hu/2024/04/kaxil-naik-astronomer-io/ | https://dataml24.sessionize.com/session/667627
The Ipsos - AI - Monitor 2024 Report.pdfSocial Samosa
According to Ipsos AI Monitor's 2024 report, 65% Indians said that products and services using AI have profoundly changed their daily life in the past 3-5 years.
1. A Hygiene Monitoring System
Brian Cherin Rohan Jinturkar
brncherin@gmail.com rjinturkar@gmail.com
Bhargavi Lanka Daniel McKeon Nivedhitha Sivakumar
bhargavilanka29@gmail.com dftmckeon@gmail.com blueflute19@gmail.com
Alex Weiner*
alexweiner@alexweiner.com
New Jersey’s Governor’s School of Engineering and Technology
July 27, 2018
*Corresponding Author
Abstract—In the healthcare industry, patients are often ex-
posed to harmful pathogens due to a lack of compliance to hand
hygiene protocol. The vast majority of healthcare professionals
do not abide by hand hygiene standards established by the
World Health Organization, facilitating the spread of nosocomial
(hospital-acquired) infections. When representatives trained in
proper handwashing procedures monitored medical profession-
als, there was a significant increase in compliance with proper
protocol. Given this correlation between observance and adher-
ence, a Hygiene Monitoring System was developed to monitor
handwashing through the application of machine learning. The
embedded system captured, processed, and compared instances
of handwashing to the proper procedure. An implementation of
this system would encourage healthcare professionals to follow
the official protocol denoted by the World Health Organization
and dramatically reduce the likelihood of healthcare–associated
infections.
I. INTRODUCTION
Hands are the primary pathways of germ transmission, yet
healthcare professionals often overlook proper handwashing
protocol [1]. In one study, only 22% of hospital workers
complied with protocol [2]. Under surveillance, compliance
increased to 52.2%, demonstrating a trend between observation
and adherence. This issue has global ramifications: healthcare-
associated infections (HCAI) contribute to over 135,000 deaths
in developed nations each year [3]. The Hygiene Monitoring
System developed in this study monitors and identifies proper
hand hygiene in hospitals to increase adherence to protocol.
II. BACKGROUND
A. Correct Handwashing Procedure
The World Health Organization (WHO) defines handwash-
ing as proper when six motions are performed: the simple
press; the roof; the webs; the links; the thumbs; and the
mortar and pestle. Pictured in figure 1, these motions should
be performed for at least 40-60 seconds. The webs and thumbs
motions involve the right hand acting on the left and vice-
versa; for instance, one must scrub both the left and right
thumbs to abide by WHO guidelines [4].
Figure 1. Top left to right: simple press, roof, web; Bottom left to right: link,
thumbs, mortar & pestle
The WHO also recommends that healthcare workers fol-
low proper handwashing procedure after five key moments.
They are defined as 1) before touching a patient, 2) before
clean/aseptic procedures, 3) after body fluid exposure/risk, 4)
after touching a patient, and 5) after touching patient surround-
ings. The concept that healthcare workers must consistently
abide by WHO protocol is essential to the purpose of this
system [5].
B. Software
C was the computer programming language chosen for
this project. It was developed in 1972 by Dennis Ritchie
at Bell Labs, and is closely associated with the Unix op-
erating system. It is commonly considered a middle level
programming language, which does not refer to its difficulty
1
2. or programming power, but rather its ability to access the
system's more basic functions while still supplying higher
level constructs. Compared to other higher level programming
languages such as Python, C is more efficient in its processing
abilities, allowing for rapid analysis of adherence to protocol.
Linux is a family of open-source, free operating systems
that can be installed on a variety of electronic devices. This
system utilized Ubuntu Linux 12.04, which came pre-installed
on the NVIDIA Jetson TX2. The operating system supports
C programming as well as direct command line input in the
form of UNIX commands, also referred to running commands
in the terminal.
Bash is a command processor that allows the system to
read commands directly from a file. Various Bash scripts were
executed in order to install the libraries defined later in this
paper.
C. Hardware
The NVIDIA Jetson TX2 is an embedded system with
machine learning capabilities. Equipped with USB ports and
digital/analog pins, it is capable of connecting to a variety
of sensors and peripheral devices. The NVIDIA Jetson TX2
runs on C++ but has C functionality and works with the
Ubuntu operating system. The included Graphics Processing
Unit (GPU) supports the machine learning algorithms used in
this study. The system also supports a Camera Serial Interface
(CSI) which is an embedded camera system that captures high
definition images and videos [6].
A GPU, unlike a Central Processing Unit (CPU), has the
ability to execute or handle thousands of threads simultane-
ously. A GPU contains many more processing units (cores)
than a CPU, enabling it to process data very quickly. This
dramatically reduces computation time for running complex
algorithms, increasing efficiency. This practice of using GPUs
to perform computations is referred to as general-purpose
computing on graphics processing units [7].
D. Image Processing
This study involved conversions between multiple file
types. They are defined below for reference:
• MP4, or MPEG-4 Part 14, is a modern multimedia file
format for storing video and audio in a space and quality
efficient format with widespread compatibility.
• AVI, or Audio Video Interleave, is an older multimedia
file format for storing video and audio with less compat-
ibility than more modern formats.
• JPEG, named for its development by the Joint Photo-
graphic Experts Group, is a file format for storing images
in smaller sizes than other formats. It utilizes approxima-
tions while compressing pictures and thus cannot support
transparency.
• PNG, or Portable Network Graphics, is a file format
for storing images with greater quality than most image
formats as a result of utilizing lossless compression.
Unlike JPEG, it supports transparency but sacrifices a
smaller file size for greater quality.
The red, green, and blue (RGB) color model is a format for
producing a broad array of colors by adding varying amounts
of red, green, and blue pigment, ranging from the values 0
to 255 for each pigment. By describing colors in this format,
electronic systems can differentiate between shades. In this
project, RGB pixel values were extracted from images to
facilitate the identification of hand motions within each image.
The Gaussian Blur uses a Gaussian distribution to blur an
image. This method of blurring the image reduces image noise
and increases the accuracy of the Support Vector Machines
used in the system. In this system, two constants necessary
to execute the Gaussian blur, the radius and the sigma value,
were assigned to be 10 and 5, respectively.
ImageMagick is a photo and video editing library com-
patible with UNIX terminal commands. Its implementation
allowed for the elimination of complex algorithms. Through
the application of a Gaussian Blur, alteration of light contrast,
and cropping of images, this library aided in maintaining the
visual consistency of the images throughout this study [8].
FFmpeg is a library used to edit and manipulate information
stored in various video formats. FFmpeg commands available
within the UNIX terminal could separate a recorded video into
individual frames without compromising quality.
Lossless image compression is a format for image data
storage that utilizes additional space in order to preserve image
quality. This image formatting technique was utilized in place
of lossy compression, which would reduce quality in order to
reduce the amount of space the image uses in storage. The
lossless image compression provided by the PNG file format
was chosen because image quality was preferred over storage
preservation while training a machine learning model.
E. Relevant Machine Learning
Machine learning is a subset of artificial intelligence that
provides systems with the ability to learn and refine models
through experience, as opposed to relying on explicit coding.
This field intends to help systems learn automatically to
make better decisions in the future based on given data and
examples.
The K-Means segmentation algorithm, one processing tech-
nique common in machine learning, can be used to isolate the
hands from the background of the images. K-Means segments
an image into k groups based on the similarity of pixel color
values. If an image contains someone's hands in front of a
white background (i.e. a sink), K-Means with a clustering of
k=2 will classify the pixels into either the group of background
pixels or the group of hand pixels. Once the two regions have
been identified, the relevant pixels can be preserved, while the
unwanted background pixels can be removed before the image
data is sent to the Support Vector Machine [9].
Support Vector Machines (SVM) are learning algorithms
that analyze data for regression and classification purposes.
An SVM attempts to draw a line between two data sets, as
depicted in Figure 2, in order to classify those above the line
2
3. into one set, and those below the line into another. SVMlight
is an implementation of a Support Vector Machine in the
programming language C that provides various algorithms.
SVMlight
's pattern recognition function trained the system to
classify new datasets of hand motions against a standard and
check for a match.
Figure 2. Support Vector Machine Separation
SVMlight
uses kernel functions in order to classify data
in different forms. Kernels are methods of organizing data,
based on an equation. SVMlight
offers the option to create a
kernel in addition to four predefined types of kernels: Linear,
Polynomial, Radial Basis Function (RBF), and Sigmoid Tanh.
This system employed linear kernels [K(X, Y ) = XT
Y ],
which attempted to match images to known handwashing
motions based on a linear equation [10].
A trained model is generated by giving a machine learning
algorithm examples of categorized data in order to “teach” it
to categorize new information, which it does through kernels.
This trained model is what is used to match new images to
already known handwashing images. In SVMlight
, a model
is trained with raw data provided in specialized text files.
These data files are in a specific format with attributes called
features and values, which are paired together. When new data
is collected, it can be compared against a trained model to
determine if the data matches the data represented within the
model.
III. EXPERIMENTAL PROCEDURE
The primary subsets of the experimental process were data
collection, image processing, and machine learning.
A. Preface
The final implementation of the Hygiene Monitoring System
would consist of activation, recording, image processing, and
handwashing procedure verification. However, before imple-
menting the system, it was necessary to develop a trained
model by manually collecting data in a video format before
being split into images stored in a PNG format. This training
data was then prepared by applying a contrast, a Gaussian
Blur, and was resized. The data was then iterated through
with a C script to convert the images into raw data to feed
into the SVMlight
. When a recording is started, data would
be automatically collected, processed, and analyzed by the
system.
The terminal command nvgstcapture, pre-installed on the
Jetson TX2, begins the camera recording. This study utilized
the embedded CSI camera system with a resolution of 1920
pixels by 1080 pixels and a frame rate of 60 frames per second.
The above command produced JPEG files at an interval of
one picture per second for 40 seconds, for a total of 40
pictures. This differs from the collection of training data as
the tremendous amount of data stored in a video is only
necessary for training the model. Using ImageMagick, these
images designed for testing were cropped and modified with
a slight contrast and Gaussian Blur. Subsequently, a K-Means
algorithm was applied to the each image to remove extraneous
information by setting the values of all background pixels
to black. At the end of this process, the images contained
a slightly blurred pair of hands with a black background. The
pixel values of these images were then run through a script in
C (Appendix Reference to Code) to format the image data to
make them readable by the SVM. This formatting was done so
that the SVM could classify the testing data more accurately,
because the images would be as similar as possible to the data
the SVM used to learn. The formatted data was inputted into
SVMlight
to test the model.
The model was eventually tested by evaluating the accuracy
of the system in identifying a single motion. After performing
the first test, there was more test data created to test the ability
of the system to identify all of the motions correctly when
provided with an entirely proper test dataset. These various
tests were performed in order to gauge the efficacy of the
trained model in various expected circumstances.
B. Segmenting Training Data into Images
The collection of data was necessary in order to train the
model. Video was recorded of the subject performing the
six handwashing motions, centered in a well-lit environment
with a white background. The demonstration was performed
within two feet of the lens. The videos were recorded and
saved in an AVI video format. The data consisted of the
handwashing motions each recorded for approximately 55
seconds in the order of R over L web, L over R web, Link,
R Thumb, L Thumb, Simple Press, Roof, and Mortar Pestle,
where “R” represents the right hand, and “L” stands for the
left hand.
Afterward, the video was edited to remove transitions
between motions and rendered into an AVI format before
being split into individual images in a PNG format. Through
this process, the eight minutes and six seconds of video were
separated into 28,806 individual images using the FFmpeg
library in the UNIX terminal. The command
ffmpeg -i 5motions scoop.avi ./Frames/motions%05d.png
was run, where 5motions scoop.avi was the rendered video
3
4. of purely the hand motions and ./Frames/motions%05d.png
saved each individual frame of the video into a directory
named Frames.
C. Image Processing
1) Preprocessing of Training Images
ImageMagick was used to alter the images before using
them as training data. The command
mogrify -shave 300x75 -contrast -brightness-contrast
5 -gaussian-blur 10x5 “filename”
allowed for the cropping of 300 pixels off the sides
and 75 pixels off the top and bottom, the addition of minor
contrast, and the application of a Gaussian Blur. The cropping
removed any undesired details while the contrast intensified
the difference between the lighter and darker elements of the
image [10]. This command was added into a bash script that
iterated through each of the 28,806 PNG files in a specified
directory to crop and apply a contrast to each image.
A script was then written in C to first apply a K-Means
algorithm and then to transition each of the PNG files into
data files in a format readable by the SVMlight
library. The
K-Means algorithm categorized each pixel of the image into
either a background class or a hands class. Each pixel assigned
to the background class was then effectively removed from the
image by being set to black, while the pixels assigned to the
hands class were maintained to be sent to the SVM training
process.
The reformatting algorithm proceeded to break down each
image into a text file consisting of many rows representing the
pixels of each image. Each row consisted of attributes called
features and values in the format of feature:value, where the
feature was the index of every fifth pixel and the value was
a representation of the color of that pixel. An interval of five
pixels was chosen to significantly reduce the file size required
to store the data. The color value used to represent each feature
(pixel) was the sum of the red value multiplied by 100, the
green value multiplied by 10, and the blue value multiplied
by 1. In this way, three values of color (red, green, blue) were
consolidated into one large value to match to every fifth pixel.
Together, these rows of features and corresponding values were
combined into text files that would be interpretable by the
Support Vector Machines.
2) Testing Images’ Processing Pipeline
In order to accurately test data against the trained Support
Vector Machines, the images supplied to the machine learning
algorithms needed to appear in a similar format to those
used to train the model. Thus, the image adjustments and
background removal from the preprocessing steps were also
used in the image processing pipeline.
The images collected from the Jetson TX2’s camera were
also resized to match the training images. Each image was
then provided with a level of contrast, brightness, and a
Gaussian Blur after being resized using the ImageMagick
command
mogrify -resize 1320x930! -contrast -brightness-contrast
5 -gaussian-blur 10x5.
These images were then converted into the PNG format
with the ImageMagick command
magick convert filename.jpg filename.png
where filename.jpg is a JPEG file captured from the
CSI camera on the NVIDIA Jetson TX2 and filename.png
represents the output of the new file.
Next, to remove any bias that would be introduced by irrel-
evant pixel data, the background of each image was removed
to isolate the hands in the image. This was an essential step,
as isolating the hands generated a clearly-defined shape and
outline for each hand motion. This information, in the form
of the absence of the surrounding pixels, would be a signif-
icant factor in the machine learning process. The K-Means
clustering algorithm was used to remove the backgrounds.
This algorithm segments an image into k groups based on the
similarity of pixel color values. The algorithm treats the data
in this system as a three-dimensional system, with dimensions
of red, green, and blue color values. This allowed for the
distance between the color values of two given pixels to be
numerically calculated, thus providing a direct measure of the
pixels'similarity.
The general K-means algorithm proceeds as follows:
1) Select k initial centroids (the center of each group)
randomly within the bounds of the set of data.
2) For each data point, find the closest centroid to that
point by calculating distance values between each pixel
and the centroid pixels. Assign the point to the group
belonging to that centroid.
3) For each group, take the RGB values of each pixel and
add them separately, so that there is a total red value, a
total green value, and a total blue value for all the pixels
in that group.
4) Divide each value by the total number of pixels in that
group in order to get the average red, green, and blue
values.
5) Set the centroid pixel’s color value of that group equal
to those average RGB values.
6) Repeat steps 2, 3, 4, and 5, until the location of each
centroid does not change.
For the purposes of this system of SVMs, a value of k = 2
was established in order to segment the image into two parts:
the background and the hands. However, the initial results
from the K-Means algorithm appeared to be inaccurate and
inconsistent, resulting in unsuitable training data. This was
likely because of the random initialization of the centroids,
which may have resulted in color values that were too close
to each other to allow for effective separation of the image
into groups.
To accommodate for this, rather than beginning with random
centroids, a targeted centroid was selected as the average color
4
5. of the top-left corner of the image (an area assumed to be part
of the background), and a second centroid from the average
color of the center of the image (an area assumed to be the
hand). At the end of the algorithm, the group resulting from the
centroid from the top-left would contain all of the background
pixels, and the group resulting from the centroid from the
center would contain all of the relevant hand pixels. Once the
groups were determined, the pixels in the background group
were changed to black pixels. This effectively removed the
background pixels from the machine learning training process,
isolating the hands from the background of each image.
The final product of these image processing techniques was
an image of hands that retained their most distinguishing
characteristics surrounded by a black background. Figure 3
depicts the results of each image processing technique.
Figure 3. Left to right: example hands image, Gaussian blur applied, K-Means
applied to isolate hands
At this point, the algorithm written in the C programming
language used in the preprocessing step was again used to
convert the images into the text files readable by the SVMs.
The data was then compared to the trained model created via
the SVMlight
library within the Ubuntu 12.04 command-line.
(Appendix Reference to Code)
D. Machine Learning
Machine learning was used to analyze processed images and
create a trained model representing the data. The trained model
later served as a reference to categorize given hand motions
as correct or incorrect.
1) Using Data to Train a Model
A process that can identify a certain hand motion as proper
must be formatted to have features identical to those of the
trained model. Once the image data was formatted correctly,
the learn function of SVMlight
was used to create a model
(support vector machine) for each of the eight hand motion
variations. To do so, correct training examples of each hand
motion were fed into the machine learning algorithms of
SVMlight
. The models were created while retaining the greatest
amount of detail using a lossless compression format with
PNG image files and splitting the original video into 60 frames
per second.
The data was then converted into a format easily interpreted
by the SVMlight
library and its algorithms by using a C script
(Appendix Reference to Code). Once converted into a suitable
format, the data was used to train the model with the command
svm learn training data model,
where training data was a file containing rows of image
data to train on, as described previously, and model was the
output file name for the resultant trained model.
2) Comparing Collected Data to the Model
The final step in the implementation of the Hygiene
Monitoring System was to classify recorded data as correct or
incorrect. Classification occurred through the comparison of
collected data to the trained model, using SVMlight
command
svm classify collected data model results
where collected data was a file containing the data
recorded, which would be compared against the model file,
containing the trained model, and results was the output
of the results of the comparison. The output file contained
information that described how close the collected data
adhered to the model by reporting the percentage of the six
handwashing motions completed out of those represented
within the model to determine the person’s overall conformity
to the procedure. A value of 80-90% would demonstrate the
completion of a correct handwashing procedure. A value of
100% is not expected because several frames of a sample
test data set are expected to contain no hand motions, such
as the transition between two hand motions. However, if a
percentage significantly lower than 80-90% was returned, it
would be clear that incomplete handwashing was observed.
3) Testing the Model
Two tests were performed to evaluate the accuracy of
the Hygiene Monitoring System. The first test attempted to
compare data comprised of the link motion against the trained
model. This returned a percentage of compliance to proper
handwashing procedure, depending on the motions detected.
Then, the percentage returned was analyzed to determine if
the link motion was detected.
In the second test, new testing data was recorded and
processed in an identical format to the training model data in
order to perform more intensive testing. Thus, the new images
were processed with a Gaussian Blur, K-Means partitioning,
resizing to 1320 pixels by 930 pixels, and an application
of light contrast. This new data included the simple press,
webs, mortar and pestle, thumbs, and roof motions, omitting
the link motion due to data collection limitations. The result
was indicated by the percentage derived from the correctly
recognized images out of the total number of images tested for
each specific motion. Subsequently, the data was also analyzed
to report the overall percentage of accuracy for all motions,
or the number of correctly classified images out of the total
number of tested images.
IV. RESULTS
The performance of the Hygiene Monitoring System was
assessed with test data collected in a simulated environment
with the intent of verifying the functionality of the system in
a controlled setting.
This evaluation tested the system’s accuracy in correctly
categorizing images as distinct hand motions. Multiple sets
5
6. of test data were created and compared against the trained
models.
In order to test the ability of the model to detect a single
motion at a time, the first test example contained only the link
hand motion being performed. When this data was classified
through the support vector machine’s model, it yielded results
showing that it fulfilled 14% of the expected handwashing
procedure. Since the model expects eight distinct motions (six
unique techniques, but two are repeated for each hand), the
observed percentage of 14% is extremely close to the expected
value of 14.28%.
In the next test, the system attempted to classify a data
set containing images of a complete handwashing procedure.
However, due to data collection limitations, the link motion
was excluded from this set. The other hand motions and
associated data were unaffected. The images in the test data
set included two to five frames for each motion. This data
was then compared to the trained model through SVMlight
to identify the percentage compliance to the handwashing
procedure by the test subject. The number of frames that were
properly identified are indicated below in figure 4. 18 images
were correctly identified out of 24 tested, corresponding to an
aggregate of 75% accuracy in correctly identifying the hand
motions.
Figure 4. Results
V. CONCLUSIONS
The results produced indicate that the Hygiene Monitoring
System can correctly identify compliance to proper handwash-
ing procedure.
A. Significance
A successful implementation of the Hygiene Monitoring
System could dramatically reduce the rate of infection and
deaths related to healthcare negligence. On any given day,
about one in twenty-five hospital patients contract at least one
healthcare-associated infection [11]. Although handwashing
seems like an elementary process, multiple studies illustrate
the immense ramifications of a lack of regard for procedure.
This system encourages greater compliance, a major step
toward limiting the spread of disease and prioritizing patient
welfare.
B. Applications
In the future, the Hygiene Monitoring System can be im-
plemented in hospitals around the world. The system's ability
to monitor and identify correct handwashing practices will in-
crease transparency within the healthcare industry. Moreover,
the systems records could give insight into any liability issues
or legal matters, offering an objective viewpoint into the often
subjective and complex world of medical lawsuits.
The concept of the Hygiene Monitoring System also lends
itself to the food service industry. The spread of germs from
workers to food accounts for 89% of outbreaks correspond-
ing to food contamination [12]. The implementation of the
Hygiene Monitoring System in restaurants and food vendors
could advance the state of sanitation in todays world and
mitigate the spread of foodborne illness.
Furthermore, the system developed in this study demon-
strates a computer’s ability to differentiate between human
hand motions. This would prove useful in the interpretation
of sign language hand positions, as a system trained with
enough images could develop models representing various
words and phrases. Such a development would significantly
improve accessibility for hearing-impaired individuals and
those attempting to communicate with them.
C. Future Improvements
Certain improvements can be made to refine the reliability
of this Hygiene Monitoring System. Most significantly, the
collection and usage of more data will improve the training
models, increasing the probability that real world data will be
correctly classified by the system.
The K-Means algorithm could be improved to suit a wider
range of image cases. The current algorithm operates under
the condition that the hands are in the center of the image.
Such a change would
A stereographic camera (a camera that records from two
slightly different positions simultaneously in order to generate
a 3D representation of an image) would provide depth to the
data collected. By deriving SVM values from the physical
location of a hand rather than from the color values of pixels,
the system could be more reliable when testing data with hands
of varied skin tones, or with a background of a similar color
to the hands.
The Hygiene Monitoring System could be more user-
friendly. Currently, the system is activated by a keyboard
press, which is not appropriate for a real-life context such
as a hospital environment. In the future, the system will
ideally be activated by a motion sensor to detect when an
employee begins washing their hands. Further integration with
a hospital system would also be an ideal addition. Each
employee, before washing their hands, would check into the
system using an radio-frequency identification device (RFID)
tag, and once the employee washed their hands and the system
determined the level of compliance with the six hand motions,
the system would record the level of compliance in a database
of employee data.
6
7. The kernel currently being used in the SVM is linear, but
there are other, more complex kernels that the SVM could
use as well. The polynomial kernel separates the result of a
classification based on a defined, but arbitrary order. The RBF
kernel separates the result of a classification using normal
curves around the data points and sums them so that the
decision boundary can be defined by a type of geometric
condition (topology). The Sigmoid Tanh kernel separates the
result of a classification using a logistic function to define
curves depending on the logistic value being greater than the
value generated through the model based upon probability.
Working with any of these kernels in the SVM could increase
the system’s accuracy.
APPENDIX
ACKNOWLEDGMENT
The authors of this paper gratefully thank the following:
project mentor Alex Weiner for his extensive experience and
hands-on involvement; Residential Teaching Assistant Siddhi
Shah for her invaluable support and efforts in coordination
and communication; Research Coordinator Brian Lai and Head
Counselor Nicholas Ferraro for their guidance and feedback;
Program Director Ilene Rosen, Ed. D. and Associate Program
Director Jean Patrick Antoine, without whom this project
would not have been possible. Finally, the authors appreciate
the generosity of the sponsors of the New Jersey Governor's
School of Engineering and Technology: Rutgers University,
Rutgers School of Engineering, The State of New Jersey,
Lockheed Martin, Silver Line, Rubiks, other Corporate Spon-
sors, and NJ GSET Alumni.
REFERENCES
[1] “WHO Guidelines on Hand Hygiene in Health
Care”, Apps.who.int, 2018. [Online]. Available:
http://apps.who.int/iris/bitstream/handle/10665/44102/9789241597906
eng.pdf?sequence=1. [Accessed: 22- Jul- 2018].
[2] Sebille V, Chevret S, Valleron AJ. Modeling the spread of resistant
nosocomial pathogens in an intensive-care unit. Infection Control and
Hospital Epidemiology, 1997, 18:8492.
[3] Klevens R et al. Estimating health care-associated infections and deaths
in U.S. hospitals, 2002. Public Health Report, 2007, 122:160166.
[4] “How to Handwash?”, Who.int, 2018. [Online]. Available:
http://www.who.int/gpsc/tools/HAND WASHING.pdf. [Accessed:
22- Jul- 2018].
[5] “Your 5 Moments for Hand Hygiene”, Who.int, 2018. [Online]. Avail-
able: http://www.who.int/gpsc/tools/5momentsHandHygiene A3.pdf.
[Accessed: 22- Jul- 2018].
[6] “Jetson TX2 Module”, NVIDIA Developer, 2018. [Online]. Available:
https://developer.nvidia.com/embedded/buy/jetson-tx2. [Accessed: 22-
Jul- 2018].
[7] “GPU Programming—NVIDIA,” NVIDIA Developer, 2018. [online].
Available: http://www.nvidia.in/object/gpu-programming-in.html [Ac-
cessed 26 Jul. 2018].
[8] I. LLC, “Convert, Edit, Or Compose Bitmap Images @
ImageMagick”, Imagemagick.org, 2018. [Online]. Available:
https://www.imagemagick.org/script/index.php#features. [Accessed:
22- Jul- 2018].
[9] Ng, A. and Piech, C. “K Means”, 2013. [Online] Available at:
http://stanford.edu/ cpiech/cs221/handouts/kmeans.html [Accessed 26
Jul. 2018].
[10] “SVM-Light Support Vector Machine”, Svmlight.joachims.org, 2018.
[Online]. Available: http://svmlight.joachims.org/. [Accessed: 22- Jul-
2018].
7