1. Multimodel Operation for Visually
Impaired People Using Deep Learning
Arockia Jaya J1, Senthilkumar B2, Mahalakshmi K3
1Department of Computer Science and Engineering, Idhaya Engineering College for Women, Chinnasalem
2Department of Electronics and Communication Engineering, Kalaignarkarunanidhi Institute of Technology,
Coimbatore
3Department of Computer Science and Engineering, Kalaignarkarunanidhi Institute of Technology, Coimbatore
ABSTRACT In this paper, recommended a technique called the multi-view
object tracking (MVOT) system to resolve the multiple cameras monitor an area
from different angles. Videos recorded by the cameras contain complementary
information and fusing the knowledge embedded in the videos facilitates the
development of a robust and accurate system. The set ask of cameras that have
different settings, we proposea correspondence Yolo V3 algorithm that maps
each segmented group of objects in one view to the correspondinggroup in
another view. We call these corresponding groups matched blob clusters, each
of which enables knowledge to be shared between cameras. It follows that we
present a two-pass regression framework for multi-view objects.
INDEX TERMS MVOT (multi-view object tracking), two pass regression
framework,
1 INTRODUCTION
The term digital image refers to processing of a two dimensional
picture by a digital computer. In a broader context, it implies digital processing
of any two dimensional data. A digital image is an array of real or complex
numbers represented by a finite number of bits. An image given in the form of a
transparency, slide, photograph or an X-ray is first digitized and stored as a
matrix of binary digits in computer memory. This digitized image can then be
processed and/ordisplayed on a high-resolution television monitor. Fordisplay,
2. the image is stored in a rapid-access buffer memory, which refreshes the monitor
at a rate of 25 frames per second to produce a visually continuous display.
The image processing system
Figure 1.1 Block diagramfor image processing system
1.1.1 Digitizer
A digitizer converts an image into a numerical representation suitable for
input into a digital computer. Some common digitizers are
1. Microdensitometer
2. Flying spotscanner
3. Image dissector
4. Videocon camera
5. Photosensitive solid- state arrays.
Image Processor
An image processor does the functions of image acquisition, storage, pre-
processing, segmentation, representation, recognition and interpretation and
3. finally displays orrecords theresulting image. The following blockdiagram gives
the fundamental sequence involved in an image processing system.
Figure 1.2 Block diagramof fundamental sequence involved in an image
As detailed in the diagram, the first step in the process is image acquisition by an
imaging sensorin conjunction with a digitizer to digitize the image. The next step
is the preprocessingstep where the image is improved being fed as an input to the
other processes. Preprocessing typically deals with enhancing, removing noise,
isolating regions, etc. Segmentation partitions an image into its constituent parts
or objects. The output of segmentation is usually raw pixel data, which consists
of either the boundary of the region or the pixels in the region themselves.
Representation is the process oftransforming the raw pixel data into a form useful
for subsequent processing by the computer. Description deals with extracting
features that are basic in differentiating one class of objects fromanother.
Recognition assigns a label to an object based on the information provided by its
descriptors. Interpretation involves assigning meaning to an ensemble of
recognized objects. The knowledge about a problem domain is incorporated into
the knowledge base. Theknowledge base guides the operation of each processing
4. module and also controls the interaction between the modules. Not all modules
need be necessarily present for a specific function. The composition of the image
processing system depends on its application. The frame rate of the image
processor is normally around 25 frames per second.
Digital computer
Mathematical processing of the digitized image such as convolution, veraging,
addition, subtraction, etc. are done by the computer.
Mass storage
The secondarystorage devices normally used are floppy disks, CD
ROMs etc.
Hard copy device
The hard copy device is used to produce a permanent copy of the image and
for the storage of the software involved.
Operator console
The operator console consists of equipment and arrangements for verification
of intermediate results and for alterations in the software as and when require.
The operator is also capable of checking for any resulting errors and for the
entry of requisite data.
1.1.2 BACKGROUND
Image processing fundamental
Image Processing is the use of computer algorithms to perform image
processing on digital images. As a sub category or field of digital signal
processing, digital image processing has many advantages over an a log
image processing. It allows a much wider range of algorithms to be applied
5. to the input data and can avoid problems such as the build-up of noise and
signal distortion during processing. Since images are defined over two
dimensions (perhaps more) digital image processing may be model in the
form of multidimensional systems.
Many of the techniques of digital image processing, or digital picture
processing as it often was called, were developed in the 960s at the JetPro
pulsion Laboratory, Massachusetts Institute of Technology, Bell
Laboratories, University of Maryland, and a few other research facilities, with
application to satellite imagery, wire-photo standards-conversion, medical
imaging, videophone, character recognition, and photograph
enhancement. The cost of processing was fairly high, however, with the
computing equipment of that era. That changed in the 1970s, when digital
image processing proliferated as cheaper computers and dedicated hardware
became available. Images then could be processed in real time, for some
dedicated problems such as television standards conversion. As general-
purpose computers became faster, they started to take over the role of
dedicated hardware for all but the most specialized and computer-intensive
operations.
Digital image processing allows the use of much more complex
algorithms, and hence, can offer both more sophisticated performance at
simple tasks, and the implementation of methods which would be impossible
by analogy means.
In particular, digital image processing is the only practical technology
for:
Classification
Feature extraction
Pattern recognition
Projection
6. Multi-scale signal analysis
Image processing is a method to convert an image into digital for and
perform some operations on it, in order to get an enhanced image or to extract
some useful information from it. It is a type of signal dispensation in which
input is image, like video frame or photograph and output may be image or
characteristics associated with that image. Usually, Image Processing system
includes treating mages as two dimensional signals while applying already
set signal processing methods to them. It is among rapidly growing
technologies today, with its applications in various aspects of a business.
Image Processing forms ore research area within engineering and computer
science disciplines too.
Digital image processing refers processing of the image in digital form.
Modern cameras may directly take the image in digital form but
generally images are originated in optical form. They are captured by video
cameras and digitalized. The digitalization process includes sampling,
quantization. Then these images are processed by the five fundamental
processes, at least any one of them, not necessarily all of them.
Problems and challenges:
The current methodology for counting the number of objects in images captured
by cameras. By Integrating effect and estimation of the size of a crowd, makes
the intra-camera visual features are not more effective and accurate and scalable
counting model and to explore inter camera knowledge, it need not to be
exploiting and adapting heterogeneous information to handle the difficult aspects
of objects counting. Finally, we match blobs to compensate for the variations
among cameras, and propose a blob matching algorithm that derives a set of
inconsistent entities from different views and it ensures that knowledge sharing
7. is successful. Intra camera visual are not much effective and accurate when
adapting the different aspects of object counting.
PROPOSED METHODOLOGY
It has been often seen that in an uncontrolled condition only some regions of the
whole image are affected by facial changes due to variations ofpose,illumination,
expression, etc. The conventional appearance-based global feature extraction
methods are usually applied on the whole face image. As a result, these methods
are not suitable to copewith above mentioned local facial changes. Therefore, to
improve the robustness of a face recognition system, it is necessary to consider
local features from these parts of the face region along with the global features in
the feature extraction process.Ourmethodology involves two main modules. The
first stage involves acquiring image from the web camera and converting it into
text document using Optical Character Recognition (OCR). The second stage
involves natural language processing and digital signal processingfor converting
the text into speech using Text to Speech synthesizer (TTS).
Steps involved in our methodology
1. Image acquisition by the web camera
2. Loading the image into the axial panel of the created Graphical User
Interface (GUI)
3. Pre-processing of the image (RGB to gray image, contrast adjustment,
adaptive threshold)
4. Converting pre-processed image into text document using OCR
5. Converting text document into speech using TTS.
8. Image analysis is the extraction of meaningful information from images; mainly
from digital images by means of digital image processing techniques. Many
important image analysis tools such as edge detectors and neural networks are
inspired by human visual perception models. Computer image analysis largely
contains the fields of computer or machine vision, and medical imaging, and
makes heavy use of pattern recognition, digital geometry, and signal processing.
1.6 YOLO V3
YOLO V3 consider classification as one of the most dynamic research and
application areas. Yolo V3 is the branch ofArtificial Intelligence (AI). Theneural
network was trained by Yolo V3 algorithm. The different combinations of
functions and its effect while using Yolo V3 as a classifier is studied and the
correctness ofthesefunctions are analysed for various kinds of datasets. TheYolo
V3 can be used as a highly successfultool for dataset classification with suitable
combination of training, learning and transfer functions. When the maximum
likelihood method was compared with COCO method, the Yolo v3 was more
accurate than maximum likelihood method. A high predictive ability with stable
and well functioning Yolo V3 is possible. Itproves to bemore effective than other
classification algorithms.
1.7 COCO Method
Though the layers are colloquially referred to as convolutions, this is only by
convention. Mathematically, it is technically a sliding dot product or cross-
correlation. This has significance for the indices in the matrix, in that it affects
how weight is determined at a specific index point. Convolutional networks may
include local or global pooling layers to streamline the underlying computation.
Pooling layers reduce the dimensions of the data by combining the outputs of
neuron clusters at one layer into a single neuron in the next layer. Local pooling
combines small clusters, typically 2 x 2. Global pooling acts on all the neurons of
9. the convolutional layer. In addition, pooling may compute a max or an
average. Max pooling uses the maximum value from each of a cluster of neurons
at the prior layer. Average pooling uses the average value from each of a cluster
of neurons at the prior layer.
1.8 TEXT TO SPEECH SYNTHESIS (TTS)
Speech synthesis is the artificial production of human speech. A
computer system used for this purposeis called a speechsynthesizer, and can be
implemented in software or hardware. A text-to-speech (TTS) system converts
normal language text into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into speech. Text-to-Speech (TTS)
refers to the ability of computers to read text aloud. A TTS Engine converts
written text to a phonemic representation, then converts the phonemic
representation to waveforms that can be output as sound. TTS engines with
different languages, dialects and specialized vocabularies are available through
third party publishers.
Preprocessing
Data pre-processing is an important step in the image processingprocess. It is
process ofdatagathering methods are often loosely controlled and analysing data
that has not been carefully screened for such problems can produce misleading
results. Thus, the representation and quality of data is first and foremost before
running an analysis. Detection of moving objects and motion-based tracking are
important components of many computer vision applications, including animal
recognition, monitoring, and automotive safety. The problem of motion-based
animal tracking can be divided into two parts:
1. Detecting moving objects in each frame
10. The detection of moving objects uses a background subtraction algorithm based
on Gaussian mixture models. Morphological operations are applied to the
resulting foreground mask to eliminate noise. Finally, blob analysis detects
groups of connected pixels, which are likely to correspond to moving objects.
2. Associating the detections corresponding to the same object over time
The association of detections to the same object is based solely on motion. The
motion of each track is estimated by a Kalman filter. The filter is used to predict
the track's location in each frame, and determine the likelihood of each
detection being assigned to each track.
3. Track Maintenance for assigned and unsigned
Track maintenance becomes an important aspect of this example. In any given
frame, some detections may be assigned to tracks, while other detections and
tracks may remain unassigned. The assigned tracks are updated using the
corresponding detections. The unassigned tracks are marked invisible. An
unassigned detection begins a new track.
4. Counting the tracks
Each track keeps count of the number of consecutive frames, where it remained
unassigned. If the countexceeds a specified threshold, the video assumes that the
object left the field of view and it deletes the track.
3.1.3 Feature Extraction
Feature extraction is a technique that is applied to remove the noisy data and
background subtraction correct the inconsistencies in data. It involves
transformations to correctthe wrong data and it performed as a datapreprocessing
step while preparing the data for a data warehouse.
11. CONCLUSION AND FUTURE ENHANCEMENTS
This projects shows that object counting and tracking are key activities in many
computervision applications. The detection of moving objects uses a background
subtraction algorithm based on Gaussian mixture models. Morphological
operations are applied to the resulting foreground mask to eliminate noise.
Finally, Gabor filter mechanism detects groups of connected pixels, which are
likely to correspond to moving objects. The use of the YOLO v3 for tracking
objects and focuses on three important features namely 1) Prediction of object
future location. 2)Reduction of noise introduced by inaccurate detections. 3)
Facilitating the process of association of multiple objects to their tracks. This
method is to detect and counting the objects can be used to analyse in any
platforms. Detection is also a first step prior to performing more sophisticated
tasks such as tracking or categorization of object by their type.