Smart blind stick book

14,985 views
14,628 views

Published on

Published in: Education
2 Comments
18 Likes
Statistics
Notes
No Downloads
Views
Total views
14,985
On SlideShare
0
From Embeds
0
Number of Embeds
22
Actions
Shares
0
Downloads
780
Comments
2
Likes
18
Embeds 0
No embeds

No notes for slide

Smart blind stick book

  1. 1. Mansoura University Faculty of Engineering Dept. of Electronics and Communication Engineering Smart Blind Stick A B. Sc. Project in Electronics and Communications Engineering Supervised by Assist. Prof. Mohamed Abdel-Azim Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashraf Department of Electronics and Communications Engineering Faculty of Engineering-Mansoura University 2011-2012
  2. 2. Mansoura University Faculty of Engineering Dept. of Electronics and Comm. Engineering Smart Blind Stick A B. Sc. Project in Electronics and Communications Engineering Supervised by Assist. Prof. Mohamed Abdel-Azim Eng. Ahmed Shabaan, Eng. Mohamed Gamal, Eng. Eman Ashrsf Department of Electronics and Communications Engineering Faculty of Engineering-Mansoura University 2011-2012
  3. 3. Team Work Team Work No. Name Contact Information 1 Ahmed Helmy Abd-Ghaffar Ahmed2033@gmail.com 2 Nesma Zein El-Abdeen Mohammed eng_nesma.zein@yahoo.com 3 Aya Gamal Osman El-Mansy eng_tota_20@hotmail.com 4 Fatma Ayman Mohammed angel_whisper89@hotmail.com 5 Ahmed Moawad Abo-Elenin Awad ahmedmowd@gmail.com i
  4. 4. Acknowledgement Acknowledgement We would like to express our gratitude to our advisor and supervisor Dr. Mohammed Abd ElAzim for guiding this work with interest. We would like to also thank Eng. Ahmed Shaaban and Eng. Mohammed Gamal and Eng. Eman Ashraf, Teaching Assistance for the countless hours he spent in the labs. We are grateful to them for setting high standards and giving us the freedom to explore. We would like to thank our colleagues for the assistance and constant support provided by them. Our Team ii
  5. 5. Acknowledgement iii
  6. 6. Abstract Abstract There is approximately 36.9 million people in the world are blind in 2002 according to World Health Organization. Majority of them are using a conventional white cane to aid in navigation. The limitation in white cane is that the information’s are gained by touching the objects by the tip of the cane. The traditional length of a white cane depends on the height of user and it extends from the floor to the person’s sternum. So we'll design ultrasound sensor to detect all kinds of barriers whatever its shape or height and warn him with vibration. Blind people also face great problems in moving from place to another in the town and the only way for them is Guide dogs which can cost about $20, 000 and they can be useful for about 5 – 6 years. So we'll design GPS for blind people which help him in moving from place to another in the town with voice orders for directions and he'll identify the place he want to go with voice only and not need to type any thing. But we want also to help him in moving indoor or in closed places he goes daily from place to another we'll design an indoor navigation system depend on working off line to help him to move from location to another in specific places home, moles, libraries...Etc. also by voice orders . He may face a great problem in control his electric devices we'll design for him a total wireless control system to easily control all his electric devices by voice connected to a security system to warn him if he indoor or out if any thing wrong happen and help him to solve this problem . iv
  7. 7. Contents Chapter-01: Introduc on……………………………………………………………………………………………….. 1.1 Problem Definition …………………………………………………………………………………...... 1.2 Problem Solution …………………………………………………………………………………………. 1.3 Business Model ……………………………………………………………………………………………. 1.4 Block Diagram………………………………………………………………………………………………. 1.5 Detailed Technical Description ……………………….…………………………………………… 1.6 Pre-Project Planning….…………………………………………………………………………………. 1.7 Time Planning………………………………………………………………………………………………. Chapter-02: Speech recognition ………………………………………………………………………………………… 2.1 Introduction ………………………………………………………………………………………………… 2.2 Literature review …………………………………………………………………………………………. 2.2.1 Pattern recognition ………………………………………………………. 2.2.2 Generation of voice ……………………………………………………… 2.2.3 Voice as biometric ………………………………………………………… 2.2.4 Speech recognition ………………………………………………………. 2.2.5 Speaker recognition ……………………………………………………… 2.2.6 Speechspeaker modeling …………………………………………….. 2.3 Implementation details ……………………………………………………………………………….. 2.3.1 Pre-processing and feature extraction …………………………… 2.4 Artificial neural network……………………………………………………............................. 2.4.1 Introduction ………………………………………………………………….. 2.4.2 Models ………………………………………………………………………….. 2.4.3 Network function …………………………………………………………... 2.4.4 Ann dependency graph ………………………………………………….. 2.4.5 Learning …………………………………………………………………………. 2.4.6 Choosing a cost function ……………………………………………….. 2.4.7 Learning paradigms ……………………………………………………….. 2.4.8 Supervised learning ……………………………………………………….. 2.4.9 unsupervised learning ……………………………………………………. 2.4.10 Reinforcement learning …………………………………………………. 2.4.11 Learning algorithms………………………………………………………… 2.4.12 Employing artificial neural network ……………………………….. 2.4.13 Application …………………………………………………………………….. 2.4.14 Types of models …………………………………………………………….. 2.4.15 Neural network software ………………………………………………. 2.4.16 Types of artificial neural network ………………………………….. 2.4.17 Confidence analysis of neural network ………………………….. Chapter-03: Image Processing ………….…………………………………………………………………………….. 3.1 Introduction …………………………………………………………………………………………………. 3.1.1 What is digital image processing? ...................................... 3.1.2 Motivating problems ……………………………………………………… 3.2 Color vision ………………………………………………………………………………………………….. 3.2.1 Fundamentals ………………………………………………………………… 3.2.2 Image formats supported by mat lab …………………………….. 3.2.3 Working formats in mat lab …………………………………………… 3.3 Aspects of image processing ……………………………………………………………………….. ii 1 1 1 2 2 3 4 4 7 7 7 7 9 11 11 12 13 13 13 22 22 23 24 24 25 26 26 26 27 27 28 28 29 30 31 31 31 32 33 33 33 34 34 35 35 35
  8. 8. Contents 3.4 Image types …………………………………………………………………………………………………. 3.4.1 Intensity image ……………………………………………………………… 3.4.2 Binary image …………………………………………………………………. 3.4.3 Indexed image ………………………………………………………………. 3.4.4 RGB image……………………………………………………………………… 3.4.5 Multi frame image …………………………………………………………. 3.5 How to …………………………………………………………………………………………………………. 3.5.1 How to convert between different formats …………………… 3.5.2 How to read file …………………………………………………………….. 3.5.3 Loading and saving variables in mat lab …………………………. 3.5.4 How to display an image in mat lab ……………………………….. 3.6 Some important definitions …………………………………………………………………………. 3.6.1 Imread function …………………………………………………………….. 3.6.2 Rotation ………………………………………………………………………… 3.6.3 Scaling …………………………………………………………………………… 3.6.4 Interpolation …………………………………………………………………. 3.7 Edge detection …………………………………………………………………………………………….. 3.7.1 Canny edge detection ……………………………………………………. 3.7.2 Edge tracing …………………………………………………………………… 3.8 Mapping ………………………………………………………………………………………………………. 3.8.1 Mapping image onto surface overview ………………………….. 3.8.2 Mapping an image onto elevation data …………………………. 3.8.3 Initializing the IDL display objects…………………………………… 3.8.4 Displaying image and geometric surface object……………… 3.8.5 Mapping an image onto sphere……………………………………… 3.9 Mapping offline……………………………………………………………………………………………. Chapter-04: GPS naviga on………………………………………………………………………………………….. 4.1 Introduction ………………………………………………………………………………………………… 4.1.1 What is GPS ?...................................................................... 4.1.2 How it work ?...................................................................... 4.2 Basic concepts of GPS ………………………………………………………………………………….. 4.3 Position calculation ……………………………………………………………………………………… 4.4 Communication …………………………………………………………………………………………… 4.5 Message format ………………………………………………………………………………………….. 4.6 Satellite frequencies ……………………………………………………………………………………. 4.7 Navigation equations ………………………………………………………………………………….. 4.8 Bancroft's method ……………………………………………………………………………………….. 4.9 Trilateration …………………………………………………………………………………………………. 4.10 Multidimensional Newton-Raphson calculation …………………………………………. 4.11 Additional method for more than four satellites ………………………….................. 4.12 Error sources and analysis …………………………………………………………………………… 4.13 Accuracy enhancement and surveying ………………………………………………………… 4.13.1 Augmentation………………………………………………………………… 4.13.2 Precise monitoring…………………………………………………………. 4.14 Time keeping ………………………………………………………………………………………………. 4.14.1 Time keeping and leap seconds …………………………………….. iii 36 36 37 37 37 37 38 38 38 39 39 40 40 40 41 41 41 41 42 43 43 44 46 47 51 51 53 53 53 53 54 55 57 57 58 59 60 60 60 61 61 61 61 62 63 63
  9. 9. Contents 4.14.2 Time keeping accuracy …………………………………………………… 4.14.3 Time keeping format………………………………………………………. 4.14.4 Carrier phase tracking ……………………………………………………. 4.15 GPS navigation …………………………………………………………………………………………….. Chapter-05: Ultrasound ……………………………………………………………………………………………. 5.1 Introduction …………………………………………………………………………………………………. 5.1.1 History ……………………………………………………………………………. 5.2 Wave motion ……………………………………………………………………………………………….. 5.3 Wave characteristics ……………………………………………………………………………………. 5.4 Ultrasound intensity …………………………………………………………………………………….. 5.5 Ultrasound velocity ……………………………………………………………………………………… 5.6 Attenuation of ultrasound …………………………………………………………………………… 5.7 Reflection ……………………………………………………………………………………………………. 5.8 Refraction ……………………………………………………………………………………………………. 5.9 Absorption …………………………………………………………………………………………………. 5.10 Hardware part …………………………………………………………………………………… 5.10.1 Introduction ………………………………………………………….. 5.10.2 Calculating the distance…………………………………………. 5.10.3 Changing beam pattern and beam width …………………. 5.10.4 The development of the sensor………………………………… Chapter-06: Microcontroller ………………………………………………………………………………………. 6.1 Introduction …………………………………………………………………………………….. 6.1.1 History of microcontroller ……………………………………… 6.1.2 Embedded design…………………………………………………….. 6.1.3 Interrupt …………………………………………………………………. 6.1.4 Programs ………………………………………………………………… 6.1.5 Other microcontroller feature ……………………………….. 6.1.6 Higher integration ……………………………………………………. 6.1.7 Programming environment ……………………………………… 6.2 Types of micro controller …………………………………………………………………. 6.2.1 Interrupt latency ………………………………………………………. 6.3 Microcontroller embedded memory technology ………………………… 6.3.1 Data……………………………………………………………………….. 6.3.2 Firmware ………………………………………………………………… 6.4 PIC microcontroller ………………………………………………………………………….. 6.4.1 Family core architecture ……………………………………….. 6.5 PIC component ………………………………………………………………………………….. 6.5.1 Logic circuit ……………………………………………………………… 6.5.2 Power supply …………………………………………………………… 6.6 Development tools…………………………………………………………………………… 6.6.1 Device programs …………………………………………………….. 6.6.2 Debugging ………………………………………………………………. 6.7 LCD display ……………………………………………………………………………………….. 6.7.1 LCD display pins ………………………………………………………. 6.7.2 LCD screen ……………………………………………………………… 6.7.3 LCD memory ………………………………………………………….. iv 63 64 64 66 69 69 69 69 71 72 75 76 77 79 81 83 83 87 87 88 91 91 92 93 93 94 94 95 97 98 99 100 100 101 101 101 101 106 119 127 127 128 130 131 131 132
  10. 10. Contents 6.7.4 LCD basic command ………………………………………………….. 6.7.5 LCD connecting …………………………………………………………. 6.7.6 LCD initialization ……………………………………………………… Chapter-07: System Implementa on ………………………………………………………………………… 7.1 Introduction ……………………………………………………………………………………… 7.2 Survey……………………………………………………………………………………………….. 7.3 Searches …………………………………………………………………………………………… 7.3.1 Ultra sound sensor……………………………………………………. 7.3.2 Indoor navigation systems ………………………………………. 7.3.3 Outdoor navigation ………………………………………………… 7.4 Sponsors ……………………………………………………………………………………….. 7.5 Pre-design ………………………………………………………………………………………. 7.5.1 List of matrices ………………………………………………………. 7.5.2 Competitive Benchmarking Information………… 136 138 139 141 141 142 142 142 142 142 143 143 144 145 7.5.3 Ideal and marginally acceptable target values ……….. 7.5.4 Time plan diagram …………………………………………………… 7.6 Design ……………………………………………………………………………………………… 7.6.1 Speech recognition ……………………………………………….. 7.6.2 Ultra sensors …………………………………………………………… 7.6.3 Outdoor navigation ………………………………………………… 7.7 Product architecture ……………………………………………………………………… 7.7.1 Product schematic ………………………………………………….. 7.7.2 Rough geometric layout …………………………………………. 7.7.3 Incidental interactions …………………………………………….. 7.8 Defining secondary system …………………………………………………………….. 7.9 Detailed interface specification ……………………………………………………… 7.10 Establishing the architecture of the chunks ……………………………………… Chapter-08: conclusion ………………………………………………………………………………………………. 8.1 Introduction…………………………………………………………………………………. 8.2 Overview………………………………………………………………………………………….. 8.2.1 Outdoor navigation …………………………………………………… 8.2.1 8.2.1.1 Outdoor navigation online ……………………………………… 8.2.1.2 Outdoor navigation offline ………………………………………. 8.2 8.2.2 Ultrasound sensor …………………………………………………….. 8.2.3 Object identifier ………………………………………………………. 8.3 Features ……………………………………………………………………………………………. 146 146 147 147 149 150 151 151 152 153 154 154 155 157 158 158 158 158 158 159 159 159 v
  11. 11. CHAPTER 1 Introduction
  12. 12. Chapter 1 | Introduction 1.1 | PROBLEM DEFINITION There is approximately 36.9 million people in the world are blind in 2002 according to World Health Organization. Majority of them are using a conventional white cane to aid in navigation. The limitation in white cane is that the information’s are gained by touching the objects by the tip of the cane. The traditional length of a white cane depends on the height of user and it extends from the floor to the person’s sternum. Blind people also face great problems in moving from place to another in the town and the only way for them is Guide dogs which can cost about $20, 000 and they can be useful for about 5 – 6 years. They also have a great problem to identify the objects he frequently used in his house as kitchen tools and clothes. And also he may face a great problem in control his electric devices or have a security problem and he can't face it. 1.2 | PROBLEM SOLUTION All previous problems we're trying to solve them. To help the user moving easily indoor and outdoor we'll use ultrasound sensor to detect the barriers on his way and alert him by 2 ways vibration motor which speed increases when the distance decreases and voice alert told him the distance between him and the barrier. To solve the problem of moving outside home from place to another we'll design a software to be used in smart phones to help him in moving from place to another with voice orders without any external help he just say the place he want to go then the phone will guide him with voice orders to arrive this place. To help him to identify the objects we'll use RFID every important object will have tag or id when the reader read the id it will told him what it is by voice. Inside the home we'll design a system to control all electronic devices by voice orders and also a security system designed especially for them the most important in it is the fire alarm when it detects a fire it will alert him by a call to his mobile phone and another call to his friends near him for help and also a security system to warn him if he forget to close his door. After finishing these applications we're going to make features after graduation by adding new technologies to help him moving in the street easier and help him crossing roads and reading books. The products in our market in Egypt for them don't cover any needs for them. 1
  13. 13. Chapter 1 | Introduction The blind needs to move control and do his tasks his self without any help from anybody. There’s just a white stick without any technologies or features. So finally we'll install on the white stick a sensor and RFID and the other part is a software part on the mobile to do the navigation and automation tasks. 1.3 | BUSINESS MODEL Our customers are blind people and a visually impaired person there's almost 1 million people in Egypt has one of the past problems. Our product would cover some needs of our customers as helping them to avoid the barriers on their way and guide them with voice to the direction they must go to avoid this and also help the to move free without any external help in different countries by android application on his mobile which designed especially for them to guide them with voice through roads and tell them the direction they have to go to arrive their goal. To reach our goal we met with different customers to know exactly what they need and help us to get a vision for our final product to be comfortable and also we were guided technically by our sponsors to find the best way to cover all these needs. In our market the available products doesn't cover any needs we just found a white stick without any technologies to help the user. 1.4 | BLOCK DIAGRAMS Fig.(1.1): General Project Block Diagram 2
  14. 14. Chapter 1 | Introduction 1.5 DETAILED TECHNICAL DESCRIPTION Our project was built on the simplest available technologies to reach our goal in the way that comfort the user so we divided our project into 2 parts software and hardware. The hardware part consists of MCU pic, MP3 module, cam module and ultrasound sensor module. The software part is an android application available to be installed on the mobile. In the hardware part there're 2 conditions for it indoor and outdoor. For indoor only one sensor will measure ranges and cam module will take a photo to the object when the user reaches 2 cm to detect the code put on and send it to MCU which processing it and identify the code number and then get the object name from database and then connect the mp3 module WT588D and get the mp3 file address which contains the name of it and out to from the speaker. For outdoor 3 sensors HC-SR04 sensors will be activated in 3 direction to determine the best way no barriers on it and send measured data to MCU and the MCU detect the best way and send the address of the mp3 which contains the wanted direction and it would be the output. For navigation outdoor we'll design android application using Google maps the user detect the place he want to go with voice and the application detect his current position using GPS and the digital compass detect the angel of view and guide him to the direction using GPS data and compass data. Choose Mode Left Button Right Button Outdoor Indoor Fig. (1.2): Button Configuration 3
  15. 15. Chapter 1 | Introduction Fig. (1.3): Indoor & Outdoor Processes Block Diagram 1.6 PRE-PROJECT PLANNING We start searching for a problem no one care it and we found blinds' problems take no care to be solved and available products in Egypt aren't found. So we found it's a good field to start in it to get an opportunity to solve a problem and also enter a new field in the market with low number of Competitors. 1.7 TIME PLANNING Project Timing: The three main parts are individual in execution time but each part has many branches which are series in execution time. Timing of Product Introductions: The timing of launching the product is dependent on the marketing and the market studying again to the products which must be having low cost and high quality. 4
  16. 16. Chapter 1 | Introduction Technology Readiness: One of the fundamental components in the product is technology because the Android and Ultrasonic technology are taking good importance between the Egyptian customers. Market Readiness: The market always has a readiness to any new product the market is common between products to give the customers the best one for them. The Product Plan: This plan makes the project comfortable in his implementation because anything arranged or planned to do give the best results. 5
  17. 17. CHAPTER 2 Speech Recognition
  18. 18. Chapter 2 | Speech Recognition 2.1 | INTRODUCTION Biometrics is, in the simplest definition, something you are. It is a physical characteristic unique to each individual such as fingerprint, retina, iris, speech. Biometrics has a very useful application in security; it can be used to authenticate a person’s identity and control access to a restricted area, based on the premise that the set of these physical characteristics can be used to uniquely identify individuals. Speech signal conveys two important types of information, the primarily the speech content and on the secondary level, the speaker identity. Speech recognizers aim to extract the lexical information from the speech signal independently of the speaker by reducing the inter-speaker variability. On the other hand, speaker recognition is concerned with extracting the identity of the person speaking the utterance. So both speech recognition and speaker recognition system is possible from same voice input. We use in our project the speech recognition technique because we want in our project to recognize the word that the stick will make action depending on this word. Mel Filter Cepstral Coefficient (MFCC) is used as feature for both speech and speaker recognition. We also combined energy features and delta and delta-delta features of energy and MFCC. After calculating feature, neural networks are used to model the speech recognition. Based on the speech model the system decides whether or not the uttered speech matches what was prompted to utter. 2.2 | LITERATURE REVIEW 2.2.1 | Pattern Recognition Pattern recognition, one of the branches of artificial intelligence, sub-section of machine learning, is the study of how machines can observe the environment, learn to distinguish patterns of interest from their background, and make sound and reasonable decisions about the categories of the patterns. A pattern can be a fingerprint image, a handwritten cursive word, a human face, or a speech signal, sales pattern etc… The applications of pattern recognition include data mining, document classification, financial forecasting, organization and retrieval of multimedia databases, and biometrics (personal identification based on various physical attributes such as face, retina, speech, ear and fingerprints).The essential steps of 7
  19. 19. Chapter 2 | Speech Recognition pattern recognition are: Data Acquisition, Preprocessing, Feature Extraction, Training and Classification. Features are used to denote the descriptor. Features must be selected so that they are discriminative and invariant. They can be represented as a vector, matrix, tree, graph, or string. They are ideally similar for objects in the same class and very different for objects indifferent class. Pattern class is a family of patterns that share some common properties. Pattern recognition by machine involves techniques for assigning patterns to their respective classes automatically and with as little human intervention as possible. Learning and Classification usually use one of the following approaches: Statistical Pattern Recognition is based on statistical characterizations of patterns, assuming that the patterns are generated by a probabilistic system. Syntactical (or Structural) Pattern Recognition is based on the structural interrelationships of features. Given a pattern, its recognition/classification may consist of one of the following two tasks according to the type of learning procedure: 1) Supervised Classification (e.g., Discriminant Analysis) in which the input pattern is identified as a member of a predefined class. 2) Unsupervised Classification (e.g., clustering) in which the pattern is assigned to a previously unknown class. Fig. (2.1): General block diagram of pattern recognition system 8
  20. 20. Chapter 2 | Speech Recognition 2.2.2 | Generation of Voice Speech begins with the generation of an airstream, usually by the lungs and diaphragm -process called initiation. This air then passes through the larynx tube, where it is modulated by the glottis (vocal chords). This step is called phonation or voicing, and is responsible fourth generation of pitch and tone. Finally, the modulated air is filtered by the mouth, nose, and throat - a process called articulation - and the resultant pressure wave excites the air. Fig. (2.2): Vocal Schematic Depending upon the positions of the various articulators different sounds are produced. Position of articulators can be modeled by linear time- invariant system that has frequency response characterized by several peaks called formants. The change in frequency of formants characterizes the phoneme being articulated. As a consequence of this physiology, we can notice several characteristics of the frequency domain spectrum of speech. First of all, the oscillation of the glottis 9
  21. 21. Chapter 2 | Speech Recognition results in an underlying fundamental frequency and a series of harmonics at multiples of this fundamental. This is shown in the figure below, where we have plotted a brief audio waveform for the phoneme /i: / and its magnitude spectrum. The fundamental frequency (180 Hz) and its harmonics appear as spikes in the spectrum. The location of the fundamental frequency is speaker dependent, and is a function of the dimensions and tension of the vocal chords. For adults it usually falls between 100 Hz and 250 Hz, and females‟ average significantly higher than that of males. Fig. (2.3): Audio Sample for /i: / phoneme showing stationary property of phonemes for a short period The sound comes out in phonemes which are the building blocks of speech. Each phoneme resonates at a fundamental frequency and harmonics of it and thus has high energy at those frequencies in other words have different formats. It is the feature that enables the identification of each phoneme at the recognition stage. The variations in Fig.(2.4): Audio Magnitude Spectrum for /i:/ phoneme showing fundamental frequency and its harmonics 10
  22. 22. Chapter 2 | Speech Recognition Inter-speaker features of speech signal during utterance of a word are modeled in word training in speech recognition. And for speaker recognition the intra-speaker variations in features in long speech content is modeled. Besides the configuration of articulators, the acoustic manifestation of a phoneme is affected by:  Physiology and emotional state of speaker.  Phonetic context.  Accent. 2.2.3 | Voice as Biometric The underlying premise for voice authentication is that each person’s voice differs in pitch, tone, and volume enough to make it uniquely distinguishable. Several factors contribute to this uniqueness: size and shape of the mouth, throat, nose, and teeth (articulators) and the size, shape, and tension of the vocal cords. The chance that all of these are exactly the same in any two people is very low. Voice Biometric has following advantages from other form of biometrics:  Natural signal to produce  Implementation cost is low since, doesn’t require specialized input device  Acceptable by user Easily mixed with other form of authentication system for multifactor authentication only biometric that allows users to authenticate remotely. 2.2.4 | Speech Recognition Speech is the dominant means for communication between humans, and promises to be important for communication between humans and machines, if it can just be made a little more reliable. Speech recognition is the process of converting an acoustic signal to a set of words. The applications include voice commands and control, data entry, voice user interface, automating the telephone operator’s job in telephony, etc. They can also serve as the input to natural language processing. There is two variant of speech recognition based on the duration of speech signal: Isolated word recognition, in which each word is surrounded by some sort of pause, is much easier than recognizing continuous speech, in which words run into each other and have to be segmented. Speech recognition is a difficult task because 11
  23. 23. Chapter 2 | Speech Recognition of the many source of variability associated with the signal such as the acoustic realizations of phonemes, the smallest sound units of which words are composed, are highly dependent on the context. Acoustic variability can result from changes in the environment as well as in the position and characteristics of the transducer. Third, within speaker variability can result from changes in the speaker's physical and emotional state, speaking rate, or voice quality. Finally, differences in socio linguistic background, dialect, and vocal tract size and shape can contribute to cross-speaker variability. Such variability is modeled in various ways. At the level of signal representation, the representation that emphasizes the speaker independent features is developed. 2.2.5 | Speaker Recognition Speaker recognition is the process of automatically recognizing who is speaking on the basis of individual’s information included in speech waves. Speaker recognition can be classified into identification and verification. Speaker recognition has been applied most often as means of biometric authentication. 2.2.5.1 | Types of Speaker Recognition Speaker Identification Speaker identification is the process of determining which registered speaker provides a given utterance. In Speaker Identification (SID) system, no identity claim is provided, the test utterance is scored against a set of known (registered) references for each potential speaker and the one whose model best matches the test utterance is selected. There is two types of speaker identification task closedset and open-set speaker identification .In closed-set, the test utterance belongs to one of the registered speakers. During testing, a matching score is estimated for each registered speaker. The speaker corresponding to the model with the best matching score is selected. This requires N comparisons for a population of N speakers. In open-set, any speaker can access the system; those who are not registered should be rejected. This requires another model referred to as garbage model or imposter model or background model, which is trained with data provided by other speakers different from the registered speakers. During testing, the matching score corresponding to the best speaker model is compared with the matching score estimated using the garbage model. In order to accept or reject the speaker, making the total number of comparisons equal to N + 12
  24. 24. Chapter 2 | Speech Recognition 1. Speaker identification performance tends to decrease as the population size increases. Speaker verification Speaker verification, on the other hand, is the process of accepting or rejecting the identity claim of a speaker. That is, the goal is to automatically accept or reject an identity that is claimed by the speaker. During testing, a verification score is estimated using the claimed speaker model and the anti-speaker model. This verification score is then compared to a threshold. If the score is higher than the threshold, the speaker is accepted, otherwise, the speaker is rejected. Thus, speaker verification, involves a hypothesis test requiring a simple binary decision: accept or reject the claimed identity regardless of the population size. Hence, the performance is quite independent of the population size, but it depends on the number of test utterances used to evaluate the performance of the system. 2.2.6 | Speaker/Speech Modeling There are various pattern modeling/matching techniques. They include Dynamic Time Warping (DTW), Gaussian Mixture Model (GMM), Hidden Markov Modeling (HMM), Artificial Neural Network (ANN), and Vector Quantization (VQ). These are interchangeably used for speech, speaker modeling. The best approach is statistical learning methods: GMM for Speaker Recognition, which models the variations in features of a speaker for a long sequence of utterance. And another statistical method widely used for speech recognition is HMM. HMM models the Markovian nature of speech signal where each phoneme represents a state and sequence of such phonemes represents a word. Sequence of Features of such phonemes from different speakers is modeled by HMM. 2.3 | IMPLEMENTATION DETAILS The implementation of system includes common pre-processing and feature extraction module, speaker independent speech modeling and classification by ANNs. 2.3.1 | Pre-Processing and Feature Extraction 13
  25. 25. Chapter 2 | Speech Recognition Starting from the capturing of audio signal, feature extraction consists of the following steps as shown in the block diagram below: Speech Signal Silence removal Preemphasis Framing Windowing DFT Mel Filter Bank Log IDF T CMS Energy 12MFCC 12 ΔMFCC 12 ΔΔ MFCC Delta 1 energy 1 Δ energy 1 ΔΔ energy Fig. (2.5): Pre-Processing and Feature Extraction 2.3.1.1 | Capture      The first step in processing speech is to convert the analog representation (first air pressure, and then analog electric signals in a microphone) into a digital signal x[n], where n is an index over time. Analysis of the audio spectrum shows that nearly all energy resides in the band between DC and 4 kHz, and beyond 10 kHz there is virtually no energy what so ever. Used sound format: 22050 Hz 16-bits, Signed Little Endian Mono Channel Uncompressed PCM 2.3.1.2 | End point detection and Silence removal The captured audio signal may contain silence at different positions such as beginning of signal, in between the words of a sentence, end of signal…. etc. If silent frames are included, modeling resources are spent on parts of the signal which do not contribute to the identification. The silence present must be removed before further processing. There are several ways for doing this: most popular are Short Time Energy and Zeros Crossing Rate. But they have their own limitation regarding setting thresholds as an ad hocbasis. The algorithm we used uses 14
  26. 26. Chapter 2 | Speech Recognition statistical properties of background noise as well as physiological aspect of speech production and does not assume any ad hoc threshold. It assumes that background noise present in the utterances is Gaussian in nature. Usually first 200msec or more (we used 4410 samples for the sampling rate 22050samples/sec) of a speech recording corresponds to silence (or background noise) because the speaker takes some time to read when recording starts. Endpoint Detection Algorithm: Step 1: Calculate the mean (μ) and standard deviation (σ) of the first 200ms samples of the given utterance. The background noise is characterized by this μ and σ. Step 2: Go from 1st sample to the last sample of the speech recording. In each sample, check whether one-dimensional Mahalanobis distance functions i.e. | x-μ |/ σ greater than 3 or not. If Mahalanobis distance function is greater than 3, the sample is to be treated as voiced sample otherwise it is an unvoiced/silence. The threshold reject the samples up to 99.7% as per given by P [|x−μ|≤3σ] =0.997 in a Gaussian distribution thus accepting only the voiced samples. Step 3: Mark the voiced sample as 1 and unvoiced sample as 0. Divide the whole speech signal into 10 ms non-overlapping windows. Represent the complete speech by only zeros and ones. Step 4: Consider there are M number of zeros and N number of ones in a window. If M ≥ N then convert each of ones to zeros and vice versa. This method adopted here keeping in mind that a speech production system consisting of vocal cord, tongue, vocal tract etc. cannot change abruptly in a short period of time window taken here as 10ms. Step 5: Collect the voiced part only according to the labeled „1‟ samples from the windowed array and dump it in a new array. Retrieve the voiced part of the original speech signal from labeled 1 sample. 15
  27. 27. Chapter 2 | Speech Recognition Fig. (2.6): Input signal to End-point detection system Fig. (2.7): Output signal from End point Detection System 2.3.1.3 | PCM Normalization The extracted pulse code modulated values of amplitude is normalized, to avoid amplitude variation during capturing. 2.3.1.4 | Pre-emphasis Usually speech signal is pre-emphasized before any further processing, if we look at the spectrum for voiced segments like vowels, there is more energy at lower frequencies than the higher frequencies. This drop in energy across frequencies is caused by the nature of the glottal pulse. Boosting the high frequency energy makes information from these higher formants more available to the acoustic model and improves phone detection accuracy. The pre-emphasis filter is a first-order high-pass filter. In the time domain, with input x[n]and 0.9 ≤ α ≤ 1.0, the filter equation is: y[n] = x[n]− α x[n−1] We used α=0.95. 16
  28. 28. Chapter 2 | Speech Recognition Fig. (2.8): Signal before Pre-Emphasis Fig.(2.9): Signal after Pre-Emphasis 2.3.1.5 | Framing and windowing Speech is a non-stationary signal, meaning that its statistical properties are not constant across time. Instead, we want to extract spectral features from a small window of speech that characterizes a particular sub phone and for which we can make the (rough) assumption that the signal is stationary (i.e. its statistical properties are constant within this region).We used frame block of 23.22ms with 50% overlapping i.e., 512 samples per frame. 17
  29. 29. Chapter 2 | Speech Recognition Fig.(2.10): Frame Blocking of the Signal The rectangular window (i.e., no window) can cause problems, when we do Fourier analysis; it abruptly cuts of the signal at its boundaries. A good window function has a narrow main lobe and low side lobe levels in their transfer functions, which shrinks the values of the signal toward zero at the window boundaries, avoiding discontinuities. The most commonly used window function in speech processing is the Hamming window defined as follows: ( ) ( ) { ( )} Fig.(2.11): Hamming window The extraction of the signal takes place by multiplying the value of the signal at time n, s frame [n], with the value of the window at time n, S w [n]: Y[n] = Sw[n] × Sframe[n] 18
  30. 30. Chapter 2 | Speech Recognition Fig.(2.12): A single frame before and after windowing 2.3.1.6 | Discrete Fourier Transform A Discrete Fourier Transform (DFT) of the windowed signal is used to extract the frequency content (the spectrum) of the current frame. The tool for extracting spectral information i.e., how much energy the signal contains at discrete frequency bands for a discrete-time (sampled) signal is the Discrete Fourier Transform or DFT. The input to the DFT is a windowed signal x[n]...x[m], and the output, for each of N discrete frequency bands, is a complex number X[k] representing the magnitude and phase of that frequency component in the original signal. |∑ ( ) ( ) | The commonly used algorithm for computing the DFT is the Fast Fourier Transform or in short FFT. 2.3.1.7 | Mel Filter For calculating the MFCC, first, a transformation is applied according to the following formula: ( ) [ ] Where, x is the linear frequency. Then, a filter bank is applied to the amplitude of the Mel-scaled spectrum. The Mel frequency warping is most conveniently done by utilizing a filter bank with filters centered according to Mel 19
  31. 31. Chapter 2 | Speech Recognition frequencies. The width of the triangular filters varies according to the Mel scale, so that the log total energy in a critical band around the center frequency is included. The centers of the filters are uniformly spaced in the Mel scale. Fig.(2.13): Equally spaced Mel values The result of Mel filter is information about distribution of energy at each Mel scale band. We obtain a vector of outputs (12 coeffs.) from each filter. Fig.(2.13): Triangular filter bank in frequency scale We have used 30 filters in the filter bank. 20
  32. 32. Chapter 2 | Speech Recognition 2.3.1.8 | Cestrum by Inverse Discrete Fourier Transform Cestrum transform is applied to the filter outputs in order to obtain MFCC feature of each frame. The triangular filter outputs Y (i), i=0, 1, 2… M are compressed using logarithm, and discrete cosine transform (DCT) is applied. Here, M is equal to number of filters in filter bank i.e., 30. [ ] ∑ () [ ( )] Where, C[n] is the MFCC vector for each frame. The resulting vector is called the Mel-frequency cepstrum (MFC), and the individual components are the Mel-frequency Cepstral coefficients (MFCCs). We extracted 12 features from each speech frame. 2.3.1.9 | Post Processing Cepstral Mean Subtraction (CMS) A speech signal may be subjected to some channel noise when recorded, also referred to as the channel effect. A problem arises if the channel effect when recording training data for a given person is different from the channel effect in later recordings when the person uses the system. The problem is that a false distance between the training data and newly recorded data is introduced due to the different channel effects. The channel effect is eliminated by subtracting the Melcepstrum coefficients with the mean Mel-cepstrum coefficients: ( ) ( ) ∑ ( ) The energy feature The energy in a frame is the sum over time of the power of the samples in the frame; thus for a signal x in a window from time sample t1 to time sample t2 the energy is: ∑ [ ] Delta feature Another interesting fact about the speech signal is that it is not constant from frame to frame. Co-articulation (influence of a speech sound during another 21
  33. 33. Chapter 2 | Speech Recognition adjacent or nearby speech sound) can provide a useful cue for phone identity. It can be preserved by using delta features. Velocity (delta) and acceleration (delta delta) coefficients are usually obtained from the static window based information. This delta and delta delta coefficients model the speed and acceleration of the variation of Cepstral feature vectors across adjacent windows. A simple way to compute deltas would be just to compute the difference between frames; thus the delta value d(t ) for a particular Cepstral value c (t) at time t can be estimated as: ( ) [] [] [] The differentiating method is simple, but since it acts as a high-pass filtering operation on the parameter domain, it tends to amplify noise. The solution to this is linear regression, i.e. first-order polynomial, the least squares solution is easily shown to be of the following form: [] ∑ [] ∑ Where, M is regression window size. We used M=4.       Composition of Feature Vector We calculated 39 Features from each frame: 12 MFCC Features. 12 Deltas MFCC. 12 Delta-Deltas MFCC. 1 Energy Feature. 1 Delta Energy Feature. 1 Delta-Delta Energy Feature. 2.4 | ARTIFICIAL NEURAL NETWORKS 2.4.1 | Introduction We have used ANNs to model our system and train voices and test it to classify it into words categories which return actions. And here we will make an overview about artificial neural networks. The original inspiration for the term Artificial Neural Network came from examination of central nervous systems and their neurons, axons, dendrites, and synapses, which constitute the processing elements of biological neural networks investigated by neuroscience. In an artificial neural network, simple artificial nodes, variously called "neurons", "neurodes", "processing elements" (PEs) or 22
  34. 34. Chapter 2 | Speech Recognition "units", are connected together to form a network of nodes mimicking the biological neural networks — hence the term "artificial neural network". Because neuroscience is still full of unanswered questions, and since there are many levels of abstraction and therefore many ways to take inspiration from the brain, there is no single formal definition of what an artificial neural network is. Generally, it involves a network of simple processing elements that exhibit complex global behavior determined by connections between processing elements and element parameters. While an artificial neural network does not have to be adaptive per se, its practical use comes with algorithms designed to alter the strength (weights) of the connections in the network to produce a desired signal flow. These networks are also similar to the biological neural networks in the sense that functions are performed collectively and in parallel by the units, rather than there being a clear delineation of subtasks to which various units are assigned (see also connectionism). Currently, the term Artificial Neural Network (ANN) tends to refer mostly to neural network models employed in statistics, cognitive psychology and artificial intelligence. Neural network models designed with emulation of the central nervous system (CNS) in mind are a subject of theoretical neuroscience and computational neuroscience. In modern software implementations of artificial neural networks, the approach inspired by biology has been largely abandoned for a more practical approach based on statistics and signal processing. In some of these systems, neural networks or parts of neural networks (such as artificial neurons) are used as components in larger systems that combine both adaptive and non-adaptive elements. While the more general approach of such adaptive systems is more suitable for real-world problem solving, it has far less to do with the traditional artificial intelligence connectionist models. What they do have in common, however, is the principle of non-linear, distributed, parallel and local processing and adaptation. Historically, the use of neural networks models marked a paradigm shift in the late eighties from high-level (symbolic) artificial intelligence, characterized by expert systems with knowledge embodied in if-then rules, to lowlevel (sub-symbolic) machine learning, characterized by knowledge embodied in the parameters of a dynamical system. 2.4.2 | Models 23
  35. 35. Chapter 2 | Speech Recognition Neural network models in artificial intelligence are usually referred to as artificial neural networks (ANNs); these are essentially simple mathematical models defining a function or a distribution over or both and , but sometimes models are also intimately associated with a particular learning algorithm or learning rule. A common use of the phrase ANN model really means the definition of a class of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons or their connectivity). 2.4.3 | Network Function The word network in the term 'artificial neural network' refers to the inter– connections between the neurons in the different layers of each system. An example system has three layers. The first layer has input neurons, which send data via synapses to the second layer of neurons, and then via more synapses to the third layer of output neurons. More complex systems will have more layers of neurons with some having increased layers of input neurons and output neurons. The synapses store parameters called "weights" that manipulate the data in the calculations. An ANN is typically defined by three types of parameters:  The interconnection pattern between different layers of neurons  The learning process for updating the weights of the interconnections  The activation function that converts a neuron's weighted input to its output activation. Mathematically, a neuron's network function is defined as a composition of other functions, which can further be defined as a composition of other functions. This can be conveniently represented as a network structure, with arrows depicting the dependencies between variables. A widely used type of composition is the nonlinear weighted sum, where (commonly referred to as the activation function) is some predefined function, such as the hyperbolic tangent. It will be convenient for the following to refer to a collection of functions as simply a vector. 2.4.4 | ANN dependency graph This figure depicts such a decomposition of , with dependencies between variables indicated by arrows. These can be interpreted in two ways. The first view is the functional view: the input is transformed into a 3dimensional vector , which is then transformed into a 2-dimensional vector , which is finally transformed into . This view is most commonly encountered in the context of optimization. 24
  36. 36. Chapter 2 | Speech Recognition The second view is the probabilistic view: the random variable depends upon the random variable , which depends upon , which depends upon the random variable . This view is most commonly encountered in the context of graphical models. The two views are largely equivalent. In either case, for this particular network architecture, the components of individual layers are independent of each other (e.g., the components of are independent of each other given their input). This naturally enables a degree of parallelism in the implementation. Two separate depictions of the recurrent ANN dependency graph. Networks such as the previous one are commonly called feed forward, because their graph is a directed acyclic graph. Networks with cycles are commonly called recurrent. Such networks are commonly depicted in the manner shown at the top of the figure, where is shown as being dependent upon itself. However, an implied temporal dependence is not shown. 2.4.5 | Learning What has attracted the most interest in neural networks is the possibility of learning. Given a specific task to solve, and a class of functions, learning means using a set of observations to find which solves the task in some optimal sense. This entails defining a cost function such that, for the optimal solution, - i.e., no solution has a cost less than the cost of the optimal solution (see Mathematical optimization). The cost function is an important concept in learning, as it is a measure of how far away a particular solution is from an optimal solution to the problem to be solved. Learning algorithms search through the solution space to find a function that has the smallest possible cost. For applications where the solution is dependent on some data, the cost must necessarily be a function of the observations; otherwise we would not be modeling anything related to the data. It is frequently defined as a statistic to which only approximations can be made. As a simple example, consider the problem of finding the model , which minimizes , for data pairs drawn from some distribution . In practical situations we would only have samples from and thus, for the above example, we would only minimize . Thus, the cost is minimized over a sample of the data rather than the entire data set. 25
  37. 37. Chapter 2 | Speech Recognition When some form of online machine learning must be used, where the cost is partially minimized as each new example is seen. While online machine learning is often used when is fixed, it is most useful in the case where the distribution changes slowly over time. In neural network methods, some form of online machine learning is frequently used for finite datasets. 2.4.6 | Choosing a cost function While it is possible to define some arbitrary, ad hoc cost function, frequently a particular cost will be used, either because it has desirable properties (such as convexity) or because it arises naturally from a particular formulation of the problem (e.g., in a probabilistic formulation the posterior probability of the model can be used as an inverse cost). Ultimately, the cost function will depend on the desired task. An overview of the three main categories of learning tasks is provided below. 2.4.7 | Learning paradigms There are three major learning paradigms, each corresponding to a particular abstract learning task. These are supervised learning, unsupervised learning and reinforcement learning. 2.4.8 | Supervised learning In supervised learning, we are given a set of example pairs and the aim is to find a function in the allowed class of functions that matches the examples. In other words, we wish to infer the mapping implied by the data; the cost function is related to the mismatch between our mapping and the data and it implicitly contains prior knowledge about the problem domain. A commonly used cost is the mean-squared error, which tries to minimize the average squared error between the network's output, f(x), and the target value y over all the example pairs. When one tries to minimize this cost using gradient descent for the class of neural networks called multilayer perceptron’s, one obtains the common and well-known back-propagation algorithm for training neural networks. Tasks that fall within the paradigm of supervised learning are pattern recognition (also known as classification) and regression (also known as function approximation). The supervised learning paradigm is also applicable to sequential 26
  38. 38. Chapter 2 | Speech Recognition data (e.g., for speech and gesture recognition). This can be thought of as learning with a "teacher," in the form of a function that provides continuous feedback on the quality of solutions obtained thus far. 2.4.9 | Unsupervised learning In unsupervised learning, some data is given and the cost function to be minimized, that can be any function of the data and the network's output. The cost function is dependent on the task (what we are trying to model) and our a priori assumptions (the implicit properties of our model, its parameters and the observed variables). As a trivial example, consider the model, where is a constant and the cost. Minimizing this cost will give us a value of that is equal to the mean of the data. The cost function can be much more complicated. Its form depends on the application: for example, in compression it could be related to the mutual information between and, whereas in statistical modeling, it could be related to the posterior probability of the model given the data. (Note that in both of those examples those quantities would be maximized rather than minimized). Tasks that fall within the paradigm of unsupervised learning are in general estimation problems; the applications include clustering, the estimation of statistical distributions, compression and filtering. 2.4.10 | Reinforcement learning In reinforcement learning, data are usually not given, but generated by an agent's interactions with the environment. At each point in time, the agent performs an action and the environment generates an observation and an instantaneous cost, according to some (usually unknown) dynamics. The aim is to discover a policy for selecting actions that minimizes some measure of a long-term cost; i.e., the expected cumulative cost. The environment's dynamics and the long-term cost for each policy are usually unknown, but can be estimated. More formally, the environment is modeled as a Markov decision process (MDP) with states and actions with the following probability distributions: the instantaneous cost distribution, the observation distribution and the transition, while a policy is defined as conditional distribution over actions given the observations. Taken together, the two define a Markov chain (MC). The aim is to 27
  39. 39. Chapter 2 | Speech Recognition discover the policy that minimizes the cost; i.e., the MC for which the cost is minimal. ANNs are frequently used in reinforcement learning as part of the overall algorithm. Dynamic programming has been coupled with ANNs (Neuro dynamic programming) by Bertsekas and Tsitsiklis and applied to multi-dimensional nonlinear problems such as those involved in vehicle routing or natural resources management because of the ability of ANNs to mitigate losses of accuracy even when reducing the discretization grid density for numerically approximating the solution of the original control problems. Tasks that fall within the paradigm of reinforcement learning are control problems, games and other sequential decision making tasks. 2.4.11 | Learning algorithms Training a neural network model essentially means selecting one model from the set of allowed models (or, in a Bayesian framework, determining a distribution over the set of allowed models) that minimizes the cost criterion. There are numerous algorithms available for training neural network models; most of them can be viewed as a straightforward application of optimization theory and statistical estimation. Most of the algorithms used in training artificial neural networks employ some form of gradient descent. This is done by simply taking the derivative of the cost function with respect to the network parameters and then changing those parameters in a gradient-related direction. Evolutionary methods, simulated annealing, expectation-maximization, nonparametric methods and particle swarm optimization are some commonly used methods for training neural networks. 2.4.12 | Employing artificial neural networks Perhaps the greatest advantage of ANNs is their ability to be used as an arbitrary function approximation mechanism that 'learns' from observed data. However, using them is not so straightforward and a relatively good understanding of the underlying theory is essential. Choice of model: This will depend on the data representation and the application. Overly complex models tend to lead to problems with learning. 28
  40. 40. Chapter 2 | Speech Recognition Learning algorithm: There is numerous trades-offs between learning algorithms. Almost any algorithm will work well with the correct hyper parameters for training on a particular fixed data set. However selecting and tuning an algorithm for training on unseen data requires a significant amount of experimentation. Robustness: If the model, cost function and learning algorithm are selected appropriately the resulting ANN can be extremely robust. With the correct implementation, ANNs can be used naturally in online learning and large data set applications. Their simple implementation and the existence of mostly local dependencies exhibited in the structure allows for fast, parallel implementations in hardware. 2.4.13 | Applications The utility of artificial neural network models lies in the fact that they can be used to infer a function from observations. This is particularly useful in applications where the complexity of the data or task makes the design of such a function by hand impractical. 2.4.13.1 | Real-life applications     The tasks artificial neural networks are applied to tend to fall within the following broad categories: Function approximation, or regression analysis, including time series prediction, fitness approximation and modeling. Classification, including pattern and sequence recognition, novelty detection and sequential decision making. Data processing, including filtering, clustering, blind source separation and compression. Robotics, including directing manipulators, Computer numerical control. Application areas include system identification and control (vehicle control, process control, natural resources management), quantum chemistry, game-playing and decision making (backgammon, chess, poker), pattern recognition (radar systems, face identification, object recognition and more), sequence recognition (gesture, speech, handwritten text recognition), medical diagnosis, financial 29
  41. 41. Chapter 2 | Speech Recognition applications (automated trading systems), data mining (or knowledge discovery in databases, "KDD"), visualization and e-mail spam filtering. Artificial neural networks have also been used to diagnose several cancers. An ANN based hybrid lung cancer detection system named HLND improves the accuracy of diagnosis and the speed of lung cancer radiology. These networks have also been used to diagnose prostate cancer. The diagnoses can be used to make specific models taken from a large group of patients compared to information of one given patient. The models do not depend on assumptions about correlations of different variables. Colorectal cancer has also been predicted using the neural networks. Neural networks could predict the outcome for a patient with colorectal cancer with a lot more accuracy than the current clinical methods. After training, the networks could predict multiple patient outcomes from unrelated institutions. 2.4.13.2 | Neural networks and neuroscience Theoretical and computational neuroscience is the field concerned with the theoretical analysis and computational modeling of biological neural systems. Since neural systems are intimately related to cognitive processes and behavior, the field is closely related to cognitive and behavioral modeling. The aim of the field is to create models of biological neural systems in order to understand how biological systems work. To gain this understanding, neuroscientists strive to make a link between observed biological processes (data), biologically plausible mechanisms for neural processing and learning (biological neural network models) and theory (statistical learning theory and information theory). 2.4.14 | Types of models Many models are used in the field defined at different levels of abstraction and modeling different aspects of neural systems. They range from models of the short-term behavior of individual neurons, models of how the dynamics of neural circuitry arise from interactions between individual neurons and finally to models of how behavior can arise from abstract neural modules that represent complete subsystems. These include models of the long-term, and short-term plasticity, of neural systems and their relations to learning and memory from the individual neuron to the system level. 30
  42. 42. Chapter 2 | Speech Recognition 2.4.15 | Neural network software Neural network software is used to simulate research, develop and apply artificial neural networks, biological neural networks and in some cases a wider array of adaptive systems. 2.4.16 | Types of artificial neural networks Artificial neural network types vary from those with only one or two layers of single direction logic, to complicated multi–input many directional feedback loop and layers. On the whole, these systems use algorithms in their programming to determine control and organization of their functions. Some may be as simple as a one neuron layer with an input and an output, and others can mimic complex systems such as dANN, which can mimic chromosomal DNA through sizes at cellular level, into artificial organisms and simulate reproduction, mutation and population sizes. Most systems use "weights" to change the parameters of the throughput and the varying connections to the neurons. Artificial neural networks can be autonomous and learn by input from outside "teachers" or even self-teaching from written in rules. 2.4.17 | Confidence analysis of a neural network Supervised neural networks that use an MSE cost function can use formal statistical methods to determine the confidence of the trained model. The MSE on a validation set can be used as an estimate for variance. This value can then be used to calculate the confidence interval of the output of the network, assuming a normal distribution. A confidence analysis made this way is statistically valid as long as the output probability distribution stays the same and the network is not modified. By assigning a softmax activation function on the output layer of the neural network (or a softmax component in a component-based neural network) for categorical target variables, the outputs can be interpreted as posterior probabilities. This is very useful in classification as it gives a certainty measure on classifications. 31
  43. 43. CHAPTER 3 Image Processing s
  44. 44. Chapter 3 | Image Processing 3.1 | INTRODUCTION This chapter is an introduction on how to handle images in Matlab. When working with images in Matlab, there are many things to keep in mind such as loading an image, using the right format, saving the data as different data types, how to display an image, conversion between different image formats, etc. This worksheet presents some of the commands designed for these operations. Most of these commands require you to have the Image processing tool box installed with MATLAB. To find out if it is installed type very at the Matlab prompt. This gives you a list of what tool boxes that are installed on your system. For further reference on image handling in Matlab you are recommended to use Matlab's help browser. There is an extensive (and quite good) on-line manual for the Image processing tool box that you can access via Matlab's help browser. The first sections of this worksheet are quite heavy. The only way to understand how the presented commands work, is to carefully work through the examples given at the end of the worksheet. Once you can get these examples to work, experiment on your own using your favorite image! 3.1.1 | What Is Digital Image Processing? Transforming digital information representing images. 3.1.2 | Motivating Problems: 1. 2. 3. 4. 5. 6. 7. 8. 9. Improve pictorial information for human interpretation. Remove noise. Correct for motion, camera position, and distortion. Enhance by changing contrast, color. Segmentation - dividing an image up into constituent parts Representation - representing an image by some more abstract. Models Classification. Reduce the size of image information for efficient handling. Compression with loss of digital information that minimizes loss of "perceptual" information. JPEG and GIF, MPEG. 33
  45. 45. Chapter 3 | Image Processing 3.2 | COLOR VISION The color-responsive chemicals in the cones are called cone pigments and are very similar to the chemicals in the rods. The retinal portion of the chemical is the same, however the scotopsin is replaced with photopsins. Therefore, the colorresponsive pigments are made of retinal and photopsins. There are three kinds of color-sensitive pigments: • Red-sensitive pigment • Green-sensitive pigment • Blue-sensitive pigmentlution representations versus quality of service. Each cone cell has one of these pigments so that it is sensitive to that color. The human eye can sense almost any gradation of color when red, green and blue are mixed. The wavelengths of the three types of cones (red, green and blue) are shown. The peak absorbancy of blue-sensitive pigment is 445 nanometers, for greensensitive pigment it is 535 nanometers, and for red-sensitive pigment it is 570 nanometers. MATLAB stores most images as two-dimensional arrays (i.e., matrices), in which each element of the matrix corresponds to a single pixel in the displayed image. For example, an image composed of 200 rows and 300 columns of different colored dots would be stored in MATLAB as a 200-by-300 matrix. Some images, such as RGB, require a three dimensional array, where the first plane in the 3rd dimension represents the red pixel intensities, the second plane represents the green pixel intensities, and the third plane represents the blue pixel intensities. To reduce memory requirements, MATLAB supports storing image data in arrays of class uint8 and uint16. The data in these arrays is stored as 8-bit or 16-bit unsigned integers. These arrays require one-eighth or one-fourth as much memory as data in double arrays. An image whose data matrix has class uint8 is called an 8bit image; an image whose data matrix has class uint16 is called a 16-bit image. 3.2.1 | Fundamentals A digital image is composed of pixels which can be thought of as small dots on the screen. A digital image is an instruction of how to color each pixel. We will see in detail later on how this is done in practice. A typical size of an image is 512by-512 pixels. Later on in the course you will see that it is convenient to let the 33
  46. 46. Chapter 3 | Image Processing dimensions of the image to be a power of 2. For example, 2 9=512. In the general case we say that an image is of size m-by-n if it is composed of m pixels in the vertical direction and n pixels in the horizontal direction. Let us say that we have an image on the format 512-by-1024 pixels. This means that the data for the image must contain information about 524288 pixels, which requires a lot of memory! Hence, compressing images is essential for efficient image processing. You will later on see how Fourier analysis and Wavelet analysis can help us to compress an image significantly. There are also a few "computer scientific" tricks (for example entropy coding) to reduce the amount of data required to store an image. 3.2.2 | Image Formats Supported By Mat lab. The following image formats are supported by Mat lab:       BMP HDF JPEG PCX TIFF XWB Most images you find on the Internet are JPEG-images which is the name for one of the most widely used compression standards for images. If you have stored an image you can usually see from the suffix what format it is stored in. For example, an image named myimage.jpg is stored in the JPEG format and we will see later on that we can load an image of this format into Mat lab. 3.2.3 | Working Formats In Matlab: If an image is stored as a JPEG-image on your disc we first read it into Matlab. However, in order to start working with an image, for example perform a wavelet transform on the image, we must convert it into a different format. This section explains four common formats. 3.3 | ASPECTS OF IMAGE PROCESSING 33
  47. 47. Chapter 3 | Image Processing Image Enhancement: Processing an image so that the result is more suitable for a particular application. (Sharpening or deploring an out of focus image, highlighting edges, improving image contrast, or brightening an image, removing noise) Image Restoration: This may be considered as reversing the damage done to an image by a known cause. (Removing of blur caused by linear motion, removal of optical distortions) Image Segmentation: This involves subdividing an image into constituent parts, or isolating certain aspects of an image.(finding lines, circles, or particular shapes in an image, in an aerial photograph, identifying cars, trees, buildings, or roads. 3.4 | IMAGE TYPES 3.4.1 | Intensity Image (Gray Scale Image) This is the equivalent to a "gray scale image" and this is the image we will mostly work with in this course. It represents an image as a matrix where every element has a value corresponding to how bright/dark the pixel at the corresponding position should be colored. There are two ways to represent the number that represents the brightness of the pixel: The double class (or data type). This assigns a floating number ("a number with decimals") between 0 and 1 to each pixel. The value 0 corresponds to black and the value 1 corresponds to white. The other class is called uint8 which assigns an integer between 0 and 255 to represent the brightness of a pixel. The value 0 corresponds to black and 255 to white. The class uint8 only requires roughly 1/8 of the storage compared to the class double. On the other hand, many mathematical functions can only be applied to the double class. We will see later how to convert between double and uint8. Fig. (3.1) 33
  48. 48. Chapter 3 | Image Processing 3.4.2 | Binary Image: This image format also stores an image as a matrix but can only color a pixel black or white (and nothing in between). It assigns a 0 for black and a 1 for white. 3.4.3 | Indexed Image: This is a practical way of representing color images. (In this course we will mostly work with gray scale images but once you have learned how to work with a gray scale image you will also know the principle how to work with color images.) An Indexed image stores an image as two matrices. The first matrix has the same size as the image and one number for each pixel. The second matrix is called the color map and its size may be different from the image. The numbers in the first matrix is an instruction of what number to use in the color map matrix. Fig. (3.2) 3.4.4 | RGB Image This is another format for color images. It represents an image with three matrices of sizes matching the image format. Each matrix corresponds to one of the colors red, green or blue and gives an instruction of how much of each of these colors a certain pixel should use. 3.4.5 | Multi-frame Image: In some applications we want to study a sequence of images. This is very common in biological and medical imaging where you might study a sequence of slices of a cell. For these cases, the multi-frame format is a convenient way of 33
  49. 49. Chapter 3 | Image Processing working with a sequence of images. In case you choose to work with biological imaging later on in this course, you may use this format. 3.5 | HOW TO? 3.5.1 | How To Convert Between Different Formats: The following table shows how to convert between the different formats given above. All these commands require the Image processing tool box! Table(3.1)Image format conversion (Within the parenthesis you type the name of the image you wish to convert) Operation Convert between intensity/indexed/RGB format to binary format. Convert between intensity format to indexed format. Convert between indexed format to intensity format. Convert between indexed format to RGB format. Convert a regular matrix to intensity format by scaling. Convert between RGB format to intensity format. Convert between RGB format to indexed format. Matlab command dither() gray2ind() ind2gray() ind2rgb() mat2gray() rgb2gray() rgb2ind() The command mat2gray is useful if you have a matrix representing an image but the values representing the gray scale range between, let's say, 0 and 1000. The command mat2gray automatically re scales all entries so that they fall within 0 and 255 (if you use the uint class) or 0 and 1 (if you use the double class). 3.5.2 | How to Read Files When you encounter an image you want to work with, it is usually in form of a file (for example, if you down load an image from the web, it is usually stored as a JPEG-file). Once we are done processing an image, we may want to write it back to a JPEG-file so that we can, for example, post the processed image on the web. This is done using the imread and imwrite commands. These commands require the Image processing tool box! 33
  50. 50. Chapter 3 | Image Processing Table(3.2)Reading and writing image files Operation Read an image. (Within the parenthesis you type the name of the image file you wish to read. Put the file name within single quotes Write an image to a file. (As the first argument within the parenthesis you type the name of the image you have worked with. As a second argument within the parenthesis you type the name of the file and format that you want to write the image to. Put the file name within single quotes. Matlab command imread() imwrite( ) Make sure to use semi-colon; after these commands, otherwise you will get LOTS OF number scrolling on your screen... The commands imread and imwrite support the formats given in the section "Image formats supported by Matlab" above. 3.5.3 | Loading And Saving Variables in Matlab This section explains how to load and save variables in Mat lab. Once you have read a file, you probably convert it into an intensity image (a matrix) and work with this matrix. Once you are done you may want to save the matrix representing the image in order to continue to work with this matrix at another time. This is easily done using the commands save and load. Note that save and load are commonly used Matlab commands, and works independently of what tool boxes that are installed. Table(3.3) Loading and saving variables Operation Save the variable X. Load the variable X. Matlab command Save X Load X 3.5.4 | How to Display an Image in MATLAB Here are a couple of basic Mat lab commands (do not require any tool box) for displaying an image. 33
  51. 51. Chapter 3 | Image Processing Table(3.4)Displaying an image given on matrix form Operation Display an image represented as the matrix X. Adjust the brightness .S is a parameter such that -1<s<0 gives a darker image, 0<s<1 gives a brighter image. Change the colors to gray. Matlab command imagesc(X) brighten(s) colormap(gray) Sometimes your image may not be displayed in gray scale even though you might have converted it into a gray scale image. You can then use the command colormap (gray) to "force" Matlab to use a gray scale when displaying an image. If you are using Matlab with an Image processing tool box installed, I recommend you to use the command imshow to display an image. Table (3.5)Displaying an image given on matrix form (with image processing tool box) Operation Matlab command Display an image represented as the matrix X. imshow(X) Zoom in (using the left and right mouse button). zoom on Turn off the zoom function. zoom off 3.6 | SOME IMPORTANT DEFINITIONS 3.6.1 | Imread Function A = imread (filename, fmt) reads a grayscale or true color image named filename into A. If the file contains a grayscale intensity image, A is a two-dimensional array. If the file contains a true color (RGB) image, A is a three-dimensional (mby-n-by-3) array. 3.6.2 | Rotation >> B = imrotate (A, ANGLE, METHOD) Where; A: Your image. ANGLE: The angle (in degrees) you want to rotate your image in the counter clockwise direction. METHOD: A string that can have one of these values If you omit the METHOD argument, IMROTATE uses the default method of 'nearest'. 34
  52. 52. Chapter 3 | Image Processing Note: to rotate the image clockwise, specify a negative angle. The returned image matrix B is, in general, larger than A to include the whole rotated image. IMROTATE sets invalid values on the periphery of B to 0. 3.6.3 | Scaling IMRESIZE resizes an image of any type using the specified interpolation method. Supported interpolation methods 3.6.4 | Interpolation 'nearest' (default) nearest neighbor interpolation? 'bilinear' bilinear interpolation? 'bicubic' bicubic interpolation ? B = IMRESIZE(A,M,METHOD) returns an image that is M times the size of A. If M is between 0 and 1.0, B is smaller than A. If M is greater than 1.0, B is larger than A. If METHOD is omitted, IMRESIZE uses nearest neighbor interpolation. B = IMRESIZE (A,[MROWS MCOLS],METHOD) returns an image of size MROWS-by-MCOLS. If the specified size does not produce the same aspect ratio as the input image has, the output image is distorted. a= imread(‘image.fmt’); % put your image in place of image.fmt. » B = IMRESIZE (a,[100 100],'nearest'); » imshow(B); » B = IMRESIZE(a,[100 100],'bilinear'); » imshow(B); » B = IMRESIZE(a,[100 100],'bicubic'); » imshow(B); 3.7 | EDGE DETECTION 3.7.1 | Canny Edge Detector 1. Low error rate of detection Well match human perception results 2. Good localization of edges The distance between actual edges in an image and the edges found by a computational algorithm should be minimized 3. Single response 34
  53. 53. Chapter 3 | Image Processing The algorithm should not return multiple edges pixels when only a single one exist. 3.7.2 | Edge Detectors bw color Canny sobel Fig.(3.4) Fig. (3.5) 3.7.3 | Edge Tracing b=rgb2gray(a); % convert to gray. WE can only do edge tracing for gray images. edge(b,'prewitt'); edge(b,'sobel'); edge(b,'sobel','vertical'); edge(b,'sobel','horizontal'); edge(b,'sobel','both'); We can only do edge tracing using gray scale images (i.e images without color). 34
  54. 54. Chapter 3 | Image Processing >> BW=rgb2gray (A); >> edge (BW,’prewitt’) Fig.(3.6) That is what I saw! >> edge (BW,’sobel’,’vertical’) >> edge (BW,’sobel’,’horizontal’) >> edge (BW,’sobel’,’both’) Table(3.6):Data types Type Int8 Uint8. Int16 Double Description 8-bit integer 8-bit unsigned integer 16-bit integer Double precision real number 3.8 | MAPPING 3.8.1 | Mapping Images onto Surfaces Overview 33 Range -128_127 0_255 -32768_32767 Machine specific
  55. 55. Chapter 3 | Image Processing Mapping an image onto geometry, also known as texture mapping, involves overlaying an image or function onto a geometric surface. Images may be realistic, such as satellite images, or representational, such as color-coded functions of temperature or elevation. Unlike volume visualizations, which render each voxel (volume element) of a three-dimensional scene, mapping an image onto geometry efficiently creates the appearance of complexity by simply layering an image onto a surface. The resulting realism of the display also provides information that is not as readily apparent as with a simple display of either the image or the geometric surface. Mapping an image onto a geometric surface is a two step process. First, the image is mapped onto the geometric surface in object space. Second, the surface undergoes view transformations (relating to the viewpoint of the observer) and is then displayed in 2D screen space. You can use IDL Direct Graphics or Object Graphics to display images mapped onto geometric surfaces. The following table introduces the tasks and routines. Table(3.7):Tasks and Routines Associated with Mapping an Image onto Geometry Routine(s)/Object(s) Description SHADE_SURF Display the elevation data IDLgrWindow::Init IDLgrView::Init Initialize the objects necessary for an Object Graphics display. IDLgrModel::Init IDLgrSurface:: Init Initialize a surface object containing the elevation data. IDLgrImage::Init Initialize an image object containing the satellite image XOBJVIEW Display the object in an interactive IDL utility allowing rotation and resizing. 3.8.2 | Mapping an Image onto Elevation Data The following Object Graphics example maps a satellite image from the Los Angeles, California vicinity onto a DEM (Digital Elevation Model) containing the areas topographical features. The realism resulting from mapping the image onto the corresponding elevation data provides a more informative view of the area’s topography. The process is segmented into the following three sections: • “Opening Image and Geometry Files” • “Initializing the IDL Display Objects” • “Displaying the Image and Geometric Surface Objects” 33
  56. 56. Chapter 3 | Image Processing Note: Data can be either regularly gridded (defined by a 2D array) or irregularly gridded (defined by irregular x, y, z points). Both the image and elevation data used in this example are regularly gridded. If you are dealing with irregularly gridded data, use GRIDDATA to map the data to a regular grid. Complete the following steps for a detailed description of the process. Example Code: See elevation_object.pro in the examples/doc/image subdirectory of the IDL installation directory for code that duplicates this example. Run the example procedure by entering elevation object at the IDL command prompt or view the file in an IDL Editor window by entering .EDIT elevation_object.pro. Opening Image and Geometry Files: The following steps read in the satellite image and DEM files and display the Elevation data. 1. Select the satellite image: >> imageFile = FILEPATH('elev_t.jpg', $) SUBDIRECTORY = ['examples', 'data']) 2. Import the JPEG file: READ_JPEG, image File, image 3. Select the DEM file: demFile = FILEPATH('elevbin.dat', $) SUBDIRECTORY = ['examples', 'data']) 4. Define an array for the elevation data, open the file, read in the data and close the file: dem = READ_BINARY(demfile, DATA_DIMS = [64, 64] 5. Enlarge the size of the elevation array for display purposes: dem = CONGRID(dem, 128, 128, /INTERP) 6. To quickly visualize the elevation data before continuing on to the Object Graphics section, initialize the display, create a window and display the elevation data using the SHADE_SURF command: DEVICE, DECOMPOSED = 0 33
  57. 57. Chapter 3 | Image Processing WINDOW, 0, TITLE = 'Elevation Data' SHADE_SURF, dem After reading in the satellite image and DEM data, continue with the next section to create the objects necessary to map the satellite image onto the elevation surface. Fig.(3.7):Visual Display of the Elevation Data After reading in the satellite image and DEM data, continue with the next section to create the objects necessary to map the satellite image onto the elevation surface. 3.8.3 | Initializing the IDL Display Objects After reading in the image and surface data in the previous steps, you will need to create objects containing the data. When creating an IDL Object Graphics display, it is necessary to create a window object (oWindow), a view object (oView) and a model object (oModel). These display objects, shown in the conceptual representation in the following figure, will contain a geometric surface object (the DEM data) and an image object (the satellite image). These user-defined objects are instances of existing IDL object classes and provide access to the properties and methods associated with each object class. 33
  58. 58. Chapter 3 | Image Processing Note: (The XOBJVIEW utility (described in “Mapping an Image Object onto a Sphere” automatically creates window and view Complete the following steps to initialize the necessary IDL objects.) 1. Initialize the window, view and model display objects. For detailed syntax, arguments and keywords available with each object initialization, see IDLgrWindow::Init, IDLgrView::Init and IDLgrModel::Init. The following three lines use the basic syntax : oNewObject = OBJ_NEW('Class_Name') To create these objects: oWindow = OBJ_NEW('IDLgrWindow', RETAIN = 2, COLOR_MODEL = 0) oView = OBJ_NEW('IDLgrView') oModel = OBJ_NEW('IDLgrModel') 2. Assign the elevation surface data, dem, to an IDLgrSurface object. The IDLgrSurface::Init keyword, STYLE = 2, draws the elevation data using a filled line style: oSurface = OBJ_NEW('IDLgrSurface', dem, STYLE = 2) 3. Assign the satellite image to a user-defined IDLgrImage object using IDLgrImage::Init: oImage = OBJ_NEW('IDLgrImage', image, INTERLEAVE = 0, $ /INTERPOLATE) INTERLEAVE = 0 indicates that the satellite image is organized using pixel interleaving, and therefore has the dimensions (3, m, n). The INTERPOLATE keyword forces bilinear interpolation instead of using the default nearest neighbor interpolation method. 3.8.4 | Displaying the Image and Geometric Surface Objects This section displays the objects created in the previous steps. The image and surface objects will first be displayed in an IDL Object Graphics window and then with the interactive XOBJVIEW utility. 33
  59. 59. Chapter 3 | Image Processing 1. Center the elevation surface object in the display window. The default object graphics coordinate system is [–1,–1], [1,1]. To center the object in the window, position the lower left corner of the surface data at [–0.5,–0.5, –0.5] for the x, y and z dimensions: 2. Map the satellite image onto the geometric elevation surface using the IDLgrSurface::Init TEXTURE_MAP keyword: oSurface -> SetProperty, TEXTURE_MAP = oImage, $ COLOR = [255, 255, 255] For clearest display of the texture map, set COLOR = [255, 255, 255]. If the image does not have dimensions that are exact powers of 2, IDL resamples the image into a larger size that has dimensions which are the next powers of two greater than the original dimensions. This resampling may cause unwanted sampling artifacts. In this example, the image does have dimensions that are exact powers of two, so no resampling occurs. oSurface -> GETPROPERTY, XRANGE = xr, YRANGE = yr, $ ZRANGE = zr xs = NORM_COORD(xr) xs[0] = xs[0] - 0.5 ys = NORM_COORD(yr) ys[0] = ys[0] - 0.5 zs = NORM_COORD(zr) zs[0] = zs[0] - 0.5 oSurface -> SETPROPERTY, XCOORD_CONV = xs, $ YCOORD_CONV = ys, ZCOORD = zs Note: (If your texture does not have dimensions that are exact powers of 2 and you do not want to introduce resampling artifacts, you can pad the texture with unused data to a power of two and tell IDL to map only a subset of the texture onto the surface.) For example, if your image is 40 by 40, create a 64 by 64 image and fill part of it with the image data: textureImage = BYTARR(64, 64, /NOZERO) textureImage[0:39, 0:39] = image ; image is 40 by 40 oImage = OBJ_NEW('IDLgrImage', textureImage) Then, construct texture coordinates that map the active part of the texture to a surface (oSurface): textureCoords = [[], [], [], []] 33
  60. 60. Chapter 3 | Image Processing oSurface -> SetProperty, TEXTURE_COORD = textureCoords The surface object in IDL 5.6 is has been enhanced to automatically perform the above calculation. In the above example, just use the image data (the 40 by 40 array) to create the image texture and do not supply texture coordinates. IDL computes the appropriate texture coordinates to correctly use the 40 by 40 image. Note: (Some graphic devices have a limit for the maximum texture size. If your texture is larger than the maximum size, IDL scales it down into dimensions that work on the device. This rescaling may introduce resampling artifacts and loss of detail in the texture. To avoid this, use the TEXTURE_HIGHRES keyword to tell IDL to draw the surface in smaller pieces that can be texture mapped without loss of detail.) 3. Add the surface object, covered by the satellite image, to the model object. Then add the model to the view object: oModel -> Add, oSurface. oView -> Add, oMode. 4. Rotate the model for better display in the object window. Without rotating the model, the surface is displayed at a 90 elevation angle, containing no depth information. The following lines rotate the model 90 away from the viewer along the x-axis and 30clockwise along the y-axis and the x-axis: oModel -> ROTATE, [1, 0, 0], -90 oModel -> ROTATE, [0, 1, 0], 30 oModel -> ROTATE, [1, 0, 0], 30 5. Display the result in the Object Graphics window: oWindow -> Draw, oView Fig.(3.9:Image Mapped onto a Surface in an Object Graphics Window 33
  61. 61. Chapter 3 | Image Processing 6. Display the results using XOBJVIEW, setting the SCALE = 1 (instead of the default value of 1/SQRT3) to increase the size of the initial display: XOBJVIEW, oModel, /BLOCK, SCALE = 1 This results in the following display: Fig.( 3.10) Displaying the Image Mapped onto the Surface in XOBJVIEW After displaying the model, you can rotate it by clicking in the applicationwindow and dragging your mouse. Select the magnify button, then click near the middle of the image. Drag your mouse away from the center of the display to magnify the image or toward the center of the display to shrink the image. Select the left-most button on the XOBJVIEW toolbar to reset the display. 7. Destroy unneeded object references after closing the display windows: OBJ_DESTROY, [oView, oImage] The oModel and oSurface objects are automatically destroyed when oView is destroyed. For an example of mapping an image onto a regular surface using both Direct and Object Graphics displays, see “Mapping an Image onto a Sphere” 34
  62. 62. Chapter 3 | Image Processing 3.8.5 | Mapping an Image onto a Sphere The following example maps an image containing a color representation of world elevation onto a sphere using both Direct and Object Graphics displays. The example is broken down into two sections: • “Mapping an Image onto a Sphere Using Direct Graphics” . • “Mapping an Image Object onto a Sphere” . 3.9 | MAPPING OFF LINE: In the absence of a network or services we can identify and see the track through the use of image processing technique, We incorporate the map where an image of the places familiar to the person and determine how to access them and return them in a clear and safe. we calculate the distances by using mat lab function : IMDISTLINE and assuming speed to calculate time takes to get from one point to another and we guide person through voice commands for example on the road to move forward or back word or to left or to right. We have thus, we provide another way to work mapping without being online. 34
  63. 63. CHAPTER 4 GPS Navigation

×