Voice Recognition


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Voice Recognition

  1. 1. Syllabus for AI 2nd Mst notes Expert system from ES.pdf file from slides till page 8 Rest of the topics are listed below Voice Recognition AI continues to become increasingly a part of our daily lives. Voice recognition is is a key enabler of human-machine interface. Although we often talk (or curse) at our computers, the thought of them talking back to us is somewhat disturbing. The ultimate computer with artificial intelligence was the HAL computer in the science fiction classic. Speech recognition (also known as automatic speech recognition or computer speech recognition) converts spoken words to text. The term "voice recognition" is sometimes used to refer to recognition systems that must be trained to a particular speaker—as is the case for most desktop recognition software. Recognizing the speaker can simplify the task of translating speech. Speech recognition is a broader solution which refers to technology that can recognize speech without being targeted at single speaker—such as a call center system that can recognize arbitrary voices. Speech recognition applications include voice user interfaces such as voice dialing (e.g., "Call home"), call routing (e.g., "I would like to make a collect call"), domotic appliance control, search (e.g., find a podcast where particular words were spoken), simple data entry (e.g., entering a credit card number), preparation of structured documents (e.g., a radiology report), speech-to-text processing (e.g., word processors or emails), and aircraft (usually termed Direct Voice Input). Applications Health care In the health care domain, even in the wake of improving speech recognition technologies, medical transcriptionists (MTs) have not yet become obsolete. The services provided may be redistributed rather than replaced. Speech recognition can be implemented in front-end or back- end of the medical documentation process. Military High-performance fighter aircraft Substantial efforts have been devoted in the last decade to the test and evaluation of speech recognition in fighter aircraft. Helicopters The problems of achieving high recognition accuracy under stress and noise pertain strongly to the helicopter environment as well as to the fighter environment. Training air traffic controllers Training for air traffic controllers (ATC) represents an excellent application for speech recognition systems. Many ATC training systems currently require a person to act as a "pseudo-pilot", engaging in a voice dialog with the trainee controller, which simulates the dialog which the controller would have to conduct with pilots in a real ATC situation. Telephony and other domains
  2. 2. ASR in the field of telephony is now commonplace and in the field of computer gaming and simulation is becoming more widespread. Despite the high level of integration with word processing in general personal computing, however, ASR in the field of document production has not seen the expected increases in use. Further applications • Automatic translation; • Automotive speech recognition (e.g., Ford Sync); • Robotics; • Video games, with Tom Clancy's EndWar and Lifeline as working examples; • Transcription (digital speech-to-text); • Speech-to-text (transcription of speech into mobile text messages); • Air Traffic Control Speech Recognition • Home automation; • Interactive voice response; Pattern Recognition In machine learning, pattern recognition is the assignment of some sort of output value (or label) to a given input value (or instance), according to some specific algorithm. An example of pattern recognition is classification, which attempts to assign each input value to one of a given set of classes (for example, determine whether a given email is "spam" or "non-spam"). However, pattern recognition is a more general problem that encompasses other types of output as well. Other examples are regression, which assigns a real-valued output to each input; sequence labeling, which assigns a class to each member of a sequence of values (for example, part of speech tagging, which assigns a part of speech to each word in an input sentence); and parsing, which assigns a parse tree to an input sentence, describing the syntactic structure of the sentence. Pattern recognition algorithms generally aim to provide a reasonable answer for all possible inputs and to do "fuzzy" matching of inputs. This is opposed to pattern matching algorithms, which look for exact matches in the input with pre-existing patterns. A common example of a pattern-matching algorithm is regular expression matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and word processors. In contrast to pattern recognition, pattern matching is generally not considered a type of machine learning, although pattern-matching algorithms (especially with fairly general, carefully tailored patterns) can sometimes succeed in providing similar-quality output to the sort provided by pattern-recognition algorithms. Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value. Supervised learning assumes that a set of training data (the training set) has been provided, consisting of a set of instances that have been properly labeled by hand with the correct output. A learning procedure then generates a model that attempts to meet two sometimes conflicting objectives: Perform as well as possible on the training data, and generalize as well as possible to new data (usually, this means being as simple as possible, for some technical definition of "simple", in accordance with Occam's Razor). Unsupervised learning, on the other hand, assumes training data that has not been hand-labeled, and attempts to find inherent patterns in the data that can then be used to determine the correct output value for new data instances. A combination of the two that has recently been explored is semi-supervised learning, which uses a combination of labeled and unlabeled data (typically a small set of labeled data combined with a large amount of unlabeled data). Note that in cases of unsupervised learning, there may be no training data at all to speak of; in other words, the data to be labeled is the training data. Uses
  3. 3. Within medical science, pattern recognition is the basis for computer-aided diagnosis (CAD) systems. CAD describes a procedure that supports the doctor's interpretations and findings. Typical applications are automatic speech recognition, classification of text into several categories (e.g. spam/non-spam email messages), the automatic recognition of handwritten postal codes on postal envelopes, or the automatic recognition of images of human faces. The last two examples form the subtopic image analysis of pattern recognition that deals with digital images as input to pattern recognition systems. Features of AI The general problem of simulating (or creating) intelligence has been broken down into a number of specific sub-problems. These consist of particular traits or capabilities that researchers would like an intelligent system to display. The traits described below have received the most attention. Deduction, reasoning, problem solving Early AI researchers developed algorithms that imitated the step-by-step reasoning that humans were often assumed to use when they solve puzzles, play board games or make logical deductions.[39] By the late 1980s and '90s, AI research had also developed highly successful methods for dealing with uncertain or incomplete information, employing concepts from probability and economics. Human beings solve most of their problems using fast, intuitive judgments rather than the conscious, step-by-step deduction that early AI research was able to model. AI has made some progress at imitating this kind of "sub-symbolic" problem solving: embodied agent approaches emphasize the importance of sensorimotor skills to higher reasoning; neural net research attempts to simulate the structures inside human and animal brains that give rise to this skill. Knowledge representation Knowledge representation[43] and knowledge engineering[44] are central to AI research. Many of the problems machines are expected to solve will require extensive knowledge about the world. Among the things that AI needs to represent are: objects, properties, categories and relations between objects;[45] situations, events, states and time;[46] causes and effects;[47] knowledge about knowledge (what we know about what other people know);[48] and many other, less well researched domainsAmong the most difficult problems in knowledge representation are: 1 Default reasoning and the qualification problem Many of the things people know take the form of "working assumptions." For example, if a bird comes up in conversation, people typically picture an animal that is fist sized, sings, and flies. None of these things are true about all birds. 2 The breadth of commonsense knowledge The number of atomic facts that the average person knows is astronomical. A major goal is to have the computer understand enough concepts to be able to learn by reading from sources like the internet, and thus be able to add to its own ontology.[citation needed] 3 The subsymbolic form of some commonsense knowledge Much of what people know is not represented as "facts" or "statements" that they could express verbally. For example, a chess master will avoid a particular chess position because it "feels too exposed"[53] or an art critic can take one look at a statue and instantly realize that it is a fake.[54]
  4. 4. Planning Intelligent agents must be able to set goals and achieve them.[56] They need a way to visualize the future (they must have a representation of the state of the world and be able to make predictions about how their actions will change it) and be able to make choices that maximize the utility (or "value") of the available choices.[57] Learning Machine learning[61] has been central to AI research from the beginning.[62] Unsupervised learning is the ability to find patterns in a stream of input. Supervised learning includes both classification and numerical regression. Classification is used to determine what category something belongs in, after seeing a number of examples of things from several categories. Regression takes a set of numerical input/output examples and attempts to discover a continuous function that would generate the outputs from the inputs. In reinforcement learning[63] the agent is rewarded for good responses and punished for bad ones. These can be analyzed in terms of decision theory, using concepts like utility. The mathematical analysis of machine learning algorithms and their performance is a branch of theoretical computer science known as computational learning theory. Natural language processing ASIMO uses sensors and intelligent algorithms to avoid obstacles and navigate stairs. Natural language processing[64] gives machines the ability to read and understand the languages that humans speak. Many researchers hope that a sufficiently powerful natural language processing system would be able to acquire knowledge on its own, by reading the existing text available over the internet. Some straightforward applications of natural language processing include information retrieval (or text mining) and machine translation.[65] Motion and manipulation The field of robotics[66] is closely related to AI. Intelligence is required for robots to be able to handle such tasks as object manipulation[67] and navigation, with sub-problems of localization (knowing where you are), mapping (learning what is around you) and motion planning (figuring out how to get there).[68] Perception Machine perception[69] is the ability to use input from sensors (such as cameras, microphones, sonar and others more exotic) to deduce aspects of the world. Computer vision[70] is the ability to analyze visual input. A few selected subproblems are speech recognition,[71] facial recognition and object recognition.[72] Social intelligence Emotion and social skills[73] play two roles for an intelligent agent. First, it must be able to predict the actions of others, by understanding their motives and emotional states. (This involves elements of game theory, decision theory, as well as the ability to model human emotions and the perceptual skills to detect emotions.) Also, for good human-computer interaction, an intelligent machine also needs to display emotions. At the very least it must appear polite and sensitive to the humans it interacts with. At best, it should have normal emotions itself.
  5. 5. Creativity A sub-field of AI addresses creativity both theoretically (from a philosophical and psychological perspective) and practically (via specific implementations of systems that generate outputs that can be considered creative, or systems that identify and assess creativity). A related area of computational research is Artificial Intuition and Artificial Imagination. General intelligence Most researchers hope that their work will eventually be incorporated into a machine with general intelligence (known as strong AI), combining all the skills above and exceeding human abilities at most or all of them.[12] A few believe that anthropomorphic features like artificial consciousness or an artificial brain may be required for such a project.[74] Difference between AI and traditional programming Artificial Intelligence *primary symbolic process *Heuristic search -steps are implicit *Control structure usually separate from the domain knowledge *usually easy to modify update and enlarge *some incorrect answers are tolerable *satisfactory answers usually acceptable Conventional Programming *numeric *Algorithmic--steps are explicit *information and control are integrated together *difficult to modify *correct answers are required *best possible solution usually sought TravellingSalesman problem The Travelling Salesman Problem (TSP) is an NP-hard problem in combinatorial optimization studied in operations research and theoretical computer science. Given a list of cities and their pairwise distances, the task is to find a shortest possible tour that visits each city exactly once. “A salesman has a list of cities, each of which he must visit exactly once. There are direct roads between each pair of cities on the list. Find the route the salesman should follow for the shortest possible round trip that both starts and finishes at any one of the cities.” Nearest neighbour heuristic: 1. Select a starting city. 2. Select the one closest to the current city. 3. Repeat step 2 until all cities have been visited.
  6. 6. TSP can be modeled as an undirected weighted graph, such that cities are the graph's vertices, paths are the graph's edges, and a path's distance is the edge's length. A TSP tour becomes a Hamiltonian cycle, and the optimal TSP tour is the shortest Hamiltonian cycle. Often, the model is a complete graph (i.e., an edge connects each pair of vertices). If no path exists between two cities, adding an arbitrarily long edge will complete the graph without affecting the optimal tour. The most direct solution would be to try all permutations (ordered combinations) and see which one is cheapest (using brute force search). The running time for this approach lies within a polynomial factor of O(n!), the factorial of the number of cities, so this solution becomes impractical even for only 20 cities. One of the earliest applications of dynamic programming is an algorithm that solves the problem in time O(n22n).[14] The dynamic programming solution requires exponential space. Using inclusion–exclusion, the problem can be solved in time within a polynomial factor of 2n and polynomial space.[15] Improving these time bounds seems to be difficult. For example, it is an open problem if there exists an exact algorithm for TSP that runs in time O(1.9999n)[16] Other approaches include:  Various branch-and-bound algorithms, which can be used to process TSPs containing 40–60 cities.  Progressive improvement algorithms which use techniques reminiscent of linear programming. Works well for up to 200 cities.  Implementations of branch-and-bound and problem-specific cut generation; this is the method of choice for solving large instances. This approach holds the current record, solving an instance with 85,900 cities, Asymmetric TSP In most cases, the distance between two nodes in the TSP network is the same in both directions. The case where the distance from A to B is not equal to the distance from B to A is called asymmetric TSP. A practical application of an asymmetric TSP is route optimisation using street-level routing (asymmetric due to one-way streets, slip-roads and motorways). [edit] Solving by conversion to Symmetric TSP
  7. 7. Solving an asymmetric TSP graph can be somewhat complex. The following is a 3x3 matrix containing all possible path weights between the nodes A, B and C. One option is to turn an asymmetric matrix of size N into a symmetric matrix of size 2N, doubling the complexity. Asymmetric Path Weights A B C A 1 2 B 6 3 C 5 4 To double the size, each of the nodes in the graph is duplicated, creating a second ghost node. Using duplicate points with very low weights, such as −∞, provides a cheap route "linking" back to the real node and allowing symmetric evaluation to continue. The original 3×3 matrix shown above is visible in the bottom left and the inverse of the original in the top-right. Both copies of the matrix have had their diagonals replaced by the low-cost hop paths, represented by −∞. Symmetric Path Weights A B C A' B' C' A −∞ 6 5 B 1 −∞ 4 C 2 3 −∞ A' −∞ 1 2 B' 6 −∞ 3 C' 5 4 −∞ The original 3x3 matrix would produce two Hamiltonian cycles (a path that visits every node once), namely A-B-C-A [score 9] and A-C-B-A [score 12]. Evaluating the 6x6 symmetric version of the same problem now produces many paths, including A-A'-B-B'-C-C'-A, A-B'-C-A'-A, A-A'-B- C'-A [all score 9–∞]. The important thing about each new sequence is that there will be an alternation between dashed (A',B',C') and un-dashed nodes (A,B,C) and that the link to "jump" between any related pair (A-A') is effectively free. A version of the algorithm could use any weight for the A-A' path, as long as that weight is lower than all other path weights present in the graph. As the path weight to "jump" must effectively be "free", the value zero (0) could be used to represent this cost — if zero is not being used for another purpose already (such as designating invalid paths). In the two examples above, non-existent paths between nodes are shown as a blank square. Logic Logic[106] is used for knowledge representation and problem solving, but it can be applied to other problems as well. For example, the satplan algorithm uses logic for planning[107] and inductive logic programming is a method for learning.[108] Several different forms of logic are used in AI research. Propositional or sentential logic[109] is the logic of statements which can be true or false. First-order logic[110] also allows the use of quantifiers and predicates, and can express facts about objects, their properties, and their relations with each other.
  8. 8. Fuzzy logic,[111] is a version of first-order logic which allows the truth of a statement to be represented as a value between 0 and 1, rather than simply True (1) or False (0). Fuzzy systems can be used for uncertain reasoning and have been widely used in modern industrial and consumer product control systems. Subjective logic models uncertainty in a different and more explicit manner than fuzzy-logic: a given binomial opinion satisfies belief + disbelief + uncertainty = 1 within a Beta distribution. By this method, ignorance can be distinguished from probabilistic statements that an agent makes with high confidence. Default logics, non-monotonic logics and circumscription[51] are forms of logic designed to help with default reasoning and the qualification problem. Several extensions of logic have been designed to handle specific domains of knowledge, such as: description logics;[45] situation calculus, event calculus and fluent calculus (for representing events and time);[46] causal calculus;[47] belief calculus; and modal logics.[48 ExpertSystems: An expert system is a computer application that performs a task that would otherwise be performed by a human expert. For example, there are expert systems that can diagnose human illnesses, make financial forecasts, and schedule routes for delivery vehicles. Some expert systems are designed to take the place of human experts, while others are designed to aid them. To design an expert system, one needs a knowledge engineer, an individual who studies how human experts make decisions and translates the rules into terms that a computer can understand. NeuralNetworks: Neural Networks are systems that simulate intelligence by attempting to reproduce the types of physical connections that occur in animal brains. A type of artificial intelligence that attempts to imitate the way a human brain works. Rather than using a digital model, in which all computations manipulate zeros and ones, a neural network works by creating connections between processing elements, the computer equivalent of neurons. The organization and weights of the connections determine the output. Neural networks are particularly effective for predicting events when the networks have a large database of prior examples to draw on. Strictly speaking, a neural network implies a non-digital computer, but neural networks can be simulated on digital computers. Neural networks are an exciting technology with the potential to change the way we solve "real- world" problems in science, engineering and economics.
  9. 9. Today, the hottest area of artificial intelligence is neural networks, which are proving successful in a number of disciplines such as voice recognition and natural-language processing. The pros: 1 Utility for a variety of pattern recognition tasks (handwriting, spoken word, face, etc.) 2 Simplicity of programming (reinforcement learning) 3 An elegant model of emergence in programming: the idea that large numbers of simple programming elements are collectively capable of interesting, higher-level behaviors. The cons: 1 A vague feeling that we haven’t really learned anything about these pattern recognition skills 2 Neural nets seem to be a far less natural means for demonstrating “explicit rule learning”