Prof.Mrs.M.P.Atre
PVG’s COET, SPPU
9/18/2017 1
 Perception is the process of acquiring,
interpreting, selecting, and organizing
sensory information.
 Perception presumes sensation, where
various types of sensors each converts a
certain type of simple signal into data of the
system.
 To put the data together and to make sense
out of them is the job of the perception
mechanism.
9/18/2017 2
 Perception can be seen as a special type
of categorization (or classification, pattern
recognition) where the inputs are sensory
data, and the outputs are categorical
judgments and conceptual relations.
 The difficulty of the task comes from the
need of multiple levels of abstraction, where
the relations among data items are many-to-
many, uncertain, and changing over time.
9/18/2017 3
 Accurately speaking, we never "see things as
they are", and perception process of an
intelligent system is often (and should be)
influenced by internal and external factors
beside the signals themselves.
 Furthermore, perception is not a pure
passive process driven by the input.
9/18/2017 4
 In AI, the study on perception is mostly
focused on the reproduction of human
perception, especially on the perception of
aural and visual signals.
 However, this is not necessarily the case
since the perception mechanism of a
computer system does not have to be
identical to that of a human being.
9/18/2017 5
9/18/2017 6
Hearing
Vision
 Speech recognition is the front-end of a
system that can perceive and understand
spoken language, as used in voice command
interface and speech-to-speech translation.
9/18/2017 7
9/18/2017 8
acoustic-phonetic
approach
pattern-matching
approach
artificial intelligence
approach
 postulates that there exist finite, distinctive
phonetic units (phonemes) in spoken
language and that these units are broadly
characterized by a set of acoustic properties
 Even though the acoustic properties of
phonetic units are highly variable, both with
speakers and with neighboring sounds, it is
assumed in the acoustic-phonetic approach
that the rules governing the variability are
straightforward and can be readily learned
9/18/2017 9
 represents a speech-pattern in the form of a
mathematical model
 A direct comparison is made between the
unknown speech (the speech to be
recognized) with each possible pattern
learned in the training stage in order to
determine the identity of the unknown
9/18/2017 10
 attempts to do speech recognition using
various AI techniques, such as knowledge-
based systems or neural networks
9/18/2017 11
 translation from text to speech
 After the text analysis capabilities pre-
process the text (digit sequences,
abbreviations, etc.) the pronunciations of
most ordinary words and proper names are
decided by the dictionary-based methods
 Finally there are methods responsible for
post-processing (prosodic phrasing, word
accentuation, sentence intonation) and the
actual speech synthesis
9/18/2017 12
 A major remaining problem is naturalness,
especially context and meaning related
adjustments (emotion, stress, tone, ...)
 To fully solve this problem, it is probably
necessary to fully understand the meaning of
the message and the purpose of the speech
 Music perception and composition are also
studied in AI. For example, there are music
works produced by a computer program, and
some of them are in the styles of various
classical composers
9/18/2017 13
 vision begins with a large array of
measurements of the light reflected from
object surfaces onto the eye
 Analysis then proceeds in multiple stages,
with each producing increasingly more useful
representations of information in the scene
9/18/2017 14
 Stage 1
 Early representations may capture information
such as the location, contrast, and sharpness of
significant intensity changes or edges in the
image.
 Such changes correspond to physical features
such as object boundaries, texture contours, and
markings on object surfaces, shadow boundaries,
and highlights.
 In the case of a dynamically changing scene, the
early representations may also describe the
direction and speed of movement of image
intensity changes.
9/18/2017 15
 Stage 2
 Intermediate representations describe
information about the three-dimensional (3-
D) shape of object surfaces from the
perspective of the viewer, such as the
orientation of small surface regions or the
distance to surface points from the eye
 Such representations may also describe the
motion of surface features in three
dimensions.
9/18/2017 16
 Stage 3
 Higher-level representations of objects
describe their 3-D shape, form, and
orientation relative to a coordinate frame
based on the objects or on a fixed location in
the world
 Tasks such as object recognition, object
manipulation, and navigation may operate
from the intermediate or higher-level
representations of the 3-D layout of objects
in the world.
9/18/2017 17
 For relatively simple pattern
recognition problems, neural network is
often used to directly map input into output
via a learning process.
 In recent years, hierarchical learning
methods have made remarkable progresses
on various problems, such as CAPTCHA.
9/18/2017 18
 Vision is not a pure input process.
 Eye movement has important impact on
human visual perception
 An active vision system is one that is able to
interact with its environment by altering its
viewpoint rather than passively observing it,
and by operating on sequences of images
rather than on a single frame.
 Also, there is some study on using the eye-
gaze of a computer user in the interface to
aid the control of the application.
9/18/2017 19
 By "higher-level perception", we mean how
the given input data is categorized
 While in low-level perception, the processing
is mostly "bottom-up", i.e., the output is
more or less a function of the input, in
higher-level perception there are many more
factors involved.
9/18/2017 20
 "One of the most important properties of high-
level perception is that it is extremely flexible.
 A given set of input data may be perceived in a
number of different ways, depending on the
context and the state of the perceiver
 Due to this flexibility, it is a mistake to regard
perception as a process that associates a fixed
representation with a particular situation.
 Both contextual factors and top-down cognitive
influences make the process far less rigid than
this."
9/18/2017 21
 Letter Spirit
 The style of a composer
 Cartoon creation and comprehension
9/18/2017 22
https://cis.temple.edu/~wangp/3203-
AI/Lecture/IO-2.htm
9/18/2017 23

Perception in artificial intelligence

  • 1.
  • 2.
     Perception isthe process of acquiring, interpreting, selecting, and organizing sensory information.  Perception presumes sensation, where various types of sensors each converts a certain type of simple signal into data of the system.  To put the data together and to make sense out of them is the job of the perception mechanism. 9/18/2017 2
  • 3.
     Perception canbe seen as a special type of categorization (or classification, pattern recognition) where the inputs are sensory data, and the outputs are categorical judgments and conceptual relations.  The difficulty of the task comes from the need of multiple levels of abstraction, where the relations among data items are many-to- many, uncertain, and changing over time. 9/18/2017 3
  • 4.
     Accurately speaking,we never "see things as they are", and perception process of an intelligent system is often (and should be) influenced by internal and external factors beside the signals themselves.  Furthermore, perception is not a pure passive process driven by the input. 9/18/2017 4
  • 5.
     In AI,the study on perception is mostly focused on the reproduction of human perception, especially on the perception of aural and visual signals.  However, this is not necessarily the case since the perception mechanism of a computer system does not have to be identical to that of a human being. 9/18/2017 5
  • 6.
  • 7.
     Speech recognitionis the front-end of a system that can perceive and understand spoken language, as used in voice command interface and speech-to-speech translation. 9/18/2017 7
  • 8.
  • 9.
     postulates thatthere exist finite, distinctive phonetic units (phonemes) in spoken language and that these units are broadly characterized by a set of acoustic properties  Even though the acoustic properties of phonetic units are highly variable, both with speakers and with neighboring sounds, it is assumed in the acoustic-phonetic approach that the rules governing the variability are straightforward and can be readily learned 9/18/2017 9
  • 10.
     represents aspeech-pattern in the form of a mathematical model  A direct comparison is made between the unknown speech (the speech to be recognized) with each possible pattern learned in the training stage in order to determine the identity of the unknown 9/18/2017 10
  • 11.
     attempts todo speech recognition using various AI techniques, such as knowledge- based systems or neural networks 9/18/2017 11
  • 12.
     translation fromtext to speech  After the text analysis capabilities pre- process the text (digit sequences, abbreviations, etc.) the pronunciations of most ordinary words and proper names are decided by the dictionary-based methods  Finally there are methods responsible for post-processing (prosodic phrasing, word accentuation, sentence intonation) and the actual speech synthesis 9/18/2017 12
  • 13.
     A majorremaining problem is naturalness, especially context and meaning related adjustments (emotion, stress, tone, ...)  To fully solve this problem, it is probably necessary to fully understand the meaning of the message and the purpose of the speech  Music perception and composition are also studied in AI. For example, there are music works produced by a computer program, and some of them are in the styles of various classical composers 9/18/2017 13
  • 14.
     vision beginswith a large array of measurements of the light reflected from object surfaces onto the eye  Analysis then proceeds in multiple stages, with each producing increasingly more useful representations of information in the scene 9/18/2017 14
  • 15.
     Stage 1 Early representations may capture information such as the location, contrast, and sharpness of significant intensity changes or edges in the image.  Such changes correspond to physical features such as object boundaries, texture contours, and markings on object surfaces, shadow boundaries, and highlights.  In the case of a dynamically changing scene, the early representations may also describe the direction and speed of movement of image intensity changes. 9/18/2017 15
  • 16.
     Stage 2 Intermediate representations describe information about the three-dimensional (3- D) shape of object surfaces from the perspective of the viewer, such as the orientation of small surface regions or the distance to surface points from the eye  Such representations may also describe the motion of surface features in three dimensions. 9/18/2017 16
  • 17.
     Stage 3 Higher-level representations of objects describe their 3-D shape, form, and orientation relative to a coordinate frame based on the objects or on a fixed location in the world  Tasks such as object recognition, object manipulation, and navigation may operate from the intermediate or higher-level representations of the 3-D layout of objects in the world. 9/18/2017 17
  • 18.
     For relativelysimple pattern recognition problems, neural network is often used to directly map input into output via a learning process.  In recent years, hierarchical learning methods have made remarkable progresses on various problems, such as CAPTCHA. 9/18/2017 18
  • 19.
     Vision isnot a pure input process.  Eye movement has important impact on human visual perception  An active vision system is one that is able to interact with its environment by altering its viewpoint rather than passively observing it, and by operating on sequences of images rather than on a single frame.  Also, there is some study on using the eye- gaze of a computer user in the interface to aid the control of the application. 9/18/2017 19
  • 20.
     By "higher-levelperception", we mean how the given input data is categorized  While in low-level perception, the processing is mostly "bottom-up", i.e., the output is more or less a function of the input, in higher-level perception there are many more factors involved. 9/18/2017 20
  • 21.
     "One ofthe most important properties of high- level perception is that it is extremely flexible.  A given set of input data may be perceived in a number of different ways, depending on the context and the state of the perceiver  Due to this flexibility, it is a mistake to regard perception as a process that associates a fixed representation with a particular situation.  Both contextual factors and top-down cognitive influences make the process far less rigid than this." 9/18/2017 21
  • 22.
     Letter Spirit The style of a composer  Cartoon creation and comprehension 9/18/2017 22
  • 23.