Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Computer vision


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Computer vision

  1. 1. COMPUTER VISION Introduction Computer vision is the study and application of methods which allow computers to "understand" image content or content of multidimensional data in general. The term "understand" means here that specific information is being extracted from the image data for a specific purpose: either for presenting it to a human operator (e. g., if cancerous cells have been detected in a microscopy image), or for controlling some process (e. g., an industry robot or an autonomous vehicle). The image data that is fed into a computer vision system is often a digital gray-scale or colour image, but can also be in the form of two or more such images (e. g., from a stereo camera pair), a video sequence, or a 3D volume (e. g., from a tomography device). In most practical computer vision applications, the computers are pre-programmed to solve a particular task, but methods based on learning are now becoming increasingly common. Computer vision can also be described as the complement (but not necessary the opposite) of biological vision. In biological vision and visual perception real vision systems of humans and various animals are studied, resulting in models of how these systems are implemented in terms of neural processing at various levels. State Of The Art Relation between Computer vision and various other fields The field of computer vision can be characterized as immature and diverse. Even though earlier work exists, it was not until the late 1970's that a more focused study of the field 1
  2. 2. started when computers could manage the processing of large data sets such as images. However, these studies usually originated from various other fields, and consequently there is no standard formulation of the "computer vision problem". Also, and to an even larger extent, there is no standard formulation of how computer vision problems should be solved. Instead, there exists an abundance of methods for solving various well-defined computer vision tasks, where the methods often are very task specific and seldom can be generalized over a wide range of applications. Many of the methods and applications are still in the state of basic research, but more and more methods have found their way into commercial products, where they often constitute a part of a larger system which can solve complex tasks (e.g., in the area of medical images, or quality control and measurements in industrial processes). A significant part of artificial intelligence deals with planning or deliberation for system which can perform mechanical actions such as moving a robot through some environment. This type of processing typically needs input data provided by a computer vision system, acting as a vision sensor and providing high-level information about the environment and the robot. Other parts which sometimes are described as belonging to artificial intelligence and which are used in relation to computer vision is pattern recognition and learning techniques. As a consequence, computer vision is sometimes seen as a part of the artificial intelligence field. Since a camera can be seen as a light sensor, there are various methods in computer vision based on correspondences between a physical phenomenon related to light and images of that phenomenon. For example, it is possible to extract information about motion in fluids and about waves by analyzing images of these phenomena. Also, a subfield within computer vision deals with the physical process which given a scene of objects, light sources, and camera lenses forms the image in a camera. Consequently, computer vision can also be seen as an extension of physics.A third field which plays an important role is neurobiology, specifically the study of the biological vision system. Over the last century, there has been an extensive study of eyes, neurons, and the brain structures devoted to processing of visual stimuli in both humans and various animals. This has led to a coarse, yet complicated, description of how "real" vision systems 2
  3. 3. operate in order to solve certain vision related tasks. These results have led to a subfield within computer vision where artificial systems are designed to mimic the processing and behaviour of biological systems, at different levels of complexity. Also, some of the learning-based methods developed within computer vision have their background in biology. Yet another field related to computer vision is signal processing. Many existing methods for processing of one-variable signals, typically temporal signals, can be extended in a natural way to processing of two-variable signals or multi-variable signals in computer vision. However, because of the specific nature of images there are many methods developed within computer vision which have no counterpart in the processing of one- variable signals. A distinct character of these methods is the fact that they are non-linear which, together with the multi-dimensionality of the signal, defines a subfield in signal processing as a part of computer vision. Beside the above mentioned views on computer vision, many of the related research topics can also be studied from a purely mathematical point of view. For example, many methods in computer vision are based on statistics, optimization or geometry. Finally, a significant part of the field is devoted to the implementation aspect of computer vision; how existing methods can be realized in various combinations of software and hardware, or how these methods can be modified in order to gain processing speed without losing too much performance. Related Fields Computer vision, Image processing, Image analysis, Robot vision and Machine vision are closely related fields. If you look inside text books which have either of these names in the title there is a significant overlap in terms of what techniques and applications they cover. This implies that the basic techniques that are used and developed in these fields are more or less identical, something which can be interpreted as there is only one field with different names. On the other hand, it appears to be necessary for research groups, scientific journals, conferences and companies to present or market themselves as 3
  4. 4. belonging specifically to one of these fields and, hence, various characterizations which distinguish each of the fields from the others have been presented. The following characterizations appear relevant but should not be taken as universally accepted. Image processing and Image analysis tend to focus on 2D images, how to transform one image to another, e.g., by pixel-wise operations such as contrast enhancement, local operations such as edge extraction or noise removal, or geometrical transformations such as rotating the image. This characterization implies that image processing/analysis neither require assumptions nor produce interpretations about the image content. Computer vision tends to focus on the 3D scene projected onto one or several images, e.g., how to reconstruct structure or other information about the 3D scene from one or several images. Computer vision often relies on more or less complex assumptions about the scene depicted in an image. Machine vision tends to focus on applications, mainly in industry, e.g., vision based autonomous robots and systems for vision based inspection or measurement. This implies that image sensor technologies and control theory often are integrated with the processing of image data to control a robot and that real-time processing is emphasized by means of efficient implementations in hardware and software. There is also a field called Imaging which primarily focus on the process of producing images, but sometimes also deals with processing and analysis of images. For example, Medical imaging contains lots of work on the analysis of image data in medical applications. Finally, pattern recognition is a field which uses various methods to extract information from signals in general, mainly based on statistical approaches. A significant part of this field is devoted to applying these methods to image data.A consequence of this state of affairs is that you can be working in a lab related to one of these fields, apply methods from a second field to solve a problem in a third field and present the result at a conference related to a fourth field! Typical Tasks Of Computer Vision 4
  5. 5. Each of the application areas described above employ a range of computer vision tasks; more or less well-defined measurement problems or processing problems, which can be solved using a variety of methods. Some examples of typical computer vision tasks are presented below. Recognition The classical problem in computer vision, image processing and machine vision is that of determining whether or not the image data contains some specific object, feature, or activity. This task can normally be solved robustly and without effort by a human, but is still not satisfactory solved in computer vision for the general case: arbitrary objects in arbitrary situations. The existing methods for dealing with this problem can at best solve it only for specific objects, such as simple geometric objects (e.g., polyhedrons), human faces, printed or hand-written characters, or vehicles, and in specific situations, typically described in terms of well-defined illumination, background, and pose of the object relative to the camera. Different varieties of the recognition problem are described in the literature: • Recognition: one or several pre-specified or learned objects or object classes can be recognized, usually together with their 2D positions in the image or 3D poses in the scene. • Identification: An individual instance of an object is recognized. Examples: identification of a specific person face or fingerprint, or identification of a specific vehicle. • Detection: the image data is scanned for a specific condition. Examples: detection of possible abnormal cells or tissues in medical images or detection of a vehicle in an automatic road toll system. Detection based on relatively simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation. Several specialized tasks based on recognition exist, such as: 5
  6. 6. • Content-based image retrieval: find all images which has a specific content in a larger set or database of images. • Pose estimation: estimation of the position and orientation of specific object relative to the camera. Example: to allow a robot arm to pick up the objects from the belt. • Optical character recognition (or OCR): images of printed or handwritten text are converted to computer readable text such as ASCII or Unicode. Motion Several tasks relate to motion estimation in which an image sequence is processed to produce an estimate of the local image velocity at each point. Examples of such tasks are • Egomotion: determine the 3D rigid motion of the camera. • Tracking of one or several objects (e.g. vehicles or humans) through the image sequence. • Surveillance: detection of possible activities based on motion. Scene Reconstruction Given two or more images of a scene, or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. More sophisticated methods produce a complete 3D surface model. Image Restoration Given an image, an image sequence, or a 3D volume, which has been degraded by noise, image restoration aims at producing the image data without the noise. Examples of noise processes which are considered are sensor noise (e.g., ultrasonic images) and motion blur (e.g., because of a moving camera or moving objects in the scene). Computer Vision Systems 6
  7. 7. A typical computer vision system can be divided in the following subsystems: Image acquisition The image or image sequence is acquired with an imaging system (camera,radar,lidar,tomography system). Often the imaging system has to be calibrated before being used. Preprocessing In the preprocessing step, the image is being treated with "low-level"-operations. The aim of this step is to do noise reduction on the image (i.e. to dissociate the signal from the noise) and to reduce the overall amount of data. This is typically being done by employing different (digital)image processing methods such as: 1. Downsampling the image. 2. Applying digital filters 3. Computing the x- and y-gradient (possibly also the time-gradient). 4. Segmenting the image. a. Pixelwise thresholding. 5. Performing an eigentransform on the image a. Fourier transform 6. Doing motion estimation for local regions of the image (also known as optical flow estimation). 7. Estimating disparity in stereo images. 8. Multiresolution analysis Feature extraction 7
  8. 8. The aim of feature extraction is to further reduce the data to a set of features, which ought to be invariant to disturbances such as lighting conditions, camera position, noise and distortion. Examples of feature extraction are: 1. Performing edge detection or estimation of local orientation. 2. Extracting corner features. 3. Detecting blob features. 4. Extracting spin images from depth maps. 5. Extracting geons or other three-dimensional primitives, such as superquadrics. 6. Acquiring contour lines and maybe curvature zero crossings. 7. Generating features with the Scale-invariant feature transform. 8. Calculating the Co-occurrence matrix of the image or sub-images to measure texture. Registration The aim of the registration step is to establish correspondence between the features in the acquired set and the features of known objects in a model-database and/or the features of the preceding image. The registration step has to bring up a final hypothesis. To name a few methods: 1. Least squares estimation 2. Hough transform in many variations 3. Geometric hashing 4. Particle filtering Applications Of Computer Vision The following is a non-complete list of applications which are studied in computer vision. In this category, the term application should be interpreted as a high level function which solves a problem at a higher level of complexity. Typically, the various technical problems related to an application can be solved and implemented in different ways. 8
  9. 9. Applications Of Computer Vision A facial recognition system is a computer-driven application for automatically identifying a person from a digital image. It does that by comparing selected facial features in the live image and a facial database. It is typically used for security systems and can be compared to other biometrics such as fingerprint or eye iris recognition systems. Popular recognition algorithms include eigenface, fisherface, the Hidden Markov model, and the neuronal motivated Dynamic Link Matching. A newly emerging trend, claimed to achieve previously unseen accuracies, is three-dimensional face recognition. Another emerging trend uses the visual details of the skin, as captured in standard digital or scanned images. Tests on the FERET database, the widely used industry benchmark, showed that this approach is substantially more reliable than previous algorithms. Polly (robot) Polly was a robot created at the MIT Artificial Intelligence Laboratory by Ian Horswill for his PhD, which was published in 1993 as a technical report. It was the first mobile robot to move at animal-like speeds (1m per second) using computer vision for its navigation. It was an example of behavior based robotics. For a few years, Polly was able to give tours of the AI laboratory's seventh floor, using canned speech to point out landmarks such as Anita Flynn's office. The Polly algorithm is a way to navigate in a cluttered space using very low resolution vision to find uncluttered areas to move forward into, assuming that the pixels at the bottom of the frame (the closest to the robot) show an example of an uncluttered area. Since this could be done 60 times a second, the algorithm only needed to discriminate three categories: telling the robot at each instant to go straight, towards the right or towards the left. Mobile robot 9
  10. 10. Mobile Robots are automatic machines that are capable of movement in a given environment. Robots generally fall into two classes, linked manipulators (or Industrial robots) and mobile robots. Mobile robots have the capability to move around in their environment and are not fixed to one physical location. In contrast, industrial manipulators usually consist of a jointed arm and gripper assembly (or end effector) that is attached to a fixed surface. The most common class of mobile robots are wheeled robots. A second class of mobile robots includes legged robots while a third smaller class includes aerial robots, usually referred to as unmanned aerial vehicles (UAVs). Mobile robots are the focus of a great deal or current research and almost every major university has one or more labs that focus on mobile robot research. Mobile robots are also found in industry, military and security environments, and appear as consumer products. Robot A humanoid robot manufactured by Toyota "playing" a trumpet The word robot is used to refer to a wide range of machines, the common feature of which is that they are all capable of movement and can be used to perform physical tasks. Robots take on many different forms, ranging from humanoid, which mimic the human form and way of moving, to industrial, whose appearance is dictated by the function they are to perform. Robots can be grouped generally as mobile robots (eg. autonomous vehicles), manipulator robots (eg. industrial robots) and Self reconfigurable robots, which can conform themselves to the task at hand. Robots may be controlled directly by a human, such as remotely-controlled bomb- disposal robots, robotic arms, or shuttles, or may act according to their own decision making ability, provided by artificial intelligence. However, the majority of robots fall in- between these extremes, being controlled by pre-programmed computers. Such robots may include feedback loops such that they can interact with their environment, but do not display actual intelligence. 10
  11. 11. The word "robot" is also used in a general sense to mean any machine which mimics the actions of a human (biomimicry), in the physical sense or in the mental sense.It comes from the Czech and Slovak word robota, labour or work (also used in a sense of a serf). The word robot first appeared in Karel Čapek's science fiction play R.U.R. (Rossum's Universal Robots) in 1921. History The construction of the Soviet-made robot of the 1970's. The robot was able to move, reproduce the pre-recorded sounds, imitate the clever conversation using the built-in radio station and demonstrate movies on the built-in screen. It was used in various 11
  12. 12. shows.The word robot was introduced by Czech writer Karel Capek in his play R.U.R. (Rossum's Universal Robots) which was written in 1920 (See also Robots in literature for details of the play). However, the verb robotovat, meaning "to work" or "to slave", and the noun robota (meaning corvée) used in the Czech and Slovak languages, has been used since the early 10th century. It was suggested that the word robot had been coined by Karel Čapek's brother, painter and writer Josef Čapek. An early automaton was created 1738 by Jacques de Vaucanson, who created a mechanical duck that was able to eat grain, flap its wings, and excrete. The first human to be killed by a robot was 37 year-old Kenji Urada, a Japanese factory worker, in 1981. According the, Urada "climbed over a safety fence at a Kawasaki plant to carry out some maintenance work on a robot. In his haste, he failed to switch the robot off properly. Unable to sense him, the robot's powerful hydraulic arm kept on working and accidentally pushed the engineer into a grinding machine." Smart Camera A smart camera is an integrated machine vision system which, in addition to image capture circuitry, includes a processor, which can extract information fromimageswithout need for an external processing unit, and interface devices used to make results available to other devices. A Smart Camera or „intelligent Camera“ is a self-contained, standalone vision system with built-in image sensor in the housing of an industrial video camera. It contains all necessary communication interfaces, e.g. Ethernet. It is not necessarily larger than an 12
  13. 13. industrial or surveillance camera. This architecture has the advantage of a more compact volume compared to PC-based vision systems and often achieves lower cost, at the expense of a somewhat simpler (or missing altogether) user interface. Early smart camera (ca. 1985, in red) with an 8MHz Z80 compared to a modern device featuring Texas Instruments' C64 @1GHz. A Smart Camera usually consists of several (but not necessarily all) of the following components: 1. Image sensor (matrix or linear, CCD- or CMOS) 2. Image digitization circuitry 3. Image memory 4. Communication interface (RS232, Ethernet) 5. I/O lines (often optoisolated) 6. Lens holder or built in lens (usually C or C-mount) Examples Of Applications For Computer Vision Another way to describe computer vision is in terms of applications areas. One of the most prominent application fields is medical computer vision or medical image processing. This area is characterized by the extraction of information from image data for the purpose of making a medical diagnosis of a patient. Typically image data is in the form of microscopy images, X-ray images, angiography images, ultrasonic images, and tomography images. An example of information which can be extracted from such image data is detection of tumours, arteriosclerosis or other malign changes. It can also be measurements of organ dimensions, blood flow, etc. This application area also supports 13
  14. 14. medical research by providing new information, e.g., about the structure of the brain, or about the quality of medical treatments. A second application area in computer vision is in industry. Here, information is extracted for the purpose of supporting a manufacturing process. One example is quality control where details or final products are being automatically inspected in order to find defects. Another example is measurement of position and orientation of details to be picked up by a robot arm. See the article on machine vision for more details on this area. Military applications are probably one of the largest areas for computer vision, even though only a small part of this work is open to the public. The obvious examples are detection of enemy soldiers or vehicles and guidance of missiles to a designated target. More advanced systems for missile guidance send the missile to an area rather than a specific target, and target selection is made when the missile reaches the area based on locally acquired image data. Modern military concepts, such as "battlefield awareness,"imply that various sensors, including image sensors, provide a rich set of information about a combat scene which can be used to support strategic decisions. In this case, automatic processing of the data is used to reduce complexity and to fuse information from multiple sensors to increase reliability. Artist's Concept of Rover on Mars. Notice the stereo cameras mounted on top of the Rover. (credit: Maas Digital LLC) One of the newer application areas is autonomous vehicles, which include submersibles, land-based vehicles (small robots with wheels, cars or trucks), and aerial vehicles. An unmanned aerial vehicle is often denoted UAV. The level of autonomy ranges from fully autonomous (unmanned) vehicles to vehicles where computer vision based systems support a driver or a pilot in various situations. Fully autonomous vehicles typically use computer vision for navigation, e. g., a UAV looking for forest fires. Examples of supporting system are obstacle warning systems in cars and systems for autonomous landing of aircraft. Several car manufacturers have demonstrated systems for autonomous driving of cars, but this technology has still not reached a level where it can be put on the market. 14
  15. 15. Software For Computer Vision Animal Animal (first implementation: 1988 - revised: 2004) is an interactive environment for Image processing that is oriented toward the rapid prototyping, testing, and modification of algorithms. To create ANIMAL (AN IMage ALgebra), XLISP of David Betz was extended with some new types: sockets, arrays, images, masks, and drawables. The theoretical framework and the implementation of the working environment is described in the paper ANIMAL: AN IMage ALgebra.In the theoretical framework of ANIMAL a digital image is a boundless matrix. However, in the implementation it is bounded by a rectangular region in the discrete plane and the elements outside the region have a constant value. The size and position of the region in the plane (focus) is defined by the coordinates of the rectangle. In this way all the pixels, including those on the border, have the same number of neighbors (useful in local operators, such as digital filters). Furthermore, pixelwise commutative operations remain commutative on image level, independently on focus. OpenCv OpenCV is an open source computer vision library developed by Intel. The library is cross-platform, and runs on both Windows and Linux. It focuses mainly towards real- time image processing. The application areas include 1. Human-Computer Interface (HCI) 2. Object Identification 3. Segmentation and Recognition 4. Face Recognition 5. Gesture Recognition 6. Motion Tracking Visualization Toolkit (VTK) 15
  16. 16. Visualization Toolkit (VTK) is an open source, freely available software system for 3D computer graphics, image processing, and visualization used by thousands of researchers and developers around the world. VTK consists of a C++ class library, and several interpreted interface layers including Tcl/Tk, Java, and Python. Professional support and products for VTK are provided by Kitware, Inc. VTK supports a wide variety ofvisualization algorithms including scalar, vector, tensor, texture, and volumetric methods; and advanced modeling techniques such as implicit modelling, polygon reduction, mesh smoothing, cutting, contouring, and Delaunay triangulation. Commercial Computer Vision Systems Automatix Inc., founded in January 1980, was the first company to market industrial robots with built-in machine vision. Its founders were Victor Scheinman, inventor of the Stanford arm; Phillippe Villers, Michael Cronin, and Arnold Reinhold of Computervision; Jake Dias and Dan Nigro of Data General; Gordon VanderBrug, of NBS and Norman Wittels of Clark University. Automatix Robots at the Robots 1985 show in Detroit, Michigan. Clockwise from lower left: AID 600, AID 900 Seamtracker, Yaskawa Motoman.Automatix mostly used robot mechanisms imported from Hitachi at first and later from Yaskawa and KUKA. It did design and manufacture a Cartesian robot called the AID-600. The 600 was intended for use in precision assembly but was adapted for welding use, particularly Tungsten inert gas welding (TIG), which demands high accuracy and immunity from the intense electromagnetic interference that the TIG process creates. Automatix was the first company to market a vision-guided welding robot called Seamtracker. Structured laser 16
  17. 17. light and monochromatic filters were used to allow an image to be seen in the presence of the welding arc. Another concept, invented by Mr. Scheinman, was RobotWorld, a system of cooperating small modules suspended from a 2-D linear motor. The product line was later sold to Yaskawa. Automatix raised large amounts of venture capital, and went public in 1983, but was not profitable until the early 1990s. In 1994, Automatix merged with another machine vision company, Itran Corp., to form Acuity Imaging, Inc. Acuity was acquired by Robotics Vision Systems Inc. (RVSI) in September 1995. As of 2004, RVSI still supported the evolved Automatix machine vision package under the PowerVision brand. RapidEye is a commercial multispectral remote sensing satellite mission being designed and implemented by MDA for RapidEye AG. The RapidEye sensor images five optical bands in the 400-850nm range and provides 5m pixel size at nadir. Rapid delivery and short revisit times are provided through the use of a five-satellite constellation. Scantron is the name of a United States company that makes and sells Scantron exam answer sheets and the machines to grade them. The Scantron system usually takes the form of a "multiple choice, fill-in-the-circle/square/rectangle" form of varying length and width, from single column 50 answer tests, to multiple 8.5" x 11" page forms used in standardized testing such as the SAT and ACT. The forms are sensed optically, using optical mark recognition to detect markings in each place, in a "Scantron Machine" that tabulates and can automatically grade results. Earlier versions were sensed electrically. 17
  18. 18. A typical 100-answer Scantron answer sheet. This is only half of it (the front side) with the back side not being shown.Commonly, there are two sides to Scantron answer sheets. 18
  19. 19. They can contain 50 answer blanks, 100 answer blanks, and so on. There is even a smaller form called a "Quiz Strip" that contains only about 20 answer boxes to bubble-in. On the larger sheets, there is a space on the back where answers can be manually written in for separate questions, if a test giver issues them out. The full-sized 8.5" x 11" form may contain a larger area for using it to work on math formulas, write short answers, etc. Answers "A" and "B" are commonly used for "True" and "False" questions, as shown in the image to the right on the top of each row. Grading of Scantron sheets is performed first by creating an answer key. The answer key is simply a standard Scantron answer sheet with all of the correct answers filled in, along with the "key" rectangle at the top of the sheet.Once you have your answer key ready the Scantron machine is powered on and the answer key is fed through. This stores the answer key in the memory of the Scantron machine and any further sheets that are fed through will be graded and marked according to the key in memory. Switching off the Scantron machine will stop the paper feed and clear the memory. 19
  20. 20. Conclusion Computer vision, unlike for example factory machine vision, happens in unconstrained environments, potentially with changing cameras and changing lighting and camera views. Also, some “objects” such as roads, rivers, bushes, etc. are just difficult to describe. In these situations, engineering a model a-priori can be difficult. With learning- based vision, one just “points” the algorithm at the data and useful models for detection, segmentation, and identification can often be formed. Learning can often easily fuse or incorporate other sensing modalities such as sound, vibration, or heat. Since cameras and sensors are becoming cheap and powerful and learning algorithms have a vast appetite for computational threads, Intel is very interested in enabling geometric and learning- based vision routines in its OpenCV library since such routines are vast consumers of computational power. 20
  21. 21. 21