Materi_01_VK_2223_3.pdf

PENGENALAN VISI KOMPUTER
VISI KOMPUTER
2022/2023 - 3

SCHEDULE
1
Lecture via zoom meeting or in
classroom:
Every weeks on Saturday start
July 01, 2023
Note online attendance: every week

INSTRUCTOR
2
Dr. Ichsan Ibrahim, S.Si., M.Si.
ichsanibrahim@stmik-im.ac.id
Cell phone: 08158210073
Telegram: @ichsanibrahim

REPORT FORMAT
 Assignment report must have
 Title Page Assignment title, Student’s Name and Student’s ID Number (NIM), instructor’s name, campus logo, date of
report preparation (5 point)
 Summary:There needs to be a summary of the major points, conclusions, and recommendations (6 point)
 Contents (2 point)
 The body of report
 Introduction: comprises the problem statement or aim of the assignment and a short overview of basic theory
related with the task questions (10 point)
 Main :This part should clearly reflect the specific achievements of the assignment, include the process and the
results, or for some assignment, in this section you write the review of paper (50 point)
 Conclusions & Recommendations (12 point)
 Reference (4 point)
 Appendix (if necessary)
 The reports must follow the rules of scientific writing and have the correct format.(6 point)
 All assignments are submitted/upload in digital format to Kuliah Online website ComputerVision course and follow the
instructions given in the Assignment section.
3

GRADING
4
Attendance and participation (attendance the course,zoom meeting,and participate in Forum): 10 %
Homework and assignments: 20 % (4 to 6 assignments)
Mid-term exam: 30 %
Final exam: 40 %
If you don't complete or submit all assignments, it will reduce your chances of passing the course
Assignment submission deadline: pay close attention the settings/info on each task

REFERENCE
 Szeliski, R. (2022). Computer Vision: Algorithms and Applications (Texts in Computer Science) (2nd ed.),
Springer Nature Switzerland AG.
 Klette, R. (2014). Concise Computer Vision: An Introduction into Theory and Algorithms (1st ed.),
Springer-Verlag London.
 Forsyth, David A. and Ponce, J. (2012). Computer Vision: A Modern Approach (2nd ed.), Pearson Education,
Inc.
 Fisher, R. B., Breckon, T. P., Dawson-Howe, K., Fitzgibbon, A., Robertson, C. , Trucco, E., Williams, C. K. I. (2014).
Dictionary of ComputerVision and Image Processing (2nd ed.), JohnWiley & Sons Ltd.
5

COURSE ETHOS
 It's your road & yours alone. Others may walk it with you, but no one
can walk it for you.
Jalāl ad-Dīn Muḥammad Rūmī
 “If you want to build a boat, don't gather your men and women to give
them orders, explain every detail, to tell them where to find everything..
If you want to build a boat, give birth in the hearts of your men and
women to the desire for the sea”
Saint Exupéry
6

HISTORY & MILESTONE
7
 1959—Most experiments started here when neurophysiologists showed an array of images to a cat in an attempt
to correlate responses in its brain. Consequently, they found that it reacted first to the lines or hard edges, which
made it clear that image processing starts with simple shapes, such as straight edges.
 1963—Computers were able to interpret the tridimensionality of a scene from a picture, and AI was already an
academic field.
 1974—Optical character recognition (OCR) was introduced to help interpret texts printed in any typeface.
 1980—Dr. Kunihiko Fukushima, a neuroscientist from Japan, proposed Neocognitron, a hierarchical multilayered
neural network capable of robust visual pattern recognition, including corner, curve, edge, and basic shape
detection.
 1982 - David Marr, a British neuroscientist, published another influential paper—“Vision:A computational
investigation into the human representation and processing of visual information”.
https://blog.superannotate.com/introduction-to-computer-vision/
https://hackernoon.com/a-brief-history-of-computer-vision-and-
convolutional-neural-networks-8fe8aacc79f3

HISTORY & MILESTONE
 1997 - Jitendra Malik (along with his student Jianbo Shi) released a paper in which he described his
attempts to tackle perceptual grouping.
 1999 - David Lowe’s work “Object Recognition from Local Scale-Invariant Features”
 2000-2001—Studies on object recognition increased, helping in the development of the first real-time
face recognition application.
 2009 - Pedro Felzenszwalb, David McAllester, and Deva Ramanan developed “the Deformable Part
Model”
 2010—ImageNet data were made available containing millions of tagged images across various object
classes that provided the foundation of CNNs and other deep learning models used today.
 2014—COCO has also been developed to offer a dataset used in object detection and support future
research. 8

COMPUTERVISION
 What kind of scene?
 Where are the cars?
 How far is the building?
 Make computers understand images
and video.
9

INFORMATION AND KNOWLEDGE
10
The first meaning of information and/or knowledge conceives of it as a physical
objective entity which can be passed from one person to another.
Knowledge, expressed as information, is encapsulated within a physical or electronic
artefact so that it can be communicated from one person to another; in this sense,
we might speak of a textbook, a research paper, a website or a documentary film as
containing knowledge which has been articulated by the author and which can be
interpreted by many others without any loss of meaning.

IS THEVISION EASY OR HARD?
11
http://persci.mit.edu/pub_pdfs/adelson_spie_01.pdf

VISION IS REALLY HARD
 Vision is an amazing feat of natural intelligence
 Visual cortex occupies about 50% of Macaque brain
 More human brain devoted to vision than anything else
12

WHY COMPUTERVISION MATTERS
13
Safety Health Security
Comfort Access
Fun

THE 3 “R” OF COMPUTERVISION
The classic problems of computational
vision:
 Reconstruction
 Recognition
 (Re)organization
Jitendra Malik – pioneer of computer vision (student of
early AI researchers)
14

LET’STHINK
 Please think this question for 1 minute:
 Have you ever used computer vision?
 How? Where?
 Put it into three categories: Reconstruction? Recognition? (Re)organization?
15

LIST OF THE EXISTING APPLICATIONS WHICH USED COMPUTER
VISION
 Laptop: Biometrics auto-login (face recognition, 3D), OCR
 Smartphones: QR codes, computational photography (Android Lens Blur, iPhone Portrait Mode), panorama
construction (Google Photo Spheres), face detection, expression detection (smile), Snapchat filters (face tracking),
FaceID (iPhone), Night Sight (Pixel), iPhone 12 Pro (LiDAR)
 Web: Image search, Google photos (face recognition, object recognition, scene recognition, geolocalization from vision),
Facebook (image captioning), Google maps aerial imaging (image stitching),YouTube (content categorization)
 VR/AR: Outside-in tracking (HTCVIVE), inside out tracking (simultaneous localization and mapping, HoloLens), object
occlusion (dense depth estimation)
 Motion: Kinect, full body tracking of skeleton, gesture recognition, virtual try-on
 Medical imaging: CAT / MRI reconstruction, assisted diagnosis, automatic pathology, connectomics, endoscopic
surgery
16

LIST OF THE EXISTING APPLICATIONS WHICH USED COMPUTER
VISION
 Industry: Vision-based robotics (marker-based), machine-assisted router (jig), automated post,ANPR (number
plates), surveillance, drones, shopping
 Transportation: Assisted driving (everything), face tracking/iris dilation for drunkeness, drowsiness,automated
distribution (all modes)
 Media: Visual effects for film,TV (reconstruction), virtual sports replay (reconstruction), semantics-based auto
edits (reconstruction, recognition)
 Robotic – navigation and control
 Remote Sensing – land use and environmental monitoring
 Psychology,AI – exploring representation and computation in natural vision
17

OPTICAL CHARACTER RECOGNITION (OCR)
• Technology to convert images of text into text
• If you have a scanner, it probably came with OCR
software
• Or while using someTranslation Apps:
• Word Lens, a feature in GoogleTranslate,
https://en.wikipedia.org/wiki/Word_Lens
• TextGrabber, https://www.textgrabber.pro/en/
• MicrosoftTranslator,
https://www.microsoft.com/en-us/translator/
• Waygo, http://www.waygoapp.com/
18
Mail digit recognition,AT&T labs
http://www.research.att.com/~yann/
License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recogniti
on

FACE DETECTION
 Almost all digital cameras detect faces
 Snapchat face filters
 Why would this be useful?
 Main reason is focus.
 Also enables “smart” cropping.
19
Photo - http://thetechjournal.com/how-to/tutorial-face-swap-snapchat.xhtml
http://www.pleated-jeans.com/2016/03/02/21-snapchat-face-swaps-that-went-
horribly-wrong/

SMILE DETECTION
20
https://www.sony.com/content/sony/en/en_us/SCA/company-news/press-
releases/sony-electronics/2008/sony-adds-smile-shutter-function-to-cybershot-
wseries-digital-cameras.html

VISION-BASED BIOMETRICS
 How the Afghan Girl was Identified by Her Iris
Patterns” Read the story
 Wikipedia
http://www.cl.cam.ac.uk/~jgd1000/afghan.html
21

OBJECT RECOGNITION (IN MOBILE PHONES)
Point & Find, Nokia (obsolete)
Google Lens, https://lens.google/
23

OBJECT RECOGNITION (IN SUPERMARKETS)
24
Amazon Go,
https://www.amazon.com/b?ie=UTF8&node=16008589011

LANEHAWK
https://www.datalogic.com/eng/retail/fixed-retail-
scanners/lanehawk-lh5000-pd-830.html
LaneHawk is a loss-prevention solution that turns
bottom-of-basket (BOB) losses into profits in real time.
“Like its predecessors, the LH5000 utilizes advanced
Visual Pattern Recognition (ViPR) software plus newly
added support for 1D bar codes and Digimarc Barcode
digital watermarks to eliminate up to 90% of shrink
caused by Bottom-Of-Basket (BOB) items…….. Simply
stated, the LaneHawk system is the best to ensure all
items on the bottom of shopping carts are paid for. … “
25

3D FROM IMAGES
 Building Rome in a Day:
 Paper:Agarwal et al. 2009,
https://grail.cs.washington.edu/r
ome/rome_paper.pdf
 Website:
https://grail.cs.washington.edu/r
ome/
26

HUMAN SHAPE CAPTURE
27
http://gl.ict.usc.edu/Research/presidentialportrait

SHAPE CAPTURE
 Shape capture is techniques for
capturing the shape of physical objects.
28
The Matrix movies, ESC Entertainment, XYZRGB, NRC
http://cinetropolis.net/tarkin-care-of-business-rogue-ones-
digital-peter-cushing/

MOTION CAPTURE
Motion Capture is a cutting-edge
method of capturing all or part of
an actor's performance so that it
can be translated into the action
of a computer-generated 3D
character on screen.
29
http://www.digitalspy.com/movies/oscars/feature
/a584704/why-andy-serkis-deserves-an-oscar-
nomination-for-planet-of-the-apes/?zoomable

SPORTS: VIRTUAL PITCH MARKINGS
 Sport vision first down line
 1st & Ten is a computer system that augments televised
coverage of American football by inserting graphical
elements on the field of play as if they were physically
present; the inserted element stays fixed within the
coordinates of the playing field and obeys the visual
rules of foreground objects occluding background
objects.
 Nice explanation on www.howstuffworks.com
 http://www.sportvision.com/video.html
30

COMPUTERVISION IN SPORT
 Thomas, G., Gade, R., Moeslund,T. B., Carr, P., &
Hilton,A. (2017). Computer vision for sports:
Current applications and research topics. Computer
Vision and Image Understanding, 159, 3-18.
https://www.sportperformanceanalysis.com/s/Compu
ter-vision-for-sports-current-applications-and-
research-topics.pdf
 https://www.sportperformanceanalysis.com/article/co
mputer-vision-in-sport
31

INTERACTIVE GAMES
 Object Recognition:
http://www.youtube.com/watch?feature=iv&v=fQ59d
XOo63o
 Mario:
http://www.youtube.com/watch?v=8CTJL5lUjHg
 3D: http://www.youtube.com/watch?v=7QrnwoO1-
8A
 Robot:
http://www.youtube.com/watch?v=w8BmgtMKFbY
32

AUTO CAR
 Mobileye
 Vision systems currently in high-end BMW, GM,Volvo
models. By 2010: 70% of car manufacturers.
33

GOOGLE CAR
 Oct 9, 2010. "Google Cars Drive Themselves, in
Traffic". The NewYorkTimes. John Markoff
 June 24, 2011. "Nevada state law paves the way for
driverless cars". Financial Post. Christine Dobby
 Aug 9, 2011, "Human error blamed after Google's
driverless car sparks five-vehicle crash". The
Star (Toronto)
34

AUTOCARS
 Uber bought Carnegie Mellon University (CMU) lab
(2015),
http://www.cmu.edu/news/stories/archives/2015/febr
uary/uber-partnership.html
 http://www.wsj.com/articles/is-uber-a-friend-or-
foe-of-carnegie-mellon-in-robotics-1433084582
 http://www.freep.com/story/money/cars/ford/201
6/08/21/uber--lyft-gm-pittsburgh-autonomous-
vehicles-self-driving-autos-equal-
profits/88944036/
 Then sold it (2020),
https://techcrunch.com/2020/12/07/uber-sells-self-
driving-unit-uber-atg-in-deal-that-will-push-auroras-
valuation-to-10b/
35

COMPUTERVISION IN SPACE
 Vision systems (JPL) used for several tasks
 Panorama stitching
 3D terrain modeling
 Obstacle detection, position tracking
 For more, read : “ComputerVision on Mars” by
Matthies et al.,
https://www.ri.cmu.edu/pub_files/pub4/matthies_larr
y_2007_1/matthies_larry_2007_1.pdf
36

COMPUTERVISION IN SPACE
 NASA Perseverance lander and rover
37
https://mars.nasa.gov/mars2020/mission/technology/#Terrain-
Relative-Navigation

COMPUTERVISION ON MARS
 It has 23 cameras on it.
https://mars.nasa.gov/mars2020/spacecraft/rover/cam
eras/
 https://mars.nasa.gov/mars2020/spacecraft/rover/brai
ns
 CPU is 200MHz PowerPC arch.
 2GB storage
 256MB RAM
38

INDUSTRIAL ROBOTS
Vision-guided robots position nut runners on wheels
39

MOBILE ROBOTS
 Robotic Grasping of Novel Objects usingVision:
http://ai.stanford.edu/~asaxena/learninggrasp/IJRR_saxena_etal_
roboticgraspingofnovelobjects.pdf
 RoboCup is an international scientific initiative with the goal to
advance the state of the art of intelligent robots.When
established in 1997, the original mission was to field a team of
robots capable of winning against the human soccer World Cup
champions by 2050,https://robocup.org/
 Mars Spirit Rover, One of two rovers launched in 2003 to
explore Mars and search for signs of past life, Spirit far
outlasted her planned 90-day mission, lasting over six years.
https://www.jpl.nasa.gov/missions/mars-exploration-rover-spirit-
mer-spirit
40

MEDICAL IMAGING
 3D Reconstruction,Visualization, and Measurement
of MRI Images, https://sci-
hub.se/https://doi.org/10.1117/12.341059
 Three-Dimensional Medical CT Image
Reconstruction, https://sci-
hub.se/10.1109/ICMTMA.2009.10
 Image Guided Surgery
http://citeseerx.ist.psu.edu/viewdoc/download?doi=1
0.1.1.469.8474&rep=rep1&type=pdf
41

HUMANOID ROBOTS
42
 https://blog.bostondynamics.com/flipping-the-script-
with-atlas Boston Dynamics (2021)

AUGMENTED REALITY ANDVIRTUAL REALITY
43
MS HoloLens, Oculus,
Magic Leap,
ARCore / ARKit

AUGMENTED REALITY ANDVIRTUAL REALITY
44
Oculus (Quest)
Niantic

AI FOR PHYSICAL INTERACTION
45

COMPUTERVISION AND NEARBY FIELDS
46
Computer Graphics: Models to Images
Image Processing : Images to Images
ComputerVision: Images to Models

COMPUTERVISION AND NEARBY FIELDS
47
Derogatory summary of computer vision:
“Machine learning applied to visual data.”
Model of
the visual
world
Images, videos,
sensor data…
Images, videos,
interaction
Digital world
Real world
Information
Computer Vision Computer Graphics

SUPERHUMAN STATE OF THE ART?
Deep learning is an enormous disruption to the field. Since 2012, rapid
expansion and commercialization.
Why?
“With enough data, computer vision matches or even outperforms human
vision at most recognition tasks.”
What.
48

VISION AND SOCIETY
 Lots of data = lots of potential bias in the data.
Needs understanding of possible failures.
+
Responsible approach.
+
Techniques to overcome bias.
49

VISION AND SOCIETY
 “Vision, in my view, is the cause of the greatest
benefit to us, inasmuch as none of the accounts
now given concerning the Universe would ever
have been given if men had not seen the stars or
the sun or the heavens.”
 - Plato (Timeus, 360 BC)
“Worldview” vs.“World-sense”
50

VISION AND SOCIETY
Societal Categorizations
Prioritization ofVision •Visual Categorization •Visual Biases
 “The reason that the body has so much presence in the
West is that the world is primarily perceived by sight.The
differentiation of human bodies in terms of sex, skin
color, and cranium size is a testament to the powers
attributed to "seeing." It is believed that just by looking at
it [the body] one can tell a person's beliefs and social
position or lack thereof.”
- Oyeronke Oyewumi
(The Invention ofWomen, 1997)
51

VISION AND SOCIETY
Derogatory summary of computer vision:
“Machine learning applied to visual data.”
JH
Models of
the visual
world
Images, videos,
sensor data…
Images, videos,
interaction
Digital world
Real world
Computer Graphics
Computer Vision
Information
Culturally defined &
technologically constrained
Visual
categories
Vision
priority
Visual
categories
Digital
constraints 52

VISION AND SOCIETY
53
https://www.bbc.com/
news/technology-
51148501
As of 2020/01/22,
Google have come
out in favour of the
ban; Microsoft against.

IMAGE PROCESSINGS: EXAMPLE
 Smoothing is used to reduce noise or to produce a less pixelated image.
Most smoothing methods are based on low-pass filters, but you can also
smooth an image using an average or median value of a group of pixels (a
kernel) that moves through the image
 Image smoothing is part of preprocessing techniques intended for
removing possible image perturbations (noises) without losing image
information.Analogously, sharpening is a pre-processing technique that
plays an important role for feature extraction in image processing.
 Contrast stretching (often called normalization) is a simple image
enhancement technique that attempts to improve the contrast in an
image by ‘stretching’ the range of intensity values it contains to span a
desired range of values, the full range of pixel values that the image type
concerned allows.
 Noise removal algorithm is the process of removing or reducing the
noise from the image.The noise removal algorithms reduce or remove
the visibility of noise by smoothing the entire image leaving areas near
contrast boundaries. But these methods can obscure fine, low contrast
details
55

COMPUTERVISION METHODS: EXAMPLE
 Shape recovery from images is a fundamental problem in
computer vision. Common methods typically fall into one of
two classes: geometric or photometric approaches.
Geometric approaches take images of a scene from multiple
viewpoints, find point correspondences across images and
establish their geometric position to recover the shapes.. On
the other hand, photometric approaches recover per-pixel
surface orientation using shading cues. For example,Shape from
Shading (SFS) recovers per-pixel surface normal vectors from a
single image taken under only one distant light from a single
direction
 Cell Segmentation is a task of splitting a microscopic image
domain into segments, which represent individual instances of
cells. It is a fundamental step in many biomedical studies, and it
is regarded as a cornerstone of image-based cellular research.
56

COMPUTERVISION METHODS: EXAMPLE
 Shape-from-shading (SFS) is an important method to
reconstruct three-dimensional (3D) shape of a
surface in photometry and computer vision.
 Lambertian surface reflectance and orthographic
camera projection are two fundamental assumptions
which generally result in undesirable reconstructed
results since inaccurate imaging model is adopted
(SFS)
57
3D Surface Shape from Shading

COMPUTERVISION OUTPUT: EXAMPLE
 In stereo video/images you have more information
per frame/image allowing for creating a 3D
presentation of the image/video signal (depth)..
 Stereo image may refer to: Stereogram, an image
intended to give a 3-dimensional visual impression
(perception of depth).
58
3D Surface Shape from Stereo Images

REMEMBER:THETHREE “R”
 Jitendra Malik, UC Berkeley:Three ‘R’s of ComputerVision
 “[Further progress in] the classic problems of computational vision:
 reconstruction
 recognition
 (re)organization
 [requires us to study the interaction among these processes].”
Note: organization means building taxonomies of the visual
world so that we can move towards reasoning, not just
recognition.
59

HUMANVISIONVS COMPUTERVISION
 CCD array
 Compaction of information
 RGB Device
 Geometric stereoscopy
 Retina
 organization in layers
 ColorVision
 Vision of depth
60

COMPUTERVISION SYSTEM
ComputerVision System (CVS) is expected to have the level capability as high as HumanVisual System(HVS)
 Object detection – is an object present inthe scene ? If so, where is its boundaries ?
 Recognition – putting a label on an object
 Description – assigning properties to objects
 3D inference – interpreting a 3D scene from 2D views
 Interpreting motion
61

62
TOOLS
Image processing – noise removal, edge detection,
morphology
Feature extraction and clustering
Measure
Modelling, fitting the model, and optimization
Statistics and classification

THETHREE STAGES OF COMPUTERVISION
63
• Image to Image : Noise removal, Image Enhancement
Low Level Processing
• Image to Symbolic :A set of lines/vectors that represent the boundaries of an object in the image
Intermediate Processing
• Symbolic to Symbolic :The symbolic representation of object boundaries produces the object’s
description
High Level Processing

LOW LEVEL
64
blurring
sharpening

INTERMEDIATE LEVEL
66
K-means
clustering
original color image regions of homogeneous color
(followed by
connected
component
analysis)
data
structure

LOW-TO HIGH-LEVEL
67
edge image
consistent
line clusters
low-
level
mid-
level
high-
level
Building Recognition

CONSIDERATIONS IN COMPUTERVISION DESIGN
 What information do you want to get and
how is that information manifested in the
images?
 It is necessary to determine the relationship
between physical entities and their intrinsic
characteristics. For example, a house can be
distinguished from a tree because it has
straight lines as its intrinsic property, or the
sea can be distinguished from other objects
because the sea has a uniform appearance.
69
http://persci.mit.edu/pub_pdfs/adelson_spie_01.pdf
An intrinsic property is a property that an
object or a thing has of itself, including its
context. https://yandex.com/company/technologies/vision/

 What knowledge is needed to recover (recover) information
 A model is needed to determine the relationship between pixel intensity and image properties, be called
 The scene model: types of features, textures, smoothness
 The Illumination model: the position and characteristics of the light source and the reflectance properties
of the object's surface
 The Sensor model: position and optical performance of the camera used, noise and distortion in the
digitization process
70

 Processing speed and knowledge representation
 It is necessary to anticipate real-time processing requirements, for example, in the go-no-go quality
inspection process
 Coding knowledge (knowledge encoding) into a form that is appropriate and easy to understand is
another essential thing in considering the design of a vision system.
71
In general go/no go testing refers to a pass/fail test (or check) principle using two boundary
conditions or a binary classification. The test is passed only when the Go condition is met
and also the No go condition fails.
Encoded knowledge is expressed in terms of an accepted ‘language’ which is understood (or
must be learned) by the recipients – professional ‘jargon’, accepted disciplinary concepts,
technical languages such as statistics or the argot of street language – and which is used to
decode the meaning

THANKYOU
72
https://apod.nasa.gov/apod/astropix.html?
Astronomy Picture of the Day (6 Feb 2022): Blue Marble Earth
Image Credit: NASA,Apollo 17 Crew

Materi_01_VK_2223_3.pdf

Recommended

Recommended

More Related Content

Similar to Materi_01_VK_2223_3.pdf

Similar to Materi_01_VK_2223_3.pdf (20)

Recently uploaded

Recently uploaded (20)

Materi_01_VK_2223_3.pdf