4. REPORT FORMAT
Assignment report must have
Title Page Assignment title, Student’s Name and Student’s ID Number (NIM), instructor’s name, campus logo, date of
report preparation (5 point)
Summary:There needs to be a summary of the major points, conclusions, and recommendations (6 point)
Contents (2 point)
The body of report
Introduction: comprises the problem statement or aim of the assignment and a short overview of basic theory
related with the task questions (10 point)
Main :This part should clearly reflect the specific achievements of the assignment, include the process and the
results, or for some assignment, in this section you write the review of paper (50 point)
Conclusions & Recommendations (12 point)
Reference (4 point)
Appendix (if necessary)
The reports must follow the rules of scientific writing and have the correct format.(6 point)
All assignments are submitted/upload in digital format to Kuliah Online website ComputerVision course and follow the
instructions given in the Assignment section.
3
5. GRADING
4
Attendance and participation (attendance the course,zoom meeting,and participate in Forum): 10 %
Homework and assignments: 20 % (4 to 6 assignments)
Mid-term exam: 30 %
Final exam: 40 %
If you don't complete or submit all assignments, it will reduce your chances of passing the course
Assignment submission deadline: pay close attention the settings/info on each task
6. REFERENCE
Szeliski, R. (2022). Computer Vision: Algorithms and Applications (Texts in Computer Science) (2nd ed.),
Springer Nature Switzerland AG.
Klette, R. (2014). Concise Computer Vision: An Introduction into Theory and Algorithms (1st ed.),
Springer-Verlag London.
Forsyth, David A. and Ponce, J. (2012). Computer Vision: A Modern Approach (2nd ed.), Pearson Education,
Inc.
Fisher, R. B., Breckon, T. P., Dawson-Howe, K., Fitzgibbon, A., Robertson, C. , Trucco, E., Williams, C. K. I. (2014).
Dictionary of ComputerVision and Image Processing (2nd ed.), JohnWiley & Sons Ltd.
5
7. COURSE ETHOS
It's your road & yours alone. Others may walk it with you, but no one
can walk it for you.
Jalāl ad-Dīn Muḥammad Rūmī
“If you want to build a boat, don't gather your men and women to give
them orders, explain every detail, to tell them where to find everything..
If you want to build a boat, give birth in the hearts of your men and
women to the desire for the sea”
Saint Exupéry
6
8. HISTORY & MILESTONE
7
1959—Most experiments started here when neurophysiologists showed an array of images to a cat in an attempt
to correlate responses in its brain. Consequently, they found that it reacted first to the lines or hard edges, which
made it clear that image processing starts with simple shapes, such as straight edges.
1963—Computers were able to interpret the tridimensionality of a scene from a picture, and AI was already an
academic field.
1974—Optical character recognition (OCR) was introduced to help interpret texts printed in any typeface.
1980—Dr. Kunihiko Fukushima, a neuroscientist from Japan, proposed Neocognitron, a hierarchical multilayered
neural network capable of robust visual pattern recognition, including corner, curve, edge, and basic shape
detection.
1982 - David Marr, a British neuroscientist, published another influential paper—“Vision:A computational
investigation into the human representation and processing of visual information”.
https://blog.superannotate.com/introduction-to-computer-vision/
https://hackernoon.com/a-brief-history-of-computer-vision-and-
convolutional-neural-networks-8fe8aacc79f3
9. HISTORY & MILESTONE
1997 - Jitendra Malik (along with his student Jianbo Shi) released a paper in which he described his
attempts to tackle perceptual grouping.
1999 - David Lowe’s work “Object Recognition from Local Scale-Invariant Features”
2000-2001—Studies on object recognition increased, helping in the development of the first real-time
face recognition application.
2009 - Pedro Felzenszwalb, David McAllester, and Deva Ramanan developed “the Deformable Part
Model”
2010—ImageNet data were made available containing millions of tagged images across various object
classes that provided the foundation of CNNs and other deep learning models used today.
2014—COCO has also been developed to offer a dataset used in object detection and support future
research. 8
10. COMPUTERVISION
What kind of scene?
Where are the cars?
How far is the building?
Make computers understand images
and video.
9
11. INFORMATION AND KNOWLEDGE
10
The first meaning of information and/or knowledge conceives of it as a physical
objective entity which can be passed from one person to another.
Knowledge, expressed as information, is encapsulated within a physical or electronic
artefact so that it can be communicated from one person to another; in this sense,
we might speak of a textbook, a research paper, a website or a documentary film as
containing knowledge which has been articulated by the author and which can be
interpreted by many others without any loss of meaning.
12. IS THEVISION EASY OR HARD?
11
http://persci.mit.edu/pub_pdfs/adelson_spie_01.pdf
13. VISION IS REALLY HARD
Vision is an amazing feat of natural intelligence
Visual cortex occupies about 50% of Macaque brain
More human brain devoted to vision than anything else
12
15. THE 3 “R” OF COMPUTERVISION
The classic problems of computational
vision:
Reconstruction
Recognition
(Re)organization
Jitendra Malik – pioneer of computer vision (student of
early AI researchers)
14
16. LET’STHINK
Please think this question for 1 minute:
Have you ever used computer vision?
How? Where?
Put it into three categories: Reconstruction? Recognition? (Re)organization?
15
17. LIST OF THE EXISTING APPLICATIONS WHICH USED COMPUTER
VISION
Laptop: Biometrics auto-login (face recognition, 3D), OCR
Smartphones: QR codes, computational photography (Android Lens Blur, iPhone Portrait Mode), panorama
construction (Google Photo Spheres), face detection, expression detection (smile), Snapchat filters (face tracking),
FaceID (iPhone), Night Sight (Pixel), iPhone 12 Pro (LiDAR)
Web: Image search, Google photos (face recognition, object recognition, scene recognition, geolocalization from vision),
Facebook (image captioning), Google maps aerial imaging (image stitching),YouTube (content categorization)
VR/AR: Outside-in tracking (HTCVIVE), inside out tracking (simultaneous localization and mapping, HoloLens), object
occlusion (dense depth estimation)
Motion: Kinect, full body tracking of skeleton, gesture recognition, virtual try-on
Medical imaging: CAT / MRI reconstruction, assisted diagnosis, automatic pathology, connectomics, endoscopic
surgery
16
18. LIST OF THE EXISTING APPLICATIONS WHICH USED COMPUTER
VISION
Industry: Vision-based robotics (marker-based), machine-assisted router (jig), automated post,ANPR (number
plates), surveillance, drones, shopping
Transportation: Assisted driving (everything), face tracking/iris dilation for drunkeness, drowsiness,automated
distribution (all modes)
Media: Visual effects for film,TV (reconstruction), virtual sports replay (reconstruction), semantics-based auto
edits (reconstruction, recognition)
Robotic – navigation and control
Remote Sensing – land use and environmental monitoring
Psychology,AI – exploring representation and computation in natural vision
17
19. OPTICAL CHARACTER RECOGNITION (OCR)
• Technology to convert images of text into text
• If you have a scanner, it probably came with OCR
software
• Or while using someTranslation Apps:
• Word Lens, a feature in GoogleTranslate,
https://en.wikipedia.org/wiki/Word_Lens
• TextGrabber, https://www.textgrabber.pro/en/
• MicrosoftTranslator,
https://www.microsoft.com/en-us/translator/
• Waygo, http://www.waygoapp.com/
18
Mail digit recognition,AT&T labs
http://www.research.att.com/~yann/
License plate readers
http://en.wikipedia.org/wiki/Automatic_number_plate_recogniti
on
20. FACE DETECTION
Almost all digital cameras detect faces
Snapchat face filters
Why would this be useful?
Main reason is focus.
Also enables “smart” cropping.
19
Photo - http://thetechjournal.com/how-to/tutorial-face-swap-snapchat.xhtml
http://www.pleated-jeans.com/2016/03/02/21-snapchat-face-swaps-that-went-
horribly-wrong/
22. VISION-BASED BIOMETRICS
How the Afghan Girl was Identified by Her Iris
Patterns” Read the story
Wikipedia
http://www.cl.cam.ac.uk/~jgd1000/afghan.html
21
24. OBJECT RECOGNITION (IN MOBILE PHONES)
Point & Find, Nokia (obsolete)
Google Lens, https://lens.google/
23
25. OBJECT RECOGNITION (IN SUPERMARKETS)
24
Amazon Go,
https://www.amazon.com/b?ie=UTF8&node=16008589011
26. LANEHAWK
https://www.datalogic.com/eng/retail/fixed-retail-
scanners/lanehawk-lh5000-pd-830.html
LaneHawk is a loss-prevention solution that turns
bottom-of-basket (BOB) losses into profits in real time.
“Like its predecessors, the LH5000 utilizes advanced
Visual Pattern Recognition (ViPR) software plus newly
added support for 1D bar codes and Digimarc Barcode
digital watermarks to eliminate up to 90% of shrink
caused by Bottom-Of-Basket (BOB) items…….. Simply
stated, the LaneHawk system is the best to ensure all
items on the bottom of shopping carts are paid for. … “
25
27. 3D FROM IMAGES
Building Rome in a Day:
Paper:Agarwal et al. 2009,
https://grail.cs.washington.edu/r
ome/rome_paper.pdf
Website:
https://grail.cs.washington.edu/r
ome/
26
29. SHAPE CAPTURE
Shape capture is techniques for
capturing the shape of physical objects.
28
The Matrix movies, ESC Entertainment, XYZRGB, NRC
http://cinetropolis.net/tarkin-care-of-business-rogue-ones-
digital-peter-cushing/
30. MOTION CAPTURE
Motion Capture is a cutting-edge
method of capturing all or part of
an actor's performance so that it
can be translated into the action
of a computer-generated 3D
character on screen.
29
http://www.digitalspy.com/movies/oscars/feature
/a584704/why-andy-serkis-deserves-an-oscar-
nomination-for-planet-of-the-apes/?zoomable
31. SPORTS: VIRTUAL PITCH MARKINGS
Sport vision first down line
1st & Ten is a computer system that augments televised
coverage of American football by inserting graphical
elements on the field of play as if they were physically
present; the inserted element stays fixed within the
coordinates of the playing field and obeys the visual
rules of foreground objects occluding background
objects.
Nice explanation on www.howstuffworks.com
http://www.sportvision.com/video.html
30
32. COMPUTERVISION IN SPORT
Thomas, G., Gade, R., Moeslund,T. B., Carr, P., &
Hilton,A. (2017). Computer vision for sports:
Current applications and research topics. Computer
Vision and Image Understanding, 159, 3-18.
https://www.sportperformanceanalysis.com/s/Compu
ter-vision-for-sports-current-applications-and-
research-topics.pdf
https://www.sportperformanceanalysis.com/article/co
mputer-vision-in-sport
31
34. AUTO CAR
Mobileye
Vision systems currently in high-end BMW, GM,Volvo
models. By 2010: 70% of car manufacturers.
33
35. GOOGLE CAR
Oct 9, 2010. "Google Cars Drive Themselves, in
Traffic". The NewYorkTimes. John Markoff
June 24, 2011. "Nevada state law paves the way for
driverless cars". Financial Post. Christine Dobby
Aug 9, 2011, "Human error blamed after Google's
driverless car sparks five-vehicle crash". The
Star (Toronto)
34
36. AUTOCARS
Uber bought Carnegie Mellon University (CMU) lab
(2015),
http://www.cmu.edu/news/stories/archives/2015/febr
uary/uber-partnership.html
http://www.wsj.com/articles/is-uber-a-friend-or-
foe-of-carnegie-mellon-in-robotics-1433084582
http://www.freep.com/story/money/cars/ford/201
6/08/21/uber--lyft-gm-pittsburgh-autonomous-
vehicles-self-driving-autos-equal-
profits/88944036/
Then sold it (2020),
https://techcrunch.com/2020/12/07/uber-sells-self-
driving-unit-uber-atg-in-deal-that-will-push-auroras-
valuation-to-10b/
35
37. COMPUTERVISION IN SPACE
Vision systems (JPL) used for several tasks
Panorama stitching
3D terrain modeling
Obstacle detection, position tracking
For more, read : “ComputerVision on Mars” by
Matthies et al.,
https://www.ri.cmu.edu/pub_files/pub4/matthies_larr
y_2007_1/matthies_larry_2007_1.pdf
36
38. COMPUTERVISION IN SPACE
NASA Perseverance lander and rover
37
https://mars.nasa.gov/mars2020/mission/technology/#Terrain-
Relative-Navigation
39. COMPUTERVISION ON MARS
It has 23 cameras on it.
https://mars.nasa.gov/mars2020/spacecraft/rover/cam
eras/
https://mars.nasa.gov/mars2020/spacecraft/rover/brai
ns
CPU is 200MHz PowerPC arch.
2GB storage
256MB RAM
38
41. MOBILE ROBOTS
Robotic Grasping of Novel Objects usingVision:
http://ai.stanford.edu/~asaxena/learninggrasp/IJRR_saxena_etal_
roboticgraspingofnovelobjects.pdf
RoboCup is an international scientific initiative with the goal to
advance the state of the art of intelligent robots.When
established in 1997, the original mission was to field a team of
robots capable of winning against the human soccer World Cup
champions by 2050,https://robocup.org/
Mars Spirit Rover, One of two rovers launched in 2003 to
explore Mars and search for signs of past life, Spirit far
outlasted her planned 90-day mission, lasting over six years.
https://www.jpl.nasa.gov/missions/mars-exploration-rover-spirit-
mer-spirit
40
42. MEDICAL IMAGING
3D Reconstruction,Visualization, and Measurement
of MRI Images, https://sci-
hub.se/https://doi.org/10.1117/12.341059
Three-Dimensional Medical CT Image
Reconstruction, https://sci-
hub.se/10.1109/ICMTMA.2009.10
Image Guided Surgery
http://citeseerx.ist.psu.edu/viewdoc/download?doi=1
0.1.1.469.8474&rep=rep1&type=pdf
41
47. COMPUTERVISION AND NEARBY FIELDS
46
Computer Graphics: Models to Images
Image Processing : Images to Images
ComputerVision: Images to Models
48. COMPUTERVISION AND NEARBY FIELDS
47
Derogatory summary of computer vision:
“Machine learning applied to visual data.”
Model of
the visual
world
Images, videos,
sensor data…
Images, videos,
interaction
Digital world
Real world
Information
Computer Vision Computer Graphics
49. SUPERHUMAN STATE OF THE ART?
Deep learning is an enormous disruption to the field. Since 2012, rapid
expansion and commercialization.
Why?
“With enough data, computer vision matches or even outperforms human
vision at most recognition tasks.”
What.
48
50. VISION AND SOCIETY
Lots of data = lots of potential bias in the data.
Needs understanding of possible failures.
+
Responsible approach.
+
Techniques to overcome bias.
49
51. VISION AND SOCIETY
“Vision, in my view, is the cause of the greatest
benefit to us, inasmuch as none of the accounts
now given concerning the Universe would ever
have been given if men had not seen the stars or
the sun or the heavens.”
- Plato (Timeus, 360 BC)
“Worldview” vs.“World-sense”
50
52. VISION AND SOCIETY
Societal Categorizations
Prioritization ofVision •Visual Categorization •Visual Biases
“The reason that the body has so much presence in the
West is that the world is primarily perceived by sight.The
differentiation of human bodies in terms of sex, skin
color, and cranium size is a testament to the powers
attributed to "seeing." It is believed that just by looking at
it [the body] one can tell a person's beliefs and social
position or lack thereof.”
- Oyeronke Oyewumi
(The Invention ofWomen, 1997)
51
53. VISION AND SOCIETY
Derogatory summary of computer vision:
“Machine learning applied to visual data.”
JH
Models of
the visual
world
Images, videos,
sensor data…
Images, videos,
interaction
Digital world
Real world
Computer Graphics
Computer Vision
Information
Culturally defined &
technologically constrained
Visual
categories
Vision
priority
Visual
categories
Digital
constraints 52
56. IMAGE PROCESSINGS: EXAMPLE
Smoothing is used to reduce noise or to produce a less pixelated image.
Most smoothing methods are based on low-pass filters, but you can also
smooth an image using an average or median value of a group of pixels (a
kernel) that moves through the image
Image smoothing is part of preprocessing techniques intended for
removing possible image perturbations (noises) without losing image
information.Analogously, sharpening is a pre-processing technique that
plays an important role for feature extraction in image processing.
Contrast stretching (often called normalization) is a simple image
enhancement technique that attempts to improve the contrast in an
image by ‘stretching’ the range of intensity values it contains to span a
desired range of values, the full range of pixel values that the image type
concerned allows.
Noise removal algorithm is the process of removing or reducing the
noise from the image.The noise removal algorithms reduce or remove
the visibility of noise by smoothing the entire image leaving areas near
contrast boundaries. But these methods can obscure fine, low contrast
details
55
57. COMPUTERVISION METHODS: EXAMPLE
Shape recovery from images is a fundamental problem in
computer vision. Common methods typically fall into one of
two classes: geometric or photometric approaches.
Geometric approaches take images of a scene from multiple
viewpoints, find point correspondences across images and
establish their geometric position to recover the shapes.. On
the other hand, photometric approaches recover per-pixel
surface orientation using shading cues. For example,Shape from
Shading (SFS) recovers per-pixel surface normal vectors from a
single image taken under only one distant light from a single
direction
Cell Segmentation is a task of splitting a microscopic image
domain into segments, which represent individual instances of
cells. It is a fundamental step in many biomedical studies, and it
is regarded as a cornerstone of image-based cellular research.
56
58. COMPUTERVISION METHODS: EXAMPLE
Shape-from-shading (SFS) is an important method to
reconstruct three-dimensional (3D) shape of a
surface in photometry and computer vision.
Lambertian surface reflectance and orthographic
camera projection are two fundamental assumptions
which generally result in undesirable reconstructed
results since inaccurate imaging model is adopted
(SFS)
57
3D Surface Shape from Shading
59. COMPUTERVISION OUTPUT: EXAMPLE
In stereo video/images you have more information
per frame/image allowing for creating a 3D
presentation of the image/video signal (depth)..
Stereo image may refer to: Stereogram, an image
intended to give a 3-dimensional visual impression
(perception of depth).
58
3D Surface Shape from Stereo Images
60. REMEMBER:THETHREE “R”
Jitendra Malik, UC Berkeley:Three ‘R’s of ComputerVision
“[Further progress in] the classic problems of computational vision:
reconstruction
recognition
(re)organization
[requires us to study the interaction among these processes].”
Note: organization means building taxonomies of the visual
world so that we can move towards reasoning, not just
recognition.
59
61. HUMANVISIONVS COMPUTERVISION
CCD array
Compaction of information
RGB Device
Geometric stereoscopy
Retina
organization in layers
ColorVision
Vision of depth
60
62. COMPUTERVISION SYSTEM
ComputerVision System (CVS) is expected to have the level capability as high as HumanVisual System(HVS)
Object detection – is an object present inthe scene ? If so, where is its boundaries ?
Recognition – putting a label on an object
Description – assigning properties to objects
3D inference – interpreting a 3D scene from 2D views
Interpreting motion
61
63. 62
TOOLS
Image processing – noise removal, edge detection,
morphology
Feature extraction and clustering
Measure
Modelling, fitting the model, and optimization
Statistics and classification
64. THETHREE STAGES OF COMPUTERVISION
63
• Image to Image : Noise removal, Image Enhancement
Low Level Processing
• Image to Symbolic :A set of lines/vectors that represent the boundaries of an object in the image
Intermediate Processing
• Symbolic to Symbolic :The symbolic representation of object boundaries produces the object’s
description
High Level Processing
70. CONSIDERATIONS IN COMPUTERVISION DESIGN
What information do you want to get and
how is that information manifested in the
images?
It is necessary to determine the relationship
between physical entities and their intrinsic
characteristics. For example, a house can be
distinguished from a tree because it has
straight lines as its intrinsic property, or the
sea can be distinguished from other objects
because the sea has a uniform appearance.
69
http://persci.mit.edu/pub_pdfs/adelson_spie_01.pdf
An intrinsic property is a property that an
object or a thing has of itself, including its
context. https://yandex.com/company/technologies/vision/
71. CONSIDERATIONS IN COMPUTERVISION DESIGN
What knowledge is needed to recover (recover) information
A model is needed to determine the relationship between pixel intensity and image properties, be called
The scene model: types of features, textures, smoothness
The Illumination model: the position and characteristics of the light source and the reflectance properties
of the object's surface
The Sensor model: position and optical performance of the camera used, noise and distortion in the
digitization process
70
72. CONSIDERATIONS IN COMPUTERVISION DESIGN
Processing speed and knowledge representation
It is necessary to anticipate real-time processing requirements, for example, in the go-no-go quality
inspection process
Coding knowledge (knowledge encoding) into a form that is appropriate and easy to understand is
another essential thing in considering the design of a vision system.
71
In general go/no go testing refers to a pass/fail test (or check) principle using two boundary
conditions or a binary classification. The test is passed only when the Go condition is met
and also the No go condition fails.
Encoded knowledge is expressed in terms of an accepted ‘language’ which is understood (or
must be learned) by the recipients – professional ‘jargon’, accepted disciplinary concepts,
technical languages such as statistics or the argot of street language – and which is used to
decode the meaning