Materi "Deep Learning for Computer Vision" mencakup konsep dasar deep learning dengan fokus pada transfer learning menggunakan TensorFlow, evaluasi model, dan proses deployment. Pada intinya, transfer learning memanfaatkan pengetahuan yang telah dimiliki oleh model yang telah dilatih sebelumnya untuk meningkatkan kinerja model pada tugas tertentu. Evaluasi model melibatkan penilaian kualitas dan keandalan model yang telah dibangun, sementara deployment mencakup implementasi model ke lingkungan produksi untuk penggunaan praktis. Materi ini memberikan pemahaman holistik tentang penerapan deep learning dalam konteks computer vision, melibatkan tahapan esensial dari pengembangan hingga implementasi model pada aplikasi dunia nyata.
10. “Deep learning allows computational models of multiple
processing layers to learn and represent data with multiple
levels of abstraction mimicking how the brain perceives and
understands multimodal information, thus implicitly capturing intricate
structures of large-scale data”
12. Annisa Darmawahyuni
Computer vision is a field of artificial intelligence (AI) that enables
computers and systems to derive meaningful information from digital
images, videos and other visual inputs — and take actions or make
recommendations based on that information. If AI enables computers to
think, computer vision enables them to see, observe and understand.
COMPUTER
VISION
18. Annisa Darmawahyuni
OBJECT DETECTION
Object detection is the process of detecting instances of semantic objects of a certain class (such as humans,
airplanes, or birds) in digital images and video.
Ground truth Bounding Box with region approach Bounding Box with region and semantic
segmentation approach
19. Annisa Darmawahyuni
OBJECT DETECTION
You can choose from two key approaches to get started with object detection using deep learning:
Create and train a custom object detector.
To train a custom object detector from scratch, you need to design a network architecture to learn
the features for the objects of interest. You also need to compile a very large set of labeled data to
train the CNN. The results of a custom object detector can be remarkable. That said, you need to
manually set up the layers and weights in the CNN, which requires a lot of time and training data.
Use a pretrained object detector.
Many object detection workflows using deep learning leverage transfer learning, an approach that
enables you to start with a pretrained network and then fine-tune it for your application. This
method can provide faster results because the object detectors have already been trained on
thousands, or even millions, of images.
26. Annisa Darmawahyuni
SEMANTIC SEGMENTATION
Semantic Segmentation is a deep learning algorithm that associates a label or category with every pixel in an
image. It is used to recognize a collection of pixels that form distinct categories
A simple example of semantic segmentation is separating the images into two classes. For example, in Figure 1, an image showing a person
at the beach is paired with a version showing the image's pixels segmented into two separate classes: person and background.
27. Annisa Darmawahyuni
HOW DOES SEMANTIC SEGMENTATION
DIFFER FROM OBJECT DETECTION?
Semantic segmentation can be a useful alternative to object detection because it allows the object of interest to span
multiple areas in the image at the pixel level. This technique cleanly detects objects that are irregularly shaped, in
contrast to object detection, where objects must fit within a bounding box (Figure 2)
Figure 2. Object detection, showing bounding boxes to identify objects.
29. Annisa Darmawahyuni
SEMANTIC
SEGMENTATION
The process of training a semantic segmentation network to
classify images follows these steps:
Analyze a collection of pixel-labeled images.
Create a semantic segmentation network.
Train the network to classify images into pixel categories.
Assess the accuracy of the network
31. Annisa Darmawahyuni
DATASET FOR COMPUTER VISION
Grayscale Images. The most used grayscale images dataset is MNIST
(https://www.kaggle.com/datasets/hojjatk/mnist-dataset) and its variations, that is, NIST and perturbed
NIST. The application scenario is the recognition of handwritten digits.
RGB Natural Images. Caltech RGB image datasets (https://euclid.caltech.edu/image/euclid20231107b-
ngc-6822), CIFAR datasets (https://www.cs.toronto.edu/~kriz/cifar.html) consist of thousands of 32 × 32
color images in various classes.
Hyperspectral Images. SCIEN hyperspectral image data and AVIRIS sensor based datasets, for example,
contain hyperspectral images.
Facial Characteristics Images. Adience benchmark dataset
Medical Images. Chest X-ray dataset (https://www.kaggle.com/datasets/paultimothymooney/chest-xray-
pneumonia) comprises 112120 frontal-view X-ray images of 30805 unique patients.
Video Streams. The WR datasets can be used for video-based activity recognition in assembly lines.
YouTube-8M is a dataset of 8 million YouTube video URLs, along with video-level labels from a diverse set
of 4800 Knowledge Graph entities.
33. Annisa Darmawahyuni
HYPERPARAMETER TUNING (DL)
Learning rate (LR). If the learning rate (LR) is too small, overfitting can occur. Large learning rates help to
regularize the training but if the learning rate is too large, the training will diverge.
Number of hidden layers.
Number of nodes/neurons per layer.
Optimizer
Batch Size
Epochs
Artikel Ilmiah Computer Vision Deep Learning
Intelligent System Research Group
https://docs.google.com/spreadsheets/d/13MLJnecd32B3H-f342M-
Uoqd_y5wRVgGDK1aT-bQg3w/edit#gid=0