2. DONE
• Information Retrieval (query the dataset)
• TF-IDF
• Machine Learning (classify new instances)
• CF-IIF
• Technologies: OpenIMAJ / Spark
• Image Pipeline
• Logica di classificazione
• Implementazione in java (e test)
3. DA RIVEDERE
• KNN with KDtree
• Cosine similarities, and distance metrics
• Improve cf-iif extractor (logica in spark)
• Tuning with hyper parameter
• Reduce features space: SVD
(scegliendo lo 0,01% di cluster sono 6000+ features)
4. TODO
• Re-engineering in spark
• testare differenti SIFT features (pyramid?)
• sostituire KNN con CNeuralNetworks
• (GraphLab/Deep4J)
5. FEATURE
EXTRACTOR
LOADER
MODEL TEST
PREDICTION
Build a classifier, based on salient-feature vocabulary
created from the dataset
1. load images dataset inherent 3 distinct class
2. extract local features from each image,
3. create the codebook for classification
4. train and test the model
IMAGE PIPELINE
6. FEATURES EXTRACTION
• Extracting features is where cool image
processing happens, and represent the key part
of the pipeline.
• Feature extractors make featureVector to
represent an image in a vector space.
• In a visual application systems, you need robust
features for classification and search
FEATURE
EXTRACTOR
IMAGES
MODEL TEST
PREDICTION
7. WHY FEATURES EXTRACTION
• Typically, images features are numerical vectors
that can be used with ML techniques.
• FeatureVectors can be compared by measuring
a distance
• Is useful to groups similar similar features and
reduce the dimension space
• Indexing for Information Retrieval
FEATURE
EXTRACTOR
10. IMAGE FEATURES
Image features can be:
LOCAL: multiple featureVector from interest
points (different from each image)
FEATURE
EXTRACTOR
11. IMAGE FEATURES
Image features can be:
SEGMENTED: multiple featureVector from
region point (different from each image)
FEATURE
EXTRACTOR
12. SIFT FEATURES
Scale Invariant FeatureTransformation is an advanced technique
to extract local features from interest points of an images,
that are invariant to rotation, lighting changes…
Builds on the idea of a local gradient histogram by incorporating
spatial binning which in essence creates multiple gradient
histograms about interest points and appends them all together
Standard SIFT geometry appends a spatial 4x4 grid of histograms
with 8 orientations
Leading a 128-dims features vector which is highly discriminant
and robust128
13. OPENIMAJ
• Image processing Java libraries, that includes a lot of
feature extractor and other utilities for visual applications:
• DoGSIFT
• DenseSIFT
• PyramidSIFT
• …
14. SPARK
• Spark is used to scale out the applications: big
dataset, high feature dimensions…
• MLlib
19. sift
EXTRACTOR
DATASET
KNN
KMEANS
QUANTIZAT
OR
my
IMAGE PIPELINE
A questo punto associamo a ciascuna immagine un
“cluster vector” (analogo del keyword vector nel
testo)
Cw=<w1, w2,...., wn> n=|C|
Per ciascuna immagine, e per ciascun cluster di
descrittori j, il corrispondente peso wj sara’ il
prodotto di due fattori:
• Cluster Frequency: percentuale di punti di quella
immagine che sono stati mappati nel cluster j
• Inverse Image Frequency: logaritmo del
rapporto tra la cardinalita’ del database e il numero
di immagini in cui descrittori mappati in quel
cluster sono presenti
PREDICTIONS
32. IMAGE PIPELINES
• Image Features Extraction: find interest points and extract discriminative
and robust features
• OPENIMAJ multiple algortihms
• Learn large codebooks from features
• Train the model (KNN)
• SPARK scalable models (3 days to train a KNN model on 15 images with
openimaj)
• SPARK multiple models (Bayes models, Neural Networks)
PAIN POINTS
33. • Efficient Nearest Neighbour Search (test)
• KDTree
• HyperParametersTuning (in own pipeline used
for Kmeans, CFIIF and KNN)
IMAGE PIPELINESPAIN POINTS
34. • Image Features can be used to match music!
• Extractors can be used to find objects!
(Face Detection)
OPENIMAJ++