This page lists a number of example VLFeat applications. The code
can be found in the VLROOT/apps/
subdirectory in the
VLFeat package.
This sample application uses VLFeat to train an test an image classifier on the Caltech-101 data. The classifier achieves 65% average accuracy by using a single feature and 15 training images per class. It uses:
The program is fully contained in a single MATLAB M-file, and can also be simply adapted to use your own data (change conf.calDir).
This example application extends the Caltech-101 demo above in many
ways: it supports multiple encoding methods, including BoVW, VLAD, and
Fisher Vectors, tweaked image features, and multiple benchmark
datasets. The code is located int apps/recognition
. Start
from the main
file.
The following tables report results on a few standard benchmark datasets (PASCAL VOC 2007 classification challenge, Caltech 101 30 training images, MIT Scene 67, and Flickr Material Dataset) for a number of different encodings:
method | VOC07 | Caltech 101 | Scene 67 | FMD |
---|---|---|---|---|
FV | 59.12% mAP | 73.02% Acc | 58.25% Acc | 59.60% Acc |
FV + aug. | 60.25% mAP | 75.61% Acc | 57.57% Acc | 60.80% Acc |
FV + s.p. | 62.23% mAP | 77.63% Acc | 61.83% Acc | 60.80% Acc |
VLAD + aug. | 54.66% mAP | 78.68% Acc | 53.29% Acc | 49.40% Acc |
BOVW + aug. | 49.87% mAP | 75.98% Acc | 50.22% Acc | 46.00% Acc |
The baseline feature is SIFT (vl_dsift
) computed at
seven scales with a factor $\sqrt{2}$ between successive scales, bins
8 pixel wide, and computed with a step of 4 pixels. All experiments
but the Caltech-101 ones start by doubling the resolution of the input
image. The details of the encodings are as follows:
vl_vlad
).vl_fisher
).vl_svmtrain
). The parameter $C$ is set to 10 for all
dataset except PASCAL VOC, for which it is set to 1.This sample application uses VLFeat to extract SIFT features form a pair of images and match them. It then filters the matches based on RANSAC and produces a mosaic. Read the code.