SlideShare a Scribd company logo
1 of 13
Download to read offline
INTERNATIONAL ORGANISATION FOR STANDARDISATION
ORGANISATION INTERNATIONALE DE NORMALISATION
ISO/IEC JTC1/SC29/WG11
CODING OF MOVING PICTURES AND AUDIO
ISO/IEC JTC1/SC29/WG11
MPEG2011/m19188
January 2011, Daegu, Korea
Source Peking University, Harbin Institute of Technology, China
Status Input Contribution
Title Peking University Landmarks: A Context Aware Visual Search Benchmark Database
Author lingyu@pku.edu.cn, Lingyu Duan
jirongrong@gmail.com, Rongrong Ji
cjie@pku.edu.cn, Jie Chen
syang@pku.edu.cn, Shuang Yang
tjhuang@pku.edu.cn, Tiejun Huang
hongxun.yao@gmail.com, Hongxun Yao
wgao@pku.edu.cn, Wen Gao
 
1 Introduction 
The 93rd
MPEG meeting output draft requirements documents (w11529, w11530 and w11531) of
Compact Descriptors for Visual Search. To advance this work, this contribution presents our
work on establishing context aware visual search benchmark database for mobile Landmark
search. In the input contribution m18542 at MPEG 94th
Meeting [9], Peking University has
proposed a compact descriptor for visual search, which combines location cues to learn a
discriminative and compact visual descriptor that is very suitable for mobile landmark search.
We believe our practice as well as the benchmark dataset would enhance the use cases and be
helpful to identify requirements for Compact Descriptors for Visual Search.
While there are ever growing focuses on mobile visual search in recent years, a
comprehensive benchmark database for fair evaluation among different strategies is still missing.
In particular, the rich contextual cues in mobile devices, such as GPS information and camera
parameters, are left unexploited in the current visual search benchmarks. This contribution
introduces a Peking University Landmarks benchmark for the quantitative evaluations of mobile
visual search performance with the support of GPS information. It contains over 13179 images
organized into 198 distinct landmark locations within the Peking University campus, which is
built by 20 volunteers during November and December, 2010. Each location is captured with
multiple shot sizes and viewing angles, using both digital cameras and phone cameras, each
photo being tagged with rich contextual information in the mobile scenarios. Moreover, this
benchmark studies typical quality degeneration scenarios in mobile photographing, including
variable resolutions, blurring, lighting changes, occlusions, as well as various viewing angles.
Together with this benchmark, we provide the bag-of-visual-words search baselines in the cases
of using either spatial or contextual information in returning image ranking. Finally, distractor
images are further introduced to evaluate the robustness of visual search methods in the database.
2 Mo
Coming
increasi
munitie
establish
search s
the rich
benefici
are left
We
photogr
applicat
evaluate
198 lan
sufficien
focus o
systema
3 Be
Sca
contains
digital a
DSC-W
NIKON
from m
iphone
respecti
pair of
GPS de
digital a
blurring
the volu
phone p
during N
Fig. 1. T
otivation
g with the
ing interest
s. However
hed benchm
scenarios th
h contextual
ial to refine
unexploited
believe a
raphing var
tions. In thi
e GPS conte
ndmarks lo
nt real-worl
n the avail
atic methodo
enchmark
ale and C
s over 1317
and phone c
W290, Samsu
N COOLPIX
mobile phon
3G and LG
ively. We re
volunteers,
evice (HOL
and phone c
g and shakin
unteers com
photos than
November a
Two typical
n 
explosive
ts in compu
r, state-of-th
mark databa
hat involve
l cues, such
e solely visu
d in the exis
real-world
riances is
is contributi
ext assisted
ocations wi
ld photogra
ability of c
ology to ev
k Databa
Constituti
79 scene ph
cameras. Th
ung Techwi
X L12 and
ne cameras
G Electroni
ecruited ove
one using
LUX M-120
camera phot
ng are mor
mpensate th
the digital
and Decemb
l scenarios o
growth of
uter vision,
he-art work
ase, which
lots of phot
h as GPS, ti
ual ranking.
sting visual
d, context
important t
ion, we intr
mobile visu
ithin the P
aphing varia
contextual c
aluate the ro
ase Statis
ion:  The P
hotos, organ
here are in t
in <Digima
Canon IXU
(Nokia E7
cs KP500 w
er 20 volunt
digital cam
00E) with t
tographers a
e frequent
heir bad pho
camera pho
ber, 2010.
of capturing
f phone ca
, multimedi
ks are rarel
should be
tographing
ime stamp,
However, t
search benc
rich bench
to put forw
roduce the
ual search p
Peking Uni
ances typica
cues to imp
obustness o
stics 
Peking Univ
nized into19
total 6193 p
ax S830 / Ke
US 210 wi
72-1, HTC
with resolu
teers in data
mera and the
them. The
are within 1
happenings
oto with a
one. All the
g landmark
ameras, mo
ia analysis,
y compared
designed to
variances u
and base s
the effectiv
chmarks.
hmark with
ward mobi
Peking Uni
performance
iversity cam
ally for mob
prove the v
of contextua
versity Lan
98 landmark
photos captu
enox S830>
ith resolutio
Desire, No
ution 640×4
a acquisitio
e other usin
averaged v
0 degrees f
s for mobile
new one, w
e images in
photos in d
obile visual
, and inform
d among ea
o target rea
using phone
tation infor
eness and ro
h sufficien
le visual s
iversity Lan
e. The datas
mpus. Our
bile phone c
isual search
al cue by ad
ndmarks ben
k locations
ured from d
>, Canon DI
on 2592×19
okia 5235,
480, 1600×1
n, each land
ng mobile p
viewing ang
for both volu
e phone cap
which thus
n the entire
ifferent sho
l search ha
mation retr
ach other o
al-world mo
e cameras. I
rmation, are
obustness o
nt coverage
search rese
ndmarks be
set is collec
benchmar
cameras. W
h performan
dding contex
nchmark (P
and captur
digital came
IGITAL IX
944) and 6
Apple iph
1200 and 2
dmark is ca
phone, with
gle variation
unteers. No
pturing. In
produce m
database ar
ots sizes and
as received
rieval com-
over a well-
obile visual
In addition,
e extremely
of such cues
e of users’
arches and
nchmark to
ct from over
rk provides
We put more
nce, with a
xt distractor
PKUBench)
ed via both
eras (SONY
XUS 100 IS,
986 photos
one, Apple
2048×1536)
aptured by a
h a portable
ns between
ote that both
such cases,
more mobile
re collected
d angles.
d
-
-
l
,
y
s
’
d
o
r
s
e
a
r.
)
h
Y
,
s
e
)
a
e
n
h
,
e
d
As i
medium
which a
degrees
differen
with res
images
Fig. 2.
Fi
Co
scenario
mobile
illustrated i
m shot and c
attempt to c
respectivel
nt weathers
spect to lan
of different
The landma
ig.3. The pe
ontextual
o is closely
user’s geo
n Figure 1,
close up. Fo
cover 360 d
ly. The cap
(sunny, clou
ndmark loc
t landmarks
ark photo d
on the
ercentages o
Cues: Co
y related to
ographical l
we capture
or each sho
degrees from
pturing of bo
udy, etc.) du
ations are
. The perce
istribution b
Google Ma
of both phon
omparing w
rich contex
location can
e photos in
ot size, ther
m the front
oth digital
during Nove
given in Fi
entage of mo
by overlayin
ap of Peking
ne camera a
with genera
xtual inform
n be levera
three differ
re are at mo
tal view of
camera and
ember and D
igure 2. Di
obile and ca
ng the locat
g University
and digital c
alized visu
mation on th
aged to pre
rent shot siz
ost 8 directi
the landma
d mobile ph
December. T
fferent colo
amera photo
tion point of
y campus.
camera phot
ual search,
he mobile p
e-filter mos
zes, namely
ions in pho
ark, capture
hone photos
The photo d
ors denote
os is given i
f each colle
tos in PKUB
mobile vis
phone. For
st of unrela
y long shot,
tographing,
ed every 45
s undergoes
distributions
the sample
in Figure 3.
ected photo
Bech.
sual search
instance, a
ated scenes
,
,
5
s
s
e
h
a
s
without visual ranking. Over PKUBench, we pay more focus to the use of such contextual cues
in facilitating visual search, including: (1) GPS tag (both latitude and longitude); (2) landmark
name label; (3) shot size (long, medium, and close-up) and viewpoints (frontal, side, and others)
of those photos; (4) camera type (digital camera or mobile phone camera); (5) capture time
stamp. We also provide EXIF information: camera setting (focal, resolution).
In addition, we will show the performance improvement of using contextual information by
providing baselines that leverage GPS to refine visual ranking. Furthermore, the effects of less
precise contextual information are also investigated by adding distractor images by imposing
random GPS distortion to the original GPS location of an image.
Scene Diversity: We provide as diverse landmark appearances as possible to simulate the
real-world difficulty in visual search. Hence, the volunteers are encouraged to capture both
queries and the ground truth photos (for both digital and phone cameras) without any particular
intent to avoid the intruding foreground, e.g. cars, human faces, and tree occlusions.
4 Comparing with Related Benchmarks 
Zubud Database [2] is widely adopted to evaluate vision-based geographical location
recognition, which contains 1,005 color images of 201 buildings or scenes (5 images per
building or per scene) in Zurich city, Switzerland.
Oxford Buildings Database [3] contains 5,062 images collected from Flickr by
searching for particular Oxford landmarks, with manual annotated ground truth for 11 different
landmarks, each represented by 5 possible queries.
SCity Database  [4] contains 20, 000 street-side photos for mobile visual search
validation in Microsoft Photo2Search system [4]. It is captured along the Seattle urban streets by
a car automatically, taken from the main streets of Seattle by a car with six surrounding cameras
and a GPS device. The location of each captured photo is obtained by aligning the time stamps
of photos and GPS record.
UKBench Database [5] contains 10,000 images with 2,500 objects, containing indoor
objects like CD Covers, book set, etc. There are four images per object to offer sufficient
variances in viewpoints, rotations, lighting conditions, scales, occlusions, and affine transforms.
Stanford Mobile Visual Search Data Set [6] contains camera-phone images of
products, CDs, books, outdoor landmarks, business cards, text documents, museum paintings and
video clips. It provides several unique characteristics, e.g. varying lighting conditions,
perspective distortion, and mobile phone queries.
Table 1. Brief comparison of related benchmarking databases.
Database PKUBench Zubud Oxfold SCity UKBench Stanford
Data Scale 13,179 1,005 5,062 20,000 10,000
Images
Per Landmark
/Object Category
66 5 92 6 4
Mobile Capture √ × × × × √
Categorized
shot size,
view Angle,
landmark/Object
Scale
√ × × × Indoor ×
Blurring Query √ × × × × ×
Context √ × × × × ×
PK
aspects:
Low qu
camera;
queries
Table 1
5 Ex
Five gr
evaluate
Oc
queries,
Ba
queries.
yield wo
Nig
photo qu
Blu
and 20 c
Ad
contextu
Palace (
Univers
the orig
degener
Fig.4. E
Occlusi
KUBench
: (1) Rich c
uality cellp
; quantize t
and databa
presents th
emplar M
roups of ex
e the real-w
cclusive Q
, occluded b
ackground
. These are
orse results
ght Quer
uality heavi
urring an
correspondi
dding Dist
ual informa
(note: the l
sity) and 20
ginal databa
ration.
Examples of
ve, Backgro
Database
contextual i
hone queri
the perform
ase caused b
he brief com
Mobile Q
xemplar mo
world visual
Query Set
by foregroun
d Clutter
often capt
due to the b
ry Set con
ily depends
nd Shakin
ing mobile q
tractors i
ation to visu
andmark bu
012 photos f
ase. We then
f query scen
ound Clutte
e: Our data
information
es with co
mance dege
by cars, peo
mparison of r
Query Sce
obile query
search perf
t contains 2
nd cars, peo
s Query
tured far aw
bias of othe
ntains 9 mo
on the ligh
ng Query
queries with
into Data
ual search. W
uildings in
from PKU,
n select 10
narios (Digi
ers, Blurring
abase is pro
n to simulat
omparison t
eneration of
ople, trees,
related benc
enarios
y scenarios
formance in
20 mobile q
ople, and bu
Set contai
way from a
er nearby bu
obile phone
hting conditi
Set contain
hout any blu
abase is to
We collect a
Summer Pa
then rando
locations (
ital Camera
g/ Shaking,
oviding rich
te what we
to the corre
f cellphone
and nearby
chmarking d
(in total 1
n challenging
queries and
uildings.
ns 20 mobi
a landmark,
uildings.
e queries an
ions.
ns 20 mobil
urring or sh
evaluate th
a distractor
alace are vi
omly assign
(30 queries)
versus Mob
and Night.
query scen
can get fro
esponding
queries; (3
y buildings,
databasets i
68 queries)
g situations
20 corresp
ile queries
where GP
nd 9 digital
le queries w
haking.
he effects of
set of 6630
isually simi
ed them wi
) from PKU
bile Phone)
narios in the
om mobile p
queries of
3) Occlusio
blurring an
in the state-
) are demo
s (See Figur
ponding dig
and 20 dig
S based se
l camera qu
with blurring
f applying l
0 photos fro
ilar as those
ith the GPS
U to evaluat
) (From Top
e following
phones. (2)
the digital
ons in both
nd shaking.
of-the-art.
onstrated to
re 4):
gital camera
gital camera
arch would
ueries. The
g or shaking
less precise
om Summer
e in Peking
S tagging of
te the mAP
p to Bottom
g
)
l
h
.
o
a
a
d
e
g
e
r
g
f
P
m:
La
walking
with thr
medium
small on
Small Sc
Medium
Large Sc
andmark
g distances
ree scales, s
m scale is 1
nes, 75 med
Tabl
cale (0-12m):
Scale (12-30
cale (> 30m):
(Examples
Fig. 5. T
Scale: We
of the phot
small, medi
2-30 m and
dium ones, a
le. 2. Typic
S
m): C
h
L
la
s of Differe
The photo v
try to categ
tographers
ium, and la
d the large
and 60 large
al landmark
Sculpture, ston
Courtyard, and
historic buildin
Large building
arge object (e.
nt Scales: F
volumes of t
gorize the la
around eac
arge. The ty
scale is ov
e ones.
k types of th
ne, pavilions,
d small or m
ngs (smaller fl
gs, such as lib
.g. BoYa Tow
From Top to
three differe
andmark sc
ch assigned
ypical distan
ver 30m. As
hree differen
gates and othe
medium sized
floor area)..
brary, comple
wer).
o Down: Sm
ent landmar
ale by meas
landmark l
nce for sma
s shown in
nt Landmar
ers.
buildings, su
ex building, o
mall, Medium
rk scales in
suring the r
location. W
all scale is 0
Figure 5, w
rk scales
uch as office
or a long shot
m, and Larg
PKUBench
range of the
We come up
0-12 m, the
we have 63
buildings,
t of a very
ge)
h
e
p
e
3
Fina
6 Mo
We pro
assisted
(1) B
build a
ally, we pro
F
obile Vis
ovide severa
d visual sear
BoW: We e
Scalable V
ovide more p
Fig.6. Ph
Fig.7. Photo
sual Sear
al visual se
rch:
extract SIFT
ocabulary T
photograph
hoto volume
o volume dis
rch Basel
earch baseli
T [7] featur
Tree [5] to g
details in F
e distributio
stribution b
lines 
ines, includ
res from ea
generate the
Figure 6 and
n by differe
by different
ding purely
ach photo, t
e initial Vo
d Figure 7.
ent shot size
viewing ang
visual sear
the ensembl
cabulary V.
es
gles.
rch as well
le of which
. The SVT
as context
h is used to
generates a
t
o
a
bag-of-w
the bran
approxi
search p
(2)
function
on the w
where D
and BoW
the BoW
is based
It is
while it
satisfact
contains
building
be well
mA
perform
Fig.8. T
camera
Note
happens
to favor
degener
perform
words signa
nching fact
mate 100,0
performance
 
GPS + Bo
n by multip
weighting fu
Dis(A,Q) is
WDis(A,Q)
W based vis
d on such sim
s worth men
t typically
tory perform
s lots of tr
gs (such as
distinguish
AP Perform
mance of eac
The perform
and mobile
e that most
s in the long
r other near
ration using
mance when
ature Vi for
tor as B. In
000 codewo
e, which rev
oW: We fu
plying the G
unction as:
(Dis A
s the overal
) stands for
ual distance
milarity me
ntioning tha
gives prom
mance in th
rees that ar
ancient Chi
hed by RAN
mance with
ch query sce
mance of oc
e phone cam
t of occlusi
g or medium
rby landma
g solely G
combining
r each datab
n a typical
ords. We us
veals its pos
urther lever
GPS distanc
, )A Q GeoD
ll distance b
the geogra
e between q
easurement i
at we have
mising resul
is database.
e un-regula
inese buildi
NSAC.
h respect t
enario respe
cclusive qu
mera(Y axis:
ive queries
m shot of a
arks around
GPS inform
visual sear
base photo I
l settlement
se mean Av
sition-sensit
rage the lo
ce with the
( , )Dis A Q B
between qu
aphical dista
query Q and
in Equation
discovered
lts in tradit
. There are
ar for spati
ings) that ha
to differen
ectively as f
ueries with
: mAP@N p
come from
a large scale
d the query
mation. Thi
rch with GP
Ii. We deno
t, we have
verage Preci
tive ranking
cation cont
BoW distan
( ,BoWDis A
uery Q and
ance (measu
d database im
n (1).
that the RA
tional visua
two possibl
al re-rankin
ave very sim
nt challeng
follows:
respect to
performanc
m a large sc
e landmark.
location, w
s may eve
S informati
ote the hiera
H = 6 an
ision at N (
g precision a
text to refin
nces to the
)Q (1)
database im
ured by GP
mage A resp
ANSAC ba
al experimen
le reasons: (
ng; (2) The
milar local f
ging scena
difference
e; X axis: to
cale landma
In such cas
which would
en degenera
ion.
archical lev
nd B = 10,
(mAP@N)
at the top N
ine the visu
query exam
)
mage A; Ge
PS distance)
pectively. O
ased spatial
ents, does n
(1) PKUBe
ere are lots
features, wh
arios: We d
methods u
op N return
ark, as occl
se, GPS po
d lead to p
ate the vis
el as H and
producing
to evaluate
N returning.
ual ranking
mple, based
eoDis(A,Q)
) as well as
Our ranking
re-ranking,
not produce
nch usually
s of similar
hich cannot
discuss the
sing digital
ning results)
usion often
sition tends
erformance
sual search
d
g
e
g
d
)
s
g
,
e
y
r
t
e
l
).
n
s
e
h
Fig
In p
such cas
major p
The N
recognit
differen
enough
camera
.9. The perf
practice, the
ses, the pur
part of a que
Fig.10. T
Night query
tion perform
nt from the
at night. It
can achieve
formance of
e backgroun
rely visual s
ery photo is
The perform
y is an inter
mance. Extr
day time.
is worth m
e better visu
f backgroun
nd clutter ty
search perfo
actually oc
mance of Ni
resting case,
racting dist
Hence, we
entioning th
ual search p
nd clutters q
ypically hap
ormance per
cupied by b
ight queries
, where GP
tinguishing
e can obser
hat due to b
performance
queries with
ppens in cap
rforms wors
backgrounds
with respec
PS (contextu
local featu
ve that usin
better image
e than a mob
h respect to
pturing sma
se, due to th
s.
ct to differe
ual informat
ures is very
ng solely G
e capturing
bile phone.
different m
all scale lan
hat in most
ent methods
tion) roles t
difficult, th
GPS is alm
quality, usi
ethods.
ndmarks. In
queries the
s.
the location
hat is quite
most already
ing a digital
n
e
n
e
y
l
F
From
visual s
become
Over
exempla
Figure 1
phone c
Note
mobile p
or mobi
F
Fig. 11. The
m Fig.11, we
search perfo
e much more
rall Perform
ar scenarios
12, which g
camera.
that using
phone phot
ile phone is
Fig.12. Over
e performan
e find that
ormance. H
e acceptable
mance Com
s) with resp
gives an intu
solely visu
tos; but with
almost iden
rall perform
nce of blurri
introducing
However, by
e comparing
mparison: W
pect to usin
uitive findin
ual search, t
h the combi
ntical.
mance comp
ing and sha
g blurring a
y incorporat
g with the p
We further s
ng either di
ng about the
the perform
ination of G
parison betw
aking querie
and shaking
ting GPS in
pure visual q
show the ov
igital came
e mAP diffe
mance of ca
GPS, the per
ween using c
es of phone
g would def
nto similari
query result
verall perfor
ra and mob
erence betwe
amera photo
rformances
camera and
camera pho
finitely deg
ity ranking,
ts.
rmance (168
bile phone
ween digital
os is better
of using eit
mobile pho
otos.
generate the
the results
8 queries of
cameras in
camera and
than using
ther camera
ones.
e
s
f
n
d
g
a
Figur
the rest
visual s
scales d
search p
around
scale lan
Final
168 que
and visu
pure vis
re 13 furthe
as the searc
search perfo
due to less b
performance
a larger sca
ndmarks yie
Fig. 1
ly, we inve
eries), as sh
ual search t
sual search,
Fig. 14. O
er compares
ched datase
ormance of
background
e of small s
ale landmark
eld better re
3. Performa
estigate the
hown in Fig
together. A
this degene
Overall perf
the perform
et) among d
large scale
clusters and
scale is bett
k. Moreover
esults of fus
ance compa
overall per
gure 14. Un
lthough dis
eration effec
formance of
mance over
different lan
landmarks
d more disti
ter than larg
r, as the GP
sing visual s
arison amon
rformance o
ndoubtedly,
stractor ima
cts are allev
f 570 querie
the whole d
dmark scale
are much b
inguishing i
ge scale, as
PS plays rel
search and G
ng different
of in total 5
the best res
ages typicall
viated by int
es in Peking
database (on
es. It is wor
better than t
interesting p
the GPS si
atively imp
GPS inform
scales of la
570 queries
sults come
ly degenera
tegrating GP
g University
ne image as
rth mention
the medium
points. The
ignal may b
portant roles
mation.
andmarks.
s (including
from fusing
ate the perf
PS with vis
y Landmark
s the query,
ning that the
m and small
GPS based
be distorted
s, the small-
g the above
g both GPS
formance of
sual cues.
ks.
,
e
l
d
d
-
e
S
f
mAP
15-16, o
perform
diverse
both blu
interest
adding
taken se
P Performa
over those
mance can b
performanc
urring/Nigh
points wou
distractors
eriously, wh
Fig.15. S
Fig.16. G
ance with r
168 querie
be more or
ces. Genera
ht and addin
uld be chal
in Fig. 15
hile the simp
olely BoW
GPS+BoW p
respect to D
es, it is qui
r less impr
ally speakin
ng distracto
llenged by
and 16, w
ple combina
performanc
performanc
Different S
ite obvious
roved, whil
ng, the wors
or images.
mobile blu
we can see t
ation is not
ce comparis
ce comparis
earch Base
that by ad
le different
st performan
The former
urring queri
the use of c
robust enou
sons in five
ons in five t
elines: Furth
dding contex
mobile qu
nces origina
r indicates
ies. By com
contextual i
ugh for deal
typical que
typical quer
hermore, fr
extual inform
uery scenar
ate from the
that the us
mparing the
information
ling with di
ery scenario
ry scenarios
om Figures
mation, the
rios present
e queries of
se of visual
e results of
n should be
istractors.
os.
s.
s
e
t
f
l
f
e
7 Application Scenarios 
We brief possible application scenarios of our Peking University Landmarks database as follows:
A benchmark dataset for mobile visual search: We hope the Peking University Landmarks
could become useful resource to validate mobile visual search systems. It emphasizes two
important factors in mobile visual search: query quality and contextual cues. To the best of our
knowledge, both are beyond the state-of-the-art benchmark databases. In addition, it offers a
dataset to evaluate the effectiveness and robustness of contextual information.
A benchmark dataset for location recognition: This dataset can be used to evaluate
traditional location recognition systems since GPS location are bound with each image instance
A training resource for scene modeling: This dataset may facilitate scene analysis and
modeling since our photograph is well designed to cover multi-shot, multi-view appearances of
the landmarks of multi-scale. To this end, we will provide the camera calibration information in
our future work.
A training resource to learn better photograph manners: Our landmark photo collection
can be further exploited to learn the (recommended) mobile visual photograph manners (proper
angle and shot size for different types of landmarks) towards better visual search results.
8 References 
[1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li ImageNet: A Large-Scale
Hierarchical Image Database. CVPR. 2010.
[2] H. Shao, T. Svoboda, and L. Van Gool ZuBuD-zurich buildings database for image based
recognition. Technical Report, Computer Vision Lab, Swiss Federal Institute of Technology.
2006.
[3] Philbin, J. , Chum, O. , Isard, M. , Sivic, J. and Zisserman, A. Object retrieval with large
vocabularies and fast spatial matching. CVPR. 2007.
[4] Rongrong Ji, Xing Xie, Hongxun Yao, Wei-Ying Ma Hierarchical Optimization of Visual
Vocabulary for Effective and Transferable Retrieval. CVPR. 2009.
[5] Nister D. and Stewenius H. Scalable recognition with a vocabulary tree. CVPR. 2006.
[6] S. Tsai, D. Chen, G. Takacs, V. Chandrasekhar, J. Singh, and B. Girod. Location Coding for
Mobile Image Retrieval. Proc. 5th International Mobile Multimedia Communications
Conference, MobiMedia. 2009.
[7] Lowe D. G. Distinctive image features from scale invariant key points. IJCV. 2004.
[8] M. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with
applications to image analysis and automated cartography. Comm. of the ACM, 24: 381-395,
1981.
[9]. Rongrong Ji, Lingyu Duan, Tiejun Huang, Hongxun Yao, and Wen Gao. Compact
Descriptors for Visual Search - Location Discriminative Mobile Landmark Search, CDVS AD
HOC Group, Input Contribution m18542, 94th MPEG Meeting, Oct. 2010

More Related Content

What's hot

unrban-building-damage-detection-by-PJLi.ppt
unrban-building-damage-detection-by-PJLi.pptunrban-building-damage-detection-by-PJLi.ppt
unrban-building-damage-detection-by-PJLi.ppt
grssieee
 
3.introduction onwards deepa
3.introduction onwards deepa3.introduction onwards deepa
3.introduction onwards deepa
Safalsha Babu
 

What's hot (9)

Knowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering LocalizationKnowledge-based Fusion for Image Tampering Localization
Knowledge-based Fusion for Image Tampering Localization
 
unrban-building-damage-detection-by-PJLi.ppt
unrban-building-damage-detection-by-PJLi.pptunrban-building-damage-detection-by-PJLi.ppt
unrban-building-damage-detection-by-PJLi.ppt
 
Remote Sensing Image Scene Classification
Remote Sensing Image Scene ClassificationRemote Sensing Image Scene Classification
Remote Sensing Image Scene Classification
 
Object detection elearning
Object detection elearningObject detection elearning
Object detection elearning
 
Geotagging Photographs By Sanjay Rana
Geotagging Photographs By Sanjay RanaGeotagging Photographs By Sanjay Rana
Geotagging Photographs By Sanjay Rana
 
Performance Evaluation of CNN Based Pedestrian and Cyclist Detectors On Degra...
Performance Evaluation of CNN Based Pedestrian and Cyclist Detectors On Degra...Performance Evaluation of CNN Based Pedestrian and Cyclist Detectors On Degra...
Performance Evaluation of CNN Based Pedestrian and Cyclist Detectors On Degra...
 
GPU Accelerated Automated Feature Extraction From Satellite Images
GPU Accelerated Automated Feature Extraction From Satellite ImagesGPU Accelerated Automated Feature Extraction From Satellite Images
GPU Accelerated Automated Feature Extraction From Satellite Images
 
3.introduction onwards deepa
3.introduction onwards deepa3.introduction onwards deepa
3.introduction onwards deepa
 
Investigation of Chaotic-Type Features in Hyperspectral Satellite Data
Investigation of Chaotic-Type Features in Hyperspectral Satellite DataInvestigation of Chaotic-Type Features in Hyperspectral Satellite Data
Investigation of Chaotic-Type Features in Hyperspectral Satellite Data
 

Viewers also liked

화해의 기술(선화.정규.요한)
화해의 기술(선화.정규.요한)화해의 기술(선화.정규.요한)
화해의 기술(선화.정규.요한)
Yohan Kang
 
Taller - Pedro Morales en Rosario
Taller - Pedro Morales en RosarioTaller - Pedro Morales en Rosario
Taller - Pedro Morales en Rosario
GFOrellana
 
Untitled 31
Untitled 31Untitled 31
Untitled 31
animol
 

Viewers also liked (16)

Presentación orientadorxs junio
Presentación orientadorxs junioPresentación orientadorxs junio
Presentación orientadorxs junio
 
Pauta i2
Pauta i2Pauta i2
Pauta i2
 
화해의 기술(선화.정규.요한)
화해의 기술(선화.정규.요한)화해의 기술(선화.정규.요한)
화해의 기술(선화.정규.요한)
 
J 10 2
J 10 2J 10 2
J 10 2
 
Taller - Pedro Morales en Rosario
Taller - Pedro Morales en RosarioTaller - Pedro Morales en Rosario
Taller - Pedro Morales en Rosario
 
Анализ развития информационного общества в Украине
Анализ развития информационного общества в УкраинеАнализ развития информационного общества в Украине
Анализ развития информационного общества в Украине
 
Agujeros negros
Agujeros negrosAgujeros negros
Agujeros negros
 
Nutrición y su importancia
Nutrición y su importanciaNutrición y su importancia
Nutrición y su importancia
 
A hiány kornai szeged_2014_márc_6
A hiány kornai szeged_2014_márc_6A hiány kornai szeged_2014_márc_6
A hiány kornai szeged_2014_márc_6
 
I love your accent!
I love your accent!I love your accent!
I love your accent!
 
Untitled 31
Untitled 31Untitled 31
Untitled 31
 
Sin título
Sin títuloSin título
Sin título
 
Konstruktor dan destruktor
Konstruktor dan destruktorKonstruktor dan destruktor
Konstruktor dan destruktor
 
Sistema de seguridad
Sistema de seguridadSistema de seguridad
Sistema de seguridad
 
Greetings
GreetingsGreetings
Greetings
 
Hume presentacion
Hume presentacionHume presentacion
Hume presentacion
 

Similar to peking-university-landmarks-a-context-aware-visual-search-benchmark-database

Video Liveness Verification
Video Liveness VerificationVideo Liveness Verification
Video Liveness Verification
ijtsrd
 
Program for 2015 ieee international conference on consumer electronics taiw...
Program for 2015 ieee international conference on consumer electronics   taiw...Program for 2015 ieee international conference on consumer electronics   taiw...
Program for 2015 ieee international conference on consumer electronics taiw...
supra_uny
 

Similar to peking-university-landmarks-a-context-aware-visual-search-benchmark-database (20)

IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
IRJET- A Survey on the Enhancement of Video Action Recognition using Semi-Sup...
 
Interactive Multimodal Visual Search on Mobile Device
Interactive Multimodal Visual Search on Mobile DeviceInteractive Multimodal Visual Search on Mobile Device
Interactive Multimodal Visual Search on Mobile Device
 
Key Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity RecognitionKey Frame Extraction for Salient Activity Recognition
Key Frame Extraction for Salient Activity Recognition
 
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
IRJET- A Real Time Yolo Human Detection in Flood Affected Areas based on Vide...
 
IRJET- Object Detection and Recognition for Blind Assistance
IRJET- Object Detection and Recognition for Blind AssistanceIRJET- Object Detection and Recognition for Blind Assistance
IRJET- Object Detection and Recognition for Blind Assistance
 
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
IRJET-  	  Behavior Analysis from Videos using Motion based Feature ExtractionIRJET-  	  Behavior Analysis from Videos using Motion based Feature Extraction
IRJET- Behavior Analysis from Videos using Motion based Feature Extraction
 
IRJET- Recognition of OPS using Google Street View Images
IRJET-  	  Recognition of OPS using Google Street View ImagesIRJET-  	  Recognition of OPS using Google Street View Images
IRJET- Recognition of OPS using Google Street View Images
 
IRJET- Application of MCNN in Object Detection
IRJET-  	  Application of MCNN in Object DetectionIRJET-  	  Application of MCNN in Object Detection
IRJET- Application of MCNN in Object Detection
 
Fusion of demands in review of bag of-visual words
Fusion of demands in review of bag of-visual wordsFusion of demands in review of bag of-visual words
Fusion of demands in review of bag of-visual words
 
Performance investigation of two-stage detection techniques using traffic lig...
Performance investigation of two-stage detection techniques using traffic lig...Performance investigation of two-stage detection techniques using traffic lig...
Performance investigation of two-stage detection techniques using traffic lig...
 
76201950
7620195076201950
76201950
 
IRJET - Direct Me-Nevigation for Blind People
IRJET -  	  Direct Me-Nevigation for Blind PeopleIRJET -  	  Direct Me-Nevigation for Blind People
IRJET - Direct Me-Nevigation for Blind People
 
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image EncryptionSecure IoT Systems Monitor Framework using Probabilistic Image Encryption
Secure IoT Systems Monitor Framework using Probabilistic Image Encryption
 
Ijciet 10 02_043
Ijciet 10 02_043Ijciet 10 02_043
Ijciet 10 02_043
 
Satellite Image Classification and Analysis using Machine Learning with ISRO ...
Satellite Image Classification and Analysis using Machine Learning with ISRO ...Satellite Image Classification and Analysis using Machine Learning with ISRO ...
Satellite Image Classification and Analysis using Machine Learning with ISRO ...
 
Video Liveness Verification
Video Liveness VerificationVideo Liveness Verification
Video Liveness Verification
 
Ijsartv6 i336124
Ijsartv6 i336124Ijsartv6 i336124
Ijsartv6 i336124
 
Automated Surveillance System and Data Communication
Automated Surveillance System and Data CommunicationAutomated Surveillance System and Data Communication
Automated Surveillance System and Data Communication
 
Program for 2015 ieee international conference on consumer electronics taiw...
Program for 2015 ieee international conference on consumer electronics   taiw...Program for 2015 ieee international conference on consumer electronics   taiw...
Program for 2015 ieee international conference on consumer electronics taiw...
 
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background SubtractionA Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 

Recently uploaded (20)

Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 

peking-university-landmarks-a-context-aware-visual-search-benchmark-database

  • 1. INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO ISO/IEC JTC1/SC29/WG11 MPEG2011/m19188 January 2011, Daegu, Korea Source Peking University, Harbin Institute of Technology, China Status Input Contribution Title Peking University Landmarks: A Context Aware Visual Search Benchmark Database Author lingyu@pku.edu.cn, Lingyu Duan jirongrong@gmail.com, Rongrong Ji cjie@pku.edu.cn, Jie Chen syang@pku.edu.cn, Shuang Yang tjhuang@pku.edu.cn, Tiejun Huang hongxun.yao@gmail.com, Hongxun Yao wgao@pku.edu.cn, Wen Gao   1 Introduction  The 93rd MPEG meeting output draft requirements documents (w11529, w11530 and w11531) of Compact Descriptors for Visual Search. To advance this work, this contribution presents our work on establishing context aware visual search benchmark database for mobile Landmark search. In the input contribution m18542 at MPEG 94th Meeting [9], Peking University has proposed a compact descriptor for visual search, which combines location cues to learn a discriminative and compact visual descriptor that is very suitable for mobile landmark search. We believe our practice as well as the benchmark dataset would enhance the use cases and be helpful to identify requirements for Compact Descriptors for Visual Search. While there are ever growing focuses on mobile visual search in recent years, a comprehensive benchmark database for fair evaluation among different strategies is still missing. In particular, the rich contextual cues in mobile devices, such as GPS information and camera parameters, are left unexploited in the current visual search benchmarks. This contribution introduces a Peking University Landmarks benchmark for the quantitative evaluations of mobile visual search performance with the support of GPS information. It contains over 13179 images organized into 198 distinct landmark locations within the Peking University campus, which is built by 20 volunteers during November and December, 2010. Each location is captured with multiple shot sizes and viewing angles, using both digital cameras and phone cameras, each photo being tagged with rich contextual information in the mobile scenarios. Moreover, this benchmark studies typical quality degeneration scenarios in mobile photographing, including variable resolutions, blurring, lighting changes, occlusions, as well as various viewing angles. Together with this benchmark, we provide the bag-of-visual-words search baselines in the cases of using either spatial or contextual information in returning image ranking. Finally, distractor images are further introduced to evaluate the robustness of visual search methods in the database.
  • 2. 2 Mo Coming increasi munitie establish search s the rich benefici are left We photogr applicat evaluate 198 lan sufficien focus o systema 3 Be Sca contains digital a DSC-W NIKON from m iphone respecti pair of GPS de digital a blurring the volu phone p during N Fig. 1. T otivation g with the ing interest s. However hed benchm scenarios th h contextual ial to refine unexploited believe a raphing var tions. In thi e GPS conte ndmarks lo nt real-worl n the avail atic methodo enchmark ale and C s over 1317 and phone c W290, Samsu N COOLPIX mobile phon 3G and LG ively. We re volunteers, evice (HOL and phone c g and shakin unteers com photos than November a Two typical n  explosive ts in compu r, state-of-th mark databa hat involve l cues, such e solely visu d in the exis real-world riances is is contributi ext assisted ocations wi ld photogra ability of c ology to ev k Databa Constituti 79 scene ph cameras. Th ung Techwi X L12 and ne cameras G Electroni ecruited ove one using LUX M-120 camera phot ng are mor mpensate th the digital and Decemb l scenarios o growth of uter vision, he-art work ase, which lots of phot h as GPS, ti ual ranking. sting visual d, context important t ion, we intr mobile visu ithin the P aphing varia contextual c aluate the ro ase Statis ion:  The P hotos, organ here are in t in <Digima Canon IXU (Nokia E7 cs KP500 w er 20 volunt digital cam 00E) with t tographers a e frequent heir bad pho camera pho ber, 2010. of capturing f phone ca , multimedi ks are rarel should be tographing ime stamp, However, t search benc rich bench to put forw roduce the ual search p Peking Uni ances typica cues to imp obustness o stics  Peking Univ nized into19 total 6193 p ax S830 / Ke US 210 wi 72-1, HTC with resolu teers in data mera and the them. The are within 1 happenings oto with a one. All the g landmark ameras, mo ia analysis, y compared designed to variances u and base s the effectiv chmarks. hmark with ward mobi Peking Uni performance iversity cam ally for mob prove the v of contextua versity Lan 98 landmark photos captu enox S830> ith resolutio Desire, No ution 640×4 a acquisitio e other usin averaged v 0 degrees f s for mobile new one, w e images in photos in d obile visual , and inform d among ea o target rea using phone tation infor eness and ro h sufficien le visual s iversity Lan e. The datas mpus. Our bile phone c isual search al cue by ad ndmarks ben k locations ured from d >, Canon DI on 2592×19 okia 5235, 480, 1600×1 n, each land ng mobile p viewing ang for both volu e phone cap which thus n the entire ifferent sho l search ha mation retr ach other o al-world mo e cameras. I rmation, are obustness o nt coverage search rese ndmarks be set is collec benchmar cameras. W h performan dding contex nchmark (P and captur digital came IGITAL IX 944) and 6 Apple iph 1200 and 2 dmark is ca phone, with gle variation unteers. No pturing. In produce m database ar ots sizes and as received rieval com- over a well- obile visual In addition, e extremely of such cues e of users’ arches and nchmark to ct from over rk provides We put more nce, with a xt distractor PKUBench) ed via both eras (SONY XUS 100 IS, 986 photos one, Apple 2048×1536) aptured by a h a portable ns between ote that both such cases, more mobile re collected d angles. d - - l , y s ’ d o r s e a r. ) h Y , s e ) a e n h , e d
  • 3. As i medium which a degrees differen with res images Fig. 2. Fi Co scenario mobile illustrated i m shot and c attempt to c respectivel nt weathers spect to lan of different The landma ig.3. The pe ontextual o is closely user’s geo n Figure 1, close up. Fo cover 360 d ly. The cap (sunny, clou ndmark loc t landmarks ark photo d on the ercentages o Cues: Co y related to ographical l we capture or each sho degrees from pturing of bo udy, etc.) du ations are . The perce istribution b Google Ma of both phon omparing w rich contex location can e photos in ot size, ther m the front oth digital during Nove given in Fi entage of mo by overlayin ap of Peking ne camera a with genera xtual inform n be levera three differ re are at mo tal view of camera and ember and D igure 2. Di obile and ca ng the locat g University and digital c alized visu mation on th aged to pre rent shot siz ost 8 directi the landma d mobile ph December. T fferent colo amera photo tion point of y campus. camera phot ual search, he mobile p e-filter mos zes, namely ions in pho ark, capture hone photos The photo d ors denote os is given i f each colle tos in PKUB mobile vis phone. For st of unrela y long shot, tographing, ed every 45 s undergoes distributions the sample in Figure 3. ected photo Bech. sual search instance, a ated scenes , , 5 s s e h a s
  • 4. without visual ranking. Over PKUBench, we pay more focus to the use of such contextual cues in facilitating visual search, including: (1) GPS tag (both latitude and longitude); (2) landmark name label; (3) shot size (long, medium, and close-up) and viewpoints (frontal, side, and others) of those photos; (4) camera type (digital camera or mobile phone camera); (5) capture time stamp. We also provide EXIF information: camera setting (focal, resolution). In addition, we will show the performance improvement of using contextual information by providing baselines that leverage GPS to refine visual ranking. Furthermore, the effects of less precise contextual information are also investigated by adding distractor images by imposing random GPS distortion to the original GPS location of an image. Scene Diversity: We provide as diverse landmark appearances as possible to simulate the real-world difficulty in visual search. Hence, the volunteers are encouraged to capture both queries and the ground truth photos (for both digital and phone cameras) without any particular intent to avoid the intruding foreground, e.g. cars, human faces, and tree occlusions. 4 Comparing with Related Benchmarks  Zubud Database [2] is widely adopted to evaluate vision-based geographical location recognition, which contains 1,005 color images of 201 buildings or scenes (5 images per building or per scene) in Zurich city, Switzerland. Oxford Buildings Database [3] contains 5,062 images collected from Flickr by searching for particular Oxford landmarks, with manual annotated ground truth for 11 different landmarks, each represented by 5 possible queries. SCity Database  [4] contains 20, 000 street-side photos for mobile visual search validation in Microsoft Photo2Search system [4]. It is captured along the Seattle urban streets by a car automatically, taken from the main streets of Seattle by a car with six surrounding cameras and a GPS device. The location of each captured photo is obtained by aligning the time stamps of photos and GPS record. UKBench Database [5] contains 10,000 images with 2,500 objects, containing indoor objects like CD Covers, book set, etc. There are four images per object to offer sufficient variances in viewpoints, rotations, lighting conditions, scales, occlusions, and affine transforms. Stanford Mobile Visual Search Data Set [6] contains camera-phone images of products, CDs, books, outdoor landmarks, business cards, text documents, museum paintings and video clips. It provides several unique characteristics, e.g. varying lighting conditions, perspective distortion, and mobile phone queries. Table 1. Brief comparison of related benchmarking databases. Database PKUBench Zubud Oxfold SCity UKBench Stanford Data Scale 13,179 1,005 5,062 20,000 10,000 Images Per Landmark /Object Category 66 5 92 6 4 Mobile Capture √ × × × × √ Categorized shot size, view Angle, landmark/Object Scale √ × × × Indoor × Blurring Query √ × × × × × Context √ × × × × ×
  • 5. PK aspects: Low qu camera; queries Table 1 5 Ex Five gr evaluate Oc queries, Ba queries. yield wo Nig photo qu Blu and 20 c Ad contextu Palace ( Univers the orig degener Fig.4. E Occlusi KUBench : (1) Rich c uality cellp ; quantize t and databa presents th emplar M roups of ex e the real-w cclusive Q , occluded b ackground . These are orse results ght Quer uality heavi urring an correspondi dding Dist ual informa (note: the l sity) and 20 ginal databa ration. Examples of ve, Backgro Database contextual i hone queri the perform ase caused b he brief com Mobile Q xemplar mo world visual Query Set by foregroun d Clutter often capt due to the b ry Set con ily depends nd Shakin ing mobile q tractors i ation to visu andmark bu 012 photos f ase. We then f query scen ound Clutte e: Our data information es with co mance dege by cars, peo mparison of r Query Sce obile query search perf t contains 2 nd cars, peo s Query tured far aw bias of othe ntains 9 mo on the ligh ng Query queries with into Data ual search. W uildings in from PKU, n select 10 narios (Digi ers, Blurring abase is pro n to simulat omparison t eneration of ople, trees, related benc enarios y scenarios formance in 20 mobile q ople, and bu Set contai way from a er nearby bu obile phone hting conditi Set contain hout any blu abase is to We collect a Summer Pa then rando locations ( ital Camera g/ Shaking, oviding rich te what we to the corre f cellphone and nearby chmarking d (in total 1 n challenging queries and uildings. ns 20 mobi a landmark, uildings. e queries an ions. ns 20 mobil urring or sh evaluate th a distractor alace are vi omly assign (30 queries) versus Mob and Night. query scen can get fro esponding queries; (3 y buildings, databasets i 68 queries) g situations 20 corresp ile queries where GP nd 9 digital le queries w haking. he effects of set of 6630 isually simi ed them wi ) from PKU bile Phone) narios in the om mobile p queries of 3) Occlusio blurring an in the state- ) are demo s (See Figur ponding dig and 20 dig S based se l camera qu with blurring f applying l 0 photos fro ilar as those ith the GPS U to evaluat ) (From Top e following phones. (2) the digital ons in both nd shaking. of-the-art. onstrated to re 4): gital camera gital camera arch would ueries. The g or shaking less precise om Summer e in Peking S tagging of te the mAP p to Bottom g ) l h . o a a d e g e r g f P m:
  • 6. La walking with thr medium small on Small Sc Medium Large Sc andmark g distances ree scales, s m scale is 1 nes, 75 med Tabl cale (0-12m): Scale (12-30 cale (> 30m): (Examples Fig. 5. T Scale: We of the phot small, medi 2-30 m and dium ones, a le. 2. Typic S m): C h L la s of Differe The photo v try to categ tographers ium, and la d the large and 60 large al landmark Sculpture, ston Courtyard, and historic buildin Large building arge object (e. nt Scales: F volumes of t gorize the la around eac arge. The ty scale is ov e ones. k types of th ne, pavilions, d small or m ngs (smaller fl gs, such as lib .g. BoYa Tow From Top to three differe andmark sc ch assigned ypical distan ver 30m. As hree differen gates and othe medium sized floor area).. brary, comple wer). o Down: Sm ent landmar ale by meas landmark l nce for sma s shown in nt Landmar ers. buildings, su ex building, o mall, Medium rk scales in suring the r location. W all scale is 0 Figure 5, w rk scales uch as office or a long shot m, and Larg PKUBench range of the We come up 0-12 m, the we have 63 buildings, t of a very ge) h e p e 3
  • 7. Fina 6 Mo We pro assisted (1) B build a ally, we pro F obile Vis ovide severa d visual sear BoW: We e Scalable V ovide more p Fig.6. Ph Fig.7. Photo sual Sear al visual se rch: extract SIFT ocabulary T photograph hoto volume o volume dis rch Basel earch baseli T [7] featur Tree [5] to g details in F e distributio stribution b lines  ines, includ res from ea generate the Figure 6 and n by differe by different ding purely ach photo, t e initial Vo d Figure 7. ent shot size viewing ang visual sear the ensembl cabulary V. es gles. rch as well le of which . The SVT as context h is used to generates a t o a
  • 8. bag-of-w the bran approxi search p (2) function on the w where D and BoW the BoW is based It is while it satisfact contains building be well mA perform Fig.8. T camera Note happens to favor degener perform words signa nching fact mate 100,0 performance   GPS + Bo n by multip weighting fu Dis(A,Q) is WDis(A,Q) W based vis d on such sim s worth men t typically tory perform s lots of tr gs (such as distinguish AP Perform mance of eac The perform and mobile e that most s in the long r other near ration using mance when ature Vi for tor as B. In 000 codewo e, which rev oW: We fu plying the G unction as: (Dis A s the overal ) stands for ual distance milarity me ntioning tha gives prom mance in th rees that ar ancient Chi hed by RAN mance with ch query sce mance of oc e phone cam t of occlusi g or medium rby landma g solely G combining r each datab n a typical ords. We us veals its pos urther lever GPS distanc , )A Q GeoD ll distance b the geogra e between q easurement i at we have mising resul is database. e un-regula inese buildi NSAC. h respect t enario respe cclusive qu mera(Y axis: ive queries m shot of a arks around GPS inform visual sear base photo I l settlement se mean Av sition-sensit rage the lo ce with the ( , )Dis A Q B between qu aphical dista query Q and in Equation discovered lts in tradit . There are ar for spati ings) that ha to differen ectively as f ueries with : mAP@N p come from a large scale d the query mation. Thi rch with GP Ii. We deno t, we have verage Preci tive ranking cation cont BoW distan ( ,BoWDis A uery Q and ance (measu d database im n (1). that the RA tional visua two possibl al re-rankin ave very sim nt challeng follows: respect to performanc m a large sc e landmark. location, w s may eve S informati ote the hiera H = 6 an ision at N ( g precision a text to refin nces to the )Q (1) database im ured by GP mage A resp ANSAC ba al experimen le reasons: ( ng; (2) The milar local f ging scena difference e; X axis: to cale landma In such cas which would en degenera ion. archical lev nd B = 10, (mAP@N) at the top N ine the visu query exam ) mage A; Ge PS distance) pectively. O ased spatial ents, does n (1) PKUBe ere are lots features, wh arios: We d methods u op N return ark, as occl se, GPS po d lead to p ate the vis el as H and producing to evaluate N returning. ual ranking mple, based eoDis(A,Q) ) as well as Our ranking re-ranking, not produce nch usually s of similar hich cannot discuss the sing digital ning results) usion often sition tends erformance sual search d g e g d ) s g , e y r t e l ). n s e h
  • 9. Fig In p such cas major p The N recognit differen enough camera .9. The perf practice, the ses, the pur part of a que Fig.10. T Night query tion perform nt from the at night. It can achieve formance of e backgroun rely visual s ery photo is The perform y is an inter mance. Extr day time. is worth m e better visu f backgroun nd clutter ty search perfo actually oc mance of Ni resting case, racting dist Hence, we entioning th ual search p nd clutters q ypically hap ormance per cupied by b ight queries , where GP tinguishing e can obser hat due to b performance queries with ppens in cap rforms wors backgrounds with respec PS (contextu local featu ve that usin better image e than a mob h respect to pturing sma se, due to th s. ct to differe ual informat ures is very ng solely G e capturing bile phone. different m all scale lan hat in most ent methods tion) roles t difficult, th GPS is alm quality, usi ethods. ndmarks. In queries the s. the location hat is quite most already ing a digital n e n e y l
  • 10. F From visual s become Over exempla Figure 1 phone c Note mobile p or mobi F Fig. 11. The m Fig.11, we search perfo e much more rall Perform ar scenarios 12, which g camera. that using phone phot ile phone is Fig.12. Over e performan e find that ormance. H e acceptable mance Com s) with resp gives an intu solely visu tos; but with almost iden rall perform nce of blurri introducing However, by e comparing mparison: W pect to usin uitive findin ual search, t h the combi ntical. mance comp ing and sha g blurring a y incorporat g with the p We further s ng either di ng about the the perform ination of G parison betw aking querie and shaking ting GPS in pure visual q show the ov igital came e mAP diffe mance of ca GPS, the per ween using c es of phone g would def nto similari query result verall perfor ra and mob erence betwe amera photo rformances camera and camera pho finitely deg ity ranking, ts. rmance (168 bile phone ween digital os is better of using eit mobile pho otos. generate the the results 8 queries of cameras in camera and than using ther camera ones. e s f n d g a
  • 11. Figur the rest visual s scales d search p around scale lan Final 168 que and visu pure vis re 13 furthe as the searc search perfo due to less b performance a larger sca ndmarks yie Fig. 1 ly, we inve eries), as sh ual search t sual search, Fig. 14. O er compares ched datase ormance of background e of small s ale landmark eld better re 3. Performa estigate the hown in Fig together. A this degene Overall perf the perform et) among d large scale clusters and scale is bett k. Moreover esults of fus ance compa overall per gure 14. Un lthough dis eration effec formance of mance over different lan landmarks d more disti ter than larg r, as the GP sing visual s arison amon rformance o ndoubtedly, stractor ima cts are allev f 570 querie the whole d dmark scale are much b inguishing i ge scale, as PS plays rel search and G ng different of in total 5 the best res ages typicall viated by int es in Peking database (on es. It is wor better than t interesting p the GPS si atively imp GPS inform scales of la 570 queries sults come ly degenera tegrating GP g University ne image as rth mention the medium points. The ignal may b portant roles mation. andmarks. s (including from fusing ate the perf PS with vis y Landmark s the query, ning that the m and small GPS based be distorted s, the small- g the above g both GPS formance of sual cues. ks. , e l d d - e S f
  • 12. mAP 15-16, o perform diverse both blu interest adding taken se P Performa over those mance can b performanc urring/Nigh points wou distractors eriously, wh Fig.15. S Fig.16. G ance with r 168 querie be more or ces. Genera ht and addin uld be chal in Fig. 15 hile the simp olely BoW GPS+BoW p respect to D es, it is qui r less impr ally speakin ng distracto llenged by and 16, w ple combina performanc performanc Different S ite obvious roved, whil ng, the wors or images. mobile blu we can see t ation is not ce comparis ce comparis earch Base that by ad le different st performan The former urring queri the use of c robust enou sons in five ons in five t elines: Furth dding contex mobile qu nces origina r indicates ies. By com contextual i ugh for deal typical que typical quer hermore, fr extual inform uery scenar ate from the that the us mparing the information ling with di ery scenario ry scenarios om Figures mation, the rios present e queries of se of visual e results of n should be istractors. os. s. s e t f l f e
  • 13. 7 Application Scenarios  We brief possible application scenarios of our Peking University Landmarks database as follows: A benchmark dataset for mobile visual search: We hope the Peking University Landmarks could become useful resource to validate mobile visual search systems. It emphasizes two important factors in mobile visual search: query quality and contextual cues. To the best of our knowledge, both are beyond the state-of-the-art benchmark databases. In addition, it offers a dataset to evaluate the effectiveness and robustness of contextual information. A benchmark dataset for location recognition: This dataset can be used to evaluate traditional location recognition systems since GPS location are bound with each image instance A training resource for scene modeling: This dataset may facilitate scene analysis and modeling since our photograph is well designed to cover multi-shot, multi-view appearances of the landmarks of multi-scale. To this end, we will provide the camera calibration information in our future work. A training resource to learn better photograph manners: Our landmark photo collection can be further exploited to learn the (recommended) mobile visual photograph manners (proper angle and shot size for different types of landmarks) towards better visual search results. 8 References  [1] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and F.-F. Li ImageNet: A Large-Scale Hierarchical Image Database. CVPR. 2010. [2] H. Shao, T. Svoboda, and L. Van Gool ZuBuD-zurich buildings database for image based recognition. Technical Report, Computer Vision Lab, Swiss Federal Institute of Technology. 2006. [3] Philbin, J. , Chum, O. , Isard, M. , Sivic, J. and Zisserman, A. Object retrieval with large vocabularies and fast spatial matching. CVPR. 2007. [4] Rongrong Ji, Xing Xie, Hongxun Yao, Wei-Ying Ma Hierarchical Optimization of Visual Vocabulary for Effective and Transferable Retrieval. CVPR. 2009. [5] Nister D. and Stewenius H. Scalable recognition with a vocabulary tree. CVPR. 2006. [6] S. Tsai, D. Chen, G. Takacs, V. Chandrasekhar, J. Singh, and B. Girod. Location Coding for Mobile Image Retrieval. Proc. 5th International Mobile Multimedia Communications Conference, MobiMedia. 2009. [7] Lowe D. G. Distinctive image features from scale invariant key points. IJCV. 2004. [8] M. Fischler and R. C. Bolles. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Comm. of the ACM, 24: 381-395, 1981. [9]. Rongrong Ji, Lingyu Duan, Tiejun Huang, Hongxun Yao, and Wen Gao. Compact Descriptors for Visual Search - Location Discriminative Mobile Landmark Search, CDVS AD HOC Group, Input Contribution m18542, 94th MPEG Meeting, Oct. 2010