SlideShare a Scribd company logo
Semantic Indexing of Wearable
Camera Images: Kids’Cam
Concepts
Alan F. Smeaton
(Dublin City University)
… and …
... Kevin McGuinness and Cathal Gurrin and Jiang Zhou
and Noel E. O’Connor
and Peng Wang
and Brian Davis and Lucas Azevedo
and Andre Freitas
and Louise Signal and Moira Smith and James Stanley
and Michelle Barr and Tim Chambers and Cliona Ní
Mhurchu
Overview
• Automatic assignment of one-per-class concept detectors
is now commonplace.
• We’re interested in the challenging case of processing
images from wearable cameras where improvement is
necessary.
• We try to exploit some limited manual annotations to
improve accuracy of automatic concept weights.
• This work is not complete, its ongoing, but the story is
interesting.
Analysis of Visual Media
• More progress made within the last few years than in previous decade
• Incorporation of deep learning plus availability of huge searchable
image resources and training data
• Automatic image tagging is now hosted
and offered by website like Aylien,
Imagga, Clarifai, and others, and very
cost-effective.
Analysis of Visual Media
• These developments are welcome … but … restrictive tagging
vocabularies.
• How to map to users formulating queries
• Alternative approach is tagging at query time but its expensive and not
scalable to huge collections.
• Almost all work on concept detection based on one concept at a time.
• TRECVid tried simultaneous detection of concept pairs like “computer-
screen with telephone”, and “airplane with clouds”.
• Limited success but “Government Leader with Flag” was OK !
• Detection of concepts independently needs a course-correction
because:
– Doesn’t avail of all available information sources
– Doesn’t map to a user’s search vocabulary
Long-term approach …
Images Concept Set
Mapping
User Search
vocabulary
How can a single image be mapped to two different vocabularies ?
Using NL for image search … tagging
• NL is fraught with complexities, ambiguities at all levels ..
– Lexical level polysemy
– Syntactic level structural ambiguity
– Semantic interpretations
– Discourse level pronoun resolution
• + vocabulary limitations when finding word or phrase to describe
something
• When using computers to help search for image data, language
challenges are exacerbated yet we assume a “simplistic” approach of
tagging by a set of concepts, notwithstanding what we’re seeing with
captioning here today
• Tagging is very useful for smaller, niche applications in restricted
domains with manual tagging, but we see scalability problems
– Addressed with progress in automatic tagging but we’re tolerant of
inaccuracies !
In this paper …
• We are interested in images from wearable cameras with lots of juicy
challenges.
• Notoriously difficult to process automatically because …
– Blurring caused by wearer moving at image capture
– Occlusions from wearer’s hands
– Lighting conditions
– Fisheye lens for wider perspective causing distortion
– First person viewpoint but not what wearer sees
– Content varies hugely across subjects
• Applications in memory support, behaviour recording and analysis,
security, other work-related, and QS.
• In this paper we work with wearable camera data from school children,
for analysis of their environments
Wearable Camera Images
The Kids’Cam Project
• Child obesity is a significant public health concern, worldwide.
• Unequivocal evidence that marketing of energy-dense and nutrient-
poor foods and beverages is a causal factor in child obesity.
• Evidence of children’s total exposure to advertising of poor foodstuffs
is not quantified.
• Kids’Cam study aimed to determine the frequency, nature and duration
of children’s exposure to such marketing.
• 169 randomly selected children 11 to 13 yo from 16 schools in
Wellington, NZ, each wore an Autographer and carried a GPS for 4
days .. .mages every 7 seconds, GPS every 5 seconds.
– 1.5M images, 2.5M GPS datapoints
• Manual annotation for food / beverage marketing using a 3-level, 53
concept ontology .. Inter-annotator reliability of 90%.
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
Manual Annotation
Shop front > sign > sugary drinks/juices
Convenience store indoors > in-store
marketing > convenience store
School > sign > fast food
Processing the Kids’Cam Data
• Following integration of different data sources and after the
manual annotation of images, we processed the image
collection in the following way …
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
7. Using a CNN to apply tags
to images. We used the
VGG-16 network, a deep
CNN, trained on 1,000
object classes using 1.2M
images from ImageNet
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
8. Trained models were
used to predict probabilities
for each concept in each of
1.5M images. Processed in
batches of 64 on NVIDIA
GPU, taking 4 days to
complete
Training Free Refinement
• Current concept-at-a-time classifiers do not consider inter-
concept relationships or dependencies yet these do exist
• To improve one-per-class detectors, we post-process detection
scores
– We take advantage of concept co-occurrence and re-
occurrence which depend on the particular collection
– We take advantage of local (temporal) neighbourhood
information where concepts are likely to re-occur close in
time
– We use GPS location information where concepts identified
by a person at a location, may re-occur subsequently at that
same location
• TFR is based on non-negative matrix factorisation, described
elsewhere
14
+ GPS + Date/Time + User
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
xxxx yyyy zzzz
85 concepts
:
X 1.5M images
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
aaaa bbbb cccc
TFR
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
aaaa’ bbbb’ cccc’
GPU
Concept
Models
6,000
concepts
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
xxxx xxxxx xxxxx xxxx xxxxx xxxxx
aaaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
aaaa bbbb ccccc
8
7
6
5
4
3
2
1
9
13
12
11
10
9. Previously-described, we
then applied Training-Free
Refinement to improve
probability assignments
• We do not know accuracy of assignment of 1,000 concepts but we
know accuracy of assignment of 53 concepts …and we have 1.5M
images each mapped into 2 concept spaces
• Can we adjust values in (b), anchored and pivoting around (a) in
addition to having already used local, within-collection distributions ?
y1
y2
x1 x2
b2
b1
a1 a2
(a) Manual, correct (b) Automatic,
unknown
accuracy
y1
y2
x1 x2
b2
b1
a1 a2
(a)
(b)
b2’
b1’
a1’ a2’
(c)
Cross-mapping concept spaces
• Distributional semantics – corpus-driven approach – based
on hypothesis that co-occurring words in similar contexts
have similar meaning
• Using word2vec in DINFRA, we can
map all words in a vocabulary to an
n-dimensional vector space, where
we can obtain relatedless scores
among the words
• Figure illustrates an example
• For each image in Kids’Cam we can
evaluate relatedness between human
annotation and automatic concepts
with highest-probability
School > availability > drink bottle
• We have top-ranked
concepts, their
confidences, their
relatedness to the
manual tags …
• First effort is to simply
multiple, as in Table, but
its hard to see the
impact of this
And the result is …
• … and that’s where we currently are !
Conclusions and Future Work
• Since automatic concept-detection using pre-defined models has
made so much progress recently, we’re seeing vocabulary / concept
space mis-matching
• Using 1.5M Kids’Cam images from wearable cameras, we have used
within-collection distributions to “smooth” concept weights (outliers and
gaps) in TFR
• We are trying to pivot around some manual annotations in order to
improve concept accuracies
• But, we need …
– More concepts – a richer vocabulary of them
– More varied manual annotations, not just fast food adverts
– A more global or collection-wide way to combine concept
confidences and relatedless to known manual annotations
– Some validation of accuracy of automatic concepts to measure
accuracy of our post-processing
Finally, a plug …
• TRECVid Video captioning Pilot task 2016
• 2,000 x Vine Videos, manually annotated with
captions, twice
• 8 participating groups (CMU, CUHK, DCU, GMU,
NII, UvA, Sheffield)
• Two tasks …
– For each video, rank the 2,000 captions –
metric is MRR
– For each video, generate your own caption –
metrics are bleu, meteor, and UMBC STS
(Semantic Textual Similarity) Service
• Lots of lessons learned and will build upon for full
task in 2017, probably using Vine videos

More Related Content

Similar to "Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts"

The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
NUS-ISS
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
SharanrajK22MMT1003
 
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Goergen Institute for Data Science
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
pratik pratyay
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
Ahmed Youssef Ali Amer
 
Elderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detectionElderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detection
Tanvi Mittal
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
QuantUniversity
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Ali Alkan
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
Utkarsh Contractor
 
Emotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.pptEmotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.ppt
Gopi Naidu
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
Sneha Ravikumar
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
QuantUniversity
 
Video to Video Translation CGAN
Video to Video Translation CGANVideo to Video Translation CGAN
Video to Video Translation CGAN
Alessandro Calmanovici
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
MarcinJedyk
 
Creating 3D neuron reconstructions from image stacks and virtual slides
Creating 3D neuron reconstructions from image stacks and virtual slidesCreating 3D neuron reconstructions from image stacks and virtual slides
Creating 3D neuron reconstructions from image stacks and virtual slides
MBF Bioscience
 
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
JacobSilbiger1
 
ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final Presentation
Pranay Mankad
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
Abhinav Dadhich
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
Goergen Institute for Data Science
 

Similar to "Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts" (20)

The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
slide-171212080528.pptx
slide-171212080528.pptxslide-171212080528.pptx
slide-171212080528.pptx
 
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
Forever Young: A Tribute to the Grandmaster through a recount of Personal Jou...
 
Real Time Object Dectection using machine learning
Real Time Object Dectection using machine learningReal Time Object Dectection using machine learning
Real Time Object Dectection using machine learning
 
Seminar nov2017
Seminar nov2017Seminar nov2017
Seminar nov2017
 
Elderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detectionElderly Assistance- Deep Learning Theme detection
Elderly Assistance- Deep Learning Theme detection
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine LearningMakine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
Makine Öğrenmesi ile Görüntü Tanıma | Image Recognition using Machine Learning
 
Surveillance scene classification using machine learning
Surveillance scene classification using machine learningSurveillance scene classification using machine learning
Surveillance scene classification using machine learning
 
Emotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.pptEmotion recognition and drowsiness detection using python.ppt
Emotion recognition and drowsiness detection using python.ppt
 
Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation Ultrasound Nerve Segmentation
Ultrasound Nerve Segmentation
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
Video to Video Translation CGAN
Video to Video Translation CGANVideo to Video Translation CGAN
Video to Video Translation CGAN
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
Creating 3D neuron reconstructions from image stacks and virtual slides
Creating 3D neuron reconstructions from image stacks and virtual slidesCreating 3D neuron reconstructions from image stacks and virtual slides
Creating 3D neuron reconstructions from image stacks and virtual slides
 
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
YU CS Summer 2021 Project | TensorFlow Street Image Classification and Object...
 
ADS Team 8 Final Presentation
ADS Team 8 Final PresentationADS Team 8 Final Presentation
ADS Team 8 Final Presentation
 
Image Object Detection Pipeline
Image Object Detection PipelineImage Object Detection Pipeline
Image Object Detection Pipeline
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 
Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?Video + Language: Where Does Domain Knowledge Fit in?
Video + Language: Where Does Domain Knowledge Fit in?
 

Recently uploaded

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
ywqeos
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
asyed10
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
v7oacc3l
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
aguty
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
nyvan3
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
exukyp
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
aqzctr7x
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
nuttdpt
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
inaya7568
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
lzdvtmy8
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
z6osjkqvd
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
hyfjgavov
 

Recently uploaded (20)

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
一比一原版(lbs毕业证书)伦敦商学院毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
一比一原版美国帕森斯设计学院毕业证(parsons毕业证书)如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
在线办理(英国UCA毕业证书)创意艺术大学毕业证在读证明一模一样
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
一比一原版澳洲西澳大学毕业证(uwa毕业证书)如何办理
 
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
一比一原版英国赫特福德大学毕业证(hertfordshire毕业证书)如何办理
 
UofT毕业证如何办理
UofT毕业证如何办理UofT毕业证如何办理
UofT毕业证如何办理
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
一比一原版(UCSF文凭证书)旧金山分校毕业证如何办理
 
Jio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdfJio cinema Retention & Engagement Strategy.pdf
Jio cinema Retention & Engagement Strategy.pdf
 
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
一比一原版格里菲斯大学毕业证(Griffith毕业证书)学历如何办理
 
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
一比一原版英属哥伦比亚大学毕业证(UBC毕业证书)学历如何办理
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
一比一原版兰加拉学院毕业证(Langara毕业证书)学历如何办理
 

"Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts"

  • 1. Semantic Indexing of Wearable Camera Images: Kids’Cam Concepts Alan F. Smeaton (Dublin City University) … and …
  • 2. ... Kevin McGuinness and Cathal Gurrin and Jiang Zhou and Noel E. O’Connor and Peng Wang and Brian Davis and Lucas Azevedo and Andre Freitas and Louise Signal and Moira Smith and James Stanley and Michelle Barr and Tim Chambers and Cliona Ní Mhurchu
  • 3. Overview • Automatic assignment of one-per-class concept detectors is now commonplace. • We’re interested in the challenging case of processing images from wearable cameras where improvement is necessary. • We try to exploit some limited manual annotations to improve accuracy of automatic concept weights. • This work is not complete, its ongoing, but the story is interesting.
  • 4. Analysis of Visual Media • More progress made within the last few years than in previous decade • Incorporation of deep learning plus availability of huge searchable image resources and training data • Automatic image tagging is now hosted and offered by website like Aylien, Imagga, Clarifai, and others, and very cost-effective.
  • 5. Analysis of Visual Media • These developments are welcome … but … restrictive tagging vocabularies. • How to map to users formulating queries • Alternative approach is tagging at query time but its expensive and not scalable to huge collections. • Almost all work on concept detection based on one concept at a time. • TRECVid tried simultaneous detection of concept pairs like “computer- screen with telephone”, and “airplane with clouds”. • Limited success but “Government Leader with Flag” was OK ! • Detection of concepts independently needs a course-correction because: – Doesn’t avail of all available information sources – Doesn’t map to a user’s search vocabulary
  • 6. Long-term approach … Images Concept Set Mapping User Search vocabulary How can a single image be mapped to two different vocabularies ?
  • 7. Using NL for image search … tagging • NL is fraught with complexities, ambiguities at all levels .. – Lexical level polysemy – Syntactic level structural ambiguity – Semantic interpretations – Discourse level pronoun resolution • + vocabulary limitations when finding word or phrase to describe something • When using computers to help search for image data, language challenges are exacerbated yet we assume a “simplistic” approach of tagging by a set of concepts, notwithstanding what we’re seeing with captioning here today • Tagging is very useful for smaller, niche applications in restricted domains with manual tagging, but we see scalability problems – Addressed with progress in automatic tagging but we’re tolerant of inaccuracies !
  • 8. In this paper … • We are interested in images from wearable cameras with lots of juicy challenges. • Notoriously difficult to process automatically because … – Blurring caused by wearer moving at image capture – Occlusions from wearer’s hands – Lighting conditions – Fisheye lens for wider perspective causing distortion – First person viewpoint but not what wearer sees – Content varies hugely across subjects • Applications in memory support, behaviour recording and analysis, security, other work-related, and QS. • In this paper we work with wearable camera data from school children, for analysis of their environments
  • 10. The Kids’Cam Project • Child obesity is a significant public health concern, worldwide. • Unequivocal evidence that marketing of energy-dense and nutrient- poor foods and beverages is a causal factor in child obesity. • Evidence of children’s total exposure to advertising of poor foodstuffs is not quantified. • Kids’Cam study aimed to determine the frequency, nature and duration of children’s exposure to such marketing. • 169 randomly selected children 11 to 13 yo from 16 schools in Wellington, NZ, each wore an Autographer and carried a GPS for 4 days .. .mages every 7 seconds, GPS every 5 seconds. – 1.5M images, 2.5M GPS datapoints • Manual annotation for food / beverage marketing using a 3-level, 53 concept ontology .. Inter-annotator reliability of 90%.
  • 11. + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts Manual Annotation
  • 12. Shop front > sign > sugary drinks/juices
  • 13. Convenience store indoors > in-store marketing > convenience store
  • 14. School > sign > fast food
  • 15. Processing the Kids’Cam Data • Following integration of different data sources and after the manual annotation of images, we processed the image collection in the following way …
  • 16. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10
  • 17. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10
  • 18. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10 7. Using a CNN to apply tags to images. We used the VGG-16 network, a deep CNN, trained on 1,000 object classes using 1.2M images from ImageNet
  • 19. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10 8. Trained models were used to predict probabilities for each concept in each of 1.5M images. Processed in batches of 64 on NVIDIA GPU, taking 4 days to complete
  • 20. Training Free Refinement • Current concept-at-a-time classifiers do not consider inter- concept relationships or dependencies yet these do exist • To improve one-per-class detectors, we post-process detection scores – We take advantage of concept co-occurrence and re- occurrence which depend on the particular collection – We take advantage of local (temporal) neighbourhood information where concepts are likely to re-occur close in time – We use GPS location information where concepts identified by a person at a location, may re-occur subsequently at that same location • TFR is based on non-negative matrix factorisation, described elsewhere
  • 21. 14 + GPS + Date/Time + User xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz xxxx yyyy zzzz 85 concepts : X 1.5M images aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc aaaa bbbb cccc TFR aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ aaaa’ bbbb’ cccc’ GPU Concept Models 6,000 concepts xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx xxxx xxxxx xxxxx aaaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc aaaa bbbb ccccc 8 7 6 5 4 3 2 1 9 13 12 11 10 9. Previously-described, we then applied Training-Free Refinement to improve probability assignments
  • 22. • We do not know accuracy of assignment of 1,000 concepts but we know accuracy of assignment of 53 concepts …and we have 1.5M images each mapped into 2 concept spaces • Can we adjust values in (b), anchored and pivoting around (a) in addition to having already used local, within-collection distributions ? y1 y2 x1 x2 b2 b1 a1 a2 (a) Manual, correct (b) Automatic, unknown accuracy
  • 24. Cross-mapping concept spaces • Distributional semantics – corpus-driven approach – based on hypothesis that co-occurring words in similar contexts have similar meaning • Using word2vec in DINFRA, we can map all words in a vocabulary to an n-dimensional vector space, where we can obtain relatedless scores among the words • Figure illustrates an example • For each image in Kids’Cam we can evaluate relatedness between human annotation and automatic concepts with highest-probability
  • 25. School > availability > drink bottle
  • 26. • We have top-ranked concepts, their confidences, their relatedness to the manual tags … • First effort is to simply multiple, as in Table, but its hard to see the impact of this
  • 27. And the result is … • … and that’s where we currently are !
  • 28. Conclusions and Future Work • Since automatic concept-detection using pre-defined models has made so much progress recently, we’re seeing vocabulary / concept space mis-matching • Using 1.5M Kids’Cam images from wearable cameras, we have used within-collection distributions to “smooth” concept weights (outliers and gaps) in TFR • We are trying to pivot around some manual annotations in order to improve concept accuracies • But, we need … – More concepts – a richer vocabulary of them – More varied manual annotations, not just fast food adverts – A more global or collection-wide way to combine concept confidences and relatedless to known manual annotations – Some validation of accuracy of automatic concepts to measure accuracy of our post-processing
  • 29. Finally, a plug … • TRECVid Video captioning Pilot task 2016 • 2,000 x Vine Videos, manually annotated with captions, twice • 8 participating groups (CMU, CUHK, DCU, GMU, NII, UvA, Sheffield) • Two tasks … – For each video, rank the 2,000 captions – metric is MRR – For each video, generate your own caption – metrics are bleu, meteor, and UMBC STS (Semantic Textual Similarity) Service • Lots of lessons learned and will build upon for full task in 2017, probably using Vine videos