Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

QCRI
QCRIQCRI
Automa'c	Image	Filtering	on	Social	
Networks	Using	Deep	Learning	and	
Perceptual	Hashing	During	Crises	
Dat	Tien	Nguyen,	Firoj	Alam	
Ferda	Ofli,	Muhammad	Imran		
Qatar	Compu>ng	Research	Ins>tute	
@mimran15	@ferdaofli	@firojalam04	@ndat
Outline	
Background	
Data	
Image	Processing	Pipeline	
Summary	and	Future	Works
Outline	
Background	
Data	
Image	Processing	Pipeline	
Summary	and	Future	Works
Ar'ficial	Intelligence	for										
Digital	Response	
Background	
hIp://aidr.qcri.org	
What	about	images?	
Expert/User
“A	picture	is	worth	a	thousand	words.”	
Background
Example	
Background	
Irrelevant	Images	for	the	event	
Relevant	Images	for	the	event
Challenges	of	Social	Media	Image	
Processing	
•  High	velocity	and	high	volume	
– Crowd	cannot	keep	up	
•  Sandy	Hurricane	
–  20	millions	tweets	in	5	days	
–  16	thousands	tweets	per	minute	
Background
Challenges	of	Social	Media	Image	
Processing	
•  High	velocity	and	high	volume	
– Crowd	cannot	keep	up	
•  Sandy	Hurricane	
–  20	millions	tweets	in	5	days	
–  16	thousands	tweets	per	minute	
•  Large	por>on	of	redundant/duplicate	and	
irrelevant	content	
– We		should	not	waste	crowd’s	>me	and	effort		
Background
Task/Goal	
	
“is	to	filter	irrelevant	and	
duplicate	images,	and	
categorize	them	into	different	
damage	level,	which	can	help	
in	the	decision	making	process”	
Background
Image	Filtering?	
•  Social	media	image	filtering	
– Real->me	image	collec>on	
– Duplicate	or	near-duplicate	image	detec>on	
– Irrelevant	image	content	detec>on	
Background
Image	Filtering?	
•  Social	media	image	filtering	
– Real->me	image	collec>on	
– Duplicate	or	near-duplicate	image	detec>on	
– Irrelevant	image	content	detec>on	
•  Ac'onable	informa'on	extrac'on	
– Infrastructure	damage	assessment	
– Injured	people	detec>on	
– Fine-grained	object	analysis	
Background
Automa'c	Image	Processing	Pipeline	
Background	
Tweet	
Collector	
Image	
Collector	
-	Receives	tweets	from	Tweet	collector	
-	Image	URL	extrac>on	
-	Image	downloading	from	the	Web	
Relevancy	
Filtering	
De-duplica>on	
Filtering	
Tweets	 Images	
Web	
Image	
downloading	
Images	 Images	
All	images	+	meta-data		
pass	through	this	channel	
Only	relevant	images	+	
meta-data	pass	
through	this	channel	
Only	relevant	and	unique	
images	+	meta-data	pass	
through	this	channel	
Damage	
Assessment
Related	Works	
•  Very	few:	
–  Images	with	on-topic	tweets	are	more	relevant	(Peters	and	
Joao,	2015)	
–  Visually	relevant	and	irrelevant	tweets	(F1	70.5%	in	binary	
classifica>on	task)	(Chen	et	al.,	2013)	
–  Damage	level	from	aerial	Images	(Turker	et	al.,	2004;	Fernandez	et	
al.	2015)	
•  Mostly	used	bag-of-visual-word	features	
Background
Outline	
Background	
Data	
Image	Processing	Pipeline	
Summary	and	Future	Works
Data:	Previous	Disaster	Datasets	
•  Four	disaster	events	
Corpus
Data:	Damage	Assessment	
Corpus	
Annota'on:	
•  Severe	damage:		
–  Substan>al	destruc>on	of	an	infrastructure	
–  Non-usable	buildings,	roads,	bridges	for	example.		
•  Mild	damage:	
–  Minor	damages	with	up	to	50%	
•  LiQle-to-no	damage	
Number	of	labeled	images	for	each	dataset	and	each	damage	category
Data:	Relevancy	Detec>on	
Corpus	
Annota'on	Mapping:	
•  Images	has	been	selected	from	the	data	we	annotated	for	
damage	assessment	
–  {severe,	mild}	à	relevant	(randomly	sampled	3518	images)	
•  Resulted	7036	Images	{relevant,	irrelevant}	
Image	
annotated	as		
LiQle-to-None	
ImageNet	Object	
Detec'on	Model	
Object	
category		
website,	suit,	lab	
coat,	envelope,	dust	
jacket,	candle,	menu,	
vestment,	monitor,	
street	sign,	puzzle,	
television,	cash	
machine,	screen	
irrelevant	
Most	frequently	
occurring	
Human	
Supervision
Outline	
Background	
Data	
Image	Processing	Pipeline	
Summary	and	Future	Works
Automa'c	Image	Processing	Pipeline	
Image	Processing	Pipeline	
Tweet	
Collector	
Image	
Collector	
-	Receives	tweets	from	Tweet	collector	
-	Image	URL	extrac>on	
-	Image	downloading	from	the	Web	
Relevancy	
Filtering	
De-duplica>on	
Filtering	
Tweets	 Images	
Web	
Image	
downloading	
Images	 Images	
All	images	+	meta-data		
pass	through	this	channel	
Only	relevant	images	+	
meta-data	pass	
through	this	channel	
Only	relevant	and	unique	
images	+	meta-data	pass	
through	this	channel	
Damage	
assessment
Image	Collector	
Image	Processing	Pipeline	
Tweet	
Collector	
Image	
Collector	
-	Receives	tweets	from	Tweet	collector	
-	Image	URL	extrac'on	
-	Image	downloading	from	the	Web	
Relevancy	
Filtering	
De-duplica>on	
Filtering	
Tweets	 Images	
Web	
Image	
downloading	
Images	 Images	
All	images	+	meta-data		
pass	through	this	
channel	
Only	relevant	images	+	meta-data		
pass	through	this	channel	
Only	relevant	and	unique	images	
+	meta-data	pass	through	this	
channel	
Damage	assessment
Relevancy	Filtering	
Image	Processing	Pipeline	
Tweet	
Collector	
Image	
Collector	
-	Receives	tweets	from	Tweet	collector	
-	Image	URL	extrac>on	
-	Image	downloading	from	the	Web	
Relevancy	
Filtering	
De-duplica>on	
Filtering	
Tweets	 Images	
Web	
Image	downloading	
Images	 Images	
All	images	+	meta-data		
pass	through	this	channel	
Only	relevant	images	+	
meta-data		
pass	through	this	
channel	
Only	relevant	and	unique	images	
+	meta-data		
pass	through	this	channel	
Damage	
assessment
Relevancy	Filtering	(Cont.)	
Irrelevant	images	showing	cartoons,	banners,	
adver>sements,	celebri>es,	etc.	
Image	Processing	Pipeline
Relevancy	Filtering	(Cont.)	
Image	Processing	Pipeline	
•  Deep	Learning	Model:	Build	a	binary	classifier	
–  Transfer	learning		
–  Extracted	features	from	VGG16,	a	pre-trained	convolu>onal	neural	
network,	e.g.,	VGG16	(Simonyan	et	al.,	2014	)	
–  Fine	tuned	the	parameters	
1x2	
Image	credit:	hIps://blog.heuritech.com
Relevancy	Filtering	(Cont.)	
*	Simonyan,	K.	and	Zisserman,	A.	(2014).	“Very	deep	convolu>onal	networks	for		
large-scale	image	recogni>on”.	In:	arXiv	preprint	arXiv:1409.1556	
Image	Processing	Pipeline	
•  Data	Split	for	Experiments:	
–  Training	60%	
–  Valida>on	20%	
–  Test	20%	
•  Performance
Duplicate	Filtering	
Image	Processing	Pipeline	
Tweet	
Collector	
Image	
Collector	
-	Receives	tweets	from	Tweet	collector	
-	Image	URL	extrac>on	
-	Image	downloading	from	the	Web	
Relevancy	
Filtering	
De-duplica'on	
Filtering	
Tweets	 Images	
Web	
Image	downloading	
Images	 Images	
All	images	+	meta-data		
pass	through	this	channel	
Only	relevant	images	+	
meta-data		
pass	through	this	
channel	
Only	relevant	and	unique	images	
+	meta-data		
pass	through	this	channel	
Damage	
assessment
Duplicate	Filtering	
Examples	of	near-duplicate	images	
Image	Processing	Pipeline
Duplicate	Filtering	(Cont.)	
Image	Processing	Pipeline	
•  Selected	1100	images	for	the	experiment	
•  Approach:		
–  Compute	perceptual	hash	for	each	image	
	
		
–  Compute	hamming	distance	between	a	pair	of	images	
Two	images	are	duplicate	if:	
	
distance(Vi, Vj) 10
Vi = pHash(Imagei)
Vi ∈ Rn
distance(Vi, Vj) =
n
k=0
|(Vi,k − Vj,k)|
Vi = pHash(Imagei)
Vi ∈ Rn
distance(Vi, Vj) =
n
k=0
|(Vi,k − Vj,k)|
hamming distance(Vi, Vj) =
n
k=1
(Vi,k ̸= Vj,k)
(Lei	et	al.	2011)
Before/AZer	Image	Filtering	
Image	Processing	Pipeline	
•  Number	of	images	that	remain	in	our	dataset	
auer	each	image	filtering	opera>on
Before/AZer	Image	Filtering	(Cont.)	
Image	Processing	Pipeline	
•  Number	of	images	that	remain	in	our	dataset	
auer	each	image	filtering	opera>on	
~	2	%	
~	2	%	
~	50	%
Before/AZer	Image	Filtering	(Cont.)	
Image	Processing	Pipeline	
•  Number	of	images	that	remain	in	our	dataset	
auer	each	image	filtering	opera>on	
~	2	%	
~	2	%	
~	50	%	
~	58	%	
~	50	%	
~	30	%
Before/AZer	Image	Filtering	(Cont.)	
Image	Processing	Pipeline	
•  Number	of	images	that	remain	in	our	dataset	
auer	each	image	filtering	opera>on	
~	2	%	
~	2	%	
~	50	%	
~	58	%	
~	50	%	
~	30	%	
Assume	tagging	an	image	costs	$1,		
We	can	save	~$17k,	almost	saving	62%	of	the	budget!!!
Damage	Assessment	
Image	Processing	Pipeline	
Tweet	
Collector	
Image	
Collector	
-	Receives	tweets	from	Tweet	collector	
-	Image	URL	extrac>on	
-	Image	downloading	from	the	Web	
Relevancy	
Filtering	
De-duplica>on	
Filtering	
Tweets	 Images	
Web	
Image	downloading	
Images	 Images	
All	images	+	meta-data		
pass	through	this	channel	
Only	relevant	images	+	
meta-data	pass	
through	this	channel	
Only	relevant	and	unique	
images	+	meta-data		
pass	through	this	channel	
Damage	
assessment
Damage	Assessment	
•  Three-class	classifica>on	
– Categories:	severe,	mild	&	liIle-to-none	
•  Dis>nc>on	between	categories	is	ambiguous.	
Image	Processing	Pipeline	
Descrip'on	 Severe	 Mild	 None	 All	
Number	of	Images	 1765	 483	 3751	 6000
Damage	Assessment	
•  Fine-tuning	a	pre-trained	CNN	(e.g.,	VGG16)	
•  5-fold	cross	valida>on	
Image	Processing	Pipeline
Outline	
Background	
Data	
Image	Processing	Pipeline	
Summary	and	Future	Works
Summary	
•  We	propose	a	real	>me	image	processing	pipeline,	
which	filters	
–  duplicate,	near	duplicate,	and	
–  irrelevant	images	
and	categories	
–  images	into	severe,	mild	and	liIle-to-no	damage	
•  Relevancy	filter	model:	AUC:	0.98;	F1:	0.98	
•  Damage	assessment	model:	AUC:	0.72;	F1:	0.67	
•  We	can	save	62%	of	the	whole	budget	
Summary	and	Future	Works	
62%	images	are	
irrelevant	and	
duplicates	
	
D.	T.	Nguyen,	F.	Alam,	F.	Ofli,	M.	Imran,	“Automa>c	image	filtering	on	social	networks	
using	deep	learning	and	perceptual	hashing	during	crises”,	ISCRAM	2017.
Future	Work	
•  Image	processing	pipeline	
– Please	check:	hIp://aidr.qcri.org	
•  Fine-grained	object	analysis	
– Severity	of	injury,	quality	of	aid/shelter,	etc.	
•  Mul>modal	analysis	
– text	+	image	+	metadata	
Summary	and	Future	Works
Thank	you!	
	
Please	follow	us	on	twiIer	@aidr_qcri	
	
Researchgate:	goo.gl/TvqPpw	
	
To	get	the	data:	hQp://crisisnlp.qcri.org/	
D.	T.	Nguyen,	F.	Alam,	F.	Ofli,	M.	Imran,	“Automa>c	image	filtering	on	social	networks	
using	deep	learning	and	perceptual	hashing	during	crises”,	ISCRAM	2017.	
Please	cite	us:
1 of 38

More Related Content

Similar to Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises

Similar to Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises(20)

Techexpo bigdata ml_ai_hanoiTechexpo bigdata ml_ai_hanoi
Techexpo bigdata ml_ai_hanoi
Lam Pham229 views
 Situational Awareness Through Social Media  Situational Awareness Through Social Media
Situational Awareness Through Social Media
Partners in Emergency Preparedness Conference709 views
8-170122180211.pptx8-170122180211.pptx
8-170122180211.pptx
MARSTONGLENNTUGAHAN82 views
Social media marketing for photographersSocial media marketing for photographers
Social media marketing for photographers
Intranet Future134 views
Social media for service deliverySocial media for service delivery
Social media for service delivery
Miles Maier391 views

Automatic Image Filtering on Social Networks Using Deep Learning and Perceptual Hashing During Crises