Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Social	Media	Verifica.on		
Challenges,	Approaches	and	Applica.ons	
Dr.	Yiannis	Kompatsiaris,	ikom@i2.gr	
Mul$media,	Knowled...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Overview	
•  Introduc.on	
–  Mo.va....
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	3	
Pope	Francis	
Pope	Benedict	
2007...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	4	
hVp://blog.tyronesystems.com/how-...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Caption
Time
User
Profile
Favs
Comm...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	6	
rise	of	the	networks
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Mul2-modal	graphs	
#
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Social	Networks	as	Graphs
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	9	
Social	Networks	as	Real-Life	Sens...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	10	
Real-life	Social	Networks	
•  So...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Examples	-	Science	
Xin	Jin,	Andrew...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Examples	-	Science	
12
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Example	–	News	(Boston	bombing)	
13...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Many	other	examples:	smellymaps	
14...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Be	careful	of	correla2on	diagrams	
...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	16	
API	Wrapper	
Website	Wrapper	
Sc...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	17	
Challenges	–	Content	(Indexing	-...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Policy	–	Licensing	–	Legal	challeng...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	19	
“It	has	changed	the	way	we	do	ne...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	20	
	
	
	
	
	
	
	
	
	
		 		 		 					...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Verifica2on	was	simpler	in	the	past....
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	22	
News	Requirements	
Quickly	surfa...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Can	mul2media	on	the	Web	be	trusted...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Can	mul2media	on	the	Web	be	trusted...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
The	Problem	
•  Everyone	can	easily...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Image	verifica2on:	tools	of	the	trad...
Monitoring	and	intelligence	system	for	
Web	mul2media	verifica2on
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Media	REVEALr	
•  Developed	within	...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Overview	of	Media	REVEALr	
29	
Medi...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Named	En2ty	Detec2on	
•  Brevity	an...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Visual	Indexing	
•  Content-based	i...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Improving	NDS	Resilience	(NDS+)	
• ...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Example:	Filtering	Out	Font	Descrip...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Classifier	Details	
•  Random	Forest...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Mining:	Clustering	and	Aggrega2on	
...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
User	Interface:	Collec2ons	View	
36
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
User	Interface:	Items	View	&	Search...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
User	Interface:	Clusters	View	
38
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
User	Interface:	En22es	View	
39
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Evalua2on:	NER	
•  Manual	annota.on...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Evalua2on:	NDS	
•  Benchmark	Datase...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Use	Cases:	Real-world	Datasets	
42	...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
NDS	Use	Case	(boston)	
43
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Clustering	Use	Case	(boston)	
•  Vi...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
En2ty	Aggrega2on	Use	Case	(snow)		
...
Image	Forensics	for	Verifica2on
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Image	Forensics	for	Verifica2on	
47
Computa2onal	Verifica2on	in	Social	
Media
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Computa2onal	Verifica2on	in	Social	M...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Goals/Contribu2ons	
•  Dis.nguish	b...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Methodology	
•  Corpus	Crea.on	
–  ...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Corpus	Crea2on	
•  Define	a	set	of	k...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Features	(verifica2on	handbook)	
53	...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Training	and	Tes2ng	the	Classifier	
...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
The	Problem	with	Cross-Valida2on	
5...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Independence	of	Training-Test	Set	
...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Cross-dataset	Training-Tes2ng	
•  I...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Evalua2on	
•  Datasets	
–  Hurrican...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Dataset	–	Hurricane	Sandy	
59	
		
N...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Dataset	–	Boston	Marathon	Bombings	...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Dataset	Sta2s2cs	
61	
Tweets	with	o...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Predic2on	accuracy	(1)	
62	
0.	 10....
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Predic2on	accuracy	(2)	
•  Results	...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Sample	Results	
64	
•  Real	tweet		...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	65	
Sample	fake	and	real	images	in	S...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Reusable	results	
•  Computa2onal	v...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Contribu2ons		
•  Dr.	Symeon	Papado...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Support	
Tools	and	services	for	
So...
3rd	interna*onal	conference	on	Internet	Science		
INSCI	2016	
Social	Media	Verifica*on	
Conclusions	
•  Social	media	data	u...
Thank	you	for	your	aVen.on!	
ikom@i..gr	
hVp://mklab.i..gr
Upcoming SlideShare
Loading in …5
×

Social Media Verification Challenges, Approaches and Applications

386 views

Published on

As grassroots and social media-based journalism becomes more widespread, the need to verify information coming from such channels becomes imperative. The objective of this talk is to explore the challenges involved in social media computational verification to automatically classify unreliable media content as fake or real. After presenting a generic conceptual architecture, there will be a focus on tweets around big events linking to images (fake or real) of which the reliability could be verified by independent online sources. The REVEALr platform will be demonstrated, a scalable and efficient content-based media crawling and indexing framework featuring a novel and resilient near-duplicate detection approach and intelligent content- and context-based aggregation capabilities (e.g. clustering, named entity extraction)

Published in: Social Media
  • Be the first to comment

  • Be the first to like this

Social Media Verification Challenges, Approaches and Applications

  1. 1. Social Media Verifica.on Challenges, Approaches and Applica.ons Dr. Yiannis Kompatsiaris, ikom@i2.gr Mul$media, Knowledge and Social Media Analy$cs Lab, Head CERTH-ITI 3rd Interna.onal Conference on Internet Science (INSCI 2016)
  2. 2. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Overview •  Introduc.on –  Mo.va.on – Challenges •  Social Media in News and Journalism •  The problem of verifica.on •  Approaches –  Context extrac.on from Web and Social Media –  Image Forensics –  Computa.onal verifica.on •  Demos - Resources •  Conclusions 2
  3. 3. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 3 Pope Francis Pope Benedict 2007: iPhone release 2008: Android release 2010: iPad release hVp://petapixel.com/2013/03/14/a-starry-sea-of-cameras-at-the-unveiling-of-pope-francis/
  4. 4. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 4 hVp://blog.tyronesystems.com/how-much-data-is-created-every-minute-by-the-social-media
  5. 5. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Caption Time User Profile Favs Comms Tags Social Media aspects
  6. 6. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 6 rise of the networks
  7. 7. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Mul2-modal graphs #
  8. 8. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Social Networks as Graphs
  9. 9. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 9 Social Networks as Real-Life Sensors •  Social Networks is a data source with an extremely dynamic nature that reflects events and the evolu.on of community focus (user’s interests) •  Huge smartphones and mobile devices penetra2on provides real-.me and loca.on-based user feedback •  Transform individually rare but collec2vely frequent media to meaningful topics, events, points of interest, emo.onal states and social connec.ons •  Present in an efficient way for a variety of applica.ons (news, marke.ng, science, health, entertainment)
  10. 10. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 10 Real-life Social Networks •  Social networks have emergent proper2es. Emergent proper.es are new aVributes of a whole that arise from the interac.on and interconnec.on of the parts •  Emo.ons, Health, Sexual rela.onships depend on our connec2ons (e.g. number of them) and on our posi2on - structure in the social graph •  Central – Hub •  Outlier •  Transi.vity (connec.ons between friends)
  11. 11. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Examples - Science Xin Jin, Andrew Gallagher, Liangliang Cao, Jiebo Luo, and Jiawei Han. The wisdom of social mul*media: using flickr for predic*on and forecast, Interna.onal conference on Mul.media (MM '10). ACM. 11 “…if you're more than 100 km away from the epicenter [of an earthquake] you can read about the quake on twiVer before it hits you…” Many twiVer examples at: What can TwiVer tell us about the real world? TwiVer and the Real World CIKM'13 Tutorial, hVps://sites.google.com/site/twiVerandtherealworld/home
  12. 12. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Examples - Science 12
  13. 13. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Example – News (Boston bombing) 13 “Following the Boston Marathon bombings, one quarter of Americans reportedly looked to Facebook, TwiVer and other social networking sites for informa.on, according to The Pew Research Center. When the Boston Police Department posted its final “CAPTURED!!!” tweet of the manhunt, more than 140,000 people retweeted it.” “Authori.es have recognized that one the first places people go in events like this is to social media, to see what the crowd is saying about what to do next” "I have been following my friend's Facebook [account] who is near the scene and she is upda2ng everyone before it even gets to the news”
  14. 14. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Many other examples: smellymaps 14 Smell related words in geo-located social media hVp://researchswinger.org/smellymaps/
  15. 15. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Be careful of correla2on diagrams 15
  16. 16. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 16 API Wrapper Website Wrapper Scheduler CRAWLING Visual Indexing Near-duplicates Text Indexing INDEXING Media Fetcher SNA Sen2ment - Influence Trends - Topics MINING Model Building Concepts Relevance Diversity Popularity RANKING Veracity Crawling Specs Sources Interac2on Responsiveness Aggrega2on VISUALIZATION Aesthe2cs Conceptual Architecture
  17. 17. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 17 Challenges – Content (Indexing - Mining) • Mul2-modality: e.g. image + tags, video, audio • Rich social context: spa.o-temporal, social connec.ons, rela.ons and social graph • Specific messages: short, conversa.ons, errors, no context • Inconsistent quality: noise, spam, fake, propaganda • Huge volume: Massively produced and disseminated • Mul2-source: may be generated by different applica.ons and user communi.es • Dynamic: Fast updates, real-.me
  18. 18. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Policy – Licensing – Legal challenges •  Fragmented access to data –  Separate wrappers/APIs for each source (TwiVer, Facebook, etc.) –  Different data collec.on/crawling policies •  Limita.ons imposed by API providers (“Walled Gardens”) •  Full access to data impossible or extremely expensive (e.g. see data licensing plans for GNIP and DataSit) •  Non-transparent data access prac.ces (e.g. access is provided to an organiza.on/person if they have a contact in TwiVer) •  Constant change of model and ToS of social APIs –  No backwards compa.bility, addi.onal development costs •  Ephemeral nature of content •  Social search results oten lead to removed content à inconsistent and unreliable referencing •  User Privacy & Purpose of use •  Fuzzy regulatory framework regarding mining user-contributed data 18
  19. 19. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 19 “It has changed the way we do news”(MSN) “Social media is the key place for emerging stories – interna$onally, na$onally, locally” (BBC) “Social media is transforming the way we do journalism” (New York Times) Source: picture alliance / dpa
  20. 20. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 20 Source: GeVy Images “It’s really hard to find the nuggets of useful stuff in an ocean of content” (BBC) “Things that aren’t relevant crowd out the content you are looking for” (MSN) “The filters aren’t configurable enough” (CNN)
  21. 21. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Verifica2on was simpler in the past... Source: Frank Grätz 21
  22. 22. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 22 News Requirements Quickly surface trusted and relevant material from social media – with context. • “quickly”: in real .me • “surfaces”: automa.cally discovers, clusters and searches • “trusted”: automa.c support in verifica.on process • “relevant”: to the specific event • “material”: any material (text, image, audio, video = mul.media), aggregated with other sources (e.g. web) • “social media”: across all relevant social media playorms • “with context”: loca.on, .me, sen.ment, influence
  23. 23. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Can mul2media on the Web be trusted? 23 Real photo captured April 2011 by WSJ but heavily tweeted during Hurricane Sandy (29 Oct 2012) Tweeted by mul.ple sources & retweeted mul.ple .mes Original online at: hVp://blogs.wsj.com/metropolis/2011/04/28/weather- journal-clouds-gathered-but-no-tornado-damage/
  24. 24. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Can mul2media on the Web be trusted?
  25. 25. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on The Problem •  Everyone can easily publish content on the Web •  Content can be easily repurposed and manipulated •  Not only for fun but also for propaganda •  News outlets are compe.ng for views and clicks à Pressure for airing stories very quickly leaves very liVle room for verifica.on. à Very oten, even well- reputed news providers fall for fake news content. •  Mul.ple tools and services available for individual tasks à complex verifica.on process Very hard and 2me consuming to check the veracity of Web mul2media 25
  26. 26. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Image verifica2on: tools of the trade •  Metadata analysis –  E.g. do the dates/loca.ons match? Is the image already copyrighted? By whom? •  Context Extrac.on from Web and Social Networks –  Reverse image search using e.g. Google or TinEye –  Clustering •  Has the image been posted elsewhere? Does it originate from a different context? •  Supervised machine learning for automa.c classifica.on –  Exploi.ng paVerns of usage, content, linking of fake/real content •  Content analysis (forensics) for tampering localiza.on –  Most commonly, Error Level Analysis (ELA)
  27. 27. Monitoring and intelligence system for Web mul2media verifica2on
  28. 28. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Media REVEALr •  Developed within the REVEAL project: hVp://revealproject.eu/ •  Framework for collec.ng, indexing and browsing mul.media content from the Web and social media •  Support for verifica.on: –  Near-duplicate detec.on against an indexed collec.on –  Clustering of social media posts by visual similarity à compara.ve view of the same incident –  Aggrega.on and visualiza.on of Named En..es around an incident 28
  29. 29. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Overview of Media REVEALr 29 Media collec.on Media pre-processing & feature extrac.on Media analysis, mining & indexing Persistence (storage, indexing) Access (API) Visualiza.on, front-end TEXT VISUAL
  30. 30. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Named En2ty Detec2on •  Brevity and noisy nature of text in social media poses a serious challenge •  Employed solu.on: –  Pre-processing: tokeniza.on, user men.on resolu.on, text cleaning –  Stanford NER + user men.on resolu.on –  Regular expressions to remove special characters and symbols (e.g., #, @, URLs, etc.) 30
  31. 31. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Visual Indexing •  Content-based image retrieval to solve Near- Duplicate Search (NDS) problem •  Based on local descriptors (SURF), aggrega.on (VLAD), dimensionality reduc.on (PCA), quan.za.on (PQ) and indexing (IVFADC) •  State-of-the-art visual similarity search –  High precision/recall –  Very efficient and scalable implementa2on (search many millions of images in a few msec, maintain full index in memory using ~1GB/10M images) 31
  32. 32. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Improving NDS Resilience (NDS+) •  Oten, NDS performance suffers from overlay graphics and fonts •  To address this issue, we integrate a descriptor-level classifier that tries to remove the font/graphic descriptors from the VLAD vector 32
  33. 33. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Example: Filtering Out Font Descriptors •  Assuming that in most cases the classifier is correct, the resul.ng VLAD vector is of much higher quality compared to the one without filtering 33
  34. 34. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Classifier Details •  Random Forest used as base classifier •  Cost Sensi.ve meta-classifier to penalize misclassifica.on of True Posi.ves •  Challenge due to Class Imbalance (overlay descriptors << useful image content descriptors) –  Cost Sensi.ve meta-classifier performs over-sampling of minority class to balance the training set •  Training set created by collec.ng images with overlays (e.g., memes) from the Web and manually annota.ng them (selec.ng areas w. fonts/overlays) 34
  35. 35. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Mining: Clustering and Aggrega2on •  Visual aggrega.on –  DBSCAN on the visual feature representa.on (PCA-reduced VLAD vectors) –  Element (tweet) selected based on the largest amount of keywords (expected to result in more informa.on) •  En.ty aggrega.on –  NER on individual items –  En.ty categoriza.on (àPersons, Loca.on, Organiza.ons) –  En.ty ranking based on frequency of occurrence 35
  36. 36. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on User Interface: Collec2ons View 36
  37. 37. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on User Interface: Items View & Search 37
  38. 38. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on User Interface: Clusters View 38
  39. 39. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on User Interface: En22es View 39
  40. 40. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Evalua2on: NER •  Manual annota.on of 400 tweets from the SNOW Data Challenge dataset (Papadopoulos et al., 2014) •  Measure: Accuracy à instance is considered correct when both en.ty and type are correctly iden.fied •  Three compe.ng solu.ons: –  Base Stanford NER (S-NER) –  S-NER + Extensions/Post-processing (S-NER+) –  Ellogon library (hVp://www.ellogon.org) 40
  41. 41. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Evalua2on: NDS •  Benchmark Datasets –  Holidays: 1,491 images, 500 queries (Jegou et al., 2008) –  Oxford: 5,063 images, 55 queries (Philbin et al., 2008) –  Paris: 6,412 images, 55 queries (Philbin et al., 2008) •  Accuracy: mean Average Precision (mAP) 41
  42. 42. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Use Cases: Real-world Datasets 42 sandy boston malaysia ferry
  43. 43. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on NDS Use Case (boston) 43
  44. 44. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Clustering Use Case (boston) •  Visual clustering enables compara.ve view and analysis over .me (in this case showing increasing confidence on picture). •  When journalists see many similar photos of the same scene, they have more confidence that it is real and not fabricated. 44
  45. 45. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on En2ty Aggrega2on Use Case (snow) 45 LOCATIONS PERSONS ORGANIZATIONS
  46. 46. Image Forensics for Verifica2on
  47. 47. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Image Forensics for Verifica2on 47
  48. 48. Computa2onal Verifica2on in Social Media
  49. 49. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Computa2onal Verifica2on in Social Media •  Create a computa$onal verifica$on framework to classify tweets with unreliable media content. •  Events used for experimenta.on 49 Fake images posted during Hurricane Sandy natural disaster Fake images posted during Boston Marathon bombings
  50. 50. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Goals/Contribu2ons •  Dis.nguish between fake and real content shared on TwiVer using a supervised approach •  Provide closer to reality es.mates of automa.c verifica.on performance •  Explore methodological issues with respect to evalua.ng classifier performance •  Create reusable resources –  Fake (and real) tweets (incl. images) corpus –  Open-source implementa.on 50
  51. 51. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Methodology •  Corpus Crea.on –  Topsy API –  Near-duplicate image detec.on •  Feature Extrac.on –  Content-based features –  User-based features –  Link-based features •  Classifier Building & Evalua.on –  Cross-valida.on –  Independent photo sets –  Cross-dataset training 51
  52. 52. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Corpus Crea2on •  Define a set of keywords K around an event of interest. •  Use Topsy API (keyword-based search) and keep only tweets containing images T. •  Using independent online sources, define a set of fake images IF and a set of real ones IR. •  Select TC ⊂ T of tweets that contain any of the images in IF or IR. •  Use near-duplicate visual search (VLAD+SURF) to extend TC with tweets that contain near-duplicate images. •  Manually check that the returned near-duplicates indeed correspond to the images of IF or IR. 52
  53. 53. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Features (verifica2on handbook) 53 # User Features 1 Username 2 Number of friends 3 Number of followers 4 Number of followers/number of friends 5 Number of .mes the user was listed 6 If the user’s status contains URL 7 If the user is verified or not # Content Features 1 Length of the tweet 2 Number of words 3 Number of exclama.on marks 4 Number of quota.on marks 5 Contains emo.con (happy/sad) 6 Number of uppercase characters 7 Number of hashtags 8 Number of men.ons 9 Number of pronouns 10 Number of URLs 11 Number of sen.ment words 12 Number of retweets 13 Readability1 # Link-based features 1 Web Of Trust score (WOT)2 2 In-degree and harmonic centrali.es3 3 Alexa rankings4 1 Flesch reading ease method to compute a score in [0,100] range, 0 hard- to-read and 100 easy-to-read text 2 A metric for how trustworthy a website is, based on user ra$ngs 3 Rankings computed based on the Web graph 4 Alexa rankings, which evaluate the frequency of visits on various websites
  54. 54. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Training and Tes2ng the Classifier •  Care should be taken to make sure that no knowledge from the training set enters the test set. •  This is NOT the case when using standard cross-valida.on. 54
  55. 55. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on The Problem with Cross-Valida2on 55 Training/Test tweets are randomly selected. One of the reference fake images Mul.ple tweets per reference image.
  56. 56. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Independence of Training-Test Set 56 Training/Test tweets are constraint to correspond to different reference images.
  57. 57. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Cross-dataset Training-Tes2ng •  In the most unfavourable case, the dataset used for training should refer to a different event than the one used for tes.ng. •  Simulates real-world scenario of a breaking story, where no prior informa.on is available to news professionals. •  Variants: –  Different event, same domain –  Different event, different domain (very challenging!) 57
  58. 58. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Evalua2on •  Datasets –  Hurricane Sandy –  Boston Marathon bombings •  Evalua.on of two sets of features (content/ user) •  Evalua.on of different classifier se‚ngs 58
  59. 59. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Dataset – Hurricane Sandy 59 Natural disaster held around the USA from October 22nd to 31st, 2012. Fake images and content, such as sharks inside New York and flooded Statue of Liberty, went viral. Hashtags Hurricane Sandy #hurricaneSandy Hurricane #hurricane Sandy #Sandy
  60. 60. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Dataset – Boston Marathon Bombings 60 The bombings occurred on 15 April, 2013 during the Boston Marathon when two pressure cooker bombs exploded at 2:49 pm EDT, killing three people and injuring an es.mated 264 others. Hashtags Boston Marathon #bostonMarathon Boston bombings #bostonbombings Boston suspect #bostonSuspect manhunt #manhunt watertown #watertown Tsarnaev #Tsarnaev 4chan #4chan Sunil Tripathi #prayForBoston
  61. 61. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Dataset Sta2s2cs 61 Tweets with other image URLs 343939 Tweets with fake images 10758 Tweets with real images 3540 Hurricane Sandy Boston Marathon Tweets with other image URLs 112449 Tweets with fake images 281 Tweets with real images 460 Tweets with fake images 1% Tweets with other image URLs Tweets with fake images Tweets with real images Tweets with other image URLs 3% 1% 96% Tweets with fake images Tweets with real images Tweets with other image URLs
  62. 62. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Predic2on accuracy (1) 62 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. Total User Content J48 decision tree 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. Total User Content KStar 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. Total User Content Random Forest Boston Marathon Hurricane Sandy •  10-fold cross valida.on results using different classifiers ~80%
  63. 63. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Predic2on accuracy (2) •  Results using different training and tes.ng set from the Hurricane Sandy dataset 63 0. 25. 50. 75. 100. Total User Content Random Forest Kstar J48 decision tree •  Results using Hurricane Sandy for training and Boston Marathon for tes.ng 0. 10. 20. 30. 40. 50. 60. 70. 80. 90. 100. Total User Content Random Forest Kstar J48 decision tree ~75% ~58% separate classifiers might be built for certain types of incidents
  64. 64. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Sample Results 64 •  Real tweet My friend's sister's Trampolene in Long Island. #HurricaneSandy Classified as real •  Real tweet 23rd street repost from @wendybarton #hurricanesandy #nyc Classified as fake •  Fake tweet Sharks in people's front yard #hurricane #sandy #bringing #sharks #newyork #crazy hZp://t.co/PVewUIE1 Classified as fake •  Fake tweet Statue of Liberty + crushing waves. hZp://t.co/7F93HuHV #hurricaneparty #sandy Classified as real
  65. 65. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on 65 Sample fake and real images in Sandy •  Fake pictures shared on social media •  Real pictures shared on social media
  66. 66. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Reusable results •  Computa2onal verifica2on –  Dataset: hVps://github.com/MKLab-ITI/image-verifica.on-corpus –  Code: hVps://github.com/socialsensor/computa.onal-verifica.on •  The Wild Web Tampered Image Dataset –  80 confirmed digital forgeries, 10,870 images, Ground truth binary masks –  Dataset: hVps://mklab.i..gr/project/wild-web-tampered-image-dataset •  The Deutsche Welle Tampered Image Dataset –  6 original images, 3 image sources, 7 different modified versions –  Surprisingly tough to crack using the state-of-the-art –  Dataset: hVps://revealproject.eu/the-deutsche-welle-image-forensics-dataset/ •  Open-source projects (Apache License v2): hVps://github.com/socialsensor –  Data collec.on (stream-manager, storm-focused-crawler) –  Indexing (framework-client, mul.media-indexing) –  Mining (topic-detec.on, mul.media-analysis, community-evolu.on- analysis, social-event-detec.on) 66
  67. 67. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Contribu2ons •  Dr. Symeon Papadopoulos –  Social network analysis, social media content mining and mul.media indexing and retrieval –  hVp://mklab.i..gr/people/papadop –  TwiVer: @sympap •  Dr. Zampoglou Markos –  Web mul.media verifica.on, image forensics for verifica.on –  markzampoglou@i..gr •  Boididou Chris.na –  computa.onal approaches for verifica.on –  boididou@i..gr 67
  68. 68. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Support Tools and services for Social Media verifica.on from a journalis.c and enterprise perspec.ve. 68 Knowledge verifica.on playorm to detect emerging stories and assess the reliability of newsworthy video files and content spread via social media EU funded projects
  69. 69. 3rd interna*onal conference on Internet Science INSCI 2016 Social Media Verifica*on Conclusions •  Social media data useful in many applica.ons –  From confirming exis.ng and known correla.ons to predic.on and decision-making •  Many challenges exist –  Data availability (infrastructure, policies) –  Personal data value (legal, ethical) –  Real-.me and scalable approaches –  Fusion of various modali.es (Content, social, temporal, loca.on) •  Verifica.on requires contribu.on from various disciplines –  Content Analy.cs –  Machine Learning –  Network Analysis –  Psychology – Social Sciences (paVerns of presenta.on, sharing) –  Visualiza.on 69
  70. 70. Thank you for your aVen.on! ikom@i..gr hVp://mklab.i..gr

×