SlideShare a Scribd company logo
1 of 24
News-oriented multimedia search over
multiple social networks
Katerina Iliakopoulou, Symeon Papadopoulos and Yiannis Kompatsiaris
1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)
CBMI 2015, June 11, 2015, Prague, Czech Republic
Presented by Katerina Andreadou
The rise of Online Social Networks (OSNs)
#2
• Increasingly popular  Massive amounts of data
– Both text and multimedia
• Content peaks when
– A planned event takes place (e.g., Olympic games)
– An unexpected news story breaks (e.g., earthquake)
Journalistic practices now involve the use of user-
generated content from OSNs for reporting on news
stories and events
The Problem
#3
• News stories are covered in multiple OSNs
– Twitter, Facebook, Google+, Instagram, Tumblr, Flickr
• No effective means of searching over multiple OSNs
– Necessary to build appropriate queries
– Find relevant hashtags and query keywords
• Effective querying is not straightforward
– Long complicated queries retrieve no results
– Vague queries bring back irrelevant content
The Problem is also OSN-specific
#4
• Flickr search is more flexible
– It returns results that contain all requested keywords or a
portion of them with the appropriate ranking
• Instagram is more restrictive
– It can only handle hashtags
– It returns very few or no results to multi-keyword queries
• The order of keywords is also crucial for some OSNs
Query formulation has to be OSN-specific
Content requirements
#5
• High relevance to the topic of interest
• High quality of multimedia
• Diversity of retrieved content
• Usefulness with respect to reporting and publication
Related work
• Optimization of query formulation methods utilizing
terms, proximities and phrases with respect to their
frequency and text position
– Markov random field models (Metzler et al., 2005)
– Positional language models (Lv et al., 2009)
– Query operations (Mishne et al., 2005)
• Improve query formulation by modelling query
concepts
– Learning concept importance (Bendersky et al., 2010)
– Latent content expansion using markov random fields
(Metzler et al., 2007)
#6
Goals and Contributions
• A novel graph-based query formulation method
– Catered for the special characteristics of each OSN
– Captures the primary entities and their associations
– Builds numerous queries by greedy graph traversal
• A relevance classification method
– 12 features based on content (text, visual) and context
(popularity, publication time)
• Evaluation of the framework in real-world events
and stories
#7
Overview of the Framework
#8
Step I: Collection of highly relevant content
• Query six OSNs with a high precision query q0 to
build an initial collection M0
– news story headline
– official name of the event
• Lower the possibility of noisy content by
– discarding all material retrieved before the story broke
• Only some OSNs were found to contribute to the
collection: Twitter, Flickr, Google+
#9
Step II: Keyword and hashtag extraction
• Extract the Named Entities from the M0 metadata
• Discard all stop-words and filter out HTML tags, web
links and social network account names
• Perform stemming for keywords that are not listed
as Named Entities to group keywords with similar
meaning
Create a list of keywords and a list of hashtags, each
associated with a frequency count
#10
Step III: Graph construction
• Vertices  set of selected keywords
• Edges  their pairwise adjacency relations
– adjacency is computed with respect to the text metadata
• Each edge  frequency of appearance of the phrase
composed of the edge keywords
• Only significant keywords are considered 
keywords with greater frequency than the average
– elimination of noisy keywords
– cost-effectiveness
#11
Step IV: Query building
• Query  path from a starting
node to an end node given a
maximum number of L hops
• Starting node high out-degree
or connected to heavy weighted
edges
• Total score for a node
• Penalize queries with high text
similarity  Jaccard coefficient
#12
Example: 86th Academy Awards
#13
Step V: Relevance classification
Textual relevance is computed wrt the high precision query q0
• title & description
• tags
#14
Popularity
Textual relevance
Visual similarity
Temporal proximity to the story
Image dimensions
Evaluation
#15
• Choose 20 events and news stories which took place
up to five months before data collection
– the older the event, the more content disappears from the
OSNs
• Choose events with considerable size and variety
• Set the maximum number of keyword-base queries
Mmax=20 and the maximum number of hashtag-
based queries to Mmax=10
Data statistics
#16
• More than 88K images for all
20 events
• ~4.4K images per event/new
story on average
• Events are associated on
average with more images
(5.5K) than news stories (3.3K)
Number of images
collected during the
first querying step
Number of images
collected during the
second querying step
Media volume per OSN
#17
• Flickr contributes the most (66.9%) with Twitter
following (19%)
• Instagram and Google+ less but considerable
• Tumblr and Facebook the least content
– Tumblr has significantly lower usage
– Facebook has very poor search API behaviour
• Increase between the two retrieval steps
– Facebook, Flickr, Tumblr: 5x
– Google+, Instagram: even higher (8.1x and 6.8x)
– Twitter: 3x
Quality of formulated queries
#18
• Evaluate the relevance and quality of the retrieved
content in the second step (Mext)
– A large majority (90%) of the images retrieved in the first
step (M0) were relevant
– Four human annotators
• Relevance is high (>50%) for 3 events
• Relevance is decent (>40%) for 3 news stories
• Half of the events and news stories are characterized
by low-to-medium relevance (10% - 40%)
• Relevance is very low (<10%) for two events and two
news stories
Why is irrelevant content collected?
#19
• Vague keyword-based queries or hashtags
– Example: British Academy Film Awards  most popular
hashtag  british
– Example: Sundance Film Festival  vague query  film
festival
• False keyword-based queries
– They contain keywords irrelevant to the subject
– They are left-overs from the graph pruning, they should
have been eliminated
Relevance classification
#20
DT  Decision Tree RF  Random Forest
SVM  Support Vector Machine MP  Multilayer Perceptor
Relevance classification
#21
• RF outperforms the
rest in all cases
• DT is also very good
• SVM has the worst
performance
– Input features are
not normalized
– A few of them are
quantized to a small
set of possible
values
Conclusion - Contributions
• Searching for multimedia content around events and
news stories over multiple OSNs is challenging!
– Collect high quality relevant content in spite of the
different behaviors and requirements of the OSNs
• We proposed a multi-step process including
– a graph-based query building method
– a relevance classification step
• We evaluated the framework on a set of 20 large-
scale events and news stories of global interest
#22
Future Work
• Improve the performance of the query building
method when the number of collected items in the
first step is small
• Extract statistically grounded relevance features
– Take into account distribution differences in different OSNs
• Apply the method while the event evolves
• Add support for the collection of video content
#23
Thank you!
• Slides:
http://www.slideshare.net/sympapadopoulos/newsoriented-
multimedia-search-over-multiple-social-networks
• Get in touch:
@matzika00 / katerina.iliakopoulou@gmail.com
@sympapadopoulos / papadop@iti.gr
#24

More Related Content

Viewers also liked

Evolution news
Evolution newsEvolution news
Evolution newsjwvocus
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingREVEAL - Social Media Verification
 
Veracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking NewsVeracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking NewsREVEAL - Social Media Verification
 
Social Strategy: Why Does Strategy Get Left Behind?
Social Strategy: Why Does Strategy Get Left Behind?Social Strategy: Why Does Strategy Get Left Behind?
Social Strategy: Why Does Strategy Get Left Behind?David Rollo
 
Multimedia in research: What is it? Why use it? How to use it?
Multimedia in research: What is it? Why use it? How to use it?  Multimedia in research: What is it? Why use it? How to use it?
Multimedia in research: What is it? Why use it? How to use it? ILRI
 
Importance of multimedia
Importance of multimediaImportance of multimedia
Importance of multimediaOnline
 

Viewers also liked (6)

Evolution news
Evolution newsEvolution news
Evolution news
 
Web image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawlingWeb image size prediction for efficient focused image crawling
Web image size prediction for efficient focused image crawling
 
Veracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking NewsVeracity & Velocity of Social Media Content during Breaking News
Veracity & Velocity of Social Media Content during Breaking News
 
Social Strategy: Why Does Strategy Get Left Behind?
Social Strategy: Why Does Strategy Get Left Behind?Social Strategy: Why Does Strategy Get Left Behind?
Social Strategy: Why Does Strategy Get Left Behind?
 
Multimedia in research: What is it? Why use it? How to use it?
Multimedia in research: What is it? Why use it? How to use it?  Multimedia in research: What is it? Why use it? How to use it?
Multimedia in research: What is it? Why use it? How to use it?
 
Importance of multimedia
Importance of multimediaImportance of multimedia
Importance of multimedia
 

Similar to News-oriented multimedia search over multiple social networks

ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)Piet J.H. Daas
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesPiet J.H. Daas
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Enrico Motta
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...Suparna De
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)Kunwoo Park
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisCrowdFlower
 
eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"
eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"
eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"eMadrid network
 
Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Symeon Papadopoulos
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesKimberley Mitchell
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical Universitybutest
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...Alistair Hamilton
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesOCLC
 
WikiRate - Data Liberation and Radical Transparency
WikiRate - Data Liberation and Radical TransparencyWikiRate - Data Liberation and Radical Transparency
WikiRate - Data Liberation and Radical TransparencyVishal Kapadia
 
Nextérité: Semantic Business Services
Nextérité: Semantic Business ServicesNextérité: Semantic Business Services
Nextérité: Semantic Business ServicesEdith Nuss
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseSoftServe
 

Similar to News-oriented multimedia search over multiple social networks (20)

ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)ESSnet Big Data WP8 Methodology (+ Quality, +IT)
ESSnet Big Data WP8 Methodology (+ Quality, +IT)
 
Leveraging Big Data Opportunities for Growth
Leveraging Big Data Opportunities for GrowthLeveraging Big Data Opportunities for Growth
Leveraging Big Data Opportunities for Growth
 
EMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniquesEMOS 2018 Big Data methods and techniques
EMOS 2018 Big Data methods and techniques
 
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
Research in Intelligent Systems and Data Science at the Knowledge Media Insti...
 
Semantic Technology in Publishing & Finance
Semantic Technology in Publishing & FinanceSemantic Technology in Publishing & Finance
Semantic Technology in Publishing & Finance
 
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
 
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...A Data-driven Approach for Internet of Things Applications: Methods and Case ...
A Data-driven Approach for Internet of Things Applications: Methods and Case ...
 
[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)[CS570] Machine Learning Team Project (I know what items really are)
[CS570] Machine Learning Team Project (I know what items really are)
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"
eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"
eMadrid 2014-01-17 uned Salvador Ros (UNED) "Big Data in Education"
 
Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar
 
Predictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use CasesPredictive Analytics: Context and Use Cases
Predictive Analytics: Context and Use Cases
 
Data Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical UniversityData Mining Xuequn Shang NorthWestern Polytechnical University
Data Mining Xuequn Shang NorthWestern Polytechnical University
 
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...Experimental transformation of  ABS data into Data Cube Vocabulary (DCV) form...
Experimental transformation of ABS data into Data Cube Vocabulary (DCV) form...
 
Exploratory Analysis of User Data
Exploratory Analysis of User DataExploratory Analysis of User Data
Exploratory Analysis of User Data
 
Smarter Data for Smarter Libraries
Smarter Data for Smarter LibrariesSmarter Data for Smarter Libraries
Smarter Data for Smarter Libraries
 
DBMS
DBMSDBMS
DBMS
 
WikiRate - Data Liberation and Radical Transparency
WikiRate - Data Liberation and Radical TransparencyWikiRate - Data Liberation and Radical Transparency
WikiRate - Data Liberation and Radical Transparency
 
Nextérité: Semantic Business Services
Nextérité: Semantic Business ServicesNextérité: Semantic Business Services
Nextérité: Semantic Business Services
 
Advanced Analytics and Data Science Expertise
Advanced Analytics and Data Science ExpertiseAdvanced Analytics and Data Science Expertise
Advanced Analytics and Data Science Expertise
 

More from REVEAL - Social Media Verification

Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...REVEAL - Social Media Verification
 
"Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ..."Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ...REVEAL - Social Media Verification
 
Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches REVEAL - Social Media Verification
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachREVEAL - Social Media Verification
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...REVEAL - Social Media Verification
 
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
 Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, GermanyREVEAL - Social Media Verification
 
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...REVEAL - Social Media Verification
 
TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...
TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...
TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...REVEAL - Social Media Verification
 

More from REVEAL - Social Media Verification (14)

Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...Geoparsing and Real-time Social Media Analytics - technical and social challe...
Geoparsing and Real-time Social Media Analytics - technical and social challe...
 
REVEAL Project - Trust and Credibility Analysis
REVEAL Project - Trust and Credibility AnalysisREVEAL Project - Trust and Credibility Analysis
REVEAL Project - Trust and Credibility Analysis
 
"Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ..."Extracting Attributed Verification and Debunking Reports from Social Media: ...
"Extracting Attributed Verification and Debunking Reports from Social Media: ...
 
Prix Italia 2015 - Verification in Social Newsgathering
Prix Italia 2015 - Verification in Social NewsgatheringPrix Italia 2015 - Verification in Social Newsgathering
Prix Italia 2015 - Verification in Social Newsgathering
 
Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches Verification of UGC/Eyewitness Media: Challenges and Approaches
Verification of UGC/Eyewitness Media: Challenges and Approaches
 
WWW2015 - RDSM2015 Workshop - Trust and Credibility Analysis
WWW2015 - RDSM2015 Workshop - Trust and Credibility AnalysisWWW2015 - RDSM2015 Workshop - Trust and Credibility Analysis
WWW2015 - RDSM2015 Workshop - Trust and Credibility Analysis
 
Geotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling ApproachGeotagging Social Media Content with a Refined Language Modelling Approach
Geotagging Social Media Content with a Refined Language Modelling Approach
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
 
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
 Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
Cross-Media Konferenz "Think Cross - Change Media" in Magdeburg, Germany
 
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
News Impact Summit - Verification, Investigation and Digital Ethics – Hamburg...
 
TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...
TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...
TRIDEC and REVEAL projects: Geoparsing and Geosemantic knowledge model for tr...
 
Reveal - Social Media Verification - poster
Reveal - Social Media Verification - posterReveal - Social Media Verification - poster
Reveal - Social Media Verification - poster
 
Focused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open DataFocused Exploration of Geospatial Context on Linked Open Data
Focused Exploration of Geospatial Context on Linked Open Data
 
REVEAL - Social Media Verification - brochure
REVEAL - Social Media Verification - brochureREVEAL - Social Media Verification - brochure
REVEAL - Social Media Verification - brochure
 

Recently uploaded

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Recently uploaded (20)

Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

News-oriented multimedia search over multiple social networks

  • 1. News-oriented multimedia search over multiple social networks Katerina Iliakopoulou, Symeon Papadopoulos and Yiannis Kompatsiaris 1Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI) CBMI 2015, June 11, 2015, Prague, Czech Republic Presented by Katerina Andreadou
  • 2. The rise of Online Social Networks (OSNs) #2 • Increasingly popular  Massive amounts of data – Both text and multimedia • Content peaks when – A planned event takes place (e.g., Olympic games) – An unexpected news story breaks (e.g., earthquake) Journalistic practices now involve the use of user- generated content from OSNs for reporting on news stories and events
  • 3. The Problem #3 • News stories are covered in multiple OSNs – Twitter, Facebook, Google+, Instagram, Tumblr, Flickr • No effective means of searching over multiple OSNs – Necessary to build appropriate queries – Find relevant hashtags and query keywords • Effective querying is not straightforward – Long complicated queries retrieve no results – Vague queries bring back irrelevant content
  • 4. The Problem is also OSN-specific #4 • Flickr search is more flexible – It returns results that contain all requested keywords or a portion of them with the appropriate ranking • Instagram is more restrictive – It can only handle hashtags – It returns very few or no results to multi-keyword queries • The order of keywords is also crucial for some OSNs Query formulation has to be OSN-specific
  • 5. Content requirements #5 • High relevance to the topic of interest • High quality of multimedia • Diversity of retrieved content • Usefulness with respect to reporting and publication
  • 6. Related work • Optimization of query formulation methods utilizing terms, proximities and phrases with respect to their frequency and text position – Markov random field models (Metzler et al., 2005) – Positional language models (Lv et al., 2009) – Query operations (Mishne et al., 2005) • Improve query formulation by modelling query concepts – Learning concept importance (Bendersky et al., 2010) – Latent content expansion using markov random fields (Metzler et al., 2007) #6
  • 7. Goals and Contributions • A novel graph-based query formulation method – Catered for the special characteristics of each OSN – Captures the primary entities and their associations – Builds numerous queries by greedy graph traversal • A relevance classification method – 12 features based on content (text, visual) and context (popularity, publication time) • Evaluation of the framework in real-world events and stories #7
  • 8. Overview of the Framework #8
  • 9. Step I: Collection of highly relevant content • Query six OSNs with a high precision query q0 to build an initial collection M0 – news story headline – official name of the event • Lower the possibility of noisy content by – discarding all material retrieved before the story broke • Only some OSNs were found to contribute to the collection: Twitter, Flickr, Google+ #9
  • 10. Step II: Keyword and hashtag extraction • Extract the Named Entities from the M0 metadata • Discard all stop-words and filter out HTML tags, web links and social network account names • Perform stemming for keywords that are not listed as Named Entities to group keywords with similar meaning Create a list of keywords and a list of hashtags, each associated with a frequency count #10
  • 11. Step III: Graph construction • Vertices  set of selected keywords • Edges  their pairwise adjacency relations – adjacency is computed with respect to the text metadata • Each edge  frequency of appearance of the phrase composed of the edge keywords • Only significant keywords are considered  keywords with greater frequency than the average – elimination of noisy keywords – cost-effectiveness #11
  • 12. Step IV: Query building • Query  path from a starting node to an end node given a maximum number of L hops • Starting node high out-degree or connected to heavy weighted edges • Total score for a node • Penalize queries with high text similarity  Jaccard coefficient #12
  • 13. Example: 86th Academy Awards #13
  • 14. Step V: Relevance classification Textual relevance is computed wrt the high precision query q0 • title & description • tags #14 Popularity Textual relevance Visual similarity Temporal proximity to the story Image dimensions
  • 15. Evaluation #15 • Choose 20 events and news stories which took place up to five months before data collection – the older the event, the more content disappears from the OSNs • Choose events with considerable size and variety • Set the maximum number of keyword-base queries Mmax=20 and the maximum number of hashtag- based queries to Mmax=10
  • 16. Data statistics #16 • More than 88K images for all 20 events • ~4.4K images per event/new story on average • Events are associated on average with more images (5.5K) than news stories (3.3K) Number of images collected during the first querying step Number of images collected during the second querying step
  • 17. Media volume per OSN #17 • Flickr contributes the most (66.9%) with Twitter following (19%) • Instagram and Google+ less but considerable • Tumblr and Facebook the least content – Tumblr has significantly lower usage – Facebook has very poor search API behaviour • Increase between the two retrieval steps – Facebook, Flickr, Tumblr: 5x – Google+, Instagram: even higher (8.1x and 6.8x) – Twitter: 3x
  • 18. Quality of formulated queries #18 • Evaluate the relevance and quality of the retrieved content in the second step (Mext) – A large majority (90%) of the images retrieved in the first step (M0) were relevant – Four human annotators • Relevance is high (>50%) for 3 events • Relevance is decent (>40%) for 3 news stories • Half of the events and news stories are characterized by low-to-medium relevance (10% - 40%) • Relevance is very low (<10%) for two events and two news stories
  • 19. Why is irrelevant content collected? #19 • Vague keyword-based queries or hashtags – Example: British Academy Film Awards  most popular hashtag  british – Example: Sundance Film Festival  vague query  film festival • False keyword-based queries – They contain keywords irrelevant to the subject – They are left-overs from the graph pruning, they should have been eliminated
  • 20. Relevance classification #20 DT  Decision Tree RF  Random Forest SVM  Support Vector Machine MP  Multilayer Perceptor
  • 21. Relevance classification #21 • RF outperforms the rest in all cases • DT is also very good • SVM has the worst performance – Input features are not normalized – A few of them are quantized to a small set of possible values
  • 22. Conclusion - Contributions • Searching for multimedia content around events and news stories over multiple OSNs is challenging! – Collect high quality relevant content in spite of the different behaviors and requirements of the OSNs • We proposed a multi-step process including – a graph-based query building method – a relevance classification step • We evaluated the framework on a set of 20 large- scale events and news stories of global interest #22
  • 23. Future Work • Improve the performance of the query building method when the number of collected items in the first step is small • Extract statistically grounded relevance features – Take into account distribution differences in different OSNs • Apply the method while the event evolves • Add support for the collection of video content #23
  • 24. Thank you! • Slides: http://www.slideshare.net/sympapadopoulos/newsoriented- multimedia-search-over-multiple-social-networks • Get in touch: @matzika00 / katerina.iliakopoulou@gmail.com @sympapadopoulos / papadop@iti.gr #24

Editor's Notes

  1. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/
  2. http://irevolution.net/2014/04/03/using-aidr-to-collect-and-analyze-tweets-from-chile-earthquake/