SlideShare a Scribd company logo
Automatically Selecting Striking
Images for Social Cards
Shawn M. Jones *† · Martin Klein † · Michele C. Weigle * · Michael L. Nelson *
* Old Dominion University, Web Science and Digital Libraries Research Group
† Los Alamos National Laboratory, Research Library Prototyping Team
2
@shawnmjones
This work is part of the
Dark and Stormy Archives (DSA) project
Web archive collection of
1000s of documents
Automated
Solution
A story that conveys
understanding at a glance
3
@shawnmjones
Social cards provide a visual summary of the
content behind a URL
https://www.google.com/maps/dir/Old+Dominion+University,+Norfolk,+VA/Los+Alamos+National+Laboratory,
+New+Mexico/@35 .3644614,-109.356967,4z/data=!3m1!4b1!4m13!4m12!1m5!1m1!1
s0x89ba99ad24ba3945:0xcd2bdc432c4e4bac!2m2!1d-76.3067676!2d36
.8855515!1m5!1m1!1s0x87181246af22e765:0x7f5a90170c5df1b4!2m2!1 d-106.287162!2d35.8440582
Long URL:
The same URL
represented by
a social card:
4
@shawnmjones
Social cards consist of different units
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
5
@shawnmjones
Social cards allow resources to
compete for clicks.
Nature article shared on Twitter
In addition to summarizing the resource, social cards drive clicks to the resource, answering the
question of What does the underlying page contain?
Which of these is more appealing?
This is also a case of ”The Truth Is Paywalled but the Lies Are Free”
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
disinformation source
shared on Twitter
6
@shawnmjones
Cards are generated based on the
HTML metadata that authors provide
og:title
-or-
twitter:title
-or-
<title>
og:description
-or-
twitter:description
-or-
description
og:image
-or-
twitter:image
Without twitter:card and og:title or twitter:title, Twitter typically gives up and does
not generate a card.
Facebook parses the <title> and produces a card with just a title.
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
7
@shawnmjones
RQ1: What are the distributions of HTML
metadata elements (general and social card
elements) in news articles (over time) and
scholarly publications published on the web?
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
If the metadata is prevalent and of high quality, then we can rely on it.
If not, then to create good cards, we need to develop methods to fill in
the missing metadata.
8
@shawnmjones
We analyzed 198,523 news articles captured by
the Internet Archive from 1998 to 2016, and found
different rates
of metadata adoption
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
released
2009
released
2010
released
2011
released
1998
est. released
1995
released
2010
est. released
2009
proposed
2009
released
2011
est. released
2012
released
2014
released
2009
est. released
2006
est. released
2006
est. released
2011
est. released
2010
OGP = Open Graph Protocol
Facebook Cards
150 billion documents
in the Internet Archive
were captured before
2010
9
@shawnmjones
We evaluated the HTML pages of 110,900
scholarly articles from the PubMed Central
dataset – 100 articles each from 1,109 journals
These are not archived pages, but how these articles were presented in 2020.
77.86% looks good, until we look at
the images presented in these cards...
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
10
@shawnmjones
74% of scholarly publications use publisher and
journal logos as striking images; 52% reuse the
same image for all articles
101 articles
200 articles
200 articles each
300 articles
107 articles
200 articles 274 articles
274 articles
(blank)
2034 articles
400 articles
300 articles
11
@shawnmjones
For news articles, most striking images are of article
content, and those that repeat across articles tend
to be author photos
48 articles
11 articles 7 articles
3 articles
15 articles
65 articles
24 articles
47 articles 3 articles
15,823 articles
131 articles
54 articles
3 articles 73 articles
3 articles
59 articles
3 articles
43 articles
Publisher logo
Author Photos
12
@shawnmjones
RQ2: What approaches and image features
are best suited to automatically select striking
images from news articles and scholarly
publications, and do the approaches differ for
both resource types?
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
Good example for news
Good example for scholarly publication
13
@shawnmjones
If no metadata
exists, we can
select a striking
image from the
images available
in the document
Which of the images
outlined in red is the striking
one chosen by the author?
How would a machine know
which one to choose if there
were no striking image
specified in the metadata?
14
@shawnmjones
Our generic
selection approach
has 3 steps
1. Score each image in the
document by some
approach (e.g., ML
probability, feature value)
2. Sort the list of images by
descending score (e.g.,
highest ML probability is
first, image with most
colors is first)
3. Choose the image at the
beginning of the list
154,131
colors
Sorted by color count
Sorted by
classifier probability
48,020
colors
44,737
colors
30,940
colors
0.3623
0.1948
0.1259
3,816
colors
0.1116
0.11
(resized)
(cropped)
(resized)
(cropped)
(larger)
15
@shawnmjones
NEWSROOM Dataset sample
We sampled from two datasets to determine
which approaches worked best for selecting
striking images
PLOS ONE dataset sample
• News articles tend to select images that
represent their stories
• 37,522 news articles
• Submission guidelines encourage authors to
choose their own striking images after
acceptance
• 198,523 scholarly articles
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
News Articles Scholarly Publications
In both, the metadata gives us the ground truth
the image that an author chose for their article.
16
@shawnmjones
A social card creation service needs to be able to select a striking
image in close to real time, so we considered base features that
are quickly calculated by image libraries
byte size: 71,934 bytes
width: 320 pixels
height: 242 pixels
negative space: 53 histogram cols = 0
size in pixels: 77,440 pixels
aspect ratio: 1.3223
number of colors: 13,891
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
Base features
17
@shawnmjones
The consistent structure of scholarly
publications allows us to quickly calculate
additional features for each image
Figure position
features
• figure position
• figure position
(scaled)
Section features
• section index
• scaled section index
• character position in section
• word position in section
Caption features
• Caption TF rank
• Caption TF rank
(scaled)
• Jaccard distance
of title and caption
character position: 7,508
word position: 1,196
section index: 2
scaled section index: 0.333
figure position: 6
figure position
(scaled): 0.857
Caption TF rank: 3
Caption TF rank (scaled): 0.429
Jaccard: 0.85
18
@shawnmjones
We evaluate our approaches with
P@1 and MRR
• Precision@1 (P@1): Does the prediction
approach choose the right image?
− P@1 = 1.0 if yes, 0 if no
• Mean Reciprocal Rank (MRR): If it failed,
how far off was it?
− the mean of the reciprocal ranks of all results
− e.g., if approach ranks the ground truth
striking image as #5, then RR = 0.2
− MRR of 1.0 is desirable
• But how do we know what the correct
image is?
− Did the image have the same URL as the one
in the metadata?
− If not, was it perceptually the same?
Image Color
Count
154,131
colors
48,020
colors
44,737
colors
30,940
colors
3,816
colors
P@1 = 0
RR = 1/2 = 0.5
Image chosen by
approach: most colors
Image chosen by
author (ground truth)
Perceptually the same
as the image chosen
by author, as
determined by pHash
19
@shawnmjones
37,522 news articles from NEWSROOM 198,523 scholarly articles from PLOS ONE
Different features work best to predict the striking
image for news articles vs. scholarly publications
P@1=0.83
MRR=0.88
P@1=0.78
MRR=0.86
20
@shawnmjones
Conclusions
• News articles quickly adopted social cards
• Prior to 2010, there were no standards, corresponding
to 150 billion documents in the Internet Archive that
need automatic summarization
• News article metadata have striking images drawn from
the article
• Scholarly publishers favor company or journal logos for
their striking images, not summarizing the document
• For predicting striking images based on the content of
the document:
− Random Forest with base features performed best for
news articles (P@1=0.83)
− Random Forest with base features and figure position
performed best for scholarly publications (P@1=0.78)
• For more information, see the
Dark and Stormy Archives Project:
https://oduwsdl.github.io/dsa/
S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social
Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
released
2010
released
2011
news articles scholarly publications

More Related Content

What's hot

Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
Patti Anklam
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
Toronto Metropolitan University
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011
guillaume ereteo
 
Social Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasySocial Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made Easy
Jeff Mohr
 
A comparative study of social network analysis tools
A comparative study of social network analysis toolsA comparative study of social network analysis tools
A comparative study of social network analysis toolsDavid Combe
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
Hendrik Speck
 
12 Network Experiments and Interventions: Studying Information Diffusion and ...
12 Network Experiments and Interventions: Studying Information Diffusion and ...12 Network Experiments and Interventions: Studying Information Diffusion and ...
12 Network Experiments and Interventions: Studying Information Diffusion and ...
dnac
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
Mike Kujawski
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
librarianrafia
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
Arsalan Khan
 
FRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIOR
FRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIORFRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIOR
FRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIOR
ijcseit
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to Tools
Patti Anklam
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Jonathan Stray
 
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Jonathan Stray
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
ACMBangalore
 
Node XL - features and demo
Node XL - features and demoNode XL - features and demo
Node XL - features and demo
Mayank Mohan
 
2009 Node XL Overview: Social Network Analysis in Excel 2007
2009 Node XL Overview: Social Network Analysis in Excel 20072009 Node XL Overview: Social Network Analysis in Excel 2007
2009 Node XL Overview: Social Network Analysis in Excel 2007
Marc Smith
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
Rory Sie
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
THomas Plotkowiak
 

What's hot (20)

Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Introduction to Social Network Analysis
Introduction to Social Network AnalysisIntroduction to Social Network Analysis
Introduction to Social Network Analysis
 
Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011Social network analysis course 2010 - 2011
Social network analysis course 2010 - 2011
 
Social Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made EasySocial Network Analysis (SNA) Made Easy
Social Network Analysis (SNA) Made Easy
 
A comparative study of social network analysis tools
A comparative study of social network analysis toolsA comparative study of social network analysis tools
A comparative study of social network analysis tools
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
12 Network Experiments and Interventions: Studying Information Diffusion and ...
12 Network Experiments and Interventions: Studying Information Diffusion and ...12 Network Experiments and Interventions: Studying Information Diffusion and ...
12 Network Experiments and Interventions: Studying Information Diffusion and ...
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Social Network Visualization 101
Social Network Visualization 101Social Network Visualization 101
Social Network Visualization 101
 
Social Network Analysis (SNA) 2018
Social Network Analysis  (SNA) 2018Social Network Analysis  (SNA) 2018
Social Network Analysis (SNA) 2018
 
FRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIOR
FRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIORFRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIOR
FRIEND SUGGESTION SYSTEM FOR THE SOCIAL NETWORK BASED ON USER BEHAVIOR
 
Social Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to ToolsSocial Network Analysis & an Introduction to Tools
Social Network Analysis & an Introduction to Tools
 
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
Frontiers of Computational Journalism week 8 - Visualization and Network Anal...
 
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
Frontiers of Computational Journalism week 1 - Introduction and High Dimensio...
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 
Node XL - features and demo
Node XL - features and demoNode XL - features and demo
Node XL - features and demo
 
2009 Node XL Overview: Social Network Analysis in Excel 2007
2009 Node XL Overview: Social Network Analysis in Excel 20072009 Node XL Overview: Social Network Analysis in Excel 2007
2009 Node XL Overview: Social Network Analysis in Excel 2007
 
The Basics of Social Network Analysis
The Basics of Social Network AnalysisThe Basics of Social Network Analysis
The Basics of Social Network Analysis
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
Social network analysis intro part I
Social network analysis intro part ISocial network analysis intro part I
Social network analysis intro part I
 

Similar to Automatically Selecting Striking Images for Social Cards

Exploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureExploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and Future
Bohyun Kim
 
Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advances
Deakin University
 
A picture and a thousand words: Mixing modalities to tackle new multimedia i...
A picture and a thousand words: Mixing modalities to tackle new  multimedia i...A picture and a thousand words: Mixing modalities to tackle new  multimedia i...
A picture and a thousand words: Mixing modalities to tackle new multimedia i...
maranlar
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
Shawn Jones
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …Marc Smith
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Shawn Jones
 
How and why study big cultural data v2
How and why study big cultural data v2How and why study big cultural data v2
How and why study big cultural data v2
Lev Manovich
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Shawn Jones
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
Shawn Jones
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
Michael Mathioudakis
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Symeon Papadopoulos
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
REVEAL - Social Media Verification
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
Marc Smith
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
Marc Smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
Local Social Summit
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
Ana Jofre
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
SCAPE Project
 
Thesis xu han final
Thesis xu han finalThesis xu han final
Thesis xu han final
Javat Ibarra
 
People's mode of online engagement: The Many Faces of Digital Visitors and Re...
People's mode of online engagement: The Many Faces of Digital Visitors and Re...People's mode of online engagement: The Many Faces of Digital Visitors and Re...
People's mode of online engagement: The Many Faces of Digital Visitors and Re...
Lynn Connaway
 

Similar to Automatically Selecting Striking Images for Social Cards (20)

Exploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and FutureExploring Machine Learning for Libraries and Archives: Present and Future
Exploring Machine Learning for Libraries and Archives: Present and Future
 
Deep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advancesDeep learning and reasoning: Recent advances
Deep learning and reasoning: Recent advances
 
A picture and a thousand words: Mixing modalities to tackle new multimedia i...
A picture and a thousand words: Mixing modalities to tackle new  multimedia i...A picture and a thousand words: Mixing modalities to tackle new  multimedia i...
A picture and a thousand words: Mixing modalities to tackle new multimedia i...
 
SSRI_pt1.ppt
SSRI_pt1.pptSSRI_pt1.ppt
SSRI_pt1.ppt
 
Combining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web ArchivesCombining Social Media Storytelling With Web Archives
Combining Social Media Storytelling With Web Archives
 
20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …20120622 web sci12-won-marc smith-semantic and social network analysis of …
20120622 web sci12-won-marc smith-semantic and social network analysis of …
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
How and why study big cultural data v2
How and why study big cultural data v2How and why study big cultural data v2
How and why study big cultural data v2
 
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
Abstract Images Have Different Levels of Retrievability Per Reverse Image Sea...
 
Storytelling With Web Archives
Storytelling With Web ArchivesStorytelling With Web Archives
Storytelling With Web Archives
 
Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020Mining the Social Web - Lecture 2 - T61.6020
Mining the Social Web - Lecture 2 - T61.6020
 
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...Media REVEALr: A social multimedia monitoring and intelligence system for Web...
Media REVEALr: A social multimedia monitoring and intelligence system for Web...
 
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
Mediarevealr: A social multimedia monitoring and intelligence system for Web ...
 
20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...20120301 strata-marc smith-mapping social media networks with no coding using...
20120301 strata-marc smith-mapping social media networks with no coding using...
 
20111103 con tech2011-marc smith
20111103 con tech2011-marc smith20111103 con tech2011-marc smith
20111103 con tech2011-marc smith
 
LSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social MediaLSS'11: Charting Collections Of Connections In Social Media
LSS'11: Charting Collections Of Connections In Social Media
 
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
 
Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation Quality assurance for document image collections in digital preservation
Quality assurance for document image collections in digital preservation
 
Thesis xu han final
Thesis xu han finalThesis xu han final
Thesis xu han final
 
People's mode of online engagement: The Many Faces of Digital Visitors and Re...
People's mode of online engagement: The Many Faces of Digital Visitors and Re...People's mode of online engagement: The Many Faces of Digital Visitors and Re...
People's mode of online engagement: The Many Faces of Digital Visitors and Re...
 

More from Shawn Jones

DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
Shawn Jones
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
Shawn Jones
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Shawn Jones
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
Shawn Jones
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Shawn Jones
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Shawn Jones
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
Shawn Jones
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
Shawn Jones
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
Shawn Jones
 
Reference Rot
Reference RotReference Rot
Reference Rot
Shawn Jones
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
Shawn Jones
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Shawn Jones
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonest
Shawn Jones
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven Development
Shawn Jones
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wiki
Shawn Jones
 

More from Shawn Jones (15)

DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
DIRA 2022 Poster -- Abstract Images Have Different Levels of Retrievability P...
 
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
It’s All About The Cards: Sharing on Social Media Encouraged HTML Metadata G...
 
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...Improving Collection Understanding For Web Archives With Storytelling: Shinin...
Improving Collection Understanding For Web Archives With Storytelling: Shinin...
 
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)SHARI(StoryGraph Hypercane ArchiveNow Raintale Integration)
SHARI (StoryGraph Hypercane ArchiveNow Raintale Integration)
 
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
Social Cards Probably Provide For Better Understanding Of Web Archive Collect...
 
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...Improving Understanding of Web Archive Collections Through Storytelling - PhD...
Improving Understanding of Web Archive Collections Through Storytelling - PhD...
 
The Off-Topic Memento Toolkit
The Off-Topic Memento ToolkitThe Off-Topic Memento Toolkit
The Off-Topic Memento Toolkit
 
The Many Shapes of Archive-It
The Many Shapes of Archive-ItThe Many Shapes of Archive-It
The Many Shapes of Archive-It
 
Improving Collection Understanding in Web Archives
Improving Collection Understanding in Web ArchivesImproving Collection Understanding in Web Archives
Improving Collection Understanding in Web Archives
 
Reference Rot
Reference RotReference Rot
Reference Rot
 
Where Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive CollectionsWhere Can We Post Stories Summarizing Web Archive Collections
Where Can We Post Stories Summarizing Web Archive Collections
 
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using MementoAvoiding Spoilers On MediaWiki Fan Sites Using Memento
Avoiding Spoilers On MediaWiki Fan Sites Using Memento
 
Continuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonestContinuous Integration: Finding problems soonest
Continuous Integration: Finding problems soonest
 
A Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven DevelopmentA Brief Introduction to Test-Driven Development
A Brief Introduction to Test-Driven Development
 
Reconstructing the past with media wiki
Reconstructing the past with media wikiReconstructing the past with media wiki
Reconstructing the past with media wiki
 

Recently uploaded

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
g2nightmarescribd
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 

Recently uploaded (20)

Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Generating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using SmithyGenerating a custom Ruby SDK for your web service or Rails API using Smithy
Generating a custom Ruby SDK for your web service or Rails API using Smithy
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 

Automatically Selecting Striking Images for Social Cards

  • 1. Automatically Selecting Striking Images for Social Cards Shawn M. Jones *† · Martin Klein † · Michele C. Weigle * · Michael L. Nelson * * Old Dominion University, Web Science and Digital Libraries Research Group † Los Alamos National Laboratory, Research Library Prototyping Team
  • 2. 2 @shawnmjones This work is part of the Dark and Stormy Archives (DSA) project Web archive collection of 1000s of documents Automated Solution A story that conveys understanding at a glance
  • 3. 3 @shawnmjones Social cards provide a visual summary of the content behind a URL https://www.google.com/maps/dir/Old+Dominion+University,+Norfolk,+VA/Los+Alamos+National+Laboratory, +New+Mexico/@35 .3644614,-109.356967,4z/data=!3m1!4b1!4m13!4m12!1m5!1m1!1 s0x89ba99ad24ba3945:0xcd2bdc432c4e4bac!2m2!1d-76.3067676!2d36 .8855515!1m5!1m1!1s0x87181246af22e765:0x7f5a90170c5df1b4!2m2!1 d-106.287162!2d35.8440582 Long URL: The same URL represented by a social card:
  • 4. 4 @shawnmjones Social cards consist of different units S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
  • 5. 5 @shawnmjones Social cards allow resources to compete for clicks. Nature article shared on Twitter In addition to summarizing the resource, social cards drive clicks to the resource, answering the question of What does the underlying page contain? Which of these is more appealing? This is also a case of ”The Truth Is Paywalled but the Lies Are Free” S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021] disinformation source shared on Twitter
  • 6. 6 @shawnmjones Cards are generated based on the HTML metadata that authors provide og:title -or- twitter:title -or- <title> og:description -or- twitter:description -or- description og:image -or- twitter:image Without twitter:card and og:title or twitter:title, Twitter typically gives up and does not generate a card. Facebook parses the <title> and produces a card with just a title. S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
  • 7. 7 @shawnmjones RQ1: What are the distributions of HTML metadata elements (general and social card elements) in news articles (over time) and scholarly publications published on the web? S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021] If the metadata is prevalent and of high quality, then we can rely on it. If not, then to create good cards, we need to develop methods to fill in the missing metadata.
  • 8. 8 @shawnmjones We analyzed 198,523 news articles captured by the Internet Archive from 1998 to 2016, and found different rates of metadata adoption S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021] released 2009 released 2010 released 2011 released 1998 est. released 1995 released 2010 est. released 2009 proposed 2009 released 2011 est. released 2012 released 2014 released 2009 est. released 2006 est. released 2006 est. released 2011 est. released 2010 OGP = Open Graph Protocol Facebook Cards 150 billion documents in the Internet Archive were captured before 2010
  • 9. 9 @shawnmjones We evaluated the HTML pages of 110,900 scholarly articles from the PubMed Central dataset – 100 articles each from 1,109 journals These are not archived pages, but how these articles were presented in 2020. 77.86% looks good, until we look at the images presented in these cards... S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021]
  • 10. 10 @shawnmjones 74% of scholarly publications use publisher and journal logos as striking images; 52% reuse the same image for all articles 101 articles 200 articles 200 articles each 300 articles 107 articles 200 articles 274 articles 274 articles (blank) 2034 articles 400 articles 300 articles
  • 11. 11 @shawnmjones For news articles, most striking images are of article content, and those that repeat across articles tend to be author photos 48 articles 11 articles 7 articles 3 articles 15 articles 65 articles 24 articles 47 articles 3 articles 15,823 articles 131 articles 54 articles 3 articles 73 articles 3 articles 59 articles 3 articles 43 articles Publisher logo Author Photos
  • 12. 12 @shawnmjones RQ2: What approaches and image features are best suited to automatically select striking images from news articles and scholarly publications, and do the approaches differ for both resource types? S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021] Good example for news Good example for scholarly publication
  • 13. 13 @shawnmjones If no metadata exists, we can select a striking image from the images available in the document Which of the images outlined in red is the striking one chosen by the author? How would a machine know which one to choose if there were no striking image specified in the metadata?
  • 14. 14 @shawnmjones Our generic selection approach has 3 steps 1. Score each image in the document by some approach (e.g., ML probability, feature value) 2. Sort the list of images by descending score (e.g., highest ML probability is first, image with most colors is first) 3. Choose the image at the beginning of the list 154,131 colors Sorted by color count Sorted by classifier probability 48,020 colors 44,737 colors 30,940 colors 0.3623 0.1948 0.1259 3,816 colors 0.1116 0.11 (resized) (cropped) (resized) (cropped) (larger)
  • 15. 15 @shawnmjones NEWSROOM Dataset sample We sampled from two datasets to determine which approaches worked best for selecting striking images PLOS ONE dataset sample • News articles tend to select images that represent their stories • 37,522 news articles • Submission guidelines encourage authors to choose their own striking images after acceptance • 198,523 scholarly articles S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021] News Articles Scholarly Publications In both, the metadata gives us the ground truth the image that an author chose for their article.
  • 16. 16 @shawnmjones A social card creation service needs to be able to select a striking image in close to real time, so we considered base features that are quickly calculated by image libraries byte size: 71,934 bytes width: 320 pixels height: 242 pixels negative space: 53 histogram cols = 0 size in pixels: 77,440 pixels aspect ratio: 1.3223 number of colors: 13,891 S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021] Base features
  • 17. 17 @shawnmjones The consistent structure of scholarly publications allows us to quickly calculate additional features for each image Figure position features • figure position • figure position (scaled) Section features • section index • scaled section index • character position in section • word position in section Caption features • Caption TF rank • Caption TF rank (scaled) • Jaccard distance of title and caption character position: 7,508 word position: 1,196 section index: 2 scaled section index: 0.333 figure position: 6 figure position (scaled): 0.857 Caption TF rank: 3 Caption TF rank (scaled): 0.429 Jaccard: 0.85
  • 18. 18 @shawnmjones We evaluate our approaches with P@1 and MRR • Precision@1 (P@1): Does the prediction approach choose the right image? − P@1 = 1.0 if yes, 0 if no • Mean Reciprocal Rank (MRR): If it failed, how far off was it? − the mean of the reciprocal ranks of all results − e.g., if approach ranks the ground truth striking image as #5, then RR = 0.2 − MRR of 1.0 is desirable • But how do we know what the correct image is? − Did the image have the same URL as the one in the metadata? − If not, was it perceptually the same? Image Color Count 154,131 colors 48,020 colors 44,737 colors 30,940 colors 3,816 colors P@1 = 0 RR = 1/2 = 0.5 Image chosen by approach: most colors Image chosen by author (ground truth) Perceptually the same as the image chosen by author, as determined by pHash
  • 19. 19 @shawnmjones 37,522 news articles from NEWSROOM 198,523 scholarly articles from PLOS ONE Different features work best to predict the striking image for news articles vs. scholarly publications P@1=0.83 MRR=0.88 P@1=0.78 MRR=0.86
  • 20. 20 @shawnmjones Conclusions • News articles quickly adopted social cards • Prior to 2010, there were no standards, corresponding to 150 billion documents in the Internet Archive that need automatic summarization • News article metadata have striking images drawn from the article • Scholarly publishers favor company or journal logos for their striking images, not summarizing the document • For predicting striking images based on the content of the document: − Random Forest with base features performed best for news articles (P@1=0.83) − Random Forest with base features and figure position performed best for scholarly publications (P@1=0.78) • For more information, see the Dark and Stormy Archives Project: https://oduwsdl.github.io/dsa/ S. M. Jones, M. C. Weigle, M. Klein, and M. L. Nelson. 2021. Automatically Selecting Striking Images for Social Cards. In ACM WebSci ‘21. https://arxiv.org/pdf/2103.04899.pdf [to be published June 2021] released 2010 released 2011 news articles scholarly publications