Blog Comments Organizer
An Interface for Organizing News Comments
Sweta Vajjhala, Nicholas Diakopoulous, Irfan Essa
Georgia Institute of Technology | College of Computing
801 Atlantic Drive, Atlanta, GA 30332
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
ABSTRACT Although there has been some research on organization of media
This paper focuses on organization of comments on a particular articles, little has been done to organize readers’ comments on
blog post. The research that was done was the first of its kind. these articles. This project focuses on that new aspect of
Background research was done with the field of computational computational journalism with the creation of a blog comments
journalism and its relation to the blogosphere, in additional to organizer.
research into categorization of blog posts. Several design ideas
were then considered for ways to organize blog comments. The 2. BACKGROUND
deciding factor was whether or not quotes from the post were There can be many different ways to organize an article’s
used in the comment. There was a specific algorithm that was comments. Today, the Internet has become the largest medium in
used to figure this out, and then, the design was applied to the the world for reading about news and interactively discussing it.
actual blog post itself. Results indicate that this would be a Not only has the number of readers increased, but the number of
successful application for all news blogs, should it be applied to blogs overall, especially news ones, has drastically increased .
the websites accordingly. Baumer et al. state that readers now have the mentality of: “I
know what’s there and I know where to find it when I need it.”
Categories and Subject Descriptors With this mannerism, readers are able to read about any type of
H.5.2 [User Interfaces]: Graphical user interfaces, H.5.3 [Group news article that they wish. With news blogs become increasingly
and Organization Interfaces] Collaborative computing popular, readers are slowly taking on the role of contributors, as
well, by posting comments to their favorite blogs.
General Terms With the variety of different news articles and comments that are
Design, Human Factors
posted, blogging has become a multi-faceted and heterogeneous
activity. Articles in news blogs today are often organized by into
Keywords different categories. In addition to this, people can add their own
design, computational journalism, blog, news, articles tags, which are collections of keywords attached to blog entries
that help describe what the entries are about .
1. INTRODUCTION Brooks & Montanez analyzed the effectiveness of tags for
There have been many different advances in technology that have classifying blog entries. Their results indicate that tags are using
helped organize information that is on the Internet. One of these for grouping articles into broad categories, but less effective in
fields, called computational journalism, is specifically tailored to indicating the particular content of an article. However, the idea
finding new ways to organize media information via technical of sharing tags could potentially be applied to help organize the
advancements. comments, based on the text of each comment. There are three
main uses of tagging: annotating information for personal use,
Since the emergence of Web 2.0, interactive media has become
placing information into broadly defined categories, and
very popular. Not only has it allowed for sharing information
annotating particular articles so as to describe their content .
across the world, but it has created an environment that
Each of these uses could also be applied to the comments on the
encourages collaboration among media articles. These
collaborations have formed millions of communities on the
Internet. News articles, in the form of blogs, have become very One problem that comes with tags is trying to identify appropriate
popular, allowing readers to become contributors  and express tags, while eliminating noise and spam . Another problem is
their opinions. several different tags might be used to all describe the same
concept, so this duplicity also creates extra clutter . A similar
problem needs to be addressed in the organization of blog
Permission to make digital or hard copies of all or part of this work for comments- which comments are useful to readers and which ones
personal or classroom use is granted without fee provided that copies are are spam or irrelevant to the topic of the post? One solution is to
not made or distributed for profit or commercial advantage and that automatically generate content-based tags, while also considering
copies bear this notice and the full citation on the first page. To copy when the tag was originally created. For comments, their
otherwise, or republish, to post on servers or to redistribute to lists, organization could be based on chronological order, with the most
requires prior specific permission and/or a fee. recent comments showing up first and the oldest ones showing up
After finding a way to organize the blog comments, the last thing comments organizer will match that of the Dot Earth page, so the
to do is to find a way to collect and organize the blog articles and integration of the application will seem transparent to the user.
its comments. The online public nature of blogs provides
incredible resources for data mining. Kramer and Rodden state
that, after collecting a variety of blogs, they used clustering to
group the blogs into categories based on five different factors:
melancholy, social, ranting, metaphysical, and work. They found
that blog articles are difficult to group into categories, because the
blogging community is so heterogeneous. So, each blog does not
cleanly fit into any single category . Comments on blogs are
also comparable to this- since there can be lots of different
discussions happening with comments, it could be very difficult
to place the comments into one category objectively.
Figure 1. Sketch of the blog comments organizer design. By
In the following sections, the design, algorithm, and evaluation of scrolling over the yellow highlighted text, the box at the top
the system will be presented, concluding with a discussion of the will show up. If the user is not moused over the highlighted
results and future work. text anymore, then the box will disappear.
The rationale for this design choice is supported by the fact that
3. BLOG COMMENTS ORGANIZER the data mining yielded that quotes were very often used in the
3.1 Data Mining comments of the Dot Earth blog. The blog comments organizer
The data that was used to implement the blog comments organizer would be a great tool for new readers to quickly get acquainted
was pulled from Dot Earth, an environmental blog written by with the traditional posting style of contributors to the Dot Earth
Andrew Revkin of The New York Times newspaper. On average, blog. Moreover, the blog comments organizer offers a way for
each of his articles tends to generate over 80 comments. Because readers to find out more information on a specific part of the
of the vast popularity of the blog and the variety of comments, article without having to read all of the 100+ comments. It
data from this blog was used in the testing of the blog comments provides the reader with the advantage of being able to only read
organizer. the comments that he/she is interested in, based on the parts of the
article that the reader liked.
Five articles were randomly chosen to undergo an analysis- by
hand. During this time, information and statistics about the set of 3.3 Data Collection
comments corresponding to each article were collected. The In order to collect the data from Dot Earth, a blog scraper script
information that was collected included the number of comments was written in the language of PHP5. The scraper script gets the
for each of the following: comments that were multiple 60 most recent articles in the Dot Earth blog and places them into
paragraphs long, comments that used quotes from the article a MySQL database. For each article, the scraper also gets all of
within them, comments that used statistics (or some other the comments and places those in the database too. The schema
numbers) to support their point-of-view, comments that for the database is as follows- the article is linked to each of its
referenced other related articles, comments that were a response comments using the field articleID.
to a previous comment, comments that used the same key words
(i.e. “history” or “future” or “evolution”), and finally, the number
of posts per day.
Out of the data that was collected above, the number that seemed
to yield the highest value was the number of comments that used
quotes from the article within them. As a result of this, it was
decided that the most optimal way to organize the comments for
this blog would be to show users a list of comments for each part Figure 2. Schema for the database that stores all of the articles
of the article that was used in a quote. and comments.
The design for the blog comments organizer was done first with
3.3.1 Algorithm for Gathering Data
The algorithm for gathering all of the articles and respective
some sketches. It was then implemented using PHP, HTML,
comments is given here.
The blog comments organizer can be easily integrated into the First connect to the Dot Earth homepage and get its HTML
Dot Earth page. For each article, it highlights the parts of the source. Inside the source, look for the title of each news article
article that are quoted in a comment. When a user then scrolls based on the corresponding HTML tags. For each of the articles,
over the highlighted part of the article, the comment(s) that look for the corresponding HTML tags for the comments. Read
reference(s) it will show up at the top of the page in reverse the text between all of the open and close HTML tags for each
chronological order, so that the most recent comment will show article and its comments. Insert all of this information into a
up first. A sketch of this design can be seen below. When the blog database with the schema above. In order to get articles across
comments organizer is implemented, the style of the blog multiple pages, loop through the same process, after finding the
corresponding HTML tags for each page.
Soon after this research was done, an API was introduced for Dot 4. EVALUATION
Earth. In the future, it might be easier to collect all of the data via The reception of the blog comments organizers to some volunteer
the API. However, this would also mean that the information testers presented some advantages and disadvantages to the blog
would be stored in an XML file, not in a database, and this could comments organizer. First and foremost, although the design is
make it harder to find quotes in the comments. integrated nicely into this particular blog (Dot Earth), it would
require a lot of customization for each blog for which this was
3.4 Finding Quotes in Comments used. This is because each blog will have a different style, and
Once the articles and comments are in the database, the next step therefore, the scraping will have to be done all over again.
is to go through all of the comments for each given article and see However, the actual algorithm that is used to find the quotes
if there are quotes from the article in there. would still be the same. Displaying the blog comments organizer
for each blog would again differ, based on the style of the blog.
First, check the comment all of the opening quote (“) symbols and
However, the algorithms for inserting the <div></div> tags would
the closing quote (”) symbols. If this exists, then see if the data
still remain the same, once the source code of the other sites were
between the two quotes matches any phrase from the article. Is it
important to check to make sure that the quotes are not links to
external pages, because these will match quotes to external pages One disadvantage of this blog comments organizer is that the
in the article. Therefore, this case must be excluded when algorithm searches for the start and end quote characters.
checking for quotes in the comments. If a quote in the comment However, a comment might have article from the text in it
matches text from the article, then the starting index of the text in paraphrased or presented without the quotation marks. If this was
the article should be stored in quote_index_start in the comments the case, then the presented algorithm would not find this as a
database table. The end of the quote should be stored in quote, because it is not located within quotation marks. By
quote_index_end. allowing for this to happen, there would be more comments for
the user to see in the design of the blog comments organizer.
3.4.1 Algorithm for Finding Quotes However, to be able to detect paraphrasing, it would also require
The algorithm for finding quotes from the article within a changing the fundamental algorithm to use some artificial
comment is below. intelligence techniques, in addition to what it is already doing,
while searching the article text.
for each article in the database:
get all comments for that article
One major advantage of this design is that the user is given a
choice whether he or she wants to read the comments. Since the
for each comment:
comments show up on a mouse-over event, if the user does not
quote_start_index = 0; want to use the feature after the first time, he will not see all of
quote_end_index = true; the different comments show up. Moreover, the comments are
as long as there is an ending quote: placed strategically towards the right-hand-side of the page,
go through text and find the opening quote where there is whitespace. This way, it does not cover up any
possible important information that is on the page. The blog
if there is no opening quote, exit loop by
setting quote_end to false comments organizer acts as a supplement to the reader to make it
easier for him to find the comments that he may be looking for,
if there is an opening quote:
but it does not require the user to use it.
search for the ending quote starting from
quote_start For example, someone who just wanted to browse the Dot Earth
search for the text between the starting and blog and get an idea of the contributors, they might want to
ending quotes in the article browse all of the comments, not just the parts that pertain to
if the text exists: certain parts of the text of the article. In this case, the user does
not have to use the blog comments organizer. However, if this
store the quote_start_index and
quote_end_index in the database for that user becomes a frequent visitor and contributor to the Dot Earth
comment blog, he may start to look for specific comments which pertain to
else: parts of an article that he likes. In this case, the user would find
the blog comments organizer an ideal tool to get the information
do not store anything and exit
that he needs without having to go through hundreds of
Using the algorithm above, quotes from the article were found
and the indices of where they were found were stored in the 5. DISCUSSION
database. Because of the variety of the usage of the blog comments
Once the quote indices were known, another script was written to organizer, there are many different ways that this tool can be
insert <div></div> tags around the quotes in the article text that useful. Namely, it focuses on the growing field of computational
trigger a mouse-over event, so that if the reader put their mouse sites that would benefit from organization of their reader
over the highlighted part of the article text, the list of comments comments, and this would be perfect.
that contained that part of the article as a quote would show up in In the Evaluation section above, there was an example of the user
the right-hand-side, as was shown in Figure 2 above. who just wanted to find information about a specific part of the
article. This blog comments organizer could be useful for data
analysts in the media profession. Based on a posted news article, 7. CONCLUSION
the author or the company that posted it can find out which parts It is possible to organize blog comments in a plethora of different
of the article triggered the most comments. Based on this, the ways. Depending on the medium and the type of blog that is being
company could post more articles that pertain to very similar used, there could be a number of ways to analyze and organize
topics. This would attract new users, as well as retain the current the comments in a meaningful way for the users that come by.
users. Organization of blog comments will soon become a very powerful
The blog comment organizer could revolutionize the way that tool that can be used to target the type of users that the blog is
articles are written and read. Based on the popularity of a certain tailored towards.
part of an article, blogs can be tailored to suit the majority of its While there are many different ways to organize comments and
readers. This would introduce a new level of specificity for the using quotes (as in this particular blog comments organizer) is just
blog. If many blogs were to follow this and focused on specific one, it is important to realize that this growing field could soon re-
topics, it might make blogs easier to categorize and make tags define the way that media is presented to the world.
6. FUTURE WORK Many thanks to all volunteer evaluators, especially Sekhar
There are many different applications and related work that could Vajjhala, Carolina Gomez, Blair Daly, and Nicholas Bowen.
be done based on the blog comments organizer.
First and foremost, a different metric could be used to organize 9. REFERENCES
comments. Right now, only quotes are being used, but blog  Baumer, Eric, Mark Sueyoshi, and Bill Tomlinson.
comments could also be gauged based on the themes of the posts "Exploring the Role of the Reader in the Activity of
(i.e. history, evolution, etc.) or comments that used statistics. Blogging." CHI 2008 (2008): 1111-20.
Blogs have become a source for data mining, and if users are
looking for certain quotes or numbers and comments contain  Brooks, Christopher H., and Nancy Montanez. "An Analysis
those statistics, this would be very useful for the user. of the Effectiveness of Tagging in Blogs." American
Association for Artificial Intelligence (2006).
The blog comments organizer could also be used to analyze
different types of media. Right now, only written blogs are being  Gill, Alastair J., et al. "Emotion Rating from Short Blog
analyzed. However, video blogs are slowly becoming more Texts." CHI 2008 (2008): 1121-24.
popular, so being able to find comments that quoted parts of a  Kramer, Adam D.I., and Kerry Rodden. "Word Usage and
video in a blog post would also prove to be very useful. Posting Behaviors: Modeling Blogs with Unobtrusive Data
Collection Methods." CHI 2008 (2008): 1125-28