Blog Comments Organizer
                                An Interface for Organizing News Comments
After finding a way to organize the blog comments, the last thing        comments organizer will match that of the Dot Ear...
Soon after this research was done, an API was introduced for Dot          4. EVALUATION
Earth. In the future, it might be ...
analysts in the media profession. Based on a posted news article,       7. CONCLUSION
the author or the company that poste...
Upcoming SlideShare
Loading in...5

Blog Comments Organizer


Published on

A new proposed mechanism for organizing blog comments for a particular post so that they are most relevant to what the user is looking for when they read the post.

Published in: Technology, News & Politics
1 Comment
1 Like
  • You really make it seem really easy along with your presentation but I find this matter to be actually something that I feel I would never understand. It kind of feels too complex and very huge for me. I am having a look forward to your subsequent post, I will try to get the dangle of it!
    Are you sure you want to  Yes  No
    Your message goes here
No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Blog Comments Organizer

  1. 1. Blog Comments Organizer An Interface for Organizing News Comments Sweta Vajjhala, Nicholas Diakopoulous, Irfan Essa Georgia Institute of Technology | College of Computing 801 Atlantic Drive, Atlanta, GA 30332,, ABSTRACT Although there has been some research on organization of media This paper focuses on organization of comments on a particular articles, little has been done to organize readers’ comments on blog post. The research that was done was the first of its kind. these articles. This project focuses on that new aspect of Background research was done with the field of computational computational journalism with the creation of a blog comments journalism and its relation to the blogosphere, in additional to organizer. research into categorization of blog posts. Several design ideas were then considered for ways to organize blog comments. The 2. BACKGROUND deciding factor was whether or not quotes from the post were There can be many different ways to organize an article’s used in the comment. There was a specific algorithm that was comments. Today, the Internet has become the largest medium in used to figure this out, and then, the design was applied to the the world for reading about news and interactively discussing it. actual blog post itself. Results indicate that this would be a Not only has the number of readers increased, but the number of successful application for all news blogs, should it be applied to blogs overall, especially news ones, has drastically increased [1]. the websites accordingly. Baumer et al. state that readers now have the mentality of: “I know what’s there and I know where to find it when I need it.” Categories and Subject Descriptors With this mannerism, readers are able to read about any type of H.5.2 [User Interfaces]: Graphical user interfaces, H.5.3 [Group news article that they wish. With news blogs become increasingly and Organization Interfaces] Collaborative computing popular, readers are slowly taking on the role of contributors, as well, by posting comments to their favorite blogs. General Terms With the variety of different news articles and comments that are Design, Human Factors posted, blogging has become a multi-faceted and heterogeneous activity. Articles in news blogs today are often organized by into Keywords different categories. In addition to this, people can add their own design, computational journalism, blog, news, articles tags, which are collections of keywords attached to blog entries that help describe what the entries are about [2]. 1. INTRODUCTION Brooks & Montanez analyzed the effectiveness of tags for There have been many different advances in technology that have classifying blog entries. Their results indicate that tags are using helped organize information that is on the Internet. One of these for grouping articles into broad categories, but less effective in fields, called computational journalism, is specifically tailored to indicating the particular content of an article. However, the idea finding new ways to organize media information via technical of sharing tags could potentially be applied to help organize the advancements. comments, based on the text of each comment. There are three main uses of tagging: annotating information for personal use, Since the emergence of Web 2.0, interactive media has become placing information into broadly defined categories, and very popular. Not only has it allowed for sharing information annotating particular articles so as to describe their content [2]. across the world, but it has created an environment that Each of these uses could also be applied to the comments on the encourages collaboration among media articles. These blog post. collaborations have formed millions of communities on the Internet. News articles, in the form of blogs, have become very One problem that comes with tags is trying to identify appropriate popular, allowing readers to become contributors [1] and express tags, while eliminating noise and spam [3]. Another problem is their opinions. several different tags might be used to all describe the same concept, so this duplicity also creates extra clutter [2]. A similar problem needs to be addressed in the organization of blog Permission to make digital or hard copies of all or part of this work for comments- which comments are useful to readers and which ones personal or classroom use is granted without fee provided that copies are are spam or irrelevant to the topic of the post? One solution is to not made or distributed for profit or commercial advantage and that automatically generate content-based tags, while also considering copies bear this notice and the full citation on the first page. To copy when the tag was originally created. For comments, their otherwise, or republish, to post on servers or to redistribute to lists, organization could be based on chronological order, with the most requires prior specific permission and/or a fee. recent comments showing up first and the oldest ones showing up last.
  2. 2. After finding a way to organize the blog comments, the last thing comments organizer will match that of the Dot Earth page, so the to do is to find a way to collect and organize the blog articles and integration of the application will seem transparent to the user. its comments. The online public nature of blogs provides incredible resources for data mining. Kramer and Rodden state that, after collecting a variety of blogs, they used clustering to group the blogs into categories based on five different factors: melancholy, social, ranting, metaphysical, and work. They found that blog articles are difficult to group into categories, because the blogging community is so heterogeneous. So, each blog does not cleanly fit into any single category [4]. Comments on blogs are also comparable to this- since there can be lots of different discussions happening with comments, it could be very difficult to place the comments into one category objectively. Figure 1. Sketch of the blog comments organizer design. By In the following sections, the design, algorithm, and evaluation of scrolling over the yellow highlighted text, the box at the top the system will be presented, concluding with a discussion of the will show up. If the user is not moused over the highlighted results and future work. text anymore, then the box will disappear. The rationale for this design choice is supported by the fact that 3. BLOG COMMENTS ORGANIZER the data mining yielded that quotes were very often used in the 3.1 Data Mining comments of the Dot Earth blog. The blog comments organizer The data that was used to implement the blog comments organizer would be a great tool for new readers to quickly get acquainted was pulled from Dot Earth, an environmental blog written by with the traditional posting style of contributors to the Dot Earth Andrew Revkin of The New York Times newspaper. On average, blog. Moreover, the blog comments organizer offers a way for each of his articles tends to generate over 80 comments. Because readers to find out more information on a specific part of the of the vast popularity of the blog and the variety of comments, article without having to read all of the 100+ comments. It data from this blog was used in the testing of the blog comments provides the reader with the advantage of being able to only read organizer. the comments that he/she is interested in, based on the parts of the article that the reader liked. Five articles were randomly chosen to undergo an analysis- by hand. During this time, information and statistics about the set of 3.3 Data Collection comments corresponding to each article were collected. The In order to collect the data from Dot Earth, a blog scraper script information that was collected included the number of comments was written in the language of PHP5. The scraper script gets the for each of the following: comments that were multiple 60 most recent articles in the Dot Earth blog and places them into paragraphs long, comments that used quotes from the article a MySQL database. For each article, the scraper also gets all of within them, comments that used statistics (or some other the comments and places those in the database too. The schema numbers) to support their point-of-view, comments that for the database is as follows- the article is linked to each of its referenced other related articles, comments that were a response comments using the field articleID. to a previous comment, comments that used the same key words (i.e. “history” or “future” or “evolution”), and finally, the number of posts per day. Out of the data that was collected above, the number that seemed to yield the highest value was the number of comments that used quotes from the article within them. As a result of this, it was decided that the most optimal way to organize the comments for this blog would be to show users a list of comments for each part Figure 2. Schema for the database that stores all of the articles of the article that was used in a quote. and comments. 3.2 Design The design for the blog comments organizer was done first with 3.3.1 Algorithm for Gathering Data The algorithm for gathering all of the articles and respective some sketches. It was then implemented using PHP, HTML, comments is given here. JavaScript, and Greasemonkey. The blog comments organizer can be easily integrated into the First connect to the Dot Earth homepage and get its HTML Dot Earth page. For each article, it highlights the parts of the source. Inside the source, look for the title of each news article article that are quoted in a comment. When a user then scrolls based on the corresponding HTML tags. For each of the articles, over the highlighted part of the article, the comment(s) that look for the corresponding HTML tags for the comments. Read reference(s) it will show up at the top of the page in reverse the text between all of the open and close HTML tags for each chronological order, so that the most recent comment will show article and its comments. Insert all of this information into a up first. A sketch of this design can be seen below. When the blog database with the schema above. In order to get articles across comments organizer is implemented, the style of the blog multiple pages, loop through the same process, after finding the corresponding HTML tags for each page.
  3. 3. Soon after this research was done, an API was introduced for Dot 4. EVALUATION Earth. In the future, it might be easier to collect all of the data via The reception of the blog comments organizers to some volunteer the API. However, this would also mean that the information testers presented some advantages and disadvantages to the blog would be stored in an XML file, not in a database, and this could comments organizer. First and foremost, although the design is make it harder to find quotes in the comments. integrated nicely into this particular blog (Dot Earth), it would require a lot of customization for each blog for which this was 3.4 Finding Quotes in Comments used. This is because each blog will have a different style, and Once the articles and comments are in the database, the next step therefore, the scraping will have to be done all over again. is to go through all of the comments for each given article and see However, the actual algorithm that is used to find the quotes if there are quotes from the article in there. would still be the same. Displaying the blog comments organizer for each blog would again differ, based on the style of the blog. First, check the comment all of the opening quote (“) symbols and However, the algorithms for inserting the <div></div> tags would the closing quote (”) symbols. If this exists, then see if the data still remain the same, once the source code of the other sites were between the two quotes matches any phrase from the article. Is it figured out. important to check to make sure that the quotes are not links to external pages, because these will match quotes to external pages One disadvantage of this blog comments organizer is that the in the article. Therefore, this case must be excluded when algorithm searches for the start and end quote characters. checking for quotes in the comments. If a quote in the comment However, a comment might have article from the text in it matches text from the article, then the starting index of the text in paraphrased or presented without the quotation marks. If this was the article should be stored in quote_index_start in the comments the case, then the presented algorithm would not find this as a database table. The end of the quote should be stored in quote, because it is not located within quotation marks. By quote_index_end. allowing for this to happen, there would be more comments for the user to see in the design of the blog comments organizer. 3.4.1 Algorithm for Finding Quotes However, to be able to detect paraphrasing, it would also require The algorithm for finding quotes from the article within a changing the fundamental algorithm to use some artificial comment is below. intelligence techniques, in addition to what it is already doing, while searching the article text. for each article in the database: get all comments for that article One major advantage of this design is that the user is given a choice whether he or she wants to read the comments. Since the for each comment: comments show up on a mouse-over event, if the user does not quote_start_index = 0; want to use the feature after the first time, he will not see all of quote_end_index = true; the different comments show up. Moreover, the comments are as long as there is an ending quote: placed strategically towards the right-hand-side of the page, go through text and find the opening quote where there is whitespace. This way, it does not cover up any possible important information that is on the page. The blog if there is no opening quote, exit loop by setting quote_end to false comments organizer acts as a supplement to the reader to make it easier for him to find the comments that he may be looking for, if there is an opening quote: but it does not require the user to use it. search for the ending quote starting from quote_start For example, someone who just wanted to browse the Dot Earth search for the text between the starting and blog and get an idea of the contributors, they might want to ending quotes in the article browse all of the comments, not just the parts that pertain to if the text exists: certain parts of the text of the article. In this case, the user does not have to use the blog comments organizer. However, if this store the quote_start_index and quote_end_index in the database for that user becomes a frequent visitor and contributor to the Dot Earth comment blog, he may start to look for specific comments which pertain to else: parts of an article that he likes. In this case, the user would find the blog comments organizer an ideal tool to get the information do not store anything and exit that he needs without having to go through hundreds of comments. Using the algorithm above, quotes from the article were found and the indices of where they were found were stored in the 5. DISCUSSION database. Because of the variety of the usage of the blog comments Once the quote indices were known, another script was written to organizer, there are many different ways that this tool can be insert <div></div> tags around the quotes in the article text that useful. Namely, it focuses on the growing field of computational highlighted that part of the article. JavaScript was then used to journalism: organization of media. There are many different news trigger a mouse-over event, so that if the reader put their mouse sites that would benefit from organization of their reader over the highlighted part of the article text, the list of comments comments, and this would be perfect. that contained that part of the article as a quote would show up in In the Evaluation section above, there was an example of the user the right-hand-side, as was shown in Figure 2 above. who just wanted to find information about a specific part of the article. This blog comments organizer could be useful for data
  4. 4. analysts in the media profession. Based on a posted news article, 7. CONCLUSION the author or the company that posted it can find out which parts It is possible to organize blog comments in a plethora of different of the article triggered the most comments. Based on this, the ways. Depending on the medium and the type of blog that is being company could post more articles that pertain to very similar used, there could be a number of ways to analyze and organize topics. This would attract new users, as well as retain the current the comments in a meaningful way for the users that come by. users. Organization of blog comments will soon become a very powerful The blog comment organizer could revolutionize the way that tool that can be used to target the type of users that the blog is articles are written and read. Based on the popularity of a certain tailored towards. part of an article, blogs can be tailored to suit the majority of its While there are many different ways to organize comments and readers. This would introduce a new level of specificity for the using quotes (as in this particular blog comments organizer) is just blog. If many blogs were to follow this and focused on specific one, it is important to realize that this growing field could soon re- topics, it might make blogs easier to categorize and make tags define the way that media is presented to the world. more universal. 8. ACKNOWLEDGMENTS 6. FUTURE WORK Many thanks to all volunteer evaluators, especially Sekhar There are many different applications and related work that could Vajjhala, Carolina Gomez, Blair Daly, and Nicholas Bowen. be done based on the blog comments organizer. First and foremost, a different metric could be used to organize 9. REFERENCES comments. Right now, only quotes are being used, but blog [1] Baumer, Eric, Mark Sueyoshi, and Bill Tomlinson. comments could also be gauged based on the themes of the posts "Exploring the Role of the Reader in the Activity of (i.e. history, evolution, etc.) or comments that used statistics. Blogging." CHI 2008 (2008): 1111-20. Blogs have become a source for data mining, and if users are looking for certain quotes or numbers and comments contain [2] Brooks, Christopher H., and Nancy Montanez. "An Analysis those statistics, this would be very useful for the user. of the Effectiveness of Tagging in Blogs." American Association for Artificial Intelligence (2006). The blog comments organizer could also be used to analyze different types of media. Right now, only written blogs are being [3] Gill, Alastair J., et al. "Emotion Rating from Short Blog analyzed. However, video blogs are slowly becoming more Texts." CHI 2008 (2008): 1121-24. popular, so being able to find comments that quoted parts of a [4] Kramer, Adam D.I., and Kerry Rodden. "Word Usage and video in a blog post would also prove to be very useful. Posting Behaviors: Modeling Blogs with Unobtrusive Data Collection Methods." CHI 2008 (2008): 1125-28