SlideShare a Scribd company logo
1 of 31
Blogosphere: Research
Issues, Tools, and Applications
         Nitin Agarwal and Huan Liu
                             Sunil Bandla
                       INF384H – Fall 2011
Agenda
   Introduction
   Research issues
   Tools and Methods
   Case Study
   Blogosphere and Social Networks
Web 2.0
   It is the reason behind surge of interest in online
    communities
   Former consumers are now producers
   Collaborative environment
   User-generated content
   Collective wisdom
   Web 2.0 services:
       Blogs, wikis, social networking sites, social tagging
       Wordpress, Wikipedia, Facebook, Youtube, Twitter, Yelp
Social Networks
   “A social network is a social structure made up of
    individuals connected by one or more types of
    interdependency, such as friendship, common
    interest…” – Wikipedia
   Web 2.0 is enabling virtual social networks
   Size and connectedness varies across networks
   Examples:
       Friendship networks ( Facebook, Myspace )
       Media sharing ( Flickr, Youtube )
“The site, chock full of          Arnold Kim, founder and senior editor of
                                  MacRumors.com.
advertising, is a moneymaking
machine – so much so that Ms.
Armstrong and her husband have    “The site places MacRumors No. 2 on a list
both quit their regular jobs.“    of the „25 most valuable blogs,‟ …” What is
The reason? The advertisers are   the potential value? “Two of the other tech-
eager to influence her 850,000    oriented blogs on its list, …, were sold
readers.                          earlier this year, reportedly for sums in
                                  excess of $25 million.”


                                                             Source: The New York Times
                                                             Slide Credit: Liu & Nitin
Blogosphere
   Blog sites
   Bloggers
   Blog posts
   Blogroll
   Permalinks
   Low barrier to publication
   Readers can comment instantly which gives blogger
    a feeling of satisfaction
   Individual vs community blogs
Blogosphere
   Complex social networks
   Bloggers/blog posts/blog sites become nodes
   Relationships are represented by edges between
    nodes
   Inlinks & Outlinks
Agenda
   Introduction
   Research issues
   Tools and Methods
   Case Study
   Blogosphere and Social Networks
Modeling the Blogosphere
   Helps in generating an artificial dataset to compare
    algorithms
   Study patterns that could explain community
    discovery, spam blogs, influence, etc.
   Key differences between Web and Blogosphere
                Web                              Blogosphere
Web models assume dense graph        Blogosphere has a very sparse
structure                            hyperlink structure
Not much interaction                 Interaction in the form of comments
                                     and replies
Static web pages                     Dynamic blog posts
Conventional web pages do not have   Blog posts have tags and categories
tags
Modeling the Blogosphere
   Web models:
       Random graph
       Preferential attachment graph models
       Hybrid graph models
   Blogosphere models:
       To study temporal patterns of blogosphere like how often
        people create blog posts, how they are linked
       Blogrolls to create a network of connected posts
Blog Clustering
   Automatic organization of the content
   Helps readers focus on interesting categories
   Keyword based:
       Brooks and Montanez 2006, pick top 3 keywords to
        cluster blog posts
       Li et al. 2007, assign different weights to title, body and
        comments of blog posts
   Collective wisdom based:
       Agarwal et al. 2008 use category relation graph to merge
        categories and cluster blogs
Blog Mining
   Valuable resources to track:
       Consumers’ beliefs and opinions
       Initial reaction to a launch
       Trends and buzzwords
   Blog conversations provide insights into how
    information flows and how opinions are shaped and
    influenced
   Pulse uses a Naïve Bayes classifier trained on
    annotated sentences to classify unlabeled data
   Attardi and Simi 2006, use opinionated words
    acquired from WordNet to improve blog retrieval
Community Discovery
   Content analysis and text analysis of the blog posts
    to identify communities
   Kleinberg et al, cluster all the expert communities
    together as authorities using an authority based
    approach
   Kumar et al. extend it to include co-citations to
    extract all communities on the web
   Some researchers studied community extraction
    using newsgroups and discussion boards
Influence in Blogs
   Influential bloggers:
       Are potential market-movers
       Sway opinions in political campaigns
       Troubleshoot the problems of peer consumers
       Useful for “word-of-mouth” advertising of products
   Finding influential blog sites is different from
    identifying influential bloggers
   Agarwal et al, studied the influence of a blogger by
    modeling the blog site as a graph
Trust and Reputation
   Overwhelming amount of collective wisdom
   Difficult for reader to decide whom to trust
   Assess the reputation of influential members in the
    community
   Not much work that deals with trust in Blogosphere
   Kale et al. 2007 mined sentiments about the cited
    blog post using a window of words around the links
   They compute trust in a network of blog sites

   Use comments on the blog post to judge a blogger’s
    trust
Filtering Spam blogs
   Splogs == Spam blogs
   Degrade search quality and waste network
    resources
   Initial researchers used web spam detection
    techniques
   Kolari et al. 2006, use content and hyperlinks to train
    a SVM based classifier to classify a blog post as
    spam
   Content on blog sites is dynamic so content based
    spam filters are ineffective
   Lin et al. propose a self similarity based splog
    detection algorithm based on patterns in posting
    times of splogs, content similarity and similar links in
Agenda
   Introduction
   Research issues
   Tools and Methods
   Case Study
   Blogosphere and Social Networks
Tools and APIs
   Tools to simulate social networks to study their
    properties
   Multi-agent simulation tools
   Analysis of social networks
   Visualization of social networks
   APIs:
       Facebook
       StumbleUpon
       Del.icio.us
Methodologies
   Centrality measures
   Content analysis
   Link analysis
   Decision theoretic approaches
   Agent-based modeling
Datasets
   Nielsen Buzzmetrics dataset
       About 14M blog posts from 3M blog sites
       Annotated with 1.7M blog-blog links
       Up to a half of the blog outlinks are missing
       Only 51% of the total blog posts are in English
   Enron Email dataset
       Emails from about 150 users at Enron
       0.5M messages
       Social networks between users were studied based on link
        construction
       Email senders and recipients are used to construct links
Experiments and Performance Metrics
   Concepts like influence, trust, etc. in Blogosphere
    are socio-psychological and subjective
   Evaluating them is non-trivial
   Hard to compare different approaches since there is
    no ground truth!
   Search engines’ ranking as the baseline for most of
    the existing works
   Web 2.0 application i.e., Digg, was used to evaluate
    the influence in blogosphere
Agenda
   Introduction
   Research issues
   Tools and Methods
   Case Study
   Blogosphere and Social Networks
Finding influential bloggers
   “A blogger can be influential if s/he has more than
    one influential blog post”
   Properties that represent influential blog posts:
       Recognition – An influential blog post is recognized by
        many
       Activity Generation – Number of comments received and
        amount of discussion initiated
       Novelty – Number of outlinks
       Eloquence – Length of a post
   Data Collection
       The Unofficial Apple Weblog
       Crawled 10,000 posts
Results
   Top 5 bloggers according to TUAW and proposed
    model
   Some bloggers are both active and influential
   Some of them are active but not influential
   Some influential bloggers are not active
   Inactive and non-influential bloggers
Verification
   Challenges:
       No testing and training data
       Absence of ground truth
   Use another Web2.0 site Digg to provide a reference
    point
   A more liked post will have higher score on Digg
   Digg returns top 100 voted posts
   Intersection of Digg 100 and top 20 from their model
Verification
   Importance of each parameter
   Inlinks > comments > outlinks > blog post length in
    decreasing order of importance to influence
    estimation
Agenda
   Introduction
   Research issues
   Tools and Methods
   Case Study
   Blogosphere and Social Networks
Blogosphere and Social Networks
                 Blogosphere                      Social Networks
Influential nodes have “been           Influential nodes “could influence”
influencing”
To share ideas or opinions             To stay in touch or make friends
Reputation is based on previous        Reputation is based on the number of
responses                              connections
Person-to-group interaction            Person-to-person interaction
Community experience                   Friendship experience
Loosely defined graph                  Strictly defined graph
Nodes could be bloggers, blog posts,   Nodes are members
blog sites
Implicit links                         Predefined links
Directed graph                         Undirected graph
Conclusion
   Virtual communities and low barrier to publication are
    helping the growth of blogosphere
   A lot is yet to be done in terms of research specific to
    blogosphere
   Need accurate ground truth data
   Experiments and evaluation plan should be devised
    to have objective analysis of different algorithms
   Thank you!
References
   http://www.sigkdd.org/explorations/issues/10-1-2008-
    07/V10N1-Blogosphere.pdf
   http://videolectures.net/kdd08_liu_briat/

More Related Content

What's hot

Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
 
Blogging for researchers
Blogging for researchersBlogging for researchers
Blogging for researchersHelen Dixon
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6Davide Ceolin
 
Mendeley The Social Academic Network
Mendeley   The Social Academic NetworkMendeley   The Social Academic Network
Mendeley The Social Academic NetworkAndy Tattersall
 
Social Web 2.0 Class Week 4: Social Networks, Privacy
Social Web 2.0 Class Week 4: Social Networks, PrivacySocial Web 2.0 Class Week 4: Social Networks, Privacy
Social Web 2.0 Class Week 4: Social Networks, PrivacyShelly D. Farnham, Ph.D.
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social ProcessRobert Cormia
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...inventionjournals
 
VU University Amsterdam - The Social Web 2016 - Lecture 1
VU University Amsterdam - The Social Web 2016 - Lecture 1 VU University Amsterdam - The Social Web 2016 - Lecture 1
VU University Amsterdam - The Social Web 2016 - Lecture 1 Davide Ceolin
 
Social media workshop for Duke faculty, 2013
Social media workshop for Duke faculty, 2013Social media workshop for Duke faculty, 2013
Social media workshop for Duke faculty, 2013Cara Rousseau
 
Characterization of facebook users
Characterization of facebook usersCharacterization of facebook users
Characterization of facebook usersDebesh Majumdar
 
Facebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User AwarenessFacebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User AwarenessJakub Ruzicka
 
These article
These articleThese article
These articleLucy Moy
 
Academics' online presence: Assessing and shaping your online visibility_26oc...
Academics' online presence: Assessing and shaping your online visibility_26oc...Academics' online presence: Assessing and shaping your online visibility_26oc...
Academics' online presence: Assessing and shaping your online visibility_26oc...SarahG_SS
 
2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshow2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshowMarc Smith
 

What's hot (17)

Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 
Blogging for researchers
Blogging for researchersBlogging for researchers
Blogging for researchers
 
VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6VU University Amsterdam - The Social Web 2016 - Lecture 6
VU University Amsterdam - The Social Web 2016 - Lecture 6
 
Emerging Technologies
Emerging TechnologiesEmerging Technologies
Emerging Technologies
 
Mendeley The Social Academic Network
Mendeley   The Social Academic NetworkMendeley   The Social Academic Network
Mendeley The Social Academic Network
 
Social Web 2.0 Class Week 4: Social Networks, Privacy
Social Web 2.0 Class Week 4: Social Networks, PrivacySocial Web 2.0 Class Week 4: Social Networks, Privacy
Social Web 2.0 Class Week 4: Social Networks, Privacy
 
Learning as a Social Process
Learning as a Social ProcessLearning as a Social Process
Learning as a Social Process
 
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
Multi-Mode Conceptual Clustering Algorithm Based Social Group Identification ...
 
VU University Amsterdam - The Social Web 2016 - Lecture 1
VU University Amsterdam - The Social Web 2016 - Lecture 1 VU University Amsterdam - The Social Web 2016 - Lecture 1
VU University Amsterdam - The Social Web 2016 - Lecture 1
 
Social media workshop for Duke faculty, 2013
Social media workshop for Duke faculty, 2013Social media workshop for Duke faculty, 2013
Social media workshop for Duke faculty, 2013
 
Jf2516311637
Jf2516311637Jf2516311637
Jf2516311637
 
Characterization of facebook users
Characterization of facebook usersCharacterization of facebook users
Characterization of facebook users
 
Facebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User AwarenessFacebook News Feed Algorithm: Facebook User Awareness
Facebook News Feed Algorithm: Facebook User Awareness
 
These article
These articleThese article
These article
 
Dissertation
DissertationDissertation
Dissertation
 
Academics' online presence: Assessing and shaping your online visibility_26oc...
Academics' online presence: Assessing and shaping your online visibility_26oc...Academics' online presence: Assessing and shaping your online visibility_26oc...
Academics' online presence: Assessing and shaping your online visibility_26oc...
 
2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshow2017 05-26 NodeXL Twitter search #shakeupshow
2017 05-26 NodeXL Twitter search #shakeupshow
 

Viewers also liked

Misra,D.C.(2009) Blogosphere Challenges And Opportunities For Public Sector ...
Misra,D.C.(2009)  Blogosphere Challenges And Opportunities For Public Sector ...Misra,D.C.(2009)  Blogosphere Challenges And Opportunities For Public Sector ...
Misra,D.C.(2009) Blogosphere Challenges And Opportunities For Public Sector ...Dr D.C. Misra
 
Μια ματιά στο βυζάντιο
Μια ματιά στο βυζάντιοΜια ματιά στο βυζάντιο
Μια ματιά στο βυζάντιοRANIA Perifantsi
 
Come fare il rapporto sociale di una città
Come fare il rapporto sociale di una cittàCome fare il rapporto sociale di una città
Come fare il rapporto sociale di una cittàFranco Pesaresi
 
- M.S.G. InMovimento n°10 - Novembre -
- M.S.G. InMovimento n°10 - Novembre -- M.S.G. InMovimento n°10 - Novembre -
- M.S.G. InMovimento n°10 - Novembre -MSGcatania
 
I costi pubblici del trattamento delle patologie della condotta alimentare ne...
I costi pubblici del trattamento delle patologie della condotta alimentare ne...I costi pubblici del trattamento delle patologie della condotta alimentare ne...
I costi pubblici del trattamento delle patologie della condotta alimentare ne...Franco Pesaresi
 
Il Rapporto Sociale del comune di Ancona
Il Rapporto Sociale del comune di AnconaIl Rapporto Sociale del comune di Ancona
Il Rapporto Sociale del comune di AnconaFranco Pesaresi
 
Il sistema di governance dei piani sociali di zona
Il sistema di governance dei piani sociali di zonaIl sistema di governance dei piani sociali di zona
Il sistema di governance dei piani sociali di zonaFranco Pesaresi
 
Lo stato di attuazione della L.328/2000. Anno 2006
Lo stato di attuazione della L.328/2000. Anno 2006Lo stato di attuazione della L.328/2000. Anno 2006
Lo stato di attuazione della L.328/2000. Anno 2006Franco Pesaresi
 
asili nido e servizi per la prima infanzia. 2003
asili nido e servizi per la prima infanzia. 2003asili nido e servizi per la prima infanzia. 2003
asili nido e servizi per la prima infanzia. 2003Franco Pesaresi
 
L'accreditamento nel sociale
L'accreditamento nel socialeL'accreditamento nel sociale
L'accreditamento nel socialeFranco Pesaresi
 
Presentació kandinsky
Presentació kandinskyPresentació kandinsky
Presentació kandinskyspozo4
 
Porta sociale di accesso: tendenze in Italia
Porta sociale di accesso: tendenze in ItaliaPorta sociale di accesso: tendenze in Italia
Porta sociale di accesso: tendenze in ItaliaFranco Pesaresi
 
L'assistenza post-acuzie
L'assistenza post-acuzieL'assistenza post-acuzie
L'assistenza post-acuzieFranco Pesaresi
 
Gli standard di personale nei presidi di riabilitazione e nelle RSA
Gli standard di personale nei presidi di riabilitazione e nelle RSAGli standard di personale nei presidi di riabilitazione e nelle RSA
Gli standard di personale nei presidi di riabilitazione e nelle RSAFranco Pesaresi
 
Misra,D.C.(2009) Change Management For E Government 24.10.2009
Misra,D.C.(2009) Change Management For E Government 24.10.2009Misra,D.C.(2009) Change Management For E Government 24.10.2009
Misra,D.C.(2009) Change Management For E Government 24.10.2009Dr D.C. Misra
 
Initiation au couplage réalité augmentée (RA) - système d’information géograp...
Initiation au couplage réalité augmentée (RA) - système d’information géograp...Initiation au couplage réalité augmentée (RA) - système d’information géograp...
Initiation au couplage réalité augmentée (RA) - système d’information géograp...Guillaume MOCQUET
 

Viewers also liked (20)

Blogosphere
BlogosphereBlogosphere
Blogosphere
 
Misra,D.C.(2009) Blogosphere Challenges And Opportunities For Public Sector ...
Misra,D.C.(2009)  Blogosphere Challenges And Opportunities For Public Sector ...Misra,D.C.(2009)  Blogosphere Challenges And Opportunities For Public Sector ...
Misra,D.C.(2009) Blogosphere Challenges And Opportunities For Public Sector ...
 
Μια ματιά στο βυζάντιο
Μια ματιά στο βυζάντιοΜια ματιά στο βυζάντιο
Μια ματιά στο βυζάντιο
 
Come fare il rapporto sociale di una città
Come fare il rapporto sociale di una cittàCome fare il rapporto sociale di una città
Come fare il rapporto sociale di una città
 
- M.S.G. InMovimento n°10 - Novembre -
- M.S.G. InMovimento n°10 - Novembre -- M.S.G. InMovimento n°10 - Novembre -
- M.S.G. InMovimento n°10 - Novembre -
 
I costi pubblici del trattamento delle patologie della condotta alimentare ne...
I costi pubblici del trattamento delle patologie della condotta alimentare ne...I costi pubblici del trattamento delle patologie della condotta alimentare ne...
I costi pubblici del trattamento delle patologie della condotta alimentare ne...
 
Tic
TicTic
Tic
 
Il Rapporto Sociale del comune di Ancona
Il Rapporto Sociale del comune di AnconaIl Rapporto Sociale del comune di Ancona
Il Rapporto Sociale del comune di Ancona
 
Il sistema di governance dei piani sociali di zona
Il sistema di governance dei piani sociali di zonaIl sistema di governance dei piani sociali di zona
Il sistema di governance dei piani sociali di zona
 
Lo stato di attuazione della L.328/2000. Anno 2006
Lo stato di attuazione della L.328/2000. Anno 2006Lo stato di attuazione della L.328/2000. Anno 2006
Lo stato di attuazione della L.328/2000. Anno 2006
 
asili nido e servizi per la prima infanzia. 2003
asili nido e servizi per la prima infanzia. 2003asili nido e servizi per la prima infanzia. 2003
asili nido e servizi per la prima infanzia. 2003
 
Blogosphere
BlogosphereBlogosphere
Blogosphere
 
L'accreditamento nel sociale
L'accreditamento nel socialeL'accreditamento nel sociale
L'accreditamento nel sociale
 
Presentació kandinsky
Presentació kandinskyPresentació kandinsky
Presentació kandinsky
 
Porta sociale di accesso: tendenze in Italia
Porta sociale di accesso: tendenze in ItaliaPorta sociale di accesso: tendenze in Italia
Porta sociale di accesso: tendenze in Italia
 
contamination control
contamination controlcontamination control
contamination control
 
L'assistenza post-acuzie
L'assistenza post-acuzieL'assistenza post-acuzie
L'assistenza post-acuzie
 
Gli standard di personale nei presidi di riabilitazione e nelle RSA
Gli standard di personale nei presidi di riabilitazione e nelle RSAGli standard di personale nei presidi di riabilitazione e nelle RSA
Gli standard di personale nei presidi di riabilitazione e nelle RSA
 
Misra,D.C.(2009) Change Management For E Government 24.10.2009
Misra,D.C.(2009) Change Management For E Government 24.10.2009Misra,D.C.(2009) Change Management For E Government 24.10.2009
Misra,D.C.(2009) Change Management For E Government 24.10.2009
 
Initiation au couplage réalité augmentée (RA) - système d’information géograp...
Initiation au couplage réalité augmentée (RA) - système d’information géograp...Initiation au couplage réalité augmentée (RA) - système d’information géograp...
Initiation au couplage réalité augmentée (RA) - système d’information géograp...
 

Similar to Blogosphere

WAM! 2008: Empowering Online Communities
WAM! 2008: Empowering Online CommunitiesWAM! 2008: Empowering Online Communities
WAM! 2008: Empowering Online CommunitiesDeanna Zandt
 
City online journalism wk2: BASIC principles & SEO
City online journalism wk2: BASIC principles & SEOCity online journalism wk2: BASIC principles & SEO
City online journalism wk2: BASIC principles & SEOPaul Bradshaw
 
Chapter2a McHaney 2nd edition
Chapter2a McHaney 2nd editionChapter2a McHaney 2nd edition
Chapter2a McHaney 2nd editionRoger McHaney
 
Blog Comments Organizer
Blog Comments OrganizerBlog Comments Organizer
Blog Comments OrganizerSweta Vajjhala
 
Monitoring the Impact of Your Strategies
Monitoring the Impact of Your StrategiesMonitoring the Impact of Your Strategies
Monitoring the Impact of Your Strategieslisbk
 
Blogging for Advisors
Blogging for AdvisorsBlogging for Advisors
Blogging for Advisorsrjensen
 
Detecting Communities in Science Blogs
Detecting Communities in Science BlogsDetecting Communities in Science Blogs
Detecting Communities in Science BlogsChristina Pikas
 
The Use of the Social Web in Scholarly Communication
The Use of the Social Web in Scholarly CommunicationThe Use of the Social Web in Scholarly Communication
The Use of the Social Web in Scholarly Communicationlisbk
 
Social media in education
Social media in educationSocial media in education
Social media in educationMichele Berner
 
Book Talking & Web 2.0
Book Talking & Web 2.0Book Talking & Web 2.0
Book Talking & Web 2.0Mary Danko
 
Exploiting The Potential of Blogs and Social Networks
Exploiting The Potential of Blogs and Social NetworksExploiting The Potential of Blogs and Social Networks
Exploiting The Potential of Blogs and Social Networkslisbk
 
Web2.0 Applications
Web2.0 ApplicationsWeb2.0 Applications
Web2.0 Applicationsdomenico79
 

Similar to Blogosphere (20)

WAM! 2008: Empowering Online Communities
WAM! 2008: Empowering Online CommunitiesWAM! 2008: Empowering Online Communities
WAM! 2008: Empowering Online Communities
 
City online journalism wk2: BASIC principles & SEO
City online journalism wk2: BASIC principles & SEOCity online journalism wk2: BASIC principles & SEO
City online journalism wk2: BASIC principles & SEO
 
Chapter2a McHaney
Chapter2a McHaneyChapter2a McHaney
Chapter2a McHaney
 
Dg24698702
Dg24698702Dg24698702
Dg24698702
 
Ili2012
Ili2012Ili2012
Ili2012
 
Imc 462 class 1 060211
Imc 462 class 1 060211Imc 462 class 1 060211
Imc 462 class 1 060211
 
Chapter2a McHaney 2nd edition
Chapter2a McHaney 2nd editionChapter2a McHaney 2nd edition
Chapter2a McHaney 2nd edition
 
Blog Comments Organizer
Blog Comments OrganizerBlog Comments Organizer
Blog Comments Organizer
 
Monitoring the Impact of Your Strategies
Monitoring the Impact of Your StrategiesMonitoring the Impact of Your Strategies
Monitoring the Impact of Your Strategies
 
Blogging for Advisors
Blogging for AdvisorsBlogging for Advisors
Blogging for Advisors
 
Task 8- group 3- cei-ufmg
Task 8- group 3- cei-ufmgTask 8- group 3- cei-ufmg
Task 8- group 3- cei-ufmg
 
Detecting Communities in Science Blogs
Detecting Communities in Science BlogsDetecting Communities in Science Blogs
Detecting Communities in Science Blogs
 
Jx2517481755
Jx2517481755Jx2517481755
Jx2517481755
 
The Use of the Social Web in Scholarly Communication
The Use of the Social Web in Scholarly CommunicationThe Use of the Social Web in Scholarly Communication
The Use of the Social Web in Scholarly Communication
 
Social media in education
Social media in educationSocial media in education
Social media in education
 
Book Talking & Web 2.0
Book Talking & Web 2.0Book Talking & Web 2.0
Book Talking & Web 2.0
 
Exploiting The Potential of Blogs and Social Networks
Exploiting The Potential of Blogs and Social NetworksExploiting The Potential of Blogs and Social Networks
Exploiting The Potential of Blogs and Social Networks
 
Web2.0 Applications
Web2.0 ApplicationsWeb2.0 Applications
Web2.0 Applications
 
Blogging Culture
Blogging CultureBlogging Culture
Blogging Culture
 
Blogging Culture
Blogging CultureBlogging Culture
Blogging Culture
 

Recently uploaded

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 

Recently uploaded (20)

"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 

Blogosphere

  • 1. Blogosphere: Research Issues, Tools, and Applications Nitin Agarwal and Huan Liu Sunil Bandla INF384H – Fall 2011
  • 2. Agenda  Introduction  Research issues  Tools and Methods  Case Study  Blogosphere and Social Networks
  • 3. Web 2.0  It is the reason behind surge of interest in online communities  Former consumers are now producers  Collaborative environment  User-generated content  Collective wisdom  Web 2.0 services:  Blogs, wikis, social networking sites, social tagging  Wordpress, Wikipedia, Facebook, Youtube, Twitter, Yelp
  • 4. Social Networks  “A social network is a social structure made up of individuals connected by one or more types of interdependency, such as friendship, common interest…” – Wikipedia  Web 2.0 is enabling virtual social networks  Size and connectedness varies across networks  Examples:  Friendship networks ( Facebook, Myspace )  Media sharing ( Flickr, Youtube )
  • 5. “The site, chock full of Arnold Kim, founder and senior editor of MacRumors.com. advertising, is a moneymaking machine – so much so that Ms. Armstrong and her husband have “The site places MacRumors No. 2 on a list both quit their regular jobs.“ of the „25 most valuable blogs,‟ …” What is The reason? The advertisers are the potential value? “Two of the other tech- eager to influence her 850,000 oriented blogs on its list, …, were sold readers. earlier this year, reportedly for sums in excess of $25 million.” Source: The New York Times Slide Credit: Liu & Nitin
  • 6. Blogosphere  Blog sites  Bloggers  Blog posts  Blogroll  Permalinks  Low barrier to publication  Readers can comment instantly which gives blogger a feeling of satisfaction  Individual vs community blogs
  • 7. Blogosphere  Complex social networks  Bloggers/blog posts/blog sites become nodes  Relationships are represented by edges between nodes  Inlinks & Outlinks
  • 8. Agenda  Introduction  Research issues  Tools and Methods  Case Study  Blogosphere and Social Networks
  • 9. Modeling the Blogosphere  Helps in generating an artificial dataset to compare algorithms  Study patterns that could explain community discovery, spam blogs, influence, etc.  Key differences between Web and Blogosphere Web Blogosphere Web models assume dense graph Blogosphere has a very sparse structure hyperlink structure Not much interaction Interaction in the form of comments and replies Static web pages Dynamic blog posts Conventional web pages do not have Blog posts have tags and categories tags
  • 10. Modeling the Blogosphere  Web models:  Random graph  Preferential attachment graph models  Hybrid graph models  Blogosphere models:  To study temporal patterns of blogosphere like how often people create blog posts, how they are linked  Blogrolls to create a network of connected posts
  • 11. Blog Clustering  Automatic organization of the content  Helps readers focus on interesting categories  Keyword based:  Brooks and Montanez 2006, pick top 3 keywords to cluster blog posts  Li et al. 2007, assign different weights to title, body and comments of blog posts  Collective wisdom based:  Agarwal et al. 2008 use category relation graph to merge categories and cluster blogs
  • 12. Blog Mining  Valuable resources to track:  Consumers’ beliefs and opinions  Initial reaction to a launch  Trends and buzzwords  Blog conversations provide insights into how information flows and how opinions are shaped and influenced  Pulse uses a Naïve Bayes classifier trained on annotated sentences to classify unlabeled data  Attardi and Simi 2006, use opinionated words acquired from WordNet to improve blog retrieval
  • 13. Community Discovery  Content analysis and text analysis of the blog posts to identify communities  Kleinberg et al, cluster all the expert communities together as authorities using an authority based approach  Kumar et al. extend it to include co-citations to extract all communities on the web  Some researchers studied community extraction using newsgroups and discussion boards
  • 14. Influence in Blogs  Influential bloggers:  Are potential market-movers  Sway opinions in political campaigns  Troubleshoot the problems of peer consumers  Useful for “word-of-mouth” advertising of products  Finding influential blog sites is different from identifying influential bloggers  Agarwal et al, studied the influence of a blogger by modeling the blog site as a graph
  • 15. Trust and Reputation  Overwhelming amount of collective wisdom  Difficult for reader to decide whom to trust  Assess the reputation of influential members in the community  Not much work that deals with trust in Blogosphere  Kale et al. 2007 mined sentiments about the cited blog post using a window of words around the links  They compute trust in a network of blog sites  Use comments on the blog post to judge a blogger’s trust
  • 16. Filtering Spam blogs  Splogs == Spam blogs  Degrade search quality and waste network resources  Initial researchers used web spam detection techniques  Kolari et al. 2006, use content and hyperlinks to train a SVM based classifier to classify a blog post as spam  Content on blog sites is dynamic so content based spam filters are ineffective  Lin et al. propose a self similarity based splog detection algorithm based on patterns in posting times of splogs, content similarity and similar links in
  • 17. Agenda  Introduction  Research issues  Tools and Methods  Case Study  Blogosphere and Social Networks
  • 18. Tools and APIs  Tools to simulate social networks to study their properties  Multi-agent simulation tools  Analysis of social networks  Visualization of social networks  APIs:  Facebook  StumbleUpon  Del.icio.us
  • 19. Methodologies  Centrality measures  Content analysis  Link analysis  Decision theoretic approaches  Agent-based modeling
  • 20. Datasets  Nielsen Buzzmetrics dataset  About 14M blog posts from 3M blog sites  Annotated with 1.7M blog-blog links  Up to a half of the blog outlinks are missing  Only 51% of the total blog posts are in English  Enron Email dataset  Emails from about 150 users at Enron  0.5M messages  Social networks between users were studied based on link construction  Email senders and recipients are used to construct links
  • 21. Experiments and Performance Metrics  Concepts like influence, trust, etc. in Blogosphere are socio-psychological and subjective  Evaluating them is non-trivial  Hard to compare different approaches since there is no ground truth!  Search engines’ ranking as the baseline for most of the existing works  Web 2.0 application i.e., Digg, was used to evaluate the influence in blogosphere
  • 22. Agenda  Introduction  Research issues  Tools and Methods  Case Study  Blogosphere and Social Networks
  • 23. Finding influential bloggers  “A blogger can be influential if s/he has more than one influential blog post”  Properties that represent influential blog posts:  Recognition – An influential blog post is recognized by many  Activity Generation – Number of comments received and amount of discussion initiated  Novelty – Number of outlinks  Eloquence – Length of a post  Data Collection  The Unofficial Apple Weblog  Crawled 10,000 posts
  • 24. Results  Top 5 bloggers according to TUAW and proposed model  Some bloggers are both active and influential  Some of them are active but not influential  Some influential bloggers are not active  Inactive and non-influential bloggers
  • 25. Verification  Challenges:  No testing and training data  Absence of ground truth  Use another Web2.0 site Digg to provide a reference point  A more liked post will have higher score on Digg  Digg returns top 100 voted posts  Intersection of Digg 100 and top 20 from their model
  • 26. Verification  Importance of each parameter  Inlinks > comments > outlinks > blog post length in decreasing order of importance to influence estimation
  • 27. Agenda  Introduction  Research issues  Tools and Methods  Case Study  Blogosphere and Social Networks
  • 28. Blogosphere and Social Networks Blogosphere Social Networks Influential nodes have “been Influential nodes “could influence” influencing” To share ideas or opinions To stay in touch or make friends Reputation is based on previous Reputation is based on the number of responses connections Person-to-group interaction Person-to-person interaction Community experience Friendship experience Loosely defined graph Strictly defined graph Nodes could be bloggers, blog posts, Nodes are members blog sites Implicit links Predefined links Directed graph Undirected graph
  • 29. Conclusion  Virtual communities and low barrier to publication are helping the growth of blogosphere  A lot is yet to be done in terms of research specific to blogosphere  Need accurate ground truth data  Experiments and evaluation plan should be devised to have objective analysis of different algorithms
  • 30. Thank you!
  • 31. References  http://www.sigkdd.org/explorations/issues/10-1-2008- 07/V10N1-Blogosphere.pdf  http://videolectures.net/kdd08_liu_briat/

Editor's Notes

  1. http://www.nytimes.com/2008/08/14/technology/14women.html?pagewanted=allhttp://www.nytimes.com/2008/07/21/technology/21blogger.html?_r=1&oref=sloginThe two examples show the new trend of advertising and the values of good blogs
  2. Reported improved clustering as compared to that using tags
  3. Mining sentiments from free text forms poses several challenges
  4. Moreover, spammers can copy the content from some regular blog posts to evade content based spam filtersLink based spam filters can easily be beaten by creating legitimate links
  5. various social networking sites provide APIs nowadays. this helps the developers to get limited access to data. APIs are also used to write numerous applications that extend the functioanlities of these sites and create mashups.
  6. In experiments we observe outlinks is negatively correlated with the number of comments received on a blog post, which means more outlinks reduces people's interest/attention.In experiments we observe blog post length is positively correlated with the number of comments received on a blog post, which means longer blog posts attracts people's interest/attention.