Your SlideShare is downloading. ×
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Mike Thelwall: Introduction to Webometrics
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Mike Thelwall: Introduction to Webometrics

2,631

Published on

Presentation to the second LIS DREaM workshop held at the British Library on Monday 30th January 2012. …

Presentation to the second LIS DREaM workshop held at the British Library on Monday 30th January 2012.

More information available at: http://lisresearch.org/dream-project/dream-event-3-workshop-monday-30-january-2012/

Published in: Technology
0 Comments
5 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2,631
On Slideshare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
57
Comments
0
Likes
5
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • Example of highly popular amateur video triggering lots of discussions
  • Webometric Analyst uses the YouTube API to download information on one or more videos. [Is Windows only]
  • Friend network
  • Friends in common
  • Intelligent design debate
  • Arjona link
  • Transcript

    • 1. Introduction to Webometrics Mike Thelwall @mikethelwall Professor of Information Science, Statistical Cybermetrics Research Gp, University of Wolverhampton
    • 2. Reminder of pre-workshop task Delegates were asked to join YouTube and leave comments and replies to earlier comments on the video: Department of Library and Information Science, Delhi http://www.youtube.com/watch?v=_-OAsF9uRfc These contributions will form part of the discussion at the end of the session, and include reference to the self-declared age and gender information from YouTube.
    • 3. Overview of “Webometrics”
      • What is Webometrics?
        • Gathering, processing and analysing large scale data from the web (web pages, hyperlinks, blogs, Web 2.0) for many purposes that include online communication
      • What can Webometrics offer other researchers?
        • Software to gather data from web sites, search engines, social network sites and blogs; methods to extract useful patterns
      • Common data sources
        • Webometric Analyst software: Twitter, YouTube, the Web, Technorati, Bing
        • Bespoke: Any resource with an API, page scraping of other sites, SocSciBot web crawler
      http://lexiurl.wlv.ac.uk
    • 4. 1. Background: Webometrics
      • Webometrics is about gathering data on the Web, and measuring aspects of the Web:
        • web sites
        • web pages
        • hyperlinks
        • YouTube video commenter networks
        • web search engine results
        • MySpace Friend networks
        • Twitter or blog trends
      • … for varied social science purposes
    • 5. New problems: Web-based phenomena
      • Webometrics can analyse online academic communication
        • Why do academic web sites interlink?
        • Which academic web sites interlink?
        • What academic interlinking patterns exist?
        • Which web sites/groups/documents have the most online impact, and why?
    • 6. Old problems: Offline phenomena reflected online
      • Some offline phenomena have measurable online reflections
        • International communication
        • Inter-university collaboration
        • University-business collaboration
        • The impact or spread of ideas
        • Public opinion about science
    • 7. Example: The online impact of research groups (NetReAct)
    • 8. Normalised linking, smallest countries removed Geopolitical connected Sweden Finland Norway UK Germany Austria Switzerland Poland Italy Belgium Spain France NL Example: Links between EU universities
    • 9. International biofuels research network
    • 10. Data Gathering/Processing Tools
      • Webometric Analyst – web citations, web text, YouTube, Flickr, Technorati
        • Submits thousands of queries to Bing and summarises the results in standard ways
      • SocSciBot – links, web text
        • Web Crawler &
        • analyser
    • 11. 2. altmetrics in traditional research evaluation
      • Altmetrics can supplement traditional citation impact with non-traditional online impact
        • E.g., educational, discussion-based
      • Often weaker than citation data but useful for research groups that have non-standard types of impacts
    • 12. The Integrated Online Impact Indicator (IOI)
      • Combines a range of online sources into one indicator
        • Google Scholar +
        • Google Books +
        • Course reading lists +
        • Google Blogs +
        • PowerPoint presentations = IOI
      • OR select individual separate components
      Invented by Kayvan Kousha
    • 13. New source 1: Google Scholar
      • Wider evidence of academic impact
      • Wider types of academic publications, some non-academic publications
      • Not reliable
      • Coverage variable
      • Can’t be automatically queried
      • Free
    • 14. New source 2: Google Books
      • Books typically not indexed in WoS or Scopus
      • Relevant in book-based disciplines (arts, humanities, some social sciences)
      • Reliability unknown but probably not good
      • Coverage variable
      • Can be automatically queried
      • Free [ Clifford Lynch ]
    • 15. New source 3: Course reading lists
      • Evidence of educational impact
      • Can automatically construct queries to detect individual articles in online syllabuses
      • Get results via advanced Google/Yahoo/Live Search queries
      • Works for most articles
        • Fails for short common article titles
    • 16. New source 4: Blogs
      • Evidence of impact on discussions
      • Educational impact, public dissemination evidence, academic impact in discursive subjects?
      • Not possible to automate in the largest database (Google Blogs)?
      • Not a well researched area
    • 17. New source 5: PowerPoint Presentations
      • Evidence of educational/scholarly impact
      • Especially relevant for discursive subjects?
      • Automated Live Search/Yahoo advanced queries
      • IOI = a*Scholar + b*PowerPoint + c*Blogs + d* Syllabus + e* Books
      Or use qualitative analyses of the different sources
    • 18. 2. Sentiment Strength Detection in the Social Web with SentiStrength
      • Detect positive and negative sentiment strength in short informal text
        • Develop workarounds for lack of standard grammar and spelling
        • Harness emotion expression forms unique to MySpace or CMC (e.g., :-) or haaappppyyy!!!)
        • Classify simultaneously as positive 1-5 AND negative 1-5 sentiment
      Thelwall, M., Buckley, K., & Paltoglou, G. (in press).  Sentiment strength detection for the social Web . Journal of the American Society for Information Science and Technology . Thelwall, M., Buckley, K., Paltoglou, G., Cai, D., & Kappas, A. (2010).  Sentiment strength detection in short informal text . Journal of the American Society for Information Science and Technology , 61(12), 2544-2558.
    • 19. SentiStrength Algorithm - Core
      • List of 2,489 positive and negative sentiment term stems and strengths (1 to 5), e.g.
        • ache = -2, dislike = -3, hate=-4, excruciating -5
        • encourage = 2, coolest = 3, lover = 4
      • Sentiment strength is highest in sentence; or highest sentence if multiple sentences
    • 20.
      • My legs ache.
      • You are the coolest.
      • I hate Paul but encourage him.
      -2 3 -4 2 1, -2 positive, negative 3, -1 2, -4
    • 21. Extra sentiment methods
      • spelling correction nicce -> nice
      • booster words alter strength very happy
      • negating words flip emotions not nice
      • repeated letters boost sentiment/+ve niiiice
      • emoticon list :) =+2
      • exclamation marks count as +2 unless –ve hi!
      • repeated punctuation boosts sentiment good!!!
      • negative emotion ignored in questions u h8 me?
      • Sentiment idiom list shock horror = -2
      Online as http://sentistrength.wlv.ac.uk/
    • 22. Tests against human coders SentiStrength agrees with humans as much as they agree with each other 1 is perfect agreement, 0 is random agreement Data set Positive scores -correlation with humans Negative scores -correlation with humans YouTube 0.589 0.521 MySpace 0.647 0.599 Twitter 0.541 0.499 Sports forum 0.567 0.541 Digg.com news 0.352 0.552 BBC forums 0.296 0.591 All 6 data sets 0.556 0.565
    • 23. Why the bad results for BBC? (and Digg)
      • Irony, sarcasm and expressive language e.g.,
        • David Cameron must be very happy that I have lost my job.
        • It is really interesting that David Cameron and most of his ministers are millionaires.
        • Your argument is a joke .
      $
    • 24. 2. Twitter – sentiment in major media events
      • Analysis of a corpus of 1 month of English Twitter posts (35 Million, from 2.7M accounts)
      • Automatic detection of spikes (events)
      • Assessment of whether sentiment changes during major media events
    • 25. Automatically-identified Twitter spikes 9 Mar 2010 9 Feb 2010 Proportion of tweets mentioning keyword Thelwall, M., Buckley, K., & Paltoglou, G. (2011).  Sentiment in Twitter events .  Journal of the American Society for Information Science and Technology,  62(2), 406-418.
    • 26. Chile matching posts Sentiment strength Subj. Increase in –ve sentiment strength 9 Feb 2010 9 Feb 2010 Date and time Date and time 9 Mar 2010 9 Mar 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Proportion of tweets mentioning Chile
    • 27. #oscars % matching posts Sentiment strength Subj. Increase in –ve sentiment strength Date and time Date and time 9 Feb 2010 9 Feb 2010 9 Mar 2010 9 Mar 2010 Av. +ve sentiment Just subj. Av. -ve sentiment Just subj. Proportion of tweets mentioning the Oscars
    • 28. Sentiment and spikes
      • Statistical analysis of top 30 events:
        • Strong evidence that higher volume hours have stronger negative sentiment than lower volume hours
        • No evidence that higher volume hours have different positive sentiment strength than lower volume hours
      • => Spikes are typified by small increases in negativity
    • 29. 3. YouTube Video comments
      • 1000 comm. per video via Webometric Analyst (or the YouTube API)
      • Good source of social web text data
      • Analysis of all comments on a pseudo-random sample of 35,347 videos with < 1000 comments
    • 30.  
    • 31. Using Webometric Analyst
      • Download free from http://lexiurl.wlv.ac.uk
      • Start, select classic interface, YouTube Tab
    • 32. Reply networks
      • Illustrate the replies to a YouTube video in network form
      • Reveal age and gender of posters
      • Reveal patterns of discussion in the replies (if any)
      • Take up to 25 minutes to make per video with Webometric Analyst
    • 33. Reply network Extended core interactions 2x2=5 video Nodes (people) blue = male pink = female Arrows (replies) red = happy replies black = angry replies
    • 34.
      • The 10 most ridiculous Black Metal videos
      • A very sparse reply network
      • Nodes are mostly connected in 2s and 3s
    • 35.
      • Black Metal vs. Deathcore
      • Denser reply network
      • On-going debates and
      • contentious issues
    • 36. Other networks
      • Connections can be
      • Friendship in YouTube (must be reciprocal)
      • Subscription in YouTube (non-reciprocal, based upon interest in video content)
      • Friends in common in YouTube
        • suggests factors in common (e.g., bands) rather than people in common
      • Subscriptions in common in YouTube
        • again suggests factors in common (e.g., bands) rather than people in common
    • 37. Very sparse Friend network = common
    • 38. Common Friends network – with a densely connected core
    • 39. Large-scale analysis of YouTube
      • Purpose: to discover patterns, norms and unusual behaviour in YouTube
      • Method:
      • Generate a large sample of YouTube videos
        • Running searches for many terms from a large word list
        • Selecting a video at random from each set of results
      • Extract properties of the videos and commenters
      • Calculate averages and distributions
      • Examine extreme videos
    • 40. Comments are mainly positive
    • 41.  
    • 42. Controversial and non-controversial ?
    • 43. Conclusions
      • Sentiment analysis allows large-scale social web analysis
      • YouTube videos can be analysed easily with a network perspective on their comments
      • http://lexiurl.wlv.ac.uk/ - Webometric Analyst
      • Can make a reply network of the discussions of video
      • _-OAsF9uRfc by following the instructions at http://lexiurl.wlv.ac.uk/searcher/youtubereplies.html
    • 44.  

    ×