Lecture 5: Mining, Analysis and Visualisation


Published on

This is the fourth lecture in the Social Web course at the VU University Amsterdam

Visit the website for more information: <a>Social Web 2012</a>

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • \n
  • \n
  • Anastasios Martidis\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Statistics: define a hypothesis, then test it\nData Mining: Test all possible hypotheses\n- crosslink the data\n
  • include visualization --&gt; as part of the interaction with the data, add-on analysis\n
  • - validity of the data --&gt; choosing the data and whether it is reliable\n- is it a static/dynamic data?\n- how often does it change, what changes?\n
  • Ashton Setoe\nAlso submitted by: Lilian Tjon, Sahar, Binyam Mersa, Niels, Mustafa &amp; Paul\n\n
  • A heterogeneous network model for so-\ncial media with `like&apos; function, using Facebook as an\nexample. A blue bidirectional arrow is the friend-\nship link. A red dashed arrow is a like action, while\na green arrow denotes a post action, annotated with\nthe time stamp when it was posted. The dashed blue\nline shows that the two photos are visually similar\n
  • \n
  • Create vectors that describe image content features (color histogram, color correlogram, texture features etc) and textual content (topics from text)\n
  • Classification: Spam/no-spam\nAssociation: Supermarket finding out which items are frequetly bought together: chips &amp; beer \n\n
  • Maarten Groen\n
  • Data product: is interest based recommendations\nEvaluation is missing from the paper\nIn the example for twitter - mention also the followers (as part of the PageRank)\n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • \n
  • Lecture 5: Mining, Analysis and Visualisation

    1. 1. Social Web Lecture 5How can we MINE, ANALYSE and VISUALISE the Social Web? (1)
    2. 2. Why?• UCG provides an enormous wealth of data • insights in users’ daily lives • insights in communities • insights in trends
    3. 3. What’s the added value of mining social web data for the individual?
    4. 4. To whom it may concern• Politicians• Companies• Governmental institutions• You?
    5. 5. The Age of Big Data• 25 billion tweets on Twitter in 2010, by 175 million users• 360 billion pieces of contents on Facebook in 2010, by 600 million different users• 35 hours of videos uploaded to YouTube every minute• 130 million photos uploaded to flickr per month
    6. 6. Questions to Ask• Who uploads/talks? (age, gender, nationality, community)• What are the trending topics?• What else do these users like?• Who are the most/least active users?• etc.
    7. 7. The Rise of the Data Scientist http://radar.oreilly.com/2010/06/what-is-data-science.html
    8. 8. The Rise of the Data Scientist• Data Science enables the creation of data products• Data products are applications that acquire their value from the data, and create more data as a result.• Users are in a feedback loop: they constantly provide information about the products they use, which gets used in the data product.
    9. 9. Popular Data Products
    10. 10. Data Mining 101 Data mining is the exploration and analysis of large quantities of data in order to discover valid, novel, potentially useful, and ultimately understandable patterns in data. (Inspired by George Tziralis’ FOSS Conf’09, John Elder IV’sSalford Systems Data Mining Conf. and Toon Calders’ slides) http://www.freefoto.com/images/33/12/33_12_7---Pebbles_web.j
    11. 11. Data Mining 101Databases Statistics Artificial Intelligence
    12. 12. Steps• Data input & exploration• Preprocessing• Data mining algorithms• Evaluation & Interpretation
    13. 13. Data Input & Exploration• What data do I need to answer question X?• What variables are in the data?• Basic stats of my data?
    14. 14. Are all likes equal? Do they all mean the same? Do people like for the same reason?The ‘likes’ across the different systems?
    15. 15. Input & Exploration in ‘LikeMiner’
    16. 16. Preprocessing• Cleanup!• Choose a suitable data model • What happens if you integrate data from multiple sources?• Reformat your data
    17. 17. Preprocessing in ‘LikeMiner’
    18. 18. Data mining algorithms• Classification: Generalising a known structure & apply to new data• Association: Finding relationships between variables• Clustering: Discovering groups and structures in data
    19. 19. How do you know you measured what you wanted to measure?
    20. 20. Mining in ‘LikeMiner’• Filter users by interests• Construct user graphs• PageRank on graphs to mine representativeness• Result: set of influential users• Compare page topics to user interests to find pages most representative for topics
    21. 21. Interpreting your results
    22. 22. Data Mining is not easy
    23. 23. Mining Social Web Data source: http://kunau.us/wp-content/uploads/ 2011/02/Screen-shot-2011-02-09- at-9.03.46-PM-w600-h900.png
    24. 24. Single Person Source: http://infosthetics.com/archives/2011/12/ all_the_information_facebook_knows_about_you.html See also: http://www.youtube.com/watch?feature=player_embedded&v=kJvAUqs3Ofg
    25. 25. Populations http://www.brandrants.com/brandrants/obama/
    26. 26. Brand Sentiment via Twitterhttp://flowingdata.com/2011/07/25/brand-sentiment-showdown/
    27. 27. Assignment 3: Data Analysis • Analyse an existing social data analysis report • Apply same analyses to your own data • Write research report http://www.actmedia.eu/media/img/text_zones/English/small_38421.jpg
    28. 28. Final Assignment:Your SocWeb App • Create a Social Web app with your group • Use structured data, relationships between entities, data analysis, visualisation • Write individual research report on one of the main aspects of your app Image Source: http://blog.compete.com/wp-content/uploads/2012/03/Like.jpg
    29. 29. Hands-on Teaser• Your Facebook Friends’ popularity in a spread sheet• Locations of your Facebook Friends• Tag Cloud of your wall posts image source: http://www.flickr.com/photos/bionicteaching/1375254387/