Data Mining for Moderation of Social Data

759 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
759
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
18
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Mining for Moderation of Social Data

  1. 1. Data Mining for Moderation of Social DataFernando G. GuerreroCEO SolidQfguerrero@solidq.com
  2. 2. © 2011 SolidQ 3
  3. 3. Introductions• Fernando G. Guerrero• Global CEO of SolidQ• fguerrero@solidq.com• Microsoft Regional Director for Spain since 2004• SQL Server MVP from year 2000 till 2007• Usual suspect at many international conferences
  4. 4. SolidQ 2012… 10th anniversary• 160 people in 23 countries: • Argentina, Australia, Austria, Bulgaria, Canada, Chile, Costa Rica, Croatia, Denmark, France, Germany, India, Israel, Italy, Mexico, Saudi Arabia, Serbia, Slovakia, Slovenia, Spain, Sweden, UK, USA• 50 current or former RDs or MVPs• Authors of many books, articles, and whitepapers• Research Collaboration with: • Universidad de Alicante • Universidad de les Illes Balears • Universidad de Santiago de Compostela • The European Union • The Spanish Ministry of Economy and Innovation
  5. 5. Agenda• Social Data• Market Research• Sentiment Analysis, Text Mining• Moderation, Data Mining• SolidQ Research Lines in Social Data© 2012 SolidQ 6
  6. 6. Social data is everywhere© 2012 SolidQ 7
  7. 7. 8
  8. 8. Social data is about everything Music© 2012 SolidQ 9
  9. 9. Social is there • Is your organization promoting social about you?ProductsServicesStories © 2012 SolidQ 10
  10. 10. Social is there, reputation• What is social saying about you? • Product • Services • Decisions • Image© 2012 SolidQ 11
  11. 11. Market Research• What is social requesting you? • Future Services • Product updates• Can you ask questions to social? • Is this service going to succeed • How can I fixed the current problem • Is society ready for this law© 2012 SolidQ 12
  12. 12. Sentiment Analysis, Text Mining The movie The movie The movie was fabulous! stars Mr. X was horrible! [ Sentimental ] [ Factual ] [ Sentimental ]© 2012 SolidQ 13
  13. 13. © 2011 SolidQ 14
  14. 14. What is Data Mining?• Inform actionable business decisions• Contrasts with “machine learning”© 2012 SolidQ 15
  15. 15. Media Case Study• Millions of posts per year (different moderation scenarios)• About 25% are human moderated• About 10% of the moderated posts fail• No Business Intelligence applications for analysis or reporting© 2012 SolidQ 16
  16. 16. Moderation, Data Mining• Contextual Information • Time • Location • User• At 10am comments are safer than at 2AM.• A user maybe safe talking about science bad dangerous talking about sports.• If a thread is hot (dangerous), comment maybe hot.• Combining context pattern the systems assign risk to posts without going into the text.© 2012 SolidQ 17
  17. 17. Solution – Logical Model• Post Context (behavior analysis) • Patterns, data mining.• Post Content (text analysis) • Profanity, low score sentences, text mining, mood or tone (sentiment analysis)© 2012 SolidQ 18
  18. 18. Typically Available Data on Posts• Historical and real time data for: • User (e.g. userid, email, nationalid) • Location (e.g. Life & Style  Fashion) • Time (e.g. 12 March 2011 18:56) • Content (e.g. text, link, picture, video). • Moderation result• Other attributes like geography, age, education could be used© 2012 SolidQ 19
  19. 19. Post context, Patterns, DataMining• User behavior.• Time behavior.• Location behavior.© 2012 Solid Quality Mentors 20
  20. 20. Building useful attributes • 1.- Thread ( % Fails in a certain thread) • 2.- User (% Fails per User) • 3.- Diff Hour Forum Created (TimeDatePosted-TimeForumCreated) • 4.- User Forum (% Fails in a certain forum) • 5.- Diff Last for User (TimeDatePosted - TimeLastFailUser) • 6.- Hour of the day • 7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined) • 8.- User Thread (% Fails per User in a thread) • 9.- Diff Hour Thread Created (TimeDatePosted-TimeThreadCreated) • 10.- Day of Week • More than 100 attributes.© 2012 Solid Quality Mentors 21
  21. 21. Hard Work• Periods.• Algorithms.• Algorithms parameters.• Model refreshing.• Attribute analysis.• Outliers.• Overpopulating.• Behavior after this systems is in production.© 2012 Solid Quality Mentors 22
  22. 22. Data Mining Algorithms• Decision Trees/Linear Regression• Sequence Analysis• Neural Networks/Logistic Regression• Clustering• Text Mining (Words and Phrases)© 2012 SolidQ 23
  23. 23. Conclusion on Context• Risk based on context of the post • Time • User’s history • Publish location• Enables risk analysis for all type of content • Comments (in any language) • Links • Pictures • Videos© 2012 SolidQ 24
  24. 24. Logical Model: Post content• Profanity Analysis• Text Mining The first minister and his secretary found sleeping together last night. They got drunk at a nearby pub.• Sentiment Analysis© 2012 SolidQ 25
  25. 25. © 2011 SolidQ 26
  26. 26. Moderation, Data Mining System© 2012 SolidQ 27
  27. 27. © 2011 SolidQ 28
  28. 28. Analysis and Reporting• Published through integrated web application • Moderation statistics. • Users statistics. • News and Stories Statistics. • Peaks.© 2012 SolidQ 29
  29. 29. Conclusion: Benefits• Moderating half of the total posts, the solution captures 90% of failing posts. The remaining 10% seem to be likely safe posts.• Using Intelligent Moderation, media companies scan the whole universe of posts at a comparatively low cost.• At peak times, Intelligent Moderation works perfect.© 2012 SolidQ 30
  30. 30. Football night in Europe• On January 25th, 2012: • Liverpool defeated Manchester City in the Carling Cup • Barcelona defeated Real Madrid in Copa del Rey• More than 100.000 comments arrived to the different BBC sites during 10 hours• All comments were filtered through our system• No problems observed during that time© 2011 SolidQ 31
  31. 31. SolidQ Team in this project• Project Managers • Francisco Gonzalez, Javier Torrenteras, Alejandro Leguizamo• Developers • Itzik Ben-Gan, Enrique Puig, Ruben Pertusa, Carlos Martinez , Fernando G. Guerrero• Technical reviewers • Mark Tabladillo, Dejan Sarka• Social Media Specialist. • Jose Quinto, Rocio Díaz© 2012 SolidQ 32
  32. 32. SolidQ Reseach• Incomplete Grammar Analysis• Human interaction with IT systems • Collaboration • Contextual analysis• Sentiment Analysis • Market Research • Reputation• Data Mining of context Social • Moderation • Market Research • Reputation© 2012 SolidQ 33
  33. 33. Invisible computing…… Driven by Social Data 34
  34. 34. THANK YOU!Fernando G. GuerreroGlobal CEO SolidQfguerrero@solidq.com © 2012 SolidQ 35

×