The Power of Social Media (Ricardo Baeza-Yates)


Published on

The Power of Social Media. Slides presented by Ricardo Baeza-Yates, director of the Yahoo! Research labs at Barcelona, Spain and Santiago, Chile, during the public kickoff of the LiveMemories project

Published in: Education, Technology
  1. 1. The Power Social Media of Ricardo Baeza-Yates VP, Yahoo! Research Barcelona, Spain & Santiago, Chile Today is the Memory of Tomorrow Remember!
  2. 2. Yahoo! Research Agenda The Internet and the Web today Web 2.0 and Social Media Example: Social Search Yahoo! Research The Wisdom of the Crowds The Future 3 Internet and the Web
  3. 3. Yahoo! Research Internet and the Web Today Between 1 and 2.5 billion people connected – 5 billion estimated for 2015 1.8 billion mobile phones today – 500 million expected to have mobile broadband in 2010 Internet traffic has increased 20 times in last 5 years Today there are more than 185 million Web servers – 50% Apache, 34% Windows The Web is in practice unbounded – Dynamic pages are unbounded – Static pages are over 12 billion? 6 Yahoo! Research Trends • Web 2.0, social networks – Fragmentation of content ownership – Fragmentation of the access (age, topic, etc.) – Fragmentation of the right to access • Increase of the Semantic Web – RDF, microformats, metadata in general • Increase of Internet advertising associated to search/content 7
  4. 4. Yahoo! Research Advertising 2011 USA 2012 2011 2007 Yahoo! Research Advertising and the Web 2.0 The power of the mouth to mouth The power of the influential bloggers Viral Marketing – Positive (Dove) – Negative (HSBC) Presence in virtual(?) worlds (Second Life) 9
  5. 5. Yahoo! Research Yahoo! Scale (2007) 24 languages, 20 countries > 4 billion page views per day (largest in the world) > 500 million unique users each month (half the Internet users!) > 250 million mail users (1 million new accounts a day) 95 million groups members 7 million moderators 4 billion music videos streamed in 2005 20 Pb of storage (20M Gb) – US Library of congress every day (28M books, 20TB) 15 Tb of data processed per day 7 billion song ratings 2 billion photos stored 2 billion Mail+Messenger sent per day 11 Social Media
  6. 6. Yahoo! Research New Trends 13 Yahoo! Research The Web: A Play in Three Acts Public “ Th e ” We b Personal “ My ” We b Social “ O u r” We b 14
  7. 7. Yahoo! Research Web 2.0: Ingredients Reviews Groups APIs RSS IM Blogs VoIP Phot os Tags Video Podcast s Bookm arks Audio Playlist s 15 Yahoo! Research Some Social Networks Blogs – Directed collaborative topical discussions Instant messenger – Buddy list Yahoo! Groups – Topically focused communities MySpace, Facebook, Friendster, Orkut – Friendship network – Collaborative bookmarking Flickr, You Tube – Photo/video sharing and tagging Yahoo! Answers – People answering people 16
  8. 8. Yahoo! Research Web 2.0 in Yahoo! Sit ios sociales t uvieron 115M visit ant es únicos, 56M “ m enores de 35” . • Yahoo! Groups 8 million, 1 of each 10 members • 2 million users • Flickr 1 million pictures per day • Yahoo! Respuestas 100M users, 150M answers • Messenger 85M unique users (dat os del 2007) Yahoo! Research Why do people come online? To communicate To be informed To be entertained Increasingly… to be part of new forms of participation, belonging and sharing To be part of social media – also referred as Social Networks 18
  9. 9. Yahoo! Research “One-way” Content Film Clips Competition Critics Picture Gallery Community Content User’s photos User’s reviews User knowledge 20 Yahoo! Research S o c ia l N e t w o rk s Ma in ly y o u n g p e o p le ( 1 3 -2 5 ) Mo b ile u s e 22
  10. 10. Yahoo! Research Who are they? Ag e % Re p re s e n t a t iv e in t e re s t s 25 Yahoo! Research What makes Flickr special? 1. User Generated Content Content not licensed from providers such as Corbis or Getty, but rather contributed by users. 2. User Organized Content Content is tagged, described, organized, discovered, etc. not by “editors” but by the users themselves. 3. User Distributed Content Flickr achieved distribution across the internet, not through “business deals” per se, but rather through the Flickr community which distributed Flickr content on 3rd-party blogs. 4. User Developed Functionality Flickr exposed APIs (PHP, Perl, etc.) that allowed the community of developers to build against the Flickr platform. Entire ecosystem created by less than ten employees… 26 aided by millions in the Flickr community.
  11. 11. Yahoo! Research Visualizing Tags: Tag Cloud from Flickr 27 Yahoo! Research A Digression: Computer Vision is hard 29
  12. 12. Yahoo! Research 30
  13. 13. Yahoo! Research 34 Yahoo! Research In t e rn e t UGC ( Us e r Ge n e ra t e d Co n t e n t ) Ha v e y o u e x p e rie n c e d UGC? Ty pp ess oof f Co nn t enn t Ty e Co t e t No Ye s Mu lt ip le Ch o ic e Pho to s , Im a g e s As a P u b lis h e r Te x t Vid e o s As a Co n s u m e r Mu s ic An im a t io n , Fla s h Ot h e rs Source: National Internet Development Agency Report in June, 2006 (South Korea) 38
  14. 14. Yahoo! Research Simple acts create value and opportunity Usin g a syst e m of u se r -a ssig n e d r a t in g s, LAUN CH ca st b u ild s u p a p r of ile of p r e f e r e n ce s f or e a ch in d ivid u a l. . Use r s ca n t h e n Th e m or e r a t in g s sh a r e t h e ir u se r s m a k e , t h e cu st om r a d io m or e st a t ion w it h in t e llig e n t t h e f r ie n d s t h r ou g h r a d io b e com e s. Ya h oo! M e sse n g e r W e h a ve ove r 6 t a k in g a ll t h e b illion r a t in g s h a ssle ou t of d iscove r in g LAUN CH ca st = n e w m u sic m u sic t h a t list e n s t o you 40 Yahoo! Research Community Dynamics 1 creators 10 synthesizers 100 consumers Next generation products will blur distinctions between Creators, Synthesizers, and Consumers Example: Launchcast Every act of consumption is an implicit act of production that requires no incremental effort… Listening itself implicitly creates a radio station… 41
  15. 15. Yahoo! Research Social Process Millions of users of Flickr share and tag each others’ photographs (why???) Fernando Flores: Blogs – Look into the future – Warning – Commotion – Institution Individual or collaborative – Community newspaper: Power law distribution 42 Social Search
  16. 16. Yahoo! Research The Knowledge Challenge Challenge Enabling users to share knowledge with their community to create a better search experience Exam ple Number of Results Query: Vacat ion Chile Vacation Chile 26,800,000 Query: “ Everyt hing Ricardo knows about Chile” “Everything Ricardo knows about Chile” 0 44 Yahoo! Research Subjective Queries The kinds of queries that rely on domain expertise… “Do you know a reputable plumber in Southampton?” “Where is the cool nightlife in Trento?” “What political blogs do you think I’d enjoy reading?” “Where can I buy a cool pair of shoes?” These kinds of queries are ill-served by today’s search engines, but are ironically the most valuable (i.e. transactional queries.) How do we capture the people’s experience? 45
  17. 17. Yahoo! Research Social Powered Search: Yahoo! Answers Democratize process of “voting” (whether explicit or implicit) Move out of the purview of webmasters and hand control back to users Allow dynamic assignment to various authorities of trust, new degree of freedom “Better Search Through People” 48 Yahoo! Research Challenges in Social Search How do we use UGC for better search? What’s the ratings and reputation system? How do you cope with (social) spam? What are the incentive mechanisms The bigger challenge: Where else can you leverage the power of the people? 49
  18. 18. Yahoo! Research Yahoo! Research Agenda European search vision Leader board Knowledge - the next challenge People power Making knowledge pay Poorly formed questions 51
  19. 19. Yahoo! Research Askers Answerers P. Jurczyk, E. Agichtein: “Discovering authorities in Q.A. communities by using link analysis” CIKM'07 Yahoo! Research No definitive answer Unverifiable answer Community consensus 53
  20. 20. Yahoo! Research What are the Problems? Which questions are legitimate? What is the incentive system? How do we validate answers? What is the role of the community? What is the reputation system? 54 Yahoo! Research What are the challenges? Community of users – Social system Incentives and reputations – Economic system Poorly phrased, “gramatically” limited queries – Language analysis Improving user experience from past data – Data mining 57
  21. 21. Yahoo! Research What are the sciences? Information retrieval & language processing Microeconomics Duncan Watts Data Mining Six Degrees of Separation Sociology and human-computer interaction Community networks 58 The Wisdom of the Crowds
  22. 22. Yahoo! Research The Rationale behind Web Mining The Wisdom of Crowds - James Surowiecki - 2004 – “Under the right circumstances, groups are remarkably intelligent” • Importance of diversity, independence and decentralization – “large groups of people are smarter than an elite few, no matter how brilliant—they are better at solving problems, fostering innovation, coming to wise decisions, even predicting the future”. • How to deploy this in the next generation of social search and media services? – SEMEDIA video retrieval EU Project (with BBC, Glasgow U., Smoke & Mirrors, Joaneeum & UPF) 61 Yahoo! Research 63
  23. 23. Yahoo! Research Anchor Text The wisdom of the crowds can be used to search The principle is not new – anchor text is used in “standard” search: when indexing a document D, include anchor text from links pointing to D Arm o n k, NY-b a s e d c o m p u t e r g ia n t IBM a n n o u n c e d t o d a y www.ib m .c o m Big Blu e t o d a y a n n o u n c e d Jo e ’s c o m p u t e r h a rd wa re lin ks re c o rd p ro fit s fo r t h e q u a rt e r Co m p a q HP IBM 64 Yahoo! Research Quality and Frequency Chris Anderson: “The Long Tail”. Hyperion, 2006. Frequency Traditional publishing User- generated Quality
  24. 24. Yahoo! Research Quality and Quantity Chris Anderson: “The Long Tail”. Hyperion, 2006. Quantity User- generated Traditional publishing Quality Yahoo! Research Chris Martin from Coldplay in The Rolling Stone, Fortieth Aniversary, July 2007. “ W e t h in k it 's a ll a b o u t q u a lit y o v e r q u a n t it y n o w , b e c a u s e t h e re 's s o m u c h n o is e e v e ry w h e re , t h e re 's n o p o in t in p u t t in g a n y t h in g o u t u n le s s it 's f u c k in g a m a z in g . ” Quantity Quality
  25. 25. Yahoo! Research The Push for Quality Quantity User- generated ? Traditional publishing Quality Yahoo! Research
  26. 26. Yahoo! Research ¼ questions want an opinion: informal polls ¾ questions seek for information or advice
  27. 27. Yahoo! Research Q. Su, D. Pavlov, J.-H. Chow, W. C. Baker. “Internet-scale collection of human-reviewed data”.WWW'07. 17%-45% of answers were correct 65%-90% of questions had at least one correct answer
  28. 28. Yahoo! Research There are top contributors ... ... but they don't have all the answers Yahoo! Research What about real quality? Question quality Answer quality Question quality and answer quality are not independent and can be predicted reasonable well (Castillo et al, 2008)
  29. 29. Yahoo! Research Influence Leadership (Bopal et al, 2008) Influence of social graph in particular actions – Social graph: Yahoo! Instant Messenger – Actions log: Yahoo! Movies • Action = user u rated movie m at time t – joined through common users identifiers Started from Yahoo! Instant Messenger subgraph of “most active” users (110M nodes) and 21M ratings from Yahoo! Movies. – Ended with 217.5K nodes, 221.4K edges and 1.8M ratings. 77 Yahoo! Research Leaders vs. Tribe leaders 78
  30. 30. Yahoo! Research The Wisdom of Crowds Crucial for Search Ranking Text content: Web Writers – not only for the Web! Links: Web Publishers Annotations: Web 2.0 Users – Tags, bookmarks, comments, ratings, etc. Queries: All Web Users! – Queries and actions 79 Yahoo! Research Query Intention (Broder, 2000) ~25% Informational •~40% Navigational •~35% Transactional 80
  31. 31. Yahoo! Research Mining Queries for ... Improved Web Search Ranking Query recommensations User Driven Design – Information Scent – The Web Site that the Users Want – The Web Site that You should Have – Improve content & structure 85 Bootstrap of pseudo-semantic resources Yahoo! Research Query Mining: Relating Similar Queries
  32. 32. Yahoo! Research Implicit Folksonomy Yahoo! Research Implicit Knowledge (Baeza-Yates et al, 2007)
  33. 33. Yahoo! Research Experimental Evaluation Yahoo! Research Some Open Issues • Implicit social network – Any fundamental similarities? • How to evaluate with partial knowledge? – Data volume amplifies the problem • User aggregation vs. personalization – Optimize common tasks: help more people – Move away from privacy issues
  34. 34. Epilogue Yahoo! Research The Future The Web is scientifically young It is intellectually diverse – The human element – The social element The technology mirrors the economic, legal and sociological reality 92
  35. 35. Yahoo! Research Mirror of the Society 93 Yahoo! Research Exports/Imports vs. Domain Links Web Spam Challenge: Baeza-Yates & Castillo, WWW2006 • UK Web Collection • Training set with thousands of judged sites 94
  36. 36. Yahoo! Research What’s next? Fourth generation: From Information Retrieval to Information Supply Explicit Act ive dem and for inform at ion Increase use supply inform at ion of cont ext driven by driven by a user query user act ivit y and cont ext 96 Yahoo! Research Web 3.0? We are at Web 2.0 beta People wants to get tasks done – Where I do go for a original holiday with 1,000 euros? Take in account the context of the task I want to book a vacation in Tuscany. Start Finish Yahoo! Experience 97