Sysom nc. mos In Socia S al Media a Business Intelligence e for Social Media Mee usin M ts Bu ess contact@ @sysomos.c com Intelligen nce www.sy ysomos.com (866) 483 3338 Vue M V Magazine & & Cana adian Marke M eting A ciation Assoc 120 Baldw win St, Suit te 3 Toro onto, ON Repor R rt on SSocial l Med dia M5T 1L6 Analyt A tics Whi itepaper April 2009 Dr. Nick Ko oudas
2 ABST TRACT User generat ted content (e.g., web log gs or blogs, social network ks, micro‐blogging such as twitter, wikis t s, collaborativ ve tagging, ne ews, music sit tes, podcast aand video sharing sites) are e proliferating at unpreceddented rates. The numbe quantifying user part ers ticipation aree astonishing; more than half a billion i individuals in social networking sites w n worldwide, in n excess of 10 million blo 00 ogs, millions of users utilizing micro‐‐blogging serrvices, etc. In n aggregate such services g generate very y large amoun nts of data on a daily basis. Commonly y the word soc t cial media is u used to refer to this inform mation collecctive, primarily contributed d by individualss online. ● ● ● Historically co orporate data abases have been accumu ulating inform mation about any aspect o of a business fr rom employm ment records to corporate supplies an sales reco nd ords. Business us ser generat ted Intelligence is a mature s of technologies offerin deeper un set ng nderstanding of such data a. content: For example such technologies inform analysts and executives n not only that t the volume o of sales in a reg s gion is increa asing or decr reasing but a also point to the reasons behind such s h o blogs changes. o works social netw o wikis o podcasts In this article e we argue th hat business intelligence t technologies and function nalities can bee o microbloggs of great serv vice to social media as w well. This wea of inform alth mation contai invaluable ins e knowledge and intelligenc ce for marketers, public r relations agen ncies, advertisers, politica al o news campaigns, la aw enforceme ent agencies among many y others. o video ● ● ●
3 INTR RODUCTION N – FROM D ATA TO INT TELLIGENCE Efficient collection, archiving and cleaning of such coll g lective knowledge provides the found dation for rob bust analytics from which busin ness intelligen may be e nce extracted. Su analytics can enable o to contra events ac uch one ast cross multiple e time periods and identif when and where, and most impo s fy d d ortantly for w what reasonss, perceptions or acceptance of brands a and products are changing. 1‐2 For exa ample we can identify the target dem n e mographic (e.g males in e g., eastern US o of ages 19‐29) tthat have ext tremely posit tive opinions about a new tech gadget. . Similarly, wee may locate the target demographic with the mo negative opinions abo the same ost out e gadget and automatically discover the reasons behind su uch sentiment (a lot o of disappointment around the screen size e and graphiccs in the new gadget). 3 By tapping g into the info rated by users on a daily b ormation collective gener basis, one can n identify crisi events for brands, pro is r oducts, indiv viduals or the general pu e ublic, as they y ● ● ● happen and react effectiv vely. 4 One can e effectively ide entify how ideas or rumor rs spread across the globe e and identify y key influenti sources th primarily contribute to such spread (e.g., the c ial hat o crisis situationn Analytical Capacities for a brand started when n a particular r posting apppeared on a b blog with widde readership p, escalated intto several on nline forums aand subsequently made it t to wide circ culation news 1. 1 Demographics sources; the top 10 blog and forums actively wr gs riting about t crisis with the highes the st 2. 2 Sentimen nt readership can be subseq quently identified). 3. 3 Crisis Managem ment 5 Gain insigh ht on the imp pact of an adv vertising campaign around d the globe as s a function o of 4. 4 Influence er time (positivve sentiment towards a br rand increase ed steadily inn a particular demographic following a p particular online ad camp eceiving the most positive paign with the video ad re e Identifica ation engagement t online). 5. 5 Measurem ment 6. 6 Comparis son 6 Obtain feedback on how key comp petitive products and bra ands fare com mparatively at 7 Engagement 7. tions around the world o in aggrega across se several locat or ate, elect demogrraphic groups (professional males in th 30s pref product A to B in the US, but the situation is heir fer A different in C Canada). ● ● ● 7 Identify ke ey communit ties or individ to engage with (a technol duals online t logy centered d community w with specific f focus on wifi enabled gadgets). Taking such desired d functionality y one step fuurther, it is possible to contrast suc informatio with key performance ch on e indicators (e e.g., volume o sales as a function of t of time) and un nderstand cor rrelations and d influence of social media on the bottom line. It is e evident that the ability to make sense of the infor o e rmation in so ocial media o offers tremen ndous value to marketing g, publi relations a brand co ic and onscious corp porations as they are no able to obtain continu ow uous feedbac from their ck
4 customers understand issues, rea to them as well as efficiently engage with s, act h ● ● ● consumerrs. Such ability y is becoming g increasingly y important. “N New media a, social A variety of studies point to changing trends in the ways people commmunicate and d exchange information, the ways pe eople choose to be inform med, the typess of preferred d media and innternet media for their enterta ainment. New w media, social media, and nnectivity has d internet con onnectivity co y has become in ncreasingly immportant. As a result, thi has led to fundamental shifts in the s is e beecome incr reasingly way people research p products to ob btain informa ation and the way in which h opinions are e im mportant... this has expressed d. Given the v volume of inf formation inv volved it is ev vident that an ny attempt to o manually make sense of this info ormation collective is at best extremely slow and d led d to fundamental expensive e, and more likely destined d to fail. sh hifts” Technolog gical advance enable th effective processing o very large volumes o es he of e of ● ● ● informatioon in real tim me. Hence wee outline the challenges associated wit th an attemp pt to provid business intelligence insights utilizing the soc de cial media c collective and d highlight h how technolo ogy can aid the ability to collect, proce ess, and mos st importantlyy social media. interpret s DAT A COLLECTI ION Socia media, whi consists of blogs, wiki message b al ich o is, boards, social networks, p podcasts, mic croblogging, bookmarking g, and o online videos, is a highly diverse and heeterogenous s set of data. Collecting data a from heteroogeneous souurces presents sever ral challenges s; especially r related to diss similar data f formats, data types and no on standard m meta‐data taggs. An addditional challenge is related to data ssource discov very. Blogs (esspecially thosse not residinng on hosted services such h as blo ogspot or live ejournal) need to be discovered the mo oment they are created. If f this data is t to be used eff fectively, new w information from blogs (new b blog posts), so ocial network ks and the res st of the sociaal media colle ective, should d be collected d the m moment they are publicly available. Collectively, mu ultiple million opinions aree posted onlin ne by consummers on a daily y basis s. Given the rrapid evolutio on of social m media servicess and the eve er increasing volume of da ata, robust an nd exhaustivee data collection is a a fundamental first step in n enabling business intellig gence on soci ial media. TA CLEANIN DAT NG AND SPA AM REMOVA AL A la arge fraction of the information in us generated content is spam. According to some ser d e stattistics, more t than half of th he content ho osted at blogs spot (a blog h hosting service provided by y Goo ogle) is spam. . Spam is prim marily create ed for search engine optim mization purp poses but alsoo for malicious adv vertising and phishing atta acks. Although there are mmany ways to o create spam m, the one that is mmore pronoun nced is malicio ous content. Spammers in nject content t unrelated to o, say, a blog post in order to c , cause the po to be rele ost evant to many possible se earch queries s.
5 ● ● ● Then th actual (sp he pam) post commonly cont tains links to several sites (advertising o g malicio ous content). “Thhe presence of spam The pre esence of spaam is harmful to any attem mpt to under rstand conten nt from socia al is h harmful to any media. Spam conte introduce noise and clutters any emerging discussion or ent es d atteempt to un nderstand themes around top of intere It is imp s pics est. perative to iddentify and r remove spam m conntent from social before processing ssocial media content. A v variety of tec chniques to r remove spam m media” exist. S Such techniques aid to a certain ext a tent further processing o content to of o facilitat te automated d comprehens sion; as spam m filtering tech hniques evolv ve, identifying g ● ● ● and removing spam is an ongoing battle f providers of business intelligence m for s e solutionns on social m media. ANA ALYTICS The key to unleashing the poower of socia media is ga al aining enhannced understa anding of its content. A s starting point woulld be to obtai in simple met trics on the vvolume of me entions of ent tities of intere est in online content. Having spam free e conte is impera ent ative in order for such co r meaningful. T functionality will allo to observe ounts of mentions to be m This ow e incre ease or decrea ase in the volume of ment tions (or buzz z) around an entity of inte erest and inde eed being abl le to compare e severral entities of f interest in te erms of ment tions online ( (being related d products, co ompetitors, e etc). Several ssolutions exisst along g these lines iincluding som me free solutions. We ar rgue that it is possible to obtain unde s erstanding of social media far superior f a than wwhat is prese ently available in the form of counts of mentions and content s clippinngs. An incre ease or a de ecrease to the volume o mentions of an entity of y probably corresponds to some event of int e terest. Being able to und g derstand such h events requires s s significant ef ffort if the volume is high and/or if multiple e attribuutes/features s/properties a are commonly associated with the entity of interest t. The first powerful functionality is the abilit to underst y ty tand discussio topics and on d identif fy the main c conversations s around an e entity of interest. This will enable us to o furthe focus on w er what is actually important and refine o search fo information our or n there is a lot of active discussion arou the scre size of a specific tech (e.g., t t und een a h gadget as well as a a different di iscussion threead around it ts wifi capabbilities; based d on this s information n we can quickly focus on e each discussio on separately y). In many cases th volume of chatter/disc he f cussion aroun a specific topic will be vast, easily surpassing thousands o nd y of ment tions of the bbrand of interest. In that case powerfu ul summariza ation features s can readily distil what is s important in n the d discussion, saaving time an cost assoc nd ciated with re eading all the content. Re e ecent advanc cements in su ummarization n technnology enabl very fast summaries of large document collec les ctions. The b basic idea be ehind summa arization is to o transsform documents into points into a hig gh dimension nal document space and subsequently y reduce the dimension o of this space, by esssentially kee ubspaces that contain the most infor eping only su e rmation. Com mmonly diver and polar rse opiniions exist abo topics, products and their feature The ability to understa the sentiment of opin out es. y and nions towards
6 speci topics or entities of interest is imperative. Thi will enable power grou ific is e uping and sum mmarization of discussion n around positive and negative s sentiment ena abling further r topical analysis across se entiments. Socia media is a truly glob phenomenon; content exists in a languages al bal all s. ● ● ● Suppport for search as well as advanced a analytics acro all languages is a key oss y requ uirement. The requiremen becomes a necessity f global co e nt for orporations to o “T The ability tto track k issues acrosss the planet. un nderstand t the Given the diversity of social m media sources, the ability to identify in nfluential and d sentiment of f opinions horitative indi auth ividuals in eac ch media sou urce is very immportant. Althhough severa al towards specific topics s or simp measures of authority exist today (e.g., numbe of in links t a blog) we ple y er to e argue that the ac ccumulated h history of the online activ of individ e vity duals enables ntities of in en nterest is much richer form ms of authority y and influence to emerge e, especially w when they aree im mperative”” ” couppled with specific busine objectives. Availability of technical tools with ess y h abilit to underst ty tand varied business obje ectives in order to condu influencer uct ● ● ● searcch is thereforre of critical im mportance. Finallly, across s social media sources, several topic communi a cal ities emergee. porting efficie identification of such communities across topics as they emSupp ent s merge and e evolve is an a additional key y requirement. Giv the fragm ven mentation of social medi today acro multiple heterogeneo sources t ability to f ia oss ous the o erstand and associate communities together across sources is a punde pressing need d. GEO GRAPHY AN ND DEMOGR RAPHICS Social m media conten nt is contribut ted by users, individuals r residing in dif fferent places of the w world as well as belongin to differen demograph groups, h ng nt hic having diverse e interest ts, backgrounnds and profe essions. Captuuring informaation about thhe location as well as demographic informatio about the individuals contributing information s on e g n offers a a way to obta ain understan nding of social media in va arious geographies, diverse e demogr raphic or inteerest groups. Such in nformation o offers enhanc ced understaanding of particular inter rest to thosee aiming to comprehe geographies or demo end ographics. Co oupled with the ability to o factor t time in the analysis, it offers a powerful capabbility to und derstand how w percept tions and opinnions change e temporally.
7 CON NCLUSION Social media are empowe S ering individu uals and are redefining t media la the andscape. We e argued that t a technology ca empower marketers a public re an and elations speciialists to gainn enhanced und e derstanding oof social media, offering functionality faar beyond wh hat is available e to market place The second generation of social me oday in the m e. n edia analytics is here as a a result of technnology breakt throughs aligned to meet business nee eds. This new generation o of social media analysis platf forms offers a convergen of busine intelligence and socia nce ess al media. m