2006 time magazine selected “You” as the person of the year. in 2011. Web 2.0/social media enabled, or more precisely helped people topple decades-old authoritarian regimes in MENA region. Essentially, what people tried to do for the last 40 years social media helped to accomplish that in 5 years.
Social media has irreversibly transformed how people communicate, organize, mobilize, respond
Formation of collective action, manifestation of social movements, etc.
100s of millions of blogs, billions of tweets, several thousands youtube videos. Tons of region-specific sources. Strictly network-based approaches do not usually perform well.
Georgian Cyber Campaign (2009) brought Internet traffic to a standstill in the Republic of Georgia. The attacks, which coincided with the Russian military ’s invasion of Georgia, were carried out in large part out by civilians and Russian crime gangs. The attacks were significant in that they made it almost impossible for citizens and officials to communicate about what was happening on the ground during the military operation. According to a US Cyber Consequences Unit (US-CCU) August 2009 special report on this cyber campaign, social networking forums were the primary means used to recruit and arm the attackers. Social media has a key role in monitoring and tracking cyber-threats.
Chicken and egg problem – to identify good event dictionary you need good source and to identify good source you need event dictionary
ef-iEf = Generalization of content analysis measure, tf-idf
Alchemy API used to extract entities Our approach looks at quality (closeness) of the entities and not just quantity, so it is robust to the skewed distribution depicted above.
Also motivates the need for studying region-specific often non-English language sources.
In order to select the highly specific sources, we propose a novel ‘specificity’ measure, which estimates the unique information that a source ( S k ) can offer vis-à-vis an event ( E i ). It is important to note here that a source’s specificity is always estimated with respect to a given event. The measure draws upon the theory of information gain and is defined as . Mathematically, where IG(E i , S k ) denotes the information gain for a source related to an event E i , where S k = k th source belonging to ; E i = i th event belonging to ; denotes the set of sources for an event E i , ; denotes the set of events selected for the study; H(E i ) is the total entropy for the event E i , H(E i ,S k ) is the total entropy of the source S k related to the event E i . Since H(E i ) is constant for every event, IG(E i, S k ) is directly proportional to - H(E i , S k ) . So we only calculate the values for H(E i , S k ) in order to find . The formulation of discussed above is generic. Sections 6.3.1 explains a specific implementation.
Analyzing events through the lens of social media
Analyzing Events Through the Lens of Social Media Debanjan Mahata (firstname.lastname@example.org) Nitin Agarwal (email@example.com) University of Arkansas at Little RockThis work is supported in part by grants from the US Office of Naval Research (ONR) and US National Science Foundation (NSF)
Outline• Introduction• Motivation• Challenges• Proposed Framework• Data collection and processing• Experiments- Results and Analysis• Looking Ahead
Social Media’s Influence• Social media played a phenomenal role in organizing these events• Citizen journalism at its best
Goals of the Research• We study how social media can be leveraged to analyze – Events and their characteristics – Coverage differences from mainstream media – Socio-demographic, socio-technical behavioral patterns – and explore further implications of the research
Challenges• Identifying the right social media sources• Language barrier• Colloquial usage, misspellings, sparse links• Extracting relevant information from the sources – Entity extraction and resolution• Evaluation due to lack of benchmark datasets.
Proposed Methodology• Identifying the right social media sourcesSpecificity (κ) of a source ‘S’ for an event ‘E’IG(E, S) = H (E) − H (E | S) 1 p(s) = ∑ p(e) log − ∑ p(e) e∈E,s∈S p(e, s) log p(e, s) e∈E
Proposed Methodology• Identifying the right social media sourcesCloseness (τ) of a term/entity ‘e’ to a source ‘E’ τ = P(e, E) = P(E)P(e | E) P(e | E) = efiEf = ef (e, E)*iEf (e)• Creating Event dictionaries
Construction of Event Dictionaries• Reference point to Egyptian revolution Tahrir Square, Egyptian specific dictionary government, Gigi construct event vocabulary Ibrahim, Alexandria,• Independent of the sources Wael Abbas, …• Globalvoicesonline.org Libyan revolution specific Tripoli, Muammar Al dictionary Gaddafi, North Atlantic• Extract entities from global Treaty Organization, voices online source Chad, United Kingdom, …• Use closeness measure to Tunisian revolution Tunisian government, Lin order the entities based on specific dictionary Ben Mhenni, Samir Feriani, Kasbah Square, relevance to the event RCD, … – Event-specific dictionary Socio-political (global) Twitter, Iranian – Event category-specific event dictionary Government, Tear gas dictionary devices, Facebook, Big Social network, … Top 5 entities in the event specific and Event category-specific dictionaries
Data collection• Collected using Google Blog Search• From blogspot.com Event Query Term Number of Blogs DatesEgyptian Revolution “egyptian 579 25th January, 2011 – revolution” OR 7th December, 2011 “egypt protest” Libyan Revolution “libyan revolution” 600 15th February, 2011 OR “libya protest” – 7th December, 2011Tunisian Revolution “tunisian 484 17th December, revolution” OR 2010 – 7th “tunisia protest” December, 2011
Data DescriptionBlogger specific Blog post specific Blog specific URL URL URL TimestampWork information Blogging tags Text Gender Outlinks Blogs followed Topic Category Blogs owned Language
Further Analysis: Source Specificity vs. Location All Sources
Further Analysis: Source Specificity vs. Location Sources localized to Egypt
Conclusions• Relevance of social media in various events• Methodology to analyze events via social media• Associated challenges• Proposed measures to identify specific sources with respect to atomic information units/entities• Evaluation framework• Popular sources may not be specific• Localized sources tend to be more specific• Expand the dataset, include more and various types of events• Use as apparatus to analyze social movements, collective actions, marketing research, etc.
Observation• Socio-demographic – Location – Age – Gender – Profession (occupation, industry) – etc.• Socio-technical – Links – Devices – Other social media profiles• Network of bloggers from the extracted data
Specificityκ = IG(Ei ,Sk ) = H(Ei )− H(Ei ,Sk ) i= n i= nκ = −H (Ei ,Sk ) = ∑ fτ ∑ f i i i i=1 i=1