Link prediction 방법의 개념 및 활용Kyunghoon Kim
The document discusses link prediction in social networks. It begins with an introduction to social networks and link prediction. It then covers the framework of link prediction, including common methods and applications. As an example, it discusses using link prediction to analyze terrorist networks. Finally, it discusses performing link prediction using Python tools like NumPy, Pandas, and NetworkX.
Presentation for tutorial session 'Measuring scholarly impact: Methods and practice' at ISSI2015
Explains how to use linkpred: https://github.com/rafguns/linkpred
Twitter: Social Network Or News Medium?Serge Beckers
This document analyzes Twitter as a social network and news media by studying its topological characteristics and information diffusion. The authors:
1) Crawled over 41 million user profiles, 1.47 billion social connections, and 106 million tweets to analyze Twitter's structure and behavior.
2) Found that Twitter has a non-power law distribution of followers, short paths of separation between users, and low reciprocity - distinguishing it from other social networks.
3) Ranked users by followers, PageRank and retweets, finding influence inferred from followers differs from popularity of tweets.
4) Analyzed trending topics and found most are news headlines that persist for days with participation from many users.
Information access can be limited in some situations where traditional media outlets can’t cover the events due to geographical limitations or censorship. Examples of those situations can be civil unrest, war or natural disasters. In these situations citizen journalism replace or complement traditional media in the documentation of such events. Microblogging services such as Twitter have become of great use in these scenarios due their mobile nature and multimedia capabilities.
In this research we propose a method to create searchable, semantically annotated news articles from tweets in an automated way using the cloud of linked open data.
This document summarizes a study that analyzed how information spreads and evolves on the social network Facebook. The researchers examined thousands of "memes" (ideas or messages that spread analogously to genes) that were collectively replicated hundreds of millions of times on Facebook. They found that as memes spread from person to person, the information undergoes mutations and the variants exhibit population distributions that follow patterns predicted by the Yule process of evolution. Variants further apart in the diffusion process tend to have greater differences, and some text sequences can confer an advantage to variants and transfer between memes. Subpopulations on Facebook also preferentially transmit variants that match their beliefs.
Groundhog day: near duplicate detection on twitterDan Nguyen
This document presents a framework for detecting near-duplicate tweets on Twitter. The framework analyzes tweet pairs using three approaches: (1) comparing syntactic characteristics like word overlap, (2) measuring semantic similarity, and (3) analyzing contextual information. Machine learning is used to learn patterns that help identify duplicate tweets. The framework is integrated into a Twitter search engine called Twinder to diversify search results and improve search quality. Extensive experiments evaluate strategies for detecting duplicate tweets and analyzing features that impact detection. The results show semantic features can boost duplicate detection performance.
How to Make People Click on a Dangerous Link Despite their Security Awareness mark-smith
It is possible to make virtually any person click on a link, as any person will be curious about something, or interested in some topic, or find the message plausible because they know the sender, or because it fits their expectations (context).
This document summarizes a study that quantifies information overload on social media platforms using data from Twitter. The study models social media users as information processing systems that receive information in queues and process it at certain rates. By analyzing timestamps of tweets received and forwarded, the study estimates users' information processing behaviors and limits. Key findings include evidence that most users have processing limits of around 30 tweets/hour, and that overloaded users take longer to process information and prioritize tweets from select sources. The study also finds that information overload reduces the effectiveness of information spreading on social issues.
Link prediction 방법의 개념 및 활용Kyunghoon Kim
The document discusses link prediction in social networks. It begins with an introduction to social networks and link prediction. It then covers the framework of link prediction, including common methods and applications. As an example, it discusses using link prediction to analyze terrorist networks. Finally, it discusses performing link prediction using Python tools like NumPy, Pandas, and NetworkX.
Presentation for tutorial session 'Measuring scholarly impact: Methods and practice' at ISSI2015
Explains how to use linkpred: https://github.com/rafguns/linkpred
Twitter: Social Network Or News Medium?Serge Beckers
This document analyzes Twitter as a social network and news media by studying its topological characteristics and information diffusion. The authors:
1) Crawled over 41 million user profiles, 1.47 billion social connections, and 106 million tweets to analyze Twitter's structure and behavior.
2) Found that Twitter has a non-power law distribution of followers, short paths of separation between users, and low reciprocity - distinguishing it from other social networks.
3) Ranked users by followers, PageRank and retweets, finding influence inferred from followers differs from popularity of tweets.
4) Analyzed trending topics and found most are news headlines that persist for days with participation from many users.
Information access can be limited in some situations where traditional media outlets can’t cover the events due to geographical limitations or censorship. Examples of those situations can be civil unrest, war or natural disasters. In these situations citizen journalism replace or complement traditional media in the documentation of such events. Microblogging services such as Twitter have become of great use in these scenarios due their mobile nature and multimedia capabilities.
In this research we propose a method to create searchable, semantically annotated news articles from tweets in an automated way using the cloud of linked open data.
This document summarizes a study that analyzed how information spreads and evolves on the social network Facebook. The researchers examined thousands of "memes" (ideas or messages that spread analogously to genes) that were collectively replicated hundreds of millions of times on Facebook. They found that as memes spread from person to person, the information undergoes mutations and the variants exhibit population distributions that follow patterns predicted by the Yule process of evolution. Variants further apart in the diffusion process tend to have greater differences, and some text sequences can confer an advantage to variants and transfer between memes. Subpopulations on Facebook also preferentially transmit variants that match their beliefs.
Groundhog day: near duplicate detection on twitterDan Nguyen
This document presents a framework for detecting near-duplicate tweets on Twitter. The framework analyzes tweet pairs using three approaches: (1) comparing syntactic characteristics like word overlap, (2) measuring semantic similarity, and (3) analyzing contextual information. Machine learning is used to learn patterns that help identify duplicate tweets. The framework is integrated into a Twitter search engine called Twinder to diversify search results and improve search quality. Extensive experiments evaluate strategies for detecting duplicate tweets and analyzing features that impact detection. The results show semantic features can boost duplicate detection performance.
How to Make People Click on a Dangerous Link Despite their Security Awareness mark-smith
It is possible to make virtually any person click on a link, as any person will be curious about something, or interested in some topic, or find the message plausible because they know the sender, or because it fits their expectations (context).
This document summarizes a study that quantifies information overload on social media platforms using data from Twitter. The study models social media users as information processing systems that receive information in queues and process it at certain rates. By analyzing timestamps of tweets received and forwarded, the study estimates users' information processing behaviors and limits. Key findings include evidence that most users have processing limits of around 30 tweets/hour, and that overloaded users take longer to process information and prioritize tweets from select sources. The study also finds that information overload reduces the effectiveness of information spreading on social issues.
Detection and resolution of rumours in social mediaObedullahFahad
This document provides a survey of research on the detection and resolution of rumors on social media. It discusses characteristics of rumors, how they spread on social media, challenges they pose, and approaches that have been studied for rumor detection, tracking, stance classification, and veracity classification. Key points include defining rumors and their temporal characteristics, how early studies differed from current social media analysis, challenges rumors pose in domains like news, crises, public opinion and stock markets, and machine learning approaches that have been applied to these rumor analysis tasks.
Online Search And Society: Could Your Best Friend Be Your Worst Enemy?Rachel Noonan
The document discusses how search engines have become one of the most disruptive technologies, with Google alone processing over 1.5 billion searches per day, and how search engines personalize results based on users' search histories and activities online, raising concerns about how this may influence society and individuals. It provides background on the rise of search engines and their evolution from basic web directories to personalized assistants that track extensive user data.
Anatoliy Gruzd and Philip Mai
Workshop presented at the TTRA Annual International Conference in Quebec City (June 20, 2017)
https://2017ttraannualinternationalconfe.sched.com/event/9yCg/social-listening-how-to-do-it-and-how-to-use-it-veille-sociale-comment-faire-et-comment-lutiliser?iframe=no&w=100%&sidebar=no&bg=no
The document provides an overview of social media and search engine optimization techniques. It discusses key metrics related to internet usage and search engines. Various case studies are presented that demonstrate how organizations have used blogs, podcasts, online video and social networks like Facebook and Twitter to engage audiences and optimize search engine results. Strategies for social bookmarking, wikis and monitoring online conversations are also covered.
Identifying and Characterizing User Communities on Twitter during Crisis EventsIIIT Hyderabad
Twitter is a prominent online social media which is used to share information and opinions. Previous research has shown that current real world news topics and events dominate the discussions on Twitter. In this paper, we present a preliminary study to identify and characterize communities from a set of users who post messages on Twitter during crisis events. We present our work in progress by analyzing three major crisis events of 2011 as case studies (Hurricane Irene, Riots in England, and Earthquake in Virginia). Hurricane Irene alone, caused a damage of about 7-10 billion USD and claimed 56 lives. The aim of this paper is to identify the different user communities, and characterize them by the top central users. First, we defined a similarity metric between users based on their links, content posted and meta-data. Second, we applied spectral clustering to obtain communities of users formed during three different cri- sis events. Third, we evaluated the mechanism to identify top central users using degree centrality; we showed that the top users represent the topics and opinions of all the users in the community with 81% accuracy on an average. The top central people identified represent what the entire community shares. Therefore to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.
This document discusses how Twitter data can be analyzed to predict election outcomes and understand political behavior. It describes how researchers extracted tweets mentioning political candidates and measured the volume of tweets between candidates to predict winners in 404 of 435 congressional races in 2010. Regression analysis was also used to show a relationship between a candidate's "tweet share" and their actual vote share. The document also outlines how hashtags, mentions, retweets and other elements of tweets can provide insight into political conversations and predictions.
Social Media Analytics for Official StatisticsIsmail Fahmi
This document summarizes a presentation given by Ismail Fahmi on using social media analytics for official statistics. It discusses how big data from sources like social media and online news can supplement traditional statistics by providing more timely information. Examples are given of analyzing social media data from Twitter and online news to derive indicators related to topics like tourism, economic activity, and violence in news. Methods described include sentiment analysis, demographic analysis, and detecting bots. The takeaway message is that while big data cannot replace official statistics, it can provide complementary data to help obtain more up-to-date information when official statistics are delayed.
Social media is now the place where people are gathering en masse to discuss the news with their friends, neighbors and complete strangers. This change in news consumers’ behavior is proving to be a challenge for local news, but it is also an opportunity. Users and system generated data from social media can also be a boon for content creators. This presentation will feature a case study showing how publishers can use social media analytics to gain insights into their audience and how to use this information to foster a stronger sense of community around their brand of journalism. The case study will focus on how to use Netlytic, a cloud-based social media analytics tool, to mine the public Facebook interactions of the readers of BlogTO, a regional, Canadian-based media outlet, to find out what their readers are interested in and what engages them.
The document discusses how social browsing and information filtering works on social media sites like Digg and Flickr. It finds that on Digg, users are more likely to vote for stories submitted by friends and stories that their friends have voted for. On Flickr, users put significant effort into sharing photos with groups, and photos are more likely to receive comments from the uploader's social connections than strangers. Social networks and browsing the activities of connections helps drive promotion and discovery of content on these social media sites.
Social Media in Australia: A ‘Big Data’ Perspective on TwitterAxel Bruns
Invited presentation at the University of Melbourne, 4 April 2017.
Twitter research to date has focussed mainly on the study of isolated events, as described for example by specific hashtags or keywords relating to elections, natural disasters, public events, and other moments of heightened activity in the network. This limited focus is determined in part by the limitations placed on large-scale access to Twitter data by Twitter, Inc. itself. This research presents the first ever comprehensive study of a national Twittersphere as an entity in its own right. It examines the structure of the follower network amongst some 4 million Australian Twitter accounts and the dynamics of their day-to-day activities, and explores the Australian Twittersphere’s engagement with specific recent events.
Twitter has impacted journalism and news media in several ways. It has empowered journalists to report news from the source as it happens through live tweets. It has changed the way people consume news by providing more immediate updates directly from eyewitnesses. It has also led both traditional and independent media outlets to use Twitter as a platform to publish and promote their news, thereby changing how news is distributed and monetized.
This document provides a high-level and low-level description of a sentiment analysis system. At the high level, it collects text data, splits it into sentences, assigns polarity, checks for repeated words, and extracts sentiment. The low-level description details how it collects data from Facebook using APIs, processes the data by tagging parts of speech, analyzes polarity vs neutral sets, lists features, and builds a classifier using naive Bayes and dependencies between n-grams and parts of speech. The system aims to analyze sentiment from social media texts at both the document and sentence level.
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
Keynote delivered at the SRA Social Media in Social Research conference, London, 24 June, 2013. The presentation highlights some thoughts on sampling, tools, data, ethics and user requirements for Twitter analytics, including an overview of a series of recent tools.
This document summarizes the key findings of a Pew Research Center study analyzing 117 million interactions with news articles on cellphones from September 2015. The study found that readers spend significantly more time (about twice as long) engaging with long-form articles (1,000+ words) compared to short-form articles on their phones. However, short-form articles make up the vast majority (76%) of the content. Both long and short-form articles have very brief lifespans, with over 80% of interactions beginning within two days of publication. While most readers (72-79%) only view one article per site, return visitors and those arriving via internal links spend the most time engaged.
Abstract: Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With its immense popularity over time, it has become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service in this domain is being exploited heavily to carry out phishing attacks, work from home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October 2013 to highlight some ground issues in their spam detection mechanism. In addition, we identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious / benign and achieved a maximum accuracy of 86.41%. To the best our knowledge, this is the first large scale study to highlight the issues with Bitly’s spam detection policies and proposing a suitable countermeasure.
Twitter is now an established and a widely popular news medium. Be it normal banter or a discussion on high impact events like Boston marathon blasts, February 2014 US Icestorm, etc., people use Twitter to get updates and also broadcast their thoughts and views. Twitter bots have today become very common and acceptable. People are using them to get updates about emergencies like natural disasters, terrorist strikes, etc., users also use them for getting updates about different places and events, both local and global. Twitter bots provide these users a means to perform certain tasks on Twitter that are both simple and structurally repetitive, at a much higher rate than what would be possible for a human alone. During high impact events these Twitter bots tend to provide a time critical and a comprehensive information source with information aggregated form various different sources. In this study, we present how these bots participate in discussions and augment them during high impact events. We identify bots in 5 high impact events for 2013: Boston blasts, February 2014 US Icestorm, Washington Navy Yard Shooting, Oklahoma tornado, and Cyclone Phailin. We identify bots among top tweeters by getting all such accounts manually annotated. We then study their activity and present many important insights. We determine the impact bots have on information diffusion during these events and how they tend to aggregate and broker information from various sources to different users. We also analyzed their tweets, list down important differentiating features between bots and non bots (normal or human accounts) during high impact events. We also show how bots are slowly moving away from traditional API based posts towards web automation platforms like IFTTT, dlvr.it, etc. Using standard machine learning, we proposed a methodology to identify bots/non bots in real time during high impact events. This study also looks into how the bot scenario has changed by comparing data from high impact events from 2013 against data from similar type of events from 2011. Bots active in high impact events generally don't spread malicious content. Lastly, we also go through an in-depth analysis of Twitter bots who were active during 2013 Boston Marathon Blast. We show how bots because of their programming structure don't pick up rumors easily during these events and even if they do; they do it after a long time.
Programs are susceptible to malformed data coming from untrusted sources. Occasionally the programming logic or constructs used are inappropriate to handle all types of constraints that are imposed by legal and well-formed data. As a result programs produce unexpected results or even worse, they may crash. Program behavior in both of these cases would be highly undesirable.
In this thesis work, we present a novel hybrid approach that saves programs from crashing when the failures originate from malformed strings or inappropriate handling of strings. Our approach statically analyses a program to identify statements that are vulnerable to failures related to associate string data. It then generates patches that are likely to satisfy constraints on the data, and in case of failures produce program behavior which would be close to the expected. The precision of the patches is improved with the help of a dynamic analysis. The patches are activated only after a failure is detected, and the technique incurs no runtime overhead during normal course of execution, and negligible overhead in case of failures.
We have experimented with Java String API, and applied Clotho to several hugely popular open-source libraries to patch 30 bugs, several of them rated either critical or major. Our evaluation shows that Clotho is both practical and effective. The comparison of the patches generated by our technique with the actual patches developed by the programmers in the later versions shows that they are semantically similar.
+ Background & Basics of Web App Security, The HTTP Protocol, Web.
+ Application Insecurities, OWASP Top 10 Vulnerabilities (XSS, SQL Injection, CSRF, etc.)
+ Web App Security Tools (Scanners, Fuzzers, etc), Remediation of Web App
+ Vulnerabilities, Web Application Audits and Risk Assessment.
Web Application Security 101 was conducted by:
Vaibhav Gupta, Vishal Ashtana, Sandeep Singh from Null.
Various Open Source Cryptographic Libraries are being used these days to implement the
general purpose cryptographic functions and to provide a secure communication channel over
the internet. These libraries, that implement SSL/TLS, have been targeted by various side
channel attacks in the past that result in leakage of sensitive information flowing over the
network. Side channel attacks rely on inadvertent leakage of information from devices
through observable attributes of online communication. Some of the common side channel
attacks discovered so far rely on packet arrival and departure times (Timing Attacks), power
usage and packet sizes. Our research explores novel side channel attack that relies on CPU
architecture and instruction sets. In this research, we explored such side channel vectors
against popular SSL/TLS implementations which were previously believed to be patched
against padding oracle attacks, like the POODLE attack. We were able to successfully extract
the plaintext bits in the information exchanged using the APIs of two popular SSL/TLS
libraries.
Today, more than two hundred Online Social Networks (OSNs) exist where each OSN extends to offer distinct services to its users such as eased access to news or better business opportunities. To enjoy each distinct service, a user innocuously registers herself on multiple OSNs. For each OSN, she defines her identity with a different set of attributes, genre of content and friends to suit the purpose of using that OSN. Thus, the quality, quantity and veracity of the identity varies with the OSN. This results in dissimilar identities of the same user, scattered across Internet, with no explicit links directing to one another. These disparate unlinked identities worry various stakeholders. For instance, security practitioners find it difficult to verify attributes across unlinked identities; enterprises fail to create a holistic overview of their customers.
Research that finds and links disconnected identities of a user across OSNs is termed as identity resolution. Accessibility to unique and private attributes of a user like ‘email’ makes the task trivial, however in absence of such attributes, identity resolution is challenging. In this dissertation, we make an effort to leverage intelligent cues and patterns extracted from partially overlapping list of public attributes of compared identities. These patterns emerge due to consistent user behavior like sharing same mobile number, content or profile picture across OSNs. Translating these patterns into features, we devise novel heuristic, unsupervised and supervised frameworks to search and link user identities across social networks. Proposed search methods use an exhaustive set of public attributes looking for consistent behavior patterns and fetch correct identity of the searched user in the candidate set for an additional 11% users. An improvement on the proposed search mechanisms further optimizes time and space complexity. Suggested linking method compares past attribute value sets and correctly connect identities of an additional 48% users, earlier missed by literature methods that compare only current values. Evaluations on popular OSNs like Twitter, Instagram and Facebook prove significance and generalizability of the linking method.
Detection and resolution of rumours in social mediaObedullahFahad
This document provides a survey of research on the detection and resolution of rumors on social media. It discusses characteristics of rumors, how they spread on social media, challenges they pose, and approaches that have been studied for rumor detection, tracking, stance classification, and veracity classification. Key points include defining rumors and their temporal characteristics, how early studies differed from current social media analysis, challenges rumors pose in domains like news, crises, public opinion and stock markets, and machine learning approaches that have been applied to these rumor analysis tasks.
Online Search And Society: Could Your Best Friend Be Your Worst Enemy?Rachel Noonan
The document discusses how search engines have become one of the most disruptive technologies, with Google alone processing over 1.5 billion searches per day, and how search engines personalize results based on users' search histories and activities online, raising concerns about how this may influence society and individuals. It provides background on the rise of search engines and their evolution from basic web directories to personalized assistants that track extensive user data.
Anatoliy Gruzd and Philip Mai
Workshop presented at the TTRA Annual International Conference in Quebec City (June 20, 2017)
https://2017ttraannualinternationalconfe.sched.com/event/9yCg/social-listening-how-to-do-it-and-how-to-use-it-veille-sociale-comment-faire-et-comment-lutiliser?iframe=no&w=100%&sidebar=no&bg=no
The document provides an overview of social media and search engine optimization techniques. It discusses key metrics related to internet usage and search engines. Various case studies are presented that demonstrate how organizations have used blogs, podcasts, online video and social networks like Facebook and Twitter to engage audiences and optimize search engine results. Strategies for social bookmarking, wikis and monitoring online conversations are also covered.
Identifying and Characterizing User Communities on Twitter during Crisis EventsIIIT Hyderabad
Twitter is a prominent online social media which is used to share information and opinions. Previous research has shown that current real world news topics and events dominate the discussions on Twitter. In this paper, we present a preliminary study to identify and characterize communities from a set of users who post messages on Twitter during crisis events. We present our work in progress by analyzing three major crisis events of 2011 as case studies (Hurricane Irene, Riots in England, and Earthquake in Virginia). Hurricane Irene alone, caused a damage of about 7-10 billion USD and claimed 56 lives. The aim of this paper is to identify the different user communities, and characterize them by the top central users. First, we defined a similarity metric between users based on their links, content posted and meta-data. Second, we applied spectral clustering to obtain communities of users formed during three different cri- sis events. Third, we evaluated the mechanism to identify top central users using degree centrality; we showed that the top users represent the topics and opinions of all the users in the community with 81% accuracy on an average. The top central people identified represent what the entire community shares. Therefore to understand a community, we need to monitor and analyze only these top users rather than all the users in a community.
This document discusses how Twitter data can be analyzed to predict election outcomes and understand political behavior. It describes how researchers extracted tweets mentioning political candidates and measured the volume of tweets between candidates to predict winners in 404 of 435 congressional races in 2010. Regression analysis was also used to show a relationship between a candidate's "tweet share" and their actual vote share. The document also outlines how hashtags, mentions, retweets and other elements of tweets can provide insight into political conversations and predictions.
Social Media Analytics for Official StatisticsIsmail Fahmi
This document summarizes a presentation given by Ismail Fahmi on using social media analytics for official statistics. It discusses how big data from sources like social media and online news can supplement traditional statistics by providing more timely information. Examples are given of analyzing social media data from Twitter and online news to derive indicators related to topics like tourism, economic activity, and violence in news. Methods described include sentiment analysis, demographic analysis, and detecting bots. The takeaway message is that while big data cannot replace official statistics, it can provide complementary data to help obtain more up-to-date information when official statistics are delayed.
Social media is now the place where people are gathering en masse to discuss the news with their friends, neighbors and complete strangers. This change in news consumers’ behavior is proving to be a challenge for local news, but it is also an opportunity. Users and system generated data from social media can also be a boon for content creators. This presentation will feature a case study showing how publishers can use social media analytics to gain insights into their audience and how to use this information to foster a stronger sense of community around their brand of journalism. The case study will focus on how to use Netlytic, a cloud-based social media analytics tool, to mine the public Facebook interactions of the readers of BlogTO, a regional, Canadian-based media outlet, to find out what their readers are interested in and what engages them.
The document discusses how social browsing and information filtering works on social media sites like Digg and Flickr. It finds that on Digg, users are more likely to vote for stories submitted by friends and stories that their friends have voted for. On Flickr, users put significant effort into sharing photos with groups, and photos are more likely to receive comments from the uploader's social connections than strangers. Social networks and browsing the activities of connections helps drive promotion and discovery of content on these social media sites.
Social Media in Australia: A ‘Big Data’ Perspective on TwitterAxel Bruns
Invited presentation at the University of Melbourne, 4 April 2017.
Twitter research to date has focussed mainly on the study of isolated events, as described for example by specific hashtags or keywords relating to elections, natural disasters, public events, and other moments of heightened activity in the network. This limited focus is determined in part by the limitations placed on large-scale access to Twitter data by Twitter, Inc. itself. This research presents the first ever comprehensive study of a national Twittersphere as an entity in its own right. It examines the structure of the follower network amongst some 4 million Australian Twitter accounts and the dynamics of their day-to-day activities, and explores the Australian Twittersphere’s engagement with specific recent events.
Twitter has impacted journalism and news media in several ways. It has empowered journalists to report news from the source as it happens through live tweets. It has changed the way people consume news by providing more immediate updates directly from eyewitnesses. It has also led both traditional and independent media outlets to use Twitter as a platform to publish and promote their news, thereby changing how news is distributed and monetized.
This document provides a high-level and low-level description of a sentiment analysis system. At the high level, it collects text data, splits it into sentences, assigns polarity, checks for repeated words, and extracts sentiment. The low-level description details how it collects data from Facebook using APIs, processes the data by tagging parts of speech, analyzes polarity vs neutral sets, lists features, and builds a classifier using naive Bayes and dependencies between n-grams and parts of speech. The system aims to analyze sentiment from social media texts at both the document and sentence level.
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
Keynote delivered at the SRA Social Media in Social Research conference, London, 24 June, 2013. The presentation highlights some thoughts on sampling, tools, data, ethics and user requirements for Twitter analytics, including an overview of a series of recent tools.
This document summarizes the key findings of a Pew Research Center study analyzing 117 million interactions with news articles on cellphones from September 2015. The study found that readers spend significantly more time (about twice as long) engaging with long-form articles (1,000+ words) compared to short-form articles on their phones. However, short-form articles make up the vast majority (76%) of the content. Both long and short-form articles have very brief lifespans, with over 80% of interactions beginning within two days of publication. While most readers (72-79%) only view one article per site, return visitors and those arriving via internal links spend the most time engaged.
Abstract: Existence of spam URLs over emails and Online Social Media (OSM) has become a growing phenomenon. To counter the dissemination issues associated with long complex URLs in emails and character limit imposed on various OSM (like Twitter), the concept of URL shortening gained a lot of traction. URL shorteners take as input a long URL and give a short URL with the same landing page in return. With its immense popularity over time, it has become a prime target for the attackers giving them an advantage to conceal malicious content. Bitly, a leading service in this domain is being exploited heavily to carry out phishing attacks, work from home scams, pornographic content propagation, etc. This imposes additional performance pressure on Bitly and other URL shorteners to be able to detect and take a timely action against the illegitimate content. In this study, we analyzed a dataset marked as suspicious by Bitly in the month of October 2013 to highlight some ground issues in their spam detection mechanism. In addition, we identified some short URL based features and coupled them with two domain specific features to classify a Bitly URL as malicious / benign and achieved a maximum accuracy of 86.41%. To the best our knowledge, this is the first large scale study to highlight the issues with Bitly’s spam detection policies and proposing a suitable countermeasure.
Twitter is now an established and a widely popular news medium. Be it normal banter or a discussion on high impact events like Boston marathon blasts, February 2014 US Icestorm, etc., people use Twitter to get updates and also broadcast their thoughts and views. Twitter bots have today become very common and acceptable. People are using them to get updates about emergencies like natural disasters, terrorist strikes, etc., users also use them for getting updates about different places and events, both local and global. Twitter bots provide these users a means to perform certain tasks on Twitter that are both simple and structurally repetitive, at a much higher rate than what would be possible for a human alone. During high impact events these Twitter bots tend to provide a time critical and a comprehensive information source with information aggregated form various different sources. In this study, we present how these bots participate in discussions and augment them during high impact events. We identify bots in 5 high impact events for 2013: Boston blasts, February 2014 US Icestorm, Washington Navy Yard Shooting, Oklahoma tornado, and Cyclone Phailin. We identify bots among top tweeters by getting all such accounts manually annotated. We then study their activity and present many important insights. We determine the impact bots have on information diffusion during these events and how they tend to aggregate and broker information from various sources to different users. We also analyzed their tweets, list down important differentiating features between bots and non bots (normal or human accounts) during high impact events. We also show how bots are slowly moving away from traditional API based posts towards web automation platforms like IFTTT, dlvr.it, etc. Using standard machine learning, we proposed a methodology to identify bots/non bots in real time during high impact events. This study also looks into how the bot scenario has changed by comparing data from high impact events from 2013 against data from similar type of events from 2011. Bots active in high impact events generally don't spread malicious content. Lastly, we also go through an in-depth analysis of Twitter bots who were active during 2013 Boston Marathon Blast. We show how bots because of their programming structure don't pick up rumors easily during these events and even if they do; they do it after a long time.
Programs are susceptible to malformed data coming from untrusted sources. Occasionally the programming logic or constructs used are inappropriate to handle all types of constraints that are imposed by legal and well-formed data. As a result programs produce unexpected results or even worse, they may crash. Program behavior in both of these cases would be highly undesirable.
In this thesis work, we present a novel hybrid approach that saves programs from crashing when the failures originate from malformed strings or inappropriate handling of strings. Our approach statically analyses a program to identify statements that are vulnerable to failures related to associate string data. It then generates patches that are likely to satisfy constraints on the data, and in case of failures produce program behavior which would be close to the expected. The precision of the patches is improved with the help of a dynamic analysis. The patches are activated only after a failure is detected, and the technique incurs no runtime overhead during normal course of execution, and negligible overhead in case of failures.
We have experimented with Java String API, and applied Clotho to several hugely popular open-source libraries to patch 30 bugs, several of them rated either critical or major. Our evaluation shows that Clotho is both practical and effective. The comparison of the patches generated by our technique with the actual patches developed by the programmers in the later versions shows that they are semantically similar.
+ Background & Basics of Web App Security, The HTTP Protocol, Web.
+ Application Insecurities, OWASP Top 10 Vulnerabilities (XSS, SQL Injection, CSRF, etc.)
+ Web App Security Tools (Scanners, Fuzzers, etc), Remediation of Web App
+ Vulnerabilities, Web Application Audits and Risk Assessment.
Web Application Security 101 was conducted by:
Vaibhav Gupta, Vishal Ashtana, Sandeep Singh from Null.
Various Open Source Cryptographic Libraries are being used these days to implement the
general purpose cryptographic functions and to provide a secure communication channel over
the internet. These libraries, that implement SSL/TLS, have been targeted by various side
channel attacks in the past that result in leakage of sensitive information flowing over the
network. Side channel attacks rely on inadvertent leakage of information from devices
through observable attributes of online communication. Some of the common side channel
attacks discovered so far rely on packet arrival and departure times (Timing Attacks), power
usage and packet sizes. Our research explores novel side channel attack that relies on CPU
architecture and instruction sets. In this research, we explored such side channel vectors
against popular SSL/TLS implementations which were previously believed to be patched
against padding oracle attacks, like the POODLE attack. We were able to successfully extract
the plaintext bits in the information exchanged using the APIs of two popular SSL/TLS
libraries.
Today, more than two hundred Online Social Networks (OSNs) exist where each OSN extends to offer distinct services to its users such as eased access to news or better business opportunities. To enjoy each distinct service, a user innocuously registers herself on multiple OSNs. For each OSN, she defines her identity with a different set of attributes, genre of content and friends to suit the purpose of using that OSN. Thus, the quality, quantity and veracity of the identity varies with the OSN. This results in dissimilar identities of the same user, scattered across Internet, with no explicit links directing to one another. These disparate unlinked identities worry various stakeholders. For instance, security practitioners find it difficult to verify attributes across unlinked identities; enterprises fail to create a holistic overview of their customers.
Research that finds and links disconnected identities of a user across OSNs is termed as identity resolution. Accessibility to unique and private attributes of a user like ‘email’ makes the task trivial, however in absence of such attributes, identity resolution is challenging. In this dissertation, we make an effort to leverage intelligent cues and patterns extracted from partially overlapping list of public attributes of compared identities. These patterns emerge due to consistent user behavior like sharing same mobile number, content or profile picture across OSNs. Translating these patterns into features, we devise novel heuristic, unsupervised and supervised frameworks to search and link user identities across social networks. Proposed search methods use an exhaustive set of public attributes looking for consistent behavior patterns and fetch correct identity of the searched user in the candidate set for an additional 11% users. An improvement on the proposed search mechanisms further optimizes time and space complexity. Suggested linking method compares past attribute value sets and correctly connect identities of an additional 48% users, earlier missed by literature methods that compare only current values. Evaluations on popular OSNs like Twitter, Instagram and Facebook prove significance and generalizability of the linking method.
In recent years due to advancement in video and image editing tools
it has become increasingly easy to modify the multimedia content. The
doctored videos are very difficult to identify through visual
examination as artifacts left behind by processing steps are subtle
and cannot be easily captured visually. Therefore, the integrity of
digital videos can no longer be taken for granted and these are not
readily acceptable as a proof-of-evidence in court-of-law. Hence,
identifying the authenticity of videos has become an important field
of information security.
In this thesis work, we present a novel approach to detect and
temporally localize video inpainting forgery based on optical flow
consistency. The proposed algorithm comprises of two stages. In the
first step, we detect if the given video is inpainted or authentic and
in the second step we perform temporal localization. Towards this, we
first compute the optical flow between frames. Further, we analyze the
goodness of fit of chi-square values obtained from optical flow
histograms using a Guassian mixture model. A threshold is then applied
to classify between authentic and inpainted videos. In the next step,
we extract Transition Probability Matrices (TPMs) by modelling the
optical flow as first order Markov process. SVM based classification
is then applied on the obtained TPM features to decide whether a block
of non-overlapping frames is authentic or inpainted thus obtaining
temporal localization. In order to evaluate the robustness of the
proposed algorithm, we perform the experiments against two popular and
efficient inpainting techniques. We test our algorithm on public
datasets like PETS and SULFA. The results show that the approach is
effective against the inpainting techniques. In addition, it detects
and localizes the inpainted frames in a video with high accuracy and
low false positives.
This year in the month of May, the tenure of the 15th Lok Sabha was to end and the elections to the 543 parliamentary seats were to be held. With 813 million registered voters, out of which a 100 million were first time voters, we are the world's largest democracy. A whooping $5 billion were spent on these elections, which made us stand second only to the US Presidential elections ($7 billion) in terms of money spent. The different phases of elections were held on 9 days spanning over the months of April and May, making it the most elaborate exercise to choose the Prime Minister of India. Swelling number of Internet users and Online Social Media (OSM) users turned these unconventional media platforms into key medium in these elections; that could effect 3-4% of urban population votes as per a report of IAMAI (Internet & Mobile Association of India). Political parties making use of Google+ Hangout to interact with people and party workers, posting campaigning photos on Instagram and videos on YouTube, debating on Twitter and Facebook were strong indicators of the impact of the OSM on the India General Elections 2014. With hardly any political leader or party not having his account on the micro blogging site Twitter and the surge in the political conversations on Twitter, inspired us to take the opportunity to study and analyze this huge ocean of elections data. Our count of tweets related to elections from September 2013 to May 2014, collected with the help of Twitter's Streaming API was close to 17.07 million.
We analyzed the complete dataset to find interesting patterns in it and also to verify if the trivial things were also evident in the data collected. We found that the activity on Twitter peaked during important events related to elections. It was evident from our data that the political behavior of the politicians affected their followers count and thus popularity on Twitter. We analyzed our data to look out for the topics that were most discussed on Twitter during these elections. Yet another aim of our work was to find an efficient way to classify the political orientation of the users on Twitter. To accomplish this task, we used four different techniques: two were based on the content of the tweets made by the user, one on the user based features and another one based on community detection algorithm on the retweet and user mention networks. We found that the community detection algorithm worked best with an efficiency of more than 80%.With an aim to monitor the daily incoming data, we built a portal to show the analysis of the tweets of the last 24 hours. To the best of our knowledge, this is the first academic pursuit to analyze the elections data and classify the users in the India General Elections 2014.
Faking Sandy: Characterizing and Identifying Fake Images on Twitter during Hu...IIIT Hyderabad
In today's world, online social media plays a vital role during real world events, especially crisis events. There are both positive and negative effects of social media coverage of events, it can be used by authorities for effective disaster management or by malicious entities to spread rumors and fake news. The aim of this paper, is to highlight the role of Twitter, during Hurricane Sandy (2012) to spread fake images about the disaster. We identified 10,350 unique tweets containing fake images that were circulated on Twitter, during Hurricane Sandy. We performed a characterization analysis, to understand the temporal, social reputation and influence patterns for the spread of fake images. Eighty six percent of tweets spreading the fake images were retweets, hence very few were original tweets. Our results showed that top thirty users out of 10,215 users (0.3%) resulted in 90% of the retweets of fake images; also network links such as follower relationships of Twitter, contributed very less (only 11%) to the spread of these fake photos URLs. Next, we used classification models, to distinguish fake images from real images of Hurricane Sandy. Best results were obtained from Decision Tree classifier, we got 97% accuracy in predicting fake images from real. Also, tweet based features were very effective in distinguishing fake images tweets from real, while the performance of user based features was very poor. Our results, showed that, automated techniques can be used in identifying real images from fake images posted on Twitter.
$1.00 per RT #BostonMarathon #PrayForBoston: Analyzing Fake Content on TwitterIIIT Hyderabad
Researchers analyzed the spread of fake content on Twitter during the 2013 Boston Marathon bombings. They found that over 7 million tweets were posted in the week after the attack, with fake rumors spreading rapidly in the first few hours. The researchers characterized the spread of rumors based on temporal patterns, user attributes, and networks of suspended accounts. They also developed a linear regression model to predict how viral a rumor might become based on the characteristics of early spreaders. The analysis provided insights into curbing the spread of misinformation during crisis events.
This document summarizes the challenges of computational verification of content shared on social media. It discusses how fake content is often shared quickly on platforms like Twitter, leaving little time for verification. The authors examine using machine learning to verify shared images from events like Hurricane Sandy and the Boston Marathon bombings. They find accuracy is around 80% via cross-validation but drops to 58% when training on one event and testing on the other, highlighting challenges verifying new events. Future work could expand verification to other features and more events.
Social Media Training at AED by Eric Schwartzman. This is Day 2 of a 2-Day Seminar delivered on Nov. 10, 2010 in Wqshington, D.C. Feel free to use this deck but please credit www.ericschwartzman.com
Privacy and Security in Online Social Media : Trust and Credebillity on OSMIIIT Hyderabad
This document summarizes a lecture on privacy and security in online social media. It discusses analyzing misinformation spread on social media during real-world events like hurricanes and bombings. Features of tweets and user profiles are used to classify tweets as real or fake. A Chrome extension called TweetCred is demonstrated that analyzes tweets in real-time to assess credibility using machine learning models trained on these features. The lecture covers collecting, filtering, and annotating social media data from events. Network and linguistic analysis are used to understand information flow and credibility.
Privacy and Security on Online Social Media: Workshop on Data Analytics & Its...IIIT Hyderabad
This document summarizes a workshop presentation on privacy and security in online social media. The presenter discusses their research interests in privacy, cybercrime, social media and usable security. They describe tools they have developed including TweetCred to assess the credibility of tweets, and OCEAN, an open government data repository. The presentation warns of privacy and identity risks when personal details from multiple sources are combined, and calls for more research on these issues in India.
Social Media Training at AED by Eric Schwartzman. This is Day 1 of a 2-Day Seminar delivered on Nov. 9, 2010 in Wqshington, D.C. Feel free to use this deck but please credit www.ericschwartzman.com
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaIIIT Hyderabad
This document summarizes a presentation given by Ponnurangam Kumaraguru on credibility, identity resolution, privacy, and policing on online social media. Some key points include:
- Kumaraguru is an associate professor at IIIT-Delhi whose research interests include social computing, computational social science, and complex networks related to human behavior and security/privacy.
- His team has developed methods for credibility modeling of tweets during crisis events using machine learning techniques. They also built a system called TweetCred to analyze tweet credibility.
- The presentation discussed challenges around user identity resolution across multiple online social networks and described some of Kumaraguru's research in this area, including using profile attributes and network connections to find
This document discusses real-time web search and the challenges it presents. It defines real-time search as having a small delay between data creation and indexing, allowing search against current hot queries or in a small time window. However, real-time search faces issues around lack of quality due to spam and bias in sources like Twitter. Moving forward, real-time search needs to broaden coverage, provide topic-focused and informative results, and balance the needs for recency and quality while discovering information in real-time.
Social Media Training :: Market Research Association :: First Outlook ConferenceEric Schwartzman
This document provides an overview of a social media research workshop. It includes sections on social media metrics and analytics, keyword discovery techniques using tools like Twitter search and Wordles, monitoring conversations through RSS feeds and tools like Google Reader, and demonstrations of social media monitoring dashboards and analytics platforms like Radian6 and Google Analytics. The workshop aims to help attendees develop skills in researching and analyzing social media conversations and engagement.
Social media-training-market-research-assoc-2010Eric Schwartzman
This document provides an overview of a social media research workshop. It includes sections on social media metrics and analytics, keyword discovery techniques using tools like Twitter search and Wordles, monitoring conversations using RSS feeds and Google Reader, and demonstrations of social media monitoring dashboards and analytics platforms like Radian6 and Google Analytics. The workshop aims to teach participants how to effectively research social media for business purposes.
Social Media Training :: Market Research Assoc. 2010Eric Schwartzman
Social Media Training by Eric Schwartz man, presented at the Market Research Association
s First Outlook Conference on Nov. 2, 2010 on Orlando, Florida.
Data! Action! Data journalism issues to watch in the next 10 yearsPaul Bradshaw
Keynote at the Nordic data journalism conference #NODA16 - an outline of issues facing data journalism which journalists and academics need to focus on in the next decade.
TweetCred: Real-Time Credibility Assessment of Content on Twitter @ Socinfo...IIIT Hyderabad
This document describes research on real-time credibility assessment of tweets. The researchers created a system called TweetCred that scores tweets for credibility in real-time based on a semi-supervised ranking model. TweetCred was deployed live and scored over 7 million tweets from over 1,400 Twitter users. The researchers evaluated TweetCred on response time, effectiveness, and usability based on surveys of 67 users, finding an average usability score of 70. Future work could focus on personalizing credibility scores based on a user's social network and exploring psychological factors influencing information credibility on Twitter.
Presentation to Conservation Communications Forum in NairobiSochin Limited
We spoke at the inaugural Conservation Communications Forum which investigated the communication strategy that conservancies ought to undertake to better connect with audiences and push a refreshed conservation narrative.
Making smart decision: Thornley Fallis whitepaper looks at important trends, metrics and benchmarks to inform digital communications strategies for 2014 and beyond.
Social media refers to channels for interacting and sharing user-generated content using accessible web-based technologies. 51% of Americans get news from people they follow on social media, and spend an average of 57 minutes on traditional media and 13 additional minutes consuming news online daily. 95% of online shoppers conduct research before purchasing, with 60% using search engines. The average social network user is 37 years old.
The document summarizes a presentation by team CDTW on the topic of fatal incidents between African American males and police. It includes the team members, purpose of presenting their research findings, and a disclaimer. It then outlines the problem statement, hypothesis, and describes two paths of data collection - one using Twitter data and the other using structured datasets. It discusses challenges with data collection and presents some visualizations and recommendations to improve the analysis.
The document summarizes a presentation by team CDTW on the topic of fatal incidents between African American males and police. It includes the team members, purpose of presenting their research findings, and a disclaimer. It then outlines the problem statement, hypothesis, and two paths of data collection and analysis - one using Twitter data and the other using structured datasets. It describes the data sources and modeling for each path, and provides recommendations to address issues encountered, such as limited Twitter data collection and disparate structured datasets. Visualizations from each path are also presented.
Similar to Designing and Evaluating Techniques to Mitigate Misinformation Spread on Micro-blogging Web Services (20)
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
This document discusses responsible and safe artificial intelligence. It summarizes PK's work on developing context-aware models to reduce bias in large language models and techniques for removing harmful knowledge from models. The talk outlines issues like inconsistency in models, bias indicators, and corrective machine unlearning. It encourages collaborating to advance this important research and addresses building more accountability as models grow more powerful.
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
This document discusses strategies for successful international collaboration, including maintaining an active website, pursuing joint grant proposals, student exchanges through co-advising and visits, organizing workshops together, and publishing joint papers. It emphasizes finding connections through one's existing network to avoid cold emails, and developing equal partnerships.
The document summarizes a workshop on responsible and safe AI held at IIT Madras. It discusses topics like legal bias and inconsistency in large language models, bias in AI systems, and approaches to make models more interpretable and remove harmful knowledge. Live demonstrations of ChatGPT were shown to illustrate issues like factual inconsistencies and how context is needed to avoid confusion. Overall, the workshop highlighted challenges with AI systems and ongoing research efforts to address issues like bias, lack of context, and removal of harmful information.
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
Fake news refers to intentionally and verifiably false stories created to manipulate people’s perceptions of reality.
The concept of fake news is not new and has marked its presence dating back to AD 1475, affecting the citizens of Italy on eastern Sunday to the COVID-19 pandemic in 2020. Fake news has gained traction among audiences, created a buzz online, and faced repercussions offline. For instance, intruding hyperbolized fake articles into political campaigns or health and climate studies is havoc. In addition, the proliferation of fabricated stories has played a crucial role in inflaming or suppressing a social event. In conclusion, fake news is destructive and can lead to hatred against religion, politics, celebrities or organizations, resulting in riots/protests or even death.
The massive growth in the proliferation of fake news online might result from numerous technological advancements. Fake news seems to be the permanent reality, with social media being a primary conduit for its creation and dissemination. Despite the difficulty in identifying, tracking, and controlling unreliable content, there must be an effort to halt its expansion. Our research endeavors contribute to tackling various aspects of fake news, encompassing identification, inspection, and intervention. The premise of our thesis is firmly placed at the point where we analyze multiple facets of user-generated content produced online in the form of text and visuals to investigate the field of fake news.
First, we focus on devising different methods to Identify, a.k.a. detect fake news online, by extracting different feature sets from the given information. By designing foundational detection mechanisms, our work accelerates research innovations. Second, our research closely Inspects the fake stories from two perspectives. First, from the information point of view, one can inspect fabricated content to identify the patterns of false reports disseminating over the web, the modality used to create the fabricated content and the platform used for dissemination. Next, from the model point of view, we inspect detection mechanisms used in prior work and their generalizability to other datasets. The thesis also suggests Intervention techniques to help internet users broaden their comprehension of fake news. We discuss potential practical implications for social media platform owners and policymakers.
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
Discuss work on using technology for Judiciary, Lawyers, etc. Analyse social media data, music listening habits for mental health. Bias and Safety in AI Systems.
Papers are available at https://precog.iiit.ac.in/pages/publications.html
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
This document appears to be the transcript from a B.Tech orientation presentation given by Ponnurangam Kumaraguru at IIIT Hyderabad. The presentation provides advice and encouragement to new students on managing their time at IIIT Hyderabad. It emphasizes making friends, trying new activities and clubs, controlling wants and FOMO, celebrating failures, and using social media to connect with others in a positive way. References are made to movies to illustrate points about perseverance, finding passion, and having a growth mindset during the transition to university life.
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
We investigate two specific forms of linguistic ambiguities - polysemy, which is the multiplicity of meanings for a specific word, and tautology, which are seemingly uninformative and ambiguous phrases used in conversations. Both phenomena are widely-known manifestations of linguistic ambiguity at the lexical and pragmatic level, respectively.
The first part of the thesis focuses on addressing this challenge by proposing a new method for quantifying the degree of polysemy in words, which refers to the number of distinct meanings that a word can have. The proposed approach is a novel, unsupervised framework to compute and estimate polysemy scores for words in multiple languages, infusing syntactic knowledge in the form of dependency structures. The proposed framework is tested on curated datasets controlling for different sense distributions of words in three typologically diverse languages - English, French, and Spanish. The framework leverages contextual language models and syntactic structures to empirically support the widely held theoretical linguistic notion that syntax is intricately linked to ambiguity/polysemy.
The second part of the thesis explores how language models handle colloquial tautologies, a type of redundancy commonly used in conversational speech. We first present a dataset of colloquial tautologies and evaluate several state-of-the-art language models on this dataset using perplexity scores. We conduct probing experiments while controlling for the noun type, context and form of tautologies. The results reveal that BERT and GPT2 perform better with modal forms and human nouns, which aligns with previous literature and human intuition.
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
Talk describes legal NLP idea discusses the following papers:
HLDC: Hindi Legal Documents Corpus https://precog.iiit.ac.in/pubs/HLDC_ACL_2022.pdf
Drug consumption: https://precog.iiit.ac.in/pubs/Effect_oF_Feedback_on_Drug_Consumption_Disclosures_on_Social_Media___ICWSM2023___16Sept1730hrs.pdf
This document provides tips for writing a good research paper. It discusses selecting an appropriate topic and audience, developing an outline, writing drafts for feedback, choosing a descriptive title, writing a literature review, crafting an introduction, including figures and tables, addressing reviewer comments, avoiding plagiarism, and acknowledging collaborators. The goal is to write papers that clearly communicate research and can be improved based on feedback from others.
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
This document summarizes research on evaluating algorithmic bias in models trained on Hindi legal documents. The researchers collected a dataset of 900k legal documents from Uttar Pradesh courts in Hindi. They trained a bail prediction model on this data and evaluated it for demographic parity bias related to religious attributes. The results showed the model predictions changed more when replacing Hindu names with Muslim names compared to the reverse, indicating a potential bias against Muslims. Overall, the study highlights the need to evaluate models trained on real-world legal data for fairness to avoid perpetuating societal biases.
I discussed our work on #LegalAI #CodeMixing #FakeNews #Elections and other cool projects that we are currently working on at https://precog.iiit.ac.in/
The document discusses social computing research in India, focusing on legal AI and natural language processing applications. It summarizes work analyzing over 900,000 Hindi legal documents from district courts in Uttar Pradesh. Models were developed for tasks like bail prediction and legal document summarization. The research also addresses challenges in processing code-mixed text and fact-checking social media. Overall, the document outlines current research areas and opportunities in social computing for Indian contexts and languages, and provides contact information for those interested in the work.
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
Do online interactions trigger reactions back in the offline world? How can these reactions be detected and quantified? Specifically, what insights can be extracted for users, platform owners, and policymakers to minimize the potential harm of such reactions?
Society functions based on the complex interactions between individuals, communities, and organizations. The advent of the Internet has enabled these interactions to move online. A website or an application that facilitates the digitization of social interactions is called a socio-technical platform. For instance, individuals converse with each other via direct messaging applications (e.g., WhatsApp, Telegram), share thoughts, and gather feedback from communities (e.g., Reddit, Twitter, Youtube). Trade of goods occurs via e-commerce (e.g., Flipkart, Amazon) and online marketplaces (e.g., Google Play store). At times interactions happening in the online world, trigger reactions in the offline world, which we call overflow. Such overflows can have either a positive or negative impact. Socio-technical platforms save every interaction and associated metadata, providing a unique opportunity to analyze rich data at scale. Discover interaction patterns, detect and quantify overflow of interactions, and extract insights for users and policymakers.
This report aims to study the interactions by keeping the individual as the focal point. We focus on two broad forms of interactions - i) the effect online community feedback can have on individual offline actions and ii) organizations leveraging individual customers' online presence to optimize business processes. In the first part, we work on two scenarios - (a) How does community feedback affect an individual future drug consumption frequency in a drug community forum? and (b) What changes does an individual undergo immediately after getting sudden popularity in Online social media? What actions help in maintaining popularity for longer? In the second part, we leverage online information about a customer to improve the prediction of Return-to-Origin in the e-commerce platform.
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
The document provides an overview of privacy concepts including definitions of privacy, forms of privacy, social media privacy, data anonymity, and ethics around studying privacy. It discusses Westin's four states of privacy (solitude, intimacy, anonymity, reserve) and Solove's taxonomy of privacy harms. It also covers Westin's privacy indexes, privacy studies conducted in India, OECD and FTC privacy principles, and the costs of reading privacy policies.
The document then discusses privacy enhancing technologies like communication anonymizers, shared bogus online accounts, obfuscation, and anonymization. Examples of privacy invasive technologies like spyware and RFID are also provided. Privacy decision making frameworks like Platform for Privacy Preferences (P3P)
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
The document provides advice and guidance for students transitioning to campus life from PK, a professor at IIIT Hyderabad. It includes quotes and links related to time management, pursuing interests, controlling wants and FOMO, exploring options before deciding on majors, aiming high while making consistent small progress, and asking for help when needed. The document acknowledges students who provided inputs and the knowledge gained from others to help with advising students.
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
The document is a transcript of a talk given by PK to B.Tech orientation students at IIIT Hyderabad. Some key points from the talk include:
- Encouraging students to make friends, participate in clubs and extracurricular activities.
- Emphasizing time management and not missing out on opportunities in the first two semesters.
- Advising students to explore different projects and areas before deciding on a focus or specialization.
- Noting that managing courses, social life, hobbies and more will be challenging but important during their time at IIIT.
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
Stress has become a significant mental health problem of the 21st century. The number of people suffering from stress is increasing rapidly. Thus, easy-to-use, inexpensive, and accurate biomarkers are needed to detect stress during its inception. Early detection of stress-related diseases allows people to access healthcare services. This thesis focuses on the development of stress stimuli and the detection of stress induced by these stimuli. Identifying brain regions affected while exposing the subject to these stressful stimuli has also been done. Three different stimuli, viz. videos, gamified application, and a game, are investigated to study their effect as stress induction stimuli. To this end, in this thesis, a system is proposed to classify participants into stressed and non-stressed categories using machine learning, deep learning, and statistical techniques. The statistical significance between stressed and non-stressed was found using Higuchi Fractal Dimensions (HFD) feature extracted from EEG. This feature also helped identify the brain’s most affected region due to stress. Another outcome of this thesis is the extra annotation of the ground truth which further helps to validate the participant’s experience under the influence of stressful stimuli. This annotation was performed by evaluating participant performance under time pressure. In addition, a technique based on in-game analytics is presented to complement the betterment of self-reported data. Further, another dimension utilizing signatures from WiFi Media Access Control (MAC) layer traffic is presented to detect stress indicators in a device-agnostic way.
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
The distribution of research efforts done in the field of Natural Language
Processing (NLP) has not been uniform across all natural languages. It has
been observed that there is a significant gap between the development of
NLP tools in Indic languages (indic-NLP), and in European languages. We
aim to explore different directions to develop an automatic question answering system for Indic languages. We built a FAQ-retrieval based chatbot for
healthcare workers and young mothers of India. It supported Hindi language in either Devanagri script or Roman script. We observed that, in our
FAQ database, if there exists a question similar to the query asked by the
user, then the developed chatbot is able to find a relevant Question-Answer
pair (QnA) among its top-3 suggestions 70% of the time. We also observed
that performance of our chatbot is dependent on the diversity in the FAQ
database. Since database creation requires substantial manual efforts, we decided to explore other ways to curate knowledge from raw text irrespective
of domain.
We developed an Open Information Extraction (OIE) tool for Indic languages. During the preprocessing, chunking of text is performed with our
fine-tuned chunker, and the phrase-level dependency tree was constructed
using the predicted chunks. In order to generate triples, various rules were
handcrafted using the dependency relations in Indic languages. Our method
performed better than other multilingual OIE tools on manual and automatic evaluations. The contextual embeddings used in this work does not
take syntactic structure of sentence into consideration. Hence, we devised
an architecture that takes the dependency tree of the sentence into consideration to calculate Dependency-aware Transformer (DaT) embeddings.
Since the dependency tree is also a graph, we used Graph Convolution
Network (GCN) to incorporate the dependency information into the contextual embeddings, thus producing DaT embeddings. We used a hate-speech
detection task to evaluate the effectiveness of DaT embeddings. Our future
plan is to evaluate the applicability of DaT embeddings for the task of chunking. Moreover, the broader aim for the future is to develop an end-to-end
pronoun resolution model to improve the quality of triples and DaT embeddings. We also aim to explore the applicability of all our works to solve the
problem of long-context question answering.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
A workshop hosted by the South African Journal of Science aimed at postgraduate students and early career researchers with little or no experience in writing and publishing journal articles.
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
This presentation includes basic of PCOS their pathology and treatment and also Ayurveda correlation of PCOS and Ayurvedic line of treatment mentioned in classics.
How to Manage Your Lost Opportunities in Odoo 17 CRMCeline George
Odoo 17 CRM allows us to track why we lose sales opportunities with "Lost Reasons." This helps analyze our sales process and identify areas for improvement. Here's how to configure lost reasons in Odoo 17 CRM
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Designing and Evaluating Techniques to Mitigate Misinformation Spread on Micro-blogging Web Services
1. Designing and Evaluating Techniques to
Mitigate Misinformation Spread on
Micro-blogging Web Services"
Adi$
Gupta
Under
the
Supervision
of
Dr.
Ponnurangam
Kumaraguru
Indraprastha
Ins9tute
of
Informa9on
Technology,
Delhi
July
6,
2015
2. Power of Social Media"
2
300
hours
of
video
uploaded
every
minute
500
million
tweets
posted
every
day
1.44
Billion
monthly
ac$ve
users
60
million
photos
shared
everyday
*
2015
Sta9s9cs
9. Aim"
Designing
and
Evalua9ng
Techniques
to
Mi9gate
Misinforma9on
Spread
on
Micro-‐blogging
Web
Services
9
10. Proposed Solution"
10
– Learning
to
Rank
model
for
assessing
credibility
of
Tweets
– Model
based
on
ground
truth
data
for
20
real
world
events
and
45
features
– System
evalua9on
using
year
long
real
world
experiment
– 1800+
users
requested
for
credibility
score
of
more
than
14.2
million
tweets.
12. Approach"
12
Characterizing
Misinforma$on
and
Fake
Content
Ranking
Framework
to
Assess
Credibility
Building
and
Evalua$ng
a
Real-‐
$me
System
Detec9ng
fake
images
(Hurricane
sandy)
Analyzing
rumor
propaga9on
(Boston
blasts)
Detec9ng
user
communi9es
(three
events)
Analyzing
rumors
spread
in
India
centric
events
(Mumbai
blasts
and
Assam
riots)
14
events
data
tagging
30%
of
tweets
provide
informa9on
(17%
credible
informa9on
Linear
logis9c
regression
Present
ranking
algorithm
to
assess
credibility
in
tweets
using
pseudo
relevance
feedback
45
features
computable
for
a
single
tweet
Live
deployment:
1,800+
TwiOer
users
Credibility
score
computed
for
14+
Million
tweets
Evaluated
TweetCred
in
terms
of
response
9me,
effec9veness
and
usability
13. Data Collection"
– Created
a
24*7
data
collec9on
framework
- Streaming
/
REST
APIs
- JSON
Format
- MySql
Databases
– Collected
2+
Billion
tweets
from
2011-‐14
13
14. Approach"
14
Characterizing
Misinforma$on
and
Fake
Content
Ranking
Framework
to
Assess
Credibility
Building
and
Evalua$ng
a
Real-‐
$me
System
Detec9ng
fake
images
(Hurricane
sandy)
Analyzing
rumor
propaga9on
(Boston
blasts)
Detec9ng
user
communi9es
(three
events)
Analyzing
rumors
spread
in
India
centric
events
(Mumbai
blasts
and
Assam
riots)
14
events
data
tagging
30%
of
tweets
provide
informa9on
(17%
credible
informa9on
Linear
logis9c
regression
Present
ranking
algorithm
to
assess
credibility
in
tweets
using
pseudo
relevance
feedback
45
features
computable
for
a
single
tweet
Live
deployment:
1,800+
TwiOer
users
Credibility
score
computed
for
14+
Million
tweets
Evaluated
TweetCred
in
terms
of
response
9me,
effec9veness
and
usability
15. Background: Hurricane Sandy"
– Dates:
Oct
22-‐
31,
2012
– Damages
worth
$75
billion
– Coast
of
NE
America
15
Faking
Sandy:
Characterizing
and
Iden9fying
Fake
Images
on
TwiOer
during
Hurricane
Sandy.
Adi9
Gupta,
Hemank
Lamba,
Ponnurangam
Kumaraguru
and
Anupam
Joshi.
Accepted
at
the
2nd
Interna9onal
Workshop
on
Privacy
and
Security
in
Online
Social
Media
(PSOSM),
in
conjunc9on
with
the
22th
Interna9onal
World
Wide
Web
Conference
(WWW),
Rio
De
Janeiro,
Brazil,
2013.
Best
Paper
Award.
17. Data Description"
17
Total
tweets
1,782,526
Total
unique
users
1,174,266
Tweets
with
URLs
622,860
Tweets
with
fake
images
10,350
Users
with
fake
images
10,215
Tweets
with
real
images
5,767
Users
with
real
images
5,678
18. Network Analysis"
18
Tweet
–
Retweet
graph
for
the
propaga9on
of
fake
images
during
first
2
hours
Node
-‐>
User
Id
Edge
-‐>
Retweet
19. Role of Twitter Network"
– Analyzed
role
of
follower
network
in
fake
image
propaga9on
– Crawled
the
TwiOer
network
for
all
users
who
tweeted
the
fake
image
URLs
19
– Graph
1
- Nodes:
Users,
Edges:
Retweets
– Graph
2
- Nodes:
Users,
Edges:
Follow
rela9onships
20. Results"
20
Total
edges
in
retweet
network
10,508
Total
edges
in
follower-‐followee
network
10,799,122
Common
edges
1,215
%age
Overlap
11%
21. Classification"
5
fold
cross
valida9on
21
Tweet
Features
[F2]
Length
of
Tweet
Number
of
Words
Contains
Ques9on
Mark?
Contains
Exclama9on
Mark?
Number
of
Ques9on
Marks
Number
of
Exclama9on
Marks
Contains
Happy
Emo9con
Contains
Sad
Emo9con
Contains
First
Order
Pronoun
Contains
Second
Order
Pronoun
Contains
Third
Order
Pronoun
Number
of
uppercase
characters
Number
of
nega9ve
sen9ment
words
Number
of
posi9ve
sen9ment
words
Number
of
men9ons
Number
of
hashtags
Number
of
URLs
Retweet
count
User
Features
[F1]
Number
of
Friends
Number
of
Followers
Follower-‐Friend
Ra9o
Number
of
9mes
listed
User
has
a
URL
User
is
a
verified
user
Age
of
user
account
22. Classification Results"
22
F1
(user)
F2
(tweet)
F1+F2
Naïve
Bayes
56.32%
91.97%
91.52%
Decision
Tree
53.24%
97.65%
96.65%
• Best
results
were
obtained
from
Decision
Tree
classifier,
we
got
97%
accuracy
in
predic9ng
fake
images
from
real.
• Tweet
based
features
are
very
effec9ve
in
dis9nguishing
fake
images
tweets
from
real,
while
the
performance
of
user
based
features
was
very
poor.
23. Boston Blasts"
– Twin
blasts
occurred
during
the
Boston
Marathon
- April
15th,
2013
at
18:50
GMT
– 3
people
were
killed
and
264
were
injured
– First
Image
on
TwiOer
(within
4
mins)
23
$1.00
per
RT
#BostonMarathon
#PrayForBoston:
Analyzing
Fake
Content
on
TwiOer.
Adi9
Gupta,
Hemank
Lamba
and
Ponnurangam
Kumaraguru.
Accepted
at
IEEE
APWG
eCrime
Research
Summit
(eCRS),
San
Francisco,
USA,
2013.
25. Data Description"
Total tweets 7,888,374
Total users 3,677,531
Time of the blast Mon Apr 15 18:50 2013
Time of first tweet Mon Apr 15 18:53 2013
25
31. Spread of Fake Content"
– Using
linear
regression
– Predict
how
viral
a
rumor
would
get
- Based
on
aOributes
of
users
who
are
propaga9ng
the
rumor
– Based
on:
- Follower
- Friends
- Favorited
- Status
- Verified
31
32. Predicting Spread of Fake Content"
32
Results
show
it
is
possible
to
predict
how
viral
a
rumor
would
become
in
future
based
on
aOributes
of
users
currently
propaga9ng
the
rumor.
34. Approach"
34
Characterizing
Misinforma$on
and
Fake
Content
Ranking
Framework
to
Assess
Credibility
Building
and
Evalua$ng
a
Real-‐
$me
System
Detec9ng
fake
images
(Hurricane
sandy)
Analyzing
rumor
propaga9on
(Boston
blasts)
Detec9ng
user
communi9es
(three
events)
Analyzing
rumors
spread
in
India
centric
events
(Mumbai
blasts
and
Assam
riots)
14
events
data
tagging
30%
of
tweets
provide
informa9on
(17%
credible
informa9on
Linear
logis9c
regression
Present
ranking
algorithm
to
assess
credibility
in
tweets
using
pseudo
relevance
feedback
45
features
computable
for
a
single
tweet
Live
deployment:
1,800+
TwiOer
users
Credibility
score
computed
for
14+
Million
tweets
Evaluated
TweetCred
in
terms
of
response
9me,
effec9veness
and
usability
Credibility
Ranking
of
Tweets
during
High
Impact
Events.
Adi9
Gupta
and
Ponnurangam
Kumaraguru,
Workshop
on
Privacy
and
Security
on
Online
Social
Media
(PSOSM),
co-‐located
with
the
21st
Interna9onal
World
Wide
Web
Conference
(WWW),
Lyon,
France,
2012.
35. Tweets about an Event"
35
Tweets
#event
Informa$on
No
informa$on
Tweets
with
informa$on
Credible
Informa$on
Non-‐
Credible
Informa$on
Fake
news
/
Rumors
Personal
Opinions
/
Spam
No.
of
people
affected
Place
of
event
Pictures
/
videos
38. Data Statistics"
Events Tweets Trending Topics
UK Riots 542,685 #ukriots, #londonri- ots, #prayforlondon
Libya Crisis 389,506 libya, tripoli
Earthquake in Virginia 277,604 #earthquake, Earth- quake in SF
JanLokPal Bill Agitation 182,692 Anna Hazare, #jan- lokpal, #anna
Apple CEO Steve Jobs resigns 158,816 Steve Jobs, Tim Cook, Apple CEO
US Downgrading 148,047 S&P, AAA to AA
Hurricane Irene 90,237 Hurricane Irene, Tropical Storm Irene
Google acquires Motorola Mobility 68,527 Google, Motorola Mobility
News of the World Scandal 67,602 Rupert Murdoch, #murdoch
Abercrombie & Fitch stocks drop 54,763 Abercrombie & Fitch, A&F
Muppets Bert and Ernie were gay 52,401 Bert and Ernie
Indiana State Fair Tragedy 49,924 Indiana State Fair
Mumbai Blast, 2011 32,156 #mumbaiblast, Dadar, #needhelp
New Facebook Messenger 28,206 Facebook Messenger 38
39. Annotation"
– Step
1
- R1.
Contains
informa9on
about
the
event
- R2.
Is
related
to
the
event,
but
contains
no
informa9on
- R3.
Not
related
to
the
event
- R4.
Skip
tweet
– Step
2
- C1.
Definitely
credible
- C2.
Seems
credible
- C3.
Definitely
incredible
- C4.
Skip
tweet.
39
40. Annotation Results"
40
– Each
tweet
annotated
by
3
people
– Inter-‐annotator
agreement
(Cronbach
Alpha)
=
0.748
– 30%
of
tweets
provide
informa9on
(17%
credible
informa9on)
and
14%
was
spam
41. Feature Sets"
41
Message based features
Length of the tweet
Number of words
Number of unique characters
Number of hashtags
Number of retweets
Number of swear language words
Number of positive sentiment words
Number of negative sentiment words
Tweet is a retweet
Number of special symbols [$, !]
Number of emoticons [:-), :-(]
Tweet is a reply
Number of @- mentions
Number of retweets
Time lapse since the query
Has URL
Number of URLs
Use of URL shortener service
Message based features
Length of the tweet
Number of words
Source based features
Registration age of the user
Number of statuses
Number of followers
Number of friends
Is a verified account
Length of description
Length of screen name
Has URL
Ratio of followers to followees
Source based features
Registration age of the user
Number of statuses
Number of followers
42. Evaluation Metric"
42
Evalua9on
Metric:
NDCG
(Normalized
Discounted
Cumula9ve
Gain)
NDCG
is
the
standard
metric
used
to
evaluate
“graded”
results
43. Ranking Results"
43
• Tweet
and
user
based
features
contribute
in
determining
the
credibility
–
it
maOers
“what
you
post
and
who
you
are”
44. PRF"
– PRF
(Pseudo
Relevance
Feedback)
- Extract
k
ranked
documents
and
then
re-‐rank
those
documents
according
to
a
defined
score
- Re-‐ranking
based
on
‘top
words’
of
an
event
- Top
n
unigrams
based
on
BM25
ranking
func9on
44
47. Approach"
47
Characterizing
Misinforma$on
and
Fake
Content
Ranking
Framework
to
Assess
Credibility
Building
and
Evalua$ng
a
Real-‐
$me
System
Detec9ng
fake
images
(Hurricane
sandy)
Analyzing
rumor
propaga9on
(Boston
blasts)
Detec9ng
user
communi9es
(three
events)
Analyzing
rumors
spread
in
India
centric
events
(Mumbai
blasts
and
Assam
riots)
14
events
data
tagging
30%
of
tweets
provide
informa9on
(17%
credible
informa9on
Linear
logis9c
regression
Present
ranking
algorithm
to
assess
credibility
in
tweets
using
pseudo
relevance
feedback
45
features
computable
for
a
single
tweet
Live
deployment:
1,800+
TwiOer
users
Credibility
score
computed
for
14+
Million
tweets
Evaluated
TweetCred
in
terms
of
response
9me,
effec9veness
and
usability
TweetCred:
Real-‐Time
Credibility
Assessment
of
Content
on
TwiOer.
Adi9
Gupta,
Ponnurangam
Kumaraguru,
Carlos
Cas9llo
and
Patrick
Meier.
Proceedings
of
the
6th
Interna9onal
Conference
on
Social
Informa9cs
(SocInfo),
Barcelona,
Spain,
2014.
Honorable
Men$on
for
Best
Paper.
49. Features for Real-time Analysis"
49
Feature
set
Features
(45)
Tweet
meta-‐data
Number
of
seconds
since
the
tweet;
Source
of
tweet
(mobile
/
web/
etc);
Tweet
contains
geo-‐coordinates
Tweet
content
(simple)
Number
of
characters;
Number
of
words;
Number
of
URLs;
Number
of
hashtags;
Number
of
unique
characters;
Presence
of
stock
symbol;
Presence
of
happy
smiley;
Presence
of
sad
smiley;
Tweet
contains
`via';
Presence
of
colon
symbol
Tweet
content
(linguis9c)
Presence
of
swear
words;
Presence
of
nega9ve
emo9on
words;
Presence
of
posi9ve
emo9on
words;
Presence
of
pronouns;
Men9on
of
self
words
in
tweet
(I;
my;
mine)
Tweet
author
Number
of
followers;
friends;
9me
since
the
user
if
on
TwiOer;
etc.
Tweet
network
Number
of
retweets;
Number
of
men9ons;
Tweet
is
a
reply;
Tweet
is
a
retweet
Tweet
links
WOT
score
for
the
URL;
Ra9o
of
likes
/
dislikes
for
a
YouTube
video
50. Training Data"
– 500
Tweets
per
event
– Used
CrowdFlower
service
50
Event
Tweets
Users
Boston
Marathon
Blasts
(2013)
7,888,374
3,677,531
Typhoon
Haiyan
/
Yolanda
(2013)
671,918
368,269
Cyclone
Phailin
(2013)
76,136
34,776
Washington
Navy
yard
shoo9ngs
(2013)
484,609
257,682
Polar
vortex
cold
wave
(2014)
143,959
116,141
Oklahoma
Tornadoes
(2013)
809,154
542,049
Total
10,074,150
4,996,448
51. Annotation"
– Step
1
- R1.
Contains
informa9on
about
the
event
- R2.
Is
related
to
the
event,
but
contains
no
informa9on
- R3.
Not
related
to
the
event
- R4.
Skip
tweet
45%
(class
R1),
40%
(class
R2),
and
15%
(class
R3)
– Step
2
- C1.
Definitely
credible
- C2.
Seems
credible
- C3.
Definitely
incredible
- C4.
Skip
tweet.
52%
(class
C1),
35%
(class
C2),
and
13%
(class
C3)
51
53. Top Ten Features"
– No.
of
characters
in
tweet
– Unique
characters
in
tweet
– No.
of
words
in
tweet
– User
has
loca9on
in
profile
– Number
of
retweets
– Age
of
tweet
– Tweet
contains
URL
– Tweet
contains
via
– Statuses
/
Followers
– Friends
/
Followers
53
56. Usage Statistics"
Date
of
launch
of
TweetCred
27
Apr,
2014
Credibility
score
requests
received
14,234,131
Unique
TwiOer
users
1,808
Feedback
was
given
for
tweets
1,654
Unique
users
who
gave
feedback
364
56
*
Data
as
on
April’15
57. Users of TweetCred"
Sample
users:
- Emergency
responders
- Firefighters
- Journalists
/
news
media
- General
users
- Researchers
(Requested
API
tokens)
57
60. Limitations & Future Work"
– Current
research
focuses
on
TwiOer,
we
would
like
analyze
credibility
of
content
on
different
social
media
using
similar
framework
– We
would
like
to
enhance
the
current
system
to
indicate
tweets
that
are
9mely,
factual,
well-‐wriOen,
etc.
60
61. Contributions Summary"
– Analyzed
how
real
and
fake
content
is
propagated
through
the
TwiOer
network,
with
the
purpose
of
assessing
the
reliability
of
TwiOer
as
an
informa9on
source
during
real-‐world
events.
– Proposed
a
learning-‐to-‐rank
framework
for
assessing
credibility
of
content
on
TwiOer
using
a
combina9on
of
content,
meta-‐data,
network,
user
profile
and
temporal
features.
– Evaluated
and
deployed
a
novel
framework
for
providing
indica9on
of
trustworthiness
/
credibility
of
tweets
posted
during
events.
61
62. Real world Impact"
– The
real-‐9me
system
TweetCred
built
to
assess
credibility
of
content
on
TwiOer
is
used
by
1,808
real
TwiOer
users
to
obtain
credibility
scores
for
more
than
14.2
million
tweets.
– A
unique
data
set
of
thousands
of
fake
images,
rumor
tweets
and
malicious
profiles
for
25+
real-‐world
events.
62
63. Publications"
– Peer
Reviewed
Publica9ons
- TweetCred:
Real-‐Time
Credibility
Assessment
of
Content
on
TwiOer.
Adi9
Gupta,
Ponnurangam
Kumaraguru,
Carlos
Cas9llo
and
Patrick
Meier.
Proceedings
of
the
6th
Interna9onal
Conference
on
Social
Informa9cs
(SocInfo),
Barcelona,
Spain,
2014.
Honorable
Men9on
for
Best
Paper.
- $1.00
per
RT
#BostonMarathon
#PrayForBoston:
Analyzing
Fake
Content
on
TwiOer.
Adi9
Gupta,
Hemank
Lamba
and
Ponnurangam
Kumaraguru.
Accepted
at
IEEE
APWG
eCrime
Research
Summit
(eCRS),
San
Francisco,
USA,
2013.
- Faking
Sandy:
Characterizing
and
Iden9fying
Fake
Images
on
TwiOer
during
Hurricane
Sandy.
Adi9
Gupta,
Hemank
Lamba,
Ponnurangam
Kumaraguru
and
Anupam
Joshi.
Accepted
at
the
2nd
Interna9onal
Workshop
on
Privacy
and
Security
in
Online
Social
Media
(PSOSM),
in
conjunc9on
with
the
22th
Interna9onal
World
Wide
Web
Conference
(WWW),
Rio
De
Janeiro,
Brazil,
2013.
Best
Paper
Award.
- Iden9fying
and
Characterizing
User
Communi9es
on
TwiOer
during
Crisis
Events.
Adi9
Gupta,
Anupam
Joshi
and
Ponnurangam
Kumaraguru.
Workshop
on
Data-‐driven
User
Behavioral
Modeling
and
Mining
from
Social
Media
(UMSOCIAL),
Co-‐located
with
21st
ACM
Interna9onal
Conference
on
Informa9on
and
Knowledge
Management
(CIKM),
Hawaii,
USA,
2012.
- Credibility
Ranking
of
Tweets
during
High
Impact
Events.
Adi9
Gupta
and
Ponnurangam
Kumaraguru,
Workshop
on
Privacy
and
Security
on
Online
Social
Media
(PSOSM),
co-‐located
with
the
21st
Interna9onal
World
Wide
Web
Conference
(WWW),
Lyon,
France,
2012.
- Beware
of
What
You
Share:
Inferring
Home
Loca9on
in
Social
Networks.
Ta9ana
Pontes,
Gabriel
Magno,
Marisa
Vasconcelos,
Adi9
Gupta,
Jussara
Almeida,
Ponnurangam
Kumaraguru
and
Virgilio
Almeida,
Privacy
in
Social
Data
(PinSoda),
in
conjunc9on
with
Interna9onal
Conference
on
Data
Mining
(ICDM)
(2012).
63
64. Publications"
– Peer
Reviewed
Publica9ons
(Posters)
- Analyzing
and
Measuring
Spread
of
Fake
Content
on
TwiOer
during
High
Impact
Events.
Adi9
Gupta,
Hemank
Lamba,
Ponnurangam
Kumaraguru.
Security
and
Privacy
Symposium
IIT,
Kanpur,
2014.
Best
Poster
Winner.
- Twit-‐Digest
Version
2:
An
Online
Solu9on
for
Analyzing
and
Visualizing
TwiOer
in
Real-‐Time.
Adi9
Gupta,
Mayank
Gupta,
Ponnurangam
Kumaraguru.
Security
and
Privacy
Symposium
IIT,
Kanpur,
2014.
- Twit-‐Digest:
Real-‐9me
TwiOer
search
portal
for
extrac9ng,
tracking
and
visualizing
informa9on.
Adi9
Gupta,
Akshit
Chhabra
and
Ponnurangam
Kumaraguru.
IBM
ICARE
2012.
2nd
Runner’s
Up
prize
Best
Poster.
- U2P2:
Understanding
User
Privacy
Percep9ons,
Niharika
Sachdeva,
Ponnurangam
Kumaraguru
and
Adi9
Gupta,
Poster
at
IBM-‐ICARE,
2011.
– Book
Chapter
- Misinforma9on
on
TwiOer
during
Crisis
Events.
Encyclopedia
of
Social
Network
Analysis
and
Mining
(ESNAM).
Adi9
Gupta,
Ponnurangam
Kumaraguru.
Book
Chapter.
Springer
publica9ons.
2012.
64