Social network and publishing platforms, such as Twitter, support the concept of verification. Veri-
fied accounts are deemed worthy of platform-wide public interest and are separately authenticated by the platform itself. There have been repeated assertions by these platforms about verification not being tan-
tamount to endorsement. However, a significant body of prior work suggests that possessing a verified
status symbolizes enhanced credibility in the eyes of the platform audience. As a result, such a station
is highly coveted among public figures and influencers. Hence, we attempt to characterize the network
of verified users on Twitter and compare the results to similar analyses performed for the entire Twit-
ter network. We extracted the whole graph of verified users on Twitter (as of July 2018) and obtained
231,246 English user-profiles and 79,213,811 connections. Subsequently, in the network analysis, we
found that the sub-graph of verified users mirrors the full Twitter users graph in some aspects, such as
possessing a short diameter. However, our findings contrast with earlier results on multiple fronts, such
as the possession of a power-law out-degree distribution, slight dissortativity, and a significantly higher
reciprocity rate, as elucidated in the paper. Moreover, we attempt to gauge the presence of salient com-
ponents within this sub-graph and detect the absence of homophily with respect to popularity, which
again is in stark contrast to the full Twitter graph. Finally, we demonstrate stationarity in the time series
of verified user activity levels.
It is in this backdrop that we attempt to deconstruct the extent to which Twitter’s verification policy
mingles the notions of authenticity and authority. To this end, we seek to unravel the aspects of a user’s
profile, which likely engender or preclude verification. The aim of the paper is two-fold: First, we test
if discerning the verification status of a handle from profile metadata and content features is feasible.
Second, we unravel the characteristics which have the most significant bearing on a handle’s verification
status. We augmented our dataset with all the 494 million tweets of the aforementioned users over a one
year collection period along with their temporal social reach and activity characteristics. Our proposed
models are able to reliably identify verification status (Area under curve AUC > 99%). We show that
the number of public list memberships, presence of neutral sentiment in tweets and an authoritative
language style are the most pertinent predictors of verification status.
To the best of our knowledge, this work represents the first quantitative attempt at characterizing
verified users on Twitter and also the first attempt at discerning and classifying verification worthy users
on Twitter.
Link Prediction in (Partially) Aligned Heterogeneous Social NetworksSina Sajadmanesh
This document discusses link prediction in homogeneous and heterogeneous social networks. It begins by introducing the problem of link prediction and its applications. It then discusses various unsupervised and supervised methods for link prediction in homogeneous networks. Next, it covers relationship prediction and collective link prediction in heterogeneous networks. It also discusses link prediction in aligned heterogeneous networks using link transfer and anchor link inference. Finally, it outlines future work on this topic.
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
AN INTELLIGENT CLASSIFICATION MODEL FOR PHISHING EMAIL DETECTIONIJNSA Journal
Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are
communicated to people from professional hackers aiming at fooling users to reveal their sensitive
information, the most popular communication channel to those messages is through users’ emails. This
paper presents an intelligent classification model for detecting phishing emails using knowledge discovery,
data mining and text processing techniques. This paper introduces the concept of phishing terms weighting
which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by
applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied
the knowledge discovery procedures using five popular classification algorithms and achieved a notable
enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm
and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set.
This paper also presents a comparative study with similar proposed classification techniques.
Survey in Online Social Media Skelton by Network based SpamIRJET Journal
This document proposes a new system called Net Spam to identify spam reviews in online social media networks. Net Spam models review datasets as heterogeneous information networks where nodes can be reviews, users, products, or spam features. It introduces a novel weighting method to determine the relative importance of each spam feature in identifying spam reviews. When evaluated on real review datasets from Yelp and Amazon, Net Spam outperforms existing methods and shows that review behavioral features perform better than other feature types like user behavioral, review linguistic, and user semantic features. The results demonstrate that Net Spam can identify spam reviews with high accuracy and less computational complexity by utilizing the most important features.
Scalable recommendation with social contextual informationeSAT Journals
Abstract Recommender systems are used to achieve effective and useful results in a social networks. The social recommendation will provide a social network structure but it is challenging to fuse social contextual factors which are derived from user’s motivation of social behaviors into social recommendation. Here, we introduce two contextual factors in recommender systems which are used to adopt a useful results namely a) individual preference and b) interpersonal influence. Individual preference analyze the social interests of an item content with user’s interest and adopt only users recommended results. Interpersonal influence is analyzing user-user interaction and their specific social relations. Beyond this, we propose a novel probabilistic matrix factorization method to fuse them in a latent space. The scalable algorithm provides a useful results by analyzing the ranking probability of each user social contextual information and also incrementally process the contextual data in large datasets. Keywords: social recommendation, individual preference, interpersonal influence, matrix factorization
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYcscpconf
This document summarizes previous research on analyzing social media like Twitter to detect criminal or suspicious activity. It discusses frameworks that have used features of Twitter like hashtags, mentions, retweets etc. to analyze user profiles and communication networks. Some key approaches discussed include using graph analysis to identify influencers and automated accounts, predicting crime locations using Twitter data and linear regression, and detecting criminal networks through node analysis. However, many of the past studies did not fully address challenges like evaluating tweet and user credibility to reduce false positives.
Social Trust-aware Recommendation System: A T-Index ApproachNima Dokoohaki
"Social Trust-aware Recommendation System: A T-Index Approach"
Workshop on Web Personalization, Reputation and Recommender Systems (WPRRS09)
Held in conjunction with 2009 IEEE/ WIC/ ACM International Conference on Web Intelligence (WI 2009) and Intelligent Agent Technology,
http://www.wprrs.scitech.qut.edu.au/
Università degli Studi di Milano Bicocca, Milano, Italy
September 15–18, 2009
Link Prediction in (Partially) Aligned Heterogeneous Social NetworksSina Sajadmanesh
This document discusses link prediction in homogeneous and heterogeneous social networks. It begins by introducing the problem of link prediction and its applications. It then discusses various unsupervised and supervised methods for link prediction in homogeneous networks. Next, it covers relationship prediction and collective link prediction in heterogeneous networks. It also discusses link prediction in aligned heterogeneous networks using link transfer and anchor link inference. Finally, it outlines future work on this topic.
In the age of social media communication, it is easy to
modulate the minds of users and also instigate violent
actions being taken by them in some cases. There is a need
to have a system that can analyze the threat level of tweets
from influential users and rank their Twitter handles so
that dangerous tweets can be avoided going public on
Twitter before fact-checking which can hurt the sentiments
of people and can take the shape of violence. The study
aims to analyse and rank twitter users according to their
influential power and extremism of their tweets to help
prevent major protests and violent events. We scraped top
trending topics and fetched tweets using those hashtags.
We propose a custom ranking algorithm which considers
source based and content based features along with a
knowledge graph which generates the score and rank the
twitter users according to the scores. Our aim with this
study is to identify and rank extremist twitter users with
regards to their impact and influence. We use a technique
that takes into consideration both source based and
content-based features of tweets to generate the ranking of
the extremist twitter users having a high impact factor
AN INTELLIGENT CLASSIFICATION MODEL FOR PHISHING EMAIL DETECTIONIJNSA Journal
Phishing attacks are one of the trending cyber-attacks that apply socially engineered messages that are
communicated to people from professional hackers aiming at fooling users to reveal their sensitive
information, the most popular communication channel to those messages is through users’ emails. This
paper presents an intelligent classification model for detecting phishing emails using knowledge discovery,
data mining and text processing techniques. This paper introduces the concept of phishing terms weighting
which evaluates the weight of phishing terms in each email. The pre-processing phase is enhanced by
applying text stemming and WordNet ontology to enrich the model with word synonyms. The model applied
the knowledge discovery procedures using five popular classification algorithms and achieved a notable
enhancement in classification accuracy; 99.1% accuracy was achieved using the Random Forest algorithm
and 98.4% using J48, which is –to our knowledge- the highest accuracy rate for an accredited data set.
This paper also presents a comparative study with similar proposed classification techniques.
Survey in Online Social Media Skelton by Network based SpamIRJET Journal
This document proposes a new system called Net Spam to identify spam reviews in online social media networks. Net Spam models review datasets as heterogeneous information networks where nodes can be reviews, users, products, or spam features. It introduces a novel weighting method to determine the relative importance of each spam feature in identifying spam reviews. When evaluated on real review datasets from Yelp and Amazon, Net Spam outperforms existing methods and shows that review behavioral features perform better than other feature types like user behavioral, review linguistic, and user semantic features. The results demonstrate that Net Spam can identify spam reviews with high accuracy and less computational complexity by utilizing the most important features.
Scalable recommendation with social contextual informationeSAT Journals
Abstract Recommender systems are used to achieve effective and useful results in a social networks. The social recommendation will provide a social network structure but it is challenging to fuse social contextual factors which are derived from user’s motivation of social behaviors into social recommendation. Here, we introduce two contextual factors in recommender systems which are used to adopt a useful results namely a) individual preference and b) interpersonal influence. Individual preference analyze the social interests of an item content with user’s interest and adopt only users recommended results. Interpersonal influence is analyzing user-user interaction and their specific social relations. Beyond this, we propose a novel probabilistic matrix factorization method to fuse them in a latent space. The scalable algorithm provides a useful results by analyzing the ranking probability of each user social contextual information and also incrementally process the contextual data in large datasets. Keywords: social recommendation, individual preference, interpersonal influence, matrix factorization
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYcscpconf
This document summarizes previous research on analyzing social media like Twitter to detect criminal or suspicious activity. It discusses frameworks that have used features of Twitter like hashtags, mentions, retweets etc. to analyze user profiles and communication networks. Some key approaches discussed include using graph analysis to identify influencers and automated accounts, predicting crime locations using Twitter data and linear regression, and detecting criminal networks through node analysis. However, many of the past studies did not fully address challenges like evaluating tweet and user credibility to reduce false positives.
Social Trust-aware Recommendation System: A T-Index ApproachNima Dokoohaki
"Social Trust-aware Recommendation System: A T-Index Approach"
Workshop on Web Personalization, Reputation and Recommender Systems (WPRRS09)
Held in conjunction with 2009 IEEE/ WIC/ ACM International Conference on Web Intelligence (WI 2009) and Intelligent Agent Technology,
http://www.wprrs.scitech.qut.edu.au/
Università degli Studi di Milano Bicocca, Milano, Italy
September 15–18, 2009
Probabilistic Relational Models for Link Prediction ProblemSina Sajadmanesh
This document discusses probabilistic relational models for link prediction. It introduces probabilistic relational networks, including relational Bayesian networks (RBNs) and relational Markov networks (RMNs). RBNs define a joint probability distribution over attributes and relations. RMNs focus on symmetric interactions using clique templates. Both approaches can be used for link prediction by modeling factors that affect relations between entities, such as attributes, structural properties, and complex patterns.
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Graph Based User Interest Modeling in Twitterraghavr186
This document summarizes a research project on modeling user interests on Twitter using a graph-based approach. The project aims to predict a user's interest profile based on the interests of the users they follow on Twitter. Various graph features are explored as weighting schemes to calculate the influence of followers on a user's interests. Experimental results show that features based on retweets and mentions perform best at predicting interests, with F1 scores around 0.6. A composite model is also proposed that combines predictions from different weighting schemes using learned quality scores. Additionally, a machine learning model is trained to predict interests directly from graph features.
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?Molly Gibbons (she/her)
This document summarizes an analysis of tweets containing the terms "Brazil" and "Michel Temer" to understand the political and economic scenario in Brazil. RapidMiner was used to collect tweets over 17 days and the Rosette Text Toolkit categorized tweets and analyzed sentiment. For "Michel Temer", there was a weak to moderate negative correlation between sentiment and retweets, and 75% of tweets were negative. For tweets about "Brazil" categorized as law/politics, 62% were negative and the most mentioned entities were the Senate, President, and Supreme Court. The analysis demonstrates how RapidMiner and Rosette can be used together to understand sentiment in social media posts about political topics.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
The document summarizes a research paper that proposes a personalized recommendation approach combining social network factors like interpersonal interest similarity and interpersonal rating behavior similarity. It uses probabilistic matrix factorization to predict ratings by considering these social network factors. The approach is evaluated on two large real-world social rating datasets and shows improved performance over approaches that only use social network information.
This document is a project report submitted by four students - Anil Shrestha, Bijay Sahani, Bimal Shrestha, and Deshbhakta Khanal - to the Department of Electronics and Computer Engineering at Tribhuvan University in partial fulfillment of the requirements for a Bachelor's degree in Computer Engineering. The report details the development of a web application called "Tweezer" to perform sentiment analysis on tweets in order to determine public sentiment towards various products, services, or personalities. Literature on previous work related to sentiment analysis, especially on social media data like tweets, is also reviewed in the report.
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...pharmaindexing
This document discusses an efficient semantic data alignment approach based on fuzzy c-means (FCM) clustering to infer user search goals using feedback sessions. It aims to cluster similar pseudo-documents representing user feedback sessions to better understand user search intents for a given query. The approach first collects feedback sessions from search results and generates pseudo-documents. It then uses FCM clustering to group similar pseudo-documents while also measuring semantic similarity between terms. This is an improvement over k-means clustering which does not consider semantic similarity. The results are evaluated using metrics like classified average precision and show this FCM-based approach performs better than clustering without semantic alignment.
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGAndry Alamsyah
1. The document presents a case study analyzing tweets about Uber using sentiment analysis and topic modeling to understand public opinion from large-scale social media data.
2. Sentiment analysis classified tweets as positive, negative, or neutral, while topic modeling identified dominant topics of discussion, like promotions or driver complaints.
3. The analyses found that positive tweets often discussed promotions while negative tweets addressed issues like sexual harassment allegations or unsatisfactory drivers.
- The study analyzed over 43,000 ratings of tweets collected through a website that had users rate tweets in exchange for receiving feedback on their own tweets.
- They found that 36% of rated tweets were considered worth reading, 25% were not worth reading, and 39% were neutral. This suggests that users tolerate a large amount of less desirable content in their feeds.
- Through regression analysis, they determined that tweets sharing information, asking questions of followers, and self-promotion links were most valued, while presence maintenance updates, conversations, and personal status updates were less valued.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
In this paper, we consider a criminal investigation on the collective guilt of part members in a working group. Assuming that the statistics we used are reliable, we present the Page Rank Model based on mutual information. First, we use the average mutual information between non-suspicious topics and the suspicious topics to score the topics by degree of suspicion. Second, we build the correlation matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Then, we set the original value for all members of the working group based on the degree of suspicion. At the last, we calculate the suspected degree of each member in the working group. In the small 10-people case, we build the improved Page Rank model. By calculating the statistics of this case, we acquire a table which indicates the ranking of the suspected degree. In contrast with the results given in this issue, we find these two results basically match each other, indicating the model we have built is feasible. In the current case, firstly, we obtain a ranking list on 15 topics in order of suspicion via Page Rank Model based on mutual information. Secondly, we acquire the stable point of Markov state transition matrix using the Markov chain. Then, we build the connection matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Last, we calculate the degree of 83 candidates. From the result, we can see that those suspicious are on the top of the ranking list while those innocent people are at the bottom of the list, representing that the model we have built is feasible. When suspicious topics and conspirators changed, a relatively good result can also be obtained by this model. In the current case, we have the evidence to believe that Dolores and Jerome, who are the senior managers, have significant suspicion. It is recommended that future attention should be paid to them. The Page Rank Model, based on mutual information, takes full account of the information flow in message distribution network. This model can not only deal with the statistics used in conspiracy, but also be applied to detect the infected cells in a biological network. Finally, we present the advantages and disadvantages of this model and the direction of improvements.
Social media recommendation based on people and tags (final)es712
1) The document proposes methods to generate personalized recommendations in social media platforms based on people relationships and tags.
2) An evaluation of three recommendation approaches that utilize direct tags, indirect tags through related items, and incoming tags from other users found that a combination of direct tags and incoming tags most accurately represented a user's interests.
3) A user study tested five recommendation approaches and found that combining people relationships and tags into a user profile achieved the highest ratings for interesting recommendations and lowest for non-interesting items.
IRJET- Quantify Mutually Dependent Privacy Risks with Locality DataIRJET Journal
This document discusses how co-location information shared on social networks can threaten users' location privacy by enabling more accurate localization of users' locations over time. It formalizes the problem of quantifying privacy risks from co-location data and location information, and proposes optimal and approximate localization attack algorithms to incorporate co-location data. Experimental evaluations on mobility trace data show that considering a single friend's co-locations can decrease a user's median location privacy by up to 62%. Differential privacy perspectives are also discussed. The study aims to quantify the effect of co-location information on location privacy risks.
KnowMe and ShareMe: Understanding Automatically Discovered Personality Trai...Wookjae Maeng
There is much recent work on using the digital footprints left by people on social media to predict personal traits and gain a deeper understanding of individuals. Due to the veracity of social media, imperfections in prediction algorithms, and the sensitive nature of one’s personal traits, much research is still needed to better understand the effectiveness of this line of work, including users’ preferences of sharing their com- putationally derived traits. In this paper, we report a two- part study involving 256 participants, which (1) examines the feasibility and effectiveness of automatically deriving three types of personality traits from Twitter, including Big 5 per- sonality, basic human values, and fundamental needs, and (2) investigates users’ opinions of using and sharing these traits. Our findings show there is a potential feasibility of automati- cally deriving one’s personality traits from social media with various factors impacting the accuracy of models. The re- sults also indicate over 61.5% users are willing to share their derived traits in the workplace and that a number of factors significantly influence their sharing preferences. Since our findings demonstrate the feasibility of automatically infer- ring a user’s personal traits from social media, we discuss their implications for designing a new generation of privacy- preserving, hyper-personalized systems.
Abstract: This paper introduces a system for visual analysis of news articles and emails. The system was developed in response to VAST MiniChallenge 1 and comprises different interfaces for mining textual data and network data.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
In this research work we have built a systems which pulls tweets , pre-process each tweet to remove unwanted artifacts and the gives stemming treatment to each token in the tweet to finally get the score value which is calculated on the basis of numerical strength expressed in mathematical expression for intimacy, trust , social distance between the parties interacting on digital media twitter. The nature of relationship with respect their tie is calculated on the bases of multivariate linear regression analysis and to further improve on this method which have conducted polynomial regression for finding a better and more accurate way to data fit the twitter tie dataset which is apparent from the statistical test value of Coefficient of determinant and other statistical test .The values of y-intercepts are interpreted and illustrated in results section .
This document summarizes a study analyzing social media influence and credibility. A team of students and professors extracted different types of data from Twitter, including mentions of users, tweets by authorities, and keywords. They developed a semantic parser to analyze tweet content using ontological semantic technology. An initial linear score was formulated to measure user influence, and network analysis identified pivotal users between communities. Validation will compare content analysis to real-world events to supplement credibility assessment. The study has potential applications in public policy, business, psychology, and traffic monitoring.
The document summarizes an MSR presentation on rumor detection on real-time Twitter data using supervised learning. It discusses introducing rumor detection and reviewing literature on current methods. The proposed work involves collecting Twitter data, preprocessing it, extracting features, and classifying tweets using techniques like decision trees and SVM. It aims to accurately detect rumors on Twitter in real-time by analyzing sentiment and verifying information with news sources. The implementation strategy and environment are also outlined along with conclusions and future work.
Twitter: Social Network Or News Medium?Serge Beckers
This document analyzes Twitter as a social network and news media by studying its topological characteristics and information diffusion. The authors:
1) Crawled over 41 million user profiles, 1.47 billion social connections, and 106 million tweets to analyze Twitter's structure and behavior.
2) Found that Twitter has a non-power law distribution of followers, short paths of separation between users, and low reciprocity - distinguishing it from other social networks.
3) Ranked users by followers, PageRank and retweets, finding influence inferred from followers differs from popularity of tweets.
4) Analyzed trending topics and found most are news headlines that persist for days with participation from many users.
Probabilistic Relational Models for Link Prediction ProblemSina Sajadmanesh
This document discusses probabilistic relational models for link prediction. It introduces probabilistic relational networks, including relational Bayesian networks (RBNs) and relational Markov networks (RMNs). RBNs define a joint probability distribution over attributes and relations. RMNs focus on symmetric interactions using clique templates. Both approaches can be used for link prediction by modeling factors that affect relations between entities, such as attributes, structural properties, and complex patterns.
Tweet Segmentation and Its Application to Named Entity Recognition1crore projects
IEEE PROJECTS 2015
1 crore projects is a leading Guide for ieee Projects and real time projects Works Provider.
It has been provided Lot of Guidance for Thousands of Students & made them more beneficial in all Technology Training.
Dot Net
DOTNET Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
Java Project Domain list 2015
1. IEEE based on datamining and knowledge engineering
2. IEEE based on mobile computing
3. IEEE based on networking
4. IEEE based on Image processing
5. IEEE based on Multimedia
6. IEEE based on Network security
7. IEEE based on parallel and distributed systems
ECE IEEE Projects 2015
1. Matlab project
2. Ns2 project
3. Embedded project
4. Robotics project
Eligibility
Final Year students of
1. BSc (C.S)
2. BCA/B.E(C.S)
3. B.Tech IT
4. BE (C.S)
5. MSc (C.S)
6. MSc (IT)
7. MCA
8. MS (IT)
9. ME(ALL)
10. BE(ECE)(EEE)(E&I)
TECHNOLOGY USED AND FOR TRAINING IN
1. DOT NET
2. C sharp
3. ASP
4. VB
5. SQL SERVER
6. JAVA
7. J2EE
8. STRINGS
9. ORACLE
10. VB dotNET
11. EMBEDDED
12. MAT LAB
13. LAB VIEW
14. Multi Sim
CONTACT US
1 CRORE PROJECTS
Door No: 214/215,2nd Floor,
No. 172, Raahat Plaza, (Shopping Mall) ,Arcot Road, Vadapalani, Chennai,
Tamin Nadu, INDIA - 600 026
Email id: 1croreprojects@gmail.com
website:1croreprojects.com
Phone : +91 97518 00789 / +91 72999 51536
Graph Based User Interest Modeling in Twitterraghavr186
This document summarizes a research project on modeling user interests on Twitter using a graph-based approach. The project aims to predict a user's interest profile based on the interests of the users they follow on Twitter. Various graph features are explored as weighting schemes to calculate the influence of followers on a user's interests. Experimental results show that features based on retweets and mentions perform best at predicting interests, with F1 scores around 0.6. A composite model is also proposed that combines predictions from different weighting schemes using learned quality scores. Additionally, a machine learning model is trained to predict interests directly from graph features.
Are Positive or Negative Tweets More "Retweetable" in Brazilian Politics?Molly Gibbons (she/her)
This document summarizes an analysis of tweets containing the terms "Brazil" and "Michel Temer" to understand the political and economic scenario in Brazil. RapidMiner was used to collect tweets over 17 days and the Rosette Text Toolkit categorized tweets and analyzed sentiment. For "Michel Temer", there was a weak to moderate negative correlation between sentiment and retweets, and 75% of tweets were negative. For tweets about "Brazil" categorized as law/politics, 62% were negative and the most mentioned entities were the Senate, President, and Supreme Court. The analysis demonstrates how RapidMiner and Rosette can be used together to understand sentiment in social media posts about political topics.
Predictions of links in graphs based on content and information propagations.
Lecture for the M. Sc. Data Science, Sapienza University of Rome, Spring 2016.
The document summarizes a research paper that proposes a personalized recommendation approach combining social network factors like interpersonal interest similarity and interpersonal rating behavior similarity. It uses probabilistic matrix factorization to predict ratings by considering these social network factors. The approach is evaluated on two large real-world social rating datasets and shows improved performance over approaches that only use social network information.
This document is a project report submitted by four students - Anil Shrestha, Bijay Sahani, Bimal Shrestha, and Deshbhakta Khanal - to the Department of Electronics and Computer Engineering at Tribhuvan University in partial fulfillment of the requirements for a Bachelor's degree in Computer Engineering. The report details the development of a web application called "Tweezer" to perform sentiment analysis on tweets in order to determine public sentiment towards various products, services, or personalities. Literature on previous work related to sentiment analysis, especially on social media data like tweets, is also reviewed in the report.
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...pharmaindexing
This document discusses an efficient semantic data alignment approach based on fuzzy c-means (FCM) clustering to infer user search goals using feedback sessions. It aims to cluster similar pseudo-documents representing user feedback sessions to better understand user search intents for a given query. The approach first collects feedback sessions from search results and generates pseudo-documents. It then uses FCM clustering to group similar pseudo-documents while also measuring semantic similarity between terms. This is an improvement over k-means clustering which does not consider semantic similarity. The results are evaluated using metrics like classified average precision and show this FCM-based approach performs better than clustering without semantic alignment.
A Survey Of Collaborative Filtering Techniquestengyue5i5j
This document provides a survey of collaborative filtering techniques. It begins with an introduction to collaborative filtering and its main challenges, such as data sparsity, scalability, and synonymy. It then describes three main categories of collaborative filtering techniques: memory-based, model-based, and hybrid approaches. Representative algorithms from each category are discussed and analyzed in terms of their predictive performance and ability to address collaborative filtering challenges. The document concludes with a discussion of evaluating collaborative filtering algorithms and commonly used datasets.
DYNAMIC LARGE SCALE DATA ON TWITTER USING SENTIMENT ANALYSIS AND TOPIC MODELINGAndry Alamsyah
1. The document presents a case study analyzing tweets about Uber using sentiment analysis and topic modeling to understand public opinion from large-scale social media data.
2. Sentiment analysis classified tweets as positive, negative, or neutral, while topic modeling identified dominant topics of discussion, like promotions or driver complaints.
3. The analyses found that positive tweets often discussed promotions while negative tweets addressed issues like sexual harassment allegations or unsatisfactory drivers.
- The study analyzed over 43,000 ratings of tweets collected through a website that had users rate tweets in exchange for receiving feedback on their own tweets.
- They found that 36% of rated tweets were considered worth reading, 25% were not worth reading, and 39% were neutral. This suggests that users tolerate a large amount of less desirable content in their feeds.
- Through regression analysis, they determined that tweets sharing information, asking questions of followers, and self-promotion links were most valued, while presence maintenance updates, conversations, and personal status updates were less valued.
Prediction of Reaction towards Textual Posts in Social NetworksMohamed El-Geish
Posting on social networks could be a gratifying or a terrifying experience depending on the reaction the post and its author —by association— receive from the readers. To better understand what makes a post popular, this project inquires into the factors that determine the number of likes, comments, and shares a textual post gets on LinkedIn; and finds a predictor function that can estimate those quantitative social gestures.
In this paper, we consider a criminal investigation on the collective guilt of part members in a working group. Assuming that the statistics we used are reliable, we present the Page Rank Model based on mutual information. First, we use the average mutual information between non-suspicious topics and the suspicious topics to score the topics by degree of suspicion. Second, we build the correlation matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Then, we set the original value for all members of the working group based on the degree of suspicion. At the last, we calculate the suspected degree of each member in the working group. In the small 10-people case, we build the improved Page Rank model. By calculating the statistics of this case, we acquire a table which indicates the ranking of the suspected degree. In contrast with the results given in this issue, we find these two results basically match each other, indicating the model we have built is feasible. In the current case, firstly, we obtain a ranking list on 15 topics in order of suspicion via Page Rank Model based on mutual information. Secondly, we acquire the stable point of Markov state transition matrix using the Markov chain. Then, we build the connection matrix based on the degree of suspicion and acquire the corresponding Markov state transition matrix. Last, we calculate the degree of 83 candidates. From the result, we can see that those suspicious are on the top of the ranking list while those innocent people are at the bottom of the list, representing that the model we have built is feasible. When suspicious topics and conspirators changed, a relatively good result can also be obtained by this model. In the current case, we have the evidence to believe that Dolores and Jerome, who are the senior managers, have significant suspicion. It is recommended that future attention should be paid to them. The Page Rank Model, based on mutual information, takes full account of the information flow in message distribution network. This model can not only deal with the statistics used in conspiracy, but also be applied to detect the infected cells in a biological network. Finally, we present the advantages and disadvantages of this model and the direction of improvements.
Social media recommendation based on people and tags (final)es712
1) The document proposes methods to generate personalized recommendations in social media platforms based on people relationships and tags.
2) An evaluation of three recommendation approaches that utilize direct tags, indirect tags through related items, and incoming tags from other users found that a combination of direct tags and incoming tags most accurately represented a user's interests.
3) A user study tested five recommendation approaches and found that combining people relationships and tags into a user profile achieved the highest ratings for interesting recommendations and lowest for non-interesting items.
IRJET- Quantify Mutually Dependent Privacy Risks with Locality DataIRJET Journal
This document discusses how co-location information shared on social networks can threaten users' location privacy by enabling more accurate localization of users' locations over time. It formalizes the problem of quantifying privacy risks from co-location data and location information, and proposes optimal and approximate localization attack algorithms to incorporate co-location data. Experimental evaluations on mobility trace data show that considering a single friend's co-locations can decrease a user's median location privacy by up to 62%. Differential privacy perspectives are also discussed. The study aims to quantify the effect of co-location information on location privacy risks.
KnowMe and ShareMe: Understanding Automatically Discovered Personality Trai...Wookjae Maeng
There is much recent work on using the digital footprints left by people on social media to predict personal traits and gain a deeper understanding of individuals. Due to the veracity of social media, imperfections in prediction algorithms, and the sensitive nature of one’s personal traits, much research is still needed to better understand the effectiveness of this line of work, including users’ preferences of sharing their com- putationally derived traits. In this paper, we report a two- part study involving 256 participants, which (1) examines the feasibility and effectiveness of automatically deriving three types of personality traits from Twitter, including Big 5 per- sonality, basic human values, and fundamental needs, and (2) investigates users’ opinions of using and sharing these traits. Our findings show there is a potential feasibility of automati- cally deriving one’s personality traits from social media with various factors impacting the accuracy of models. The re- sults also indicate over 61.5% users are willing to share their derived traits in the workplace and that a number of factors significantly influence their sharing preferences. Since our findings demonstrate the feasibility of automatically infer- ring a user’s personal traits from social media, we discuss their implications for designing a new generation of privacy- preserving, hyper-personalized systems.
Abstract: This paper introduces a system for visual analysis of news articles and emails. The system was developed in response to VAST MiniChallenge 1 and comprises different interfaces for mining textual data and network data.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
In this research work we have built a systems which pulls tweets , pre-process each tweet to remove unwanted artifacts and the gives stemming treatment to each token in the tweet to finally get the score value which is calculated on the basis of numerical strength expressed in mathematical expression for intimacy, trust , social distance between the parties interacting on digital media twitter. The nature of relationship with respect their tie is calculated on the bases of multivariate linear regression analysis and to further improve on this method which have conducted polynomial regression for finding a better and more accurate way to data fit the twitter tie dataset which is apparent from the statistical test value of Coefficient of determinant and other statistical test .The values of y-intercepts are interpreted and illustrated in results section .
This document summarizes a study analyzing social media influence and credibility. A team of students and professors extracted different types of data from Twitter, including mentions of users, tweets by authorities, and keywords. They developed a semantic parser to analyze tweet content using ontological semantic technology. An initial linear score was formulated to measure user influence, and network analysis identified pivotal users between communities. Validation will compare content analysis to real-world events to supplement credibility assessment. The study has potential applications in public policy, business, psychology, and traffic monitoring.
The document summarizes an MSR presentation on rumor detection on real-time Twitter data using supervised learning. It discusses introducing rumor detection and reviewing literature on current methods. The proposed work involves collecting Twitter data, preprocessing it, extracting features, and classifying tweets using techniques like decision trees and SVM. It aims to accurately detect rumors on Twitter in real-time by analyzing sentiment and verifying information with news sources. The implementation strategy and environment are also outlined along with conclusions and future work.
Twitter: Social Network Or News Medium?Serge Beckers
This document analyzes Twitter as a social network and news media by studying its topological characteristics and information diffusion. The authors:
1) Crawled over 41 million user profiles, 1.47 billion social connections, and 106 million tweets to analyze Twitter's structure and behavior.
2) Found that Twitter has a non-power law distribution of followers, short paths of separation between users, and low reciprocity - distinguishing it from other social networks.
3) Ranked users by followers, PageRank and retweets, finding influence inferred from followers differs from popularity of tweets.
4) Analyzed trending topics and found most are news headlines that persist for days with participation from many users.
PURGING OF UNTRUSTWORTHY RECOMMENDATIONS FROM A GRIDijngnjournal
In grid computing, trust has massive significance. There is lot of research to propose various models in providing trusted resource sharing mechanisms. The trust is a belief or perception that various researchers have tried to correlate with some computational model. Trust on any entity can be direct or indirect. Direct trust is the impact of either first impression over the entity or acquired during some direct interaction. Indirect trust is the trust may be due to either reputation gained or recommendations received from various recommenders of a particular domain in a grid or any other domain outside that grid or outside that grid itself. Unfortunately, malicious indirect trust leads to the misuse of valuable resources of the grid. This paper proposes the mechanism of identifying and purging the untrustworthy recommendations in the grid environment. Through the obtained results, we show the way of purging of untrustworthy entities.
Provide individualized suggestions
of data or products related to users’ needs
by Recommender systems (RSs). Even
if RSs have created substantial progresses
in theory and formula development and
have achieved many business successes, a
way to operate the wide accessible info in
online social Networks (OSNs) has been
mainly overlooked. Noticing such a gap in
the existing research in RSs and taking
into account a user’s choice being greatly
influenced by his/her trustworthy friends
and their opinions; this paper proposes a,
Fact Finder technique that improves the
prevailing recommendation approaches by
exploring a new source of data from
friends’ short posts in microbloggings as
micro-reviews.Degree of friends’
sentiment and level being sure to a user’s
choice are known by victimisation
machine learning strategies as well as
Naive Bayes, Logistic Regression and
Decision Trees. As the verification of the
proposed Fact finder, experiments
victimisation real social data from Twitter
microblogger area unit given and results
show the effectiveness and promising of
the planned approach.
This document outlines a proposed framework called TKmeans++ for identifying grey sheep users (GSU) and recommending items to them based on trust relations. The framework has two phases: a GSU identification phase that calculates user weights based on similarity, influence, and trust to assign users to clusters, and a recommendation phase that recommends top items to a GSU based on items positively rated by other GSU. An experimental study on the Epinions dataset shows TKmeans++ outperforms other clustering algorithms on MAE and coverage metrics for recommending to GSU. Future work could explore matrix factorization approaches or combining clustering and matrix factorization.
IRJET- Personalised Privacy-Preserving Social Recommendation based on Ranking...IRJET Journal
This document proposes a framework called PrivRank that enables privacy-preserving social recommendations. PrivRank aims to protect users' private data from inference attacks while still allowing personalized ranking-based recommendations. It does this by obfuscating users' public data before publishing it to third parties. This allows third parties to provide accurate recommendations without accessing sensitive private user information. The framework is designed to efficiently provide continuous privacy protection for users' data streams over time as new data is published.
IMPROVING HYBRID REPUTATION MODEL THROUGH DYNAMIC REGROUPINGijp2p
Peer-to-Peer (P2P) systems have the ability to bond with millions of clients in business and knowledge
scenario. The mechanism that leads users to distribute files without the need of centralized servers has
achieved wide recognition among internet users. This also permits for a range of applications further than
simple file sharing. he main problem lies in the fact that peers have to customarily intermingle with
mysterious peers in the absence of trusted third parties. Usually the lack of incentives often makes these
strange peers to act as freeriders and thus reduce the system performance. The trustworthiness among
peers is portrayed by applying the knowledge obtained as a result of reputation mechanisms. This paper
endows with a new reputation model in association with a detailed survey of diverse reputation models. The
proposed model suggests a hybrid reputation model through dynamic regrouping..
Scalable recommendation with social contextual informationeSAT Journals
Abstract Recommender systems are used to achieve effective and useful results in a social networks. The social recommendation will provide a social network structure but it is challenging to fuse social contextual factors which are derived from user’s motivation of social behaviors into social recommendation. Here, we introduce two contextual factors in recommender systems which are used to adopt a useful results namely a) individual preference and b) interpersonal influence. Individual preference analyze the social interests of an item content with user’s interest and adopt only users recommended results. Interpersonal influence is analyzing user-user interaction and their specific social relations. Beyond this, we propose a novel probabilistic matrix factorization method to fuse them in a latent space. The scalable algorithm provides a useful results by analyzing the ranking probability of each user social contextual information and also incrementally process the contextual data in large datasets. Keywords: social recommendation, individual preference, interpersonal influence, matrix factorization.
Unification Algorithm in Hefty Iterative Multi-tier Classifiers for Gigantic ...Editor IJAIEM
Dr.G.Anandharaj1, Dr.P.Srimanchari2
1Associate Professor and Head, Department of Computer Science
Adhiparasakthi College of Arts and Science (Autonomous), Kalavai, Vellore (Dt) -632506
2 Assistant Professor and Head, Department of Computer Applications
Erode Arts and Science College (Autonomous), Erode (Dt) - 638001
ABSTRACT
In unpredictable increase in mobile apps, more and more threats migrate from outmoded PC client to mobile device. Compared
with traditional windows Intel alliance in PC, Android alliance dominates in Mobile Internet, the apps replace the PC client
software as the foremost target of hateful usage. In this paper, to improve the confidence status of recent mobile apps, we
propose a methodology to estimate mobile apps based on cloud computing platform and data mining. Compared with
traditional method, such as permission pattern based method, combines the dynamic and static analysis methods to
comprehensively evaluate an Android applications The Internet of Things (IoT) indicates a worldwide network of
interconnected items uniquely addressable, via standard communication protocols. Accordingly, preparing us for the
forthcoming invasion of things, a tool called data fusion can be used to manipulate and manage such data in order to improve
progression efficiency and provide advanced intelligence. In this paper, we propose an efficient multidimensional fusion
algorithm for IoT data based on partitioning. Finally, the attribute reduction and rule extraction methods are used to obtain the
synthesis results. By means of proving a few theorems and simulation, the correctness and effectiveness of this algorithm is
illustrated. This paper introduces and investigates large iterative multitier ensemble (LIME) classifiers specifically tailored for
big data. These classifiers are very hefty, but are quite easy to generate and use. They can be so large that it makes sense to use
them only for big data. Our experiments compare LIME classifiers with various vile classifiers and standard ordinary ensemble
Meta classifiers. The results obtained demonstrate that LIME classifiers can significantly increase the accuracy of
classifications. LIME classifiers made better than the base classifiers and standard ensemble Meta classifiers.
Keywords: LIME classifiers, ensemble Meta classifiers, Internet of Things, Big data
AIST 2015 Conference Paper PresentationFalguni Roy
The document presents a framework for computing user similarity for collaborative filtering using dynamic implicit trust. It proposes combining similarity, trust, and time into a single function to address issues with existing implicit trust recommender systems. The framework includes modules for similarity computation, trust computation, and a combined trust and similarity computation. The trust computation module defines a new implicit trust method that considers users' changing interests over time. An experiment on Movielens data shows the proposed approach achieves reasonable accuracy in recommendations.
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGIRJET Journal
This document describes a system for performing sentiment analysis on social media data using deep learning techniques. The system uses entity recognition and sentiment analysis to automatically generate random variables and rules for a Bayesian network model. The model is trained using Twitter data to determine the likelihood that a user will visit a location based on their tweets. The system achieved 93% accuracy in classifying tweets as positive or negative sentiment towards different locations. The authors propose that this approach could be adapted to analyze sentiment across different social media platforms and topics.
This document summarizes a research paper that proposes a novel approach to discovering user interests on e-commerce websites based on their clickstream data. The approach involves developing a rough leader clustering algorithm using indicators like category visiting paths, visiting frequencies, and durations to measure user similarities and group users into clusters with similar interests. The algorithm starts with a random leader and assigns other users to clusters based on similarity thresholds. It allows users to belong to multiple clusters to account for overlapping interests.
Testing Vitality Ranking and Prediction in Social Networking Services With Dy...reshma reshu
Social networking services have been prevalent at many online communities such as Twitter.com and Weibo.com, where millions of users keep interacting with each other every day. One interesting and important problem in the social networking services is to rank users based on their vitality in a timely fashion. An accurate ranking list of user vitality could benefit many parties in social network services such as the ads providers and site operators. Although it is very promising to obtain a vitality-based ranking list of users, there are many technical challenges due to the large scale and dynamics of social networking data .
Participatory Sensing through Social Networks: The Tension between Participat...Ioannis Krontiris
This paper corresponds to publication:
I. Krontiris, F.C. Freiling, "Urban Sensing through Social Networks: The Tension between Participation and Privacy", International Tyrrhenian Workshop on Digital Communications (ITWDC), Island of Ponza, Italy, September 2010.
https://pi1.informatik.uni-mannheim.de/filepool/publications/ITWDC_2010.pdf
Quality of Claim Metrics in Social Sensing Systems: A case study on IranDealMaynooth University
There is an ongoing trend in social sensing where people act as sensors and report the events happening in their surroundings. These claims are often reported by smartphones and need to be processed to discover new patterns of events. Since these claims are not generated with consistent quality, the processing and evaluation tasks can become a challenge. In this paper, we address questions on how the quality of each claim can be evaluated, and which factors should be considered to qualify the quality of the claims. To do this, we investigate the sources of low-quality claims an propose a new form of Quality of Claim (QoC) metrics. We categorize the Quality of Claim factors into two classes of Content Measure and Feedback Measure. The study is performed on Two datasets. The main dataset is the #IranDeal extracted from Twitter. To compare the quality metrics, a second dataset is crawled from the Fouresqure social network. The metrics follow the power law pattern and are modeled by a Zipfian distribution function. The results show the power degree varies from 1.75 to 5. A number of factors are discussed as an influencer of the variation, such as the query criteria of the extracted dataset, the characteristics of the QoC metric, and the type of the social network.
https://ieeexplore.ieee.org/document/7802128
Avoiding Anonymous Users in Multiple Social Media Networks (SMN)paperpublications3
Abstract: The main aim of this project is secure the user login and data sharing among the social networks like Gmail, Facebook and also find anonymous user using this networks. If the original user not available in the networks, but their friends or anonymous user knows their login details means possible to misuse their chats. In this project we have to overcome the anonymous user using the network without original user knowledge. Unauthorized user using the login to chat, share images or videos etc This is the problem to be overcome in this project .That means user first register their details with one secured question and answer. Because the anonymous user can delete their chat or data In this by using the secured questions we have to recover the unauthorized user chat history or sharing details with their IP address or MAC address. So in this project they have found out a way to prevent the anonymous users misuse the original user login details.
Twitter is a free social networking microblogging service that allows registered members to broadcast, in real-time, short posts called tweets. Twitter members can broadcast tweets and follow other users’ tweets by using multiple devices, making this information system one of the fastest in the world. In this chapter, we leverage this characteristic to introduce a novel topic-detection method aimed at informing, in real-time, a specific user about the most emerging arguments expressed by the network around his/her domain interests. With this goal, we aim at formalizing the information spread over the network by studying the topology of the network and by modeling the implicit and explicit connections among the users. Then, we propose an innovative term aging model, based on a biological metaphor, to retrieve the freshest arguments of discussion, represented through a minimal set of terms, expressed by the community within the foci of interest of a specific user. We finally test the proposed model through various experiments and user studies.
- The document presents a study analyzing factors that influence continued participation in Twitter group chats. It develops a 5F model examining individual initiative, group characteristics, perceived receptivity, linguistic affinity, and geographic proximity.
- The study analyzes data from 30 educational Twitter chats over two years involving over 71,000 users. It also conducted a user survey.
- The 5F model effectively predicts whether a new user who attends one session will return based on analysis of their contributions, how well they fit with the group linguistically, and other metrics.
Similar to What Sets Verified Users apart? Insights Into, Analysis of and Prediction of Verified Users on Twitter (20)
Responsible & Safe AI Systems at ACM India ROCS at IIT BombayIIIT Hyderabad
This document discusses responsible and safe artificial intelligence. It summarizes PK's work on developing context-aware models to reduce bias in large language models and techniques for removing harmful knowledge from models. The talk outlines issues like inconsistency in models, bias indicators, and corrective machine unlearning. It encourages collaborating to advance this important research and addresses building more accountability as models grow more powerful.
International Collaboration: Experiences, Challenges, Success storiesIIIT Hyderabad
This document discusses strategies for successful international collaboration, including maintaining an active website, pursuing joint grant proposals, student exchanges through co-advising and visits, organizing workshops together, and publishing joint papers. It emphasizes finding connections through one's existing network to avoid cold emails, and developing equal partnerships.
The document summarizes a workshop on responsible and safe AI held at IIT Madras. It discusses topics like legal bias and inconsistency in large language models, bias in AI systems, and approaches to make models more interpretable and remove harmful knowledge. Live demonstrations of ChatGPT were shown to illustrate issues like factual inconsistencies and how context is needed to avoid confusion. Overall, the workshop highlighted challenges with AI systems and ongoing research efforts to address issues like bias, lack of context, and removal of harmful information.
Identify, Inspect and Intervene Multimodal Fake NewsIIIT Hyderabad
Fake news refers to intentionally and verifiably false stories created to manipulate people’s perceptions of reality.
The concept of fake news is not new and has marked its presence dating back to AD 1475, affecting the citizens of Italy on eastern Sunday to the COVID-19 pandemic in 2020. Fake news has gained traction among audiences, created a buzz online, and faced repercussions offline. For instance, intruding hyperbolized fake articles into political campaigns or health and climate studies is havoc. In addition, the proliferation of fabricated stories has played a crucial role in inflaming or suppressing a social event. In conclusion, fake news is destructive and can lead to hatred against religion, politics, celebrities or organizations, resulting in riots/protests or even death.
The massive growth in the proliferation of fake news online might result from numerous technological advancements. Fake news seems to be the permanent reality, with social media being a primary conduit for its creation and dissemination. Despite the difficulty in identifying, tracking, and controlling unreliable content, there must be an effort to halt its expansion. Our research endeavors contribute to tackling various aspects of fake news, encompassing identification, inspection, and intervention. The premise of our thesis is firmly placed at the point where we analyze multiple facets of user-generated content produced online in the form of text and visuals to investigate the field of fake news.
First, we focus on devising different methods to Identify, a.k.a. detect fake news online, by extracting different feature sets from the given information. By designing foundational detection mechanisms, our work accelerates research innovations. Second, our research closely Inspects the fake stories from two perspectives. First, from the information point of view, one can inspect fabricated content to identify the patterns of false reports disseminating over the web, the modality used to create the fabricated content and the platform used for dissemination. Next, from the model point of view, we inspect detection mechanisms used in prior work and their generalizability to other datasets. The thesis also suggests Intervention techniques to help internet users broaden their comprehension of fake news. We discuss potential practical implications for social media platform owners and policymakers.
Data Science for Social Good: #MentalHealth #CodeMix #LegalNLP #AISafetyIIIT Hyderabad
Discuss work on using technology for Judiciary, Lawyers, etc. Analyse social media data, music listening habits for mental health. Bias and Safety in AI Systems.
Papers are available at https://precog.iiit.ac.in/pages/publications.html
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
This document appears to be the transcript from a B.Tech orientation presentation given by Ponnurangam Kumaraguru at IIIT Hyderabad. The presentation provides advice and encouragement to new students on managing their time at IIIT Hyderabad. It emphasizes making friends, trying new activities and clubs, controlling wants and FOMO, celebrating failures, and using social media to connect with others in a positive way. References are made to movies to illustrate points about perseverance, finding passion, and having a growth mindset during the transition to university life.
Beyond the Surface: A Computational Exploration of Linguistic AmbiguityIIIT Hyderabad
We investigate two specific forms of linguistic ambiguities - polysemy, which is the multiplicity of meanings for a specific word, and tautology, which are seemingly uninformative and ambiguous phrases used in conversations. Both phenomena are widely-known manifestations of linguistic ambiguity at the lexical and pragmatic level, respectively.
The first part of the thesis focuses on addressing this challenge by proposing a new method for quantifying the degree of polysemy in words, which refers to the number of distinct meanings that a word can have. The proposed approach is a novel, unsupervised framework to compute and estimate polysemy scores for words in multiple languages, infusing syntactic knowledge in the form of dependency structures. The proposed framework is tested on curated datasets controlling for different sense distributions of words in three typologically diverse languages - English, French, and Spanish. The framework leverages contextual language models and syntactic structures to empirically support the widely held theoretical linguistic notion that syntax is intricately linked to ambiguity/polysemy.
The second part of the thesis explores how language models handle colloquial tautologies, a type of redundancy commonly used in conversational speech. We first present a dataset of colloquial tautologies and evaluate several state-of-the-art language models on this dataset using perplexity scores. We conduct probing experiments while controlling for the noun type, context and form of tautologies. The results reveal that BERT and GPT2 perform better with modal forms and human nouns, which aligns with previous literature and human intuition.
Data Science for Social Good: #LegalNLP #AlgorithmicBias...IIIT Hyderabad
Talk describes legal NLP idea discusses the following papers:
HLDC: Hindi Legal Documents Corpus https://precog.iiit.ac.in/pubs/HLDC_ACL_2022.pdf
Drug consumption: https://precog.iiit.ac.in/pubs/Effect_oF_Feedback_on_Drug_Consumption_Disclosures_on_Social_Media___ICWSM2023___16Sept1730hrs.pdf
This document provides tips for writing a good research paper. It discusses selecting an appropriate topic and audience, developing an outline, writing drafts for feedback, choosing a descriptive title, writing a literature review, crafting an introduction, including figures and tables, addressing reviewer comments, avoiding plagiarism, and acknowledging collaborators. The goal is to write papers that clearly communicate research and can be improved based on feedback from others.
Data Science for Social Good: #LegalNLP #AlgorithmicBiasIIIT Hyderabad
This document summarizes research on evaluating algorithmic bias in models trained on Hindi legal documents. The researchers collected a dataset of 900k legal documents from Uttar Pradesh courts in Hindi. They trained a bail prediction model on this data and evaluated it for demographic parity bias related to religious attributes. The results showed the model predictions changed more when replacing Hindu names with Muslim names compared to the reverse, indicating a potential bias against Muslims. Overall, the study highlights the need to evaluate models trained on real-world legal data for fairness to avoid perpetuating societal biases.
I discussed our work on #LegalAI #CodeMixing #FakeNews #Elections and other cool projects that we are currently working on at https://precog.iiit.ac.in/
The document discusses social computing research in India, focusing on legal AI and natural language processing applications. It summarizes work analyzing over 900,000 Hindi legal documents from district courts in Uttar Pradesh. Models were developed for tasks like bail prediction and legal document summarization. The research also addresses challenges in processing code-mixed text and fact-checking social media. Overall, the document outlines current research areas and opportunities in social computing for Indian contexts and languages, and provides contact information for those interested in the work.
Modeling Online User Interactions and their Offline effects on Socio-Technica...IIIT Hyderabad
Do online interactions trigger reactions back in the offline world? How can these reactions be detected and quantified? Specifically, what insights can be extracted for users, platform owners, and policymakers to minimize the potential harm of such reactions?
Society functions based on the complex interactions between individuals, communities, and organizations. The advent of the Internet has enabled these interactions to move online. A website or an application that facilitates the digitization of social interactions is called a socio-technical platform. For instance, individuals converse with each other via direct messaging applications (e.g., WhatsApp, Telegram), share thoughts, and gather feedback from communities (e.g., Reddit, Twitter, Youtube). Trade of goods occurs via e-commerce (e.g., Flipkart, Amazon) and online marketplaces (e.g., Google Play store). At times interactions happening in the online world, trigger reactions in the offline world, which we call overflow. Such overflows can have either a positive or negative impact. Socio-technical platforms save every interaction and associated metadata, providing a unique opportunity to analyze rich data at scale. Discover interaction patterns, detect and quantify overflow of interactions, and extract insights for users and policymakers.
This report aims to study the interactions by keeping the individual as the focal point. We focus on two broad forms of interactions - i) the effect online community feedback can have on individual offline actions and ii) organizations leveraging individual customers' online presence to optimize business processes. In the first part, we work on two scenarios - (a) How does community feedback affect an individual future drug consumption frequency in a drug community forum? and (b) What changes does an individual undergo immediately after getting sudden popularity in Online social media? What actions help in maintaining popularity for longer? In the second part, we leverage online information about a customer to improve the prediction of Return-to-Origin in the e-commerce platform.
Privacy. Winter School on “Topics in Digital Trust”. IIT BombayIIIT Hyderabad
The document provides an overview of privacy concepts including definitions of privacy, forms of privacy, social media privacy, data anonymity, and ethics around studying privacy. It discusses Westin's four states of privacy (solitude, intimacy, anonymity, reserve) and Solove's taxonomy of privacy harms. It also covers Westin's privacy indexes, privacy studies conducted in India, OECD and FTC privacy principles, and the costs of reading privacy policies.
The document then discusses privacy enhancing technologies like communication anonymizers, shared bogus online accounts, obfuscation, and anonymization. Examples of privacy invasive technologies like spyware and RFID are also provided. Privacy decision making frameworks like Platform for Privacy Preferences (P3P)
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
The document provides advice and guidance for students transitioning to campus life from PK, a professor at IIIT Hyderabad. It includes quotes and links related to time management, pursuing interests, controlling wants and FOMO, exploring options before deciding on majors, aiming high while making consistent small progress, and asking for help when needed. The document acknowledges students who provided inputs and the knowledge gained from others to help with advising students.
It is our choices, Harry, that show what we truly are, far more than our abil...IIIT Hyderabad
The document is a transcript of a talk given by PK to B.Tech orientation students at IIIT Hyderabad. Some key points from the talk include:
- Encouraging students to make friends, participate in clubs and extracurricular activities.
- Emphasizing time management and not missing out on opportunities in the first two semesters.
- Advising students to explore different projects and areas before deciding on a focus or specialization.
- Noting that managing courses, social life, hobbies and more will be challenging but important during their time at IIIT.
Development of Stress Induction and Detection System to Study its Effect on B...IIIT Hyderabad
Stress has become a significant mental health problem of the 21st century. The number of people suffering from stress is increasing rapidly. Thus, easy-to-use, inexpensive, and accurate biomarkers are needed to detect stress during its inception. Early detection of stress-related diseases allows people to access healthcare services. This thesis focuses on the development of stress stimuli and the detection of stress induced by these stimuli. Identifying brain regions affected while exposing the subject to these stressful stimuli has also been done. Three different stimuli, viz. videos, gamified application, and a game, are investigated to study their effect as stress induction stimuli. To this end, in this thesis, a system is proposed to classify participants into stressed and non-stressed categories using machine learning, deep learning, and statistical techniques. The statistical significance between stressed and non-stressed was found using Higuchi Fractal Dimensions (HFD) feature extracted from EEG. This feature also helped identify the brain’s most affected region due to stress. Another outcome of this thesis is the extra annotation of the ground truth which further helps to validate the participant’s experience under the influence of stressful stimuli. This annotation was performed by evaluating participant performance under time pressure. In addition, a technique based on in-game analytics is presented to complement the betterment of self-reported data. Further, another dimension utilizing signatures from WiFi Media Access Control (MAC) layer traffic is presented to detect stress indicators in a device-agnostic way.
A Framework for Automatic Question Answering in Indian LanguagesIIIT Hyderabad
The distribution of research efforts done in the field of Natural Language
Processing (NLP) has not been uniform across all natural languages. It has
been observed that there is a significant gap between the development of
NLP tools in Indic languages (indic-NLP), and in European languages. We
aim to explore different directions to develop an automatic question answering system for Indic languages. We built a FAQ-retrieval based chatbot for
healthcare workers and young mothers of India. It supported Hindi language in either Devanagri script or Roman script. We observed that, in our
FAQ database, if there exists a question similar to the query asked by the
user, then the developed chatbot is able to find a relevant Question-Answer
pair (QnA) among its top-3 suggestions 70% of the time. We also observed
that performance of our chatbot is dependent on the diversity in the FAQ
database. Since database creation requires substantial manual efforts, we decided to explore other ways to curate knowledge from raw text irrespective
of domain.
We developed an Open Information Extraction (OIE) tool for Indic languages. During the preprocessing, chunking of text is performed with our
fine-tuned chunker, and the phrase-level dependency tree was constructed
using the predicted chunks. In order to generate triples, various rules were
handcrafted using the dependency relations in Indic languages. Our method
performed better than other multilingual OIE tools on manual and automatic evaluations. The contextual embeddings used in this work does not
take syntactic structure of sentence into consideration. Hence, we devised
an architecture that takes the dependency tree of the sentence into consideration to calculate Dependency-aware Transformer (DaT) embeddings.
Since the dependency tree is also a graph, we used Graph Convolution
Network (GCN) to incorporate the dependency information into the contextual embeddings, thus producing DaT embeddings. We used a hate-speech
detection task to evaluate the effectiveness of DaT embeddings. Our future
plan is to evaluate the applicability of DaT embeddings for the task of chunking. Moreover, the broader aim for the future is to develop an end-to-end
pronoun resolution model to improve the quality of triples and DaT embeddings. We also aim to explore the applicability of all our works to solve the
problem of long-context question answering.
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Electric vehicle and photovoltaic advanced roles in enhancing the financial p...IJECEIAES
Climate change's impact on the planet forced the United Nations and governments to promote green energies and electric transportation. The deployments of photovoltaic (PV) and electric vehicle (EV) systems gained stronger momentum due to their numerous advantages over fossil fuel types. The advantages go beyond sustainability to reach financial support and stability. The work in this paper introduces the hybrid system between PV and EV to support industrial and commercial plants. This paper covers the theoretical framework of the proposed hybrid system including the required equation to complete the cost analysis when PV and EV are present. In addition, the proposed design diagram which sets the priorities and requirements of the system is presented. The proposed approach allows setup to advance their power stability, especially during power outages. The presented information supports researchers and plant owners to complete the necessary analysis while promoting the deployment of clean energy. The result of a case study that represents a dairy milk farmer supports the theoretical works and highlights its advanced benefits to existing plants. The short return on investment of the proposed approach supports the paper's novelty approach for the sustainable electrical system. In addition, the proposed system allows for an isolated power setup without the need for a transmission line which enhances the safety of the electrical network
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Introduction- e - waste – definition - sources of e-waste– hazardous substances in e-waste - effects of e-waste on environment and human health- need for e-waste management– e-waste handling rules - waste minimization techniques for managing e-waste – recycling of e-waste - disposal treatment methods of e- waste – mechanism of extraction of precious metal from leaching solution-global Scenario of E-waste – E-waste in India- case studies.
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
What Sets Verified Users apart? Insights Into, Analysis of and Prediction of Verified Users on Twitter
1. What sets Verified Users
apart?
Insights into, Analysis of and Prediction of
Verified Users on Twitter
Indraneil Paul
Masters Thesis Defense
14th September 2019
IIIT Hyderabad
Committee Members
Dr. Ponnurangam Kumaraguru (Advisor)
Dr. Kamalakar Karlapalem
Dr. Vikram Pudi
2. Outline
A: PROBLEM AND MOTIVATION
➢ Perceived influence of verification
➢ Understanding what sets verified
users apart
2
B: DATASET DESCRIPTION
➢ Description of data collection
➢ Summary data statistics
D: TOPIC ANALYSIS
➢ Study divergence between verified
users and the rest for tweet topics
➢ Study divergence in topic diversity
C: METADATA/ACTIVITY
ANALYSIS
➢ Study divergence of verified users
from the rest for temporal activity
and metadata signatures
➢ Deconstruct users into profiles
3. Outline
A: PROBLEM AND MOTIVATION
➢ Perceived influence of verification
➢ Understanding what sets verified
users apart
3
B: DATASET DESCRIPTION
➢ Description of data collection
➢ Summary data statistics
D: TOPIC ANALYSIS
➢ Study divergence between verified
users and the rest for tweet topics
➢ Study divergence in topic diversity
C: METADATA/ACTIVITY
ANALYSIS
➢ Study divergence of verified users
from the rest for temporal activity
and metadata signatures
➢ Deconstruct users into profiles
4. Ambiguity in Perception
● Twitter, Facebook and Instagram have incorporated a verification
process to authenticate handles they deem important enough to be
worth impersonating.
● However, despite repeated statements by Twitter about verification not
being equivalent to endorsement, users have misinterpreted the
authenticity it is meant to convey with credibility.
● The rarity of the status and its prominent visual signalling are the likely
causes for this.
4
5. This perception of verification lending credence has led Twitter to receive a lot
of flak in recent times, especially for harbouring bias against certain groups.
We try to demonstrate that the attainment of verified status by users can be
explained away by less insidious factors based on user activity trajectory,
tweet sentiment and tweet contents.
5
Ambiguity in Perception
6. Visual Incentive
1. Presence of authority and authenticity indicators:
Lends further credibility to the Tweets made by a user handle
2. Presentation over relevance:
Psychological testing reveals that credibility evaluation of online
content is influenced by its presentation rather than its relevance or
apparent credulity
Attaining verified status might lead to a user’s content being more frequently
liked and retweeted.
6
7. Heuristic Models
The average user devotes only three seconds of attention per Tweet. This is
symptomatic of users resorting to content evaluation heuristics.
One such relevant heuristic is the Endorsement heuristic, which is
associated with credibility conferred to content by visual markers.
The presence of a marker such as a verified badge could hence, be the
difference between a user reading a Tweet in a congested feed or
completely ignoring it.
7
8. Heuristic Models
Another pertinent heuristic is the Consistency heuristic, which stems from
endorsements by several authorities. This is important because a verified
user on one social media platform is likelier to be verified on other platforms
as well.
Hence, we posit that possessing a verified status can make a world of
difference in the outreach/influence of a brand or individual in terms of the
extent and quality.
8
9. Coveted Nature
Unsurprisingly, a verified status is highly
sought after by preeminent entities and
businesses, as evidenced by the
prevalence of get-verified-quick
schemes.
Instead of resorting to questionable
schemes, accounts can follow our insights
to increase their platform reach and
improve their chances of verification.
9
10. Outline
A: PROBLEM AND MOTIVATION
➢ Perceived influence of verification
➢ Understanding what sets verified
users apart
10
B: DATASET DESCRIPTION
➢ Description of data collection
➢ Summary data statistics
D: TOPIC ANALYSIS
➢ Study divergence between verified
users and the rest for tweet topics
➢ Study divergence in topic diversity
C: METADATA/ACTIVITY
ANALYSIS
➢ Study divergence of verified users
from the rest for temporal activity
and metadata signatures
➢ Deconstruct users into profiles
11. Collection Approach
We queried the Twitter REST API for the following:
1. The @verified handle on Twitter follows all accounts on the platform
that are currently verified. We queried this handle on the 18th of July
2018 and extracted the user IDs.
2. We obtained the user objects for all verified users and subsetted for
English speaking users obtaining 231,235 users.
3. Additionally, we leveraged Twitter’s Firehose API – a near real-time
stream of public tweets and accompanying author metadata.
11
12. Collection Approach
We used the Firehose to sample a set of
175,930 non-verified users by controlling
for number of followers - a conventional
metric of public interest.
This was done by ensuring that the number
of followers of every non-verified user was
within 2% of that of a unique verified user
we had previously acquired. For each of
the aforementioned user, data and
metadata including friends, tweet content
and sentiment, activity time series, and
profile reach trajectories was gathered.
12
14. Collected Metadata
For each verified member, we collected the following metadata:
1. Followers count
2. Friends count
3. Status count
4. Public list memberships
14
16. 494 million
Tweets collected over a one year period
175,930
English language Twitter non-verified users
16
Verified User Network
English language Twitter verified users
231,235
17. Class Imbalance
To prevent any effects of a skewed
class distribution from affecting
results, we applied two class
rebalancing methods to rectify this.
A minority oversampling technique
called ADASYN was used. It creates
synthetic minority samples based
on interpolation between already
existing samples.
17
18. Class Imbalance
Additionally, we use a hybrid over and under sampling technique called
SMOTE Tomek that also eliminates samples of the overrepresented class.
For a pair of opposing class points that are each other's closest
neighbours (tomek link), the majority class point is eliminated.
18
20. Outline
A: PROBLEM AND MOTIVATION
➢ Perceived influence of verification
➢ Understanding what sets verified
users apart
20
B: DATASET DESCRIPTION
➢ Description of data collection
➢ Summary data statistics
D: TOPIC ANALYSIS
➢ Study divergence between verified
users and the rest for tweet topics
➢ Study divergence in topic diversity
C: METADATA/ACTIVITY
ANALYSIS
➢ Study divergence of verified users
from the rest for temporal activity
and metadata signatures
➢ Deconstruct users into profiles
21. Attracting Components
Attracting components are components in
a directed graph in which, if a random walk
enters, it can never leave.
The acquired network consists of 6091
attracting components.
At the core of these components lie
famous personalities (high in-degree users)
who do not follow any other handle.
21
22. Reciprocity
The verified network has a reciprocity rate of 33.7%.
This is lower than usually seen in other social networks such as Flickr (68%).
The likely cause of this is the prevalence of brands and third-party sources
of curated and crawled information, which typically do not reciprocate
engagements.
22
23. Power Law
Power-law is a key component in characterizing degree distribution of
networks gathered from various sources. It refers to the presence of the
following distributional property:
This is closely related to the concept of the Pareto distribution or the 80-20
rule, where 20 percent of an entity is responsible for 80 percent of its
characteristics.
We explore the presence of power laws in the network degree distribution
and laplacian eigenvalue distribution.
23
24. Power Law Inference
The modern statistically-sound methodology for testing the presence of
power laws involves a two step procedure for estimating xmin and then α.
The value of the exponent is given using the closed form expression after
performing the MLE inference shown below:
24
25. Power Law Inference
The optimal value of xmin is determined by iterating over various candidate
intervals and choosing the value for which the Kolmogorov-Smirnov
distance between the CDF of the data in the interval and the power law that
best approximates it is the smallest.
25
26. Eigenvalue Distribution
We computed the 10,000 largest eigenvalues of the Laplacian matrix. The
eigenvalues were computed using the power iteration method in existing
solvers.
Inference of power-law parameters α and xmin is done using the continuous
maximum-likelihood algorithm. Continuous MLE inference for the degree
distribution yields parameter estimates of 3.18 for α and 9377.26 for xmin with
a p value of 0.3
This is in keeping with earlier such findings in Laplacian eigenvalue
distributions of synthetic and real world undirected social network datasets.
26
27. Degree Distribution
Further, we carry out a similar inference procedure for the out degree
distribution of the nodes.
Inference of power-law parameters α and xmin is done using the discrete
maximum-likelihood algorithm. Discrete MLE inference for the degree
distribution yields parameter estimates of 3.24 for α and 1334 for xmin with a p
value of 0.13
Our findings are in contrast with the absence of a power-law in the degree
distribution when analyzing the whole Twitter network, as reported by
existing work.
27
28. Degrees of Separation
Existing work such as the 6 degrees of
separation and the small-world model
after named after findings that many
social and technological networks
possessed small average path lengths.
The verified network is even more
extreme in this aspect with an average
node distance of 2.74 which is much
lower than previous sampling estimates
for all of Twitter (3.43, 4.12)
28
29. Bio Analysis
Each user on Twitter can have a biography (or bio) allowing him/her to
describe themselves using a limited number of characters.
We attempt to gain insights from some of the most popular unigrams,
bigrams and trigrams occurring in the bios of verified users.
We also filter out n-grams constituted largely of non-informative words.
A running theme common to all three cases is the dominance of journalists
and news and weather outlets. Being a preeminent journalist in an English
media outlet seems to be one of the surest ways to get verified on Twitter.
29
30. Bio Analysis
The most frequent unigrams portray
several underlying themes such as:
1. They include cross-links to other
social media handles (e.g. Instagram)
2. Professional descriptors (e.g. Tech)
Bigrams and trigrams reiterate a largely
similar narrative, dominated by generic
descriptors (e.g. Official Account) and
business descriptors (e,g, Weather Alerts)
30
32. Network Centrality
We delve into how a user’s centrality in this network correlates with
conventional metrics of reach such as follower and list membership count.
Public list membership has been shown to be a robust predictor of
influence and topical relevance on Twitter.
32
33. Network Centrality
We observe that public list membership and follower count in the entire
Twitter network is positively correlated with PageRank and Betweenness
centrality of that user in the English verified user sub-graph.
This backs up the general perception that a verified status is afforded, not
just as a mark of authenticity, but also sufficient public interest.
33
34. Autocorrelation
We check for existing auto correlations in the time series using the Ljung-
Box and the Box-Pierce portmanteau tests.
If the p values returned by the test are greater than 0.05, then the time-
lagged correlation cannot be ruled out with a 95% significance level. The
Ljung-Box and Box-Pierce test results indicate a maximum p value of
3.81×10-38 and 7.57×10-38 respectively, thus strongly ruling out any lagged
correlation.
This counters intuitive expectations that there would be a significant auto
correlation in a week’s lag given that activity rates on Sundays are reliably
lower than those on weekdays.
34
36. Stationarity
We next inquire whether the activity time series is stationary or not using a
time series changepoint detection mechanism called Pruned Exact Linear
Time (PELT).
We assume that this time series is drawn from a normal distribution, with
mean and variance that can change at a discrete number of change-points.
We use the PELT algorithm to maximize the log-likelihood for the means
and variances of the underlying distribution with a penalty for the number of
change-points.
36
37. Stationarity
Results from several runs of the algorithm are recorded while cooling down
the penalty factor and ramping up the number of change-points. Dates that
fall in the change-point list in a significant number of runs of the algorithm
are considered viable change-point candidates
We only find weak evidence for a changepoint around Christmas of 2017.
37
38. Stationarity
Existing work on smaller social networks, such as Gab, reveal that the
activity time series drastically change in response to socio-political events
occurring outside the network.
Hence, to investigate further, we employ an Augmented Dickey-Fuller test
with both a constant term and a trend term. For upwards of 250
observations (we have 366) the critical value of the test is −3.42 when using a
constant and a trend term at the 95% significance level. If the test statistic
value is more negative than the critical threshold, we reject the null
hypothesis of a unit root and conclude the presence of stationarity.
Our test, returns a test statistic of −3.86 which is significantly more negative
than the critical threshold, thus strongly suggesting stationarity
38
39. User Data Classification
We commence our analysis by eliminating all features that could be
deemed surplus to requirements. To this end, we employed an all-relevant
feature selection model which classifies features into three categories:
confirmed, tentative and rejected. We only retain features that the model
is able to confirm over 100 iterations.
Using the rich set of features collected, we are able to consistently attain a
near-perfect classification accuracy of over 98%. Our results suggest that a
very competent classification of the Twitter user verification status is
possible without resorting to complex deep-learning pipelines that sacrifice
interpretability.
39
41. Feature Importance
To compare the usefulness of various
categories of features, we trained
gradient boosting classifier, our most
competitive model, using each category
of features alone.
Evaluated on randomized train-test
splits of our dataset, user metadata and
content features were both able to
consistently surpass 0.88 AUC. Also,
temporal features alone are able to
consistently attain an AUC of over 0.79.
41
42. Feature Importance
The individual feature importances
were determined using the Gini
impurity reduction metric output by the
gradient boosting model.
To rank the most important features
reliably, the model was trained 100 times
with varying combinations of
hyperparameters.
The most reliable discriminative features
are shown.
42
43. Feature Importance
Some features are intuitively separable,
making an informed prediction possible.
The top 6 features are sufficient to
attain 0.9 AUC on their own right.
For instance, the very highest public list
membership counts and prevalence of
positive sentiment in Tweets are
populated exclusively by verified users
while the very lowest propensities for
authoritative speech as indicated by
LIWC Clout summary scores are
exclusively shown by non-verified users.
43
44. Profile Clustering
In order to characterize accounts with a
higher resolution, we attempt to cluster
them. We apply K-Means++ on the
normalized user vectors selecting the 30
most discriminative features indicated by
the XGBoost model, eventually settling
on 8 different clusters by tuning the
perplexity metric.
In the interest of intuitive visualization,
two dimensional embeddings obtained
via t-SNE are shown alongside.
44
45. Strongly Non-Verified
Cluster C0 can largely be characterized
as the Twitter layman with a high
proportion of experiential tweets. They
have short tweets, high incidence of
verb usage and score very high in the
LIWC Authenticity summary.
Cluster C2 can be characterized as an
amalgamation of accounts exhibiting bot-
like behavior. Members of this cluster
scored highly on the network and
content automation scores in our feature
set. Extensive usage of hashtags and
outlinks are observed.
45
46. Strongly Verified
Cluster C4 having a tendency to post
longer tweets and retweet more
frequently than author content, while
members of Cluster C6 almost
exclusively retweet on the platform.
Cluster C5 is nearly entirely comprised of
verified users and includes elite Twitter
users that comprise the core of verified
users on the platform. These users have
by far the highest list memberships on
average.
46
47. Mixed Clusters
Clusters C1, C3 and C7 are comprised of
a mix of verified and non-verified users.
Members of cluster C1 are ascendant
both in terms of reach and activity levels
as evidenced by the proportion of their
followers gained and statuses authored
recently. Many users in C1 have obtained
verification in the data collection period.
Members of C3 and C7 who are either
stagnant or declining in their reach and
activity levels and show very low
engagement with the rest of the platform
in terms of retweets and mentions.
47
48. Outline
A: PROBLEM AND MOTIVATION
➢ Perceived influence of verification
➢ Understanding what sets verified
users apart
48
B: DATASET DESCRIPTION
➢ Description of data collection
➢ Summary data statistics
D: TOPIC ANALYSIS
➢ Study divergence between verified
users and the rest for tweet topics
➢ Study divergence in topic diversity
C: METADATA/ACTIVITY
ANALYSIS
➢ Study divergence of verified users
from the rest for temporal activity
and metadata signatures
➢ Deconstruct users into profiles
49. Topic Classification
To glean into Tweet topics we ran the Gibbs Sampling based LDA over
1000 iterations of sampling.
The number of topics was optimally fine tuned to 100 after trying out various
values from 30 to 300 using perplexity values.
Instead of topic modelling on a per-Tweet basis and aggregating per user
we apply the author-topic model collating all of a user’s Tweets and topic
modelling in one go. This is done to work around the fact that most Tweets
are too short to meaningfully infer topics.
We use the default document-topic densities as well as term-topic
densities as suggested in prior topic modelling studies.
49
50. Topic Classification
50
Our classification models demonstrate that it is eminently possible to infer the
verification status of a user purely using the distribution across topics they tweet
about, with a high accuracy.
The most competitive classifier attained a classification accuracy of 88.2 %.
51. Topic Importance
In the interest of interpretability, we
evaluate the predictive power of each
topic with respect to verification status.
We obtain individual topic importances
using the ANOVA F-Scores output by
GAM.
The procedure is run on 50 random
train-test splits of the dataset and the
topics with the lowest F-Scores noted.
Most discriminative topics with their top
3 keywords were noted.
51
52. Topic Importance
Though there is some overlap between
topics, there are clear patterns to be
observed on some topics using which an
informed prediction can be made.
For instance the users who tweet most
frequently about consequential topics
like climate change and national
politics are all verified while
controversial topics like middle-east
geopolitics and mundane topics like
online sales are something verified
users devote limited attention to.
52
53. Topical Span
We next inquire about the diversity of
Tweet topics.
In order to obtain an optimal mix of the
number of topics per user in an
unsupervised manner, we leveraged
the use of an Hierarchical Dirichlet
Process.
Inference is done using an Online
Variational Bayes estimation using the
previously stated hyperparameters.
53
54. Topical Span
A trend is observed with non-verified users
clearly being over-represented in the
lower reaches of the distribution (1–4
topics), while a comparatively substantial
portion of verified users are situated in the
middle of the distribution (5–10 topics).
Also noteworthy is the fact that the very
upper echelons of topical variety in tweets
are occupied exclusively by verified users.
Shown are the two most topically diverse
handles with 13 and 21 topics respectively.
54
56. Key Contributions
Full Featured
Dataset
Released a fully
featured dataset of
407k+ users, containing
79+ million edges and
494+ million time
stamped Tweets.
Successful
Classification
We are the first study to
successfully attempt at
discerning as well as
classifying verification
worthy users on Twitter.
We obtain a near perfect
classifier in the process.
Actionable
Findings
We unravel the aspects
of a profile’s activity and
presence that have the
greatest bearing on a
user’s verification status.
56
57. Future Applications
1. Superior verification heuristic
Aforementioned deviations likely constitute a unique fingerprint for
verified users which can be leveraged gauge the strength of a user’s
case for such status
2. Actionable insights to improve online presence
Obtained insights can be used to significantly enhance the quality and
reach of one’s online presence before resorting to prohibitively priced
social media management solutions
3. Realistic synthetic influential profile generation
57
58. Related Publications
1. Elites Tweet? Characterizing the Twitter Verified User Network
ICDE Workshop on Large Scale Graph Data Analytics 2019, Macau SAR
1. What sets Verified Users apart? Insights, Analysis and Prediction of
Verified Users on Twitter
WebSci 2019, Boston
58