With the tremendous growth of social networks, there has been a growth in the amount of new data that is being created every minute on these networking sites. The notion of community in this social networking world has caught lots of attention. Studying Twitter is useful for understanding how people use new communication technologies to form social connections and maintain existing ones. We analysed how geo-tagged tweets in Twitter can be used to identify useful user features and behavior as well as identify landmarks/places of interests. We also analysed several clustering algorithms and proposed different similarity measures to detect communities.
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Edureka!
** Hadoop Training: https://www.edureka.co/hadoop **
This Edureka tutorial on "Data Science vs Big Data vs Data Analytics" will explain you the similarities and differences between them. Also, you will get a complete insight into the skills required to become a Data Scientist, Big Data Professional, and Data Analyst.
Below topics are covered in this tutorial:
1. What is Data Science, Big Data, Data Analytics?
2. Roles and Responsibilities of Data Scientist, Big Data Professional and Data Analyst
3. Required Skill set.
4. Understanding how data science, big data, and data analytics is used to drive the success of Netflix.
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
Polestar we hope to bring the power of data to organizations across industries helping them analyze billions of data points and data sets to provide real-time insights, and enabling them to make critical decisions to grow their business.
This is a presentation I gave on Data Visualization at a General Assembly event in Singapore, on January 22, 2016. The presso provides a brief history of dataviz as well as examples of common chart and visualization formatting mistakes that you should never make.
Data visualizations make huge amounts of data more accessible and understandable. Data visualization, or "data viz," is becoming largely important as the amount of data generated is increasing and big data tools are helping to create meaning behind all of that data.
This SlideShare presentation takes you through more details around data visualization and includes examples of some great data visualization pieces.
Big Data vs Data Science vs Data Analytics | Demystifying The Difference | Ed...Edureka!
** Hadoop Training: https://www.edureka.co/hadoop **
This Edureka tutorial on "Data Science vs Big Data vs Data Analytics" will explain you the similarities and differences between them. Also, you will get a complete insight into the skills required to become a Data Scientist, Big Data Professional, and Data Analyst.
Below topics are covered in this tutorial:
1. What is Data Science, Big Data, Data Analytics?
2. Roles and Responsibilities of Data Scientist, Big Data Professional and Data Analyst
3. Required Skill set.
4. Understanding how data science, big data, and data analytics is used to drive the success of Netflix.
Check our complete Hadoop playlist here: https://goo.gl/hzUO0m
Polestar we hope to bring the power of data to organizations across industries helping them analyze billions of data points and data sets to provide real-time insights, and enabling them to make critical decisions to grow their business.
This is a presentation I gave on Data Visualization at a General Assembly event in Singapore, on January 22, 2016. The presso provides a brief history of dataviz as well as examples of common chart and visualization formatting mistakes that you should never make.
Data Warehouse, Data Warehouse Architecture, Data Warehouse Concept, Data Warehouse Modeling, OLAP, OLAP Operations, Data Cube, Data Processing, Data Cleaning, Data Reduction, Data Integration, Data Transformation
Predict business outcomes with Predictive Modelling PowerPoint Presentation Slides. Identify outliers in a data and determine any fraud activity using predictive modelling PPT presentation templates. Use content-ready predictive modelling PowerPoint presentation layout for CRM to find out customers who are likely to purchase from you. Other than this, incorporate predictive modelling PPT slideshow for various other applications such as disaster management, capacity planning, change management, engineering, and more. This deck comprises of templates that will help you collect and analyse data to identify the chances of future outcomes based on historical data. It has PPT slides like predictive analytics steps, define project, data collection, data analysis, statistics results, predictive analytics stages, predictive analytics benefits, and more. Discover what kind of products and services consumers might be interested in and what attracts them with predictive analytics PPT templates. Use prediction interval for your company to increase bottom line and competitive advantage. Have statistical model PowerPoint slideshow to produce valuable information. Our Predictive Modelling PowerPoint Presentation Slides have the impact of being succinct and to the point.
Best Practices for Killer Data VisualizationQualtrics
There’s something special about simple, powerful visualizations that tell a story. In fact, 65% of people are visual learners.
Join Qualtrics and Sasha Pasulka from Tableau as we illuminate the world of data visualization and give you clear takeaways to help you tell a better story with data. Getting executive buy-in or that seat at the table may come down to who can visualize data in a way that excites and enlightens the audience.
OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.
Big Data & Analytics for Government - Case StudiesJohn Palfreyman
This presentation explains the future challenges that Governments face, and illustrates how Big Data & Analytics technologies can help address these challenges. Four case studies - based on recent customer projects - are used to show the value that the innovative application of these technologies can bring.
This presentation explains what data engineering is and describes the data lifecycles phases briefly. I used this presentation during my work as an on-demand instructor at Nooreed.com
Data Warehouse, Data Warehouse Architecture, Data Warehouse Concept, Data Warehouse Modeling, OLAP, OLAP Operations, Data Cube, Data Processing, Data Cleaning, Data Reduction, Data Integration, Data Transformation
Predict business outcomes with Predictive Modelling PowerPoint Presentation Slides. Identify outliers in a data and determine any fraud activity using predictive modelling PPT presentation templates. Use content-ready predictive modelling PowerPoint presentation layout for CRM to find out customers who are likely to purchase from you. Other than this, incorporate predictive modelling PPT slideshow for various other applications such as disaster management, capacity planning, change management, engineering, and more. This deck comprises of templates that will help you collect and analyse data to identify the chances of future outcomes based on historical data. It has PPT slides like predictive analytics steps, define project, data collection, data analysis, statistics results, predictive analytics stages, predictive analytics benefits, and more. Discover what kind of products and services consumers might be interested in and what attracts them with predictive analytics PPT templates. Use prediction interval for your company to increase bottom line and competitive advantage. Have statistical model PowerPoint slideshow to produce valuable information. Our Predictive Modelling PowerPoint Presentation Slides have the impact of being succinct and to the point.
Best Practices for Killer Data VisualizationQualtrics
There’s something special about simple, powerful visualizations that tell a story. In fact, 65% of people are visual learners.
Join Qualtrics and Sasha Pasulka from Tableau as we illuminate the world of data visualization and give you clear takeaways to help you tell a better story with data. Getting executive buy-in or that seat at the table may come down to who can visualize data in a way that excites and enlightens the audience.
OLAP performs multidimensional analysis of business data and provides the capability for complex calculations, trend analysis, and sophisticated data modeling.
Big Data & Analytics for Government - Case StudiesJohn Palfreyman
This presentation explains the future challenges that Governments face, and illustrates how Big Data & Analytics technologies can help address these challenges. Four case studies - based on recent customer projects - are used to show the value that the innovative application of these technologies can bring.
This presentation explains what data engineering is and describes the data lifecycles phases briefly. I used this presentation during my work as an on-demand instructor at Nooreed.com
A web service for setting up social media presence, publishing content into the created presence followed by monitoring the activities all from one single, centralized location.
This slide has other topics as given below:
How does it function ?
How does it help a business or organization ?
How to get started ?
Case Study with Screen shots
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
This is the fourth lecture in the Social Web course at the VU University Amsterdam
Visit the website for more information: <a>Social Web 2012</a>
Example of Irish Recruiters Tuesday Club 2009 Content e twitter to recruit tu...Declan Fitzgerald
Example of Irish Recruiters Tuesday Club 2009 Content regarding using Twitter for recruiting. Contact declan.fitzgerald [at] gmail.com to learn more about Irish Recruiters
Warren Buffet would often think of companies as castles with a competitive moat protecting the business. Products or companies that figure out how to build and leverage differentiated data assets will be best positioned to win their respective markets. This talk describes the properties of a good data moat, why it matters, and how to go about building them within your organization.
How can we mine, analyse and visualise the Social Web?
In this lecture, you will learn about mining social web data for analysis. Data preparation and gathering basic statistics on your data.
M&A prosposal to Warburg Pincus to spin out Convera Search assets. Ultimately Convera Search was acquired for $18M by FAST and served as basis of Microsoft acquisition of FAST for $600M.
Take this opportunity to learn more about SP 2013 and find out about the plans other organizations have for SP 2013. Some of the common concerns now include:
Should I wait for SP 2013 or move on with SP 2010?
How do I justify for SP 2013's investment?
With great improvements in features and usability, the SP product team now says that the web/intranet team can focus more on engaging with users needs rather than vendors' implementation. So what role do we play in SP 2013, and what role do vendors play?
What does it mean for migration from earlier versions of SharePoint?
Andy Teichholz, a Senior Discovery Consultant at Daegis, delivered a presentation at the Society of Corporate Compliance and Ethics' ("SCCE") conference on Nov. 10 in San Francisco titled Effective Internal Investigations. Andy spoke on eDiscovery and computer forensics processes and concepts in an investigative context. His presentation identified preliminary investigative considerations and requirements for data preservation and collection.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Honest Reviews of Tim Han LMA Course Program.pptxtimhan337
Personal development courses are widely available today, with each one promising life-changing outcomes. Tim Han’s Life Mastery Achievers (LMA) Course has drawn a lot of interest. In addition to offering my frank assessment of Success Insider’s LMA Course, this piece examines the course’s effects via a variety of Tim Han LMA course reviews and Success Insider comments.
Embracing GenAI - A Strategic ImperativePeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
1. 1
Data Mining and Analysis on
Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
2. Professor 2
• Prof. Pascal Frossard
Project Supervisor
• Xiaowen Dong
Students
• Pulkit Goyal (twitter.com/pulkit110)
• Sapan Diwakar (twitter.com/diwakarsapan)
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
3. Contents 3
• Objective
• Twitter at a glance
• Modules
• Data Collection
• Visualization Results
• Community Detection
• Future Mentions on Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
4. Objective 4
• Large amount of new data created every minute on social
networking sites.
– Difficult to obtain and interpret
– Collect data to allow for further analysis
• Identify online communities of users on Twitter
• Explore reasons of user interactions as a step towards prediction of
future interactions
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
5. Contents 5
• Objective
• Twitter at a glance
• Modules
• Data Collection
• Visualization Results
• Community Detection
• Future Mentions on Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
6. Twitter at a glance 6
Micro-blogging platform
Since March 2006
Status Update
300 Million users
(June, 2011)
Giant Chat room
Instant Messaging
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
7. Lingo 7
• Tweet - A message of 140 characters or less
• Retweet - Repeat a tweet from somebody else
• Hashtag - Tweet that includes a #term (tracking)
• Reply/Mention - Mentioning another user in a tweet
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
8. Contents 8
• Objective
• Twitter at a glance
• Modules
• Data Collection
• Visualization Results
• Community Detection
• Future Mentions on Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
9. Modules 9
• Data Collection
– Setup system to collect data based on some constraints
• Visualization
– Build some visualizations based on the collected data
– Analyze the results
• Community Detection
– Identify communities of users on Twitter based on several different similarty
measures
• Analysis of Future Mentions
– Identify factors for future mentions between users on twitter.
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
10. Contents 10
• Objective
• Twitter at a glance
• Modules
• Data Collection
• Visualization Results
• Community Detection
• Future Mentions on Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
11. Data Collection | Data based on location 11
• Collect data based on locations: Objectives:
– London • Model the spread of interests
– New York • Time
– Paris • Location
– San Francisco • Rate of information flow
– Mumbai • Identify future events
• Identify landmarks
• Model Relationships among users
• Friendship/Social Connections
• Common Interests
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
12. Data Collection | Data based on topics 12
• Collect data based on keywords Objectives:
– Apple (Tech) • Model the spread of interests
– Manchester United (Soccer) • Time
• Location
• Rate of information flow
• Identify future events
• Identify landmarks
• Model Relationships among users
• Friendship/Social Connections
• Common Interests
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
13. Data Collection | Data from a group of users 13
• Collect tweets from a "group of users" Objectives:
– Group of around 25k users • Model the spread of interests
• Time
• Created by a specified user
• Location
• Explicitly in-reply-to a status created by a • Rate of information flow
specified user (pressed reply button) • Identify future events
• Identify landmarks
• Model Relationships among users
• Friendship/Social Connections
• Common Interests
Overview of links we
use to collect users
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
14. Contents 14
• Objective
• Twitter at a glance
• Modules
• Data Collection
• Visualization Results
• Community Detection
• Future Mentions on Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
15. Visualization Results | Streets of London 15
• Setup
– Geo-tagged tweets for one week (16 to 22 August 2011)
• 111,206 tweets
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
16. Visualization Results | Streets of London | 1 week 16
• Analysis
• High density of tweets from famous places/tourist attractions
• Clustering of tweets
• Content of tweets can be used to predict the place
• More tweets along the roads/streets
National Gallery
London Waterloo Rail
The Big Ben
London Victoria Rail
Oval Cricket Ground Greenwich
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
18. Tweets in London | Aggregated by wards 18
No. of tweets
in increasing
order
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
19. Tweets about a topic| Manchester United 19
• Setup
– Data for two weeks (27 Oct to 8 Nov 2011)
• Keywords
– "manchesterunited", "manchester united", "manchester utd", "man
united", "manutd", "man utd", "manu", "mufc"
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
20. Visualization Results | Tweets About 20
Manchester United
Analysis
• More tweets in and around Europe
• Manchester United plays in the English Premiere League and has homeground in Manchester
• High amount of tweets from countries whose players play for Manchester United
• High popularity of Manchester United in Indonesia and Malaysia
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
21. Tweets about a topic| Apple 21
• Setup
– Data for two weeks (27 Oct to 8 Nov 2011)
• Keywords
– "apple", "mac", "macbook", "macbookair", "macbookpro", "os x", "osx",
"osxlion", "ipod", "ipodshuffle", "ipodnano", "ipodclassic", "ipodtouch",
"itunes", "iphone", "iphone3", "iphone3s", "iphone4", "iphone4s",
"iphone5", "ios", "ios4", "ios5", "ipad", "ipad2", "ipad3"
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
22. Visualization Results | Tweets About Apple 22
Analysis
• High volume of tweets in USA and Europe
• Popularity of apple products in Europe and USA
• Volume of data as compared to Manchester United
• 32k tweets (with Geo-Location) about Apple as opposed to 1.4k for Manchester United
• Interest about Apple spread over the world whereas for Manchester United, it is limited to few countries
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
23. Contents 23
• Twitter at a glance
• Modules
• Data Collection
• Visualization Results
• Community Detection
• Future mentions on Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
24. Community Detection| Background 24
• Community
– A set of users having strong connections.
– Held together by some common interests of a large group of users.
• Similarity Measures
– Users’ Social Connection
– User Mentions
– Description Content Similarity
– Tweet Content Similarity
– Hash-Tag Similarity
• Algorithms for community detection
– Modularity Maximization Clustering
• Spectrum Based
• Greedy Bottom-up Fast Modularity Clustering
– Spectral Clustering
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
25. Community Detection| Analysis on small dataset 25
• Experimental setup
– 501 users from three different lists on twitter
• List id 4293757, 12932674 and 33222959
– Tweets collected for 2 weeks
• 26th October, 2011 to 7th November 2011
• Goal
– Recover ground truth clusters
– Evaluation based on NMI and RI
• Similarity Measures used
– Users’ social connections
– User mentions
– Users’ Description content similarity
– Users’ Tweet content similarity Spy plot for Social connections
with users ordered by the list to
which they belong
• Algorithms used
– Spectrum based Modularity Maximization
– Spectral Algorithm – Normalized Laplacian Matrix
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
26. Analysis on small dataset | Modularity Based Clustering 26
Clusters for spectrum based Clusters for spectrum based Modularity
Ground truth clusters modularity maximization clustering on maximization clustering on combined
User Connections similarity measure
Similarity Matrix Modularity Matrix
Analysis
• Social connections most dominating for
NMI RI clustering this group of users.
User Connections 0.3868 0.7174 • Individual similarity measures perform
inaccurately
Mention 0.0130 0.3398 • Combined similarity measures not as good
Tweet content 0.0074 0.3371 as user connections alone
• Addition of low information content to user
Description content 0.0780 0.5254 connections decreases accuracy.
• User behavior not consistent with ground
All combined 0.2500 0.6175
truth.
Company Proprietary and Confidential Copyright Info Goes Here Just Like • Post similar content
This
27. Analysis on small dataset | Laplacian Based Clustering 27
Clusters for Normalized Laplacian based spectral
Ground truth clusters clustering on combined similarity measure
Symmetric Normalized Analysis
Similarity Matrix • Clustering on Social connections fails.
Laplacian Matrix
• Laplacian based methods are sensitive to
NMI RI
the presence of disconnected nodes.
User Connections 0.0077 0.3374 • Individual similarity measures (including
Mention 0.0077 0.3374 social connections) fail to reconstruct any
cluster information.
Tweet content 0.0077 0.3374 • Combined similarity measures gives results
Description content 0.0088 0.3381
consistent with the modularity based
approach.
All combined 0.2931 0.6472 • Addition of different information to the
Company Proprietary and Confidential Copyright Info Goes Here Just Like
social connections makes it connected.
This
28. Community Detection| Analysis on large dataset 28
• Experimental setup
– 11273 users from the set of all users collected during data-collection
– Tweets collected for 4 weeks
• 26th October, 2011 to 22nd November 2011
• Similarity Measures used
– Users’ social connections
– User mentions
– Users’ Hash tag similarity
– Users’ Tweet content similarity
• Algorithm used
– Bottom up Fast Modularity Clustering
Spy plot for Social connections
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
29. Analysis on large dataset| Clustering on Social Connections 29
Spy plot for social connections with
Visualization of clustering results
users ordered by the clusters that
they are present in
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
30. Analysis on large dataset| Clustering on Social Connections 30
Tag cloud 1: Frequent keywords in tweets from cluster 2
Visualization of clustering results Tag cloud 2: Frequent keywords in tweets from cluster 6
Analysis
• Largest cluster, (i.e. cluster 0) contains most of the users from UK and are mostly web
developers/software developers and talk consistently about these terms.
• Users in cluster 2 talk mostly about technologies like ‘Google’, ‘server’, ‘SQL’ etc. as shown in tag
cloud 1
• Users in cluster 4 are from same university in India ‘IIIT Hyderabad’.
• Users in cluster 6 are football fans as shown in the tag cloud 2. Most of them support Italian club
Juventus.
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
31. Analysis on large dataset| Clustering on Combined matrices 31
Results for data from week 1 Results for week 2 Results for only social connections
Analysis
• Using combined data leads to much
finer clustering results as compared to
clustering on social connections.
• Additional information allowed
making division between users who
weren’t tightly connected.
• Division into smaller cluster consistent
with different weeks
Results for week 3 Results for week 4 • Not due to some shifts of interests for
a small period of time.
Combined and Confidential
Company Proprietary
This
= Conection+Mention+Hashtag+Tweet
Copyright Info Goes Here Just Like
32. Contents 32
• Twitter at a glance
• Modules
• Data Collection
• Visualization Results
• Community Detection
• Future mentions on Twitter
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
33. Future Mentions| Reasons for mentions on Twitter 33
• Social Connections
– Users can see the tweets of their friends on their wall and therefore are
more likely to mention them in their future tweets.
– Mentions should occur only if two users share a ‘following ‘or ‘being
followed’ relationship
• Past mentions
– Users who have mentioned each other often in the past are more likely to
mention each other in the future .
– Past mentions means that the users might have had a conversation on
Twitter which means that they share a good relationship.
• Hash Tag Similarity
– Hash tags are used to highlight important keywords in tweets and make it
easy to find tweets or set trending topics on Twitter.
– If two users discuss about the same topic/keyword (hashtag) they are
more likely to mention each other in future.
• Tweet Content Similarity
– Users can mention others if they find their tweets to be interesting.
– Highly similar tweet content means that there is higher probability of a
mention event between two users.
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
34. Future Mentions| Correlation between features 34
and future mentions
Correlation between features of week 1 as compared to mentions in week 2 Weighted combination =
W1/W2 Mention Hash Tag Tweet Combined Class 2*Mention + 5*Hashtag +
Mention 1 0.0528 0.003 0.919 0.1656
Hash Tag 0.0528 1 0.0031 0.4422 0.0565 Tweet Similarity
Tweet 0.003 0.0031 1 0.0134 0.0272
Combined
Class
0.919
0.1656
0.4422
0.0565
0.0134
0.0272
1
0.1713
0.1713
1
Analysis
• Past user mentions has a high
correlation with mentions in
Correlation between features of week 1,2 and 3 as compared to mentions in week 4 the next week.
W123/W4 Mention Hash Tag Tweet Combined Class • Combined similarity measure
Mention 1 0.1428 0.0219 0.8912 0.1906
provides some increase in the
Hash Tag 0.1428 1 0.0193 0.5761 0.0861
Tweet 0.0219 0.0193 1 0.0343 -0.006 correlation as compared to past
Combined 0.8912 0.5761 0.0343 1 0.1968 mentions.
Class 0.1906 0.0861 -0.006 0.1968 1 • We can improve accuracy by
increasing the learning data.
Correlation between features of week 1 as compared to mentions in week • Correlation for only one cluster
2 only for users of cluster 1
W1/W2 Mention Hash Tag Tweet Combined Class
is very good.
Mention 1 0.0343 -0.0062 0.7492 0.1616 • Only 1-week learning
Hash Tag 0.0343 1 -0.0049 0.6876 0.2192 data outperforms 3 weeks
Tweet -0.0062 -0.0049 1 -0.0001 -0.0116
learning data for
Combined 0.7492 0.6876 -0.0001 1 0.2625
Class 0.1616 0.2192 -0.0116 0.2625 1 complete set of users.
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
35. Future Work 35
• Landmark detection
– Tweets collected from different cities can be used to identify
landmark/places of interest in these cities.
• Identify future events
– Algorithms can be developed to identify future events with the help of
tweets collected for different topics.
• Combined similarity measure for community detection
– Different weighted combinations of similarity measures like mentions,
tweet, hashtag, description and social connection etc. can be used to
improve clustering results.
• Future Mentions
– Causes of mentions like past mentions, hashtag similarity etc. can be
used to predict future mentions.
Company Proprietary and Confidential Copyright Info Goes Here Just Like
This
% of Tweets containing GPS location (0.5-1%) But this is also enough because there are millions of tweets
% of Tweets containing GPS location (0.5-1%) But this is also enough because there are millions of tweets
% of Tweets containing GPS location (0.5-1%) But this is also enough because there are millions of tweets
The organisation into groups should be such that similar objects belong to the same cluster whereas there is little or no similarity between objects that belong to different clusters.
Lists are a way of grouping users on twitter. Users can follow lists to obtain updates from a group of users. lists @prolificd/met, @rahulkalra_e/entrepreneurs and @8hasin/mildly-interesting respectively.
A reason for the bad performance of the similarity measures based on the tweets, descriptions and mentions can be that the group of users are similar and generally post similar content on the web. This also means that the user behaviours don’t seem to be consistent with the ground truth data. @prolificd/met, @rahulkalra_e/entrepreneurs and @8hasin/mildly-interesting
A reason for the bad performance of the similarity measures based on the tweets, descriptions and mentions can be that the group of users are similar and generally post similar content on the web. This also means that the user behaviours don’t seem to be consistent with the ground truth data. @prolificd/met, @rahulkalra_e/entrepreneurs and @8hasin/mildly-interesting
Note that there is no special ordering enforced on the users here so we cannot immediately see some cluster structure in the network.
We can now observe a community structure in the graph, i.e. the users have more connections within the community with other users in other communities. Clusters are ordered by the number of users present in each cluster. Red is largest cluster followed by green, blue, purple and cyanThis is just layout. Colors define the distribution of users into clusters. In fact the top 4 communities in the graph cover more than 93% of the total nodes.
Use connections, mentions, hash tag, tweet content Used weekly data
If two users discuss about the same topic/keyword (hashtag) they are more likely to see each others’ tweets and therefore more likely to share a mention relationship in the future.Tweet Content Similarity: Here we implicitly assume that the users also post something that they are interested in.