SlideShare a Scribd company logo
Power-Law Distributions in Twitter
Data
Conor Feeney
University of Limerick
Supervisor: Prof James Gleeson
April 15, 2016
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 1 / 20
Outline
Twitter.
Power-law distributions.
Data collection.
Initial results.
Synthetic data & Kolmogorov-Smirnov test.
Comparing “poweRlaw” and Aaron Clauset’s code.
Twitter’s structure: changes in last three years.
Conclusion
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 2 / 20
Twitter
Twitter was founded in March 2006, and in a little
over 10 years has amassed over 300 million users
worldwide.
Due to this, a large amount of social media data
that can be obtained from Twitter.
The purposes of this paper is to examine the
potential presence of power-law distributions in
Twitter Data.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 3 / 20
Introduction to Power-Law Distributions
A power-law probability distribution is a distribution
whose density function (or mass function in the
discrete case) has the form
p(x) = Cx−α
,
where C is a normalising constant.
Power-law distributions are deemed “heavy tail”.
This means that there is a greater chance of extreme
values than the Gaussian distrubution for example.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 4 / 20
Plotting a Power-Law Distribution
Most common way is plotting the CCDF on a
log-log scale.
In theory it should have an approximate straight line
form.
Very few empirical phenomena obey power laws for
all values of x, generally the power-law is only
obeyed for values greater than some xmin.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 5 / 20
Data Collection
First, a Twitter API (Application Program
Interface) was set up. This is because specific codes
generated by the app are needed to establish a
connection with R.
Data was collected using the “TwitteR” package in
R. This package is designed to work specifically with
Twitter, and their API.
Some difficulties at start because only specific
versions of R work with this package.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 6 / 20
Results
Initially a data set of 1.89 × 105
(1) was obtained.
“poweRlaw” and Clauset’s code was run on it to
calculate α and xmin values.
Over the break in semesters, further data was
collected and stored. This took nearly two weeks to
fully collect.
Our new data set contained 8.3 × 105
(2) rows and
similar tests were carried out.
For second data set, we were required to use the
64-bit version of R for out results.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 7 / 20
Results
Table: Results for two Twitter data sets.
Statistic Xmin(C) Xmin(R) α(C) α(R)
Followers (1) 360 329 2.2 2.19
Friends (1) 23251 9924 2.79 2.38
Rate of Posting (1) 0.99 0.99 2.04 2.04
Followers (2) 404 364 2.2 2.18
Friends (2) 51033 9986 3.08 2.37
Rate of Posting (2) 0.35 xx 2.01 xx
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 8 / 20
Figure: CCDF for the number of followers k of a random sample of
8.3 × 105
Twitter users.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 9 / 20
Figure: Plot of the CCDF for the number of friends for users. Notice
R and Clauset differ in outputted values, with R deviating from the
data the further along the tail.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 10 / 20
Figure: Plot of the CCDF for the rate of posting for users. R failed
to produce an output so the black line represents the output from
Aaron Clauset’s code.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 11 / 20
Synthetic Data
Synthetic data is any production data applicable to
a given situation that is not obtained by direct
measurement.
Synthetic data is generated to meet specific needs
or certain conditions that may not be found in the
original, real data.
The method that this paper utilised is known as
Inverse Transform Sampling.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 12 / 20
Synthetic Data
Needed to create synthetic data sets that follow
power-law distributions for our various samples.
They had to have α values equal to our empirical
data.
This was needed to perform the KS test.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 13 / 20
Kolmogorov-Smirnov Test Results
Necessary to show that the power-law model is a
good fit for data.
Ran on two random samples, size 2 × 104
and
4 × 104
.
The results from these samples gave p-values that
told us that the data could be drawn from the
power-law model.
This, however, was not true for the larger sample’s
followers count, gave a p-value of around .08.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 14 / 20
Sample Results
Table: Results for two Twitter data sets.
Statistic α(C) α(R) P-Val(C) P-Val(R)
Followers (A) 2.18 2.17 0.405 0.2213
Friends (A) 3 2.92 0.506 0.3455
Rate of Posting (A) 2.03 2.05 0.96 0.746
Followers (B) 2.2 2.19 0.103 0.086
Friends (B) 3.07 2.95 0.855 0.438
Rate of Posting (B) 2.03 2.01 0.427 0.45
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 15 / 20
A Comparison of R and Clauset’s codes.
R’s code nearly systematically calculated an xmin
that was less than Clauset, leading to less accurate
αs.
“poweRlaw” gave p-values less than Clauset’s
values.
While it is okay to use, borderline values should be
doubled checked using additional resources.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 16 / 20
Comparing 2013 Twitter to 2016 Twitter
Data set from a different paper was obtained.
Data was collected in 2013. We used it for
comparison purposes.
Only data was a users number of followers, but had
8.2 × 105
users.
Had a similar α values, 2.13 vs 2.2.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 17 / 20
1e+00 1e+02 1e+04 1e+06
1e−051e−01
x
CCDF
Figure: CCDF’s of the 2013 data set plotted alongside the 2016 data
set.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 18 / 20
Conclusion
Twitter & Power-law Distributions.
Data Collection.
Initial Results.
Synthetic Data & Kolmogorov-Smirnov Test.
Comparing “poweRlaw” and Aaron Clauset’s Code.
Twitter’s Structure: Changes in Last Three Years.
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 19 / 20
Thank You for Listening
Questions?
Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 20 / 20

More Related Content

What's hot

Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksMark Scully
 
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object DetectionDeep Learning JP
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsulesharmonylab
 
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB CameraDeep Learning JP
 
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜Jun Okumura
 
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...Toru Tamaki
 
[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...
[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...
[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...Deep Learning JP
 
Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...
Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...
Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...Clare Nelson, CISSP, CIPP-E
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You NeedDaiki Tanaka
 
論文紹介:”Playing hard exploration games by watching YouTube“
論文紹介:”Playing hard exploration games by watching YouTube“論文紹介:”Playing hard exploration games by watching YouTube“
論文紹介:”Playing hard exploration games by watching YouTube“Jun Okumura
 
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded SystemsFastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded Systemsharmonylab
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Vitaly Bondar
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networksShuyo Nakatani
 
[DL輪読会]Glow: Generative Flow with Invertible 1×1 Convolutions
[DL輪読会]Glow: Generative Flow with Invertible 1×1 Convolutions[DL輪読会]Glow: Generative Flow with Invertible 1×1 Convolutions
[DL輪読会]Glow: Generative Flow with Invertible 1×1 ConvolutionsDeep Learning JP
 
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...Deep Learning JP
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP
 
Depthwise separable convolution
Depthwise separable convolutionDepthwise separable convolution
Depthwise separable convolutionDong-Won Shin
 
帰納バイアスが成立する条件
帰納バイアスが成立する条件帰納バイアスが成立する条件
帰納バイアスが成立する条件Shinobu KINJO
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...Deep Learning JP
 
ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)
ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)
ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)tomohiro kato
 

What's hot (20)

Intro To Convolutional Neural Networks
Intro To Convolutional Neural NetworksIntro To Convolutional Neural Networks
Intro To Convolutional Neural Networks
 
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
[DL輪読会]EfficientDet: Scalable and Efficient Object Detection
 
Dynamic Routing Between Capsules
Dynamic Routing Between CapsulesDynamic Routing Between Capsules
Dynamic Routing Between Capsules
 
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
[DL輪読会]VNect: Real-time 3D Human Pose Estimation with a Single RGB Camera
 
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
深層強化学習の分散化・RNN利用の動向〜R2D2の紹介をもとに〜
 
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
文献紹介:Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segm...
 
[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...
[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...
[DL輪読会]Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network A...
 
Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...
Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...
Privacy-Preserving Authentication, Another Reason to Care about Zero-Knowledg...
 
[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need[Paper Reading] Attention is All You Need
[Paper Reading] Attention is All You Need
 
論文紹介:”Playing hard exploration games by watching YouTube“
論文紹介:”Playing hard exploration games by watching YouTube“論文紹介:”Playing hard exploration games by watching YouTube“
論文紹介:”Playing hard exploration games by watching YouTube“
 
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded SystemsFastDepth: Fast Monocular Depth Estimation on Embedded Systems
FastDepth: Fast Monocular Depth Estimation on Embedded Systems
 
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
Imagen: Photorealistic Text-to-Image Diffusion Models with Deep Language Unde...
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
[DL輪読会]Glow: Generative Flow with Invertible 1×1 Convolutions
[DL輪読会]Glow: Generative Flow with Invertible 1×1 Convolutions[DL輪読会]Glow: Generative Flow with Invertible 1×1 Convolutions
[DL輪読会]Glow: Generative Flow with Invertible 1×1 Convolutions
 
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima... [DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
[DL輪読会]StarGAN: Unified Generative Adversarial Networks for Multi-Domain Ima...
 
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
 
Depthwise separable convolution
Depthwise separable convolutionDepthwise separable convolution
Depthwise separable convolution
 
帰納バイアスが成立する条件
帰納バイアスが成立する条件帰納バイアスが成立する条件
帰納バイアスが成立する条件
 
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
[DL輪読会]Vision Transformer with Deformable Attention (Deformable Attention Tra...
 
ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)
ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)
ICLR2018におけるモデル軽量化(ICLR2018読み会@ PFN)
 

Similar to Power Law Distributions for Twitter Data

Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangEugine Kang
 
LinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataLinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataAndreas Thalhammer
 
Apollo Towards Factfinding In Participatory Sensing
Apollo  Towards Factfinding In Participatory SensingApollo  Towards Factfinding In Participatory Sensing
Apollo Towards Factfinding In Participatory SensingCarmen Pell
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systemskib_83
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph MaintenancePaul Groth
 

Similar to Power Law Distributions for Twitter Data (10)

Network Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine KangNetwork Flow Pattern Extraction by Clustering Eugine Kang
Network Flow Pattern Extraction by Clustering Eugine Kang
 
LinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity DataLinkSUM: Using Link Analysis to Summarize Entity Data
LinkSUM: Using Link Analysis to Summarize Entity Data
 
Bn presentation
Bn presentationBn presentation
Bn presentation
 
Apollo Towards Factfinding In Participatory Sensing
Apollo  Towards Factfinding In Participatory SensingApollo  Towards Factfinding In Participatory Sensing
Apollo Towards Factfinding In Participatory Sensing
 
Real-world News Recommender Systems
Real-world News Recommender SystemsReal-world News Recommender Systems
Real-world News Recommender Systems
 
Time series project
Time series projectTime series project
Time series project
 
ACQSurvey (Poster)
ACQSurvey (Poster)ACQSurvey (Poster)
ACQSurvey (Poster)
 
Final Project Statr 503
Final Project Statr 503Final Project Statr 503
Final Project Statr 503
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Knowledge Graph Maintenance
Knowledge Graph MaintenanceKnowledge Graph Maintenance
Knowledge Graph Maintenance
 

Power Law Distributions for Twitter Data

  • 1. Power-Law Distributions in Twitter Data Conor Feeney University of Limerick Supervisor: Prof James Gleeson April 15, 2016 Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 1 / 20
  • 2. Outline Twitter. Power-law distributions. Data collection. Initial results. Synthetic data & Kolmogorov-Smirnov test. Comparing “poweRlaw” and Aaron Clauset’s code. Twitter’s structure: changes in last three years. Conclusion Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 2 / 20
  • 3. Twitter Twitter was founded in March 2006, and in a little over 10 years has amassed over 300 million users worldwide. Due to this, a large amount of social media data that can be obtained from Twitter. The purposes of this paper is to examine the potential presence of power-law distributions in Twitter Data. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 3 / 20
  • 4. Introduction to Power-Law Distributions A power-law probability distribution is a distribution whose density function (or mass function in the discrete case) has the form p(x) = Cx−α , where C is a normalising constant. Power-law distributions are deemed “heavy tail”. This means that there is a greater chance of extreme values than the Gaussian distrubution for example. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 4 / 20
  • 5. Plotting a Power-Law Distribution Most common way is plotting the CCDF on a log-log scale. In theory it should have an approximate straight line form. Very few empirical phenomena obey power laws for all values of x, generally the power-law is only obeyed for values greater than some xmin. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 5 / 20
  • 6. Data Collection First, a Twitter API (Application Program Interface) was set up. This is because specific codes generated by the app are needed to establish a connection with R. Data was collected using the “TwitteR” package in R. This package is designed to work specifically with Twitter, and their API. Some difficulties at start because only specific versions of R work with this package. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 6 / 20
  • 7. Results Initially a data set of 1.89 × 105 (1) was obtained. “poweRlaw” and Clauset’s code was run on it to calculate α and xmin values. Over the break in semesters, further data was collected and stored. This took nearly two weeks to fully collect. Our new data set contained 8.3 × 105 (2) rows and similar tests were carried out. For second data set, we were required to use the 64-bit version of R for out results. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 7 / 20
  • 8. Results Table: Results for two Twitter data sets. Statistic Xmin(C) Xmin(R) α(C) α(R) Followers (1) 360 329 2.2 2.19 Friends (1) 23251 9924 2.79 2.38 Rate of Posting (1) 0.99 0.99 2.04 2.04 Followers (2) 404 364 2.2 2.18 Friends (2) 51033 9986 3.08 2.37 Rate of Posting (2) 0.35 xx 2.01 xx Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 8 / 20
  • 9. Figure: CCDF for the number of followers k of a random sample of 8.3 × 105 Twitter users. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 9 / 20
  • 10. Figure: Plot of the CCDF for the number of friends for users. Notice R and Clauset differ in outputted values, with R deviating from the data the further along the tail. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 10 / 20
  • 11. Figure: Plot of the CCDF for the rate of posting for users. R failed to produce an output so the black line represents the output from Aaron Clauset’s code. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 11 / 20
  • 12. Synthetic Data Synthetic data is any production data applicable to a given situation that is not obtained by direct measurement. Synthetic data is generated to meet specific needs or certain conditions that may not be found in the original, real data. The method that this paper utilised is known as Inverse Transform Sampling. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 12 / 20
  • 13. Synthetic Data Needed to create synthetic data sets that follow power-law distributions for our various samples. They had to have α values equal to our empirical data. This was needed to perform the KS test. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 13 / 20
  • 14. Kolmogorov-Smirnov Test Results Necessary to show that the power-law model is a good fit for data. Ran on two random samples, size 2 × 104 and 4 × 104 . The results from these samples gave p-values that told us that the data could be drawn from the power-law model. This, however, was not true for the larger sample’s followers count, gave a p-value of around .08. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 14 / 20
  • 15. Sample Results Table: Results for two Twitter data sets. Statistic α(C) α(R) P-Val(C) P-Val(R) Followers (A) 2.18 2.17 0.405 0.2213 Friends (A) 3 2.92 0.506 0.3455 Rate of Posting (A) 2.03 2.05 0.96 0.746 Followers (B) 2.2 2.19 0.103 0.086 Friends (B) 3.07 2.95 0.855 0.438 Rate of Posting (B) 2.03 2.01 0.427 0.45 Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 15 / 20
  • 16. A Comparison of R and Clauset’s codes. R’s code nearly systematically calculated an xmin that was less than Clauset, leading to less accurate αs. “poweRlaw” gave p-values less than Clauset’s values. While it is okay to use, borderline values should be doubled checked using additional resources. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 16 / 20
  • 17. Comparing 2013 Twitter to 2016 Twitter Data set from a different paper was obtained. Data was collected in 2013. We used it for comparison purposes. Only data was a users number of followers, but had 8.2 × 105 users. Had a similar α values, 2.13 vs 2.2. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 17 / 20
  • 18. 1e+00 1e+02 1e+04 1e+06 1e−051e−01 x CCDF Figure: CCDF’s of the 2013 data set plotted alongside the 2016 data set. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 18 / 20
  • 19. Conclusion Twitter & Power-law Distributions. Data Collection. Initial Results. Synthetic Data & Kolmogorov-Smirnov Test. Comparing “poweRlaw” and Aaron Clauset’s Code. Twitter’s Structure: Changes in Last Three Years. Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 19 / 20
  • 20. Thank You for Listening Questions? Conor Feeney (UL) Power-Law Distributions in Twitter Data April 15, 2016 20 / 20