This document describes using social network analysis techniques to analyze power in organizations using an Enron email dataset. It discusses parsing emails to extract sender and recipient addresses to build a graph model. Various centrality measures like degree, closeness, and betweenness are calculated to identify influential individuals. The data is stored in a hashmap for serialization. The analysis finds that degree centrality, measured by outbound connections, best identifies powerful individuals in the organization.
The growing number of datasets published on the Web as linked data brings both opportunities for high data
availability of data. As the data increases challenges for querying also increases. It is very difficult to search
linked data using structured languages. Hence, we use Keyword Query searching for linked data. In this paper,
we propose different approaches for keyword query routing through which the efficiency of keyword search can
be improved greatly. By routing the keywords to the relevant data sources the processing cost of keyword search
queries can be greatly reduced. In this paper, we contrast and compare four models – Keyword level, Element
level, Set level and query expansion using semantic and linguistic analysis. These models are used for keyword
query routing in keyword search.
Entity Annotation WordPress Plugin using TAGME TechnologyTELKOMNIKA JOURNAL
The development of internet technology makes more information can be accessed. It makes
information need to be organized in order to be easily managed. One solution can be used is by using the
entity annotation approach which generates tags to represent that document. In this study, TAGME
technology is implemented on a WordPress plugin, which is used to manage a blog. Moreover, information
on Wikipedia ‘Bahasa Indonesia’ is processed to generate an anchor dictionary which is required by the
technology that is implemented. This plugin performs entity annotation by giving tag suggestion for posts in
a blog. Testing is carried out by measuring the precision, recall, and of tag suggestions given by the
plugin. The result shows that the plugin can give tag suggestions with precision 0.7638, recall 0.5508, and
0.59.
The growing number of datasets published on the Web as linked data brings both opportunities for high data
availability of data. As the data increases challenges for querying also increases. It is very difficult to search
linked data using structured languages. Hence, we use Keyword Query searching for linked data. In this paper,
we propose different approaches for keyword query routing through which the efficiency of keyword search can
be improved greatly. By routing the keywords to the relevant data sources the processing cost of keyword search
queries can be greatly reduced. In this paper, we contrast and compare four models – Keyword level, Element
level, Set level and query expansion using semantic and linguistic analysis. These models are used for keyword
query routing in keyword search.
Entity Annotation WordPress Plugin using TAGME TechnologyTELKOMNIKA JOURNAL
The development of internet technology makes more information can be accessed. It makes
information need to be organized in order to be easily managed. One solution can be used is by using the
entity annotation approach which generates tags to represent that document. In this study, TAGME
technology is implemented on a WordPress plugin, which is used to manage a blog. Moreover, information
on Wikipedia ‘Bahasa Indonesia’ is processed to generate an anchor dictionary which is required by the
technology that is implemented. This plugin performs entity annotation by giving tag suggestion for posts in
a blog. Testing is carried out by measuring the precision, recall, and of tag suggestions given by the
plugin. The result shows that the plugin can give tag suggestions with precision 0.7638, recall 0.5508, and
0.59.
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
In today’s world of internet, with whole lot of e-documents such, as html pages, digital libraries etc. occupying considerable amount of cyber space, organizing these documents has become a practical need. Clustering is an important technique that organizes large number of objects into smaller coherent groups.This helps in efficient and effective use of these documents for information retrieval and other NLP tasks.Email is one of the most frequently used e-document by individual or organization. Email categorization is one of the major tasks of email mining. Categorizing emails into different groups help easy retrieval and maintenance. Like other e-documents, emails can also be classified using clustering algorithms. In this
paper a similarity measure called Similarity Measure for Text Processing is suggested for email clustering.
The suggested similarity measure takes into account three situations: feature appears in both emails, feature appears in only one email and feature appears in none of the emails. The potency of suggested similarity measure is analyzed on Enron email data set to categorize emails. The outcome indicates that the efficiency acquired by the suggested similarity measure is better than that acquired by other measures.
A scalable gibbs sampler for probabilistic entity linkingSunny Kr
Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.
Merkle Tree and Merkle root are explained which are the backbone of blockchain. It is a special type of data structure in blockchain technology which is completely built using Cryptographic Hash functions
Hybrid approach for generating non overlapped substring using genetic algorithmeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAcsandit
This work presents a novel ranking scheme for structured data. We show how to apply the
notion of typicality analysis from cognitive science and how to use this notion to formulate the
problem of ranking data with categorical attributes. First, we formalize the typicality query
model for relational databases. We adopt Pearson correlation coefficient to quantify the extent
of the typicality of an object. The correlation coefficient estimates the extent of statistical
relationships between two variables based on the patterns of occurrences and absences of their
values. Second, we develop a top-k query processing method for efficient computation. TPFilter
prunes unpromising objects based on tight upper bounds and selectively joins tuples of highest
typicality score. Our methods efficiently prune unpromising objects based on upper bounds.
Experimental results show our approach is promising for real data.
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...IJECEIAES
Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by the number of predefined topics. If conducted incorrectly, determining the amount of K topics will result in limited word correlation with topics. Too large or too small number of K topics causes inaccuracies in grouping topics in the formation of training models. This study aims to determine the optimal number of corpus topics in the LDA method using the maximum likelihood and Minimum Description Length (MDL) approach. The experimental process uses Indonesian news articles with the number of documents at 25, 50, 90, and 600; in each document, the numbers of words are 3898, 7760, 13005, and 4365. The results show that the maximum likelihood and MDL approach result in the same number of optimal topics. The optimal number of topics is influenced by alpha and beta parameters. In addition, the number of documents does not affect the computation times but the number of words does. Computational times for each of those datasets are 2.9721, 6.49637, 13.2967, and 3.7152 seconds. The optimisation model has resulted in many LDA topics as a classification model. This experiment shows that the highest average accuracy is 61% with alpha 0.1 and beta 0.001.
Tip: Data Scoring: Convert data with XQueryGeert Josten
The process of converting data is one of migrating information from an unsuitable
source or format to a suitable one—often not an exact science. Data scoring is a way
to measure the accuracy of your conversion. Discover a simple scoring technique in
XQuery that you can apply to the result of a small text-to-XML conversion.
Scoring converted data is all about analyzing the quality of the conversion. Quality
can mean different things, and converting data from a database carries with it
different problems than converting data from documents with more natural language.
The technique that this tip presents makes no assumptions: You can apply it to any
XML code of interest. To see the technique in practice, you will convert plain
text—not comma-separated files, but plain text from news items grabbed from the
Internet.
Frequently used acronyms
• HTML: Hypertext Markup Language
• W3C: World Wide Web Consortium
• URL: Uniform Resource Locator
• XML: Extensible Markup Language
• XSLT: Extensible Stylesheet Transformations
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Introducing new features in Apache Pinot. In this talk, we will go over indexing support in Pinot, recently added text indexing feature, SQL support, and cloud readiness.
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
In today’s world of internet, with whole lot of e-documents such, as html pages, digital libraries etc. occupying considerable amount of cyber space, organizing these documents has become a practical need. Clustering is an important technique that organizes large number of objects into smaller coherent groups.This helps in efficient and effective use of these documents for information retrieval and other NLP tasks.Email is one of the most frequently used e-document by individual or organization. Email categorization is one of the major tasks of email mining. Categorizing emails into different groups help easy retrieval and maintenance. Like other e-documents, emails can also be classified using clustering algorithms. In this
paper a similarity measure called Similarity Measure for Text Processing is suggested for email clustering.
The suggested similarity measure takes into account three situations: feature appears in both emails, feature appears in only one email and feature appears in none of the emails. The potency of suggested similarity measure is analyzed on Enron email data set to categorize emails. The outcome indicates that the efficiency acquired by the suggested similarity measure is better than that acquired by other measures.
A scalable gibbs sampler for probabilistic entity linkingSunny Kr
Entity linking involves labeling phrases in text with their referent entities, such as Wikipedia or Freebase entries. This task is challenging due to the large number of possible entities, in the millions, and heavy-tailed mention ambiguity. We formulate the problem in terms of probabilistic inference within a topic model, where each topic is associated with a Wikipedia article. To deal with the large number of topics we propose a novel efficient Gibbs sampling scheme which can also incorporate side information, such as the Wikipedia graph. This conceptually simple probabilistic approach achieves state-of-the-art performance in entity-linking on the Aida-CoNLL dataset.
Merkle Tree and Merkle root are explained which are the backbone of blockchain. It is a special type of data structure in blockchain technology which is completely built using Cryptographic Hash functions
Hybrid approach for generating non overlapped substring using genetic algorithmeSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAcsandit
This work presents a novel ranking scheme for structured data. We show how to apply the
notion of typicality analysis from cognitive science and how to use this notion to formulate the
problem of ranking data with categorical attributes. First, we formalize the typicality query
model for relational databases. We adopt Pearson correlation coefficient to quantify the extent
of the typicality of an object. The correlation coefficient estimates the extent of statistical
relationships between two variables based on the patterns of occurrences and absences of their
values. Second, we develop a top-k query processing method for efficient computation. TPFilter
prunes unpromising objects based on tight upper bounds and selectively joins tuples of highest
typicality score. Our methods efficiently prune unpromising objects based on upper bounds.
Experimental results show our approach is promising for real data.
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...IJECEIAES
Latent Dirichlet Allocation (LDA) is a probability model for grouping hidden topics in documents by the number of predefined topics. If conducted incorrectly, determining the amount of K topics will result in limited word correlation with topics. Too large or too small number of K topics causes inaccuracies in grouping topics in the formation of training models. This study aims to determine the optimal number of corpus topics in the LDA method using the maximum likelihood and Minimum Description Length (MDL) approach. The experimental process uses Indonesian news articles with the number of documents at 25, 50, 90, and 600; in each document, the numbers of words are 3898, 7760, 13005, and 4365. The results show that the maximum likelihood and MDL approach result in the same number of optimal topics. The optimal number of topics is influenced by alpha and beta parameters. In addition, the number of documents does not affect the computation times but the number of words does. Computational times for each of those datasets are 2.9721, 6.49637, 13.2967, and 3.7152 seconds. The optimisation model has resulted in many LDA topics as a classification model. This experiment shows that the highest average accuracy is 61% with alpha 0.1 and beta 0.001.
Tip: Data Scoring: Convert data with XQueryGeert Josten
The process of converting data is one of migrating information from an unsuitable
source or format to a suitable one—often not an exact science. Data scoring is a way
to measure the accuracy of your conversion. Discover a simple scoring technique in
XQuery that you can apply to the result of a small text-to-XML conversion.
Scoring converted data is all about analyzing the quality of the conversion. Quality
can mean different things, and converting data from a database carries with it
different problems than converting data from documents with more natural language.
The technique that this tip presents makes no assumptions: You can apply it to any
XML code of interest. To see the technique in practice, you will convert plain
text—not comma-separated files, but plain text from news items grabbed from the
Internet.
Frequently used acronyms
• HTML: Hypertext Markup Language
• W3C: World Wide Web Consortium
• URL: Uniform Resource Locator
• XML: Extensible Markup Language
• XSLT: Extensible Stylesheet Transformations
Extraction of Data Using Comparable Entity Miningiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Introducing new features in Apache Pinot. In this talk, we will go over indexing support in Pinot, recently added text indexing feature, SQL support, and cloud readiness.
Modeling employees relationships with Apache SparkWassim TRIFI
How to build a graphX in order to analyze relationships based on emails exchanges between employees of an organization. Then apply different algorithm to get more insights on the business model and influent person among employees.
1. LOKESH SHANMUGANANDAM | NORTHEASTERN UNIVERSITY
Using Social Networking Theory to
Understand Power in Organizations
Under the guidance of Prof. KAL BUGRARA
Using Social Networking Theory to Understand Power in Organizations
P a g e | 1|
2. Using Social Networking Theory to Understand Power in Organizations
Project Report
P a g e | 2|
3. Using Social Networking Theory to Understand Power in Organizations
TABLE OF CONTENTS
TABLE OF CONTENTS.................................................................................................................3
.........................................................................................................................................................3
1.Objective.......................................................................................................................................4
2.Enron E-Mail Data Set..................................................................................................................4
3.Parsing E-Mails from the dataset..................................................................................................4
4.Natural Language Processing (NLP)............................................................................................4
5.Extracting E-Mail Id’s from Dataset.............................................................................................5
6.Data structure to store the Email Id’s...........................................................................................5
7.Serializing the HashMap Data......................................................................................................7
8.Deserializing the HashMap Data..................................................................................................7
9.Graph Analysis..............................................................................................................................8
10.Degree Centrality......................................................................................................................10
11.Farness......................................................................................................................................13
12. Transitivity...............................................................................................................................14
13. Closeness Centrality.................................................................................................................15
14. Conclusion...............................................................................................................................16
15.References.................................................................................................................................17
P a g e | 3|
4. Using Social Networking Theory to Understand Power in Organizations
1. Objective
To study Power in organizations using graph algorithm design and analysis. To analyze e-mail
communication between people to understand who is in a better bargaining position and has
more chance of influencing others. To create a graph model and apply the graph algorithms to
study power in organizations. Analyze the graph model and understand who is in a better
bargaining position, more chances for making things happen, and more flexibilities.
2. Enron E-Mail Data Set
In this project we use Enron email dataset to study and understand the power in organizations.
The Enron email dataset is valuable because it is one of the very few collections of
organizational emails that are publicly available. The emails of this period (1998.11 - 2002.6)
record the dynamics of Enron, from glory to collapse.
The Enron email dataset contains 517,431 messages organized into 150 folders. The folder’s
name is given as the employee’s last name, followed by a dash, followed by the initial letter of
the employee’s first name. For example, folder “allen-p” is named after Enron employee Phillip
K. Allen. Each employee folder contains subfolders, such as “inbox”, “sent”, “_sent_mail”,
“discussion_threads”, “all_documents”, “deleted_items”, and subfolders created by the
employee. A large number of duplicate emails exist in those folders.
An Enron email message contains the following header fields in order (the header field in
parenthesis is optional): “Message-ID”, “Date”, “From”, (“To”), “Subject”, (“Cc”), “Mime-
Version”, “Content-Type”, “ContentTransfer-Encoding”, (“Bcc”), “X-From”, “X-To”, “X-cc”,
“X-bcc”, “X-Folder”, “X-Origin”, and “X-FileName”. The email content is separated with the
headers by a blank line.
3. Parsing E-Mails from the dataset
Apache OpenNLP library is used to parse and read the contents of the email. Emails from each
employee’s folder are parsed using Open NLP.
4. Natural Language Processing (NLP)
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural
language text. It supports the most common NLP tasks, such as tokenization, sentence
P a g e | 4|
5. Using Social Networking Theory to Understand Power in Organizations
segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and
coreference resolution. These tasks are usually required to build more advanced text processing
services. OpenNLP also includes maximum entropy and perceptron based machine learning.
5. Extracting E-Mail Id’s from Dataset
Once the e-mail is parsed using Open NLP, the next task is to extract the Email Id’s from the
mail. Here regular expression is used to match with email id’s to extract them. Each email
contains email headers such as Message Id, Date, From, To, Subject and Email Content.
Sample Email header format
Message-ID: <18782981.1075855378110.JavaMail.evans@thyme>
Date: Mon, 14 May 2001 16:39:00 -0700 (PDT)
From: phillip.allen@enron.com
To: tim.belden@enron.com
Subject:
Here we are extracting email id for the “From:” and “To:” keywords using the regular
expression below.
Regular Expression used to match Email id’s
[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+
Pattern Matcher is used to extract email id encountered from the parsed data.
Matcher matcher =
Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-.]+").matcher(a);
Once the From and To emails are extracted they are saved in the corresponding
fromEmail and toEmail variables.
6. Data structure to store the Email Id’s
Once the From and To emails have been extracted they need to be stored to data structure to
preform analysis. The data structure used to store the email id’s is a hash map.
HashMap<Key, HashMap<Key, Value>> map = new HashMap<>()
• The Outer HashMap key contains the From email id.
• The inner HashMap key contains the To email id.
• The inner HashMap value is the count of the emails sent between the From and To email
id’s.
P a g e | 5|
6. Using Social Networking Theory to Understand Power in Organizations
Sample HashMap Representation:
**********************HashMap Representation******************
[phillip.allen@enron.com = {tim.belden@enron.com=2 , jsmith@austintx.com=1}]
In the above sample representation we can find that the
• Outer HashMap key is phillip.allen@enron.com is the From Email id.
• Inner HashMap contains two keys tim.beldon@enron.com and jsmith@austintx.com
which To Email id’s.
• Inner HashMap value contains the count of the email communication between the
From and the To Email id’s.
Graph Representation
• All the email id’s are added as nodes in a Hash Set.
• From email id’s, which are the key of the outer HashMap, are added as the vertices.
• An Edge is created between From Email id and the To Email id.
• To Email Id value is added as the weight for the corresponding Edge connection.
Sample Graph Representation
P a g e | 6|
Phillip.allen@enron.com
Tim.beldon@enron.com jsmith@austintx.com
2 1
7. Using Social Networking Theory to Understand Power in Organizations
7. Serializing the HashMap Data
The application for parsing large amount of email data requires long runtime. In order to
improve the application performance, once all the emails have been parsed and added to Hash
Map we serialize the hash map data to text file which can be loaded again when the application
runs for the next time.
The hash map is serialized using the file output stream.
FileOutputStream fos = new FileOutputStream("hashmap.txt");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(graphMap);
oos.close();
fos.close();
8. Deserializing the HashMap Data
Once the hashmap data has serialized when the application runs for the next time we can simply
deserialize the data save the time used to parse all the data once again.
FileInputStream fis = new FileInputStream("hashmap.txt");
ObjectInputStream ois = new ObjectInputStream(fis);
deserializedGraphMap = (HashMap) ois.readObject();
ois.close();
fis.close();
P a g e | 7|
8. Using Social Networking Theory to Understand Power in Organizations
9. Graph Analysis
Email Id vs. No. of Outgoing Edges
E-Mail Outgoing Edges
sally.beck@enron 89892
vince.kaminski@enron.com 7 719
tana.jones@enron.com 705
jeff.dasovich@enron.com 642
P a g e | 8|
9. Using Social Networking Theory to Understand Power in Organizations
Employee Email Id vs No. of Incoming Edges
P a g e | 9|
10. Using Social Networking Theory to Understand Power in Organizations
10. Degree Centrality
Degree Centrality of a node refers to the number of edges attached to the node. In order to find
the standardized score, we need to divide each score by n-1 (n = number of nodes). In the case
of a directed network, we usually define two separate measures of degree centrality, namely
indegree and outdegree.
Indegree: Count of the number of ties directed to the node. Indegree is often interpreted
as a form of popularity.
Outdegree: Number of ties that the node directs to others. Outdegree is often interpreted
as gregariousness.
Degree Centrality Formula:
Degree Centrality = (No. of Inbound edges + No. of Outbound edges) / (n-1)
Where n is the number of nodes.
Bar Chart Representation of the Degree Centrality Score
P a g e | 10|
11. Using Social Networking Theory to Understand Power in Organizations
Outdegree Graph Visualization for Max Degree Centrality Score
P a g e | 11|
12. Using Social Networking Theory to Understand Power in Organizations
Here we can find Outdegree graph visualization for the employee email id max degree centrality
score, the image below represents the Indegree graph visualization for the same employee email
id.
On analyzing both the graphs we can find that the number outbound links are considerably
higher than the number of inbound links.
The no. of outbound links contribute substantially to the degree centrality score.
With directed data, however, it can be important to distinguish centrality based on in-degree
from centrality based on out-degree. If an actor receives many ties, they are often said to
be prominent, or to have high prestige. That is, many other actors seek to direct ties to them, and
this may indicate their importance. Actors who have unusually high out-degree are actors who
are able to exchange with many others, or make many others aware of their views. Actors who
display high out-degree centrality are often said to be influential actors.
Indegree Graph Visualization for Max Degree Centrality Score
P a g e | 12|
13. Using Social Networking Theory to Understand Power in Organizations
11. Farness
The farness of a node x is defined as the sum of its distances from all other nodes. We calculate
the shortest path distance from a node to all the other nodes.
Formulae
Farness score = (sum of the weighted shortest path distances from all other nodes.)
Farness Chart Representation
P a g e | 13|
14. Using Social Networking Theory to Understand Power in Organizations
12. Transitivity
Transitivity of a relation means that when there is a tie from i to j, and also from j to h, then there
is also a tie from i to h: friends of my friends are my friends.
Here the Transitive closure is calculated considering how far a node can reach out to other nodes
to which it is not directly connected.
P a g e | 14|
15. Using Social Networking Theory to Understand Power in Organizations
Employee Email ID Transitivity Score
lavorato@enron.com 76
Selly.beck@enron.com 43
Kay.chapman@enorn.com 42
Louise.kitchen@enron.com 40
13. Closeness Centrality
Closeness centrality is considered as a more global measure of centrality, as compared with
degree, indegree and outdegree. That is closeness centrality takes into consideration the entire
network of ties when calculating the centrality of an individual actor.
Closeness centrality is determined by the short path lengths linking the actors together: it
measures the centrality as the distance between the actors, where actors who have the shortest
distance to other actors are seen as having the most closeness centrality.
P a g e | 15|
16. Using Social Networking Theory to Understand Power in Organizations
Formulae
Closeness centrality = (n-1)/ (sum of the weighted shortest path distances from all other nodes.)
Where n is the number of nodes.
14. Conclusion
In this project we construct a graph from the Enron Email Dataset and analyze its graph
theoretical properties. We have used various social networking theory and graph analysis
techniques to understand power in organizations.
P a g e | 16|
17. Using Social Networking Theory to Understand Power in Organizations
15. References
http://research.cs.queensu.ca/~skill/proceedings/yener.pdf
http://www.egr.msu.edu/waves/publications_files/2012_09_mohammad.pdf
https://en.wikipedia.org/wiki/Centrality#Closeness_centrality
http://www.cs.rpi.edu/~goldberg/publications/cleaning.pdf
https://books.google.com/books?
id=wZYQAgAAQBAJ&pg=PA108&lpg=PA108&dq=how+to+calculate+farness&source=bl&o
ts=9S_U030wAf&sig=DyjoeJ2DxTkKzC6q7fkpDW7s6RY&hl=en&sa=X&ved=0CCcQ6AEw
AWoVChMIj7Ogpqe1xwIVTJseCh2YAwff#v=onepage&q=how%20to%20calculate
%20farness&f=false
P a g e | 17|