1. LOKESH SHANMUGANANDAM | NORTHEASTERN UNIVERSITY
Using Social Networking Theory to
Understand Power in Organizations
Under the guidance of Prof. KAL BUGRARA
Using Social Networking Theory to Understand Power in Organizations
P a g e | 1|
2. Using Social Networking Theory to Understand Power in Organizations
Project Report
P a g e | 2|
3. Using Social Networking Theory to Understand Power in Organizations
TABLE OF CONTENTS
TABLE OF CONTENTS.................................................................................................................3
.........................................................................................................................................................3
1.Objective.......................................................................................................................................4
2.Enron E-Mail Data Set..................................................................................................................4
3.Parsing E-Mails from the dataset..................................................................................................4
4.Natural Language Processing (NLP)............................................................................................4
5.Extracting E-Mail Id’s from Dataset.............................................................................................5
6.Data structure to store the Email Id’s...........................................................................................5
7.Serializing the HashMap Data......................................................................................................7
8.Deserializing the HashMap Data..................................................................................................7
9.Graph Analysis..............................................................................................................................8
10.Degree Centrality......................................................................................................................10
11.Farness......................................................................................................................................13
12. Transitivity...............................................................................................................................14
13. Closeness Centrality.................................................................................................................15
14. Conclusion...............................................................................................................................16
15.References.................................................................................................................................17
P a g e | 3|
4. Using Social Networking Theory to Understand Power in Organizations
1. Objective
To study Power in organizations using graph algorithm design and analysis. To analyze e-mail
communication between people to understand who is in a better bargaining position and has
more chance of influencing others. To create a graph model and apply the graph algorithms to
study power in organizations. Analyze the graph model and understand who is in a better
bargaining position, more chances for making things happen, and more flexibilities.
2. Enron E-Mail Data Set
In this project we use Enron email dataset to study and understand the power in organizations.
The Enron email dataset is valuable because it is one of the very few collections of
organizational emails that are publicly available. The emails of this period (1998.11 - 2002.6)
record the dynamics of Enron, from glory to collapse.
The Enron email dataset contains 517,431 messages organized into 150 folders. The folder’s
name is given as the employee’s last name, followed by a dash, followed by the initial letter of
the employee’s first name. For example, folder “allen-p” is named after Enron employee Phillip
K. Allen. Each employee folder contains subfolders, such as “inbox”, “sent”, “_sent_mail”,
“discussion_threads”, “all_documents”, “deleted_items”, and subfolders created by the
employee. A large number of duplicate emails exist in those folders.
An Enron email message contains the following header fields in order (the header field in
parenthesis is optional): “Message-ID”, “Date”, “From”, (“To”), “Subject”, (“Cc”), “Mime-
Version”, “Content-Type”, “ContentTransfer-Encoding”, (“Bcc”), “X-From”, “X-To”, “X-cc”,
“X-bcc”, “X-Folder”, “X-Origin”, and “X-FileName”. The email content is separated with the
headers by a blank line.
3. Parsing E-Mails from the dataset
Apache OpenNLP library is used to parse and read the contents of the email. Emails from each
employee’s folder are parsed using Open NLP.
4. Natural Language Processing (NLP)
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural
language text. It supports the most common NLP tasks, such as tokenization, sentence
P a g e | 4|
5. Using Social Networking Theory to Understand Power in Organizations
segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and
coreference resolution. These tasks are usually required to build more advanced text processing
services. OpenNLP also includes maximum entropy and perceptron based machine learning.
5. Extracting E-Mail Id’s from Dataset
Once the e-mail is parsed using Open NLP, the next task is to extract the Email Id’s from the
mail. Here regular expression is used to match with email id’s to extract them. Each email
contains email headers such as Message Id, Date, From, To, Subject and Email Content.
Sample Email header format
Message-ID: <18782981.1075855378110.JavaMail.evans@thyme>
Date: Mon, 14 May 2001 16:39:00 -0700 (PDT)
From: phillip.allen@enron.com
To: tim.belden@enron.com
Subject:
Here we are extracting email id for the “From:” and “To:” keywords using the regular
expression below.
Regular Expression used to match Email id’s
[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+
Pattern Matcher is used to extract email id encountered from the parsed data.
Matcher matcher =
Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-.]+").matcher(a);
Once the From and To emails are extracted they are saved in the corresponding
fromEmail and toEmail variables.
6. Data structure to store the Email Id’s
Once the From and To emails have been extracted they need to be stored to data structure to
preform analysis. The data structure used to store the email id’s is a hash map.
HashMap<Key, HashMap<Key, Value>> map = new HashMap<>()
• The Outer HashMap key contains the From email id.
• The inner HashMap key contains the To email id.
• The inner HashMap value is the count of the emails sent between the From and To email
id’s.
P a g e | 5|
6. Using Social Networking Theory to Understand Power in Organizations
Sample HashMap Representation:
**********************HashMap Representation******************
[phillip.allen@enron.com = {tim.belden@enron.com=2 , jsmith@austintx.com=1}]
In the above sample representation we can find that the
• Outer HashMap key is phillip.allen@enron.com is the From Email id.
• Inner HashMap contains two keys tim.beldon@enron.com and jsmith@austintx.com
which To Email id’s.
• Inner HashMap value contains the count of the email communication between the
From and the To Email id’s.
Graph Representation
• All the email id’s are added as nodes in a Hash Set.
• From email id’s, which are the key of the outer HashMap, are added as the vertices.
• An Edge is created between From Email id and the To Email id.
• To Email Id value is added as the weight for the corresponding Edge connection.
Sample Graph Representation
P a g e | 6|
Phillip.allen@enron.com
Tim.beldon@enron.com jsmith@austintx.com
2 1
7. Using Social Networking Theory to Understand Power in Organizations
7. Serializing the HashMap Data
The application for parsing large amount of email data requires long runtime. In order to
improve the application performance, once all the emails have been parsed and added to Hash
Map we serialize the hash map data to text file which can be loaded again when the application
runs for the next time.
The hash map is serialized using the file output stream.
FileOutputStream fos = new FileOutputStream("hashmap.txt");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(graphMap);
oos.close();
fos.close();
8. Deserializing the HashMap Data
Once the hashmap data has serialized when the application runs for the next time we can simply
deserialize the data save the time used to parse all the data once again.
FileInputStream fis = new FileInputStream("hashmap.txt");
ObjectInputStream ois = new ObjectInputStream(fis);
deserializedGraphMap = (HashMap) ois.readObject();
ois.close();
fis.close();
P a g e | 7|
8. Using Social Networking Theory to Understand Power in Organizations
9. Graph Analysis
Email Id vs. No. of Outgoing Edges
E-Mail Outgoing Edges
sally.beck@enron 89892
vince.kaminski@enron.com 7 719
tana.jones@enron.com 705
jeff.dasovich@enron.com 642
P a g e | 8|
9. Using Social Networking Theory to Understand Power in Organizations
Employee Email Id vs No. of Incoming Edges
P a g e | 9|
10. Using Social Networking Theory to Understand Power in Organizations
10. Degree Centrality
Degree Centrality of a node refers to the number of edges attached to the node. In order to find
the standardized score, we need to divide each score by n-1 (n = number of nodes). In the case
of a directed network, we usually define two separate measures of degree centrality, namely
indegree and outdegree.
Indegree: Count of the number of ties directed to the node. Indegree is often interpreted
as a form of popularity.
Outdegree: Number of ties that the node directs to others. Outdegree is often interpreted
as gregariousness.
Degree Centrality Formula:
Degree Centrality = (No. of Inbound edges + No. of Outbound edges) / (n-1)
Where n is the number of nodes.
Bar Chart Representation of the Degree Centrality Score
P a g e | 10|
11. Using Social Networking Theory to Understand Power in Organizations
Outdegree Graph Visualization for Max Degree Centrality Score
P a g e | 11|
12. Using Social Networking Theory to Understand Power in Organizations
Here we can find Outdegree graph visualization for the employee email id max degree centrality
score, the image below represents the Indegree graph visualization for the same employee email
id.
On analyzing both the graphs we can find that the number outbound links are considerably
higher than the number of inbound links.
The no. of outbound links contribute substantially to the degree centrality score.
With directed data, however, it can be important to distinguish centrality based on in-degree
from centrality based on out-degree. If an actor receives many ties, they are often said to
be prominent, or to have high prestige. That is, many other actors seek to direct ties to them, and
this may indicate their importance. Actors who have unusually high out-degree are actors who
are able to exchange with many others, or make many others aware of their views. Actors who
display high out-degree centrality are often said to be influential actors.
Indegree Graph Visualization for Max Degree Centrality Score
P a g e | 12|
13. Using Social Networking Theory to Understand Power in Organizations
11. Farness
The farness of a node x is defined as the sum of its distances from all other nodes. We calculate
the shortest path distance from a node to all the other nodes.
Formulae
Farness score = (sum of the weighted shortest path distances from all other nodes.)
Farness Chart Representation
P a g e | 13|
14. Using Social Networking Theory to Understand Power in Organizations
12. Transitivity
Transitivity of a relation means that when there is a tie from i to j, and also from j to h, then there
is also a tie from i to h: friends of my friends are my friends.
Here the Transitive closure is calculated considering how far a node can reach out to other nodes
to which it is not directly connected.
P a g e | 14|
15. Using Social Networking Theory to Understand Power in Organizations
Employee Email ID Transitivity Score
lavorato@enron.com 76
Selly.beck@enron.com 43
Kay.chapman@enorn.com 42
Louise.kitchen@enron.com 40
13. Closeness Centrality
Closeness centrality is considered as a more global measure of centrality, as compared with
degree, indegree and outdegree. That is closeness centrality takes into consideration the entire
network of ties when calculating the centrality of an individual actor.
Closeness centrality is determined by the short path lengths linking the actors together: it
measures the centrality as the distance between the actors, where actors who have the shortest
distance to other actors are seen as having the most closeness centrality.
P a g e | 15|
16. Using Social Networking Theory to Understand Power in Organizations
Formulae
Closeness centrality = (n-1)/ (sum of the weighted shortest path distances from all other nodes.)
Where n is the number of nodes.
14. Conclusion
In this project we construct a graph from the Enron Email Dataset and analyze its graph
theoretical properties. We have used various social networking theory and graph analysis
techniques to understand power in organizations.
P a g e | 16|
17. Using Social Networking Theory to Understand Power in Organizations
15. References
http://research.cs.queensu.ca/~skill/proceedings/yener.pdf
http://www.egr.msu.edu/waves/publications_files/2012_09_mohammad.pdf
https://en.wikipedia.org/wiki/Centrality#Closeness_centrality
http://www.cs.rpi.edu/~goldberg/publications/cleaning.pdf
https://books.google.com/books?
id=wZYQAgAAQBAJ&pg=PA108&lpg=PA108&dq=how+to+calculate+farness&source=bl&o
ts=9S_U030wAf&sig=DyjoeJ2DxTkKzC6q7fkpDW7s6RY&hl=en&sa=X&ved=0CCcQ6AEw
AWoVChMIj7Ogpqe1xwIVTJseCh2YAwff#v=onepage&q=how%20to%20calculate
%20farness&f=false
P a g e | 17|