SlideShare a Scribd company logo
1 of 17
LOKESH SHANMUGANANDAM | NORTHEASTERN UNIVERSITY
Using Social Networking Theory to
Understand Power in Organizations
Under the guidance of Prof. KAL BUGRARA
Using Social Networking Theory to Understand Power in Organizations
P a g e | 1|
Using Social Networking Theory to Understand Power in Organizations
Project Report
P a g e | 2|
Using Social Networking Theory to Understand Power in Organizations
TABLE OF CONTENTS
TABLE OF CONTENTS.................................................................................................................3
.........................................................................................................................................................3
1.Objective.......................................................................................................................................4
2.Enron E-Mail Data Set..................................................................................................................4
3.Parsing E-Mails from the dataset..................................................................................................4
4.Natural Language Processing (NLP)............................................................................................4
5.Extracting E-Mail Id’s from Dataset.............................................................................................5
6.Data structure to store the Email Id’s...........................................................................................5
7.Serializing the HashMap Data......................................................................................................7
8.Deserializing the HashMap Data..................................................................................................7
9.Graph Analysis..............................................................................................................................8
10.Degree Centrality......................................................................................................................10
11.Farness......................................................................................................................................13
12. Transitivity...............................................................................................................................14
13. Closeness Centrality.................................................................................................................15
14. Conclusion...............................................................................................................................16
15.References.................................................................................................................................17
P a g e | 3|
Using Social Networking Theory to Understand Power in Organizations
1. Objective
To study Power in organizations using graph algorithm design and analysis. To analyze e-mail
communication between people to understand who is in a better bargaining position and has
more chance of influencing others. To create a graph model and apply the graph algorithms to
study power in organizations. Analyze the graph model and understand who is in a better
bargaining position, more chances for making things happen, and more flexibilities.
2. Enron E-Mail Data Set
In this project we use Enron email dataset to study and understand the power in organizations.
The Enron email dataset is valuable because it is one of the very few collections of
organizational emails that are publicly available. The emails of this period (1998.11 - 2002.6)
record the dynamics of Enron, from glory to collapse.
The Enron email dataset contains 517,431 messages organized into 150 folders. The folder’s
name is given as the employee’s last name, followed by a dash, followed by the initial letter of
the employee’s first name. For example, folder “allen-p” is named after Enron employee Phillip
K. Allen. Each employee folder contains subfolders, such as “inbox”, “sent”, “_sent_mail”,
“discussion_threads”, “all_documents”, “deleted_items”, and subfolders created by the
employee. A large number of duplicate emails exist in those folders.
An Enron email message contains the following header fields in order (the header field in
parenthesis is optional): “Message-ID”, “Date”, “From”, (“To”), “Subject”, (“Cc”), “Mime-
Version”, “Content-Type”, “ContentTransfer-Encoding”, (“Bcc”), “X-From”, “X-To”, “X-cc”,
“X-bcc”, “X-Folder”, “X-Origin”, and “X-FileName”. The email content is separated with the
headers by a blank line.
3. Parsing E-Mails from the dataset
Apache OpenNLP library is used to parse and read the contents of the email. Emails from each
employee’s folder are parsed using Open NLP.
4. Natural Language Processing (NLP)
The Apache OpenNLP library is a machine learning based toolkit for the processing of natural
language text. It supports the most common NLP tasks, such as tokenization, sentence
P a g e | 4|
Using Social Networking Theory to Understand Power in Organizations
segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and
coreference resolution. These tasks are usually required to build more advanced text processing
services. OpenNLP also includes maximum entropy and perceptron based machine learning.
5. Extracting E-Mail Id’s from Dataset
Once the e-mail is parsed using Open NLP, the next task is to extract the Email Id’s from the
mail. Here regular expression is used to match with email id’s to extract them. Each email
contains email headers such as Message Id, Date, From, To, Subject and Email Content.
Sample Email header format
Message-ID: <18782981.1075855378110.JavaMail.evans@thyme>
Date: Mon, 14 May 2001 16:39:00 -0700 (PDT)
From: phillip.allen@enron.com
To: tim.belden@enron.com
Subject:
Here we are extracting email id for the “From:” and “To:” keywords using the regular
expression below.
Regular Expression used to match Email id’s
[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+
Pattern Matcher is used to extract email id encountered from the parsed data.
Matcher matcher =
Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-.]+").matcher(a);
Once the From and To emails are extracted they are saved in the corresponding
fromEmail and toEmail variables.
6. Data structure to store the Email Id’s
Once the From and To emails have been extracted they need to be stored to data structure to
preform analysis. The data structure used to store the email id’s is a hash map.
HashMap<Key, HashMap<Key, Value>> map = new HashMap<>()
• The Outer HashMap key contains the From email id.
• The inner HashMap key contains the To email id.
• The inner HashMap value is the count of the emails sent between the From and To email
id’s.
P a g e | 5|
Using Social Networking Theory to Understand Power in Organizations
Sample HashMap Representation:
**********************HashMap Representation******************
[phillip.allen@enron.com = {tim.belden@enron.com=2 , jsmith@austintx.com=1}]
In the above sample representation we can find that the
• Outer HashMap key is phillip.allen@enron.com is the From Email id.
• Inner HashMap contains two keys tim.beldon@enron.com and jsmith@austintx.com
which To Email id’s.
• Inner HashMap value contains the count of the email communication between the
From and the To Email id’s.
Graph Representation
• All the email id’s are added as nodes in a Hash Set.
• From email id’s, which are the key of the outer HashMap, are added as the vertices.
• An Edge is created between From Email id and the To Email id.
• To Email Id value is added as the weight for the corresponding Edge connection.
Sample Graph Representation
P a g e | 6|
Phillip.allen@enron.com
Tim.beldon@enron.com jsmith@austintx.com
2 1
Using Social Networking Theory to Understand Power in Organizations
7. Serializing the HashMap Data
The application for parsing large amount of email data requires long runtime. In order to
improve the application performance, once all the emails have been parsed and added to Hash
Map we serialize the hash map data to text file which can be loaded again when the application
runs for the next time.
The hash map is serialized using the file output stream.
FileOutputStream fos = new FileOutputStream("hashmap.txt");
ObjectOutputStream oos = new ObjectOutputStream(fos);
oos.writeObject(graphMap);
oos.close();
fos.close();
8. Deserializing the HashMap Data
Once the hashmap data has serialized when the application runs for the next time we can simply
deserialize the data save the time used to parse all the data once again.
FileInputStream fis = new FileInputStream("hashmap.txt");
ObjectInputStream ois = new ObjectInputStream(fis);
deserializedGraphMap = (HashMap) ois.readObject();
ois.close();
fis.close();
P a g e | 7|
Using Social Networking Theory to Understand Power in Organizations
9. Graph Analysis
Email Id vs. No. of Outgoing Edges
E-Mail Outgoing Edges
sally.beck@enron 89892
vince.kaminski@enron.com 7 719
tana.jones@enron.com 705
jeff.dasovich@enron.com 642
P a g e | 8|
Using Social Networking Theory to Understand Power in Organizations
Employee Email Id vs No. of Incoming Edges
P a g e | 9|
Using Social Networking Theory to Understand Power in Organizations
10. Degree Centrality
Degree Centrality of a node refers to the number of edges attached to the node. In order to find
the standardized score, we need to divide each score by n-1 (n = number of nodes). In the case
of a directed network, we usually define two separate measures of degree centrality, namely
indegree and outdegree.
Indegree: Count of the number of ties directed to the node. Indegree is often interpreted
as a form of popularity.
Outdegree: Number of ties that the node directs to others. Outdegree is often interpreted
as gregariousness.
Degree Centrality Formula:
Degree Centrality = (No. of Inbound edges + No. of Outbound edges) / (n-1)
Where n is the number of nodes.
Bar Chart Representation of the Degree Centrality Score
P a g e | 10|
Using Social Networking Theory to Understand Power in Organizations
Outdegree Graph Visualization for Max Degree Centrality Score
P a g e | 11|
Using Social Networking Theory to Understand Power in Organizations
Here we can find Outdegree graph visualization for the employee email id max degree centrality
score, the image below represents the Indegree graph visualization for the same employee email
id.
On analyzing both the graphs we can find that the number outbound links are considerably
higher than the number of inbound links.
The no. of outbound links contribute substantially to the degree centrality score.
With directed data, however, it can be important to distinguish centrality based on in-degree
from centrality based on out-degree. If an actor receives many ties, they are often said to
be prominent, or to have high prestige. That is, many other actors seek to direct ties to them, and
this may indicate their importance. Actors who have unusually high out-degree are actors who
are able to exchange with many others, or make many others aware of their views. Actors who
display high out-degree centrality are often said to be influential actors.
Indegree Graph Visualization for Max Degree Centrality Score
P a g e | 12|
Using Social Networking Theory to Understand Power in Organizations
11. Farness
The farness of a node x is defined as the sum of its distances from all other nodes. We calculate
the shortest path distance from a node to all the other nodes.
Formulae
Farness score = (sum of the weighted shortest path distances from all other nodes.)
Farness Chart Representation
P a g e | 13|
Using Social Networking Theory to Understand Power in Organizations
12. Transitivity
Transitivity of a relation means that when there is a tie from i to j, and also from j to h, then there
is also a tie from i to h: friends of my friends are my friends.
Here the Transitive closure is calculated considering how far a node can reach out to other nodes
to which it is not directly connected.
P a g e | 14|
Using Social Networking Theory to Understand Power in Organizations
Employee Email ID Transitivity Score
lavorato@enron.com 76
Selly.beck@enron.com 43
Kay.chapman@enorn.com 42
Louise.kitchen@enron.com 40
13. Closeness Centrality
Closeness centrality is considered as a more global measure of centrality, as compared with
degree, indegree and outdegree. That is closeness centrality takes into consideration the entire
network of ties when calculating the centrality of an individual actor.
Closeness centrality is determined by the short path lengths linking the actors together: it
measures the centrality as the distance between the actors, where actors who have the shortest
distance to other actors are seen as having the most closeness centrality.
P a g e | 15|
Using Social Networking Theory to Understand Power in Organizations
Formulae
Closeness centrality = (n-1)/ (sum of the weighted shortest path distances from all other nodes.)
Where n is the number of nodes.
14. Conclusion
In this project we construct a graph from the Enron Email Dataset and analyze its graph
theoretical properties. We have used various social networking theory and graph analysis
techniques to understand power in organizations.
P a g e | 16|
Using Social Networking Theory to Understand Power in Organizations
15. References
http://research.cs.queensu.ca/~skill/proceedings/yener.pdf
http://www.egr.msu.edu/waves/publications_files/2012_09_mohammad.pdf
https://en.wikipedia.org/wiki/Centrality#Closeness_centrality
http://www.cs.rpi.edu/~goldberg/publications/cleaning.pdf
https://books.google.com/books?
id=wZYQAgAAQBAJ&pg=PA108&lpg=PA108&dq=how+to+calculate+farness&source=bl&o
ts=9S_U030wAf&sig=DyjoeJ2DxTkKzC6q7fkpDW7s6RY&hl=en&sa=X&ved=0CCcQ6AEw
AWoVChMIj7Ogpqe1xwIVTJseCh2YAwff#v=onepage&q=how%20to%20calculate
%20farness&f=false
P a g e | 17|

More Related Content

What's hot

IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET Journal
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingSunny Kr
 
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeWeb Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeIRJET Journal
 
Exploiting web search engines to search structured
Exploiting web search engines to search structuredExploiting web search engines to search structured
Exploiting web search engines to search structuredNita Pawar
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET Journal
 
Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Editor IJARCET
 
Hybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithmHybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithmeSAT Publishing House
 
Dhiraj Gurnan Twitter prototype with calculator
Dhiraj Gurnan Twitter prototype with calculatorDhiraj Gurnan Twitter prototype with calculator
Dhiraj Gurnan Twitter prototype with calculatorDhiraj Gurnani
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAcsandit
 
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...IJECEIAES
 

What's hot (14)

IRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text RankIRJET- Automatic Text Summarization using Text Rank
IRJET- Automatic Text Summarization using Text Rank
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
 
A scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linkingA scalable gibbs sampler for probabilistic entity linking
A scalable gibbs sampler for probabilistic entity linking
 
What is merkle tree
What is merkle treeWhat is merkle tree
What is merkle tree
 
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage KnowledgeWeb Page Recommendation using Domain Knowledge and Web Usage Knowledge
Web Page Recommendation using Domain Knowledge and Web Usage Knowledge
 
Exploiting web search engines to search structured
Exploiting web search engines to search structuredExploiting web search engines to search structured
Exploiting web search engines to search structured
 
Mcs 021
Mcs 021Mcs 021
Mcs 021
 
Mcs 021 solve assignment
Mcs 021 solve assignmentMcs 021 solve assignment
Mcs 021 solve assignment
 
IRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLPIRJET - Event Notifier on Scraped Mails using NLP
IRJET - Event Notifier on Scraped Mails using NLP
 
Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883Ijarcet vol-2-issue-3-881-883
Ijarcet vol-2-issue-3-881-883
 
Hybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithmHybrid approach for generating non overlapped substring using genetic algorithm
Hybrid approach for generating non overlapped substring using genetic algorithm
 
Dhiraj Gurnan Twitter prototype with calculator
Dhiraj Gurnan Twitter prototype with calculatorDhiraj Gurnan Twitter prototype with calculator
Dhiraj Gurnan Twitter prototype with calculator
 
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAEFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATA
 
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
Optimisation towards Latent Dirichlet Allocation: Its Topic Number and Collap...
 

Similar to UsingSocialNetworkingTheoryToUnderstandPowerinOrganizations

Tip: Data Scoring: Convert data with XQuery
Tip: Data Scoring: Convert data with XQueryTip: Data Scoring: Convert data with XQuery
Tip: Data Scoring: Convert data with XQueryGeert Josten
 
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...Venkat Projects
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptxAnush90
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Miningiosrjce
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type detemayank272369
 
Excel analysis assignment this is an independent assignment me
Excel analysis assignment this is an independent assignment meExcel analysis assignment this is an independent assignment me
Excel analysis assignment this is an independent assignment mejoney4
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache PinotSiddharth Teotia
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rYanchang Zhao
 
How to Trace an E-mail Part 1
How to Trace an E-mail Part 1How to Trace an E-mail Part 1
How to Trace an E-mail Part 1Lebowitzcomics
 
India build problem
India build problemIndia build problem
India build problemICE CUBE
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...ijsrd.com
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailijsrd.com
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...IRJET Journal
 
Modeling employees relationships with Apache Spark
Modeling employees relationships with Apache SparkModeling employees relationships with Apache Spark
Modeling employees relationships with Apache SparkWassim TRIFI
 
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...amit kuraria
 
MiningEmailSocialNetworks
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworkswebuploader
 

Similar to UsingSocialNetworkingTheoryToUnderstandPowerinOrganizations (20)

Tip: Data Scoring: Convert data with XQuery
Tip: Data Scoring: Convert data with XQueryTip: Data Scoring: Convert data with XQuery
Tip: Data Scoring: Convert data with XQuery
 
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
Emailphishing(deep anti phishnet applying deep neural networks for phishing e...
 
miniproject.ppt.pptx
miniproject.ppt.pptxminiproject.ppt.pptx
miniproject.ppt.pptx
 
Extraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity MiningExtraction of Data Using Comparable Entity Mining
Extraction of Data Using Comparable Entity Mining
 
E017252831
E017252831E017252831
E017252831
 
Sherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type deteSherlock a deep learning approach to semantic data type dete
Sherlock a deep learning approach to semantic data type dete
 
Excel analysis assignment this is an independent assignment me
Excel analysis assignment this is an independent assignment meExcel analysis assignment this is an independent assignment me
Excel analysis assignment this is an independent assignment me
 
New Features in Apache Pinot
New Features in Apache PinotNew Features in Apache Pinot
New Features in Apache Pinot
 
3iemail
3iemail3iemail
3iemail
 
RDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-rRDataMining slides-text-mining-with-r
RDataMining slides-text-mining-with-r
 
How to Trace an E-mail Part 1
How to Trace an E-mail Part 1How to Trace an E-mail Part 1
How to Trace an E-mail Part 1
 
India build problem
India build problemIndia build problem
India build problem
 
Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...Study, analysis and formulation of a new method for integrity protection of d...
Study, analysis and formulation of a new method for integrity protection of d...
 
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mailText Based Fuzzy Clustering Algorithm to Filter Spam E-mail
Text Based Fuzzy Clustering Algorithm to Filter Spam E-mail
 
Does sizematter
Does sizematterDoes sizematter
Does sizematter
 
Selenium-Locators
Selenium-LocatorsSelenium-Locators
Selenium-Locators
 
The Detection of Suspicious Email Based on Decision Tree ...
The Detection of Suspicious Email Based on Decision Tree                     ...The Detection of Suspicious Email Based on Decision Tree                     ...
The Detection of Suspicious Email Based on Decision Tree ...
 
Modeling employees relationships with Apache Spark
Modeling employees relationships with Apache SparkModeling employees relationships with Apache Spark
Modeling employees relationships with Apache Spark
 
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
Write better python code with these 10 tricks | by yong cui, ph.d. | aug, 202...
 
MiningEmailSocialNetworks
MiningEmailSocialNetworksMiningEmailSocialNetworks
MiningEmailSocialNetworks
 

UsingSocialNetworkingTheoryToUnderstandPowerinOrganizations

  • 1. LOKESH SHANMUGANANDAM | NORTHEASTERN UNIVERSITY Using Social Networking Theory to Understand Power in Organizations Under the guidance of Prof. KAL BUGRARA Using Social Networking Theory to Understand Power in Organizations P a g e | 1|
  • 2. Using Social Networking Theory to Understand Power in Organizations Project Report P a g e | 2|
  • 3. Using Social Networking Theory to Understand Power in Organizations TABLE OF CONTENTS TABLE OF CONTENTS.................................................................................................................3 .........................................................................................................................................................3 1.Objective.......................................................................................................................................4 2.Enron E-Mail Data Set..................................................................................................................4 3.Parsing E-Mails from the dataset..................................................................................................4 4.Natural Language Processing (NLP)............................................................................................4 5.Extracting E-Mail Id’s from Dataset.............................................................................................5 6.Data structure to store the Email Id’s...........................................................................................5 7.Serializing the HashMap Data......................................................................................................7 8.Deserializing the HashMap Data..................................................................................................7 9.Graph Analysis..............................................................................................................................8 10.Degree Centrality......................................................................................................................10 11.Farness......................................................................................................................................13 12. Transitivity...............................................................................................................................14 13. Closeness Centrality.................................................................................................................15 14. Conclusion...............................................................................................................................16 15.References.................................................................................................................................17 P a g e | 3|
  • 4. Using Social Networking Theory to Understand Power in Organizations 1. Objective To study Power in organizations using graph algorithm design and analysis. To analyze e-mail communication between people to understand who is in a better bargaining position and has more chance of influencing others. To create a graph model and apply the graph algorithms to study power in organizations. Analyze the graph model and understand who is in a better bargaining position, more chances for making things happen, and more flexibilities. 2. Enron E-Mail Data Set In this project we use Enron email dataset to study and understand the power in organizations. The Enron email dataset is valuable because it is one of the very few collections of organizational emails that are publicly available. The emails of this period (1998.11 - 2002.6) record the dynamics of Enron, from glory to collapse. The Enron email dataset contains 517,431 messages organized into 150 folders. The folder’s name is given as the employee’s last name, followed by a dash, followed by the initial letter of the employee’s first name. For example, folder “allen-p” is named after Enron employee Phillip K. Allen. Each employee folder contains subfolders, such as “inbox”, “sent”, “_sent_mail”, “discussion_threads”, “all_documents”, “deleted_items”, and subfolders created by the employee. A large number of duplicate emails exist in those folders. An Enron email message contains the following header fields in order (the header field in parenthesis is optional): “Message-ID”, “Date”, “From”, (“To”), “Subject”, (“Cc”), “Mime- Version”, “Content-Type”, “ContentTransfer-Encoding”, (“Bcc”), “X-From”, “X-To”, “X-cc”, “X-bcc”, “X-Folder”, “X-Origin”, and “X-FileName”. The email content is separated with the headers by a blank line. 3. Parsing E-Mails from the dataset Apache OpenNLP library is used to parse and read the contents of the email. Emails from each employee’s folder are parsed using Open NLP. 4. Natural Language Processing (NLP) The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence P a g e | 4|
  • 5. Using Social Networking Theory to Understand Power in Organizations segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services. OpenNLP also includes maximum entropy and perceptron based machine learning. 5. Extracting E-Mail Id’s from Dataset Once the e-mail is parsed using Open NLP, the next task is to extract the Email Id’s from the mail. Here regular expression is used to match with email id’s to extract them. Each email contains email headers such as Message Id, Date, From, To, Subject and Email Content. Sample Email header format Message-ID: <18782981.1075855378110.JavaMail.evans@thyme> Date: Mon, 14 May 2001 16:39:00 -0700 (PDT) From: phillip.allen@enron.com To: tim.belden@enron.com Subject: Here we are extracting email id for the “From:” and “To:” keywords using the regular expression below. Regular Expression used to match Email id’s [a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-9-.]+ Pattern Matcher is used to extract email id encountered from the parsed data. Matcher matcher = Pattern.compile("[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+.[a-zA-Z0-.]+").matcher(a); Once the From and To emails are extracted they are saved in the corresponding fromEmail and toEmail variables. 6. Data structure to store the Email Id’s Once the From and To emails have been extracted they need to be stored to data structure to preform analysis. The data structure used to store the email id’s is a hash map. HashMap<Key, HashMap<Key, Value>> map = new HashMap<>() • The Outer HashMap key contains the From email id. • The inner HashMap key contains the To email id. • The inner HashMap value is the count of the emails sent between the From and To email id’s. P a g e | 5|
  • 6. Using Social Networking Theory to Understand Power in Organizations Sample HashMap Representation: **********************HashMap Representation****************** [phillip.allen@enron.com = {tim.belden@enron.com=2 , jsmith@austintx.com=1}] In the above sample representation we can find that the • Outer HashMap key is phillip.allen@enron.com is the From Email id. • Inner HashMap contains two keys tim.beldon@enron.com and jsmith@austintx.com which To Email id’s. • Inner HashMap value contains the count of the email communication between the From and the To Email id’s. Graph Representation • All the email id’s are added as nodes in a Hash Set. • From email id’s, which are the key of the outer HashMap, are added as the vertices. • An Edge is created between From Email id and the To Email id. • To Email Id value is added as the weight for the corresponding Edge connection. Sample Graph Representation P a g e | 6| Phillip.allen@enron.com Tim.beldon@enron.com jsmith@austintx.com 2 1
  • 7. Using Social Networking Theory to Understand Power in Organizations 7. Serializing the HashMap Data The application for parsing large amount of email data requires long runtime. In order to improve the application performance, once all the emails have been parsed and added to Hash Map we serialize the hash map data to text file which can be loaded again when the application runs for the next time. The hash map is serialized using the file output stream. FileOutputStream fos = new FileOutputStream("hashmap.txt"); ObjectOutputStream oos = new ObjectOutputStream(fos); oos.writeObject(graphMap); oos.close(); fos.close(); 8. Deserializing the HashMap Data Once the hashmap data has serialized when the application runs for the next time we can simply deserialize the data save the time used to parse all the data once again. FileInputStream fis = new FileInputStream("hashmap.txt"); ObjectInputStream ois = new ObjectInputStream(fis); deserializedGraphMap = (HashMap) ois.readObject(); ois.close(); fis.close(); P a g e | 7|
  • 8. Using Social Networking Theory to Understand Power in Organizations 9. Graph Analysis Email Id vs. No. of Outgoing Edges E-Mail Outgoing Edges sally.beck@enron 89892 vince.kaminski@enron.com 7 719 tana.jones@enron.com 705 jeff.dasovich@enron.com 642 P a g e | 8|
  • 9. Using Social Networking Theory to Understand Power in Organizations Employee Email Id vs No. of Incoming Edges P a g e | 9|
  • 10. Using Social Networking Theory to Understand Power in Organizations 10. Degree Centrality Degree Centrality of a node refers to the number of edges attached to the node. In order to find the standardized score, we need to divide each score by n-1 (n = number of nodes). In the case of a directed network, we usually define two separate measures of degree centrality, namely indegree and outdegree. Indegree: Count of the number of ties directed to the node. Indegree is often interpreted as a form of popularity. Outdegree: Number of ties that the node directs to others. Outdegree is often interpreted as gregariousness. Degree Centrality Formula: Degree Centrality = (No. of Inbound edges + No. of Outbound edges) / (n-1) Where n is the number of nodes. Bar Chart Representation of the Degree Centrality Score P a g e | 10|
  • 11. Using Social Networking Theory to Understand Power in Organizations Outdegree Graph Visualization for Max Degree Centrality Score P a g e | 11|
  • 12. Using Social Networking Theory to Understand Power in Organizations Here we can find Outdegree graph visualization for the employee email id max degree centrality score, the image below represents the Indegree graph visualization for the same employee email id. On analyzing both the graphs we can find that the number outbound links are considerably higher than the number of inbound links. The no. of outbound links contribute substantially to the degree centrality score. With directed data, however, it can be important to distinguish centrality based on in-degree from centrality based on out-degree. If an actor receives many ties, they are often said to be prominent, or to have high prestige. That is, many other actors seek to direct ties to them, and this may indicate their importance. Actors who have unusually high out-degree are actors who are able to exchange with many others, or make many others aware of their views. Actors who display high out-degree centrality are often said to be influential actors. Indegree Graph Visualization for Max Degree Centrality Score P a g e | 12|
  • 13. Using Social Networking Theory to Understand Power in Organizations 11. Farness The farness of a node x is defined as the sum of its distances from all other nodes. We calculate the shortest path distance from a node to all the other nodes. Formulae Farness score = (sum of the weighted shortest path distances from all other nodes.) Farness Chart Representation P a g e | 13|
  • 14. Using Social Networking Theory to Understand Power in Organizations 12. Transitivity Transitivity of a relation means that when there is a tie from i to j, and also from j to h, then there is also a tie from i to h: friends of my friends are my friends. Here the Transitive closure is calculated considering how far a node can reach out to other nodes to which it is not directly connected. P a g e | 14|
  • 15. Using Social Networking Theory to Understand Power in Organizations Employee Email ID Transitivity Score lavorato@enron.com 76 Selly.beck@enron.com 43 Kay.chapman@enorn.com 42 Louise.kitchen@enron.com 40 13. Closeness Centrality Closeness centrality is considered as a more global measure of centrality, as compared with degree, indegree and outdegree. That is closeness centrality takes into consideration the entire network of ties when calculating the centrality of an individual actor. Closeness centrality is determined by the short path lengths linking the actors together: it measures the centrality as the distance between the actors, where actors who have the shortest distance to other actors are seen as having the most closeness centrality. P a g e | 15|
  • 16. Using Social Networking Theory to Understand Power in Organizations Formulae Closeness centrality = (n-1)/ (sum of the weighted shortest path distances from all other nodes.) Where n is the number of nodes. 14. Conclusion In this project we construct a graph from the Enron Email Dataset and analyze its graph theoretical properties. We have used various social networking theory and graph analysis techniques to understand power in organizations. P a g e | 16|
  • 17. Using Social Networking Theory to Understand Power in Organizations 15. References http://research.cs.queensu.ca/~skill/proceedings/yener.pdf http://www.egr.msu.edu/waves/publications_files/2012_09_mohammad.pdf https://en.wikipedia.org/wiki/Centrality#Closeness_centrality http://www.cs.rpi.edu/~goldberg/publications/cleaning.pdf https://books.google.com/books? id=wZYQAgAAQBAJ&pg=PA108&lpg=PA108&dq=how+to+calculate+farness&source=bl&o ts=9S_U030wAf&sig=DyjoeJ2DxTkKzC6q7fkpDW7s6RY&hl=en&sa=X&ved=0CCcQ6AEw AWoVChMIj7Ogpqe1xwIVTJseCh2YAwff#v=onepage&q=how%20to%20calculate %20farness&f=false P a g e | 17|