Collaborative SocialNetwork Discovery fromOnline Communications             Chris Diehl    USMA-ARI Network Science       ...
The Question    Organizations today utilize a number of    communication channels     Email, Instant Messaging, Text Messa...
Data Attributes    Structured Data (Metadata)      Sender and recipient(s), datetime      Can identify patterns of communi...
Identifying Key Actors –A Motivating ExampleFrom: Jennifer FraserSubject: john arnold bid for 20,000?true? and when do you...
Representations: Data and Network        Communication                 Network (Hyper)Graph         (Hyper)Graph        HP...
Collaborative Social Network Discovery    Communication Graph                                          Incremental Machine...
Entity Resolution:InfoVis Co-Author Network Fragment        Before             After7
D-Dupe: An Interactive Toolfor Entity Resolution          http://www.cs.umd.edu/projects/linqs/ddupe8
Entity Resolution:Name and Network References                                                                          Eve...
Context Challenges     Datetime: 2000-06-19        Datetime: 2001-02-28     09:52:00                    09:32:00     Sende...
Relationship Identification -Incremental Ego Network Exploration                                                          ...
Enron Manager-SubordinateCommunications Relationships                                                                     ...
Relationship Identification -Manager-Subordinate Relations     Preference Learning                             Approach   ...
Future Directions     Incremental, Active Learning       Relationship-Level and Message-Level Annotations       Automated ...
Upcoming SlideShare
Loading in …5
×

Collaborative Social Network Discovery from Online Communities - USMA-ARI Network Science Workshop - 2008

791 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
791
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Collaborative Social Network Discovery from Online Communities - USMA-ARI Network Science Workshop - 2008

  1. 1. Collaborative SocialNetwork Discovery fromOnline Communications Chris Diehl USMA-ARI Network Science WorkshopCollaboration with Lise Getoorand Galileo Namata, University of Maryland – College Park
  2. 2. The Question Organizations today utilize a number of communication channels Email, Instant Messaging, Text Messaging, To: j.smith@enron.com Wikis, Blogs From: j.doe@enron.com Subject: Re: trade Given access to an organization’s My friend John says …. online communications, how does one infer relationship and role types within the organization from the data?2
  3. 3. Data Attributes Structured Data (Metadata) Sender and recipient(s), datetime Can identify patterns of communication from metadata Metadata provides no relationship context Unstructured Data (Content) Message subject and body, attachments Content may provide relationship and role information Additional context may be needed to clarify the message Goal is to exploit complimentary cues offered by the metadata and content3
  4. 4. Identifying Key Actors –A Motivating ExampleFrom: Jennifer FraserSubject: john arnold bid for 20,000?true? and when do you plan on selling them?From: John Arnoldexaggerations...word travels everywhere doesnt it?howd you hear?From: Jennifer Fraserjohnny johhny johnny-- there is no secrecy whenone is the king of ng .. your brokers have thebiggest moves in the world…4
  5. 5. Representations: Data and Network Communication Network (Hyper)Graph (Hyper)Graph HP Labs Communication Graph (Adamic and Adar, 2003) Nodes: Network References Nodes: Entities Edges: Communication Events Edges: Social Relationships5
  6. 6. Collaborative Social Network Discovery Communication Graph Incremental Machine Learning from Context Entity Resolution Relationship Identification Validated Network6
  7. 7. Entity Resolution:InfoVis Co-Author Network Fragment Before After7
  8. 8. D-Dupe: An Interactive Toolfor Entity Resolution http://www.cs.umd.edu/projects/linqs/ddupe8
  9. 9. Entity Resolution:Name and Network References Every individual Datetime: 2001-01-23 09:45:00 has two classes of Network Sender: sara.shackleton@enron.com referencesReferences Recipients: tana.jones@enron.com To define an Subject: Hedge Funds individual’s identity and draw broader connections across Name Tana: Other than your email emails, we need toReferences attached, have you had other first associate discussions with Mark or credit name and network about hedge funds? Sara references Reference: C. P. Diehl, L. Getoor, G. Namata, "Name Reference Resolution in Organizational Email Archives," SIAM Data Mining 20069
  10. 10. Context Challenges Datetime: 2000-06-19 Datetime: 2001-02-28 09:52:00 09:32:00 Sender: Sender: tana.jones@enron.com liz.taylor@enron.com Recipients: Recipients: marie.heard@enron.com john.arnold@enron.com Subject: Just a tease!!! Subject: Greg s Bill Wouldn t you like to know Johnny, What does Greg owe which of the two Susan s you for the champagne? Is it gave her notice today $896.00? Liz10
  11. 11. Relationship Identification -Incremental Ego Network Exploration Evidence Discovery From: Christian Yoder [christian.yoder@enron.com] To: Elizabeth Sager [elizabeth.sager@enron.com], Genia Fitzgerald [genia.fitzgerald@enron.com] Relationship Ranking Message Ranking Subject: Happiness Rank Relationship with Ego Rank Message Subject Happiness is looking at the new legal (Christian Yoder) org chart (which Jan just now 1 Elizabeth Sager 1 Happiness dropped on my desk). I always approach these dry documents as 2 Richard Sanders 2 System Outage Risk though they were trigrams resulting 3 Steve Hall from throwing the coins and 3 Mark Taylor Visit consulting the I-Ching. At the top of 4 Mark Haedicke the trigram which I find myself listed 5 Dave Fuller 4 Question about a deal in I see a single name: Elizabeth we did Sager, and at the bottom I see the 6 Tracy Ngo name Genia FitzGerald. ... cgy Reference: C. P. Diehl, G. Namata, L. Getoor, ”Relationship Identification for Social Network Discovery," AAAI 200711
  12. 12. Enron Manager-SubordinateCommunications Relationships barbara.gray@enron.com gerald.nemec@enron.com mark.guzman@enron.com n..gray@enron.com bill.iii@enron.com carol.clair@enron.com leslie.hansen@enron.com chris.gaskill@enron.com janet.moore@enron.com harlan.murphy@enron.com jeffrey.hodge@enron.com christian.yoder@enron.com s..shively@enron.com cheryl.nelson@enron.com elizabeth.sager@enron.com brent.hendry@enron.com david.portz@enron.com vince.kaminski@enron.com sara.shackleton@enron.com mark.haedicke@enron.com susan.bailey@enron.com kay.mann@enron.com marie.heard@enron.com mark.taylor@enron.com alice.wright@enron.com .taylor@enron.com gwyn.koepke@enron.com tana.jones@enron.com sean.crandall@enron.com sheila.tweed@enron.com e..haedicke@enron.com mary.cook@enron.com stephanie.panus@enron.com mike.swerzbin@enron.com robert.badeer@enron.com mark.greenberg@enron.com pinto.leite@enron.com tim.belden@enron.com diana.scholtes@enron.com jean.mrha@enron.com f..calger@enron.com bert.meyers@enron.com louise.kitchen@enron.com tyrell.harrison@enron.com hunter.shively@enron.com m..presto@enron.com bill.williams@enron.com mara.bronstein@enron.com k..allen@enron.com rogers.herndon@enron.com mark.whitt@enron.com mike.grigsby@enron.com barry.tycholiz@enron.com ryan.slinger@enron.com phillip.allen@enron.com lloyd.will@enron.com john.lavorato@enron.com stephanie.miller@enron.com jane.tholt@enron.com t..lucci@enron.com scott.neal@enron.com phil.polsky@enron.com matthew.lenhart@enron.com john.arnold@enron.com dave.fuller@enron.com kimberly.bates@enron.com l @12
  13. 13. Relationship Identification -Manager-Subordinate Relations Preference Learning Approach Mean Supervised learning of relationship Reciprocal ranker Rank Given initial set of labeled ego networks Content- Ranking dyadic relationships Based with Attribute 0.719 Traffic-Based Approach Selection Message frequency Content- 0.660 Number of recipients Based Exchanges between relationship participants and common recipients Traffic-Based 0.518 Content-Based Approach Random 0.211 Term frequency vector for set of Selection messages corresponding to the relationship Worst Case 0.141 Exploits text from sender to recipient13
  14. 14. Future Directions Incremental, Active Learning Relationship-Level and Message-Level Annotations Automated Model Selection Automated Feature Selection Visualization Communications Graph Exploration Network Graph Construction Interaction Paradigms Unified Workflow for Entity Resolution and Relationship Identification14

×