Building Rich Social Network Data

1,105 views

Published on

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,105
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
14
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • Sometimes goals require simple dataset, sometimes complex.When complex asking questions is often usefulPossible extensions / further analyses
  • Node represents the unit of action/autonomy – this can be individuals, firms, organisations, countries or positions.Edge (sometimes called ’network tie’, ‘tie’, ‘link’, or ‘connection’). Edges represent some relationship between nodes.
  • Building Rich Social Network Data

    1. 1. Building Rich Social Network DataSchema to aid designing, collecting and evaluating social network data Eamonn O’Loughlin, Diane Payne,eamonn.oloughlin@gmail.com University College Dublin University College Dublin
    2. 2. Why Social NetworksSocial interactions and social networks are anenduring component of our everyday lives.Social Networks are (among other things):• Basis upon which information and behaviours diffuse through a population• Cornerstone for trade and cooperation• Key component in determining the languages we speak, goals we aspire to, and values we hold
    3. 3. Background & MotivationEamonn O’Loughlin: Early stage PhDResearcher in the Dynamic Lab – with Conclusionan interest in predictive modeling ofsocial behavior using social positionand structure. Also interested in largenetwork visualisations and policydesign that recognises / leverages Hypothesis Evaluationnetwork effects. timeMotivation: Social Network Analysistechniques to uncover patterns andrelationships between network Data Strategystructure/activity and micro network Design & Collectionoutcomes (individual actions ordecisions). Motivation + Intuition + Problem + Hypothesis
    4. 4. Background & Motivation Intended Audience: Researchers who are / will be creating a Conclusion social network dataset (1) Precautionary: Nobody wants to realise that they didn’t consider some easy-to-collect yet suddenly Hypothesis Evaluation vital-for-analysis feature after their time data has already been collected (2) Not Straightforward: Social Network data & data design is Today’s Data Strategy complex – compared to traditional Design focus & Collection multi-dimensional data there are many different assumptions that must be made and (as we will see) Motivation + Intuition + Problem + Hypothesis quite a few trade-offs(not covered today: domain of analysis & specific domain challenges)
    5. 5. What is Social Network Data “Social network views social relationships in terms of network theory, consisting of nodes (representing individual actors within the network) and ties (which represent relationships between the individuals”
    6. 6. Brief (Subjective) History of Social Network Analysis Martin Everett Nicholas Mullins David Krackhardt S.D. Berkowitz Anatol Rapoport Ronald Burt Barry Wellman Stanley WassermanJ. A. Barnes Katherine Faust Nan Lin Peter Marsden Tom A. B. Snijders Linton Freeman Stephen Mark Granovetter Garry Robins Borgatti David Knoke Kathleen Carley Harrison White Karen Cook Douglas R. White
    7. 7. Brief (Subjective) History of Social Network Analysis Statistical Models UCINet Diffusion of for Social Networks Innovation Social Capital Social (& structural holes)Networks as Martin Everett a Science Network Theory of Nicholas Mullins Social Capital David Krackhardt S.D. Berkowitz Anatol Rapoport Ronald Burt Barry Wellman Multilevel Stanley Wasserman J. A. Barnes Katherine Faust Nan Lin Analysis & SIENA Peter Marsden Tom A. B. Snijders Linton Freeman Stephen Mark Granovetter Garry Robins Borgatti David Knoke Kathleen Carley Harrison White Karen Cook Douglas R. White ERGMs Exchange & Trust Dynamic Network Social Constructs / Analysis Persistent Social Formations ‘The Strength of Weak Ties’ (economic networks)
    8. 8. Brief (Subjective) History of Social Network Analysis Communication, Statistical Models Social Network co-authorship, UCINet Diffusion of for Social Networks Visualisation and colleagueship Innovation Comparative Social Social Capital Network Methods Structures Social (& structural holes)Networks as Martin Everett a Science Network Theory of Nicholas Mullins Social Capital David Krackhardt S.D. Berkowitz Anatol Rapoport Ronald Burt Barry Wellman Multilevel Stanley Wasserman J. A. Barnes Katherine Faust Nan Lin Analysis & SIENA Peter Marsden Tom A. B. Snijders Linton Freeman Stephen Mark Granovetter Garry Robins Borgatti David Knoke Kathleen Carley Harrison White Karen Cook Douglas R. White Social Networks & Social Structure ERGMs the Internet & Cognition Exchange & Network Realism Dynamic Network Trust Social Constructs / Analysis Persistent Social Formations ‘The Strength of Inter-organisationalFormal Organisations Consensus Weak Ties’ political networks & & Social Networks Analysis (economic networks) Terrorist Networks
    9. 9. Why is this a Problem Many design Difficult to Privacy decisions sample Concerns Different network data Practitioners Network Data Collection is Expensive Reduced cost Rapid SensorIncrease in ability to of data Tech.analyse data storage Advancement
    10. 10. Dimensional Data -vs- Network Data Cross-Sectional ?? No Standard Time Series Representation ?? Panel Data
    11. 11. What is the Solution “A schema allows us to represent in a particular way the structure and features of a particular object” A schema is a mechanism that allows us to define the design, content, and to some extent, the semantics of a dataset. …. …. Cross-Sectional …. Time Series …. …. Panel Data ….
    12. 12. Approach Taken1. Searched for publically available social network datasets (20-30 Dataset Wiki: http://dl.ucd.ie different datasets)2. Accesses datasets & related publications. Reviewed structure and collection approach TBC3. Created draft schema4. Added 110 more datasets to analysis. Refined / iterated schema design5. Published dataset wiki / solicited input from social network analysis community (INSNA)6. Completed schema design
    13. 13. Schema Overview: Structure …. …. …. …. …. …. …. …. …. …. …. …. …. ….
    14. 14. Schema Overview: Minimal RepresentationOverview:• What does a node represent (Individuals? Employees? Researchers? Firms? Organisations? Node Represents Countries? political positions?) Edge Represents• What does an edge represent (friendship? communication? Interaction?) …. ….Examples: ….UK MPs on Twitter (Personal Twitter Accounts) …. …. …. …. (Mentions)Co-authorship in network science …. …. (Academic Journal Authors) (Co-Authorship) …. ….Infectious SocioPatterns (Visitors to Science Gallery) (face-to-face proximity) ….
    15. 15. Schema Overview: Node TypesOverview:• Does the network contain > 1 node types? Node Represents• Bipartite networks are a particular class of complex networks, whose nodes are divided into Edge Represents two sets X and Y, and only connections between two nodes in different sets are allowed. Multiple Is bipartite? Node Types?Examples: ….Terrorist Network …. …. …. …. Nodes Types: Terrorist, Leader, Politician, Citizen …. ….Primary School Cumulative Network …. …. Node Types: Teacher, Student Edge Type: Physical Interaction between student and teacher ….
    16. 16. Schema Overview: Edge TypesOverview:• Does the network contain > 1 edge types• Are these edges: Node Represents • directed? • undirected? Edge Represents • weighted (e.g. strength / frequency) • signed (e.g. positive / negative) Multiple Is bipartite? Node Types?Examples: Multiple Edge Types?The Policy Network of Toxic ChemicalsRegulation in Germany in the 1980s => -=> w +/-Edge Types: Shared Committee Membership, Information Exchange …. ….Students data sets (van de Bunt)Edge Types: Unknown, best friend, friend, …. ….friendly relation, neutral, troubled relation,item non-response, actor non-response ….
    17. 17. Schema Overview: Edge TypesOverview:• Does the network contain > 1 edge types• Are these edges: Node Represents • directed? • undirected? Edge Represents • weighted (e.g. strength / frequency) • signed (e.g. positive / negative) Multiple Is bipartite? Node Types?Examples: Multiple Edge Types?Enron Email DatasetNodes: Senior Enron Employees => -=> w +/-Edge Types: Email Sent, Email RecievedWeight: # of Emails sent …. ….Dining-table partners in a girls dormitory at aNew York State training school …. ….Nodes: Girls in a New York state dormitoryEdge Types: preferred dining partner ….Weight: order of preference
    18. 18. Schema Overview: Node Attributes / Communities Overview: • Do Nodes have attributes? • Are these attributes static (e.g. gender) or Node Represents dynamic (e.g. smoking preference)? Edge Represents • Are the nodes belonging to some known community? Multiple Is bipartite? Node Types? Examples: Multiple Edge Types? Lawyers data (Lazenga) Node Attributes: seniority, formal status, office => -=> w +/- in which they work, gender, law school attended, individual performance Node Attributes Commu nities measurements (hours worked, fees brought in), attitudes concerning management policy …. …. Irish Politicians & Organisations on Twitter Communities: Political Affiliation (Fine Gael, …. Fianna Fáil, Labour, Sinn Féin, …)
    19. 19. Schema Overview: Dynamic DataOverview:• Is the Network Dataset Dynamic? Node Represents• If Dynamic, is the type of temporal data: • Event Driven? Edge Represents • Continuous / Realtime? • Periodic Snapshots? Multiple Is bipartite? Node Types?Examples: Multiple Edge Types?Kapferer Tailor ShopInteractions recorded at two different time => -=> w +/-points seven months apart; a strike happenedin between (snapshot) Node Attributes Commu nitiesSouthern Women Network Dynamic ….It contains the observed attendance at 14social events by 18 Southern women. (eventdriven) ….
    20. 20. Schema Overview: Dynamic DataOverview:• Is the Network Dataset Dynamic? Node Represents• If Dynamic, is the type of temporal data: • Event Driven? Edge Represents • Continuous / Realtime? • Periodic Snapshots? Multiple Is bipartite? Node Types?Examples: Multiple Edge Types?Norwegian Boards (Aug’ 09)Board membership evolution from 1999 to => -=> w +/-2009 (continuous or real-time) Commu Node Attributes nities Dynamic …. ….
    21. 21. Schema Overview: Parallel Data Overview: • Does the Network come with Parallel Data? Node Represents • Is this parallel data time-series? Edge Represents • What is the relationship of this parallel data to the network data? Multiple Is bipartite? Node Types? Examples: Multiple Edge Types? Wiki-Vote Nodes: Wikipedia Editors => -=> w +/- Edges: Voting Behaviour Parallel Data: Vote outcome Node Attributes Commu nities MathSciNet: Co-authorship network Node: Journal Article Authors Dynamic Parallel Edges: Co-authorship Parallel Data: Detailed information about MathSciNet papers: numerical IDs of papers, …. authors, and categories
    22. 22. Schema Overview: Parallel Data Overview: • Does the Network come with Parallel Data? Node Represents • Is this parallel data time-series? Edge Represents • What is the relationship of this parallel data to the network data? Multiple Is bipartite? Node Types? Examples: Multiple Edge Types? Extended Epinions dataset Nodes: Consumers on trust site Epinions.com => -=> w +/- Edges: Trust / Distrust Parallel Data: Details of all product reviews Node Attributes Commu nities hosted on the Epinions website Dynamic Parallel ….
    23. 23. Schema Overview: MetadataOverview:• What are the network boundry conditions?• Does the network have mising data? Node Represents • Does this missing data have a pattern?• Was the data sampled / sub-selected from a Edge Represents larger dataset? • What was the process for sampling? Multiple Is bipartite? Node Types?Examples: Multiple Edge Types?Newcomb Fraternity15 weekly sociometric preference rankings => -=> w +/-from 17 men attending the University ofMichigan in the fall of 1956; data from week 9 Node Attributes Commu nitiesare missing. Dynamic ParallelEnron Email Dataset (Boundary Conditions) Collection Metadata
    24. 24. Schema Overview: MetadataOverview:• What are the network boundry conditions?• Does the newwork have mising data? Node Represents • Does this missing data have a pattern?• Was the data samples / sub-selected from a Edge Represents larged dataset? • What was the process for sampling? Multiple Is bipartite? Node Types?Examples: Multiple Edge Types?Yahoo! Messenger User CommunicationPattern => -=> w +/-Dataset contains a small sample of the Yahoo!Messenger communitys communication (IM) Node Attributes Commu nitieslog at a high level for a period of 4 weeks.Specifically, this dataset only records the first Dynamic Parallelcommunication event from one user toanother on a particular day, and generatessuch records for a period of 28 days. Collection Metadata
    25. 25. Social Network Data Schema (1 page overview) A schema is a way to define the structure, content, and to some extent, the semantics of a dataset • What does a node represent (Individuals? Employees? Researchers? Node Represents Firms? Organisations? Countries? political positions?) • What does an edge represent (friendship? communication? Interaction?) Edge Represents • Does the network contain > 1 node types Multiple • Is the network bipartite, where ties can only exist between nodes of two Is bipartite? Node Types? different groups. • Does the network contain > 1 edge types Multiple Edge Types? • Are these edges: • directed? / undirected? => -=> w +/- • weighted (e.g. strength / frequency) or signed (e.g. pos. / neg) Commu • Do nodes have attributes? / Are these attributes static or dynamic? Node Attributes nities • Are the nodes belonging to some known community? • Is the Network Dataset Dynamic? Dynamic Parallel • If Dynamic, is the type of temporal data: • Event Driven? / Continuous / Realtime? / Periodic Snapshots? • Boundry Conditions? Missing Data? Collection Metadata • Sampled from larger dataset? SamplingEamonn O’Loughlin, Dynamics Lab, UCD (eamonn.oloughlin@ucdconnect.ie)
    26. 26. Proposed Use of Schema Direct Observation / Survey
    27. 27. Proposed Use of Schema Retrieving Data (subset) from an existing system
    28. 28. Proposed Use of Schema Identifying / Assessing publically available data
    29. 29. Thank You QuestionsReach me at eamonn.oloughlin@gmail.com

    ×