The document discusses data management techniques for social network analysis. It covers how to format network data for import into analysis software, how to transform data to make it suitable for different analyses, and how to export data and results. Specific transformation techniques discussed include transposing matrices, imputing missing values, symmetrizing and dichotomizing networks, combining multiple relations, combining nodes, and extracting subgraphs. Proper data management is presented as an important first step for network analysis.
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
Data Management.pptx
1. Prepared and Presented by,
Dr.Nisha Soms
Department of CSE
KPR Institute of Engineering and Technology
Coimbatore
Data Management in Social
Network Analysis
03-10-2022
1 U19CSP38 SOCIAL NETWORK ANALYSIS
3. Data Management
03-10-2022
U19CSP38 SOCIAL NETWORK ANALYSIS
3
How to format network data for import into a
network analysis software package,
How to transform network data to make it suitable
for different analyses, and
How to export network data and results for use in
other programs, such as statistical packages.
4. Data Import
03-10-2022
U19CSP38 SOCIAL NETWORK ANALYSIS
4
One of the most important steps in any network
analysis.
For large datasets, a proper database such as
Microsoft Access or MySQL is useful
For most users, using Microsoft Excel is
recommended as a sort of universal translator
5. Cleaning network data
03-10-2022
U19CSP38 SOCIAL NETWORK ANALYSIS
5
Once the data is imported, it is advisable to
examine it in some detail.
Look for repeated nodes
Look for differences in how the node’s name
was typed
Look for missing actors.
Look for isolates.
Run a quick centrality analysis early!
Etc.
6. Methodology
Step 1: Preparation. Identify the problem and what questions
should be answered; is data available to answer this question?
Step 2: Data retrieval. Retrieve data (from sources).
Step 3: Data cleaning. Clean data by unifying the format and
handling missing data/duplication, and fix errors if possible.
Step 4: Data selection. Use statistical tools to select the
significant data, create fields (attributes), keep the important
ones, and drop the others.
Step 5: Network representation. Build graph (s) from the
preprocessed data.
Step 6: Graph analysis. Process the graph(s). Compute the
(strong) components, clusters, and communities. Create new
attributes based on these, and add to the ones gained in Step
4.
03-10-2022
6 U19CSP38 SOCIAL NETWORK ANALYSIS
7. Data Transformation
These include transposing matrices,
symmetrizing, dichotomizing, imputing missing
values, combining relations, combining nodes,
extracting subgraphs, and many more.
03-10-2022
7 U19CSP38 SOCIAL NETWORK ANALYSIS
8. 1. Transposing
To transpose a matrix is to interchange its rows with
its columns
This can be helpful in maintaining a consistent
interpretation of the ties in a network.
Example: A matrix and its transpose: (a) who likes
whom; (b) who is liked by whom.
Stacked datasets can be seen as three-dimensional
matrices consisting of rows, columns and layers or
slices.
In these matrices, three different transpositions can be
done: interchanging rows with columns, rows with
layers, and columns with layers.
03-10-2022
8 U19CSP38 SOCIAL NETWORK ANALYSIS
9. 2. Imputing missing data
Missing data can be a problem in full network
research designs.
The most common kind of missing data is where
a respondent has chosen not to fill out the survey.
This creates a row of missing values in the
network adjacency matrix.
Solution?
03-10-2022
9 U19CSP38 SOCIAL NETWORK ANALYSIS
10. 2. Imputing missing data (contd)
When confronted with missing data, researchers
often want to handle the missing observations by
substituting plausible values for the missing
scores. This practice of filling in missing items is
called imputation
It gives the opportunity to use information
contained in the observed data in predicting the
missing scores, and allows analysis using
standard techniques and software on a
complete(d) dataset that is the same for all
following analyses
03-10-2022
10 U19CSP38 SOCIAL NETWORK ANALYSIS
11. 2. Imputing missing data (contd)
The shortcomings of imputation are related to
bias and uncertainty. Ad hoc imputations can
seriously distort data distributions and
relationships, and produce biased estimates.
Solution: Multiple imputation
03-10-2022
11 U19CSP38 SOCIAL NETWORK ANALYSIS
12. 3. Symmetrizing
Symmetrizing refers to creating a new dataset in
which all ties are reciprocated
Reason being, some analytical techniques, such
as multidimensional scaling, assume symmetric
data.
OR, or union, rule.
AND, or intersection, rule
the union rule creates networks denser than the
original, while the intersection rule makes them
sparser.
03-10-2022
12 U19CSP38 SOCIAL NETWORK ANALYSIS
13. 4. Dichotomizing
refers to converting valued data to binary data.
Reason being, some methods, especially graph-
theoretic methods, are only applicable to binary
data.
Helps to reduce the density of the network, which
is useful in handling large networks
This approach retains the richness of the data
and can reveal insights into the network structure
that would not be easy to deduce from techniques
designed to deal with valued data directly.
It also gives you an idea of the extent to which
your findings are robust across different
definitions of ties. 03-10-2022
13 U19CSP38 SOCIAL NETWORK ANALYSIS
14. 5. Combining relations
most network studies collect multiple relations on
the same set of nodes.
For some analyses, they are combined into one.
For eg. we might take several relations involving
friendship, support, liking and so on and combine
them to create a category of relations that we
might call ‘expressive ties’.
03-10-2022
14 U19CSP38 SOCIAL NETWORK ANALYSIS
15. 6. Combining nodes
we might want to aggregate the nodes into
departments such that a tie between any two
nodes becomes a tie between their departments.
The inter-departmental ties could be defined as a
simple count of the individual-level ties, or we
could normalize the count to account for the
number of people in each department.
03-10-2022
15 U19CSP38 SOCIAL NETWORK ANALYSIS
16. 7. Subgraphs
Finally, it may happen that we do not want analyze
the whole network.
We may wish to delete a node or nodes from the
network. This may be because they are outliers in some
respect, or because we need to match the data to
another dataset where some but not all of the same
nodes are present.
Or we may wish to combine nodes to form one node that
is connected to the same nodes as the individuals were.
One reason for combining nodes may be that the data
was collected at too fine a level and we need to take a
courser-grained analysis.
Combining nodes in the same departments would be an
example of moving up from the individual level to the
department level.
03-10-2022
16 U19CSP38 SOCIAL NETWORK ANALYSIS
17. References
1. “Analyzing Social Networks” by Stephen
P Borgatti, Martin G Everett, Jeffrey C
Johnson, SAGE Publications Ltd.
2. “Introduction to Social Network
Methods” by Robert A Hanneman
03-10-2022
17 U19CSP38 SOCIAL NETWORK ANALYSIS