Data Management.pptx

Prepared and Presented by,
Dr.Nisha Soms
Department of CSE
KPR Institute of Engineering and Technology
Coimbatore
Data Management in Social
Network Analysis
03-10-2022
1 U19CSP38 SOCIAL NETWORK ANALYSIS

Outline
03-10-2022
U19CSP38 SOCIAL NETWORK ANALYSIS
2
1. Data Management
2. Data Transformation Techniques

Data Management
03-10-2022
3
 How to format network data for import into a
network analysis software package,
 How to transform network data to make it suitable
for different analyses, and
 How to export network data and results for use in
other programs, such as statistical packages.

Data Import
03-10-2022
4
 One of the most important steps in any network
analysis.
 For large datasets, a proper database such as
Microsoft Access or MySQL is useful
 For most users, using Microsoft Excel is
recommended as a sort of universal translator

Cleaning network data
03-10-2022
5
 Once the data is imported, it is advisable to
examine it in some detail.
Look for repeated nodes
Look for differences in how the node’s name
was typed
Look for missing actors.
Look for isolates.
Run a quick centrality analysis early!
Etc.

Methodology
 Step 1: Preparation. Identify the problem and what questions
should be answered; is data available to answer this question?
 Step 2: Data retrieval. Retrieve data (from sources).
 Step 3: Data cleaning. Clean data by unifying the format and
handling missing data/duplication, and fix errors if possible.
 Step 4: Data selection. Use statistical tools to select the
significant data, create fields (attributes), keep the important
ones, and drop the others.
 Step 5: Network representation. Build graph (s) from the
preprocessed data.
 Step 6: Graph analysis. Process the graph(s). Compute the
(strong) components, clusters, and communities. Create new
attributes based on these, and add to the ones gained in Step
4.
03-10-2022

Data Transformation
 These include transposing matrices,
symmetrizing, dichotomizing, imputing missing
values, combining relations, combining nodes,
extracting subgraphs, and many more.
03-10-2022

1. Transposing
 To transpose a matrix is to interchange its rows with
its columns
 This can be helpful in maintaining a consistent
interpretation of the ties in a network.
 Example: A matrix and its transpose: (a) who likes
whom; (b) who is liked by whom.
 Stacked datasets can be seen as three-dimensional
matrices consisting of rows, columns and layers or
slices.
 In these matrices, three different transpositions can be
done: interchanging rows with columns, rows with
layers, and columns with layers.
03-10-2022

2. Imputing missing data
 Missing data can be a problem in full network
research designs.
 The most common kind of missing data is where
a respondent has chosen not to fill out the survey.
This creates a row of missing values in the
network adjacency matrix.
 Solution?
03-10-2022

2. Imputing missing data (contd)
 When confronted with missing data, researchers
often want to handle the missing observations by
substituting plausible values for the missing
scores. This practice of filling in missing items is
called imputation
 It gives the opportunity to use information
contained in the observed data in predicting the
missing scores, and allows analysis using
standard techniques and software on a
complete(d) dataset that is the same for all
following analyses
03-10-2022

2. Imputing missing data (contd)
 The shortcomings of imputation are related to
bias and uncertainty. Ad hoc imputations can
seriously distort data distributions and
relationships, and produce biased estimates.
 Solution: Multiple imputation
03-10-2022

3. Symmetrizing
 Symmetrizing refers to creating a new dataset in
which all ties are reciprocated
 Reason being, some analytical techniques, such
as multidimensional scaling, assume symmetric
data.
 OR, or union, rule.
 AND, or intersection, rule
 the union rule creates networks denser than the
original, while the intersection rule makes them
sparser.
03-10-2022

4. Dichotomizing
 refers to converting valued data to binary data.
 Reason being, some methods, especially graph-
theoretic methods, are only applicable to binary
data.
 Helps to reduce the density of the network, which
is useful in handling large networks
 This approach retains the richness of the data
and can reveal insights into the network structure
that would not be easy to deduce from techniques
designed to deal with valued data directly.
 It also gives you an idea of the extent to which
your findings are robust across different
definitions of ties. 03-10-2022

5. Combining relations
 most network studies collect multiple relations on
the same set of nodes.
 For some analyses, they are combined into one.
 For eg. we might take several relations involving
friendship, support, liking and so on and combine
them to create a category of relations that we
might call ‘expressive ties’.
03-10-2022

6. Combining nodes
 we might want to aggregate the nodes into
departments such that a tie between any two
nodes becomes a tie between their departments.
 The inter-departmental ties could be defined as a
simple count of the individual-level ties, or we
could normalize the count to account for the
number of people in each department.
03-10-2022

7. Subgraphs
 Finally, it may happen that we do not want analyze
the whole network.
 We may wish to delete a node or nodes from the
network. This may be because they are outliers in some
respect, or because we need to match the data to
another dataset where some but not all of the same
nodes are present.
 Or we may wish to combine nodes to form one node that
is connected to the same nodes as the individuals were.
One reason for combining nodes may be that the data
was collected at too fine a level and we need to take a
courser-grained analysis.
 Combining nodes in the same departments would be an
example of moving up from the individual level to the
department level.
03-10-2022

References
1. “Analyzing Social Networks” by Stephen
P Borgatti, Martin G Everett, Jeffrey C
Johnson, SAGE Publications Ltd.
2. “Introduction to Social Network
Methods” by Robert A Hanneman
03-10-2022

Thank you
03-10-2022

Data Management.pptx

Recommended

Recommended

More Related Content

Similar to Data Management.pptx

Similar to Data Management.pptx (20)

More from NISHASOMSCS113

More from NISHASOMSCS113 (9)

Recently uploaded

Recently uploaded (20)

Data Management.pptx