ATPI Doctoral Dissertation Defense of Laura A. Pasquini
Department of Learning Technologies, College of Information
University of North Texas
June 12, 2014
1. 1
Organizational Identity and
Community Values:
Determining Meaning in
Post-Secondary Education
Social Media Guideline and
Policy Documents
ATPI Dissertation of Laura A. Pasquini
Department of Learning Technologies
College of Information, University of North Texas
June 12, 2014
Co-Major Professors: Dr. Jeff M. Allen & Dr. Nick
Evangelopoulos; Dissertation Committee: Dr. Kim
Nimon & Dr. Mark Davis
2. Research Study
To examine and define the semantic
structure of a corpus-creating community
of practice and to establish a common
reference point for post-secondary
education (PSE) social media guideline
and policy documents.
2
pp. 2-3
3. How is Social Media Being
“Guided” in Higher Ed?
•Mergel et al. (2012)
Create a social media policy before using social media
or experimentation with social media within the
organization to generate and apply guidance.
•Wandel (2009) and Joosten et al. (2013)
Security and privacy are two of the primary concerns
•Rodriguez (2011)
Deal with challenges related to privacy, ownership of
intellectual property, legal use, identity management,
and literacy development
pp. 31-32
3
4. Background
• Social media use has increased in higher education
(Brenner & Smith, 2013); however guideline and policy
documents have rarely been examined (Joosten, 2012;
Joosten et al., 2013; Reed, 2013)
• Institutions direct & moderate how students, staff, faculty &
administrators use social media on campus (Blankenship,
2011; Moran, Seaman, & Tinti-Kane, 2011)
pp. 4-54
5. Research Questions
R1. What content related factors are
relevant to structuring the body of
textual data in retrieved electronic
social media guideline and policy
documents from the PSE sector?
R2. Does the distribution of topics
analyzed in the corpus differ by PSE
institution geographic location?
pp. 9-85
6. Theoretical Development
The cycle of Wenger’s
(1998) participation and
reification in the community
of practice is assessed
through a distributed
repository of documents,
which for the purpose of this
study is called a corpus.
pp. 8-16
6
9. Theory Building
Following Evangelopoulos & Polyakov
(2014) this research focused on a special
kind of community of practice, the
corpus-creating community,
where the body of social media guideline
and policy documents is a distributed
corpus. Specifically this corpus contains
meaning, values and identity.
9
pp. 14-16
10. Assumption 1.
The community of social
media guideline and
policy administrators in
PSE is a community of
practice.
10
p. 9
11. Assumption 2.
The community of practice, social media
guideline and policy administrators in PSE,
have built a semantic structure with a
shared understanding of how social media
guidelines and policies should be.
11
p. 10
12. Assumption 3.
Published and
accessible social
media guideline and
policy documents are
artifacts that reify the
ideas from the
community of
practice.
12
p. 11
13. Assumption 4.
Analysis of the collection of
social media guideline and
policy documents by an
appropriate text analytic
method uncovers the
components of the semantic
structure of meaning.
13
p. 13
15. Limitations
The research method LSA is:
•Dimension reduction of the dataset
•Orthogonal (Lee, Song & Kim, 2010)
•Polysemy issues (Li & Joshi, 2012)
pp. 19-20
15
16. Delimitations
• Published, online accessible
• Text only in artifacts
• Organizational focus: PSE sector
• English-speaking countries
• No indicates bound of the study
controlled might influence validity
• Follow LSA methodological
recommendations for this type of text
mining procedures
(Evangelopoulos, Zhang, & Prybutok, 2012)
pp. 20-2116
17. Methodology
The research design for this study was a
semi-automatic approach to reviewing the
semantic structure and terms in the social
media guideline and policy documents.
This particular text mining procedure
required a large matrix of term-document
data to construct a semantic space in which
the closely associated terms and
documents were place near one another.
(Deerester, Dumais, Landauer, & Harshman, 1990)
p. 3617
18. Research Methods
Latent Semantic Analysis (LSA)
• a computational research method that
simulates human like analysis with language
(Landauer, 2011)
• originally used for information retrieval
query optimization (Deerwester, Dumais, Furnas,
Landauer, & Harshman, 1990; Dumais, 2004)
• topic extraction using LSA (Sidorova,
Evangelopoulos, Valacich & Ramakrishnan, 2008; Li & Joshi,
2012)
• rotated LSA (Evangelopoulos & Polyakov, 2014)
pp. 41-4518
25. Text Document Preparation for LSA
pp. 47-48 &
Appendix C
25
PassageID PassageText
SMP00001 Social Networking/Social Utilities
SMP00002
The following recommendations were discussed in the context of the social media that are most popular now,
mainly Facebook, LinkedIn, and Twitter, but were drafted to be fluid enough to apply to social networks and
utilities that will emerge in the future. The IMG recommends the following best practices guidelines.
SMP00003 Do:
SMP00004 Use social media to stay in touch with friends and make new ones.
SMP00005
Use social media to create your best image, since your page is likely visible to more people than just your
selected friends, followers, or subscribers.
SMP00006
Type your name into a search engine (i.e., Google, Bing, Facebook, YouTube) every once in a while to check on
your public image.
SMP00007 Use social media to get involved with the campus community and learn what's happening.
SMP00008 Use social media to advertise your organization's events.
SMP00009
Make sure you understand and use the privacy settings on your social media accounts to monitor who can look
at your profile.
26. LSA Input Data: Term Frequency
Matrix
Input data for LSA is the term frequency matrix X.
This matrix quantifies the collection of documents by
recording the occurrence of each term in each
document.
documents
terms
X
pp. 42-4426
PassageID PassageText
SMP00001 Social Networking/Social Utilities
SMP00002
The following recommendations were discussed in
the context of the social media that are most
popular now, mainly Facebook, LinkedIn, and
Twitter, but were drafted to be fluid enough to
apply to social networks and utilities that will
emerge in the future. The IMG recommends the
following best practices guidelines.
SMP00003 Do:
SMP00004
Use social media to stay in touch with friends and
make new ones.
SMP00005
Use social media to create your best image, since
your page is likely visible to more people than just
your selected friends, followers, or subscribers.
28. LSA Step 1:
Singular Value Decomposition
Latent Semantic Analysis (LSA) starts with the
Singular Value Decomposition (SVD) of matrix X:
X = U Σ VT
where U is the term eigenvector matrix, V is the document
eigenvector matrix, and Σ is the diagonal matrix of singular values
(square roots of eigenvalues).
SVD performs a semantic decomposition of the
discourse in X.
documents
terms
X
dimensions
terms
dimensionsdimensions
=
documents
dimensions
· · VT
ΣU
pp. 42-4428
29. LSA Step 2:
Truncated SVD
The truncated term frequency matrix is obtained by
retaining the first k SVD dimensions:
Xk = Uk Σk Vk
T
The truncation of the SVD components corresponds
to a semantic abstraction of the discourse in X.
documents
terms
Xk
dimensions
terms
dimensions
dimensions=
documents
dimensions
· ·
Vk
T
ΣkUk
pp. 44-4529
24,24
3
664
30. How Many Dimensions?
pp. 63-6430
Eigenvalues obtained by squaring the singular values
in matrix and using iterative methods to obtain the
scree plot elbow, and the profile likelihood test (Zhu
& Ghodsi, 2006).
39. Promotional View: Canada
p. 8839
F28 see SMP01268 passage example
from Brock University :
”Brock University protects your privacy and
your personal information. The personal
information requested on this form is
collected under the authority of The Brock
University Act, 1964, and in accordance
with the Freedom of Information and
Protection of Privacy Act (FIPPA) for the
administration of the University and its
programs and services. Direct any
questions about this collection to the Social
Media coordinator in University Marketing
and Communications.”
F28 = Privacy
F07 = Information Management
F07 see SMP16534 passage example from
Thompson River University :
“The first, and probably most important, privacy
tool or protocol you can engage is to prepare
or provide a brief privacy seminar for students
that informs them about existing privacy
legislation in BC and Canada and highlights
the importance of fundamental privacy
principles, such as knowledge, notice and
informed consent. Most younger students have
grown up in a culture of mass information--
sharing and are not yet old enough-or simply
have been fortunate enough-to never have
suffered the serious negative consequences
for sharing too much of their or other people's
personal information.”
45. Social Media Guideline and
Policy Document Database
45
Appendix B (pp. 116-117)
Pasquini, Laura A. (2014).
Appendix B: Social media
guideline and policy
document database.
figshare.
http://dx.doi.org/10.6084/m
9.figshare.1050571
47. Summary of Research
47
• Compiled recommendations for developing social media
guidelines and policies from 250 PSE institutions
representing 10 countries after extracting key topics.
• Developed a common reference for social media guideline
and policy documents research to inform the PSE sector.
• Compared the distribution of the 36 topics (factors) across
two geographic regions to determine importance.
• Theorized the semantic structure created by an
organization in relation to Wenger’s (1998) Community of
Practice framework with corpus-creating community.
State title of dissertation: Organizational Identity and Community Values: Identifying Meaning in Post-Secondary Education Social Media Guideline and Policy Documents
GOAL: to define and identify the common structure to establish a common reference for social media guideline and policy documents
Research Study:
To examine and define the common semantic structure of a corpus-creating community of practice and to establish a common reference for post-secondary education (PSE) social media guideline and policy documents.
Motivation for this research: There is a need to create a standard for social media guidance, with respect to guideline and policy development in higher education.
Over 75% of the incoming 2013 class use social media for enrollment decisions (Uversity, 2013)
41% of faculty use social media for teaching (Seaman & Tinti-Kane, 2013; Pearson, 2012)
Social media guideline and policy document analysis has the potential to inform use (e.g. teaching, engagement, etc.), implementation, and policy design in higher education
On-going and current concerns: December 2013 Kansas Board of Regents; NYU policies for community May 2014; Chicago faculty blog ban October/November 2013
These are the questions to guide my research investigation.
To build theory around organizational identity, this study applied Wenger’s (1998) community of practice theory to a distributed repository of documents known as a corpus in this study.
THIS is my focal point of the dissertation – in assessing organizations and their identity development in a community of practice.
The community of practice is involved in participation on a regular basis as PSE administrators of social media are aware of these documents, read these documents, contribute and edit these documents, and share these documents between PSE institutions to deal with legal requirements within their region (state, territory or country), at professional associations, among accrediting bodies, via association involvement, and through online communities and networks – thanks to social media and electronic sharing on websites.
With REIFICATION, members of the community of practice REFLECT and REIFY social media guideline and policy documents as they have reviewed and see this body of knowledge as both the inspiration and authority of reference for social media use and guidance in the PSE sector. Members of the community who are craft social media policy, look up into those documents and they see something in them and they believe they have value; symbolic value and higher status– content is no longer part of the individuals or PSE institution, those documents and points of references/objects/texts become associated with “the standard” for social media guidance
--others make reference and comparison to the corpus during reification
---Isn’t this interesting? Good – this is what this dissertation is about because we are going to do it
In creating this new theory related to communities of practice, I focused this research investigation on a particular group: A CORPUS-CREATING COMMUNITY identified by Evangelopoulos & Polyakov in a recent article. Specifically this theory identify meaning, value and identify from the corpus.
How does this make sense we propose that…
Therefore…
This also means that the artifacts reify the ideas from the community of practice
And this finally means that, since the corpus has a semantic structure it will reflect the meaning, values, and identity of the community (or organizations) in the PSE sector.
As people look up to the common documents from the community of users with the particular discussion of social media guideline and policy of users.
This theoretical framwork will help to uncover how the knowledge sharing in the corpus makes meaning from a corpus-creating community of practice.
Latent semantic analysis (LSA) is dimension reduction of the original dataset; determination of dimension factors is based on a subjective researcher judgment.
LSA has orthogonal characteristics, which means multiple occurrences of words from different factors (topics) are usually prevented and words in a certain topics will have a high relation with words in that topic, whereas will be limited in connection to other topics.
LSA will not be able to resolve polysemy issues (coexistence of many possible word or phrase meanings).
It makes no use of word order, thus of syntactic relations or logic, or of morphology. Remarkably, it manages to extract correct reflections of passage and word meanings quite well without these aids, but it must still be suspected of incompleteness or likely error on some occasions.
I have selected to review online published and publically accessible text artifacts from English speaking PSE institutions. Analysis of textual content only i.e. no images, screenshots, videos, photos, or URLs. The organizational focus on the PSE sector was due to the abundance of documents that could be gathered to build a sufficient database (corpus) for text analysis. Text is optimal since we are applying LSA and there is a large amount of text in the corpus.
Data mining to both predict clusters and results for a significant amount of data (Romero, Ventura, & Garcia, 2008).
Text mining extracting interesting and non-trivial patterns or knowledge from unstructured text documents (Hearst, 1997; Feldman & Dagan, 1995; Fayyad, Piatesky-Shapiro, & Smyth, 1996; Simoudis, 1996).
uses fast processing by consolidating a vase amount of data, reduce coding bias, and limit researcher influence (Cronin, Stiffler, & Day, 1993; Litecky, Aken, Ahmad, & Nelson, 2010).
“text classification, text clustering, ontology and taxonomy creation, document summarization and latent corpus analysis” (Feinerer, Hornik, & Meyer, 2008)
Semi-automatic approach, i.e. the Text Document Preparation for LSA can be found in Appendix C pp. 118-119; offers coding validity and member checking for data manipulation
LSA is a text mining approach to index words and concepts. Essentially, LSA is a computational model that learned word meanings from vast amounts of text and identified the degree to which two words or passages have the same meaning (Landauer, 2011). ORIGINALLY used for information retrieval; NOW a new methods developed to allow for topic identification through rotated LSA (Siddorova et. al 2008) and updated rLSA (Evangelopoulos & Polyakov, 2014
The latest version of LSA is rotated LSA (rLSA) to extract topics – none of the previous LSA methods extracted topics using LSA after 2008 only
10 seconds “More schematically… “ with 5 second to pause (show in segments)
-goal is to define and identify the common structure to establish a common reference for social media guideline and policy documents
This study will follow established text mining procedures as discussed in prior studies (Evangelopoulos et al., 2010; Hossain et al, 2011; Li & Joshi, 2012) and utilize the following three-step process of text mining using LSA as described in Elder, Hill,
Delen, and Fast’s (2012) methodology as outlined in Figure
Step 1: Establish the Corpus - search online, website gathering, social media
Step 2: Pre-Process the Data - Word (carriage returns), to Excel docs (macros), combine all - clean URLs, videos, images, etc text only
—pre-processing and term reduction; SVD; term frequency matrix
Step 3: Extract Knowledge
Intro to step 1
STEP 1: ESTABLISH THE CORPUS
Try to get all of it. The random sample is the classic statistic; downloaded from databases and search engines
To ensure the corpus for this study would be robust for latent semantic analysis procedures, the researcher conducted a preliminary online search of social media guideline and policy documents to form the database from October 2013 until January 2014. The database currently contains at least 20, 000 documents from approximately 240 post-secondary education institution representing various geographic locations (countries), size of campus (by student population), and institutional types (e.g. public, private, bachelor’s and associate degrees, etc.). The researcher will continue to solicit for submissions for social media guidelines and policy documents that are directed at students, staff, faculty, researchers, and campus stakeholders from the post-secondary education sector via an online form (http://socialmediaguidance.wordpress.com/submit-a-social-media-guideline/) embedded into a research website
Activity 1: show the segment
For the purpose of this study, publicly accessible social media guideline and policy documents were the target sample. Although a growing number of institutions were guiding social media use, the researcher only reviewed documents retrieved online as accessible to any visitor of a PSE institution website. To be eligible for this study, all social media guideline and policy documents had to be available electronically and accessible through PSEs’ institutional websites or a general web search. The text documents would guide social media from departmental or institutional levels within the PSE sector. .
Here are the 24,243 atomic documents representing the following countries: Canada, the United States of America, Australia, New Zealand, Norway, the Netherlands, Austria, Ireland, South Africa, and Great Britain (until Scotland has their vote in September).
Intro to Step 2
STEP 2; PRE-PROCESS THE DATA
Prepare the text. See guidelines for text document preparation for LSA for this semi-automatic approach in Appendix C (pp. 117-118); discusses the segmentation of documents, validity and member checking among multiple coders
STEP 2; PRE-PROCESS THE DATA
Input data for LSA is the term frequency matrix X.
This matrix quantifies the collection of documents by recording the occurrence of each term in each document
Intro to step 3
664 terms by 24,243 documents
X = the quantification of the 24,243 atomic documents
By keeping this truncated model- it promotes different term frequencies
-This is an improved version of LSA from the “stereotypical interpretation”
-The assumption is that these documents are talking about topics (factors) and each term is mentioned at least once in ALL the documents
-Rather than looking at 664 dimensions, we examine 36 to understand the semantic structure of the documents from this truncated term frequency matrix…
But how many topics (or factors) should we keep from this matrix?
Script log reviewing eigenvalues; look for elbow points (similar to bootstrapping by Efron, 2005)
Zhu & Ghodsi, 2006 – use profile likelihood loading; I am fortunate to have the macro provided by my co-chair Dr. E to conduct this analysis
Explored a bit and found 664 terms by 24, 243 documents, because it is like that, the eigenvalues it can extract is the maximum by the dimensions
Dimensionality detection was determined by the change-point and profile likelihood test (Zhu & Ghodsi, 2006).
Contractional = normative Transcendent topics or Converge – if the document mentions something it is there and it doesn’t matter if it is present one or many times, just as it is written in a contract (maximum)
Promotional = what should be heavily promoted through repetition; some guideline & policy documents mention online one term once or multiple times via the count function here the views differ Divergent
Differences: Neighborhoods in the global village of the 250 PSE institutions have slightly different values depending on their region.
Here are the first 18 factors:
The 6 high-loading terms extracted using LSA was Institutional Users, Information Management, Page & Group Administration, Account Management, Support at Institution, Comments, and Content
The next 18 factors:
The lowest factors: Respect, Privacy, Responsibility, Advising Resources and Questions, Flickr and LinkedIn
Table 9 on pp. 69-71 and Appendix E outlines the high-loading terms with the term frequency – inverse document frequency (TF-IDF)
Let’s take a look at one example
Separate – zoom in to the top factors
Highlight a few key factors for the extraction
Identify the 22 topics that converge and are present in all geographic regions
From help of the chi square test… that make a difference across geo regions; the remaining 22 do not make a difference
1Promotional = what should be heavily promoted through repetition; here the views differ Divergent
**Topics of divergent importance --
-Chi-square statistic compares distributions of documents across the 36 topics in the US and Non-US universally
-Homogeneous and heterogeneous – not evenly distributed
Identify the 14 that diverge from the topics
From help of the chi square test… that make a difference across geo regions; the remaining 22 do not make a diff
1Promotional = what should be heavily promoted through repetition; here the views differ Divergent
**Topics of divergent importance --
-Chi-square statistic compares distributions of documents across the 36 topics in the US and Non-US universally
-Homogeneous and heterogeneous – not evenly distributed
F12 Institutional Users – discussed more among Commonwealth countries and Ireland
F28 Privacy is is closer to Canada
US (and close by the NLD) seem to cluster around similar topics including
F36.21 Audience and F36.11 Support at Institution following in very close proximity. Other factors, F36.10 Page & Group Administration, F36.1 Facebook, and F36.8 Posting
For example, passages for F28 and F07 being close to Canada are…
F07 have 141 out of 735 originate from Canadian PSE institutions about the topic of information management
F28 have 98 out of 349 originate from Canadian PSE institutions about the topic of Privacy – with Thompson River University leading with this
Topics discussed in social media policies – Q sort: verify # of people; your task is to match the 36 topics in 9 categories
To be recorded on a matrix (confederates – no IRB application) Factor by Rater matrix developed
Contribution to research findings: Recommendations for developing social media guidelines and policies
FINAL Q-sort of classifications to group the topics (factors)
Topics discussed in social media policies – Q sort: verify # of people; your task is to match the 36 topics in 9 categories
To be recorded on a matrix (confederates – no IRB application) Factor by Rater matrix developed
Fliess’ kappa – macro provided by Dr. E to compute this statistics for inter rater reliability
9 categories to cluster the 36 topics (factors) for social media guideline and policy recommendations to validate own classification
Bonus: outside the scope of the dissertation however useful for future publications
The inter-rater reliability is at 0.68. This is not bad at all! Landis and Koch (1977) – see p. 165 of the attached paper – consider this to be in the “substantial agreement” range, which is the (0.60, 0.80) range. Ran the Excel macro AgreeStat, created by Gwet (reference his 2008 paper in Psychometrika, also attached here.) I attach a document that discusses IRR, has some references, and explains how to use the AgreeStat macro.
Contribution to research findings: Recommendations for developing social media guidelines and policies
FINAL Q-sort of classifications to group the topics (factors)
Database shared on figshare
Future research implications & applications: Review difference further between countries; linguistics; potential to see how these factors are implemented at PSE institutions for teaching, research, and service scholarship
-Industry review for other organization
-application for policy to institutional and organizational culture and identity, with regards to common values
-Continue research to identify differences between countries additional languages and linguistically assessment.
Contributions from this investigation include…
Motivated the need for social media g & p comparison & research
Theorized the semantic structure of the community that builds them in relation to the cop framework
Compiled a set of 250 PSE from 10 countries to extract the topics
Compared distributions of these topics across two geographic regions & found that some are more important than others
Any Questions? Thanks
QUESTION from Nick: What’s the meaning of reification from Wenger’s Community of Practice theory? How does it apply to this study?