Open Source Camp Kubernetes 2024 | Running WebAssembly on Kubernetes by Alex ...
Production knowledge imass-olhao_24-4-2014_en
1. 11
Production of new knowledge through automated Big
Data extraction from Social Bookmarking Systems and
analyzing of the resulting network: The case of the
network of the globalization of agriculture in Delicious
1st IMASS Conference,
Methods and Analyses in Social Sciences,
23-24 April 2014, Olhão, Portugal,
http://imass.ca/imass/conference
University of Huelva, Spain
Juan D. Borrero, jdiego@uhu.es
Estrella Gualda, estrella@uhu.es
José Carpio, jose.carpio@dti.uhu.es
2. 22
Table of Contents
Introduction
Web 2.0 and Social Tagging Systems
Social tagging and folksonomy
Folksonmy and collective tag structure
Context and Topic of Study
Delicious
Tagging on Delicious
Tag structure, Delicious and social networks
Globalization of Agriculture
Objectives
Methodology
Data collection
Analysis
Results
Social network statistics from Delicious dataset
Network centralization
Top authoritative nodes
Visualization UserURL net
Cohesion and substructures
Tag clouds
Discussion
Centrality and power
Central tags
Conclusions
Further research
Possible applications
Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and
analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious
3. 33
Framework
Web 2.0 and Social Tagging Systems
Many users add metadata in the
form of TAGS
Resulting collective tag structure
Source: http://www.idonato.com/2009/05/27/fun-with-tag-clouds/
Source: http://blog.hubspot.com/blog/tabid/6307/bid/7372/9-Reasons-Why-
Your-Social-Media-Strategy-Isn-t-Working.aspx/
Source: http://bvdt.tuxic.nl/index.php/the-wisdom-of-
the-crowds-in-the-audiovisual-archive-domain/
Web 2.0 has made tagging possible for a wide range of
people to produce, share, interact with, and organize data
4. 44
Framework
Social Tagging
A user enjoys a resource and, according to his or her mental
model, identifies those terms that best describe the
information conveyed by that resource
is the activity in the Web 2.0 of annotating digital resources
with keywords - tags (Golder and Huberman, 2006; Trant,
2009).
Social Tagging
Tagging
5. 55
Source: http://scot-project.net//
Social tags produced by
users are usually regarded
as high quality descriptors
of web page topics and a
good indicator of web
users’ interests and
preferences.
This process also allows
the formation of a socially
constructed classification
schema called
folksonomy…
(Vander Wal, 2004)
Framework
Social tagging and folksonomy
6. 66
… that emerges via a Bottom-up
process, and… …the tags of many different users
are aggregated and the resulting
collective tag structure
– such as tag cloud – depicts the
collective knowledge of Web
users (Cress et al., 2012)
Source: http://blog.cimmyt.org/?p=6052
Source: http://scot-project.net//
Framework
Folksonomy and collective tag structure
7. 7
Context and Topic of Study
Context
Deliciousis a free social
bookmarking web service for
storing, sharing and
discovering web bookmarks
•Content is created, annotated
and viewed by its users.
•Non-hierarchical classification
system: users can tag each of
their bookmarks on the
Delicious website, and
provides knowledge about the
URL marked
•Collective nature:
• view bookmarks added or
annotated by other users.
• organize existing tags into
groups (tag bundles).
Source: www.delicious.com
8. 88
Context and Topic of Study
Tagging on Delicious
People can classify the huge amount of information at her/his
disposal in the form of tags.
Keywords freely
chosen by users
employed to
annotate various
types of digital
content, or
suggested by
Delicious
Source: www.delicious.com
9. 99
Context and Topic of Study
Tag structure, Delicious and social networks
We can see Delicious as a tripartite
network whose representation can be
described by two bipartite networks, for
user→tag and user→URL relations, and
where we can also see indirected links
(e.g. between users - straight lines), that
represent a unipartite network
The structure of Social tagging websites can be viewed as a network of
three different node types: the U users, the R resources (web sites –
URLs) and the T tags that the U users deploy to tag the R websites.
A Tripartite Network made of three users U=(u,u’,u’’), four
tags T=(t,t’,t’’,t’’’) and three URLs (url,url’,url’’)
In Delicious, an annotation is mainly
composed of three interconnected
components (Smith, 2008):
1. Link to the resource (website)
2. One or more tags
3. User who makes the annotation
10. 1010
Globalization
Implies large market as result of the reduction transaction costs of international
trade
Globalization of agriculture
- trade (foods, goods)
- prices (food, goods)
- food consumption (bulk products versus processed products)
- R&D
- rules and laws (subsidies, WTO related to poverty)
implications
Asymmetries
effects
Web 2.0
Discussion/diffusion
Context and Topic of Study
Topic
11. 11
Objectives
To discover some type of structuration around the
issue of the globalization of agriculture on Delicious
Extracting automatically data from Delicious social
bookmarking website, and using Social Network Analysis
(SNA),
1.what types of URLs around our topic have been
recommended via collaborative tagging in Delicious,
2.what types of users label URLs around this topic,
3.whether there is some type of structuration and hierarchy
to be discovered in the network of the globalisation of
agriculture (centrality, substructures, etc.), and
4.what types of tags are been used to specifically label (and
thus define and qualify) the URLs on the globalization of
agriculture that they recommend through Delicious.
12. 1212
Methodology
Data collection / Procedure
(A) Start point. Identify the search attributes. Authoritative
source as baseline to find keywords connected to the idea of
‘globalization of agriculture’
– Wikipedia definition of “critics of globalization (popular,
high reputation)
– Other starts points (future)
– Selected (manually= researcher expertise) main concepts
from the website homepages, tag clouds or topics.
– Identified the 9 seed keywords (globalization +
agriculture, development, activism, trade, poverty, food,
organic, GMO)
– Other concepts rejected
(B) Perl program web-crawling was made to gather the
sample of users, URLs and tags for
- globalization+agriculture;globalization+development;
globalization+activism; globalization+poverty;
globalization+food; globalization+organic;
globalization+GMO
- From 22 April 2011 to 21 May 2011
(C) Results
- 61,043 taggings that involved 3,668 users on 4,913 URLs
and 5,724 tags.
(D) Program in Haskell to reduce the amount of data by
cutting the URLs and using key words, including the
identification of synonyms, the elimination of words with
capital letters and derivatives such as words in plural.
14. 14
Methodology
Analysis
With the help of the Software Pajek, we
analyzed these social networks,
first studying its properties (quantitative),
and
second visualizing the nets (qualitative)
through force-directed graph layouts and tag
clouds.
15. 15
Network Type Relation # of
nodes
# of links Density Av.
Degree
User URL Bipartite Directed 5,816 7,200 0.09% 2.476
User– User Unipartite Undirected 3,668 134,833 1.97% 73.5187
URL – URL Unipartite Undirected 2,148 20,558 0.84% 19.141
Tag – Tag Unipartite Undirected 4,776 539,105 47.06% 225.756
A bipartite network with a directed relation is a network created through two different types of nodes (in this case “users” and
“URLs”), that are directly connected by a relationship or link (in this work: user recommend URLs, or user tag URLs) (2-mode network).
A unipartite network with an undirected relation is a network created after a transformation of the original matrix into a user-user, tag-
tag, or URL-URL matrix. In these cases there is an undirected relation through a vertice (node) that connect both (1-mode network).
For instance, a user-user matrix is built here through the URLs that connect users, because different people can tag or recommend the
same URL.
Results
Social Network Statistics from Delicious dataset
Tag-tag network is much denser than the others: People
usually use common tags
16. 1616
The network is highly centralized within a few nodes. The power law
is a defining characteristic of large-scale networks such as the Web (e.g.
Barabási and Albert, 1999), which implies a high degree of network
centralization
How come that a few users and websites are better
connected than the majority?
2,148 URLs arranged in rank order by number of
inbound links (URL’s Indegree: Sum of total inbound
links)
3,668 users arranged in rank order by number of
outbound links (User’s Outdegree: Sum of total outbound
links)
Results
Network centralization
Hyperlink Network (userURL). The degree of variability in URL and user centrality scores according to
indegree and outdegree.
Only 10 URLs from 2,148 (0.47%) account for 17.97% of links.
1% URLs (22 URLs from 2,148) account for 26.50% of links.
Only 10 users from 3,668 (0.27%) account for 5.25% of links.
1% users (37 users from 3,668) account for 12.01% of links.
17. 17
Results
Top authoritative nodes in the Delicious “Globalization of agriculture” network
Indegree Outdegree
Value URL Description Value User Description
1 259 www.nytimes.com On line newspaper 71 /garrygolden
http://www.garrygold
en.net/
Professionalfuturist
2 170
www.independent.co.
uk
On line newspaper 51 /mritiunjoy
Mritiunjoy Mohanty
Professor,Economics
Indian Instituteof
Management Calcutta
3 155 www.naomiklein.org Activist media site 44 /emmarlyb
4 144 www.news.bbc.co.uk/ On line newspaper 42 /woldpublicopinion
http://www.worldpubl
icopinion.org/
Activist media site
5 124
www.globalresearch.c
a
Activist media site 33
/criticalspatialpractic
e
Nicholas Brown
Artist
6 95 www.spiegel.de/ On line newspaper 30 /pagolnari
Dr. Kathy Ward pagol
Nari
Professor,Carbondale,
EEUU
Feminist blogger
http://pagolnari.blogs
pot.com.es/
7 94 www.guardian.co.uk/ On line newspaper 28 /bfunk
Bryan Finoki
http://subtopia.blogsp
ot.com.es/
Author Subtopia
(Blog),Senior Editor,
Archinect, and
Adjunct, Woodbury
University School of
Architecture,San
Diego
8 94 www.economist.com/ On line newspaper 28 /chris.h.p
9 87 www.corpwatch.org Activist media site 27 /maitreya11 Carlos Puentes
10 72 www.theatlantic.com Online magazine 24 /matttbastard
Matthew Elliot
http://bastardlogic.wo
rdpress.com/
10 most centralized websites.
Six of them were media-based
(online newspapers such as
The New York Times, The
Independent, BBC, Spiegel,
The Guardian, and The
Economist) and three wer
activist (Naomi Klein, Global
Research, and Corpwatch)
Identification of Users with a
greater degree of centrality.
Mritiunjoy user plays a very
important role in the
network.
Mritiunjoy joined to Delicious on
12 march, 2007.
Mritiunjoy Mohanty - is a professor
at the Indian Institute of
Management Calcutta, and his
Research Interests are Political
Economy of growth and
development.
19. 19
Cluster
K=1..5
(subnet)
Nodes Frequence
(%)
CumFreq
(nodes)
CumFreq (%)
1 4,445 76.43% 4,445 76.43%
2 792 13.62% 5,237 90.04%
3 387 6.65% 5,624 96.70%
4 147 2.53% 5,771 99.23%
5 45 0.77% 5,816 100.00%
Sum 5,816 100.00%
k-core: A k-core of a graph G is a maximal connected sub-graph of G in which all vertices have a degree of
at least k.
Results
Cohesion and substructures
20. 20
2-core 792 vertices. Density=0.26% 3-core 387 vertices. Density=1.16%
4-core 147 vertices. Density=5.16% 5-core 45 vertices. Density=34.77%
Results
Cohesion and substructures
We found that the mass media websites belong to the 5-core subgroup, as
the main activists websites are included in the 4-core.
21. 21
Gráfico 9. Nube de etiquetas para la Red de
Globalización de la Agricultura identificada en
Delicious (Principales etiquetas de la red)
Main themes
Results
TagCloud: identifying the topical themes in the unipartite tagtag network
Size proportional to the weights - the top 50 highest weighted tags.
Produced by Wordle
22. 22
Discussion
• Because tagging is a bottom up process, the constitution of a global
network in this way suggests a very old sociological dilemma concerning the
constitution of society.
– Do individuals (or micro entities) came first or are communities and societies
present from the very beginning?
– Does human agency determine social structures or is an individual's behavior
determined by social structures?
• We found the bottom-up social tagging process is crucial, but it could not
exist without Web 2.0 technology.
• What it is especially interesting for us here is whether these questions
could be transferred to understanding the society that lives around the
process of social tagging inside Web 2.0 as we exemplified in this article by
the social bookmarking site Delicious.
• The approach of this study acknowledges the reciprocity and influence of
the social and semantic characteristics. However, the user is who
ultimately decides if one URL have to be included or not and whether he or
she is going to write new tags. Thus, the constitution of the globalization of
agriculture network is probably a mixture, as it is the society.
23. 23
Discussion
Centrality and Power
• Very inequal distribution of power of the URLs cited by users
in the topic globalization of agriculture.
– Important accumulation of inbound links.
• Mass media and activists in this network of globalization of
agriculture in Delicious surpassed by far other resources
tagged.
• Identification of key collective actors (represented here
through URLs as unknow users as well) allow a better
comprehension of leadership, influence processes, and
power-related structures.
• For social practitioners, is a good way to identify key
informants in a community through which to disseminate
useful and important information.
ADVANTAGES OF THIS TYPE OF KNOWLEDGE
FOR RESEARCHING AND INTERVENING
24. 24
Discussion
Central Tags: Users producing Tags
• Tags: suggested by the website or added new tags in a
creative way
• Each user could label a URL with an unlimited number
of tags.
• Tag Cloud: visual approach to the language used by
users and to identify discourses.
• From a total of 4,776 tags, two words were the main
ones.
• Most frequently tags used were the words:
‘economics’ and ‘politics’.
25. 25
Conclusions
Achieved goals
• A first step towards the development of empirical techniques
capable of automatically differentiating actors who occupy a
more central position.
• First stone in the difficult process of understanding and
discovering patterns in the process that characterize users
tagging URLs for collaborative reasons.
• Utility for discovering latent patterns = provide effective
recommendations to different actors.
• Understanding the community of more than a thousand
links.
• Retrieval and analysis of information was complex but easy =
working in interdisciplinary teams.
26. 26
FOCUS ON Users
•Identification of key actors that disseminate and share URLs, as the
previously cited Mritiunjoy
– Determine from where key elements that structure the network emerge.
•Why is ‘that’ so important actor in the network of globalization of
agriculture?
– Key actors in this type of network could configure and reconfigure the
evolution of the network (TIME), and structure and even manipulate the
type of interchange of resources in Delicious or in similar bookmarking sites.
•Use of some tags at classifying URLs and the distinction among users in the
way they use some words/tags
– Distinction between scientifics / other professionals or users?
– Identify users with the same patterns at tagging, or URLs that were similarly
labelled: study structural equivalences
•Is it by chance? Are most prominent actors in a type of website like
Delicious corresponding to a profile of very active and participative people?
Do they usually work (or have as hobby) in this area and this is why
accumulate and tag so many URLs in Delicious?
•Go in-depth about users (if possible).
Further research
27. 27
Further research
FOCUS ON Tags
• Reasons of the prominence of the two first tags around the globalization of
agriculture.
– Influence of first tags on the following ones.
• Role of innovation and creativity at tagging
• Are some of the 4,776 found tags used in a interchangeable basis?
– Why sometimes the word economics is used sometimes, and why other times is
used economy?
– Are they used in the same way at classifying the URLs?
– Evolution and usage of language around an issue along time.
– Ideological and terminological approaches in the national/ international arena.
• Other possible studies based in retrieving the pages and making content
analysis.
• Why some labels are present/ absent?
• Are there “traditions”/ “fashions” at tagging in the Web 2.0?
OTHERS
• To compare results from Delicious and from other social bookmarking sites.
• Longitudinal analysis.
• And other explorations, other starting points, other indicators, etc.
28. 28
Possible Applications
• Producing and “manipulating” public opinion (at
recommending and describing websites) and markets
– If we know the interests of users belonging to a network, we could also
be able to make recommendations
• Important for researchers interested in formulating strategies
for intervention and mobilization, but also practitioners, and
firms could make use of this.
• The discovery of the central elements in a network (users and
URLs), at the same time that the tags used by users could be
key to design future strategies for diffusion (spreading taglines,
causes, rumours, etc.
• Implementation of Information Retrieval and Recommender
Systems techniques in social commerce and social media
contexts.
• Applications in advertising, e-commerce, mobilizing, security…s
• …