11
Production of new knowledge through automated Big
Data extraction from Social Bookmarking Systems and
analyzing of the ...
22
Table of Contents
Introduction
Web 2.0 and Social Tagging Systems
Social tagging and folksonomy
Folksonmy and collectiv...
33
Framework
Web 2.0 and Social Tagging Systems
Many users add metadata in the
form of TAGS
Resulting collective tag struc...
44
Framework
Social Tagging
A user enjoys a resource and, according to his or her mental
model, identifies those terms tha...
55
Source: http://scot-project.net//
Social tags produced by
users are usually regarded
as high quality descriptors
of web...
66
… that emerges via a Bottom-up
process, and… …the tags of many different users
are aggregated and the resulting
collect...
7
Context and Topic of Study
Context
Deliciousis a free social
bookmarking web service for
storing, sharing and
discoverin...
88
Context and Topic of Study
Tagging on Delicious
People can classify the huge amount of information at her/his
disposal ...
99
Context and Topic of Study
Tag structure, Delicious and social networks
We can see Delicious as a tripartite
network wh...
1010
Globalization
Implies large market as result of the reduction transaction costs of international
trade
Globalization ...
11
Objectives
To discover some type of structuration around the
issue of the globalization of agriculture on Delicious
Ext...
1212
Methodology
Data collection / Procedure
(A) Start point. Identify the search attributes. Authoritative
source as base...
1313
Methodology
Data collection / Final dataset
2,148 URLs 4,776 tags 3,668 users
14
Methodology
Analysis
With the help of the Software Pajek, we
analyzed these social networks,
first studying its propert...
15
Network Type Relation # of
nodes
# of links Density Av.
Degree
User  URL Bipartite Directed 5,816 7,200 0.09% 2.476
Us...
1616
The network is highly centralized within a few nodes. The power law
is a defining characteristic of large-scale netwo...
17
Results
Top authoritative nodes in the Delicious “Globalization of agriculture” network
Indegree Outdegree
Value URL De...
18
Results
Visualization UserURL network. 5,816 nodes
Energy-Frutcherman (Pajek) Map. Color: Cores
19
Cluster
K=1..5
(subnet)
Nodes Frequence
(%)
CumFreq
(nodes)
CumFreq (%)
1 4,445 76.43% 4,445 76.43%
2 792 13.62% 5,237 ...
20
2-core 792 vertices. Density=0.26% 3-core 387 vertices. Density=1.16%
4-core 147 vertices. Density=5.16% 5-core 45 vert...
21
Gráfico 9. Nube de etiquetas para la Red de
Globalización de la Agricultura identificada en
Delicious (Principales etiq...
22
Discussion
• Because tagging is a bottom up process, the constitution of a global
network in this way suggests a very o...
23
Discussion
Centrality and Power
• Very inequal distribution of power of the URLs cited by users
in the topic globalizat...
24
Discussion
Central Tags: Users producing Tags
• Tags: suggested by the website or added new tags in a
creative way
• Ea...
25
Conclusions
Achieved goals
• A first step towards the development of empirical techniques
capable of automatically diff...
26
FOCUS ON Users
•Identification of key actors that disseminate and share URLs, as the
previously cited Mritiunjoy
– Dete...
27
Further research
FOCUS ON Tags
• Reasons of the prominence of the two first tags around the globalization of
agricultur...
28
Possible Applications
• Producing and “manipulating” public opinion (at
recommending and describing websites) and marke...
Upcoming SlideShare
Loading in …5
×

Production knowledge imass-olhao_24-4-2014_en

158 views
125 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
158
On SlideShare
0
From Embeds
0
Number of Embeds
63
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Production knowledge imass-olhao_24-4-2014_en

  1. 1. 11 Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious 1st IMASS Conference, Methods and Analyses in Social Sciences, 23-24 April 2014, Olhão, Portugal, http://imass.ca/imass/conference University of Huelva, Spain Juan D. Borrero, jdiego@uhu.es Estrella Gualda, estrella@uhu.es José Carpio, jose.carpio@dti.uhu.es
  2. 2. 22 Table of Contents Introduction Web 2.0 and Social Tagging Systems Social tagging and folksonomy Folksonmy and collective tag structure Context and Topic of Study Delicious Tagging on Delicious Tag structure, Delicious and social networks Globalization of Agriculture Objectives Methodology Data collection Analysis Results Social network statistics from Delicious dataset Network centralization Top authoritative nodes Visualization UserURL net Cohesion and substructures Tag clouds Discussion Centrality and power Central tags Conclusions Further research Possible applications Production of new knowledge through automated Big Data extraction from Social Bookmarking Systems and analyzing of the resulting network: The case of the network of the globalization of agriculture in Delicious
  3. 3. 33 Framework Web 2.0 and Social Tagging Systems Many users add metadata in the form of TAGS Resulting collective tag structure Source: http://www.idonato.com/2009/05/27/fun-with-tag-clouds/ Source: http://blog.hubspot.com/blog/tabid/6307/bid/7372/9-Reasons-Why- Your-Social-Media-Strategy-Isn-t-Working.aspx/ Source: http://bvdt.tuxic.nl/index.php/the-wisdom-of- the-crowds-in-the-audiovisual-archive-domain/ Web 2.0 has made tagging possible for a wide range of people to produce, share, interact with, and organize data
  4. 4. 44 Framework Social Tagging A user enjoys a resource and, according to his or her mental model, identifies those terms that best describe the information conveyed by that resource is the activity in the Web 2.0 of annotating digital resources with keywords - tags (Golder and Huberman, 2006; Trant, 2009). Social Tagging Tagging
  5. 5. 55 Source: http://scot-project.net// Social tags produced by users are usually regarded as high quality descriptors of web page topics and a good indicator of web users’ interests and preferences. This process also allows the formation of a socially constructed classification schema called folksonomy… (Vander Wal, 2004) Framework Social tagging and folksonomy
  6. 6. 66 … that emerges via a Bottom-up process, and… …the tags of many different users are aggregated and the resulting collective tag structure – such as tag cloud – depicts the collective knowledge of Web users (Cress et al., 2012) Source: http://blog.cimmyt.org/?p=6052 Source: http://scot-project.net// Framework Folksonomy and collective tag structure
  7. 7. 7 Context and Topic of Study Context Deliciousis a free social bookmarking web service for storing, sharing and discovering web bookmarks •Content is created, annotated and viewed by its users. •Non-hierarchical classification system: users can tag each of their bookmarks on the Delicious website, and provides knowledge about the URL marked •Collective nature: • view bookmarks added or annotated by other users. • organize existing tags into groups (tag bundles). Source: www.delicious.com
  8. 8. 88 Context and Topic of Study Tagging on Delicious People can classify the huge amount of information at her/his disposal in the form of tags. Keywords freely chosen by users employed to annotate various types of digital content, or suggested by Delicious Source: www.delicious.com
  9. 9. 99 Context and Topic of Study Tag structure, Delicious and social networks We can see Delicious as a tripartite network whose representation can be described by two bipartite networks, for user→tag and user→URL relations, and where we can also see indirected links (e.g. between users - straight lines), that represent a unipartite network The structure of Social tagging websites can be viewed as a network of three different node types: the U users, the R resources (web sites – URLs) and the T tags that the U users deploy to tag the R websites. A Tripartite Network made of three users U=(u,u’,u’’), four tags T=(t,t’,t’’,t’’’) and three URLs (url,url’,url’’) In Delicious, an annotation is mainly composed of three interconnected components (Smith, 2008): 1. Link to the resource (website) 2. One or more tags 3. User who makes the annotation
  10. 10. 1010 Globalization Implies large market as result of the reduction transaction costs of international trade Globalization of agriculture - trade (foods, goods) - prices (food, goods) - food consumption (bulk products versus processed products) - R&D - rules and laws (subsidies, WTO related to poverty) implications Asymmetries effects Web 2.0 Discussion/diffusion Context and Topic of Study Topic
  11. 11. 11 Objectives To discover some type of structuration around the issue of the globalization of agriculture on Delicious Extracting automatically data from Delicious social bookmarking website, and using Social Network Analysis (SNA), 1.what types of URLs around our topic have been recommended via collaborative tagging in Delicious, 2.what types of users label URLs around this topic, 3.whether there is some type of structuration and hierarchy to be discovered in the network of the globalisation of agriculture (centrality, substructures, etc.), and 4.what types of tags are been used to specifically label (and thus define and qualify) the URLs on the globalization of agriculture that they recommend through Delicious.
  12. 12. 1212 Methodology Data collection / Procedure (A) Start point. Identify the search attributes. Authoritative source as baseline to find keywords connected to the idea of ‘globalization of agriculture’ – Wikipedia definition of “critics of globalization (popular, high reputation) – Other starts points (future) – Selected (manually= researcher expertise) main concepts from the website homepages, tag clouds or topics. – Identified the 9 seed keywords (globalization + agriculture, development, activism, trade, poverty, food, organic, GMO) – Other concepts rejected (B) Perl program web-crawling was made to gather the sample of users, URLs and tags for - globalization+agriculture;globalization+development; globalization+activism; globalization+poverty; globalization+food; globalization+organic; globalization+GMO - From 22 April 2011 to 21 May 2011 (C) Results - 61,043 taggings that involved 3,668 users on 4,913 URLs and 5,724 tags. (D) Program in Haskell to reduce the amount of data by cutting the URLs and using key words, including the identification of synonyms, the elimination of words with capital letters and derivatives such as words in plural.
  13. 13. 1313 Methodology Data collection / Final dataset 2,148 URLs 4,776 tags 3,668 users
  14. 14. 14 Methodology Analysis With the help of the Software Pajek, we analyzed these social networks, first studying its properties (quantitative), and second visualizing the nets (qualitative) through force-directed graph layouts and tag clouds.
  15. 15. 15 Network Type Relation # of nodes # of links Density Av. Degree User  URL Bipartite Directed 5,816 7,200 0.09% 2.476 User– User Unipartite Undirected 3,668 134,833 1.97% 73.5187 URL – URL Unipartite Undirected 2,148 20,558 0.84% 19.141 Tag – Tag Unipartite Undirected 4,776 539,105 47.06% 225.756 A bipartite network with a directed relation is a network created through two different types of nodes (in this case “users” and “URLs”), that are directly connected by a relationship or link (in this work: user recommend URLs, or user tag URLs) (2-mode network). A unipartite network with an undirected relation is a network created after a transformation of the original matrix into a user-user, tag- tag, or URL-URL matrix. In these cases there is an undirected relation through a vertice (node) that connect both (1-mode network). For instance, a user-user matrix is built here through the URLs that connect users, because different people can tag or recommend the same URL. Results Social Network Statistics from Delicious dataset Tag-tag network is much denser than the others: People usually use common tags
  16. 16. 1616 The network is highly centralized within a few nodes. The power law is a defining characteristic of large-scale networks such as the Web (e.g. Barabási and Albert, 1999), which implies a high degree of network centralization How come that a few users and websites are better connected than the majority? 2,148 URLs arranged in rank order by number of inbound links (URL’s Indegree: Sum of total inbound links) 3,668 users arranged in rank order by number of outbound links (User’s Outdegree: Sum of total outbound links) Results Network centralization Hyperlink Network (userURL). The degree of variability in URL and user centrality scores according to indegree and outdegree. Only 10 URLs from 2,148 (0.47%) account for 17.97% of links. 1% URLs (22 URLs from 2,148) account for 26.50% of links. Only 10 users from 3,668 (0.27%) account for 5.25% of links. 1% users (37 users from 3,668) account for 12.01% of links.
  17. 17. 17 Results Top authoritative nodes in the Delicious “Globalization of agriculture” network Indegree Outdegree Value URL Description Value User Description 1 259 www.nytimes.com On line newspaper 71 /garrygolden http://www.garrygold en.net/ Professionalfuturist 2 170 www.independent.co. uk On line newspaper 51 /mritiunjoy Mritiunjoy Mohanty Professor,Economics Indian Instituteof Management Calcutta 3 155 www.naomiklein.org Activist media site 44 /emmarlyb 4 144 www.news.bbc.co.uk/ On line newspaper 42 /woldpublicopinion http://www.worldpubl icopinion.org/ Activist media site 5 124 www.globalresearch.c a Activist media site 33 /criticalspatialpractic e Nicholas Brown Artist 6 95 www.spiegel.de/ On line newspaper 30 /pagolnari Dr. Kathy Ward pagol Nari Professor,Carbondale, EEUU Feminist blogger http://pagolnari.blogs pot.com.es/ 7 94 www.guardian.co.uk/ On line newspaper 28 /bfunk Bryan Finoki http://subtopia.blogsp ot.com.es/ Author Subtopia (Blog),Senior Editor, Archinect, and Adjunct, Woodbury University School of Architecture,San Diego 8 94 www.economist.com/ On line newspaper 28 /chris.h.p 9 87 www.corpwatch.org Activist media site 27 /maitreya11 Carlos Puentes 10 72 www.theatlantic.com Online magazine 24 /matttbastard Matthew Elliot http://bastardlogic.wo rdpress.com/ 10 most centralized websites. Six of them were media-based (online newspapers such as The New York Times, The Independent, BBC, Spiegel, The Guardian, and The Economist) and three wer activist (Naomi Klein, Global Research, and Corpwatch) Identification of Users with a greater degree of centrality. Mritiunjoy user plays a very important role in the network. Mritiunjoy joined to Delicious on 12 march, 2007. Mritiunjoy Mohanty - is a professor at the Indian Institute of Management Calcutta, and his Research Interests are Political Economy of growth and development.
  18. 18. 18 Results Visualization UserURL network. 5,816 nodes Energy-Frutcherman (Pajek) Map. Color: Cores
  19. 19. 19 Cluster K=1..5 (subnet) Nodes Frequence (%) CumFreq (nodes) CumFreq (%) 1 4,445 76.43% 4,445 76.43% 2 792 13.62% 5,237 90.04% 3 387 6.65% 5,624 96.70% 4 147 2.53% 5,771 99.23% 5 45 0.77% 5,816 100.00% Sum 5,816 100.00% k-core: A k-core of a graph G is a maximal connected sub-graph of G in which all vertices have a degree of at least k. Results Cohesion and substructures
  20. 20. 20 2-core 792 vertices. Density=0.26% 3-core 387 vertices. Density=1.16% 4-core 147 vertices. Density=5.16% 5-core 45 vertices. Density=34.77% Results Cohesion and substructures We found that the mass media websites belong to the 5-core subgroup, as the main activists websites are included in the 4-core.
  21. 21. 21 Gráfico 9. Nube de etiquetas para la Red de Globalización de la Agricultura identificada en Delicious (Principales etiquetas de la red) Main themes Results TagCloud: identifying the topical themes in the unipartite tagtag network Size proportional to the weights - the top 50 highest weighted tags. Produced by Wordle
  22. 22. 22 Discussion • Because tagging is a bottom up process, the constitution of a global network in this way suggests a very old sociological dilemma concerning the constitution of society. – Do individuals (or micro entities) came first or are communities and societies present from the very beginning? – Does human agency determine social structures or is an individual's behavior determined by social structures? • We found the bottom-up social tagging process is crucial, but it could not exist without Web 2.0 technology. • What it is especially interesting for us here is whether these questions could be transferred to understanding the society that lives around the process of social tagging inside Web 2.0 as we exemplified in this article by the social bookmarking site Delicious. • The approach of this study acknowledges the reciprocity and influence of the social and semantic characteristics. However, the user is who ultimately decides if one URL have to be included or not and whether he or she is going to write new tags. Thus, the constitution of the globalization of agriculture network is probably a mixture, as it is the society.
  23. 23. 23 Discussion Centrality and Power • Very inequal distribution of power of the URLs cited by users in the topic globalization of agriculture. – Important accumulation of inbound links. • Mass media and activists in this network of globalization of agriculture in Delicious surpassed by far other resources tagged. • Identification of key collective actors (represented here through URLs as unknow users as well) allow a better comprehension of leadership, influence processes, and power-related structures. • For social practitioners, is a good way to identify key informants in a community through which to disseminate useful and important information. ADVANTAGES OF THIS TYPE OF KNOWLEDGE FOR RESEARCHING AND INTERVENING
  24. 24. 24 Discussion Central Tags: Users producing Tags • Tags: suggested by the website or added new tags in a creative way • Each user could label a URL with an unlimited number of tags. • Tag Cloud: visual approach to the language used by users and to identify discourses. • From a total of 4,776 tags, two words were the main ones. • Most frequently tags used were the words: ‘economics’ and ‘politics’.
  25. 25. 25 Conclusions Achieved goals • A first step towards the development of empirical techniques capable of automatically differentiating actors who occupy a more central position. • First stone in the difficult process of understanding and discovering patterns in the process that characterize users tagging URLs for collaborative reasons. • Utility for discovering latent patterns = provide effective recommendations to different actors. • Understanding the community of more than a thousand links. • Retrieval and analysis of information was complex but easy = working in interdisciplinary teams.
  26. 26. 26 FOCUS ON Users •Identification of key actors that disseminate and share URLs, as the previously cited Mritiunjoy – Determine from where key elements that structure the network emerge. •Why is ‘that’ so important actor in the network of globalization of agriculture? – Key actors in this type of network could configure and reconfigure the evolution of the network (TIME), and structure and even manipulate the type of interchange of resources in Delicious or in similar bookmarking sites. •Use of some tags at classifying URLs and the distinction among users in the way they use some words/tags – Distinction between scientifics / other professionals or users? – Identify users with the same patterns at tagging, or URLs that were similarly labelled: study structural equivalences •Is it by chance? Are most prominent actors in a type of website like Delicious corresponding to a profile of very active and participative people? Do they usually work (or have as hobby) in this area and this is why accumulate and tag so many URLs in Delicious? •Go in-depth about users (if possible). Further research
  27. 27. 27 Further research FOCUS ON Tags • Reasons of the prominence of the two first tags around the globalization of agriculture. – Influence of first tags on the following ones. • Role of innovation and creativity at tagging • Are some of the 4,776 found tags used in a interchangeable basis? – Why sometimes the word economics is used sometimes, and why other times is used economy? – Are they used in the same way at classifying the URLs? – Evolution and usage of language around an issue along time. – Ideological and terminological approaches in the national/ international arena. • Other possible studies based in retrieving the pages and making content analysis. • Why some labels are present/ absent? • Are there “traditions”/ “fashions” at tagging in the Web 2.0? OTHERS • To compare results from Delicious and from other social bookmarking sites. • Longitudinal analysis. • And other explorations, other starting points, other indicators, etc.
  28. 28. 28 Possible Applications • Producing and “manipulating” public opinion (at recommending and describing websites) and markets – If we know the interests of users belonging to a network, we could also be able to make recommendations • Important for researchers interested in formulating strategies for intervention and mobilization, but also practitioners, and firms could make use of this. • The discovery of the central elements in a network (users and URLs), at the same time that the tags used by users could be key to design future strategies for diffusion (spreading taglines, causes, rumours, etc. • Implementation of Information Retrieval and Recommender Systems techniques in social commerce and social media contexts. • Applications in advertising, e-commerce, mobilizing, security…s • …

×