A Computer-Based Approach For Deriving And Measuring Individual And Team Know...Angie Miller
Similar to Theoretical foundations and applications: a study of normalized indicators Salton's Cosine and Jaccard Index in Author Co-citation Analysis (20)
THE ROLE OF PHARMACOGNOSY IN TRADITIONAL AND MODERN SYSTEM OF MEDICINE.pptx
Theoretical foundations and applications: a study of normalized indicators Salton's Cosine and Jaccard Index in Author Co-citation Analysis
1. Theoretical foundations and applications: a study of normalized
indicators Salton's Cosine and Jaccard Index in Author Co-citation
Analysis
Bruno Henrique Alves1 & Ely Francina Tannuri de Oliveira2
1 bruninkmkt@hotmail.com
2 etannuri@gmail.com
UNESP – Univ Estadual Paulista, 737 Hygino Muzzi Filho Avenue, 17525-900 Marília (Brazil)
1 Introduction
Studies on Author Co-citation Analysis (ACA) aim to identify influential authors and show
their interrelations from citations (White; Griffith, 1981; White; McCain, 1998).
When comparative studies are intended, given the specificities of each area, the
importance of normalized indicators, which standardize the units of measure and reveal
aspects not explained in absolutes, are emphasized.
According to the studies of Luukkonen et al. (1993), absolute and normalized measures
carry different types of information: the first shows the central "actors" of the networks,
while the latter shows the intensity of relations and reveal aspects that are not
identifiable in the absolute frequencies. Among relative indices, Pearson's Correlation
Coefficient Salton's Cosine, and Jaccard Index are cited (Leydersdoff; Vaughan, 2006).
Pearson's r was the standard measure before the studies of Ahlgren, Jarneving &
Rousseau (2003), who criticized its use, showing that it does not satisfy as similarity and
proximity measures.
This research aims to deepen the study on normalized indicators of ACA. Specifically, it
presents and analyzes normalized indicators such as Salton's Cosine (Ss) and Jaccard Index
(JI), and compares the similarities between them via identification of normalized relations
applied to Information Science.
Salton's Cosine (Ss) and Jaccard Index (JI) are stressed. These two normalized indices are
calculated from the co-occurrence matrix of absolute data, according to Luukkonen et al.
(1993).
In the studies by Hamers et al. (1989), co-occurrences represent co-citations, Ss is then
expressed (Equation 1):
Where:
coc(a,b)= total of co-occurences of authors a and b
cit(a) = total of citations received by author a
cit(b)= total of citations received by author b
Luukkonen et al (1993), express JI by (Equation 2):
Both Ss as IJ vary between zero and one: the closer to one, more similar are the two
authors (with theoretical-methodological proximity, similarity, complementarity, overlap,
or opposed ideas or even co-authorship); the closer to zero, the farther is the association
between the two authors.
2 Methodological procedures
Data was extracted from 110 articles published in the 2007-2011 period, from ENANCIBs
proceedings, in Brazil. We identified 1242 cited researchers, 2003 references, composing a
target group of 20 researchers cited at least 12 times.
A 20x20 square matrix was built, from the most cited authors, with absolute co-citation
frequency. Ss and JI was applied. We used Microsoft Excel macros built in "Visual Basic for
Applications“ (VBA). We comparatively analyzed the results of the two normalized matrices
using Ss and JI, evidencing the proximities, similarities and differences between the present
values and the intensities of connections in the networks.
3 Presentation and analysis of data
The two normalized matrices using Ss (Equation 1) and JI (Equation 2) are presented in Tables
1 and 2. In the analysis of Tables 1 and 2, we initially highlighted Meneghini and Packer with
the highest value for Ss equal to 0.84 and JI equal to 0.71, observing that in the absolute
matrix (not presented here) the co-citation between these two authors is 10, with 13
citations made to Meneghini and 11 to Packer. The number of co-citations is relativized by
citations made to the two authors. In Figures 1 and 2, the links between these authors are
strongly highlighted.
Research Group for
Metric Studies of
Table 1. Ss Normalized Matrix
Information
Meadows and Mueller present Ss equal to 0.54, JI equal to 0.37 and the absolute
number of co-citations equal to 19 with 31 citations made to Meadows and 40
citations to Mueller, which justifies the relativized median value for Ss, and lower
to JI. In Figure 1, the link between these two researchers is much more
highlighted than in Figure 2.
Table 2. JI Normalized Matrix
Researchers Leta and Spinak present 0.08 for Ss and 0.04 for JI with absolute co-citation
value equal to 1, with 12 citations to Leta and 12 citations to Spinak,
which explains the low relativized values. In Figures 1 and 2 the connections for
both Ss and JI present their links slightly differentiated.
The highlights r atify Hamers et.al. (1989), when they claim that Ss formula often
produces a relative similarity measure which is twice the number obtained by JI.
Extending this analysis to other values, it is observed that the higher the Ss, the
closer JI will be to it, and above half of Ss (the example of Meneghini and Packer;
Meadows and Mueller), and the lower the Ss, the JI will be closer or will be the very
half (the example of Leta and Spinak).
4 Final considerations
This study has validated the analyzes already made b y other scholars and advanced
on existing analyzes between Ss and JI, showing when there is a tendency of
proximity. They exhibit similar behavior and the choice of using either index does
not present a conclusive position on the pointed question, and consequently, the
appropriate methodology to establish ACA is not fully consolidated.
References
Algren, P., Jarneving, B. & Rousseau, R. (2003). Requirements for a Cocitation Similarity Measure, with Special Reference to Pearson’s Correlation Coefficient. Journal of the American
Society for Information Science and Technology, 54, 6, 550-560.
Hamers, L. et al. (1989). Similarity measures in scientometric research: the Jaccard index versus Salton’s cosine formula. Information Processing & Management, 25, 3, 315-318.
Leydesdorff, L. & Vaughan, L. (2006). Co-occurrence Matrices and their applications in Information Science: Extending ACA to the Web environment. Journal of the American Society for
Information Science and Technology, 57, 12, 1616-1628.
Luukkonen, T. et al. (1993). The measurement of international scientific collaboration. Scientometrics, Amsterdam, 28, 1, 15-36.
White, H.D., & Griffith, B. (1981). Author cocitation: A literature measure of intellectual structure. Journal of the American Society for Information Science, 32, 163–171.
White, H.D. & Mccain, K.W. (1998). Visualizing a discipline: an author co-citation analysis of Information Science, 1972-1995. Journal of the American Society for Information Science, 49, 4,
327-355.