This document discusses applying weighted PageRank algorithms to author citation networks to measure author popularity and prestige. It analyzes data on authors in the information retrieval field from 1956-2008. The results show that weighted PageRank using citations or publications as weights correlates highly with popularity rank. Prestige rank, which measures citations from highly cited papers, differs more from PageRank and correlates highest with weighted PageRank using citations as weights. Weighted PageRank may thus help distinguish popularity from prestige when analyzing author impact.
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Is search always the right solution? There are many things you can do with a hammer, but it’s not so great if you need to turn a screw.
Text Classification is an alternative to search that may be more appropriate for social media data analysis. Text classification is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world. Using text classification as the foundation for analysis – i.e., teaching a machine to categorize posts the way humans do – can dramatically improve your ability to gather the right data and, ultimately, increase the chances that you’ll uncover what you need to know.
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...ijdms
This paper aims to show how mutual information can help provide a semantic interpretation of anomalies in data, characterize the anomalies, and how mutual information can help measure the information that object item X shares with another object item Y. Whilst most link mining approaches focus on predicting link type, link based object classification or object identification, this research focused on using link mining to detect anomalies and discovering links/objects among anomalies. This paper attempts to demonstrate the contribution of mutual information to interpret anomalies using a case study.
Integrated expert recommendation model for online communitiesst02IJwest
Online communities have become vital places for Web 2.0 users to share knowledg
e and experiences.
Recently, finding expertise user in community has become an important research issue. This paper
proposes a novel cascaded model for expert recommendation using aggregated knowledge extracted from
enormous contents and social network fe
atures. Vector space model is used to compute the relevance of
published content with respect
to a specific query while PageRank
algorithm is applied to rank candidate
experts. The experimental results sho
w that the proposed model is
an effective recommen
dation which can
guarantee that the most candidate experts are both highly relevant to the specific queries and highly
influential in corresponding areas
This is a high-level summary of three important ways to help people find information. The slides were presented at Vera Rhoades' information architecture class at the University of Maryland.
Is search always the right solution? There are many things you can do with a hammer, but it’s not so great if you need to turn a screw.
Text Classification is an alternative to search that may be more appropriate for social media data analysis. Text classification is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world. Using text classification as the foundation for analysis – i.e., teaching a machine to categorize posts the way humans do – can dramatically improve your ability to gather the right data and, ultimately, increase the chances that you’ll uncover what you need to know.
Interpreting the Semantics of Anomalies Based on Mutual Information in Link M...ijdms
This paper aims to show how mutual information can help provide a semantic interpretation of anomalies in data, characterize the anomalies, and how mutual information can help measure the information that object item X shares with another object item Y. Whilst most link mining approaches focus on predicting link type, link based object classification or object identification, this research focused on using link mining to detect anomalies and discovering links/objects among anomalies. This paper attempts to demonstrate the contribution of mutual information to interpret anomalies using a case study.
Integrated expert recommendation model for online communitiesst02IJwest
Online communities have become vital places for Web 2.0 users to share knowledg
e and experiences.
Recently, finding expertise user in community has become an important research issue. This paper
proposes a novel cascaded model for expert recommendation using aggregated knowledge extracted from
enormous contents and social network fe
atures. Vector space model is used to compute the relevance of
published content with respect
to a specific query while PageRank
algorithm is applied to rank candidate
experts. The experimental results sho
w that the proposed model is
an effective recommen
dation which can
guarantee that the most candidate experts are both highly relevant to the specific queries and highly
influential in corresponding areas
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...INFOGAIN PUBLICATION
The Improved Web Explorer aims at extraction and selection of the best possible hyperlinks and retrieving more accurate search results for the entered search query. The hyperlinks that are more preferable to the entered search query are evaluated by taking into account weighted values of frequencies of words in search string that are present in anchor texts and plain texts available in title and body tags of various hyperlink pages respectively to retrieve relevant hyperlinks from all available links. Then the concept of ontology is used to gain insights of words in search string by finding their hypernyms, hyponyms and synsets to reach to the source and context of the words in search string. The Explicit Semantic Similarity analysis along with Naïve Bayes method is used to find the semantic similarity between lexically different terms using Wikipedia and Google as explicit semantic analysis tools and calculating the probabilities of occurrence of words in anchor and body texts .Vector Space Model is being used to calculate Term frequency and Inverse document frequency values, and then calculate cosine similarities between the entered Search query and extracted relevant hyperlinks to get the most appropriate relevance wise ranked search results to the entered search string
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsCITE
HU, Xiao (University of Hong Kong)
http://citers2013.cite.hku.hk/en/paper_619.htm
---------------------------
Author(s) bear(s) the responsibility in case of any infringement of the Intellectual Property Rights of third parties.
---------------------------
CITE was notified by the author(s) that if the presentation slides contain any personal particulars, records and personal data (as defined in the Personal Data (Privacy) Ordinance) such as names, email addresses, photos of students, etc, the author(s) have/has obtained the corresponding person's consent.
Citations—often termed as intellectual transactions, acknowledgment of intellectual debts, and conceptual association—are a link between the author’s current study and already published work. It not only provides credibility to the author’s work but also helps funders evaluate the impact of the research study. Citation indexes are maintained for information retrieval of both cited and citing work, facilitating the literature search process. It also helps authors in identifying the number of citations that their papers have received. Citation data is considered as a legitimate measure to rank authors, journals, and publishers. Through this webinar, we aim to provide information about citation indexing and how authors and publishers can get indexed in established citation databases.
Co-word analyses study the co-occurrence of pairs of items (for example, keywords) that are representative in a document, to identify relations between the ideas presented in the
texts.
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKINGcsandit
With the enormous growth of data, retrieving information from the Web became more desirable
and even more challenging because of the Big Data issues (e.g. noise, corruption, bad
quality…etc.). Expert seeking, defined as returning a ranked list of expert researchers given a
topic, has been a real concern in the last 15 years. This kind of task comes in handy when
building scientific committees, requiring to identify the scholars’ experience to assign them the
most suitable roles in addition to other factors as well. Due to the fact the Web is drowning with
plenty of data, this opens up the opportunity to collect different kinds of expertise evidence. In
this paper, we propose an expert seeking approach with specifying the most desirable features
(i.e. criteria on which researcher’s evaluation is done) along with their estimation techniques.
We utilized some machine learning techniques in our system and we aim at verifying the
effectiveness of incorporating influential features that go beyond publications
Structured data and metadata evaluation methodology for organizations looking...Emily Kolvitz
The current state of findability on the web for many organizations is incipient. Search Engine Optimization (SEO) techniques change frequently and remain much a mystery to many companies. The one variable in the equation of web findability that remains a staple is good quality metadata under the hood of the website.
This research methodology will allow for :
An assessment of findability maturity on the web from an image-centric viewpoint
Help improve findability on the web by establishing a baseline for where your organization is at in terms of structured data content and visualize gaps or areas for improvement from a search engine neutral perspective
Contextual model of recommending resources on an academic networking portalcsandit
Artificial Intelligence techniques have been instrumental in helping users to handle the large
amount of information on the Internet. The idea of recommendation systems, custom search
engines, and intelligent software has been widely accepted among users who seek assistance in
searching, sorting, classifying, filtering and sharing this vast quantity of information. In this
paper, we present a contextual model of recommendation engine which keeping in mind the
context and activities of a user, recommends resources in an academic networking portal. The
proposed method uses the implicit method of feedback and the concepts relationship hierarchy
to determine the similarity between a user and the resources in the portal. The proposed
algorithm has been tested on an academic networking portal and the results are convincing.
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALcscpconf
Artificial Intelligence techniques have been instrumental in helping users to handle the large amount of information on the Internet. The idea of recommendation systems, custom search engines, and intelligent software has been widely accepted among users who seek assistance insearching, sorting, classifying, filtering and sharing this vast quantity of information. In thispaper, we present a contextual model of recommendation engine which keeping in mind the context and activities of a user, recommends resources in an academic networking portal. Theproposed method uses the implicit method of feedback and the concepts relationship hierarchy to determine the similarity between a user and the resources in the portal. The proposed algorithm has been tested on an academic networking portal and the results are convincing
Software Repositories for Research -- An Environmental ScanMicah Altman
Presented at the Software Preservation Network Forum:
"We discuss the results of an environmental scan characterizing the current landscape of software repositories, hubs, and publication venues that are used in research and scholarships. The study aims to characterize the research and scholarship use cases supported by exemplar repositories, their models for sustainability, and the related key affordances, significant properties which the repository offers/maintains. We supplement this with a scan of funder and publisher policies toward software curation and citation; and a summary of key policy resources and guidelines. Using this environmental scan, we discuss a preliminary gap analysis. It hoped that by addressing these key questions, new insights will be provided into the types of decisions research Libraries can expect to make when designing future pilot software curation services."
Infrastructure and practices for data citation have made substantial progress over the last decade. This increases the potential rewards for data publication and reproducible science, however overall incentives remain relatively weak.
authorsNote: This summarizes a presentation given at the *National Academies of Sciences* as part of [Data Citation Workshop: Developing Policy And Practice*](http://sites.nationalacademies.org/pga/brdi/index.htm) .
Semantic Grounding Strategies for Tagbased Recommender Systems dannyijwest
Recommender systems usually operate on similarities between recommended items or users. Tag based
recommender systems utilize similarities on tags. The tags are however mostly free user entered phrases.
Therefore, similarities computed without their semantic groundings might lead to less relevant
recommendations. In this paper, we study a semantic grounding used for tag similarity calculus. We show a
comprehensive analysis of semantic grounding given by 20 ontologies from different domains. The study
besides other things reveals that currently available OWL ontologies are very narrow and the percentage
of the similarity expansions is rather small. WordNet scores slightly better as it is broader but not much as
it does not support several semantic relationships. Furthermore, the study reveals that even with such
number of expansions, the recommendations change considerably.
Computing semantic similarity measure between words using web search enginecsandit
Semantic Similarity measures between words plays an important role in information retrieval,
natural language processing and in various tasks on the web. In this paper, we have proposed a
Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four
association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector
machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous
word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
Whose articles cite a body of work? Is this a high-impact journal? How might others assess my scholarly impact? Citation analysis is one of the primary methods used to answer these questions.
An Improved Web Explorer using Explicit Semantic Similarity with ontology and...INFOGAIN PUBLICATION
The Improved Web Explorer aims at extraction and selection of the best possible hyperlinks and retrieving more accurate search results for the entered search query. The hyperlinks that are more preferable to the entered search query are evaluated by taking into account weighted values of frequencies of words in search string that are present in anchor texts and plain texts available in title and body tags of various hyperlink pages respectively to retrieve relevant hyperlinks from all available links. Then the concept of ontology is used to gain insights of words in search string by finding their hypernyms, hyponyms and synsets to reach to the source and context of the words in search string. The Explicit Semantic Similarity analysis along with Naïve Bayes method is used to find the semantic similarity between lexically different terms using Wikipedia and Google as explicit semantic analysis tools and calculating the probabilities of occurrence of words in anchor and body texts .Vector Space Model is being used to calculate Term frequency and Inverse document frequency values, and then calculate cosine similarities between the entered Search query and extracted relevant hyperlinks to get the most appropriate relevance wise ranked search results to the entered search string
Towards Automatic Analysis of Online Discussions among Hong Kong StudentsCITE
HU, Xiao (University of Hong Kong)
http://citers2013.cite.hku.hk/en/paper_619.htm
---------------------------
Author(s) bear(s) the responsibility in case of any infringement of the Intellectual Property Rights of third parties.
---------------------------
CITE was notified by the author(s) that if the presentation slides contain any personal particulars, records and personal data (as defined in the Personal Data (Privacy) Ordinance) such as names, email addresses, photos of students, etc, the author(s) have/has obtained the corresponding person's consent.
Citations—often termed as intellectual transactions, acknowledgment of intellectual debts, and conceptual association—are a link between the author’s current study and already published work. It not only provides credibility to the author’s work but also helps funders evaluate the impact of the research study. Citation indexes are maintained for information retrieval of both cited and citing work, facilitating the literature search process. It also helps authors in identifying the number of citations that their papers have received. Citation data is considered as a legitimate measure to rank authors, journals, and publishers. Through this webinar, we aim to provide information about citation indexing and how authors and publishers can get indexed in established citation databases.
Co-word analyses study the co-occurrence of pairs of items (for example, keywords) that are representative in a document, to identify relations between the ideas presented in the
texts.
TOWARDS A MULTI-FEATURE ENABLED APPROACH FOR OPTIMIZED EXPERT SEEKINGcsandit
With the enormous growth of data, retrieving information from the Web became more desirable
and even more challenging because of the Big Data issues (e.g. noise, corruption, bad
quality…etc.). Expert seeking, defined as returning a ranked list of expert researchers given a
topic, has been a real concern in the last 15 years. This kind of task comes in handy when
building scientific committees, requiring to identify the scholars’ experience to assign them the
most suitable roles in addition to other factors as well. Due to the fact the Web is drowning with
plenty of data, this opens up the opportunity to collect different kinds of expertise evidence. In
this paper, we propose an expert seeking approach with specifying the most desirable features
(i.e. criteria on which researcher’s evaluation is done) along with their estimation techniques.
We utilized some machine learning techniques in our system and we aim at verifying the
effectiveness of incorporating influential features that go beyond publications
Structured data and metadata evaluation methodology for organizations looking...Emily Kolvitz
The current state of findability on the web for many organizations is incipient. Search Engine Optimization (SEO) techniques change frequently and remain much a mystery to many companies. The one variable in the equation of web findability that remains a staple is good quality metadata under the hood of the website.
This research methodology will allow for :
An assessment of findability maturity on the web from an image-centric viewpoint
Help improve findability on the web by establishing a baseline for where your organization is at in terms of structured data content and visualize gaps or areas for improvement from a search engine neutral perspective
Contextual model of recommending resources on an academic networking portalcsandit
Artificial Intelligence techniques have been instrumental in helping users to handle the large
amount of information on the Internet. The idea of recommendation systems, custom search
engines, and intelligent software has been widely accepted among users who seek assistance in
searching, sorting, classifying, filtering and sharing this vast quantity of information. In this
paper, we present a contextual model of recommendation engine which keeping in mind the
context and activities of a user, recommends resources in an academic networking portal. The
proposed method uses the implicit method of feedback and the concepts relationship hierarchy
to determine the similarity between a user and the resources in the portal. The proposed
algorithm has been tested on an academic networking portal and the results are convincing.
CONTEXTUAL MODEL OF RECOMMENDING RESOURCES ON AN ACADEMIC NETWORKING PORTALcscpconf
Artificial Intelligence techniques have been instrumental in helping users to handle the large amount of information on the Internet. The idea of recommendation systems, custom search engines, and intelligent software has been widely accepted among users who seek assistance insearching, sorting, classifying, filtering and sharing this vast quantity of information. In thispaper, we present a contextual model of recommendation engine which keeping in mind the context and activities of a user, recommends resources in an academic networking portal. Theproposed method uses the implicit method of feedback and the concepts relationship hierarchy to determine the similarity between a user and the resources in the portal. The proposed algorithm has been tested on an academic networking portal and the results are convincing
Software Repositories for Research -- An Environmental ScanMicah Altman
Presented at the Software Preservation Network Forum:
"We discuss the results of an environmental scan characterizing the current landscape of software repositories, hubs, and publication venues that are used in research and scholarships. The study aims to characterize the research and scholarship use cases supported by exemplar repositories, their models for sustainability, and the related key affordances, significant properties which the repository offers/maintains. We supplement this with a scan of funder and publisher policies toward software curation and citation; and a summary of key policy resources and guidelines. Using this environmental scan, we discuss a preliminary gap analysis. It hoped that by addressing these key questions, new insights will be provided into the types of decisions research Libraries can expect to make when designing future pilot software curation services."
Infrastructure and practices for data citation have made substantial progress over the last decade. This increases the potential rewards for data publication and reproducible science, however overall incentives remain relatively weak.
authorsNote: This summarizes a presentation given at the *National Academies of Sciences* as part of [Data Citation Workshop: Developing Policy And Practice*](http://sites.nationalacademies.org/pga/brdi/index.htm) .
Semantic Grounding Strategies for Tagbased Recommender Systems dannyijwest
Recommender systems usually operate on similarities between recommended items or users. Tag based
recommender systems utilize similarities on tags. The tags are however mostly free user entered phrases.
Therefore, similarities computed without their semantic groundings might lead to less relevant
recommendations. In this paper, we study a semantic grounding used for tag similarity calculus. We show a
comprehensive analysis of semantic grounding given by 20 ontologies from different domains. The study
besides other things reveals that currently available OWL ontologies are very narrow and the percentage
of the similarity expansions is rather small. WordNet scores slightly better as it is broader but not much as
it does not support several semantic relationships. Furthermore, the study reveals that even with such
number of expansions, the recommendations change considerably.
Computing semantic similarity measure between words using web search enginecsandit
Semantic Similarity measures between words plays an important role in information retrieval,
natural language processing and in various tasks on the web. In this paper, we have proposed a
Modified Pattern Extraction Algorithm to compute the supervised semantic similarity measure
between the words by combining both page count method and web snippets method. Four
association measures are used to find semantic similarity between words in page count method
using web search engines. We use a Sequential Minimal Optimization (SMO) support vector
machines (SVM) to find the optimal combination of page counts-based similarity scores and
top-ranking patterns from the web snippets method. The SVM is trained to classify synonymous
word-pairs and non-synonymous word-pairs. The proposed Modified Pattern Extraction
Algorithm outperforms by 89.8 percent of correlation value.
Overview of Bibliometrics - IAP Course version 1.1Micah Altman
Whose articles cite a body of work? Is this a high-impact journal? How might others assess my scholarly impact? Citation analysis is one of the primary methods used to answer these questions.
Evolution and state-of-the art of Altmetric research: Insights from network a...Aravind Sesagiri Raamkumar
Evolution and state-of-the art of Altmetric research: Insights from network analysis and altmetric analysis
Authors: Hiran Lathabai, Thara Prabhakaran, Manoj Changat
Workshop Website: http://www.altmetrics.ntuchess.com/AROSIM2018/
This poster provides referencing services to linking bibliographical papers and citations with existing Linked Open Data. It aims to convert current bibliographical data in various digital library databases into semantic bibliographical data to enable research profiling and intelligent knowledge discovery
Provenance and social network analysis for recommender systems: a literature...IJECEIAES
Recommender systems (RS) and their scientific approach have become very important because they help scientists find suitable publications and approaches, customers find adequate items, tourists find their preferred points of interest, and many more recommendations on domains. This work will present a literature review of approaches and the influence that social network analysis (SNA) and data provenance has on RS. The aim is to analyze differences and similarities using several dimensions, public datasets for assessing their impacts and limitations, evaluations of methods and metrics along with their challenges by identifying the most efficient approaches, the most appropriate assessment data sets, and the most appropriate assessment methods and metrics. Hence, by correlating these three fields, the system will be able to improve the recommendation of certain items, by being able to choose the recommendations that are made from the most trusted nodes/resources within a social network. We have found that content-based filtering techniques, combined with term frequency-inverse document frequency (TF-IDF) features are the most feasible approaches when combined with provenance since our focus is to recommend the most trusted items, where trust, distrust, and ignorance are calculated as weight in terms of the relationship between nodes on a network.
Running Head DESCRIPTIVE STATISTICS COMPUTING .docxtodd271
Running Head: DESCRIPTIVE STATISTICS COMPUTING 1
DESCRIPTIVE STATISTICS COMPUTING 2
DESCRIPTIVE STATISTICS COMPUTING
Name
Course
Institution
Instructor
Date
Computing descriptive statistics
Haukoos, J. S., & Lewis, R. J. (2015). Advanced statistics: bootstrapping confidence intervals for statistics with “difficult” distributions. Academic emergency medicine, 12(4), 360-365.
This article by Haukoos and Lewis describe how to use confidence intervals in reporting research results which the authors acknowledge to have increased in use and as a requirement for scientific journal editors. The article explores a number of resources that describe methods of computing statistical confidence intervals for descriptive statistics that have descriptions that are not easy to mathematically represent which is a challenging task. The article is relevant to the topic matter in that it describes the methods along with how the resources availed describing the computing methods for descriptive statistics. The authors propose the use of bootstrap technique which they argue allows a researcher to make inferences from data without making strong assumptions about distribution of the data or the statistics under calculation. The strengths of the article include the fact that it describes the bootstrapping concept and demonstrates how to estimate confidence intervals for the median and the spearman rank correlation coefficient for data not normally distributed. the weakness of this resource is the fact that it does not generally describe how to compute descriptive statistics but narrows down to describing how to compute descriptive statistics confidence intervals. The article used a qualitative research method on a clinical study that used two commonly used statistical software packages of strata and SAS in its discussion of limitations of the bootstrap.
Team, R. C. (2013). R: A language and environment for statistical computing.
The document by Andy Bunn and Mikko Korpela describes the basic features of dendrologists program library in R, which is in itself a package used by dendrologists to handle the processing and analysis of data. The document follows the basic steps an analyst follows when working with a new tree-ring data set. This document is a vignette commences by describing how to read in ring widths and how to plot them. The vignette is relevant in this study as it describes a number of methods available for trending and detrending and shows how to extract basic descriptive statistics. It also shows how to build and plot a simple chronology of the mean value. The building of the mean-value chronology is shown in the vignette using the expressed signal; of the population from the detrended ring widths as a way of doing complicated computing by the use of The Dendrochronology Program Library in R (dplR). The vignette highlights how to.
CITATION SEMANTIC BASED APPROACHES TO IDENTIFY ARTICLE QUALITY cscpconf
Analyzing the structure of research articles and the relationship between the articles has always been important in bibliometrics research area. One of the methods for analyzing articles is by
its citation. Normally researchers rank the article based on the citation count. But this quantity based evaluation is not sufficient due to various citation types like Random citation, Guest
citation. Our paper aims in providing a new method to rate a research paper based on its citation quality. In order to find the citation quality, three different semantic related processes
are used: Citation classification, Citation Sentiment Analysis, Content Relevancy. This work analyzes some challenges found in citing a research paper. We have proposed methods to figure
out the valid citations for research paper and thereby found quality of the research paper.
Citation semantic based approaches to identify article qualitycsandit
Analyzing the structure of research articles and the relationship between the articles has always
been important in bibliometrics research area. One of the methods for analyzing articles is by
its citation. Normally researchers rank the article based on the citation count. But this quantity
based evaluation is not sufficient due to various citation types like Random citation, Guest
citation. Our paper aims in providing a new method to rate a research paper based on its
citation quality. In order to find the citation quality, three different semantic related processes
are used: Citation classification, Citation Sentiment Analysis, Content Relevancy. This work
analyzes some challenges found in citing a research paper. We have proposed methods to figure
out the valid citations for research paper and thereby found quality of the research paper
Exploring The Dimensions and Dynamics of Felt Obligation: A Bibliometric Anal...AJHSSR Journal
ABSTARCT: This study presents, to our knowledge, the first bibliometric analysis focusing on the concept of
"felt obligation," examining 120 articles published between 1986 and 2024. The aim of the study is to deepen our
understanding of the existing knowledge in the field of "felt obligation" and to provide guidance for further
research. The analysis is centered around the authors, countries, institutions, and keywords of the articles. The
findings highlight prominent researchers in this field, leading universities, and influential journals. Particularly,
it is identified that China plays a leading role in "felt obligation" research. The analysis of keywords emphasizes
the thematic focuses of these studies and provides a roadmap for future research. Finally, various
recommendations are presented to deepen the knowledge in this area and promote applied research. This study
serves as a foundation to expand and advance the understanding of "felt obligation" in the field.
KEYWORDS: Felt Obligation, Bibliometric Analysis, Research Trends
Acquiring the Skills for Professional Academic WritingSelf Employed
Presentation given at the Society for the Study of Symbolic Interaction Conference, Lancaster, 4th July 2018.
Using constructivist grounded theory, this presentation explores how academics acquire the skills to write and publish peer review papers. The value of a mixed methods approach is discussed, using open-ended interviews with academics who have successfully navigated the path to publication combined with social network analysis of participants’ publication portfolios. The intended outcome of this study will be a model of writing experiences and practice from which evidence based structures will be developed to support academics throughout their academic writing careers.
Navigation through citation network based on content similarity using cosine ...Salam Shah
The rate of scientific literature has been increased in the past few decades; new topics and information is added in the form of articles, papers, text documents, web logs, and patents. The growth of information at rapid rate caused a tremendous amount of additions in the current and past knowledge, during this process, new topics emerged, some topics split into many other sub-topics, on the other hand, many topics merge to formed single topic. The selection and search of a topic manually in such a huge amount of information have been found as an expensive and workforce-intensive task. For the emerging need of an automatic process to locate, organize, connect, and make associations among these sources the researchers have proposed different techniques that automatically extract components of the information presented in various formats and organize or structure them. The targeted data which is going to be processed for component extraction might be in the form of text, video or audio. The addition of different algorithms has structured information and grouped similar information into clusters and on the basis of their importance, weighted them. The organized, structured and weighted data is then compared with other structures to find similarity with the use of various algorithms. The semantic patterns can be found by employing visualization techniques that show similarity or relation between topics over time or related to a specific event. In this paper, we have proposed a model based on Cosine Similarity Algorithm for citation network which will answer the questions like, how to connect documents with the help of citation and content similarity and how to visualize and navigate through the document.
Fuzzy AndANN Based Mining Approach Testing For Social Network AnalysisIJERA Editor
Fast and Appropriate Social Network Analysis (SNA) tools ,techniques, are required to collect and classify
opinion scores on social networksites , as a grouping on wrong opinion may create problems for a society or
country . Social Network Analysis (SNA) is popular means for researcher as the number of users and groups
increasing day by day on that social sites , and a large group may influence other.In this paper, we
recommendhybrid model of opinion recommendation systems, for single user and for collective community
respectively, formed on social liking and influence network theory. By collecting thedata of user social networks
and preferenceslike, we designed aimproved hybrid prototype to imitate the social influence by like and sharing
the information among groups.The significance of this paper to analyze the suitability of ANN and Fuzzy sets
method in a hybrid manner for social web sites classifications, First, we intend to use Artificial Neural
Network(ANN)techniques in social media data classification by using some contemporary methods different
than the conventional methods of statistics and data analysis, in next we want to propagate the fuzzy approach
as a way to overcome the uncertainity that is always present in social media analysis . We give a brief overview
of the main ideas and recent results of social networks analysis , and we point to relationships between the two
social network analysis and classification approaches .This researchsuggests a hybrid classification model build
on fuzzy and artificial neural network (HFANN). Information Gain and three popular social sites are used to
collect information depicting features that are then used to train and test the proposed methods . This neoteric
approach combines the advantages of ANN and Fuzzy sets in classification accuracy with utilizing social data
and knowledge base available in the hate lexicons.
Subject tagging: Recommendations for Dryad curators and scientistsJane Frazier
How do scientists tag their data? Do they come up with their own topical terms, do they consult controlled vocabularies, or do they use some other source for these terms? The goal of this project is to write a memo or guide which will help Dryad librarians guide scientists to describe their data in the best way possible by submitting meaningful subject headings/topical headings with their data.
Strumsky lobo (2011) does patenting intensity beget qualityivan weinel
This paper addresses the following research question: Is the quality of patents issued in a given metropolitan area related to the per capita rate of patent authorship (patenting intensity / productivity)? The authors conclude that there may be a small positive response of patent quality (avg. number of citations received per patent granted) to increases in patenting productivity. Highly productive inventors do not necessarily generate high quality patents.
The author finds that using the ratio of a fast moving average to that of a slower one as an indicator to buy or sell individual stocks outperforms both traditional measures of price momentum (recent rate of change in security price) as well as the proximity of the stock price to its 52 week high.
Author presents a model of the distribution of wealth within a market society that: 1) can be calibrated to display the basic "stylized facts" (mixed exponential and Pareto distribution of wealth; Gini coefficient); 2) demonstrates the effectiveness of relatively modest changes in tax and transfer policies in reducing the degree of inequality characterizing the society.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
UiPath Test Automation using UiPath Test Suite series, part 3DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 3. In this session, we will cover desktop automation along with UI automation.
Topics covered:
UI automation Introduction,
UI automation Sample
Desktop automation flow
Pradeep Chinnala, Senior Consultant Automation Developer @WonderBotz and UiPath MVP
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™UiPathCommunity
In questo evento online gratuito, organizzato dalla Community Italiana di UiPath, potrai esplorare le nuove funzionalità di Autopilot, il tool che integra l'Intelligenza Artificiale nei processi di sviluppo e utilizzo delle Automazioni.
📕 Vedremo insieme alcuni esempi dell'utilizzo di Autopilot in diversi tool della Suite UiPath:
Autopilot per Studio Web
Autopilot per Studio
Autopilot per Apps
Clipboard AI
GenAI applicata alla Document Understanding
👨🏫👨💻 Speakers:
Stefano Negro, UiPath MVPx3, RPA Tech Lead @ BSP Consultant
Flavio Martinelli, UiPath MVP 2023, Technical Account Manager @UiPath
Andrei Tasca, RPA Solutions Team Lead @NTT Data
Welocme to ViralQR, your best QR code generator.ViralQR
Welcome to ViralQR, your best QR code generator available on the market!
At ViralQR, we design static and dynamic QR codes. Our mission is to make business operations easier and customer engagement more powerful through the use of QR technology. Be it a small-scale business or a huge enterprise, our easy-to-use platform provides multiple choices that can be tailored according to your company's branding and marketing strategies.
Our Vision
We are here to make the process of creating QR codes easy and smooth, thus enhancing customer interaction and making business more fluid. We very strongly believe in the ability of QR codes to change the world for businesses in their interaction with customers and are set on making that technology accessible and usable far and wide.
Our Achievements
Ever since its inception, we have successfully served many clients by offering QR codes in their marketing, service delivery, and collection of feedback across various industries. Our platform has been recognized for its ease of use and amazing features, which helped a business to make QR codes.
Our Services
At ViralQR, here is a comprehensive suite of services that caters to your very needs:
Static QR Codes: Create free static QR codes. These QR codes are able to store significant information such as URLs, vCards, plain text, emails and SMS, Wi-Fi credentials, and Bitcoin addresses.
Dynamic QR codes: These also have all the advanced features but are subscription-based. They can directly link to PDF files, images, micro-landing pages, social accounts, review forms, business pages, and applications. In addition, they can be branded with CTAs, frames, patterns, colors, and logos to enhance your branding.
Pricing and Packages
Additionally, there is a 14-day free offer to ViralQR, which is an exceptional opportunity for new users to take a feel of this platform. One can easily subscribe from there and experience the full dynamic of using QR codes. The subscription plans are not only meant for business; they are priced very flexibly so that literally every business could afford to benefit from our service.
Why choose us?
ViralQR will provide services for marketing, advertising, catering, retail, and the like. The QR codes can be posted on fliers, packaging, merchandise, and banners, as well as to substitute for cash and cards in a restaurant or coffee shop. With QR codes integrated into your business, improve customer engagement and streamline operations.
Comprehensive Analytics
Subscribers of ViralQR receive detailed analytics and tracking tools in light of having a view of the core values of QR code performance. Our analytics dashboard shows aggregate views and unique views, as well as detailed information about each impression, including time, device, browser, and estimated location by city and country.
So, thank you for choosing ViralQR; we have an offer of nothing but the best in terms of QR code services to meet business diversity!
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfPeter Spielvogel
Building better applications for business users with SAP Fiori.
• What is SAP Fiori and why it matters to you
• How a better user experience drives measurable business benefits
• How to get started with SAP Fiori today
• How SAP Fiori elements accelerates application development
• How SAP Build Code includes SAP Fiori tools and other generative artificial intelligence capabilities
• How SAP Fiori paves the way for using AI in SAP apps
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Monitoring Java Application Security with JDK Tools and JFR Events
Applyng wtd pr cttn nets
1. 1
This is the preliminary version of the JASIST submission
Applying weighted PageRank to author
citation networks
Ying Ding
School of Library and Information Science, Indiana University,
1320 East 10th Street, Herman B Wells Library, LI025, Bloomington, IN 47405, USA
Tel: (812) 855 5388, Fax: (812) 855 6166, Email: dingying@indiana.edu
Abstract
This paper aims to identify whether different weighted PageRank algorithms can be applied to
author citation networks to measure the popularity and prestige of a scholar from a citation
perspective. Information Retrieval (IR) was selected as a test field and data from 1956-2008
were collected from Web of Science (WOS). Weighted PageRank with citation and publication
as weighted vectors were calculated on author citation networks. The results indicate that both
popularity rank and prestige rank were highly correlated with the weighted PageRank. Principal
Component Analysis (PCA) was conducted to detect relationships among these different
measures. For capturing prize winners within the IR field, prestige rank outperformed all the
other measures.
2. 2
Introduction
The word “popularity” is derived from the Latin word popularis1
and its meaning evolved
from “belonging to the people” to “being beloved by the people.” Prestige suggests “important
popularity,” which may be described as “reputation”, “esteem”, “high standing among others”,
or “dazzling influence.”2
Bibliometrically, the popularity of a researcher can be measured by the
number of citations he has accumulated, and his prestige can be calibrated by the number of
citations from highly cited publications (see Figure 1, Ding & Cronin, 2010 forthcoming).
Researchers can be popular but not necessarily prestigious or vice versa. For example, an author
of an article introducing trendy topics in one field can be cited by many young researchers who
are relatively new to the field, but may not be cited by domain experts. In contrast, a researcher
of a seminal paper introducing innovative methods may be highly appreciated by domain experts,
but not laymen. Prestige, therefore, indicates important popularity.
Figure 1. Prestige (cited by highly cited papers) and Popularity (cited by normal papers).
Internet search engines need to distinguish between websites linked by normal websites from
those linked by important websites (e.g., Google, Yahoo, IBM, Microsoft websites). They face
the same issue when identifying prestigious websites. PageRank, invented by Sergey Brin and
3. 3
Lawrence Page, assumes that the importance of any website can be judged by the websites that
link to it, which was coincidently motivated by citation analysis. Brin and Page (1998)
emphasized the important link between PageRank and citation analysis, stating that “Academic
citation literature has been applied to the web, largely by counting citations or backlinks to a
given page. This gives some approximation of a page’s importance or quality. PageRank extends
this idea by not counting links from all pages equally, and by normalizing by the number of links
on a page” (p. 109). Google uses PageRank algorithm to rank websites by not only counting the
number of hyperlinks a website has, but also the number of hyperlinks pointed to by important
websites.
In citation analysis, the number of citations reflects the impact of a scientific publication.
This measurement does not differentiate the importance of the citing papers: a citation coming
from an obscure paper has the same weight as one from a groundbreaking, highly cited work
(Maslov & Redner, 2008). Pinski and Narin (1976) were these first scholars to note the
difference between popularity and prestige in the bibliometric area. They proposed using the
eigenvector of a journal citation matrix (i.e., similar to PageRank) corresponding to the principal
eigenvalue to represent journal prestige. Bollen, Rodriguez and Van de Sompel (2006) defined
journal prestige and popularity, and developed a weighted PageRank algorithm to measure them.
They defined popular journals as those journals cited frequently by journals with little prestige,
and prestigious journals as those journals cited by highly prestigious journals. Their definitions
are recursive. Recently, Ding and Cronin (2010 forthcoming) extended this approach to authors
and applied weighted citation counting methods to calculate researcher prestige in the field of
information retrieval. They defined the popularity of a researcher as “the number of times he is
cited (endorsed) in total, and prestige as the number of times he is cited by highly cited papers.”
4. 4
The main concept behind this prestige measure is to use simple citation counting but give more
weights to highly cited papers.
Since scholarly activities form complex networks where authors, journals, and papers are
connected via citing/being cited or co-occurred, the network topology can significantly influence
the impact of an author, journal, or paper. The recent development of large-scale networks and
the success of PageRank demonstrate the influence of the network topology on scholarly data
analysis. PageRank or weighted PageRank have performed well in representing the prestige of
journals (Bollen, Rodriguez, & Van de Somple, 2006; Falagas, Kouranos, Arencibia-Jorge, &
Karageorgopoulos, 2008), while only a few researchers have applied this concept to authors
(Radicchi, Fortunato, Makines & Vespignani, 2009; Zyczkowski, 2010). This paper is built upon
the effort of Ding and Cronin (2010 forthcoming) and clearly addresses this need by testing
whether PageRank or weighted PageRank algorithms applied in author citation networks can be
used to denote the popularity and prestige of scholars.
Information retrieval (IR) field was chosen as the test field and 15,370 papers with 341,871
citations covering 1956-2008 from WOS were collected. Weighted PageRank algorithms were
applied in author citation networks and were compared with related measures such as citation
counts, the h-index, and popularity and prestige ranks from Ding and Cronin (2010 forthcoming).
Measures were also tested by comparing the number of award winners included in their top 5, 10,
20, and 50 lists. This paper is organized as follows. Section 2 discusses related work on
popularity and prestige in the bibliometric setting; Section 3 explains the methods used in this
study and proposes weighted PageRank algorithms; Section 4 presents and compares original
and weighted PageRanks with popularity and prestige ranks from Ding and Cronin (2010,
forthcoming), tests the correlations of different measures, and evaluates the different coverage of
5. 5
IR field prize winners for these measures. Finally, Section 5 draws the conclusion and addresses
areas of future research.
Related Work
In bibliometrics, the impact factor is widely used to measure journal prestige (Garfield, 1999;
Bordons, Fernandez, & Gomez, 2002; Harter & Nisonger, 1997; Nederhof, Luwel, & Moed,
2001). This same principle has been extended to measure the impact of web spaces (Smith, 1999;
Thelwall, 2001). The h-index is used to assess the performance of researchers (Hirsch, 2005;
Cronin & Meho, 2006), and different h-index variations have been proposed and studied in
recent years (Jin, Liang, Rousseau, & Egghe, 2007; Sidiropoulos, Katsaros, & Manolopoulos,
2007). Redner (1998) measured the popularity of scientific papers based on the distribution of
citations. However, these previous measures simply use citation counts, and do not distinguish
citations coming from different authors, nor consider the network features of citing behavior.
Network features of citations can be important to differentiate the impact of author, journal
and paper. Bollen, Rodriguez and Van de Sompel (2006) proposed the notion of popular journals
and prestigious journals. They used a weighted PageRank algorithm to measure journal prestige.
They argued that the ISI Impact Factor (ISI IF) is a metric of popularity rather than of prestige.
They found significant discrepancies between PageRank and ISI IF in terms of measuring the
journal status at their top ranks. Franceschet (2010) conducted a thorough bibliometric analysis
to identify the difference between popularity and prestige of journals in science and social
science. He used five-year impact factor to represent the popularity of journals and the
eigenfactor metric for the prestige of journals. He found various diverging ranks in measuring
the popularity and prestige of science and social science journals.
6. 6
PageRank provides a computationally simple and effective way to identify important nodes
in the connected graphs (Maslov & Redner, 2008; Falagas, Kouranos, Arencibia-Jorge, &
Karageorgopoulos, 2008). The weighted PageRank is a further extension of PageRank that adds
weights to different parts of the PageRank formula. Radicchi, Fortunato, Makines and
Vespignani (2009) proposed a weighted PageRank algorithm on a directed weighted author
citation network for ranking scientists by considering the diffusion of their scientific credits.
Zyczkowski (2010) proposed weighting factors to measure author’s impact by normalizing
eigenvectors with the largest absolute eigenvalue for author citation networks. Xing and
Ghorbani (2004) added weights to the links based on their reference pages, differentiating
between inbound and outbound link weights. Their simulation results showed that weighted
PageRank outperforms the original PageRank in terms of returning larger number of relevant
pages to a given query. Aktas, Nacar, and Menczer (2006) proposed a weighted PageRank
algorithm based on user profiles by adding weights to certain Internet domains preferred by users.
Yu, Li, and Liu (2004) added a temporal dimension to the PageRank formula by weighting each
citation by date. Walker, Xie, Yan, and Maslov (2007) proposed a CiteRank algorithm by
introducing two parameters: the inverse of the average citation depth and the time constant which
is biased toward more recent publications. Liu, Bollen, Nelson, and Sompel (2005) defined
AuthorRank as a modification of PageRank that considers link weights among the co-authorship
links.
Methodology
Data Collection
Information retrieval (IR) was selected as the testing field. Papers and their citations were
collected from Web of Science (WOS) published between 1956 and 2008. (for more details, see
7. 7
Ding & Cronin, 2010 forthcoming). In total, 15,370 papers with 341,871 citations were collected.
The 15,370 papers did not include book reviews, software or database reviews. The citation
records contained the first author, year, source, volume, and page number. The whole dataset
was divided into four sets based on the time span: Phase 1 (1956-1980), Phase 2 (1981-1990),
Phase 3 (1991-2000) and Phase 4 (2001-2008). Figure 2 shows the publication distribution
among these four phases.
Figure 2. IR publications
PageRank and Weighted PageRank
The PageRank algorithm can be defined by the following equation
(1)
Where , ,… , are the nodes in the network and N is the total number of nodes, is the
set of nodes that link to , is the sum of weights of out-going links on node , is
the probability that the random surfer is on node , then same for . d is the damping
factor which is the probability that a random surfer will follow one of the links on the present
page. In this paper, the damping factor was set to 0.15 (to stress the equal chance of being cited),
8. 8
0.5 (to indicate that scientific papers usually follow a short path of 2), or 0.85 (to stress the
network topology) (Chen, Xie, Maslov, & Redner, 2007).
The weighted PageRank was proposed as
(2)
Where is the weight assigned to node , is the sum of the weights assigned
to each node in the network. is the weighted PageRank score for and same for
. So in weighted PageRank formula, every node does not have an equal chance to get
visited by the random surfer, while nodes with high weights will get high chance. The weight
for each node can be the number of citations received by this node, the number of publications
where this node acts as a first author, or the h-index of this node. In this paper, two weighted
vectors were considered: the number of citations of the nodes and the number of publications of
the nodes. It indicates that nodes with large number of citations or publications will have high
probability to get visited by the random surfer.
The author citation network is a directed and weighted graph where nodes represent authors,
edges represent citing relationships from author A to author B, and edge weights represent the
number of times that author A cites author B. For each phase, papers without any reference were
ignored. Four author citation networks were built: the 5395*5395 matrix in Phase 1, the
5978*5978 matrix in Phase 2, the 33483*33483 matrix in Phase 3, and the 61013*61013 matrix
in Phase 4. The original PageRank (denoted PR) and weighted PageRank (denoted PR_c or
PR_p where citation or publication, respectively, is the weighted vector) were calculated based
on these author citation networks to see whether they can measure the popularity and prestige of
9. 9
authors. The original and weighted PageRank were calculated using a PageRank MATLAB
program with damping factor of 0.15, 0.5, and 0.85.
For each phase, the top 20 and the top 100 highly cited authors were chosen to test the
variations of their PageRanks and weighted PageRanks with their popularity ranks and prestige
ranks, and to identify the correlation variation at different ranking levels (top 20 vs. top 100).
Spearman’s rank correlation test (two-tailed) was used to calculate the correlations among the
ranks of these measures for the top 20 or top 100 highly cited authors respectively. Two
indicators (the h-index and impact factor) were added to measure the 2001-2008 dataset.
Principal Component Analysis (PCA) was conducted to detect relationships among these
different measures. Finally, the coverage of IR prize winners captured by these measures was
compared at the top 5, 10, 20, and 50 ranks.
Results
Top 20 Ranks
The ranks of the top 20 authors measured by citation, PR (d), PR_c(d), and PR_p(d), where d
is 0.85, 0.5, or 0.15 are shown in Appendix. The dynamic changes of their ranks were tested in
four different periods. The top 20 prestigious authors are the top 20 authors according to prestige
rank, which was calculated based on how many times the author has been cited by highly cited
papers (for more details, see Ding & Cronin, 2010 forthcoming). Figure 3 shows the variety of
changes in the popularity rank, prestige rank, PageRank, and weighted PageRank in the four
phases. For the top 20s, prestige rank differed dramatically from the popularity rank, PageRank,
and weighted PageRank.
10. 10
Figure 3. Ranking of the top 20 prestigious authors based on popularity rank, PageRank, and
weighted PageRank in the four periods (PreRank represents prestige rank and PopRank
represents popularity rank).
Top 100 Ranks
The Spearman correlation test (two-tailed) on the top 100 highly cited authors was calculated
to identify correlations among these measures (see Table 1). Popularity rank and prestige rank
were significantly correlated in Phase 1(r=0.548, p<0.01), Phase 2 (r=0.744, p>0.01), Phase 3
(r=0.662, p<0.01), and Phase 4 (r=0.490, p<0.01). Table 2 summarizes the correlations for the
top 100 highly cited authors. Figure 4 displays the scatter plots of the correlations. In general,
popularity rank had a higher correlation with PR_c (mean r=0.925, p<0.01) than with PR (mean
r=0.702, p<0.01) or PR_p (mean r=0.485, p<0.01), and had the highest correlation with PR_c(.15)
12. 12
popularity vs. original PR 0.543 0.759 0.662 0.843
popularity vs. weighted PR_c 0.815 0.961 0.949 0.973
popularity vs. weighted PR_p 0.451 0.618 0.509 0.360
highest correlation with popularity PR_c(.15)
(0.866)
PR_c(.15)
(0.994)
PR_c(.15) (0.994) PR_c(.15) (0.996)
prestige vs. original PR 0.188* 0.55 0.302 0.284
prestige vs. weighted PR_c 0.472 0.790 0.691 0.512
prestige vs. weighted PR_p 0.482 0.712 0.637 0.520
highest correlation with prestige PR_p(.15)
(0.496)
PR_c(.85)
(0.807)
PR_c(.85) (0.7) PR_p(.85) (0.585)
Note: * means that they are not significantly correlated at the 0.05 confidence level;
Figure 4. Scatter plots of popularity rank, prestige rank, original PageRank, and weighted
PageRank. (X axis represents the authors (numbered according to the popularity rank). Y axis
represents the ranks).
For different rank scales (top 20 vs. top 100), although the popularity rank and prestige rank
differ for the top 20 highly cited authors, they were significantly correlated for the top 100 highly
cited authors in these four phases. This result shows that the ranking scale matters: in our case,
13. 13
the discrepancy in the top 20 was larger than in the top 100. Bollen, Rodriguez, and Van de
Sompel (2006) found similar results, in that the top 10 ranked journals based on the ISI IF and
Y-factors significantly diverged, but significantly correlated when the scale increased. Chen, Xie,
Maslov, and Redner (2007) also found a similar phenomenon in Physics, that is, only four papers
were common in the top 10 highly ranked physics papers based on PageRank and citation counts,
but they became significantly correlated when the scale increased.
Correlation of various indicators
Papers published in Phase 4 and their citations were selected to test the correlations among
different indicators since 2001-2008 contained the largest number of papers and citations. Two
new indicators were added to the above 11 indicators: the h-index rank and impact factor rank
(IF rank). The h-index rank was calculated based on ranking the h-indexes of the top 100 highly
cited authors from WOS. The IF rank was calculated by first adding the journal impact factor of
the citing article to its citations, summing together the impact factors of citations for each of the
top 100 highly cited authors, and then ranking these authors based on the summarized impact
factors. The journal impact factor of the citing article is the corresponding journal impact factor
based on the publication year of the citing article as journal impact factors vary yearly. Table 3
shows that prestige rank was highly correlated with PR_p(.85), which has publication counts as
the weighted vector and stresses the author citation graph topology. The IF rank had a higher
correlation with popularity rank than with prestige rank. The h-index rank was significantly
correlated with popularity rank, prestiage rank, PR_c, PR_p(.85) or PR_p(.5) at a confidence
level of 0.05.
Table 3. Correlations of 13 different indicators for the top 100 highly cited authors in 2001-2008.
PopRank PreRank PR(.85) PR(.5) PR(.15) PR_c(.85) PR_c(.5) PR_c(.15) PR_p(.85) PR_p(.5) PR_p(.15) h-index
Rank
IF
Rank
14. 14
PopRank 1
PreRank 0.49 1
PR(.85) 0.855 0.299 1
PR(.5) 0.847 0.269 0.994 1
PR(.15) 0.827 0.236** 0.974 0.992 1
PR_c(.85) 0.947 0.514 0.873 0.84 0.793 1
PR_c(.5) 0.977 0.514 0.877 0.853 0.816 0.991 1
PR_c(.15) 0.996 0.509 0.865 0.852 0.826 0.965 0.989 1
PR_p(.85) 0.43 0.585 0.22** 0.186** 0.155* 0.442 0.437 0.437 1
PR_p(.5) 0.337 0.497 0.135* 0.109* 0.088* 0.33 0.329 0.338 0.982 1
PR_p(.15) 0.312 0.479 0.11* 0.086* 0.068* 0.301 0.302 0.313 0.974 0.998 1
h-index
Rank 0.167* -0.065* 0.188** 0.192** 0.188** 0.144* 0.153* 0.157* -0.183* -0.184* -0.192** 1
IF Rank 0.799 0.552 0.679 0.663 0.634 0.802 0.809 0.809 0.469 0.384 0.357 0.075* 1
* means that they are not significantly correlated at the 0.05 confidence level; ** means that they are not significantly correlated at the
0.01 confidence level;
Principal component analysis (PCA) was conducted for these 13 different measures. Four
components were extracted that explained 87.84% of the total variance (Rotation method:
Varimax with Kaiser Normalization). In Table 4, representative variables for each component
were highlighted in bold if their loadings were more than 0.4 (Raubenheimer, 2004). Popularity
rank and prestige rank belong to different components. Component 1 can be explained to
represent the dimension of popularity highlighted by popularity rank, PR_c, and IF rank.
Component 2 can be explained to represent the dimension of PageRank highlighted by PR and
PR_c(.85). Component 3 can be explained to represent the dimension of weighted PageRank
highlighted by PR_p. Component 4 can be explained to represent the dimension of prestige
highlighted by prestige rank, h-index rank and IF rank. Therefore, the h-index rank and IF rank
showed a significant level of prestige with very high absolute loadings in Component 4.
Table 4. PCA of 13 different indicators.
Component
1 2 3 4
PopRank .894 .329 .169 -.083
PreRank .389 .025 .365 .535
PR(.85) .294 .944 -.044 -.016
PR(.5) .305 .947 -.058 -.043
PR(.15) .307 .932 -.064 -.063
PR_c(.85) .848 .439 .139 .001
PR_c(.5) .889 .394 .154 -.032
PR_c(.15) .898 .352 .168 -.062
PR_p(.85) .247 -.028 .887 .179
PR_p(.5) .069 -.041 .974 .049
PR_p(.15) .038 -.047 .972 .065
15. 15
h-index Rank .185 .091 -.119 -.849
IF Rank .587 -.005 -.065 .420
Evaluation
Table 5 shows the number of IR prize winners contained in the top 5, 10, 20, and 50 ranks of
these 13 different indicators, according to the 2001-2008 dataset. The IR prize contains the
Gerard Salton Award and the Tony Kent Strix Award. Overall, prestige rank outperformed all
the other measures at all different top rank categories. Thus, when capturing the prize winner is
important, it is better to use prestige rank.
Table 5. IR prize winners captured by the 13 indicators (2001-2008).
Top@5 Top@10 Top@20 Top@50
PopRank 2 2 5 8
PreRank 3 7 8 9
PR(.85) 2 2 4 7
PR(.5) 2 2 3 7
PR(.15) 1 2 3 7
PR_c(.85) 2 2 5 8
PR_c(.5) 2 2 6 8
PR_c(.15) 2 2 5 8
PR_p(.85) 1 3 5 6
PR_p(.5) 1 1 4 6
PR_p(.15) 1 1 2 6
h-index
Rank
0 1 2 2
IF Rank 2 3 6 7
Conclusion
This paper conducted a detailed analysis of popularity and prestige rankings based on data
collected from WOS in IR for the period of 1956-2008. The whole 52-year period was divided
into four phases. A weighted PageRank was proposed and calculated based on author citation
networks. The proposed weighted PageRank contains different weighted vectors: the number of
citations or the number of publications. Different damping factors were tested to stress the
random citation (d=0.15), network topology (d=0.85), or short path of two for scientific papers
(d=0.5). For the top 20 ranks, there were various discrepancies among prestige rank, popularity
16. 16
rank, PageRank, and weighted PageRank in different time periods. For the top 100, popularity
rank and prestige rank were significantly correlated in all four phases.
Two measures (h-index rank and IF rank) were added to the above indicators to test the
correlations in the 2001-2008 dataset. Four components were extracted by PCA representing:
popularity, original PageRank, weighted PageRank and prestige. Popularity rank and prestige
rank were separated to different components which indicated the difference of their measuring.
The number of prize winners in the top 5, 10, 20, and 50 ranks of these 13 measures were also
tested in 2001-2008. Overall, prestige rank outperformed all other indicators at all different top
rank categories.
In bibliometrics, research impact is measured by different indicators using various counting
methods on different datasets. Thus there are three important methodological components to the
evaluation of scholarly impact: indicators, counting methods, and datasets. Most indicators are
one-dimensional measures of either publications or citations. The recently developed h-index
combines publications with citations. A future trend will be to generate indicators that measure
multi-dimensional units (e.g., publications, citations, and co-authorship collaborations). Counting
methods are generally limited to the simple counting of citations, and it is necessary to
differentiate citations coming from Paper “The-Best” or Paper “The Worst.” Adding weights to
citations is therefore important. Finally, citing and co-authoring create relationships to link data
from different datasets and form scholarly networks. Previous work focuses mainly on
homogenous networks, where nodes in the network belong to the same data type. Since different
types of data can be linked, it is meaningful to create new measures or algorithms to evaluate
heterogeneous networks.
17. 17
Acknowledgement
The author would like to thank two anonymous reviewers who help to improve the quality of this
paper dramatically.
References
Aktas, M.S., Nacar, M.A., & Menczer, F. (2006). Using hyperlink features to personalize web search. In
Proceeding of 6th International Workshop on Knowledge Discovery from the Web (WebKDD 2004),
Seattle, WA.
Bollen, J., Rodriguez, M. A., & Van de Sompel, H. (2006). Journal status. Scientometrics, 69(3), 669-687.
Bordons, M., Fernandez, M. T., & Gomez, I. (2002). Advantages and limitations in the use of impact
factor measures for the assessment of research performance. Scientometrics, 53(2), 195-206.
Brin, S., & Page, L. (1998). The anatomy of a large-scale hypertextual Web search engine. Proceedings
of the seventh international conference on World Wide Web, 107-117.
Chen, P., Xie, H., Maslov, S., & Redner, S. (2007). Finding scientific gems with Google. Journal of
Informetrics, 1, 8-15.
Cronin, B., & Meho, L. I. (2006). Using the h-index to rank influential information scientists. Journal of
the American Society for Information Science and Technology, 57(9), 1275-1278.
Ding, Y., & Cronin, B. (2010 forthcoming). Popular and/or Prestigious? Measures of Scholarly Esteem,
Information Processing and Management (DOI:10.1016/j.ipm.2010.01.002).
Falagas, M. E., Kouranos, V. D., Arencibia-Jorge, R., & Karageorgopoulos, D. E. (2008). Comparison of
SCImago journal rank indicator with journal impact factor. The FASEB Journal, 22, 2623-2628.
Franceschet, M (2010). The difference between popularity and prestige in the sciences and in the social
sciences: A bibliometric analysis. Journal of Informetrics, 4, 55-63
Garfield, E. (1999). Journal impact factor: a brief review. Canadian Medical Association Journal,
161:979–980.
18. 18
Harter, S. P., & Nisonger, T. E. (1997). ISI’s impact factor as misnomer: a proposed new measure to
assess journal impact. Journal of the American Society for Information Science, 48(12):1146–1148.
Hirsch, J. E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the
National Academy of Sciences, 102(46):16569–16572.
Jin, B., Liang, L., Rousseau, R., & Egghe, L. (2007). The R- and AR- indices: Complementing the h-
index. Chinese Science Bulletin, 52(6): 855-863.
Liu, X., Bollen, J., Nelson, M. L., & Sompel, H. V. (2005). Co-authorship networks in the digital library
research community. Information Processing and Management, 41, 1462-1480.
Maslov, S., & Redner, S. (2008). Promise and pitfalls of extending Google’s PageRank algorithm to
citation networks. Journal of Neuroscience, 28(44), 11103-11105.
Nederhof, A. J., Luwel, M., & Moed, H. F. (2001). Assessing the quality of scholarly journals in
linguistics: An alternative to citation-based journal impact factors. Scientometrics, 51(1):241 – 265.
Pinski, G., & Narin, F. (1976). Citation influence for journal aggregates of scientific publications: Theory,
with application to the literature of physics. Information Processing and Management, 12, 297-312.
Radicchi, F., Fortunato, S., Makines, B., & Vespignani, A. (2009). Diffusion of scientific credits and the
ranking of scientists. Phys. Rev. E., 80(5), 056103. DOI: 10.1103/PhysRevE.80.056103
Raubenheimer, J. E. (2004). An item selection procedure to maximize scale reliability and validity. South
African Journal of Industrial Psychology, 30(4), 59-64.
Redner, S. (1998). How popular is your paper? An empirical study of the citation distribution. European
Physics Journal, 4, 131-134.
Sidiropoulos, A., Katsaros, D., & Manolopoulos, Y. (2007). Generalized Hirsch h-index for disclosing
latent facts in citation networks. Scientometrics, 72(2), 253-280.
Smith, A. G. (1999). A tale of two web spaces: comparing sites using web impact factors. Journal of
Documentation, 55(5):577–592.
19. 19
Thelwall, M. (2001). Results from a web impact factor crawler. Journal of Documentation, 57(2):177–
191.
Xie, H., Yan, K. K., & Maslov, S. (2007). Optimal ranking in networks with community structure.
Physica A. 373,831-836.
Xing, W., & Ghorbani, A. A. (2005). Information domain modeling for adaptive web systems. Web
Intelligence, 10, 684-687.
Yu, P.S., Li, X., & Liu, B. (2004). On the temporal dimension of search. In Proceedings of the 13th
international World Wide Web conference on Alternate track papers & posters, May 17-22, New
York.
Zyczkowski, K. (2010). Citation graph, weighted impact factors and performance indices. Scientometrics,
85(1), 301-315.
Footnotes
1
http://www.etymonline.com/index.php?term=popular
2
http://www.etymonline.com/index.php?search=prestige,
http://dictionary.reference.com/browse/prestige