UCount: A community-driven approach for measuring Scientific ReputationAltmetrics Workshop / websci2011 Cristhian ParraUniversity of Trento, Italyparra@disi.unitn.it
Contexthttp://beta.kspaces.net/ic/http://reseseval.org/http://liquidjournal.org/
What is Scientific Reputation?Scientific Reputation is the social evaluation (opinion) by the scientific community of a researcher or its contributions given a certain criterion (scientific impact)
Main GoalunderstandTo understand the way reputationis formed within and across scientific communities?How, Why
Science is an Economy of Reputation [Whitley 2000]MotivationImprove support for Decision MakingReadershipAffiliationBibliometrics
[Insert footer]DatasetTop H-Index (>200)79 total Replies8 Online SurveysICWE (18)BPM (20)VLDB (15)...http://reseval.org/surveyhttp://www.cs.ucla.edu/~palsberg/h-number.htmlExperiment #1: LiquidReputation Surveys
Correlation ResultsH-Index (Palsberg)# Publications (DBLP)H-Index (Script)132Results published in ISSI2011 and SEBD2011
Experiment #2: Position Contests Analysis(*) http://reclutamento.murst.it/(**) http://intersection.dsi.cnrs.fr/intersection/resultats- cc- en.do
ResultsSurveys:Correlation between bibliometric indicators and reputation is always in the rank of (-0.5:0.5)Research Position ContestsCNRS dataset: same result as in surveysItalian dataset: around 50% of effectiveness in predictions for all metricsBibliometrics are not a good describer of real reputation
UCount MethodologyUCount Sci. ExcellenceUCount Reviewer Score
Challenges
UCountEliciting ReputationBeen therePeer Review based assessment(Research Position Contests)SurveysCommunity oriented SurveysPeer Review Feedback
UCount SurveysList of CandidatesDBLPCoauthorship GraphICSTAffinityShortest Path + JaccardEditorial BoardsPalsbergTop H Researchershttp://icst.org/UCount-Survey/http://icst.org/icst-transactions/http://www.cs.ucla.edu/~palsberg/h-number.html
UCountDerive Reputation FunctionsUCount Scientific ImpactUCount Reviewer ScorePeer Review FeedbackSurveys Results
Reverse Engineering of ReputationCombineOther Features?Minimum Distance
UCountDerive Reputation FunctionsUCount Scientific ImpactUCount Reviewer ScoreCommunity Reputation Functions LibraryPeer Review FeedbackSurveys Results
Reverse Engineering ApproachesDecision TreesNo tree with more than 60% of accuracyUnsupervised MethodsGenetic algorithms applied on CNRS Dataset improved correlation in an average of 15% (running only for 5 minutes)Highly improved correlation for fields Research Management and Politics. NextApplying Machine Learning techniquesExplore other techniques (e.g. neural networks)Obtain other types of features (e.g. keynotes, advisory networks)http://code.google.com/p/revengrep/https://github.com/cdparra/melquiades/

2011 06-14 cristhian-parra_u_count

  • 1.
    UCount: A community-drivenapproach for measuring Scientific ReputationAltmetrics Workshop / websci2011 Cristhian ParraUniversity of Trento, Italyparra@disi.unitn.it
  • 2.
  • 3.
    What is ScientificReputation?Scientific Reputation is the social evaluation (opinion) by the scientific community of a researcher or its contributions given a certain criterion (scientific impact)
  • 4.
    Main GoalunderstandTo understandthe way reputationis formed within and across scientific communities?How, Why
  • 5.
    Science is anEconomy of Reputation [Whitley 2000]MotivationImprove support for Decision MakingReadershipAffiliationBibliometrics
  • 6.
    [Insert footer]DatasetTop H-Index(>200)79 total Replies8 Online SurveysICWE (18)BPM (20)VLDB (15)...http://reseval.org/surveyhttp://www.cs.ucla.edu/~palsberg/h-number.htmlExperiment #1: LiquidReputation Surveys
  • 7.
    Correlation ResultsH-Index (Palsberg)#Publications (DBLP)H-Index (Script)132Results published in ISSI2011 and SEBD2011
  • 8.
    Experiment #2: PositionContests Analysis(*) http://reclutamento.murst.it/(**) http://intersection.dsi.cnrs.fr/intersection/resultats- cc- en.do
  • 9.
    ResultsSurveys:Correlation between bibliometricindicators and reputation is always in the rank of (-0.5:0.5)Research Position ContestsCNRS dataset: same result as in surveysItalian dataset: around 50% of effectiveness in predictions for all metricsBibliometrics are not a good describer of real reputation
  • 10.
    UCount MethodologyUCount Sci.ExcellenceUCount Reviewer Score
  • 11.
  • 12.
    UCountEliciting ReputationBeen therePeerReview based assessment(Research Position Contests)SurveysCommunity oriented SurveysPeer Review Feedback
  • 13.
    UCount SurveysList ofCandidatesDBLPCoauthorship GraphICSTAffinityShortest Path + JaccardEditorial BoardsPalsbergTop H Researchershttp://icst.org/UCount-Survey/http://icst.org/icst-transactions/http://www.cs.ucla.edu/~palsberg/h-number.html
  • 14.
    UCountDerive Reputation FunctionsUCountScientific ImpactUCount Reviewer ScorePeer Review FeedbackSurveys Results
  • 15.
    Reverse Engineering ofReputationCombineOther Features?Minimum Distance
  • 16.
    UCountDerive Reputation FunctionsUCountScientific ImpactUCount Reviewer ScoreCommunity Reputation Functions LibraryPeer Review FeedbackSurveys Results
  • 17.
    Reverse Engineering ApproachesDecisionTreesNo tree with more than 60% of accuracyUnsupervised MethodsGenetic algorithms applied on CNRS Dataset improved correlation in an average of 15% (running only for 5 minutes)Highly improved correlation for fields Research Management and Politics. NextApplying Machine Learning techniquesExplore other techniques (e.g. neural networks)Obtain other types of features (e.g. keynotes, advisory networks)http://code.google.com/p/revengrep/https://github.com/cdparra/melquiades/

Editor's Notes

  • #2 Good afternoon everyone. My name is Cristhian Parra and today I will present the work we are pushing forward in Trento to first capture and later estimate reputation in academia
  • #4 The most basic definition of reputation comes in the following way: reputation (in this case scientific) is the social evaluation of a group of entities (the scientific community) towards a person, group of persons, organizations, artifacts (researchers and contributions in this case) on a certain criterion (which is more frequently the scientific impact)And why is this of any importance?
  • #5 With this title, we want to refer to the two main elements of the proposal. The first element is “understanding”, which refers to the main goal of the proposal: to understand the way reputation is formed within and across scientific communities. Very few people will doubt about reputation people like Einstein in Physics, Turing in CS, or more recently by Aho in CS (famous to us students for his Dragon Book). Their good reputation is safe, in a way. Now, few people will also know how to precisely explain why this happens or what exactly make researchers to have such a good opinion about some of their peer. Which lead us to the second element of our proposal, related to the fundamental problem we will need to solve in order to get to the goal: Reverse Engineering Scientific Reputation.How can we derive the main aspects that affect reputation of researchersin the mind of people?
  • #6 Because Science is basically an Economy of Reputation, where the reward for contributing to science is fundamentally building up your reputation.An this reputation is mainly based on your Scientific impact, is a multi-dimensional construct that can not be adequately measured by any single indicator [9]. It might depend on features ranging from citation-based bibliometrics, to newly web based readership or download, twitter counts, or simply the reputation of your affiliation or collaborators.This features can be both objective (e.g. bibliometrics) and subjective (e.g.affiliation) criteria resulting in a measure of and they are highly dependant of the communities. Some communities might be more or less subjective than others. Researchers will understand criteria behind their own reputationResearchers will also understand how this reputation varies across communitiesAll this understanding will help to ease the pressure of the publish or perish cultureIn general, it will improve support for decision making in evaluation processes.
  • #8 weak positive linear dependence wrt H-Index (with self-citations).medium positive linear dependence wrtnumber of publications,
  • #11 Institute for Computer Sciences, Social Informatics and Telecommunications Engineering (ICST)
  • #12 Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
  • #13 Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
  • #15 Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
  • #16 Possible Examples of CombinationsOne single feature with the highest correlation to reputation (e.g. H-Index for Databases, Readership for Social Informatics)A linear combination of featuresA complex logic algorithm (e.g. a decision tree)
  • #17 Measure the difference on reputation across different communitiesValidation of resultsAnd the challenges are basically the following. First, we need to get reputation info. This is, we need to know the opinion researchers have about other researchersSecond, we need to understand what are the features that characterize to researchersor their work in computer science. Example of Features are Indicators as the "Total number of publications" and other Informations that can give an idea of thequality of the work of a scientist (e.g. keynotes talks, awards, grants, affiliation, etc.) Then, we need to find a way of representing and "Collecting" these features. That is, we need to crawl the web, academic libraries, search engines, etc. looking for this info. Once we have all the data, the next step is to efectively "derive" and "represent"reputation logic behind a particular ranking. And finally, the big challenge is to validate the work. To measure how much our derived reputation algorithms can actually help researchers make better decisions.
  • #19 Possible Examples of CombinationsOne single feature with the highest correlation to reputation (e.g. H-Index for Databases, Readership for Social Informatics)A linear combination of featuresA complex logic algorithm (e.g. a decision tree)
  • #20 Now, I’m sure that you are all thinking now. “Why do we want to do this?”Yes, and NO.
  • #26 Researchers will understand criteria behind their own reputation, allowing them to know what re- ally matters when it comes to research impact. This is what indicators contribute most to the researcher’s opinion of reputation.• Researchers will also understand how this reputation varies across communities, giving an important in- put for the always difficult problem of cross community comparisons.• This understanding will be done using data sources that include traditional but also social indicators (e.g. liquidpub, citeulike, mendeley, etc.) which means that our results will naturally extent metrics beyond cita- tions, helping to identify ways to measure scientific reputation in accurate terms (i.e. closer to the real opinion of people)• All these understanding will help to ease the pressure of the publish or perish culture and allow scientists to better focus on what it is really important.
  • #27 In our case, because we want to analyze Reputation in the context of Science, we need to understand Research Evaluationbecause in order to come up with an opinion about a peer in science, what we do is EVALUATING himIn research evaluation, not onlyResearchers are the subject of evaluation, but alsoTheir contributions (papers)The dissemination means such as Journals and ConferencesAnd the Institutions. To do so, we have been using two main methods:Committees (such as those of peer review)Quantitative Analysis (such as bibliometric indicators)
  • #28 weak positive linear dependence wrt H-Index (with self-citations).medium positive linear dependence wrtnumber of publications,
  • #29 weak positive linear dependence wrt H-Index (with self-citations).medium positive linear dependence wrtnumber of publications,