A Model of the Scholarly Community
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

A Model of the Scholarly Community

on

  • 1,861 views

 

Statistics

Views

Total Views
1,861
Views on SlideShare
1,861
Embed Views
0

Actions

Likes
1
Downloads
27
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

A Model of the Scholarly Community Presentation Transcript

  • 1. A Model of the Scholarly Community Marko A. Rodriguez http://www.soe.ucsc.edu/~okram March 30, 2007
  • 2. MESUR Project
    • 2-year project
    • First half of the project is focused on ontology development , parsing, and the development of analysis algorithms ( metrics ).
    • Second half of the project is the analysis of our data structure and reporting our findings in the literature.
  • 3. Outline
    • The Data : which and how much data?
    • The Model : how do we represent the data?
    • The Metrics : how do we quantify the entities in our model?
  • 4. Terminology
    • Groups : journals, proceedings, magazines, newspapers, edited books.
    • Units : articles, book chapters, dissertations.
    • Documents : groups and units.
    • Usage-Event : the act interacting with an article. (e.g. getFullText, getAbstract, getReferences) -- expression of user interest.
  • 5. The Data
    • Two types of data:
      • Bibliographic data : metadata pertaining groups and units.
      • Usage data : metadata pertaining to group and unit usage.
  • 6. The Data
    • Group-level Bibliographic Data
      • SFX Master List: > 300,000 groups
      • SFX: > 85,000 group classifications
      • ISI JCR: > 8,000 indexed groups
      • ISI JCR: > 50,000,000 group citations
      • ISI JCR: > 100,000 group classifications
    • Unit-level Bibliographic Data
      • ISI Tapes: > 30,000,000 unit records
      • ISI Tapes: > 500,000,000 unit citations
  • 7. The Data
    • Usage Data
      • Los Alamos: > 400,000 1-year
      • BioMed Central: > 24,000,000 2-years
      • anonymous : > 1,000,000 5-years
      • anonymous : > 2,500,000 1-year
      • anonymous : > 50,000,000 1-week
  • 8. The Data
    • The semantic network model is estimated be >10 billion triples (edges).
      • as of March 2007: 1.2 billion.
  • 9. The Model
    • In order to integrate the various data sets in their various formats, we model all information according to an ontology.
  • 10. The Model
    • RDF, RDFS, OWL [W3C Standards]
      • Resource Description Framework
      • Resource Description Framework Schema
      • Web Ontology Language
    • Provides us a standardized language for which to represent our entities and their relationships to one another.
  • 11. The Model
    • In OWL, everything is an owl:Thing--both nodes and edges (analogous to java.lang.Object in Java)
    • All owl:Things are represented by a URI.
    • An instance of the ontology provides us with a URI triple list data structure:
  • 12. The Model
    • The instance of an OWL ontology resides in a triple store.
  • 13. The Model
    • SPARQL (like SQL, but for triple stores).
    SELECT ?c as grandparent WHERE ( ?a childOf ?b) ( ?b childOf ?c )
  • 14. The Model Rodriguez, M.A., Bollen, J., Van de Sompel, H., “ A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage ”, IEEE/ACM Joint Conference on Digital Libraries, Vancouver, 2007.
  • 15. The Model
  • 16. The Model
  • 17. The Model SELECT ?x WHERE ( ?x rdf:type mesur:Publishes ) ( ?x mesur:hasAuthor lanl:marko ) ( ?x mesur:hasAuthor lanl:herbertv ) INSERT < _123 rdf:type mesur:Coauthor > INSERT < _123 mesur:hasSource lanl:marko > INSERT < _123 mesur:hasSink lanl:herbertv > INSERT < _123 mesur:hasWeight COUNT(?x) > INSERT < _456 rdf:type mesur:Coauthor > INSERT < _456 mesur:hasSource lanl:herbertv > INSERT < _456 mesur:hasSink lanl:marko > INSERT < _456 mesur:hasWeight COUNT(?x) > From the Publishes contexts, generate a weighted coauthorship network.
  • 18. The Model Phase 1 is looking just at group level usage and bibliographic data
  • 19. The Metrics
    • ISI Impact Factor
    • Usage Impact Factor
      • Bollen J., Van de Sompel, H., “ Usage Impact Factor: The Effects of Sample Characteristics on Usage-based Impact Metrics ”, [in review], 2007.
    • H-Index
      • Hirsh, J.E., “ An index to quantify an individual's scientific research output ”, Proceedings of the National Academy of Science, 102:46, 2005.
    • Y-Factor
      • Bollen J., Rodriguez, M.A., Van de Sompel, H., “ Journal Status ”, Scientometrics, 69:3, 2006.
  • 20. The Metrics SELECT ?x WHERE ( ?x rdf:type mesur:Publishes ) ( ?x mesur:hasUnit ?a ) ( ?x mesur:hasGroup ?b ) ( ?b mesur:partOf urn:issn:1082-9873 ) ( ?x mesur:hasTime ?t ) AND (?t > 2004 AND ?t < 2007) ( ?y rdf:type mesur:Citation ) ( ?y mesur:hasSource ?c ) ( ?y mesur:hasSink ?a ) ( ?z rdf:type mesur:Publishes ) ( ?z mesur:hasUnit ?c ) ( ?z mesur:hasTime ?u) AND ?u = 2007 SELECT ?y WHERE ( ?y rdf:type mesur:Publishes ) ( ?y mesur:hasGroup ?a ) ( ?a mesur:partOf urn:issn:1082-9873 ) ( ?y mesur:hasTime ?t ) AND (?t > 2004 AND ?t < 2007) INSERT < _123 rdf:type mesur:ImpactFactor > INSERT < _123 mesur:hasObject urn:issn:1082-9873 > INSERT < _123 mesur:hasStartTime 2007 > INSERT < _123 mesur:hasEndTime 2007 > INSERT < _123 mesur:hasNumbericValue (COUNT(?x) / COUNT(?y)) > From the Publishes and Citation contexts, generate Impact Factor Rankings.
  • 21. The Metrics
    • Eigenvector-based global-rank metrics such as PageRank, Eigenvector centrality, Y-Factor, and relative-rank ‘spreading activation’ algorithms can be calculated in a similar fashion.
    Rodriguez, M.A., “ Grammar-Based Random Walkers in Semantic Networks ”, [in review], 2007.
  • 22. Conclusion
    • Thanks for your time…Good life.
    http://www.mesur.org