A Model of the  Scholarly Community Marko A. Rodriguez http://www.soe.ucsc.edu/~okram March 30, 2007
MESUR Project <ul><li>2-year project </li></ul><ul><li>First half of the project is focused on  ontology development , par...
Outline <ul><li>The Data :  which and how much data? </li></ul><ul><li>The Model :  how do we represent the data? </li></u...
Terminology <ul><li>Groups :  journals, proceedings, magazines, newspapers, edited books. </li></ul><ul><li>Units :  artic...
The Data <ul><li>Two types of data: </li></ul><ul><ul><li>Bibliographic data :  metadata pertaining groups and units. </li...
The Data <ul><li>Group-level Bibliographic Data </li></ul><ul><ul><li>SFX Master List:  > 300,000 groups </li></ul></ul><u...
The Data <ul><li>Usage Data </li></ul><ul><ul><li>Los Alamos: > 400,000 1-year </li></ul></ul><ul><ul><li>BioMed Central: ...
The Data <ul><li>The semantic network model is estimated be >10 billion triples (edges). </li></ul><ul><ul><li>as of March...
The Model <ul><li>In order to integrate the various data sets in their various formats, we model all information according...
The Model <ul><li>RDF, RDFS, OWL [W3C Standards] </li></ul><ul><ul><li>Resource Description Framework </li></ul></ul><ul><...
The Model <ul><li>In OWL, everything is an owl:Thing--both nodes and edges  (analogous to java.lang.Object in Java) </li><...
The Model <ul><li>The instance of an OWL ontology resides in a triple store. </li></ul>
The Model <ul><li>SPARQL (like SQL, but for triple stores). </li></ul>SELECT ?c as grandparent WHERE  ( ?a childOf ?b)  ( ...
The Model Rodriguez, M.A., Bollen, J., Van de Sompel, H., “ A Practical Ontology for the Large-Scale Modeling of Scholarly...
The Model
The Model
The Model SELECT ?x WHERE  ( ?x rdf:type mesur:Publishes )  ( ?x mesur:hasAuthor lanl:marko ) ( ?x mesur:hasAuthor lanl:he...
The Model Phase 1 is looking just at group level usage and bibliographic data
The Metrics <ul><li>ISI Impact Factor </li></ul><ul><li>Usage Impact Factor </li></ul><ul><ul><li>Bollen J., Van de Sompel...
The Metrics SELECT  ?x WHERE  ( ?x rdf:type mesur:Publishes )  ( ?x mesur:hasUnit ?a ) ( ?x mesur:hasGroup ?b ) ( ?b mesur...
The Metrics <ul><li>Eigenvector-based global-rank metrics such as PageRank, Eigenvector centrality, Y-Factor, and relative...
Conclusion <ul><li>Thanks for your time…Good life. </li></ul>http://www.mesur.org
Upcoming SlideShare
Loading in...5
×

A Model of the Scholarly Community

1,315

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,315
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

A Model of the Scholarly Community

  1. 1. A Model of the Scholarly Community Marko A. Rodriguez http://www.soe.ucsc.edu/~okram March 30, 2007
  2. 2. MESUR Project <ul><li>2-year project </li></ul><ul><li>First half of the project is focused on ontology development , parsing, and the development of analysis algorithms ( metrics ). </li></ul><ul><li>Second half of the project is the analysis of our data structure and reporting our findings in the literature. </li></ul>
  3. 3. Outline <ul><li>The Data : which and how much data? </li></ul><ul><li>The Model : how do we represent the data? </li></ul><ul><li>The Metrics : how do we quantify the entities in our model? </li></ul>
  4. 4. Terminology <ul><li>Groups : journals, proceedings, magazines, newspapers, edited books. </li></ul><ul><li>Units : articles, book chapters, dissertations. </li></ul><ul><li>Documents : groups and units. </li></ul><ul><li>Usage-Event : the act interacting with an article. (e.g. getFullText, getAbstract, getReferences) -- expression of user interest. </li></ul>
  5. 5. The Data <ul><li>Two types of data: </li></ul><ul><ul><li>Bibliographic data : metadata pertaining groups and units. </li></ul></ul><ul><ul><li>Usage data : metadata pertaining to group and unit usage. </li></ul></ul>
  6. 6. The Data <ul><li>Group-level Bibliographic Data </li></ul><ul><ul><li>SFX Master List: > 300,000 groups </li></ul></ul><ul><ul><li>SFX: > 85,000 group classifications </li></ul></ul><ul><ul><li>ISI JCR: > 8,000 indexed groups </li></ul></ul><ul><ul><li>ISI JCR: > 50,000,000 group citations </li></ul></ul><ul><ul><li>ISI JCR: > 100,000 group classifications </li></ul></ul><ul><li>Unit-level Bibliographic Data </li></ul><ul><ul><li>ISI Tapes: > 30,000,000 unit records </li></ul></ul><ul><ul><li>ISI Tapes: > 500,000,000 unit citations </li></ul></ul>
  7. 7. The Data <ul><li>Usage Data </li></ul><ul><ul><li>Los Alamos: > 400,000 1-year </li></ul></ul><ul><ul><li>BioMed Central: > 24,000,000 2-years </li></ul></ul><ul><ul><li>anonymous : > 1,000,000 5-years </li></ul></ul><ul><ul><li>anonymous : > 2,500,000 1-year </li></ul></ul><ul><ul><li>anonymous : > 50,000,000 1-week </li></ul></ul><ul><ul><li>… </li></ul></ul>
  8. 8. The Data <ul><li>The semantic network model is estimated be >10 billion triples (edges). </li></ul><ul><ul><li>as of March 2007: 1.2 billion. </li></ul></ul>
  9. 9. The Model <ul><li>In order to integrate the various data sets in their various formats, we model all information according to an ontology. </li></ul>
  10. 10. The Model <ul><li>RDF, RDFS, OWL [W3C Standards] </li></ul><ul><ul><li>Resource Description Framework </li></ul></ul><ul><ul><li>Resource Description Framework Schema </li></ul></ul><ul><ul><li>Web Ontology Language </li></ul></ul><ul><li>Provides us a standardized language for which to represent our entities and their relationships to one another. </li></ul>
  11. 11. The Model <ul><li>In OWL, everything is an owl:Thing--both nodes and edges (analogous to java.lang.Object in Java) </li></ul><ul><li>All owl:Things are represented by a URI. </li></ul><ul><li>An instance of the ontology provides us with a URI triple list data structure: </li></ul>
  12. 12. The Model <ul><li>The instance of an OWL ontology resides in a triple store. </li></ul>
  13. 13. The Model <ul><li>SPARQL (like SQL, but for triple stores). </li></ul>SELECT ?c as grandparent WHERE ( ?a childOf ?b) ( ?b childOf ?c )
  14. 14. The Model Rodriguez, M.A., Bollen, J., Van de Sompel, H., “ A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage ”, IEEE/ACM Joint Conference on Digital Libraries, Vancouver, 2007.
  15. 15. The Model
  16. 16. The Model
  17. 17. The Model SELECT ?x WHERE ( ?x rdf:type mesur:Publishes ) ( ?x mesur:hasAuthor lanl:marko ) ( ?x mesur:hasAuthor lanl:herbertv ) INSERT < _123 rdf:type mesur:Coauthor > INSERT < _123 mesur:hasSource lanl:marko > INSERT < _123 mesur:hasSink lanl:herbertv > INSERT < _123 mesur:hasWeight COUNT(?x) > INSERT < _456 rdf:type mesur:Coauthor > INSERT < _456 mesur:hasSource lanl:herbertv > INSERT < _456 mesur:hasSink lanl:marko > INSERT < _456 mesur:hasWeight COUNT(?x) > From the Publishes contexts, generate a weighted coauthorship network.
  18. 18. The Model Phase 1 is looking just at group level usage and bibliographic data
  19. 19. The Metrics <ul><li>ISI Impact Factor </li></ul><ul><li>Usage Impact Factor </li></ul><ul><ul><li>Bollen J., Van de Sompel, H., “ Usage Impact Factor: The Effects of Sample Characteristics on Usage-based Impact Metrics ”, [in review], 2007. </li></ul></ul><ul><li>H-Index </li></ul><ul><ul><li>Hirsh, J.E., “ An index to quantify an individual's scientific research output ”, Proceedings of the National Academy of Science, 102:46, 2005. </li></ul></ul><ul><li>Y-Factor </li></ul><ul><ul><li>Bollen J., Rodriguez, M.A., Van de Sompel, H., “ Journal Status ”, Scientometrics, 69:3, 2006. </li></ul></ul><ul><li>… </li></ul>
  20. 20. The Metrics SELECT ?x WHERE ( ?x rdf:type mesur:Publishes ) ( ?x mesur:hasUnit ?a ) ( ?x mesur:hasGroup ?b ) ( ?b mesur:partOf urn:issn:1082-9873 ) ( ?x mesur:hasTime ?t ) AND (?t > 2004 AND ?t < 2007) ( ?y rdf:type mesur:Citation ) ( ?y mesur:hasSource ?c ) ( ?y mesur:hasSink ?a ) ( ?z rdf:type mesur:Publishes ) ( ?z mesur:hasUnit ?c ) ( ?z mesur:hasTime ?u) AND ?u = 2007 SELECT ?y WHERE ( ?y rdf:type mesur:Publishes ) ( ?y mesur:hasGroup ?a ) ( ?a mesur:partOf urn:issn:1082-9873 ) ( ?y mesur:hasTime ?t ) AND (?t > 2004 AND ?t < 2007) INSERT < _123 rdf:type mesur:ImpactFactor > INSERT < _123 mesur:hasObject urn:issn:1082-9873 > INSERT < _123 mesur:hasStartTime 2007 > INSERT < _123 mesur:hasEndTime 2007 > INSERT < _123 mesur:hasNumbericValue (COUNT(?x) / COUNT(?y)) > From the Publishes and Citation contexts, generate Impact Factor Rankings.
  21. 21. The Metrics <ul><li>Eigenvector-based global-rank metrics such as PageRank, Eigenvector centrality, Y-Factor, and relative-rank ‘spreading activation’ algorithms can be calculated in a similar fashion. </li></ul>Rodriguez, M.A., “ Grammar-Based Random Walkers in Semantic Networks ”, [in review], 2007.
  22. 22. Conclusion <ul><li>Thanks for your time…Good life. </li></ul>http://www.mesur.org
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×