Your SlideShare is downloading. ×
0
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
A Model of the Scholarly Community
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

A Model of the Scholarly Community

1,280

Published on

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,280
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
28
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. A Model of the Scholarly Community Marko A. Rodriguez http://www.soe.ucsc.edu/~okram March 30, 2007
  • 2. MESUR Project <ul><li>2-year project </li></ul><ul><li>First half of the project is focused on ontology development , parsing, and the development of analysis algorithms ( metrics ). </li></ul><ul><li>Second half of the project is the analysis of our data structure and reporting our findings in the literature. </li></ul>
  • 3. Outline <ul><li>The Data : which and how much data? </li></ul><ul><li>The Model : how do we represent the data? </li></ul><ul><li>The Metrics : how do we quantify the entities in our model? </li></ul>
  • 4. Terminology <ul><li>Groups : journals, proceedings, magazines, newspapers, edited books. </li></ul><ul><li>Units : articles, book chapters, dissertations. </li></ul><ul><li>Documents : groups and units. </li></ul><ul><li>Usage-Event : the act interacting with an article. (e.g. getFullText, getAbstract, getReferences) -- expression of user interest. </li></ul>
  • 5. The Data <ul><li>Two types of data: </li></ul><ul><ul><li>Bibliographic data : metadata pertaining groups and units. </li></ul></ul><ul><ul><li>Usage data : metadata pertaining to group and unit usage. </li></ul></ul>
  • 6. The Data <ul><li>Group-level Bibliographic Data </li></ul><ul><ul><li>SFX Master List: &gt; 300,000 groups </li></ul></ul><ul><ul><li>SFX: &gt; 85,000 group classifications </li></ul></ul><ul><ul><li>ISI JCR: &gt; 8,000 indexed groups </li></ul></ul><ul><ul><li>ISI JCR: &gt; 50,000,000 group citations </li></ul></ul><ul><ul><li>ISI JCR: &gt; 100,000 group classifications </li></ul></ul><ul><li>Unit-level Bibliographic Data </li></ul><ul><ul><li>ISI Tapes: &gt; 30,000,000 unit records </li></ul></ul><ul><ul><li>ISI Tapes: &gt; 500,000,000 unit citations </li></ul></ul>
  • 7. The Data <ul><li>Usage Data </li></ul><ul><ul><li>Los Alamos: &gt; 400,000 1-year </li></ul></ul><ul><ul><li>BioMed Central: &gt; 24,000,000 2-years </li></ul></ul><ul><ul><li>anonymous : &gt; 1,000,000 5-years </li></ul></ul><ul><ul><li>anonymous : &gt; 2,500,000 1-year </li></ul></ul><ul><ul><li>anonymous : &gt; 50,000,000 1-week </li></ul></ul><ul><ul><li>… </li></ul></ul>
  • 8. The Data <ul><li>The semantic network model is estimated be &gt;10 billion triples (edges). </li></ul><ul><ul><li>as of March 2007: 1.2 billion. </li></ul></ul>
  • 9. The Model <ul><li>In order to integrate the various data sets in their various formats, we model all information according to an ontology. </li></ul>
  • 10. The Model <ul><li>RDF, RDFS, OWL [W3C Standards] </li></ul><ul><ul><li>Resource Description Framework </li></ul></ul><ul><ul><li>Resource Description Framework Schema </li></ul></ul><ul><ul><li>Web Ontology Language </li></ul></ul><ul><li>Provides us a standardized language for which to represent our entities and their relationships to one another. </li></ul>
  • 11. The Model <ul><li>In OWL, everything is an owl:Thing--both nodes and edges (analogous to java.lang.Object in Java) </li></ul><ul><li>All owl:Things are represented by a URI. </li></ul><ul><li>An instance of the ontology provides us with a URI triple list data structure: </li></ul>
  • 12. The Model <ul><li>The instance of an OWL ontology resides in a triple store. </li></ul>
  • 13. The Model <ul><li>SPARQL (like SQL, but for triple stores). </li></ul>SELECT ?c as grandparent WHERE ( ?a childOf ?b) ( ?b childOf ?c )
  • 14. The Model Rodriguez, M.A., Bollen, J., Van de Sompel, H., “ A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage ”, IEEE/ACM Joint Conference on Digital Libraries, Vancouver, 2007.
  • 15. The Model
  • 16. The Model
  • 17. The Model SELECT ?x WHERE ( ?x rdf:type mesur:Publishes ) ( ?x mesur:hasAuthor lanl:marko ) ( ?x mesur:hasAuthor lanl:herbertv ) INSERT &lt; _123 rdf:type mesur:Coauthor &gt; INSERT &lt; _123 mesur:hasSource lanl:marko &gt; INSERT &lt; _123 mesur:hasSink lanl:herbertv &gt; INSERT &lt; _123 mesur:hasWeight COUNT(?x) &gt; INSERT &lt; _456 rdf:type mesur:Coauthor &gt; INSERT &lt; _456 mesur:hasSource lanl:herbertv &gt; INSERT &lt; _456 mesur:hasSink lanl:marko &gt; INSERT &lt; _456 mesur:hasWeight COUNT(?x) &gt; From the Publishes contexts, generate a weighted coauthorship network.
  • 18. The Model Phase 1 is looking just at group level usage and bibliographic data
  • 19. The Metrics <ul><li>ISI Impact Factor </li></ul><ul><li>Usage Impact Factor </li></ul><ul><ul><li>Bollen J., Van de Sompel, H., “ Usage Impact Factor: The Effects of Sample Characteristics on Usage-based Impact Metrics ”, [in review], 2007. </li></ul></ul><ul><li>H-Index </li></ul><ul><ul><li>Hirsh, J.E., “ An index to quantify an individual&apos;s scientific research output ”, Proceedings of the National Academy of Science, 102:46, 2005. </li></ul></ul><ul><li>Y-Factor </li></ul><ul><ul><li>Bollen J., Rodriguez, M.A., Van de Sompel, H., “ Journal Status ”, Scientometrics, 69:3, 2006. </li></ul></ul><ul><li>… </li></ul>
  • 20. The Metrics SELECT ?x WHERE ( ?x rdf:type mesur:Publishes ) ( ?x mesur:hasUnit ?a ) ( ?x mesur:hasGroup ?b ) ( ?b mesur:partOf urn:issn:1082-9873 ) ( ?x mesur:hasTime ?t ) AND (?t &gt; 2004 AND ?t &lt; 2007) ( ?y rdf:type mesur:Citation ) ( ?y mesur:hasSource ?c ) ( ?y mesur:hasSink ?a ) ( ?z rdf:type mesur:Publishes ) ( ?z mesur:hasUnit ?c ) ( ?z mesur:hasTime ?u) AND ?u = 2007 SELECT ?y WHERE ( ?y rdf:type mesur:Publishes ) ( ?y mesur:hasGroup ?a ) ( ?a mesur:partOf urn:issn:1082-9873 ) ( ?y mesur:hasTime ?t ) AND (?t &gt; 2004 AND ?t &lt; 2007) INSERT &lt; _123 rdf:type mesur:ImpactFactor &gt; INSERT &lt; _123 mesur:hasObject urn:issn:1082-9873 &gt; INSERT &lt; _123 mesur:hasStartTime 2007 &gt; INSERT &lt; _123 mesur:hasEndTime 2007 &gt; INSERT &lt; _123 mesur:hasNumbericValue (COUNT(?x) / COUNT(?y)) &gt; From the Publishes and Citation contexts, generate Impact Factor Rankings.
  • 21. The Metrics <ul><li>Eigenvector-based global-rank metrics such as PageRank, Eigenvector centrality, Y-Factor, and relative-rank ‘spreading activation’ algorithms can be calculated in a similar fashion. </li></ul>Rodriguez, M.A., “ Grammar-Based Random Walkers in Semantic Networks ”, [in review], 2007.
  • 22. Conclusion <ul><li>Thanks for your time…Good life. </li></ul>http://www.mesur.org

×