Groups : journals, proceedings, magazines, newspapers, edited books.
Units : articles, book chapters, dissertations.
Documents : groups and units.
Usage-Event : the act interacting with an article. (e.g. getFullText, getAbstract, getReferences) -- expression of user interest.
Two types of data:
Bibliographic data : metadata pertaining groups and units.
Usage data : metadata pertaining to group and unit usage.
Group-level Bibliographic Data
SFX Master List: > 300,000 groups
SFX: > 85,000 group classifications
ISI JCR: > 8,000 indexed groups
ISI JCR: > 50,000,000 group citations
ISI JCR: > 100,000 group classifications
Unit-level Bibliographic Data
ISI Tapes: > 30,000,000 unit records
ISI Tapes: > 500,000,000 unit citations
Los Alamos: > 400,000 1-year
BioMed Central: > 24,000,000 2-years
anonymous : > 1,000,000 5-years
anonymous : > 2,500,000 1-year
anonymous : > 50,000,000 1-week
The semantic network model is estimated be >10 billion triples (edges).
as of March 2007: 1.2 billion.
In order to integrate the various data sets in their various formats, we model all information according to an ontology.
RDF, RDFS, OWL [W3C Standards]
Resource Description Framework
Resource Description Framework Schema
Web Ontology Language
Provides us a standardized language for which to represent our entities and their relationships to one another.
In OWL, everything is an owl:Thing--both nodes and edges (analogous to java.lang.Object in Java)
All owl:Things are represented by a URI.
An instance of the ontology provides us with a URI triple list data structure:
The instance of an OWL ontology resides in a triple store.
SPARQL (like SQL, but for triple stores).
SELECT ?c as grandparent WHERE ( ?a childOf ?b) ( ?b childOf ?c )
The Model Rodriguez, M.A., Bollen, J., Van de Sompel, H., “ A Practical Ontology for the Large-Scale Modeling of Scholarly Artifacts and their Usage ”, IEEE/ACM Joint Conference on Digital Libraries, Vancouver, 2007.