Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Metadata Usage Tendencies in Latin American Electronic Journals


Published on

Presented at ELPUB 2009, and written along with Helen Francke (University of Borås) and Saray Córdoba (University of Costa Rica)

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Metadata Usage Tendencies in Latin American Electronic Journals

  1. 1. Metadata Usage Tendencies in Latin American Electronic Journals Rolando Coto-Solano Helena Francke Saray Córdoba-González ELPUB 2009 Milan, Italy
  2. 2. Justification (1/2) <ul><li>In order to fully exploit the advantages OA brings, journals need to be both visible and accessible. - In practice, this means: </li></ul><ul><ul><li>Being included in journal indexes </li></ul></ul><ul><ul><li>Being retrievable by data harvesters (such as OAI harvesters) </li></ul></ul><ul><ul><li>Being retrievable by search engines (such as the current vectorial-based search engines, and the future semantic web answer engines). </li></ul></ul><ul><ul><li>To help with this process, the main information about an article, the metadata , can be presented according to certain standards. (From full DTD marking and use of Dublin Core, to at least marking the <description> of a web page). </li></ul></ul>
  3. 3. Justification (2/2) <ul><li>The metadata (such as article title, keywords, abstract) represent the minimal desirable information that we would like the editors to encode so that their articles are easy to describe and more likely to be retrieved. </li></ul><ul><li>We focused on three questions: </li></ul><ul><li>How often do these data appear in Latin American journals? How often do these journals really include information such as abstracts (or abstracts in more than one language, for example)? </li></ul><ul><li>How often are the metadata encoded in a way that would help retrievability? (DTD, any form of XML, Dublin Core, HTML metadata) </li></ul><ul><li>How often do the titles of the web pages include sufficient information about the content of the webpage? </li></ul>
  4. 4. <ul><li>Methodology </li></ul><ul><li>How many journals have metadata? </li></ul><ul><li>Preparedness for DTD/XML marking (How much of the data that could be marked does exist?) </li></ul><ul><li>What about multilingualism? </li></ul><ul><li>Most used metatags </li></ul><ul><li>Actual output formats: (X)HTML vrs PDF </li></ul><ul><li>The most visible tag: <title> </li></ul><ul><li>Conclusions </li></ul>Contents
  5. 5. Methodology (1/2) <ul><li>- We randomly chose a sample of 167 journals from the LATINDEX database, belonging to 12 different countries and territories (Argentina, Brazil, Chile, Colombia, Costa Rica, Cuba, Ecuador, Mexico, Peru, Puerto Rico, Uruguay and Venezuela). </li></ul><ul><li>After eliminating the journals that didn’t have their own website (were available only within a journal portal) or weren’t full peer-reviewed journals (bulletins, for example), we were left with a sample of 123 journals. </li></ul>
  6. 6. Methodology <ul><li>- We examined four “levels”: </li></ul><ul><ul><li>Cover of the website (the main entry page) </li></ul></ul><ul><ul><li>Table of contents </li></ul></ul><ul><ul><li>Article presentation page (title and abstract of the article) </li></ul></ul><ul><ul><li>Article full text page </li></ul></ul><ul><ul><li>In each of these, we examined the available metatags and the format of the page. We also examined the contents of the <title> tag. </li></ul></ul><ul><li>We examined the first article of the latest number of each article, and checked whether it included a title, an abstract, keywords, author information, and whether this was presented in more than one language. </li></ul>Methodology (2/2)
  7. 7. How many journals have metadata (on any level) Total Journals 123 (100%) Have any metatags 107 (87%) Have any non-automatic metatags 55 (45%) Have DC metatags 16 (13%)
  8. 8. Journals in Costa Rica and Argentina use DC tags significantly more frequently than the rest of the countries in the sample, 66% and 35% respectively (p < 0.05). Brazil also presents a high usage of DC tags in its journals: 17%. This might be due to training provided to the editors by scientific institutions in the country (more on this in the Conclusions section).
  9. 9. Presence of basic descriptors in the articles(title, abstract, keywords, author affiliation) Total articles 123 (100%) Has a title 123 (100%) Has author affilliation 105 (85%) Has an abstract 103 (84%) Has keywords 95 (77%) Has title marked as metadata 17 (14%) Has author affilliation marked as metadata 5 (4%) Has abstract marked as metadata 9 (7%) Has keywords marked as metadata 8 (7%)
  10. 10. More engineering and medical sciences journals use keywords than do journals in other areas. Medical sciences journals use significantly more abstracts than journals in other areas. (96% of the medical journals use abstracts). Journals in Arts and Humanities use significantly less abstracts and keywords than journals in other areas (73% use abstracts and 64% use keywords). (An interesting find is that, contrary to what could be expected, journals in the Exact and Natural Sciences and in Social Sciences are not significantly different in their use of abstracts and keywords. (For example, 70% of Exact and Natural Sciences journals use keywords; 71% of the Social Sciences journals use keywords). This will have to be verified in further studies).
  11. 11. Presence of basic descriptors (title, abstract, keywords, author affiliation) Total articles 123 (100%) Has title in English 54 (44%) Has abstract in English 88 (72%) Has keywords in English 87 (71%) 13% of the articles were written in English
  12. 12. Most used metatags Cover (n = 55) Table of contents (n = 42) keywords (58%) keywords (60%) description (58%) description (57%) author (27%) robots (31%) robots (26%) author (26%) Article presentation page (n = 20) Full text page in HTML (n = 16) DC.Language (50%) keywords (50%) DC.Title (50%) description (50%) DC.Description (45%) author (25%) DC.Type (45%) robots (18%) Article presentation pages are salient in their use of DC tags. This might be due to the fact that, if the journal uses article presentation pages at all (a practice that is not very common in Latin America), then the editor could have also become aware of other “good practices” in larger publishing cultures, such as use of DC.
  13. 13. Actual output formats: (X)HTML vrs PDF About 7% are specified as XHTML. However, we only found one ( Electronic Journal of Biotechnology ) that offered access to an actual XML-marked copy. (Systems such as SciELO and RedALyC do offer XML copies of their articles). (X)HTML 33 (27%) PDF 105 (85%) Both (X)HTML and PDF 17 (14%)
  14. 14.   Contents of the tag <title>
  15. 15. Contents of the tag <title> on the Cover level Journal title 93 (76%) Institution Name 24 (20%) Issue Information 2 (2%) Have a cover 122 (100%) None of the above 10 (8%) Two or more of the above 9 (7%)
  16. 16. Contents of the tag <title> on the Full Text HTML page Journal title 14 (42%) Institution Name 12 (36%) Issue Information 5 (15%) Have a full text page 33 (100%) Two or more of the above 4 (12%) Article title 14 (42%) Author name 4 (12%) None of the above 1 (3%) Both title and author’s name 0 (0%)
  17. 17. <ul><li>Use of non-national languages in the metadata was high (particularly of English), but not in the text of the article itself. Most of the articles (84%) were written in the national languages. </li></ul><ul><li>(Speakers of Spanish particularly have traditionally been credited with being defensive about their language, and editors often reflect this attitude. Moreover, many institutions in the Spanish speaking world are promoting the use of Spanish as a language of science. This debate, however, has become ideologically charged, and it’s still ongoing). </li></ul><ul><li>Multilingualism was very often not taken into account when marking metadata. </li></ul><ul><li>Relatively few journals in the sample used Content Management Systems (29% use any system; 9% use OJS or an OJS derivate). </li></ul>Conclusions (1/2)
  18. 18. <ul><li>PDF-centric publishing distracts attention from keywords and from text marking in general </li></ul><ul><li>Of course the best solution would be DTD/XML marking. This would </li></ul><ul><ul><li>Help editors think of key data (such as abstracts) </li></ul></ul><ul><ul><li>Providing better data for existing vectorial-based search engines </li></ul></ul><ul><ul><li>Help index the data for the use of future semantic web search engines [ think Wolfram Alpha ] </li></ul></ul><ul><li>Training of editors is key, as it helps implement relatively non-expensive standards (DC; titles according to good SEO practices), and could help sensitivize editors towards more complex standards (DTD/XML) </li></ul>Conclusions (2/2)