TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013


Published on

A presentation on "Extraction and Visualization of Metadata Analytics for Multimedia Learning Object Repositories: The case of TERENA TF-media network OER portal" presented at the LACRO workshop of the LAK Conference, on April 9th, 2013

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • This presentation is about extracting metadata analytics for Multimedia Learning Object Repositories. It was conducted by memebers of GRNET in collaboration with Nikos Manouselis from Agro-Know Technologies and Peter Szegedi from Terena TF Media.
  • There are many previous studies that have worked on the analysis of the metadata used content providers. So our work is based on a concept and methods that have been previous defined. The most relevant work to our study is the first fully quantified study in a large number of Learning Repositories that belongs to Globe.
  • Let’s start by pointing why a study on metadata analytics is important when we are developing a learning portal that is based on metadata aggregation from a number of providers. When we are developing such a learning portal questions like these are born. Most of these questions are about portal design decisions and thus very critical for the development of the portal.
  • The main objectives of our study are:
  • Let’s first say some words about TERENA OER Project. The goal of this project is to set up a European level metadata aggregation portal for ….. It aims at creating a broken ….. and a bridge between ….
  • Here you can see the list of the data providers that were successfully harvested. The vast majority of the providers are using Dublin Core as a format for exposing the metadata. The goal is to identify these providers that can be used for the first version of the TERENA OER portal.
  • We have 58K+ instances and we need to select the data providers that can be used for the first version of the TERENA OER portal
  • How we did we conduct the study. We performed a repository based and we studied:
  • What we have used to conduct this study is the Ariadne Harvester in order to aggregate the metadata records and fro the metadata analysis we have developed a tool that will ….
  • We have this version now bu
  • we are working on such a version which will include GUI that will allow to researchers to easily estimate metadata analytics and visualize them.
  • A heat map table for the element usage. One can see (++CLICK++) that elements such as title, Identifier, Language, Type and Creator are generally used by the studied repositories. The repository with the most rich metadata is the repository of campus do Mar, SCAM, Malopolskie and RiuNet
  • In this chart the average element usage values for all the studied repositories is presented. It is evident again elements like Title, identifier, language, type and publisher are highly used almost by all repositories. These elements can be candidates as mandatory/core elements of the aggregator metadata schema.
  • Zero entropy means that the values are null or we have the same value for all elements. Format element is used only by one repository and thus the entropy is zero for the rest. High values of entropy indicates heterogeneity/diversity of info. Type of learning object has high values in many repositories and this means that we have various type of objects. Further, this means that if TERENA OER will include only audiovisual material some kind of filtering will be needed. For Campus do Mar all the entropy for all the values were found zero and this is due to the fact that all the records has the same values for these elements.
  • The main conclusion from language analysis of free text elements is that if all the repositories will be aggregated then English can be the main language supported on TERENA OER Portal.
  • TERENA OER portal, metadata extraction analysis, LAK, Leuven @9apr2013

    1. 1. Extraction and Visualization of Metadata Analytics for Multimedia Learning ObjectRepositories: The case of TERENA TF-media network Kostas Vogias1, Ilias Hatzakis1, Nikos Manouselis2, Peter Szegedi3 1 Greek Research and Technology Network 2 Agro-Know Technologies 3 TERENA TF-media Workshop on Learning Object Analytics for Collections, Repositories and Federations, 9 April, 2013
    2. 2. Metadata analysis is not something new• Ochoa, Xavier, Klerkx, Joris, Vandeputte, Bram, and Duval, Erik. – On the Use of Learning Object Metadata: The GLOBE Experience. • Made the first fully quantitative study in a large number of Learning Repositories that belongs to a large organization like Globe• Neven, Filip and Duval, Erik. – Reusable Learning Objects: a Survey of LOM-Based Repositories.• Zschocke, Thomas and Beniest, Jan and Paisley, Courtney and Najjar, Jehad and Duval, Erik. – The LOM application profile for agricultural learning resources of the CGIAR • studied the use of LOM for the indexing of learning resources• Manouselis, N, Salokhe, G, Keizer, J, and Rudgard, S. – Towards a Harmonization of Metadata Application Profiles for Agricultural Learning Repositories. • Made an analysis of the metadata schemas used by Repositories including Agricultural Learning Resources.
    3. 3. Why extraction of metadata analytics isimportant when we are developing a learning portal Which metadata schema is used by our content providers? How the providers are using different elements of the schema? On which metadata schema our learning portal should rely? Can we provide services based on metadata elements such as Portal design decisions Portal design decisions Subject, Type, Format, Keywords, Title and Descriptions? Which languages can our portal support?
    4. 4. Study Objectives• To perform a quantified study on the different metadata schemas used by TERENA TF media network• To propose a metadata schema on which the TERENA OER portal will be based• To verify if metadata analytics can constitute a tool that can facilitate the development of a learning portal that is based on metadata records aggregated by various content providers
    5. 5. What Terena OER Project is• A European level metadata aggregation portal for Open Educational Resources (primarily audiovisual contents, recorded lectures) collected and maintained by institutional and national content repositories of the Research & Education Community.• Main objectives of the project – Create a broker for national learning resource organizations. – Bridge the gap between the national repositories and the emerging global repositories (e.g., GLOBE) by establishing a European level metadata repository (i.e. aggregation point) for the national repositories acting in the R&E community. The European level repository will be a metadata repository only, the content remains in its original content repository.
    6. 6. Which Data Providers• Successfully harvested.Repository Name Records Harvested Metadata SchemaSwitch Collection 619 oai_dcDSpace at University Of Latvia 1009 oai_dcOBAA Repository 56 oai_dcRiuNet: Repositori Institucional 21902 oai_dcde la Universitat Politècnica de 58.000 + instancesValènciaSCAM Repository 7351 oai_lomMaterial Audiovisual ofrecido por 978 oai_dcel Campus do Marwikiwijs 26054 oai_dcMałopolskie Towarzystwo 181 oai_dcGenealogiczneSelect the ones that can be used for the first version of TERENA OERportal
    7. 7. how and what we used
    8. 8. HOW• Repository Based Analysis.• Metrics – Element Completeness: The percentage of records in which an element has a value. – Relative Entropy: Diversity of values in an element. – Vocabulary values distribution: Format, Language and Type – Language properties: Attribute (e.g. lang=en) value usage frequency e.g. lang in free text metadata elements Title, Description and Keyword• The analysis was performed for a core set of metadata elements that is present in the studied repositories• Use of a standard set of mappings from DC to LOM
    9. 9. What we have used• ARIADNE Harvester• A metadata analysis tool – Implemented using JAVA. – Metadata schema agnostic. – Metadata Analysis schemes: 1. Repository based. 2. Federation based. – Element based analysis: • Completeness • Relative Entropy • Specific element vocabulary extraction and usage frequency – Attribute based analysis. • Lang value attribute frequency – Input:XMLs – Output:CSV,TXT
    10. 10. Current version of the tool
    11. 11. Next version of the tool
    12. 12. RESULTSresults
    13. 13. elements usage
    14. 14. Element completeness Campus doElement Switch UOL OBAA RiuNet SCAM Mar wikiwijs Malopolskie Title 52.62 100.00 100.00 100.00 99.96 100.00 100.00 100.00Description 47.67 0.10 0.00 0.17 99.96 100.00 0.00 99.44 Subject 52.45 100.00 0.00 96.84 10.49 100.00 0.00 100.00 Format 0.00 0.00 0.00 0.00 73.31 100.00 0.00 98.89 Identifier 52.62 100.00 100.00 100.00 100.00 97.24 0.00 100.00Language 3.10 95.34 0.00 99.91 92.45 100.00 100.00 100.00 Type 52.33 99.31 0.00 96.57 74.12 100.00 100.00 100.00 Publisher 47.38 44.84 0.00 43.97 65.43 100.00 100.00 97.78 Creator 52.45 92.76 89.09 99.67 65.43 100.00 0.00 32.78 Rights 52.62 0.20 0.00 100.00 89.27 100.00 96.07 0.00 Date 0.00 31.15 100.00 6.38 45.89 100.00 0.00 98.89 Relation 0.00 15.87 0.00 3.83 0.00 0.00 0.00 47.22 Coverage 0.00 67.66 0.00 97.94 0.00 0.00 0.00 100.00 Source 51.85 0.00 0.00 100.00 100.00 0.00 0.00 99.44
    15. 15. Average Element Usage Define the mandatory elements for the schema of metadata aggregator
    16. 16. which values are used
    17. 17. Heterogeneity of information
    18. 18. What type of digital objectsOf interest for TERENA OER We need a filtering mechanism to keep only the LO that are suitable for TERENA
    19. 19. which languages are used for theannotation
    20. 20. Language distribution for title
    21. 21. Language distribution for description
    22. 22. Language distribution for keyword
    23. 23. English can be the main languagesupported on the TERENA OER Portal
    24. 24. Conclusions• Metadata Analysis helps in: – Defining metadata aggregation element set. – Defining the type of services that can be provided by the TERENA OER portal e.g. browse by type of LOs, elements that can be used for full text search – Providing recommendations back to the providers about usage of metadata elements – Validating the metadata records at harvesting time• Next steps – Extend to more repositories of TERENA TF-media network – Combine with results of an online survey for content providers – Develop the web based version of the tool and provide it as an open source tool
    25. 25. Thank you! Ilias Hatzakis,GRNEThatzakis@admin.grnet.gr