Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Aloa - A Web Services Driven Framework for Automatic Learning Objcet Annotation


Published on

Talk given at the Metadata 2.0 workshop in Leuven on Feb 7, 2008.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Aloa - A Web Services Driven Framework for Automatic Learning Objcet Annotation

  1. 1. Mohamed Amine Chatti Informatik 5, RWTH Aachen, Germany PROLEARN Network of Excellence ALOA – A Web Services Driven Framework for A utomatic L earning O bject A nnotation
  2. 2. Agenda <ul><li>Why Automatic Metadata Generation? </li></ul><ul><li>AMG v.1 </li></ul><ul><li>AMG v.2 – SAmgI </li></ul><ul><li>ALOA </li></ul><ul><li>ALOA and AMG </li></ul><ul><li>Conclusion and Future Work </li></ul>
  3. 3. Metadata <ul><li>Metadata is crucial for search, access, share, and reuse. </li></ul><ul><li>Dealing with metadata cannot be a human task (Duval and Hodgins, 2004) </li></ul><ul><ul><li>Complex metadata standards (e.g. 9 LOM categories and 45 records of LOM level two) </li></ul></ul><ul><ul><li>Benefit not immediately appreciated </li></ul></ul><ul><ul><li>Metadata creators too expensive to be employed </li></ul></ul><ul><ul><li>Tools not user friendly (“electronic forms must die”) </li></ul></ul><ul><ul><li>Need for Automatic Metadata Generation </li></ul></ul>
  4. 4. Automatic Approach <ul><li>Use information about the LO and its context to extract or generate its metadata. </li></ul><ul><li>4 aspects of AMG (Cardinaels et al., 2005) </li></ul><ul><ul><li>Content analysis (LO itself, e.g. keyword, language) </li></ul></ul><ul><ul><li>Context analysis (environment the LO is stored or used in, e.g. LMS) </li></ul></ul><ul><ul><li>Usage analysis (e.g. time spent reading a doc) </li></ul></ul><ul><ul><li>Structure analysis (relationship amongst LOs) </li></ul></ul>
  5. 5. <ul><li>AMG at KUL (Cardinaels et al., 2005; Ochoa et al., 2005) </li></ul>AMG v.1
  6. 6. <ul><li>It was an application (Java-based) </li></ul><ul><li>No support for different languages </li></ul><ul><li>Not possible to have a metadata subset as a result </li></ul><ul><li>Not flexible and extensible </li></ul><ul><li>Not really interoperable between platforms </li></ul>AMG v.1 Limitations
  7. 7. AMG v.2 <ul><li>Federated AMG </li></ul><ul><li>Simple AMG Interface (SAmgI) (Meire et al., 2007) </li></ul><ul><li>Main Design Goals: </li></ul><ul><ul><li>Extensibility – Pluggability </li></ul></ul><ul><ul><li>Interoperability (Service oriented) </li></ul></ul>
  8. 8. <ul><li>ObjectBasedGenerators based on the Factory design pattern </li></ul><ul><li>Problem: checkout source code, recompile and rebuild the whole application </li></ul>AMG v.2 Extensibility
  9. 9. <ul><li>Federated AMG Engine - SAmgI installations / service endpoints </li></ul><ul><li>Problem: some programming required (SAmgI WSDL specification, XML schemas, etc.) </li></ul>AMG v.2 Interoperability
  10. 10. ALOA <ul><li>A Framework for LOM-based Automatic LO Annotation </li></ul><ul><li>Service Oriented Architecture (SOA) / Web Services </li></ul><ul><li>Main focus on flexibility and extensibility </li></ul>
  11. 11. <ul><li>Indexer performing these actions: </li></ul><ul><ul><li>read all configurations in the properties file (i.e. available extractors and generators, priority of each generator, maximum generated values) </li></ul></ul><ul><ul><li>access the LO as an array of bytes </li></ul></ul><ul><ul><li>detect the mime type of the LO </li></ul></ul><ul><ul><li>look for the available extractor for this particular mime type </li></ul></ul><ul><ul><li>extract the content and the embedded properties of the LO </li></ul></ul><ul><ul><li>contact the available generators </li></ul></ul><ul><ul><li>solve conflicts </li></ul></ul><ul><ul><li>translate the generated metadata into the required languages </li></ul></ul><ul><ul><li>return the generation result to the Web Service stub </li></ul></ul><ul><li>ConflictResolver </li></ul><ul><ul><li>considers priorities of the generators </li></ul></ul><ul><li>Translator </li></ul><ul><ul><li>uses Google Translate as its translation service </li></ul></ul>ALOA Core Engine
  12. 12. <ul><li>Extractors </li></ul><ul><ul><li>extract content information and embedded properties from LOs </li></ul></ul><ul><ul><li>only one extractor for each LO mime type </li></ul></ul><ul><ul><li>html extractor (Jericho library) </li></ul></ul><ul><ul><li>pdf extractor (pdfBox library) </li></ul></ul><ul><ul><li>word extractor (Apache POI library) </li></ul></ul><ul><ul><li>ppt extractor (Apache POI library) </li></ul></ul><ul><li>Generators </li></ul><ul><ul><li>use the output of the extractors to generate one or parts of the metadata </li></ul></ul><ul><ul><li>text/data mining libraries (e.g. Yahoo! Term Extraction, Tagthe, Topicalizer, LingPipe, Balie, Classifier4J) </li></ul></ul>ALOA Components
  13. 13. <ul><li>Based on the ALOA Web Services API </li></ul><ul><li>Automatically generate metadata from online LOs (html, plain text, word, ppt, pdf) </li></ul><ul><li>Parameters </li></ul><ul><ul><li>URL location of the LO </li></ul></ul><ul><ul><li>Target metadata languages (English, German, Arabic, French, Spanish, Korean) </li></ul></ul><ul><ul><li>Subset of the generated metadata </li></ul></ul><ul><ul><li>Output format (LOM XML, HTML, LOM Editor) </li></ul></ul>ALOA User Interface
  14. 14. <ul><li>Enables to easily plug-in new components (extractors and generators), for instance: </li></ul><ul><ul><li>Extractor for multimedia LO (e.g. audio, video, image, flash) </li></ul></ul><ul><ul><li>Generator for a specific context (e.g. LMS) </li></ul></ul><ul><li>The components can be deployed on different machines or on different application servers </li></ul><ul><li>Once deployed, a component can be plugged into ALOA by just giving the address of the component service </li></ul><ul><li>ALOA core engine validates and adds it to the component list in the properties file </li></ul><ul><li>Dynamic addition in run time; no need to recompile and rebuild the system </li></ul><ul><li>ALOA CMI also enables to manage the priorities of the generators and to define the maximum generated values (used by ALOA core engine) </li></ul>ALOA Configuration Management Interface
  15. 15. <ul><li>ALOA adopts a slightly modified version of SAmgI WSDL specification </li></ul><ul><li>New methods: getLanguages , setLanguages </li></ul><ul><li>Modified method: getMetadata </li></ul><ul><li>Web Services-based interactions between ALOA and AMG possible </li></ul><ul><li>ALOA as a new SAmgI installation used by the federated AMG engine </li></ul><ul><li>AMG as a new component (i.e. extractor or generator) of ALOA </li></ul>ALOA and AMG
  16. 16. <ul><li>ALOA – A framework for LOM-based automatic metadata generation </li></ul><ul><li>ALOA already implements different components (i.e. extractors and generators) </li></ul><ul><li>ALOA already generates LOM from different types of LOs (html, plain text, pdf, ppt, word) </li></ul><ul><li>Primary focus on flexibility and extensibility of the framework </li></ul><ul><li>SOA-based architecture enabling new components to be easily plugged into the basic system </li></ul><ul><li>ALOA provides a public Web Services API for third party applications </li></ul>Conclusion
  17. 17. <ul><li>Interactions between ALOA and AMG </li></ul><ul><li>Extension with more extractors and generators based on other text/data mining techniques </li></ul><ul><li>Look at model transformation techniques to support other metadata schemas (e.g. DC, MPEG) </li></ul><ul><li>Further research of the quality of automatically generated metadata </li></ul><ul><li>Combination of automatic metadata generation with a bottom up approach (e.g. Web 2.0 social tagging) </li></ul>Future Work
  18. 18. Thank You!