Mohamed Amine Chatti Informatik 5, RWTH Aachen, Germany PROLEARN Network of Excellence ALOA  – A Web Services Driven Framework for  A utomatic  L earning  O bject  A nnotation
Agenda Why Automatic Metadata Generation? AMG v.1 AMG v.2 – SAmgI ALOA ALOA and AMG Conclusion and Future Work
Metadata Metadata is crucial for search, access, share, and reuse.  Dealing with metadata cannot be a human task (Duval and Hodgins, 2004)   Complex metadata standards (e.g. 9 LOM categories and  45 records of LOM level two) Benefit not immediately appreciated Metadata creators too expensive to be employed Tools not user friendly (“electronic forms must die”) Need for Automatic Metadata Generation
Automatic Approach Use information about the LO and its context to extract or generate its metadata. 4 aspects of AMG (Cardinaels et al., 2005) Content analysis (LO itself, e.g. keyword, language) Context analysis (environment the LO is stored or used in, e.g. LMS) Usage analysis (e.g. time spent reading a doc) Structure analysis (relationship amongst LOs)
AMG at KUL (Cardinaels et al., 2005; Ochoa et al., 2005) AMG v.1
It was an application (Java-based) No support for different languages Not possible to have a metadata subset as a result Not flexible and extensible Not really interoperable between platforms AMG v.1 Limitations
AMG v.2 Federated AMG Simple AMG Interface (SAmgI) (Meire et al., 2007) Main Design Goals: Extensibility – Pluggability Interoperability (Service oriented)
ObjectBasedGenerators based on the Factory design pattern Problem: checkout source code, recompile and rebuild the whole application AMG v.2 Extensibility
Federated AMG Engine - SAmgI installations / service endpoints Problem: some programming required (SAmgI WSDL specification, XML schemas, etc.) AMG v.2 Interoperability
ALOA A Framework for LOM-based Automatic LO Annotation Service Oriented Architecture (SOA) / Web Services Main focus on flexibility and extensibility
Indexer  performing these actions: read all configurations in the properties file (i.e. available extractors and generators, priority of each generator, maximum generated values) access the LO as an array of bytes detect the mime type of the LO look for the available extractor for this particular mime type extract the content and the embedded properties of the LO contact the available generators solve conflicts translate the generated metadata into the required languages return the generation result to the Web Service stub  ConflictResolver considers priorities of the generators Translator uses Google Translate as its translation service ALOA Core Engine
Extractors extract content information and embedded properties from LOs only one extractor for each LO mime type html extractor (Jericho library) pdf extractor (pdfBox library) word extractor (Apache POI library) ppt extractor (Apache POI library) Generators use the output of the extractors to generate one or parts of the metadata text/data mining libraries (e.g. Yahoo! Term Extraction, Tagthe, Topicalizer, LingPipe, Balie, Classifier4J) ALOA Components
Based on the ALOA Web Services API Automatically generate metadata from online LOs (html, plain text, word, ppt, pdf) Parameters URL location of the LO Target metadata languages (English, German, Arabic, French, Spanish, Korean) Subset of the generated metadata Output format (LOM XML, HTML, LOM Editor) ALOA User Interface
Enables to easily plug-in new components (extractors and generators), for instance: Extractor for multimedia LO (e.g. audio, video, image, flash) Generator for a specific context (e.g. LMS) The components can be deployed on different machines or on different application servers Once deployed, a component can be plugged into ALOA by just giving the address of the component service ALOA core engine validates and adds it to the component list in the properties file Dynamic addition in run time; no need to recompile and rebuild the system ALOA CMI also enables to manage the priorities of the generators and to define the maximum generated values (used by ALOA core engine) ALOA Configuration  Management Interface
ALOA adopts a slightly modified version of SAmgI WSDL specification New methods:  getLanguages ,  setLanguages Modified method:  getMetadata Web Services-based interactions between ALOA and AMG possible ALOA as a new SAmgI installation used by the federated AMG engine AMG as a new component (i.e. extractor or generator) of ALOA  ALOA and AMG
ALOA – A framework for LOM-based automatic metadata generation ALOA already implements different components (i.e. extractors and generators) ALOA already generates LOM from different types of LOs (html, plain text, pdf, ppt, word) Primary focus on flexibility and extensibility of the framework SOA-based architecture enabling new components to be easily plugged into the basic system ALOA provides a public Web Services API for third party applications Conclusion
Interactions between ALOA and AMG Extension with more extractors and generators based on other text/data mining techniques Look at model transformation techniques to support other metadata schemas (e.g. DC, MPEG) Further research of the quality of automatically generated metadata Combination of automatic metadata generation with a bottom up approach (e.g. Web 2.0 social tagging) Future Work
Thank You!

Aloa - A Web Services Driven Framework for Automatic Learning Objcet Annotation

  • 1.
    Mohamed Amine ChattiInformatik 5, RWTH Aachen, Germany PROLEARN Network of Excellence ALOA – A Web Services Driven Framework for A utomatic L earning O bject A nnotation
  • 2.
    Agenda Why AutomaticMetadata Generation? AMG v.1 AMG v.2 – SAmgI ALOA ALOA and AMG Conclusion and Future Work
  • 3.
    Metadata Metadata iscrucial for search, access, share, and reuse. Dealing with metadata cannot be a human task (Duval and Hodgins, 2004) Complex metadata standards (e.g. 9 LOM categories and 45 records of LOM level two) Benefit not immediately appreciated Metadata creators too expensive to be employed Tools not user friendly (“electronic forms must die”) Need for Automatic Metadata Generation
  • 4.
    Automatic Approach Useinformation about the LO and its context to extract or generate its metadata. 4 aspects of AMG (Cardinaels et al., 2005) Content analysis (LO itself, e.g. keyword, language) Context analysis (environment the LO is stored or used in, e.g. LMS) Usage analysis (e.g. time spent reading a doc) Structure analysis (relationship amongst LOs)
  • 5.
    AMG at KUL(Cardinaels et al., 2005; Ochoa et al., 2005) AMG v.1
  • 6.
    It was anapplication (Java-based) No support for different languages Not possible to have a metadata subset as a result Not flexible and extensible Not really interoperable between platforms AMG v.1 Limitations
  • 7.
    AMG v.2 FederatedAMG Simple AMG Interface (SAmgI) (Meire et al., 2007) Main Design Goals: Extensibility – Pluggability Interoperability (Service oriented)
  • 8.
    ObjectBasedGenerators based onthe Factory design pattern Problem: checkout source code, recompile and rebuild the whole application AMG v.2 Extensibility
  • 9.
    Federated AMG Engine- SAmgI installations / service endpoints Problem: some programming required (SAmgI WSDL specification, XML schemas, etc.) AMG v.2 Interoperability
  • 10.
    ALOA A Frameworkfor LOM-based Automatic LO Annotation Service Oriented Architecture (SOA) / Web Services Main focus on flexibility and extensibility
  • 11.
    Indexer performingthese actions: read all configurations in the properties file (i.e. available extractors and generators, priority of each generator, maximum generated values) access the LO as an array of bytes detect the mime type of the LO look for the available extractor for this particular mime type extract the content and the embedded properties of the LO contact the available generators solve conflicts translate the generated metadata into the required languages return the generation result to the Web Service stub ConflictResolver considers priorities of the generators Translator uses Google Translate as its translation service ALOA Core Engine
  • 12.
    Extractors extract contentinformation and embedded properties from LOs only one extractor for each LO mime type html extractor (Jericho library) pdf extractor (pdfBox library) word extractor (Apache POI library) ppt extractor (Apache POI library) Generators use the output of the extractors to generate one or parts of the metadata text/data mining libraries (e.g. Yahoo! Term Extraction, Tagthe, Topicalizer, LingPipe, Balie, Classifier4J) ALOA Components
  • 13.
    Based on theALOA Web Services API Automatically generate metadata from online LOs (html, plain text, word, ppt, pdf) Parameters URL location of the LO Target metadata languages (English, German, Arabic, French, Spanish, Korean) Subset of the generated metadata Output format (LOM XML, HTML, LOM Editor) ALOA User Interface
  • 14.
    Enables to easilyplug-in new components (extractors and generators), for instance: Extractor for multimedia LO (e.g. audio, video, image, flash) Generator for a specific context (e.g. LMS) The components can be deployed on different machines or on different application servers Once deployed, a component can be plugged into ALOA by just giving the address of the component service ALOA core engine validates and adds it to the component list in the properties file Dynamic addition in run time; no need to recompile and rebuild the system ALOA CMI also enables to manage the priorities of the generators and to define the maximum generated values (used by ALOA core engine) ALOA Configuration Management Interface
  • 15.
    ALOA adopts aslightly modified version of SAmgI WSDL specification New methods: getLanguages , setLanguages Modified method: getMetadata Web Services-based interactions between ALOA and AMG possible ALOA as a new SAmgI installation used by the federated AMG engine AMG as a new component (i.e. extractor or generator) of ALOA ALOA and AMG
  • 16.
    ALOA – Aframework for LOM-based automatic metadata generation ALOA already implements different components (i.e. extractors and generators) ALOA already generates LOM from different types of LOs (html, plain text, pdf, ppt, word) Primary focus on flexibility and extensibility of the framework SOA-based architecture enabling new components to be easily plugged into the basic system ALOA provides a public Web Services API for third party applications Conclusion
  • 17.
    Interactions between ALOAand AMG Extension with more extractors and generators based on other text/data mining techniques Look at model transformation techniques to support other metadata schemas (e.g. DC, MPEG) Further research of the quality of automatically generated metadata Combination of automatic metadata generation with a bottom up approach (e.g. Web 2.0 social tagging) Future Work
  • 18.