Model-Driven Design of Audiovisual Indexing Processes for Search Apps.
Upcoming SlideShare
Loading in...5
×
 

Model-Driven Design of Audiovisual Indexing Processes for Search Apps.

on

  • 1,025 views

As the Web becomes a platform for multimedia content fruition, audiovisual search assumes a central role in providing users with the content most adequate to their information needs. A key issue for ...

As the Web becomes a platform for multimedia content fruition, audiovisual search assumes a central role in providing users with the content most adequate to their information needs. A key issue for enabling audiovisual search is extracting indexable knowledge from opaque media. Such a process is heavily constrained by scalability and performance issues and must be able to flexibly incorporate specialized components for educing selected features from media elements. This paper shows how the use of a model-driven approach can help designers specify multimedia indexing processes, verify properties of interest in such processes, and generate the code that orchestrates the components, so as to enable rapid prototyping of content analysis processes in presence of evolving requirements.

Statistics

Views

Total Views
1,025
Views on SlideShare
1,023
Embed Views
2

Actions

Likes
0
Downloads
14
Comments
0

1 Embed 2

http://dbgroup.como.polimi.it 2

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Good morning blah blah blah I’m here today to present you my doctoral thesis, named “ Model-driven development of Search-Based Web Applications”
  • If we consider: 1) The huge increasing in the availability of data in digital format that we have witnessed in the recent years 2) The spread of Internet as one of the main communication mean, from which we derived, in everyday' life, many of its interaction paradigms (e.g., Web search, from which we got use to the easiness and comfort of a single text-box search interface) it can be stated that search (intended as querying heterogenous data sources by means of Search Engines ) has become a de-facto standard paradigm for information seeking , in many usage domain, not only the Web. We can therefore define SBAs as the class of applications in which searching over heterogeneous data constitutes the predominant user interaction paradigm . W.r.t. search engines , which are canned applications with a fixed behaviour, homogeneous data sources , simple content processing flows and basic query flows , SBA are typically tailor-made solutions , where data and user’s needs nature vary for different business sectors and requirements For instance, the front end of a professional SBA for searching audio content can offer advanced feature extraction (e.g., inference of music mood, genre, key, etc) and a simple keyword base interface. The same application could be extended and offered to the general public, by adding a more captivating front-end, allowing query by humming, similarity search with music recorded with a mobile terminal, and so on. Likewise, at the back-end side, the content processing pipeline may vary, e.g., due to the added requirement of extracting a novel feature (e.g., music danceability) or to integrate a novel feature extraction algorithm. In SBAs, search engines are a part of a complex system, involving: heterogeneous data source integration , articulated content analysis operations , complex querying processes , Web-mediated social interactions
  • Even by starting from such an informal definition, it appears clear how SBAs are inherently complex applications , exposing many of the functionalities typical of modern Web applications (such as adaptivity, multi-channel delivery, personalization, etc), retain also a specific flavor, due to the prevailing role of search : they must integrate a complex front-end (devoted to query expression and result presentation ) with a complex back-end (specialized in content provisioning , annotation , indexing and distributed query executio n) SBAs . The trait d'union of these apparently dissimilar focuses is the fact that both embody data-intensive and process-intensive tasks. The query and the result list are first class citizens, which are subject to a life-cycle: they must be obtained, reformulated, possibly integrated with external information, and stored. The same is true for content, which must be provisioned, transcoded, analyzed, annotated, and indexed. We can identify 3 reference processes: + CPI Process: indexation of contents coming from the application data sources (thus involving data retrieval from external sources, transformation or aggregation of the retrieved data and, finally, their indexation) + QRP Process: addressing the operations related to query execution, orchestration and result-set composition + User Interaction process: i.e., the way users interact with the application's functionalities. SBAs heterogeneity is due to several reasons: + heterogeneous nature of the managed contents + information need of its users (end-user, professional) + business field + reduced time-to-market for innovations, especially for what concerns analysis technologies + need for fast prototyping
  • The current state of the art (both in industry and in academia), does not provide methodologies and tools able to holistically cater for the needs of SBA development , as they usually rely either on the developer programming skills or on simple models addressing just part of SBA complexity Such a situation leads to problems really well-know in the SW Engineering field, like a lack of separation of concerns among the different involved actors, low productivity , difficulties in managing and maintaining the application over time, etc.. We claim that SBAs development demand for new methodologies and tools , following a path that somehow resembles the evolution that we have witnessed for fields such Web engineering (need to take into account features typical of the domain): rational development process, clear separation of concerns among the involved actors, central roles of models , automatic code generation , etc. My thesis therefore propose a modeling framework, specifically addressed to SBA application . Modeling frameworks help in rationalizing the design and development process , while reducing the overhead associated with common activities in application development . Moreover, frameworks enable the reuse of software design, and provide tools for the specification of the relevant aspects of the application.
  • Here’s an outline of my thesis’s contributions, which is composed of: + A modeling framework for SBA Model Driven Development. - Set of reference domain specific design dimensions - Development methodologies - Modeling Tools + Semi automatic code generation from Computer-aided design tools + A reference data and process model for the 3 identified processes (CAI, QRP and UI), in which we exemplify how the identified design dimensions influence their specification + The practical implementation of the proposed approach in a tool set + The validation of the approach in the context of an EU project
  • The works lies in space at the intersection of 3 discipline, information retrieval, business process design and model-driven development , from which we selectively derived concepts and principles as foundations of our SBA modeling framework. Search engines and SBA applications derives from the decades-lasting work in the field of IR , a discipline devoted to solve the problem of finding contents that satisfy an information need from within large collections. The discipline proposed solutions for this problem from many point of view (type of managed document and their representation, analysis operation, search algorithms tailored to given document representations, query format, user interaction, also in its social alternatives and so on) Business process design is a method for the representation of processes (of different nature) in terms of related, structured activities or tasks that produce a specific service or product. There have been proposed several standards and methodologies to address BP design at a conceptual level. One of the most notable is the well-know BPMN, now supported my the OMG. MDD for Web Engineering, instead, is a discipline born around 15 years ago, aimed at raising the level of abstraction for (data-intensive or process-intensive) applications on the Web. By sharing (or mutuating) part of its methodology with the MDA (model driven architecture) activity, in Web engineering the use of models and models transformation is fostered, as they are key artifacts for application development. Also here, many methodologies has been proposed. WebML, the Web modeling language, is one of such proposals as it is the one adopted in this work.
  • A slide to make a really brief introduction to the modeling lnaguages adopted in the thesis. The BPMN notation allows representing all the basic process concepts defined by the WfMC model: + ACTIVITIES: the units of works composing a process, typically performed by a single actor + FLOWS: shows in which order the activities will be performed + CONSTRAINS: a condition, related to the activity selection and/or completion, which must be met during work processing; + ARTIFACTS: Data Objects show the reader which data is required or produced in an activity. + EVENTS: denotes something that happens (rather than Activities which are something that is done) **************************************************************************************************************** WebML, a conceptual language for the specification of data-intensive Web applications, i.e., application having as main purpose the manipulation or the publication of data. Among the different Web engineering proposals, we chose WebML for different reasons: because of its general diffusion and acceptance, the simplicity of its notation, and the availability of a code generator we could extend to validate our proposals on real applications. WebML decompose the Web application development problem into 3 major aspects: + domain model, to represent the objects and items that will be managed by the Web application (ER/UML) + navigation model, which covers two aspect, which are the design of how user can interact with the application and the reaction of the system upon link navigation. + service composition and orchestration models: the goal of the process model is to allow designers to specify the workflow enacted by a Web application. The process model is usually defined in advance w.r.t. the domain and the application model, as it influences the design of both.
  • One aspect of the proposed development framework is the definition of a methodology for the design and implementation of the application to be produced. A development approach based on a formal methodology and appropriate high level modeling languages smoothly incorporates change management into the mainstream production life-cycle, and greatly reduces the risk of breaking the software engineering process due to the occurrence of changes. The proposed methodology follows the path of the MDD approach by leveraging on a incremental, iterative design steps that foster separation of concerns among the actors involved in the SBA design. The Conceptual Design macro activity represents the core of the development lifecycle, since it involves the main design activities In the terminology of MDD, the BPMN Process Model can be seen as a Computation Independent Model (CIM), which specifies SBA requirements for the CAI and QRP processes; as we will see, instead, the UI process is address as an Interaction pattern composition activity. The WebML application model is a Platform Independent Model (PIM), which exploits SOA and Web hypertext interfaces as a technical space. Finally, the application code is a Platform Specific Model (PSM) for the Java 2 technical space. Initially, requirements are conceptualized in a Domain Model, which formalizes the essential data objects managed by the application, and a Process Model, which pinpoints the workflow of the CAI,QRP and UI processes. The link between the domain and process models is established by the type of objects that flow between activities. The designed solutions do not take into account domain specific informations like the schema of the adopted search technologies, or the format of the annotations produced by the analysis components. Nonetheless, the focus on a specific class of applications allows one to include, in the business model, high-level concepts relative to the applications’ domain. For SBA, for instance, the concept of query, user, index and so on. The use of an high-level model combined with coarse grained domain concepts allows one to address the designed application in perspective, possibly by creating designs that can be applied to classes of applications (e.g., audiovisual search engines), more than punctual solutions. Abstract-level notation, though, cannot be translated into running code,due to the lack of platform-specific details (e.g., the technologies adopted by actual search engines, analysis components, deployment platform etc.) needed to enact code generation. The Domain Model and Process Model are then subject to a first (CIM to PIM) transformation, which produces the Application Model and process metadata. objects. Therefore, coarse-grained design is followed by refinements that take into account more domain-specific information, like the structure and format for the contents, the annotations and indexes. To do so, a finer grained model is adopted, in order to enable the definition of domain-and application-specific details that can lead to automatic code generation. The proposed approach is generic enough in order to adopt alternative modeling languages, both for process and application design. This slide discusses how to derive an application model from high-level process model. The proposed framework employ the BPMN modeling language for process specification and the WebML modeling language for the design of hypertextes and Web service orchestrations
  • Let’s now have a bird’s eye view on some reference, example design for all the 3 identified SBA’s processes. The CAI process can be defined as the work to be performed by the actors of a SBA to achieve the indexation of a content item . The goal of the domain model is to formalize content- and index-related data and metadata managed by the search applications. Such models build on five basic domain concepts: + Content Item : a Content Item is an individual information unit which is relevant in a search based Web application for indexing purposes. + Annotation : an annotation is the textual information associated with a content item for indexing and searching purposes. Such information might be of different nature, being both manual annotation, provided by the content provider or by the user, and automatically generated annotation, produced by the search application during the Indexing process. + Usage Group : Content Items are published by one or more Content Provider, which is responsible for their publication. A Usage Group is an access profile specified by a content provider to define the set of operations allowed for a given content item to a set of users: + Index : the notion of Index, well known in many disciplines of computer science, denotes a data structure designed in order to optimize speed and performance in finding relevant content items for a search query.
  • Here’s instead, an example of top-down specification for an indexing process, using the proposed methodology. The Figure depicts a coarse model of the indexing process in BPMN, where just a single actor (the search application) is considered. The indexing process is composed by three atomic sub-processes (which span through the architectural layers of the Back-end Business Logic Tier ), namely the content registration process, content analysis process and the content indexation process. The combination of the three processes allows the creation of the search applications’ indexes, starting from content items in their original format, through the creation of suitable annotations produced either automatically by the system or manually by users. Others....
  • BPMN has know problems, i.e., converting BPMN models to executable environments. In the case of SBA, then, where processes are also data-intensive, there is an addition problem of lack of support for the formal definition of process data and data flows among activities. Moreover, being an high-level, domain-independent model, BPMN does not convey domain-specific information about the business logic enacted by the activities, this hindering even further the process of model transformation or, at the end, the automatic generation of running applications. For this reason we formalized extensions to the BPMN language so to introduce the concepts of.....
  • Thanks to the implemented extensions, we inject more information in the higher level model, thus leading to: + finer-grained application models + less errors + more efficiency. Transformations were implemented in ATL, a language for model transformations. Here’s a graphical example of model transformation among BPMN* activities and WebML model, and here’s just to give you a hint of how transformations are coded
  • BPMN extensions also allow alternative representations of a CAI process, in order to perform additional verifications of the process properties. For instance, in order to assess the outputs produced by a CAI process, we advised a method to define a textual, algebraic representation of the process. Each typed activity is defined as an expression which depends on its type and on the process schema. The semantic of each activity is therefore defined extensively, as tabular expression of its input-output. Starting from the BPMN* process definition, and trough a decomposition method, the algebraic representation of the process can be solved on order to obtain the process output for the ground-truth data. Thanks to this method is possible to assess how variation of activities’ business logics produce variations in the representativeness of annotation. *************************************************************************************************** The algebraic representation is assembled by encoding the BPMN structural operators Build a decomposition of the process where: each activity is independent fro the other activity in the same step each activity of the step n+1 has a predecessor in the step n Start from the activity with no antecedents in the process and build the input-output expression Inputs are empty or literal value Proceed to the next steps, using the previously determined expressions to denote the output of activities in the preceding steps The specification of the process can be assembled from that of the activities, by properly encoding the BPMN structural operators: links, conditions, and gateways. The general procedure can be summarized as follows: + Construct the Cartier-Foata decomposition of the process (Cartier & Foata, 1969); such a form is composed by the sequentialization of subsequent "steps" defined as follows: GOAL: decomposition of operations that can be performed simultaneously Since BPMN loops are not treated, the normal form exists and is unique. Examples of process schema dependency: + GUARD CONDITION: composition of an expression of the condition and that of the activity. P(X):= (X.duration<=1200) to state that the face recognition component should be called only for short videos. Then, the expression for the output of the AnalyzeFaces activity would become: if P(TranscodeAudioVideo(RetrieveVideo()).videoOut) then AnalyzeFaces(TranscodeAudioVideo(RetrieveVideo()).videoOut) else () where () denotes the empty expression. + MUTUAL EXCLUSION: When an activity a(x) has multiple incoming links from two or more antecedent activities (say a1 and a2) that are associated with the same input value x of a(), the expression of the output of a() becomes if mutuallyExclusive-a1-a2() then (a1()? a(a1(); a(a2()) else () The condition mutuallyExclusive-a1-a2()checks whether the schema of the process ensures that activity a1 and a2 cannot be executed both at the same time. This happens when a1 and a2 belong to distinct branches of a XOR gateway5. Conversely, if a1 and a2 can be executed both, e.g., because they reside in distinct branches of an AND gateway, the resulting expression of the output of a() is the empty expression .
  • The methodology and the modeling primitives introduced have been implemented in order to test their feasibility and their effectiveness as a mean to model search based Web applications We extended WR 5, a commerical tool for the automatic generation of Web applications, to address the new modeling primitives introduced by this work. We also developed a novel Web editor for the on-line specification of BPMN models (the SBA-WF Editor). Web application external to WebRatio, but fully interoperable with it. The SBA-WF Editor allows user to create and modify BPMN diagrams targeted to the specification of SBA application. The code generator of WR5 was extended: first, the overall code generation process now includes a prior step, that is the transformation of the BPMN diagrams designed in the SBA-WF Editor into WebML models. Then, all the libraries and runtime descriptors needed by the new modeling primitives had been designed and implemented from scratch. The new runtime architecture adds to the previous one the support for a new source of information, i.e., the search engines. The extensions presented in this section are required in order to enable (i) the creation/manteinance of the search engines’ indexes modeled in the Index Model Process metadata generation has been formalized as an ATL transformation from the BPDM metamodel to the process Model
  • The implementation of the illustrated primitives granted the opportunity to put SBA conceptual modeling to work in the development of real-world applications, in order to evaluate the strengths and weaknesses of the approach. We applied the presented methodology within the context of the PHAROS European project. The evaluation of the approach concentrated on two essential issues for a MDD, that are ease of use and effectiveness: the notation and development methodology should provide tangible benefits, in terms of learning curve, intuitiveness and communicability of the specifications, and ease of maintenance. Ease of use and effectiveness can be qualitatively evaluated by comparing model-driven development of SBA with their manual coding: this is a long debated issue (Nussbaumer et al., 2006), which is outside the scope of this work. However the use of a methodology based on conceptual modeling and code generation gain momentum especially when technology is evolving and common standards are lacking: the advantage derives from the generalization of the software design and implementation into an abstract modeling and development framework. This is the current situation with SBA development, where a variety of different tools and architectures are available for much the same objectives.

Model-Driven Design of Audiovisual Indexing Processes for Search Apps. Model-Driven Design of Audiovisual Indexing Processes for Search Apps. Presentation Transcript

  • Model-Driven Design of Audiovisual Indexing Processes for Search-Based Applications Alessandro Bozzon, Marco Brambilla , Piero Fraternali http://dbgroup.como.polimi.it/brambilla/mdd-search-apps CBMI 2009, June 5 th , Chania, Crete.
  • Agenda
    • Introduction
    • Search-based applications (SBAs)
    • The Content Provision and Annotation (CPA) process
      • Definition
      • Verification of properties
    • Conclusions
  • Introduction
    • Search has become the prominent paradigm for information seeking across both online spaces and enterprises.
    • Search-Based Applications (SBAs)
    • searching over heterogeneous data constitutes the predominant user interaction paradigm
    • Search Engine
    • canned applications
    • fixed behavior
    • homogeneous data source
    • simple content processing flows
    • basic query flows
    • SBA
    • tailor-made solution, depending on varying data and user's needs nature
    • search engine s are part of a complex system
      • data source integration
      • content analysis
      • querying
      • Web-mediated social interactions
    View slide
  • Motivations
    • SBAs are complex application integrating:
      • complex front-end: query expression, result presentation, personalization, adaptation
      • complex back-end: content provisioning, annotation, indexing, distributed query execution
    • SBAs embody data-intensive and process-intensive tasks
        • content items , queries , and result lists are first-class objects, subject to a life cycle
        • Content Provisioning and Annotation process ( CAI process)
          • indexation of contents coming from the application data sources (data retrieval from external sources, transformation or aggregation, analysis, indexation)
        • Query and Result Presentation process ( QRP process)
          • operations related to query analysis, execution, orchestration and result-set composition
        • User Interaction process ( UI process)
          • the way users interact with the application's functionalities ( statement of information needs, result navigation, social interactions)
    View slide
  • Thesis
    • Current practices lack complete tools and methodologies
    • rely either on very simple models or on the programming skills of the developers
      • low separation of concerns among involved actors
      • low productivity in the implementation phase
      • problems in managing and maintaining applications over time
    • SBA development demands for the same evolution in methods and tools that has characterized in the recent past the progress of Software Engineering for Web applications
      • clear separation of concerns among the involved actors
      • central roles of models as key development artifacts
      • automatic code generation, etc.
    SBA Model Driven Development Framework
  • The contribution
    • A modeling framework
    • an intuitive Rich Process Model to specify the schema of the CPA process for a given application,
    • use of formal methods to investigate properties of interest of his design
    • code generation techniques to implement and deploy the process
  • Background
    • Information Retrieval
      • ” ... the process of finding contents (e.g., textual documents, multimedia items, etc.) that satisfy an information need from within large collections (usually stored on computers)...”
    • Business Process Design
      • representing processes (of heterogeneous nature) in terms of related, structured activities or tasks that produce a specific service or product
      • several proposals for visual modeling languages (e.g., UML, YAML, BPMN )
    • Model Driven Web Engineering
      • raise of abstraction (separation of platform independent and platform dependent concerns) in Web application design and development
      • use of models (and model transformations) as the key artifacts for application developments
      • several proposals (e.g., UML, Hera, OOHDM, UWE, W2000, WebML )
    • Formal methods
      • Formal semantics to models
      • Automatic verification
  • Background Models BPMN WebML
    • Business Process Model Notation
    • Activities
    • FlowsConstraints (OR-XOR-AND gateways)
    • Artifacts (Data Objects and data associations)
    • Events
    • Web Modeling Language
    • Domain ModelsNavigation Models ( content publication and manipulation, link behavioral semantics)
    • Service composition and orchestration Models ( Web service invocation and publication, XML management)
  • Modeling Approach
    • Process Models: BPMN (CAI, QRP), Interaction Pattern Composition (UI)
    • Domain data and process metadata: ER/UML
    • Application Models: WebML
    • Model To Model Transformation: Java / ATL
    • Model To Code Transformation: Java
  • CPA Process - Domain Model
    • Content model : the objects that relate to the Content Items indexed by a search application
    • Annotation model : structure of the annotations associated with searchable Content Items during the indexing process
    • Usage model : user, roles, permissions
    • Index model :
    • abstraction for the actual physical implementation of search engine indexes
  • CPA Process - Process and Application design
    • Coarse CPA process model
      • Content Registration
      • Content Analysis
      • Content Indexation
    • Fine-grained process model
      • Analysis of audiovisual content trough face recognition and identification technologies
    • Application model
      • Invocation algorithm of annotation technologies
    Refinement M2M Transformation M2T Transformation
    • Running CPA process
      • Console trace of the working annotation technology
      • Process advancement control UI
    Full Size Video
  • CPA - Examples of complex processes Analysis of audiovisual content Analysis of audio-visual content with textual annotations
  • Search-specific extensions to the process model (BPMN*)
    • BPMN
      • does not provide support for:
        • process data
        • data flows and dependencies among activities
      • does not convey domain-specific information about the modeled process for activity’s business logic
    • Formalized extensions to the BPMN language, introducing:
      • typed attributes (content, annotation, process)
      • typed activities
      • activity properties
        • type of produced and consumed data
        • type of enforced operations (classified w.r.t. their input/output)
      • finer-grained data flows
        • mapping of links with guard conditions and parameters
  • BPMN* > WebML transformation
    • More detailed transformation rules
      • finer-grained Application Model, needing less (or no) refinement by the designer
        • typed activities enables reusable PIM models
        • data dependencies are specified at an higher level
      • less errors in Application Model design
      • fastest SBA development
    ATL Transformation Example of ATL BPMD-WebML transformation rule for activities
  • Verification of properties for CPA processes
    • Assessment of CPA process outputs
      • Algebraic representation
        • Expression of an activity depend on the type (number and type of input/output) of the activity and on the schema (activity pre-condition , mutual exclusions or indetermination in setting activity inputs) of the process
      • The expression evaluation requires assigning the semantics to each activity
        • functions from input to output values, described extensively, as tabular expression (or ground-truth)
    • Variation of activities’ business logics (or in the process schema) produces variations in the representativeness of annotations
      • assessment of produced annotation quality, validation against expected outcomes, compare outcomes of different processes
    • indexAnnotation(
    • AggregateAnnotation(
      • AnalyzeAudio(TranscodeAudioVideo(
        • RetrieveVideo()).audioOut),
      • AnalyzeFaces(TranscodeAudioVideo(
        • RetrieveVideo()).videoOut)))
  • Implementation
    • Extensions to a commercial tool for the automatic generation of Web applications
    • Creation of a Web-based BPMN editor for the specification of the CAI and QRP process models
    • Set of model to model and model to text transformations
  • Experiences
    • PHAROS ( P latform for searc H of A udiovisual R esources across O nline S paces) project
      • First-hand experience on a large scale SBA application
        • Adoption of models in the platform design
          • Domain model
          • Query and result presentation process
          • User interaction process
          • Experiments with Content Analysis and
          • Indexing process
      • Developer evaluation
    Pros Cons Complete set of design dimensions High-quality prototypes Quick prototyping cycles Fast development time Lack of reverse transformations from application to process models Availability of only simple guard condition expressions
  • Conclusions and future work
    • A modeling framework
    • an intuitive and visual design of the CPA process,
      • Helps the initial design of the process
      • Helps for its evolution when requirements change
    • the extensibility of the model, based on a plugin approach
    • the amenability to automatic verification
      • for checking properties
      • for estimating the characteristics of a specific CPA configuration
    • the availability of well-established code generation technology
      • great improvement in productivity
    • Future work
    • extension of the validation capabilities
    • industrial implementation
  • Thank You! Marco Brambilla [email_address] www.dei.polimi.it http://www.pharos-audiovisual-search.eu/