Cost Estimation of Ontologies Using ONTOCOM
Elena Simperl, Tobias Bürger, Igor Popov, UIBK
Motivation: A typical business scenario

                               How do I
                               identify     How much
                Ontologies
                               relevant      does it
                    ?
                             expeditures?     cost?

                                               What do I gain
                                                   from the
                                               introduction of
          What do we                            the system ?
            need to
          build them?



                                               How do the
                                                  gains
                                               materialize ?


Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Methods and approaches to cost estimation

                                  Bottom-up estimation                 Top-down estimation
                              Experts estimate the costs of       Experts estimate the total costs
           Expert Judgment    low-level components or             of a product or a project
                              activities

                             Costs are calculated using             Cost are estimated using a
            Analogy Method   analogies between low-level or         global similarity function for
                             activities                             products or projects

                               Costs are calculated as an
             Decomposition     average sum of the costs of
                               lower-level units, whose
             Method            development are known in
                               advance
                             Costs are calculated using a         Costs are calculated using a
                             statistic model which predicts       statistic model which is
                             the costs of lower-level units on    calibrated using historical data
         Parametric Method   the basis of historical data about   and predicts the current value
                             the costs of developing such         of the total development costs
                             units

Project ACTIVE
Date: 18.06.2008,
Dubrovnik
ONTOCOM- Overview



               ONTOCOM – A cost estimation model for building ontologies

               ONTOCOM uses top-down, parametric and expert-based methods to form
                its basis for cost estimation of ontology building

               ONTOCOM is realized using a combination of methods:
                    -   Top-down breakdown of ontology engineering processes to reduce complexity
                        (Decomposition method)
                    -   Parametric method to create a-priori statistical prediction model
                    -   Validation and calibration of model according to existing project data and
                        experts estimations lead to a-posteriori model (Expert judgment




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
ONTOCOM



               How ONTOCOM works:


   Define lifecycle phases
   •Ontology building
   •Ontology reuse
   •Ontology maintenance
                                 Specify cost drivers
                                 •Ontology building
                                 •Ontology reuse
                                 •Ontology maintenance   Refine the model
                                                         •Evaluate cost drivers
   Top-down methodology                                  •Specify start values
                                                         •Calibrate the model
                                Parametric methodology
                                                         Parametric methodology
                                                         Expert-based methodology

Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Top down breakdown




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
The parametric equation



               PM: effort in person months
               A : baseline multiplicative constant (in person months)
               Size : expected size of ontology (distinction between different entitiy types
                e.g. classes, properties, axioms´and size of ontology
                building/reuse/maintenance)
               α : acknowledges non-linear behavior wrt. to size
               EM : effort multiplier (correspond to cost drivers)




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Effort multipliers



               Each process stage is characterized by a specific set of cost drivers
               The cost drivers are associated to rating levels
               The rating level (from very low to very high) expresses the impact of each cost driver
                on the development effort
               Each rating level of each cost driver is associated to a weight (quantitative analysis) -
                effort multiplier (EM)
               The values of effort multiplier are subject of further calibration on the basis of
                the statistical analysis of real-world project data.




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Cost drivers



               Product drivers account for the influence ontology characteristics have on
                costs
                    -   e.g. Complexity of the Domain Analysis, Required Reusability, Documentation
                        Needs
               Project drivers account for the influence of project setting characteristics
                on the overall development
                    -   E.g. Support Tools, multi-site development
               Personnel drivers emphasize the role of team experience, ability and
                continuity w.r.t. the effort invested in the process
                    -   E.g. Ontologist/Domain Expert Experience, Language/Tool Experience

               Total amount of cost drivers: 20
               Identification of cost drivers through literature survey, analysis of empiricial
                data and expert interviews
               Overview of the cost drivers: http://ontocom.sti-innsbruck.at/ontocom.htm


Project ACTIVE
Date: 18.06.2008,
Dubrovnik
ONTOCOM



               ONTOCOM Model Calibration


                                     Input from experts


                                        Calibration
                                     Linear Regression
    a-priori method                  Correlation Analysis      a-posteriori method
                                      Bayesian Analysis



                                    Input from gathered data




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Using ONTOCOM: An example



               Exemplary ontology with 600
                concepts, 100 relations and 50
                axioms.
               Cost drivers:
                    -   domain analysis complexity (DCPLX):
                        high
                    -   Evaluation of the results (OE) has a
                        high influence on the effort
                    -   Instantiation complexity (ICPLX) has a
                        low impact on the effort
                    -   Remaining cost drivers: nominal effort
               Constant A and α: values 2.58 and
                0.15 as resulting from the calibration




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Data collection using an online survey




                         We need your data – please visit the survey here:
              http://survey.sti2.at/public/survey.php?name=OntocomSurveyJune13




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Data collection and model calibration in SALERO



               55 identified multimedia ontologies, 15
                replies (30 %)
               Survey results
                    -   Main application of multimedia
                        ontologies: Annotation (47%)
                    -   Total size between 35-10000
                    -   Development effort between 0.5 and
                        130 PM
                    -   Many ontologies were built from
                        scratch (45%)
                    -   Most ontologies in OWL-DL (53%)
               Calibration using linear regression and
                Bayesian analysis resulted in new
                effort multipliers
               Prediction quality improved!




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
New web site

            http://ontocom.sti-innsbruck.at




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Outlook and future plans



               Development of a family of ONTOCOM models
                    -   ONTOCOM-Ultra Lite for the estimiation of folksonomies
                    -   ONTOCOM-Lite for the estimation of lightweight ontologies
                    -   ONTOCOM (Standard) for the estimation of heavyweight ontologies
               Tool support for ONTOCOM
                    -   Automatic calibration and addition / removal of data points
                    -   Form based use of ONTOCOM for cost prediction
               Benefit estimation of ontologies




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Goal: Web 2.0 and semantic technologies’ economic
  measurements – cost estimation



               Produce methods to assess costs of core Web2.0 and semantic technological
                solutions
               Demonstrate their tangible and measurable benefits within an enterprise for their
                adoption
               Cost prediction for development, maintenance and usage of Web2.0 and semantic
                technological components
               How to reach this goal:
                    -   Develop a general model of Semantic Web based applications
                    -   Develop a catalogue of cost drivers for distributed, collaborative applications based on
                        Web2.0 and semantic technologies
                         - Using literature analysis, expert interviews and knowledge elicitation (use case
                             partners)
                    -   Collect cost-benefit related data to calibrate the model & improve prediction quality
               Expected outcome:
                    -   Tool suite for effort estimation, planning and controlling
                    -   Prototypical methods to integrate cost/benefit rationals into collaborative knowledge creation
                        / elicitation tasks

Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Subgoal: Benefit estimation methods for ontologies



     Central question: What are the benefits gained from the introduction of an
      ontology based application?
     Typical distinction: tangible / intangible benefits
     Different methods have a quantitative, qualitative or financial output
     Requirements – the nature of benefits of ontologies
       1. Most expected benefits from typical uses are intangible
            - For Communication: to ensure interoperability, for disambiguation (unique
                identification), or for knowledge transfer (by excluding unwanted interpretations
                through informal semantics).
            - For Computational Inference: for browsing / searching (automatic inferring of implicit
                facts), for automation / code generation or to spot logical inconsistencies.
            - For Reuse and organisation of knowledge: for knowledge reuse or for structuring of
                information and knowledge.
       2. As the main impact of the use of ontologies is to improve information communication, the
          method should not have a financial output
       3. Ontologies and applications using them should be assess simultaneously as an ontology
          typically only acquires value when used in combination with an application (analogously to
          information systems)
First proposal: A multiple gap model for user information
satisfaction analysis



     User Information Satisfaction (UIS) is a method to measure intangible benefits
     UIS can be measured through a comparison of user expectations with perceived
      performance on a number of different facets
     Multiple gap models are useful for assessing how systems are viewed at various
      stages of their design, implementation, and use
     UIS = f(gap1,…Gapn, Influencing-factors)
Sources



               Elena Paslaru Bontas Simperl, Christoph Tempich, Malgorzata Mochol
                "Cost estimation for ontology development: applying the ONTOCOM model"
                In W. Abramowicz and H.C. Mayr, Technologies for Business Information
                Systems. Springer-Verlag Berlin Heidelberg , 2006.
               Elena Paslaru Bontas Simperl, Christoph Tempich, York Sure "ONTOCOM:
                A Cost Estimation Model for Ontology Engineering" In: Proceedings of the
                International Semantic Web Conference ISWC 2006
               Tobias Bürger "A Benefit Estimation Model for Ontologies" In: Poster
                Proceedings of the 5th European Semantic Web Conference (ESWC),
                2008.
               Further information: see http://ontocom.sti-innsbruck.at/info.htm




Project ACTIVE
Date: 18.06.2008,
Dubrovnik
Thank you for your attention




Project ACTIVE
Date: 18.06.2008,
Dubrovnik

ONTOCOM

  • 1.
    Cost Estimation ofOntologies Using ONTOCOM Elena Simperl, Tobias Bürger, Igor Popov, UIBK
  • 2.
    Motivation: A typicalbusiness scenario How do I identify How much Ontologies relevant does it ? expeditures? cost? What do I gain from the introduction of What do we the system ? need to build them? How do the gains materialize ? Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 3.
    Methods and approachesto cost estimation Bottom-up estimation Top-down estimation Experts estimate the costs of Experts estimate the total costs Expert Judgment low-level components or of a product or a project activities Costs are calculated using Cost are estimated using a Analogy Method analogies between low-level or global similarity function for activities products or projects Costs are calculated as an Decomposition average sum of the costs of lower-level units, whose Method development are known in advance Costs are calculated using a Costs are calculated using a statistic model which predicts statistic model which is the costs of lower-level units on calibrated using historical data Parametric Method the basis of historical data about and predicts the current value the costs of developing such of the total development costs units Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 4.
    ONTOCOM- Overview  ONTOCOM – A cost estimation model for building ontologies  ONTOCOM uses top-down, parametric and expert-based methods to form its basis for cost estimation of ontology building  ONTOCOM is realized using a combination of methods: - Top-down breakdown of ontology engineering processes to reduce complexity (Decomposition method) - Parametric method to create a-priori statistical prediction model - Validation and calibration of model according to existing project data and experts estimations lead to a-posteriori model (Expert judgment Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 5.
    ONTOCOM  How ONTOCOM works: Define lifecycle phases •Ontology building •Ontology reuse •Ontology maintenance Specify cost drivers •Ontology building •Ontology reuse •Ontology maintenance Refine the model •Evaluate cost drivers Top-down methodology •Specify start values •Calibrate the model Parametric methodology Parametric methodology Expert-based methodology Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 6.
    Top down breakdown ProjectACTIVE Date: 18.06.2008, Dubrovnik
  • 7.
    The parametric equation  PM: effort in person months  A : baseline multiplicative constant (in person months)  Size : expected size of ontology (distinction between different entitiy types e.g. classes, properties, axioms´and size of ontology building/reuse/maintenance)  α : acknowledges non-linear behavior wrt. to size  EM : effort multiplier (correspond to cost drivers) Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 8.
    Effort multipliers  Each process stage is characterized by a specific set of cost drivers  The cost drivers are associated to rating levels  The rating level (from very low to very high) expresses the impact of each cost driver on the development effort  Each rating level of each cost driver is associated to a weight (quantitative analysis) - effort multiplier (EM)  The values of effort multiplier are subject of further calibration on the basis of the statistical analysis of real-world project data. Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 9.
    Cost drivers  Product drivers account for the influence ontology characteristics have on costs - e.g. Complexity of the Domain Analysis, Required Reusability, Documentation Needs  Project drivers account for the influence of project setting characteristics on the overall development - E.g. Support Tools, multi-site development  Personnel drivers emphasize the role of team experience, ability and continuity w.r.t. the effort invested in the process - E.g. Ontologist/Domain Expert Experience, Language/Tool Experience  Total amount of cost drivers: 20  Identification of cost drivers through literature survey, analysis of empiricial data and expert interviews  Overview of the cost drivers: http://ontocom.sti-innsbruck.at/ontocom.htm Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 10.
    ONTOCOM  ONTOCOM Model Calibration Input from experts Calibration Linear Regression a-priori method Correlation Analysis a-posteriori method Bayesian Analysis Input from gathered data Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 11.
    Using ONTOCOM: Anexample  Exemplary ontology with 600 concepts, 100 relations and 50 axioms.  Cost drivers: - domain analysis complexity (DCPLX): high - Evaluation of the results (OE) has a high influence on the effort - Instantiation complexity (ICPLX) has a low impact on the effort - Remaining cost drivers: nominal effort  Constant A and α: values 2.58 and 0.15 as resulting from the calibration Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 12.
    Data collection usingan online survey We need your data – please visit the survey here: http://survey.sti2.at/public/survey.php?name=OntocomSurveyJune13 Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 13.
    Data collection andmodel calibration in SALERO  55 identified multimedia ontologies, 15 replies (30 %)  Survey results - Main application of multimedia ontologies: Annotation (47%) - Total size between 35-10000 - Development effort between 0.5 and 130 PM - Many ontologies were built from scratch (45%) - Most ontologies in OWL-DL (53%)  Calibration using linear regression and Bayesian analysis resulted in new effort multipliers  Prediction quality improved! Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 14.
    New web site http://ontocom.sti-innsbruck.at Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 15.
    Outlook and futureplans  Development of a family of ONTOCOM models - ONTOCOM-Ultra Lite for the estimiation of folksonomies - ONTOCOM-Lite for the estimation of lightweight ontologies - ONTOCOM (Standard) for the estimation of heavyweight ontologies  Tool support for ONTOCOM - Automatic calibration and addition / removal of data points - Form based use of ONTOCOM for cost prediction  Benefit estimation of ontologies Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 16.
    Goal: Web 2.0and semantic technologies’ economic measurements – cost estimation  Produce methods to assess costs of core Web2.0 and semantic technological solutions  Demonstrate their tangible and measurable benefits within an enterprise for their adoption  Cost prediction for development, maintenance and usage of Web2.0 and semantic technological components  How to reach this goal: - Develop a general model of Semantic Web based applications - Develop a catalogue of cost drivers for distributed, collaborative applications based on Web2.0 and semantic technologies - Using literature analysis, expert interviews and knowledge elicitation (use case partners) - Collect cost-benefit related data to calibrate the model & improve prediction quality  Expected outcome: - Tool suite for effort estimation, planning and controlling - Prototypical methods to integrate cost/benefit rationals into collaborative knowledge creation / elicitation tasks Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 17.
    Subgoal: Benefit estimationmethods for ontologies  Central question: What are the benefits gained from the introduction of an ontology based application?  Typical distinction: tangible / intangible benefits  Different methods have a quantitative, qualitative or financial output  Requirements – the nature of benefits of ontologies 1. Most expected benefits from typical uses are intangible - For Communication: to ensure interoperability, for disambiguation (unique identification), or for knowledge transfer (by excluding unwanted interpretations through informal semantics). - For Computational Inference: for browsing / searching (automatic inferring of implicit facts), for automation / code generation or to spot logical inconsistencies. - For Reuse and organisation of knowledge: for knowledge reuse or for structuring of information and knowledge. 2. As the main impact of the use of ontologies is to improve information communication, the method should not have a financial output 3. Ontologies and applications using them should be assess simultaneously as an ontology typically only acquires value when used in combination with an application (analogously to information systems)
  • 18.
    First proposal: Amultiple gap model for user information satisfaction analysis  User Information Satisfaction (UIS) is a method to measure intangible benefits  UIS can be measured through a comparison of user expectations with perceived performance on a number of different facets  Multiple gap models are useful for assessing how systems are viewed at various stages of their design, implementation, and use  UIS = f(gap1,…Gapn, Influencing-factors)
  • 19.
    Sources  Elena Paslaru Bontas Simperl, Christoph Tempich, Malgorzata Mochol "Cost estimation for ontology development: applying the ONTOCOM model" In W. Abramowicz and H.C. Mayr, Technologies for Business Information Systems. Springer-Verlag Berlin Heidelberg , 2006.  Elena Paslaru Bontas Simperl, Christoph Tempich, York Sure "ONTOCOM: A Cost Estimation Model for Ontology Engineering" In: Proceedings of the International Semantic Web Conference ISWC 2006  Tobias Bürger "A Benefit Estimation Model for Ontologies" In: Poster Proceedings of the 5th European Semantic Web Conference (ESWC), 2008.  Further information: see http://ontocom.sti-innsbruck.at/info.htm Project ACTIVE Date: 18.06.2008, Dubrovnik
  • 20.
    Thank you foryour attention Project ACTIVE Date: 18.06.2008, Dubrovnik