1. Development effort
estimation for large scale
business ontologies
Presentation
Dr. Elena Simperl
Dr. Christoph Tempich
20.05.2008
Member of
2. Content
1. Motivation
2. Framework
3. Cost Driver
4. Case Study
5. Conclusion
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Page 1
3. Development effort estimation for large scale business ontologies
Management Summary
Ontocom is a framework to help you estimate the effort related to the building of an
ontology. It make accurate predictions and can be improved with data from your team.
Ontologies are the key enablers for knowledge exchange across the Web
Ontocom is a framework to estimate the effort related to ontology building
Ontocom comes with
a process for effort estimation
a formula and a tool calculating the estimations and
a methodology to adjust the estimations to a particular company.
Ontocom takes the size, the domain, the development complexity, the
expected quality and the experience of the staff as input factors
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Ontocom estimates ontology building costs with a 30% accuracy in 80% of the
cases
We successfully applied the methodology in an ontology development project
within a large German telecommunication operator as part of a SOA project.
Page 2
5. Motivation
Key challenges for telecommunication operators
For telco IT organizations, the migration from today´s vertical, silo-type architecture
towards a flexible, NGN compliant, horizontal architecture is a disruptive step.
Traditional Telco Stove Pipe Architecture Future NGN IT Layered Architecture
Customer Customer
Voice Mobile Data Integrated, convergent Services
CRM, Sales, Order Management
CRM, Sales CRM, Sales CRM, Sales
Order Mgt Order Mgt Order Mgt
Fulfillment Assurance Billing
Assurance
Fulfillment
Assurance
Fulfillment
Assurance
Fulfillment
Billing
Billing
Billing
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
NGN service delivery
Fixed Network Access Mobile Network
Service Service Service Access
Delivery Delivery Delivery
vertical architecture horizontal architecture
Page 4
6. Motivation
Reference Architecture
Many operators base this architectural change on a Service Oriented Architecture (SOA)
in which an ontology enables communication between different applications.
Customer Portal Partner Portal
Service Factory Customer Management
3rd Party
Exposure Order Problem Billing
SDP Management Management
Policy
Enforcement Invoicing
Network based
Computing Infrastructure
Service Elem. Ontology
Security
Federated ID BPM Master Data Mgmt.
ERP
Service Bus
Connectivity Control
Provisioning
Management Service Creation
Network
Dynamic Resource
IP/MPLS
Transport Resource Discovery
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Control
Service Assurance
Resource Activation
Access Resource Management Service Management
Customer Equipment
Supplier Management
Network Factory
Virtualization, ATCA, Blades, Storage
Page 5
7. Motivation
Implementation Timeline
The implementation of the new architecture is a long time effort and an accurate esti-
mation of development efforts is the pre-requisite for the creation of a good project plan.
Elements to align Aligned Project Plan
Feb Mar Apr
Activity 06 07 08 09 10 11 12 13 14 15 16 17
CRM
IT
Process
IT Data
application Resource
IT
Process
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Data
Process Service
Ontology model IT
Process
Data
Page 6
8. Motivation
Ontocom
Ontocom is a framework to estimate the effort to develop an ontology. Ontocom consists
of a process, a formula and a methodology.
Objective Elements
5
?
Process
Ontocom
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Formula Methodology
CHARTPOOL_A4.PPT
Page 7
9. Content
2. Framework
Process
Formula
Example
Methodology
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Page 8
10. Ontocom Framework
Cost estimation in ontology engineering processes
The project manager of an ontology development project estimates the effort for
building, documenting and evaluating the ontology during the requirements analysis.
Requirements analysis
motivating scenarios, use cases, existing solutions,
Knowledge acquisition
effort estimation, competency questions, application requirements
Documentation
Evaluation
Conceptualization
conceptualization of the model, integration and extension of
existing solutions
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Implementation
implementation of the formal model in a representation language
Page 9
11. Ontocom Methodology
Process
Applying Ontocom is easy and follows a five step process. The project manager defines
the different parameters based on the process guidelines which are part of the framework.
Step 1 Step 2 Step 3 Step 4 Step 5
Eval
uatio Eval
Eval
Eval n of uatio
uatio
uatio devel n of
Size n of Effort
n of op- expe
Estimation pers estimation
dom ment cted
onne
ain com quali
l
plexi ty
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
ty
Page 10
12. Ontocom Framework
Effort Estimation Formula
The formula uses information collected in the ontology development process and of
historical information collected from previous projects to make the effort estimation.
.
Parametric Effort Estimation Method
PM = A * (Size ) * ∏ CD i B
Person Normaliza- Size of the Cost
Month tion Factor Ontology Drivers
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Learning
Factor
Result
Input from project manager
Input from methodology
Page 11
13. Ontocom Methodology
Example
The parameters associated with the different cost drivers are predefined in our
calculation tool.
Effort Estimation Formula
Person Size of the Cost
Month Ontology Drivers
Quality of personnel Development complexity
very high very high
6,9 PM = 500 Entities * high X high
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
X average average
low low
very low very low
Page 12
14. Ontocom Methodology
Methodology to adapt Ontocom to your company
For a high accuracy of the model we calculated the parameters aggregating the
experience of well over 40 ontology engineering projects. And counting.
Model generation
Data collection Data analysis Model Usage
Model calibration
Specify cost Collect data Analyze data Calibrate Evaluate Release
drivers model model model
Effort estimations
12.000
11.000
10.000
9.000 +/ -30% tolerance
8.000
average
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
7.000 estimation
6.000
5.000
4.000
3.000
2.000
1.000
0
0 4 8 1 1 2 2 2 3 3
2 6 0 4 8 2 4
The accuracy of the model increases if it is adapted and calibrated with data from your own business.
Page 13
16. Cost drivers
Model application
Step 1: Size of the ontology
Explanation Guidelines
The size of the ontology. This includes all first class Determining the size of a prospected ontology is
citizens of an ontology. Size is measured in kilo a challenging task in an early stage of the
entities. ontology development process.
All class definitions Existing domain ontologies can help to get a
All attribute definitions rough capture.
All relationship definitions 1. Search for existing domain ontologies.
All rule definitions 2. Compare coverage of existing domain
ontologies with the required level of detail
Examples
3. Calculate expected size of the new ontology
An ontology has
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
500 classes
700 attributes
300 relations
no rules
This totals in 1.5 k entities.
Page 15
17. Cost drivers
Model application
Step 2: Evaluation of the domain
Explanation Guidelines
The Domain Analysis Complexity accounts for DOMAIN
those features of the application setting which
Very Low: narrow scope, common-sense
influence the complexity of the engineering
knowledge, low connectivity
outcomes. It consist of three sub categories:
The domain complexity Very High: wide scope, expert knowledge, high
connectivity
The requirements complexity
REQUIREMENTS
The available information sources
Very Low few, simple req.
Examples Very High: very high number of req. with a high
conflicting degree, high number of usability
An ontology for the cooking domain, having a requirements
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
low number of requirements and a high number
of available information sources has a very low INFORMATION SOURCES
to low domain complexity. Very Low high number of sources in various
An ontology for the chemistry domain, with a forms
high number of requirements and a low number Very High none
of available information sources has a high to
very high domain complexity.
Page 16
18. Cost drivers
Model application
Step 3: Evaluation of the development complexity
Explanation Guidelines
The Conceptualization Complexity accounts CONCEPTUALIZATION
for the impact of a complex conceptual model
Very Low: concept list
on the overall costs
Very High: instances, no patterns, considerable
The Implementation Complexity takes into
number of constraints
consideration the additional efforts arisen from
the usage of a specific implementation language
IMPLEMENTATION
Low: The semantics of the conceptualization
Examples compatible to the one of the implementation
language
An ontology for a search application with an
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
thesaurus has a low development complexity. High: Major differences between the two
An ontology for the chemistry domain,
modeling reaction patterns has a high
development complexity.
Page 17
19. Cost drivers
Model application
Step 4: Evaluation of expected quality
Explanation Guidelines
The Evaluation Complexity accounts for the ONTOLOGY EVALUATION
additional efforts eventually invested in
Very Low: small number of tests, easily
generating test cases and evaluating test
generated and reviewed
results. This includes the effort to document the
ontology. Very High: extensive testing, difficult to
generate and review
Required reusability to capture the additional
effort associated with the development of a REUSEABILITY
reusable ontology,
Very Low: Ontology is used for this application
only
Examples
Very High: Ontology should be used across
An ontology which is used for one application many applications as an upper level ontology
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
only without extensive testing has a low factor.
An integration ontology which should be used
across an entire organization or for many web
users with high documentation requirements has
a high or very high factor.
Page 18
20. Cost drivers
Model application
Step 5: Evaluation of personnel
Explanation Guidelines
Ontologist/Domain Expert Capability accounts ONTOLOGIST/DOMAIN EXPERT CAPABILITY
for the perceived ability and efficiency of the
Very Low: 15%
single actors involved in the process (ontologist
and domain expert) as well as their teamwork Very High: 95%
capabilities.
ONTOLOGYIST/DOMAIN EXPERT EXPERIENCE
Ontologist/Domain Expert Experience to mea-
Very Low: 2 month (ontology) / 6 month
sure the level of experience of the engineering
(domain)
team w.r.t. performing ontology engineering.
Very High: 3 years (ontology) / 7 years
Examples (domain)
The new project member who has never worked
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
with ontologies nor has any experience with the
domain has a very low expert experience.
The project manager who has been working with
ontologies for several years and is experienced
in a certain field has a very high expert
experience.
Page 19
21. 4.
Content
Page 20
Case Study
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
22. Case study
Challenge
We used the Ontocom process and formula to create a project plan for the development
of an integration ontology. The challenge was to estimate the size of ontology.
Step 1 Step 2 Step 3 Step 4 Step 5
Eval
uatio Eval
Eval
Eval n of uatio
uatio
uatio devel n of
Size n of Effort
n of op- expe
Estimation pers estimation
dom ment cted
onne
ain com quali
l
plexi ty
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
ty
Page 21
23. Case study
Step 1: Size of the ontology
For the integration ontology the TMForum has developed the Shared Information Data
Model (SID).
SID model overview Process
Existing domain ontologies
The TMForum is an industry
organization of
telecommunication operators
and their vendors.
The SID is a UML model
formalizing the knowledge
relevant to an operators
business
It has approx. 4.8 k UML
elements
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Coverage of the domain
Initial analysis shows that
average coverage across all
sub domain is at around 40%
The estimate size is 12 k
CHARTPOOL_A4.PPT
entities.
Page 22
24. Case study
Step 1: Size of the ontology
For a detailed estimation of the required person month to build the ontology in the
respective sub domains we analyzed the number of classes in each sub domain.
Market / Sales
Market Strategy & Plan
Market Segment
Marketing Campaign
Competitor
Contact/Lead/Prospect
Sales Statistic Sales Channel
52
Customer
Customer
Customer Interaction
Customer Order
Customer Statistic
Customer Problem
Customer SLA
Applied Cust. Bill. Rate
Customer Bill
74
Customer Bill Collection
Customer Bill Inquiry
Product
Product
Product Specification
Strat. Prod. Portf. Plan
Product Offering
Product Performance
Product Usage Statistic
66
Service
Service
Service Specification
Service Applications
Service Configuration
Service Performance
Service Usage
Service Strategy & Plan
Service Trouble
124
Service Test
Resource
Resource Resource Topology Resource Performance Resource Strat. & Plan
241
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Resource Specification Resource Configuration Resource Usage Resource Trouble Resource Test
Supplier / Partner S/P Performance S/P Bill
Supplier/Partner
S/P Plan
S/P Interaction
S/P Product
S/P Order
S/P SLA
S/P Problem
S/P Statistic
8
S/P Bill Inquiry
S/P Payment
Enterprise Common Business
(Under Construction) 25 Party
Location
Business Interaction
Policy
431
Agreement
Page 23
25. Case study
Steps 2 - 5
We rated the remaining factors in accordance with the circumstances found in the
project and following the guidelines of the framework.
Step 2 Step 3 Step 4 Step 5
Evaluation of
Evaluation of Evaluation of Evaluation of
development
domain expected quality personnel
complexity
The telecommuni- The development The expected The personnel
cation domain has of the ontology quality was rated was rated slightly
an above average has an average very high below average.
domain complexity complexity.
Due to the cross Experience with
Although there are The ontology will organizational use ontology building
many information be formalized as documentation was limited in the
sources available an UML diagram has to very team.
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
detailed
there are many It was not planned Modelers with
different to model rules or We needed to experience in the
requirements axioms evaluated the telecommunica-
ontology tions industry are
and it is a broad
thoroughly. difficult to find
domain.
Page 24
26. Case study
Effort estimation for the development of the data model: June 2007
The model size will increase slowly and it will take approximately 26 person month to
develop the complete ontology if engineers can continuously work on it.
Effort estimations Result
Estimation:
12.000
We used a Excel tool in order to compute the
11.000
duration of the development.* We varied the
10.000 number of entities in order to show the increase
9.000 +/-30% tolerance in entities over time.
no. of entities
8.000 Implications:
7.000 average
estimation If modelers stay in the development team the
6.000
learning rate is quite high.
5.000
For the development of the entire model with
4.000
approx 12.000 UML elements we estimated an
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
3.000 effort of 24 month. However, as it is a prediction
2.000 variations of around 30% are still within the
1.000 range of the model.
0 To introduce the first 600 elements plan six
0 5 10 15 20 25 30 35 month.
person month
*:The model was not adapted with data from the operator.
Page 25
27. Case study
Effort estimation for the development of the data model: June 2007
In order to get a rough estimation for the finalization of the different domains we divided
the overall engineering effort into several sub tasks.
Effort estimations
average
12.000 estimation +/-30% tolerance
Enterprise Supplier / Partner
11.000 Market / Sales
10.000
9.000
no. of entities
8.000 Product / Service
7.000
6.000
Customer/ Resource
5.000
4.000
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
3.000
Common business
2.000
1.000
0
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34
person month
Page 26
28. Case study
Comparison with actual numbers in February 2008
The actual effort is higher than expected. This is mainly due to frequent changes in the
modeling team and to technical problems aligning the process and ontology model.
Actual Effort Evaluation
Changes in the development team:
12.000
The team consisted of in average 4 people.
11.000
10.000 The team structure changed quite often due to
9.000 management decisions
Entities
no. of entities
8.000 This required experienced modelers to train
7.000 newcomers
6.000 Aligning the process model with the ontology:
5.000 Tool support to define the data objects required
4.000 for activities in a process model is limited
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
3.000 The original model does not account for the
2.000 integration of an ontology with a process model
1.000 Size
0
The estimate of the size of the ontology is
0 5 10 15 20 25 30 35 relatively good
person month
The project is ongoing
Page 27
30. Lessons learned
Ontocom is a
Previous experience with ontology building implies a high learning rate. This
1 straightforward
can only be achieved if the ontology engineering team is constant.
methodology to
estimate the effort
Identify existing taxonomies, models, classifications in order to get a rough
2
capture of the size of the ontology. related to ontology
engineering.
Evaluation and documentation of the model for a better reusability is the most The experience
3
time consuming part. incorporated in the
methodology includes
best practices for
4 A clear methodology is essential.
ontology engineering.
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
The model can be
5 The target picture must be clear to increase efficiency. improved with
calibration data from
your company.
Ontology development differs from data modeling and experts are difficult to
6
find.
Page 29
31. Thank you.
Find out more about Ontocom at:
http://ontocom.ag-nbi.de/
Member of
32. Contact
Dr. Elena Simperl
Digital Enterprise Research Institute
University of Innsbruck
ICT Technologiepark
Technikerstr. 21a
6020 Innsbruck (Austria)
Phone: +43 512 507 96884
Fax: +43 512 507 9872
Mobile: +43 664 812 5236
e-Mail: elena.simperl@deri.at
Dr. Christoph Tempich
Detecon International GmbH
080413_CT_SEMANTIC TECHNOLOGY_V04.PPT
Industry/Competence Practice IT
Oberkasseler Str. 2
53227 Bonn (Germany)
Phone: +49 228 700-1942
Fax: +49 228 700 – 2361
Mobile: +49 (151) 12720065
e-Mail: Christoph.Tempich@detecon.com
Page 31