Scc talk
- 1. © 2009 IBM Corporation
Organizing Documented Processes
Biplav Srivastava
Debdoot Mukherjee
IBM Research, India
- 2. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
2 23
Research Theme
Establish an effective framework for organizing design-level
documentation on business processes and linked business artifacts
in order to:
– Boost information reuse across engagements
– Maintain coherence in enterprise process repositories
– Reduce costs and improve quality in business transformation exercises
Setting: Enterprise Resource Planning Projects
– Off-the-shelf software to manage common
business functions (e.g. Finance, Supply Chain)
– Businesses buy these software and then engage
service providers to tailor them
– AMR Research estimates that spending on consulting,
integration and support for packaged application services
was $103B in 2007, and expected to reach $174B by 2012
- 3. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
3 23
Motivation
Blueprinting is the crucial activity in ERP projects where the details are
decided about how the ERP functionality will be used and any new
customizations will be implemented
Documented business processes and related artifacts are the key outputs of
blueprinting
Business Processes are captured in large numbers and in multiple
representations
– Typically over 100 business processes per engagement
– Flow Diagrams: Visio, PowerPoint
– Text Documents: Word, Excel
Effective reuse of process information from past engagements will yield
great benefits
– Conventional document management systems are not capable of providing a
process-centric view of information
– How to search for the most effective business artifacts in the current “process”
context?
- 4. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
4 23
Related Literature
Work in measuring similarity (diagnosing differences) in business process
models
– e.g., Ehrig et al (APCCM ’07), Dijkman (BPM ’08), Van der Aalst et al (BPM ’06)
– Compares flow models in structured formats viz. Petri net, EPC, YAWL
– Linguistic, semantic and structural dimensions of comparing process elements
Extensive literature in Process Mining from execution logs
– ProM framework
Research on choosing an appropriate granularity of process model reuse
– Holschke et al (BPM ’09), Mendling et al (BPM ’08)
Extraction and management of useful process variants (Sadiq BPM ’06)
Traditional methods in legacy text mining and organization
– But they do not specifically focus on process information
No known effort to target design level process information with
linkage to business artifacts of interest viz. requirements, KPIs, use-
cases
- 5. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
5 23
Key Information Elements
Business Process Hierarchies
Industry Specific
Cross Industry
Process Specific Artifacts
Scenario
Process
Process Step
Inputs, Outputs
Non-Process Business Artifacts
Requirement
Use-case
Gap
KPI
- 6. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
6 23
Data, Data Everywhere... Nor Any Drop to Use!!
Design information on business artifacts implemented in
engagements are locked in documents
–Need to turn them into reusable assets
–Retrieve information into a model based format
Enterprise asset repositories are not well organized
–Essentially, a dump of unlinked process documentation in
different formats
– No meta-data available against silos of documents
Inconsistencies in process data
– Multiple teams are responsible for various aspects of
process design
- 7. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
7 23
Process Organization & Reuse
Extract model
based content
Enterprise
repositories
Process Organization
Framework
Content Reuse
Duplicate Detection
- 8. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
8 23
Process Information Extraction - Text
Utilize semi-structured nature of data
Extract content segments present in a document collection, which can map to some process
semantics
Seek an appropriate tag (preferably from a pre-defined meta model) from the user
Utilize layout of content segments in the document to establish cardinality and relations
between various pieces of flat tagged content
Extract Tag
- 9. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
9 23
Process Information Extraction - Diagrams
General purpose diagramming tools viz. Visio, Powerpoint, Xfig etc. are used to capture
business processes. Reasons: Ubiquitous (low cost), Familiarity (intuitive to use)
No formal modeling tool provides sound import capabilities from diagramming formats!!
Challenges in Model Discovery
– Ambiguities are commonplace in informal drawings
– Humans can understand intent from visual cues – machine interpretation is hard!
– Dangling connectors, Unlinked Labels, Over-specification, Under-specification
Steps in Model Discovery : Flow Structure Extraction, Semantic Interpretation
Create
Order
Process
Order
Order
Ship
Order
Create
Order
Process
Order
Over-specification:
Under-specification:
A
C
B
D
Dangling Connectors:
- 10. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
10 23
Problem: Organizing Process Information
Given a dump of business process documentation (both text
and diagrams) from an engagement, how to organize them
so that information contained in them may be effectively
harvested?
Three sub-problems
– Problem 1: Link text and visual representation
– Problem 2: Normalize content in linked text and visual
forms
– Problem 3: Group normalized content in similar clusters
Demonstrate benefit of better organization
- 11. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
11 23
Process Information in Text and Visual Formats
- 12. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
12 23
Benefits in text:
• Process information is detailed
* Problems in text:
• Control flow details is lost
• Unintuitive, e.g., swim lanes is missing
Benefits in flow:
• Control flow is detailed
• Intuitive
* Problems in flow:
• Names in flow do not match text (Functional FP&A Planner v/s
(FP&A Planner)
• Limited information. E.g., whether an activity is system or manual?
Text has the details
Example
- 13. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
13 23
Steps in Process Organization
Set of work product (files)
describing business processes
Link textual and flow (visual) files
Normalize process step information
in linked text and flow
Cluster normalized process
information
Clusters of business processes
with linked non-process artifacts
• Enrichment of information
• Consistency-Single view of truth
• Structured representation
• Name
• Description
• Role
• Predecessors
• Successors
• Inputs
• Outputs
• Nature
• Miscellaneous
• Define suitable similarity measures to
deal with atomic and composite content
• Run a clustering algorithm without
apriori information on number of clusters
- 14. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
14 23
Input
– 240 Process Definition Documents
– 315 Process Flow Diagrams
Linking
Normalization
Empirical Evaluation ― Results
Similarity
Measure
Pair-wise
Matches
# PDDs Precision
(%)
Jaro 126 30 48
Exact 11 11 100
Similarity Measure % Match
(Name)
% Match
(Name + Role)
Jaro 37 8
Exact 45.5 13
- 15. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
15 23
Empirical Evaluation ― Results (2)
Dataset: A set of 240 Process Definition Documents from an actual ERP
project engagement
Number of pair-wise similar processes : 266
Number of clusters found : 23
Range of cluster sizes = (2, 21)
Number of processes similar to at least one other process = 134 (i.e., 55% of
total)
Effectiveness of discovered clusters in boosting similarity of non-process
business artifacts written in context of business processes
Artifact Similarity
inside
clusters
Overall
Similarity
Similarity
Boost (%)
Requirement 0.209 0.014 1430.55
Integration
Consideration
0.620 0.115 438.54
Supplier 0.844 0.109 671.22
- 16. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
16 23
Application to File Duplicate Detection
Scenario
– Input: 1520 files organized in a complex directory structure, 13 different asset
types, files per asset type known
– Problem: Find duplicates or near similar files in an asset type
Approach
– Harvest content of files per asset type
– Cluster based on content
– Files in each cluster are duplicates
16
Type # Files #Clusters #Files in
Some
Cluster
% Unique
PDD 866 116 786 23%
(196/866)
BPP 463 121 406 38%
(178/463)
- 17. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
17 23
Scope for Future Work
Improve precision of text similarity measures
–Use domain specific Word Nets
–Apply sound aggregation measures for robust relational
learning
Build ontologies of ERP concepts and utilize relationships
therein to improve search for similar business artifacts in the
context of a business process
Extraction of process documentation into standardized
representations
- 18. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
18 23
Conclusions
Efficient organization of design-level process documentation,
which may not have execution semantics, can ease
information reuse
Process information can help in searching for useful non-
process business artifacts
– e.g., Searching for the correct use-case or performance
indicator can be easy if these are maintained along with process
information
Enriching and normalizing process information from multiple
representations is important
– Removal of duplicate and inconsistent data is critical
- 19. © 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
19 23
Thank You
Extract model
based content
Enterprise
repositories
Process Organization
Framework
Content Reuse
Duplicate Detection