SlideShare a Scribd company logo
1 of 19
© 2009 IBM Corporation
Organizing Documented Processes
Biplav Srivastava
Debdoot Mukherjee
IBM Research, India
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
2 23
Research Theme
 Establish an effective framework for organizing design-level
documentation on business processes and linked business artifacts
in order to:
– Boost information reuse across engagements
– Maintain coherence in enterprise process repositories
– Reduce costs and improve quality in business transformation exercises
 Setting: Enterprise Resource Planning Projects
– Off-the-shelf software to manage common
business functions (e.g. Finance, Supply Chain)
– Businesses buy these software and then engage
service providers to tailor them
– AMR Research estimates that spending on consulting,
integration and support for packaged application services
was $103B in 2007, and expected to reach $174B by 2012
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
3 23
Motivation
 Blueprinting is the crucial activity in ERP projects where the details are
decided about how the ERP functionality will be used and any new
customizations will be implemented
 Documented business processes and related artifacts are the key outputs of
blueprinting
 Business Processes are captured in large numbers and in multiple
representations
– Typically over 100 business processes per engagement
– Flow Diagrams: Visio, PowerPoint
– Text Documents: Word, Excel
 Effective reuse of process information from past engagements will yield
great benefits
– Conventional document management systems are not capable of providing a
process-centric view of information
– How to search for the most effective business artifacts in the current “process”
context?
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
4 23
Related Literature
 Work in measuring similarity (diagnosing differences) in business process
models
– e.g., Ehrig et al (APCCM ’07), Dijkman (BPM ’08), Van der Aalst et al (BPM ’06)
– Compares flow models in structured formats viz. Petri net, EPC, YAWL
– Linguistic, semantic and structural dimensions of comparing process elements
 Extensive literature in Process Mining from execution logs
– ProM framework
 Research on choosing an appropriate granularity of process model reuse
– Holschke et al (BPM ’09), Mendling et al (BPM ’08)
 Extraction and management of useful process variants (Sadiq BPM ’06)
 Traditional methods in legacy text mining and organization
– But they do not specifically focus on process information
 No known effort to target design level process information with
linkage to business artifacts of interest viz. requirements, KPIs, use-
cases
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
5 23
Key Information Elements
Business Process Hierarchies
Industry Specific
Cross Industry
Process Specific Artifacts
Scenario
Process
Process Step
Inputs, Outputs
Non-Process Business Artifacts
Requirement
Use-case
Gap
KPI
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
6 23
Data, Data Everywhere... Nor Any Drop to Use!!
Design information on business artifacts implemented in
engagements are locked in documents
–Need to turn them into reusable assets
–Retrieve information into a model based format
Enterprise asset repositories are not well organized
–Essentially, a dump of unlinked process documentation in
different formats
– No meta-data available against silos of documents
Inconsistencies in process data
– Multiple teams are responsible for various aspects of
process design
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
7 23
Process Organization & Reuse
Extract model
based content
Enterprise
repositories
Process Organization
Framework
Content Reuse
Duplicate Detection
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
8 23
Process Information Extraction - Text
 Utilize semi-structured nature of data
 Extract content segments present in a document collection, which can map to some process
semantics
 Seek an appropriate tag (preferably from a pre-defined meta model) from the user
 Utilize layout of content segments in the document to establish cardinality and relations
between various pieces of flat tagged content
Extract Tag
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
9 23
Process Information Extraction - Diagrams
 General purpose diagramming tools viz. Visio, Powerpoint, Xfig etc. are used to capture
business processes. Reasons: Ubiquitous (low cost), Familiarity (intuitive to use)
 No formal modeling tool provides sound import capabilities from diagramming formats!!
 Challenges in Model Discovery
– Ambiguities are commonplace in informal drawings
– Humans can understand intent from visual cues – machine interpretation is hard!
– Dangling connectors, Unlinked Labels, Over-specification, Under-specification
 Steps in Model Discovery : Flow Structure Extraction, Semantic Interpretation
Create
Order
Process
Order
Order
Ship
Order
Create
Order
Process
Order
Over-specification:
Under-specification:
A
C
B
D
Dangling Connectors:
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
10 23
Problem: Organizing Process Information
 Given a dump of business process documentation (both text
and diagrams) from an engagement, how to organize them
so that information contained in them may be effectively
harvested?
 Three sub-problems
– Problem 1: Link text and visual representation
– Problem 2: Normalize content in linked text and visual
forms
– Problem 3: Group normalized content in similar clusters
 Demonstrate benefit of better organization
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
11 23
Process Information in Text and Visual Formats
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
12 23
Benefits in text:
• Process information is detailed
* Problems in text:
• Control flow details is lost
• Unintuitive, e.g., swim lanes is missing
Benefits in flow:
• Control flow is detailed
• Intuitive
* Problems in flow:
• Names in flow do not match text (Functional FP&A Planner v/s
(FP&A Planner)
• Limited information. E.g., whether an activity is system or manual?
Text has the details
Example
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
13 23
Steps in Process Organization
Set of work product (files)
describing business processes
Link textual and flow (visual) files
Normalize process step information
in linked text and flow
Cluster normalized process
information
Clusters of business processes
with linked non-process artifacts
• Enrichment of information
• Consistency-Single view of truth
• Structured representation
• Name
• Description
• Role
• Predecessors
• Successors
• Inputs
• Outputs
• Nature
• Miscellaneous
• Define suitable similarity measures to
deal with atomic and composite content
• Run a clustering algorithm without
apriori information on number of clusters
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
14 23
 Input
– 240 Process Definition Documents
– 315 Process Flow Diagrams
 Linking
 Normalization
Empirical Evaluation ― Results
Similarity
Measure
Pair-wise
Matches
# PDDs Precision
(%)
Jaro 126 30 48
Exact 11 11 100
Similarity Measure % Match
(Name)
% Match
(Name + Role)
Jaro 37 8
Exact 45.5 13
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
15 23
Empirical Evaluation ― Results (2)
 Dataset: A set of 240 Process Definition Documents from an actual ERP
project engagement
 Number of pair-wise similar processes : 266
 Number of clusters found : 23
 Range of cluster sizes = (2, 21)
 Number of processes similar to at least one other process = 134 (i.e., 55% of
total)
 Effectiveness of discovered clusters in boosting similarity of non-process
business artifacts written in context of business processes
Artifact Similarity
inside
clusters
Overall
Similarity
Similarity
Boost (%)
Requirement 0.209 0.014 1430.55
Integration
Consideration
0.620 0.115 438.54
Supplier 0.844 0.109 671.22
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
16 23
Application to File Duplicate Detection
 Scenario
– Input: 1520 files organized in a complex directory structure, 13 different asset
types, files per asset type known
– Problem: Find duplicates or near similar files in an asset type
 Approach
– Harvest content of files per asset type
– Cluster based on content
– Files in each cluster are duplicates
16
Type # Files #Clusters #Files in
Some
Cluster
% Unique
PDD 866 116 786 23%
(196/866)
BPP 463 121 406 38%
(178/463)
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
17 23
Scope for Future Work
Improve precision of text similarity measures
–Use domain specific Word Nets
–Apply sound aggregation measures for robust relational
learning
Build ontologies of ERP concepts and utilize relationships
therein to improve search for similar business artifacts in the
context of a business process
Extraction of process documentation into standardized
representations
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
18 23
Conclusions
Efficient organization of design-level process documentation,
which may not have execution semantics, can ease
information reuse
Process information can help in searching for useful non-
process business artifacts
– e.g., Searching for the correct use-case or performance
indicator can be easy if these are maintained along with process
information
Enriching and normalizing process information from multiple
representations is important
– Removal of duplicate and inconsistent data is critical
© 2009 IBM Corporation
SCC 2009, Organizing Documented Processes
19 23
Thank You
Extract model
based content
Enterprise
repositories
Process Organization
Framework
Content Reuse
Duplicate Detection

More Related Content

What's hot

Enterprise architectsview 2015-apr
Enterprise architectsview 2015-aprEnterprise architectsview 2015-apr
Enterprise architectsview 2015-aprMongoDB
 
BPM - The Promise And Challenges
BPM  - The Promise And ChallengesBPM  - The Promise And Challenges
BPM - The Promise And ChallengesJerald Burget
 
How the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical ComputingHow the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical ComputingIBM India Smarter Computing
 
Kevin Kinsman Resume - February 2017
Kevin Kinsman Resume - February 2017Kevin Kinsman Resume - February 2017
Kevin Kinsman Resume - February 2017Kevin Kinsman
 
Evils of Layering in Telecom Management
Evils of Layering in Telecom ManagementEvils of Layering in Telecom Management
Evils of Layering in Telecom Managementsfratini
 
Overview of Information Framework
Overview of Information FrameworkOverview of Information Framework
Overview of Information FrameworkAyub Qureshi
 

What's hot (13)

Erp4
Erp4Erp4
Erp4
 
Enterprise architectsview 2015-apr
Enterprise architectsview 2015-aprEnterprise architectsview 2015-apr
Enterprise architectsview 2015-apr
 
13721876
1372187613721876
13721876
 
BPM - The Promise And Challenges
BPM  - The Promise And ChallengesBPM  - The Promise And Challenges
BPM - The Promise And Challenges
 
How the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical ComputingHow the IBM Platform LSF Architecture Accelerates Technical Computing
How the IBM Platform LSF Architecture Accelerates Technical Computing
 
Kevin Kinsman Resume - February 2017
Kevin Kinsman Resume - February 2017Kevin Kinsman Resume - February 2017
Kevin Kinsman Resume - February 2017
 
SOA Design Patterns
SOA Design PatternsSOA Design Patterns
SOA Design Patterns
 
Evils of Layering in Telecom Management
Evils of Layering in Telecom ManagementEvils of Layering in Telecom Management
Evils of Layering in Telecom Management
 
Canonical data model
Canonical data modelCanonical data model
Canonical data model
 
Overview of Information Framework
Overview of Information FrameworkOverview of Information Framework
Overview of Information Framework
 
Karunakar.V
Karunakar.VKarunakar.V
Karunakar.V
 
L01 Enterprise Application Architecture
L01 Enterprise Application ArchitectureL01 Enterprise Application Architecture
L01 Enterprise Application Architecture
 
Transpromo on a Budget
Transpromo on a Budget Transpromo on a Budget
Transpromo on a Budget
 

Viewers also liked

Determining QoS of WS-BPEL Compositions
Determining QoS of WS-BPEL CompositionsDetermining QoS of WS-BPEL Compositions
Determining QoS of WS-BPEL CompositionsDebdoot Mukherjee
 
Which Work-Item Updates Need Your Response?
Which Work-Item Updates Need Your Response?Which Work-Item Updates Need Your Response?
Which Work-Item Updates Need Your Response?Debdoot Mukherjee
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Debdoot Mukherjee
 
Electrocardiography,cvp,blood pressure
Electrocardiography,cvp,blood pressureElectrocardiography,cvp,blood pressure
Electrocardiography,cvp,blood pressuresugamadex
 

Viewers also liked (9)

Naelektriziranost na telata
Naelektriziranost na telataNaelektriziranost na telata
Naelektriziranost na telata
 
Determining QoS of WS-BPEL Compositions
Determining QoS of WS-BPEL CompositionsDetermining QoS of WS-BPEL Compositions
Determining QoS of WS-BPEL Compositions
 
Latihan thn 1
Latihan thn 1Latihan thn 1
Latihan thn 1
 
Which Work-Item Updates Need Your Response?
Which Work-Item Updates Need Your Response?Which Work-Item Updates Need Your Response?
Which Work-Item Updates Need Your Response?
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
 
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
Is Text Search an Effective Approach for Fault Localization: A Practitioners ...
 
Womm
WommWomm
Womm
 
Electrocardiography,cvp,blood pressure
Electrocardiography,cvp,blood pressureElectrocardiography,cvp,blood pressure
Electrocardiography,cvp,blood pressure
 
Mobile computing
Mobile computingMobile computing
Mobile computing
 

Similar to Scc talk

Model-Driven Design of Audiovisual Indexing Processes for Search Apps.
Model-Driven Design of Audiovisual Indexing Processes for Search Apps.Model-Driven Design of Audiovisual Indexing Processes for Search Apps.
Model-Driven Design of Audiovisual Indexing Processes for Search Apps.Marco Brambilla
 
Process-Oriented Business Requirements
Process-Oriented Business RequirementsProcess-Oriented Business Requirements
Process-Oriented Business RequirementsDafna Levy
 
Analyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible EnterpriseAnalyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible EnterpriseDafna Levy
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Daniel Zivkovic
 
Achieving synergy between BPM, SOA and EA
Achieving synergy between BPM, SOA and EAAchieving synergy between BPM, SOA and EA
Achieving synergy between BPM, SOA and EAAlexander SAMARIN
 
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...Sudhendu Rai
 
Wei_Zhang_Linkedin
Wei_Zhang_LinkedinWei_Zhang_Linkedin
Wei_Zhang_LinkedinWei Zhang
 
Business Process Design
Business Process DesignBusiness Process Design
Business Process DesignSandy Kemsley
 
sap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hanasap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hanaJames L. Lee
 
ECM BPM Strategy With Enterprise Architecture Maturity Model
ECM BPM Strategy With Enterprise Architecture Maturity ModelECM BPM Strategy With Enterprise Architecture Maturity Model
ECM BPM Strategy With Enterprise Architecture Maturity ModelDavid Champeau
 
Service Oriented Architecture 10 0
Service Oriented Architecture 10 0Service Oriented Architecture 10 0
Service Oriented Architecture 10 0Nigel Tebbutt
 
Information Technology and Supply Chain Management.pptx
Information Technology and Supply Chain Management.pptxInformation Technology and Supply Chain Management.pptx
Information Technology and Supply Chain Management.pptxSiddharth Kumar Rai
 
Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP SolutionProvectus
 
Week7 Submit Analysis And Gain Agreement
Week7 Submit Analysis And Gain AgreementWeek7 Submit Analysis And Gain Agreement
Week7 Submit Analysis And Gain Agreementhapy
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)Syaifuddin Ismail
 

Similar to Scc talk (20)

Model-Driven Design of Audiovisual Indexing Processes for Search Apps.
Model-Driven Design of Audiovisual Indexing Processes for Search Apps.Model-Driven Design of Audiovisual Indexing Processes for Search Apps.
Model-Driven Design of Audiovisual Indexing Processes for Search Apps.
 
Process-Oriented Business Requirements
Process-Oriented Business RequirementsProcess-Oriented Business Requirements
Process-Oriented Business Requirements
 
Analyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible EnterpriseAnalyzing Business Requirements in a Visible Enterprise
Analyzing Business Requirements in a Visible Enterprise
 
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
Canadian Experts Discuss Modern Data Stacks and Cloud Computing for 5 Years o...
 
Achieving synergy between BPM, SOA and EA
Achieving synergy between BPM, SOA and EAAchieving synergy between BPM, SOA and EA
Achieving synergy between BPM, SOA and EA
 
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
Anylogic 2021 Conference Presentation: Automatic generation of simulation mod...
 
Training Agenda
Training AgendaTraining Agenda
Training Agenda
 
James hall ch 14
James hall ch 14James hall ch 14
James hall ch 14
 
TowardsCognitive BPMas a Platform for Smart Process Support over Unstructured...
TowardsCognitive BPMas a Platform for Smart Process Support over Unstructured...TowardsCognitive BPMas a Platform for Smart Process Support over Unstructured...
TowardsCognitive BPMas a Platform for Smart Process Support over Unstructured...
 
Wei_Zhang_Linkedin
Wei_Zhang_LinkedinWei_Zhang_Linkedin
Wei_Zhang_Linkedin
 
Business Process Design
Business Process DesignBusiness Process Design
Business Process Design
 
Session 4 & 5
Session 4 & 5Session 4 & 5
Session 4 & 5
 
sap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hanasap hana|sap hana database| Introduction to sap hana
sap hana|sap hana database| Introduction to sap hana
 
ECM BPM Strategy With Enterprise Architecture Maturity Model
ECM BPM Strategy With Enterprise Architecture Maturity ModelECM BPM Strategy With Enterprise Architecture Maturity Model
ECM BPM Strategy With Enterprise Architecture Maturity Model
 
Service Oriented Architecture 10 0
Service Oriented Architecture 10 0Service Oriented Architecture 10 0
Service Oriented Architecture 10 0
 
Information Technology and Supply Chain Management.pptx
Information Technology and Supply Chain Management.pptxInformation Technology and Supply Chain Management.pptx
Information Technology and Supply Chain Management.pptx
 
ERP
ERPERP
ERP
 
Choosing the right IDP Solution
Choosing the right IDP SolutionChoosing the right IDP Solution
Choosing the right IDP Solution
 
Week7 Submit Analysis And Gain Agreement
Week7 Submit Analysis And Gain AgreementWeek7 Submit Analysis And Gain Agreement
Week7 Submit Analysis And Gain Agreement
 
BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)BI Masterclass slides (Reference Architecture v3)
BI Masterclass slides (Reference Architecture v3)
 

Recently uploaded

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 

Recently uploaded (20)

What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 

Scc talk

  • 1. © 2009 IBM Corporation Organizing Documented Processes Biplav Srivastava Debdoot Mukherjee IBM Research, India
  • 2. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 2 23 Research Theme  Establish an effective framework for organizing design-level documentation on business processes and linked business artifacts in order to: – Boost information reuse across engagements – Maintain coherence in enterprise process repositories – Reduce costs and improve quality in business transformation exercises  Setting: Enterprise Resource Planning Projects – Off-the-shelf software to manage common business functions (e.g. Finance, Supply Chain) – Businesses buy these software and then engage service providers to tailor them – AMR Research estimates that spending on consulting, integration and support for packaged application services was $103B in 2007, and expected to reach $174B by 2012
  • 3. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 3 23 Motivation  Blueprinting is the crucial activity in ERP projects where the details are decided about how the ERP functionality will be used and any new customizations will be implemented  Documented business processes and related artifacts are the key outputs of blueprinting  Business Processes are captured in large numbers and in multiple representations – Typically over 100 business processes per engagement – Flow Diagrams: Visio, PowerPoint – Text Documents: Word, Excel  Effective reuse of process information from past engagements will yield great benefits – Conventional document management systems are not capable of providing a process-centric view of information – How to search for the most effective business artifacts in the current “process” context?
  • 4. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 4 23 Related Literature  Work in measuring similarity (diagnosing differences) in business process models – e.g., Ehrig et al (APCCM ’07), Dijkman (BPM ’08), Van der Aalst et al (BPM ’06) – Compares flow models in structured formats viz. Petri net, EPC, YAWL – Linguistic, semantic and structural dimensions of comparing process elements  Extensive literature in Process Mining from execution logs – ProM framework  Research on choosing an appropriate granularity of process model reuse – Holschke et al (BPM ’09), Mendling et al (BPM ’08)  Extraction and management of useful process variants (Sadiq BPM ’06)  Traditional methods in legacy text mining and organization – But they do not specifically focus on process information  No known effort to target design level process information with linkage to business artifacts of interest viz. requirements, KPIs, use- cases
  • 5. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 5 23 Key Information Elements Business Process Hierarchies Industry Specific Cross Industry Process Specific Artifacts Scenario Process Process Step Inputs, Outputs Non-Process Business Artifacts Requirement Use-case Gap KPI
  • 6. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 6 23 Data, Data Everywhere... Nor Any Drop to Use!! Design information on business artifacts implemented in engagements are locked in documents –Need to turn them into reusable assets –Retrieve information into a model based format Enterprise asset repositories are not well organized –Essentially, a dump of unlinked process documentation in different formats – No meta-data available against silos of documents Inconsistencies in process data – Multiple teams are responsible for various aspects of process design
  • 7. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 7 23 Process Organization & Reuse Extract model based content Enterprise repositories Process Organization Framework Content Reuse Duplicate Detection
  • 8. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 8 23 Process Information Extraction - Text  Utilize semi-structured nature of data  Extract content segments present in a document collection, which can map to some process semantics  Seek an appropriate tag (preferably from a pre-defined meta model) from the user  Utilize layout of content segments in the document to establish cardinality and relations between various pieces of flat tagged content Extract Tag
  • 9. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 9 23 Process Information Extraction - Diagrams  General purpose diagramming tools viz. Visio, Powerpoint, Xfig etc. are used to capture business processes. Reasons: Ubiquitous (low cost), Familiarity (intuitive to use)  No formal modeling tool provides sound import capabilities from diagramming formats!!  Challenges in Model Discovery – Ambiguities are commonplace in informal drawings – Humans can understand intent from visual cues – machine interpretation is hard! – Dangling connectors, Unlinked Labels, Over-specification, Under-specification  Steps in Model Discovery : Flow Structure Extraction, Semantic Interpretation Create Order Process Order Order Ship Order Create Order Process Order Over-specification: Under-specification: A C B D Dangling Connectors:
  • 10. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 10 23 Problem: Organizing Process Information  Given a dump of business process documentation (both text and diagrams) from an engagement, how to organize them so that information contained in them may be effectively harvested?  Three sub-problems – Problem 1: Link text and visual representation – Problem 2: Normalize content in linked text and visual forms – Problem 3: Group normalized content in similar clusters  Demonstrate benefit of better organization
  • 11. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 11 23 Process Information in Text and Visual Formats
  • 12. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 12 23 Benefits in text: • Process information is detailed * Problems in text: • Control flow details is lost • Unintuitive, e.g., swim lanes is missing Benefits in flow: • Control flow is detailed • Intuitive * Problems in flow: • Names in flow do not match text (Functional FP&A Planner v/s (FP&A Planner) • Limited information. E.g., whether an activity is system or manual? Text has the details Example
  • 13. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 13 23 Steps in Process Organization Set of work product (files) describing business processes Link textual and flow (visual) files Normalize process step information in linked text and flow Cluster normalized process information Clusters of business processes with linked non-process artifacts • Enrichment of information • Consistency-Single view of truth • Structured representation • Name • Description • Role • Predecessors • Successors • Inputs • Outputs • Nature • Miscellaneous • Define suitable similarity measures to deal with atomic and composite content • Run a clustering algorithm without apriori information on number of clusters
  • 14. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 14 23  Input – 240 Process Definition Documents – 315 Process Flow Diagrams  Linking  Normalization Empirical Evaluation ― Results Similarity Measure Pair-wise Matches # PDDs Precision (%) Jaro 126 30 48 Exact 11 11 100 Similarity Measure % Match (Name) % Match (Name + Role) Jaro 37 8 Exact 45.5 13
  • 15. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 15 23 Empirical Evaluation ― Results (2)  Dataset: A set of 240 Process Definition Documents from an actual ERP project engagement  Number of pair-wise similar processes : 266  Number of clusters found : 23  Range of cluster sizes = (2, 21)  Number of processes similar to at least one other process = 134 (i.e., 55% of total)  Effectiveness of discovered clusters in boosting similarity of non-process business artifacts written in context of business processes Artifact Similarity inside clusters Overall Similarity Similarity Boost (%) Requirement 0.209 0.014 1430.55 Integration Consideration 0.620 0.115 438.54 Supplier 0.844 0.109 671.22
  • 16. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 16 23 Application to File Duplicate Detection  Scenario – Input: 1520 files organized in a complex directory structure, 13 different asset types, files per asset type known – Problem: Find duplicates or near similar files in an asset type  Approach – Harvest content of files per asset type – Cluster based on content – Files in each cluster are duplicates 16 Type # Files #Clusters #Files in Some Cluster % Unique PDD 866 116 786 23% (196/866) BPP 463 121 406 38% (178/463)
  • 17. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 17 23 Scope for Future Work Improve precision of text similarity measures –Use domain specific Word Nets –Apply sound aggregation measures for robust relational learning Build ontologies of ERP concepts and utilize relationships therein to improve search for similar business artifacts in the context of a business process Extraction of process documentation into standardized representations
  • 18. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 18 23 Conclusions Efficient organization of design-level process documentation, which may not have execution semantics, can ease information reuse Process information can help in searching for useful non- process business artifacts – e.g., Searching for the correct use-case or performance indicator can be easy if these are maintained along with process information Enriching and normalizing process information from multiple representations is important – Removal of duplicate and inconsistent data is critical
  • 19. © 2009 IBM Corporation SCC 2009, Organizing Documented Processes 19 23 Thank You Extract model based content Enterprise repositories Process Organization Framework Content Reuse Duplicate Detection