Scc talk


Published on

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Scc talk

  1. 1. © 2009 IBM CorporationOrganizing Documented ProcessesBiplav SrivastavaDebdoot MukherjeeIBM Research, India
  2. 2. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes2 23Research Theme Establish an effective framework for organizing design-leveldocumentation on business processes and linked business artifactsin order to:– Boost information reuse across engagements– Maintain coherence in enterprise process repositories– Reduce costs and improve quality in business transformation exercises Setting: Enterprise Resource Planning Projects– Off-the-shelf software to manage commonbusiness functions (e.g. Finance, Supply Chain)– Businesses buy these software and then engageservice providers to tailor them– AMR Research estimates that spending on consulting,integration and support for packaged application serviceswas $103B in 2007, and expected to reach $174B by 2012
  3. 3. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes3 23Motivation Blueprinting is the crucial activity in ERP projects where the details aredecided about how the ERP functionality will be used and any newcustomizations will be implemented Documented business processes and related artifacts are the key outputs ofblueprinting Business Processes are captured in large numbers and in multiplerepresentations– Typically over 100 business processes per engagement– Flow Diagrams: Visio, PowerPoint– Text Documents: Word, Excel Effective reuse of process information from past engagements will yieldgreat benefits– Conventional document management systems are not capable of providing aprocess-centric view of information– How to search for the most effective business artifacts in the current “process”context?
  4. 4. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes4 23Related Literature Work in measuring similarity (diagnosing differences) in business processmodels– e.g., Ehrig et al (APCCM ’07), Dijkman (BPM ’08), Van der Aalst et al (BPM ’06)– Compares flow models in structured formats viz. Petri net, EPC, YAWL– Linguistic, semantic and structural dimensions of comparing process elements Extensive literature in Process Mining from execution logs– ProM framework Research on choosing an appropriate granularity of process model reuse– Holschke et al (BPM ’09), Mendling et al (BPM ’08) Extraction and management of useful process variants (Sadiq BPM ’06) Traditional methods in legacy text mining and organization– But they do not specifically focus on process information No known effort to target design level process information withlinkage to business artifacts of interest viz. requirements, KPIs, use-cases
  5. 5. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes5 23Key Information ElementsBusiness Process HierarchiesIndustry SpecificCross IndustryProcess Specific ArtifactsScenarioProcessProcess StepInputs, OutputsNon-Process Business ArtifactsRequirementUse-caseGapKPI
  6. 6. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes6 23Data, Data Everywhere... Nor Any Drop to Use!!Design information on business artifacts implemented inengagements are locked in documents–Need to turn them into reusable assets–Retrieve information into a model based formatEnterprise asset repositories are not well organized–Essentially, a dump of unlinked process documentation indifferent formats– No meta-data available against silos of documentsInconsistencies in process data– Multiple teams are responsible for various aspects ofprocess design
  7. 7. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes7 23Process Organization & ReuseExtract modelbased contentEnterpriserepositoriesProcess OrganizationFrameworkContent ReuseDuplicate Detection
  8. 8. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes8 23Process Information Extraction - Text Utilize semi-structured nature of data Extract content segments present in a document collection, which can map to some processsemantics Seek an appropriate tag (preferably from a pre-defined meta model) from the user Utilize layout of content segments in the document to establish cardinality and relationsbetween various pieces of flat tagged contentExtract Tag
  9. 9. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes9 23Process Information Extraction - Diagrams General purpose diagramming tools viz. Visio, Powerpoint, Xfig etc. are used to capturebusiness processes. Reasons: Ubiquitous (low cost), Familiarity (intuitive to use) No formal modeling tool provides sound import capabilities from diagramming formats!! Challenges in Model Discovery– Ambiguities are commonplace in informal drawings– Humans can understand intent from visual cues – machine interpretation is hard!– Dangling connectors, Unlinked Labels, Over-specification, Under-specification Steps in Model Discovery : Flow Structure Extraction, Semantic InterpretationCreateOrderProcessOrderOrderShipOrderCreateOrderProcessOrderOver-specification:Under-specification:ACBDDangling Connectors:
  10. 10. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes10 23Problem: Organizing Process Information Given a dump of business process documentation (both textand diagrams) from an engagement, how to organize themso that information contained in them may be effectivelyharvested? Three sub-problems– Problem 1: Link text and visual representation– Problem 2: Normalize content in linked text and visualforms– Problem 3: Group normalized content in similar clusters Demonstrate benefit of better organization
  11. 11. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes11 23Process Information in Text and Visual Formats
  12. 12. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes12 23Benefits in text:• Process information is detailed* Problems in text:• Control flow details is lost• Unintuitive, e.g., swim lanes is missingBenefits in flow:• Control flow is detailed• Intuitive* Problems in flow:• Names in flow do not match text (Functional FP&A Planner v/s(FP&A Planner)• Limited information. E.g., whether an activity is system or manual?Text has the detailsExample
  13. 13. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes13 23Steps in Process OrganizationSet of work product (files)describing business processesLink textual and flow (visual) filesNormalize process step informationin linked text and flowCluster normalized processinformationClusters of business processeswith linked non-process artifacts• Enrichment of information• Consistency-Single view of truth• Structured representation• Name• Description• Role• Predecessors• Successors• Inputs• Outputs• Nature• Miscellaneous• Define suitable similarity measures todeal with atomic and composite content• Run a clustering algorithm withoutapriori information on number of clusters
  14. 14. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes14 23 Input– 240 Process Definition Documents– 315 Process Flow Diagrams Linking NormalizationEmpirical Evaluation ― ResultsSimilarityMeasurePair-wiseMatches# PDDs Precision(%)Jaro 126 30 48Exact 11 11 100Similarity Measure % Match(Name)% Match(Name + Role)Jaro 37 8Exact 45.5 13
  15. 15. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes15 23Empirical Evaluation ― Results (2) Dataset: A set of 240 Process Definition Documents from an actual ERPproject engagement Number of pair-wise similar processes : 266 Number of clusters found : 23 Range of cluster sizes = (2, 21) Number of processes similar to at least one other process = 134 (i.e., 55% oftotal) Effectiveness of discovered clusters in boosting similarity of non-processbusiness artifacts written in context of business processesArtifact SimilarityinsideclustersOverallSimilaritySimilarityBoost (%)Requirement 0.209 0.014 1430.55IntegrationConsideration0.620 0.115 438.54Supplier 0.844 0.109 671.22
  16. 16. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes16 23Application to File Duplicate Detection Scenario– Input: 1520 files organized in a complex directory structure, 13 different assettypes, files per asset type known– Problem: Find duplicates or near similar files in an asset type Approach– Harvest content of files per asset type– Cluster based on content– Files in each cluster are duplicates16Type # Files #Clusters #Files inSomeCluster% UniquePDD 866 116 786 23%(196/866)BPP 463 121 406 38%(178/463)
  17. 17. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes17 23Scope for Future WorkImprove precision of text similarity measures–Use domain specific Word Nets–Apply sound aggregation measures for robust relationallearningBuild ontologies of ERP concepts and utilize relationshipstherein to improve search for similar business artifacts in thecontext of a business processExtraction of process documentation into standardizedrepresentations
  18. 18. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes18 23ConclusionsEfficient organization of design-level process documentation,which may not have execution semantics, can easeinformation reuseProcess information can help in searching for useful non-process business artifacts– e.g., Searching for the correct use-case or performanceindicator can be easy if these are maintained along with processinformationEnriching and normalizing process information from multiplerepresentations is important– Removal of duplicate and inconsistent data is critical
  19. 19. © 2009 IBM CorporationSCC 2009, Organizing Documented Processes19 23Thank YouExtract modelbased contentEnterpriserepositoriesProcess OrganizationFrameworkContent ReuseDuplicate Detection