SlideShare a Scribd company logo
1 of 41
Automated Correlation Discovery for Semi-Structured Business Processes DEBS 2011 Szabolcs Rozsnyai, Aleksander Slominski, Geetika T. Lakshmanan
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Motivation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],We present a novel algorithm to automatically determine  correlation rules  for the purposes of monitoring, and discovery, and other applications
Solution Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Big Picture and Context
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Related Work  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Overview ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Pre-Processing ,[object Object],[object Object],[object Object],Raw Event Event Attributes EventType     Common Alias Key Timestamp Type Raw         DateTime OrderId Product … 32123… 2011-01-01T09:35:52.50 OrderReceived <OrderReceived…  2011-01-01T09:35:52.50 166635 ProductA …         DateTime ShipmentId OrderId … 213131… 2011-01-01T09:40:54.50 Shipment Created <Shipment Created… 2011-01-01T09:31:52.50 253355 166635 …                                                                
Statistics Calculation 1/2
Statistics Calculation 2/2 Attribute Cardinality It contains a map of each value and how often each of those values occurred.  Card Determines the number of different values for the attribute. Cnt Represents the total number of instances in which the attribute occurs. As the data structure does not work on a defined schema it is possible that the attribute does not occur in every instance.  AvgAttributeLength Represents the average attribute length of the current attribute. This is an indicator about the potential uniqueness of a value. A long value might be the sign that attribute might be a unique identifier. Unique identifiers such as  OrderId  is a potential attribute that occurs in other types and thus forms a correlation. This may also be misleading since a textual description may be very long and is in fact unique but it is never used for correlating artefacts. InferencedType ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],NoOfNumeric Depending on the InferencedType this variable contains the number of values that are of a numeric type. NoOfAlphaNum Depending on the InferencedType this variable contains the number of values that are of an alpha-numeric type.
Example Example
Example - Index The attribute cardinality (i.e. Index) contains a  map of each value and how often each of those values occurred . Example
Example - Card 4 Unique Values Determines the  number of different values  for the attribute. Example
Example - Cnt Cnt=5 For certain attributes the number might be smaller as they can be null or missing Cnt=5 For certain attributes the number might be smaller as they can be null or missing Cnt=5 For certain attributes the number might be smaller as they can be null or missing Represents the  total number of instances in which the attribute occurs . As the data structure does not work on a defined schema it is possible that the attribute does not occur in every instance.  Represents the  total number of instances in which the attribute occurs . As the data structure does not work on a defined schema it is possible that the attribute does not occur in every instance.  Example
Example - AvgAttributeLength AvgAttributeLength is calculated  Represents the  average attribute length  of the current attribute. This is an indicator about the potential uniqueness of a value. A long value might be the sign that attribute might be a unique identifier. Unique identifiers such as OrderId is a potential attribute that occurs in other types and thus forms a correlation. This may also be misleading since a textual description may be very long and is in fact unique but it is never used for correlating artefacts. Example
Example - InferencedType Determines DataType Defines the  type of an attribute . The type of an attribute is an important characteristic for correlation discovery to reduce the problem space of correlation candidates. The chances that a type would correlate with another attribute given that the type contains mostly alpha-numeric attributes are very low.  The determination of the type is made with a  fault tolerance of 0.9  (e.g. min. 90% of the values must be numeric), and we refer to this as a  parameter Phi.  Example
Example – The rest of the types… Example
Example – The rest of the types… Example
Determining Correlation Candidates ,[object Object],[object Object],[object Object],[object Object]
Difference Set 1/2 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[1] I. Ilyas, V. Markl, P. Haas, P. Brown. (2004). CORDS: Automatic discovery of correlations and soft functional dependencies. [2] A. Rostin, O. Albrecht, F. Naumann, J. Bauckmann, and U. Leser. (2009).  A Machine Learning Approach to Foreign Key Discovery,  (WebDB).
Example – Determining Highly Indexables ,[object Object],[object Object],[object Object],1 1 0.8 0.8 0.2 1 Calculate Card/Cnt Example
Example – Determining Highly Indexables ,[object Object],[object Object],[object Object],1 1 0.8 0.8 0.2 1 Card / Cnt > Alpha where Alpha = 0.9  AvgAttributeLength > Epsilon where Epsilon = 5 Example
Example – Determining Mappables The Mappable Attribute can be seen as means to reduced search space of potentially correlating attributes of a type. One approach is to set an upper threshold of how often a value of an attribute can occur. The assumption is that if it occurs more then x times it is unlikely that it is a correlation candidate.  x… Cardinality of a value i… Attribute of a type { xi |  x < Gamma } Card < Gamma where Gamma = 10 For instance in this domain it might be unlikely that a shipment has more than 10 orders. However this might cause problems in other domains or for certain relationships (one customer has definitely more than 10 orders). Example
Difference Set 2/2 ,[object Object],[object Object],[object Object],[object Object]
Example ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Example DateTime’s  are excluded as they are a timestamp which are of a type that are not suitable for correlation pairs. This also applies for  booleans  and  description texts .
Example – DifferenceSet for all Permutations Example OrderReceived.OrderId = ShipmentCreated.ShipmentId OrderReceived.OrderId = ShipmentCreated.OrderId OrderReceived.OrderId = TransportStarted.TransportId OrderReceived.OrderId = TransportStarted.ShipmentId … A/B = {x | x  A    x  B} |A/B| <= DiffTreshold 100% 0% 100% 100% … DiffTreshold = 0.95 OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId  Resulting candidates of Correlation Pairs with 100% overlapping SetDiff SetDiff 0% 0% 0% 0% 0%
Example – DifferenceSet for all Permutations Example A/B = {x | x  A    x  B} |A/B| <= DiffTreshold OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId  ,[object Object],[object Object],[object Object],SetDiff 0% 0% 0% 0% 0%
Difference between AvgAttributeLength ,[object Object],[object Object]
Example –  AvgAttributeLength  Example OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId  SetDiff 0% 0% 0% 0% 0% AvgAttrLength 0 0 0 0 0
LevenshteinDistance ,[object Object],[object Object],[object Object]
Example –  LevenshteinDistance Example OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId  SetDiff 0% 0% 0% 0% 0% AvgAttrLength 0 0 0 0 0 LevenshteinDistance 0 0 0 0 0
Example –  Weight Calculation Example OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId  SetDiff 0% 0% 0% 0% 0% Avg Attribute Length 0 0 0 0 0 Levenshtein Distance 0 0 0 0 0 SetDiff AvgAttrLenght LevenshteinDistance 60% 20% 20% Confidence 100% 100% 100% 100% 100% Weight is adjustable!
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Correlation Discovery
Correlation Discovery Refinement
Agenda ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evaluation, Conclusion and Future Work ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Precision: 99.56%  False Positive Example:  correlation by “ orderVolume”    Always similar size and attributes has a min. length ( No.of.RelevantCorrelationRules  / ( No.of. RelevantCorrelationRules + FalsePositives ) * 100).
THANK YOU! Questions?

More Related Content

What's hot

Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
 
Dq2644974501
Dq2644974501Dq2644974501
Dq2644974501IJMER
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data miningkavitha muneeshwaran
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization janani thirupathi
 
Object Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQLObject Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQLShahriar Hyder
 
Preprocessing
PreprocessingPreprocessing
Preprocessingmmuthuraj
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaIJDKP
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data MiningDHIVYADEVAKI
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...Computer Science Journals
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysisHirra Sultan
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to RAnshik Bansal
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & VisualizationTreparel
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingHarry Potter
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Datafeiwin
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeIJSRD
 

What's hot (20)

G1803054653
G1803054653G1803054653
G1803054653
 
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsHortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data Sets
 
Dq2644974501
Dq2644974501Dq2644974501
Dq2644974501
 
Data Integration and Transformation in Data mining
Data Integration and Transformation in Data miningData Integration and Transformation in Data mining
Data Integration and Transformation in Data mining
 
data generalization and summarization
data generalization and summarization data generalization and summarization
data generalization and summarization
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
Object Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQLObject Relational Mapping with LINQ To SQL
Object Relational Mapping with LINQ To SQL
 
Preprocessing
PreprocessingPreprocessing
Preprocessing
 
Data Mining
Data MiningData Mining
Data Mining
 
Enhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging areaEnhancement techniques for data warehouse staging area
Enhancement techniques for data warehouse staging area
 
data-modeling-paper
data-modeling-paperdata-modeling-paper
data-modeling-paper
 
Data preprocessing in Data Mining
Data preprocessing in Data MiningData preprocessing in Data Mining
Data preprocessing in Data Mining
 
Text mining
Text miningText mining
Text mining
 
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
An Efficient Annotation of Search Results Based on Feature Ranking Approach f...
 
Attribute oriented analysis
Attribute oriented analysisAttribute oriented analysis
Attribute oriented analysis
 
Data Science as a Career and Intro to R
Data Science as a Career and Intro to RData Science as a Career and Intro to R
Data Science as a Career and Intro to R
 
2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization2014: Treparel Big Data Text Analytics & Visualization
2014: Treparel Big Data Text Analytics & Visualization
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Mining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire DataMining from Open Answers in Questionnaire Data
Mining from Open Answers in Questionnaire Data
 
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribeEfficient Filtering Algorithms for Location- Aware Publish/subscribe
Efficient Filtering Algorithms for Location- Aware Publish/subscribe
 

Viewers also liked

Business Process Insight - SRII 2012
Business Process Insight - SRII 2012Business Process Insight - SRII 2012
Business Process Insight - SRII 2012Szabolcs Rozsnyai
 
Business Process Management and Virtual Worlds
Business Process Management and Virtual WorldsBusiness Process Management and Virtual Worlds
Business Process Management and Virtual WorldsIan Hughes / epredator
 
Business process modelling with sbi an example
Business process modelling with sbi an exampleBusiness process modelling with sbi an example
Business process modelling with sbi an exampleSatyam Anand
 
Business Process Modeling Case Study
Business Process Modeling Case StudyBusiness Process Modeling Case Study
Business Process Modeling Case StudyAkash Gajjar
 
Supply chain excellence
Supply chain excellenceSupply chain excellence
Supply chain excellenceKeivan Zokaei
 
Business Process Management in Higher Education Institutions - an award winni...
Business Process Management in Higher Education Institutions - an award winni...Business Process Management in Higher Education Institutions - an award winni...
Business Process Management in Higher Education Institutions - an award winni...Tomislav Rozman
 
The Forrester Wave BPM Suites 2013
The Forrester Wave BPM Suites 2013The Forrester Wave BPM Suites 2013
The Forrester Wave BPM Suites 2013Luciano Gomes
 
Beyond Process Mining: Discovering Business Rules From Event Logs
Beyond Process Mining: Discovering Business Rules From Event LogsBeyond Process Mining: Discovering Business Rules From Event Logs
Beyond Process Mining: Discovering Business Rules From Event LogsMarlon Dumas
 
Introduction to the BPM Lifecycle
Introduction to the BPM LifecycleIntroduction to the BPM Lifecycle
Introduction to the BPM LifecycleMichael zur Muehlen
 
Service Management with Odoo/OpenERP - Opendays 2014
Service Management with Odoo/OpenERP - Opendays 2014Service Management with Odoo/OpenERP - Opendays 2014
Service Management with Odoo/OpenERP - Opendays 2014Daniel Reis
 
H&M Strategic Recommendations in Depth
H&M Strategic Recommendations in DepthH&M Strategic Recommendations in Depth
H&M Strategic Recommendations in DepthVasiliki Evangelou
 

Viewers also liked (11)

Business Process Insight - SRII 2012
Business Process Insight - SRII 2012Business Process Insight - SRII 2012
Business Process Insight - SRII 2012
 
Business Process Management and Virtual Worlds
Business Process Management and Virtual WorldsBusiness Process Management and Virtual Worlds
Business Process Management and Virtual Worlds
 
Business process modelling with sbi an example
Business process modelling with sbi an exampleBusiness process modelling with sbi an example
Business process modelling with sbi an example
 
Business Process Modeling Case Study
Business Process Modeling Case StudyBusiness Process Modeling Case Study
Business Process Modeling Case Study
 
Supply chain excellence
Supply chain excellenceSupply chain excellence
Supply chain excellence
 
Business Process Management in Higher Education Institutions - an award winni...
Business Process Management in Higher Education Institutions - an award winni...Business Process Management in Higher Education Institutions - an award winni...
Business Process Management in Higher Education Institutions - an award winni...
 
The Forrester Wave BPM Suites 2013
The Forrester Wave BPM Suites 2013The Forrester Wave BPM Suites 2013
The Forrester Wave BPM Suites 2013
 
Beyond Process Mining: Discovering Business Rules From Event Logs
Beyond Process Mining: Discovering Business Rules From Event LogsBeyond Process Mining: Discovering Business Rules From Event Logs
Beyond Process Mining: Discovering Business Rules From Event Logs
 
Introduction to the BPM Lifecycle
Introduction to the BPM LifecycleIntroduction to the BPM Lifecycle
Introduction to the BPM Lifecycle
 
Service Management with Odoo/OpenERP - Opendays 2014
Service Management with Odoo/OpenERP - Opendays 2014Service Management with Odoo/OpenERP - Opendays 2014
Service Management with Odoo/OpenERP - Opendays 2014
 
H&M Strategic Recommendations in Depth
H&M Strategic Recommendations in DepthH&M Strategic Recommendations in Depth
H&M Strategic Recommendations in Depth
 

Similar to Automated Correlation Discovery for Semi-Structured Business Processes

CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISijseajournal
 
software_engg-chap-03.ppt
software_engg-chap-03.pptsoftware_engg-chap-03.ppt
software_engg-chap-03.ppt064ChetanWani
 
Analysis modeling in software engineering
Analysis modeling in software engineeringAnalysis modeling in software engineering
Analysis modeling in software engineeringMuhammadTalha436
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure Prof Ansari
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPTTrinath
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEkevig
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEijnlc
 
Summary data modelling
Summary data modellingSummary data modelling
Summary data modellingNovita Sari
 
DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS Adams Sidibe
 
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhhppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhhshaikfahim2127
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPromptCloud
 
Literature Survey: Clustering Technique
Literature Survey: Clustering TechniqueLiterature Survey: Clustering Technique
Literature Survey: Clustering TechniqueEditor IJCATR
 
20100810
2010081020100810
20100810guanqoo
 

Similar to Automated Correlation Discovery for Semi-Structured Business Processes (20)

CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSISCORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
CORRELATING FEATURES AND CODE BY DYNAMIC AND SEMANTIC ANALYSIS
 
software_engg-chap-03.ppt
software_engg-chap-03.pptsoftware_engg-chap-03.ppt
software_engg-chap-03.ppt
 
Analysis modeling in software engineering
Analysis modeling in software engineeringAnalysis modeling in software engineering
Analysis modeling in software engineering
 
Analysis modeling
Analysis modelingAnalysis modeling
Analysis modeling
 
Introduction to Data Structure
Introduction to Data Structure Introduction to Data Structure
Introduction to Data Structure
 
Data Modeling PPT
Data Modeling PPTData Modeling PPT
Data Modeling PPT
 
Algorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftjAlgorithms and Data Structures~hmftj
Algorithms and Data Structures~hmftj
 
User Case.pptx
User Case.pptxUser Case.pptx
User Case.pptx
 
fINAL ML PPT.pptx
fINAL ML PPT.pptxfINAL ML PPT.pptx
fINAL ML PPT.pptx
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Data Science Machine
Data Science Machine Data Science Machine
Data Science Machine
 
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTEA FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
A FILM SYNOPSIS GENRE CLASSIFIER BASED ON MAJORITY VOTE
 
Ch08
Ch08Ch08
Ch08
 
Ch08
Ch08Ch08
Ch08
 
Summary data modelling
Summary data modellingSummary data modelling
Summary data modelling
 
DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS DATA STRUCTURE AND ALGORITHMS
DATA STRUCTURE AND ALGORITHMS
 
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhhppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
ppt-20.06.24.pptx ghyyuuuygrfggtyghffhhhh
 
Popular Text Analytics Algorithms
Popular Text Analytics AlgorithmsPopular Text Analytics Algorithms
Popular Text Analytics Algorithms
 
Literature Survey: Clustering Technique
Literature Survey: Clustering TechniqueLiterature Survey: Clustering Technique
Literature Survey: Clustering Technique
 
20100810
2010081020100810
20100810
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 

Automated Correlation Discovery for Semi-Structured Business Processes

  • 1. Automated Correlation Discovery for Semi-Structured Business Processes DEBS 2011 Szabolcs Rozsnyai, Aleksander Slominski, Geetika T. Lakshmanan
  • 2.
  • 3.
  • 4.
  • 5.
  • 6. Big Picture and Context
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 13.
  • 15. Example - Index The attribute cardinality (i.e. Index) contains a map of each value and how often each of those values occurred . Example
  • 16. Example - Card 4 Unique Values Determines the number of different values for the attribute. Example
  • 17. Example - Cnt Cnt=5 For certain attributes the number might be smaller as they can be null or missing Cnt=5 For certain attributes the number might be smaller as they can be null or missing Cnt=5 For certain attributes the number might be smaller as they can be null or missing Represents the total number of instances in which the attribute occurs . As the data structure does not work on a defined schema it is possible that the attribute does not occur in every instance. Represents the total number of instances in which the attribute occurs . As the data structure does not work on a defined schema it is possible that the attribute does not occur in every instance. Example
  • 18. Example - AvgAttributeLength AvgAttributeLength is calculated Represents the average attribute length of the current attribute. This is an indicator about the potential uniqueness of a value. A long value might be the sign that attribute might be a unique identifier. Unique identifiers such as OrderId is a potential attribute that occurs in other types and thus forms a correlation. This may also be misleading since a textual description may be very long and is in fact unique but it is never used for correlating artefacts. Example
  • 19. Example - InferencedType Determines DataType Defines the type of an attribute . The type of an attribute is an important characteristic for correlation discovery to reduce the problem space of correlation candidates. The chances that a type would correlate with another attribute given that the type contains mostly alpha-numeric attributes are very low. The determination of the type is made with a fault tolerance of 0.9 (e.g. min. 90% of the values must be numeric), and we refer to this as a parameter Phi. Example
  • 20. Example – The rest of the types… Example
  • 21. Example – The rest of the types… Example
  • 22.
  • 23.
  • 24.
  • 25.
  • 26. Example – Determining Mappables The Mappable Attribute can be seen as means to reduced search space of potentially correlating attributes of a type. One approach is to set an upper threshold of how often a value of an attribute can occur. The assumption is that if it occurs more then x times it is unlikely that it is a correlation candidate. x… Cardinality of a value i… Attribute of a type { xi | x < Gamma } Card < Gamma where Gamma = 10 For instance in this domain it might be unlikely that a shipment has more than 10 orders. However this might cause problems in other domains or for certain relationships (one customer has definitely more than 10 orders). Example
  • 27.
  • 28.
  • 29. Example – DifferenceSet for all Permutations Example OrderReceived.OrderId = ShipmentCreated.ShipmentId OrderReceived.OrderId = ShipmentCreated.OrderId OrderReceived.OrderId = TransportStarted.TransportId OrderReceived.OrderId = TransportStarted.ShipmentId … A/B = {x | x  A  x  B} |A/B| <= DiffTreshold 100% 0% 100% 100% … DiffTreshold = 0.95 OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId Resulting candidates of Correlation Pairs with 100% overlapping SetDiff SetDiff 0% 0% 0% 0% 0%
  • 30.
  • 31.
  • 32. Example – AvgAttributeLength Example OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId SetDiff 0% 0% 0% 0% 0% AvgAttrLength 0 0 0 0 0
  • 33.
  • 34. Example – LevenshteinDistance Example OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId SetDiff 0% 0% 0% 0% 0% AvgAttrLength 0 0 0 0 0 LevenshteinDistance 0 0 0 0 0
  • 35. Example – Weight Calculation Example OrderReceived.OrderId = ShipmentCreated.OrderId ShipmentCreated.ShipmentId = TransportStarted.ShipmentId ShipmentCreated.ShipmentId = TransportEnded.ShipmentId TransportStarted.TransportId = TransportEnded.TransportId TransportEnded.TransportId = TransportStarted.TransportId SetDiff 0% 0% 0% 0% 0% Avg Attribute Length 0 0 0 0 0 Levenshtein Distance 0 0 0 0 0 SetDiff AvgAttrLenght LevenshteinDistance 60% 20% 20% Confidence 100% 100% 100% 100% 100% Weight is adjustable!
  • 36.
  • 39.
  • 40.