SlideShare a Scribd company logo
1 of 23
Microsoft Sequence ClusteringAnd Association Rules
OVERVIEW Introduction DMX Queries Interpreting the sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
Microsoft Sequence ClusteringAnd Association Rules The Microsoft Sequence Clustering algorithm is a sequence analysis algorithm provided by Microsoft SQL Server Analysis Services. The algorithm finds the most common sequences by grouping, or clustering, sequences that are identical. Ex :  Data that describes the click paths that are created when users navigate or browse a Web site. Data that describes the order in which a customer adds items to a shopping cart at an online retailer.
DMX Queries By querying the data mining schema rowset, you can find various kinds of information about the model such as: Basic metadata,  The date and time that the model was created and last processed,  The name of the mining structure that the model is based on,  The column used as the predictable attribute.
DMX Queries SELECT MINING_PARAMETERS  from  $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'Sequence Clustering'     Query to return the parameters that were used to build and train the Sample model.
DMX Queries SELECT FLATTENED NODE_UNIQUE_NAME, (SELECT ATTRIBUTE_VALUE AS [Product 1], [Support] AS [Sequence Support], [Probability] AS [Sequence Probability]     FROM NODE_DISTRIBUTION) AS t FROM [Sequence Clustering].CONTENT WHERE NODE_TYPE = 13 AND [PARENT_UNIQUE_NAME] = 0 Getting a List of Sequences for a State Query to return the complete list of first states in the model, before the sequences are grouped into clusters.  Returning the list of sequences (NODE_TYPE = 13) that have the model root node as parent (PARENT_UNIQUE_NAME = 0).  The FLATTENED keyword makes the results easier to read. Sample  result of this query is shown in the next figure.
DMX Queries you reference the value returned for NODE_UNIQUE_NAME  to get the ID of the node that contains all sequences for the model.  You pass this value to the query as the ID of the parent node, to get only the transitions included in this node, which happens to contain a list of al sequences for the model.
Interpreting the sequence clustering model A sequence clustering model has a single parent node that represents the model and its metadata.  The parent node, which is labeled, has a related sequence node that lists all the transitions that were detected in the training data. The algorithm also creates a number of clusters, based on the transitions that were found in the data and any other input attributes included when creating the model.  Each cluster contains its own sequence node that lists only the transitions that were used in generating that specific cluster.
Interpreting the sequence clustering model
Microsoft Sequence Clustering Algorithm Principles The Microsoft Sequence Clustering algorithm is a hybrid algorithm that combines clustering techniques with Markov chain analysis to identify clusters and their sequences. This data typically represents a series of events or transitions between states in a dataset.  The algorithm examines all transition probabilities and measures the differences, or distances, between all the possible sequences in the dataset to determine which sequences are the best to use as inputs for clustering.  After the algorithm has created the list of candidate sequences, it uses the sequence information as an input for the EM method of clustering.
Markov chain model A Markov chain also contains a matrix of transition probabilities.  The transitions emanating from a given state define a distribution over the possible next states.  The equation P (xi= G|xi-1=A) = 0.15 means that, given the current state A, the probability of the next state being G is 0.15.
Markov chain model Based on the Markov chain, for any given length L sequence x {x1, x2,x3,. . .,xL},  you can calculate the probability of a sequence as follows: P(x) = P(xL . xL-1,. . .,x1)         = P(xL| xL-1,. . .,x1)P (xL-1|xL-2,. . .,x1).. .P(x1) In first-order, the probability of each state xi depends only on the state of xi-1. P(x) = P(xL . xL-1,. . .,x1)        = P(xL|xL-1)P(xL-1|xL-2). . .P(x2|x1)P(x1)
Microsoft Sequence Clustering Parameters ,[object Object],Setting the CLUSTER_COUNT parameter to 0 causes the algorithm to use heuristics to best determine the number of clusters to build. The default is 10. ,[object Object],The default is 100.
Microsoft Sequence Clustering Parameters ,[object Object],The default is 10. ,[object Object],The default is 64.
Introduction to Microsoft Association Rules The Microsoft Association Rules Viewer in Microsoft SQL Server Analysis Services displays mining models that are built with the Microsoft Association algorithm. The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines.  A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest.  The Microsoft Association algorithm is also useful for market basket analysis.
Structure of an Association Model The top level has a single node (Model Root) that represents the model.  The second level contains nodes that represent qualified item sets and rules.
Association Algorithm Principles The Microsoft Association Rules algorithm belongs to the Apriori association family.  The two steps in the Microsoft Association Rules algorithm are: ,[object Object]
Generate association rules based on frequent item sets. ,[object Object]
Association Algorithm Parameters MINIMUM_PROBABILITY is a threshold parameter.  It defines the minimum probability for an association rule.  Its value is within the range of 0 to 1.  The default value is 0.4. MINIMUM_IMPORTANCE is a threshold parameter for association rules.  Rules with importance less than Minimum_Importance are filtered out.
Association Algorithm Parameters MAXIMUM_ITEMSET_SIZE specifies the maximum size of an itemset.  The default value is 0, which means that there is no size limit on the itemset. MINIMUM_ITEMSET_SIZE specifies the minimum size of the itemset.  The default value is 0. MAXIMUM_ITEMSET_COUNTdefines the maximum number of item sets.
Association Algorithm Parameters OPTIMIZED_PREDICTION_COUNTdefines the number of items to be cached to optimized predictions AUTODETECT_MINIMUM_SUPPORTrepresents the sensitivity of the algorithm used to autodetect minimum support. To automatically detect the smallest appropriate value of minimum support, Set this value to 1.0 . To turns off autodetection, Set this value to 1.0
Summary Introduction to sequence clustering DMX Queries The sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net

More Related Content

What's hot

Chapter 04-discriminant analysis
Chapter 04-discriminant analysisChapter 04-discriminant analysis
Chapter 04-discriminant analysisRaman Kannan
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbookRaman Kannan
 
WEKA: Output Knowledge Representation
WEKA: Output Knowledge RepresentationWEKA: Output Knowledge Representation
WEKA: Output Knowledge RepresentationDataminingTools Inc
 
[M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization [M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization Andrea Rubio
 

What's hot (6)

XL-MINER:Prediction
XL-MINER:PredictionXL-MINER:Prediction
XL-MINER:Prediction
 
Chapter 04-discriminant analysis
Chapter 04-discriminant analysisChapter 04-discriminant analysis
Chapter 04-discriminant analysis
 
Chapter01 introductory handbook
Chapter01 introductory handbookChapter01 introductory handbook
Chapter01 introductory handbook
 
XL Miner: Classification
XL Miner: ClassificationXL Miner: Classification
XL Miner: Classification
 
WEKA: Output Knowledge Representation
WEKA: Output Knowledge RepresentationWEKA: Output Knowledge Representation
WEKA: Output Knowledge Representation
 
[M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization [M2A3] Data Analysis and Interpretation Specialization
[M2A3] Data Analysis and Interpretation Specialization
 

Viewers also liked (20)

Quick Look At Classification
Quick Look At ClassificationQuick Look At Classification
Quick Look At Classification
 
MySql:Introduction
MySql:IntroductionMySql:Introduction
MySql:Introduction
 
Presentazione oroblu
Presentazione orobluPresentazione oroblu
Presentazione oroblu
 
LISP:Object System Lisp
LISP:Object System LispLISP:Object System Lisp
LISP:Object System Lisp
 
LISP: Macros in lisp
LISP: Macros in lispLISP: Macros in lisp
LISP: Macros in lisp
 
BI: Open Source
BI: Open SourceBI: Open Source
BI: Open Source
 
LISP:Loops In Lisp
LISP:Loops In LispLISP:Loops In Lisp
LISP:Loops In Lisp
 
Jive Clearspace Best#2598 C8
Jive  Clearspace  Best#2598 C8Jive  Clearspace  Best#2598 C8
Jive Clearspace Best#2598 C8
 
Data Applied:Decision Trees
Data Applied:Decision TreesData Applied:Decision Trees
Data Applied:Decision Trees
 
Norihicodanch
NorihicodanchNorihicodanch
Norihicodanch
 
Mysql:Operators
Mysql:OperatorsMysql:Operators
Mysql:Operators
 
Data Applied: Similarity
Data Applied: SimilarityData Applied: Similarity
Data Applied: Similarity
 
C,C++ In Matlab
C,C++ In MatlabC,C++ In Matlab
C,C++ In Matlab
 
Ccc
CccCcc
Ccc
 
Oracle: Joins
Oracle: JoinsOracle: Joins
Oracle: Joins
 
RapidMiner: Nested Subprocesses
RapidMiner:   Nested SubprocessesRapidMiner:   Nested Subprocesses
RapidMiner: Nested Subprocesses
 
Mphone
MphoneMphone
Mphone
 
Control Statements in Matlab
Control Statements in  MatlabControl Statements in  Matlab
Control Statements in Matlab
 
Matlab Text Files
Matlab Text FilesMatlab Text Files
Matlab Text Files
 
Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4 Powerpoint paragraaf 5.3/5.4
Powerpoint paragraaf 5.3/5.4
 

Similar to MS SQL SERVER: Microsoft sequence clustering and association rules

MS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regressionMS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regressionsqlserver content
 
MS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithmMS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithmsqlserver content
 
MS SQL Server: Data mining concepts and dmx
MS SQL Server: Data mining concepts and dmxMS SQL Server: Data mining concepts and dmx
MS SQL Server: Data mining concepts and dmxsqlserver content
 
MS SQL SERVER: Data mining concepts and dmx
MS SQL SERVER: Data mining concepts and dmxMS SQL SERVER: Data mining concepts and dmx
MS SQL SERVER: Data mining concepts and dmxDataminingTools Inc
 
Php and MySQL Web Development
Php and MySQL Web DevelopmentPhp and MySQL Web Development
Php and MySQL Web Developmentw3ondemand
 
mc_simulation documentation
mc_simulation documentationmc_simulation documentation
mc_simulation documentationCarlo Parodi
 
Interface Python with MySQL connectivity.pptx
Interface Python with MySQL connectivity.pptxInterface Python with MySQL connectivity.pptx
Interface Python with MySQL connectivity.pptxBEENAHASSINA1
 
MS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithmMS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithmsqlserver content
 
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithmMS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithmDataminingTools Inc
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdfKalyankumarVenkat1
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streamingAdam Doyle
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdfSudhanshiBakre1
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Predictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matchingPredictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matchingHoria Berca
 
Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Eduardo Castro
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning AlgorithmsWalaa Hamdy Assy
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachErik De Monte
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningVenkata Karthik Gullapalli
 

Similar to MS SQL SERVER: Microsoft sequence clustering and association rules (20)

MS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regressionMS SQL SERVER: Neural network and logistic regression
MS SQL SERVER: Neural network and logistic regression
 
MS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithmMS SQL SERVER: Microsoft naive bayes algorithm
MS SQL SERVER: Microsoft naive bayes algorithm
 
Database programming
Database programmingDatabase programming
Database programming
 
MS SQL Server: Data mining concepts and dmx
MS SQL Server: Data mining concepts and dmxMS SQL Server: Data mining concepts and dmx
MS SQL Server: Data mining concepts and dmx
 
MS SQL SERVER: Data mining concepts and dmx
MS SQL SERVER: Data mining concepts and dmxMS SQL SERVER: Data mining concepts and dmx
MS SQL SERVER: Data mining concepts and dmx
 
Php and MySQL Web Development
Php and MySQL Web DevelopmentPhp and MySQL Web Development
Php and MySQL Web Development
 
mc_simulation documentation
mc_simulation documentationmc_simulation documentation
mc_simulation documentation
 
Interface Python with MySQL connectivity.pptx
Interface Python with MySQL connectivity.pptxInterface Python with MySQL connectivity.pptx
Interface Python with MySQL connectivity.pptx
 
MS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithmMS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithm
 
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithmMS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithm
 
Oracle_Analytical_function.pdf
Oracle_Analytical_function.pdfOracle_Analytical_function.pdf
Oracle_Analytical_function.pdf
 
Spark ml streaming
Spark ml streamingSpark ml streaming
Spark ml streaming
 
Clustering in Machine Learning.pdf
Clustering in Machine Learning.pdfClustering in Machine Learning.pdf
Clustering in Machine Learning.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Predictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matchingPredictive performance analysis using sql pattern matching
Predictive performance analysis using sql pattern matching
 
Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008Minería de Datos en Sql Server 2008
Minería de Datos en Sql Server 2008
 
Machine learning Algorithms
Machine learning AlgorithmsMachine learning Algorithms
Machine learning Algorithms
 
ifip2008albashiri.pdf
ifip2008albashiri.pdfifip2008albashiri.pdf
ifip2008albashiri.pdf
 
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning ApproachReducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
Reducing False Positives - BSA AML Transaction Monitoring Re-Tuning Approach
 
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine LearningA Novel Methodology to Implement Optimization Algorithms in Machine Learning
A Novel Methodology to Implement Optimization Algorithms in Machine Learning
 

More from DataminingTools Inc

AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceDataminingTools Inc
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web miningDataminingTools Inc
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataDataminingTools Inc
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsDataminingTools Inc
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisDataminingTools Inc
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technologyDataminingTools Inc
 

More from DataminingTools Inc (20)

Terminology Machine Learning
Terminology Machine LearningTerminology Machine Learning
Terminology Machine Learning
 
Techniques Machine Learning
Techniques Machine LearningTechniques Machine Learning
Techniques Machine Learning
 
Machine learning Introduction
Machine learning IntroductionMachine learning Introduction
Machine learning Introduction
 
Areas of machine leanring
Areas of machine leanringAreas of machine leanring
Areas of machine leanring
 
AI: Planning and AI
AI: Planning and AIAI: Planning and AI
AI: Planning and AI
 
AI: Logic in AI 2
AI: Logic in AI 2AI: Logic in AI 2
AI: Logic in AI 2
 
AI: Logic in AI
AI: Logic in AIAI: Logic in AI
AI: Logic in AI
 
AI: Learning in AI 2
AI: Learning in AI 2AI: Learning in AI 2
AI: Learning in AI 2
 
AI: Learning in AI
AI: Learning in AI AI: Learning in AI
AI: Learning in AI
 
AI: Introduction to artificial intelligence
AI: Introduction to artificial intelligenceAI: Introduction to artificial intelligence
AI: Introduction to artificial intelligence
 
AI: Belief Networks
AI: Belief NetworksAI: Belief Networks
AI: Belief Networks
 
AI: AI & Searching
AI: AI & SearchingAI: AI & Searching
AI: AI & Searching
 
AI: AI & Problem Solving
AI: AI & Problem SolvingAI: AI & Problem Solving
AI: AI & Problem Solving
 
Data Mining: Text and web mining
Data Mining: Text and web miningData Mining: Text and web mining
Data Mining: Text and web mining
 
Data Mining: Outlier analysis
Data Mining: Outlier analysisData Mining: Outlier analysis
Data Mining: Outlier analysis
 
Data Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence dataData Mining: Mining stream time series and sequence data
Data Mining: Mining stream time series and sequence data
 
Data Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlationsData Mining: Mining ,associations, and correlations
Data Mining: Mining ,associations, and correlations
 
Data Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysisData Mining: Graph mining and social network analysis
Data Mining: Graph mining and social network analysis
 
Data warehouse and olap technology
Data warehouse and olap technologyData warehouse and olap technology
Data warehouse and olap technology
 
Data Mining: Data processing
Data Mining: Data processingData Mining: Data processing
Data Mining: Data processing
 

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

MS SQL SERVER: Microsoft sequence clustering and association rules

  • 2. OVERVIEW Introduction DMX Queries Interpreting the sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
  • 3. Microsoft Sequence ClusteringAnd Association Rules The Microsoft Sequence Clustering algorithm is a sequence analysis algorithm provided by Microsoft SQL Server Analysis Services. The algorithm finds the most common sequences by grouping, or clustering, sequences that are identical. Ex : Data that describes the click paths that are created when users navigate or browse a Web site. Data that describes the order in which a customer adds items to a shopping cart at an online retailer.
  • 4. DMX Queries By querying the data mining schema rowset, you can find various kinds of information about the model such as: Basic metadata, The date and time that the model was created and last processed, The name of the mining structure that the model is based on, The column used as the predictable attribute.
  • 5. DMX Queries SELECT MINING_PARAMETERS from $system.DMSCHEMA_MINING_MODELS WHERE MODEL_NAME = 'Sequence Clustering' Query to return the parameters that were used to build and train the Sample model.
  • 6. DMX Queries SELECT FLATTENED NODE_UNIQUE_NAME, (SELECT ATTRIBUTE_VALUE AS [Product 1], [Support] AS [Sequence Support], [Probability] AS [Sequence Probability] FROM NODE_DISTRIBUTION) AS t FROM [Sequence Clustering].CONTENT WHERE NODE_TYPE = 13 AND [PARENT_UNIQUE_NAME] = 0 Getting a List of Sequences for a State Query to return the complete list of first states in the model, before the sequences are grouped into clusters. Returning the list of sequences (NODE_TYPE = 13) that have the model root node as parent (PARENT_UNIQUE_NAME = 0). The FLATTENED keyword makes the results easier to read. Sample result of this query is shown in the next figure.
  • 7. DMX Queries you reference the value returned for NODE_UNIQUE_NAME to get the ID of the node that contains all sequences for the model. You pass this value to the query as the ID of the parent node, to get only the transitions included in this node, which happens to contain a list of al sequences for the model.
  • 8. Interpreting the sequence clustering model A sequence clustering model has a single parent node that represents the model and its metadata. The parent node, which is labeled, has a related sequence node that lists all the transitions that were detected in the training data. The algorithm also creates a number of clusters, based on the transitions that were found in the data and any other input attributes included when creating the model. Each cluster contains its own sequence node that lists only the transitions that were used in generating that specific cluster.
  • 9. Interpreting the sequence clustering model
  • 10. Microsoft Sequence Clustering Algorithm Principles The Microsoft Sequence Clustering algorithm is a hybrid algorithm that combines clustering techniques with Markov chain analysis to identify clusters and their sequences. This data typically represents a series of events or transitions between states in a dataset. The algorithm examines all transition probabilities and measures the differences, or distances, between all the possible sequences in the dataset to determine which sequences are the best to use as inputs for clustering. After the algorithm has created the list of candidate sequences, it uses the sequence information as an input for the EM method of clustering.
  • 11. Markov chain model A Markov chain also contains a matrix of transition probabilities. The transitions emanating from a given state define a distribution over the possible next states. The equation P (xi= G|xi-1=A) = 0.15 means that, given the current state A, the probability of the next state being G is 0.15.
  • 12. Markov chain model Based on the Markov chain, for any given length L sequence x {x1, x2,x3,. . .,xL}, you can calculate the probability of a sequence as follows: P(x) = P(xL . xL-1,. . .,x1) = P(xL| xL-1,. . .,x1)P (xL-1|xL-2,. . .,x1).. .P(x1) In first-order, the probability of each state xi depends only on the state of xi-1. P(x) = P(xL . xL-1,. . .,x1) = P(xL|xL-1)P(xL-1|xL-2). . .P(x2|x1)P(x1)
  • 13.
  • 14.
  • 15. Introduction to Microsoft Association Rules The Microsoft Association Rules Viewer in Microsoft SQL Server Analysis Services displays mining models that are built with the Microsoft Association algorithm. The Microsoft Association algorithm is an association algorithm provided by Analysis Services that is useful for recommendation engines. A recommendation engine recommends products to customers based on items they have already bought, or in which they have indicated an interest. The Microsoft Association algorithm is also useful for market basket analysis.
  • 16. Structure of an Association Model The top level has a single node (Model Root) that represents the model. The second level contains nodes that represent qualified item sets and rules.
  • 17.
  • 18.
  • 19. Association Algorithm Parameters MINIMUM_PROBABILITY is a threshold parameter. It defines the minimum probability for an association rule. Its value is within the range of 0 to 1. The default value is 0.4. MINIMUM_IMPORTANCE is a threshold parameter for association rules. Rules with importance less than Minimum_Importance are filtered out.
  • 20. Association Algorithm Parameters MAXIMUM_ITEMSET_SIZE specifies the maximum size of an itemset. The default value is 0, which means that there is no size limit on the itemset. MINIMUM_ITEMSET_SIZE specifies the minimum size of the itemset. The default value is 0. MAXIMUM_ITEMSET_COUNTdefines the maximum number of item sets.
  • 21. Association Algorithm Parameters OPTIMIZED_PREDICTION_COUNTdefines the number of items to be cached to optimized predictions AUTODETECT_MINIMUM_SUPPORTrepresents the sensitivity of the algorithm used to autodetect minimum support. To automatically detect the smallest appropriate value of minimum support, Set this value to 1.0 . To turns off autodetection, Set this value to 1.0
  • 22. Summary Introduction to sequence clustering DMX Queries The sequence clustering model Microsoft Sequence Clustering Algorithm Principles and Parameters Markov chain model Introduction to Microsoft Association Rules Association Algorithm Principles and Parameters
  • 23. Visit more self help tutorials Pick a tutorial of your choice and browse through it at your own pace. The tutorials section is free, self-guiding and will not involve any additional support. Visit us at www.dataminingtools.net