SlideShare a Scribd company logo
1 of 31
Data Mining in SQL Server 2008 Ing Eduardo Castro GrupoAsesor en Informática ecastro@grupoasesor.net
Eduardo Castro ecastro@grupoasesor.net MCITP Server Administrator MCTS Windows Server 2008 ActiveDirectory MCTS Windows Server 2008 Network Infrastructure MCTS Windows Server 2008 Applications Infrastructure MCITP Enterprise Support MCSTS Windows Vista MCITP Database Developer MCITP Database Administrator MCTS SQL Server MCITP Exchange Server 2007 MCTS Office PerformancePoint Server MCTS Team Foundation Server MCPD Enterprise Application Developer MCTS .Net Framework 2.0: Distributed Applications MCT 2008 International Association of  Software Architects Chapter Leader IEEE Communications Society Board of Directors European Datawarehouse Research
Disclaimer The information contained in this slide deck represents the current view of Microsoft Corporation on the issues discussed as of the date of publication.  Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This slide deck is for informational purposes only.  MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user.  Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation.  Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck.  Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred.   © 2008 Microsoft Corporation.  All rights reserved. Microsoft, SQL Server, Office System, Visual Studio, SharePoint Server, Office PerformancePoint Server, .NET Framework, ProClarity Desktop Professionalare either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. 3
Overview Introducing Data Mining Office Add-Ins Understanding Data Mining Structure Improvements Using the New Time Series Algorithm 4
Introducing Data Mining Office Add-Ins Data Preparation Tasks Tools for Exploration Tools for Prediction Model Testing and Validation
Data Preparation Tasks
Tools for Exploration - Table Analysis Tools 7
Tools for Exploration - Data Modeling Tools
Tools for Exploration – Model Viewers  Cluster Diagram ,[object Object]
Strength of similarities between clustersOther viewers: ,[object Object]
 Neural  network
 Association   rules
 Time seriesCluster Profiles ,[object Object]
Drill through to detailsCluster Characteristics ,[object Object]
Probability attribute appearing in clusterCluster Discrimination ,[object Object],[object Object]
Tools for Prediction - Data Modeling Tools
Model Testing and Validation Accuracy Chart ,[object Object]
Lift chart comparing actual results to random guess and to perfect predictionClassification Matrix ,[object Object]
Displays percentage and countsProfit Chart ,[object Object]
Input: population, fixed cost, individual cost, revenue per individual
Output: maximum profit, probability thresholdCross Validation – more on this later
1 Using the Data Mining Excel Add-In demo
Understanding Data Mining Structure Improvements Data Partitioning for Training and Testing Mining Model Column Aliases Data Mining Filters Drill Through to Mining Structure Data Cross-Validation of a Mining Model
Data Partitioning for Training and Testing Specify as percentage or maximum number of cases Smaller value is used if both parameters specified Data is divided randomly between training and testing HoldoutSeed property enables consistent partitions across structures
Data Partitioning with DMX Create a structure with partitioning with the HOLDOUT keyword Query the structure to review partitions
Mining Model Column Aliases Assign a column alias to reuse a column in a structure Column content can be clarified Column can be more easily referenced in DMX Continuous and discretized versions of the same column can be used in separate models in the same structure
Data Mining Filters Specify a condition to apply to mining structure columns  Filter creates subsets of training and testing data for a model Multiple conditions can be linked with AND/OR operators Conditions for continuous value use > , >=,  <, <= operators Conditions for discrete values use =, !=, or is null operators Conditions on nested tables can use EXISTS keyword and subquery
Data Mining Filters with DMX Add a filtered mining model to a structure
Drill Through to Mining Structure  Data Add columns to the mining structure, but not to models Eliminates unnecessary data from model and improves processing time Supports drill through from mining model viewer or DMX for visibility into results
Cross-Validation of a Mining Model Purpose Validate the accuracy of a single model Compare models within the same mining structure Process Split mining structure into partitions of equal size Iteratively build models on all partitions excluding one partition such that all partitions are excluded once Measure accuracy of each model using the excluded partition Analyze results
Cross-Validation Parameters Fold Count Number of partitions to use Minimum 2, Maximum 256 Maximum 10 for session mining structure Max Cases Total number of cases to include in cross-validation Cases divided across folds Value of 0 specifies all cases Target Attribute Predictable column  Target State Target value for target attribute Value of null specifies all states are to be tested Target Threshold Value between 0 and 1 for prediction probability above which a predicted state is considered correct Value of null specifies most probable prediction is considered correct

More Related Content

What's hot

A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksMicrosoft Tech Community
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Databricks
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Databricks
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksDatabricks
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannDatabricks
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationMatthew W. Bowers
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Databricks
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichDatabricks
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Con LA
 
Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayDatabricks
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkDatabricks
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in ProductionNathan Bijnens
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at ScaleDatabricks
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Databricks
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...Microsoft Tech Community
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureMark Tabladillo
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleVasu S
 

What's hot (20)

Sparkflows.io
Sparkflows.ioSparkflows.io
Sparkflows.io
 
A developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure DatabricksA developer's introduction to big data processing with Azure Databricks
A developer's introduction to big data processing with Azure Databricks
 
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
Building an Enterprise Data Platform with Azure Databricks to Enable Machine ...
 
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
Lightning-Fast Analytics for Workday Transactional Data with Pavel Hardak and...
 
Using Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on DatabricksUsing Redash for SQL Analytics on Databricks
Using Redash for SQL Analytics on Databricks
 
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha DittmannAzure Databricks—Apache Spark as a Service with Sascha Dittmann
Azure Databricks—Apache Spark as a Service with Sascha Dittmann
 
Azure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar PresentationAzure Synapse 101 Webinar Presentation
Azure Synapse 101 Webinar Presentation
 
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
 
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei VaranovichLambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
Lambda Architecture in the Cloud with Azure Databricks with Andrei Varanovich
 
Data Lakes with Azure Databricks
Data Lakes with Azure DatabricksData Lakes with Azure Databricks
Data Lakes with Azure Databricks
 
Scalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using RayScalable AutoML for Time Series Forecasting using Ray
Scalable AutoML for Time Series Forecasting using Ray
 
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache SparkReal-Time Analytics and Actions Across Large Data Sets with Apache Spark
Real-Time Analytics and Actions Across Large Data Sets with Apache Spark
 
Dataminds - ML in Production
Dataminds - ML in ProductionDataminds - ML in Production
Dataminds - ML in Production
 
Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
Unifying Streaming and Historical Telemetry Data For Real-time Performance Re...
 
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...The Developer Data Scientist – Creating New Analytics Driven Applications usi...
The Developer Data Scientist – Creating New Analytics Driven Applications usi...
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 
Spark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat PattersonSpark Summit EU talk by Pat Patterson
Spark Summit EU talk by Pat Patterson
 
Big Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft AzureBig Data Adavnced Analytics on Microsoft Azure
Big Data Adavnced Analytics on Microsoft Azure
 
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | QuboleEbooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
Ebooks - Accelerating Time to Value of Big Data of Apache Spark | Qubole
 

Viewers also liked

Introducción al análisis predictivo con SQL Server
Introducción al análisis predictivo con SQL ServerIntroducción al análisis predictivo con SQL Server
Introducción al análisis predictivo con SQL ServerEduardo Castro
 
Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...
Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...
Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...Heber Lopes
 
Charla sql server 2012 cibertec BI
Charla sql server 2012 cibertec BICharla sql server 2012 cibertec BI
Charla sql server 2012 cibertec BIdbLearner
 
SQL Denali Microsoft BI Raona
SQL Denali Microsoft BI RaonaSQL Denali Microsoft BI Raona
SQL Denali Microsoft BI RaonaRaona
 
SQL Server 2014 - Power BI
SQL Server 2014 - Power BISQL Server 2014 - Power BI
SQL Server 2014 - Power BIBILATAM
 
Sql Server Business Intelligence Spanish
Sql Server Business Intelligence SpanishSql Server Business Intelligence Spanish
Sql Server Business Intelligence SpanishEduardo Castro
 
Inteligencia de Negocios con PowerView
Inteligencia de Negocios con PowerViewInteligencia de Negocios con PowerView
Inteligencia de Negocios con PowerViewEduardo Castro
 
Welcome to PowerBI and Tableau
Welcome to PowerBI and TableauWelcome to PowerBI and Tableau
Welcome to PowerBI and TableauAshwin Dinoriya
 
Paweł Ciepły: PowerBI part1
Paweł Ciepły: PowerBI part1Paweł Ciepły: PowerBI part1
Paweł Ciepły: PowerBI part1AnalyticsConf
 
Microsoft Power BI Overview
Microsoft Power BI OverviewMicrosoft Power BI Overview
Microsoft Power BI OverviewNetwoven Inc.
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made SimpleJames Serra
 
Introduction to Microsoft Power BI
Introduction to Microsoft Power BIIntroduction to Microsoft Power BI
Introduction to Microsoft Power BIExilesoft
 
Business Analysis Fundamentals
Business Analysis FundamentalsBusiness Analysis Fundamentals
Business Analysis Fundamentalswaelsaid75
 
Business Analysis Techniques
Business Analysis TechniquesBusiness Analysis Techniques
Business Analysis TechniquesIIBA UK Chapter
 

Viewers also liked (20)

Introducción al análisis predictivo con SQL Server
Introducción al análisis predictivo con SQL ServerIntroducción al análisis predictivo con SQL Server
Introducción al análisis predictivo con SQL Server
 
Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...
Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...
Palestra sobre Microsoft Business Intelligence para estudantes de Mogi-Guaçu ...
 
Charla sql server 2012 cibertec BI
Charla sql server 2012 cibertec BICharla sql server 2012 cibertec BI
Charla sql server 2012 cibertec BI
 
SQL Denali Microsoft BI Raona
SQL Denali Microsoft BI RaonaSQL Denali Microsoft BI Raona
SQL Denali Microsoft BI Raona
 
SPS-Power BI Introduction
SPS-Power BI IntroductionSPS-Power BI Introduction
SPS-Power BI Introduction
 
SQL Server 2014 - Power BI
SQL Server 2014 - Power BISQL Server 2014 - Power BI
SQL Server 2014 - Power BI
 
Sql Server Business Intelligence Spanish
Sql Server Business Intelligence SpanishSql Server Business Intelligence Spanish
Sql Server Business Intelligence Spanish
 
MCSE Productivity
MCSE ProductivityMCSE Productivity
MCSE Productivity
 
Inteligencia de Negocios con PowerView
Inteligencia de Negocios con PowerViewInteligencia de Negocios con PowerView
Inteligencia de Negocios con PowerView
 
Welcome to PowerBI and Tableau
Welcome to PowerBI and TableauWelcome to PowerBI and Tableau
Welcome to PowerBI and Tableau
 
Paweł Ciepły: PowerBI part1
Paweł Ciepły: PowerBI part1Paweł Ciepły: PowerBI part1
Paweł Ciepły: PowerBI part1
 
MCSA: Windows Server 2016
MCSA: Windows Server 2016MCSA: Windows Server 2016
MCSA: Windows Server 2016
 
Power bi desktop et Power BI Service
Power bi desktop et Power BI ServicePower bi desktop et Power BI Service
Power bi desktop et Power BI Service
 
Business analyst ppt
Business analyst pptBusiness analyst ppt
Business analyst ppt
 
Power BI
Power BIPower BI
Power BI
 
Microsoft Power BI Overview
Microsoft Power BI OverviewMicrosoft Power BI Overview
Microsoft Power BI Overview
 
Power BI Made Simple
Power BI Made SimplePower BI Made Simple
Power BI Made Simple
 
Introduction to Microsoft Power BI
Introduction to Microsoft Power BIIntroduction to Microsoft Power BI
Introduction to Microsoft Power BI
 
Business Analysis Fundamentals
Business Analysis FundamentalsBusiness Analysis Fundamentals
Business Analysis Fundamentals
 
Business Analysis Techniques
Business Analysis TechniquesBusiness Analysis Techniques
Business Analysis Techniques
 

Similar to Minería de Datos en Sql Server 2008

Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Eduardo Castro
 
MS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithmMS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithmsqlserver content
 
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithmMS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithmDataminingTools Inc
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008llangit
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicDavid Solivan
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Miningllangit
 
SQL Azure Overview - ericnel
SQL Azure Overview - ericnelSQL Azure Overview - ericnel
SQL Azure Overview - ericnelukdpe
 
BizDataX White paper Test Data Management
BizDataX White paper Test Data ManagementBizDataX White paper Test Data Management
BizDataX White paper Test Data ManagementDragan Kinkela
 
Machine Learning as service
Machine Learning as serviceMachine Learning as service
Machine Learning as serviceNihal Mehdi
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactDATAVERSITY
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014Mark Tabladillo
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developersukdpe
 
Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)Steve Lange
 
Microsoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations PresentationMicrosoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations PresentationMicrosoft Private Cloud
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery ToolsAntonio Rolle
 

Similar to Minería de Datos en Sql Server 2008 (20)

Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008Analysis Services en SQL Server 2008
Analysis Services en SQL Server 2008
 
MS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithmMS SQL SERVER: Microsoft time series algorithm
MS SQL SERVER: Microsoft time series algorithm
 
MS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithmMS SQL SERVER: Time series algorithm
MS SQL SERVER: Time series algorithm
 
Data Mining 2008
Data Mining 2008Data Mining 2008
Data Mining 2008
 
The Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs PublicThe Magic Of Application Lifecycle Management In Vs Public
The Magic Of Application Lifecycle Management In Vs Public
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Server 2008 Data Mining
SQL Server 2008 Data MiningSQL Server 2008 Data Mining
SQL Server 2008 Data Mining
 
SQL Azure Overview - ericnel
SQL Azure Overview - ericnelSQL Azure Overview - ericnel
SQL Azure Overview - ericnel
 
BizDataX White paper Test Data Management
BizDataX White paper Test Data ManagementBizDataX White paper Test Data Management
BizDataX White paper Test Data Management
 
Machine Learning as service
Machine Learning as serviceMachine Learning as service
Machine Learning as service
 
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI ImpactHow to Use a Semantic Layer on Big Data to Drive AI & BI Impact
How to Use a Semantic Layer on Big Data to Drive AI & BI Impact
 
.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014.Net development with Azure Machine Learning (AzureML) Nov 2014
.Net development with Azure Machine Learning (AzureML) Nov 2014
 
Hot sos em12c_metric_extensions
Hot sos em12c_metric_extensionsHot sos em12c_metric_extensions
Hot sos em12c_metric_extensions
 
SQL Server 2008 for Developers
SQL Server 2008 for DevelopersSQL Server 2008 for Developers
SQL Server 2008 for Developers
 
Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)Whats New In 2010 (Msdn & Visual Studio)
Whats New In 2010 (Msdn & Visual Studio)
 
Microsoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations PresentationMicrosoft SQL Server - SQL Server Migrations Presentation
Microsoft SQL Server - SQL Server Migrations Presentation
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Ax
AxAx
Ax
 
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
“Lights Out”Configuration using Tivoli Netcool AutoDiscovery Tools
 

More from Eduardo Castro

Introducción a polybase en SQL Server
Introducción a polybase en SQL ServerIntroducción a polybase en SQL Server
Introducción a polybase en SQL ServerEduardo Castro
 
Creando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerCreando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerEduardo Castro
 
Seguridad en SQL Azure
Seguridad en SQL AzureSeguridad en SQL Azure
Seguridad en SQL AzureEduardo Castro
 
Azure Synapse Analytics MLflow
Azure Synapse Analytics MLflowAzure Synapse Analytics MLflow
Azure Synapse Analytics MLflowEduardo Castro
 
SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022Eduardo Castro
 
Novedades en SQL Server 2022
Novedades en SQL Server 2022Novedades en SQL Server 2022
Novedades en SQL Server 2022Eduardo Castro
 
Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Eduardo Castro
 
Machine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceMachine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceEduardo Castro
 
Novedades en sql server 2022
Novedades en sql server 2022Novedades en sql server 2022
Novedades en sql server 2022Eduardo Castro
 
Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Eduardo Castro
 
Introduccion a databricks
Introduccion a databricksIntroduccion a databricks
Introduccion a databricksEduardo Castro
 
Pronosticos con sql server
Pronosticos con sql serverPronosticos con sql server
Pronosticos con sql serverEduardo Castro
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsEduardo Castro
 
Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Eduardo Castro
 
Introduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsIntroduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsEduardo Castro
 
Seguridad de SQL Database en Azure
Seguridad de SQL Database en AzureSeguridad de SQL Database en Azure
Seguridad de SQL Database en AzureEduardo Castro
 
Python dentro de SQL Server
Python dentro de SQL ServerPython dentro de SQL Server
Python dentro de SQL ServerEduardo Castro
 
Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Eduardo Castro
 
Script de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesScript de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesEduardo Castro
 
Introducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesIntroducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesEduardo Castro
 

More from Eduardo Castro (20)

Introducción a polybase en SQL Server
Introducción a polybase en SQL ServerIntroducción a polybase en SQL Server
Introducción a polybase en SQL Server
 
Creando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL ServerCreando tu primer ambiente de AI en Azure ML y SQL Server
Creando tu primer ambiente de AI en Azure ML y SQL Server
 
Seguridad en SQL Azure
Seguridad en SQL AzureSeguridad en SQL Azure
Seguridad en SQL Azure
 
Azure Synapse Analytics MLflow
Azure Synapse Analytics MLflowAzure Synapse Analytics MLflow
Azure Synapse Analytics MLflow
 
SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022SQL Server 2019 con Windows Server 2022
SQL Server 2019 con Windows Server 2022
 
Novedades en SQL Server 2022
Novedades en SQL Server 2022Novedades en SQL Server 2022
Novedades en SQL Server 2022
 
Introduccion a SQL Server 2022
Introduccion a SQL Server 2022Introduccion a SQL Server 2022
Introduccion a SQL Server 2022
 
Machine Learning con Azure Managed Instance
Machine Learning con Azure Managed InstanceMachine Learning con Azure Managed Instance
Machine Learning con Azure Managed Instance
 
Novedades en sql server 2022
Novedades en sql server 2022Novedades en sql server 2022
Novedades en sql server 2022
 
Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022Sql server 2019 con windows server 2022
Sql server 2019 con windows server 2022
 
Introduccion a databricks
Introduccion a databricksIntroduccion a databricks
Introduccion a databricks
 
Pronosticos con sql server
Pronosticos con sql serverPronosticos con sql server
Pronosticos con sql server
 
Data warehouse con azure synapse analytics
Data warehouse con azure synapse analyticsData warehouse con azure synapse analytics
Data warehouse con azure synapse analytics
 
Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2Que hay de nuevo en el Azure Data Lake Storage Gen2
Que hay de nuevo en el Azure Data Lake Storage Gen2
 
Introduccion a Azure Synapse Analytics
Introduccion a Azure Synapse AnalyticsIntroduccion a Azure Synapse Analytics
Introduccion a Azure Synapse Analytics
 
Seguridad de SQL Database en Azure
Seguridad de SQL Database en AzureSeguridad de SQL Database en Azure
Seguridad de SQL Database en Azure
 
Python dentro de SQL Server
Python dentro de SQL ServerPython dentro de SQL Server
Python dentro de SQL Server
 
Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft Servicios Cognitivos de de Microsoft
Servicios Cognitivos de de Microsoft
 
Script de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure EnclavesScript de paso a paso de configuración de Secure Enclaves
Script de paso a paso de configuración de Secure Enclaves
 
Introducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure EnclavesIntroducción a conceptos de SQL Server Secure Enclaves
Introducción a conceptos de SQL Server Secure Enclaves
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 

Minería de Datos en Sql Server 2008

  • 1. Data Mining in SQL Server 2008 Ing Eduardo Castro GrupoAsesor en Informática ecastro@grupoasesor.net
  • 2. Eduardo Castro ecastro@grupoasesor.net MCITP Server Administrator MCTS Windows Server 2008 ActiveDirectory MCTS Windows Server 2008 Network Infrastructure MCTS Windows Server 2008 Applications Infrastructure MCITP Enterprise Support MCSTS Windows Vista MCITP Database Developer MCITP Database Administrator MCTS SQL Server MCITP Exchange Server 2007 MCTS Office PerformancePoint Server MCTS Team Foundation Server MCPD Enterprise Application Developer MCTS .Net Framework 2.0: Distributed Applications MCT 2008 International Association of Software Architects Chapter Leader IEEE Communications Society Board of Directors European Datawarehouse Research
  • 3. Disclaimer The information contained in this slide deck represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This slide deck is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. © 2008 Microsoft Corporation. All rights reserved. Microsoft, SQL Server, Office System, Visual Studio, SharePoint Server, Office PerformancePoint Server, .NET Framework, ProClarity Desktop Professionalare either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. 3
  • 4. Overview Introducing Data Mining Office Add-Ins Understanding Data Mining Structure Improvements Using the New Time Series Algorithm 4
  • 5. Introducing Data Mining Office Add-Ins Data Preparation Tasks Tools for Exploration Tools for Prediction Model Testing and Validation
  • 7. Tools for Exploration - Table Analysis Tools 7
  • 8. Tools for Exploration - Data Modeling Tools
  • 9.
  • 10.
  • 11. Neural network
  • 13.
  • 14.
  • 15.
  • 16. Tools for Prediction - Data Modeling Tools
  • 17.
  • 18.
  • 19.
  • 20. Input: population, fixed cost, individual cost, revenue per individual
  • 21. Output: maximum profit, probability thresholdCross Validation – more on this later
  • 22. 1 Using the Data Mining Excel Add-In demo
  • 23. Understanding Data Mining Structure Improvements Data Partitioning for Training and Testing Mining Model Column Aliases Data Mining Filters Drill Through to Mining Structure Data Cross-Validation of a Mining Model
  • 24. Data Partitioning for Training and Testing Specify as percentage or maximum number of cases Smaller value is used if both parameters specified Data is divided randomly between training and testing HoldoutSeed property enables consistent partitions across structures
  • 25. Data Partitioning with DMX Create a structure with partitioning with the HOLDOUT keyword Query the structure to review partitions
  • 26. Mining Model Column Aliases Assign a column alias to reuse a column in a structure Column content can be clarified Column can be more easily referenced in DMX Continuous and discretized versions of the same column can be used in separate models in the same structure
  • 27. Data Mining Filters Specify a condition to apply to mining structure columns Filter creates subsets of training and testing data for a model Multiple conditions can be linked with AND/OR operators Conditions for continuous value use > , >=, <, <= operators Conditions for discrete values use =, !=, or is null operators Conditions on nested tables can use EXISTS keyword and subquery
  • 28. Data Mining Filters with DMX Add a filtered mining model to a structure
  • 29. Drill Through to Mining Structure Data Add columns to the mining structure, but not to models Eliminates unnecessary data from model and improves processing time Supports drill through from mining model viewer or DMX for visibility into results
  • 30. Cross-Validation of a Mining Model Purpose Validate the accuracy of a single model Compare models within the same mining structure Process Split mining structure into partitions of equal size Iteratively build models on all partitions excluding one partition such that all partitions are excluded once Measure accuracy of each model using the excluded partition Analyze results
  • 31. Cross-Validation Parameters Fold Count Number of partitions to use Minimum 2, Maximum 256 Maximum 10 for session mining structure Max Cases Total number of cases to include in cross-validation Cases divided across folds Value of 0 specifies all cases Target Attribute Predictable column Target State Target value for target attribute Value of null specifies all states are to be tested Target Threshold Value between 0 and 1 for prediction probability above which a predicted state is considered correct Value of null specifies most probable prediction is considered correct
  • 34. 2 Creating a Clustering Model demo
  • 35. Using the New Time Series Algorithm Better Time Series Support Time Series Algorithm Parameters
  • 36. Better Time Series Support ARTxp algorithm Still included in Microsoft Time Series algorithm Best for prediction of next likely value in a series ARIMA algorithm Added to Microsoft Time Series algorithm Best for long-term predictions The new Microsoft Time Series algorithm Trains one model using ARTxp and second model using ARIMA Blends the results to return best prediction
  • 37. Time Series Algorithm Parameters
  • 38. Resources Model Filter Syntax and Examples, technet.microsoft.com/en-us/library/bb895186(SQL.100).aspx Cross-Validation, msdn2.microsoft.com/en-us/library/bb895174(SQL.100).aspx SQL Server Data Mining, www.sqlserverdatamining.com Jamie MacLennan’s blog, blogs.msdn.com/jamiemac/default.aspx
  • 39.
  • 40. © 2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Editor's Notes

  1. Data Mining Office Add-ins were introduced with SQL Server 2005, and a new version is available for SQL Server 2008 to take advantage of the improvements made to Analysis Services data mining. In this module, we’ll review how to use the Data Mining Add-ins, and then examine the changes made to mining structures as well as the new Time Series alogrithm.
  2. Data Mining Add-ins for Office allow you to perform a variety of data mining tasks. You can prepare data by applying data cleansing, and you can partition the data into training and test sets. Some of the add-in tools are focused on exploring your data, while other tools are built specifically for prediction purposes. The add-ins also includes functionality for testing and validating each model.Point out that the add-ins are also useful as a client viewer for data mining models developed on the server.
  3. This slide shows the data preparation tasks : Explore Data (to find anomalies), clean data (to handle outliers or erroenous data, and partition data to separate it into training and test data.In the background is a view used to consolidate information from several tables. Transformations have been applied to enforce business rules. This logical table is then used as the source for data mining activities –whether using the add-ins or using BI Development Studio.
  4. This slide identifies the table analysis tools that are exploration-based data mining tools and identifies the data mining algorithm associated with the tool.
  5. This slide identifies the data modeling tools that are exploration-based data mining tools and identifies the data mining algorithm associated with the tool.
  6. Model viewers are available not only for mining data models created by using the add-in, but also for mining models created on the server.
  7. This slide shows the predictive tools and shows the related algorithm.
  8. Prediction tools are also available in the Data Modeling ribbon of Excel. Here you see the algorithm associated with these predictive tools.
  9. The Data Mining add-in also includes model testing and validation tools, such as an Accuracy cart, a classification matrix, and a profit chart. Cross Validation is also new to Analysis Services data mining and will be discussed in more detail later in this module.
  10. In this section, we’ll review the improvements for mining structures in SSAS 2008. Specifically, we’ll look at setting up data partitions for training and testing dta, how to us aliases with mining model columns, how to apply filers to data associated with a mining model, how to drillthrough to details when studying data mining results, and how to use the cross-validation report to assess the accuracy of a model or to compare multiple models to find the best model.
  11. To create training and testing sets using random data for SSAS 2005, best practice was to use the Random Sample transformation in SSIS 2005. However, the package design was particularly cumbersome for structures with nested tables. In SSAS 2008, the process to generate random data sets for training and testing is built in.You can specify parameters for partitioning data into training and testing sets: In the Data Mining Wizard In the Properties pane of the mining structureAnalysis services uses a random sampling algorithm to assign data to either the training or the testing data set.If you provide both a percentage and maximum number of rows, the smaller number prevails. For example, you can specify a percentage of 30% of the entire data set which is not to exceed 1,000 rows if the data source continues to grow. When using the same data source view for multiple mining structures, you might want to keep the same partitioning strategy for each mining structure. Set the HoldoutSeed property to the same value in each structure to yield comparable results in the training and testing data sets.You can also define partitioning using DMX, AMO, or XML DDL.Point out that partitioning is not available for a model using the Time Series algorithm.
  12. For those who prefer to use DMX to create mining structures instead of the user interface, DMX now supports partitioning when the mining structure is created. Point out that HOLDOUT cannot be used with ALTER MINING STRUCTURE.The process to train the model – using INSERT INTO MINING STRUCTURE – is unchanged. The query executes and data is random sampled. A holdout store is created for each partition of the mining structure. In SSAS 2008, you can now query the structure to view the contents of the training and testing data sets.
  13. In SSAS 2005, you could change the name of a mining model column in Business Intelligence Development Studio, but not in DMX. One reason you might want to use alias a column is when you want to use the same column with different algorithms, but one algorithm supports continuous columns and the other does not. You can add a column to the mining structure more than once and set the Content property to a different value for each version of the column. Ignore the column in the model where the content type is unsupported, and include it as an input column in models supporting that content type. By enabling the use of an alias, you can use the same NATURAL PREDICTION JOIN for the models in the same mining structure because input columns are bound by name to the model column.
  14. Instead of creating separate data source views for your mining structure, you can create separate filtered models. Each model contains the same training and testing data which allows you to compare model results. Why create filtered models?Achieve better overall accuracy by eliminating strong patterns of one attribute value (e.g. North America versus Pacific).Compare patterns in isolated subsets of data.You can create filers: In the Model Filter dialog box In the Properties pane of the mining modelIn the case of discretized values, the bucket containing the specified value is selected. Example: Age = 23 returns bucket containing 20-25 ages.An example of a filter expression for a case table and a nested table:Gender = ‘M’ and EXISTS(select * from Products where Model = ‘Water Bottle’)Point out that NOT EXISTS is also valid.Mention the URL on the Resources slide for more information about filter syntax.You must process the mining structure to see the filter applied to the model.
  15. Mention that using drillthrough in a filtered model returns all cases matching the filter, whether used for training or testing.
  16. As in SSAS 2005, the following algorithms do not support drill through: NaïveBayes Neural Network Logistic RegressionThe Time Series algorithm supports drill through in a DMX query only; drill through is not supported in Business Intelligence Development Studio.
  17. Using parameters you specify, cross-validation automatically creates partitions of the data set of approximately equal size. For each partition, a mining model is created for the entire data set with one of the partitions removed, and then tested for accuracy using the partition that was excluded. If the variations are subtle, then the model generalizes well. If there is too much variation, then the model is not useful.Point out that cross-validation cannot be used with models built using the Time Series or Sequence Clustering algorithms.You can use the Cross Validation Report in the Mining Accuracy Chart of Business Intelligence Development Studio, or use Analysis Services stored procedures to create an ad hoc cross-validation SQL Server Management Studio.
  18. More folds results in longer processing time.
  19. This slide and the next outlines the types of tests and their respective measures that are found on the cross-validation report. Different models will use different test types for this report. Point out the report can be generated in Business Intelligence Development Studio, which will be shown in the demonstration, or by calling an Analysis Services stored procedure.
  20. Data mining in SSAS 2008 was also improved by modifying the Time Series algorithm. In this section, we’ll review why the mining structure is improved and we’ll review the algorithm parameters for the Time Series algorithm.
  21. In SSAS 2005, the ARTxp Time Series prediction algorithm (autoregressive tree model for multiple prior unknown states), built by Microsoft Research, was introduced. The purpose of this algorithm was to tackle a difficult business problem – how to accuractly predict the next step in a series. It was less reliable for predicting 10 steps or further out.ARIMA (autoregressive integrated moving average) is a very common time series algorithm that is well understood by seasoned data miners. It provides good predictions when projecting beyond the next 10 steps. In SSAS 2008, the Microsoft Time Series algorithm blends results of the two algorithms to leverage short and long term capabilities.In Standard Edition, you can configure your model to use one or the other algorithm, or both (which is the default). In Enterprise Edition, you can do custom weighting to get best prediction over a variable time span.
  22. The FORECAST_METHOD default value is MIXED. You can change this to use ARIMA or ARTXP to use a single algorithm exclusively.The PREDICTION_SMOOTHING parameter affects the weighting of the ARTxpand ARIMAalgorithms when MIXED mode is used. A value closer to 0 weights in favor of ARTxp while a value closer to 1 weights in favor of ARIMA. For example, a value of 0.8 is weighted towards ARIMA and the value of 0.2 is used for ARTxp.