Minería de Datos en Sql Server 2008

Data Mining in SQL Server 2008Ing Eduardo CastroGrupoAsesor en Informáticaecastro@grupoasesor.net

Eduardo Castroecastro@grupoasesor.netMCITP Server AdministratorMCTS Windows Server 2008 ActiveDirectoryMCTS Windows Server 2008 Network InfrastructureMCTS Windows Server 2008 Applications InfrastructureMCITP Enterprise SupportMCSTS Windows VistaMCITP Database DeveloperMCITP Database AdministratorMCTS SQL ServerMCITP Exchange Server 2007MCTS Office PerformancePoint ServerMCTS Team Foundation ServerMCPD Enterprise Application DeveloperMCTS .Net Framework 2.0: Distributed ApplicationsMCT 2008International Association of Software Architects Chapter LeaderIEEE Communications Society Board of DirectorsEuropean Datawarehouse Research

DisclaimerThe information contained in this slide deck represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication.This slide deck is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT.Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property.Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. © 2008 Microsoft Corporation. All rights reserved.Microsoft, SQL Server, Office System, Visual Studio, SharePoint Server, Office PerformancePoint Server, .NET Framework, ProClarity Desktop Professionalare either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.The names of actual companies and products mentioned herein may be the trademarks of their respective owners.3

OverviewIntroducing Data Mining Office Add-InsUnderstanding Data Mining Structure ImprovementsUsing the New Time Series Algorithm4

Introducing Data Mining Office Add-InsData Preparation TasksTools for ExplorationTools for PredictionModel Testing and Validation

Tools for Exploration - Table Analysis Tools7

Tools for Exploration - Data Modeling Tools

Tools for Exploration – Model Viewers Cluster DiagramDistribution of population

Strength of similarities between clustersOther viewers: Decision tree

Time seriesCluster ProfilesDistribution of values for each attribute

Drill through to detailsCluster CharacteristicsAttributes ordered by importance to cluster

Probability attribute appearing in clusterCluster DiscriminationComparison of attributes between two clustersTools for Prediction - Table Analysis Tools

Tools for Prediction - Data Modeling Tools

Model Testing and ValidationAccuracy ChartMeasurement of model accuracy

Lift chart comparing actual results to random guess and to perfect predictionClassification MatrixShows correct and incorrect predictions

Displays percentage and countsProfit ChartEstimation of profit by percentage of population contacted

Input: population, fixed cost, individual cost, revenue per individual

Output: maximum profit, probability thresholdCross Validation – more on this later

1 Using the Data Mining Excel Add-Indemo

Understanding Data Mining Structure ImprovementsData Partitioning for Training and TestingMining Model Column AliasesData Mining FiltersDrill Through to Mining Structure DataCross-Validation of a Mining Model

Data Partitioning for Training and TestingSpecify as percentage or maximum number of casesSmaller value is used if both parameters specifiedData is divided randomly between training and testingHoldoutSeed property enables consistent partitions across structures

Data Partitioning with DMXCreate a structure with partitioning with the HOLDOUT keywordQuery the structure to review partitions

Mining Model Column AliasesAssign a column alias to reuse a column in a structureColumn content can be clarifiedColumn can be more easily referenced in DMXContinuous and discretized versions of the same column can be used in separate models in the same structure

Data Mining FiltersSpecify a condition to apply to mining structure columns Filter creates subsets of training and testing data for a modelMultiple conditions can be linked with AND/OR operatorsConditions for continuous value use > , >=, <, <= operatorsConditions for discrete values use =, !=, or is null operatorsConditions on nested tables can use EXISTS keyword and subquery

Data Mining Filters with DMXAdd a filtered mining model to a structure

Drill Through to Mining Structure DataAdd columns to the mining structure, but not to modelsEliminates unnecessary data from model and improves processing timeSupports drill through from mining model viewer or DMX for visibility into results

Cross-Validation of a Mining ModelPurposeValidate the accuracy of a single modelCompare models within the same mining structureProcessSplit mining structure into partitions of equal sizeIteratively build models on all partitions excluding one partition such that all partitions are excluded onceMeasure accuracy of each model using the excluded partitionAnalyze results

Cross-Validation ParametersFold CountNumber of partitions to useMinimum 2, Maximum 256Maximum 10 for session mining structureMax CasesTotal number of cases to include in cross-validationCases divided across foldsValue of 0 specifies all casesTarget AttributePredictable column Target StateTarget value for target attributeValue of null specifies all states are to be testedTarget ThresholdValue between 0 and 1 for prediction probability above which a predicted state is considered correctValue of null specifies most probable prediction is considered correct

Minería de Datos en Sql Server 2008

More Related Content

What's hot

Viewers also liked

Similar to Minería de Datos en Sql Server 2008

More from Eduardo Castro

Recently uploaded

Minería de Datos en Sql Server 2008

Editor's Notes