Minería de Datos en Sql Server 2008

Data Mining in SQL Server 2008 Ing Eduardo Castro GrupoAsesor en Informática ecastro@grupoasesor.net

Eduardo Castro ecastro@grupoasesor.net MCITP Server Administrator MCTS Windows Server 2008 ActiveDirectory MCTS Windows Server 2008 Network Infrastructure MCTS Windows Server 2008 Applications Infrastructure MCITP Enterprise Support MCSTS Windows Vista MCITP Database Developer MCITP Database Administrator MCTS SQL Server MCITP Exchange Server 2007 MCTS Office PerformancePoint Server MCTS Team Foundation Server MCPD Enterprise Application Developer MCTS .Net Framework 2.0: Distributed Applications MCT 2008 International Association of Software Architects Chapter Leader IEEE Communications Society Board of Directors European Datawarehouse Research

Disclaimer The information contained in this slide deck represents the current view of Microsoft Corporation on the issues discussed as of the date of publication. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information presented after the date of publication. This slide deck is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this slide deck may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. Microsoft may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this slide deck. Except as expressly provided in any written license agreement from Microsoft, the furnishing of this slide deck does not give you any license to these patents, trademarks, copyrights, or other intellectual property. Unless otherwise noted, the example companies, organizations, products, domain names, e-mail addresses, logos, people, places and events depicted herein are fictitious, and no association with any real company, organization, product, domain name, email address, logo, person, place or event is intended or should be inferred. © 2008 Microsoft Corporation. All rights reserved. Microsoft, SQL Server, Office System, Visual Studio, SharePoint Server, Office PerformancePoint Server, .NET Framework, ProClarity Desktop Professionalare either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries. The names of actual companies and products mentioned herein may be the trademarks of their respective owners. 3

Overview Introducing Data Mining Office Add-Ins Understanding Data Mining Structure Improvements Using the New Time Series Algorithm 4

Introducing Data Mining Office Add-Ins Data Preparation Tasks Tools for Exploration Tools for Prediction Model Testing and Validation

Tools for Exploration - Table Analysis Tools 7

Tools for Exploration - Data Modeling Tools

Tools for Exploration – Model Viewers Cluster Diagram ,[object Object]

Strength of similarities between clustersOther viewers: ,[object Object]

Time seriesCluster Profiles ,[object Object]

Drill through to detailsCluster Characteristics ,[object Object]

Probability attribute appearing in clusterCluster Discrimination ,[object Object],[object Object]

Tools for Prediction - Data Modeling Tools

Model Testing and Validation Accuracy Chart ,[object Object]

Lift chart comparing actual results to random guess and to perfect predictionClassification Matrix ,[object Object]

Displays percentage and countsProfit Chart ,[object Object]

Input: population, fixed cost, individual cost, revenue per individual

Output: maximum profit, probability thresholdCross Validation – more on this later

1 Using the Data Mining Excel Add-In demo

Understanding Data Mining Structure Improvements Data Partitioning for Training and Testing Mining Model Column Aliases Data Mining Filters Drill Through to Mining Structure Data Cross-Validation of a Mining Model

Data Partitioning for Training and Testing Specify as percentage or maximum number of cases Smaller value is used if both parameters specified Data is divided randomly between training and testing HoldoutSeed property enables consistent partitions across structures

Data Partitioning with DMX Create a structure with partitioning with the HOLDOUT keyword Query the structure to review partitions

Mining Model Column Aliases Assign a column alias to reuse a column in a structure Column content can be clarified Column can be more easily referenced in DMX Continuous and discretized versions of the same column can be used in separate models in the same structure

Data Mining Filters Specify a condition to apply to mining structure columns Filter creates subsets of training and testing data for a model Multiple conditions can be linked with AND/OR operators Conditions for continuous value use > , >=, <, <= operators Conditions for discrete values use =, !=, or is null operators Conditions on nested tables can use EXISTS keyword and subquery

Data Mining Filters with DMX Add a filtered mining model to a structure

Drill Through to Mining Structure Data Add columns to the mining structure, but not to models Eliminates unnecessary data from model and improves processing time Supports drill through from mining model viewer or DMX for visibility into results

Cross-Validation of a Mining Model Purpose Validate the accuracy of a single model Compare models within the same mining structure Process Split mining structure into partitions of equal size Iteratively build models on all partitions excluding one partition such that all partitions are excluded once Measure accuracy of each model using the excluded partition Analyze results

Cross-Validation Parameters Fold Count Number of partitions to use Minimum 2, Maximum 256 Maximum 10 for session mining structure Max Cases Total number of cases to include in cross-validation Cases divided across folds Value of 0 specifies all cases Target Attribute Predictable column Target State Target value for target attribute Value of null specifies all states are to be tested Target Threshold Value between 0 and 1 for prediction probability above which a predicted state is considered correct Value of null specifies most probable prediction is considered correct

Minería de Datos en Sql Server 2008

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Minería de Datos en Sql Server 2008

Similar to Minería de Datos en Sql Server 2008 (20)

More from Eduardo Castro

More from Eduardo Castro (20)

Recently uploaded

Recently uploaded (20)

Minería de Datos en Sql Server 2008

Editor's Notes