Data Mining
with Excel 2010
and PowerPivot

Mark Tabladillo Ph.D.
http://marktab.net
September 18, 2010
SQL Saturday 46 -- Raleigh NC
#sqlsat46




                                © 2010 Mark Tabladillo Ph.D.
                                    2
MarkTab & Data Mining




    © 2010 Mark Tabladillo Ph.D.
3
© 2010 Mark Tabladillo Ph.D.
4
© 2010 Mark Tabladillo Ph.D.
5
Outline




                                   © 2010 Mark Tabladillo Ph.D.
  What is       What is
                           Demos
Data Mining   PowerPivot




                                       6
Data Mining as a Service




    © 2010 Mark Tabladillo Ph.D.
7
Outline




                                   © 2010 Mark Tabladillo Ph.D.
  What is       What is
                           Demos
Data Mining   PowerPivot




                                       8
Data Mining Definitions
• Data mining
• Machine Learning
• Data mining algorithms -- typically use estimation or
  optimization to achieve results (as opposed to only
  calculations).




                                                          © 2010 Mark Tabladillo Ph.D.
                                                              9
Data Mining Tasks
• Supervised
  • Answer known, what is correlated?
• Unsupervised
  • Answer unknown (unspecified), what are the groups?
• Forecasting




                                                                 © 2010 Mark Tabladillo Ph.D.
  • Given a trend, what is next?



                                                         Value
                                                         Slide

                                                                 10
Data Mining Add-In for Excel
• Requires Analysis Services instance
• Version 10.00.2531.00 (April 2009)
• 32-Bit Add-In
• Microsoft .NET Framework 2.0 (32-bit)
• Office 2007 (Professional, Professional Plus, Ultimate,




                                                             © 2010 Mark Tabladillo Ph.D.
  Enterprise)
• SQL Server Enterprise or Standard (or Developer) 2008 or
  higher



                                                             11
The Analyze Tab




     © 2010 Mark Tabladillo Ph.D.
12
The Analyze Tab


  Menu Option                     Data Mining Algorithm
  Analyze Key Influencers         Naïve Bayes




                                                          © 2010 Mark Tabladillo Ph.D.
  Detect Categories               Clustering
  Fill from Example               Logistic Regression
  Forecast                        Time Series
  Highlight Exceptions            Clustering
  Scenario Analysis (Goal Seek)   Logistic Regression
  Scenario Analysis (What If)     Logistic Regression
  Prediction Calculator           Logistic Regression
                                                          13
  Shopping Basket Analysis        Association Rules
Data Mining Tab




     © 2010 Mark Tabladillo Ph.D.
14
Data Mining Tab




Many




       © 2010 Mark Tabladillo Ph.D.
15
Data Mining Capacities

SQL Server 2008 R2 Analysis Services
                                            Maximum sizes/numbers
Object
Maximum data mining models per
                                             2^31-1 = 2,147,483,647
structure
Maximum data mining structures per




                                                                          © 2010 Mark Tabladillo Ph.D.
                                             2^31-1 = 2,147,483,647
solution
Maximum data mining structures per
                                             2^31-1 = 2,147,483,647
Analysis Services database
Maximum data mining attributes
                                             2^31-1 = 2,147,483,647
(variables) per structure

     Reference:
     http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-
     data-mining-capacities-2008-r2/                                      16
Data Mining Tab




     © 2010 Mark Tabladillo Ph.D.
17
Outline




                                   © 2010 Mark Tabladillo Ph.D.
  What is       What is
                           Demos
Data Mining   PowerPivot




                                   18
PowerPivot for Excel
• Take advantage of familiar Excel tools and
  features
• Process massive amounts of data in seconds
• Load even the largest data sets from virtually any




                                                       © 2010 Mark Tabladillo Ph.D.
  source
• Use powerful new analytical capabilities, such as
  Data Analysis Expressions (DAX)
• Make the most of multi-core processors and
  gigabytes of memory
                                                       19
PowerPivot for Excel Sources
• SQL Server
• SQL Azure
• Oracle, Teradata, Sybase, Informix, IBM DB2
• OLEDB/ODBC




                                                © 2010 Mark Tabladillo Ph.D.
• Analysis Services (SSAS)
• Reporting Services (SSRS)
• Excel, Text File

                                                20
PowerPivot Reference
• http://www.powerpivot.com (Product Site)
• http://www.powerpivotpro.com (Blog Site)




                                             © 2010 Mark Tabladillo Ph.D.
                                             21
Outline




                                   © 2010 Mark Tabladillo Ph.D.
  What is       What is
                           Demos
Data Mining   PowerPivot




                                   22
Resources
• MarkTab.NET
  Blog, links, video resources and information for
  data mining
• Blog: http://marktab.net/datamining




                                                     © 2010 Mark Tabladillo Ph.D.
• Twitter: @MarkTabNet




                                                     23
© 2010 Mark Tabladillo Ph.D.
24
Regroup and Conclusion
• Main Points from this Presentation




                                       © 2010 Mark Tabladillo Ph.D.
                                       25
Contact Information
• Mark Tabladillo
  http://marktab.net

• Also on:
  Twitter @marktabnet




                        © 2010 Mark Tabladillo Ph.D.
  Linked In




                        26

Data Mining with Excel 2010 and PowerPivot

  • 1.
    Data Mining with Excel2010 and PowerPivot Mark Tabladillo Ph.D. http://marktab.net September 18, 2010
  • 2.
    SQL Saturday 46-- Raleigh NC #sqlsat46 © 2010 Mark Tabladillo Ph.D. 2
  • 3.
    MarkTab & DataMining © 2010 Mark Tabladillo Ph.D. 3
  • 4.
    © 2010 MarkTabladillo Ph.D. 4
  • 5.
    © 2010 MarkTabladillo Ph.D. 5
  • 6.
    Outline © 2010 Mark Tabladillo Ph.D. What is What is Demos Data Mining PowerPivot 6
  • 7.
    Data Mining asa Service © 2010 Mark Tabladillo Ph.D. 7
  • 8.
    Outline © 2010 Mark Tabladillo Ph.D. What is What is Demos Data Mining PowerPivot 8
  • 9.
    Data Mining Definitions •Data mining • Machine Learning • Data mining algorithms -- typically use estimation or optimization to achieve results (as opposed to only calculations). © 2010 Mark Tabladillo Ph.D. 9
  • 10.
    Data Mining Tasks •Supervised • Answer known, what is correlated? • Unsupervised • Answer unknown (unspecified), what are the groups? • Forecasting © 2010 Mark Tabladillo Ph.D. • Given a trend, what is next? Value Slide 10
  • 11.
    Data Mining Add-Infor Excel • Requires Analysis Services instance • Version 10.00.2531.00 (April 2009) • 32-Bit Add-In • Microsoft .NET Framework 2.0 (32-bit) • Office 2007 (Professional, Professional Plus, Ultimate, © 2010 Mark Tabladillo Ph.D. Enterprise) • SQL Server Enterprise or Standard (or Developer) 2008 or higher 11
  • 12.
    The Analyze Tab © 2010 Mark Tabladillo Ph.D. 12
  • 13.
    The Analyze Tab Menu Option Data Mining Algorithm Analyze Key Influencers Naïve Bayes © 2010 Mark Tabladillo Ph.D. Detect Categories Clustering Fill from Example Logistic Regression Forecast Time Series Highlight Exceptions Clustering Scenario Analysis (Goal Seek) Logistic Regression Scenario Analysis (What If) Logistic Regression Prediction Calculator Logistic Regression 13 Shopping Basket Analysis Association Rules
  • 14.
    Data Mining Tab © 2010 Mark Tabladillo Ph.D. 14
  • 15.
    Data Mining Tab Many © 2010 Mark Tabladillo Ph.D. 15
  • 16.
    Data Mining Capacities SQLServer 2008 R2 Analysis Services Maximum sizes/numbers Object Maximum data mining models per 2^31-1 = 2,147,483,647 structure Maximum data mining structures per © 2010 Mark Tabladillo Ph.D. 2^31-1 = 2,147,483,647 solution Maximum data mining structures per 2^31-1 = 2,147,483,647 Analysis Services database Maximum data mining attributes 2^31-1 = 2,147,483,647 (variables) per structure Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server- data-mining-capacities-2008-r2/ 16
  • 17.
    Data Mining Tab © 2010 Mark Tabladillo Ph.D. 17
  • 18.
    Outline © 2010 Mark Tabladillo Ph.D. What is What is Demos Data Mining PowerPivot 18
  • 19.
    PowerPivot for Excel •Take advantage of familiar Excel tools and features • Process massive amounts of data in seconds • Load even the largest data sets from virtually any © 2010 Mark Tabladillo Ph.D. source • Use powerful new analytical capabilities, such as Data Analysis Expressions (DAX) • Make the most of multi-core processors and gigabytes of memory 19
  • 20.
    PowerPivot for ExcelSources • SQL Server • SQL Azure • Oracle, Teradata, Sybase, Informix, IBM DB2 • OLEDB/ODBC © 2010 Mark Tabladillo Ph.D. • Analysis Services (SSAS) • Reporting Services (SSRS) • Excel, Text File 20
  • 21.
    PowerPivot Reference • http://www.powerpivot.com(Product Site) • http://www.powerpivotpro.com (Blog Site) © 2010 Mark Tabladillo Ph.D. 21
  • 22.
    Outline © 2010 Mark Tabladillo Ph.D. What is What is Demos Data Mining PowerPivot 22
  • 23.
    Resources • MarkTab.NET Blog, links, video resources and information for data mining • Blog: http://marktab.net/datamining © 2010 Mark Tabladillo Ph.D. • Twitter: @MarkTabNet 23
  • 24.
    © 2010 MarkTabladillo Ph.D. 24
  • 25.
    Regroup and Conclusion •Main Points from this Presentation © 2010 Mark Tabladillo Ph.D. 25
  • 26.
    Contact Information • MarkTabladillo http://marktab.net • Also on: Twitter @marktabnet © 2010 Mark Tabladillo Ph.D. Linked In 26