Secrets of Enterprise
Data Mining
Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)
PASS SQL Saturday #177 Mountain View, CA
February 23, 2013
Networking
Interactive
About MarkTab
Training and Consulting with        Ph.D. – Industrial Engineering,
http://marktab.com                  Georgia Tech
Data Mining Resources and Blog at   Training and consulting
http://marktab.net                  internationally across many
                                    industries – SAS and Microsoft
                                    Contributed to peer-reviewed
                                    research and legislation
                                      Mentoring doctoral dissertations at the
                                      accredited University of Phoenix
                                    Presenter
Interactive
Name (up to) three things you want from enterprise
data mining
Secret: Excel data
mining
Excel add-in for SQL Server data mining
Secret: More than just
SQL Server
Microsoft continues to add machine learning
technology
Microsoft Offers
Bing
  Maps
Xbox Kinect
  Hacker Magnet
SQL Server 2012
  Analysis Services (Multidimensional and Data Mining)
  Integration Services
  Semantic Search
  Hadoop Partnership
Excel Projects from Microsoft Research
Definitions
What is data mining?
Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery
Purposes
    Phrase          Goal

    “Data Mining”   Inform actionable decisions



    “Machine        Determine best performing
    Learning”       algorithm
Secret: Give artists art
Data mining is part of a complete decision cycle
MarkTab Decision Cycle
                             GO




           Synthesis                 Analysis
               (art)                (science)


         Science needs science fiction -- MarkTab
MarkTab Decision Cycle
                      GO




          Synthesis        Analysis
            (art)          (science)
XKCD: Shopping Teams
XKCD: Shopping Teams
XKCD: Shopping Teams
Secret: Microsoft is an
analytics competitor
Industry Comparisons 2012-2013
Gartner 2013
           Magic Quadrant for
           Business Intelligence
           and Analytics
           Platforms




  Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
  – February 5, 2013
Gartner 2013
           Magic Quadrant for
           Data Warehouse
           Database
           Management
           Systems




  Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
  – January 31, 2013
KDNuggets 2012
http://marktab.net/datamining/2012/06/15/excel-number-
commercial-tool-analytics-data-mining-big-data/
SQL Server 2012
Business Intelligence and Business Analytics
New Platform options: managed services
   Platform       Infrastructure                         Platform                            Software
(Self Managed)     (as a Service)                      (as a Service)                      (as a Service)

  Applications     Applications                         Applications                        Applications

     Data              Data                                Data                                Data

   Runtime           Runtime                             Runtime                             Runtime

  Middleware       Middleware                           Middleware                          Middleware




                                                                                                            Managed Services
   Database          Database                            Database                            Database




                                                                        Managed Services
      O/S               O/S                                 O/S                                 O/S

 Virtualization    Virtualization                      Virtualization                      Virtualization




                                    Managed Services
    Servers           Servers                             Servers                             Servers

    Storage          Storage                              Storage                             Storage

  Networking       Networking                           Networking                          Networking
SQL Release timelines                                                                                                                 2008
                                                                                                                                 SQL Server 2008
                                                                                                                                                            2012
                                                                                                                                                      SQL Server 2012
                                                                                                                                                         AlwaysOn
                                                                                                                                                        Columnstore
      1989                   1993                                            2000                                                Sparse Columns          FileTable
  SQL Server 1.0         SQL Server 4.21         1996                  SQL Server 2000                                            Spatial Types       Semantic Search
     (OS/2)                   (NT)           SQL Server 6.5            Reporting Services                                         FILESTREAM            Power View



          1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012



                 1991                       1995                 1998                                            2005                           2010
             SQL Server 1.1             SQL Server 6.0     SQL Server 7.0                                 SQL Server 2005                SQL Server 2008 R2
                (OS/2)                                    Dynamic Locking                                  Unicode Support                 Data-tier Apps
                                                            Auto-Tuning                                      Native XML                     StreamInsight
                                                           Full-text search                                    SQLCLR                        PowerPivot
                                                             Replication                                    Service Broker               Master Data Services
                                                          Analysis Services                              Integration Services
                                                                                                                             Aug 11
                                                    Aug 10
                                                                                                                     New Portal Experience
                                              SQL Azure SU4 RTW                        Feb 11
                                                                                                                         Sparse Columns
                                                Database Copy                 SQL Azure Reporting CTP2              SQL Azure Reporting CTP3
                                                 Web Admin                  Dec DataSync CTP2 Update
                                                                                10                                  SQL Azure DataSync CTP3
                          Apr 10
             Feb 10 SQL Azure SU2 RTW         Jul 10                   SQL Azure SU6 RTW                            DAC Import/Export Service
         SQL Azure RTW MARS               DataSync CTP1                  DataSync CTP2                                     Denali TSQL



                        Apr 10             Jul 10             Oct 10             Jan 11           Apr 11                Jul 11             Oct 11



                Feb 10                  Jun 10                          Nov 10                     Apr 11
          SQL Azure SU1 RTW       SQL Azure SU3 RTW                DataMarket RTW            SQL Azure SU V.Next
             Alter Edition             50 GB Db                SQL Azure Reporting CTP1        Multiple Servers
                                     Spatial Type                                             Server Mgmt API
                                   HierarchyId Type                                                 JDBC
                                                                                                DAC Upgrade
Secret: Many already
have Microsoft analytics
Business Intelligence and Business Analytics are
included with most SQL Server licenses
Data platform: SQL Server 2012
                              Data Integration
  Database Services                                      Analytical Services      Reporting Services
                                 Services

          SQL Server*            Integration Services*                               Reporting Services*
                                                             Analysis Services*
          SQL Azure*                                                                SQL Azure Reporting*


                                Master Data Services*
          Replication
                                                               Data Mining             Report Builder
     SQL Azure Data Sync*
                                Data Quality Services*


      Full Text & Semantic
                                   StreamInsight*              PowerPivot*              Power View*
             Search*
                                  Project “Austin”*




* New / improved in SQL Server 2012
SQL Server 2012 Editions




    Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
Secret: Microsoft offers
three enterprise tools
All three tools support scaled solutions
What Enterprise Tools support Microsoft
Data Mining?
                  Data
                 Mining

      SSMS        SSIS    PowerShell
Variable      0   1   2   3   4   5   6   7



Discretized
Discretized
Continuous
Discrete
Variable      0   1   2   3   4   5   6   7



Discretized
Discretized
Continuous
Discrete
Variable      0   1   2   3   4   5   6   7



Discretized
Discretized
Continuous
Discrete
Variable      0   1   2   3   4   5   6   7


Discretized
Discretized
Continuous
Discrete
Data Mining Capacities
   SQL Server 2008 R2 Analysis Services Object                    Maximum sizes/numbers
   Maximum data mining models per structure                       2^31-1 = 2,147,483,647

   Maximum data mining structures per solution                    2^31-1 = 2,147,483,647

   Maximum data mining structures per Analysis
                                                                  2^31-1 = 2,147,483,647
   Services database
   Maximum data mining attributes (variables) per
                                                                  2^31-1 = 2,147,483,647
   structure


Reference:
http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
Semantic Search
Text Mining
Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both
Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)
(iFilter Required)
                                  iFilters   Full-Text
       Documents                             Keyword
                                              Index
                                               “FTI”



                                              Semantic
                                             Key Phrase
                                  Semantic     Index –
         Semantic Document        Database    Tag Index
         Similarity Index “DSI”                  “TI”
Languages Currently Supported
Traditional Chinese   Simplified Chinese
German                British English
English               Portuguese
French                Chinese (Hong Kong SAR, PRC)
Italian               Spanish
Brazilian             Chinese (Singapore)
Russian               Chinese (Macau SAR)
Swedish
Phases of Semantic Indexing
      Full Text Keyword Index “FTI”

                                                 Semantic Document Similarity
                                                         Index “DSI”
      Semantic Key Phrase Index –
            Tag Index “TI”




     http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
Secret: Semantic Search
scales linearly
Performance
Integrated Full Text Search (iFTS)
Improved Performance and Scale:
  Scale-up to 350M documents for storage and search
  iFTS query performance 7-10 times faster than in SQL Server 2008
  Worst-case iFTS query response times less than 3 sec for corpus
  Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry




            Time in Seconds vs. Number of Documents
            (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
Text Mining References
Video
  http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-
  Search
  http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) – explains the demo
  http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
  http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Resources
Links
Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
 http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
 http://office.microsoft.com/en-us/try
Organizations
 Professional Association for SQL Server http://www.sqlpass.org
   Atlanta MDF http://www.atlantamdf.com/
   Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
   Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/
Interactive
Takeaways
Conclusion: Seven Secrets
Excel data mining
More than just SQL Server
Success involves everyone
Microsoft is an analytics competitor
Many already have Microsoft analytics
Microsoft offers three enterprise tools
Semantic search scales linearly
Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com
Abstract
If you have a SQL Server license (Standard or higher) then you already have the ability
to start data mining. In this new presentation, you will see how to scale up data
mining from the free Excel 2013 add-in to production use. Aimed at beginning to
intermediate data miners, this presentation will show how mining models move from
development to production. We will use SQL Server 2012 tools including SSMS, SSIS,
and SSDT.

Secrets of Enterprise Data Mining

  • 1.
    Secrets of Enterprise DataMining Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT) PASS SQL Saturday #177 Mountain View, CA February 23, 2013
  • 2.
  • 3.
    About MarkTab Training andConsulting with Ph.D. – Industrial Engineering, http://marktab.com Georgia Tech Data Mining Resources and Blog at Training and consulting http://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  • 4.
    Interactive Name (up to)three things you want from enterprise data mining
  • 5.
    Secret: Excel data mining Exceladd-in for SQL Server data mining
  • 6.
    Secret: More thanjust SQL Server Microsoft continues to add machine learning technology
  • 7.
    Microsoft Offers Bing Maps Xbox Kinect Hacker Magnet SQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop Partnership Excel Projects from Microsoft Research
  • 8.
  • 9.
    Definition Data mining isthe automated or semi-automated process of discovering patterns in data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  • 10.
    Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Determine best performing Learning” algorithm
  • 11.
    Secret: Give artistsart Data mining is part of a complete decision cycle
  • 12.
    MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  • 13.
    MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  • 14.
  • 15.
  • 16.
  • 17.
    Secret: Microsoft isan analytics competitor Industry Comparisons 2012-2013
  • 18.
    Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  • 19.
    Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  • 20.
  • 21.
    SQL Server 2012 BusinessIntelligence and Business Analytics
  • 22.
    New Platform options:managed services Platform Infrastructure Platform Software (Self Managed) (as a Service) (as a Service) (as a Service) Applications Applications Applications Applications Data Data Data Data Runtime Runtime Runtime Runtime Middleware Middleware Middleware Middleware Managed Services Database Database Database Database Managed Services O/S O/S O/S O/S Virtualization Virtualization Virtualization Virtualization Managed Services Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
  • 23.
    SQL Release timelines 2008 SQL Server 2008 2012 SQL Server 2012 AlwaysOn Columnstore 1989 1993 2000 Sparse Columns FileTable SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search (OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1991 1995 1998 2005 2010 SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2 (OS/2) Dynamic Locking Unicode Support Data-tier Apps Auto-Tuning Native XML StreamInsight Full-text search SQLCLR PowerPivot Replication Service Broker Master Data Services Analysis Services Integration Services Aug 11 Aug 10 New Portal Experience SQL Azure SU4 RTW Feb 11 Sparse Columns Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3 Web Admin Dec DataSync CTP2 Update 10 SQL Azure DataSync CTP3 Apr 10 Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Feb 10 Jun 10 Nov 10 Apr 11 SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers Spatial Type Server Mgmt API HierarchyId Type JDBC DAC Upgrade
  • 24.
    Secret: Many already haveMicrosoft analytics Business Intelligence and Business Analytics are included with most SQL Server licenses
  • 25.
    Data platform: SQLServer 2012 Data Integration Database Services Analytical Services Reporting Services Services SQL Server* Integration Services* Reporting Services* Analysis Services* SQL Azure* SQL Azure Reporting* Master Data Services* Replication Data Mining Report Builder SQL Azure Data Sync* Data Quality Services* Full Text & Semantic StreamInsight* PowerPivot* Power View* Search* Project “Austin”* * New / improved in SQL Server 2012
  • 26.
    SQL Server 2012Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  • 27.
    Secret: Microsoft offers threeenterprise tools All three tools support scaled solutions
  • 28.
    What Enterprise Toolssupport Microsoft Data Mining? Data Mining SSMS SSIS PowerShell
  • 29.
    Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 30.
    Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 31.
    Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 32.
    Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 33.
    Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis 2^31-1 = 2,147,483,647 Services database Maximum data mining attributes (variables) per 2^31-1 = 2,147,483,647 structure Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • 34.
  • 35.
    Future: Most datais Text Two Research Types • Quantitative research = data mining • Qualitative research = text mining The future is combining both
  • 36.
    Full-Text Search Enhancements Propertysearch: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  • 37.
    (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  • 38.
    Languages Currently Supported TraditionalChinese Simplified Chinese German British English English Portuguese French Chinese (Hong Kong SAR, PRC) Italian Spanish Brazilian Chinese (Singapore) Russian Chinese (Macau SAR) Swedish
  • 39.
    Phases of SemanticIndexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 40.
    Secret: Semantic Search scaleslinearly Performance
  • 41.
    Integrated Full TextSearch (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 42.
    Linear Scale ofFTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 43.
    Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 44.
  • 45.
    Software SQL Server 2012Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx Microsoft Office 2012 Professional http://office.microsoft.com/en-us/try
  • 46.
    Organizations Professional Associationfor SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/ PASS Business Analytics Conference http://www.passbaconference.com Microsoft TechEd North America http://northamerica.msteched.com/
  • 47.
  • 48.
    Conclusion: Seven Secrets Exceldata mining More than just SQL Server Success involves everyone Microsoft is an analytics competitor Many already have Microsoft analytics Microsoft offers three enterprise tools Semantic search scales linearly
  • 49.
    Connect Data Mining Resourcesand blog http://marktab.net Data Mining Training and Consulting (especially Microsoft and SAS) http://marktab.com
  • 50.
    Abstract If you havea SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.