Secrets of Enterprise Data Mining 
Mark Tabladillo, Ph.D. (MVP, SAS Expert) 
Consultant, SolidQ 
SQL Saturday Oregon 
November 1, 2014
Networking 
Say hello
Mark Tab 
SQL Server MVP; SAS Expert 
Consulting 
Training 
Teaching 
Presenting 
Linked In 
@MarkTabNet
Interactive 
Name (up to) three things you want from enterprise data mining
Definitions 
What is data mining?
Definition 
Data mining is the automated or semi-automated process of discovering patterns in data 
Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
Purposes 
Phrase 
Goal 
“Data Mining” 
Inform actionabledecisions 
“Machine Learning” 
Determine best performingalgorithm
How could data mining apply? 
Let’s look at three companies
Telecommunications
Oil and Gas
Volkswagen Group
What 
Why 
How 
Relational Data Warehouse 
Familiarway to store, fast retrieval, consistency, scalable 
Database, relational constructs,indexes 
Hadoop & HDInsight 
Large amounts, divideand conquer, analyzing unstructured data, flexible schema 
Distributed computing 
Tabular 
Fast calculations 
In-memory, columns over rows 
MultidimensionalOLAP 
Sliceand dice, ad hoc querying 
Expandsstar schema into cube, preaggregatedcalculations 
Data Mining & Machine Learning 
Patterns, predictions, high volume 
Algorithms,estimations
Secret: Excel data mining 
Excel add-in for SQL Server data mining
Data mining add-in for business analysts 
•Ease of use 
•Rich data mining 
•Scalable
Split Personality of SSAS 
SS 
SQL 
AS 
NoSQL
Excel Data Mining Add-In 
For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2: 
http://www.microsoft.com/en-us/download/details.aspx?id=7294 
For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: 
http://www.microsoft.com/en-us/download/details.aspx?id=35578 
For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: 
http://www.microsoft.com/en-us/download/details.aspx?id=35578
Secret: Data Science provides an Epistemology 
Data mining is part of a complete data science cycle
MarkTab Decision Cycle 
Analysis 
(science) 
Synthesis 
(art) 
GO 
Science needs science fiction --MarkTab
MarkTab Decision Cycle 
Analysis 
(science) 
Synthesis 
(art) 
GO
Currency of Science 
Notes
Secret: Microsoft is an analytics competitor
Gartner 2013 
Magic Quadrant for Business Intelligence and Analytics Platforms 
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
Gartner 2013 
Magic Quadrant for Data Warehouse Database Management Systems 
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? 
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? 
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
KDNuggets2014 
http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
KDNuggets2014 
http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
SQL Server 2014 
Business Intelligence and Business Analytics
Secret: Many already have Microsoft analytics 
Business Intelligence and Business Analytics are included with most SQL Server licenses
Self-service BI 
Corporate BI 
Evolution of BI
Evolution of BI 
Niche Startups 
Self-service BI 
Corporate BI
Data platform: SQL Server 2014 
Database Services 
SQL Server* SQL Azure* 
ReplicationSQL Azure Data Sync* 
Full Text & Semantic Search* 
Data Integration Services 
Integration Services* 
Master Data Services* 
Data Quality Services* 
StreamInsight* Project “Austin”* 
Analytical Services 
Analysis Services* 
Data Mining 
PowerPivot* 
Reporting Services 
Reporting Services* SQL Azure Reporting* 
Report Builder 
Power View*
Secret: Microsoft offers two choices 
SQL Server Analysis Services = SQL Server Data Mining 
Microsoft Azure Machine Learning
Advanced analytic tools for data scientists 
•Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services) 
•Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services) 
•Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012) 
•Big Data analytics(Hadoop integration)
What Enterprise Tools support SSAS? 
Data Mining 
SSMS 
SSIS 
PowerShell
SSAS Data Mining Capacities 
SQL Server 2014Analysis Services Object 
Maximum sizes/numbers 
Maximum data mining models per structure 
2^31-1 = 2,147,483,647 
Maximum data mining structures per solution 
2^31-1 = 2,147,483,647 
Maximum data mining structures per Analysis Services database 
2^31-1 = 2,147,483,647 
Maximum data mining attributes (variables) per structure 
64K 
Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
Microsoft Azure Machine Learning 
Bringsengineeringbestpracticestodatascience… 
Archiveforpredictivemodels,ensuringmodels 
arenotlost,deleted,orcorrupted. 
Search,discoveryandreuseexistingmodelsto 
buildontheworkofothers; 
Deploypredictivemodelintooperation,from 
DataLabtominimizetimetoinsight; 
Frequentlyupdatethepredictivemodel,to adapttochangingbusinessconditions. 
Everynewalgorithmaddedasamodule,everynewpredictivemodeldeployedwillflow 
tobuilduptheknowledgebaseandmakethe software morevaluable.
Semantic Search 
Text Mining
Future: Most data is Text 
•Quantitative research = data mining 
•Qualitative research = text mining 
Two Research Types 
The future is combining both
(iFilterRequired) 
Documents 
Full-Text Keyword Index 
“FTI” 
iFilters 
Semantic Document Similarity Index “DSI” 
Semantic Database 
Semantic Key Phrase Index – 
Tag Index “TI”
Languages Currently Supported 
Traditional Chinese 
German 
English 
French 
Italian 
Brazilian 
Russian 
Swedish 
Simplified Chinese 
British English 
Portuguese 
Chinese (Hong Kong SAR, PRC) 
Spanish 
Chinese (Singapore) 
Chinese (Macau SAR)
Secret: Semantic Search scales linearly 
Performance
Integrated Full Text Search (iFTS) 
Improved Performance and Scale: 
Scale-up to 350M documents for storage and search 
iFTSquery performance 7-10 times faster than in SQL Server 2008 
Worst-case iFTSquery response times less than 3 sec for corpus 
Similar or better than main database search competitors 
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI 
First known linearly scaling end-to-end Search and Semantic product in the industry 
Time in Seconds vs. Number of Documents 
(2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
Text Mining References 
Video 
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search 
http://www.microsoftpdc.com/2009/SVR32 
Semantic Search (Books Online) –explains the demo 
http://msdn.microsoft.com/en-us/library/gg492075.aspx 
Paper 
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Resources 
Links
Major Websites 
SQL Server Data Mining 
http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx 
http://www.sqlserverdatamining.com/ 
Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/
Software 
Dreamspark(students); BizSpark(businesses) 
SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) 
http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx 
Microsoft Office 
http://office.microsoft.com/en-us/ 
Primer on Power BI --MarkTab 
http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx
Organizations 
Professional Association for SQL Server http://www.sqlpass.org 
PASS Business Analytics Conference http://www.passbaconference.com
Interactive 
Takeaways
Conclusion 
Excel data mining 
Data Science provides an epistemology 
Microsoft is an analytics competitor 
Many already have Microsoft analytics 
Microsoft offers two enterprise solutions 
Semantic search scales linearly
Abstract 
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.

Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

  • 1.
    Secrets of EnterpriseData Mining Mark Tabladillo, Ph.D. (MVP, SAS Expert) Consultant, SolidQ SQL Saturday Oregon November 1, 2014
  • 2.
  • 3.
    Mark Tab SQLServer MVP; SAS Expert Consulting Training Teaching Presenting Linked In @MarkTabNet
  • 4.
    Interactive Name (upto) three things you want from enterprise data mining
  • 5.
    Definitions What isdata mining?
  • 6.
    Definition Data miningis the automated or semi-automated process of discovering patterns in data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  • 7.
    Purposes Phrase Goal “Data Mining” Inform actionabledecisions “Machine Learning” Determine best performingalgorithm
  • 8.
    How could datamining apply? Let’s look at three companies
  • 9.
  • 10.
  • 11.
  • 13.
    What Why How Relational Data Warehouse Familiarway to store, fast retrieval, consistency, scalable Database, relational constructs,indexes Hadoop & HDInsight Large amounts, divideand conquer, analyzing unstructured data, flexible schema Distributed computing Tabular Fast calculations In-memory, columns over rows MultidimensionalOLAP Sliceand dice, ad hoc querying Expandsstar schema into cube, preaggregatedcalculations Data Mining & Machine Learning Patterns, predictions, high volume Algorithms,estimations
  • 15.
    Secret: Excel datamining Excel add-in for SQL Server data mining
  • 16.
    Data mining add-infor business analysts •Ease of use •Rich data mining •Scalable
  • 17.
    Split Personality ofSSAS SS SQL AS NoSQL
  • 18.
    Excel Data MiningAdd-In For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2: http://www.microsoft.com/en-us/download/details.aspx?id=7294 For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578 For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578
  • 19.
    Secret: Data Scienceprovides an Epistemology Data mining is part of a complete data science cycle
  • 20.
    MarkTab Decision Cycle Analysis (science) Synthesis (art) GO Science needs science fiction --MarkTab
  • 21.
    MarkTab Decision Cycle Analysis (science) Synthesis (art) GO
  • 22.
  • 23.
    Secret: Microsoft isan analytics competitor
  • 24.
    Gartner 2013 MagicQuadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
  • 25.
    Gartner 2013 MagicQuadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
  • 26.
    KDNuggets2014What Analytics, BigData, Data mining, Data Science software you used in the past 12 months for a real project? http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
  • 27.
    KDNuggets2014What Analytics, BigData, Data mining, Data Science software you used in the past 12 months for a real project? http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
  • 28.
  • 29.
  • 30.
    SQL Server 2014 Business Intelligence and Business Analytics
  • 31.
    Secret: Many alreadyhave Microsoft analytics Business Intelligence and Business Analytics are included with most SQL Server licenses
  • 32.
    Self-service BI CorporateBI Evolution of BI
  • 33.
    Evolution of BI Niche Startups Self-service BI Corporate BI
  • 34.
    Data platform: SQLServer 2014 Database Services SQL Server* SQL Azure* ReplicationSQL Azure Data Sync* Full Text & Semantic Search* Data Integration Services Integration Services* Master Data Services* Data Quality Services* StreamInsight* Project “Austin”* Analytical Services Analysis Services* Data Mining PowerPivot* Reporting Services Reporting Services* SQL Azure Reporting* Report Builder Power View*
  • 35.
    Secret: Microsoft offerstwo choices SQL Server Analysis Services = SQL Server Data Mining Microsoft Azure Machine Learning
  • 36.
    Advanced analytic toolsfor data scientists •Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services) •Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services) •Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012) •Big Data analytics(Hadoop integration)
  • 37.
    What Enterprise Toolssupport SSAS? Data Mining SSMS SSIS PowerShell
  • 38.
    SSAS Data MiningCapacities SQL Server 2014Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis Services database 2^31-1 = 2,147,483,647 Maximum data mining attributes (variables) per structure 64K Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • 39.
    Microsoft Azure MachineLearning Bringsengineeringbestpracticestodatascience… Archiveforpredictivemodels,ensuringmodels arenotlost,deleted,orcorrupted. Search,discoveryandreuseexistingmodelsto buildontheworkofothers; Deploypredictivemodelintooperation,from DataLabtominimizetimetoinsight; Frequentlyupdatethepredictivemodel,to adapttochangingbusinessconditions. Everynewalgorithmaddedasamodule,everynewpredictivemodeldeployedwillflow tobuilduptheknowledgebaseandmakethe software morevaluable.
  • 40.
  • 41.
    Future: Most datais Text •Quantitative research = data mining •Qualitative research = text mining Two Research Types The future is combining both
  • 42.
    (iFilterRequired) Documents Full-TextKeyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  • 43.
    Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  • 44.
    Secret: Semantic Searchscales linearly Performance
  • 45.
    Integrated Full TextSearch (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTSquery performance 7-10 times faster than in SQL Server 2008 Worst-case iFTSquery response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 46.
    Linear Scale ofFTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
  • 47.
    Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) –explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 48.
  • 49.
    Major Websites SQLServer Data Mining http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx http://www.sqlserverdatamining.com/ Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/
  • 50.
    Software Dreamspark(students); BizSpark(businesses) SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx Microsoft Office http://office.microsoft.com/en-us/ Primer on Power BI --MarkTab http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx
  • 51.
    Organizations Professional Associationfor SQL Server http://www.sqlpass.org PASS Business Analytics Conference http://www.passbaconference.com
  • 52.
  • 53.
    Conclusion Excel datamining Data Science provides an epistemology Microsoft is an analytics competitor Many already have Microsoft analytics Microsoft offers two enterprise solutions Semantic search scales linearly
  • 54.
    Abstract If youhave a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.