Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

Secrets of Enterprise Data Mining
Mark Tabladillo, Ph.D. (MVP, SAS Expert)
Consultant, SolidQ
SQL Saturday Oregon
November 1, 2014

Mark Tab
SQL Server MVP; SAS Expert
Consulting
Training
Teaching
Presenting
Linked In
@MarkTabNet

Interactive
Name (up to) three things you want from enterprise data mining

Definitions
What is data mining?

Definition
Data mining is the automated or semi-automated process of discovering patterns in data
Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery

Purposes
Phrase
Goal
“Data Mining”
Inform actionabledecisions
“Machine Learning”
Determine best performingalgorithm

How could data mining apply?
Let’s look at three companies

What
Why
How
Relational Data Warehouse
Familiarway to store, fast retrieval, consistency, scalable
Database, relational constructs,indexes
Hadoop & HDInsight
Large amounts, divideand conquer, analyzing unstructured data, flexible schema
Distributed computing
Tabular
Fast calculations
In-memory, columns over rows
MultidimensionalOLAP
Sliceand dice, ad hoc querying
Expandsstar schema into cube, preaggregatedcalculations
Data Mining & Machine Learning
Patterns, predictions, high volume
Algorithms,estimations

Secret: Excel data mining
Excel add-in for SQL Server data mining

Data mining add-in for business analysts
•Ease of use
•Rich data mining
•Scalable

Split Personality of SSAS
SS
SQL
AS
NoSQL

Excel Data Mining Add-In
For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2:
http://www.microsoft.com/en-us/download/details.aspx?id=7294
For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier:
For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier:

Secret: Data Science provides an Epistemology
Data mining is part of a complete data science cycle

MarkTab Decision Cycle
Analysis
(science)
Synthesis
(art)
GO
Science needs science fiction --MarkTab

MarkTab Decision Cycle
Analysis
(science)
Synthesis
(art)
GO

Secret: Microsoft is an analytics competitor

Gartner 2013
Magic Quadrant for Business Intelligence and Analytics Platforms
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013

Gartner 2013
Magic Quadrant for Data Warehouse Database Management Systems
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013

KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project?
http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html

KDNuggets2014
http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html

SQL Server 2014
Business Intelligence and Business Analytics

Secret: Many already have Microsoft analytics
Business Intelligence and Business Analytics are included with most SQL Server licenses

Self-service BI
Corporate BI
Evolution of BI

Evolution of BI
Niche Startups
Self-service BI
Corporate BI

Data platform: SQL Server 2014
Database Services
SQL Server* SQL Azure*
ReplicationSQL Azure Data Sync*
Full Text & Semantic Search*
Data Integration Services
Integration Services*
Master Data Services*
Data Quality Services*
StreamInsight* Project “Austin”*
Analytical Services
Analysis Services*
Data Mining
PowerPivot*
Reporting Services
Reporting Services* SQL Azure Reporting*
Report Builder
Power View*

Secret: Microsoft offers two choices
SQL Server Analysis Services = SQL Server Data Mining
Microsoft Azure Machine Learning

Advanced analytic tools for data scientists
•Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services)
•Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services)
•Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012)
•Big Data analytics(Hadoop integration)

What Enterprise Tools support SSAS?
Data Mining
SSMS
SSIS
PowerShell

SSAS Data Mining Capacities
SQL Server 2014Analysis Services Object
Maximum sizes/numbers
Maximum data mining models per structure
2^31-1 = 2,147,483,647
Maximum data mining structures per solution
2^31-1 = 2,147,483,647
Maximum data mining structures per Analysis Services database
2^31-1 = 2,147,483,647
Maximum data mining attributes (variables) per structure
64K
Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/

Microsoft Azure Machine Learning
Bringsengineeringbestpracticestodatascience…
Archiveforpredictivemodels,ensuringmodels
arenotlost,deleted,orcorrupted.
Search,discoveryandreuseexistingmodelsto
buildontheworkofothers;
Deploypredictivemodelintooperation,from
DataLabtominimizetimetoinsight;
Frequentlyupdatethepredictivemodel,to adapttochangingbusinessconditions.
Everynewalgorithmaddedasamodule,everynewpredictivemodeldeployedwillflow
tobuilduptheknowledgebaseandmakethe software morevaluable.

Future: Most data is Text
•Quantitative research = data mining
•Qualitative research = text mining
Two Research Types
The future is combining both

(iFilterRequired)
Documents
Full-Text Keyword Index
“FTI”
iFilters
Semantic Document Similarity Index “DSI”
Semantic Database
Semantic Key Phrase Index –
Tag Index “TI”

Languages Currently Supported
Traditional Chinese
German
English
French
Italian
Brazilian
Russian
Swedish
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)

Secret: Semantic Search scales linearly
Performance

Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTSquery performance 7-10 times faster than in SQL Server 2008
Worst-case iFTSquery response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)

Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry
Time in Seconds vs. Number of Documents
(2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)

Text Mining References
Video
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search
http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) –explains the demo
http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf

Major Websites
SQL Server Data Mining
http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx
http://www.sqlserverdatamining.com/
Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/

Software
Dreamspark(students); BizSpark(businesses)
SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx
Microsoft Office
http://office.microsoft.com/en-us/
Primer on Power BI --MarkTab
http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx

Organizations
Professional Association for SQL Server http://www.sqlpass.org
PASS Business Analytics Conference http://www.passbaconference.com

Conclusion
Excel data mining
Data Science provides an epistemology
Microsoft is an analytics competitor
Many already have Microsoft analytics
Microsoft offers two enterprise solutions
Semantic search scales linearly

Abstract
If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.

Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

Similar to Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411 (20)

More from Mark Tabladillo

More from Mark Tabladillo (20)

Recently uploaded

Recently uploaded (20)

Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411