Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

463 views

Published on

If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411

  1. 1. Secrets of Enterprise Data Mining Mark Tabladillo, Ph.D. (MVP, SAS Expert) Consultant, SolidQ SQL Saturday Oregon November 1, 2014
  2. 2. Networking Say hello
  3. 3. Mark Tab SQL Server MVP; SAS Expert Consulting Training Teaching Presenting Linked In @MarkTabNet
  4. 4. Interactive Name (up to) three things you want from enterprise data mining
  5. 5. Definitions What is data mining?
  6. 6. Definition Data mining is the automated or semi-automated process of discovering patterns in data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  7. 7. Purposes Phrase Goal “Data Mining” Inform actionabledecisions “Machine Learning” Determine best performingalgorithm
  8. 8. How could data mining apply? Let’s look at three companies
  9. 9. Telecommunications
  10. 10. Oil and Gas
  11. 11. Volkswagen Group
  12. 12. What Why How Relational Data Warehouse Familiarway to store, fast retrieval, consistency, scalable Database, relational constructs,indexes Hadoop & HDInsight Large amounts, divideand conquer, analyzing unstructured data, flexible schema Distributed computing Tabular Fast calculations In-memory, columns over rows MultidimensionalOLAP Sliceand dice, ad hoc querying Expandsstar schema into cube, preaggregatedcalculations Data Mining & Machine Learning Patterns, predictions, high volume Algorithms,estimations
  13. 13. Secret: Excel data mining Excel add-in for SQL Server data mining
  14. 14. Data mining add-in for business analysts •Ease of use •Rich data mining •Scalable
  15. 15. Split Personality of SSAS SS SQL AS NoSQL
  16. 16. Excel Data Mining Add-In For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2: http://www.microsoft.com/en-us/download/details.aspx?id=7294 For Office 2010: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578 For Office 2013: The 32-or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578
  17. 17. Secret: Data Science provides an Epistemology Data mining is part of a complete data science cycle
  18. 18. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO Science needs science fiction --MarkTab
  19. 19. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO
  20. 20. Currency of Science Notes
  21. 21. Secret: Microsoft is an analytics competitor
  22. 22. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
  23. 23. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
  24. 24. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
  25. 25. KDNuggets2014What Analytics, Big Data, Data mining, Data Science software you used in the past 12 months for a real project? http://www.kdnuggets.com/2014/06/analytics-data- mining-data-science-software-poll-analyzed.html
  26. 26. KDNuggets2014 http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
  27. 27. KDNuggets2014 http://www.kdnuggets.com/2014/08/four-main-languages-analytics-data-mining-data-science.html
  28. 28. SQL Server 2014 Business Intelligence and Business Analytics
  29. 29. Secret: Many already have Microsoft analytics Business Intelligence and Business Analytics are included with most SQL Server licenses
  30. 30. Self-service BI Corporate BI Evolution of BI
  31. 31. Evolution of BI Niche Startups Self-service BI Corporate BI
  32. 32. Data platform: SQL Server 2014 Database Services SQL Server* SQL Azure* ReplicationSQL Azure Data Sync* Full Text & Semantic Search* Data Integration Services Integration Services* Master Data Services* Data Quality Services* StreamInsight* Project “Austin”* Analytical Services Analysis Services* Data Mining PowerPivot* Reporting Services Reporting Services* SQL Azure Reporting* Report Builder Power View*
  33. 33. Secret: Microsoft offers two choices SQL Server Analysis Services = SQL Server Data Mining Microsoft Azure Machine Learning
  34. 34. Advanced analytic tools for data scientists •Advanced descriptive analytics (e.g. clustering algorithm in SQL Server Analysis Services) •Predictive analytics (Neural Nets, Regression, Decision Tree, Time Series, Naïve Bayes algorithms in SQL Server Analysis Services) •Further advanced analytics (Semantic Search and Geospatial Data and functions in SQL Server 2012) •Big Data analytics(Hadoop integration)
  35. 35. What Enterprise Tools support SSAS? Data Mining SSMS SSIS PowerShell
  36. 36. SSAS Data Mining Capacities SQL Server 2014Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis Services database 2^31-1 = 2,147,483,647 Maximum data mining attributes (variables) per structure 64K Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  37. 37. Microsoft Azure Machine Learning Bringsengineeringbestpracticestodatascience… Archiveforpredictivemodels,ensuringmodels arenotlost,deleted,orcorrupted. Search,discoveryandreuseexistingmodelsto buildontheworkofothers; Deploypredictivemodelintooperation,from DataLabtominimizetimetoinsight; Frequentlyupdatethepredictivemodel,to adapttochangingbusinessconditions. Everynewalgorithmaddedasamodule,everynewpredictivemodeldeployedwillflow tobuilduptheknowledgebaseandmakethe software morevaluable.
  38. 38. Semantic Search Text Mining
  39. 39. Future: Most data is Text •Quantitative research = data mining •Qualitative research = text mining Two Research Types The future is combining both
  40. 40. (iFilterRequired) Documents Full-Text Keyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  41. 41. Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  42. 42. Secret: Semantic Search scales linearly Performance
  43. 43. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTSquery performance 7-10 times faster than in SQL Server 2008 Worst-case iFTSquery response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  44. 44. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
  45. 45. Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) –explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  46. 46. Microsoft Resources Links
  47. 47. Major Websites SQL Server Data Mining http://technet.microsoft.com/en-us/sqlserver/cc510301.aspx http://www.sqlserverdatamining.com/ Microsoft Azure Machine Learning (currently in preview) http://azure.microsoft.com/en-us/services/machine-learning/
  48. 48. Software Dreamspark(students); BizSpark(businesses) SQL Server 2014 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/en-us/server-cloud/products/sql-server/default.aspx Microsoft Office http://office.microsoft.com/en-us/ Primer on Power BI --MarkTab http://blogs.msdn.com/b/mvpawardprogram/archive/2014/08/04/primer-on-power-bi-business- intelligence.aspx
  49. 49. Organizations Professional Association for SQL Server http://www.sqlpass.org PASS Business Analytics Conference http://www.passbaconference.com
  50. 50. Interactive Takeaways
  51. 51. Conclusion Excel data mining Data Science provides an epistemology Microsoft is an analytics competitor Many already have Microsoft analytics Microsoft offers two enterprise solutions Semantic search scales linearly
  52. 52. Abstract If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2014 tools including SSMS, SSIS, and SSDT.

×