Secrets of Enterprise Data Mining 201310

3,013 views

Published on

If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.

Published in: Business, Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
3,013
On SlideShare
0
From Embeds
0
Number of Embeds
113
Actions
Shares
0
Downloads
32
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

Secrets of Enterprise Data Mining 201310

  1. 1. Secrets of Enterprise Data Mining Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT) Silicon Valley Code Camp October 6, 2013
  2. 2. Networking Interactive
  3. 3. About MarkTab Training and Consulting with http://marktab.com Data Mining Resources and Blog at http://marktab.net Ph.D. – Industrial Engineering, Georgia Tech Training and consulting internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  4. 4. Interactive Name (up to) three things you want from enterprise data mining
  5. 5. Definitions What is data mining?
  6. 6. Definition Data mining is the automated or semi-automated process of discovering patterns in data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  7. 7. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Learning” Determine best performing algorithm
  8. 8. Secret: Excel data mining Excel add-in for SQL Server data mining
  9. 9. Split Personality of SSAS SS SQL AS NoSQL
  10. 10. Excel Data Mining Add-In For Office 2007: The 32-bit data mining add-in works with SQL Server 2008 or 2008 R2: http://www.microsoft.com/en-us/download/details.aspx?id=7294 For Office 2010: The 32- or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578 For Office 2013: The 32- or 64-bit data mining add-in works with SQL Server 2012 or earlier: http://www.microsoft.com/en-us/download/details.aspx?id=35578
  11. 11. Secret: Give artists content Data mining is part of a complete decision cycle
  12. 12. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO Science needs science fiction -- MarkTab
  13. 13. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO
  14. 14. Currency of Science Notes
  15. 15. Secret: Microsoft is an analytics competitor Industry Comparisons 2012-2013
  16. 16. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  17. 17. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  18. 18. KDNuggets 2013 http://www.kdnuggets.com/2013/06/kdnuggets-annual- software-poll-rapidminer-r-vie-for-first-place.html
  19. 19. SQL Server 2012 Business Intelligence and Business Analytics
  20. 20. New Platform options: managed services Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Platform (Self Managed) Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Infrastructure (as a Service) Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Platform (as a Service) Applications Data Runtime Middleware Database O/S Virtualization Servers Storage Networking Software (as a Service) ManagedServices ManagedServices ManagedServices
  21. 21. SQL Release timelines 1996 SQL Server 6.5 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2005 SQL Server 2005 Unicode Support Native XML SQLCLR Service Broker Integration Services 1993 SQL Server 4.21 (NT) 1995 SQL Server 6.0 1989 SQL Server 1.0 (OS/2) 2000 SQL Server 2000 Reporting Services 2010 SQL Server 2008 R2 Data-tier Apps StreamInsight PowerPivot Master Data Services 2008 SQL Server 2008 Sparse Columns Spatial Types FILESTREAM 1998 SQL Server 7.0 Dynamic Locking Auto-Tuning Full-text search Replication Analysis Services 1991 SQL Server 1.1 (OS/2) 2012 SQL Server 2012 AlwaysOn Columnstore FileTable Semantic Search Power View Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Aug 10 SQL Azure SU4 RTW Database Copy Web Admin Feb 10 SQL Azure RTW Feb 10 SQL Azure SU1 RTW Alter Edition Apr 10 SQL Azure SU2 RTW MARS Jun 10 SQL Azure SU3 RTW 50 GB Db Spatial Type HierarchyId Type Dec 10 SQL Azure SU6 RTW DataSync CTP2 Apr 11 SQL Azure SU V.Next Multiple Servers Server Mgmt API JDBC DAC Upgrade Nov 10 DataMarket RTW SQL Azure Reporting CTP1 Feb 11 SQL Azure Reporting CTP2 DataSync CTP2 Update Jul 10 DataSync CTP1 Aug 11 New Portal Experience Sparse Columns SQL Azure Reporting CTP3 SQL Azure DataSync CTP3 DAC Import/Export Service Denali TSQL
  22. 22. Secret: Many already have Microsoft analytics Business Intelligence and Business Analytics are included with most SQL Server licenses
  23. 23. Data platform: SQL Server 2012 Database Services SQL Server* SQL Azure* Replication SQL Azure Data Sync* Full Text & Semantic Search* Data Integration Services Integration Services* Master Data Services* Data Quality Services* StreamInsight* Project “Austin”* Analytical Services Analysis Services* Data Mining PowerPivot* Reporting Services Reporting Services* SQL Azure Reporting* Report Builder Power View* * New / improved in SQL Server 2012
  24. 24. SQL Server 2012 Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  25. 25. Secret: Microsoft offers three enterprise tools All three tools support scaled data mining solutions
  26. 26. What Enterprise Tools support Microsoft Data Mining? Data Mining SSMS SSIS PowerShell
  27. 27. Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis Services database 2^31-1 = 2,147,483,647 Maximum data mining attributes (variables) per structure 64K Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  28. 28. Semantic Search Text Mining
  29. 29. Future: Most data is Text • Quantitative research = data mining • Qualitative research = text mining Two Research Types The future is combining both
  30. 30. Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  31. 31. (iFilter Required) Documents Full-Text Keyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  32. 32. Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  33. 33. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Key Phrase Index – Tag Index “TI” Semantic Document Similarity Index “DSI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  34. 34. Secret: Semantic Search scales linearly Performance
  35. 35. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  36. 36. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  37. 37. Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  38. 38. Microsoft Resources Links
  39. 39. Software SQL Server 2012 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx Microsoft Office 2012 Professional http://office.microsoft.com/en-us/try
  40. 40. Organizations Professional Association for SQL Server http://www.sqlpass.org PASS Business Analytics Conference http://www.passbaconference.com Microsoft TechEd North America http://northamerica.msteched.com/
  41. 41. Secret: More than just SQL Server Microsoft continues to add machine learning technology
  42. 42. Microsoft Offers Bing Maps Xbox Kinect Hacker Magnet SQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop Partnership (therefore Mahout) Excel Projects from Microsoft Research
  43. 43. Interactive Takeaways
  44. 44. Conclusion: Seven Secrets Excel data mining Give Artists Content Microsoft is an analytics competitor Many already have Microsoft analytics Microsoft offers three enterprise tools Semantic search scales linearly More than just SQL Server
  45. 45. Connect Newsletter http://eepurl.com/ELqS9 Data Mining Resources and blog http://marktab.net Data Mining Training and Consulting (especially Microsoft and SAS) http://marktab.com
  46. 46. Abstract If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.

×