Secrets of Enterprise Data Mining

545 views

Published on

If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT. Technology includes SQL Server 2012 SP1, Office 2013, Windows 8 Professional.

Published in: Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
545
On SlideShare
0
From Embeds
0
Number of Embeds
28
Actions
Shares
0
Downloads
15
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Secrets of Enterprise Data Mining

  1. 1. Secrets of EnterpriseData MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #177 Mountain View, CAFebruary 23, 2013
  2. 2. NetworkingInteractive
  3. 3. About MarkTabTraining and Consulting with Ph.D. – Industrial Engineering,http://marktab.com Georgia TechData Mining Resources and Blog at Training and consultinghttp://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  4. 4. InteractiveName (up to) three things you want from enterprisedata mining
  5. 5. Secret: Excel dataminingExcel add-in for SQL Server data mining
  6. 6. Secret: More than justSQL ServerMicrosoft continues to add machine learningtechnology
  7. 7. Microsoft OffersBing MapsXbox Kinect Hacker MagnetSQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop PartnershipExcel Projects from Microsoft Research
  8. 8. DefinitionsWhat is data mining?
  9. 9. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  10. 10. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Determine best performing Learning” algorithm
  11. 11. Secret: Give artists artData mining is part of a complete decision cycle
  12. 12. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  13. 13. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  14. 14. XKCD: Shopping Teams
  15. 15. XKCD: Shopping Teams
  16. 16. XKCD: Shopping Teams
  17. 17. Secret: Microsoft is ananalytics competitorIndustry Comparisons 2012-2013
  18. 18. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  19. 19. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  20. 20. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  21. 21. SQL Server 2012Business Intelligence and Business Analytics
  22. 22. New Platform options: managed services Platform Infrastructure Platform Software(Self Managed) (as a Service) (as a Service) (as a Service) Applications Applications Applications Applications Data Data Data Data Runtime Runtime Runtime Runtime Middleware Middleware Middleware Middleware Managed Services Database Database Database Database Managed Services O/S O/S O/S O/S Virtualization Virtualization Virtualization Virtualization Managed Services Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
  23. 23. SQL Release timelines 2008 SQL Server 2008 2012 SQL Server 2012 AlwaysOn Columnstore 1989 1993 2000 Sparse Columns FileTable SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search (OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1991 1995 1998 2005 2010 SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2 (OS/2) Dynamic Locking Unicode Support Data-tier Apps Auto-Tuning Native XML StreamInsight Full-text search SQLCLR PowerPivot Replication Service Broker Master Data Services Analysis Services Integration Services Aug 11 Aug 10 New Portal Experience SQL Azure SU4 RTW Feb 11 Sparse Columns Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3 Web Admin Dec DataSync CTP2 Update 10 SQL Azure DataSync CTP3 Apr 10 Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Feb 10 Jun 10 Nov 10 Apr 11 SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers Spatial Type Server Mgmt API HierarchyId Type JDBC DAC Upgrade
  24. 24. Secret: Many alreadyhave Microsoft analyticsBusiness Intelligence and Business Analytics areincluded with most SQL Server licenses
  25. 25. Data platform: SQL Server 2012 Data Integration Database Services Analytical Services Reporting Services Services SQL Server* Integration Services* Reporting Services* Analysis Services* SQL Azure* SQL Azure Reporting* Master Data Services* Replication Data Mining Report Builder SQL Azure Data Sync* Data Quality Services* Full Text & Semantic StreamInsight* PowerPivot* Power View* Search* Project “Austin”** New / improved in SQL Server 2012
  26. 26. SQL Server 2012 Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  27. 27. Secret: Microsoft offersthree enterprise toolsAll three tools support scaled solutions
  28. 28. What Enterprise Tools support MicrosoftData Mining? Data Mining SSMS SSIS PowerShell
  29. 29. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  30. 30. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  31. 31. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  32. 32. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  33. 33. Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis 2^31-1 = 2,147,483,647 Services database Maximum data mining attributes (variables) per 2^31-1 = 2,147,483,647 structureReference:http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  34. 34. Semantic SearchText Mining
  35. 35. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  36. 36. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  37. 37. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  38. 38. Languages Currently SupportedTraditional Chinese Simplified ChineseGerman British EnglishEnglish PortugueseFrench Chinese (Hong Kong SAR, PRC)Italian SpanishBrazilian Chinese (Singapore)Russian Chinese (Macau SAR)Swedish
  39. 39. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  40. 40. Secret: Semantic Searchscales linearlyPerformance
  41. 41. Integrated Full Text Search (iFTS)Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors(2012, Michael Rys, Microsoft)
  42. 42. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  43. 43. Text Mining ReferencesVideo http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspxPaper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  44. 44. Microsoft ResourcesLinks
  45. 45. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professional http://office.microsoft.com/en-us/try
  46. 46. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  47. 47. InteractiveTakeaways
  48. 48. Conclusion: Seven SecretsExcel data miningMore than just SQL ServerSuccess involves everyoneMicrosoft is an analytics competitorMany already have Microsoft analyticsMicrosoft offers three enterprise toolsSemantic search scales linearly
  49. 49. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com
  50. 50. AbstractIf you have a SQL Server license (Standard or higher) then you already have the abilityto start data mining. In this new presentation, you will see how to scale up datamining from the free Excel 2013 add-in to production use. Aimed at beginning tointermediate data miners, this presentation will show how mining models move fromdevelopment to production. We will use SQL Server 2012 tools including SSMS, SSIS,and SSDT.

×