Secrets of Enterprise Data Mining 201305

836 views

Published on

Presented at SQL Saturday 220, Atlanta, GA, 201305. If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.

Published in: Business
1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
836
On SlideShare
0
From Embeds
0
Number of Embeds
165
Actions
Shares
0
Downloads
14
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Secrets of Enterprise Data Mining 201305

  1. 1. Secrets of EnterpriseData MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #220 Atlanta GAMay 18, 2013
  2. 2. NetworkingInteractive
  3. 3. About MarkTabTraining and Consulting withhttp://marktab.comData Mining Resources and Blog athttp://marktab.netTwitter @marktabnet
  4. 4. InteractiveName (up to) three things you want from enterprisedata mining
  5. 5. Secret: Excel dataminingExcel add-in for SQL Server data mining
  6. 6. Secret: More than justSQL ServerMicrosoft continues to add machine learningtechnology
  7. 7. Microsoft OffersBingMapsXbox KinectHacker MagnetSQL Server 2012Analysis Services (Multidimensional and Data Mining)Integration ServicesSemantic SearchHadoop PartnershipExcel Projects from Microsoft ResearchMicrosoft Data Lab: http://passfiles.sqlpass.org/vc/ba/PASSBAVC042513/PASSBAVC042513.pdf
  8. 8. DefinitionsWhat is data mining?
  9. 9. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  10. 10. PurposesPhrase Goal“Data Mining” Inform actionable decisions“MachineLearning”Determine best performingalgorithm
  11. 11. Secret: Give artists artData mining is part of a complete decision cycle
  12. 12. MarkTab Decision CycleAnalysis(science)Synthesis(art)GOScience needs science fiction -- MarkTab
  13. 13. MarkTab Decision CycleAnalysis(science)Synthesis(art)GO
  14. 14. XKCD: Shopping Teams
  15. 15. XKCD: Shopping Teams
  16. 16. XKCD: Shopping Teams
  17. 17. Secret: Microsoft is ananalytics competitorIndustry Comparisons 2012-2013
  18. 18. Gartner 2013Magic Quadrant forBusiness Intelligenceand AnalyticsPlatformsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb– February 5, 2013
  19. 19. Gartner 2013Magic Quadrant forData WarehouseDatabaseManagementSystemsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb– January 31, 2013
  20. 20. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  21. 21. SQL Server 2012Business Intelligence and Business Analytics
  22. 22. New Platform options: managed servicesApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingPlatform(Self Managed)ApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingInfrastructure(as a Service)ApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingPlatform(as a Service)ApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingSoftware(as a Service)ManagedServicesManagedServicesManagedServices
  23. 23. SQL Release timelines1996SQL Server 6.51990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 20122005SQL Server 2005Unicode SupportNative XMLSQLCLRService BrokerIntegration Services1993SQL Server 4.21(NT)1995SQL Server 6.01989SQL Server 1.0(OS/2)2000SQL Server 2000Reporting Services2010SQL Server 2008 R2Data-tier AppsStreamInsightPowerPivotMaster Data Services2008SQL Server 2008Sparse ColumnsSpatial TypesFILESTREAM1998SQL Server 7.0Dynamic LockingAuto-TuningFull-text searchReplicationAnalysis Services1991SQL Server 1.1(OS/2)2012SQL Server 2012AlwaysOnColumnstoreFileTableSemantic SearchPower ViewApr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11Aug 10SQL Azure SU4 RTWDatabase CopyWeb AdminFeb 10SQL Azure RTWFeb 10SQL Azure SU1 RTWAlter EditionApr 10SQL Azure SU2 RTWMARSJun 10SQL Azure SU3 RTW50 GB DbSpatial TypeHierarchyId TypeDec 10SQL Azure SU6 RTWDataSync CTP2Apr 11SQL Azure SU V.NextMultiple ServersServer Mgmt APIJDBCDAC UpgradeNov 10DataMarket RTWSQL Azure Reporting CTP1Feb 11SQL Azure Reporting CTP2DataSync CTP2 UpdateJul 10DataSync CTP1Aug 11New Portal ExperienceSparse ColumnsSQL Azure Reporting CTP3SQL Azure DataSync CTP3DAC Import/Export ServiceDenali TSQL
  24. 24. Secret: Many alreadyhave Microsoft analyticsBusiness Intelligence and Business Analytics areincluded with most SQL Server licenses
  25. 25. Data platform: SQL Server 2012Database ServicesSQL Server*SQL Azure*ReplicationSQL Azure Data Sync*Full Text & SemanticSearch*Data IntegrationServicesIntegration Services*Master Data Services*Data Quality Services*StreamInsight*Project “Austin”*Analytical ServicesAnalysis Services*Data MiningPowerPivot*Reporting ServicesReporting Services*SQL Azure Reporting*Report BuilderPower View** New / improved in SQL Server 2012
  26. 26. SQL Server 2012 EditionsRetrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  27. 27. Secret: Microsoft offersthree enterprise toolsAll three tools support scaled solutions
  28. 28. What Enterprise Tools support MicrosoftData Mining?DataMiningSSMS SSIS PowerShell
  29. 29. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  30. 30. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  31. 31. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  32. 32. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  33. 33. Data Mining CapacitiesSQL Server 2008 R2 Analysis Services Object Maximum sizes/numbersMaximum data mining models per structure 2^31-1 = 2,147,483,647Maximum data mining structures per solution 2^31-1 = 2,147,483,647Maximum data mining structures per AnalysisServices database2^31-1 = 2,147,483,647Maximum data mining attributes (variables) perstructure2^31-1 = 2,147,483,647Reference:http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  34. 34. Semantic SearchText Mining
  35. 35. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  36. 36. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  37. 37. (iFilter Required)DocumentsFull-TextKeywordIndex“FTI”iFiltersSemantic DocumentSimilarity Index “DSI”SemanticDatabaseSemanticKey PhraseIndex –Tag Index“TI”
  38. 38. Languages Currently SupportedTraditional ChineseGermanEnglishFrenchItalianBrazilianRussianSwedishSimplified ChineseBritish EnglishPortugueseChinese (Hong Kong SAR, PRC)SpanishChinese (Singapore)Chinese (Macau SAR)
  39. 39. Phases of Semantic IndexingFull Text Keyword Index “FTI”Semantic Key Phrase Index –Tag Index “TI”Semantic Document SimilarityIndex “DSI”http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  40. 40. Secret: Semantic Searchscales linearlyPerformance
  41. 41. Integrated Full Text Search (iFTS)Improved Performance and Scale:Scale-up to 350M documents for storage and searchiFTS query performance 7-10 times faster than in SQL Server 2008Worst-case iFTS query response times less than 3 sec for corpusSimilar or better than main database search competitors(2012, Michael Rys, Microsoft)
  42. 42. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industryTime in Seconds vs. Number of Documents(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  43. 43. Text Mining ReferencesVideohttp://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-Searchhttp://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demohttp://msdn.microsoft.com/en-us/library/gg492075.aspxPaperhttp://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  44. 44. Microsoft ResourcesLinks
  45. 45. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT)http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professionalhttp://office.microsoft.com/en-us/try
  46. 46. OrganizationsProfessional Association for SQL Server http://www.sqlpass.orgAtlanta MDF http://www.atlantamdf.com/Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  47. 47. InteractiveTakeaways
  48. 48. Conclusion: Seven SecretsExcel data miningMore than just SQL ServerSuccess involves everyoneMicrosoft is an analytics competitorMany already have Microsoft analyticsMicrosoft offers three enterprise toolsSemantic search scales linearly
  49. 49. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com
  50. 50. AbstractIf you have a SQL Server license (Standard or higher) then you already have the abilityto start data mining. In this new presentation, you will see how to scale up datamining from the free Excel 2013 add-in to production use. Aimed at beginning tointermediate data miners, this presentation will show how mining models move fromdevelopment to production. We will use SQL Server 2012 tools including SSMS, SSIS,and SSDT.

×