Your SlideShare is downloading. ×
Secrets of Enterprise Data Mining 201305
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Secrets of Enterprise Data Mining 201305

565
views

Published on

Presented at SQL Saturday 220, Atlanta, GA, 201305. If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will …

Presented at SQL Saturday 220, Atlanta, GA, 201305. If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.

Published in: Business

1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total Views
565
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
12
Comments
1
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Secrets of EnterpriseData MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #220 Atlanta GAMay 18, 2013
  • 2. NetworkingInteractive
  • 3. About MarkTabTraining and Consulting withhttp://marktab.comData Mining Resources and Blog athttp://marktab.netTwitter @marktabnet
  • 4. InteractiveName (up to) three things you want from enterprisedata mining
  • 5. Secret: Excel dataminingExcel add-in for SQL Server data mining
  • 6. Secret: More than justSQL ServerMicrosoft continues to add machine learningtechnology
  • 7. Microsoft OffersBingMapsXbox KinectHacker MagnetSQL Server 2012Analysis Services (Multidimensional and Data Mining)Integration ServicesSemantic SearchHadoop PartnershipExcel Projects from Microsoft ResearchMicrosoft Data Lab: http://passfiles.sqlpass.org/vc/ba/PASSBAVC042513/PASSBAVC042513.pdf
  • 8. DefinitionsWhat is data mining?
  • 9. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  • 10. PurposesPhrase Goal“Data Mining” Inform actionable decisions“MachineLearning”Determine best performingalgorithm
  • 11. Secret: Give artists artData mining is part of a complete decision cycle
  • 12. MarkTab Decision CycleAnalysis(science)Synthesis(art)GOScience needs science fiction -- MarkTab
  • 13. MarkTab Decision CycleAnalysis(science)Synthesis(art)GO
  • 14. XKCD: Shopping Teams
  • 15. XKCD: Shopping Teams
  • 16. XKCD: Shopping Teams
  • 17. Secret: Microsoft is ananalytics competitorIndustry Comparisons 2012-2013
  • 18. Gartner 2013Magic Quadrant forBusiness Intelligenceand AnalyticsPlatformsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb– February 5, 2013
  • 19. Gartner 2013Magic Quadrant forData WarehouseDatabaseManagementSystemsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb– January 31, 2013
  • 20. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  • 21. SQL Server 2012Business Intelligence and Business Analytics
  • 22. New Platform options: managed servicesApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingPlatform(Self Managed)ApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingInfrastructure(as a Service)ApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingPlatform(as a Service)ApplicationsDataRuntimeMiddlewareDatabaseO/SVirtualizationServersStorageNetworkingSoftware(as a Service)ManagedServicesManagedServicesManagedServices
  • 23. SQL Release timelines1996SQL Server 6.51990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 20122005SQL Server 2005Unicode SupportNative XMLSQLCLRService BrokerIntegration Services1993SQL Server 4.21(NT)1995SQL Server 6.01989SQL Server 1.0(OS/2)2000SQL Server 2000Reporting Services2010SQL Server 2008 R2Data-tier AppsStreamInsightPowerPivotMaster Data Services2008SQL Server 2008Sparse ColumnsSpatial TypesFILESTREAM1998SQL Server 7.0Dynamic LockingAuto-TuningFull-text searchReplicationAnalysis Services1991SQL Server 1.1(OS/2)2012SQL Server 2012AlwaysOnColumnstoreFileTableSemantic SearchPower ViewApr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11Aug 10SQL Azure SU4 RTWDatabase CopyWeb AdminFeb 10SQL Azure RTWFeb 10SQL Azure SU1 RTWAlter EditionApr 10SQL Azure SU2 RTWMARSJun 10SQL Azure SU3 RTW50 GB DbSpatial TypeHierarchyId TypeDec 10SQL Azure SU6 RTWDataSync CTP2Apr 11SQL Azure SU V.NextMultiple ServersServer Mgmt APIJDBCDAC UpgradeNov 10DataMarket RTWSQL Azure Reporting CTP1Feb 11SQL Azure Reporting CTP2DataSync CTP2 UpdateJul 10DataSync CTP1Aug 11New Portal ExperienceSparse ColumnsSQL Azure Reporting CTP3SQL Azure DataSync CTP3DAC Import/Export ServiceDenali TSQL
  • 24. Secret: Many alreadyhave Microsoft analyticsBusiness Intelligence and Business Analytics areincluded with most SQL Server licenses
  • 25. Data platform: SQL Server 2012Database ServicesSQL Server*SQL Azure*ReplicationSQL Azure Data Sync*Full Text & SemanticSearch*Data IntegrationServicesIntegration Services*Master Data Services*Data Quality Services*StreamInsight*Project “Austin”*Analytical ServicesAnalysis Services*Data MiningPowerPivot*Reporting ServicesReporting Services*SQL Azure Reporting*Report BuilderPower View** New / improved in SQL Server 2012
  • 26. SQL Server 2012 EditionsRetrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  • 27. Secret: Microsoft offersthree enterprise toolsAll three tools support scaled solutions
  • 28. What Enterprise Tools support MicrosoftData Mining?DataMiningSSMS SSIS PowerShell
  • 29. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 30. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 31. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 32. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 33. Data Mining CapacitiesSQL Server 2008 R2 Analysis Services Object Maximum sizes/numbersMaximum data mining models per structure 2^31-1 = 2,147,483,647Maximum data mining structures per solution 2^31-1 = 2,147,483,647Maximum data mining structures per AnalysisServices database2^31-1 = 2,147,483,647Maximum data mining attributes (variables) perstructure2^31-1 = 2,147,483,647Reference:http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • 34. Semantic SearchText Mining
  • 35. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  • 36. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  • 37. (iFilter Required)DocumentsFull-TextKeywordIndex“FTI”iFiltersSemantic DocumentSimilarity Index “DSI”SemanticDatabaseSemanticKey PhraseIndex –Tag Index“TI”
  • 38. Languages Currently SupportedTraditional ChineseGermanEnglishFrenchItalianBrazilianRussianSwedishSimplified ChineseBritish EnglishPortugueseChinese (Hong Kong SAR, PRC)SpanishChinese (Singapore)Chinese (Macau SAR)
  • 39. Phases of Semantic IndexingFull Text Keyword Index “FTI”Semantic Key Phrase Index –Tag Index “TI”Semantic Document SimilarityIndex “DSI”http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 40. Secret: Semantic Searchscales linearlyPerformance
  • 41. Integrated Full Text Search (iFTS)Improved Performance and Scale:Scale-up to 350M documents for storage and searchiFTS query performance 7-10 times faster than in SQL Server 2008Worst-case iFTS query response times less than 3 sec for corpusSimilar or better than main database search competitors(2012, Michael Rys, Microsoft)
  • 42. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industryTime in Seconds vs. Number of Documents(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 43. Text Mining ReferencesVideohttp://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-Searchhttp://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demohttp://msdn.microsoft.com/en-us/library/gg492075.aspxPaperhttp://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 44. Microsoft ResourcesLinks
  • 45. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT)http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professionalhttp://office.microsoft.com/en-us/try
  • 46. OrganizationsProfessional Association for SQL Server http://www.sqlpass.orgAtlanta MDF http://www.atlantamdf.com/Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  • 47. InteractiveTakeaways
  • 48. Conclusion: Seven SecretsExcel data miningMore than just SQL ServerSuccess involves everyoneMicrosoft is an analytics competitorMany already have Microsoft analyticsMicrosoft offers three enterprise toolsSemantic search scales linearly
  • 49. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com
  • 50. AbstractIf you have a SQL Server license (Standard or higher) then you already have the abilityto start data mining. In this new presentation, you will see how to scale up datamining from the free Excel 2013 add-in to production use. Aimed at beginning tointermediate data miners, this presentation will show how mining models move fromdevelopment to production. We will use SQL Server 2012 tools including SSMS, SSIS,and SSDT.