Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An overview of Microsoft data mining technology


Published on

Presented for 30 at the Atlanta .NET User Group

Published in: Business
  • Be the first to comment

An overview of Microsoft data mining technology

  1. 1. An Overview ofMicrosoft Data MiningTechnologyMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)March 25, 2013Atlanta .NET User Group
  2. 2. NetworkingInteractive
  3. 3. About MarkTabTraining and Consulting with Ph.D. – Industrial Engineering, Georgia TechData Mining Resources and Blog at Training and consulting internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  4. 4. InteractiveName three things you want from enterprise datamining
  5. 5. Microsoft OffersBing MapsXbox Kinect Hacker MagnetSQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop PartnershipExcel Projects from Microsoft Research
  6. 6. Outline
  7. 7. DefinitionsWhat is data mining?
  8. 8. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  9. 9. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Determine best performing Learning” algorithm
  10. 10. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  11. 11. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  12. 12. Industry Comparisons2012-2013
  13. 13. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from – February 5, 2013
  14. 14. Microsoft ResponseFocus on familiar, intuitive user experiences delivered via high quality, industry-leadingproducts that businesses already know and use today is key to making BI trulyaccessible to all users.By providing Business Intelligence capabilities in familiar tools such as Excel andSharePoint, we empower an entirely new segment of business users to build andconsume rich BI solutions as part of their everyday work.Delivering the server-side capabilities to enable self-service BI via SharePoint and SQLServer provides a common, scalable data platform to handle any data, any size, fromanywhere, and tackle all of your Big Data needs.Retrieved from -- Feb 2013
  15. 15. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from – January 31, 2013
  16. 16. KDNuggets 2012
  17. 17. SQL Server 2012Business Intelligence and Business Analytics
  18. 18. New Platform options: managed services Platform Infrastructure Platform Software(Self Managed) (as a Service) (as a Service) (as a Service) Applications Applications Applications Applications Data Data Data Data Runtime Runtime Runtime Runtime Middleware Middleware Middleware Middleware Managed Services Database Database Database Database Managed Services O/S O/S O/S O/S Virtualization Virtualization Virtualization Virtualization Managed Services Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
  19. 19. SQL Release timelines 2008 SQL Server 2008 2012 SQL Server 2012 AlwaysOn Columnstore 1989 1993 2000 Sparse Columns FileTable SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search (OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1991 1995 1998 2005 2010 SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2 (OS/2) Dynamic Locking Unicode Support Data-tier Apps Auto-Tuning Native XML StreamInsight Full-text search SQLCLR PowerPivot Replication Service Broker Master Data Services Analysis Services Integration Services Aug 11 Aug 10 New Portal Experience SQL Azure SU4 RTW Feb 11 Sparse Columns Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3 Web Admin Dec DataSync CTP2 Update 10 SQL Azure DataSync CTP3 Apr 10 Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Feb 10 Jun 10 Nov 10 Apr 11 SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers Spatial Type Server Mgmt API HierarchyId Type JDBC DAC Upgrade
  20. 20. Data platform: SQL Server 2012 Data Integration Database Services Analytical Services Reporting Services Services SQL Server* Integration Services* Reporting Services* Analysis Services* SQL Azure* SQL Azure Reporting* Master Data Services* Replication Data Mining Report Builder SQL Azure Data Sync* Data Quality Services* Full Text & Semantic StreamInsight* PowerPivot* Power View* Search* Project “Austin”** New / improved in SQL Server 2012
  21. 21. SQL Server 2012 Editions Retrieved from -- February 2013
  22. 22. What Enterprise Tools support MicrosoftData Mining? Data Mining SSMS SSIS PowerShell
  23. 23. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  24. 24. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  25. 25. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  26. 26. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  27. 27. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  28. 28.
  29. 29.
  30. 30. Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis 2^31-1 = 2,147,483,647 Services database Maximum data mining attributes (variables) per 2^31-1 = 2,147,483,647 structureReference:
  31. 31. Third-PartyPredixion Software
  32. 32. Semantic SearchText Mining
  33. 33. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  34. 34. Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on these phrases, can identify (by score) similar documents
  35. 35. FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  36. 36. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  37. 37. From Documents to Output Office Varchar PDF NVarchar Rowset Output with Scores
  38. 38. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  39. 39. Languages Currently SupportedTraditional Chinese Simplified ChineseGerman British EnglishEnglish PortugueseFrench Chinese (Hong Kong SAR, PRC)Italian SpanishBrazilian Chinese (Singapore)Russian Chinese (Macau SAR)Swedish
  40. 40. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI”
  41. 41. Integrated Full Text Search (iFTS)Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors(2012, Michael Rys, Microsoft)
  42. 42. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  43. 43. Text Mining ReferencesVideo Search Search (Books Online) – explains the demo
  44. 44. Microsoft ResourcesLinks
  45. 45. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) Office 2012 Professional
  46. 46. Organizations Professional Association for SQL Server Atlanta MDF Atlanta Microsoft BI Users Group Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America
  47. 47. InteractiveTakeaways
  48. 48. ConclusionMicrosoft competes well with other vendors Business Intelligence and Analytics Data Warehouse ExcelSQL Server Data Mining 2012 provides data mining and semantic search
  49. 49. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)