An Overview ofMicrosoft Data MiningTechnologyMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)March 25, 2013Atlanta .NET...
NetworkingInteractive
About MarkTabTraining and Consulting with        Ph.D. – Industrial Engineering,http://marktab.com                  Georgi...
InteractiveName three things you want from enterprise datamining
Microsoft OffersBing  MapsXbox Kinect  Hacker MagnetSQL Server 2012  Analysis Services (Multidimensional and Data Mining) ...
Outline
DefinitionsWhat is data mining?
DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the dev...
Purposes    Phrase          Goal    “Data Mining”   Inform actionable decisions    “Machine        Determine best performi...
MarkTab Decision Cycle                             GO           Synthesis                 Analysis               (art)    ...
MarkTab Decision Cycle                      GO          Synthesis        Analysis            (art)          (science)
Industry Comparisons2012-2013
Gartner 2013           Magic Quadrant for           Business Intelligence           and Analytics           Platforms  Ret...
Microsoft ResponseFocus on familiar, intuitive user experiences delivered via high quality, industry-leadingproducts that ...
Gartner 2013           Magic Quadrant for           Data Warehouse           Database           Management           Syste...
KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
SQL Server 2012Business Intelligence and Business Analytics
New Platform options: managed services   Platform       Infrastructure                         Platform                   ...
SQL Release timelines                                                                                                     ...
Data platform: SQL Server 2012                              Data Integration  Database Services                           ...
SQL Server 2012 Editions    Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
What Enterprise Tools support MicrosoftData Mining?                  Data                 Mining      SSMS        SSIS    ...
Variable      0   1   2   3   4   5   6   7DiscretizedDiscretizedContinuousDiscrete
Variable      0   1   2   3   4   5   6   7DiscretizedDiscretizedContinuousDiscrete
Variable      0   1   2   3   4   5   6   7DiscretizedDiscretizedContinuousDiscrete
Variable      0   1   2   3   4   5   6   7DiscretizedDiscretizedContinuousDiscrete
Variable      0   1   2   3   4   5   6   7DiscretizedDiscretizedContinuousDiscrete
http://msdn.microsoft.com/en-us/library/ms174776.aspx
http://msdn.microsoft.com/en-us/library/ms174587.aspx
Data Mining Capacities   SQL Server 2008 R2 Analysis Services Object                    Maximum sizes/numbers   Maximum da...
Third-PartyPredixion Software
Semantic SearchText Mining
Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe fu...
Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on thes...
FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documents   Stored in special tables in SQL Server  ...
Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find...
From Documents to Output                    Office         Varchar                                 PDF        NVarchar    ...
(iFilter Required)                                  iFilters   Full-Text       Documents                             Keywo...
Languages Currently SupportedTraditional Chinese   Simplified ChineseGerman                British EnglishEnglish         ...
Phases of Semantic Indexing      Full Text Keyword Index “FTI”                                                 Semantic Do...
Integrated Full Text Search (iFTS)Improved Performance and Scale:  Scale-up to 350M documents for storage and search  iFTS...
Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry            T...
Text Mining ReferencesVideo  http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-  Search  http://www.mi...
Microsoft ResourcesLinks
SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sq...
Organizations Professional Association for SQL Server http://www.sqlpass.org   Atlanta MDF http://www.atlantamdf.com/   At...
InteractiveTakeaways
ConclusionMicrosoft competes well with other vendors Business Intelligence and Analytics Data Warehouse ExcelSQL Server Da...
ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)...
An overview of Microsoft data mining technology
An overview of Microsoft data mining technology
Upcoming SlideShare
Loading in …5
×

An overview of Microsoft data mining technology

633 views
566 views

Published on

Presented for 30 at the Atlanta .NET User Group

Published in: Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
633
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
12
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

An overview of Microsoft data mining technology

  1. 1. An Overview ofMicrosoft Data MiningTechnologyMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)March 25, 2013Atlanta .NET User Group
  2. 2. NetworkingInteractive
  3. 3. About MarkTabTraining and Consulting with Ph.D. – Industrial Engineering,http://marktab.com Georgia TechData Mining Resources and Blog at Training and consultinghttp://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  4. 4. InteractiveName three things you want from enterprise datamining
  5. 5. Microsoft OffersBing MapsXbox Kinect Hacker MagnetSQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop PartnershipExcel Projects from Microsoft Research
  6. 6. Outline
  7. 7. DefinitionsWhat is data mining?
  8. 8. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  9. 9. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Determine best performing Learning” algorithm
  10. 10. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  11. 11. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  12. 12. Industry Comparisons2012-2013
  13. 13. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  14. 14. Microsoft ResponseFocus on familiar, intuitive user experiences delivered via high quality, industry-leadingproducts that businesses already know and use today is key to making BI trulyaccessible to all users.By providing Business Intelligence capabilities in familiar tools such as Excel andSharePoint, we empower an entirely new segment of business users to build andconsume rich BI solutions as part of their everyday work.Delivering the server-side capabilities to enable self-service BI via SharePoint and SQLServer provides a common, scalable data platform to handle any data, any size, fromanywhere, and tackle all of your Big Data needs.Retrieved from http://blogs.msdn.com/b/microsoft_business_intelligence1/archive/2013/02/07/microsoft-in-leaders-quadrant-of-gartner-magic-quadrant-for-business-intelligence-and-analytics-platforms.aspx -- Feb 2013
  15. 15. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  16. 16. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  17. 17. SQL Server 2012Business Intelligence and Business Analytics
  18. 18. New Platform options: managed services Platform Infrastructure Platform Software(Self Managed) (as a Service) (as a Service) (as a Service) Applications Applications Applications Applications Data Data Data Data Runtime Runtime Runtime Runtime Middleware Middleware Middleware Middleware Managed Services Database Database Database Database Managed Services O/S O/S O/S O/S Virtualization Virtualization Virtualization Virtualization Managed Services Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
  19. 19. SQL Release timelines 2008 SQL Server 2008 2012 SQL Server 2012 AlwaysOn Columnstore 1989 1993 2000 Sparse Columns FileTable SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search (OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1991 1995 1998 2005 2010 SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2 (OS/2) Dynamic Locking Unicode Support Data-tier Apps Auto-Tuning Native XML StreamInsight Full-text search SQLCLR PowerPivot Replication Service Broker Master Data Services Analysis Services Integration Services Aug 11 Aug 10 New Portal Experience SQL Azure SU4 RTW Feb 11 Sparse Columns Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3 Web Admin Dec DataSync CTP2 Update 10 SQL Azure DataSync CTP3 Apr 10 Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Feb 10 Jun 10 Nov 10 Apr 11 SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers Spatial Type Server Mgmt API HierarchyId Type JDBC DAC Upgrade
  20. 20. Data platform: SQL Server 2012 Data Integration Database Services Analytical Services Reporting Services Services SQL Server* Integration Services* Reporting Services* Analysis Services* SQL Azure* SQL Azure Reporting* Master Data Services* Replication Data Mining Report Builder SQL Azure Data Sync* Data Quality Services* Full Text & Semantic StreamInsight* PowerPivot* Power View* Search* Project “Austin”** New / improved in SQL Server 2012
  21. 21. SQL Server 2012 Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  22. 22. What Enterprise Tools support MicrosoftData Mining? Data Mining SSMS SSIS PowerShell
  23. 23. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  24. 24. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  25. 25. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  26. 26. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  27. 27. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  28. 28. http://msdn.microsoft.com/en-us/library/ms174776.aspx
  29. 29. http://msdn.microsoft.com/en-us/library/ms174587.aspx
  30. 30. Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis 2^31-1 = 2,147,483,647 Services database Maximum data mining attributes (variables) per 2^31-1 = 2,147,483,647 structureReference:http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  31. 31. Third-PartyPredixion Software
  32. 32. Semantic SearchText Mining
  33. 33. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  34. 34. Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on these phrases, can identify (by score) similar documents
  35. 35. FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  36. 36. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  37. 37. From Documents to Output Office Varchar PDF NVarchar Rowset Output with Scores
  38. 38. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  39. 39. Languages Currently SupportedTraditional Chinese Simplified ChineseGerman British EnglishEnglish PortugueseFrench Chinese (Hong Kong SAR, PRC)Italian SpanishBrazilian Chinese (Singapore)Russian Chinese (Macau SAR)Swedish
  40. 40. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  41. 41. Integrated Full Text Search (iFTS)Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors(2012, Michael Rys, Microsoft)
  42. 42. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  43. 43. Text Mining ReferencesVideo http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspxPaper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  44. 44. Microsoft ResourcesLinks
  45. 45. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professional http://office.microsoft.com/en-us/try
  46. 46. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  47. 47. InteractiveTakeaways
  48. 48. ConclusionMicrosoft competes well with other vendors Business Intelligence and Analytics Data Warehouse ExcelSQL Server Data Mining 2012 provides data mining and semantic search
  49. 49. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com

×