An overview of Microsoft data mining technology

  • 433 views
Uploaded on

Presented for 30 at the Atlanta .NET User Group

Presented for 30 at the Atlanta .NET User Group

More in: Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
433
On Slideshare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
9
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. An Overview ofMicrosoft Data MiningTechnologyMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)March 25, 2013Atlanta .NET User Group
  • 2. NetworkingInteractive
  • 3. About MarkTabTraining and Consulting with Ph.D. – Industrial Engineering,http://marktab.com Georgia TechData Mining Resources and Blog at Training and consultinghttp://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  • 4. InteractiveName three things you want from enterprise datamining
  • 5. Microsoft OffersBing MapsXbox Kinect Hacker MagnetSQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop PartnershipExcel Projects from Microsoft Research
  • 6. Outline
  • 7. DefinitionsWhat is data mining?
  • 8. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  • 9. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Determine best performing Learning” algorithm
  • 10. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  • 11. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  • 12. Industry Comparisons2012-2013
  • 13. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  • 14. Microsoft ResponseFocus on familiar, intuitive user experiences delivered via high quality, industry-leadingproducts that businesses already know and use today is key to making BI trulyaccessible to all users.By providing Business Intelligence capabilities in familiar tools such as Excel andSharePoint, we empower an entirely new segment of business users to build andconsume rich BI solutions as part of their everyday work.Delivering the server-side capabilities to enable self-service BI via SharePoint and SQLServer provides a common, scalable data platform to handle any data, any size, fromanywhere, and tackle all of your Big Data needs.Retrieved from http://blogs.msdn.com/b/microsoft_business_intelligence1/archive/2013/02/07/microsoft-in-leaders-quadrant-of-gartner-magic-quadrant-for-business-intelligence-and-analytics-platforms.aspx -- Feb 2013
  • 15. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  • 16. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  • 17. SQL Server 2012Business Intelligence and Business Analytics
  • 18. New Platform options: managed services Platform Infrastructure Platform Software(Self Managed) (as a Service) (as a Service) (as a Service) Applications Applications Applications Applications Data Data Data Data Runtime Runtime Runtime Runtime Middleware Middleware Middleware Middleware Managed Services Database Database Database Database Managed Services O/S O/S O/S O/S Virtualization Virtualization Virtualization Virtualization Managed Services Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
  • 19. SQL Release timelines 2008 SQL Server 2008 2012 SQL Server 2012 AlwaysOn Columnstore 1989 1993 2000 Sparse Columns FileTable SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search (OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1991 1995 1998 2005 2010 SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2 (OS/2) Dynamic Locking Unicode Support Data-tier Apps Auto-Tuning Native XML StreamInsight Full-text search SQLCLR PowerPivot Replication Service Broker Master Data Services Analysis Services Integration Services Aug 11 Aug 10 New Portal Experience SQL Azure SU4 RTW Feb 11 Sparse Columns Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3 Web Admin Dec DataSync CTP2 Update 10 SQL Azure DataSync CTP3 Apr 10 Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Feb 10 Jun 10 Nov 10 Apr 11 SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers Spatial Type Server Mgmt API HierarchyId Type JDBC DAC Upgrade
  • 20. Data platform: SQL Server 2012 Data Integration Database Services Analytical Services Reporting Services Services SQL Server* Integration Services* Reporting Services* Analysis Services* SQL Azure* SQL Azure Reporting* Master Data Services* Replication Data Mining Report Builder SQL Azure Data Sync* Data Quality Services* Full Text & Semantic StreamInsight* PowerPivot* Power View* Search* Project “Austin”** New / improved in SQL Server 2012
  • 21. SQL Server 2012 Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  • 22. What Enterprise Tools support MicrosoftData Mining? Data Mining SSMS SSIS PowerShell
  • 23. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 24. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 25. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 26. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 27. Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • 28. http://msdn.microsoft.com/en-us/library/ms174776.aspx
  • 29. http://msdn.microsoft.com/en-us/library/ms174587.aspx
  • 30. Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis 2^31-1 = 2,147,483,647 Services database Maximum data mining attributes (variables) per 2^31-1 = 2,147,483,647 structureReference:http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • 31. Third-PartyPredixion Software
  • 32. Semantic SearchText Mining
  • 33. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  • 34. Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on these phrases, can identify (by score) similar documents
  • 35. FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  • 36. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  • 37. From Documents to Output Office Varchar PDF NVarchar Rowset Output with Scores
  • 38. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  • 39. Languages Currently SupportedTraditional Chinese Simplified ChineseGerman British EnglishEnglish PortugueseFrench Chinese (Hong Kong SAR, PRC)Italian SpanishBrazilian Chinese (Singapore)Russian Chinese (Macau SAR)Swedish
  • 40. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 41. Integrated Full Text Search (iFTS)Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors(2012, Michael Rys, Microsoft)
  • 42. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 43. Text Mining ReferencesVideo http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspxPaper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 44. Microsoft ResourcesLinks
  • 45. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professional http://office.microsoft.com/en-us/try
  • 46. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  • 47. InteractiveTakeaways
  • 48. ConclusionMicrosoft competes well with other vendors Business Intelligence and Analytics Data Warehouse ExcelSQL Server Data Mining 2012 provides data mining and semantic search
  • 49. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com