Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Applied Enterprise Semantic Mining -- Charlotte 201410


Published on

Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2014 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index, and will also provide a comparison between what semantic search is and what Delve does. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.

Published in: Business
  • Be the first to comment

  • Be the first to like this

Applied Enterprise Semantic Mining -- Charlotte 201410

  1. 1. Applied Enterprise Semantic Mining Mark Tabladillo, Ph.D. Microsoft MVP | Consultant (SolidQ) Charlotte, NC –SQL Saturday 330 October 4, 2014
  2. 2. Networking Interactive
  3. 3. Mark Tab SQL Server MVP; SAS Expert Consulting Training Teaching Presenting Linked In @MarkTabNet
  4. 4. Quick Look My Semantic Search
  5. 5. Interactive Name three things you want from enterprise text mining
  6. 6. Introduction SQL Server has new Programmability Enhancements Statistical Semantic Search File Tables Full-Text Search Improvements These combined technologies make SQL Server 2014 a strong contender in text mining
  7. 7. Outline Why Microsoft is competitive for data mining Definitions: what is text mining? History: how Microsoft’s semantic search was born What is inside semantic search Logical model Demos Performance Microsoft Resources
  8. 8. Why Microsoft is Competitive for Data Mining Based on 2012 and 2013 Surveys
  9. 9. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from–February 5, 2013
  10. 10. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from–January 31, 2013
  11. 11. KDNuggets2013 mining-data-science-software.html
  12. 12. Definitions What is text mining?
  13. 13. Definition Data mining is the automated or semi-automated process of discovering patterns in data Text mining is the automated or semi-automated process of discovering patterns from textual data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  14. 14. Purposes Phrase Goal “Data Mining” “Text Mining” Inform actionabledecisions “Machine Learning” Determine best performingalgorithm
  15. 15. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO Science needs science fiction --MarkTab
  16. 16. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO
  17. 17. History How Microsoft’s semantic search came to be
  18. 18. History July 2008 Microsoft purchases Powerset for US$100 Million Google Dismisses Semantic Search powerset-for-100m-plus/ cx_ag_0701powerset.html
  19. 19. History March 2009 Google announces “snippets” as relevant to search The media picks this story up as “semantic search” results.html#!/2009/03/two-new-improvements-to-google-results.html
  20. 20. History February 2012 Google announces Knowledge Graph, an explicit application of semantic search
  21. 21. History April 2012 Microsoft purchases 800+ patents from AOL for US$1 Billion Among the patents are semantic search and metadata querying –older than Google
  22. 22. What is inside Semantic Search Text Mining introduced for SQL Server 2012
  23. 23. Future: Most data is Text Two Research Types •Quantitative research = data mining •Qualitative research = text mining The future is combining both
  24. 24. Statistical Semantic Search Comprises some aspects of text mining Identifies statistically relevant key phrases Based on these phrases, can identify (by score) similar documents
  25. 25. FileTables Built on existing SQL Server FILESTREAM technology Files and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  26. 26. Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  27. 27. Logical Model How semantic search works
  28. 28. RowsetOutput with Scores Varchar NVarchar Office PDF From Documents to Output
  29. 29. (iFilterRequired) Documents Full-Text Keyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  30. 30. Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  31. 31. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Key Phrase Index – Tag Index “TI” Semantic Document Similarity Index “DSI”
  32. 32. Interactive Demo SQL Server Management Studio
  33. 33. Semantic Search and SQL Server Data Mining SQL Server Data Tools: data mining plus text mining
  34. 34. Performance The Million-Dollar Edge
  35. 35. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTSquery performance 7-10 times faster than in SQL Server 2008 Worst-case iFTSquery response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  36. 36. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
  37. 37. Office Delve
  38. 38. Microsoft Resources Links
  39. 39. Text Mining References Video Search Semantic Search (Books Online) –explains the demo Paper
  40. 40. Software SQL Server Enterprise (includes database engine, Analysis Services, SSMS and SSDT) Microsoft Office 365 Home
  41. 41. Organizations Professional Association for SQL Server PASS Business Analytics Conference
  42. 42. Interactive Takeaways
  43. 43. Connect Data Mining Resources and blog Linked In Twitter @marktabnet
  44. 44. Conclusion SQL Server provides data mining and semantic search The core technology allows document similarity matching The results can be combined with SQL Server Data Mining (such as Association Analysis)