Applied Semantic Search with Microsoft SQL Server

3,050 views

Published on

Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.

Published in: Business
  • Be the first to comment

Applied Semantic Search with Microsoft SQL Server

  1. 1. Applied EnterpriseSemantic MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #198 Vancouver BCFebruary 16, 2013
  2. 2. Photos © 2013 Mark Tabladillo, All Rights Reserved
  3. 3. Photos © 2013 Mark Tabladillo, All Rights Reserved
  4. 4. NetworkingInteractive
  5. 5. About MarkTabTraining and Consulting with Ph.D. – Industrial Engineering,http://marktab.com Georgia TechData Mining Resources and Blog at Training and consultinghttp://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  6. 6. Quick LookMy Semantic Search
  7. 7. InteractiveName three things you want from enterprise textmining
  8. 8. IntroductionSQL Server 2012 has new Programmability Enhancements Statistical Semantic Search File Tables Full-Text Search ImprovementsThese combined technologies make SQL Server 2012 a strong contender in textmining
  9. 9. OutlineWhy Microsoft is competitive for data miningDefinitions: what is text mining?History: how Microsoft’s semantic search was bornWhat is inside semantic search Logical model Demos PerformanceMicrosoft Resources
  10. 10. Why Microsoft isCompetitive for DataMiningBased on 2012 and 2013 Surveys
  11. 11. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  12. 12. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  13. 13. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  14. 14. DefinitionsWhat is text mining?
  15. 15. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in data Text mining is the automated or semi-automated process of discovering patterns from textual dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  16. 16. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Text Mining” “Machine Determine best performing Learning” algorithm
  17. 17. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  18. 18. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  19. 19. HistoryHow Microsoft’s semantic search came to be
  20. 20. HistoryJuly 2008 Microsoft purchases Powerset for US$100 Million Google Dismisses Semantic Search http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine- powerset-for-100m-plus/ http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel- cx_ag_0701powerset.html
  21. 21. HistoryMarch 2009 Google announces “snippets” as relevant to search The media picks this story up as “semantic search” http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google- results.html#!/2009/03/two-new-improvements-to-google-results.html
  22. 22. HistoryFebruary 2012 Google announces Knowledge Graph, an explicit application of semantic search http://mashable.com/2012/02/13/google-knowledge-graph-change-search/
  23. 23. HistoryApril 2012 Microsoft purchases 800+ patents from AOL for US$1 Billion Among the patents are semantic search and metadata querying – older than Google http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
  24. 24. What is inside SemanticSearchText Mining introduced for SQL Server 2012
  25. 25. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  26. 26. Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on these phrases, can identify (by score) similar documents
  27. 27. FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  28. 28. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  29. 29. Logical ModelHow semantic search works
  30. 30. From Documents to Output Office Varchar PDF NVarchar Rowset Output with Scores
  31. 31. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  32. 32. Languages Currently SupportedTraditional Chinese Simplified ChineseGerman British EnglishEnglish PortugueseFrench Chinese (Hong Kong SAR, PRC)Italian SpanishBrazilian Chinese (Singapore)Russian Chinese (Macau SAR)Swedish
  33. 33. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  34. 34. Interactive DemoSQL Server Management Studio
  35. 35. Semantic Search andSQL Server Data MiningSQL Server Data Tools: data mining plus text mining
  36. 36. PerformanceThe Million-Dollar Edge
  37. 37. Integrated Full Text Search (iFTS)Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors(2012, Michael Rys, Microsoft)
  38. 38. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  39. 39. Text Mining ReferencesVideo http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspxPaper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  40. 40. Microsoft ResourcesLinks
  41. 41. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professional http://office.microsoft.com/en-us/try
  42. 42. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  43. 43. InteractiveTakeaways
  44. 44. ConclusionSQL Server Data Mining 2012 provides data mining and semantic searchThe core technology allows document similarity matchingThe results can be combined with SQL Server Data Mining (such asAssociation Analysis)
  45. 45. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com

×