Applied Enterprise Semantic Search 201305

1,145 views

Published on

Presented at SQL Saturday Atlanta May 18, 2013
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.

Published in: Business
  • Be the first to comment

  • Be the first to like this

Applied Enterprise Semantic Search 201305

  1. 1. Applied EnterpriseSemantic MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #220 Atlanta GAMay 18, 2013
  2. 2. NetworkingInteractive
  3. 3. About MarkTabTraining and Consulting withhttp://marktab.comData Mining Resources and Blog athttp://marktab.netTwitter @marktabnet
  4. 4. Quick LookMy Semantic Search
  5. 5. InteractiveName three things you want from enterprise textmining
  6. 6. IntroductionSQL Server 2012 has new Programmability EnhancementsStatistical Semantic SearchFile TablesFull-Text Search ImprovementsThese combined technologies make SQL Server 2012 a strong contender in textmining
  7. 7. OutlineWhy Microsoft is competitive for data miningDefinitions: what is text mining?History: how Microsoft’s semantic search was bornWhat is inside semantic searchLogical modelDemosPerformanceMicrosoft Resources
  8. 8. Why Microsoft isCompetitive for DataMiningBased on 2012 and 2013 Surveys
  9. 9. Gartner 2013Magic Quadrant forBusiness Intelligenceand AnalyticsPlatformsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb– February 5, 2013
  10. 10. Gartner 2013Magic Quadrant forData WarehouseDatabaseManagementSystemsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb– January 31, 2013
  11. 11. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  12. 12. DefinitionsWhat is text mining?
  13. 13. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataText mining is the automated or semi-automated process ofdiscovering patterns from textual dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  14. 14. PurposesPhrase Goal“Data Mining”“Text Mining”Inform actionable decisions“MachineLearning”Determine best performingalgorithm
  15. 15. MarkTab Decision CycleAnalysis(science)Synthesis(art)GOScience needs science fiction -- MarkTab
  16. 16. MarkTab Decision CycleAnalysis(science)Synthesis(art)GO
  17. 17. HistoryHow Microsoft’s semantic search came to be
  18. 18. HistoryJuly 2008Microsoft purchases Powerset for US$100 MillionGoogle Dismisses Semantic Searchhttp://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine-powerset-for-100m-plus/http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel-cx_ag_0701powerset.html
  19. 19. HistoryMarch 2009Google announces “snippets” as relevant to searchThe media picks this story up as “semantic search”http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-results.html#!/2009/03/two-new-improvements-to-google-results.html
  20. 20. HistoryFebruary 2012Google announces Knowledge Graph, an explicit application of semantic searchhttp://mashable.com/2012/02/13/google-knowledge-graph-change-search/
  21. 21. HistoryApril 2012Microsoft purchases 800+ patents from AOL for US$1 BillionAmong the patents are semantic search and metadata querying – older thanGooglehttp://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
  22. 22. What is inside SemanticSearchText Mining introduced for SQL Server 2012
  23. 23. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  24. 24. Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on these phrases, can identify (by score) similar documents
  25. 25. FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documentsStored in special tables in SQL ServerAccessed if they were stored in the file system
  26. 26. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  27. 27. Logical ModelHow semantic search works
  28. 28. RowsetOutputwith ScoresVarcharNVarcharOfficePDFFrom Documents to Output
  29. 29. (iFilter Required)DocumentsFull-TextKeywordIndex“FTI”iFiltersSemantic DocumentSimilarity Index “DSI”SemanticDatabaseSemanticKey PhraseIndex –Tag Index“TI”
  30. 30. Languages Currently SupportedTraditional ChineseGermanEnglishFrenchItalianBrazilianRussianSwedishSimplified ChineseBritish EnglishPortugueseChinese (Hong Kong SAR, PRC)SpanishChinese (Singapore)Chinese (Macau SAR)
  31. 31. Phases of Semantic IndexingFull Text Keyword Index “FTI”Semantic Key Phrase Index –Tag Index “TI”Semantic Document SimilarityIndex “DSI”http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  32. 32. Interactive DemoSQL Server Management Studio
  33. 33. Semantic Search andSQL Server Data MiningSQL Server Data Tools: data mining plus text mining
  34. 34. PerformanceThe Million-Dollar Edge
  35. 35. Integrated Full Text Search (iFTS)Improved Performance and Scale:Scale-up to 350M documents for storage and searchiFTS query performance 7-10 times faster than in SQL Server 2008Worst-case iFTS query response times less than 3 sec for corpusSimilar or better than main database search competitors(2012, Michael Rys, Microsoft)
  36. 36. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industryTime in Seconds vs. Number of Documents(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  37. 37. Text Mining ReferencesVideohttp://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-Searchhttp://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demohttp://msdn.microsoft.com/en-us/library/gg492075.aspxPaperhttp://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  38. 38. Microsoft ResourcesLinks
  39. 39. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT)http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professionalhttp://office.microsoft.com/en-us/try
  40. 40. OrganizationsProfessional Association for SQL Server http://www.sqlpass.orgAtlanta MDF http://www.atlantamdf.com/Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  41. 41. InteractiveTakeaways
  42. 42. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com
  43. 43. ConclusionSQL Server Data Mining 2012 provides data mining and semantic searchThe core technology allows document similarity matchingThe results can be combined with SQL Server Data Mining (such asAssociation Analysis)

×