Applied Semantic Search 201306

620 views

Published on

Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
620
On SlideShare
0
From Embeds
0
Number of Embeds
53
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Applied Semantic Search 201306

  1. 1. Applied Enterprise Semantic Mining Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT) PASS SQL Saturday #229 South Florida June 29, 2013
  2. 2. Networking Interactive
  3. 3. About MarkTab Training and Consulting with http://marktab.com Data Mining Resources and Blog at http://marktab.net Twitter @marktabnet
  4. 4. Quick Look My Semantic Search
  5. 5. Interactive Name three things you want from enterprise text mining
  6. 6. Introduction SQL Server 2012 has new Programmability Enhancements Statistical Semantic Search File Tables Full-Text Search Improvements These combined technologies make SQL Server 2012 a strong contender in text mining
  7. 7. Outline Why Microsoft is competitive for data mining Definitions: what is text mining? History: how Microsoft’s semantic search was born What is inside semantic search Logical model Demos Performance Microsoft Resources
  8. 8. Why Microsoft is Competitive for Data Mining Based on 2012 and 2013 Surveys
  9. 9. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  10. 10. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  11. 11. KDNuggets 2013 http://www.kdnuggets.com/polls/2013/analytics-big-data- mining-data-science-software.html
  12. 12. Definitions What is text mining?
  13. 13. Definition Data mining is the automated or semi-automated process of discovering patterns in data Text mining is the automated or semi-automated process of discovering patterns from textual data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  14. 14. Purposes Phrase Goal “Data Mining” “Text Mining” Inform actionable decisions “Machine Learning” Determine best performing algorithm
  15. 15. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO Science needs science fiction -- MarkTab
  16. 16. MarkTab Decision Cycle Analysis (science) Synthesis (art) GO
  17. 17. History How Microsoft’s semantic search came to be
  18. 18. History July 2008 Microsoft purchases Powerset for US$100 Million Google Dismisses Semantic Search http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine- powerset-for-100m-plus/ http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel- cx_ag_0701powerset.html
  19. 19. History March 2009 Google announces “snippets” as relevant to search The media picks this story up as “semantic search” http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google- results.html#!/2009/03/two-new-improvements-to-google-results.html
  20. 20. History February 2012 Google announces Knowledge Graph, an explicit application of semantic search http://mashable.com/2012/02/13/google-knowledge-graph-change-search/
  21. 21. History April 2012 Microsoft purchases 800+ patents from AOL for US$1 Billion Among the patents are semantic search and metadata querying – older than Google http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
  22. 22. What is inside Semantic Search Text Mining introduced for SQL Server 2012
  23. 23. Future: Most data is Text Two Research Types • Quantitative research = data mining • Qualitative research = text mining The future is combining both
  24. 24. Statistical Semantic Search Comprises some aspects of text mining Identifies statistically relevant key phrases Based on these phrases, can identify (by score) similar documents
  25. 25. FileTables Built on existing SQL Server FILESTREAM technology Files and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  26. 26. Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  27. 27. Logical Model How semantic search works
  28. 28. Rowset Output with Scores Varchar NVarchar Office PDF From Documents to Output
  29. 29. (iFilter Required) Documents Full-Text Keyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  30. 30. Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  31. 31. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Key Phrase Index – Tag Index “TI” Semantic Document Similarity Index “DSI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  32. 32. Interactive Demo SQL Server Management Studio
  33. 33. Semantic Search and SQL Server Data Mining SQL Server Data Tools: data mining plus text mining
  34. 34. Performance The Million-Dollar Edge
  35. 35. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  36. 36. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  37. 37. Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  38. 38. Microsoft Resources Links
  39. 39. Software SQL Server 2012 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx Microsoft Office 2012 Professional http://office.microsoft.com/en-us/try
  40. 40. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/ PASS Business Analytics Conference http://www.passbaconference.com Microsoft TechEd North America http://northamerica.msteched.com/
  41. 41. Interactive Takeaways
  42. 42. Connect Data Mining Resources and blog http://marktab.net Data Mining Training and Consulting (especially Microsoft and SAS) http://marktab.com
  43. 43. Conclusion SQL Server Data Mining 2012 provides data mining and semantic search The core technology allows document similarity matching The results can be combined with SQL Server Data Mining (such as Association Analysis)

×