Your SlideShare is downloading. ×

Applied Semantic Search with Microsoft SQL Server

1,799

Published on

Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic …

Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.

Published in: Business
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,799
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
36
Comments
0
Likes
2
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Applied EnterpriseSemantic MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #198 Vancouver BCFebruary 16, 2013
  • 2. Photos © 2013 Mark Tabladillo, All Rights Reserved
  • 3. Photos © 2013 Mark Tabladillo, All Rights Reserved
  • 4. NetworkingInteractive
  • 5. About MarkTabTraining and Consulting with Ph.D. – Industrial Engineering,http://marktab.com Georgia TechData Mining Resources and Blog at Training and consultinghttp://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  • 6. Quick LookMy Semantic Search
  • 7. InteractiveName three things you want from enterprise textmining
  • 8. IntroductionSQL Server 2012 has new Programmability Enhancements Statistical Semantic Search File Tables Full-Text Search ImprovementsThese combined technologies make SQL Server 2012 a strong contender in textmining
  • 9. OutlineWhy Microsoft is competitive for data miningDefinitions: what is text mining?History: how Microsoft’s semantic search was bornWhat is inside semantic search Logical model Demos PerformanceMicrosoft Resources
  • 10. Why Microsoft isCompetitive for DataMiningBased on 2012 and 2013 Surveys
  • 11. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  • 12. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  • 13. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  • 14. DefinitionsWhat is text mining?
  • 15. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in data Text mining is the automated or semi-automated process of discovering patterns from textual dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  • 16. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Text Mining” “Machine Determine best performing Learning” algorithm
  • 17. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  • 18. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  • 19. HistoryHow Microsoft’s semantic search came to be
  • 20. HistoryJuly 2008 Microsoft purchases Powerset for US$100 Million Google Dismisses Semantic Search http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine- powerset-for-100m-plus/ http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel- cx_ag_0701powerset.html
  • 21. HistoryMarch 2009 Google announces “snippets” as relevant to search The media picks this story up as “semantic search” http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google- results.html#!/2009/03/two-new-improvements-to-google-results.html
  • 22. HistoryFebruary 2012 Google announces Knowledge Graph, an explicit application of semantic search http://mashable.com/2012/02/13/google-knowledge-graph-change-search/
  • 23. HistoryApril 2012 Microsoft purchases 800+ patents from AOL for US$1 Billion Among the patents are semantic search and metadata querying – older than Google http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
  • 24. What is inside SemanticSearchText Mining introduced for SQL Server 2012
  • 25. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  • 26. Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on these phrases, can identify (by score) similar documents
  • 27. FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  • 28. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  • 29. Logical ModelHow semantic search works
  • 30. From Documents to Output Office Varchar PDF NVarchar Rowset Output with Scores
  • 31. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  • 32. Languages Currently SupportedTraditional Chinese Simplified ChineseGerman British EnglishEnglish PortugueseFrench Chinese (Hong Kong SAR, PRC)Italian SpanishBrazilian Chinese (Singapore)Russian Chinese (Macau SAR)Swedish
  • 33. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 34. Interactive DemoSQL Server Management Studio
  • 35. Semantic Search andSQL Server Data MiningSQL Server Data Tools: data mining plus text mining
  • 36. PerformanceThe Million-Dollar Edge
  • 37. Integrated Full Text Search (iFTS)Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors(2012, Michael Rys, Microsoft)
  • 38. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 39. Text Mining ReferencesVideo http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspxPaper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 40. Microsoft ResourcesLinks
  • 41. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professional http://office.microsoft.com/en-us/try
  • 42. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  • 43. InteractiveTakeaways
  • 44. ConclusionSQL Server Data Mining 2012 provides data mining and semantic searchThe core technology allows document similarity matchingThe results can be combined with SQL Server Data Mining (such asAssociation Analysis)
  • 45. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com

×