Your SlideShare is downloading. ×
Applied Enterprise Semantic Search 201305
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Introducing the official SlideShare app

Stunning, full-screen experience for iPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Applied Enterprise Semantic Search 201305

848
views

Published on

Presented at SQL Saturday Atlanta May 18, 2013 …

Presented at SQL Saturday Atlanta May 18, 2013
Text mining is projected to dominate data mining, and the reasons are evident: we have more text available than numeric data. Microsoft introduced a new technology to SQL Server 2012 called Semantic Search. This session's detailed description and demos give you important information for the enterprise implementation of Tag Index and Document Similarity Index. The demos include a web-based Silverlight application, and content documents from Wikipedia. We'll also look at strategy tips for how to best leverage the new semantic technology with existing Microsoft data mining.

Published in: Business

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
848
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Applied EnterpriseSemantic MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #220 Atlanta GAMay 18, 2013
  • 2. NetworkingInteractive
  • 3. About MarkTabTraining and Consulting withhttp://marktab.comData Mining Resources and Blog athttp://marktab.netTwitter @marktabnet
  • 4. Quick LookMy Semantic Search
  • 5. InteractiveName three things you want from enterprise textmining
  • 6. IntroductionSQL Server 2012 has new Programmability EnhancementsStatistical Semantic SearchFile TablesFull-Text Search ImprovementsThese combined technologies make SQL Server 2012 a strong contender in textmining
  • 7. OutlineWhy Microsoft is competitive for data miningDefinitions: what is text mining?History: how Microsoft’s semantic search was bornWhat is inside semantic searchLogical modelDemosPerformanceMicrosoft Resources
  • 8. Why Microsoft isCompetitive for DataMiningBased on 2012 and 2013 Surveys
  • 9. Gartner 2013Magic Quadrant forBusiness Intelligenceand AnalyticsPlatformsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb– February 5, 2013
  • 10. Gartner 2013Magic Quadrant forData WarehouseDatabaseManagementSystemsRetrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb– January 31, 2013
  • 11. KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  • 12. DefinitionsWhat is text mining?
  • 13. DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataText mining is the automated or semi-automated process ofdiscovering patterns from textual dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  • 14. PurposesPhrase Goal“Data Mining”“Text Mining”Inform actionable decisions“MachineLearning”Determine best performingalgorithm
  • 15. MarkTab Decision CycleAnalysis(science)Synthesis(art)GOScience needs science fiction -- MarkTab
  • 16. MarkTab Decision CycleAnalysis(science)Synthesis(art)GO
  • 17. HistoryHow Microsoft’s semantic search came to be
  • 18. HistoryJuly 2008Microsoft purchases Powerset for US$100 MillionGoogle Dismisses Semantic Searchhttp://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine-powerset-for-100m-plus/http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel-cx_ag_0701powerset.html
  • 19. HistoryMarch 2009Google announces “snippets” as relevant to searchThe media picks this story up as “semantic search”http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-results.html#!/2009/03/two-new-improvements-to-google-results.html
  • 20. HistoryFebruary 2012Google announces Knowledge Graph, an explicit application of semantic searchhttp://mashable.com/2012/02/13/google-knowledge-graph-change-search/
  • 21. HistoryApril 2012Microsoft purchases 800+ patents from AOL for US$1 BillionAmong the patents are semantic search and metadata querying – older thanGooglehttp://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
  • 22. What is inside SemanticSearchText Mining introduced for SQL Server 2012
  • 23. Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  • 24. Statistical Semantic SearchComprises some aspects of text miningIdentifies statistically relevant key phrasesBased on these phrases, can identify (by score) similar documents
  • 25. FileTablesBuilt on existing SQL Server FILESTREAM technologyFiles and documentsStored in special tables in SQL ServerAccessed if they were stored in the file system
  • 26. Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  • 27. Logical ModelHow semantic search works
  • 28. RowsetOutputwith ScoresVarcharNVarcharOfficePDFFrom Documents to Output
  • 29. (iFilter Required)DocumentsFull-TextKeywordIndex“FTI”iFiltersSemantic DocumentSimilarity Index “DSI”SemanticDatabaseSemanticKey PhraseIndex –Tag Index“TI”
  • 30. Languages Currently SupportedTraditional ChineseGermanEnglishFrenchItalianBrazilianRussianSwedishSimplified ChineseBritish EnglishPortugueseChinese (Hong Kong SAR, PRC)SpanishChinese (Singapore)Chinese (Macau SAR)
  • 31. Phases of Semantic IndexingFull Text Keyword Index “FTI”Semantic Key Phrase Index –Tag Index “TI”Semantic Document SimilarityIndex “DSI”http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 32. Interactive DemoSQL Server Management Studio
  • 33. Semantic Search andSQL Server Data MiningSQL Server Data Tools: data mining plus text mining
  • 34. PerformanceThe Million-Dollar Edge
  • 35. Integrated Full Text Search (iFTS)Improved Performance and Scale:Scale-up to 350M documents for storage and searchiFTS query performance 7-10 times faster than in SQL Server 2008Worst-case iFTS query response times less than 3 sec for corpusSimilar or better than main database search competitors(2012, Michael Rys, Microsoft)
  • 36. Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industryTime in Seconds vs. Number of Documents(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 37. Text Mining ReferencesVideohttp://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-Searchhttp://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demohttp://msdn.microsoft.com/en-us/library/gg492075.aspxPaperhttp://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 38. Microsoft ResourcesLinks
  • 39. SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT)http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professionalhttp://office.microsoft.com/en-us/try
  • 40. OrganizationsProfessional Association for SQL Server http://www.sqlpass.orgAtlanta MDF http://www.atlantamdf.com/Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  • 41. InteractiveTakeaways
  • 42. ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com
  • 43. ConclusionSQL Server Data Mining 2012 provides data mining and semantic searchThe core technology allows document similarity matchingThe results can be combined with SQL Server Data Mining (such asAssociation Analysis)