Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Into the Wild...Taming Unstructured Data with Semantic Search


Published on

There is runaway growth in the data volumes many organizations are facing today. The bad news is that much of this data is unstructured which means your traditional RDBMS just isn't capability of helping you deal with it. As a result significant emphasis has been put on technologies like Hadoop, No SQL and other distributed databases which are better suited to handling unstructured data. With the latest release SQL Server 2012 however, Microsoft has provided new features which will help tame some of this unstructured data. This session will dive into the new FileTable and Statistical Semantic Search features. We will show you how they work and highlight real world examples for integrating these exciting new features into your organization.

Published in: Technology
  • Be the first to comment

Into the Wild...Taming Unstructured Data with Semantic Search

  1. 1. MAKING BUSINESS INTELLIGENT Into the Wild Taming Unstructured Data with Semantic Search Chris Price Senior BI Consultant @BluewaterSQL
  2. 2. MAKING BUSINESS INTELLIGENT Intro Chris Price Senior BI Consultant with Pragmatic Works @BluewaterSQL
  3. 3. MAKING BUSINESS INTELLIGENT Outline  Data gone Wild  FileStream -> FileTable  Full-Text  FileTable/Full-Text Integration  SQL Server 2012 Enhancements  Semantic Search  Search Scenarios
  4. 4. MAKING BUSINESS INTELLIGENT Data Gone Wild!  Data by any other name….  Structured: Tabular, CSV & Fixed Width  Semi-Structured: HTML, XML & JSON  Unstructured: Images, Videos PDF & Email  80% of this stuff is not found in a DB  Difficult to Integrate  Hard to manage
  5. 5. MAKING BUSINESS INTELLIGENT Key Objective SQL Server 2012 is a great choice for integrating and managing structured, semi- structured & unstructured data
  6. 6. MAKING BUSINESS INTELLIGENT FileStream  Introduced in SQL Server 2008  Integrated DB Engine with NFTS File System  VARBINARY(MAX) columns stored on File System  Dual Programming Model:  Transact SQL (No write)  Win 32 Streaming (ODBC or OLE DB/ADO.NET)  Non-Trivial (Requires a Transactional Context)
  7. 7. MAKING BUSINESS INTELLIGENT FileTable  Introduce in SQL Server 2012  Built over top FileStream  Win32 API Access  Implemented as a fixed format table:  FileStream Storage/Container  Fille System Properties (Columns)  Hierarchy ID (synthesized hierarchical file system share)
  8. 8. MAKING BUSINESS INTELLIGENT FileTable  Accessed through File System Share or Table  SMB Protocol for Remote Access  Open docs in MS Word, Excel, etc  Share Allows Non-Transactional Access  No Memory-Mapped Files (Notepad/Paint)  File Name/Properties Preserved  Supports directory structures
  10. 10. MAKING BUSINESS INTELLIGENT FileTable Set-Up  Enable FileStream  DATABASE  TABLE
  11. 11. MAKING BUSINESS INTELLIGENT FileTable Access  Share:  <server><instance><database><table>  T-SQL:  Insert/Update/Delete  Can update a stream without affecting timestamp  Cannot delete directories that have files  Functions:  GetFileNamespacePath()  FiletableRootPath()  GetPathLocator()
  13. 13. MAKING BUSINESS INTELLIGENT Full-Text  Enhanced in 2012  7-10x fast than prior version  Scales up to >350m documents  NEW Property Search  Filter for document properties (i.e. Author ,Title)  iFilter must support  Customizable NEAR  CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, false’)  CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, true’)
  15. 15. MAKING BUSINESS INTELLIGENT Semantic Search  Built on top of Full-Text  What is a semantic search?  Full-Text finds words….Semantic Search meaning  Extract & Index statistically significant keywords  Tag Clouds, Etc  Identify related/similar docs  Based on Keywords)  Explain how/why two docs are related
  16. 16. MAKING BUSINESS INTELLIGENT Semantic Set-Up  Install Office Filter Pack & Filter Pack SP 1  Install, Attach & Register the Semantic DB
  18. 18. MAKING BUSINESS INTELLIGENT Semantic Results  SemanticKeyPhraseTable  Extracts key phrases for entire corpus or single document  SemanticSimilarityTable  Finds similar documents  SemanticSimilarityDetailsTable  Displays similarity details for two matched documents
  19. 19. MAKING BUSINESS INTELLIGENT Semantic Search Demo
  20. 20. MAKING BUSINESS INTELLIGENT www.pragmaticworks.comMAKING BUSINESS INTELLIGENT Thank you! Don’t forget to fill out your evaluations! @BluewaterSQL