• Like

Loading…

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

Into the Wild...Taming Unstructured Data with Semantic Search

  • 1,148 views
Uploaded on

There is runaway growth in the data volumes many organizations are facing today. The bad news is that much of this data is unstructured which means your traditional RDBMS just isn't capability of …

There is runaway growth in the data volumes many organizations are facing today. The bad news is that much of this data is unstructured which means your traditional RDBMS just isn't capability of helping you deal with it. As a result significant emphasis has been put on technologies like Hadoop, No SQL and other distributed databases which are better suited to handling unstructured data. With the latest release SQL Server 2012 however, Microsoft has provided new features which will help tame some of this unstructured data. This session will dive into the new FileTable and Statistical Semantic Search features. We will show you how they work and highlight real world examples for integrating these exciting new features into your organization.

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,148
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
7
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • Typical used when:File Size is > 1MbFast read accessSQL Server 2012 Enhancements:Multiple Containers per File Group (Performance Improvement, 5x Read Improvement)Full Always On SupportMAXSIZE specified at the Container-level
  • Functions are provided for portable programming
  • NEAR(<Search Terms>, <Distance>, <Order Matters>)

Transcript

  • 1. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Into the Wild Taming Unstructured Data with Semantic Search Chris Price Senior BI Consultant @BluewaterSQL
  • 2. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Intro Chris Price Senior BI Consultant with Pragmatic Works @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com
  • 3. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Outline  Data gone Wild  FileStream -> FileTable  Full-Text  FileTable/Full-Text Integration  SQL Server 2012 Enhancements  Semantic Search  Search Scenarios
  • 4. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Data Gone Wild!  Data by any other name….  Structured: Tabular, CSV & Fixed Width  Semi-Structured: HTML, XML & JSON  Unstructured: Images, Videos PDF & Email  80% of this stuff is not found in a DB  Difficult to Integrate  Hard to manage
  • 5. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Key Objective SQL Server 2012 is a great choice for integrating and managing structured, semi- structured & unstructured data
  • 6. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com FileStream  Introduced in SQL Server 2008  Integrated DB Engine with NFTS File System  VARBINARY(MAX) columns stored on File System  Dual Programming Model:  Transact SQL (No write)  Win 32 Streaming (ODBC or OLE DB/ADO.NET)  Non-Trivial (Requires a Transactional Context)
  • 7. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com FileTable  Introduce in SQL Server 2012  Built over top FileStream  Win32 API Access  Implemented as a fixed format table:  FileStream Storage/Container  Fille System Properties (Columns)  Hierarchy ID (synthesized hierarchical file system share)
  • 8. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com FileTable  Accessed through File System Share or Table  SMB Protocol for Remote Access  Open docs in MS Word, Excel, etc  Share Allows Non-Transactional Access  No Memory-Mapped Files (Notepad/Paint)  File Name/Properties Preserved  Supports directory structures
  • 9. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com FileTable Format
  • 10. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com FileTable Set-Up  Enable FileStream  DATABASE  TABLE
  • 11. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com FileTable Access  Share:  <server><instance><database><table>  T-SQL:  Insert/Update/Delete  Can update a stream without affecting timestamp  Cannot delete directories that have files  Functions:  GetFileNamespacePath()  FiletableRootPath()  GetPathLocator()
  • 12. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com FileTable Demo
  • 13. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Full-Text  Enhanced in 2012  7-10x fast than prior version  Scales up to >350m documents  NEW Property Search  Filter for document properties (i.e. Author ,Title)  iFilter must support  Customizable NEAR  CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, false’)  CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, true’)
  • 14. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Full-Text Demo
  • 15. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Semantic Search  Built on top of Full-Text  What is a semantic search?  Full-Text finds words….Semantic Search meaning  Extract & Index statistically significant keywords  Tag Clouds, Etc  Identify related/similar docs  Based on Keywords)  Explain how/why two docs are related
  • 16. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Semantic Set-Up  Install Office Filter Pack & Filter Pack SP 1  Install, Attach & Register the Semantic DB
  • 17. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Verify Filters
  • 18. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Semantic Results  SemanticKeyPhraseTable  Extracts key phrases for entire corpus or single document  SemanticSimilarityTable  Finds similar documents  SemanticSimilarityDetailsTable  Displays similarity details for two matched documents
  • 19. MAKING BUSINESS INTELLIGENT www.pragmaticworks.com Semantic Search Demo
  • 20. MAKING BUSINESS INTELLIGENT www.pragmaticworks.comMAKING BUSINESS INTELLIGENT www.pragmaticworks.com Thank you! Don’t forget to fill out your evaluations! @BluewaterSQL http://bluewatersql.wordpress.com/ cprice@pragmaticworks.com