FileTable and Semantic Search in SQL Server 2012

FILETABLE AND SEMANTIC SEARCH IN SQL
SERVER 2012
Michael Rys
Principal Program Manager
Microsoft Corp
@SQLServerMike

© 2012 Microsoft

MY FAVORITE BEYOND RELATIONAL APPLICATION

Structured and
unstructured Search

Related/”Semantic”
Search

BEYOND RELATIONAL DATA

Building and Maintaining Applications with
relational and non-relational data is hard
Pain Complex integration
Duplicated functionality
Points Compensation for unavailable services

Reduce the cost of managing all data
Simplify the development of applications
Goals over all data
Provide management and programming
services for all data

RICH UNSTRUCTURED DATA IN SQL SERVER 2012

• 80% of all data is not stored in databases!
Most of it is “unstructured”

• Make SQL Server the preferred choice for managing Unstructured Data
and allow building Rich Application Experience on top

• Address important customer requests for Capabilities and rich services
for Rich Unstructured Data (RUDS)
o Scale Up for storage and search to 100mio to 500mio documents
o Easy use/access to Unstructured data from all applications
o Rich insight into unstructured data to make better decisions

DEMO
Teaser: MySemanticSearch
http://mysemanticsearch.codeplex.com

RICH UNSTRUCTURED DATA & SERVICES ECOSYSTEM

Transactional Access Streaming Win32 Access
Streaming Win32 Access??
Database Applications Windows Apps SQL Apps

Blobs SMB Share FileStream
Files/Folders API

Rich Services

Fulltext Search Database

Solutions
Scale-up
Semantic Similarity
FileTable
Disk1 Disk2 Disk3

FileStreams
Search
Multiple Containers

Integrated Administration?
Integrated Administration Remote BLOB Storage
Customer Application
SQL RBS API
DB Centera SQL FILESTREAM
DB FileStre Azure lib lib lib
FileStreams

Integrated Azure Centera SQL DB
Backup/Replication/AlwaysOn

DEMO
Integrated Management of documents in SQL Server 2012

FILETABLE OVERVIEW

• FileTable: A Table of Files/Directories FileTable Folder Hierarchy
• User created Table with a fixed schema
• contains FILESTREAM and File Attributes FILESTREAM Share
MSSQLSERVER

• Each row represents a File or a Directory
my_machineMSSQLSERVER
• System defined constraints maintain the tree Database
Office DocsDocuments

integrity Directories

Private Docs Office Docs
(Database1) (Database2)

• File/Directory hierarchy view through a Windows
Share FileTable Directories

Media Documents LogFiles
• Supports Win32 APIs for File/Directory (FileTable) (FileTable) (FileTable)

Management User-Defined

• DB Storage is Transparent to Win32 applications
Directory Structure

• SMB level of application compatibility
• Virtual network name (VNN) path support for
transparent Win32 application failover

CREATING A FILETABLE

Pre-requisites
Enable FILESTREAM
Create FILESTREAM Share and Filegroup
Enable non-transactional access at the DB level
ALTER DATABASE Contoso SET FILESTREAM( non_transacted_access=FULL,
Directory_name = N’Contoso’)

Create FileTable
CREATE TABLE Contoso..Documents AS FILETABLE
WITH (filetable_directory = N'Document Library')
Access at <machine name><FILESTREAM share>ContosoDocument Library

MODIFYING A FILETABLE

FileTable has a fixed schema
Columns, system defined constraints cannot be altered/dropped
Allows user defined indexes/constraints/triggers
Disabling/Enabling FileTable Namespace
ALTER TABLE Documents DISABLE FILETABLE_NAMESPACE
Disables all system-defined constraints and Win32 access to
FileTable
Useful for bulk-loading/re-organization of data
FileTable can be dropped similar to any other table
Catalog views can be used for obtaining metadata

DATA ACCESS – FILE SYSTEM ACCESS
FileTable hierarchy is visible through Filestream share
machine<FILESTREAMshare><Database_directory><FileTable_Directory>...
Provides transparent Win32 API & File/Directory Management capabilities
e.g. MS word can create/open/save files; xcopy for copying directory trees into
database..

Win32 API operations are non-transactional
Operations cannot be part of any user transactions
Win32 operations are intercepted by SQL Server at the File system level
e.g. File/Directory creation/deletion => insert/delete into FileTable
Full locking/concurrency semantics with other accesses
Allows in-place update of file stream data/File attributes

Transactional FILESTREAM APIs can also be used.

DATA ACCESS – T-SQL ACCESS

Normal Insert/Update/Delete allowed for the FileTable manipulation
FileTable Namespace integrity constraints enforced
Set based operations on the File-attributes – value add

Built-in functions
GetFileNamespacePath() – UNC path for a file/directory
FileTableRootPath() – UNC path to the FileTable root
GetPathlocator() – path_locator value for a file/directory

DDL/DML Triggers are supported
DML triggers on a FileTable cannot update any FileTables

MANAGING FILETABLE

DB Backup/Restore operations include FileTable data
Point in time Restore‟ may contain more recent FILESTREAM data due to
non-transactional updates during backup
FileTables are secured similar to any other user tables
Same security is enforced for Win32 access also
Data Loading
Windows tools like xcopy/robocopy OR drag-drop operations through
Windows Explorer can be used
BCP operations are supported for direct T-SQL data inserts
SSMS supports FileTable creation/exploration

MANAGING FILETABLE – HIGH AVAILABILITY

SQL Server 2012 AlwaysOn is fully supported

Transparent data failover
FileTables can be configured with multiple secondary nodes
Both sync and async data replication is supported
File and metadata is available in the secondary in case of failover
Transparent application failover
Virtual network name (VNN) path support for transparent Win32 application failover
Applications use VNNSharedb... Path
Applications are automatically redirected to the secondary in case of failover
Restrictions
FileTables cannot participate in “Read-only” replicas.

FILETABLE RESTRICTIONS

FileTables cannot be partitioned
Merge/Transactional replications are not supported
RCSI/SnapShot isolation mode
Applications cannot modify file stream data in FileTables
Win32 Application compatibility
Memory mapped files, Directory notifications, links are not supported

UNSTRUCTURED DATA SCALE-UP
MULTIPLE CONTAINERS FOR FILESTREAM DATA
SQL 2008 R2
Only one storage container/FILESTREAM filegroup
Limits storage capacity scaling and I/O scaling

SQL Server 2012
Support for multiple storage containers/filegroup.
DDL Changes to Create/Alter Database statements
Ability to set max_size for the containers
DBCC Shrinkfile Emptyfile support
Scaling Flexibility
Storage scaling by adding additional storage drives
I/O scaling with multiple spindles

UNSTRUCTURED DATA : MULTIPLE CONTAINERS

Use of multiple spindles for achieving better I/O Scalability

RUDS SCALE-UP: FILESTREAM PERF/SCALE
Improved performance of T-SQL and File I/O access
Various enhancements to improve read/write throughput
5 fold increase in Read throughput
Linear scaling with large number of concurrent threads

2012 2012

SUMMARY: FILETABLE

Application Compatibility for Windows Applications
Windows applications run on top of files stored in FileTables with
no modifications
Relational Value Proposition
Provide Integrated Administration and Services
Backup, Log Shipping, HA-DR, Full text and Semantic search, …
T-SQL orthogonality
File/Folder attributes surfaced through relational columns
Power of set based operations, Policy Management, Reporting etc
FileNamespace Hierarchy management

FULL TEXT SEARCH IMPROVEMENTS IN SQL SERVER 2012
Improved Performance and Scale:
Scale-up to 350M documents
iFTS query perf 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times < 3 sec for corpus
At par or better than main database search competitors
New Functionality:
Property Search
customizable NEAR
New Wordbrakers: update existing WB, add Czech and Greek
Innovation in Search:
Semantic Similarity Search

FULLTEXT SEARCH PERFORMANCE & SCALE IMPROVEMENTS
Architectural Improvements
Improved internal implementation
Queries no longer block Index updates
Improved Query Plans:
Better Plans for common queries
Fulltext predicate folding
Parallel Plan execution

Index and Query tested on scale up to 350Million documents with
<~2 Sec Response
~3X better w/o DML and ~9X better with DML throughput
Scale easily with increasing number of connections

SCALE-UP: FULL-TEXT SEARCH
2005/8 vs 2012

2005/8
2012

Queries over 350M documents database and random DMLs running in background.
Beating SQL Server 2005 with a scale factor more than 2x and with avg 60x times better throughput

SCALE-UP: FULL-TEXT SEARCH
2005/8 vs 2012

2005/8

2012

Query avgExecTime (ms) under various number of connections (50 ~ 2000 users) for customer
playback benchmark

FULLTEXT PROPERTY SCOPED SEARCH
New Search Filter for Document Properties
CONTAINS (PROPERTY ( { column_name }, 'property_name' ), „contains_search_condition‟ )
• Setup once per database instance to load the office filters
exec sp_fulltext_service 'load_os_resources',1
go
exec sp_fulltext_service 'restart_all_fdhosts'
go
• Create a property list
CREATE SEARCH PROPERTY LIST p1;

• Add properties to be extracted
ALTER SEARCH PROPERTY LIST [p1] ADD N'System.Author' WITH
(PROPERTY_SET_GUID = 'f29f85e0-4ff9-1068-ab91-08002b27b3d9',
PROPERTY_INT_ID = 4, PROPERTY_DESCRIPTION = N'System.Author');

• Create/Alter Fulltext index to specify property list to be extracted
ALTER FULLTEXT INDEX ON fttable... SET SEARCH PROPERTY LIST = [p1];

• Query for properties
SELECT * FROM fttable WHERE CONTAINS(PROPERTY(ftcol, 'System.Author'), 'fernlope');

FULL-TEXT CUSTOMIZABLE NEAR
OLD NEAR SYNTAX
select * from fttable where contains(*, 'test near Space')

NEW NEAR USAGES
• SPECIFY DISTANCE
select * from fttable
where contains(*, 'near((test, Space), 5,false)')

• REDUCE DISTANCE
where contains(*, 'near((test, Space), 2,false)')

• ORDER OF WORDS IS SPECIFIED AS IMPORTANT
where contains(*, 'near((test, Space), 5,true)')

STATISTICAL SEMANTIC SEARCH
Semantic Insight into textual content
Uses language models to find most important keywords in document
No need to build brittle ontologies!
Statistically Prominent Keywords
Autogenerated tag clouds
Potentially Related Content based on extracted Keywords, such as
Similar Products (based on description)
Similar Jobs or Applicants
Similar Support Incidents (based on call logs)
Potential Solutions (based on similar incidents)

First class usage experience
Efficent linear algorithms
Integrated with FTS and SQL
New Rowset functions for all results using SQL query

DEMO
Semantic Extraction and Relationships
FullText Search in SQL Server 2012

SEMANTIC SIMILARITY
• Input: Text such as varchar, Office, PDF, HTML, email…
Output: Rowset functions with standard SQL queries
Illustrating example:
Source Table Keyphrases KeyphraseDocuments
--------------
Key Title Document -------------- ID Keyword ID DocID
D1 Annual Budget … --------------
--------------
-------------- T1 revenue T1 (revenue) D1 (Annual Budget)
D2 Corporate Earnings … --------------
--------------
--------------
-------------- T2 growth T2 (growth) D2 (Corporate Earnings)
D3 Marketing Reports … --------------
-------------- T3 Windows T3 (Windows) D3 (Marketing Reports)
--------------
--------------
… … … T4 Azure
-------------- … …
-------------- … … T1 (revenue) D7 (Finance Report)
1 … …
Full-Text and Semantic Processing T3 (Windows) D11 (Azure Strategy)

quarter, record,
T4 (Azure) D11 (Azure Strategy)
revenue…

3
DocumentSimilarity
2
a

Keyword Index (Full-Text) DocID MatchedDocID
ID Keyword Colid … compDocid CompOc CompPid D1 (Annual Budget) D2 (Corporate Earnings)
K1 revenue 1 … 10,23,123 (1,4),(5,8),(1,34) 2,5,6,8,4,3 D1 (Annual Budget) D7 (Finance Report)
K2 growth 1 … 10,23,123 (1,5),(5,9),(1,34) 2,5,6,8,5,4 D3 (Marketing Reports) D11 (Azure Strategy)
… … … … … … … …

SEMANTIC EXTRACTION: END-2-END EXPERIENCE

• Downloadable Language Statistical Database with registration stored
procedure
• Setup along with Full-Text
• Metadata / Catalog views
• System level DMVs for progress state and usage
• Manageability through SSMS and SMO

KEY TAKEAWAYS

SQL Server‟s unstructured data support is key strategy to
enable you to build complex data applications that go
beyond relational data!
Content and Collaboration, eDiscovery, Healthcare, Document
management etc.

RELATED CONTENT

SQL Server 2012 Whitepapers and information:
http://www.sqlserverlaunch.com
Channel 9 DataBound Episode 2: http://channel9.msdn.com
MySemanticsSearch Demo: http://mysemanticsearch.codeplex.com
More demo data sets and demo scripts:
http://blogs.msdn.com/b/sqlfts/archive/2011/07/21/introducing-fulltext-
statistical-semantic-search-in-sql-server-codename-denali-release.aspx
Microsoft Virtual Academy Recording: Coming Soon!

FileTable and Semantic Search in SQL Server 2012

FileTable and Semantic Search in SQL Server 2012

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (18)

Similar to FileTable and Semantic Search in SQL Server 2012

Similar to FileTable and Semantic Search in SQL Server 2012 (20)

More from Michael Rys

More from Michael Rys (20)

Recently uploaded

Recently uploaded (20)

FileTable and Semantic Search in SQL Server 2012

Editor's Notes