Applied Semantic Search with Microsoft SQL Server

Applied Enterprise
Semantic Mining
Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)
PASS SQL Saturday #198 Vancouver BC
February 16, 2013

About MarkTab
Training and Consulting with Ph.D. – Industrial Engineering,
http://marktab.com Georgia Tech
Data Mining Resources and Blog at Training and consulting
http://marktab.net internationally across many
industries – SAS and Microsoft
Contributed to peer-reviewed
research and legislation
Mentoring doctoral dissertations at the
accredited University of Phoenix
Presenter

Interactive
Name three things you want from enterprise text
mining

Introduction
SQL Server 2012 has new Programmability Enhancements
Statistical Semantic Search
File Tables
Full-Text Search Improvements
These combined technologies make SQL Server 2012 a strong contender in text
mining

Outline
Why Microsoft is competitive for data mining
Definitions: what is text mining?
History: how Microsoft’s semantic search was born
What is inside semantic search
Logical model
Demos
Performance
Microsoft Resources

Why Microsoft is
Competitive for Data
Mining
Based on 2012 and 2013 Surveys

Gartner 2013
Magic Quadrant for
Business Intelligence
and Analytics
Platforms

Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
– February 5, 2013

Gartner 2013
Magic Quadrant for
Data Warehouse
Database
Management
Systems

Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
– January 31, 2013

KDNuggets 2012
http://marktab.net/datamining/2012/06/15/excel-number-
commercial-tool-analytics-data-mining-big-data/

Definitions
What is text mining?

Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
Text mining is the automated or semi-automated process of
discovering patterns from textual data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery

Purposes
Phrase Goal

“Data Mining” Inform actionable decisions
“Text Mining”

“Machine Determine best performing
Learning” algorithm

MarkTab Decision Cycle
GO

Synthesis Analysis
(art) (science)

Science needs science fiction -- MarkTab

MarkTab Decision Cycle
GO

Synthesis Analysis
(art) (science)

History
How Microsoft’s semantic search came to be

History
July 2008
Microsoft purchases Powerset for US$100 Million
Google Dismisses Semantic Search
http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine-
powerset-for-100m-plus/
http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel-
cx_ag_0701powerset.html

History
March 2009
Google announces “snippets” as relevant to search
The media picks this story up as “semantic search”
http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-
results.html#!/2009/03/two-new-improvements-to-google-results.html

History
February 2012
Google announces Knowledge Graph, an explicit application of semantic search
http://mashable.com/2012/02/13/google-knowledge-graph-change-search/

History
April 2012
Microsoft purchases 800+ patents from AOL for US$1 Billion
Among the patents are semantic search and metadata querying – older than
Google
http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/

What is inside Semantic
Search
Text Mining introduced for SQL Server 2012

Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both

Statistical Semantic Search
Comprises some aspects of text mining
Identifies statistically relevant key phrases
Based on these phrases, can identify (by score) similar documents

FileTables
Built on existing SQL Server FILESTREAM technology
Files and documents
Stored in special tables in SQL Server
Accessed if they were stored in the file system

Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)

Logical Model
How semantic search works

From Documents to Output
Office
Varchar
PDF
NVarchar
Rowset
Output
with Scores

(iFilter Required)
iFilters Full-Text
Documents Keyword
Index
“FTI”

Semantic
Key Phrase
Semantic Index –
Semantic Document Database Tag Index
Similarity Index “DSI” “TI”

Languages Currently Supported
Traditional Chinese Simplified Chinese
German British English
English Portuguese
French Chinese (Hong Kong SAR, PRC)
Italian Spanish
Brazilian Chinese (Singapore)
Russian Chinese (Macau SAR)
Swedish

Phases of Semantic Indexing
Full Text Keyword Index “FTI”

Semantic Document Similarity
Index “DSI”
Semantic Key Phrase Index –
Tag Index “TI”

http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing

Interactive Demo
SQL Server Management Studio

Semantic Search and
SQL Server Data Mining
SQL Server Data Tools: data mining plus text mining

Performance
The Million-Dollar Edge

Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTS query performance 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)

Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry

Time in Seconds vs. Number of Documents
(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)

Text Mining References
Video
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-
Search
http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) – explains the demo
http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf

Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
http://office.microsoft.com/en-us/try

Organizations
Professional Association for SQL Server http://www.sqlpass.org
Atlanta MDF http://www.atlantamdf.com/
Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/

Conclusion
SQL Server Data Mining 2012 provides data mining and semantic search
The core technology allows document similarity matching
The results can be combined with SQL Server Data Mining (such as
Association Analysis)

Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com

Applied Semantic Search with Microsoft SQL Server

More Related Content

What's hot

Viewers also liked

Similar to Applied Semantic Search with Microsoft SQL Server

More from Mark Tabladillo

Recently uploaded

Applied Semantic Search with Microsoft SQL Server