Applied Enterprise Semantic Mining 
Mark Tabladillo, Ph.D. 
Microsoft MVP | Consultant (SolidQ) 
Charlotte, NC –SQL Saturday 330 
October 4, 2014
Networking 
Interactive
Mark Tab 
SQL Server MVP; SAS Expert 
Consulting 
Training 
Teaching 
Presenting 
Linked In 
@MarkTabNet
Quick Look 
My Semantic Search
Interactive 
Name three things you want from enterprise text mining
Introduction 
SQL Server has new Programmability Enhancements 
Statistical Semantic Search 
File Tables 
Full-Text Search Improvements 
These combined technologies make SQL Server 2014 a strong contender in text mining
Outline 
Why Microsoft is competitive for data mining 
Definitions: what is text mining? 
History: how Microsoft’s semantic search was born 
What is inside semantic search 
Logical model 
Demos 
Performance 
Microsoft Resources
Why Microsoft is Competitive for Data Mining 
Based on 2012 and 2013 Surveys
Gartner 2013 
Magic Quadrant for Business Intelligence and Analytics Platforms 
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
Gartner 2013 
Magic Quadrant for Data Warehouse Database Management Systems 
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
KDNuggets2013 
http://www.kdnuggets.com/polls/2013/analytics-big-data- mining-data-science-software.html
Definitions 
What is text mining?
Definition 
Data mining is the automated or semi-automated process of discovering patterns in data 
Text mining is the automated or semi-automated process of discovering patterns from textual data 
Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
Purposes 
Phrase 
Goal 
“Data Mining” 
“Text Mining” 
Inform actionabledecisions 
“Machine Learning” 
Determine best performingalgorithm
MarkTab Decision Cycle 
Analysis 
(science) 
Synthesis 
(art) 
GO 
Science needs science fiction --MarkTab
MarkTab Decision Cycle 
Analysis 
(science) 
Synthesis 
(art) 
GO
History 
How Microsoft’s semantic search came to be
History 
July 2008 
Microsoft purchases Powerset for US$100 Million 
Google Dismisses Semantic Searchhttp://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine- powerset-for-100m-plus/ 
http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel- cx_ag_0701powerset.html
History 
March 2009 
Google announces “snippets” as relevant to search 
The media picks this story up as “semantic search” http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google- results.html#!/2009/03/two-new-improvements-to-google-results.html
History 
February 2012 
Google announces Knowledge Graph, an explicit application of semantic searchhttp://mashable.com/2012/02/13/google-knowledge-graph-change-search/
History 
April 2012 
Microsoft purchases 800+ patents from AOL for US$1 Billion 
Among the patents are semantic search and metadata querying –older than Google 
http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
What is inside Semantic Search 
Text Mining introduced for SQL Server 2012
Future: Most data is Text 
Two Research Types 
•Quantitative research = data mining 
•Qualitative research = text mining 
The future is combining both
Statistical Semantic Search 
Comprises some aspects of text mining 
Identifies statistically relevant key phrases 
Based on these phrases, can identify (by score) similar documents
FileTables 
Built on existing SQL Server FILESTREAM technology 
Files and documents 
Stored in special tables in SQL Server 
Accessed if they were stored in the file system
Full-Text Search Enhancements 
Property search: search on tagged properties (such as author or title) 
Customizable NEAR: find words or phrases close to one another 
New Word Breakers and Stemmers (for many languages)
Logical Model 
How semantic search works
RowsetOutput 
with Scores 
Varchar 
NVarchar 
Office 
PDF 
From Documents to Output
(iFilterRequired) 
Documents 
Full-Text Keyword Index 
“FTI” 
iFilters 
Semantic Document Similarity Index “DSI” 
Semantic Database 
Semantic Key Phrase Index – 
Tag Index “TI”
Languages Currently Supported 
Traditional Chinese 
German 
English 
French 
Italian 
Brazilian 
Russian 
Swedish 
Simplified Chinese 
British English 
Portuguese 
Chinese (Hong Kong SAR, PRC) 
Spanish 
Chinese (Singapore) 
Chinese (Macau SAR)
Phases of Semantic Indexing 
Full Text Keyword Index “FTI” 
Semantic Key Phrase Index – 
Tag Index “TI” 
Semantic Document Similarity Index “DSI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
Interactive Demo 
SQL Server Management Studio
Semantic Search and SQL Server Data Mining 
SQL Server Data Tools: data mining plus text mining
Performance 
The Million-Dollar Edge
Integrated Full Text Search (iFTS) 
Improved Performance and Scale: 
Scale-up to 350M documents for storage and search 
iFTSquery performance 7-10 times faster than in SQL Server 2008 
Worst-case iFTSquery response times less than 3 sec for corpus 
Similar or better than main database search competitors 
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI 
First known linearly scaling end-to-end Search and Semantic product in the industry 
Time in Seconds vs. Number of Documents 
(2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
Office Delve 
http://youtu.be/cCbyer0Xupg
Microsoft Resources 
Links
Text Mining References 
Video 
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search 
http://www.microsoftpdc.com/2009/SVR32 
Semantic Search (Books Online) –explains the demo 
http://msdn.microsoft.com/en-us/library/gg492075.aspx 
Paper 
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Software 
SQL Server Enterprise (includes database engine, Analysis Services, SSMS and SSDT) 
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx 
Microsoft Office 365 Homehttp://office.microsoft.com/en-us/try
Organizations 
Professional Association for SQL Server http://www.sqlpass.org 
PASS Business Analytics Conference http://www.passbaconference.com
Interactive 
Takeaways
Connect 
Data Mining Resources and blog http://marktab.net 
Linked In 
Twitter @marktabnet
Conclusion 
SQL Server provides data mining and semantic search 
The core technology allows document similarity matching 
The results can be combined with SQL Server Data Mining (such as Association Analysis)

Applied Enterprise Semantic Mining -- Charlotte 201410

  • 1.
    Applied Enterprise SemanticMining Mark Tabladillo, Ph.D. Microsoft MVP | Consultant (SolidQ) Charlotte, NC –SQL Saturday 330 October 4, 2014
  • 2.
  • 3.
    Mark Tab SQLServer MVP; SAS Expert Consulting Training Teaching Presenting Linked In @MarkTabNet
  • 4.
    Quick Look MySemantic Search
  • 5.
    Interactive Name threethings you want from enterprise text mining
  • 6.
    Introduction SQL Serverhas new Programmability Enhancements Statistical Semantic Search File Tables Full-Text Search Improvements These combined technologies make SQL Server 2014 a strong contender in text mining
  • 7.
    Outline Why Microsoftis competitive for data mining Definitions: what is text mining? History: how Microsoft’s semantic search was born What is inside semantic search Logical model Demos Performance Microsoft Resources
  • 8.
    Why Microsoft isCompetitive for Data Mining Based on 2012 and 2013 Surveys
  • 9.
    Gartner 2013 MagicQuadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb–February 5, 2013
  • 10.
    Gartner 2013 MagicQuadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb–January 31, 2013
  • 11.
  • 12.
    Definitions What istext mining?
  • 13.
    Definition Data miningis the automated or semi-automated process of discovering patterns in data Text mining is the automated or semi-automated process of discovering patterns from textual data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  • 14.
    Purposes Phrase Goal “Data Mining” “Text Mining” Inform actionabledecisions “Machine Learning” Determine best performingalgorithm
  • 15.
    MarkTab Decision Cycle Analysis (science) Synthesis (art) GO Science needs science fiction --MarkTab
  • 16.
    MarkTab Decision Cycle Analysis (science) Synthesis (art) GO
  • 17.
    History How Microsoft’ssemantic search came to be
  • 18.
    History July 2008 Microsoft purchases Powerset for US$100 Million Google Dismisses Semantic Searchhttp://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine- powerset-for-100m-plus/ http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel- cx_ag_0701powerset.html
  • 19.
    History March 2009 Google announces “snippets” as relevant to search The media picks this story up as “semantic search” http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google- results.html#!/2009/03/two-new-improvements-to-google-results.html
  • 20.
    History February 2012 Google announces Knowledge Graph, an explicit application of semantic searchhttp://mashable.com/2012/02/13/google-knowledge-graph-change-search/
  • 21.
    History April 2012 Microsoft purchases 800+ patents from AOL for US$1 Billion Among the patents are semantic search and metadata querying –older than Google http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
  • 22.
    What is insideSemantic Search Text Mining introduced for SQL Server 2012
  • 23.
    Future: Most datais Text Two Research Types •Quantitative research = data mining •Qualitative research = text mining The future is combining both
  • 24.
    Statistical Semantic Search Comprises some aspects of text mining Identifies statistically relevant key phrases Based on these phrases, can identify (by score) similar documents
  • 25.
    FileTables Built onexisting SQL Server FILESTREAM technology Files and documents Stored in special tables in SQL Server Accessed if they were stored in the file system
  • 26.
    Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  • 27.
    Logical Model Howsemantic search works
  • 28.
    RowsetOutput with Scores Varchar NVarchar Office PDF From Documents to Output
  • 29.
    (iFilterRequired) Documents Full-TextKeyword Index “FTI” iFilters Semantic Document Similarity Index “DSI” Semantic Database Semantic Key Phrase Index – Tag Index “TI”
  • 30.
    Languages Currently Supported Traditional Chinese German English French Italian Brazilian Russian Swedish Simplified Chinese British English Portuguese Chinese (Hong Kong SAR, PRC) Spanish Chinese (Singapore) Chinese (Macau SAR)
  • 31.
    Phases of SemanticIndexing Full Text Keyword Index “FTI” Semantic Key Phrase Index – Tag Index “TI” Semantic Document Similarity Index “DSI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 32.
    Interactive Demo SQLServer Management Studio
  • 33.
    Semantic Search andSQL Server Data Mining SQL Server Data Tools: data mining plus text mining
  • 34.
  • 35.
    Integrated Full TextSearch (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTSquery performance 7-10 times faster than in SQL Server 2008 Worst-case iFTSquery response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 36.
    Linear Scale ofFTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 –K. Mukerjee, T. Porter, S. Gherman–Microsoft)
  • 37.
  • 38.
  • 39.
    Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) –explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 40.
    Software SQL ServerEnterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx Microsoft Office 365 Homehttp://office.microsoft.com/en-us/try
  • 41.
    Organizations Professional Associationfor SQL Server http://www.sqlpass.org PASS Business Analytics Conference http://www.passbaconference.com
  • 42.
  • 43.
    Connect Data MiningResources and blog http://marktab.net Linked In Twitter @marktabnet
  • 44.
    Conclusion SQL Serverprovides data mining and semantic search The core technology allows document similarity matching The results can be combined with SQL Server Data Mining (such as Association Analysis)