Microsoft has provided data mining technology in their SQL Server product since 2000, being the first major database vendor to put analytics at the database. SQL Server 2012 continues that leadership including the newly introduced semantic search (for text mining). This demonstration talk will outline this SQL Server technology, including the Excel 2013 add-in, SQL Server Integration Services for production scoring and data cleaning, and semantic search for text mining. This talk is appropriate for people new to data mining
4. About MarkTab
Training and Consulting with Ph.D. – Industrial Engineering,
http://marktab.com Georgia Tech
Data Mining Resources and Blog at Training and consulting
http://marktab.net internationally across many
industries – SAS and Microsoft
Contributed to peer-reviewed
research and legislation
Mentoring doctoral dissertations at the
accredited University of Phoenix
Presenter
6. Microsoft Offers
Bing
Maps
Xbox Kinect
Hacker Magnet
SQL Server 2012
Analysis Services (Multidimensional and Data Mining)
Integration Services
Semantic Search
Hadoop Partnership
Excel Projects from Microsoft Research
9. Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery
14. Gartner 2013
Magic Quadrant for
Business Intelligence
and Analytics
Platforms
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
– February 5, 2013
15. Microsoft Response
Focus on familiar, intuitive user experiences delivered via high quality, industry-leading
products that businesses already know and use today is key to making BI truly
accessible to all users.
By providing Business Intelligence capabilities in familiar tools such as Excel and
SharePoint, we empower an entirely new segment of business users to build and
consume rich BI solutions as part of their everyday work.
Delivering the server-side capabilities to enable self-service BI via SharePoint and SQL
Server provides a common, scalable data platform to handle any data, any size, from
anywhere, and tackle all of your Big Data needs.
Retrieved from http://blogs.msdn.com/b/microsoft_business_intelligence1/archive/2013/02/07/microsoft-in-
leaders-quadrant-of-gartner-magic-quadrant-for-business-intelligence-and-analytics-platforms.aspx -- Feb 2013
16. Gartner 2013
Magic Quadrant for
Data Warehouse
Database
Management
Systems
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
– January 31, 2013
19. New Platform options: managed services
Platform Infrastructure Platform Software
(Self Managed) (as a Service) (as a Service) (as a Service)
Applications Applications Applications Applications
Data Data Data Data
Runtime Runtime Runtime Runtime
Middleware Middleware Middleware Middleware
Managed Services
Database Database Database Database
Managed Services
O/S O/S O/S O/S
Virtualization Virtualization Virtualization Virtualization
Managed Services
Servers Servers Servers Servers
Storage Storage Storage Storage
Networking Networking Networking Networking
20. SQL Release timelines 2008
SQL Server 2008
2012
SQL Server 2012
AlwaysOn
Columnstore
1989 1993 2000 Sparse Columns FileTable
SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search
(OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
1991 1995 1998 2005 2010
SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2
(OS/2) Dynamic Locking Unicode Support Data-tier Apps
Auto-Tuning Native XML StreamInsight
Full-text search SQLCLR PowerPivot
Replication Service Broker Master Data Services
Analysis Services Integration Services
Aug 11
Aug 10
New Portal Experience
SQL Azure SU4 RTW Feb 11
Sparse Columns
Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3
Web Admin Dec DataSync CTP2 Update
10 SQL Azure DataSync CTP3
Apr 10
Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service
SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL
Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11
Feb 10 Jun 10 Nov 10 Apr 11
SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next
Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers
Spatial Type Server Mgmt API
HierarchyId Type JDBC
DAC Upgrade
21. Data platform: SQL Server 2012
Data Integration
Database Services Analytical Services Reporting Services
Services
SQL Server* Integration Services* Reporting Services*
Analysis Services*
SQL Azure* SQL Azure Reporting*
Master Data Services*
Replication
Data Mining Report Builder
SQL Azure Data Sync*
Data Quality Services*
Full Text & Semantic
StreamInsight* PowerPivot* Power View*
Search*
Project “Austin”*
* New / improved in SQL Server 2012
22. SQL Server 2012 Editions
Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
23. What Enterprise Tools support Microsoft
Data Mining?
Data
Mining
SSMS SSIS PowerShell
31. Data Mining Capacities
SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers
Maximum data mining models per structure 2^31-1 = 2,147,483,647
Maximum data mining structures per solution 2^31-1 = 2,147,483,647
Maximum data mining structures per Analysis
2^31-1 = 2,147,483,647
Services database
Maximum data mining attributes (variables) per
2^31-1 = 2,147,483,647
structure
Reference:
http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
34. Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both
35. Statistical Semantic Search
Comprises some aspects of text mining
Identifies statistically relevant key phrases
Based on these phrases, can identify (by score) similar documents
36. FileTables
Built on existing SQL Server FILESTREAM technology
Files and documents
Stored in special tables in SQL Server
Accessed if they were stored in the file system
37. Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)
38. From Documents to Output
Office
Varchar
PDF
NVarchar
Rowset
Output
with Scores
39. (iFilter Required)
iFilters Full-Text
Documents Keyword
Index
“FTI”
Semantic
Key Phrase
Semantic Index –
Semantic Document Database Tag Index
Similarity Index “DSI” “TI”
40. Languages Currently Supported
Traditional Chinese Simplified Chinese
German British English
English Portuguese
French Chinese (Hong Kong SAR, PRC)
Italian Spanish
Brazilian Chinese (Singapore)
Russian Chinese (Macau SAR)
Swedish
41. Phases of Semantic Indexing
Full Text Keyword Index “FTI”
Semantic Document Similarity
Index “DSI”
Semantic Key Phrase Index –
Tag Index “TI”
http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
42. Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTS query performance 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
43. Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry
Time in Seconds vs. Number of Documents
(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
44. Text Mining References
Video
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-
Search
http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) – explains the demo
http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
46. Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
http://office.microsoft.com/en-us/try
47. Organizations
Professional Association for SQL Server http://www.sqlpass.org
Atlanta MDF http://www.atlantamdf.com/
Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/
49. Conclusion
Microsoft competes well with other vendors
Business Intelligence and Analytics
Data Warehouse
Excel
SQL Server Data Mining 2012 provides data mining and semantic search
50. Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com