Document Classification using DMX in SQL Server Analysis Services

3,092 views
2,833 views

Published on

Presentation for SQL Saturday Raleigh NC, Septmber 18, 2010
Overview of using DMX (Data Mining Extensions) in Excel, SSMS (SQL Server Management Studio), BIDS (Business Intelligence Development Studio), and PowerShell

Published in: Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
3,092
On SlideShare
0
From Embeds
0
Number of Embeds
46
Actions
Shares
0
Downloads
44
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Document Classification using DMX in SQL Server Analysis Services

  1. 1. DocumentClassification usingDMX in AnalysisServicesMark Tabladillo Ph.D.http://marktab.netSeptember 18, 2010
  2. 2. SQL Saturday 46 -- Raleigh NC#sqlsat46 #MarkTabNet © 2010 Mark Tabladillo Ph.D. 2
  3. 3. MarkTab & Text Mining © 2010 Mark Tabladillo Ph.D.3
  4. 4. © 2010 Mark Tabladillo Ph.D.4
  5. 5. Outline © 2010 Mark Tabladillo Ph.D. Tools for DemosText Mining 5
  6. 6. Data Mining as a Service © 2010 Mark Tabladillo Ph.D.6
  7. 7. Text Mining ProductComparison from 2008 © 2010 Mark Tabladillo Ph.D. 7Feinerer, I., Hornik, K., & Meyer, D. (2008). Text Mining Infrastructure in R. Journal of Statistical Software, 25(5).
  8. 8. SQL Server Data MiningActivity HowPreprocess T-SQL; Integration Services; Data Mining Add-In for Excel; .NET programmingAssociate Microsoft Association Rules (algorithm) © 2010 Mark Tabladillo Ph.D.Cluster Microsoft Clustering (algorithm)Summarize Integration Services (Term Extraction, Term Lookup)Categorize Integration ServicesAPI Includes DMX, XMLA, AMO, ADOMD.NET 8
  9. 9. APIs for Data Mining Acronym Term Definition DMX Data Mining Extensions SQL-like queries (OLE DB for Data Mining) XMLA Extensible Markup Language for Client communication Analysis protocol © 2010 Mark Tabladillo Ph.D. AMO Analysis Management Objects .NET library to manage Analysis Services ADOMD.NET ActiveX Data Objects .NET Framework data (Multidimensional) for .NET provider 9
  10. 10. DMX Tasks• Data Definition • Create, Alter, Drop – Mining Structure • Create, Drop – Mining Model • Export and Import Models• Data Manipulation © 2010 Mark Tabladillo Ph.D. • Query Models, Content, Cases, Sample Cases, Dimension Content 10
  11. 11. SQL Server Data MiningApplications (User Interfaces)User Interface ActivityExcel (and PowerPivot for Excel) DMXBIDS (Business Intelligence Analysis Services Project; IntegrationDevelopment Studio) Services Project (T-SQL; DMX; XMLA) © 2010 Mark Tabladillo Ph.D.SSMS (SQL Server Management T-SQL; DMX; XMLAStudio)PowerShell version 2.0 T-SQL; DMX; XMLA AMO; ADOMD.NETSharePoint (Requires Setup or Customization)Your Name Here (Develop Your Own) ? 11
  12. 12. Outline © 2010 Mark Tabladillo Ph.D. Tools for DemosText Mining 12
  13. 13. Data: Presidential Addresses © 2010 Mark Tabladillo Ph.D. 13 http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470277742,descCd-DOWNLOAD.html
  14. 14. Excel• Use the 32-bit Excel add-in for Data Mining • Written for SQL Server 2008, ok for 2008 R2 • Written for Office 2007, ok for 2010• (Optional) Add the free PowerPivot add-in (http://powerpivot.com) © 2010 Mark Tabladillo Ph.D. 14
  15. 15. Click to edit Master title style Datasets & Models Public Cloud or On- Premise Private Cloud SQL Server • SQL Server PowerPivot Analysis • Access Data Sources Services • Oracle • Teradata • Sybase • Informix • DB2 • Data Feeds • Text Files ©2010 Predixion Software
  16. 16. BIDS• The preferred application for production data mining• Analysis Services Projects • Make Mining Structures and Models • Data Mining for OLAP Cubes • Excellent for Experimentation © 2010 Mark Tabladillo Ph.D.• Integration Services Projects • Term Extraction and Term Lookup Text Mining • Excellent for Production• Reporting Services Projects • Similar to Crystal Reports 16
  17. 17. SSMS• Production management and maintenance• Scripts can become stored procedures• T-SQL, DMX, MDX, XMLA © 2010 Mark Tabladillo Ph.D. 17
  18. 18. PowerShell• Object-oriented command prompt, now in version 2• Provides complete access to AMO, ADOMD.NET and DMX © 2010 Mark Tabladillo Ph.D. 18
  19. 19. Excel in Production• Can create and manage permanent data mining models• Can document data mining models• Can do some preprocessing (ETL) © 2010 Mark Tabladillo Ph.D. 19
  20. 20. BIDS in Production• Can create a production workflow with Integration Services projects• Can create production data mining models with Analysis Services projects © 2010 Mark Tabladillo Ph.D. 20
  21. 21. SSMS in Production• The standard production user interface for SQL Server• Also the standard production user interface for Analysis Services Databases• Built for • Scripting (T-SQL, MDX, DMX, XMLA) © 2010 Mark Tabladillo Ph.D. • Security • Assembly Registration (Analysis Services) • Stored Procedures (SQL Server) 21
  22. 22. PowerShell in Production• Features • Object-oriented • Command window or ISE (Integrated Scripting Environment) • Accesses .NET libraries and WMI (Windows Management Instrumentation) © 2010 Mark Tabladillo Ph.D. • Version two adds event and exception handling 22
  23. 23. Resources• MarkTab.NET Blog, links, video resources and information for data mining• Blog: http://marktab.net/datamining © 2010 Mark Tabladillo Ph.D.• Twitter: @MarkTabNet 23
  24. 24. Regroup and Conclusion• Main Points from this Presentation © 2010 Mark Tabladillo Ph.D. 24
  25. 25. Contact Information• Mark Tabladillo http://marktab.net• Also on: Twitter @marktabnet © 2010 Mark Tabladillo Ph.D. Linked In 25

×