Secrets of Enterprise Data Mining
Upcoming SlideShare
Loading in...5
×
 

Secrets of Enterprise Data Mining

on

  • 539 views

If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel ...

If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT. Technology includes SQL Server 2012 SP1, Office 2013, Windows 8 Professional.

Statistics

Views

Total Views
539
Views on SlideShare
512
Embed Views
27

Actions

Likes
1
Downloads
13
Comments
0

4 Embeds 27

http://tweetedtimes.com 20
http://marktab.net 4
http://www.marktab.net 2
https://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Secrets of Enterprise Data Mining Secrets of Enterprise Data Mining Presentation Transcript

  • Secrets of EnterpriseData MiningMark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)PASS SQL Saturday #177 Mountain View, CAFebruary 23, 2013
  • NetworkingInteractive
  • About MarkTabTraining and Consulting with Ph.D. – Industrial Engineering,http://marktab.com Georgia TechData Mining Resources and Blog at Training and consultinghttp://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter View slide
  • InteractiveName (up to) three things you want from enterprisedata mining View slide
  • Secret: Excel dataminingExcel add-in for SQL Server data mining
  • Secret: More than justSQL ServerMicrosoft continues to add machine learningtechnology
  • Microsoft OffersBing MapsXbox Kinect Hacker MagnetSQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop PartnershipExcel Projects from Microsoft Research
  • DefinitionsWhat is data mining?
  • DefinitionData mining is the automated or semi-automated process ofdiscovering patterns in dataMachine learning is the development and optimization ofalgorithms for automated or semi-automated pattern discovery
  • Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Determine best performing Learning” algorithm
  • Secret: Give artists artData mining is part of a complete decision cycle
  • MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  • MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  • XKCD: Shopping Teams
  • XKCD: Shopping Teams
  • XKCD: Shopping Teams
  • Secret: Microsoft is ananalytics competitorIndustry Comparisons 2012-2013
  • Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  • Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  • KDNuggets 2012http://marktab.net/datamining/2012/06/15/excel-number-commercial-tool-analytics-data-mining-big-data/
  • SQL Server 2012Business Intelligence and Business Analytics
  • New Platform options: managed services Platform Infrastructure Platform Software(Self Managed) (as a Service) (as a Service) (as a Service) Applications Applications Applications Applications Data Data Data Data Runtime Runtime Runtime Runtime Middleware Middleware Middleware Middleware Managed Services Database Database Database Database Managed Services O/S O/S O/S O/S Virtualization Virtualization Virtualization Virtualization Managed Services Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
  • SQL Release timelines 2008 SQL Server 2008 2012 SQL Server 2012 AlwaysOn Columnstore 1989 1993 2000 Sparse Columns FileTable SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search (OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1991 1995 1998 2005 2010 SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2 (OS/2) Dynamic Locking Unicode Support Data-tier Apps Auto-Tuning Native XML StreamInsight Full-text search SQLCLR PowerPivot Replication Service Broker Master Data Services Analysis Services Integration Services Aug 11 Aug 10 New Portal Experience SQL Azure SU4 RTW Feb 11 Sparse Columns Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3 Web Admin Dec DataSync CTP2 Update 10 SQL Azure DataSync CTP3 Apr 10 Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Feb 10 Jun 10 Nov 10 Apr 11 SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers Spatial Type Server Mgmt API HierarchyId Type JDBC DAC Upgrade
  • Secret: Many alreadyhave Microsoft analyticsBusiness Intelligence and Business Analytics areincluded with most SQL Server licenses
  • Data platform: SQL Server 2012 Data Integration Database Services Analytical Services Reporting Services Services SQL Server* Integration Services* Reporting Services* Analysis Services* SQL Azure* SQL Azure Reporting* Master Data Services* Replication Data Mining Report Builder SQL Azure Data Sync* Data Quality Services* Full Text & Semantic StreamInsight* PowerPivot* Power View* Search* Project “Austin”** New / improved in SQL Server 2012
  • SQL Server 2012 Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  • Secret: Microsoft offersthree enterprise toolsAll three tools support scaled solutions
  • What Enterprise Tools support MicrosoftData Mining? Data Mining SSMS SSIS PowerShell
  • Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • Variable 0 1 2 3 4 5 6 7DiscretizedDiscretizedContinuousDiscrete
  • Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis 2^31-1 = 2,147,483,647 Services database Maximum data mining attributes (variables) per 2^31-1 = 2,147,483,647 structureReference:http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • Semantic SearchText Mining
  • Future: Most data is TextTwo Research Types• Quantitative research = data mining• Qualitative research = text miningThe future is combining both
  • Full-Text Search EnhancementsProperty search: search on tagged properties (such as author or title)Customizable NEAR: find words or phrases close to one anotherNew Word Breakers and Stemmers (for many languages)
  • (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  • Languages Currently SupportedTraditional Chinese Simplified ChineseGerman British EnglishEnglish PortugueseFrench Chinese (Hong Kong SAR, PRC)Italian SpanishBrazilian Chinese (Singapore)Russian Chinese (Macau SAR)Swedish
  • Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • Secret: Semantic Searchscales linearlyPerformance
  • Integrated Full Text Search (iFTS)Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors(2012, Michael Rys, Microsoft)
  • Linear Scale of FTI/TI/DSIFirst known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • Text Mining ReferencesVideo http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspxPaper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • Microsoft ResourcesLinks
  • SoftwareSQL Server 2012 Enterprise(includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspxMicrosoft Office 2012 Professional http://office.microsoft.com/en-us/try
  • Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/PASS Business Analytics Conference http://www.passbaconference.comMicrosoft TechEd North America http://northamerica.msteched.com/
  • InteractiveTakeaways
  • Conclusion: Seven SecretsExcel data miningMore than just SQL ServerSuccess involves everyoneMicrosoft is an analytics competitorMany already have Microsoft analyticsMicrosoft offers three enterprise toolsSemantic search scales linearly
  • ConnectData Mining Resources and blog http://marktab.netData Mining Training and Consulting (especially Microsoft and SAS)http://marktab.com
  • AbstractIf you have a SQL Server license (Standard or higher) then you already have the abilityto start data mining. In this new presentation, you will see how to scale up datamining from the free Excel 2013 add-in to production use. Aimed at beginning tointermediate data miners, this presentation will show how mining models move fromdevelopment to production. We will use SQL Server 2012 tools including SSMS, SSIS,and SSDT.