Data Mining With Excel 2007 And SQL Server 2008

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    4 Favorites & 1 Group

    Data Mining With Excel 2007 And SQL Server 2008 - Presentation Transcript

    1. Data Mining with Excel 2007 and SQL Server 2008 d S Mark Tabladillo Ph.D. MTabladillo <(at)> solidq.com November 10, 2008
    2. Approach of this Presentation • Emphasize – Conceptual value of data mining – Relationship of data mining to the real world ld • Reserve – Specific procedures and mechanics – Specific mathematics – Production implementation © 2008 Mark Tabladillo Ph.D. 2
    3. Introduction • Microsoft Data Mining (MDM) is a major branch of SQL S b h f Server A l i S i Analysis Services (SSAS) • The technology is supported by a new language within SSAS called DMX (Data Mining Extensions) • Currently, the two promoted interfaces are y p BIDS (Business Intelligence Development Studio) and Excel 2007 ) © 2008 Mark Tabladillo Ph.D. 3
    4. Introduction • SQL Server 2008 has some improvements over 2005 b t th main t h l 2005, but the i technology i is similar • A major improvement for 2008 is the documentation (Books Online) • Microsoft’s team releases technology information at http://www.sqlserverdatamining.com © 2008 Mark Tabladillo Ph.D. 4
    5. Outline • Main Conclusions on Data Mining • D t Mi i Definition Data Mining D fi iti • Microsoft Data Mining Fundamentals • Overview of Microsoft Data Mining Algorithms • C Conclusion l i © 2008 Mark Tabladillo Ph.D. 5
    6. Four Interactive Demos • Card Sorting • Demographic Profiles • Sports ( p (College Football) g ) • Money (American Economy) © 2008 Mark Tabladillo Ph.D. 6
    7. Data Mining Definitions • Data mining is the automatic or semi- automatic process of exploring data for t ti f l i d t f meaningful or useful patterns. • Data mining algorithms typically use estimation or optimization to achieve results (as opposed to only calculations). © 2008 Mark Tabladillo Ph.D. 7
    8. Data Mining Provides Insight • Business – What reasons contribute to stock price changes? – Wh do l Why d longer tterm j bl jobless b benefits hit a 25 fit year high? • E t t i Entertainment t – Who is more likely to lose a civil lawsuit? – How well will new DVD sales do in the next few months? © 2008 Mark Tabladillo Ph.D. 8
    9. Data Mining Provides Insight • Sports – How much should a sports team offer for a proven free agent? – Wh t factors lead to winning a tennis What f t l dt i i t i championship? • T h l Technology – How does Cisco know there are warning signals i th t h sector? i l in the tech t ? – What is the net loss in losing corporate secrets? © 2008 Mark Tabladillo Ph.D. 9
    10. Data Mining Provides Insight • Politics – What priorities do American voters have for the new President? – Wh did a certain candidate win or l Why t i did t i lose a race? • S i Science – What factors contribute to ozone holes over the Antarctic? th A t ti ? – Why do we believe that Tyrannosaurus Rex had a good sense of smell? © 2008 Mark Tabladillo Ph.D. 10
    11. Functions in Technology • Job Titles = Rationalized System to Pay People L P l Less or Gi th Give them MMore Responsibility • “Engineer”? • “Scientist”? © 2008 Mark Tabladillo Ph.D. 11
    12. The Scientific Method • (Suppose you are a computer scientist) • Define the question • Gather information and resources (observe) • Form hypothesis • Perform experiment and collect data © 2008 Mark Tabladillo Ph.D. 12
    13. The Scientific Method • Analyze data – data mining is an option • Interpret data and draw conclusions that serve as a starting point for new hypothesis • Publish results • Retest (frequently done by other scientists) © 2008 Mark Tabladillo Ph.D. 13
    14. Microsoft Data Mining • Microsoft Data Mining refers to Microsoft’s specific implementation of Mi ft’ ifi i l t ti f certain common data mining algorithms for th DMX (D t Mi i E t the (Data Mining Extensions) i ) language. • Also called SQL Server Data Mining, the technology is implemented through tools rather than through a single, finished application interface. © 2008 Mark Tabladillo Ph.D. 14
    15. Data Mining Input and Results • Data mining input can include continuous numeric, categorized ( di l or nominal), i t i d (ordinal i l) and text data. • Data mining results consists of a lower dimensional model, either describing the empirical data (unsupervised), or the relationship between named input and output attributes (supervised) © 2008 Mark Tabladillo Ph.D. 15
    16. Data Explosion © 2008 Mark Tabladillo Ph.D. 16
    17. Donald Farmer – May 2008 \"[We don't] have all the functionality of something like a SAS or an SPSS, because that's just not our market,\" he that s market, conceded. It comes down to a difference of scale, according to Farmer. SAS and SPSS t i ll t F d typically target larger, more tl expensive deployments, typically with users well-versed in the usage of their tools. Microsoft is targeting a g g g different kind of data mining consumer: the Excel analyst, for example, who might not have much (if any) experience with data mining predictive analytics or mining, statistical analysis, for that matter. © 2008 Mark Tabladillo Ph.D. 17
    18. Donald Farmer – May 2008 \"By the way, I don't mean to say we can't hit the high-end. Within Microsoft, we have our own database marketing team. We're one of the largest companies in the world. We have a huge database marketing team who do classic customer analysis These guys were all SAS analysis. users, but when they joined Microsoft, they started using our tools. The entire process runs on our database, they actually use the Excel [data mining] add-ins to do it. It's not that there's nothing they don't miss, [it's that] they are able to achieve the same business results using our tools.“ Redmond Magazine – May 7, 2008 http://redmondmag.com/news/article.asp?EditorialsID=9836 © 2008 Mark Tabladillo Ph.D. 18
    19. Obtaining the Add-in © 2008 Mark Tabladillo Ph.D. 19
    20. Obtaining the Add-in (Nov 2008) http://www.microsoft.com/sqlserver/2008/en/us/data-mining-addins.aspx © 2008 Mark Tabladillo Ph.D. 20
    21. System Requirements • Supported Operating Systems: Windows Server 2003 Service Pack 2; Windows Server 2008; Windows Vista Service Pack 1; Windows XP Service Pack 3 • Microsoft .NET Framework 2.0. NET 20 • If installing the Table Analysis Tools or Data Mining Client for Excel, Microsoft Office 2007 with .NET Programmability Support. Supported editions of Office 2007 include: – Professional – Professional Plus – Ultimate – Enterprise • If installing the Data Mining Templates for Visio, Microsoft Visio Professional 2007 with .NET Programmability Support. • 40 MB of available hard disk space. • Note: The Data Mining Add-ins require a connection to one of the following versions of SQL Server 2008 Analysis Services: – Enterprise – Standard © 2008 Mark Tabladillo Ph.D. 21
    22. Delivering Predictive Analysis to Every User • Comprehensive – Extend the benefits of predictive analysis to all users delivering users, a full data mining development life cycle through the familiar environment of the 2007 Microsoft Office system. • I t iti Intuitive – Empower users to harness advanced data mining technologies, hiding complexity behind automated tasks that deliver actionable insight throughout the organization. • Collaborative – Share data mining models through interactive graphical visualizations, and deliver recommendation and insight with simple and prompt publishing capabilities. © 2008 Mark Tabladillo Ph.D. 22
    23. Top New Features • Score new cases to seek most profitable customers with new Prediction Calculator Calculator. • Discover cross-sell/up-sell opportunities to optimize offerings with new Shopping Basket Analysis. • Validate accuracy and stability of models simultaneously with new, richly formatted Cross Validation. • Generate summary reports to enhance referencing and collaboration with the new Document Model feature. © 2008 Mark Tabladillo Ph.D. 23
    24. SQL Server 2008 Menu Items © 2008 Mark Tabladillo Ph.D. 24
    25. Asking Permission © 2008 Mark Tabladillo Ph.D. 25
    26. Asking Permission Text DBA Person, I have downloaded and installed Microsoft SQL Server 2008 Data Mining Add-ins for Office 2007 on my machine ARCHITECT. These add-ins let me analyze my spreadsheet data in powerful ways by utilizing Microsoft SQL Server 2008 Analysis Services. In order to use these add-ins, I will need to be connected to an instance of Microsoft SQL Server 2008 Analysis Services that has been configured to support the add-ins. This configuration needs to be carried out by an administrator by following these steps: 1. Download the add-ins package from p g http://www.microsoft.com/sqlserver/2008/en/us/trial-software.aspx. 2. Launch the Setup, select the Server Configuration Tool and install it. 3. Run the Server Configuration Tool and follow the wizard steps. I would appreciate it if you could let me know whether it is possible for you to configure an instance of SQL Server 2008 Analysis Services as described above and give me access to it. Thank you you, Data Miner © 2008 Mark Tabladillo Ph.D. 26
    27. What is a model? © 2008 Mark Tabladillo Ph.D. 27
    28. List the Data Mining Algorithms • Ten Answers • Each one is a field of academic focus © 2008 Mark Tabladillo Ph.D. 28
    29. The Data Mining Algorithms • Microsoft Decision Trees • Microsoft Cl t i Mi ft Clustering • Microsoft Time Series • Microsoft Association Rules • Microsoft Sequence Clustering • Microsoft Naive Bayes • Microsoft Neural Network • Microsoft Linear Regression • Microsoft Logistic Regression • Text Mining © 2008 Mark Tabladillo Ph.D. 29
    30. What is a calculation? • Business intelligence relies on many common calculations. l l ti © 2008 Mark Tabladillo Ph.D. 30
    31. A Parable of Unity and Diversity • One day a parabola met a line. They each wondered aloud h d d l d how much th h d i h they had in common. They moved around to find out. Parabola Line © 2008 Mark Tabladillo Ph.D. 31
    32. The Analyze Tab Menu Option Data Mining Algorithm Analyze Key Influencers Naïve Bayes Detect Categories Clustering Fill from Example Logistic Regression Forecast Time Series Highlight Exceptions Clustering Scenario Analysis (Goal Seek) Logistic Regression Scenario Analysis (What If) Logistic Regression Prediction Calculator Logistic Regression Shopping Basket Analysis Association Rules © 2008 Mark Tabladillo Ph.D. 32
    33. Why Different Button Names? Menu Option Data Mining Algorithm Analyze Key Influencers Naïve Bayes Detect Categories Clustering Fill from Example Logistic Regression Forecast Time Series Highlight Exceptions Clustering Scenario Analysis (Goal Seek) Logistic Regression Scenario Analysis (What If) Logistic Regression Prediction Calculator Logistic Regression Shopping Basket Analysis Association Rules © 2008 Mark Tabladillo Ph.D. 33
    34. The Data Mining Tab • The ribbon has different regions: • Data Preparation • Data Modeling • Accuracy and Validation • Model Usage • Management • Connection © 2008 Mark Tabladillo Ph.D. 34
    35. Demo 1: Card Sorting • Take the sample of cards you have and put them into one or more groups. Write t th i t W it in the area below what your groups are. © 2008 Mark Tabladillo Ph.D. 35
    36. Demo 2: Demographic Profiles • Exercise 1. We will assume that each of the th 10 li t d people uses SQL S listed l Server technology as some part of their job. For the l th column marked “U G k d “UserGroup”, write i ” it in YES (and NO otherwise) for people you believe would b i t b li ld be interested i f t t d in future SQL Server user group meetings. © 2008 Mark Tabladillo Ph.D. 36
    37. Demo 2: Demographic Profiles • Exercise 2: Assume an average house in your neighborhood or area i f sale. F i hb h d is for l For the column marked “NewNeighbors”, write in i YES ( d NO otherwise) f people you (and th i ) for l believe might be a potential buyer for that average h home. © 2008 Mark Tabladillo Ph.D. 37
    38. What is unsupervised? • Model of the empirical data. © 2008 Mark Tabladillo Ph.D. 38
    39. What is supervised? • Model of the process between input and output attributes. t t tt ib t © 2008 Mark Tabladillo Ph.D. 39
    40. Scientific Progress • Why might two scientists come to slightly or widely different li htl id l diff t conclusions? © 2008 Mark Tabladillo Ph.D. 40
    41. Demo 3: Sports • Look at page 8C with the USA Today Coaches P ll B C h Poll. Based on thi li t ( d d this list (and other information on college football on this thi page) d you completely agree with ) do l t l ith the rankings? Why or why not? © 2008 Mark Tabladillo Ph.D. 41
    42. Demo 4: Money • Look at page 6B with the USA Today M k t Trends. Choose th Market T d Ch three specific ifi pieces of information on this chart which, to t you, illustrate the current state of the ill t t th t t t f th American Economy. © 2008 Mark Tabladillo Ph.D. 42
    43. Wittgenstein’s Duck-Rabbit © 2008 Mark Tabladillo Ph.D. 43
    44. Data Mining Examples Tour © 2008 Mark Tabladillo Ph.D. 44
    45. Data Mining • “Data” precedes “Mining” • “Data” – when is it easier? • “Data” – when is it harder? • “Mining” – when is it easier? • “Mining” – when is it harder? Mining © 2008 Mark Tabladillo Ph.D. 45
    46. Regroup and Conclusion • Main Points from this Presentation © 2008 Mark Tabladillo Ph.D. 46
    47. Resources • Microsoft SQL Server 2008 http://www.microsoft.com/sqlserver/2008/en/us/data-mining.aspx • SQL Server Data Mining http://www.sqlserverdatamining.com/ssdm/default.aspx • Adventure Works Tutorial – “SQL Server 2005 Data Mining Tutorial http://www.sqlserverdatamining.com/ssdm/Home/Tutorials/tabid/57/Default.aspx • MSDN Forums (“Katmai” = 2008, “SQL Server” = 2005 and before) http://forums.microsoft.com/MSDN/default.aspx?SiteID=1 • Data Mining with Microsoft SQL Server 2008 (Coming November 17, 2008) by Jamie MacLennan ( y ), g( (Author), ZhaoHui Tang (Author), Bogdan Crivat (Author) ), g ( ) • Smart Business Intelligence Solutions with Microsoft® SQL Server® 2008 (PRO- Developer) (Coming February 4, 2009) by Lynn Langit (Author), Matthew Roche (Author) • KD Nuggets (Data Mining and Knowledge Discovery Portal) http://www.kdnuggets.com/ • Association of Computing Machinery http://www.acm.org/ © 2008 Mark Tabladillo Ph.D. 47
    48. Contact Information • Mark Tabladillo mtabladillo <{ t}> solidq.com t bl dill <{at}> lid • Also on: Linked In Facebook © 2008 Mark Tabladillo Ph.D. 48

    + Mark TabladilloMark Tabladillo, 11 months ago

    custom

    4388 views, 4 favs, 0 embeds more stats

    Introduction to Excel 2007 Data Mining Plug-In usin more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 4388
      • 4388 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 4
    • Downloads 278
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories

    Groups / Events