Data Mining Beyond Adventure Works

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    2 Favorites

    Data Mining Beyond Adventure Works - Presentation Transcript

    1. Data Mining beyond Adventure Works Mark Tabladillo Ph D Ph.D. MTabladillo <(at)> solidq.com April 25, 2009 25
    2. Outline • Data Mining Fundamentals • Interactive Demos •CConclusion li 2
    3. Approach of this Presentation • Emphasize – Conceptual value of data mining – Relationship of data mining to the real world ld • Reserve – Specific procedures and mechanics – Specific mathematics – Production implementation 3 © 2008 Mark Tabladillo Ph.D.
    4. Interactive Demos • Sports • Government Forecasting 4 © 2008 Mark Tabladillo Ph.D.
    5. Data Mining Definitions • Data mining is the automatic or semi- automatic process of exploring data for t ti f l i dt f meaningful or useful patterns. • Data mining algorithms typically use estimation or optimization to achieve results (as opposed to only calculations). 5 © 2008 Mark Tabladillo Ph.D.
    6. Microsoft Data Mining • Microsoft Data Mining refers to Microsoft’s specific implementation of Mi ft’ ifi i l t ti f certain common data mining algorithms for the th DMX (D t Mi i E t (Data Mining Extensions) i ) language. • Also called SQL Server Data Mining, the technology is implemented through tools rather than through a single, finished application interface. 6 © 2008 Mark Tabladillo Ph.D.
    7. Data Mining Tasks • Supervised – Answer known, what is correlated? • Unsupervised – Answer unknown (unspecified), what are the groups? • Forecasting – Given a trend, what is next? , Value Slide 7 © 2008 Mark Tabladillo Ph.D.
    8. List the Data Mining Algorithms • Ten Answers • Each one is a field of academic focus 8 © 2008 Mark Tabladillo Ph.D.
    9. The Data Mining Algorithms • Microsoft Naive Bayes • Microsoft Linear R Mi ft Li Regression i • Microsoft Decision Trees • Microsoft Time Series • Microsoft Clustering • Microsoft Sequence Clustering • Microsoft Association Rules • Microsoft Neural Networks • Microsoft Logistic Regression • Text Mining 9 © 2008 Mark Tabladillo Ph.D.
    10. The Analyze Tab Menu Option Data Mining Algorithm Analyze Key Influencers Naïve Bayes Detect Categories Clustering Fill from Example Logistic Regression Forecast Time Series Highlight Exceptions Clustering Scenario Analysis (Goal Seek) Logistic Regression Scenario Analysis (What If) Logistic Regression Prediction Calculator Logistic Regression Shopping Basket Analysis Association Rules 10 © 2008 Mark Tabladillo Ph.D.
    11. Demo One: National League Baseball • Directions: You Y are on the management team for the th tt f th Atlanta Braves. To better serve the team, you hhave b been i t t d b th owner t instructed by the to group the players by considering both their position and th i salary. iti d their l 11 © 2009 Mark Tabladillo Ph.D.
    12. Demo One: National League Baseball • The following rules apply: – Players of different position may be in the same group – Y must make more than one group You t k th – Each group must have at least two players 12 © 2009 Mark Tabladillo Ph.D.
    13. Demo One: National League Baseball • Individual attributes can be used to make groups • Historical statistics can be used to group new players • Both supervised and unsupervised p p grouping can be performed on the same data 13 © 2009 Mark Tabladillo Ph.D.
    14. Demo Two: Government Forecasting • Directions: The P id t is ki Th President i asking your opinion on ii how the following numbers will increase over th next f the t few months. B th Because thithis project is sensitive, you do not know what these numbers measure. H th b However, b basedd on the available history, make your best projection f th next fi periods. j ti for the t five id 14 © 2009 Mark Tabladillo Ph.D.
    15. Demo Two: Unemployment Projections 5.2 5 4.8 4.6 4.4 4.2 42 4 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr 2006200620062006200620062006200620062006200620062007200720072007200720072007200720072007200720072008200820082008 15 © 2009 Mark Tabladillo Ph.D.
    16. Demo Two: Unemployment Projections 9 8 7 6 5 4 3 2 1 0 Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar 200720072007200720072007200720072007200720072007200820082008200820082008200820082008200820082008200920092009 16 © 2009 Mark Tabladillo Ph.D.
    17. Demo Two: Unemployment Projections • Rapid response is as useful as prediction • Seek intelligent correlations among related metrics • Projections depend on time frame – modeling is continual g 17 © 2009 Mark Tabladillo Ph.D.
    18. Forecasting Algorithms • Microsoft Time Series Value Slide 18 © 2008 Mark Tabladillo Ph.D.
    19. Supervised Algorithms • Microsoft Naive Bayes • Microsoft Linear R Mi ft Li Regression i • Microsoft Decision Trees • Microsoft Neural Networks • Microsoft Logistic Regression Value Slide 19 © 2008 Mark Tabladillo Ph.D.
    20. Unsupervised Algorithms • Microsoft Clustering • Microsoft Sequence Clustering Mi ft S Cl t i • Microsoft Association Rules • Text Mining Value Slide 20 © 2008 Mark Tabladillo Ph.D.
    21. Resources • MarkTab.NET Links, video resources and information for data mining • Data Mining with Microsoft SQL Server 2008 by Jamie MacLennan (Author), ZhaoHui Tang (Author), Bogdan Crivat (Author) • Smart Business Intelligence Solutions with Microsoft® SQL Server® 2008 (PRO- Developer) by Lynn Langit (Author), Matthew Roche (Author) • Solid Quality Mentors 21 © 2008 Mark Tabladillo Ph.D.
    22. Regroup and Conclusion • Main Points from this Presentation 22 © 2008 Mark Tabladillo Ph.D.
    23. Contact Information • Mark Tabladillo mtabladillo <{ t}> solidq.com t bl dill <{at}> lid • Also on: Linked In Facebook 23 © 2008 Mark Tabladillo Ph.D.
    24. Bonus: Sequence Clustering Ideas • Trading players in professional sports • Assigning l A i i players t certain positions to ti iti • Moving from city to city • Store path at the mall • Cancer treatment path • Taking up a musical instrument • Taking up sports 24 © 2008 Mark Tabladillo Ph.D.
    25. Data Mining beyond Adventure Works @ Atlanta SQL Saturday  Mark Tabladillo, Ph.D.  April 25, 2009  Demonstration One:  Baseball Management  Directions:  You are on the management team for the Atlanta Braves.  To better serve the team, you have been  instructed by the owner to group the players by considering both their position and their salary.  The following rules  apply:  1) Players of different position may be in the same group  2) You must make more than one group  3) Each group must have at least two players  Salary Position Team Name Group Atlanta Braves Bernero, Adam 450,000 Pitcher Atlanta Braves Betemit, Wilson 316,000 Shortstop Atlanta Braves Colon, Roman 318,500 Pitcher Atlanta Braves Estrada, Johnny 460,000 Catcher Atlanta Braves Franco, Julio 1,000,000 First Baseman Atlanta Braves Furcal, Rafael 5,600,000 Shortstop Atlanta Braves Giles, Marcus 2,350,000 Second Baseman Atlanta Braves Gryboski, Kevin 877,500 Pitcher Atlanta Braves Hampton, Mike 15,125,000 Pitcher Atlanta Braves Hudson, Tim 6,500,000 Pitcher Atlanta Braves Jones, Andruw 13,000,000 Outfielder Atlanta Braves Jones, Chipper 16,061,802 Outfielder Atlanta Braves Jordan, Brian 600,000 Outfielder Atlanta Braves Kolb, Dan 3,400,000 Pitcher Atlanta Braves Langerhans, Ryan 316,000 Outfielder Atlanta Braves LaRoche, Adam 337,500 First Baseman Atlanta Braves Martin, Tom 1,900,000 Pitcher Atlanta Braves Mondesi, Raul 1,000,000 Outfielder Atlanta Braves Orr, Pete 300,000 Second Baseman Atlanta Braves Perez, Eddie 625,000 Catcher Atlanta Braves Ramirez, Horacio 370,000 Pitcher Atlanta Braves Reitsma, Chris 1,650,000 Pitcher Atlanta Braves Smoltz, John 9,000,000 Pitcher Atlanta Braves Sosa, Jorge 650,000 Pitcher Atlanta Braves Thomson, John 4,250,000 Pitcher      
    26. Data Mining beyond Adventure Works @ Atlanta SQL Saturday  Mark Tabladillo, Ph.D.  April 25, 2009  Demonstration Two:  Government Statistics  Directions:    The President is asking your opinion on how the following numbers will increase over the next few months.  Because this  project is sensitive, you do not know what these numbers measure.  However, based on the available history, make your  best projection for the next five periods.  Year  Period  Value  2006  Jan  4.7  2006  Feb  4.8  2006  Mar  4.7  2006  Apr  4.7  2006  May  4.7  2006  Jun  4.6  2006  Jul  4.7  2006  Aug  4.7  2006  Sep  4.5  2006  Oct  4.4  2006  Nov  4.5  2006  Dec  4.4  2007  Jan  4.6  2007  Feb  4.5  2007  Mar  4.4  2007  Apr  4.5  2007  May  4.5  2007  Jun  4.6  2007  Jul  4.7  2007  Aug  4.7  2007  Sep  4.7  2007  Oct  4.8  2007  Nov  4.7  2007  Dec  4.9  2008  Jan  4.9  2008  Feb  4.8  2008  Mar  5.1  2008  Apr  5  2008  May  2008  Jun  2008  Jul  2008  Aug  2008  Sep   
    27. Data Mining beyond Adventure Works @ Atlanta SQL Saturday  Mark Tabladillo, Ph.D.  April 25, 2009  Demonstration Two:  Government Statistics  Directions:    The President is asking your opinion on how the following numbers will increase over the next few months.  Because this  project is sensitive, you do not know what these numbers measure.  However, based on the available history, make your  best projection for the next five periods.  Year  Period  Value  2007  Jan  4.6  2007  Feb  4.5  2007  Mar  4.4  2007  Apr  4.5  2007  May  4.5  2007  Jun  4.6  2007  Jul  4.7  2007  Aug  4.7  2007  Sep  4.7  2007  Oct  4.8  2007  Nov  4.7  2007  Dec  4.9  2008  Jan  4.9  2008  Feb  4.8  2008  Mar  5.1  2008  Apr  5  2008  May  5.5  2008  Jun  5.6  2008  Jul  5.8  2008  Aug  6.2  2008  Sep  6.2  2008  Oct  6.6  2008  Nov  6.8  2008  Dec  7.2  2009  Jan  7.6  2009  Feb  8.1  2009  Mar  8.5  2008  Apr     2008  May  2008  Jun  2008  Jul  2008  Aug   
    SlideShare Zeitgeist 2009

    + Mark TabladilloMark Tabladillo Nominate

    custom

    388 views, 2 favs, 0 embeds more stats

    Microsoft provides excellent tutorials and informat more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 388
      • 388 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 2
    • Downloads 23
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories