Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

03 The Art of Data Mining

132 views

Published on

The Art of Data Mining
Researched & Presented by Harvey Nash Vietnam

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

03 The Art of Data Mining

  1. 1. HARVEY 2-3. NASH THE ART OF DATA MINING Vinh Phan
  2. 2. http: wwwjphoneincanada ca reviews facebook—iphone—app—now—shows— Customers Who Bought This Item Also Bought PIILIIIWVI AIOALVTIES < Predictive Analytics The Power to Predict atg Data A Revolution That will Transform » Ertc Slegel Vlklm Mayevschantzerger flfififi (106) fififlfii (145) . . Haflfizlver S16 57 -/ puma $1319 vvrmre The Signal and the Noise in 30 Minutes — The The 30 Minute Expert Sen ttfifi (10) Pan <. .:k $5 54 vvnme Thinking. Fast and Slow > Daniel Kahneman iiitt (717) Paperbaix $9 60 ~A’rIm<=
  3. 3. ‘ HE . I<; ..t*iC *Wl. _FEDC-~51 DE»! C > V Fiji ‘ W £7:jl[: £e’»_: ;‘»EE ( . I<; .£¥£>) izuy/ zrzxzzainutmnir
  4. 4. j""""’ Agenda 1 Why KDD is necessary? 2 Overview of KDD 3 Data mining concepts 4 Data mining techniques 5 Data mining algorithms in SQL server
  5. 5. 1|-""""’ The necessary of KDD . The amount of raw data stored in corporate databases is exploding fifi 5 Walmart«. ~ &. /,"/ at&t ? " Save money. Live better. L2 20M trans/ dav 300M calls/ day 1ooTb data J» . Raw data by itself does not provide much information . Competitive pressure is strong httb: cnsincere. eri. aIi'baba. com product1988323261-221338583 Ziploc Sandwich Baas Makinq Machine Plastic Baa Makiria Machi'rie. htrril htm: www. i:hiIdnet. com resources oriline-reputatinn-checklist httu: www. ebaV. cam itm Vinmae—Mt7biI—0iI—Gas—Peaa5iA5—Decal—The—Best— 230654548545
  6. 6. j"""""’ Overview of KDD . The overall process of discovering useful knowledge from data . Extracting non—obvious, hidden knowledge from large volumes of data automatically 3. com
  7. 7. 1"-""" KDD Process ( Evaluation & Presentation '4 Data mining ( Selection & Transformation A (Cleaning & Integration Task—Re| eva nt Data I I I I I I I I A I I I I I I I DATA Warehouse SOURCES L _ _ _ _ _ _ _ L—_—————+ I I I I I I I I I I" I I I I I I I I '6 Source image: Internet
  8. 8. 1|-"""" Data mining concepts Answer questions that can’t be resolved traditionall) Extraction of interesting pattern from huge data Finding hidden info that experts may miss Predict behaviour and future trends httg: [[www. datatudetechnologies. comgservicesghg
  9. 9. ji-"" Applications of Data mining N PI ltii Customer service htt : www. dokasoft. com onmuhasebesmart ee htt : wamen. l<a oak. cam view45182.htmI http: yusuftravel. bIaqsi; iot. L‘am/2014 12 01 archivejitmi ° httgfiexguisiteeliguids. com(IovnIty[ S m ll httas: [/cherrvberrybear. wardaresscom 2012/05/08 precious-iunk-mail
  10. 10. —--"” Data mining techniques Classification Sequential Pattern 2 Clustering Association Rule 4 Regression
  11. 11. MS Dcatca nining ca| g=orithms Microsoft Decision Trees ' , ,1 . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... . . . '7' I"‘-'~I“"‘I J Microsoft Linear Regression VI . . . 7 ‘ . i“"i_', I“‘II , I, Microsoft Time Series V’ "3 V‘ ‘ "‘ “ ‘R I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I I u I I I II ? Q0 "_‘, ‘I| "'". “ , 1 Microsoft Clustering 9‘ i ‘ . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. I L“ 4 V“ I“ i‘ ‘ I l‘ ' 3 I. it. II“‘ J Microsoft Naive Bayes '1 '0 ‘ i _ . ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... .. . . . I ' ' '0 ‘.75=‘‘‘' I J Microsoft Association Rules . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . o . . . . . - httg: [[www. daaxgroug, ca[ _ _ 7 _ — — — — — 7 T 7
  12. 12. 1|--" Association rule algorithm iIttg: _/zbstigmafree. argzblogz httg: [[www. srgerrott. comzgroductszheirieken-usa httgzzlazo. comgrocerz-i: ougans[babz-cougons
  13. 13. 1---"’ Association rule algorithm . An important data mining model studied extensively by the database and data mining community . Finding frequent associations, correlations among sets of items in transaction databases . Apply widely for basket data analysis, cross- marketing .
  14. 14. 1|---"’ Association rule algorithm Frequent itemsets / I l Item Sup - Bread 9 Coke Items 1 Bead, Coke, Milk {Bread, Coke} _ Coke 9 Bread 2 Beer, Coke, Diaper {Been Coke} 3 Bread, Beer, Coke, Diaper I Beer, Coke 9 Diaper {Beer, Diaper} { Beef, C0k€, I Beer, Diaper 9 Coke Beer, Diaper Diaper } 4 - MinSup = 2 - Support(X= {Beer, Coke}) = 2 - MinConf= 0.6 - Confident (Beer->Coke: = 2/3
  15. 15. [7, {J2 . l E *9-Ii : fE; fk<f rim ; I‘r'= ' 9-'9 C| _l. ';PT U1 I_i‘_l, i‘_l’ .1 The decision tree is The most common probably the most ‘L To data mining task popu| ar data for a decision tree Fast training, high accuracy, _ L V understandable patterns easily Using recursive " technique to split V ’ data into subsets
  16. 16. Decision trees calgoiilhm l Ag: In-mm-. ~ l : iiI. r-1l: m‘i "Ii: XIZl1§_, lf-l'flIl; —‘ Elm l :33: l In-mm-. -‘lui-. l=nii l -1if: U.ll§_, l{= lTlI! :«‘ Flu‘ <=3O High No Fair No >40 Medium No Fair ? <=3O High No Excellent No <=3O Low No Fair ? 31.. .40 High No Fair Yes >40 Medium No Fair Yes 6 >40 Low Yes Fair Yes >40 Low Yes Excellent No 31.. .40 Low Yes Excellent Yes <=3O Medium No Fair No 3-“": m'l: : ‘—, ;“: n[‘ ; filn. u[g <=3O Low Yes Fair Yes Yes >40 Medium Yes Fair Yes <=3O Medium Yes Excellent Yes No Yes Excellent Fair 31.. .40 Medium No Excellent Yes I I I 31.. .40 High Yes Fair Yes i No Yes i N0 Yes >40 Medium No Excellent No
  17. 17. ZI-""" Data mining challenges 6 Poor-quality data 6 Data variety I. -I I 6 Data security glwllllgyxgggi 6 Dealing with huge datasets @ Lack of understanding of data mining techniques https: //com munity. qIik. com/ blogs/ theqIikviewbIog/2012/10/10/overcome—chaIIenges—with—the—qIikview—governance—dashboard
  18. 18. Deno Area Segment Gender / Housing Segment/ Status ‘R A Is Employee Marial Status Age Segment httu: //www. c|ker. com/ cliDart-teamstiil-Derson-icon-b| ue. htm| _ _ 7 _ — — — — i 7
  19. 19. T""""’. f Question & Answer http: //www. fourerricom/ Writinz/ Z7668/answer-your-any-questions
  20. 20. THANK YOU
  21. 21. 1I—"""’ References to Study More - Data Mining Tutorials in MSDN: htt 2 technet. microsoft. com en-us librar bb677206.as X I Data Mining Algorithms in MSDN: htt : technet. microsoft. com en-us librar ms175595.as x I Data Mining with SQL Server 2008 Book: htt : www. amazon. com Data-Minin -Microsoft-Server-2008 d 0470277742

×