SlideShare a Scribd company logo
1 of 30
Content
• Introduction
  –   Data Mining
  –   Association Rules
  –   Fuzzy logic
  –   Applications
• Procedure
  – Support and confidence
  – Steps
• Example:Risk analysis
  – Funda: Conditional probability
  – Example analysis
Data Mining
      • KDD
      • Extraction of
        Knowledge from Huge
        amounts of data
      • This knowledge is
        – implicit,
        – previously unknown
        – and potentially useful
Association rules
• Item sets (Z:C)

• Antecedent (X:A)

• Consequent (Y:B)

• Ex: If X is A, then Y is
  B
Application
• Strategic Decision
  Making
• Marketing Strategy
  formulation
• Predictive analytics:
   – CRM
   – Machine Maintenance
   – Employee Relations
• Artificial Intelligence :
  Video games, Robots
• Machines: Air conditioner,
  Washing machines, ABS
Procedure
• Two factors: Support and Confidence

    Trans_ID   Bread   Butter   Biscuit   Milk
       1         1       1        0        1
       2         0       1        1        0
       3         1       0        1        0
       4         1       1        0        1
       5         1       1        1        0
       6         1       1        1        1
Procedure
• Hypothesis : Customers who buy bread
  and butter, also buy milk.
      Trans_ID   Bread   Butter   Biscuit   Milk
         1         1       1        0        1
         2         0       1        1        0
         3         1       0        1        0
         4         1       1        0        1
         5         1       1        1        0
         6         1       1        1        1

• Support = Desired Outcome/ Total Opportunities
• Support = 3/6 = 0.5
Procedure
• Customers who buy bread and butter, also
  buy milk.
      Trans_ID   Bread   Butter   Biscuit   Milk
         1         1       1        0        1
         2         0       1        1        0
         3         1       0        1        0
         4         1       1        0        1
         5         1       1        1        0
         6         1       1        1        1

• Confident = Desired Outcome/ Desired Opportunities
• Confident = 3/4 = 0.75
Inference
• Hypothesis becomes Rule :
     Customers who buy bread and butter,
  also buy milk.
• With 75% confidence and 50% support
  from past transactions records
Procedure
Fuzzification
• Continuous to Discrete data



    Analysis
    • Threshold


         Defuzzification
         • State rule with confidence and
           support
Bank customer Data set
Case      Age    Income    Risk     Credit   Result
 1         20    52,623   –38,954    red       0
 2         26    23,047   –23,636   green      1
 3         46    56,810   45,669    green      1
 4         31    38,388   –7,968    amber      1
 5         28    80,019   –35,125   green      1
 6         21    74,561   –47,592   green      1
 7         46    65,341   58,119    green      1
 8         25    46,504   –30,022   green      1
 9         38    65,735   30,571    green      1
 10        27    26,047     –6       red       1

        RISK = ASSETS – DEBT – WANTS
Bank’s weight for each attribute
   and condition for analysis
Attribute   Weight
                       Function    Percentage
 Credit     0.800
                       Minimum
  Risk      0.700                     25
                       Support
Income      0.550      Minimum
                                      90
  Age       0.450     Confidence
 Result     0.691

 Objective : Provide Confident/Risk
factor for the bank to issue loans for
             the customers
Membership Function
Fuzzification
Attribute   Level     Representation   Weight   Membership   Support
                                                  value       (Rjk)
  Age       Young          R11         0.450      0.580       0.261
  Age       Middle         R12         0.450      0.300       0.135
  Age        Old           R13         0.450      0.131       0.059
Income       High          R21         0.550      0.000       0.000
Income      Middle         R22         0.550      0.890       0.490
Income       Low           R23         0.550      0.109       0.060
  Risk       High          R31         0.700      0.457       0.320
  Risk      Middle         R32         0.700      0.208       0.146
  Risk       Low           R33         0.700      0.332       0.233
 Credit     Good           R41         0.800      0.720       0.576
 Credit      Bad           R42         0.800      0.280       0.224
 Result     On Time        R51         0.691      0.930       0.643
 Result     Default        R52         0.691      0.069       0.048
Item set
• C = complete sets, individual items
• L = Set of items above minimum support,
     grouped items
• minsupp = 0.25
• Conditional probability = support
Apriori Algorithm
     START



    Compute
                               Eliminate
   conditional
                           items < minsupp
probability of each
                            to form SET L
element in SET C



                                 Is
                                L=0            YES : STOP




                            NO : nCr to form
                             new SET C
C1 -> L1 - >C2
C1    Support   L1
R11   0.261     R11        C2
R12   0.135            (R11 , R22)
R13   0.059            (R11 , R31)
R21   0.000            (R11 , R41)
R22   0.490     R22    (R11 , R51)
R23   0.060            (R22 , R31)
R31   0.320     R31    (R22 , R41)
R32   0.146            (R22 , R51)
R33   0.233            (R31 , R41)
R41   0.576     R41    (R31 , R51)
R42   0.224            (R41 , R51)
R51   0.643     R51
R52   0.048
C2 -> L2 -> C3
    C2        Support    L2
(R11 , R22)   0.235
(R11 , R31)   0.207                   C3
(R11 , R41)   0.212              (R22, R41, R51)
(R11 , R51)   0.230              (R22, R31, R41)
(R22 , R31)   0.237              (R22, R51, R31)
(R22 , R41)   0.419     (R22 ,   (R31, R41, R51)
(R22 , R51)   0.449      R41)
(R31 , R41)   0.266     (R22 ,
(R31 , R51)   0.264      R51)
(R41 , R51)   0.560     (R31 ,
                         R41)
                        (R31 ,
C3 -> L3 -> C4


      C3          Support         L3                  C4
(R22, R41, R51)   0.417     (R22, R41, R51)   (R22, R31, R41, R51)
(R22, R31, R41)   0.198
(R22, R51, R31)   0.196
(R31, R41, R51)   0.264     (R31, R41, R51)
C4 -> L4



      C4          Support       L4
                                     STOP
(R22, R31, R41,
                  0.1957
     R51)
Possible Associations

Items from L2   Associations   Items from L3   Associations
                  R22->R41
 (R22, R41)                                    R22, R41->R51
                  R41->R22
                  R22->R51      (R22, R41,
 (R22, R51)                                    R22, R51->R41
                  R51->R22        R51)
                  R3`->R41                     R51, R41->R22
 (R31,R41)
                  R41->R31
                  R31->R51                     R31, R41->R51
 (R31, R51)                     (R31, R41,
                  R31->R51
                                               R31, R51->R41
                  R41->R51        R51)
 (R41, R51)                                    R51, R41->R33
                  R51->R41
Confidence of each association

Items from                  Confide   Items from   Association
             Associations                                        Confidence
    L2                        nc          L3           s
  (R22,        R22->R41      0.855                  R22, R41-
                                                                   0.995
                                                     >R51
  R41)         R41->R22      0.727
                                      (R22, R41,    R22, R51-
               R22->R51      0.916                                 0.928
  (R22,                                 R51)         >R41
  R51)         R51->R22      0.697                  R51, R41-
                                                                   0.744
               R3`->R41      0.831                   >R22
(R31,R41)                                           R31, R41-
               R41->R31      0.462                                 0.993
                                                     >R51
  (R31,        R31->R51      0.825
                                      (R31, R41,    R31, R51-
                                                                   1.000
  R51)         R31->R51      0.410
                                        R51)         >R41
  (R41,        R41->R51      0.972                  R51, R41-
                                                                   0.472
                                                     >R31
  R51)         R51->R41      0.870
Associations meeting minconf

Items from                  Confide   Items from   Association
             Associations                                        Confidence
    L2                        nc          L3           s
  (R22,        R22->R41      0.855                  R22, R41-
                                                                   0.995
                                                     >R51
  R41)         R41->R22      0.727
                                      (R22, R41,    R22, R51-
               R22->R51      0.916                                 0.928
  (R22,                                 R51)         >R41
  R51)         R51->R22      0.697                  R51, R41-
                                                                   0.744
               R3`->R41      0.831                   >R22
(R31,R41)                                           R31, R41-
               R41->R31      0.462                                 0.993
                                                     >R51
  (R31,        R31->R51      0.825
                                      (R31, R41,    R31, R51-
                                                                   1.000
  R51)         R31->R51      0.410
                                        R51)         >R41
  (R41,        R41->R51      0.972                  R51, R41-
                                                                   0.472
                                                     >R31
  R51)         R51->R41      0.870
Confident Associations that meet
      the objective of the analysis

Items from                  Confide   Items from   Association
             Associations                                        Confidence
    L2                        nc          L3           s
  (R22,        R22->R41      0.855                  R22, R41-
                                                                   0.995
                                                     >R51
  R41)         R41->R22      0.727
                                      (R22, R41,    R22, R51-
               R22->R51      0.916                                 0.928
  (R22,                                 R51)         >R41
  R51)         R51->R22      0.697                  R51, R41-
                                                                   0.744
               R3`->R41      0.831                   >R22
(R31,R41)                                           R31, R41-
               R41->R31      0.462                                 0.993
                                                     >R51
  (R31,        R31->R51      0.825
                                      (R31, R41,    R31, R51-
                                                                   1.000
  R51)         R31->R51      0.410
                                        R51)         >R41
  (R41,        R41->R51      0.972                  R51, R41-
                                                                   0.472
                                                     >R31
  R51)         R51->R41      0.870
Defuzzification
• If Income is middle, then payment will be
  received on time
       R22->R51;(91.6%)
• If Credit is good, then payment will be
  received on time
       R41->R51;(97.2%)
• If Income is middle and Credit is good, then
  payment will be received ontime
       R41, R22-> R51; (99.5%)
• If Risk is high and Credit is good, then
  payment will be received on time
       R31, R41->R51; (99.25%)
Conclusion
References
Data Mining Techniques for Risk Analysis

More Related Content

Similar to Data Mining Techniques for Risk Analysis

Borgatti dagstuhl 2008 presentation 2c
Borgatti   dagstuhl 2008 presentation 2cBorgatti   dagstuhl 2008 presentation 2c
Borgatti dagstuhl 2008 presentation 2cSteve Borgatti
 
EuroQol Agenda and Developing the New EQ-5D-5L Value Sets
EuroQol Agenda and Developing the New EQ-5D-5L Value SetsEuroQol Agenda and Developing the New EQ-5D-5L Value Sets
EuroQol Agenda and Developing the New EQ-5D-5L Value SetsOffice of Health Economics
 
Case Investment Portfolio Non Lp 06 Dec 08
Case Investment Portfolio Non Lp 06 Dec 08Case Investment Portfolio Non Lp 06 Dec 08
Case Investment Portfolio Non Lp 06 Dec 08analyst_opsbiz
 
Total factor productivity growth
Total factor productivity growthTotal factor productivity growth
Total factor productivity growthGece Gunduz
 
Sample cba contact center upgrade project
Sample cba   contact center upgrade projectSample cba   contact center upgrade project
Sample cba contact center upgrade projectKen Hicks
 
Cr2 presentation deutsche bank - 01-jun-11
Cr2 presentation   deutsche bank - 01-jun-11Cr2 presentation   deutsche bank - 01-jun-11
Cr2 presentation deutsche bank - 01-jun-11SiteriCR2
 
Mm3 project ppt group 1_section a
Mm3 project ppt group 1_section aMm3 project ppt group 1_section a
Mm3 project ppt group 1_section aAbhijeet Dash
 
Week7 Quiz Help Excel File
Week7 Quiz Help Excel FileWeek7 Quiz Help Excel File
Week7 Quiz Help Excel FileBrent Heard
 
Consolidated Customer Financials072908 Sample
Consolidated Customer Financials072908  SampleConsolidated Customer Financials072908  Sample
Consolidated Customer Financials072908 Samplewcampagn
 
Case steel works, inc. presentation (group 5)
Case   steel works, inc. presentation (group 5)Case   steel works, inc. presentation (group 5)
Case steel works, inc. presentation (group 5)NIDA Business School
 
Bmo presentation draft 2013 03 28
Bmo presentation draft 2013 03 28Bmo presentation draft 2013 03 28
Bmo presentation draft 2013 03 28primero_mining
 

Similar to Data Mining Techniques for Risk Analysis (12)

Borgatti dagstuhl 2008 presentation 2c
Borgatti   dagstuhl 2008 presentation 2cBorgatti   dagstuhl 2008 presentation 2c
Borgatti dagstuhl 2008 presentation 2c
 
Differential games
Differential gamesDifferential games
Differential games
 
EuroQol Agenda and Developing the New EQ-5D-5L Value Sets
EuroQol Agenda and Developing the New EQ-5D-5L Value SetsEuroQol Agenda and Developing the New EQ-5D-5L Value Sets
EuroQol Agenda and Developing the New EQ-5D-5L Value Sets
 
Case Investment Portfolio Non Lp 06 Dec 08
Case Investment Portfolio Non Lp 06 Dec 08Case Investment Portfolio Non Lp 06 Dec 08
Case Investment Portfolio Non Lp 06 Dec 08
 
Total factor productivity growth
Total factor productivity growthTotal factor productivity growth
Total factor productivity growth
 
Sample cba contact center upgrade project
Sample cba   contact center upgrade projectSample cba   contact center upgrade project
Sample cba contact center upgrade project
 
Cr2 presentation deutsche bank - 01-jun-11
Cr2 presentation   deutsche bank - 01-jun-11Cr2 presentation   deutsche bank - 01-jun-11
Cr2 presentation deutsche bank - 01-jun-11
 
Mm3 project ppt group 1_section a
Mm3 project ppt group 1_section aMm3 project ppt group 1_section a
Mm3 project ppt group 1_section a
 
Week7 Quiz Help Excel File
Week7 Quiz Help Excel FileWeek7 Quiz Help Excel File
Week7 Quiz Help Excel File
 
Consolidated Customer Financials072908 Sample
Consolidated Customer Financials072908  SampleConsolidated Customer Financials072908  Sample
Consolidated Customer Financials072908 Sample
 
Case steel works, inc. presentation (group 5)
Case   steel works, inc. presentation (group 5)Case   steel works, inc. presentation (group 5)
Case steel works, inc. presentation (group 5)
 
Bmo presentation draft 2013 03 28
Bmo presentation draft 2013 03 28Bmo presentation draft 2013 03 28
Bmo presentation draft 2013 03 28
 

Recently uploaded

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxAvyJaneVismanos
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 

Recently uploaded (20)

EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)ESSENTIAL of (CS/IT/IS) class 06 (database)
ESSENTIAL of (CS/IT/IS) class 06 (database)
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
Final demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptxFinal demo Grade 9 for demo Plan dessert.pptx
Final demo Grade 9 for demo Plan dessert.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 

Data Mining Techniques for Risk Analysis

  • 1.
  • 2. Content • Introduction – Data Mining – Association Rules – Fuzzy logic – Applications • Procedure – Support and confidence – Steps • Example:Risk analysis – Funda: Conditional probability – Example analysis
  • 3. Data Mining • KDD • Extraction of Knowledge from Huge amounts of data • This knowledge is – implicit, – previously unknown – and potentially useful
  • 4. Association rules • Item sets (Z:C) • Antecedent (X:A) • Consequent (Y:B) • Ex: If X is A, then Y is B
  • 5. Application • Strategic Decision Making • Marketing Strategy formulation • Predictive analytics: – CRM – Machine Maintenance – Employee Relations • Artificial Intelligence : Video games, Robots • Machines: Air conditioner, Washing machines, ABS
  • 6.
  • 7. Procedure • Two factors: Support and Confidence Trans_ID Bread Butter Biscuit Milk 1 1 1 0 1 2 0 1 1 0 3 1 0 1 0 4 1 1 0 1 5 1 1 1 0 6 1 1 1 1
  • 8. Procedure • Hypothesis : Customers who buy bread and butter, also buy milk. Trans_ID Bread Butter Biscuit Milk 1 1 1 0 1 2 0 1 1 0 3 1 0 1 0 4 1 1 0 1 5 1 1 1 0 6 1 1 1 1 • Support = Desired Outcome/ Total Opportunities • Support = 3/6 = 0.5
  • 9. Procedure • Customers who buy bread and butter, also buy milk. Trans_ID Bread Butter Biscuit Milk 1 1 1 0 1 2 0 1 1 0 3 1 0 1 0 4 1 1 0 1 5 1 1 1 0 6 1 1 1 1 • Confident = Desired Outcome/ Desired Opportunities • Confident = 3/4 = 0.75
  • 10. Inference • Hypothesis becomes Rule : Customers who buy bread and butter, also buy milk. • With 75% confidence and 50% support from past transactions records
  • 11. Procedure Fuzzification • Continuous to Discrete data Analysis • Threshold Defuzzification • State rule with confidence and support
  • 12.
  • 13. Bank customer Data set Case Age Income Risk Credit Result 1 20 52,623 –38,954 red 0 2 26 23,047 –23,636 green 1 3 46 56,810 45,669 green 1 4 31 38,388 –7,968 amber 1 5 28 80,019 –35,125 green 1 6 21 74,561 –47,592 green 1 7 46 65,341 58,119 green 1 8 25 46,504 –30,022 green 1 9 38 65,735 30,571 green 1 10 27 26,047 –6 red 1 RISK = ASSETS – DEBT – WANTS
  • 14. Bank’s weight for each attribute and condition for analysis Attribute Weight Function Percentage Credit 0.800 Minimum Risk 0.700 25 Support Income 0.550 Minimum 90 Age 0.450 Confidence Result 0.691 Objective : Provide Confident/Risk factor for the bank to issue loans for the customers
  • 16. Fuzzification Attribute Level Representation Weight Membership Support value (Rjk) Age Young R11 0.450 0.580 0.261 Age Middle R12 0.450 0.300 0.135 Age Old R13 0.450 0.131 0.059 Income High R21 0.550 0.000 0.000 Income Middle R22 0.550 0.890 0.490 Income Low R23 0.550 0.109 0.060 Risk High R31 0.700 0.457 0.320 Risk Middle R32 0.700 0.208 0.146 Risk Low R33 0.700 0.332 0.233 Credit Good R41 0.800 0.720 0.576 Credit Bad R42 0.800 0.280 0.224 Result On Time R51 0.691 0.930 0.643 Result Default R52 0.691 0.069 0.048
  • 17. Item set • C = complete sets, individual items • L = Set of items above minimum support, grouped items • minsupp = 0.25 • Conditional probability = support
  • 18. Apriori Algorithm START Compute Eliminate conditional items < minsupp probability of each to form SET L element in SET C Is L=0 YES : STOP NO : nCr to form new SET C
  • 19. C1 -> L1 - >C2 C1 Support L1 R11 0.261 R11 C2 R12 0.135 (R11 , R22) R13 0.059 (R11 , R31) R21 0.000 (R11 , R41) R22 0.490 R22 (R11 , R51) R23 0.060 (R22 , R31) R31 0.320 R31 (R22 , R41) R32 0.146 (R22 , R51) R33 0.233 (R31 , R41) R41 0.576 R41 (R31 , R51) R42 0.224 (R41 , R51) R51 0.643 R51 R52 0.048
  • 20. C2 -> L2 -> C3 C2 Support L2 (R11 , R22) 0.235 (R11 , R31) 0.207 C3 (R11 , R41) 0.212 (R22, R41, R51) (R11 , R51) 0.230 (R22, R31, R41) (R22 , R31) 0.237 (R22, R51, R31) (R22 , R41) 0.419 (R22 , (R31, R41, R51) (R22 , R51) 0.449 R41) (R31 , R41) 0.266 (R22 , (R31 , R51) 0.264 R51) (R41 , R51) 0.560 (R31 , R41) (R31 ,
  • 21. C3 -> L3 -> C4 C3 Support L3 C4 (R22, R41, R51) 0.417 (R22, R41, R51) (R22, R31, R41, R51) (R22, R31, R41) 0.198 (R22, R51, R31) 0.196 (R31, R41, R51) 0.264 (R31, R41, R51)
  • 22. C4 -> L4 C4 Support L4 STOP (R22, R31, R41, 0.1957 R51)
  • 23. Possible Associations Items from L2 Associations Items from L3 Associations R22->R41 (R22, R41) R22, R41->R51 R41->R22 R22->R51 (R22, R41, (R22, R51) R22, R51->R41 R51->R22 R51) R3`->R41 R51, R41->R22 (R31,R41) R41->R31 R31->R51 R31, R41->R51 (R31, R51) (R31, R41, R31->R51 R31, R51->R41 R41->R51 R51) (R41, R51) R51, R41->R33 R51->R41
  • 24. Confidence of each association Items from Confide Items from Association Associations Confidence L2 nc L3 s (R22, R22->R41 0.855 R22, R41- 0.995 >R51 R41) R41->R22 0.727 (R22, R41, R22, R51- R22->R51 0.916 0.928 (R22, R51) >R41 R51) R51->R22 0.697 R51, R41- 0.744 R3`->R41 0.831 >R22 (R31,R41) R31, R41- R41->R31 0.462 0.993 >R51 (R31, R31->R51 0.825 (R31, R41, R31, R51- 1.000 R51) R31->R51 0.410 R51) >R41 (R41, R41->R51 0.972 R51, R41- 0.472 >R31 R51) R51->R41 0.870
  • 25. Associations meeting minconf Items from Confide Items from Association Associations Confidence L2 nc L3 s (R22, R22->R41 0.855 R22, R41- 0.995 >R51 R41) R41->R22 0.727 (R22, R41, R22, R51- R22->R51 0.916 0.928 (R22, R51) >R41 R51) R51->R22 0.697 R51, R41- 0.744 R3`->R41 0.831 >R22 (R31,R41) R31, R41- R41->R31 0.462 0.993 >R51 (R31, R31->R51 0.825 (R31, R41, R31, R51- 1.000 R51) R31->R51 0.410 R51) >R41 (R41, R41->R51 0.972 R51, R41- 0.472 >R31 R51) R51->R41 0.870
  • 26. Confident Associations that meet the objective of the analysis Items from Confide Items from Association Associations Confidence L2 nc L3 s (R22, R22->R41 0.855 R22, R41- 0.995 >R51 R41) R41->R22 0.727 (R22, R41, R22, R51- R22->R51 0.916 0.928 (R22, R51) >R41 R51) R51->R22 0.697 R51, R41- 0.744 R3`->R41 0.831 >R22 (R31,R41) R31, R41- R41->R31 0.462 0.993 >R51 (R31, R31->R51 0.825 (R31, R41, R31, R51- 1.000 R51) R31->R51 0.410 R51) >R41 (R41, R41->R51 0.972 R51, R41- 0.472 >R31 R51) R51->R41 0.870
  • 27. Defuzzification • If Income is middle, then payment will be received on time R22->R51;(91.6%) • If Credit is good, then payment will be received on time R41->R51;(97.2%) • If Income is middle and Credit is good, then payment will be received ontime R41, R22-> R51; (99.5%) • If Risk is high and Credit is good, then payment will be received on time R31, R41->R51; (99.25%)