SlideShare a Scribd company logo
1 of 45
  2nd Workshop on Domain Driven Data Mining, Session I : S2208
                                                           Dec. 15, 2008
                                           Palazzo dei Congressi, Pisa, Italy



     Identification of Causal Variables
    for Building Energy Fault Detection
          by Semi-supervised LDA
                                &
          Decision Boundary Analysis


Keigo Yoshida, Minoru Inui, Takehisa Yairi, Kazuo Machida
     (Dept. of Aeronautics & Astronautics, the Univ. of Tokyo)

          Masaki Shioya, and Yoshio Masukawa
                          (Kajima Corp.)
                                                                    1
2



 Main Point of the Presentation
We propose …

A Supportive Method for Anomaly Cause Identification
                         by
        Combining Traditional Data Analysis
             and Domain Knowledge

Applied to Real Building Energy Management System (BEMS)
   Root cause of energy wastes was found successfully
3



Outline

   Introduction
   Theories
   Experiments for Real Data
   Conclusions
4



    Introduction: What is BEMS ?
   Building Energy Management Systems
        Collect/Monitor Sensor Data in BLDG
                            (temperature, heat consumption etc…)
        Energy-efficient Control
        Discover Energy Faults (wastes)




                I/F

        BEMS
5



    Introduction: Problem of BEMS
   Hard to identify root causes of Energy Faults (EF)
       Complex Relation between Equipments
       Data Deluge from Numerous Sensors
              (approx. 2000 sensors for 20-story)

   Current EF Detection:
Heuristics Based on Expert’s Empirical Knowledge,
           usually fuzzy “IF-THEN” rules.
          “Heuristic Diagnostics is Incomplete”
       Fuzziness                False Negative Error
       Detection-Only           Cannot Improve Systems
6



     Early Fault Diagnosis Methods
                                  Performance
   Knowledge-Based
                                                            Data-Driven
    Modeling-Based
                                                            • Feature Extraction
• FTA/FMEA
                                                            • Neural Networks…
• Bayesian
     Filtering     Expert System
• FDA…
                     Fuzzy Logic             Unsupervised
                 Supervised Learning           Learning /
                                              Data Mining


                 Experts             Source                  Data
                 Easy            Interpretation              Hard
                 Expensive       Modeling Cost               Low
                 Poor              Versatility               High

Knowledge Acquisition Bottleneck                Neglecting Useful Knowledge
7



  Proposed Method
                             Performance
 Knowledge-Based
                                                        Data-Driven
  Modeling-Based



            Expert System Proposal
                        Domain Knowledge
              Fuzzy Logic           Unsupervised
                               +
          Supervised Learning AnalysisLearning /
                          Data
                                     Data Mining

                        - Characteristics -
         Experts                  Source                 Data
Interpretation:
         Easy        exploit domain knowledge
                              Interpretation            Hard
         Expensive            Modeling Cost              Low
          Cost:
         Poor        not so high, empirical knowledge
                                Versatility             only
                                                         High
   Versatility:      easy to apply to various domains & problems
 Performance:        better than heuristics
8



  Conceptual Diagram
                                         Learning Boundary
Experts Detection Rule




    e.g.


 Feedback
        * Assumption *
     Variable Identification
  Contribution to EF
  Incomplete heuristics surely           Data Distribution
      represent abnormal             Acquire Reliable Labels
          phenomena                     with Given Rule
                               DBA
                                     Semi-supervised LDA
            Variable #
9



Outline

   Introduction
   Theories
       Semi-Supervised Linear Discriminant Analysis
       Decision Boundary Analysis
   Experiments for Real Data
   Conclusions
10



Semi-supervised LDA
                      Learning Boundary




                      Data Distribution

                 Acquire Reliable Labels
                    with Given Rule
11



Manifold Regularization [M. Belkin et al. 05]
                                                  Labeled data only
   Regularized Least Square



             Squared loss        Penalty Term
            for labeled data   (usually squared
                                function norm)
12



 Manifold Regularization [M. Belkin et al. 05]
                                                     Labeled data only
    Regularized Least Square



                Squared loss        Penalty Term
               for labeled data   (usually squared
                                   function norm)
    Laplacian RLS:


            Squared loss   Penalty Term           Additional term
                                               for intrinsic geometry
Use labeled & unlabeled data
Assumption:
Geometrically close
⇒ similar label                                      : graph Laplacian
13



Semi-Supervised Linear Discriminant Analysis (SS-LDA)

    LDA seeks projection for small within-cov. & large between-cov.

                                 Between-class

                                  Within-class

    Regularized Discriminant Analysis:
                                [Friedman 89]




                                 Regularizer
    Semi-Supervised Discriminant Analysis (SS-LDA):
14



Decision Boundary Analysis
                     Learning Boundary




                     Data Distribution

                  Acquire Reliable Labels
                     with Given Rule


                  Semi-supervised LDA
15

    Decision Boundary Analysis
   Feature Extraction method proposed by Lee & Landgrabe
       C. Lee & D. A. Landgrabe. Feature Extraction Based on Decision Boundary, IEEE
                                 Trans. Pattern Anal. Mach. Intell. 15(4): 388-400, 1993



Class 2 Learned Class 1              Top view               Cross-section view
       Boundary
                                                   Normal vec.




                         : disciminantly informative       : discriminantly redundant


   Extract informative features from
              normal vectors on the boundary
16



Decision Boundary Feature Matrix



Linear:

Nonlinear:




   Define responsibility of each variables for discrimination
17



Outline

   Introduction
   Theories
   Experiments
       Application to Energy Fault Analysis
   Conclusions
18


  Energy Fault Diagnosis Problem
                                  EF: Inverter overloaded
                                        Detection Rule
                             6h M.A. of Inverter output = 100   EF
                                … but I don’t know the cause




cold
                  Inverter
hot

           coil
           Air Handling Unit
humidity
19


  Energy Fault Diagnosis Problem
                                      EF: Inverter overloaded
                                            Detection Rule
                                 6h M.A. of Inverter output = 100   EF
                                    … but I don’t know the cause



        DATA
cold
         &            Inverter
hot    RULE
               coil

      Find out root cause of inverter overload
            Air Handling Unit
humidity
20

     Energy Fault Diagnosis - Settings

     Air-conditioning time-series sensor data for 1 unit
     instances: 744
     Labeled sample: 10 for each (3% of all)
      (based on probability proportional to distance from boundary)
     Hyper-parameters:       NN = 5,
     13 attributes, all continuous

1. Supply Air (SA) Temp.              8. Humidifier Valve Opening
2. Room Tempe.                        9. Return Air Temperature
3. Supply Air Temp. Setting       10. Pressure Diff. between In-Outside
4.   Room Humidity                11. Moving Ave. of Pressure Difference
5.   Inverter Output              12. Outside Air Temperature
6.   Cooing Water Valve Opening   13. Outside Humidity
7.   Hot Water Valve Opening
Experimental Results


                       21
22

      Results (100 times ave.)
Contribution Score [%]           0     20   40   60   80         100
              SA Te m p .
          Ro o m Te m p .
            SA Se t t in g
     Ro o m Hu m id it y
              Inverte
                 I ve rt e r
                  n
              r
       C o o lin g Wa t e r
              Ho t Wa t e r
              H u m id if ie r
  Re t u r n Air Te m p .
       Pr e s s u r e Dif f .
        MA. Pr e s s u r e                                 LDA
      O u t s id e Te m p .
  O u t s id e Hu m id it y

     <LDA>
 Inverter (96%)                  Trivial
23

      Results (100 times ave.)
Contribution Score [%]           0   20   40   60   80           100
              SA Temp.
               SA Te m p .
          Ro o m Te m p .
            SA Se t t in g
     Ro o m Hu m id it y
                 I ve rt e r
                  n
       CCooling t e r
          o o lin g Wa
         watert Wa t e r
              Ho
              H u m id if ie r
  Re t u r n Air Te m p .
       Pr e s s u r e Dif f .
        MA. Pr e s s u r e                               LDA
      O u t s id e Te m p .                              SSLDA
  O u t s id e Hu m id it y

     <LDA>          <SSLDA>
 Inverter (96%) Cool water (75%)
                SA temp. (12%)
24

      Results (100 times ave.)
Contribution Score [%]           0   20   40       60       80           100
              SA Te m p .
          Ro o m Te m p .                           Not Distinctive !
            SA Se t t in g
     Ro o m Hu m id it y
                 I ve rt e r
                  n
       C o o lin g Wa t e r
              Ho t Wa t e r
              H u m id if ie r
  Re t u r n Air Te m p .
       Pr e s s u r e Dif f .                                    LDA
        MA. Pr e s s u r e                                       SSLDA
      O u t s id e Te m p .                                      K DA
  O u t s id e Hu m id it y

     <LDA>          <SSLDA>             <KDA>
 Inverter (96%) Cool water (75%) Cool water (19%)
                SA temp. (12%) MA. Pressure (15%)
                                 Inverter    (15%)
                                               …
25

       Results (100 times ave.)
Contribution Score [%]           0   20   40       60   80            100
 [1]          SA Te SA. mp
                  Temp.
          Ro o m Te m p .
 [2]        SA Se t tSA  in g
                Setting
     Ro o m Hu m id it y
              Inverterr
                 I ve rt e
                  n
 [3] C o o lin g Wa t e r
               Cooling
              Ho t water
                     Wa t e r
              H u m id if ie r
  Re t u r n Air Te m p .
       Pr e s s u r e Dif f .                                LDA
        MA. Pr e s s u r e                                   SSLDA
      O u t s id e Te m p .                                  K DA
  O u t s id e Hu m id it y                                  SSK DA

     <LDA>          <SSLDA>             <KDA>          <SSKDA>
 Inverter (96%) Cool water (75%) Cool water (19%) Inverter    (33%)
                SA temp. (12%) MA. Pressure (15%) SA temp (19%)
                                 Inverter    (15%) Cool Water (17%)
                                                   SA setting (13%)
                                               …
26

    Energy Fault Diagnosis: Examine Row Data
   Cooling water valve Opening [3]




     valve opens completely, but this is result of EF, not cause
27

    Energy Fault Diagnosis: Examine Row Data
   Cooling water valve Opening




     valve opens completely, but this is result of EF, not cause
   SSLDA/SSKDAdeviation… temp. [1] & setting [2] responsible
     To reduce this show SA
     • Operate inverter at peak power
                                        deviation of SA temp.
     • Open cooling water valve
28


Evaluation




Root Cause   LDA   SSLDA   KDA   SSKDA

SA Temp.

SA Setting
29



Outline

   Introduction
   Theories
   Experiments for Real Data
   Conclusions
30



    Conclusions
   Introduce identification method of causal variables
                  by combining semi-supervised LDA & DBA
   Labels are acquired from imperfect domain-specific rule
   SS-LDA/SS-KDA: reflect domain knowledge & avoid over-fitting
   DBA: extract informative features from normal direction of boundary



   Apply to energy fault cause diagnosis
   Succeeded in extracting some responsible features
    beginning with fuzzy heuristics based on domain knowledge
31



Room for improvements


   Consider temporal continuity
       Time-series is not i.i.d.

   Find True Cause from Correlating Variables
32




Thank you for your kind attention
33




Discussions
34



Extension to Multiple Energy Faults

   In real systems, various faults take place
   Fault cause varies among phenomena
   Need to separate phenomena and diagnose respectively

<Our Approach>
1. Extract points detected by existing heuristics
2. Reduce dimensionality and visualize data in low-dim. space
3. Clustering data and give them labels
4. Identify variables discriminating that cluster from normal data
35



    Experimental Condition & Results
   Air-conditioning sensor data, 13 attributes, same heuristics
   748 instances, operating time only (hourly data for 2 months)
   137 points are detected by heuristics
   Reduce dimensionality by isomap [J.B. Tenenbaum 00] (kNN = 5)
   Contribution score is given by SS-KDA (kNN = 5,                 )
    <2D representation>




2 major cluster,
4 anomalies
36
                                    Contribution score for red points
    Experimental Condition &Room air Temp.
                             Results
                                                         superficial
   Air-conditioning sensor data, 13 attributes, same heuristics
   748 instances, operating time only (hourly data for 2 months)
   137 points are detected by heuristics
   Reduce dimensionality by isomap [J.B. Tenenbaum 00] (kNN = 5)
   Contribution score is given by SS-KDA (kNN = 5,                     )
    <2D representation>




                                       Deviation of Room Air Temp.
2 major cluster,                         around detected points
4 anomalies
                                         Detected, this is EF
37


Data Distribution
                    Properly Controlled


                    System Deviation
38



                          Data Distribution
                                                  Linearly Separable
                                              for Cooling Water Valve [3]
Cooling Water Valve [%]
39



    Probabilistic Labeling
            Rule       Points distant from boundary are
                        reliable as class labels
                       Keep robustness against outliers

outlier
                       Points are stochastically given labels
      Unreliable                  based on reliability




                        : Distance from boundary of point
40



 Estimate DBFM
    Linear Case:
    Nonlinear Case
     Difficult to acquire points on boundary & calculate gradient vector
                Disciminant function is linear in feature space
           Input space                            Feature space




Kernelized SSLDA
    (SS-KDA)
41



  DBFM for Nonlinear Distribution (1)
                                                     Feature space
1. Generate points on boundary in feature space




2. Gradient vector at corresponding point


  for Gaussian kernel
                                                      Input space



  But to find pre-image    is generally difficult…

By kernel trick, pre-image problem is avoidable
42



    DBFM for Nonlinear Distribution (2)
Finally we have gradient vectors on boundary for each point



3. Construct estimated DBFM




    Define responsibility of each variables for discrimination

                                   Max. eigenvalue
43



Verification by Benchmark Data – wine discrimination -

           UCI Machine Learning Repository: Wine Dataset
    Consider 2-class problem (Original data contain 3)
    Number of Instances:   wine A: 59, wine B: 71
    13 attributes, all continuous
  1. Alcohol                Ad hoc Rule: Color intensity > 4    wine A
  2. Malic                                        otherwise     wine B
  3. Ash
  4. Alkalinity of Ash                                    Histogram
  5. Magnesium
  6. Phenols
  7. Flavonoids                    Frequency
  8. Nonflavonoid phenols
  9. Proanthocyanins
 10. Color intensity
 11. Hue
 12. OD280/OD315 of diluted wines
 13. Proline
                                                Color intensity
44



Result on Benchmark Data
   Acquire only 3 labels for each class based on probability
    proportional to distance from boundary (color intensity = 4)
   Hyper-parameters: Nearest neighbors = 3,
         100 times average

                                           Most 3 responsible attributes
                                           <LDA>
                                           1. Flavonoids (7): 18.0%
                                           2. Color intensity (10): 13.2%
                                           3. Phenols (6): 11.6 %
                                                                  [42.8%]
                                           <SS-LDA>
                                           1. Proline (13): 26.5%
                                           2. Color intensity (10): 22.1%
                                           3. Alcohol (1): 14.2%
                                                                  [62.8%]
45



  Comparison of SSLDA with LDA
   Plot data in space spanned by most 3 responsible features
           LDA                             SSLDA




Apparently SSLDA gives effective features for discrimination

More Related Content

What's hot

Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing
Cognitive Behavior Analysis framework for Fault Prediction in Cloud ComputingCognitive Behavior Analysis framework for Fault Prediction in Cloud Computing
Cognitive Behavior Analysis framework for Fault Prediction in Cloud ComputingReza Farrahi Moghaddam, PhD, BEng
 
Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...
Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...
Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...IDES Editor
 
A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...
A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...
A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...Jose Iglesias
 

What's hot (6)

Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing
Cognitive Behavior Analysis framework for Fault Prediction in Cloud ComputingCognitive Behavior Analysis framework for Fault Prediction in Cloud Computing
Cognitive Behavior Analysis framework for Fault Prediction in Cloud Computing
 
Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...
Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...
Intrusion Detection using C4.5: Performance Enhancement by Classifier Combina...
 
FASE08.ppt
FASE08.pptFASE08.ppt
FASE08.ppt
 
A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...
A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...
A Study of Semantic Proximity between Archetype Terms based on SNOMED CT Rela...
 
Study proposal: Dohorap
Study proposal: DohorapStudy proposal: Dohorap
Study proposal: Dohorap
 
Engineering self-organising self-aware electronic institutions-by Jeremy Pitt
Engineering self-organising self-aware electronic institutions-by Jeremy PittEngineering self-organising self-aware electronic institutions-by Jeremy Pitt
Engineering self-organising self-aware electronic institutions-by Jeremy Pitt
 

Similar to Identification of Causal Variables for Building Energy Fault Detection by Semisupervised LDA

Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular dataJimmyLiang20
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview EMC
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Sybase Türkiye
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBigDataCloud
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
IET harnessing big data tools in financial services
IET harnessing big data tools in financial servicesIET harnessing big data tools in financial services
IET harnessing big data tools in financial servicesChris Swan
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵CHENHuiMei
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson
 
Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRDavid Carmel
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOpenStorageSummit
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Big Data Spain
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotModern Data Stack France
 
Cutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationCutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationPankaj Sharma
 
Cybersecurity exchange briefing oct 2012 v2
Cybersecurity exchange briefing oct 2012 v2Cybersecurity exchange briefing oct 2012 v2
Cybersecurity exchange briefing oct 2012 v2Naba Barkakati
 
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryFebruary 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryJohn Wang
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Ed Kohlwey
 
Barnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary ExamBarnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary ExamBarnan Das
 

Similar to Identification of Causal Variables for Building Energy Fault Detection by Semisupervised LDA (20)

Deep neural networks and tabular data
Deep neural networks and tabular dataDeep neural networks and tabular data
Deep neural networks and tabular data
 
Greenplum Database Overview
Greenplum Database Overview Greenplum Database Overview
Greenplum Database Overview
 
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
Farklı Ortamlarda Büyük Veri Kavramı -Big Data by Sybase
 
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of SybaseBig Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
Big Data Analytics in a Heterogeneous World - Joydeep Das of Sybase
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
IET harnessing big data tools in financial services
IET harnessing big data tools in financial servicesIET harnessing big data tools in financial services
IET harnessing big data tools in financial services
 
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
小數據如何實現電腦視覺,微軟AI研究首席剖析關鍵
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
Sigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IRSigir12 tutorial: Query Perfromance Prediction for IR
Sigir12 tutorial: Query Perfromance Prediction for IR
 
TAO DAYS - ROADMAP
TAO DAYS - ROADMAPTAO DAYS - ROADMAP
TAO DAYS - ROADMAP
 
OSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal SternOSS Presentation Keynote by Hal Stern
OSS Presentation Keynote by Hal Stern
 
Forrester
ForresterForrester
Forrester
 
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
Health Insurance Predictive Analysis with Hadoop and Machine Learning. JULIEN...
 
Analyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien CabotAnalyse prédictive en assurance santé par Julien Cabot
Analyse prédictive en assurance santé par Julien Cabot
 
Cutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For ClassificationCutting Edge Predictive Modeling For Classification
Cutting Edge Predictive Modeling For Classification
 
Cybersecurity exchange briefing oct 2012 v2
Cybersecurity exchange briefing oct 2012 v2Cybersecurity exchange briefing oct 2012 v2
Cybersecurity exchange briefing oct 2012 v2
 
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscoveryFebruary 2010 8 Things You Cant Afford To Ignore About eDiscovery
February 2010 8 Things You Cant Afford To Ignore About eDiscovery
 
Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?Hadoop & Greenplum: Why Do Such a Thing?
Hadoop & Greenplum: Why Do Such a Thing?
 
Barnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary ExamBarnan Das PhD Preliminary Exam
Barnan Das PhD Preliminary Exam
 
093
093093
093
 

Recently uploaded

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Identification of Causal Variables for Building Energy Fault Detection by Semisupervised LDA

  • 1.   2nd Workshop on Domain Driven Data Mining, Session I : S2208 Dec. 15, 2008 Palazzo dei Congressi, Pisa, Italy Identification of Causal Variables for Building Energy Fault Detection by Semi-supervised LDA & Decision Boundary Analysis Keigo Yoshida, Minoru Inui, Takehisa Yairi, Kazuo Machida (Dept. of Aeronautics & Astronautics, the Univ. of Tokyo) Masaki Shioya, and Yoshio Masukawa (Kajima Corp.) 1
  • 2. 2 Main Point of the Presentation We propose … A Supportive Method for Anomaly Cause Identification by Combining Traditional Data Analysis and Domain Knowledge Applied to Real Building Energy Management System (BEMS) Root cause of energy wastes was found successfully
  • 3. 3 Outline  Introduction  Theories  Experiments for Real Data  Conclusions
  • 4. 4 Introduction: What is BEMS ?  Building Energy Management Systems  Collect/Monitor Sensor Data in BLDG (temperature, heat consumption etc…)  Energy-efficient Control  Discover Energy Faults (wastes) I/F BEMS
  • 5. 5 Introduction: Problem of BEMS  Hard to identify root causes of Energy Faults (EF)  Complex Relation between Equipments  Data Deluge from Numerous Sensors (approx. 2000 sensors for 20-story)  Current EF Detection: Heuristics Based on Expert’s Empirical Knowledge, usually fuzzy “IF-THEN” rules. “Heuristic Diagnostics is Incomplete”  Fuzziness False Negative Error  Detection-Only Cannot Improve Systems
  • 6. 6 Early Fault Diagnosis Methods Performance Knowledge-Based Data-Driven Modeling-Based • Feature Extraction • FTA/FMEA • Neural Networks… • Bayesian Filtering Expert System • FDA… Fuzzy Logic Unsupervised Supervised Learning Learning / Data Mining Experts Source Data Easy Interpretation Hard Expensive Modeling Cost Low Poor Versatility High Knowledge Acquisition Bottleneck Neglecting Useful Knowledge
  • 7. 7 Proposed Method Performance Knowledge-Based Data-Driven Modeling-Based Expert System Proposal Domain Knowledge Fuzzy Logic Unsupervised + Supervised Learning AnalysisLearning / Data Data Mining - Characteristics - Experts Source Data Interpretation: Easy exploit domain knowledge Interpretation Hard Expensive Modeling Cost Low Cost: Poor not so high, empirical knowledge Versatility only High Versatility: easy to apply to various domains & problems Performance: better than heuristics
  • 8. 8 Conceptual Diagram Learning Boundary Experts Detection Rule e.g. Feedback * Assumption * Variable Identification Contribution to EF Incomplete heuristics surely Data Distribution represent abnormal Acquire Reliable Labels phenomena with Given Rule DBA Semi-supervised LDA Variable #
  • 9. 9 Outline  Introduction  Theories  Semi-Supervised Linear Discriminant Analysis  Decision Boundary Analysis  Experiments for Real Data  Conclusions
  • 10. 10 Semi-supervised LDA Learning Boundary Data Distribution Acquire Reliable Labels with Given Rule
  • 11. 11 Manifold Regularization [M. Belkin et al. 05] Labeled data only  Regularized Least Square Squared loss Penalty Term for labeled data (usually squared function norm)
  • 12. 12 Manifold Regularization [M. Belkin et al. 05] Labeled data only  Regularized Least Square Squared loss Penalty Term for labeled data (usually squared function norm)  Laplacian RLS: Squared loss Penalty Term Additional term for intrinsic geometry Use labeled & unlabeled data Assumption: Geometrically close ⇒ similar label : graph Laplacian
  • 13. 13 Semi-Supervised Linear Discriminant Analysis (SS-LDA)  LDA seeks projection for small within-cov. & large between-cov. Between-class Within-class  Regularized Discriminant Analysis: [Friedman 89] Regularizer  Semi-Supervised Discriminant Analysis (SS-LDA):
  • 14. 14 Decision Boundary Analysis Learning Boundary Data Distribution Acquire Reliable Labels with Given Rule Semi-supervised LDA
  • 15. 15 Decision Boundary Analysis  Feature Extraction method proposed by Lee & Landgrabe C. Lee & D. A. Landgrabe. Feature Extraction Based on Decision Boundary, IEEE Trans. Pattern Anal. Mach. Intell. 15(4): 388-400, 1993 Class 2 Learned Class 1 Top view Cross-section view Boundary Normal vec. : disciminantly informative : discriminantly redundant  Extract informative features from normal vectors on the boundary
  • 16. 16 Decision Boundary Feature Matrix Linear: Nonlinear:  Define responsibility of each variables for discrimination
  • 17. 17 Outline  Introduction  Theories  Experiments  Application to Energy Fault Analysis  Conclusions
  • 18. 18 Energy Fault Diagnosis Problem EF: Inverter overloaded Detection Rule 6h M.A. of Inverter output = 100 EF … but I don’t know the cause cold Inverter hot coil Air Handling Unit humidity
  • 19. 19 Energy Fault Diagnosis Problem EF: Inverter overloaded Detection Rule 6h M.A. of Inverter output = 100 EF … but I don’t know the cause DATA cold & Inverter hot RULE coil Find out root cause of inverter overload Air Handling Unit humidity
  • 20. 20 Energy Fault Diagnosis - Settings  Air-conditioning time-series sensor data for 1 unit  instances: 744  Labeled sample: 10 for each (3% of all) (based on probability proportional to distance from boundary)  Hyper-parameters: NN = 5,  13 attributes, all continuous 1. Supply Air (SA) Temp. 8. Humidifier Valve Opening 2. Room Tempe. 9. Return Air Temperature 3. Supply Air Temp. Setting 10. Pressure Diff. between In-Outside 4. Room Humidity 11. Moving Ave. of Pressure Difference 5. Inverter Output 12. Outside Air Temperature 6. Cooing Water Valve Opening 13. Outside Humidity 7. Hot Water Valve Opening
  • 22. 22 Results (100 times ave.) Contribution Score [%] 0 20 40 60 80 100 SA Te m p . Ro o m Te m p . SA Se t t in g Ro o m Hu m id it y Inverte I ve rt e r n r C o o lin g Wa t e r Ho t Wa t e r H u m id if ie r Re t u r n Air Te m p . Pr e s s u r e Dif f . MA. Pr e s s u r e LDA O u t s id e Te m p . O u t s id e Hu m id it y <LDA> Inverter (96%) Trivial
  • 23. 23 Results (100 times ave.) Contribution Score [%] 0 20 40 60 80 100 SA Temp. SA Te m p . Ro o m Te m p . SA Se t t in g Ro o m Hu m id it y I ve rt e r n CCooling t e r o o lin g Wa watert Wa t e r Ho H u m id if ie r Re t u r n Air Te m p . Pr e s s u r e Dif f . MA. Pr e s s u r e LDA O u t s id e Te m p . SSLDA O u t s id e Hu m id it y <LDA> <SSLDA> Inverter (96%) Cool water (75%) SA temp. (12%)
  • 24. 24 Results (100 times ave.) Contribution Score [%] 0 20 40 60 80 100 SA Te m p . Ro o m Te m p . Not Distinctive ! SA Se t t in g Ro o m Hu m id it y I ve rt e r n C o o lin g Wa t e r Ho t Wa t e r H u m id if ie r Re t u r n Air Te m p . Pr e s s u r e Dif f . LDA MA. Pr e s s u r e SSLDA O u t s id e Te m p . K DA O u t s id e Hu m id it y <LDA> <SSLDA> <KDA> Inverter (96%) Cool water (75%) Cool water (19%) SA temp. (12%) MA. Pressure (15%) Inverter (15%) …
  • 25. 25 Results (100 times ave.) Contribution Score [%] 0 20 40 60 80 100 [1] SA Te SA. mp Temp. Ro o m Te m p . [2] SA Se t tSA in g Setting Ro o m Hu m id it y Inverterr I ve rt e n [3] C o o lin g Wa t e r Cooling Ho t water Wa t e r H u m id if ie r Re t u r n Air Te m p . Pr e s s u r e Dif f . LDA MA. Pr e s s u r e SSLDA O u t s id e Te m p . K DA O u t s id e Hu m id it y SSK DA <LDA> <SSLDA> <KDA> <SSKDA> Inverter (96%) Cool water (75%) Cool water (19%) Inverter (33%) SA temp. (12%) MA. Pressure (15%) SA temp (19%) Inverter (15%) Cool Water (17%) SA setting (13%) …
  • 26. 26 Energy Fault Diagnosis: Examine Row Data  Cooling water valve Opening [3] valve opens completely, but this is result of EF, not cause
  • 27. 27 Energy Fault Diagnosis: Examine Row Data  Cooling water valve Opening valve opens completely, but this is result of EF, not cause  SSLDA/SSKDAdeviation… temp. [1] & setting [2] responsible To reduce this show SA • Operate inverter at peak power deviation of SA temp. • Open cooling water valve
  • 28. 28 Evaluation Root Cause LDA SSLDA KDA SSKDA SA Temp. SA Setting
  • 29. 29 Outline  Introduction  Theories  Experiments for Real Data  Conclusions
  • 30. 30 Conclusions  Introduce identification method of causal variables by combining semi-supervised LDA & DBA  Labels are acquired from imperfect domain-specific rule  SS-LDA/SS-KDA: reflect domain knowledge & avoid over-fitting  DBA: extract informative features from normal direction of boundary  Apply to energy fault cause diagnosis  Succeeded in extracting some responsible features beginning with fuzzy heuristics based on domain knowledge
  • 31. 31 Room for improvements  Consider temporal continuity  Time-series is not i.i.d.  Find True Cause from Correlating Variables
  • 32. 32 Thank you for your kind attention
  • 34. 34 Extension to Multiple Energy Faults  In real systems, various faults take place  Fault cause varies among phenomena  Need to separate phenomena and diagnose respectively <Our Approach> 1. Extract points detected by existing heuristics 2. Reduce dimensionality and visualize data in low-dim. space 3. Clustering data and give them labels 4. Identify variables discriminating that cluster from normal data
  • 35. 35 Experimental Condition & Results  Air-conditioning sensor data, 13 attributes, same heuristics  748 instances, operating time only (hourly data for 2 months)  137 points are detected by heuristics  Reduce dimensionality by isomap [J.B. Tenenbaum 00] (kNN = 5)  Contribution score is given by SS-KDA (kNN = 5, ) <2D representation> 2 major cluster, 4 anomalies
  • 36. 36 Contribution score for red points Experimental Condition &Room air Temp. Results superficial  Air-conditioning sensor data, 13 attributes, same heuristics  748 instances, operating time only (hourly data for 2 months)  137 points are detected by heuristics  Reduce dimensionality by isomap [J.B. Tenenbaum 00] (kNN = 5)  Contribution score is given by SS-KDA (kNN = 5, ) <2D representation> Deviation of Room Air Temp. 2 major cluster, around detected points 4 anomalies Detected, this is EF
  • 37. 37 Data Distribution Properly Controlled System Deviation
  • 38. 38 Data Distribution Linearly Separable for Cooling Water Valve [3] Cooling Water Valve [%]
  • 39. 39 Probabilistic Labeling Rule  Points distant from boundary are reliable as class labels  Keep robustness against outliers outlier Points are stochastically given labels Unreliable based on reliability : Distance from boundary of point
  • 40. 40 Estimate DBFM  Linear Case:  Nonlinear Case Difficult to acquire points on boundary & calculate gradient vector Disciminant function is linear in feature space Input space Feature space Kernelized SSLDA (SS-KDA)
  • 41. 41 DBFM for Nonlinear Distribution (1) Feature space 1. Generate points on boundary in feature space 2. Gradient vector at corresponding point for Gaussian kernel Input space But to find pre-image is generally difficult… By kernel trick, pre-image problem is avoidable
  • 42. 42 DBFM for Nonlinear Distribution (2) Finally we have gradient vectors on boundary for each point 3. Construct estimated DBFM  Define responsibility of each variables for discrimination Max. eigenvalue
  • 43. 43 Verification by Benchmark Data – wine discrimination - UCI Machine Learning Repository: Wine Dataset  Consider 2-class problem (Original data contain 3)  Number of Instances:   wine A: 59, wine B: 71  13 attributes, all continuous 1. Alcohol Ad hoc Rule: Color intensity > 4 wine A 2. Malic otherwise wine B 3. Ash 4. Alkalinity of Ash Histogram 5. Magnesium 6. Phenols 7. Flavonoids Frequency 8. Nonflavonoid phenols 9. Proanthocyanins 10. Color intensity 11. Hue 12. OD280/OD315 of diluted wines 13. Proline Color intensity
  • 44. 44 Result on Benchmark Data  Acquire only 3 labels for each class based on probability proportional to distance from boundary (color intensity = 4)  Hyper-parameters: Nearest neighbors = 3, 100 times average Most 3 responsible attributes <LDA> 1. Flavonoids (7): 18.0% 2. Color intensity (10): 13.2% 3. Phenols (6): 11.6 % [42.8%] <SS-LDA> 1. Proline (13): 26.5% 2. Color intensity (10): 22.1% 3. Alcohol (1): 14.2% [62.8%]
  • 45. 45 Comparison of SSLDA with LDA Plot data in space spanned by most 3 responsible features LDA SSLDA Apparently SSLDA gives effective features for discrimination