Random Artificial Incorporation of Noise in a Learning Classifier System Environment.
Random Artificial Incorporation of Noise in a Learning Classifier System Environment Ryan J. Urbanowicz, Nicholas A. Sinnott- Armstrong, and Jason H. Moore Dartmouth Medical School GECCO Dublin, Ireland - 2011
Genetic Epidemiology • Association Study (Case/Control) • Single Nucleotide Polymorphism (SNP) • Allele & GenotypeSubject #1 Subject #2 Subject #3-- AGGTCA -- -- AGGTCA -- -- AGCTCA ---- AGGTCA -- -- AGCTCA -- -- AGCTCA -- Two alleles (G and C) Three genotypes (GG, GC, CC) Encode genotypes (0, 1, 2)
“One SNP at a time approach”“Complex systems approach”
Main EffectsDisease Disease Disease AA(.25) Aa(.5) aa(.25) SNP1 SNP2 SNP3 SNP X 0 0 1Epistasis SNP 1 AA(.25) Aa(.5) aa(.25) Disease BB(.25) 0 1 0 0.5 SNP 2 Bb(.5) 1 0 1 0.5 bb(.25 0 1 0 0.5 0.5 0.5 0.5 Marginal Penetrance(s) SNP1 SNP2 SNP3
Genetic Heterogeneity Sample population G3 G4 G5 G7 G8G1 G2 G6 • Evidence of GH in… – Autism – Tuberous sclerosis – Schizophrenia – Cystic Fibrosis – Breast Cancer – Asthma – Alzheimer disease – And many, many others…
Learning Classifier Systems 4 2 Population [P] Covering Classifiern = Condition : Action :: Parameter(s) Detectors 3 Action 5 1 10 Match Set [M] Selection 9 Genetic Classifierm Environment Algorithm Prediction Array 6 Action Performed Action Set [A] Effectors Classifiera 7 8 Reward [A]t-1 Learning Strategy Classifiert-1 Credit Assignment • Autonomous Robotics Discovery Component • Complex Adaptive Systems Performance Component • Function Approximation Reinforcement Component • Classification • Data MiningUrbanowicz 2009 LCS: A Complete Introduction, Review, and Roadmap. Journal of Artificial Evolution and Applications
Effective Generalization• Effective Generalization – Maximizing rule generality while preserving accuracy. (Testing Acc. = Training Acc.)• Examples of LCS Generalization 10###00### - 1 – Generalization Hypothesis (Wilson 1995) 02##1##### - 0 – Action Set GA & Subsumption (Wilson 1998) – Hierarchical Selection operator (Bacardit/Garrell 2002) – Windowing (Bacardit et. al. 2004) – Minimum Description Length (Bacardit/Garrell 2007) – Ensemble LCS (Gao et. al. 2005)• Noisy Problem domains – – Over-fitting become a particularly important problem – Classification Noise - < 100% testing accuracy possible – Attribute Noise – attributes which contribute nothing to testing accuracy.
Hypothesis:• Given: a noisy problem, LCS’s with accuracy-based fitness will tend to over-fit (learn structure idiosyncratic to the training dataset).• Consider: datasets with a small sample size would be particularly susceptible to this (online learning repeatedly considers same samples).• If: we probabilistically incorporate variable noise into the incoming training instances, than every epoch of learning, the Michigan LCS is exposed to a randomly permuted version of the original dataset. Leads to an artificially inflated sample size.• Hypothesis: The incorporation of low levels of random classification noise will discourage over-fitting, and promote effective generalization
RAIN Random Artificial Incorporation of Noise Environment0210110220 -- 1 0210120220 -- 1Population [P] Match Set [M] Correct Set [C]
Temporal Models• Pm = Maximum Permutation Prob.•Pc = Current Permutation Prob.•Im = Maximum Iteration•Ic = Current Iteration•Uniform•Linear•Inverse Linear•Gaussian
Targeted RAIN• Idea: Strategically and automatically avoid destructively adding noise to attributes more likely to be important to classification.• Probabilistically targets attributes which are more frequently generalized (rather than specified)• Pc = Pm• Two Implementations (Weight lists generated differently) – Targeted Generality (TG) – Targeted Fitness Weighted Generality (TFWG)• Noise Generation (same for both implementations) – First epoch – no noise – Weight list recalculated at the end of each epoch. – Subtract minimum weight in list from all values in list. – Determine number of attributes to be permuted (Random < Pm) – Choose attribute - Roulette wheel selection
Experimental Evaluation• UCS – – Iterations = [50000,100000, 200000, 500000] – Micro Pop. Size = 1600 – Other parameters are default – Track: Training Acc. Testing Acc. Generality, Macro Pop. Size, Run Time Power to find both or a single underlying model – Pm = 0.001, 0.01, 0.05, 0.1• Each Dataset – Main effect free – 2X two-locus epistatic interation – 20 Attributes – Balanced – Minor allele frequencies = 0.2 G1 G2 G3 G4 – Heritability = 0.2 – Mix Ratio = 50:50 – Sample Sizes [200, 400, 800, 1600] – 20 Replicates• 80 Simulated Datasets + (10 Fold CV) 800 runs of UCS
Conclusions & Future Work• Incorporation of RAIN with equal attribute probability is ineffective.• Targeted RAIN was able to reduce over-fitting (sig. decrease in training accuracy without reducing testing accuracy.• Improvements in power (not significant) suggest that RAIN may improve UCS’s ability to identify predictive attributes.• Try RAIN on datasets with much larger numbers of attributes• Consider the combination of targeted RAIN with temporal models• Explore a larger range of Pm values• Implement RAIN with an adaptive Pm
Acknowledgements Jason Moore & Nicholas A. Sinnott-Armstrong Funding Support NIH: AI59694, LM009012, LM010098 William H. Neukom 1964 Institute forComputational Science at Dartmouth College