SlideShare a Scribd company logo
1 of 16
Download to read offline
Financial Data Mining with
    Genetic Programming:
  a Survey and Look Forward


Nicolas NAVET – INRIA
France
nnavet@loria.fr

Shu-Heng CHEN – AIECON/NCCU
Shu-
Taiwan
chchen@nccu.edu.tw

ISI 2007 - 08/23/2007
                                                        1




       Genetic programming
   GP is the process of evolving a population of
computer programs, that are candidate solutions,
    according to the evolutionary principles

 Generate a
population of            Evaluate their          Solution
   random               quality (“fitness”)
  programs

                  Create better programs by
                applying genetic operators, eg
                          - mutation
                 - combination (“crossover”)
                                                        2




                                                            1
In GP, programs are
         represented by trees

Trading system: buy if abs(Close(t)/0.7748) < Close(t − 218)



      functions




  terminals
                                                         3




     Typical genetic operator:
       standard crossover
  Standard crossover : exchange two randomly
chosen sub-trees among the parents



                        +

                                                         4




                                                               2
Strong points of GP

 Solutions are produced under a symbolic form
 that can be analyzed by humans
 GP does not assume a predefined size and shape:
 it creates both the functional form and the
 parameters’ values


 “Ability to produce a large number of different,
 yet meaningful hypotheses .. that are non-
                                        non-
 intuitive and sometimes provocative” [Kei02]



                                                     5




  G.P. in the financial domain

1. Knowledge discovery : results are scarce
    Agent based modeling: study the evolution of
    a population of decision rules
    Testing the EMH in real and artificial markets

2. Financial trading :
    Composing portfolios
    Evolving structure of NN used for prediction
    Predicting price evolution
    Discovering trading rules

                                                     6




                                                         3
Discovering trading rules :
            the big picture
         Training         Validation      Out-of-sample
         interval          interval          interval

    1 ) Creation of        Further         Performance
    the trading rules    selection on       evaluation
    using GP             unseen data
    2) Selection of            -
    the best
    resulting           One strategy is
    strategies            chosen for
                        out-of-sample




                                                          7




 Improvements ahead of us (1/2)

1. Rigorous assessment of the GP
   outcomes: controlling the data-mining
   bias!
2. Selecting the right time series: market
   can be efficient
3. Reducing variability of the results from
   GP run to GP run
4. Re-thinking the data-division scheme for
   training, validation and testing periods

                                                          8




                                                              4
Improvements ahead of us (2/2)

5. Pre-processing the data ?!?
6. Re-thinking fitness functions : GP-
   friendly, sensitivity and risk adjusted, …
7. Embedding more domain specific
   knowledge : GP function set is still very
   primitive ..




                                                9




1. Rigorous assessment of the GP
   outcomes




                                                10




                                                     5
GP’s outcomes on the training
          interval (1/2)
 Assume an “inefficient” solution leads to a
 profitable trade with probability 0.5
       Probability than an inefficient system achieves a
       given success rate for a given number of trades
                         Number of trades
                        10        50         100
Success rate   60%     0.38      0.1        0.03
               70%     0.17   3 · 10−3    4 · 10−5

 Guideline : penalize or discard systems with few
 trades
                                                        11




  GP’s outcomes on the training
          interval (2/2)
     Probability than at least one inefficient system
   achieves a success rate = 70% for a given number
                      of solutions

                        Number of trades
                         10     50      100
 Number of      100       1    0.28    0.004
  solutions    1000       1    0.96    0.38
   tested      50000      1     1      0.85

 NB : in a typical GP run, 50000 solutions are tested
 and the average number of trades is usually small …

                                                        12




                                                             6
GP’s outcomes on the testing
           period [ChNa07]
  Compare GP with several variants of
    Random search algorithms
      “Zero-Intelligence Strategies” - ZIS
    Random trading behaviors
      “Lottery trading” - LT

 Issue : how to best constrain randomness ?
  Statistical hypotheses testing
    Null : GP does not outperform ZIS
    Null : GP does not outperform LT
                                              13




2. Selecting the Right Time Series

         Experiments [CIEF2007]:
       Does low entropy imply better
        profitability of GP-induced
            GP Trading Rules ?

            NYSE US 100 Stocks
       Daily Data from 2000 to 2006



                                              14




                                                   7
Experimental setup
   Entropy rate estimator: Kontoyannis et al 1998
             pt
   rt = ln( pt−1 )
   Discretization: {rt } ∈ R → {At } ∈ N




          3,4,1,0,2,6,2,…

                alphabet of size 8 - equal number of values in each
                bin   max. theoretical entropy = 3

                                                                                                       15




  Entropy of NYSE US 100 stocks –
         period 2000-2006
                25
                20




                                                                               Mean = Median = 2.75
                                                                               Max = 2.79
                15




                                                                               Min = 2.68
      Density

                10




                                                                               Rand() boost = 2.96
                                                                               Rand() C lib = 2.77 !
                5
                0




                     2.66   2.68   2.70   2.72     2.74   2.76   2.78   2.80

                                             entropy


NB : a normal distribution of same mean and standard deviation is
plotted for comparison.
                                                                                                       16




                                                                                                            8
Entropy is high but price time
          50
                         series are not random!




                                                                                 50
          40




                                                                                 40
          30




                                                                                 30
Density




                                                                       Density
          20




                                                                                 20
          10




                                                                                 10
          0




                                                                                 0
               2.65     2.70            2.75             2.80   2.85                  2.65      2.70            2.75             2.80   2.85

                               Entropy (original data)                                                 Entropy (shuffled data)




               Original time series                                                          Randomly shuffled
                                                                                                time series

                                                                                                                                               17




                 Stocks in the distribution’s tails

                                                                       Symbol                 Entropy
                                                                        OXY                    2.789
                       Highest entropy                                  VLO                    2.787
                                                                        MRO                    2.785
                         time series
                                                                        BAX                     2.78
                                                                        WAG                    2.776

                                                                       Symbol                  Entropy
                                                                        TWX                     2.677
                       Lowest entropy                                   EMC                     2.694
                                                                         C                      2.712
                         time series                                    JPM                     2.716
                                                                         GE                     2.723



                                                                                                                                               18




                                                                                                                                                    9
Autocorrelation analysis
          High complexity                     Low complexity
            stock (OXY)                          stock (C)




  Up to a lag 100, there are 2.7 x more autocorrelations outside
the 99% confidence bands for the lowest entropy stocks than for
the highest entropy stocks

                                                                   19




    BDS tests: are daily log price
           changes i.i.d ?
                 Lowest entropy time series
       m     δ    TWX        EM C      C      JP M     GE
       2     1    18.06      14.21    13.9    11.82   11.67
       3     1    22.67      19.54    18.76   16.46   16.34
       5     1    34.18      29.17    28.12   26.80   24.21

                 Highest entropy time series
      m     δ    OXY        V LO     M RO     BAX     W AG
      2     1    5.66       4.17     6.69      8.13    7.45
      3     1    6.61       5.35      9.40    11.11    8.89
      5     1    9.04       6.88     13.08    15.31   11.17

  Null that log price changes are i.i.d. always rejected at 1% level
but - whatever BDS parameters - rejection is much stronger for
high-entropy stocks
                                                                   20




                                                                        10
Results: surprisingly ..

             On high-entropy stocks
 GP is always profitable
 LT is never better than GP (95% confidence level)
 GP outperforms LT 2 times out of 5 (95% C.L.)


            On low-entropy stocks
GP is never better than LT (95% C.L.)
LT outperforms GP 2 times out of 5 (95% C.L.)



                                                              21




           Explanations (1/2)
 GP is not good when training period is very
 different from out-of-sample e.g.
                out- of-
            Typical low                 Typical high
       complexity stock (EMC)      complexity stock (MRO)




2000                     2006   2000                        2006

                                                              22




                                                                   11
Explanations (2/2)
   The 2 cases where GP outperforms LT : training
   quite similar to out-of-sample
                    out- of-


          BAX                       WAG




2000               2006    2000                2006

                                                    23




       4. Re-thinking data division
                  scheme




                                                    24




                                                         12
Data division scheme

There is multiple evidence that GP
performs poorly when training interval ≠
from the out-of-sample interval …

What is needed: characterization of the
market condition – similarity measure

Re-learning triggered when similarity or
performances below a threshold


                                           25




5. Re-thinking fitness functions




                                           26




                                                13
Rethinking fitness
    functions

                                    from [LaPo02]

  Issue 1 : some fitness functions induce a
“difficult" landscape for GP GP-friendly fitness

 Issue 2 : a few lucky trades alone may lead to
an outstanding return    risk-adjusted fitness

  Issue 3 : solutions located on peaks of the
fitness landscape are not robust out-of-sample
    sensitivity-adjusted fitness
                                                    27




7. Embedding more domain specific
            knowledge




                                                    28




                                                         14
Embedding more domain specific
         knowledge

Choice of the function/terminal sets is
crucial – no guidelines - 2 risks:
   Extraneous functions
   Required functions not available
As yet, GP uses a very primitive language
 Enrich primitive set with volume, indexes,
bid/ask spread, …
 Enrich function set with cross-correlation,
predictability measure, …
                                                                29




            References (1/2)
[ChKuHo06] S.-H. Chen and T.-W. Kuo and K.-M. Hoi. “Genetic
              S.-             T.-            K.-
Programming and Financial Trading: How Much about "What we
Know“”. In 4th NTU International Conference on Economics,
Finance and Accounting, April 2006.
[ChNa06] S.-H. Chen and N. Navet. “Pretests for genetic-
           S.-                                    genetic-
programming evolved trading programs : “zero-intelligence”
                                           “zero-
strategies and lottery trading”, Proc. ICONIP’2006, Hong-Kong,
                                                    Hong-
October 2006
[ChNa07] S.-H. Chen, N. Navet, "Failure of Genetic-Programming
           S.-                               Genetic-
Induced Trading Strategies: Distinguishing between Efficient
Markets and Inefficient Algorithms", Chapter 8, Evolutionary
Computation in Economics and Finance: Volume 2, Springer,
ISBN3540728201, 2007.
[NaCh07] N. Navet, S.-H. Chen, "Entropy rate and profitability of
                     S.-
technical analysis: experiments on the NYSE US 100 stocks", 6th
International Conference on Computational Intelligence in
Economics & Finance (CIEF2007), Salt-Lake City, USA, July 2007.
                                    Salt-
[Kab02] M. Kaboudan, “GP Forecasts of Stock Prices for Profitable
            Kaboudan,
Trading”, Evolutionary computation in economics and finance,
Kluwers, 2002.
Kluwers,



                                                                30




                                                                     15
References (2/2)
[SaTe02] M. Santini, A. Tettamanzi, “Genetic Programming for
              Santini,    Tettamanzi,
Financial Series Prediction”, Proceedings of EuroGP'2001, 2001.
[BhPiZu02] S. Bhattacharyya, O. V. Pictet, G. Zumbach,
                                       Pictet,  Zumbach,
“Knowledge-Intensive Genetic Discovery in Foreign Exchange
“Knowledge-
Markets”, IEEE Transactions on Evolutionary Computation, vol 6,
n° 2, April 2002.
[LaPo02] W.B. Langdon, R. Poli, “Fondations of Genetic
                              Poli, “Fondations
Programming”, Springer Verlag, 2002.
                           Verlag,
[Kab00] M. Kaboudan, “Genetic Programming Prediction of Stock
             Kaboudan,
Prices”, Computational Economics, vol16, 2000.
[Wag03] L. Wagman, “Stock Portfolio Evaluation: An Application
             Wagman,
of Genetic-Programming-Based Technical Analysis”, Genetic
   Genetic- Programming-
Algorithms and Genetic Programming at Stanford 2003, 2003.
[Dem05] I. Dempsey, “Constant Generation for the Financial
Domain using Grammatical Evolution”, Proceedings of the 2005
workshops on Genetic and evolutionary computation 2005, pp
350 – 353, Washington, June 25 - 26, 2005.
[Kei02] M. Keijzer, “Scientific discovery using Genetic
            Keijzer,
Programming”, Phd Thesis, DTU, Lyngby, Denmark, 2002.
                                     Lyngby,


                                                                  31




                                                       ?
                                                                  32




                                                                       16

More Related Content

Viewers also liked

Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Nicolas Navet
 
Automating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication CycleAutomating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication CycleNicolas Navet
 
Automotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityAutomotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityNicolas Navet
 
Mécanismes de protection dans AUTOSAR OS
Mécanismes de protection dans AUTOSAR OSMécanismes de protection dans AUTOSAR OS
Mécanismes de protection dans AUTOSAR OSNicolas Navet
 
In-Vehicle Networking : a Survey and Look Forward
In-Vehicle Networking : a Survey and Look ForwardIn-Vehicle Networking : a Survey and Look Forward
In-Vehicle Networking : a Survey and Look ForwardNicolas Navet
 
Battery Aware Dynamic Scheduling for Periodic Task Graphs
Battery Aware Dynamic Scheduling for Periodic Task GraphsBattery Aware Dynamic Scheduling for Periodic Task Graphs
Battery Aware Dynamic Scheduling for Periodic Task GraphsNicolas Navet
 

Viewers also liked (8)

Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
Pushing the limits of CAN - Scheduling frames with offsets provides a major p...
 
Automating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication CycleAutomating the Configuration of the FlexRay Communication Cycle
Automating the Configuration of the FlexRay Communication Cycle
 
New policies
New policiesNew policies
New policies
 
Automotive communication systems: from dependability to security
Automotive communication systems: from dependability to securityAutomotive communication systems: from dependability to security
Automotive communication systems: from dependability to security
 
Mécanismes de protection dans AUTOSAR OS
Mécanismes de protection dans AUTOSAR OSMécanismes de protection dans AUTOSAR OS
Mécanismes de protection dans AUTOSAR OS
 
In-Vehicle Networking : a Survey and Look Forward
In-Vehicle Networking : a Survey and Look ForwardIn-Vehicle Networking : a Survey and Look Forward
In-Vehicle Networking : a Survey and Look Forward
 
Battery Aware Dynamic Scheduling for Periodic Task Graphs
Battery Aware Dynamic Scheduling for Periodic Task GraphsBattery Aware Dynamic Scheduling for Periodic Task Graphs
Battery Aware Dynamic Scheduling for Periodic Task Graphs
 
FlexRay Product Days RTaW
FlexRay Product Days RTaWFlexRay Product Days RTaW
FlexRay Product Days RTaW
 

Similar to Isi2007 nn shc_2007

Ibc biological assay development & validation 2011 gra presentation
Ibc biological assay development & validation 2011 gra presentationIbc biological assay development & validation 2011 gra presentation
Ibc biological assay development & validation 2011 gra presentationGreyRigge Associates Ltd
 
Operational Research Presentation
Operational Research PresentationOperational Research Presentation
Operational Research Presentationcbw0109
 
Choosing the Right Regressors
Choosing the Right RegressorsChoosing the Right Regressors
Choosing the Right RegressorsAsad Zaman
 
Notes 7
Notes 7Notes 7
Notes 7butest
 
Testability: Factors and Strategy
Testability: Factors and StrategyTestability: Factors and Strategy
Testability: Factors and StrategyBob Binder
 
The Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements ModelsThe Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements Modelsgregoryg
 
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsAcm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsVinayak Hegde
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchRonald Teo
 
Optimizing Market Segmentation
Optimizing Market SegmentationOptimizing Market Segmentation
Optimizing Market SegmentationRobert Colner
 
Estimating in Software Development: No Silver Bullets Allowed
Estimating in Software Development: No Silver Bullets AllowedEstimating in Software Development: No Silver Bullets Allowed
Estimating in Software Development: No Silver Bullets AllowedKent McDonald
 
Quality Management.ppt
Quality Management.pptQuality Management.ppt
Quality Management.pptddelucy
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityAmogh Mundhekar
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning SystemsTrieu Nguyen
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...height
 
Presentation nicole huyghe (advanced analytics) get inspired 2012
Presentation nicole huyghe (advanced analytics) get inspired 2012Presentation nicole huyghe (advanced analytics) get inspired 2012
Presentation nicole huyghe (advanced analytics) get inspired 2012solutions-2
 

Similar to Isi2007 nn shc_2007 (20)

Bebpa Nice 29 Sept 2011
Bebpa Nice 29 Sept 2011Bebpa Nice 29 Sept 2011
Bebpa Nice 29 Sept 2011
 
Ibc biological assay development & validation 2011 gra presentation
Ibc biological assay development & validation 2011 gra presentationIbc biological assay development & validation 2011 gra presentation
Ibc biological assay development & validation 2011 gra presentation
 
Operational Research Presentation
Operational Research PresentationOperational Research Presentation
Operational Research Presentation
 
Choosing the Right Regressors
Choosing the Right RegressorsChoosing the Right Regressors
Choosing the Right Regressors
 
Notes 7
Notes 7Notes 7
Notes 7
 
Testability: Factors and Strategy
Testability: Factors and StrategyTestability: Factors and Strategy
Testability: Factors and Strategy
 
The Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements ModelsThe Robust Optimization of Non-Linear Requirements Models
The Robust Optimization of Non-Linear Requirements Models
 
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale SystemsAcm Tech Talk - Decomposition Paradigms for Large Scale Systems
Acm Tech Talk - Decomposition Paradigms for Large Scale Systems
 
Lec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-researchLec6 nuts-and-bolts-deep-rl-research
Lec6 nuts-and-bolts-deep-rl-research
 
Optimizing Market Segmentation
Optimizing Market SegmentationOptimizing Market Segmentation
Optimizing Market Segmentation
 
Estimating in Software Development: No Silver Bullets Allowed
Estimating in Software Development: No Silver Bullets AllowedEstimating in Software Development: No Silver Bullets Allowed
Estimating in Software Development: No Silver Bullets Allowed
 
Quality Management.ppt
Quality Management.pptQuality Management.ppt
Quality Management.ppt
 
report
reportreport
report
 
Multiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimalityMultiobjective optimization and trade offs using pareto optimality
Multiobjective optimization and trade offs using pareto optimality
 
Causal reasoning and Learning Systems
Causal reasoning and Learning SystemsCausal reasoning and Learning Systems
Causal reasoning and Learning Systems
 
Risk Ana
Risk AnaRisk Ana
Risk Ana
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
Approximate Dynamic Programming: A New Paradigm for Process Control & Optimiz...
 
Presentation nicole huyghe (advanced analytics) get inspired 2012
Presentation nicole huyghe (advanced analytics) get inspired 2012Presentation nicole huyghe (advanced analytics) get inspired 2012
Presentation nicole huyghe (advanced analytics) get inspired 2012
 

More from Nicolas Navet

Configuring the communication on FlexRay: the case of the static segment
Configuring the communication on FlexRay: the case of the static segmentConfiguring the communication on FlexRay: the case of the static segment
Configuring the communication on FlexRay: the case of the static segmentNicolas Navet
 
Aperiodic Traffic in Response Time Analyses with Adjustable Safety Level
Aperiodic Traffic in Response Time Analyses with Adjustable Safety LevelAperiodic Traffic in Response Time Analyses with Adjustable Safety Level
Aperiodic Traffic in Response Time Analyses with Adjustable Safety LevelNicolas Navet
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore schedulingNicolas Navet
 
Optimizing the Robustness of X-by-Wire using Word Combinatorics
Optimizing the Robustness of X-by-Wire using Word CombinatoricsOptimizing the Robustness of X-by-Wire using Word Combinatorics
Optimizing the Robustness of X-by-Wire using Word CombinatoricsNicolas Navet
 
Cief2007 nn shc_slides
Cief2007 nn shc_slidesCief2007 nn shc_slides
Cief2007 nn shc_slidesNicolas Navet
 
Virtualization in Automotive Embedded Systems : an Outlook
Virtualization in Automotive Embedded Systems : an OutlookVirtualization in Automotive Embedded Systems : an Outlook
Virtualization in Automotive Embedded Systems : an OutlookNicolas Navet
 

More from Nicolas Navet (6)

Configuring the communication on FlexRay: the case of the static segment
Configuring the communication on FlexRay: the case of the static segmentConfiguring the communication on FlexRay: the case of the static segment
Configuring the communication on FlexRay: the case of the static segment
 
Aperiodic Traffic in Response Time Analyses with Adjustable Safety Level
Aperiodic Traffic in Response Time Analyses with Adjustable Safety LevelAperiodic Traffic in Response Time Analyses with Adjustable Safety Level
Aperiodic Traffic in Response Time Analyses with Adjustable Safety Level
 
Ertss2010 multicore scheduling
Ertss2010 multicore schedulingErtss2010 multicore scheduling
Ertss2010 multicore scheduling
 
Optimizing the Robustness of X-by-Wire using Word Combinatorics
Optimizing the Robustness of X-by-Wire using Word CombinatoricsOptimizing the Robustness of X-by-Wire using Word Combinatorics
Optimizing the Robustness of X-by-Wire using Word Combinatorics
 
Cief2007 nn shc_slides
Cief2007 nn shc_slidesCief2007 nn shc_slides
Cief2007 nn shc_slides
 
Virtualization in Automotive Embedded Systems : an Outlook
Virtualization in Automotive Embedded Systems : an OutlookVirtualization in Automotive Embedded Systems : an Outlook
Virtualization in Automotive Embedded Systems : an Outlook
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 

Isi2007 nn shc_2007

  • 1. Financial Data Mining with Genetic Programming: a Survey and Look Forward Nicolas NAVET – INRIA France nnavet@loria.fr Shu-Heng CHEN – AIECON/NCCU Shu- Taiwan chchen@nccu.edu.tw ISI 2007 - 08/23/2007 1 Genetic programming GP is the process of evolving a population of computer programs, that are candidate solutions, according to the evolutionary principles Generate a population of Evaluate their Solution random quality (“fitness”) programs Create better programs by applying genetic operators, eg - mutation - combination (“crossover”) 2 1
  • 2. In GP, programs are represented by trees Trading system: buy if abs(Close(t)/0.7748) < Close(t − 218) functions terminals 3 Typical genetic operator: standard crossover Standard crossover : exchange two randomly chosen sub-trees among the parents + 4 2
  • 3. Strong points of GP Solutions are produced under a symbolic form that can be analyzed by humans GP does not assume a predefined size and shape: it creates both the functional form and the parameters’ values “Ability to produce a large number of different, yet meaningful hypotheses .. that are non- non- intuitive and sometimes provocative” [Kei02] 5 G.P. in the financial domain 1. Knowledge discovery : results are scarce Agent based modeling: study the evolution of a population of decision rules Testing the EMH in real and artificial markets 2. Financial trading : Composing portfolios Evolving structure of NN used for prediction Predicting price evolution Discovering trading rules 6 3
  • 4. Discovering trading rules : the big picture Training Validation Out-of-sample interval interval interval 1 ) Creation of Further Performance the trading rules selection on evaluation using GP unseen data 2) Selection of - the best resulting One strategy is strategies chosen for out-of-sample 7 Improvements ahead of us (1/2) 1. Rigorous assessment of the GP outcomes: controlling the data-mining bias! 2. Selecting the right time series: market can be efficient 3. Reducing variability of the results from GP run to GP run 4. Re-thinking the data-division scheme for training, validation and testing periods 8 4
  • 5. Improvements ahead of us (2/2) 5. Pre-processing the data ?!? 6. Re-thinking fitness functions : GP- friendly, sensitivity and risk adjusted, … 7. Embedding more domain specific knowledge : GP function set is still very primitive .. 9 1. Rigorous assessment of the GP outcomes 10 5
  • 6. GP’s outcomes on the training interval (1/2) Assume an “inefficient” solution leads to a profitable trade with probability 0.5 Probability than an inefficient system achieves a given success rate for a given number of trades Number of trades 10 50 100 Success rate 60% 0.38 0.1 0.03 70% 0.17 3 · 10−3 4 · 10−5 Guideline : penalize or discard systems with few trades 11 GP’s outcomes on the training interval (2/2) Probability than at least one inefficient system achieves a success rate = 70% for a given number of solutions Number of trades 10 50 100 Number of 100 1 0.28 0.004 solutions 1000 1 0.96 0.38 tested 50000 1 1 0.85 NB : in a typical GP run, 50000 solutions are tested and the average number of trades is usually small … 12 6
  • 7. GP’s outcomes on the testing period [ChNa07] Compare GP with several variants of Random search algorithms “Zero-Intelligence Strategies” - ZIS Random trading behaviors “Lottery trading” - LT Issue : how to best constrain randomness ? Statistical hypotheses testing Null : GP does not outperform ZIS Null : GP does not outperform LT 13 2. Selecting the Right Time Series Experiments [CIEF2007]: Does low entropy imply better profitability of GP-induced GP Trading Rules ? NYSE US 100 Stocks Daily Data from 2000 to 2006 14 7
  • 8. Experimental setup Entropy rate estimator: Kontoyannis et al 1998 pt rt = ln( pt−1 ) Discretization: {rt } ∈ R → {At } ∈ N 3,4,1,0,2,6,2,… alphabet of size 8 - equal number of values in each bin max. theoretical entropy = 3 15 Entropy of NYSE US 100 stocks – period 2000-2006 25 20 Mean = Median = 2.75 Max = 2.79 15 Min = 2.68 Density 10 Rand() boost = 2.96 Rand() C lib = 2.77 ! 5 0 2.66 2.68 2.70 2.72 2.74 2.76 2.78 2.80 entropy NB : a normal distribution of same mean and standard deviation is plotted for comparison. 16 8
  • 9. Entropy is high but price time 50 series are not random! 50 40 40 30 30 Density Density 20 20 10 10 0 0 2.65 2.70 2.75 2.80 2.85 2.65 2.70 2.75 2.80 2.85 Entropy (original data) Entropy (shuffled data) Original time series Randomly shuffled time series 17 Stocks in the distribution’s tails Symbol Entropy OXY 2.789 Highest entropy VLO 2.787 MRO 2.785 time series BAX 2.78 WAG 2.776 Symbol Entropy TWX 2.677 Lowest entropy EMC 2.694 C 2.712 time series JPM 2.716 GE 2.723 18 9
  • 10. Autocorrelation analysis High complexity Low complexity stock (OXY) stock (C) Up to a lag 100, there are 2.7 x more autocorrelations outside the 99% confidence bands for the lowest entropy stocks than for the highest entropy stocks 19 BDS tests: are daily log price changes i.i.d ? Lowest entropy time series m δ TWX EM C C JP M GE 2 1 18.06 14.21 13.9 11.82 11.67 3 1 22.67 19.54 18.76 16.46 16.34 5 1 34.18 29.17 28.12 26.80 24.21 Highest entropy time series m δ OXY V LO M RO BAX W AG 2 1 5.66 4.17 6.69 8.13 7.45 3 1 6.61 5.35 9.40 11.11 8.89 5 1 9.04 6.88 13.08 15.31 11.17 Null that log price changes are i.i.d. always rejected at 1% level but - whatever BDS parameters - rejection is much stronger for high-entropy stocks 20 10
  • 11. Results: surprisingly .. On high-entropy stocks GP is always profitable LT is never better than GP (95% confidence level) GP outperforms LT 2 times out of 5 (95% C.L.) On low-entropy stocks GP is never better than LT (95% C.L.) LT outperforms GP 2 times out of 5 (95% C.L.) 21 Explanations (1/2) GP is not good when training period is very different from out-of-sample e.g. out- of- Typical low Typical high complexity stock (EMC) complexity stock (MRO) 2000 2006 2000 2006 22 11
  • 12. Explanations (2/2) The 2 cases where GP outperforms LT : training quite similar to out-of-sample out- of- BAX WAG 2000 2006 2000 2006 23 4. Re-thinking data division scheme 24 12
  • 13. Data division scheme There is multiple evidence that GP performs poorly when training interval ≠ from the out-of-sample interval … What is needed: characterization of the market condition – similarity measure Re-learning triggered when similarity or performances below a threshold 25 5. Re-thinking fitness functions 26 13
  • 14. Rethinking fitness functions from [LaPo02] Issue 1 : some fitness functions induce a “difficult" landscape for GP GP-friendly fitness Issue 2 : a few lucky trades alone may lead to an outstanding return risk-adjusted fitness Issue 3 : solutions located on peaks of the fitness landscape are not robust out-of-sample sensitivity-adjusted fitness 27 7. Embedding more domain specific knowledge 28 14
  • 15. Embedding more domain specific knowledge Choice of the function/terminal sets is crucial – no guidelines - 2 risks: Extraneous functions Required functions not available As yet, GP uses a very primitive language Enrich primitive set with volume, indexes, bid/ask spread, … Enrich function set with cross-correlation, predictability measure, … 29 References (1/2) [ChKuHo06] S.-H. Chen and T.-W. Kuo and K.-M. Hoi. “Genetic S.- T.- K.- Programming and Financial Trading: How Much about "What we Know“”. In 4th NTU International Conference on Economics, Finance and Accounting, April 2006. [ChNa06] S.-H. Chen and N. Navet. “Pretests for genetic- S.- genetic- programming evolved trading programs : “zero-intelligence” “zero- strategies and lottery trading”, Proc. ICONIP’2006, Hong-Kong, Hong- October 2006 [ChNa07] S.-H. Chen, N. Navet, "Failure of Genetic-Programming S.- Genetic- Induced Trading Strategies: Distinguishing between Efficient Markets and Inefficient Algorithms", Chapter 8, Evolutionary Computation in Economics and Finance: Volume 2, Springer, ISBN3540728201, 2007. [NaCh07] N. Navet, S.-H. Chen, "Entropy rate and profitability of S.- technical analysis: experiments on the NYSE US 100 stocks", 6th International Conference on Computational Intelligence in Economics & Finance (CIEF2007), Salt-Lake City, USA, July 2007. Salt- [Kab02] M. Kaboudan, “GP Forecasts of Stock Prices for Profitable Kaboudan, Trading”, Evolutionary computation in economics and finance, Kluwers, 2002. Kluwers, 30 15
  • 16. References (2/2) [SaTe02] M. Santini, A. Tettamanzi, “Genetic Programming for Santini, Tettamanzi, Financial Series Prediction”, Proceedings of EuroGP'2001, 2001. [BhPiZu02] S. Bhattacharyya, O. V. Pictet, G. Zumbach, Pictet, Zumbach, “Knowledge-Intensive Genetic Discovery in Foreign Exchange “Knowledge- Markets”, IEEE Transactions on Evolutionary Computation, vol 6, n° 2, April 2002. [LaPo02] W.B. Langdon, R. Poli, “Fondations of Genetic Poli, “Fondations Programming”, Springer Verlag, 2002. Verlag, [Kab00] M. Kaboudan, “Genetic Programming Prediction of Stock Kaboudan, Prices”, Computational Economics, vol16, 2000. [Wag03] L. Wagman, “Stock Portfolio Evaluation: An Application Wagman, of Genetic-Programming-Based Technical Analysis”, Genetic Genetic- Programming- Algorithms and Genetic Programming at Stanford 2003, 2003. [Dem05] I. Dempsey, “Constant Generation for the Financial Domain using Grammatical Evolution”, Proceedings of the 2005 workshops on Genetic and evolutionary computation 2005, pp 350 – 353, Washington, June 25 - 26, 2005. [Kei02] M. Keijzer, “Scientific discovery using Genetic Keijzer, Programming”, Phd Thesis, DTU, Lyngby, Denmark, 2002. Lyngby, 31 ? 32 16