2009. 12. 02



2010   1   8
•   Dynamo

               •
                   •
               •   more challenging situation

               •   -O                   Dynamo -O4

               •   HP PA-8000, HPUX10.20 OS



2010   1   8
•   ’00

                   •
               •
                   •   Shrink-wrapped software (as a collection of DLLs)

               •                          (Java JIT/                 )

               •
                   •   CISC→RISC→VLIW


2010   1   8
Dynamo
               •   1996    HP Lab.

               •   dynamic optimization system (               )

               •             Software

               •   transparent

                   •                           (preparatory)

                   •   programmer assistance

                   •   legachy

                   •                    /JIT
2010   1   8
details available in...?




2010   1   8
•   JIT(Self, Smaltalk-80, Java)

               •   selective dynamic compilation

                   •   user annotation

                   •   language extensions

               •   non-native system emulation

               •   offline binary translation

               •   superscalar microprocessor

               •   trace cache


2010   1   8
Cited by 189

                                                                            #. Cites

               40


               30


               20


               10


                0
                    2000   2001   2002   2003   2004   2005   2006   2007   2008   2009


2010   1   8
2009
               •   Efe Yardimci et al., Mostly static program partitioning of binary executables
               •   Ryan W. Moore et al., Addressing the challenges of DBT for the ARM architecture
               •   Borys J. Bradel et al., A study of potential parallelism among traces in Java programs
               •   Tobias Werth et al., Dynamic code footprint optimization for the IBM Cell Broadband Engine
               •   Seung Woo Son et al., A compiler-directed data prefetching scheme for chip multiprocessors
               •   Kim Hazelwood et al., Scalable support for multithreaded applications on dynamic binary
                   instrumentation systems
               •   Mason Chang et al., Tracing for web 3.0: trace compilation for the next generation web applications
               •   Andreas Gal et al., Trace-based just-in-time type specialization for dynamic languages
               •   Florian Brandner, Precise simulation of interrupts using a rollback mechanism
               •   Jason Mars et al., Scenario Based Optimization: A Framework for Statically Enabling Online
                   Optimizations
               •   Jianjun Li et al., An Evaluation of Misaligned Data Access Handling Mechanisms in Dynamic Binary
                   Translation Systems
               •   Naveen Kumar et al., Transparent Debugging of Dynamically Optimized Code
               •   Alessandro Pellegrini et al., Di-DyMeLoR: Logging only Dirty Chunks for Efficient Management of
                   Dynamic Memory Based Optimistic Simulation Objects
               •   Daniel Williams et al., Using program metadata to support SDT in object-oriented applications
               •   Carl Friedrich Bolz et al., Tracing the meta-level: PyPy's tracing JIT compiler


2010   1   8
•   Overview of how Dynamo works

               •   Dynamo’s startup mechanism

               •
               •
               •
               •
               •
2010   1   8
Overview

                          Dynamo only interprets the inst.
                          stream until a “hot” inst. seq. is
                          identified.

                          Dynamo generates an optimized
                          ver. of the trace(fragment) into a
                          software code cache(fragment
                          cache).




2010   1   8
Startup and Initialization




2010   1   8
Fragment Formation

               •   Trace Selection

                   •   not accuracy, but predictability

                   •   the amount of counter updates and counter storage

                   •   interpretation

               •   Trace Optimization

                   •   fall-through direction remains on the trace

               •   Fragment code generation

                   •   emitting the fragment body

                   •   emitting the fragment exit stubs

2010   1   8
Trace Selection




               •   interpretation > statistical PC sampling

               •   MRET to pick hot traces




2010   1   8
Trace Optimization




2010   1   8
Fragment Linking




2010   1   8
Fragment Cache Management


               •                                  Cold Traces



               •   Dynamo fragment creation rate sharp



               •   fragment                Free    Object       GC




2010   1   8
Fragment Cache Management




2010   1   8
Signal Handling


               •            fragment            Signal

               •   signal                   /

               •   fragment



               •   Prototype

                   •   Conservative/Aggressive



2010   1   8
Speedup ratio
                                              150kB fragment cache
               startup overhead




                  the lack of stable working set
2010   1   8
Bailing-out
               bail-out       0




2010   1   8
2010   1   8

Dynamo 100107092845-phpapp02

  • 1.
  • 2.
    Dynamo • • • more challenging situation • -O Dynamo -O4 • HP PA-8000, HPUX10.20 OS 2010 1 8
  • 3.
    ’00 • • • Shrink-wrapped software (as a collection of DLLs) • (Java JIT/ ) • • CISC→RISC→VLIW 2010 1 8
  • 4.
    Dynamo • 1996 HP Lab. • dynamic optimization system ( ) • Software • transparent • (preparatory) • programmer assistance • legachy • /JIT 2010 1 8
  • 5.
  • 6.
    JIT(Self, Smaltalk-80, Java) • selective dynamic compilation • user annotation • language extensions • non-native system emulation • offline binary translation • superscalar microprocessor • trace cache 2010 1 8
  • 7.
    Cited by 189 #. Cites 40 30 20 10 0 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 1 8
  • 8.
    2009 • Efe Yardimci et al., Mostly static program partitioning of binary executables • Ryan W. Moore et al., Addressing the challenges of DBT for the ARM architecture • Borys J. Bradel et al., A study of potential parallelism among traces in Java programs • Tobias Werth et al., Dynamic code footprint optimization for the IBM Cell Broadband Engine • Seung Woo Son et al., A compiler-directed data prefetching scheme for chip multiprocessors • Kim Hazelwood et al., Scalable support for multithreaded applications on dynamic binary instrumentation systems • Mason Chang et al., Tracing for web 3.0: trace compilation for the next generation web applications • Andreas Gal et al., Trace-based just-in-time type specialization for dynamic languages • Florian Brandner, Precise simulation of interrupts using a rollback mechanism • Jason Mars et al., Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations • Jianjun Li et al., An Evaluation of Misaligned Data Access Handling Mechanisms in Dynamic Binary Translation Systems • Naveen Kumar et al., Transparent Debugging of Dynamically Optimized Code • Alessandro Pellegrini et al., Di-DyMeLoR: Logging only Dirty Chunks for Efficient Management of Dynamic Memory Based Optimistic Simulation Objects • Daniel Williams et al., Using program metadata to support SDT in object-oriented applications • Carl Friedrich Bolz et al., Tracing the meta-level: PyPy's tracing JIT compiler 2010 1 8
  • 9.
    Overview of how Dynamo works • Dynamo’s startup mechanism • • • • • 2010 1 8
  • 10.
    Overview Dynamo only interprets the inst. stream until a “hot” inst. seq. is identified. Dynamo generates an optimized ver. of the trace(fragment) into a software code cache(fragment cache). 2010 1 8
  • 11.
  • 12.
    Fragment Formation • Trace Selection • not accuracy, but predictability • the amount of counter updates and counter storage • interpretation • Trace Optimization • fall-through direction remains on the trace • Fragment code generation • emitting the fragment body • emitting the fragment exit stubs 2010 1 8
  • 13.
    Trace Selection • interpretation > statistical PC sampling • MRET to pick hot traces 2010 1 8
  • 14.
  • 15.
  • 16.
    Fragment Cache Management • Cold Traces • Dynamo fragment creation rate sharp • fragment Free Object GC 2010 1 8
  • 17.
  • 18.
    Signal Handling • fragment Signal • signal / • fragment • Prototype • Conservative/Aggressive 2010 1 8
  • 19.
    Speedup ratio 150kB fragment cache startup overhead the lack of stable working set 2010 1 8
  • 20.
    Bailing-out bail-out 0 2010 1 8
  • 21.
    2010 1 8