SlideShare a Scribd company logo
Threaded-Execution and CPS Provide Smooth Switching Between Execution
Modes
Dave Mason
Toronto Metropolitan University
©2023 Dave Mason
Execution Models
Source Interpretation
Bytecode Interpretation
Threaded Execution
Hardware Interpretation
Timing of Fibonacci
Thread CPS Object NativePharo JIT
169
278
347
1,094
559
186
304
391
1,918
591
136
332
564
2,010
695
Apple M1
Apple M2 Pro
Intel i7 2.8GHz
Threaded is 2.3-4.7 times faster than bytecode.
Zag Smalltalk
supports 2 models: threaded and native CPS
seamless transition
threaded is smaller and fully supports step-debugging
native is 3-5 times faster and can fallback to full debugging after a send
Threaded Execution
sequence of addresses of native code
like an extensible bytecode
each word passes control to the next
associated with Forth, originally in PDP-11 FORTRAN compiler, was used in BrouHaHa
fibonacci
1 fibonacci
2 self <= 2 ifTrue: [ ↑ 1 ].
3 ↑ (self - 1) fibonacci + (self - 2) fibonacci
Threaded fibonacci
1 verifySelector,
2 ":recurse",
3 dup, // self
4 pushLiteral, Object.from(2),
5 p5, // <=
6 ifFalse,"label3",
7 drop, // self
8 pushLiteral1,
9 returnNoContext,
10 ":label3",
11 pushContext,"^",
12 pushLocal0, // self
13 pushLiteral1,
14 p2, // -
15 callRecursive, "recurse",
16 pushLocal0, //self
17 pushLiteral2,
18 p2, // -
19 callRecursive, "recurse",
20 p1, // +
21 returnTop,
Some control words
1 pub fn drop(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
2 tailcall pc[0].prim(pc+1, sp+1, process, context, selector, cache);
3 }
4 pub fn dup(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
5 const newSp = sp-1;
6 newSp[0] = newSp[1];
7 tailcall pc[0].prim(pc+1, newSp, process, context, selector, cache);
8 }
9 pub fn ifFalse(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
10 const v = sp[0];
11 if (False.equals(v)) tailcall branch(pc, sp+1, process, context, selector, cache );
12 if (True.equals(v)) tailcall pc[1].prim(pc+2, sp+1, process, context, selector, cache );
13 @panic("non boolean");
14 }
15 pub fn p1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
16 if (!Sym.@"+".selectorEquals(selector)) tailcall dnu(pc, sp, process, context, selector);
17 sp[1] = inlines.p1(sp[1], sp[0])
18 catch tailcall pc[0].prim(pc+1, sp, process, context, selector, cache);
19 tailcall context.npc(context.tpc, sp+1, process, context, selector, cache);
20 }
Continuation Passing Style
continuation is the rest of the program
comes from optimization of functional languages (continuation was a closure)
no implicit stack frames - passed explicitly
like the original Smalltalk passing Context (maybe not obvious that Context is a special kind of closure)
Normal Style
1 pub fn fibNative(self: i64) i64 {
2 if (self <= 2) return 1;
3 return fibNative(self - 1) + fibNative(self - 2);
4 }
5 const one = Object.from(1);
6 const two = Object.from(2);
7 pub fn fibObject(self: Object) Object {
8 if (i.p5N(self,two)) return one;
9 const m1 = i.p2L(self, 1) catch @panic("int subtract failed in fibObject");
10 const fm1 = fibObject(m1);
11 const m2 = i.p2L(self, 2) catch @panic("int subtract failed in fibObject");
12 const fm2 = fibObject(m2);
13 return i.p1(fm1, fm2) catch @panic("int add failed in fibObject");
14 }
Continuation Passing Style
1 pub fn fibCPS(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
2 if (!fibSym.equals(selector)) tailcall dnu(pc,sp,process,context,selector);
3 if (inlined.p5N(sp[0],Object.from(2))) {
4 sp[0] = Object.from(1);
5 tailcall context.npc(context.tpc,sp,process,context,selector);
6 }
7 const newContext = context.push(sp,process,fibThread.asCompiledMethodPtr(),0,2,0);
8 const newSp = newContext.sp();
9 newSp[0]=inlined.p2L(sp[0],1)
10 catch tailcall pc[10].prim(pc+11,newSp+1,process,context,fibSym);
11 newContext.setReturnBoth(fibCPS1, pc+13); // after first callRecursive (line 15 above)
12 tailcall fibCPS(fibCPST+1,newSp,process,newContext,fibSym);
13 }
Continuation Passing Style
1 fn fibCPS1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, _:Object) Stack {
2 const newSp = sp.push();
3 newSp[0] = inlined.p2L(context.getTemp(0),2)
4 catch tailcall pc[0].prim(pc+1,newSp,process,context,fibSym));
5 context.setReturnBoth(fibCPS2, pc+3); // after 2nd callRecursive (line 19 above)
6 tailcall fibCPS(fibCPST+1,newSp,process,context,fibSym);
7 }
Continuation Passing Style
1 fn fibCPS2(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack {
2 const sum = inlined.p1(sp[1],sp[0])
3 catch tailcall pc[0].prim(pc+1,sp,process,context,fibSym);
4 const result = context.pop(process);
5 const newSp = result.sp;
6 newSp.put0(sum);
7 const callerContext = result.ctxt;
8 tailcall callerContext.npc(callerContext.tpc,newSp,process,callerContext,selector);
9 }
Implementation Decisions
Context must contain not only native return points, but also threaded return points;
CompiledMethods must facilitate seamless switching between execution modes;
the stack cannot reasonably be woven into the hardware stack with function calls, so no native stack;
as well as parameters, locals, and working space, stack is used to allocate Context and BlockClosure as
usually released in LIFO pattern
Conclusions
with proper structures, can easily switch between threaded and native code
threaded code is “good enough” for many purposes
this is preliminary work, so some open questions
many experiments to run to validate my intuitions
many more details in the paper
Questions?
@DrDaveMason dmason@torontomu.ca
https://github.com/dvmason/Zag-Smalltalk
Timing of Fibonacci
Thread CPS P-JIT P-Stack Z-Byte
347
1,094
559
5,033
4,730
391
1,918
591
3,527
4,820
564
2,010
695
3,568
4,609
Apple M1
Apple M2 Pro
Intel i7 2.8GHz
M2 P-Stack is presumed to be mis-configured

More Related Content

Similar to Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes

MPI
MPIMPI
Exploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernel
Vitaly Nikolenko
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
ScyllaDB
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
Miklos Christine
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
South Tyrol Free Software Conference
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
Adrian Huang
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2Bahul Neel Upadhyaya
 
Linux Process & CF scheduling
Linux Process & CF schedulingLinux Process & CF scheduling
Linux Process & CF schedulingSangJung Woo
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术Feng Yu
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
Sigma Software
 
Asynchronous programming with java script and node.js
Asynchronous programming with java script and node.jsAsynchronous programming with java script and node.js
Asynchronous programming with java script and node.js
Timur Shemsedinov
 
CorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. ProgrammingCorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. Programming
Slide_N
 
Process management
Process managementProcess management
Process management
Utkarsh Kulshrestha
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
Oleg Podsechin
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Asuka Nakajima
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter ppt
Muhammad Sikandar Mustafa
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on NetezzaAjay Ohri
 
Handling Large State on BEAM
Handling Large State on BEAMHandling Large State on BEAM
Handling Large State on BEAM
Yoshihiro TANAKA
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
Hajime Tazaki
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
Maxim Kharchenko
 

Similar to Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes (20)

MPI
MPIMPI
MPI
 
Exploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernelExploitation of counter overflows in the Linux kernel
Exploitation of counter overflows in the Linux kernel
 
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in GoCapturing NIC and Kernel TX and RX Timestamps for Packets in Go
Capturing NIC and Kernel TX and RX Timestamps for Packets in Go
 
What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?What's new with Apache Spark's Structured Streaming?
What's new with Apache Spark's Structured Streaming?
 
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel  write Python code, get Fortran ...
SFSCON23 - Emily Bourne Yaman Güçlü - Pyccel write Python code, get Fortran ...
 
Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...Process Address Space: The way to create virtual address (page table) of user...
Process Address Space: The way to create virtual address (page table) of user...
 
To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2To Infinity & Beyond: Protocols & sequences in Node - Part 2
To Infinity & Beyond: Protocols & sequences in Node - Part 2
 
Linux Process & CF scheduling
Linux Process & CF schedulingLinux Process & CF scheduling
Linux Process & CF scheduling
 
Cpu高效编程技术
Cpu高效编程技术Cpu高效编程技术
Cpu高效编程技术
 
Столпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай МозговойСтолпы функционального программирования для адептов ООП, Николай Мозговой
Столпы функционального программирования для адептов ООП, Николай Мозговой
 
Asynchronous programming with java script and node.js
Asynchronous programming with java script and node.jsAsynchronous programming with java script and node.js
Asynchronous programming with java script and node.js
 
CorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. ProgrammingCorePy High-Productivity CellB.E. Programming
CorePy High-Productivity CellB.E. Programming
 
Process management
Process managementProcess management
Process management
 
Server side JavaScript: going all the way
Server side JavaScript: going all the wayServer side JavaScript: going all the way
Server side JavaScript: going all the way
 
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading SkillsReverse Engineering Dojo: Enhancing Assembly Reading Skills
Reverse Engineering Dojo: Enhancing Assembly Reading Skills
 
Advanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter pptAdvanced procedures in assembly language Full chapter ppt
Advanced procedures in assembly language Full chapter ppt
 
Using R on Netezza
Using R on NetezzaUsing R on Netezza
Using R on Netezza
 
Handling Large State on BEAM
Handling Large State on BEAMHandling Large State on BEAM
Handling Large State on BEAM
 
Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013Direct Code Execution @ CoNEXT 2013
Direct Code Execution @ CoNEXT 2013
 
0.5mln packets per second with Erlang
0.5mln packets per second with Erlang0.5mln packets per second with Erlang
0.5mln packets per second with Erlang
 

More from ESUG

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
ESUG
 
Technical documentation support in Pharo
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
ESUG
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
ESUG
 
Sequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
ESUG
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
ESUG
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
ESUG
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
ESUG
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
ESUG
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
ESUG
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
ESUG
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
ESUG
 
Garbage Collector Tuning
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
ESUG
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
ESUG
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
ESUG
 
thisContext in the Debugger
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
ESUG
 
Websockets for Fencing Score
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
ESUG
 
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ESUG
 
Advanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
ESUG
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
ESUG
 
BioSmalltalk
BioSmalltalkBioSmalltalk
BioSmalltalk
ESUG
 

More from ESUG (20)

Workshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programmingWorkshop: Identifying concept inventories in agile programming
Workshop: Identifying concept inventories in agile programming
 
Technical documentation support in Pharo
Technical documentation support in PharoTechnical documentation support in Pharo
Technical documentation support in Pharo
 
The Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and RoadmapThe Pharo Debugger and Debugging tools: Advances and Roadmap
The Pharo Debugger and Debugging tools: Advances and Roadmap
 
Sequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in PharoSequence: Pipeline modelling in Pharo
Sequence: Pipeline modelling in Pharo
 
Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...Migration process from monolithic to micro frontend architecture in mobile ap...
Migration process from monolithic to micro frontend architecture in mobile ap...
 
Analyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early resultsAnalyzing Dart Language with Pharo: Report and early results
Analyzing Dart Language with Pharo: Report and early results
 
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
Transpiling Pharo Classes to JS ECMAScript 5 versus ECMAScript 6
 
A Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test GenerationA Unit Test Metamodel for Test Generation
A Unit Test Metamodel for Test Generation
 
Creating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic ProgrammingCreating Unit Tests Using Genetic Programming
Creating Unit Tests Using Genetic Programming
 
Exploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience ReportExploring GitHub Actions through EGAD: An Experience Report
Exploring GitHub Actions through EGAD: An Experience Report
 
Pharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIsPharo: a reflective language A first systematic analysis of reflective APIs
Pharo: a reflective language A first systematic analysis of reflective APIs
 
Garbage Collector Tuning
Garbage Collector TuningGarbage Collector Tuning
Garbage Collector Tuning
 
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame CaseImproving Performance Through Object Lifetime Profiling: the DataFrame Case
Improving Performance Through Object Lifetime Profiling: the DataFrame Case
 
Pharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and FuturePharo DataFrame: Past, Present, and Future
Pharo DataFrame: Past, Present, and Future
 
thisContext in the Debugger
thisContext in the DebuggerthisContext in the Debugger
thisContext in the Debugger
 
Websockets for Fencing Score
Websockets for Fencing ScoreWebsockets for Fencing Score
Websockets for Fencing Score
 
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScriptShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
ShowUs: PharoJS.org Develop in Pharo, Run on JavaScript
 
Advanced Object- Oriented Design Mooc
Advanced Object- Oriented Design MoocAdvanced Object- Oriented Design Mooc
Advanced Object- Oriented Design Mooc
 
A New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and TransformationsA New Architecture Reconciling Refactorings and Transformations
A New Architecture Reconciling Refactorings and Transformations
 
BioSmalltalk
BioSmalltalkBioSmalltalk
BioSmalltalk
 

Recently uploaded

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
Tier1 app
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
vrstrong314
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
Fermin Galan
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
NYGGS Automation Suite
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
rickgrimesss22
 

Recently uploaded (20)

Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 
Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604Orion Context Broker introduction 20240604
Orion Context Broker introduction 20240604
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Enterprise Resource Planning System in Telangana
Enterprise Resource Planning System in TelanganaEnterprise Resource Planning System in Telangana
Enterprise Resource Planning System in Telangana
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptxTop Features to Include in Your Winzo Clone App for Business Growth (4).pptx
Top Features to Include in Your Winzo Clone App for Business Growth (4).pptx
 

Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes

  • 1. Threaded-Execution and CPS Provide Smooth Switching Between Execution Modes Dave Mason Toronto Metropolitan University ©2023 Dave Mason
  • 2. Execution Models Source Interpretation Bytecode Interpretation Threaded Execution Hardware Interpretation
  • 3. Timing of Fibonacci Thread CPS Object NativePharo JIT 169 278 347 1,094 559 186 304 391 1,918 591 136 332 564 2,010 695 Apple M1 Apple M2 Pro Intel i7 2.8GHz Threaded is 2.3-4.7 times faster than bytecode.
  • 4. Zag Smalltalk supports 2 models: threaded and native CPS seamless transition threaded is smaller and fully supports step-debugging native is 3-5 times faster and can fallback to full debugging after a send
  • 5. Threaded Execution sequence of addresses of native code like an extensible bytecode each word passes control to the next associated with Forth, originally in PDP-11 FORTRAN compiler, was used in BrouHaHa
  • 6. fibonacci 1 fibonacci 2 self <= 2 ifTrue: [ ↑ 1 ]. 3 ↑ (self - 1) fibonacci + (self - 2) fibonacci
  • 7. Threaded fibonacci 1 verifySelector, 2 ":recurse", 3 dup, // self 4 pushLiteral, Object.from(2), 5 p5, // <= 6 ifFalse,"label3", 7 drop, // self 8 pushLiteral1, 9 returnNoContext, 10 ":label3", 11 pushContext,"^", 12 pushLocal0, // self 13 pushLiteral1, 14 p2, // - 15 callRecursive, "recurse", 16 pushLocal0, //self 17 pushLiteral2, 18 p2, // - 19 callRecursive, "recurse", 20 p1, // + 21 returnTop,
  • 8. Some control words 1 pub fn drop(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 2 tailcall pc[0].prim(pc+1, sp+1, process, context, selector, cache); 3 } 4 pub fn dup(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 5 const newSp = sp-1; 6 newSp[0] = newSp[1]; 7 tailcall pc[0].prim(pc+1, newSp, process, context, selector, cache); 8 } 9 pub fn ifFalse(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 10 const v = sp[0]; 11 if (False.equals(v)) tailcall branch(pc, sp+1, process, context, selector, cache ); 12 if (True.equals(v)) tailcall pc[1].prim(pc+2, sp+1, process, context, selector, cache ); 13 @panic("non boolean"); 14 } 15 pub fn p1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 16 if (!Sym.@"+".selectorEquals(selector)) tailcall dnu(pc, sp, process, context, selector); 17 sp[1] = inlines.p1(sp[1], sp[0]) 18 catch tailcall pc[0].prim(pc+1, sp, process, context, selector, cache); 19 tailcall context.npc(context.tpc, sp+1, process, context, selector, cache); 20 }
  • 9. Continuation Passing Style continuation is the rest of the program comes from optimization of functional languages (continuation was a closure) no implicit stack frames - passed explicitly like the original Smalltalk passing Context (maybe not obvious that Context is a special kind of closure)
  • 10. Normal Style 1 pub fn fibNative(self: i64) i64 { 2 if (self <= 2) return 1; 3 return fibNative(self - 1) + fibNative(self - 2); 4 } 5 const one = Object.from(1); 6 const two = Object.from(2); 7 pub fn fibObject(self: Object) Object { 8 if (i.p5N(self,two)) return one; 9 const m1 = i.p2L(self, 1) catch @panic("int subtract failed in fibObject"); 10 const fm1 = fibObject(m1); 11 const m2 = i.p2L(self, 2) catch @panic("int subtract failed in fibObject"); 12 const fm2 = fibObject(m2); 13 return i.p1(fm1, fm2) catch @panic("int add failed in fibObject"); 14 }
  • 11. Continuation Passing Style 1 pub fn fibCPS(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 2 if (!fibSym.equals(selector)) tailcall dnu(pc,sp,process,context,selector); 3 if (inlined.p5N(sp[0],Object.from(2))) { 4 sp[0] = Object.from(1); 5 tailcall context.npc(context.tpc,sp,process,context,selector); 6 } 7 const newContext = context.push(sp,process,fibThread.asCompiledMethodPtr(),0,2,0); 8 const newSp = newContext.sp(); 9 newSp[0]=inlined.p2L(sp[0],1) 10 catch tailcall pc[10].prim(pc+11,newSp+1,process,context,fibSym); 11 newContext.setReturnBoth(fibCPS1, pc+13); // after first callRecursive (line 15 above) 12 tailcall fibCPS(fibCPST+1,newSp,process,newContext,fibSym); 13 }
  • 12. Continuation Passing Style 1 fn fibCPS1(pc:PC, sp:Stack, process:*Process, context:ContextPtr, _:Object) Stack { 2 const newSp = sp.push(); 3 newSp[0] = inlined.p2L(context.getTemp(0),2) 4 catch tailcall pc[0].prim(pc+1,newSp,process,context,fibSym)); 5 context.setReturnBoth(fibCPS2, pc+3); // after 2nd callRecursive (line 19 above) 6 tailcall fibCPS(fibCPST+1,newSp,process,context,fibSym); 7 }
  • 13. Continuation Passing Style 1 fn fibCPS2(pc:PC, sp:Stack, process:*Process, context:ContextPtr, selector:Object) Stack { 2 const sum = inlined.p1(sp[1],sp[0]) 3 catch tailcall pc[0].prim(pc+1,sp,process,context,fibSym); 4 const result = context.pop(process); 5 const newSp = result.sp; 6 newSp.put0(sum); 7 const callerContext = result.ctxt; 8 tailcall callerContext.npc(callerContext.tpc,newSp,process,callerContext,selector); 9 }
  • 14. Implementation Decisions Context must contain not only native return points, but also threaded return points; CompiledMethods must facilitate seamless switching between execution modes; the stack cannot reasonably be woven into the hardware stack with function calls, so no native stack; as well as parameters, locals, and working space, stack is used to allocate Context and BlockClosure as usually released in LIFO pattern
  • 15. Conclusions with proper structures, can easily switch between threaded and native code threaded code is “good enough” for many purposes this is preliminary work, so some open questions many experiments to run to validate my intuitions many more details in the paper
  • 17. Timing of Fibonacci Thread CPS P-JIT P-Stack Z-Byte 347 1,094 559 5,033 4,730 391 1,918 591 3,527 4,820 564 2,010 695 3,568 4,609 Apple M1 Apple M2 Pro Intel i7 2.8GHz M2 P-Stack is presumed to be mis-configured