SlideShare a Scribd company logo
From Superscalar OO to Multicore SST Checkpoint and Transactional memory support for SST © dave+stratusdesign@gmail.com stratusdesign.squarespace.com
The OO Superscalar legacy ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Year Inflight Instructions Clock Speed 1998 90 600Mhz 2008 200 3200Mhz
Speculative execution evolution ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Evolution to SST ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Hazards ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
OO & SST Differences ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data hazards ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SST handling of Data Hazards ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SST handling of Control Hazards ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SST Memory Consistency Protocol ,[object Object],[object Object],[object Object]
Checkpoints ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SST new circuit structures ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
SST logic Wakeup Behind Thread DQ Full? DQ Empty for current & spec ckpt? L1 Miss Set  ‘ S ’  bit  in Cache Start Behind  thread in  wait mode  to handle  Defers Start Executing  Main thread Speculatively  ahead Behind Thread Runs Thru DQ for Active Checkpoint Done Ahead Thread  • Normal Mode Behind Thread  • Pause L1  Resolved Ahead Thread • Scout Mode Behind Thread • Pause High Level SW  initiates a Memory Transaction Restore Checkpoint Tx Fail  ‘ S ’ bit Detect Mem Order Violation Br Mispredict Exception WAIT Begin SST Episode Arch Checkpoint Active • Architectural Inactive • Speculative   Instr has Data Dependencies? Execute Instr and Retire OO Enqueue DQ  with Instr & All Resolved Opr Instr has no Data Dependencies? WAIT more data expected Speculation Successful Program Execution resumes were speculation finished
SST scheduling Program Order LDX addr1, %r1 ADD %r1, 0x04, %r2 STX %r2, addr2 SETHI 0x01, %r2 STX %r2, addr3 etc..  ;  Ahead-Thread   1 LDX addr1, %r1  ; Load Miss on addr1, Defer and set R1 [ NT ])  To Defer Q ; Checkpoint Start Ahead-Thread, Behind-Thread Waits for data read 2 ADD %r1, 0x04, %r2 ; Source Operand has NT bit set Defer and set R2 [NT]  To Defer Q 3 STX %r2, addr2 ; Source Operand has NT bit set Defer) To Defer Q 4  SETHI 0x01, %r2  ; Ahead Thread Executes Independently) 5  STX %r2, addr3 ; Ahead Thread Executes Independently & continues speculative execution of more program instructions ;  Load Miss resolves start   Behind-Thread 6  ADD %r1, 0x04, %r2 [NT=0,SNT=1]   ;  NT was reset at 4, set waw bit 7  STX %r2, addr3 SST Order LDX addr1, %r1 ADD %r1, 0x04, %r2 STX %r2, addr2 SETHI 0x01, %r2 STX %r2, addr3 etc..  Deferring data-dependent instructions prevents RAW  –   here %r2 was read at 3 but written before at 2 Saving operands in DQ prevents WAR as any valid data in register at that time is captured and saved for Behind-Thread to use later regardless of future writes by Ahead-Thread Registers with WAW bit not committed to Architectural state  –  here %r2 was written at 4 & 6 ;Deferred Queue LDX  addr1, %r1 [ NT ] ADD  %r1 [ NT ],  0x04, %r2 [ NT ] STX  %r2 [ NT ] , addr2 WAW WAR RAW

More Related Content

What's hot

Process Synchronization And Deadlocks
Process Synchronization And DeadlocksProcess Synchronization And Deadlocks
Process Synchronization And Deadlocks
tech2click
 
OSCh7
OSCh7OSCh7
Operating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and DeadlocksOperating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and Deadlocks
Mukesh Chinta
 
Timing Analysis
Timing AnalysisTiming Analysis
Timing Analysis
rchovatiya
 
CNWeek4 lec2-bscs1
CNWeek4 lec2-bscs1CNWeek4 lec2-bscs1
CNWeek4 lec2-bscs1
syedhaiderraza
 
Jack_Knutson_SNUG2003_ Copy
Jack_Knutson_SNUG2003_ CopyJack_Knutson_SNUG2003_ Copy
Jack_Knutson_SNUG2003_ Copy
Jack Knutson
 
Timing analysis
Timing analysisTiming analysis
Timing analysis
Kunal Doshi
 
Process synchronization
Process synchronizationProcess synchronization
Process synchronization
Syed Hassan Ali
 
6.Process Synchronization
6.Process Synchronization6.Process Synchronization
6.Process Synchronization
Senthil Kanth
 
Major project iii 3
Major project  iii  3Major project  iii  3
Major project iii 3
Gopal Prasad Bansal
 
Chapter 6 - Process Synchronization
Chapter 6 - Process SynchronizationChapter 6 - Process Synchronization
Chapter 6 - Process Synchronization
Wayne Jones Jnr
 
Operating System-Ch6 process synchronization
Operating System-Ch6 process synchronizationOperating System-Ch6 process synchronization
Operating System-Ch6 process synchronization
Syaiful Ahdan
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
Ritu Ranjan Shrivastwa
 
Ch7 OS
Ch7 OSCh7 OS
Ch7 OS
C.U
 
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
Kevin Mathew
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linux
Susant Sahani
 
Operating systems question bank
Operating systems question bankOperating systems question bank
Operating systems question bank
anuradha raheja
 
Operating Systems Chapter 6 silberschatz
Operating Systems Chapter 6 silberschatzOperating Systems Chapter 6 silberschatz
Operating Systems Chapter 6 silberschatz
GiulianoRanauro
 
Burst clock controller
Burst clock controllerBurst clock controller
Burst clock controller
kumar gavanurmath
 
Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"
Ra'Fat Al-Msie'deen
 

What's hot (20)

Process Synchronization And Deadlocks
Process Synchronization And DeadlocksProcess Synchronization And Deadlocks
Process Synchronization And Deadlocks
 
OSCh7
OSCh7OSCh7
OSCh7
 
Operating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and DeadlocksOperating Systems - Process Synchronization and Deadlocks
Operating Systems - Process Synchronization and Deadlocks
 
Timing Analysis
Timing AnalysisTiming Analysis
Timing Analysis
 
CNWeek4 lec2-bscs1
CNWeek4 lec2-bscs1CNWeek4 lec2-bscs1
CNWeek4 lec2-bscs1
 
Jack_Knutson_SNUG2003_ Copy
Jack_Knutson_SNUG2003_ CopyJack_Knutson_SNUG2003_ Copy
Jack_Knutson_SNUG2003_ Copy
 
Timing analysis
Timing analysisTiming analysis
Timing analysis
 
Process synchronization
Process synchronizationProcess synchronization
Process synchronization
 
6.Process Synchronization
6.Process Synchronization6.Process Synchronization
6.Process Synchronization
 
Major project iii 3
Major project  iii  3Major project  iii  3
Major project iii 3
 
Chapter 6 - Process Synchronization
Chapter 6 - Process SynchronizationChapter 6 - Process Synchronization
Chapter 6 - Process Synchronization
 
Operating System-Ch6 process synchronization
Operating System-Ch6 process synchronizationOperating System-Ch6 process synchronization
Operating System-Ch6 process synchronization
 
Process synchronization in Operating Systems
Process synchronization in Operating SystemsProcess synchronization in Operating Systems
Process synchronization in Operating Systems
 
Ch7 OS
Ch7 OSCh7 OS
Ch7 OS
 
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
A Robust UART Architecture Based on Recursive Running Sum Filter for Better N...
 
Synchronization linux
Synchronization linuxSynchronization linux
Synchronization linux
 
Operating systems question bank
Operating systems question bankOperating systems question bank
Operating systems question bank
 
Operating Systems Chapter 6 silberschatz
Operating Systems Chapter 6 silberschatzOperating Systems Chapter 6 silberschatz
Operating Systems Chapter 6 silberschatz
 
Burst clock controller
Burst clock controllerBurst clock controller
Burst clock controller
 
Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"Operating Systems - "Chapter 5 Process Synchronization"
Operating Systems - "Chapter 5 Process Synchronization"
 

Similar to from OO to Multicore SST

OS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and MonitorsOS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and Monitors
sgpraju
 
Final report
Final reportFinal report
Final report
Vineel Reddy G
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
Vipin Varghese
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
pasalapudi
 
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Tanel Poder
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazard
Waed Shagareen
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
RichardWarburton
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
datamantra
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
Irfan Anjum
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
Suman Karumuri
 
Operating System Engineering
Operating System EngineeringOperating System Engineering
Operating System Engineering
Programming Homework Help
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
RichardWarburton
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
JAXLondon2014
 
CH05.pdf
CH05.pdfCH05.pdf
CH05.pdf
ImranKhan880955
 
Coding style for good synthesis
Coding style for good synthesisCoding style for good synthesis
Coding style for good synthesis
Vinchipsytm Vlsitraining
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
Karam Abuataya
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
fcamachob
 
Lecture18-19 (1).ppt
Lecture18-19 (1).pptLecture18-19 (1).ppt
Lecture18-19 (1).ppt
ssuserf67e3a
 
676.v3
676.v3676.v3
676.v3
Rajesh M
 
AMC Minor Technical Issues
AMC Minor Technical IssuesAMC Minor Technical Issues
AMC Minor Technical Issues
Apache Traffic Server
 

Similar to from OO to Multicore SST (20)

OS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and MonitorsOS Process Synchronization, semaphore and Monitors
OS Process Synchronization, semaphore and Monitors
 
Final report
Final reportFinal report
Final report
 
Dpdk applications
Dpdk applicationsDpdk applications
Dpdk applications
 
Analyzing and Interpreting AWR
Analyzing and Interpreting AWRAnalyzing and Interpreting AWR
Analyzing and Interpreting AWR
 
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel PoderTroubleshooting Complex Oracle Performance Problems with Tanel Poder
Troubleshooting Complex Oracle Performance Problems with Tanel Poder
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazard
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Building real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark StreamingBuilding real time Data Pipeline using Spark Streaming
Building real time Data Pipeline using Spark Streaming
 
Control hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptxControl hazards MIPS pipeline.pptx
Control hazards MIPS pipeline.pptx
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
 
Operating System Engineering
Operating System EngineeringOperating System Engineering
Operating System Engineering
 
Performance and predictability (1)
Performance and predictability (1)Performance and predictability (1)
Performance and predictability (1)
 
Performance and Predictability - Richard Warburton
Performance and Predictability - Richard WarburtonPerformance and Predictability - Richard Warburton
Performance and Predictability - Richard Warburton
 
CH05.pdf
CH05.pdfCH05.pdf
CH05.pdf
 
Coding style for good synthesis
Coding style for good synthesisCoding style for good synthesis
Coding style for good synthesis
 
11thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp0111thingsabout11g 12659705398222 Phpapp01
11thingsabout11g 12659705398222 Phpapp01
 
11 Things About11g
11 Things About11g11 Things About11g
11 Things About11g
 
Lecture18-19 (1).ppt
Lecture18-19 (1).pptLecture18-19 (1).ppt
Lecture18-19 (1).ppt
 
676.v3
676.v3676.v3
676.v3
 
AMC Minor Technical Issues
AMC Minor Technical IssuesAMC Minor Technical Issues
AMC Minor Technical Issues
 

Recently uploaded

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
maazsz111
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 

Recently uploaded (20)

Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
SAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloudSAP S/4 HANA sourcing and procurement to Public cloud
SAP S/4 HANA sourcing and procurement to Public cloud
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 

from OO to Multicore SST

  • 1. From Superscalar OO to Multicore SST Checkpoint and Transactional memory support for SST © dave+stratusdesign@gmail.com stratusdesign.squarespace.com
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13. SST logic Wakeup Behind Thread DQ Full? DQ Empty for current & spec ckpt? L1 Miss Set ‘ S ’ bit in Cache Start Behind thread in wait mode to handle Defers Start Executing Main thread Speculatively ahead Behind Thread Runs Thru DQ for Active Checkpoint Done Ahead Thread • Normal Mode Behind Thread • Pause L1 Resolved Ahead Thread • Scout Mode Behind Thread • Pause High Level SW initiates a Memory Transaction Restore Checkpoint Tx Fail ‘ S ’ bit Detect Mem Order Violation Br Mispredict Exception WAIT Begin SST Episode Arch Checkpoint Active • Architectural Inactive • Speculative Instr has Data Dependencies? Execute Instr and Retire OO Enqueue DQ with Instr & All Resolved Opr Instr has no Data Dependencies? WAIT more data expected Speculation Successful Program Execution resumes were speculation finished
  • 14. SST scheduling Program Order LDX addr1, %r1 ADD %r1, 0x04, %r2 STX %r2, addr2 SETHI 0x01, %r2 STX %r2, addr3 etc.. ; Ahead-Thread 1 LDX addr1, %r1 ; Load Miss on addr1, Defer and set R1 [ NT ]) To Defer Q ; Checkpoint Start Ahead-Thread, Behind-Thread Waits for data read 2 ADD %r1, 0x04, %r2 ; Source Operand has NT bit set Defer and set R2 [NT] To Defer Q 3 STX %r2, addr2 ; Source Operand has NT bit set Defer) To Defer Q 4 SETHI 0x01, %r2 ; Ahead Thread Executes Independently) 5 STX %r2, addr3 ; Ahead Thread Executes Independently & continues speculative execution of more program instructions ; Load Miss resolves start Behind-Thread 6 ADD %r1, 0x04, %r2 [NT=0,SNT=1] ; NT was reset at 4, set waw bit 7 STX %r2, addr3 SST Order LDX addr1, %r1 ADD %r1, 0x04, %r2 STX %r2, addr2 SETHI 0x01, %r2 STX %r2, addr3 etc.. Deferring data-dependent instructions prevents RAW – here %r2 was read at 3 but written before at 2 Saving operands in DQ prevents WAR as any valid data in register at that time is captured and saved for Behind-Thread to use later regardless of future writes by Ahead-Thread Registers with WAW bit not committed to Architectural state – here %r2 was written at 4 & 6 ;Deferred Queue LDX addr1, %r1 [ NT ] ADD %r1 [ NT ], 0x04, %r2 [ NT ] STX %r2 [ NT ] , addr2 WAW WAR RAW