SlideShare a Scribd company logo
1 of 19
ETI SCC Baremetal FrameworkBandwidth and Power Findings Rishi Khan 3/30/11
Outline SCC Framework Overview Bandwidth Findings Power Findings Software Access
SCC Framework Overview
Messaging Goals Asynchronous Communications Single Threaded Possibly Long Latency until data is received Maximize bandwidth Handle big and small messages Extensible layer that supports MPI, BSD sockets, etc
Design Choices One channel per core-pair per direction Large window size (up to 1MB/channel) Fast polling of incoming data (use MPB) Circular buffer with 16 slots and read/write pointers Poll local pointers, signal remote pointers Use separate cache lines to avoid locking 2 cache lines * 48 channels = 3K per core Double map read and write pages Read – L2 cache enabled Write – L2 cache disabled (write back)
Circular Buffer Example Read data Write the data  (with length as first 2 bytes) Update write pointer Is there space? Poll local write pointer Core 1 (writer) Core 0 (reader) Cache Cache DRAM Update Read Pointer Channel->body[] Channel->local_write Channel->local_read MPB MPB Channel->mpb_read Channel->mpb_write
Socket API intstream_recv(intnid, void *buf, size_tlen, intnb); intstream_send(intnid, const void *buf, size_tlen); L1 L2
MPI L2 L1
Power Goals External monitoring of voltage and current Backend Power API Update time functions with frequency changes Keep chip under safe conditions!! Internal synchronization of clocks External synchronization of host and SCC
External Monitoring Read /opt/sccKit/systemSettings.ini Telnet BMC 5010 Request Status / Parse Data Store timestamps
Backend Power API power_sessionscc_open_power(heap h); void scc_close_power(power_sessionps); intscc_set_freq(power_sessionps, u32 requested_frequency); intscc_set_voltage(power_sessionps, u32 requested_millivolts); char* scc_error_string(status_code code); Allowable Frequency Voltage
Internal Synchronization Cores come out of sccReset in 20ms intervals Each core’s clock starts at cycle 0 at reset Each core’s frequency may be different Solution: Set all cores to 400MHz Barrier After Barrier, set internal integrator to 0
Formulas for Time Use this formula for time: count = scc_cycle_count() - _integral_cycle; ns = _integral_time_ns +count*_current_ns_in_cycles; Use this for frequency change: _integral_time_ns += (scc_cycle_count() - _integral_cycle)                         *_current_ns_in_cycles; _integral_cycle = current_time; _current_ns_in_cycles = 1.0e9/((double)_global_clock/	(double)freq_divider); Freq … Integral Time scc_cycle_count() _integral_cycle
Syncing Front/Back Change voltage from 0.7 to 1.1 every 1 second Measure changes on frontend Cannot get better than 0.5 seconds
Bug in BMC Voltage Readings 3 power islands Drop voltage from 1.2 to 0.7 immediately Raise Voltage after 20 seconds 20.5 Seconds 0.6 Seconds
Other SCC issues If more than 24 cores pound on one MPB, contention overtakes system. Sleep required between polling Allowable Voltage/freq are chip specific BMC telnet response is > 100ms
Future Work DARPA UHPC: Study how voltage/freq affect power dissipation Allan Snavely (UCSD) Systematically study loops over a number of parameters to find the best voltage/freq. Create formulas to approximate good power settings for unknown loops
Access to Software Email scc-support@etinternational.com Beta available Considering open sourcing SCC-specific portions of our work for others to test/learn/improve
Acknowledgements Mark Deazley (ETI) Eric Hoffman (ETI) Allan Snavely (UCSD) Intel: Tim Mattson Ted Kubaska Rob Noradki Wilf Pinfold, ShekharBorkar (UHPC)

More Related Content

What's hot

Arithmatic pipline
Arithmatic piplineArithmatic pipline
Arithmatic piplineA. Shamel
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInteX Research Lab
 
INSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISMINSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISMKamran Ashraf
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazardAJAL A J
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...Hideyuki Tanaka
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazardWaed Shagareen
 
Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipeliningTech_MX
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with PipeliningAneesh Raveendran
 
Computer structurepowerpoint
Computer structurepowerpointComputer structurepowerpoint
Computer structurepowerpointhamid ali
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System AchitectureYashiUpadhyay3
 
Analyzing network packets Using Wireshark
Analyzing network packets Using WiresharkAnalyzing network packets Using Wireshark
Analyzing network packets Using WiresharkSmrutiRanjanBiswal9
 
Memory interleaving and superscalar processor
Memory interleaving and superscalar processorMemory interleaving and superscalar processor
Memory interleaving and superscalar processorsshwetasrivastava
 
6 Switch Fabric
6 Switch Fabric6 Switch Fabric
6 Switch FabricFNian
 
CArcMOOC 05.03 - Pipeline hazards
CArcMOOC 05.03 - Pipeline hazardsCArcMOOC 05.03 - Pipeline hazards
CArcMOOC 05.03 - Pipeline hazardsAlessandro Bogliolo
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards Bharti Khemani
 
3b multiple access
3b multiple access3b multiple access
3b multiple accesskavish dani
 

What's hot (20)

Arithmatic pipline
Arithmatic piplineArithmatic pipline
Arithmatic pipline
 
Instruction pipeline: Computer Architecture
Instruction pipeline: Computer ArchitectureInstruction pipeline: Computer Architecture
Instruction pipeline: Computer Architecture
 
INSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISMINSTRUCTION LEVEL PARALLALISM
INSTRUCTION LEVEL PARALLALISM
 
Vector computing
Vector computingVector computing
Vector computing
 
Pipeline hazard
Pipeline hazardPipeline hazard
Pipeline hazard
 
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
ESPM2 2018 - Automatic Generation of High-Order Finite-Difference Code with T...
 
Pipeline and data hazard
Pipeline and data hazardPipeline and data hazard
Pipeline and data hazard
 
Instruction pipelining
Instruction pipeliningInstruction pipelining
Instruction pipelining
 
3 Pipelining
3 Pipelining3 Pipelining
3 Pipelining
 
Performance Enhancement with Pipelining
Performance Enhancement with PipeliningPerformance Enhancement with Pipelining
Performance Enhancement with Pipelining
 
Real time-embedded-system-lec-06
Real time-embedded-system-lec-06Real time-embedded-system-lec-06
Real time-embedded-system-lec-06
 
Computer structurepowerpoint
Computer structurepowerpointComputer structurepowerpoint
Computer structurepowerpoint
 
Pipelining in Computer System Achitecture
Pipelining in Computer System AchitecturePipelining in Computer System Achitecture
Pipelining in Computer System Achitecture
 
Analyzing network packets Using Wireshark
Analyzing network packets Using WiresharkAnalyzing network packets Using Wireshark
Analyzing network packets Using Wireshark
 
Pipelining
PipeliningPipelining
Pipelining
 
Memory interleaving and superscalar processor
Memory interleaving and superscalar processorMemory interleaving and superscalar processor
Memory interleaving and superscalar processor
 
6 Switch Fabric
6 Switch Fabric6 Switch Fabric
6 Switch Fabric
 
CArcMOOC 05.03 - Pipeline hazards
CArcMOOC 05.03 - Pipeline hazardsCArcMOOC 05.03 - Pipeline hazards
CArcMOOC 05.03 - Pipeline hazards
 
pipeline and pipeline hazards
pipeline and pipeline hazards pipeline and pipeline hazards
pipeline and pipeline hazards
 
3b multiple access
3b multiple access3b multiple access
3b multiple access
 

Viewers also liked

Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014
Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014
Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014Bill Cobb, Appraiser
 
Split Second Presentation 01 13 09
Split Second Presentation 01 13 09Split Second Presentation 01 13 09
Split Second Presentation 01 13 09Split Second
 
Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014
Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014
Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014Bill Cobb, Appraiser
 
An Investigation Of Space In Luiaozhai Zhiyi
An Investigation Of Space In  Luiaozhai ZhiyiAn Investigation Of Space In  Luiaozhai Zhiyi
An Investigation Of Space In Luiaozhai Zhiyigoldmoony
 
A Java Fork_Join Framework
A Java Fork_Join FrameworkA Java Fork_Join Framework
A Java Fork_Join FrameworkHiroshi Ono
 
Types of north surveing
Types of north surveingTypes of north surveing
Types of north surveingDiana Dian
 
City of toronto tennis brochure web
City of toronto tennis brochure webCity of toronto tennis brochure web
City of toronto tennis brochure webRyan
 
Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009
Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009
Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009Polanest
 
Rivers Edge Park Concept - Silverthorne, CO
Rivers Edge Park Concept - Silverthorne, CORivers Edge Park Concept - Silverthorne, CO
Rivers Edge Park Concept - Silverthorne, CODan Skinner
 
Teaching With Online Media
Teaching With Online MediaTeaching With Online Media
Teaching With Online MediaTzvi Pittinsky
 
B414 sml induction student
B414 sml induction studentB414 sml induction student
B414 sml induction studentDiana Shore
 

Viewers also liked (17)

Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014
Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014
Sherwood Forest Subdivision Baton Rouge Home Sales Q3 2011 vs Q3 2014
 
Saints row
Saints rowSaints row
Saints row
 
Cheats
CheatsCheats
Cheats
 
How to build an admin guy
How to build an admin guyHow to build an admin guy
How to build an admin guy
 
Małe biuro podróży
Małe biuro podróżyMałe biuro podróży
Małe biuro podróży
 
Split Second Presentation 01 13 09
Split Second Presentation 01 13 09Split Second Presentation 01 13 09
Split Second Presentation 01 13 09
 
Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014
Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014
Richmond Place Denham Springs LA 70706 Home Sales Q3 2011 vs Q3 2014
 
An Investigation Of Space In Luiaozhai Zhiyi
An Investigation Of Space In  Luiaozhai ZhiyiAn Investigation Of Space In  Luiaozhai Zhiyi
An Investigation Of Space In Luiaozhai Zhiyi
 
A Java Fork_Join Framework
A Java Fork_Join FrameworkA Java Fork_Join Framework
A Java Fork_Join Framework
 
Types of north surveing
Types of north surveingTypes of north surveing
Types of north surveing
 
Le Fou!
Le Fou!Le Fou!
Le Fou!
 
City of toronto tennis brochure web
City of toronto tennis brochure webCity of toronto tennis brochure web
City of toronto tennis brochure web
 
Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009
Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009
Wspolczesne Metody Odwracania Blokady n-m W. Machala Krynica 2009
 
Rivers Edge Park Concept - Silverthorne, CO
Rivers Edge Park Concept - Silverthorne, CORivers Edge Park Concept - Silverthorne, CO
Rivers Edge Park Concept - Silverthorne, CO
 
Tlen 2
Tlen 2Tlen 2
Tlen 2
 
Teaching With Online Media
Teaching With Online MediaTeaching With Online Media
Teaching With Online Media
 
B414 sml induction student
B414 sml induction studentB414 sml induction student
B414 sml induction student
 

Similar to 7 eti pres

A Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change MemoryA Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change MemoryIBM Research
 
memeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesmemeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesGauravDaware2
 
embedded system introduction to microcontrollers
embedded system introduction to microcontrollersembedded system introduction to microcontrollers
embedded system introduction to microcontrollersBarER4
 
MCF5223x: Integrated ColdFire V2 Ethernet Microcontrollers
MCF5223x: Integrated ColdFire V2 Ethernet MicrocontrollersMCF5223x: Integrated ColdFire V2 Ethernet Microcontrollers
MCF5223x: Integrated ColdFire V2 Ethernet MicrocontrollersPremier Farnell
 
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSnehaLatha68
 
Atc On An Simd Cots System Wmpp05
Atc On An Simd Cots System   Wmpp05Atc On An Simd Cots System   Wmpp05
Atc On An Simd Cots System Wmpp05Ülger Ahmet
 
An Overview of LPC2101/02/03
An Overview of LPC2101/02/03An Overview of LPC2101/02/03
An Overview of LPC2101/02/03Premier Farnell
 
A Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory OptimizationA Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory Optimizationijsrd.com
 
Block diagram of msp430x5xx
Block diagram of msp430x5xxBlock diagram of msp430x5xx
Block diagram of msp430x5xxRAMYASREEKUPPALA
 
Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptPavikaSharma3
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodellingObsidian Software
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and ArchitectureSubhasis Dash
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Andriy Berestovskyy
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manualkot seelam
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCoburn Watson
 
Library Characterization Flow
Library Characterization FlowLibrary Characterization Flow
Library Characterization FlowSatish Grandhi
 

Similar to 7 eti pres (20)

A Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change MemoryA Prototype Storage Subsystem based on Phase Change Memory
A Prototype Storage Subsystem based on Phase Change Memory
 
memeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memoriesmemeoryorganization PPT for organization of memories
memeoryorganization PPT for organization of memories
 
embedded system introduction to microcontrollers
embedded system introduction to microcontrollersembedded system introduction to microcontrollers
embedded system introduction to microcontrollers
 
MCF5223x: Integrated ColdFire V2 Ethernet Microcontrollers
MCF5223x: Integrated ColdFire V2 Ethernet MicrocontrollersMCF5223x: Integrated ColdFire V2 Ethernet Microcontrollers
MCF5223x: Integrated ColdFire V2 Ethernet Microcontrollers
 
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOCSOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
SOC-CH3.pptSOC ProcessorsSOC Processors Used in SOC Used in SOC
 
Atc On An Simd Cots System Wmpp05
Atc On An Simd Cots System   Wmpp05Atc On An Simd Cots System   Wmpp05
Atc On An Simd Cots System Wmpp05
 
BARC Report
BARC ReportBARC Report
BARC Report
 
An Overview of LPC2101/02/03
An Overview of LPC2101/02/03An Overview of LPC2101/02/03
An Overview of LPC2101/02/03
 
Isa bus nptel
Isa bus nptelIsa bus nptel
Isa bus nptel
 
A Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory OptimizationA Simplied Bit-Line Technique for Memory Optimization
A Simplied Bit-Line Technique for Memory Optimization
 
Block diagram of msp430x5xx
Block diagram of msp430x5xxBlock diagram of msp430x5xx
Block diagram of msp430x5xx
 
Microelectronics U4.pptx.ppt
Microelectronics U4.pptx.pptMicroelectronics U4.pptx.ppt
Microelectronics U4.pptx.ppt
 
Nilesh ranpura systemmodelling
Nilesh ranpura systemmodellingNilesh ranpura systemmodelling
Nilesh ranpura systemmodelling
 
Computer Organisation and Architecture
Computer Organisation and ArchitectureComputer Organisation and Architecture
Computer Organisation and Architecture
 
Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)Network Programming: Data Plane Development Kit (DPDK)
Network Programming: Data Plane Development Kit (DPDK)
 
Memory management
Memory managementMemory management
Memory management
 
Bc0040
Bc0040Bc0040
Bc0040
 
VJITSk 6713 user manual
VJITSk 6713 user manualVJITSk 6713 user manual
VJITSk 6713 user manual
 
CPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performanceCPN302 your-linux-ami-optimization-and-performance
CPN302 your-linux-ami-optimization-and-performance
 
Library Characterization Flow
Library Characterization FlowLibrary Characterization Flow
Library Characterization Flow
 

7 eti pres

  • 1. ETI SCC Baremetal FrameworkBandwidth and Power Findings Rishi Khan 3/30/11
  • 2. Outline SCC Framework Overview Bandwidth Findings Power Findings Software Access
  • 4. Messaging Goals Asynchronous Communications Single Threaded Possibly Long Latency until data is received Maximize bandwidth Handle big and small messages Extensible layer that supports MPI, BSD sockets, etc
  • 5. Design Choices One channel per core-pair per direction Large window size (up to 1MB/channel) Fast polling of incoming data (use MPB) Circular buffer with 16 slots and read/write pointers Poll local pointers, signal remote pointers Use separate cache lines to avoid locking 2 cache lines * 48 channels = 3K per core Double map read and write pages Read – L2 cache enabled Write – L2 cache disabled (write back)
  • 6. Circular Buffer Example Read data Write the data (with length as first 2 bytes) Update write pointer Is there space? Poll local write pointer Core 1 (writer) Core 0 (reader) Cache Cache DRAM Update Read Pointer Channel->body[] Channel->local_write Channel->local_read MPB MPB Channel->mpb_read Channel->mpb_write
  • 7. Socket API intstream_recv(intnid, void *buf, size_tlen, intnb); intstream_send(intnid, const void *buf, size_tlen); L1 L2
  • 9. Power Goals External monitoring of voltage and current Backend Power API Update time functions with frequency changes Keep chip under safe conditions!! Internal synchronization of clocks External synchronization of host and SCC
  • 10. External Monitoring Read /opt/sccKit/systemSettings.ini Telnet BMC 5010 Request Status / Parse Data Store timestamps
  • 11. Backend Power API power_sessionscc_open_power(heap h); void scc_close_power(power_sessionps); intscc_set_freq(power_sessionps, u32 requested_frequency); intscc_set_voltage(power_sessionps, u32 requested_millivolts); char* scc_error_string(status_code code); Allowable Frequency Voltage
  • 12. Internal Synchronization Cores come out of sccReset in 20ms intervals Each core’s clock starts at cycle 0 at reset Each core’s frequency may be different Solution: Set all cores to 400MHz Barrier After Barrier, set internal integrator to 0
  • 13. Formulas for Time Use this formula for time: count = scc_cycle_count() - _integral_cycle; ns = _integral_time_ns +count*_current_ns_in_cycles; Use this for frequency change: _integral_time_ns += (scc_cycle_count() - _integral_cycle) *_current_ns_in_cycles; _integral_cycle = current_time; _current_ns_in_cycles = 1.0e9/((double)_global_clock/ (double)freq_divider); Freq … Integral Time scc_cycle_count() _integral_cycle
  • 14. Syncing Front/Back Change voltage from 0.7 to 1.1 every 1 second Measure changes on frontend Cannot get better than 0.5 seconds
  • 15. Bug in BMC Voltage Readings 3 power islands Drop voltage from 1.2 to 0.7 immediately Raise Voltage after 20 seconds 20.5 Seconds 0.6 Seconds
  • 16. Other SCC issues If more than 24 cores pound on one MPB, contention overtakes system. Sleep required between polling Allowable Voltage/freq are chip specific BMC telnet response is > 100ms
  • 17. Future Work DARPA UHPC: Study how voltage/freq affect power dissipation Allan Snavely (UCSD) Systematically study loops over a number of parameters to find the best voltage/freq. Create formulas to approximate good power settings for unknown loops
  • 18. Access to Software Email scc-support@etinternational.com Beta available Considering open sourcing SCC-specific portions of our work for others to test/learn/improve
  • 19. Acknowledgements Mark Deazley (ETI) Eric Hoffman (ETI) Allan Snavely (UCSD) Intel: Tim Mattson Ted Kubaska Rob Noradki Wilf Pinfold, ShekharBorkar (UHPC)