4. Messaging Goals Asynchronous Communications Single Threaded Possibly Long Latency until data is received Maximize bandwidth Handle big and small messages Extensible layer that supports MPI, BSD sockets, etc
5. Design Choices One channel per core-pair per direction Large window size (up to 1MB/channel) Fast polling of incoming data (use MPB) Circular buffer with 16 slots and read/write pointers Poll local pointers, signal remote pointers Use separate cache lines to avoid locking 2 cache lines * 48 channels = 3K per core Double map read and write pages Read – L2 cache enabled Write – L2 cache disabled (write back)
6. Circular Buffer Example Read data Write the data (with length as first 2 bytes) Update write pointer Is there space? Poll local write pointer Core 1 (writer) Core 0 (reader) Cache Cache DRAM Update Read Pointer Channel->body[] Channel->local_write Channel->local_read MPB MPB Channel->mpb_read Channel->mpb_write
9. Power Goals External monitoring of voltage and current Backend Power API Update time functions with frequency changes Keep chip under safe conditions!! Internal synchronization of clocks External synchronization of host and SCC
10. External Monitoring Read /opt/sccKit/systemSettings.ini Telnet BMC 5010 Request Status / Parse Data Store timestamps
11. Backend Power API power_sessionscc_open_power(heap h); void scc_close_power(power_sessionps); intscc_set_freq(power_sessionps, u32 requested_frequency); intscc_set_voltage(power_sessionps, u32 requested_millivolts); char* scc_error_string(status_code code); Allowable Frequency Voltage
12. Internal Synchronization Cores come out of sccReset in 20ms intervals Each core’s clock starts at cycle 0 at reset Each core’s frequency may be different Solution: Set all cores to 400MHz Barrier After Barrier, set internal integrator to 0
13. Formulas for Time Use this formula for time: count = scc_cycle_count() - _integral_cycle; ns = _integral_time_ns +count*_current_ns_in_cycles; Use this for frequency change: _integral_time_ns += (scc_cycle_count() - _integral_cycle) *_current_ns_in_cycles; _integral_cycle = current_time; _current_ns_in_cycles = 1.0e9/((double)_global_clock/ (double)freq_divider); Freq … Integral Time scc_cycle_count() _integral_cycle
14. Syncing Front/Back Change voltage from 0.7 to 1.1 every 1 second Measure changes on frontend Cannot get better than 0.5 seconds
15. Bug in BMC Voltage Readings 3 power islands Drop voltage from 1.2 to 0.7 immediately Raise Voltage after 20 seconds 20.5 Seconds 0.6 Seconds
16. Other SCC issues If more than 24 cores pound on one MPB, contention overtakes system. Sleep required between polling Allowable Voltage/freq are chip specific BMC telnet response is > 100ms
17. Future Work DARPA UHPC: Study how voltage/freq affect power dissipation Allan Snavely (UCSD) Systematically study loops over a number of parameters to find the best voltage/freq. Create formulas to approximate good power settings for unknown loops
18. Access to Software Email scc-support@etinternational.com Beta available Considering open sourcing SCC-specific portions of our work for others to test/learn/improve
19. Acknowledgements Mark Deazley (ETI) Eric Hoffman (ETI) Allan Snavely (UCSD) Intel: Tim Mattson Ted Kubaska Rob Noradki Wilf Pinfold, ShekharBorkar (UHPC)