SlideShare a Scribd company logo
Staggering SPI
         Performance
Or how I learned to stop bit-banging and love SPDR
                       --Dg
Project background

                               • Multi-led project
                               • Shift register based controllers
                               • Software PWM
                               • 40-60% cpu use target

Wi nøt tei a høliday in Sweden this yer?
Software PWM
                       • 32 rgb leds, 96 individual leds
                       • 12 8-bit shift registers to turn leds on/off
                       • 12 bytes output per round
                       • 128 levels of color desired
                       • ~100Hz refresh rate wanted
                       • 12,800 rounds of sending 12 bytes/second
See the løveli lakes
NO BITBANGING


                            • Bit twiddling may be fun, but it is slow
                            • Order of magnitude slower than hardware
                                 spi - numbers later




The wonderful telephøne system
Arduino Hardware SPI
                              • Suggestions for use found online suggest:
                                     while(morestuff) {
                                       data = morestuff;
                                       SPDR = data;
                                       while(!(SPSR & (1<<SPIF)));
                                     }



And mani interesting furry animals
Waiting for ?

                               • 63.2µs / 12 byte round of output
                               • 12,500 rounds/second - 80% cpu used
                               • 28% of time spent waiting for SPI to be
                                 ready
                               • waiting sucks

Including the majestic møøse
Staggered SPI output
                                 • Do work while waiting:
                           data = stuff;
                           SPDR=data;
                           while(morestuff) {
                             data = morestuff;
                             while(!(SPSR & (1<<SPIF)));
                             SPDR=data;
                           }

A Møøse once bit my sister ...
Staggered numbers

                              • 49.1µs/round - 61% cpu spent doing PWM
                              • 7% of that still spent spinning for SPSR
                              • More to peel out?

No realli! She was Karving her initials on the møøse with the sharpened end of an interspace tøøthbrush given her by Svenge - her brother-in-law - an Oslo dentist and the star of many Norwegian møvies: “The Høt Hands of an Oslo Dentist”, “Fillings
of Passion”, “The Huge Mølars of Horst Nordfink”
Squeezing more
                              • Loop unrolling buys more time
                              • 42.4µs/round - 53% cpu usage
                                SPDR = data; data=morestuff; SPSRWait;
                                SPDR = data; data=morestuff; SPSRWait;
                                SPDR = data; data=morestuff; SPSRWait;
                                SPDR = data; data=morestuff; SPSRWait;
                                ....

Mynd you, møøse bites Kan be pretty nasti...
Bitbanging?

                              • 586.6µs/round
                              • 73% cpu usage
                              • only 1000 rounds - 10 levels of color at
                                           100Hz or 20 at 50Hz
                              • any higher flickers

Did you kno that Møøse like bit banging?
More techniques
                            • Smoother PWM by spreading out work
                            • Hiding SPSR waiting in interrupts
                            • Clock counting, hand assembler
                            • Take advantage of avr opcodes *(--pData)
                                      becomes a single opcode (dec and
                                      derefence)


They dø! Dirty, dirty bit banging møøses...
Fin

                                                    Daniel Garcia
                                                 dgarcia@dgarcia.net
                                                        --Dg




Møøse trained by TUTTE HERMSGERVORDENBROTBORDA

More Related Content

Similar to Staggering spi performance for arduino

London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
Chris Fregly
 
Insecure Obsolete and Trivial - The Real IOT
Insecure Obsolete and Trivial - The Real IOTInsecure Obsolete and Trivial - The Real IOT
Insecure Obsolete and Trivial - The Real IOT
Price McDonald
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
Fisnik Kraja
 
Dsp ajal
Dsp  ajalDsp  ajal
Dsp ajal
AJAL A J
 
1. hardware basics
1. hardware basics1. hardware basics
1. hardware basics
Marian Marinov
 
DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM Memory
Subhajit Sahu
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : Presentation
Subhajit Sahu
 
Lec07
Lec07Lec07
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
Mahesh Khadatare
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
Yoss Cohen
 
OSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc NewlinOSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc Newlin
EC-Council
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
ShapeBlue
 
Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architecture
inside-BigData.com
 
CUDA
CUDACUDA
FPGA Based RGB LED Display
FPGA Based RGB LED DisplayFPGA Based RGB LED Display
FPGA Based RGB LED Display
dfordivam
 
Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro Framework
Ducas Francis
 
DEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi village
DEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi villageDEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi village
DEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi village
Felipe Prado
 
IPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesIPv6 Fundamentals & Securities
IPv6 Fundamentals & Securities
Don Anto
 
BlackHat 2009 - Hacking Zigbee Chips (slides)
BlackHat 2009 - Hacking Zigbee Chips (slides)BlackHat 2009 - Hacking Zigbee Chips (slides)
BlackHat 2009 - Hacking Zigbee Chips (slides)
Michael Smith
 
pic16f877.pdf
pic16f877.pdfpic16f877.pdf
pic16f877.pdf
Aarthi Venkatesh N
 

Similar to Staggering spi performance for arduino (20)

London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015London Spark Meetup Project Tungsten Oct 12 2015
London Spark Meetup Project Tungsten Oct 12 2015
 
Insecure Obsolete and Trivial - The Real IOT
Insecure Obsolete and Trivial - The Real IOTInsecure Obsolete and Trivial - The Real IOT
Insecure Obsolete and Trivial - The Real IOT
 
Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...Designing High Performance Computing Architectures for Reliable Space Applica...
Designing High Performance Computing Architectures for Reliable Space Applica...
 
Dsp ajal
Dsp  ajalDsp  ajal
Dsp ajal
 
1. hardware basics
1. hardware basics1. hardware basics
1. hardware basics
 
DDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM MemoryDDR, GDDR, HBM SDRAM Memory
DDR, GDDR, HBM SDRAM Memory
 
DDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : PresentationDDR, GDDR, HBM Memory : Presentation
DDR, GDDR, HBM Memory : Presentation
 
Lec07
Lec07Lec07
Lec07
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Infrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platformsInfrared simulation and processing on Nvidia platforms
Infrared simulation and processing on Nvidia platforms
 
OSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc NewlinOSINT RF Reverse Engineering by Marc Newlin
OSINT RF Reverse Engineering by Marc Newlin
 
Nick Fisk - low latency Ceph
Nick Fisk - low latency CephNick Fisk - low latency Ceph
Nick Fisk - low latency Ceph
 
Argonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer ArchitectureArgonne's Theta Supercomputer Architecture
Argonne's Theta Supercomputer Architecture
 
CUDA
CUDACUDA
CUDA
 
FPGA Based RGB LED Display
FPGA Based RGB LED DisplayFPGA Based RGB LED Display
FPGA Based RGB LED Display
 
Building a robot with the .Net Micro Framework
Building a robot with the .Net Micro FrameworkBuilding a robot with the .Net Micro Framework
Building a robot with the .Net Micro Framework
 
DEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi village
DEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi villageDEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi village
DEF CON 23 - DAKAHUNA and SATANLAWZ - introduction to sdr and wifi village
 
IPv6 Fundamentals & Securities
IPv6 Fundamentals & SecuritiesIPv6 Fundamentals & Securities
IPv6 Fundamentals & Securities
 
BlackHat 2009 - Hacking Zigbee Chips (slides)
BlackHat 2009 - Hacking Zigbee Chips (slides)BlackHat 2009 - Hacking Zigbee Chips (slides)
BlackHat 2009 - Hacking Zigbee Chips (slides)
 
pic16f877.pdf
pic16f877.pdfpic16f877.pdf
pic16f877.pdf
 

Recently uploaded

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
David Brossard
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Speck&Tech
 

Recently uploaded (20)

Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
OpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - AuthorizationOpenID AuthZEN Interop Read Out - Authorization
OpenID AuthZEN Interop Read Out - Authorization
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?
 

Staggering spi performance for arduino

  • 1. Staggering SPI Performance Or how I learned to stop bit-banging and love SPDR --Dg
  • 2. Project background • Multi-led project • Shift register based controllers • Software PWM • 40-60% cpu use target Wi nøt tei a høliday in Sweden this yer?
  • 3. Software PWM • 32 rgb leds, 96 individual leds • 12 8-bit shift registers to turn leds on/off • 12 bytes output per round • 128 levels of color desired • ~100Hz refresh rate wanted • 12,800 rounds of sending 12 bytes/second See the løveli lakes
  • 4. NO BITBANGING • Bit twiddling may be fun, but it is slow • Order of magnitude slower than hardware spi - numbers later The wonderful telephøne system
  • 5. Arduino Hardware SPI • Suggestions for use found online suggest: while(morestuff) { data = morestuff; SPDR = data; while(!(SPSR & (1<<SPIF))); } And mani interesting furry animals
  • 6. Waiting for ? • 63.2µs / 12 byte round of output • 12,500 rounds/second - 80% cpu used • 28% of time spent waiting for SPI to be ready • waiting sucks Including the majestic møøse
  • 7. Staggered SPI output • Do work while waiting: data = stuff; SPDR=data; while(morestuff) { data = morestuff; while(!(SPSR & (1<<SPIF))); SPDR=data; } A Møøse once bit my sister ...
  • 8. Staggered numbers • 49.1µs/round - 61% cpu spent doing PWM • 7% of that still spent spinning for SPSR • More to peel out? No realli! She was Karving her initials on the møøse with the sharpened end of an interspace tøøthbrush given her by Svenge - her brother-in-law - an Oslo dentist and the star of many Norwegian møvies: “The Høt Hands of an Oslo Dentist”, “Fillings of Passion”, “The Huge Mølars of Horst Nordfink”
  • 9. Squeezing more • Loop unrolling buys more time • 42.4µs/round - 53% cpu usage SPDR = data; data=morestuff; SPSRWait; SPDR = data; data=morestuff; SPSRWait; SPDR = data; data=morestuff; SPSRWait; SPDR = data; data=morestuff; SPSRWait; .... Mynd you, møøse bites Kan be pretty nasti...
  • 10. Bitbanging? • 586.6µs/round • 73% cpu usage • only 1000 rounds - 10 levels of color at 100Hz or 20 at 50Hz • any higher flickers Did you kno that Møøse like bit banging?
  • 11. More techniques • Smoother PWM by spreading out work • Hiding SPSR waiting in interrupts • Clock counting, hand assembler • Take advantage of avr opcodes *(--pData) becomes a single opcode (dec and derefence) They dø! Dirty, dirty bit banging møøses...
  • 12. Fin Daniel Garcia dgarcia@dgarcia.net --Dg Møøse trained by TUTTE HERMSGERVORDENBROTBORDA

Editor's Notes