SlideShare a Scribd company logo
1 of 37
Michael Vistine Katy Rodriguez Ralph Walker II
1/38
For our senior design project, we are testing high-performance computing using the Raspberry Pi 2. The Raspberry
Pi 2 offers a powerful 900 MHz quad-core ARM CPU that will be tested to its limit by running different tests such as
wired vs wireless, number of cores vs execution time, and temperature vs clock speed. The wired design is set up
with one master pi communicating to three slave nodes via router that we are using as a switch. The master pi runs
the test program while it is SSH to the slave pi’s which are the main horsepower while running our program
through Open MPI.
Michael Vistine
Software Engineer
2/38
Katy Rodriguez
Integration Engineer
Ralph Walker
Hardware
Engineer
 Motivation
 Hardware/Design
Description
 Software
 Data
 Timeline/Current Status
 Conclusion & Questions
3/38
4/38
Design
• Cluster Computing
• Compact
• Active Cooling
Raspberry Pi
• Low Cost multicore processor
• Open Source Code
Characterization of the Design
• Nodes vs. Performance
• Wireless vs. Wired Performance
• Passive vs. Active cooling
Photo courtesy of azchipka.thechipkahouse.com
5/38
Photo courtesy of
pcworld.com
Pi 1B+ Pi B 2 BeagleBone Pi 3
Processor 700 MHz 900-1000 MHz 1GHz 1.2GHz
Cores 1 4 1 4
RAM 512 MB 1 GB 512 MB 1 GB
Peripherals 4 USB Ports 4 USB Ports 2 USB Ports 4 USB Ports
Power Draw 0.31A 0.42A 0.46A 0.58A
Memory Micro SD slot Micro SD slot 2 GB on board
& Micro SD
Micro SD slot
Price ~$30 ~$35 ~$55 ~$35
Photo courtesy of
ti.com
Photo courtesy of
adafruit.com
Photo courtesy of
hifiberry.com
6/38
Photo courtesy of Amazon
• 2.4 amps per port
• Multi-device charging
• Surge protection
Anker 60W 6 Port USB Charger PowerPort
Photo courtesy of Amazon
Wireless Router TP-Link TL WR841N
• 300Mbps wireless
connection
• Adjustable DHCP settings
• Wireless On/Off switch
• 4 LAN ports
RPI1
7/38
Power
RPI0
(Master
Node)
RPI2
RPI3
Open
MPI
Test.cRouter
8/38
Final Design
• Custom made 3D printed enclosure
using PTC Creo Elements
• Laser cut plexiglass
• Wired/Wireless router
• Heat sinks and PC fan
• Power hub
Photo courtesy Katy Rodriguez
 OPERATING SYSTEM –
RAPSBIAN JESSIE
◦ Based on Debian Linux
◦ Lightweight OS
◦ Open source
◦ Bash terminal interface
◦ Kernel version 4.1
◦ Pre-installed with education
programing languages
9/38
Photo courtesy raspberrypi.org
 Bash terminal- used to:
◦ Edit and create configuration files
 Style of syntax used to operate in terminal
◦ $ sudo apt-get install (“file”) – used to install files
 OpenMPI:
◦ Message Passing Interface used to implement parallel
computing
◦ Takes the data and breaks it into smaller chunks and
distributes it to the nodes to run simultaneously
10/38
 First all packages were updated
 Finalize the configurations using sudo raspi-config
 Settings for the master were the same as the slave
nodes:
◦ Set the host names as rpi0
◦ Enable ssh
◦ Set the memory split to 16
12/38
 Install all the same packages from the master node
 sudo raspi-config to set all the same system
preferences as the master node
13/38
Photo courtesy of www.raspberrypi.org
14/38
1. # include <stdio.h> //Standard Input/output library
2. # include <mpi.h>
3. int main(int argc, char** argv)
4. {
5. //MPI variables
6. int num_processes;
7. int curr_rank;
8. char proc_name[MPI_MAX_PROCESSOR_NAME];
9. int proc_name_len;
10. //intialize MPI
11. MPI_Init(&argc, &argv);
12. //get the number of processes
13. MPI_Comm_size(MPI_COMM_WORLD, &num_processes);
14.
15. //Get the rank of the current process
16. MPI_Comm_rank(MPI_COMM_WORLD, &curr_rank);
17. // Get the processor name for the current thread
18. MPI_Get_processor_name(proc_name, &proc_name_len);
19. //Check that we're running this process.
20. printf("Calling process %d out of %d on %srn", curr_rank, num_processes,
proc_name);
21. //Wait for all threads ot finish
22. MPI_Finalized();
23. return 0;
24. }
•Creates user specified dummy
processes of equal size
•Allocates the processes
dynamically to each node
•Displays the process number
upon completion
#include <stdio.h>
#include <math.h>
#include <mpi.h>
#define TOTAL_ITERATIONS 10000
int main(int argc, char *argv[])
{
//MPI variables
…
sum = 0.0;
//determine step size
h = 1.0 / (double) total_iter;
//the current process will perform operations on its rank
//added by multiples of the total number of threads
// rank = 3,
for(step_iter = curr_rank +1; step_iter <= total_iter; step_iter += num_processes)
// resolve the sum into calculated value of pi
curr_pi = h * sum;
//reduce all processes' pi values to one value
MPI_Reduce(&curr_pi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);
}
// Print out the final value and error
printf("calculated Pi = %.16frn", pi);
printf("Relative Error = %.16frn", fabs(pi - M_PI));
//Wrap up MPI
MPI_Finalize();
15/38
This program calculates the
value of pi the 10,000 times
per thread
 SSH Keys generated and a
passphrase is recommended
◦ A bitmap of random
characters was then
generated as the key
 Next key is copied to slave
nodes
16/38
Photo courtesy visualgdb.com
 Set all node IP addresses as static in
◦ sudo nano /etc/network/interfaces (edit on all nodes)
 Set all hostnames to now static IP’s
◦ sudo nano /etc/hosts (edit on all nodes )
 We were only able to set up either wired or wireless
static ips at one time to prevent conflict with the
mounts
17/38
 Setting up the wireless connection was essentially
the same as setting up the wired connection
 /etc/network/hosts was edited and new ip addresses
and hostnames were added
18/38
Photo courtesy of Mike Vistine
19/38
 This figure
shows the
wireless setup
of
/etc/network/in
terfaces
Photo courtesy of Mike Vistine
20/38
 Next a common user was created on all nodes to
allow the nodes to communicate with out the need
for repeated password entry
 Next the nodes were mounted onto the master node
21/38
 sudo nano /etc/exports
◦ Line added at bottom of file:
◦ /mirror 192.168.0.0/24(rw,sync) [for wired]
◦ /mirror 192.168.1.0/24(rw,sync) [for wireless]
 These steps repeated for all slave nodes
22/38
• For each node /etc/rc.local was
edited
• A few lines were added at the end
of the file to print “mounting
network drives”
• This script was supposed to
automatically mount the drives on
boot
• The automount function was
incredibly slow
Photo courtesy of Mike Vistine
23/38
 Log in as mpiu on master node using
 su – mpiu
 Switch to the /mirror/code/
 mpicc calc_pi.c –o calc_pi
 time mpiexec –n 4 –H RRPI0-3 calc_pi
24/38
 The .c files and
the executables
in the directory
in the screen
shot
 The execution of
the program
call_procs with
mpiexec
25/38
Photo courtesy of Mike Vistine
 Here you can see an example of the format while
running the calc_pi test
 Each core and the number of threads are designated
in the MPI command
Photo courtesy of Mike Vistine
26/38
 In order for wireless mpi to work the mounts had to
be set manually
 The nfs kernel had to be restart each time the pi’s
were powered off or rebooted
27/38
 Wired vs Wireless performance
◦ Test the processing performance of cluster when:
 Hard wired to router
 Using dongles for each node to communicate wirelessly
 Computational benchmark tests
◦ Using benchmark software to observe total processing power across
all pi’s
◦ Using complicated program as test material to solve with cluster
 Graphical performance info
 Implementation of practical applications
 Active Cooling of the Pi’s
◦ Fans implemented in final case design
28/38
29/38
 Wired
performance
did prove to be
more efficient
 The wireless
values were
inconsistent
 Each record
value per core
was an average
of three runs
30/38
 Passive temperatures proved to be higher before
and after running wireless data test.
 Active cooling significantly improved temperature
regulation of each pi.
31/38
 Passive cooling results were very erratic.
 Active cooling results were consistent and had better
test times.
32/38
Aug. 28 – Sept. 27
Sept. 23 – Dec. 10
Oct. 11 – March 23
Jan. 4 – April 5
Feb. 9 – April 15
33/38
Budget from ScratchTotal Project Budget
 All project tests are complete
 Data has been collected for anaylsis
 Case is 98% complete
34/38
 Add finishing details to documentation and case
design
 Make senior design day poster
 Prepare for senior design day
35/38
 Experiment completed
 Wired proved to be faster and more reliable than
wireless
 Active cooling made a significant different in
performance and temperature regulation
36/38
 http://www.python.org/doc/current/tut/tut.html
 http://likemagicappears.com/projects/raspberry-pi-cluster/
 http://www.zdnet.com/article/build-your-own-supercomputer-
out-of-raspberry-pi-boards/
 https://Youtu.be/R0Uglgcb5g
 http://www.newegg.com/
 http://www.amazon.com
 http://anllyquinte.blogspot.com/
 http://www.slideshare.net/calcpage2011/mpi4pypdf
37/38
38/38

More Related Content

What's hot

Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPFRogerColl2
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelThomas Graf
 
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...IO Visor Project
 
Network time sync solutions for security
Network time sync solutions for securityNetwork time sync solutions for security
Network time sync solutions for securityMohd Amir
 
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Adam Dunkels
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!Affan Syed
 
Kernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauKernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauAnne Nicolas
 
Building day 2 upload Building the Internet of Things with Thingsquare and ...
Building day 2   upload Building the Internet of Things with Thingsquare and ...Building day 2   upload Building the Internet of Things with Thingsquare and ...
Building day 2 upload Building the Internet of Things with Thingsquare and ...Adam Dunkels
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on LabMichelle Holley
 
Network time protocol
Network time protocolNetwork time protocol
Network time protocolMohd Amir
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2Adam Dunkels
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsScyllaDB
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityThomas Graf
 
Hermes Reliable Replication Protocol - Poster
Hermes Reliable Replication Protocol - Poster Hermes Reliable Replication Protocol - Poster
Hermes Reliable Replication Protocol - Poster Antonios Katsarakis
 
Kernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: DebianKernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: DebianAnne Nicolas
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPThomas Graf
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPThomas Graf
 

What's hot (20)

CapstonePresentation
CapstonePresentationCapstonePresentation
CapstonePresentation
 
Introduction to eBPF
Introduction to eBPFIntroduction to eBPF
Introduction to eBPF
 
Accelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux KernelAccelerating Envoy and Istio with Cilium and the Linux Kernel
Accelerating Envoy and Istio with Cilium and the Linux Kernel
 
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...
Using IO Visor to Secure Microservices Running on CloudFoundry [OpenStack Sum...
 
Network time sync solutions for security
Network time sync solutions for securityNetwork time sync solutions for security
Network time sync solutions for security
 
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
Advanced Internet of Things firmware engineering with Thingsquare and Contiki...
 
ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!ebpf and IO Visor: The What, how, and what next!
ebpf and IO Visor: The What, how, and what next!
 
Kernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy TarreauKernel Recipes 2017 - Build farm again - Willy Tarreau
Kernel Recipes 2017 - Build farm again - Willy Tarreau
 
Building day 2 upload Building the Internet of Things with Thingsquare and ...
Building day 2   upload Building the Internet of Things with Thingsquare and ...Building day 2   upload Building the Internet of Things with Thingsquare and ...
Building day 2 upload Building the Internet of Things with Thingsquare and ...
 
Intel® RDT Hands-on Lab
Intel® RDT Hands-on LabIntel® RDT Hands-on Lab
Intel® RDT Hands-on Lab
 
Network time protocol
Network time protocolNetwork time protocol
Network time protocol
 
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
Building the Internet of Things with Thingsquare and Contiki - day 2 part 2
 
Get Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java ApplicationsGet Lower Latency and Higher Throughput for Java Applications
Get Lower Latency and Higher Throughput for Java Applications
 
Linux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network SecurityLinux Native, HTTP Aware Network Security
Linux Native, HTTP Aware Network Security
 
Hermes Reliable Replication Protocol - Poster
Hermes Reliable Replication Protocol - Poster Hermes Reliable Replication Protocol - Poster
Hermes Reliable Replication Protocol - Poster
 
Kernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: DebianKernel maintainance in Linux distributions: Debian
Kernel maintainance in Linux distributions: Debian
 
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDPDockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
DockerCon 2017 - Cilium - Network and Application Security with BPF and XDP
 
Cilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDPCilium - Fast IPv6 Container Networking with BPF and XDP
Cilium - Fast IPv6 Container Networking with BPF and XDP
 
Sdn command line controller lab
Sdn command line controller labSdn command line controller lab
Sdn command line controller lab
 
Ebpf ovsconf-2016
Ebpf ovsconf-2016Ebpf ovsconf-2016
Ebpf ovsconf-2016
 

Similar to Presentation 2 Spring 2016 FINAL fat cut (1)

A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...NETWAYS
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performanceinside-BigData.com
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)Hein Min Htike
 
Peer Sim (P2P network)
Peer Sim (P2P network)Peer Sim (P2P network)
Peer Sim (P2P network)Sijo Emmanuel
 
Running head network design 1 netwo
Running head network design                             1 netwoRunning head network design                             1 netwo
Running head network design 1 netwoAKHIL969626
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Ontico
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQICS
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloadsinside-BigData.com
 
Comparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyComparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyBenoit des Ligneris
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdfPramodhN3
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 WorkshopYi Tseng
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Lecture 3
Lecture 3Lecture 3
Lecture 3Mr SMAK
 

Similar to Presentation 2 Spring 2016 FINAL fat cut (1) (20)

uCluster
uClusteruCluster
uCluster
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
OSMC 2019 | Monitoring Alerts and Metrics on Large Power Systems Clusters by ...
 
Trends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient PerformanceTrends in Systems and How to Get Efficient Performance
Trends in Systems and How to Get Efficient Performance
 
Tos tutorial
Tos tutorialTos tutorial
Tos tutorial
 
Peer sim (p2p network)
Peer sim (p2p network)Peer sim (p2p network)
Peer sim (p2p network)
 
GCF
GCFGCF
GCF
 
Peer Sim (P2P network)
Peer Sim (P2P network)Peer Sim (P2P network)
Peer Sim (P2P network)
 
Running head network design 1 netwo
Running head network design                             1 netwoRunning head network design                             1 netwo
Running head network design 1 netwo
 
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
Dataplane networking acceleration with OpenDataplane / Максим Уваров (Linaro)
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
LPC4300_two_cores
LPC4300_two_coresLPC4300_two_cores
LPC4300_two_cores
 
Network-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQNetwork-Connected Development with ZeroMQ
Network-Connected Development with ZeroMQ
 
Large-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC WorkloadsLarge-Scale Optimization Strategies for Typical HPC Workloads
Large-Scale Optimization Strategies for Typical HPC Workloads
 
Comparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization TechnologyComparison of Open Source Virtualization Technology
Comparison of Open Source Virtualization Technology
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
P4_tutorial.pdf
P4_tutorial.pdfP4_tutorial.pdf
P4_tutorial.pdf
 
2016 NCTU P4 Workshop
2016 NCTU P4 Workshop2016 NCTU P4 Workshop
2016 NCTU P4 Workshop
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Lecture 3
Lecture 3Lecture 3
Lecture 3
 

Presentation 2 Spring 2016 FINAL fat cut (1)

  • 1. Michael Vistine Katy Rodriguez Ralph Walker II 1/38 For our senior design project, we are testing high-performance computing using the Raspberry Pi 2. The Raspberry Pi 2 offers a powerful 900 MHz quad-core ARM CPU that will be tested to its limit by running different tests such as wired vs wireless, number of cores vs execution time, and temperature vs clock speed. The wired design is set up with one master pi communicating to three slave nodes via router that we are using as a switch. The master pi runs the test program while it is SSH to the slave pi’s which are the main horsepower while running our program through Open MPI.
  • 2. Michael Vistine Software Engineer 2/38 Katy Rodriguez Integration Engineer Ralph Walker Hardware Engineer
  • 3.  Motivation  Hardware/Design Description  Software  Data  Timeline/Current Status  Conclusion & Questions 3/38
  • 4. 4/38 Design • Cluster Computing • Compact • Active Cooling Raspberry Pi • Low Cost multicore processor • Open Source Code Characterization of the Design • Nodes vs. Performance • Wireless vs. Wired Performance • Passive vs. Active cooling Photo courtesy of azchipka.thechipkahouse.com
  • 5. 5/38 Photo courtesy of pcworld.com Pi 1B+ Pi B 2 BeagleBone Pi 3 Processor 700 MHz 900-1000 MHz 1GHz 1.2GHz Cores 1 4 1 4 RAM 512 MB 1 GB 512 MB 1 GB Peripherals 4 USB Ports 4 USB Ports 2 USB Ports 4 USB Ports Power Draw 0.31A 0.42A 0.46A 0.58A Memory Micro SD slot Micro SD slot 2 GB on board & Micro SD Micro SD slot Price ~$30 ~$35 ~$55 ~$35 Photo courtesy of ti.com Photo courtesy of adafruit.com Photo courtesy of hifiberry.com
  • 6. 6/38 Photo courtesy of Amazon • 2.4 amps per port • Multi-device charging • Surge protection Anker 60W 6 Port USB Charger PowerPort Photo courtesy of Amazon Wireless Router TP-Link TL WR841N • 300Mbps wireless connection • Adjustable DHCP settings • Wireless On/Off switch • 4 LAN ports
  • 8. 8/38 Final Design • Custom made 3D printed enclosure using PTC Creo Elements • Laser cut plexiglass • Wired/Wireless router • Heat sinks and PC fan • Power hub Photo courtesy Katy Rodriguez
  • 9.  OPERATING SYSTEM – RAPSBIAN JESSIE ◦ Based on Debian Linux ◦ Lightweight OS ◦ Open source ◦ Bash terminal interface ◦ Kernel version 4.1 ◦ Pre-installed with education programing languages 9/38 Photo courtesy raspberrypi.org
  • 10.  Bash terminal- used to: ◦ Edit and create configuration files  Style of syntax used to operate in terminal ◦ $ sudo apt-get install (“file”) – used to install files  OpenMPI: ◦ Message Passing Interface used to implement parallel computing ◦ Takes the data and breaks it into smaller chunks and distributes it to the nodes to run simultaneously 10/38
  • 11.  First all packages were updated  Finalize the configurations using sudo raspi-config  Settings for the master were the same as the slave nodes: ◦ Set the host names as rpi0 ◦ Enable ssh ◦ Set the memory split to 16 12/38
  • 12.  Install all the same packages from the master node  sudo raspi-config to set all the same system preferences as the master node 13/38 Photo courtesy of www.raspberrypi.org
  • 13. 14/38 1. # include <stdio.h> //Standard Input/output library 2. # include <mpi.h> 3. int main(int argc, char** argv) 4. { 5. //MPI variables 6. int num_processes; 7. int curr_rank; 8. char proc_name[MPI_MAX_PROCESSOR_NAME]; 9. int proc_name_len; 10. //intialize MPI 11. MPI_Init(&argc, &argv); 12. //get the number of processes 13. MPI_Comm_size(MPI_COMM_WORLD, &num_processes); 14. 15. //Get the rank of the current process 16. MPI_Comm_rank(MPI_COMM_WORLD, &curr_rank); 17. // Get the processor name for the current thread 18. MPI_Get_processor_name(proc_name, &proc_name_len); 19. //Check that we're running this process. 20. printf("Calling process %d out of %d on %srn", curr_rank, num_processes, proc_name); 21. //Wait for all threads ot finish 22. MPI_Finalized(); 23. return 0; 24. } •Creates user specified dummy processes of equal size •Allocates the processes dynamically to each node •Displays the process number upon completion
  • 14. #include <stdio.h> #include <math.h> #include <mpi.h> #define TOTAL_ITERATIONS 10000 int main(int argc, char *argv[]) { //MPI variables … sum = 0.0; //determine step size h = 1.0 / (double) total_iter; //the current process will perform operations on its rank //added by multiples of the total number of threads // rank = 3, for(step_iter = curr_rank +1; step_iter <= total_iter; step_iter += num_processes) // resolve the sum into calculated value of pi curr_pi = h * sum; //reduce all processes' pi values to one value MPI_Reduce(&curr_pi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD); } // Print out the final value and error printf("calculated Pi = %.16frn", pi); printf("Relative Error = %.16frn", fabs(pi - M_PI)); //Wrap up MPI MPI_Finalize(); 15/38 This program calculates the value of pi the 10,000 times per thread
  • 15.  SSH Keys generated and a passphrase is recommended ◦ A bitmap of random characters was then generated as the key  Next key is copied to slave nodes 16/38 Photo courtesy visualgdb.com
  • 16.  Set all node IP addresses as static in ◦ sudo nano /etc/network/interfaces (edit on all nodes)  Set all hostnames to now static IP’s ◦ sudo nano /etc/hosts (edit on all nodes )  We were only able to set up either wired or wireless static ips at one time to prevent conflict with the mounts 17/38
  • 17.  Setting up the wireless connection was essentially the same as setting up the wired connection  /etc/network/hosts was edited and new ip addresses and hostnames were added 18/38
  • 18. Photo courtesy of Mike Vistine 19/38
  • 19.  This figure shows the wireless setup of /etc/network/in terfaces Photo courtesy of Mike Vistine 20/38
  • 20.  Next a common user was created on all nodes to allow the nodes to communicate with out the need for repeated password entry  Next the nodes were mounted onto the master node 21/38
  • 21.  sudo nano /etc/exports ◦ Line added at bottom of file: ◦ /mirror 192.168.0.0/24(rw,sync) [for wired] ◦ /mirror 192.168.1.0/24(rw,sync) [for wireless]  These steps repeated for all slave nodes 22/38
  • 22. • For each node /etc/rc.local was edited • A few lines were added at the end of the file to print “mounting network drives” • This script was supposed to automatically mount the drives on boot • The automount function was incredibly slow Photo courtesy of Mike Vistine 23/38
  • 23.  Log in as mpiu on master node using  su – mpiu  Switch to the /mirror/code/  mpicc calc_pi.c –o calc_pi  time mpiexec –n 4 –H RRPI0-3 calc_pi 24/38
  • 24.  The .c files and the executables in the directory in the screen shot  The execution of the program call_procs with mpiexec 25/38 Photo courtesy of Mike Vistine
  • 25.  Here you can see an example of the format while running the calc_pi test  Each core and the number of threads are designated in the MPI command Photo courtesy of Mike Vistine 26/38
  • 26.  In order for wireless mpi to work the mounts had to be set manually  The nfs kernel had to be restart each time the pi’s were powered off or rebooted 27/38
  • 27.  Wired vs Wireless performance ◦ Test the processing performance of cluster when:  Hard wired to router  Using dongles for each node to communicate wirelessly  Computational benchmark tests ◦ Using benchmark software to observe total processing power across all pi’s ◦ Using complicated program as test material to solve with cluster  Graphical performance info  Implementation of practical applications  Active Cooling of the Pi’s ◦ Fans implemented in final case design 28/38
  • 28. 29/38  Wired performance did prove to be more efficient  The wireless values were inconsistent  Each record value per core was an average of three runs
  • 29. 30/38  Passive temperatures proved to be higher before and after running wireless data test.  Active cooling significantly improved temperature regulation of each pi.
  • 30. 31/38  Passive cooling results were very erratic.  Active cooling results were consistent and had better test times.
  • 31. 32/38 Aug. 28 – Sept. 27 Sept. 23 – Dec. 10 Oct. 11 – March 23 Jan. 4 – April 5 Feb. 9 – April 15
  • 33.  All project tests are complete  Data has been collected for anaylsis  Case is 98% complete 34/38
  • 34.  Add finishing details to documentation and case design  Make senior design day poster  Prepare for senior design day 35/38
  • 35.  Experiment completed  Wired proved to be faster and more reliable than wireless  Active cooling made a significant different in performance and temperature regulation 36/38
  • 36.  http://www.python.org/doc/current/tut/tut.html  http://likemagicappears.com/projects/raspberry-pi-cluster/  http://www.zdnet.com/article/build-your-own-supercomputer- out-of-raspberry-pi-boards/  https://Youtu.be/R0Uglgcb5g  http://www.newegg.com/  http://www.amazon.com  http://anllyquinte.blogspot.com/  http://www.slideshare.net/calcpage2011/mpi4pypdf 37/38
  • 37. 38/38