The aim of the proposed research will be to develop software for implementing a parallel solution for the RSA decryption algorithm. Multithread and distributed computing methods will be used to reach the aimed objective. This effort will include the development of a hybrid OpenMP/MPI program to maximize the use of computational resources and, consequently, decrease the time to decrypt large ciphertexts.
1. For our experiments, we use the Stampede Computer Cluster at Texas Advanced Computing Center.
Stampede consists of 6,400 powerful desktop machines providing 522,080 processing cores. For our
tests, we use up to 256 of those machines, each of which containing dual 8-core Intel Xeon E5-2680
processors and 32 gigabytes of memory. At the time of this research, the Stampede Computer
Cluster is the 10th fastest computer in the world [5].
Results
The following graphs show the obtained results for the proposed solution.
The Strong Scaling graph shows how the processing time varies for a fixed sized input, when dividing
the work to up to 16 parallel threads in a single CPU. The Weak Scaling graph, on the other hand,
keeps the data fixed for each thread, therefore increasing the total input with the number of threads.
The presented OpenMP results show that the multithreaded implementation is able to take advantage
of the multi-core architecture up to the point where the number of running threads matches the
number of available processing cores.
On top on the multithreaded OpenMP implementation, MPI is used to further improve the scalability of
our solution by introducing distributed computing capabilities. The same Strong Scaling and Weak
Scaling measures were taken, and the below graphs show our results for running the proposed
solution in up to 512 CPUs, totaling 4096 parallel working threads.
SCALABLE AND DISTRIBUTED APPROACH FOR RSA DECRYPTION ALGORITHM
Alysson Almeida1, Tiago de Almeida11
Prof. Dr. Christopher Stone1
1. Department of Computer Science,
Loyola University of Chicago, Chicago, IL
Introduction
The RSA algorithm was first described in 1977 by R.L. Rivest, A. Shamir, and L. Adleman as an
implementation of the concept of “public-key cryptosystems”. Such concept was presented by Diffie
Helman; however, no practical implementation was developed in his work [1][2]. RSA is the most
popular and widely deployed public-key crypto-system and is used for both encryption and digital
signature. Its algorithm is based on modular arithmetic, and due to the magnitude of the numbers
used, the decryption process is computationally intensive. In this research, we discuss the design,
implementation and results of a scalable, multi-threaded and distributed RSA decryption algorithm,
which aims to use parallel and high performance computing techniques to drastically reduce the
processing time. OpenMP is used to distribute the work to multiple parallel threads running on the
same CPU, while MPI is used in order to create a distributed computing scenario, having hundreds of
computers working in parallel to decrypt data. The presented algorithm is capable of scaling almost
ideally, presenting a considerably constant efficiency and greatly benefiting from
the parallel implementation. The experiments were performed on the Stampede Computer Cluster,
located at the Texas Advanced Computing Center.
Keywords — Multithreading, Distributed Computing, OpenMP, MPI, RSA
Research Objectives
The aim of the proposed research will be to develop software for implementing a parallel solution for
the RSA decryption algorithm. Multithread and distributed computing methods will be used to reach
the aimed objective. This effort will include the development of a hybrid OpenMP/MPI program to
maximize the use of computational resources and, consequently, decrease the time to decrypt large
ciphertexts.
Methods
The proposed method works as follows: The input ciphertext is divided into N blocks, which are
distributed to N processes using Message Passing Interface (MPI). Each process runs on an
independent CPU which, in turn, provides 8 cores for running parallel threads. In order to take full
advantage of that scenario, we divide each block again into 8 smaller blocks, distributing them to 8
threads that run in parallel in a single CPU. OpenMP is used to create and manage the threads. As a
result, the input ciphertext data is divided into N*8 blocks, which are all decrypted by N processes,
totaling N*8 parallel threads. At the end, all blocks are assembled together to create the final
decrypted data. The figure below shows the proposed parallel and distributed approach in contrast
with the original serial algorithm
The Efficiency graphs for the OpenMP implementation confirm the previously observed behavior,
where the parallel implementation scales up to the hardware limitation. The MPI Efficiency graph
shows that using above 256 CPUs to process a 1MB input data sample brings diminishing returns,
due to the overhead of distributed computing.
Conclusions
The proposed multithread and distributed implementation of the RSA
decryption algorithm scales almost ideally. The initial parallel approach using only
OpenMP takes full advantage of the available multiple cores in the CPU, only saturating when
threads need to compete for processing time. The second approach fixes the number of threads in 8
and adds MPI distributed computing capabilities on top. This algorithm also presents considerably
ideal scalability for all numbers of processors we tested, and only showed a decrease in performance
when the input data was too small. For this case, the overhead of managing the distributed work
throughout hundreds of processors becomes apparent, and increasing the number of parallel
processes above 256 brings diminishing returns. For large enough input data, however, the proposed
solution does not present any scalability limit.
Acknowledgment
The authors would like to thank the research funding agency CAPES Foundation, Ministry of
Education of Brazil for the scholarships granted to the post-graduate students participating in the
study. They would also like to thank Rajorshi Biswas, Shibdas Bandyopadhyay, Anirban Banerjee for
distributing their code under the GNU General Public License. Funding to attend and present in
March 2016 at the BRASCON conference (Cambridge, MA) was provided by Loyola University
Chicago Graduate School.
References
[1] R. Rivest, A. Shamir and L. Adleman, 'A method for obtaining digital signatures and public-key
cryptosystems', Communications of the ACM, vol. 21, no. 2, pp. 2-3, 1978.
[2] Diffie, W., and Hellman, M. New directions in cryptography. IEEE Trans. Inform. Theory IT-22,
(Nov. 1976), 644-654.
[3] B. Rajorshi S. Bandyopadhyay, and A. Banerjee, ‘A Fast Implementation Of The RSA Algorithm
Using The GNU MP Library’.
[4] Abusharekh, A., Gaj, K., ‘Comparative Analysis of Software Libraries for Public Key Cryptography’.
http://www.hyperelliptic.org/SPEED/slides/Abusharekh_Gaj_SPEED.pdf
[5] Top 500 supercomputing site.
http://www.top500.org/lists/2015/11/. (accessed February 8, 2016).
0.5
1
2
4
8
16
1 2 4 8 16
TIME (S)
THREADS
OpenMP
Weak Scaling Time
100KB 1MB
0.0039063
0.0078125
0.015625
0.03125
0.0625
0.125
0.25
0.5
1
2
4
8
16
32
64
128
1 2 4 6 16 32 64 128 256 512
TIME (S)
PROCESSES
OpenMP + MPI
Strong Scaling Time
1MB 10MB 100MB
0.5
1
2
4 16 64 256
EFFICIENCY
PROCESSES
OpenMP + MPI
Weak Scaling Efficiency
100 KB 1 MB
0.0625
0.125
0.25
0.5
1
1 4 16 64 256
TIME (S)
PROCESSES
OpenMP + MPI
Weak Scaling Time
100 KB 1 MB
0.25
0.5
1
2
2 4 8 16
EFFICIENCY
THREADS
OpenMP
Strong Scaling Efficiency
100KB 1MB 10MB
0.25
0.5
1
2
2 4 8 16
EFFICIENCY
THREADS
OpenMP
Weak Scaling Efficiency
100KB 1MB
0.0625
0.125
0.25
0.5
1
2
4
8
16
32
64
128
1 2 4 8 16
TIME (S)
THREADS
OpenMP
Strong Scaling Time
100KB 1MB 10MB
0.5
1
2
2 4 8 16 32 64 128 256 512
EFFICIENCY
PROCESSES
OpenMP + MPI
Strong Scaling Efficiency
1MB 10MB 100MB