Parallel Numerical Verification of the σ_odd problem

Universit´e Libre de Bruxelles
Computer Science Department
INFO-Y100 (4004940ENR) Parallel systems
Project
Parallel
Numerical Verification of
the σodd problem
Presentation
1
3
7
21
Olivier Pirson — olivier.pirson.opi@gmail.com
orcid.org/0000-0001-6296-9659
December 15, 2017
(Last modiﬁcations: September 11, 2019)
https://speakerdeck.com/opimedia/parallel-numerical-verification-of-the-s-odd-problem

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 2 / 41

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd and ςodd functions
σ(n) = sum of all divisors of n (sigma)
σodd(n) = sum of odd divisors of n (sigma odd)
All divisors of 18: {1, 2, 3, 6, 9, 18}
Only odd divisors: {1, 3, 9} so σodd(18) = 13
All divisors of 19: {1, 19}
Only odd divisors: {1, 19} so σodd(19) = 20
ςodd(n) = σodd(n) divided by 2 until to be odd (varsigma odd)
ςodd(18) = 13
ςodd(19) = 5
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2
σ(n) 1 3 4 7 6 12 8 15 13 18 12 28 14 24 24 31 18 39 20 42 3
σodd(n) 1 1 4 1 6 4 8 1 13 6 12 4 14 8 24 1 18 13 20 6 3
ςodd(n) 1 1 1 1 3 1 1 1 13 3 3 1 7 1 3 1 9 13 5 3

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd problem: an iteration problem
We iterate the ςodd (or equivalently σodd) function
and we observe that we always reach 1.
Numbers in orange are square numbers.
For all n odd and square number (= 1):
ςodd(n) = σodd(n) > n
But we observe that for almost other odd numbers n:
ςodd(n) < n
Note that even numbers are not interesting
for this problem, because
σodd(2n) = σodd(n).
and ςodd(2n) = ςodd(n).
1
3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81 121 133
83
85

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd problem: an iteration problem
The point in the middle of this picture is the number 1.
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd problem is a conjecture
Does the iteration always reaches 1?
The σodd problem is the conjecture that is always true,
what ever the starting number (integer ≥ 1).
Successfully checked for each n until 1.1 × 1011
≃ 1.6 × 236
with programs developed for this work.
Previous result known was 230
.
Moreover, n ≤ 1011
=⇒ ςodd
15
(n) = 1

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
Multi-threads
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Numerical verification by the simple direct algorithm
For each odd number:
Algorithm 1 first check varsigma odd(first n, last n)
Ò f i r s t c h e c k v a r s i g m a o d d ( f i r s t n , l a s t n ) :
1 ÓÖ n = f i r s t n ØÓ l a s t n ×Ø Ô 2
2 lowe r n , l e n g t h = f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r (n )
3 l e n g t h > 1 Ø Ò
4 ÔÖ ÒØ n , lowe r n , l e n g t h

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Numerical verification by the simple direct algorithm
Simply iterate ςodd until to have a little number:
Algorithm 2 first iterate varsigma odd until lower(n)
Ò f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r ( s t a r t n ) :
1 n = s t a r t n
2 l e n g t h = 0
3 Ó
4 l e n g t h = l e n g t h + 1
5 n = ςodd (n )
6 n > MAX POSSIBLE N Ø Ò
7 ÔÖ ÒØ "! Impossible to check " , s t a r t n , le ngth , n
8 Ü Ø
9 Û Ð n > s t a r t n
10
11 n = s t a r t n Ø Ò
12 ÔÖ ÒØ "! Found not trivial cycle " , s t a r t n , l e n g t h
13 Ü Ø
14
15 Ö ØÙÖÒ n , l e n g t h

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Computation of σodd(n)
Assume n odd:
n = pα1
1 × pα2
2 × pα3
3 × · · · × pαk
k with pi distinct prime numbers
σodd(n) =
pα+1
1 −1
p1−1
×
pα+1
2 −1
p2−1
×
pα+1
3 −1
p3−1
× · · · ×
pα+1
k
−1
pk −1
Thus, to verify the conjecture we must factorize
(other ways are less efficient).

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Use properties to avoid a lot of computations
For each n, we want to check there exists k such that σodd
k
(n) = 1
It is equivalent to check there exists k such that ςodd
k
(n) < n.
That reduces the path that will be compute.
Only odd numbers must be check (50%).
Other numbers can be avoided (remains ≃ 33%).
Almost numbers reach smaller number in only one step!
Exceptions identified before computation: square numbers.
The other exceptions (called bad numbers) are very rare.
So instead to iterate we will compute only one step
and keep exceptions that will be check separately (very fast).
ςodd(ab) ≤ ςodd(a) ςodd(b)
−→ shortcut in the factorization (the most heavy work)
(with use of previous known bad numbers
or with general upper bound).

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Transformed problem
With these properties we have transformed the necessity to compute the
complete iteration of σodd
(and thus the complete factorization)
of each number
to this both improved and simpler (relatively to other possible
optimizations) algorithm:
compute only one
(eventually partially) iteration of ςodd
for only some numbers.
“The cheapest, fastest and most reliable components of a computer system
are those that aren’t there.”
— Gordon Bell

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Transformed problem
(progs/src/sequential/sequential/sequential.hpp)
Algorithm 3 sequential check gentle varsigma odd(first n,
last n)
// P r e c o n d i t i o n s : 3 ≤ f i r s t n odd ≤ l a s t n ≤ MAX POSSIBLE N
Ò s e q u e n t i a l c h e c k g e n t l e v a r s i g m a o d d ( f i r s t n , l a s t n ) :
1 b a d t a b l e = ∅
2 ÓÖ n = f i r s t n ØÓ l a s t n ×Ø Ô 2
3 ÒÓØ (3, 7, 31 or 127 n) Ø Ò
4 ÒÓØ (n i s square number) Ø Ò
5 ÒÓØ s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n ,
6 bad table , f i r s t n ) Ø Ò
7 b a d t a b l e = b a d t a b l e ∪ {n}
8 ÔÖ ÒØ n
Ö ØÙÖÒ b a d t a b l e
// P o s t c o n d i t i o n :
// I f a l l numbers < f i r s t n r e s p e c t the c o n j e c t u r e
// and a l l square numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e
// and a l l odd bad numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e
// then a l l numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e .
// P r i n t a l l odd bad numbers between f i r s t n and l a s t n ( i n c l u d e d )
// and r e t u r n the s e t .
d n means that d is a divisor of n.
d n means that d is a divisor of n, but d2
is not.

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Transformed problem
Computes (eventually partially) ςodd(n) by the factorization of n
and returns True if and only if ςodd(n) < n.
Algorithm 4 sequential is varsigma odd lower(n, bad table,
bad first n)
// P r e c o n d i t i o n s : 3 ≤ n odd ≤ MAX POSSIBLE N
// b a d t a b l e c o n t a i n s a l l odd bad numbers
// between b a d f i r s t n ( i n c l u d e d ) and n ( e xc lude d )
Ò s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n , bad table , b a d f i r s t n ) :
1 n d i v i d e d = n
2 varsigma odd = 1
3 ÓÖ p odd prime ≤ ⌊
√
n divided⌋
4 α = 0
5 Û Ð p n d i v i d e d
6 n d i v i d e d = n d i v i d e d / p
7 α = α + 1
8
9 α > 0 Ø Ò // pα
i s a f a c t o r of n
10 varsigma odd = varsigma odd ∗ Odd
pα − 1
p − 1
+ pα
11 ( varsigma odd
12 ∗ s e q u e n t i a l s i g m a o d d u p p e r b o u n d ( n d i v i d e d ,
13 bad table , b a d f i r s t n )) < n Ø Ò
14 Ö ØÙÖÒ ÌÖÙ
15
16 n d i v i d e d > 1 Ø Ò // n d i v i d e d i s prime
17 varsigma odd = varsigma odd ∗ Odd( n d i v i d e d + 1)
18
19 Ö ØÙÖÒ ( varsigma odd < n )

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Factorization shortcut
When we found a prime factor,
it may be possible to shortcut the complete factorization.
For example, with a first prime factor p1 of n:
n = pα1
1 n′
σodd(n) =
pα+1
1 −1
p1−1
× σodd(n′
)
σodd(n) ≤
pα+1
1 −1
p1−1
× upper bound of σodd(n′
) < n? If yes, then stop
Upper bound always true:
σodd(n′
) ≤ 2n′ 8
√
n′
It is the same for the ςodd function, with some additional division(s) by 2.
And if n′
is gentle (odd but neither square neither bad):
ςodd(n′
) < n′
(so it can be possible to shortcut “often”).

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Stop and restart
Note that the program can be stopped (or executed until some last n)
and restarted with the last value checked.
In fact, it is possible to compute different ranges of numbers separately (in
the same time or not).
If all required numbers are checked (with odd square numbers and bad
numbers checked, for example by the naive way, which is fast for these rare
numbers) until number N, then the conclusion is for all n such that n ≤ N,
the iteration of σodd (and ςodd) from n reaches 1 (what we wanted to
achieve).

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
Multi-threads
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Performance with one thread/process
First, the comparison between sequential, three multi-threading and two
message-passing implementations (for only one thread/process).
By checking numbers between 1 and 20,000,001.
On a personal computer with 4 cores, 2 threads by core.
6
6.2
6.4
6.6
6.8
7
0 1 2 3 4 5
seconds
0:sequential,
one thread (1:one by one, 2:by range, 3:dynamic),
one process MPI (4:one by one, 5:dynamic)

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads (thread of C++11)
3 different implementations: (progs/src/threads/threads/threads.hpp)
One by one
Each slave computes independently one number and sends a boolean to the
master. The master also computes one number, and waits everybody. And
so forth with next numbers.
Silly implementation; just to try. Very inefficient. The barrier is a big
limitation because each number has a different factorization time.
By range
Like one by one but each slave receives a range of numbers (by these
extremities), computes and returns the (very little) set of bad numbers
founds. The master computes a smaller range, and waits everybody. And so
forth with next numbers.
Really better because computation is more well balanced, due to an average
of the factorization time.
“Dynamic”
Like by range, but the master do not waits, gives new range when a slave is
free, and computes also the rest of the time.
Very good occupation for each thread (see graph in following slides).
All threads share the same prime number tables.

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads — one by one
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8
seconds
# threads

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads — by range
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8
seconds
# threads

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads — “dynamic”
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8
seconds
# threads

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
2 implementations: (progs/src/mpi/mpi/mpi.hpp)
One by one
One element, barrier. Very inefficient; just to try.
“Dynamic”
By range and does not wait.
Same algorithms than for multi-threading.
But exchange information by messages. (That could be between different
machines, but these results was computed on only one computer.) Little
impact if size of range is important compared to the small quantity of these
information.
Messages from the master to each slave:
The unique number or the extremities of the range, and the new (rare) bad
numbers found by other threads.
Messages from each slave to the master:
A boolean or a array of the new (rare) bad numbers found.
Main differences with multi-threading: exchanges between processes,
and each process have its own prime numbers table.

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Message-passing — “dynamic”
0
1
2
3
4
5
6
7
1 2 3 4 5 6
seconds
5 ¡o¢£¤¤

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL)
Only one implementation: (progs/src/opencl/opencl/opencl.hpp)
By list of numbers
The CPU selects a list of numbers to be check
and sends them to the GPU.
The GPU compute completely ς(n) for each n received (without to use
a list of bad numbers and without to shortcut the factorization).
Then the GPU returns a corresponding list of booleans to the CPU.
And so forth.
Instead a direct computation of ς(n) during the factorization,
this implementation collects before all prime factors of n.
That makes it easier the parallel work.
The important improvements of the algorithm (the shortcut of the
factorization) was also removed, because that did not gave better results,
due to the complexification of branching.

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL): explanations of bad results
The computation is massively parallel (if big list of numbers).
But the efficiency is limited by the difference of the factorization process
for each number. The algorithm, by the nature of the computation of the
problem by factorization, is more or less a random succession of conditional
branches. And the nature of the parallel computation by GPUs loses a lot
of power on that.
More the list of numbers is big and more the computation is ideally
parallel. But more this list is big and more the computation of each
number disturbs the progress of the others.
Moreover, all numbers quickly factorized wait the end of the others.
Also, GPUs give the best of their power on floating point computations.
This problem is an integer problem.
A completely different approach could be better.

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL): old GPU used during tests
The poor performances on the OpenCL implementation
are also due to the old GPU used:
a graphic card NVIDIA quadro FX 1800 with 768 Mio.
This GPU has no cache for the global memory.
And the main loop iterates on prime numbers in this global memory.
More modern GPU could use the native OpenCL function ctz (instead a
loop).
Nevertheless, with the maximum list of numbers possible for this GPU, the
OpenCL implementation has a little (disappointing) gain of performance
compared to the sequential implementation.

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL) — by list of numbers
0
20
40
60
80
100
100 1000 10000 100000
seconds
s¥¦§ ¨© § ¥s ¨© §s ¨ ¥¥! s! §

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
Multi-threads
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Results
Results are produced on a computer with only 4 cores, that explains the
decrease in gains beginning at 5 cores.
Results with Open MPI are a little strange, because for some parameters
they are better than the sequential implementation. It is like as if mpirun
on the sequential program made it faster.
Theoretically the overhead of the MPI implementation should be bigger
than the multi-thread implementation, due to the communication between
processes (but tests were made on a single computer).
The implementation is almost identical to the multi-thread version and all
computation results are identical, thus it must be correct.
Maybe the GCC compiler required with Open MPI optimizes better this
code than the clang compiler used for sequential and multi-thread versions.
Maybe is due to a little imprecision in the measures.
The two better implementations (“dynamic” algorithm with threads and
Open MPI) are both pretty close to the ideal.

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Speedup
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8
speedup
# thre#$%'()e00
i$e12i23
0e46e12i#7
thre#$0%(1e 83 (1e
thre#$0%83 '#19e
thre#$0%$31#@i)
wAB%(1e 83 (1e
wAB%$31#@i)

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Speedup with OpenCL
0
1
2
3
4
5
1 10 100 1000 10000 100000 1x106
CD
EEF
GD
# thrHIPQRSocess or size of the list of numbers (logarithmic scale)
identity
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Efficiency
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 T 6 U 8
e
V
ciency
W XYèad/process
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic

Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Efficiency with OpenCL
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000 abac6
e
d
ciency
# thread/process or size of the list of numbers (logarithmic scale)
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Overhead
0
1000
2000
3000
4000
5000
0 1 2 3 4 5 6 7 8
efgh
head
# thripqrstuvixx
xiy€i‚ƒp„
thripqxrui …† ui
thripqxr…† tp‡i
thripqxrq†pˆƒv
‰‘rui …† ui
‰‘rq†pˆƒv

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Overhead only until 4 cores
’“””
’•””
0
200
“””
600
800
1000
1200
0 1 2 – “
—˜™d
head
# thrfghjklmnfpp
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic

Parallel
Numerical
Veriﬁcation of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Eﬃciency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Overhead with OpenCL
1
10
100
1000
10000
100000
1x106
1x107
1x108
1x10q
1x1010
1 10 100 1000 10000 100000 1x106
overhead(logarithmicscale)
# thread/process or size of the list of numbers (logarithmic scale)
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL

Parallel Numerical Verification of the σ_odd problem

Parallel Numerical Verification of the σ_odd problem

Recommended

Recommended

More Related Content

Featured

Featured (20)

Parallel Numerical Verification of the σ_odd problem