SlideShare a Scribd company logo
1 of 41
Download to read offline
Universit´e Libre de Bruxelles
Computer Science Department
INFO-Y100 (4004940ENR) Parallel systems
Project
Parallel
Numerical Verification of
the σodd problem
Presentation
1
3
7
21
Olivier Pirson — olivier.pirson.opi@gmail.com
orcid.org/0000-0001-6296-9659
December 15, 2017
(Last modifications: September 11, 2019)
https://speakerdeck.com/opimedia/parallel-numerical-verification-of-the-s-odd-problem
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 2 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd and ςodd functions
σ(n) = sum of all divisors of n (sigma)
σodd(n) = sum of odd divisors of n (sigma odd)
All divisors of 18: {1, 2, 3, 6, 9, 18}
Only odd divisors: {1, 3, 9} so σodd(18) = 13
All divisors of 19: {1, 19}
Only odd divisors: {1, 19} so σodd(19) = 20
ςodd(n) = σodd(n) divided by 2 until to be odd (varsigma odd)
ςodd(18) = 13
ςodd(19) = 5
n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2
σ(n) 1 3 4 7 6 12 8 15 13 18 12 28 14 24 24 31 18 39 20 42 3
σodd(n) 1 1 4 1 6 4 8 1 13 6 12 4 14 8 24 1 18 13 20 6 3
ςodd(n) 1 1 1 1 3 1 1 1 13 3 3 1 7 1 3 1 9 13 5 3
Parallel Numerical Verification of the σodd problem 3 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd problem: an iteration problem
We iterate the ςodd (or equivalently σodd) function
and we observe that we always reach 1.
Numbers in orange are square numbers.
For all n odd and square number (= 1):
ςodd(n) = σodd(n) > n
But we observe that for almost other odd numbers n:
ςodd(n) < n
Note that even numbers are not interesting
for this problem, because
σodd(2n) = σodd(n).
and ςodd(2n) = ςodd(n).
1
3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81 121 133
83
85
Parallel Numerical Verification of the σodd problem 4 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd problem: an iteration problem
The point in the middle of this picture is the number 1.
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Parallel Numerical Verification of the σodd problem 5 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
The σodd problem is a conjecture
Does the iteration always reaches 1?
The σodd problem is the conjecture that is always true,
what ever the starting number (integer ≥ 1).
Successfully checked for each n until 1.1 × 1011
≃ 1.6 × 236
with programs developed for this work.
Previous result known was 230
.
Moreover, n ≤ 1011
=⇒ ςodd
15
(n) = 1
Parallel Numerical Verification of the σodd problem 6 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 7 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Numerical verification by the simple direct algorithm
For each odd number:
Algorithm 1 first check varsigma odd(first n, last n)
Ò f i r s t c h e c k v a r s i g m a o d d ( f i r s t n , l a s t n ) :
1 ÓÖ n = f i r s t n ØÓ l a s t n ×Ø Ô 2
2 lowe r n , l e n g t h = f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r (n )
3 l e n g t h > 1 Ø Ò
4 ÔÖ ÒØ n , lowe r n , l e n g t h
Parallel Numerical Verification of the σodd problem 8 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Numerical verification by the simple direct algorithm
Simply iterate ςodd until to have a little number:
Algorithm 2 first iterate varsigma odd until lower(n)
Ò f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r ( s t a r t n ) :
1 n = s t a r t n
2 l e n g t h = 0
3 Ó
4 l e n g t h = l e n g t h + 1
5 n = ςodd (n )
6 n > MAX POSSIBLE N Ø Ò
7 ÔÖ ÒØ "! Impossible to check " , s t a r t n , le ngth , n
8 Ü Ø
9 Û Ð n > s t a r t n
10
11 n = s t a r t n Ø Ò
12 ÔÖ ÒØ "! Found not trivial cycle " , s t a r t n , l e n g t h
13 Ü Ø
14
15 Ö ØÙÖÒ n , l e n g t h
Parallel Numerical Verification of the σodd problem 9 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Computation of σodd(n)
Assume n odd:
n = pα1
1 × pα2
2 × pα3
3 × · · · × pαk
k with pi distinct prime numbers
σodd(n) =
pα+1
1 −1
p1−1
×
pα+1
2 −1
p2−1
×
pα+1
3 −1
p3−1
× · · · ×
pα+1
k
−1
pk −1
Thus, to verify the conjecture we must factorize
(other ways are less efficient).
Parallel Numerical Verification of the σodd problem 10 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Use properties to avoid a lot of computations
For each n, we want to check there exists k such that σodd
k
(n) = 1
It is equivalent to check there exists k such that ςodd
k
(n) < n.
That reduces the path that will be compute.
Only odd numbers must be check (50%).
Other numbers can be avoided (remains ≃ 33%).
Almost numbers reach smaller number in only one step!
Exceptions identified before computation: square numbers.
The other exceptions (called bad numbers) are very rare.
So instead to iterate we will compute only one step
and keep exceptions that will be check separately (very fast).
ςodd(ab) ≤ ςodd(a) ςodd(b)
−→ shortcut in the factorization (the most heavy work)
(with use of previous known bad numbers
or with general upper bound).
Parallel Numerical Verification of the σodd problem 11 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Transformed problem
With these properties we have transformed the necessity to compute the
complete iteration of σodd
(and thus the complete factorization)
of each number
to this both improved and simpler (relatively to other possible
optimizations) algorithm:
compute only one
(eventually partially) iteration of ςodd
for only some numbers.
“The cheapest, fastest and most reliable components of a computer system
are those that aren’t there.”
— Gordon Bell
Parallel Numerical Verification of the σodd problem 12 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Transformed problem
(progs/src/sequential/sequential/sequential.hpp)
Algorithm 3 sequential check gentle varsigma odd(first n,
last n)
// P r e c o n d i t i o n s : 3 ≤ f i r s t n odd ≤ l a s t n ≤ MAX POSSIBLE N
Ò s e q u e n t i a l c h e c k g e n t l e v a r s i g m a o d d ( f i r s t n , l a s t n ) :
1 b a d t a b l e = ∅
2 ÓÖ n = f i r s t n ØÓ l a s t n ×Ø Ô 2
3 ÒÓØ (3, 7, 31 or 127  n) Ø Ò
4 ÒÓØ (n i s square number) Ø Ò
5 ÒÓØ s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n ,
6 bad table , f i r s t n ) Ø Ò
7 b a d t a b l e = b a d t a b l e ∪ {n}
8 ÔÖ ÒØ n
Ö ØÙÖÒ b a d t a b l e
// P o s t c o n d i t i o n :
// I f a l l numbers < f i r s t n r e s p e c t the c o n j e c t u r e
// and a l l square numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e
// and a l l odd bad numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e
// then a l l numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e .
// P r i n t a l l odd bad numbers between f i r s t n and l a s t n ( i n c l u d e d )
// and r e t u r n the s e t .
d  n means that d is a divisor of n.
d  n means that d is a divisor of n, but d2
is not.
Parallel Numerical Verification of the σodd problem 13 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Transformed problem
Computes (eventually partially) ςodd(n) by the factorization of n
and returns True if and only if ςodd(n) < n.
Algorithm 4 sequential is varsigma odd lower(n, bad table,
bad first n)
// P r e c o n d i t i o n s : 3 ≤ n odd ≤ MAX POSSIBLE N
// b a d t a b l e c o n t a i n s a l l odd bad numbers
// between b a d f i r s t n ( i n c l u d e d ) and n ( e xc lude d )
Ò s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n , bad table , b a d f i r s t n ) :
1 n d i v i d e d = n
2 varsigma odd = 1
3 ÓÖ p odd prime ≤ ⌊
√
n divided⌋
4 α = 0
5 Û Ð p  n d i v i d e d
6 n d i v i d e d = n d i v i d e d / p
7 α = α + 1
8
9 α > 0 Ø Ò // pα
i s a f a c t o r of n
10 varsigma odd = varsigma odd ∗ Odd
pα − 1
p − 1
+ pα
11 ( varsigma odd
12 ∗ s e q u e n t i a l s i g m a o d d u p p e r b o u n d ( n d i v i d e d ,
13 bad table , b a d f i r s t n )) < n Ø Ò
14 Ö ØÙÖÒ ÌÖÙ
15
16 n d i v i d e d > 1 Ø Ò // n d i v i d e d i s prime
17 varsigma odd = varsigma odd ∗ Odd( n d i v i d e d + 1)
18
19 Ö ØÙÖÒ ( varsigma odd < n )
Parallel Numerical Verification of the σodd problem 14 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Factorization shortcut
When we found a prime factor,
it may be possible to shortcut the complete factorization.
For example, with a first prime factor p1 of n:
n = pα1
1 n′
σodd(n) =
pα+1
1 −1
p1−1
× σodd(n′
)
σodd(n) ≤
pα+1
1 −1
p1−1
× upper bound of σodd(n′
) < n? If yes, then stop
Upper bound always true:
σodd(n′
) ≤ 2n′ 8
√
n′
It is the same for the ςodd function, with some additional division(s) by 2.
And if n′
is gentle (odd but neither square neither bad):
ςodd(n′
) < n′
(so it can be possible to shortcut “often”).
Parallel Numerical Verification of the σodd problem 15 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Stop and restart
Note that the program can be stopped (or executed until some last n)
and restarted with the last value checked.
In fact, it is possible to compute different ranges of numbers separately (in
the same time or not).
If all required numbers are checked (with odd square numbers and bad
numbers checked, for example by the naive way, which is fast for these rare
numbers) until number N, then the conclusion is for all n such that n ≤ N,
the iteration of σodd (and ςodd) from n reaches 1 (what we wanted to
achieve).
Parallel Numerical Verification of the σodd problem 16 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 17 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Performance with one thread/process
First, the comparison between sequential, three multi-threading and two
message-passing implementations (for only one thread/process).
By checking numbers between 1 and 20,000,001.
On a personal computer with 4 cores, 2 threads by core.
6
6.2
6.4
6.6
6.8
7
0 1 2 3 4 5
seconds
0:sequential,
one thread (1:one by one, 2:by range, 3:dynamic),
one process MPI (4:one by one, 5:dynamic)
Parallel Numerical Verification of the σodd problem 18 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads (thread of C++11)
3 different implementations: (progs/src/threads/threads/threads.hpp)
One by one
Each slave computes independently one number and sends a boolean to the
master. The master also computes one number, and waits everybody. And
so forth with next numbers.
Silly implementation; just to try. Very inefficient. The barrier is a big
limitation because each number has a different factorization time.
By range
Like one by one but each slave receives a range of numbers (by these
extremities), computes and returns the (very little) set of bad numbers
founds. The master computes a smaller range, and waits everybody. And so
forth with next numbers.
Really better because computation is more well balanced, due to an average
of the factorization time.
“Dynamic”
Like by range, but the master do not waits, gives new range when a slave is
free, and computes also the rest of the time.
Very good occupation for each thread (see graph in following slides).
All threads share the same prime number tables.
Parallel Numerical Verification of the σodd problem 19 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads — one by one
0
10
20
30
40
50
60
70
80
1 2 3 4 5 6 7 8
seconds
# threads
Parallel Numerical Verification of the σodd problem 20 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads — by range
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8
seconds
# threads
Parallel Numerical Verification of the σodd problem 21 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Multi-threads — “dynamic”
0
1
2
3
4
5
6
7
1 2 3 4 5 6 7 8
seconds
# threads
Parallel Numerical Verification of the σodd problem 22 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Message-passing (Open MPI)
2 implementations: (progs/src/mpi/mpi/mpi.hpp)
One by one
One element, barrier. Very inefficient; just to try.
“Dynamic”
By range and does not wait.
Same algorithms than for multi-threading.
But exchange information by messages. (That could be between different
machines, but these results was computed on only one computer.) Little
impact if size of range is important compared to the small quantity of these
information.
Messages from the master to each slave:
The unique number or the extremities of the range, and the new (rare) bad
numbers found by other threads.
Messages from each slave to the master:
A boolean or a array of the new (rare) bad numbers found.
Main differences with multi-threading: exchanges between processes,
and each process have its own prime numbers table.
Parallel Numerical Verification of the σodd problem 23 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Message-passing — “dynamic”
0
1
2
3
4
5
6
7
1 2 3 4 5 6
seconds
5  ¡o¢£¤¤
Parallel Numerical Verification of the σodd problem 24 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL)
Only one implementation: (progs/src/opencl/opencl/opencl.hpp)
By list of numbers
The CPU selects a list of numbers to be check
and sends them to the GPU.
The GPU compute completely ς(n) for each n received (without to use
a list of bad numbers and without to shortcut the factorization).
Then the GPU returns a corresponding list of booleans to the CPU.
And so forth.
Instead a direct computation of ς(n) during the factorization,
this implementation collects before all prime factors of n.
That makes it easier the parallel work.
The important improvements of the algorithm (the shortcut of the
factorization) was also removed, because that did not gave better results,
due to the complexification of branching.
Parallel Numerical Verification of the σodd problem 25 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL): explanations of bad results
The computation is massively parallel (if big list of numbers).
But the efficiency is limited by the difference of the factorization process
for each number. The algorithm, by the nature of the computation of the
problem by factorization, is more or less a random succession of conditional
branches. And the nature of the parallel computation by GPUs loses a lot
of power on that.
More the list of numbers is big and more the computation is ideally
parallel. But more this list is big and more the computation of each
number disturbs the progress of the others.
Moreover, all numbers quickly factorized wait the end of the others.
Also, GPUs give the best of their power on floating point computations.
This problem is an integer problem.
A completely different approach could be better.
Parallel Numerical Verification of the σodd problem 26 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL): old GPU used during tests
The poor performances on the OpenCL implementation
are also due to the old GPU used:
a graphic card NVIDIA quadro FX 1800 with 768 Mio.
This GPU has no cache for the global memory.
And the main loop iterates on prime numbers in this global memory.
More modern GPU could use the native OpenCL function ctz (instead a
loop).
Nevertheless, with the maximum list of numbers possible for this GPU, the
OpenCL implementation has a little (disappointing) gain of performance
compared to the sequential implementation.
Parallel Numerical Verification of the σodd problem 27 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
GPU (OpenCL) — by list of numbers
0
20
40
60
80
100
100 1000 10000 100000
seconds
s¥¦§ ¨© § ¥s ¨© §s ¨ ¥¥! s! §
Parallel Numerical Verification of the σodd problem 28 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
1 The problem
2 Computation
Simple algorithm
Better algorithm
3 Parallel implementations
Multi-threads
Message-passing (Open MPI)
GPU (OpenCL)
4 Results
Speedup
Efficiency
Overhead
Benchmarks tables
Parallel Numerical Verification of the σodd problem 29 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Results
Results are produced on a computer with only 4 cores, that explains the
decrease in gains beginning at 5 cores.
Results with Open MPI are a little strange, because for some parameters
they are better than the sequential implementation. It is like as if mpirun
on the sequential program made it faster.
Theoretically the overhead of the MPI implementation should be bigger
than the multi-thread implementation, due to the communication between
processes (but tests were made on a single computer).
The implementation is almost identical to the multi-thread version and all
computation results are identical, thus it must be correct.
Maybe the GCC compiler required with Open MPI optimizes better this
code than the clang compiler used for sequential and multi-thread versions.
Maybe is due to a little imprecision in the measures.
The two better implementations (“dynamic” algorithm with threads and
Open MPI) are both pretty close to the ideal.
Parallel Numerical Verification of the σodd problem 30 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Speedup
0
1
2
3
4
5
0 1 2 3 4 5 6 7 8
speedup
# thre#$%'()e00
i$e12i23
0e46e12i#7
thre#$0%(1e 83 (1e
thre#$0%83 '#19e
thre#$0%$31#@i)
wAB%(1e 83 (1e
wAB%$31#@i)
Parallel Numerical Verification of the σodd problem 31 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Speedup with OpenCL
0
1
2
3
4
5
1 10 100 1000 10000 100000 1x106
CD
EEF
GD
# thrHIPQRSocess or size of the list of numbers (logarithmic scale)
identity
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL
Parallel Numerical Verification of the σodd problem 32 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Efficiency
0
0.2
0.4
0.6
0.8
1
1.2
0 1 2 3 4 T 6 U 8
e
V
ciency
W XY`ead/process
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
Parallel Numerical Verification of the σodd problem 33 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Efficiency with OpenCL
0
0.2
0.4
0.6
0.8
1
1 10 100 1000 10000 100000 abac6
e
d
ciency
# thread/process or size of the list of numbers (logarithmic scale)
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL
Parallel Numerical Verification of the σodd problem 34 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Overhead
0
1000
2000
3000
4000
5000
0 1 2 3 4 5 6 7 8
efgh
head
# thripqrstuvixx
xiy€i‚ƒp„
thripqxrui …† ui
thripqxr…† tp‡i
thripqxrq†pˆƒv
‰‘rui …† ui
‰‘rq†pˆƒv
Parallel Numerical Verification of the σodd problem 35 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Overhead only until 4 cores
’“””
’•””
0
200
“””
600
800
1000
1200
0 1 2 – “
—˜™d
head
# thrfghjklmnfpp
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
Parallel Numerical Verification of the σodd problem 36 / 41
Parallel
Numerical
Verification of
the σodd
problem
The problem
Computation
Simple algo.
Better algorithm
Parallel
implementations
Multi-threads
Message-passing
GPU (OpenCL)
Results
Speedup
Efficiency
Overhead
Benchmarks
tables
1 3
5
7
9
13
11
15
17
19
21
23
25
31
27
29
33
35
37
39
41
43
45
47
49
57
51
53
55
59
61
63
65
67
69
71
73
75
77
79
81
121
133
83
85
87
89
91
93
95
97
99
101
103
105
107
109
111
113
115
117
119
123
125
127
129
131
135
137
139
141
143
145
147
149
151
153
155
157
159
161
163
165
167
169
183
171
173
175
177
179
181
185
187
189
191
193
195
197
199
201
203
205
207
209
211
213
215
217
219
221
223
225
403
227
229
231
233
235
237
239
241
243
245
247
249
251
253
255
257
259
261
263
265
267
269
271
273
275
277
279
281
283
285
287
289
307
291
293
295
297
299
301
303
305
309
311
313
315
317
319
321
323
325
327
329
331
333
335
337
339
341
343
345
347
349
351
353
355
357
359
361
381
363
365
367
369
371
373
375
377
379
383
385
387
389
391
393
395
397
399
401
405
407
409
411
413
415
417
419
421
423
425
427
429
431
433
435
437
439
441
741
443
445
447
449
451
453
455
457
459
461
463
465
467
469
471
473
475
477
479
481
483
485
487
489
491
493
495
497
499
501
503
505
507
509
511
513
515
517
519
521
523
525
527
529553
531
533
535
537
539
541
543
545
547
549
551
555
557
559
561
563
565
567
569
571
573
575
577
579
581
583
585
587
589
591
593
595
597
599
601
603
605
607
609
611
613
615
617
619
621
623
625
781
627
629
631
633
635
637
639
641
643
645
647
649
651
653
655
657
659
661
663
665
667
669
671
673
675
677
679
681
683
685
687
689
691
693
695
697
699
701
703
705
707
709
711
713
715
717
719
721
723
725
727
729
1093
731
733
735
737
739
743
745
747
749
751
753
755
757
759
761
763
765
767
769
771
773
775
777
779
783
785
787
789
791
793
795
797
799
801
803
805
807
809
811
813
815
817
819
821
823
825
827
829
831
833
835
837
839
841
871
843
845
847
849
851
853
855
857
859
861
863
865
867
869
873
875
877
879
881
883
885
887
889
891
893
895
897
899
901
903
905
907
909
911
913
915
917
919
921
923
925
927
929
931
933
935
937
939
941
943
945
947
949
951
953
955
957
959
961
993
963
965
967
969
971
973
975
977
979
981
983
985
987
989
991995
997
999
1001
Overhead with OpenCL
1
10
100
1000
10000
100000
1x106
1x107
1x108
1x10q
1x1010
1 10 100 1000 10000 100000 1x106
overhead(logarithmicscale)
# thread/process or size of the list of numbers (logarithmic scale)
sequential
threads/one by one
threads/by range
threads/dynamic
MPI/one by one
MPI/dynamic
OpenCL
Parallel Numerical Verification of the σodd problem 37 / 41
Parallel Numerical Verification of the σ_odd problem
Parallel Numerical Verification of the σ_odd problem
Parallel Numerical Verification of the σ_odd problem
Parallel Numerical Verification of the σ_odd problem

More Related Content

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Parallel Numerical Verification of the σ_odd problem

  • 1. Universit´e Libre de Bruxelles Computer Science Department INFO-Y100 (4004940ENR) Parallel systems Project Parallel Numerical Verification of the σodd problem Presentation 1 3 7 21 Olivier Pirson — olivier.pirson.opi@gmail.com orcid.org/0000-0001-6296-9659 December 15, 2017 (Last modifications: September 11, 2019) https://speakerdeck.com/opimedia/parallel-numerical-verification-of-the-s-odd-problem
  • 2. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableshe problem 2 Computation Simple algorithm Better algorithm 3 Parallel implementations Multi-threads Message-passing (Open MPI) GPU (OpenCL) 4 Results Speedup Efficiency Overhead Benchmarks tables Parallel Numerical Verification of the σodd problem 2 / 41
  • 3. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableshe σodd and ςodd functions σ(n) = sum of all divisors of n (sigma) σodd(n) = sum of odd divisors of n (sigma odd) All divisors of 18: {1, 2, 3, 6, 9, 18} Only odd divisors: {1, 3, 9} so σodd(18) = 13 All divisors of 19: {1, 19} Only odd divisors: {1, 19} so σodd(19) = 20 ςodd(n) = σodd(n) divided by 2 until to be odd (varsigma odd) ςodd(18) = 13 ςodd(19) = 5 n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 2 σ(n) 1 3 4 7 6 12 8 15 13 18 12 28 14 24 24 31 18 39 20 42 3 σodd(n) 1 1 4 1 6 4 8 1 13 6 12 4 14 8 24 1 18 13 20 6 3 ςodd(n) 1 1 1 1 3 1 1 1 13 3 3 1 7 1 3 1 9 13 5 3 Parallel Numerical Verification of the σodd problem 3 / 41
  • 4. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableshe σodd problem: an iteration problem We iterate the ςodd (or equivalently σodd) function and we observe that we always reach 1. Numbers in orange are square numbers. For all n odd and square number (= 1): ςodd(n) = σodd(n) > n But we observe that for almost other odd numbers n: ςodd(n) < n Note that even numbers are not interesting for this problem, because σodd(2n) = σodd(n). and ςodd(2n) = ςodd(n). 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 Parallel Numerical Verification of the σodd problem 4 / 41
  • 5. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableshe σodd problem: an iteration problem The point in the middle of this picture is the number 1. 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 123 125 127 129 131 135 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169 183 171 173 175 177 179 181 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 219 221 223 225 403 227 229 231 233 235 237 239 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287 289 307 291 293 295 297 299 301 303 305 309 311 313 315 317 319 321 323 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359 361 381 363 365 367 369 371 373 375 377 379 383 385 387 389 391 393 395 397 399 401 405 407 409 411 413 415 417 419 421 423 425 427 429 431 433 435 437 439 441 741 443 445 447 449 451 453 455 457 459 461 463 465 467 469 471 473 475 477 479 481 483 485 487 489 491 493 495 497 499 501 503 505 507 509 511 513 515 517 519 521 523 525 527 529553 531 533 535 537 539 541 543 545 547 549 551 555 557 559 561 563 565 567 569 571 573 575 577 579 581 583 585 587 589 591 593 595 597 599 601 603 605 607 609 611 613 615 617 619 621 623 625 781 627 629 631 633 635 637 639 641 643 645 647 649 651 653 655 657 659 661 663 665 667 669 671 673 675 677 679 681 683 685 687 689 691 693 695 697 699 701 703 705 707 709 711 713 715 717 719 721 723 725 727 729 1093 731 733 735 737 739 743 745 747 749 751 753 755 757 759 761 763 765 767 769 771 773 775 777 779 783 785 787 789 791 793 795 797 799 801 803 805 807 809 811 813 815 817 819 821 823 825 827 829 831 833 835 837 839 841 871 843 845 847 849 851 853 855 857 859 861 863 865 867 869 873 875 877 879 881 883 885 887 889 891 893 895 897 899 901 903 905 907 909 911 913 915 917 919 921 923 925 927 929 931 933 935 937 939 941 943 945 947 949 951 953 955 957 959 961 993 963 965 967 969 971 973 975 977 979 981 983 985 987 989 991995 997 999 1001 Parallel Numerical Verification of the σodd problem 5 / 41
  • 6. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableshe σodd problem is a conjecture Does the iteration always reaches 1? The σodd problem is the conjecture that is always true, what ever the starting number (integer ≥ 1). Successfully checked for each n until 1.1 × 1011 ≃ 1.6 × 236 with programs developed for this work. Previous result known was 230 . Moreover, n ≤ 1011 =⇒ ςodd 15 (n) = 1 Parallel Numerical Verification of the σodd problem 6 / 41
  • 7. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableshe problem 2 Computation Simple algorithm Better algorithm 3 Parallel implementations Multi-threads Message-passing (Open MPI) GPU (OpenCL) 4 Results Speedup Efficiency Overhead Benchmarks tables Parallel Numerical Verification of the σodd problem 7 / 41
  • 8. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesumerical verification by the simple direct algorithm For each odd number: Algorithm 1 first check varsigma odd(first n, last n) Ò f i r s t c h e c k v a r s i g m a o d d ( f i r s t n , l a s t n ) : 1 ÓÖ n = f i r s t n ØÓ l a s t n ×Ø Ô 2 2 lowe r n , l e n g t h = f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r (n ) 3 l e n g t h > 1 Ø Ò 4 ÔÖ ÒØ n , lowe r n , l e n g t h Parallel Numerical Verification of the σodd problem 8 / 41
  • 9. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesumerical verification by the simple direct algorithm Simply iterate ςodd until to have a little number: Algorithm 2 first iterate varsigma odd until lower(n) Ò f i r s t i t e r a t e v a r s i g m a o d d u n t i l l o w e r ( s t a r t n ) : 1 n = s t a r t n 2 l e n g t h = 0 3 Ó 4 l e n g t h = l e n g t h + 1 5 n = ςodd (n ) 6 n > MAX POSSIBLE N Ø Ò 7 ÔÖ ÒØ "! Impossible to check " , s t a r t n , le ngth , n 8 Ü Ø 9 Û Ð n > s t a r t n 10 11 n = s t a r t n Ø Ò 12 ÔÖ ÒØ "! Found not trivial cycle " , s t a r t n , l e n g t h 13 Ü Ø 14 15 Ö ØÙÖÒ n , l e n g t h Parallel Numerical Verification of the σodd problem 9 / 41
  • 10. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesomputation of σodd(n) Assume n odd: n = pα1 1 × pα2 2 × pα3 3 × · · · × pαk k with pi distinct prime numbers σodd(n) = pα+1 1 −1 p1−1 × pα+1 2 −1 p2−1 × pα+1 3 −1 p3−1 × · · · × pα+1 k −1 pk −1 Thus, to verify the conjecture we must factorize (other ways are less efficient). Parallel Numerical Verification of the σodd problem 10 / 41
  • 11. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesse properties to avoid a lot of computations For each n, we want to check there exists k such that σodd k (n) = 1 It is equivalent to check there exists k such that ςodd k (n) < n. That reduces the path that will be compute. Only odd numbers must be check (50%). Other numbers can be avoided (remains ≃ 33%). Almost numbers reach smaller number in only one step! Exceptions identified before computation: square numbers. The other exceptions (called bad numbers) are very rare. So instead to iterate we will compute only one step and keep exceptions that will be check separately (very fast). ςodd(ab) ≤ ςodd(a) ςodd(b) −→ shortcut in the factorization (the most heavy work) (with use of previous known bad numbers or with general upper bound). Parallel Numerical Verification of the σodd problem 11 / 41
  • 12. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesransformed problem With these properties we have transformed the necessity to compute the complete iteration of σodd (and thus the complete factorization) of each number to this both improved and simpler (relatively to other possible optimizations) algorithm: compute only one (eventually partially) iteration of ςodd for only some numbers. “The cheapest, fastest and most reliable components of a computer system are those that aren’t there.” — Gordon Bell Parallel Numerical Verification of the σodd problem 12 / 41
  • 13. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesransformed problem (progs/src/sequential/sequential/sequential.hpp) Algorithm 3 sequential check gentle varsigma odd(first n, last n) // P r e c o n d i t i o n s : 3 ≤ f i r s t n odd ≤ l a s t n ≤ MAX POSSIBLE N Ò s e q u e n t i a l c h e c k g e n t l e v a r s i g m a o d d ( f i r s t n , l a s t n ) : 1 b a d t a b l e = ∅ 2 ÓÖ n = f i r s t n ØÓ l a s t n ×Ø Ô 2 3 ÒÓØ (3, 7, 31 or 127 n) Ø Ò 4 ÒÓØ (n i s square number) Ø Ò 5 ÒÓØ s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n , 6 bad table , f i r s t n ) Ø Ò 7 b a d t a b l e = b a d t a b l e ∪ {n} 8 ÔÖ ÒØ n Ö ØÙÖÒ b a d t a b l e // P o s t c o n d i t i o n : // I f a l l numbers < f i r s t n r e s p e c t the c o n j e c t u r e // and a l l square numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e // and a l l odd bad numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e // then a l l numbers ≤ l a s t n r e s p e c t the c o n j e c t u r e . // P r i n t a l l odd bad numbers between f i r s t n and l a s t n ( i n c l u d e d ) // and r e t u r n the s e t . d n means that d is a divisor of n. d n means that d is a divisor of n, but d2 is not. Parallel Numerical Verification of the σodd problem 13 / 41
  • 14. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesransformed problem Computes (eventually partially) ςodd(n) by the factorization of n and returns True if and only if ςodd(n) < n. Algorithm 4 sequential is varsigma odd lower(n, bad table, bad first n) // P r e c o n d i t i o n s : 3 ≤ n odd ≤ MAX POSSIBLE N // b a d t a b l e c o n t a i n s a l l odd bad numbers // between b a d f i r s t n ( i n c l u d e d ) and n ( e xc lude d ) Ò s e q u e n t i a l i s v a r s i g m a o d d l o w e r (n , bad table , b a d f i r s t n ) : 1 n d i v i d e d = n 2 varsigma odd = 1 3 ÓÖ p odd prime ≤ ⌊ √ n divided⌋ 4 α = 0 5 Û Ð p n d i v i d e d 6 n d i v i d e d = n d i v i d e d / p 7 α = α + 1 8 9 α > 0 Ø Ò // pα i s a f a c t o r of n 10 varsigma odd = varsigma odd ∗ Odd pα − 1 p − 1 + pα 11 ( varsigma odd 12 ∗ s e q u e n t i a l s i g m a o d d u p p e r b o u n d ( n d i v i d e d , 13 bad table , b a d f i r s t n )) < n Ø Ò 14 Ö ØÙÖÒ ÌÖÙ 15 16 n d i v i d e d > 1 Ø Ò // n d i v i d e d i s prime 17 varsigma odd = varsigma odd ∗ Odd( n d i v i d e d + 1) 18 19 Ö ØÙÖÒ ( varsigma odd < n ) Parallel Numerical Verification of the σodd problem 14 / 41
  • 15. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesactorization shortcut When we found a prime factor, it may be possible to shortcut the complete factorization. For example, with a first prime factor p1 of n: n = pα1 1 n′ σodd(n) = pα+1 1 −1 p1−1 × σodd(n′ ) σodd(n) ≤ pα+1 1 −1 p1−1 × upper bound of σodd(n′ ) < n? If yes, then stop Upper bound always true: σodd(n′ ) ≤ 2n′ 8 √ n′ It is the same for the ςodd function, with some additional division(s) by 2. And if n′ is gentle (odd but neither square neither bad): ςodd(n′ ) < n′ (so it can be possible to shortcut “often”). Parallel Numerical Verification of the σodd problem 15 / 41
  • 16. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablestop and restart Note that the program can be stopped (or executed until some last n) and restarted with the last value checked. In fact, it is possible to compute different ranges of numbers separately (in the same time or not). If all required numbers are checked (with odd square numbers and bad numbers checked, for example by the naive way, which is fast for these rare numbers) until number N, then the conclusion is for all n such that n ≤ N, the iteration of σodd (and ςodd) from n reaches 1 (what we wanted to achieve). Parallel Numerical Verification of the σodd problem 16 / 41
  • 17. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tables 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 123 125 127 129 131 135 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169 183 171 173 175 177 179 181 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 219 221 223 225 403 227 229 231 233 235 237 239 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287 289 307 291 293 295 297 299 301 303 305 309 311 313 315 317 319 321 323 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359 361 381 363 365 367 369 371 373 375 377 379 383 385 387 389 391 393 395 397 399 401 405 407 409 411 413 415 417 419 421 423 425 427 429 431 433 435 437 439 441 741 443 445 447 449 451 453 455 457 459 461 463 465 467 469 471 473 475 477 479 481 483 485 487 489 491 493 495 497 499 501 503 505 507 509 511 513 515 517 519 521 523 525 527 529553 531 533 535 537 539 541 543 545 547 549 551 555 557 559 561 563 565 567 569 571 573 575 577 579 581 583 585 587 589 591 593 595 597 599 601 603 605 607 609 611 613 615 617 619 621 623 625 781 627 629 631 633 635 637 639 641 643 645 647 649 651 653 655 657 659 661 663 665 667 669 671 673 675 677 679 681 683 685 687 689 691 693 695 697 699 701 703 705 707 709 711 713 715 717 719 721 723 725 727 729 1093 731 733 735 737 739 743 745 747 749 751 753 755 757 759 761 763 765 767 769 771 773 775 777 779 783 785 787 789 791 793 795 797 799 801 803 805 807 809 811 813 815 817 819 821 823 825 827 829 831 833 835 837 839 841 871 843 845 847 849 851 853 855 857 859 861 863 865 867 869 873 875 877 879 881 883 885 887 889 891 893 895 897 899 901 903 905 907 909 911 913 915 917 919 921 923 925 927 929 931 933 935 937 939 941 943 945 947 949 951 953 955 957 959 961 993 963 965 967 969 971 973 975 977 979 981 983 985 987 989 991995 997 999 1001 1 The problem 2 Computation Simple algorithm Better algorithm 3 Parallel implementations Multi-threads Message-passing (Open MPI) GPU (OpenCL) 4 Results Speedup Efficiency Overhead Benchmarks tables Parallel Numerical Verification of the σodd problem 17 / 41
  • 18. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableserformance with one thread/process First, the comparison between sequential, three multi-threading and two message-passing implementations (for only one thread/process). By checking numbers between 1 and 20,000,001. On a personal computer with 4 cores, 2 threads by core. 6 6.2 6.4 6.6 6.8 7 0 1 2 3 4 5 seconds 0:sequential, one thread (1:one by one, 2:by range, 3:dynamic), one process MPI (4:one by one, 5:dynamic) Parallel Numerical Verification of the σodd problem 18 / 41
  • 19. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesulti-threads (thread of C++11) 3 different implementations: (progs/src/threads/threads/threads.hpp) One by one Each slave computes independently one number and sends a boolean to the master. The master also computes one number, and waits everybody. And so forth with next numbers. Silly implementation; just to try. Very inefficient. The barrier is a big limitation because each number has a different factorization time. By range Like one by one but each slave receives a range of numbers (by these extremities), computes and returns the (very little) set of bad numbers founds. The master computes a smaller range, and waits everybody. And so forth with next numbers. Really better because computation is more well balanced, due to an average of the factorization time. “Dynamic” Like by range, but the master do not waits, gives new range when a slave is free, and computes also the rest of the time. Very good occupation for each thread (see graph in following slides). All threads share the same prime number tables. Parallel Numerical Verification of the σodd problem 19 / 41
  • 20. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesulti-threads — one by one 0 10 20 30 40 50 60 70 80 1 2 3 4 5 6 7 8 seconds # threads Parallel Numerical Verification of the σodd problem 20 / 41
  • 21. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesulti-threads — by range 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 seconds # threads Parallel Numerical Verification of the σodd problem 21 / 41
  • 22. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesulti-threads — “dynamic” 0 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 seconds # threads Parallel Numerical Verification of the σodd problem 22 / 41
  • 23. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesessage-passing (Open MPI) 2 implementations: (progs/src/mpi/mpi/mpi.hpp) One by one One element, barrier. Very inefficient; just to try. “Dynamic” By range and does not wait. Same algorithms than for multi-threading. But exchange information by messages. (That could be between different machines, but these results was computed on only one computer.) Little impact if size of range is important compared to the small quantity of these information. Messages from the master to each slave: The unique number or the extremities of the range, and the new (rare) bad numbers found by other threads. Messages from each slave to the master: A boolean or a array of the new (rare) bad numbers found. Main differences with multi-threading: exchanges between processes, and each process have its own prime numbers table. Parallel Numerical Verification of the σodd problem 23 / 41
  • 24. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesessage-passing — “dynamic” 0 1 2 3 4 5 6 7 1 2 3 4 5 6 seconds 5  ¡o¢£¤¤ Parallel Numerical Verification of the σodd problem 24 / 41
  • 25. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tables 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 123 125 127 129 131 135 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169 183 171 173 175 177 179 181 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 219 221 223 225 403 227 229 231 233 235 237 239 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287 289 307 291 293 295 297 299 301 303 305 309 311 313 315 317 319 321 323 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359 361 381 363 365 367 369 371 373 375 377 379 383 385 387 389 391 393 395 397 399 401 405 407 409 411 413 415 417 419 421 423 425 427 429 431 433 435 437 439 441 741 443 445 447 449 451 453 455 457 459 461 463 465 467 469 471 473 475 477 479 481 483 485 487 489 491 493 495 497 499 501 503 505 507 509 511 513 515 517 519 521 523 525 527 529553 531 533 535 537 539 541 543 545 547 549 551 555 557 559 561 563 565 567 569 571 573 575 577 579 581 583 585 587 589 591 593 595 597 599 601 603 605 607 609 611 613 615 617 619 621 623 625 781 627 629 631 633 635 637 639 641 643 645 647 649 651 653 655 657 659 661 663 665 667 669 671 673 675 677 679 681 683 685 687 689 691 693 695 697 699 701 703 705 707 709 711 713 715 717 719 721 723 725 727 729 1093 731 733 735 737 739 743 745 747 749 751 753 755 757 759 761 763 765 767 769 771 773 775 777 779 783 785 787 789 791 793 795 797 799 801 803 805 807 809 811 813 815 817 819 821 823 825 827 829 831 833 835 837 839 841 871 843 845 847 849 851 853 855 857 859 861 863 865 867 869 873 875 877 879 881 883 885 887 889 891 893 895 897 899 901 903 905 907 909 911 913 915 917 919 921 923 925 927 929 931 933 935 937 939 941 943 945 947 949 951 953 955 957 959 961 993 963 965 967 969 971 973 975 977 979 981 983 985 987 989 991995 997 999 1001 GPU (OpenCL) Only one implementation: (progs/src/opencl/opencl/opencl.hpp) By list of numbers The CPU selects a list of numbers to be check and sends them to the GPU. The GPU compute completely ς(n) for each n received (without to use a list of bad numbers and without to shortcut the factorization). Then the GPU returns a corresponding list of booleans to the CPU. And so forth. Instead a direct computation of ς(n) during the factorization, this implementation collects before all prime factors of n. That makes it easier the parallel work. The important improvements of the algorithm (the shortcut of the factorization) was also removed, because that did not gave better results, due to the complexification of branching. Parallel Numerical Verification of the σodd problem 25 / 41
  • 26. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tables 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 123 125 127 129 131 135 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169 183 171 173 175 177 179 181 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 219 221 223 225 403 227 229 231 233 235 237 239 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287 289 307 291 293 295 297 299 301 303 305 309 311 313 315 317 319 321 323 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359 361 381 363 365 367 369 371 373 375 377 379 383 385 387 389 391 393 395 397 399 401 405 407 409 411 413 415 417 419 421 423 425 427 429 431 433 435 437 439 441 741 443 445 447 449 451 453 455 457 459 461 463 465 467 469 471 473 475 477 479 481 483 485 487 489 491 493 495 497 499 501 503 505 507 509 511 513 515 517 519 521 523 525 527 529553 531 533 535 537 539 541 543 545 547 549 551 555 557 559 561 563 565 567 569 571 573 575 577 579 581 583 585 587 589 591 593 595 597 599 601 603 605 607 609 611 613 615 617 619 621 623 625 781 627 629 631 633 635 637 639 641 643 645 647 649 651 653 655 657 659 661 663 665 667 669 671 673 675 677 679 681 683 685 687 689 691 693 695 697 699 701 703 705 707 709 711 713 715 717 719 721 723 725 727 729 1093 731 733 735 737 739 743 745 747 749 751 753 755 757 759 761 763 765 767 769 771 773 775 777 779 783 785 787 789 791 793 795 797 799 801 803 805 807 809 811 813 815 817 819 821 823 825 827 829 831 833 835 837 839 841 871 843 845 847 849 851 853 855 857 859 861 863 865 867 869 873 875 877 879 881 883 885 887 889 891 893 895 897 899 901 903 905 907 909 911 913 915 917 919 921 923 925 927 929 931 933 935 937 939 941 943 945 947 949 951 953 955 957 959 961 993 963 965 967 969 971 973 975 977 979 981 983 985 987 989 991995 997 999 1001 GPU (OpenCL): explanations of bad results The computation is massively parallel (if big list of numbers). But the efficiency is limited by the difference of the factorization process for each number. The algorithm, by the nature of the computation of the problem by factorization, is more or less a random succession of conditional branches. And the nature of the parallel computation by GPUs loses a lot of power on that. More the list of numbers is big and more the computation is ideally parallel. But more this list is big and more the computation of each number disturbs the progress of the others. Moreover, all numbers quickly factorized wait the end of the others. Also, GPUs give the best of their power on floating point computations. This problem is an integer problem. A completely different approach could be better. Parallel Numerical Verification of the σodd problem 26 / 41
  • 27. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tables 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 123 125 127 129 131 135 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169 183 171 173 175 177 179 181 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 219 221 223 225 403 227 229 231 233 235 237 239 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287 289 307 291 293 295 297 299 301 303 305 309 311 313 315 317 319 321 323 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359 361 381 363 365 367 369 371 373 375 377 379 383 385 387 389 391 393 395 397 399 401 405 407 409 411 413 415 417 419 421 423 425 427 429 431 433 435 437 439 441 741 443 445 447 449 451 453 455 457 459 461 463 465 467 469 471 473 475 477 479 481 483 485 487 489 491 493 495 497 499 501 503 505 507 509 511 513 515 517 519 521 523 525 527 529553 531 533 535 537 539 541 543 545 547 549 551 555 557 559 561 563 565 567 569 571 573 575 577 579 581 583 585 587 589 591 593 595 597 599 601 603 605 607 609 611 613 615 617 619 621 623 625 781 627 629 631 633 635 637 639 641 643 645 647 649 651 653 655 657 659 661 663 665 667 669 671 673 675 677 679 681 683 685 687 689 691 693 695 697 699 701 703 705 707 709 711 713 715 717 719 721 723 725 727 729 1093 731 733 735 737 739 743 745 747 749 751 753 755 757 759 761 763 765 767 769 771 773 775 777 779 783 785 787 789 791 793 795 797 799 801 803 805 807 809 811 813 815 817 819 821 823 825 827 829 831 833 835 837 839 841 871 843 845 847 849 851 853 855 857 859 861 863 865 867 869 873 875 877 879 881 883 885 887 889 891 893 895 897 899 901 903 905 907 909 911 913 915 917 919 921 923 925 927 929 931 933 935 937 939 941 943 945 947 949 951 953 955 957 959 961 993 963 965 967 969 971 973 975 977 979 981 983 985 987 989 991995 997 999 1001 GPU (OpenCL): old GPU used during tests The poor performances on the OpenCL implementation are also due to the old GPU used: a graphic card NVIDIA quadro FX 1800 with 768 Mio. This GPU has no cache for the global memory. And the main loop iterates on prime numbers in this global memory. More modern GPU could use the native OpenCL function ctz (instead a loop). Nevertheless, with the maximum list of numbers possible for this GPU, the OpenCL implementation has a little (disappointing) gain of performance compared to the sequential implementation. Parallel Numerical Verification of the σodd problem 27 / 41
  • 28. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tables 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 123 125 127 129 131 135 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169 183 171 173 175 177 179 181 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 219 221 223 225 403 227 229 231 233 235 237 239 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287 289 307 291 293 295 297 299 301 303 305 309 311 313 315 317 319 321 323 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359 361 381 363 365 367 369 371 373 375 377 379 383 385 387 389 391 393 395 397 399 401 405 407 409 411 413 415 417 419 421 423 425 427 429 431 433 435 437 439 441 741 443 445 447 449 451 453 455 457 459 461 463 465 467 469 471 473 475 477 479 481 483 485 487 489 491 493 495 497 499 501 503 505 507 509 511 513 515 517 519 521 523 525 527 529553 531 533 535 537 539 541 543 545 547 549 551 555 557 559 561 563 565 567 569 571 573 575 577 579 581 583 585 587 589 591 593 595 597 599 601 603 605 607 609 611 613 615 617 619 621 623 625 781 627 629 631 633 635 637 639 641 643 645 647 649 651 653 655 657 659 661 663 665 667 669 671 673 675 677 679 681 683 685 687 689 691 693 695 697 699 701 703 705 707 709 711 713 715 717 719 721 723 725 727 729 1093 731 733 735 737 739 743 745 747 749 751 753 755 757 759 761 763 765 767 769 771 773 775 777 779 783 785 787 789 791 793 795 797 799 801 803 805 807 809 811 813 815 817 819 821 823 825 827 829 831 833 835 837 839 841 871 843 845 847 849 851 853 855 857 859 861 863 865 867 869 873 875 877 879 881 883 885 887 889 891 893 895 897 899 901 903 905 907 909 911 913 915 917 919 921 923 925 927 929 931 933 935 937 939 941 943 945 947 949 951 953 955 957 959 961 993 963 965 967 969 971 973 975 977 979 981 983 985 987 989 991995 997 999 1001 GPU (OpenCL) — by list of numbers 0 20 40 60 80 100 100 1000 10000 100000 seconds s¥¦§ ¨© § ¥s ¨© §s ¨ ¥¥! s! § Parallel Numerical Verification of the σodd problem 28 / 41
  • 29. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tableshe problem 2 Computation Simple algorithm Better algorithm 3 Parallel implementations Multi-threads Message-passing (Open MPI) GPU (OpenCL) 4 Results Speedup Efficiency Overhead Benchmarks tables Parallel Numerical Verification of the σodd problem 29 / 41
  • 30. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tables 1 3 5 7 9 13 11 15 17 19 21 23 25 31 27 29 33 35 37 39 41 43 45 47 49 57 51 53 55 59 61 63 65 67 69 71 73 75 77 79 81 121 133 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 123 125 127 129 131 135 137 139 141 143 145 147 149 151 153 155 157 159 161 163 165 167 169 183 171 173 175 177 179 181 185 187 189 191 193 195 197 199 201 203 205 207 209 211 213 215 217 219 221 223 225 403 227 229 231 233 235 237 239 241 243 245 247 249 251 253 255 257 259 261 263 265 267 269 271 273 275 277 279 281 283 285 287 289 307 291 293 295 297 299 301 303 305 309 311 313 315 317 319 321 323 325 327 329 331 333 335 337 339 341 343 345 347 349 351 353 355 357 359 361 381 363 365 367 369 371 373 375 377 379 383 385 387 389 391 393 395 397 399 401 405 407 409 411 413 415 417 419 421 423 425 427 429 431 433 435 437 439 441 741 443 445 447 449 451 453 455 457 459 461 463 465 467 469 471 473 475 477 479 481 483 485 487 489 491 493 495 497 499 501 503 505 507 509 511 513 515 517 519 521 523 525 527 529553 531 533 535 537 539 541 543 545 547 549 551 555 557 559 561 563 565 567 569 571 573 575 577 579 581 583 585 587 589 591 593 595 597 599 601 603 605 607 609 611 613 615 617 619 621 623 625 781 627 629 631 633 635 637 639 641 643 645 647 649 651 653 655 657 659 661 663 665 667 669 671 673 675 677 679 681 683 685 687 689 691 693 695 697 699 701 703 705 707 709 711 713 715 717 719 721 723 725 727 729 1093 731 733 735 737 739 743 745 747 749 751 753 755 757 759 761 763 765 767 769 771 773 775 777 779 783 785 787 789 791 793 795 797 799 801 803 805 807 809 811 813 815 817 819 821 823 825 827 829 831 833 835 837 839 841 871 843 845 847 849 851 853 855 857 859 861 863 865 867 869 873 875 877 879 881 883 885 887 889 891 893 895 897 899 901 903 905 907 909 911 913 915 917 919 921 923 925 927 929 931 933 935 937 939 941 943 945 947 949 951 953 955 957 959 961 993 963 965 967 969 971 973 975 977 979 981 983 985 987 989 991995 997 999 1001 Results Results are produced on a computer with only 4 cores, that explains the decrease in gains beginning at 5 cores. Results with Open MPI are a little strange, because for some parameters they are better than the sequential implementation. It is like as if mpirun on the sequential program made it faster. Theoretically the overhead of the MPI implementation should be bigger than the multi-thread implementation, due to the communication between processes (but tests were made on a single computer). The implementation is almost identical to the multi-thread version and all computation results are identical, thus it must be correct. Maybe the GCC compiler required with Open MPI optimizes better this code than the clang compiler used for sequential and multi-thread versions. Maybe is due to a little imprecision in the measures. The two better implementations (“dynamic” algorithm with threads and Open MPI) are both pretty close to the ideal. Parallel Numerical Verification of the σodd problem 30 / 41
  • 31. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablespeedup 0 1 2 3 4 5 0 1 2 3 4 5 6 7 8 speedup # thre#$%'()e00 i$e12i23 0e46e12i#7 thre#$0%(1e 83 (1e thre#$0%83 '#19e thre#$0%$31#@i) wAB%(1e 83 (1e wAB%$31#@i) Parallel Numerical Verification of the σodd problem 31 / 41
  • 32. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablespeedup with OpenCL 0 1 2 3 4 5 1 10 100 1000 10000 100000 1x106 CD EEF GD # thrHIPQRSocess or size of the list of numbers (logarithmic scale) identity sequential threads/one by one threads/by range threads/dynamic MPI/one by one MPI/dynamic OpenCL Parallel Numerical Verification of the σodd problem 32 / 41
  • 33. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesfficiency 0 0.2 0.4 0.6 0.8 1 1.2 0 1 2 3 4 T 6 U 8 e V ciency W XY`ead/process sequential threads/one by one threads/by range threads/dynamic MPI/one by one MPI/dynamic Parallel Numerical Verification of the σodd problem 33 / 41
  • 34. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesfficiency with OpenCL 0 0.2 0.4 0.6 0.8 1 1 10 100 1000 10000 100000 abac6 e d ciency # thread/process or size of the list of numbers (logarithmic scale) sequential threads/one by one threads/by range threads/dynamic MPI/one by one MPI/dynamic OpenCL Parallel Numerical Verification of the σodd problem 34 / 41
  • 35. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesverhead 0 1000 2000 3000 4000 5000 0 1 2 3 4 5 6 7 8 efgh head # thripqrstuvixx xiy€i‚ƒp„ thripqxrui …† ui thripqxr…† tp‡i thripqxrq†pˆƒv ‰‘rui …† ui ‰‘rq†pˆƒv Parallel Numerical Verification of the σodd problem 35 / 41
  • 36. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesverhead only until 4 cores ’“”” ’•”” 0 200 “”” 600 800 1000 1200 0 1 2 – “ —˜™d head # thrfghjklmnfpp sequential threads/one by one threads/by range threads/dynamic MPI/one by one MPI/dynamic Parallel Numerical Verification of the σodd problem 36 / 41
  • 37. Parallel Numerical Verification of the σodd problem The problem Computation Simple algo. Better algorithm Parallel implementations Multi-threads Message-passing GPU (OpenCL) Results Speedup Efficiency Overhead Benchmarks tablesverhead with OpenCL 1 10 100 1000 10000 100000 1x106 1x107 1x108 1x10q 1x1010 1 10 100 1000 10000 100000 1x106 overhead(logarithmicscale) # thread/process or size of the list of numbers (logarithmic scale) sequential threads/one by one threads/by range threads/dynamic MPI/one by one MPI/dynamic OpenCL Parallel Numerical Verification of the σodd problem 37 / 41