GPU - Double Precision?

Author	Message
SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0	Message 896 - Posted: 10 Sep 2018, 8:17:13 UTC I have computed the same unit with my RX580 as a Tesla V100 SXM2. RX580 finished the work unit in 1140 seconds. The V100 finished it in 82 seconds, 13.9x faster: https://sech.me/boinc/Amicable/workunit.php?wuid=7557871 Since my card is "rated" 6175 Single / 386 Double (GFlops) and the Tesla is rated 15667 Single / 7830 Double, (2.5x faster in Simple Precision, 20x faster in Double Precision) I assume that this app use a lot of Double Precision calculations. Am I correct in this assumption? ID: 896 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 548 Credit: 72,451,573 RAC: 0	Message 897 - Posted: 10 Sep 2018, 8:21:10 UTC - in response to Message 896. Last modified: 10 Sep 2018, 8:21:24 UTC You can look at the GPU code here: https://github.com/SChernykh/Amicable/blob/boinc-opencl-version-128-bit/Amicable/kernel.cl It uses 64-bit integer arithmetic which is kind of equivalent to FP double arithmetic when it comes to performance. Yes, GPUs with a lot of double GFlops capability will be much faster. ID: 897 · Rating: 0 · rate: / Reply Quote

SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0	Message 898 - Posted: 10 Sep 2018, 8:32:26 UTC So if one would convert those 64 bit int to operations with 32 bit (vector array?), it would gain a lot of speed on current GPU's? ID: 898 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 548 Credit: 72,451,573 RAC: 0	Message 899 - Posted: 10 Sep 2018, 8:40:43 UTC - in response to Message 898. They're not vectorizable - the algorithm requires 64x64 bit multiplications. But if you can optimize this GPU code, do a pull request. ID: 899 · Rating: 0 · rate: / Reply Quote

SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0	Message 900 - Posted: 10 Sep 2018, 8:50:56 UTC How about this discussion, is it applicable? https://stackoverflow.com/questions/28886467/multiplication-of-32-bits-numbers-in-c ID: 900 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 548 Credit: 72,451,573 RAC: 0	Message 901 - Posted: 10 Sep 2018, 10:31:37 UTC - in response to Message 900. It doesn't make sense to rewrite 64-bit multiplication as 4 32-bit multiplications because this is what OpenCL compiler already does for GPUs with weak FP double performance. ID: 901 · Rating: 0 · rate: / Reply Quote

SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0	Message 902 - Posted: 10 Sep 2018, 14:15:20 UTC - in response to Message 901. Hmm. I don't think that the compiler does that. The results from the first post suggested me that is pushing 64 bit to all of the GPS. Otherwise the discrepancy would be less than it is now, I was expecting more like 3-4x. But I have no experience in this area. ID: 902 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 548 Credit: 72,451,573 RAC: 0	Message 903 - Posted: 10 Sep 2018, 15:24:35 UTC - in response to Message 902. It's more than 3-4x because you need to: 1) Do 4 multiplications 2) Add 4 results together, doing 3 64-bit additions In total, it translates to ~12 low level 32-bit operations. I tried it before, it doesn't improve speed. ID: 903 · Rating: 0 · rate: / Reply Quote

SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0	Message 904 - Posted: 10 Sep 2018, 17:07:50 UTC - in response to Message 903. Last modified: 10 Sep 2018, 17:10:39 UTC Interesting. Yes, 12 is a little excessive. The MULT_ADD cannot be used? ID: 904 · Rating: 0 · rate: / Reply Quote

[AF>Amis des Lapins] Phil1966 Send message Joined: 24 Jan 17 Posts: 6 Credit: 247,909,594 RAC: 0	Message 917 - Posted: 16 Sep 2018, 7:22:40 UTC Don't understand the tech details, but I see that GPU's with high DP Gflops perf. such as the old HD7950/7970 or R9 280/X are much slower than GTX's 7XX, 9XX, 10X0, ... If you look at MilkyWay@Home stats, you will see these old AMD GPU's are much faster than nVidia's (except Titan , Titan Black, and some Tesla) ID: 917 · Rating: 0 · rate: / Reply Quote

SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0	Message 921 - Posted: 18 Sep 2018, 17:47:39 UTC - in response to Message 917. Don't understand the tech details, but I see that GPU's with high DP Gflops perf. such as the old HD7950/7970 or R9 280/X are much slower than GTX's 7XX, 9XX, 10X0, ... If you look at MilkyWay@Home stats, you will see these old AMD GPU's are much faster than nVidia's (except Titan , Titan Black, and some Tesla) Can you give links with those examples? That's what I did in my first post. ID: 921 · Rating: 0 · rate: / Reply Quote