Message boards : Number crunching : GPU - Double Precision?
Author | Message |
---|---|
SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0 |
I have computed the same unit with my RX580 as a Tesla V100 SXM2. RX580 finished the work unit in 1140 seconds. The V100 finished it in 82 seconds, 13.9x faster: https://sech.me/boinc/Amicable/workunit.php?wuid=7557871 Since my card is "rated" 6175 Single / 386 Double (GFlops) and the Tesla is rated 15667 Single / 7830 Double, (2.5x faster in Simple Precision, 20x faster in Double Precision) I assume that this app use a lot of Double Precision calculations. Am I correct in this assumption? |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 534 Credit: 72,451,573 RAC: 0 |
You can look at the GPU code here: https://github.com/SChernykh/Amicable/blob/boinc-opencl-version-128-bit/Amicable/kernel.cl It uses 64-bit integer arithmetic which is kind of equivalent to FP double arithmetic when it comes to performance. Yes, GPUs with a lot of double GFlops capability will be much faster. |
SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0 |
So if one would convert those 64 bit int to operations with 32 bit (vector array?), it would gain a lot of speed on current GPU's? |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 534 Credit: 72,451,573 RAC: 0 |
They're not vectorizable - the algorithm requires 64x64 bit multiplications. But if you can optimize this GPU code, do a pull request. |
SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0 |
How about this discussion, is it applicable? https://stackoverflow.com/questions/28886467/multiplication-of-32-bits-numbers-in-c |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 534 Credit: 72,451,573 RAC: 0 |
It doesn't make sense to rewrite 64-bit multiplication as 4 32-bit multiplications because this is what OpenCL compiler already does for GPUs with weak FP double performance. |
SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0 |
Hmm. I don't think that the compiler does that. The results from the first post suggested me that is pushing 64 bit to all of the GPS. Otherwise the discrepancy would be less than it is now, I was expecting more like 3-4x. But I have no experience in this area. |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 534 Credit: 72,451,573 RAC: 0 |
It's more than 3-4x because you need to: 1) Do 4 multiplications 2) Add 4 results together, doing 3 64-bit additions In total, it translates to ~12 low level 32-bit operations. I tried it before, it doesn't improve speed. |
SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0 |
Interesting. Yes, 12 is a little excessive. The MULT_ADD cannot be used? |
[AF>Amis des Lapins] Phil1966 Send message Joined: 24 Jan 17 Posts: 6 Credit: 247,909,594 RAC: 40 |
Don't understand the tech details, but I see that GPU's with high DP Gflops perf. such as the old HD7950/7970 or R9 280/X are much slower than GTX's 7XX, 9XX, 10X0, ... If you look at MilkyWay@Home stats, you will see these old AMD GPU's are much faster than nVidia's (except Titan , Titan Black, and some Tesla) |
SoNic1967 Send message Joined: 8 Sep 18 Posts: 13 Credit: 23,954,022 RAC: 0 |
Don't understand the tech details, but I see that GPU's with high DP Gflops perf. such as the old HD7950/7970 or R9 280/X are much slower than GTX's 7XX, 9XX, 10X0, ... Can you give links with those examples? That's what I did in my first post. |
Message boards : Number crunching : GPU - Double Precision?
©2024 Sergei Chernykh