GPU - Double Precision?

Message boards : Number crunching : GPU - Double Precision?

To post messages, you must log in.

AuthorMessage
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 896 - Posted: 10 Sep 2018, 8:17:13 UTC

I have computed the same unit with my RX580 as a Tesla V100 SXM2.
RX580 finished the work unit in 1140 seconds. The V100 finished it in 82 seconds, 13.9x faster:
https://sech.me/boinc/Amicable/workunit.php?wuid=7557871

Since my card is "rated" 6175 Single / 386 Double (GFlops) and the Tesla is rated 15667 Single / 7830 Double, (2.5x faster in Simple Precision, 20x faster in Double Precision) I assume that this app use a lot of Double Precision calculations.

Am I correct in this assumption?
ID: 896 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 897 - Posted: 10 Sep 2018, 8:21:10 UTC - in response to Message 896.  
Last modified: 10 Sep 2018, 8:21:24 UTC

You can look at the GPU code here: https://github.com/SChernykh/Amicable/blob/boinc-opencl-version-128-bit/Amicable/kernel.cl

It uses 64-bit integer arithmetic which is kind of equivalent to FP double arithmetic when it comes to performance. Yes, GPUs with a lot of double GFlops capability will be much faster.
ID: 897 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 898 - Posted: 10 Sep 2018, 8:32:26 UTC

So if one would convert those 64 bit int to operations with 32 bit (vector array?), it would gain a lot of speed on current GPU's?
ID: 898 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 899 - Posted: 10 Sep 2018, 8:40:43 UTC - in response to Message 898.  

They're not vectorizable - the algorithm requires 64x64 bit multiplications. But if you can optimize this GPU code, do a pull request.
ID: 899 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 900 - Posted: 10 Sep 2018, 8:50:56 UTC

How about this discussion, is it applicable?
https://stackoverflow.com/questions/28886467/multiplication-of-32-bits-numbers-in-c
ID: 900 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 901 - Posted: 10 Sep 2018, 10:31:37 UTC - in response to Message 900.  

It doesn't make sense to rewrite 64-bit multiplication as 4 32-bit multiplications because this is what OpenCL compiler already does for GPUs with weak FP double performance.
ID: 901 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 902 - Posted: 10 Sep 2018, 14:15:20 UTC - in response to Message 901.  

Hmm. I don't think that the compiler does that. The results from the first post suggested me that is pushing 64 bit to all of the GPS. Otherwise the discrepancy would be less than it is now, I was expecting more like 3-4x.
But I have no experience in this area.
ID: 902 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 903 - Posted: 10 Sep 2018, 15:24:35 UTC - in response to Message 902.  

It's more than 3-4x because you need to:
1) Do 4 multiplications
2) Add 4 results together, doing 3 64-bit additions
In total, it translates to ~12 low level 32-bit operations. I tried it before, it doesn't improve speed.
ID: 903 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 904 - Posted: 10 Sep 2018, 17:07:50 UTC - in response to Message 903.  
Last modified: 10 Sep 2018, 17:10:39 UTC

Interesting. Yes, 12 is a little excessive. The MULT_ADD cannot be used?
ID: 904 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile [AF>Amis des Lapins] Phil1966

Send message
Joined: 24 Jan 17
Posts: 6
Credit: 247,909,594
RAC: 49
   
Message 917 - Posted: 16 Sep 2018, 7:22:40 UTC

Don't understand the tech details, but I see that GPU's with high DP Gflops perf. such as the old HD7950/7970 or R9 280/X are much slower than GTX's 7XX, 9XX, 10X0, ...
If you look at MilkyWay@Home stats, you will see these old AMD GPU's are much faster than nVidia's (except Titan , Titan Black, and some Tesla)
ID: 917 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 921 - Posted: 18 Sep 2018, 17:47:39 UTC - in response to Message 917.  

Don't understand the tech details, but I see that GPU's with high DP Gflops perf. such as the old HD7950/7970 or R9 280/X are much slower than GTX's 7XX, 9XX, 10X0, ...
If you look at MilkyWay@Home stats, you will see these old AMD GPU's are much faster than nVidia's (except Titan , Titan Black, and some Tesla)

Can you give links with those examples? That's what I did in my first post.
ID: 921 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : GPU - Double Precision?


©2024 Sergei Chernykh