Work unit errors

Author	Message
AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0	Message 940 - Posted: 3 Oct 2018, 21:43:12 UTC Daily I have to suspend operation of Boinc work units to change users. Sometimes, but not always, when the work is started again the work unit(s) will immediate leave Boinc and new work downloaded. It makes me wonder if there is some time limit a work unit in be actual progress and finish before it goes into an error state. Anyone have an idea as to what causes this and how I can stop it. Thanks. ID: 940 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 598 Credit: 72,451,573 RAC: 0	Message 941 - Posted: 4 Oct 2018, 7:13:35 UTC I see only 3 errors out of 925 work units for your PC. Time limit for work unit is 3 days from the time it was sent to client. ID: 941 · Rating: 0 · rate: / Reply Quote

AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0	Message 942 - Posted: 4 Oct 2018, 12:10:35 UTC - in response to Message 941. That means that it is not a timing issue. ID: 942 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 598 Credit: 72,451,573 RAC: 0	Message 943 - Posted: 4 Oct 2018, 19:44:40 UTC Last modified: 4 Oct 2018, 19:46:41 UTC It can happen if: - this is an old work unit which expired for someone else - then it was sent to you - then this "someone else" finally sends it to the server - the server validates it and cancels all remaining "in progress" tasks (including yours) P.S. But your 3 errors are just computing errors, they didn't happen immediately after start. ID: 943 · Rating: 0 · rate: / Reply Quote

AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0	Message 944 - Posted: 5 Oct 2018, 0:22:03 UTC - in response to Message 943. Yes they are listed are computation errors but it is a little suspicious because: I have literally run thousands of PrimeGrid work units without a computational error and those work units tend to stress the GPUs more than the ones that failed here. If I am wrong about the stress part please correct me. All of the errors occurred immediately after restarting the work. Something happened which might never be explained. Thanks for the reply. ID: 944 · Rating: 0 · rate: / Reply Quote

vseven Send message Joined: 15 Mar 18 Posts: 12 Credit: 587,338,410 RAC: 0	Message 947 - Posted: 10 Oct 2018, 12:15:36 UTC I've seen the same issue, a WU failing upon startup. Not just in this project but in others also. But with 3 failed out of 900+ I don't know if its worth trying to figure out. ID: 947 · Rating: 0 · rate: / Reply Quote

BobMALCS Send message Joined: 27 May 18 Posts: 2 Credit: 18,232,128 RAC: 0	Message 951 - Posted: 24 Oct 2018, 21:44:49 UTC Running Windows 10. I have now had the same problem. A work unit failed immediately up being restarted at 00:00:00. Obviously it is a rare occurance but still a waste of time. Looking at the stderr output I noticed one thing. ======= Stderr output <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>3</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>23</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 23 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>3</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>23</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 23 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 00:00:00 (5288): called boinc_finish(-1) </stderr_txt> ]]> ======= I assume that "c:\temp\" is the real name of the folder and not some indirect reference to somewhere else. If it is an indirect reference then my following statements are not relevant.. I do not have a folder named "c:\temp\". My temp folders are on another disk. If you are going to use the system temp folder then go look for it and do not assume where you think it should be. Its not a good idea to use the temp folder when the task is inactive for a long period of time; 12 hours in my case. You have no idea what may happen to the folder or data in that time period. Part of my system maintenance is to delete unused or not recently active files in the temp folder. In any case, should you not be using the "..\BOINC\Data\slots\" folder for a task's temporary work files. At a guess it doesn't seem likely, from the error message, that this causes an error. However BOINC is set to leave the tasks in core while they are idle. I likely restarted the PC while the task was suspended. This could well have caused some corruption at the end of the file. ID: 951 · Rating: 0 · rate: / Reply Quote