Message boards : Number crunching : Work unit errors
Author | Message |
---|---|
AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0 |
Daily I have to suspend operation of Boinc work units to change users. Sometimes, but not always, when the work is started again the work unit(s) will immediate leave Boinc and new work downloaded. It makes me wonder if there is some time limit a work unit in be actual progress and finish before it goes into an error state. Anyone have an idea as to what causes this and how I can stop it. Thanks. |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 534 Credit: 72,451,573 RAC: 0 |
I see only 3 errors out of 925 work units for your PC. Time limit for work unit is 3 days from the time it was sent to client. |
AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0 |
That means that it is not a timing issue. |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 534 Credit: 72,451,573 RAC: 0 |
It can happen if: - this is an old work unit which expired for someone else - then it was sent to you - then this "someone else" finally sends it to the server - the server validates it and cancels all remaining "in progress" tasks (including yours) P.S. But your 3 errors are just computing errors, they didn't happen immediately after start. |
AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0 |
Yes they are listed are computation errors but it is a little suspicious because: I have literally run thousands of PrimeGrid work units without a computational error and those work units tend to stress the GPUs more than the ones that failed here. If I am wrong about the stress part please correct me. All of the errors occurred immediately after restarting the work. Something happened which might never be explained. Thanks for the reply. |
vseven Send message Joined: 15 Mar 18 Posts: 12 Credit: 587,338,410 RAC: 0 |
I've seen the same issue, a WU failing upon startup. Not just in this project but in others also. But with 3 failed out of 900+ I don't know if its worth trying to figure out. |
BobMALCS Send message Joined: 27 May 18 Posts: 2 Credit: 18,232,128 RAC: 0 |
Running Windows 10. I have now had the same problem. A work unit failed immediately up being restarted at 00:00:00. Obviously it is a rare occurance but still a waste of time. Looking at the stderr output I noticed one thing. ======= Stderr output <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>3</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>23</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 23 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>3</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>23</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 23 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 00:00:00 (5288): called boinc_finish(-1) </stderr_txt> ]]> ======= I assume that "c:\temp\" is the real name of the folder and not some indirect reference to somewhere else. If it is an indirect reference then my following statements are not relevant.. I do not have a folder named "c:\temp\". My temp folders are on another disk. If you are going to use the system temp folder then go look for it and do not assume where you think it should be. Its not a good idea to use the temp folder when the task is inactive for a long period of time; 12 hours in my case. You have no idea what may happen to the folder or data in that time period. Part of my system maintenance is to delete unused or not recently active files in the temp folder. In any case, should you not be using the "..\BOINC\Data\slots\" folder for a task's temporary work files. At a guess it doesn't seem likely, from the error message, that this causes an error. However BOINC is set to leave the tasks in core while they are idle. I likely restarted the PC while the task was suspended. This could well have caused some corruption at the end of the file. |
Message boards : Number crunching : Work unit errors
©2024 Sergei Chernykh