Message boards : Number crunching : Computation error
Author | Message |
---|---|
flallnatural Send message Joined: 29 Sep 18 Posts: 2 Credit: 67,507,412 RAC: 0 |
I follow a schedule on BOINC so my computer only computes when I'm not using it. The problem I'm running into though is when BOINC suspends projects because of a schedule or use, when it returns to crunching, the last Amicable Numbers task it was working on is lost regardless of the percentage it was at. I have other projects running on GPU and CPU and they are able to resume just fine. Whats going on with Amicable Numbers? I would really appreciate some help. Thanks. |
Kellen Send message Joined: 14 Nov 17 Posts: 70 Credit: 1,000,005,236 RAC: 0 |
Hi flallnatural, When we started the new large prime search this also started happening to my computers. I have not found any way around it other than to make sure that any given task completes before I suspend BOINC work. When I want to use my computer I select "No New Tasks" for Amicable Numbers on the Projects tab in BOINC and just wait for the ones I have downloaded to finish. I run BOINC with zero buffer, so this is, at most, two tasks. As your buffer is somewhat larger, you can do the same thing, selecting No New Tasks on the Projects tab, then suspend all of the Amicable Numbers tasks that are not currently running, and wait out the one that is running. Your computer seems to be taking approximately 900 seconds to complete each task, so the most you would have to wait is 15 minutes. The perfect amount of time to make a nice cup of tea and a slice or two of toast :) I know this isn't the solution you are looking for, but I hope it helps anyway. Regards, Kellen |
flallnatural Send message Joined: 29 Sep 18 Posts: 2 Credit: 67,507,412 RAC: 0 |
Thank you for your response. I'll give that a shot! |
AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0 |
Obviously there is a problem in the coding of Amicable Numbers. The "fix" suggested works but you waste computing time if you run more than 1 work unit at a time, which is what I do. |
Quantum Mechanic Send message Joined: 23 Feb 19 Posts: 1 Credit: 27,345 RAC: 0 |
|
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 518 Credit: 72,451,573 RAC: 0 |
clEnqueueWriteBuffer returned error -5 This is CL_OUT_OF_RESOURCES error. Try to reduce kernel size in computing preferences: https://sech.me/boinc/Amicable/prefs.php?subset=project |
candido Send message Joined: 13 Feb 17 Posts: 1 Credit: 2,520,331 RAC: 0 |
Hi! I have a few errors (https://sech.me/boinc/Amicable/results.php?userid=2313&offset=0&show_names=0&state=6&appid=) with this one : clGetEventInfo returned error -58 Any idea on what is causing the errors? ThankS! Eg: <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <allow_non_selected_apps>1</allow_non_selected_apps> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <allow_non_selected_apps>1</allow_non_selected_apps> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 21:40:53 (13408): called boinc_finish(-1) </stderr_txt> ]]> |
JohnMD Send message Joined: 8 Jan 18 Posts: 11 Credit: 25,123,011 RAC: 0 |
I get the same "-58" in 2 situations with Nvidia 930M (I also have Intel 520 for display) 1. When I close BOINC and restart. The GPU app CAN'T restart, even though it has created a checkpoint file. 2. When I switch to another user, the GPU app gets suspended. When I switch back it CAN'T resume, even though suspended tasks are 'kept in storage'. Sounds to me like something's been forgotten and the program not properly tested. |
BobMALCS Send message Joined: 27 May 18 Posts: 2 Credit: 18,232,128 RAC: 0 |
This error is irritating. Especially as there seems to be no attempt to fix it. I'm not going to waste my time and money on it. Bye. |
marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0 |
I'm receiving error -58 on my 1060 3gb. Sometimes work units complete, other times they get errors. Is it related to how much available CPU is open to the WU? If CPU projects take up too much CPU does this error occur? Have run CUDA or OpenCL WUs for Einstein, GPUGrid, Moo!, Milkyway and Asteroids on this 1060 3GB successfully over the last week as part of a baseline testing at default values (no overclocking, just an aggressive cooling profile to assure the GPU's are under 60C and best BIOS based clocking decisions are made). Logfile: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>2</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>2</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 20:33:57 (1356): called boinc_finish(-1) </stderr_txt> ]]> |
marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0 |
It's most every WU now. Kernel size is a meager 21 for a 3GB GTX, that should be easy (https://sech.me/boinc/Amicable/forum_thread.php?id=128&postid=795#795 2 CPU's selected. Peak working set on every failed WU doesn't exceed 586 MB or peak swap 985mb. The valid WUs show maximum: Peak working set size 608 MB Peak swap size 1,005 MB Not attempting multiple WU in app_config and this is just default, non-overclocked WU attempts. GPU is running cool at 41C. Maybe it's the driver version 19.3.2? (took hours to get dual ATI/nVidia setup to work... not wanting to change drivers). |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 518 Credit: 72,451,573 RAC: 0 |
I don't really know what causes error -58 (CL_INVALID_EVENT). It's triggered at this line: https://github.com/SChernykh/Amicable/blob/boinc-opencl-version-128-bit/Amicable/OpenCL.cpp#L1017 - but it's always set properly in the preceding call to clEnqueueNDRangeKernel on the last iteration of "for" loop. My guess is that OpenCL driver runs out of resources occasionally. |
marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0 |
I don't really know what causes error -58 (CL_INVALID_EVENT). It's triggered at this line: https://github.com/SChernykh/Amicable/blob/boinc-opencl-version-128-bit/Amicable/OpenCL.cpp#L1017 - but it's always set properly in the preceding call to clEnqueueNDRangeKernel on the last iteration of "for" loop. It started happening on the RX 550 GPU (machine has 1 RX 550, 1 GTX 1060), so that should eliminate video drivers or hardware and point to the OS or main computing Guessing it's OS components or non-GPU hardware configuration. The machine has 8GB RAM and 4GB swap file space. At one point, there was reported 2.1GB free RAM but the swapfile was nearly full with commits. If I decrease swap file space and get 100% error -58 failures and increase swapspace and the errors are gone, then it would be not enough real/virtual memory. My guess is that OpenCL driver runs out of resources occasionally. Won't get to test it till later in the week. The machine has moved onto other data gathering. |
marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0 |
The machine is ready to test Amicable again. New job load of BOINC and other running apps on barebones (shut down most all possible services) Windows 10 (Oct 2018) configuration: Total memory commits are 8.07GB 4.3GB free of 8GB RAM. Each WU has 1 free CPU available. 1 WU of sieve 23 on the 1060 3GB 1 WU of sieve 23 on the RX 550 4GB Will let you know the results. |
marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0 |
There are two pending validation and 4x error -58 WU's. task 23580804 task 23580777 task 23580802 task 23588243 I restarted the computer in order to shut down more Windows 10 services and get it to bare bones system (every Windows store app support service). Gained another 350MB and so the system uses ~750MB on clean boot with no user apps running. Each of the -58 computation errors occurred after the restart and BOINC attempting to resume Amicable Numbers WU from save point. There wasn't a shortage of swap space or free RAM, so original hypothesis likely denied. Will leave the computer alone for 2 days and see if any more error -58 occur spontaneously. |
Dingo Send message Joined: 30 Jan 17 Posts: 11 Credit: 71,461,714 RAC: 0 |
My tasks are all ending in error today since the last lot of new work: It processes up to the last second then aborts with an error; This is an example. https://sech.me/boinc/Amicable/result.php?resultid=23592821 I have aborted all my tasks till this is fixed. <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 01:52:13 (1232): called boinc_finish(-1) </stderr_txt> ]]>[[/url] Proud Founder and member of Have a look at my WebCam |
marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0 |
So the barebones Windows 10 has no screensaver, no windows defender/firewall, no antivirus, no defrag, no back ground tasks infrastructure, no tasks host, no windows update, no Windows store services and no users apps but MSI Afterburner and BOINC Manager running. Just a basic OS running BOINC and GPU fan cooling. No error -58's in 2 days. Going to stress test Amicable Number WU's by suspending in rapid succession, shutting down and restarting BOINC client multiple times, starting up 3 VM's to fill up RAM and see if I can cause some -58's. It seems to be a resume error so all the dedicated BOINC machines that never suspend the WU's are not seeing an issue. |
marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0 |
Suspending WU's repeatedly (removed from RAM) caused no issues on resume as long as boinc.exe remained in RAM. (Note: this machine just finished 2 days of WU's on both GPU's without errors) Error -58 comes from a failed WU restart after boinc.exe shuts down and restarts. Let a WU run on both GPU's, plenty of free RAM and swap space, and a free CPU for each. Both WU ran successfully for 30 minutes then performed a graceful BOINC shutdown. Upon restart, both WU showed they were restarting from 29:xx minutes and within 10 seconds computation error of -58 on both WU, independent of driver or GPU model. 1st attempt: 23667006 23666961 2nd attempt: 23661437 23666900 Rest of the visible errors were from testing the maximum number of WU per GPU possible. (nCores=Max WU, but ran out of virtual memory at 6, running 2/GPU test overnight). |
Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 518 Credit: 72,451,573 RAC: 0 |
It looks like I fixed this error: https://github.com/SChernykh/Amicable/commit/806085804dc51e48aef2527cd18861ad3a986bc0 I'll test it some more and update GPU versions today. |
Message boards : Number crunching : Computation error
©2024 Sergei Chernykh