Computation error

Author	Message
flallnatural Send message Joined: 29 Sep 18 Posts: 2 Credit: 67,507,412 RAC: 0	Message 989 - Posted: 22 Nov 2018, 3:22:42 UTC I follow a schedule on BOINC so my computer only computes when I'm not using it. The problem I'm running into though is when BOINC suspends projects because of a schedule or use, when it returns to crunching, the last Amicable Numbers task it was working on is lost regardless of the percentage it was at. I have other projects running on GPU and CPU and they are able to resume just fine. Whats going on with Amicable Numbers? I would really appreciate some help. Thanks. ID: 989 · Rating: 0 · rate: / Reply Quote

Kellen Send message Joined: 14 Nov 17 Posts: 70 Credit: 1,000,005,236 RAC: 0	Message 990 - Posted: 23 Nov 2018, 3:34:25 UTC - in response to Message 989. Hi flallnatural, When we started the new large prime search this also started happening to my computers. I have not found any way around it other than to make sure that any given task completes before I suspend BOINC work. When I want to use my computer I select "No New Tasks" for Amicable Numbers on the Projects tab in BOINC and just wait for the ones I have downloaded to finish. I run BOINC with zero buffer, so this is, at most, two tasks. As your buffer is somewhat larger, you can do the same thing, selecting No New Tasks on the Projects tab, then suspend all of the Amicable Numbers tasks that are not currently running, and wait out the one that is running. Your computer seems to be taking approximately 900 seconds to complete each task, so the most you would have to wait is 15 minutes. The perfect amount of time to make a nice cup of tea and a slice or two of toast :) I know this isn't the solution you are looking for, but I hope it helps anyway. Regards, Kellen ID: 990 · Rating: 0 · rate: / Reply Quote

flallnatural Send message Joined: 29 Sep 18 Posts: 2 Credit: 67,507,412 RAC: 0	Message 991 - Posted: 23 Nov 2018, 4:02:50 UTC - in response to Message 990. Thank you for your response. I'll give that a shot! ID: 991 · Rating: 0 · rate: / Reply Quote

AT Hiker Send message Joined: 21 Sep 18 Posts: 20 Credit: 66,803,284 RAC: 0	Message 992 - Posted: 23 Nov 2018, 14:26:38 UTC Obviously there is a problem in the coding of Amicable Numbers. The "fix" suggested works but you waste computing time if you run more than 1 work unit at a time, which is what I do. ID: 992 · Rating: 0 · rate: / Reply Quote

Quantum Mechanic Send message Joined: 23 Feb 19 Posts: 1 Credit: 27,345 RAC: 0	Message 1098 - Posted: 26 Feb 2019, 0:02:18 UTC https://sech.me/boinc/Amicable/result.php?resultid=22880929 (2 in a row) OK wtf is going on? ID: 1098 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 598 Credit: 72,451,573 RAC: 0	Message 1099 - Posted: 26 Feb 2019, 8:02:29 UTC - in response to Message 1098. Last modified: 26 Feb 2019, 8:03:00 UTC clEnqueueWriteBuffer returned error -5 This is CL_OUT_OF_RESOURCES error. Try to reduce kernel size in computing preferences: https://sech.me/boinc/Amicable/prefs.php?subset=project ID: 1099 · Rating: 0 · rate: / Reply Quote

candido Send message Joined: 13 Feb 17 Posts: 1 Credit: 2,520,331 RAC: 0	Message 1100 - Posted: 28 Feb 2019, 16:13:09 UTC - in response to Message 1099. Hi! I have a few errors (https://sech.me/boinc/Amicable/results.php?userid=2313&offset=0&show_names=0&state=6&appid=) with this one : clGetEventInfo returned error -58 Any idea on what is causing the errors? ThankS! Eg: <core_client_version>7.8.3</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <allow_non_selected_apps>1</allow_non_selected_apps> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <allow_non_selected_apps>1</allow_non_selected_apps> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 21:40:53 (13408): called boinc_finish(-1) </stderr_txt> ]]> ID: 1100 · Rating: 0 · rate: / Reply Quote

JohnMD Send message Joined: 8 Jan 18 Posts: 11 Credit: 25,123,011 RAC: 0	Message 1102 - Posted: 13 Mar 2019, 1:56:16 UTC - in response to Message 1100. Last modified: 13 Mar 2019, 2:00:10 UTC I get the same "-58" in 2 situations with Nvidia 930M (I also have Intel 520 for display) 1. When I close BOINC and restart. The GPU app CAN'T restart, even though it has created a checkpoint file. 2. When I switch to another user, the GPU app gets suspended. When I switch back it CAN'T resume, even though suspended tasks are 'kept in storage'. Sounds to me like something's been forgotten and the program not properly tested. ID: 1102 · Rating: 0 · rate: / Reply Quote

BobMALCS Send message Joined: 27 May 18 Posts: 2 Credit: 18,232,128 RAC: 0	Message 1105 - Posted: 30 Mar 2019, 12:17:54 UTC This error is irritating. Especially as there seems to be no attempt to fix it. I'm not going to waste my time and money on it. Bye. ID: 1105 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0	Message 1106 - Posted: 30 Mar 2019, 16:41:06 UTC Last modified: 30 Mar 2019, 16:51:15 UTC I'm receiving error -58 on my 1060 3gb. Sometimes work units complete, other times they get errors. Is it related to how much available CPU is open to the WU? If CPU projects take up too much CPU does this error occur? Have run CUDA or OpenCL WUs for Einstein, GPUGrid, Moo!, Milkyway and Asteroids on this 1060 3GB successfully over the last week as part of a baseline testing at default values (no overclocking, just an aggressive cooling profile to assure the GPU's are under 60C and best BIOS based clocking decisions are made). Logfile: <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>2</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>2</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 20:33:57 (1356): called boinc_finish(-1) </stderr_txt> ]]> ID: 1106 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0	Message 1107 - Posted: 31 Mar 2019, 20:07:56 UTC Last modified: 31 Mar 2019, 20:14:37 UTC It's most every WU now. Kernel size is a meager 21 for a 3GB GTX, that should be easy (https://sech.me/boinc/Amicable/forum_thread.php?id=128&postid=795#795 2 CPU's selected. Peak working set on every failed WU doesn't exceed 586 MB or peak swap 985mb. The valid WUs show maximum: Peak working set size 608 MB Peak swap size 1,005 MB Not attempting multiple WU in app_config and this is just default, non-overclocked WU attempts. GPU is running cool at 41C. Maybe it's the driver version 19.3.2? (took hours to get dual ATI/nVidia setup to work... not wanting to change drivers). ID: 1107 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 598 Credit: 72,451,573 RAC: 0	Message 1110 - Posted: 1 Apr 2019, 11:10:34 UTC - in response to Message 1107. I don't really know what causes error -58 (CL_INVALID_EVENT). It's triggered at this line: https://github.com/SChernykh/Amicable/blob/boinc-opencl-version-128-bit/Amicable/OpenCL.cpp#L1017 - but it's always set properly in the preceding call to clEnqueueNDRangeKernel on the last iteration of "for" loop. My guess is that OpenCL driver runs out of resources occasionally. ID: 1110 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0	Message 1111 - Posted: 1 Apr 2019, 21:00:15 UTC - in response to Message 1110. Last modified: 1 Apr 2019, 21:09:11 UTC I don't really know what causes error -58 (CL_INVALID_EVENT). It's triggered at this line: https://github.com/SChernykh/Amicable/blob/boinc-opencl-version-128-bit/Amicable/OpenCL.cpp#L1017 - but it's always set properly in the preceding call to clEnqueueNDRangeKernel on the last iteration of "for" loop. It started happening on the RX 550 GPU (machine has 1 RX 550, 1 GTX 1060), so that should eliminate video drivers or hardware and point to the OS or main computing Guessing it's OS components or non-GPU hardware configuration. The machine has 8GB RAM and 4GB swap file space. At one point, there was reported 2.1GB free RAM but the swapfile was nearly full with commits. If I decrease swap file space and get 100% error -58 failures and increase swapspace and the errors are gone, then it would be not enough real/virtual memory. My guess is that OpenCL driver runs out of resources occasionally. Won't get to test it till later in the week. The machine has moved onto other data gathering. ID: 1111 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0	Message 1113 - Posted: 3 Apr 2019, 22:59:55 UTC - in response to Message 1111. The machine is ready to test Amicable again. New job load of BOINC and other running apps on barebones (shut down most all possible services) Windows 10 (Oct 2018) configuration: Total memory commits are 8.07GB 4.3GB free of 8GB RAM. Each WU has 1 free CPU available. 1 WU of sieve 23 on the 1060 3GB 1 WU of sieve 23 on the RX 550 4GB Will let you know the results. ID: 1113 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0	Message 1114 - Posted: 4 Apr 2019, 10:06:35 UTC - in response to Message 1113. Last modified: 4 Apr 2019, 10:09:42 UTC There are two pending validation and 4x error -58 WU's. task 23580804 task 23580777 task 23580802 task 23588243 I restarted the computer in order to shut down more Windows 10 services and get it to bare bones system (every Windows store app support service). Gained another 350MB and so the system uses ~750MB on clean boot with no user apps running. Each of the -58 computation errors occurred after the restart and BOINC attempting to resume Amicable Numbers WU from save point. There wasn't a shortage of swap space or free RAM, so original hypothesis likely denied. Will leave the computer alone for 2 days and see if any more error -58 occur spontaneously. ID: 1114 · Rating: 0 · rate: / Reply Quote

Dingo Send message Joined: 30 Jan 17 Posts: 12 Credit: 168,439,956 RAC: 1,465	Message 1115 - Posted: 4 Apr 2019, 14:57:37 UTC My tasks are all ending in error today since the last lot of new work: It processes up to the last second then aborts with an error; This is an example. https://sech.me/boinc/Amicable/result.php?resultid=23592821 I have aborted all my tasks till this is fixed. <core_client_version>7.14.2</core_client_version> <![CDATA[ <message> (unknown error) - exit code -1 (0xffffffff)</message> <stderr_txt> Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 Initializing prime tables...done c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences: <project_preferences> <max_jobs>0</max_jobs> <max_cpus>0</max_cpus> <kernel_size_amd>21</kernel_size_amd> <kernel_size_nvidia>21</kernel_size_nvidia> </project_preferences> c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 21 c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58 01:52:13 (1232): called boinc_finish(-1) </stderr_txt> ]]>[[/url] Proud Founder and member of Have a look at my WebCam ID: 1115 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0	Message 1116 - Posted: 5 Apr 2019, 20:35:35 UTC - in response to Message 1114. I restarted the computer in order to shut down more Windows 10 services and get it to bare bones system (every Windows store app support service). Gained another 350MB and so the system uses ~750MB on clean boot with no user apps running. Will leave the computer alone for 2 days and see if any more error -58 occur spontaneously. So the barebones Windows 10 has no screensaver, no windows defender/firewall, no antivirus, no defrag, no back ground tasks infrastructure, no tasks host, no windows update, no Windows store services and no users apps but MSI Afterburner and BOINC Manager running. Just a basic OS running BOINC and GPU fan cooling. No error -58's in 2 days. Going to stress test Amicable Number WU's by suspending in rapid succession, shutting down and restarting BOINC client multiple times, starting up 3 VM's to fill up RAM and see if I can cause some -58's. It seems to be a resume error so all the dedicated BOINC machines that never suspend the WU's are not seeing an issue. ID: 1116 · Rating: 0 · rate: / Reply Quote

marmot Send message Joined: 14 Mar 19 Posts: 9 Credit: 26,298,837 RAC: 0	Message 1118 - Posted: 8 Apr 2019, 3:25:59 UTC - in response to Message 1116. Last modified: 8 Apr 2019, 3:26:35 UTC It seems to be a resume error so all the dedicated BOINC machines that never suspend the WU's are not seeing an issue. Suspending WU's repeatedly (removed from RAM) caused no issues on resume as long as boinc.exe remained in RAM. (Note: this machine just finished 2 days of WU's on both GPU's without errors) Error -58 comes from a failed WU restart after boinc.exe shuts down and restarts. Let a WU run on both GPU's, plenty of free RAM and swap space, and a free CPU for each. Both WU ran successfully for 30 minutes then performed a graceful BOINC shutdown. Upon restart, both WU showed they were restarting from 29:xx minutes and within 10 seconds computation error of -58 on both WU, independent of driver or GPU model. 1st attempt: 23667006 23666961 2nd attempt: 23661437 23666900 Rest of the visible errors were from testing the maximum number of WU per GPU possible. (nCores=Max WU, but ran out of virtual memory at 6, running 2/GPU test overnight). ID: 1118 · Rating: 0 · rate: / Reply Quote

Sergei Chernykh Project administrator Project developer Send message Joined: 5 Jan 17 Posts: 598 Credit: 72,451,573 RAC: 0	Message 1119 - Posted: 8 Apr 2019, 8:20:32 UTC Last modified: 8 Apr 2019, 8:21:01 UTC It looks like I fixed this error: https://github.com/SChernykh/Amicable/commit/806085804dc51e48aef2527cd18861ad3a986bc0 I'll test it some more and update GPU versions today. ID: 1119 · Rating: 0 · rate: / Reply Quote