All work is ending in Error

Message boards : Number crunching : All work is ending in Error

To post messages, you must log in.

AuthorMessage
Profile Dingo
Avatar

Send message
Joined: 30 Jan 17
Posts: 11
Credit: 71,598,438
RAC: 1,146
   
Message 905 - Posted: 12 Sep 2018, 15:26:55 UTC
Last modified: 12 Sep 2018, 15:28:36 UTC

Almost all my tasks on both Windows and Linux are ending in error:

This is an example https://sech.me/boinc/Amicable/result.php?resultid=16886548

This is the output:

Task 16886548
Name	amicable_10_20_27977_1536444902.738474_524_1
Workunit	7600724
Created	9 Sep 2018, 2:57:59 UTC
Sent	12 Sep 2018, 15:19:24 UTC
Report deadline	15 Sep 2018, 15:19:24 UTC
Received	12 Sep 2018, 15:21:04 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-1073741819 (0xC0000005) STATUS_ACCESS_VIOLATION
Computer ID	45990
Run time	
CPU time	
Validate state	Invalid
Credit	0.00
Device peak FLOPS	31.47 GFLOPS
Application version	Amicable Numbers up to 10^20 v2.16 (mt)
windows_x86_64
Stderr output
<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1073741819 (0xc0000005)</message>
<stderr_txt>


Unhandled Exception Detected...

- Unhandled Exception Record -


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190

Engaging BOINC Windows Runtime Debugger...



Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190

Engaging BOINC Windows Runtime Debugger...



********************


BOINC Windows Runtime Debugger Version 7.9.0


Dump Timestamp    : 09/13/18 01:19:24
Install Directory : C:\Program Files\BOINC\
Data Directory    : C:\ProgramData\BOINC
Project Symstore  : 
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190

Engaging BOINC Windows Runtime Debugger...

LoadLibraryA( C:\ProgramData\BOINC\dbghelp.dll ): GetLastError = 126


Unhandled Exception Detected...

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190

Engaging BOINC Windows Runtime Debugger...

Loaded Library    : dbghelp.dll
LoadLibraryA( C:\ProgramData\BOINC\symsrv.dll ): GetLastError = 126
LoadLibraryA( symsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\srcsrv.dll ): GetLastError = 126
LoadLibraryA( srcsrv.dll ): GetLastError = 126
LoadLibraryA( C:\ProgramData\BOINC\version.dll ): GetLastError = 126
Loaded Library    : version.dll
Debugger Engine   : 4.0.5.0
Symbol Search Path: C:\ProgramData\BOINC\slots\3;C:\ProgramData\BOINC\projects\sech.me_boinc_Amicable


ModLoad: 0000000037050000 0000000000d67000 C:\ProgramData\BOINC\projects\sech.me_boinc_Amicable\Amicable_v_2_16.exe (-nosymbols- Symbols Loaded)
    Linked PDB Filename   : C:\Temp\Amicable-boinc-version-128-bit\x64\Release\Amicable.pdb

ModLoad: 0000000022330000 00000000001e1000 C:\WINDOWS\SYSTEM32\ntdll.dll (6.2.17134.254) (-exported- Symbols Loaded)
    Linked PDB Filename   : ntdll.pdb
    File Version          : 10.0.17134.228 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.228

ModLoad: 0000000021130000 00000000000b2000 C:\WINDOWS\System32\KERNEL32.DLL (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernel32.pdb
    File Version          : 10.0.17134.228 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.228

ModLoad: 000000001eb50000 0000000000273000 C:\WINDOWS\System32\KERNELBASE.dll (6.2.17134.165) (-exported- Symbols Loaded)
    Linked PDB Filename   : kernelbase.pdb
    File Version          : 10.0.17134.228 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.228

ModLoad: 0000000020fa0000 0000000000190000 C:\WINDOWS\System32\USER32.dll (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : user32.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 000000001f5b0000 0000000000020000 C:\WINDOWS\System32\win32u.dll (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : win32u.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 0000000021400000 0000000000028000 C:\WINDOWS\System32\GDI32.dll (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 000000001e9b0000 0000000000192000 C:\WINDOWS\System32\gdi32full.dll (6.2.17134.112) (-exported- Symbols Loaded)
    Linked PDB Filename   : gdi32full.pdb
    File Version          : 10.0.17134.112 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.112

ModLoad: 000000001e8b0000 000000000009f000 C:\WINDOWS\System32\msvcp_win.dll (6.2.17134.137) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcp_win.pdb
    File Version          : 10.0.17134.137 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.137

ModLoad: 000000001e7b0000 00000000000fa000 C:\WINDOWS\System32\ucrtbase.dll (6.2.17134.254) (-exported- Symbols Loaded)
    Linked PDB Filename   : ucrtbase.pdb
    File Version          : 10.0.17134.254 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.254

ModLoad: 0000000021820000 00000000000a1000 C:\WINDOWS\System32\ADVAPI32.dll (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : advapi32.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 0000000022250000 000000000009e000 C:\WINDOWS\System32\msvcrt.dll (7.0.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : msvcrt.pdb
    File Version          : 7.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 7.0.17134.1

ModLoad: 0000000021c00000 000000000005b000 C:\WINDOWS\System32\sechost.dll (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : sechost.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 0000000020e70000 0000000000124000 C:\WINDOWS\System32\RPCRT4.dll (6.2.17134.112) (-exported- Symbols Loaded)
    Linked PDB Filename   : rpcrt4.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 0000000021dc0000 000000000002d000 C:\WINDOWS\System32\IMM32.DLL (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : imm32.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 000000001e6e0000 0000000000011000 C:\WINDOWS\System32\kernel.appcore.dll (6.2.17134.112) (-exported- Symbols Loaded)
    Linked PDB Filename   : Kernel.Appcore.pdb
    File Version          : 10.0.17134.112 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.112

ModLoad: 0000000019f30000 00000000001c9000 C:\WINDOWS\SYSTEM32\dbghelp.dll (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : dbghelp.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1

ModLoad: 000000001ab60000 000000000000a000 C:\WINDOWS\SYSTEM32\version.dll (6.2.17134.1) (-exported- Symbols Loaded)
    Linked PDB Filename   : version.pdb
    File Version          : 10.0.17134.1 (WinBuild.160101.0800)
    Company Name          : Microsoft Corporation
    Product Name          : Microsoft&#174; Windows&#174; Operating System
    Product Version       : 10.0.17134.1



*** Dump of the Process Statistics: ***

- I/O Operations Counters -
Read: 7, Write: 577, Other 69

- I/O Transfers Counters -
Read: 17519, Write: 604, Other 6394

- Paged Pool Usage -
QuotaPagedPoolUsage: 113648, QuotaPeakPagedPoolUsage: 113648
QuotaNonPagedPoolUsage: 9104, QuotaPeakNonPagedPoolUsage: 9232

- Virtual Memory Usage -
VirtualSize: 529408000, PeakVirtualSize: 625639424

- Pagefile Usage -
PagefileUsage: 529408000, PeakPagefileUsage: 529416192

- Working Set Size -
WorkingSetSize: 98172928, PeakWorkingSetSize: 98172928, PageFaultCount: 24901

*** Dump of thread ID 18012 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190


*** Dump of thread ID 11556 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Normal, , Kernel Time: 0.000000, User Time: 0.000000, Wait Time: 0.000000

- Unhandled Exception Record -
Reason: Access Violation (0xc0000005) at address 0x00007FF737059BC7 read attempt to address 0x00003190


*** Dump of thread ID 32761 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 37.000000, User Time: 0.000000, Wait Time: 4242696448.000000


*** Dump of thread ID 30689963 (state: Unknown): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 4294967296.000000, User Time: 21474836480.000000, Wait Time: 156250.000000


*** Dump of thread ID 5 (state: Initialized): ***

- Information -
Status: Base Priority: Normal, Priority: Unknown, , Kernel Time: 51584956.000000, User Time: 140707994009600.000000, Wait Time: 9468.000000


*** Dump of thread ID 1 (state: Initialized): ***

- Information -
Status: Base Priority: Unknown, Priority: Unknown, , Kernel Time: 131812391695417344.000000, User Time: 51584956.000000, Wait Time: 5604.000000



*** Debug Message Dump ****


*** Foreground Window Data ***
    Window Name      : 
    Window Class     : 
    Window Process ID: 0
    Window Thread ID : 0

Exiting...

</stderr_txt>



The Linux output is different:

<core_client_version>7.4.23</core_client_version>
<![CDATA[
<message>
process exited with code 193 (0xc1, -63)
</message>
<stderr_txt>
SIGSEGV: segmentation violation
Stack trace (10 frames):
[0x4384d0]
[0x458bb0]
[0x416f5f]
[0x414e7e]
[0x41bee3]
[0x4141dc]
[0x405ba3]
[0x49bbcf]
[0x44f095]
[0x59fecb]

Exiting...

</stderr_txt>

Proud Founder and member of



Have a look at my WebCam
ID: 905 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 906 - Posted: 12 Sep 2018, 15:35:35 UTC

I can confirm it started crashing on latest work units, but I don't know why yet. I'll try to fix it today.
ID: 906 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 907 - Posted: 12 Sep 2018, 15:54:25 UTC

I've found the bug, will update CPU version later today.
ID: 907 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 908 - Posted: 12 Sep 2018, 18:26:58 UTC

It's fixed now.
ID: 908 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dingo
Avatar

Send message
Joined: 30 Jan 17
Posts: 11
Credit: 71,598,438
RAC: 1,146
   
Message 909 - Posted: 13 Sep 2018, 4:23:17 UTC - in response to Message 908.  

I will give another try today.

Proud Founder and member of



Have a look at my WebCam
ID: 909 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 910 - Posted: 13 Sep 2018, 9:15:57 UTC

Errors stopped on my PC. Good job!
ID: 910 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
XAVER
Avatar

Send message
Joined: 17 Jul 18
Posts: 1
Credit: 17,999,698
RAC: 0
  
Message 911 - Posted: 13 Sep 2018, 20:56:09 UTC

GPU WUs still ending in error
ID: 911 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
corris

Send message
Joined: 23 Apr 17
Posts: 4
Credit: 186,151,574
RAC: 0
  
Message 912 - Posted: 13 Sep 2018, 21:26:48 UTC - in response to Message 911.  

yep, suddenly GPU on linux have started erroring

Were running OK, now not so
ID: 912 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 913 - Posted: 13 Sep 2018, 21:55:40 UTC

Sorry, will fix it ASAP.
ID: 913 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
corris

Send message
Joined: 23 Apr 17
Posts: 4
Credit: 186,151,574
RAC: 0
  
Message 914 - Posted: 13 Sep 2018, 21:59:56 UTC - in response to Message 913.  

Not all Sergei

Maybe something > 50% over the past hour or so. Error (nvidia 1060 Linux) at 18 seconds

If you are on the case, then no worries
ID: 914 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 915 - Posted: 13 Sep 2018, 22:06:51 UTC
Last modified: 13 Sep 2018, 22:19:30 UTC

I've updated Windows & Linux OpenCL versions, can you check that they run fine? MacOS version will follow soon.

P.S. MacOS OpenCL version is now updated.
ID: 915 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 916 - Posted: 13 Sep 2018, 22:39:54 UTC

It looks like it's fixed now. New GPU versions didn't give any errors so far, and almost 100 WUs are already finished.
ID: 916 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Dingo
Avatar

Send message
Joined: 30 Jan 17
Posts: 11
Credit: 71,598,438
RAC: 1,146
   
Message 918 - Posted: 17 Sep 2018, 15:24:58 UTC

I am getting a different error now on most of my tasks



Task 17126533
Name	amicable_10_20_8598_1537187402.097611_151_0
Workunit	7699398
Created	17 Sep 2018, 12:30:41 UTC
Sent	17 Sep 2018, 13:47:25 UTC
Report deadline	20 Sep 2018, 13:47:25 UTC
Received	17 Sep 2018, 13:51:59 UTC
Server state	Over
Outcome	Computation error
Client state	Compute error
Exit status	-1 (0xFFFFFFFF) Unknown error code
Computer ID	45990
Run time	1 min 1 sec
CPU time	1 sec
Validate state	Invalid
Credit	0.00
Device peak FLOPS	11,792.29 GFLOPS
Application version	Amicable Numbers up to 10^20 v2.17 (opencl_nvidia)
windows_x86_64
Peak working set size	151.53 MB
Peak swap size	1,413.85 MB
Peak disk usage	0.01 MB
Stderr output
<core_client_version>7.12.1</core_client_version>
<![CDATA[
<message>
(unknown error) - exit code -1 (0xffffffff)</message>
<stderr_txt>
Initializing prime tables...done
c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences:
<project_preferences>


<max_jobs>0</max_jobs>
<max_cpus>0</max_cpus>
<kernel_size_amd>21</kernel_size_amd>
<kernel_size_nvidia>23</kernel_size_nvidia>
</project_preferences>

c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 23
Initializing prime tables...done
c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 294: Preferences:
<project_preferences>


<max_jobs>0</max_jobs>
<max_cpus>0</max_cpus>
<kernel_size_amd>21</kernel_size_amd>
<kernel_size_nvidia>23</kernel_size_nvidia>
</project_preferences>

c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 307: Kernel size for NVIDIA GPU has been set to 23
c:\temp\amicable-boinc-opencl-version-128-bit\amicable\opencl.cpp, line 1130: clGetEventInfo returned error -58
23:50:43 (17116): called boinc_finish(-1)

</stderr_txt>
]]>


Proud Founder and member of



Have a look at my WebCam
ID: 918 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 534
Credit: 72,451,573
RAC: 0
   
Message 919 - Posted: 17 Sep 2018, 21:46:12 UTC - in response to Message 918.  

Try to reduce "Kernel size for NVIDIA GPU", 23 might be too high.
ID: 919 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote

Message boards : Number crunching : All work is ending in Error


©2024 Sergei Chernykh