Auto-ban faulty hosts

Message boards : Number crunching : Auto-ban faulty hosts

To post messages, you must log in.

Previous · 1 · 2

AuthorMessage
Tex1954

Send message
Joined: 4 Feb 17
Posts: 4
Credit: 24,049,346
RAC: 0
   
Message 501 - Posted: 26 Jun 2017, 16:07:07 UTC
Last modified: 26 Jun 2017, 16:16:25 UTC

Ummm, I am just getting an older system going and it crapped out a couple times and I think it may be or get banned.

No tasks are showing in BOINC on my end,, did a BOINC update, your system shows 16 error and 16 in progress... I think it got glitched somehow...

Anyway, the host ID is 4878

Can you reset it, correct the "In Progress" data and unban it if it is banned? It is setup with 2 GTX 580 cards (yes old I know) and an I7-950 on X58 mobo.

Right now both GTX580's are running PrimeGrid tasks flawlessly... seems like the hardware and drivers are correct now.

Thanks!

8-)
ID: 501 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 447
Credit: 72,451,573
RAC: 0
   
Message 502 - Posted: 26 Jun 2017, 18:10:13 UTC
Last modified: 26 Jun 2017, 18:10:23 UTC

Host 4878 is not banned. "Error" status is just a client error, it doesn't count. Only "Invalid" status counts.
ID: 502 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Tex1954

Send message
Joined: 4 Feb 17
Posts: 4
Credit: 24,049,346
RAC: 0
   
Message 503 - Posted: 26 Jun 2017, 20:49:37 UTC - in response to Message 502.  

Ohh, okay! Thanks!

8-)
ID: 503 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Salt
Avatar

Send message
Joined: 12 Dec 17
Posts: 2
Credit: 222,545,447
RAC: 0
   
Message 690 - Posted: 23 Dec 2017, 11:32:15 UTC

Would be grateful if you could remove my Macbook Pro host # 20220 ... It has insufficient GPU RAM to run Amicable Numbers so until the base memory requirement goes down it is just wasting space in your statistics.
ID: 690 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 447
Credit: 72,451,573
RAC: 0
   
Message 691 - Posted: 23 Dec 2017, 13:07:03 UTC - in response to Message 690.  

It can still run CPU tasks. But if you consider only GPUs for Amicable Numbers, this host will be auto-removed after 3 months of inactivity.
ID: 691 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Salt
Avatar

Send message
Joined: 12 Dec 17
Posts: 2
Credit: 222,545,447
RAC: 0
   
Message 708 - Posted: 19 Jan 2018, 14:32:29 UTC - in response to Message 691.  

Thanks Sergei - I found out how to remove it. I only do GPU Amicable tasks. Yoyo@home takes up all my CPUs
ID: 708 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
steverocky

Send message
Joined: 14 Mar 17
Posts: 1
Credit: 21,111,379
RAC: 0
   
Message 773 - Posted: 24 Mar 2018, 17:27:23 UTC

I did 1300 good work units and 4 invalid and you banned my computer!!!!!!! BYE!!!
ID: 773 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 447
Credit: 72,451,573
RAC: 0
   
Message 774 - Posted: 25 Mar 2018, 7:18:14 UTC - in response to Message 773.  

I did 1300 good work units and 4 invalid and you banned my computer!!!!!!! BYE!!!

Do you realise this ban was automatic? Unless you fix this PC, it will stay banned.
ID: 774 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 20 Feb 17
Posts: 15
Credit: 349,985,444
RAC: 3,363
   
Message 775 - Posted: 26 Mar 2018, 23:44:24 UTC - in response to Message 774.  

I did 1300 good work units and 4 invalid and you banned my computer!!!!!!! BYE!!!


Do you realise this ban was automatic? Unless you fix this PC, it will stay banned.


You guys do realize that he's using the latest driver and the problems was:

1 error detected in the compilation of "C:\Users\steve\AppData\Local\Temp\OCL2252T1.cl".
Frontend phase failed compilation.

OpenCL.cpp, line 397: Trying to disable 'goto' and build again
01:16:06 (2252): called boinc_finish(0)

please tell me how this is a USER CAUSED ERROR? Because I too could be banned if this is a problem caused by the users!!
ID: 775 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 447
Credit: 72,451,573
RAC: 0
   
Message 776 - Posted: 27 Mar 2018, 6:39:55 UTC - in response to Message 775.  
Last modified: 27 Mar 2018, 7:06:24 UTC

1 error detected in the compilation of "C:\Users\steve\AppData\Local\Temp\OCL2252T1.cl".
Frontend phase failed compilation.

OpenCL.cpp, line 397: Trying to disable 'goto' and build again
01:16:06 (2252): called boinc_finish(0)

please tell me how this is a USER CAUSED ERROR? Because I too could be banned if this is a problem caused by the users!!

That PC has this message in all tasks' logs, including a few hundred valid ones. It's normal: in this case it failed to compile because of a known issue with amd drivers and a workaround was applied, so it compiled and ran fine after that.

But incorrect result was returned for a few tasks out of a few hundred valid tasks. This is a direct indication of unstable hardware. Many similar configurations run with the same message in logs, but without problems (validation failures) on this project.

Only validation failures are counted - it's when a client didn't report any errors on exit, but returned incorrect result. This can jeopardize the whole project's target and can't be tolerated.

Only "Completed, marked as invalid" status is counted for banning.

Because I too could be banned if this is a problem caused by the users!!

No, your computers have 0 "Completed, marked as invalid" tasks.

P.S. There are over 5000 hosts that run Amicable Numbers every day. Only 3 hosts are currently banned, so please don't exaggerate this problem.
ID: 776 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 20 Feb 17
Posts: 15
Credit: 349,985,444
RAC: 3,363
   
Message 777 - Posted: 27 Mar 2018, 13:16:39 UTC - in response to Message 776.  

1 error detected in the compilation of "C:\Users\steve\AppData\Local\Temp\OCL2252T1.cl".
Frontend phase failed compilation.

OpenCL.cpp, line 397: Trying to disable 'goto' and build again
01:16:06 (2252): called boinc_finish(0)

please tell me how this is a USER CAUSED ERROR? Because I too could be banned if this is a problem caused by the users!!

That PC has this message in all tasks' logs, including a few hundred valid ones. It's normal: in this case it failed to compile because of a known issue with amd drivers and a workaround was applied, so it compiled and ran fine after that.

But incorrect result was returned for a few tasks out of a few hundred valid tasks. This is a direct indication of unstable hardware. Many similar configurations run with the same message in logs, but without problems (validation failures) on this project.

Only validation failures are counted - it's when a client didn't report any errors on exit, but returned incorrect result. This can jeopardize the whole project's target and can't be tolerated.

Only "Completed, marked as invalid" status is counted for banning.

Because I too could be banned if this is a problem caused by the users!!


No, your computers have 0 "Completed, marked as invalid" tasks.

P.S. There are over 5000 hosts that run Amicable Numbers every day. Only 3 hosts are currently banned, so please don't exaggerate this problem.


I'm not trying too...the point is the host did Valid (240) workunits and ONLY· Invalid (2) ones and yet it was banned. My point is that perhaps your criteria for banning is a bit too sensitive and then to say 'well it's automatic' isn't being genuine because SOMEONE had to set the criteria for the 'automatic' procedure to be implemented. It's not built into the Boinc software to do that or several other projects would have long ago banned hundreds of graphics cards, but they haven't. This is like the accounting department telling me 'the computer made a mistake', NO computers don't do that, they only do what they are told to do that's why we all love them, they can do the same thing over and over and over again until either a part fails or the programming fails, but they do not randomly ban pc's or make mistakes.

I don't want to get into an argument here, it's your project and you can run it the way you like, but to start banning pc's with just 2 errors out of 242 workunits seems a bit harsh. What's the procedure for the guy to bring his pc back? In the banning process he had more workunits on his pc that he finished crunching but was NOT allowed to return them to even figure out it the problem was fixed or not, he had to abort them all and move to another project, where the graphics card is working just fine.
ID: 777 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Sergei Chernykh
Project administrator
Project developer

Send message
Joined: 5 Jan 17
Posts: 447
Credit: 72,451,573
RAC: 0
   
Message 778 - Posted: 27 Mar 2018, 14:24:20 UTC - in response to Message 777.  

to start banning pc's with just 2 errors out of 242 workunits seems a bit harsh. What's the procedure for the guy to bring his pc back? In the banning process he had more workunits on his pc that he finished crunching but was NOT allowed to return them to even figure out it the problem was fixed or not, he had to abort them all and move to another project, where the graphics card is working just fine.

It's not harsh, because extremely few computers actually get banned here. He needs to check his GPU, run stress tests and make sure it doesn't crash or give errors, doesn't overheat, and I'll unban it then. "Working just fine" depends on the type of workload. It may be just on the edge for some projects and run fine, but cross the edge and fail sometimes on other projects.
ID: 778 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
mikey
Avatar

Send message
Joined: 20 Feb 17
Posts: 15
Credit: 349,985,444
RAC: 3,363
   
Message 783 - Posted: 28 Mar 2018, 3:31:35 UTC - in response to Message 778.  

to start banning pc's with just 2 errors out of 242 workunits seems a bit harsh. What's the procedure for the guy to bring his pc back? In the banning process he had more workunits on his pc that he finished crunching but was NOT allowed to return them to even figure out it the problem was fixed or not, he had to abort them all and move to another project, where the graphics card is working just fine.


It's not harsh, because extremely few computers actually get banned here. He needs to check his GPU, run stress tests and make sure it doesn't crash or give errors, doesn't overheat, and I'll unban it then. "Working just fine" depends on the type of workload. It may be just on the edge for some projects and run fine, but cross the edge and fail sometimes on other projects.


He's not a rookie cruncher and has been crunching for a long time so understands all that, but it's okay 'lots of fish in the sea' as they say and he's now happily crunching just fine at another project.
ID: 783 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 920 - Posted: 18 Sep 2018, 17:42:42 UTC
Last modified: 18 Sep 2018, 18:35:43 UTC

I am not receiving any units for: https://sech.me/boinc/Amicable/show_host_detail.php?hostid=52502
I had three nVidia errors because I was testing the Kernel size and at 23 it crashed.
LE: All is good now.
ID: 920 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
SoNic1967

Send message
Joined: 8 Sep 18
Posts: 13
Credit: 23,954,022
RAC: 0
  
Message 922 - Posted: 19 Sep 2018, 19:15:05 UTC

Canceled 7 nVidia GPU tasks because I removed that GPU from the system. Too slow.
ID: 922 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
M0CZY
Avatar

Send message
Joined: 28 Mar 17
Posts: 1
Credit: 560,689
RAC: 0
   
Message 996 - Posted: 27 Nov 2018, 8:29:16 UTC - in response to Message 922.  

Canceled 7 nVidia GPU tasks because I removed that GPU from the system. Too slow.

I am using an Nvidia GT 610, which is one of the slowest GPUs that will work on this project.
My work units take over 5 hours each, but I don't think that it is too slow. It earns about 1,100 credits per hour, which is good considering that the computer has a 12 year old Pentium 4 processor.
The biggest threat to public safety and security is not terrorism, it is Government abuse of authority.
Bitcoin Donations: 1Le52kWoLz42fjfappoBmyg73oyvejKBR3
ID: 996 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Beyond
Avatar

Send message
Joined: 12 Apr 17
Posts: 13
Credit: 2,227,466,572
RAC: 0
   
Message 1076 - Posted: 29 Jan 2019, 20:00:02 UTC - in response to Message 778.  
Last modified: 29 Jan 2019, 20:02:03 UTC

It's not harsh, because extremely few computers actually get banned here.

And that's a problem for the project and the rest of us that are still processing WUs. There's a mechanism in BOINC that will lower the number of WUs a bad host gets per day until it only receives 1/day. Please institute this mechanism. If the host starts producing valid WUs their allocation will increase. Right now the situation is ridiculous, with some hosts spewing out hundreds of errors. This causes valid users to receive erroneous "Completed, can't validate" messages. All because of good WUs that get flagged as "Too many errors (may have bug)" when they are fine.

Example 1:

https://sech.me/boinc/Amicable/workunit.php?wuid=9905159

My machine is listed as "Completed, can't validate" because of "Too many errors (may have bug)". In reality EVERY other machine is one that produces almost all errors.


Example 2:

https://sech.me/boinc/Amicable/workunit.php?wuid=9880120

Once again, my machine is listed as "Completed, can't validate" because of "Too many errors (may have bug)". In reality the 6 other machines are ones that produce almost all errors.

This is just crazy. Please get these bad machines under control.
ID: 1076 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Previous · 1 · 2

Message boards : Number crunching : Auto-ban faulty hosts


©2020 Sergei Chernykh