Computation Errors !

Message boards : Number crunching : Computation Errors !

To post messages, you must log in.

1 · 2 · Next

AuthorMessage
STE\/E

Send message
Joined: 27 Mar 17
Posts: 1
Credit: 1,028,433
RAC: 0
Message 37 - Posted: 23 Apr 2017, 12:33:25 UTC

Have 1 Box that refuses to run the Wu's, keeps giving this error: Rest of my Box's seem to run okay with an occasional computation error ...

<core_client_version>7.4.22</core_client_version>
<![CDATA[
<stderr_txt>
07:08:35 (13360): called boinc_finish(0)

</stderr_txt>
<message>
upload failure: <file_xfer_error>
<file_name>44_hydro_r_weighted260.txt-wu151678_2_r674292241_0</file_name>
<error_code>-161 (not found)</error_code>
</file_xfer_error>

</message>
]]>
ID: 37 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oleg Zaikin
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 28 Mar 17
Posts: 107
Credit: 1,601,711
RAC: 0
Message 43 - Posted: 21 Aug 2017, 15:50:53 UTC

I have released new application version. Please try it.
ID: 43 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 23 Jan 17
Posts: 2
Credit: 1,238,879
RAC: 0
Message 44 - Posted: 21 Aug 2017, 16:28:15 UTC - in response to Message 43.  

Good to see some new work. Have a couple machines that grabbed some work, we'll see how they do. Are new wu's going to be coming soon?
ID: 44 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oleg Zaikin
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 28 Mar 17
Posts: 107
Credit: 1,601,711
RAC: 0
Message 46 - Posted: 21 Aug 2017, 17:54:20 UTC - in response to Message 44.  

Yes, I will add some new wu's tomorrow. I also plan to launch a large experiment with the huge amount of wu;s soon.
ID: 46 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Paul

Send message
Joined: 23 Jan 17
Posts: 2
Credit: 1,238,879
RAC: 0
Message 47 - Posted: 22 Aug 2017, 20:42:00 UTC

Thanks for the quick reply, looking forward to a new batch!
ID: 47 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 24 Apr 17
Posts: 23
Credit: 750,247
RAC: 4,198
Message 48 - Posted: 29 Aug 2017, 7:10:52 UTC

I have also a number of failures. Most fail in less than a second with...

-185 (0xFFFFFF47) ERR_RESULT_START

... But I also have two which crunched for a couple of seconds before crashing out...

-1073741790 (0xC0000022) Unknown error code.
ID: 48 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oleg Zaikin
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 28 Mar 17
Posts: 107
Credit: 1,601,711
RAC: 0
Message 49 - Posted: 29 Aug 2017, 7:50:06 UTC - in response to Message 48.  

I have also a number of failures. Most fail in less than a second with...

-185 (0xFFFFFF47) ERR_RESULT_START

... But I also have two which crunched for a couple of seconds before crashing out...

-1073741790 (0xC0000022) Unknown error code.


Please give me the id of one such task.
ID: 49 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 24 Apr 17
Posts: 23
Credit: 750,247
RAC: 4,198
Message 50 - Posted: 29 Aug 2017, 8:27:53 UTC
Last modified: 29 Aug 2017, 8:28:48 UTC

Sure.

This is a typical example of the first type...

Name 505_ssp_full_uniform_boinc-wu2961_0
Workunit 385641

... and this of the second type...

Name 505_ssp_full_uniform_boinc-wu2498_0
Workunit 385178

As I said, they fail almost immediately, so it is no great loss to me, but your system is wasting resources if the same error is occurring on all processor types and or OS conditions. Anything else you need to solve the problem(s) just ask, happy to help.

Good luck.
ID: 50 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oleg Zaikin
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 28 Mar 17
Posts: 107
Credit: 1,601,711
RAC: 0
Message 52 - Posted: 30 Aug 2017, 7:36:16 UTC - in response to Message 50.  
Last modified: 30 Aug 2017, 7:37:01 UTC


Name 505_ssp_full_uniform_boinc-wu2498_0
Workunit 385178
Good luck.


Thank you for this information, I will try to fix it.
ID: 52 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 24 Apr 17
Posts: 23
Credit: 750,247
RAC: 4,198
Message 58 - Posted: 10 Sep 2017, 7:05:29 UTC

I notice that the work unit I mentioned of the "second type" which failed quickly here has been crunched normally by two others. The two others have 32 bit systems, mine is 64 bit. Probably insignificant but a difference certainly.
ID: 58 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Dr Who Fan
Avatar

Send message
Joined: 8 Apr 17
Posts: 5
Credit: 51,052
RAC: 189
Message 114 - Posted: 15 Apr 2018, 3:00:00 UTC

Seems I am NOT the only one that had problems downloading a task:.

Error while computing 9 of 10 Users had the same error,

See Workunit 1789037

name 100_uniform_dynamic_d_2n_boinc-wu757

ID: 114 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oleg Zaikin
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 28 Mar 17
Posts: 107
Credit: 1,601,711
RAC: 0
Message 115 - Posted: 15 Apr 2018, 7:09:12 UTC - in response to Message 114.  

Thank you, I'm going to check it.
ID: 115 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 24 Apr 17
Posts: 23
Credit: 750,247
RAC: 4,198
Message 118 - Posted: 15 Apr 2018, 14:36:26 UTC
Last modified: 15 Apr 2018, 14:57:29 UTC

I've also had a couple of download errors.
ID: 118 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Thyme Lawn

Send message
Joined: 14 Apr 18
Posts: 6
Credit: 109,704
RAC: 408
Message 120 - Posted: 15 Apr 2018, 16:45:08 UTC - in response to Message 118.  

I've had 15 download errors, all of them from a single work request at 02:17:15 UTC this morning which returned 43 tasks. There were no problems with the _0 and _1 tasks, but the repair tasks all failed to download. The following BOINC event log messages show this for tasks 100_uniform_dynamic_d_2n_boinc-wu1581_7 (download failed, exit status -200 and error code -200 (wrong size)) and 100_uniform_dynamic_d_2n_boinc-wu23373_1 (pending validation):

15-Apr-2018 03:17:15 [Acoustics@home] Sending scheduler request: To fetch work.
15-Apr-2018 03:17:15 [Acoustics@home] Requesting new tasks for CPU
15-Apr-2018 03:17:16 [Acoustics@home] Scheduler request completed: got 43 new tasks
15-Apr-2018 03:17:16 [Acoustics@home] [task] result state=NEW for 100_uniform_dynamic_d_2n_boinc-wu1581_7 from handle_scheduler_reply
15-Apr-2018 03:17:16 [Acoustics@home] [task] result state=NEW for 100_uniform_dynamic_d_2n_boinc-wu23373_1 from handle_scheduler_reply
15-Apr-2018 03:17:18 [Acoustics@home] Started download of input_100_uniform_dynamic_d_2n_boinc-wu1581 (467 bytes)
15-Apr-2018 03:17:19 [Acoustics@home] Incomplete read of 470.000000 < 5KB for input_100_uniform_dynamic_d_2n_boinc-wu1581 - truncating
15-Apr-2018 03:17:19 [Acoustics@home] Finished download of input_100_uniform_dynamic_d_2n_boinc-wu1581
15-Apr-2018 03:17:19 [Acoustics@home] Started download of input_100_uniform_dynamic_d_2n_boinc-wu23373 (470 bytes)
15-Apr-2018 03:17:19 [Acoustics@home] File input_100_uniform_dynamic_d_2n_boinc-wu1581 has wrong size: expected 467, got 0
15-Apr-2018 03:17:19 [Acoustics@home] Checksum or signature error for input_100_uniform_dynamic_d_2n_boinc-wu1581
15-Apr-2018 03:17:20 [Acoustics@home] Finished download of input_100_uniform_dynamic_d_2n_boinc-wu23373
15-Apr-2018 03:17:20 [Acoustics@home] [task] result state=FILES_DOWNLOADED for 100_uniform_dynamic_d_2n_boinc-wu23373_1 from CS::update_results

For all 15 of the download failures one of the initial replication tasks failed and the 8 repair tasks failed to download (the other initial task is either "Completed, can't validate", "In progress" or has also failed).

In subsequent scheduler requests I have had a number of repair tasks where download was successful. Six of them have been successfully completed, with the workunits being:

ID: 120 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oleg Zaikin
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 28 Mar 17
Posts: 107
Credit: 1,601,711
RAC: 0
Message 125 - Posted: 16 Apr 2018, 4:54:08 UTC - in response to Message 120.  

I will try to find out the reasons for these bugs.
ID: 125 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 24 Apr 17
Posts: 23
Credit: 750,247
RAC: 4,198
Message 126 - Posted: 16 Apr 2018, 6:55:34 UTC
Last modified: 16 Apr 2018, 7:54:09 UTC

I've seen another issue with the new units. Normally running tasks run for less than an hour, but I have an increasing number, currently 20, which run for around 7 hours before failing at 99.9xx% complete and showing 10-11 seconds remaining. Both of my machines have some of these, both are 4GHz i7's, running Windows 8.1.

>>> 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

No new tasks set. I have several of these suspended at the moment.
ID: 126 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Oleg Zaikin
Project administrator
Project developer
Project tester
Project scientist

Send message
Joined: 28 Mar 17
Posts: 107
Credit: 1,601,711
RAC: 0
Message 130 - Posted: 16 Apr 2018, 9:57:55 UTC - in response to Message 126.  

I've seen another issue with the new units. Normally running tasks run for less than an hour, but I have an increasing number, currently 20, which run for around 7 hours before failing at 99.9xx% complete and showing 10-11 seconds remaining. Both of my machines have some of these, both are 4GHz i7's, running Windows 8.1.

>>> 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

No new tasks set. I have several of these suspended at the moment.


Could you please point out these WUs' IDs.
ID: 130 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
Profile Michael H.W. Weber

Send message
Joined: 19 Sep 17
Posts: 4
Credit: 510,678
RAC: 4,553
Message 131 - Posted: 16 Apr 2018, 11:31:58 UTC - in response to Message 126.  
Last modified: 16 Apr 2018, 11:32:50 UTC

I've seen another issue with the new units. Normally running tasks run for less than an hour, but I have an increasing number, currently 20, which run for around 7 hours before failing at 99.9xx% complete and showing 10-11 seconds remaining. Both of my machines have some of these, both are 4GHz i7's, running Windows 8.1.

>>> 197 (0x000000C5) EXIT_TIME_LIMIT_EXCEEDED

Similar case here, but: These tasks do not appear to end with a computation error. They just run for up to 9 hrs on my Broadwell system without consuming any CPU. The runtime indicated does not corespond to their real runtime. It appears to me that just the time where CPU was consumed is counted?
I am not sure, but possibly these WUs end up in the queue to be validated?
This is the machine: http://www.acousticsathome.ru/boinc/results.php?hostid=2859&offset=0&show_names=0&state=2&appid=

Michael.
ID: 131 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 24 Apr 17
Posts: 23
Credit: 750,247
RAC: 4,198
Message 132 - Posted: 16 Apr 2018, 12:53:21 UTC

Sure...
>>>
4155473 1884083 1339 16 Apr 2018, 4:51:42 UTC 16 Apr 2018, 12:02:49 UTC Error while computing 25,758.60 0.00 --- SSPEMDD v0.17
windows_x86_64
4127382 1870320 1339 16 Apr 2018, 0:00:12 UTC 16 Apr 2018, 11:20:37 UTC Error while computing 25,781.77 0.00 --- SSPEMDD v0.17
windows_x86_64
4070686 1842168 1339 15 Apr 2018, 19:57:23 UTC 16 Apr 2018, 5:32:41 UTC Error while computing 25,778.84 0.00 --- SSPEMDD v0.17
windows_x86_64
4050403 1832265 1339 15 Apr 2018, 18:43:14 UTC 16 Apr 2018, 4:36:40 UTC Error while computing 25,758.59 0.00 --- SSPEMDD v0.17
windows_x86_64
4047343 1823108 1078 15 Apr 2018, 17:56:05 UTC 16 Apr 2018, 1:57:38 UTC Error while computing 26,173.51 0.00 --- SSPEMDD v0.17
windows_x86_64
4047007 1830965 1078 15 Apr 2018, 17:48:19 UTC 16 Apr 2018, 1:57:38 UTC Error while computing 26,170.27 0.00 --- SSPEMDD v0.17
windows_x86_64
4046822 1830872 1078 15 Apr 2018, 17:42:28 UTC 16 Apr 2018, 1:47:01 UTC Error while computing 26,147.49 0.00 --- SSPEMDD v0.17
windows_x86_64
4046649 1830786 1078 15 Apr 2018, 17:39:12 UTC 16 Apr 2018, 1:47:01 UTC Error while computing 26,191.06 0.00 --- SSPEMDD v0.17
windows_x86_64
4046517 1830720 1078 15 Apr 2018, 17:31:30 UTC 16 Apr 2018, 1:47:01 UTC Error while computing 26,170.64 0.00 --- SSPEMDD v0.17
windows_x86_64
4046120 1830521 1078 15 Apr 2018, 17:20:37 UTC 16 Apr 2018, 1:47:01 UTC Error while computing 26,140.12 0.00 --- SSPEMDD v0.17
windows_x86_64
4046080 1830501 1078 15 Apr 2018, 17:18:54 UTC 16 Apr 2018, 1:47:01 UTC Error while computing 26,145.89 3.00 --- SSPEMDD v0.17
windows_x86_64
4043823 1829391 1339 15 Apr 2018, 16:20:50 UTC 16 Apr 2018, 4:39:48 UTC Error while computing 25,794.69 0.00 --- SSPEMDD v0.17
windows_x86_64
4042677 1828820 1339 15 Apr 2018, 15:54:56 UTC 16 Apr 2018, 4:13:29 UTC Error while computing 25,751.98 0.00 --- SSPEMDD v0.17
windows_x86_64
4041574 1828304 1339 15 Apr 2018, 15:30:53 UTC 16 Apr 2018, 4:10:32 UTC Error while computing 25,749.25 0.00 --- SSPEMDD v0.17
windows_x86_64
4036501 1825910 1339 15 Apr 2018, 14:10:57 UTC 16 Apr 2018, 4:10:43 UTC Aborted 1,236.92 0.00 --- SSPEMDD v0.17
windows_x86_64
4036039 1825686 1339 15 Apr 2018, 14:00:33 UTC 16 Apr 2018, 4:10:32 UTC Error while computing 25,797.48 0.00 --- SSPEMDD v0.17
windows_x86_64
4036041 1825687 1339 15 Apr 2018, 13:59:15 UTC 15 Apr 2018, 21:29:26 UTC Error while computing 25,775.70 0.00 --- SSPEMDD v0.17
windows_x86_64
4036012 1825672 1339 15 Apr 2018, 13:58:03 UTC 15 Apr 2018, 21:29:26 UTC Error while computing 25,804.83 0.00 --- SSPEMDD v0.17
windows_x86_64
4035967 1825650 1339 15 Apr 2018, 13:55:44 UTC 15 Apr 2018, 21:08:46 UTC Error while computing 25,771.42 0.00 --- SSPEMDD v0.17
windows_x86_64
4035888 1825610 1339 15 Apr 2018, 13:53:40 UTC 15 Apr 2018, 21:04:34 UTC Error while computing 25,775.67 0.00 --- SSPEMDD v0.17
windows_x86_64
4035563 1825448 1339 15 Apr 2018, 13:48:47 UTC 15 Apr 2018, 21:01:56 UTC Error while computing 25,767.50 0.00 --- SSPEMDD v0.17
windows_x86_64
<<<

I have more if you need them.
ID: 132 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
adrianxw

Send message
Joined: 24 Apr 17
Posts: 23
Credit: 750,247
RAC: 4,198
Message 133 - Posted: 16 Apr 2018, 17:17:32 UTC

A further observation. I have 9 work units on this machine which are at the 99% level, 7 of them have 5:20:00, (appx) and show 10-11 seconds remaining. the other two show 4:40:00, (appx), run time and 24-25 seconds remaining. They are all suspended right now. I can release them and see if they all go to the time limit, it will waste CPU time, but then, they have already done a lot of that.
ID: 133 · Rating: 0 · rate: Rate + / Rate - Report as offensive    Reply Quote
1 · 2 · Next

Message boards : Number crunching : Computation Errors !