Aggregator

VM Heartbeat file specified, but missing.

3 months ago
I assume most people would never see this but I though I would report it anyhow.

I lowered the power limit on my process to save some energy, at some point if you lower it too much then the time to boot up the vm so they fail with missing heartbeat.

How long may Native-Theory-Tasks run

3 months ago

On which level?

I understand that this is a complicated environment and a lot of things is outside my understanding. As a user I do everything I can to make it run well and for me a simple errorhandling is to abort zombie jobs automatic. It does not matter if it is user error or app errors, if someone like me can see in the log files that it has gone wrong it's easy to believe that the system knows it has gone wrong and should terminate. When terminated, a job has error status and it's easy to check the stderr and see what gone wrong, most likely user error :)

Your answers is very much appreciated - now the sudoers issue has been solved, found https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6075&postid=48978 I hope it's the correct one.

Can't do anything about the cgroups v1 issue as all modern Linux dialects is using v2. It is possible to reverse but it causes a lot of other issues in the system.

I don't know why Boinc has tried to pause the jobs. I don't allow any extra jobs to be downloaded so it should not try: <work_buf_min_days>0</work_buf_min_days> \ <work_buf_additional_days>0</work_buf_additional_days> I never pause any jobs manually these systems run 24/7 with only LHC no other Boinc projects.

The three jobs I had to cancel if you would get a minute to check if there is anything else I should fix:
https://lhcathome.cern.ch/lhcathome/result.php?resultid=419879823
https://lhcathome.cern.ch/lhcathome/result.php?resultid=419848816
https://lhcathome.cern.ch/lhcathome/result.php?resultid=419817476

I was working as sysadmin and software developer 30-35 years ago and have done a lot since. Now retired and have this as one of my hobbies so I'm not totaly lost with computers :)

Extreme event processing times

3 months 1 week ago
I don't know if an automatic end is expected to happen after a certain period (?)
I thought that 807,403.00 seconds (more than 9 days and 8 hours) was enough time to give the chance...For ATLAS is no automatic end set, but using only CPU time 54 min 33 sec in over 9 days is enough sign, that the task is not doing well.

Why does server give me atlas tasks if I enable theory tasks and If no work for selected applications is available, accept work from other applications?

3 months 1 week ago
Then check the settings for the default value.

It looks like it works a expected for other computers (including mine).
As long as you are the only one experiencing/reporting that issue it's most likely not a server issue.

It is also not related to your local BOINC client since you configure it on the server rather than on the client.