Aggregator

Lost in Atlas......

3 days 10 hours ago
CMS mostly seem to be working ok.
That's wrong.
Your CMS VMs are running empty tasks without any scientific value.
As said, this is because of an error in CERN's backend queue which does not send out any scientific job.
You can't do anything against it as it must be solved by CERN staff after their holidays.

Indicators are:
1. short runtimes
2. CMS Grafana pages:
https://lhcathome.cern.ch/lhcathome/cms_job.php

https://monit-grafana.cern.ch/d/o3dI49GMz/cms-job-monitoring-12m?viewPanel=49&orgId=11&var-group_by=CMS_JobType&var-Tier=All&var-CMS_WMTool=All&var-CMS_SubmissionTool=All&var-CMS_CampaignType=All&var-Site=T3_CH_Volunteer&var-Site=T3_CH_CMSAtHome&var-Type=All&var-CMS_JobType=All&var-CMSPrimaryDataTier=All&var-adhoc=data.RecordTime%7C%3E%7Cnow-7d&var-ScheddName=All&from=now-7d&to=now


If you want to deliver work with scientific value, switch to Theory.

This gonna be long

1 week 2 days ago
A long Herwig one.




===> [runRivet] Mon Dec 15 08:37:16 UTC 2025 [boinc pp z1j 13000 280 - herwig7 7.2.1 nlo-pw-dipole 48000 421]

After 24 hours runtime 217 of 760 integrations done and thereafter 48000 events to process.This will be going very loooooong.

The integration's part was done after about 80 hours and the first 480 (1%) of the events took 4 hours, so another 17 days to go . . . . . . . . . . .

New 1000 event tasks

2 weeks 1 day ago
Same here too. Got more than a dozen tasks cancelled while running for hours (some >50% in progress). Some did get cancel before the tasks ran and I'm fine with that.

In addition, got tasks with validation error but it was only a few minutes of running, so that's not as bad when compare to those already running for hours and then got cancelled.
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=237892957
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=237896161
https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=237891554

Xtrack (Xboinc)

2 weeks 3 days ago
Meantime works on Xtrack continues: version 0.98.5 released

Does this mean that eventually there will be new work for us as well at some point? I mean it wouldn't make sense to further develop Xtrack if the project is dead.

Who knows?
Xtrack is the codebase of XBoinc, so the fact that the development is not dead is a good news....

Hope for some work of Xboinc, obviously

no new WUs available

1 month ago
YES they seem to be running here again Ivan.......not over at -dev though
Hi; The vccs-dev machine had been removed from the firewall. Laurence reinstated it, and I now have a job successfully running, so try again.

Events count less easily monitored: eventLoopHeartBeat.txt stays stuck.

1 month ago
Hi!
I've been away for a while. Now I see that the file eventLoopHeartBeat.txt in the [...]/boinc-client/slots/?*/PanDA_Pilot-* directory is no more constantly updated, so it always reports "1 event read so far". It's possible to find multiple updated eventLoopHeartBeat.txt files, one for each worker, in [...]/boinc-client/slots/?*/PanDA_Pilot-*/athenaMP-workers-EVNTtoHITS-sim/worker_?* subdirs. However you have to sum up the number of events to get the total...

I don't think this has been done on purpose, am I wrong?
--
Bye, Lem

Hung Theory task?

1 month 1 week ago
There's no 'obvious error' reported back to the project.
In cases like that there is no log file from the scientific app sent back to the project.
Hence, there is nothing to analyse and the task is either marked as 'failed' or 'lost' after the due date.

Even the log snippets you posted do not clearly explain if/why the tasks got stuck.

So, how should the project decide what caused the failure.
It could be either (may be incomplete):
- hardware
- the OS
- VirtualBox
- BOINC
- vboxwrapper
- data from CVMFS
- scientific app

From the project's perspective there's only the overall task failure rate for the computer itself.
As already mentioned for this computer it is less than 1 % covering all possible reasons.

6+ day task?

1 month 1 week ago
However, for Theory tasks, you can click on the "Graphics" button in the left hand part of the BOINC Manager, then a Browser window opens, there you click on "logs", and then on "running log" - this shows the progress of the task
This tip has been very useful for me.
Previously, I blindly aborted Theory tasks when lasting more than 5-6 days.
Then I read your comment (thank you very much) and explored "Show graphics" command and beyond at BOINC Manager.Congratulations for returning this overdue valid task and many thanks for your extended comments and images!

It was also the longest task reported on November 7th: 344.21 hours
That is: 14 days, 8 hours, 12 minutes, 33 secondsUnfortunately someone broke your record during the last 100 tasks:

Theory Simulation 4930 6784 3.52 (0.03 - 489.41)
20 days - 9 hours - 25 minutes

Theory CPU Scheduling oddness

1 month 3 weeks ago
This is a bug in VirtualBox 7.2.4.

On a computer with AMD CPU there's no known workaround so far.
...
After more testing...
Looks like the downgrade left the 7.2.4 kernel module on the system.
It now works after a cleanup and a fresh 7.2.2 installation (package from VirtualBox).

The kvm_amd module must remain blacklisted.

atlas error

2 months ago
Anyone using Windows 11 should consider virtualization.
https://learn.microsoft.com/en-us/answers/questions/2186657/help-with-hyper-v-disabling-and-vt-x-enabling-for?forum=windowsclient-all&referrer=answers

Generally, we all should check our computers to see if they're still delivering anything. Regardless of which operating system is installed. For example, this computer with Linux has already failed almost 2,000 WUs.
https://lhcathome.cern.ch/lhcathome/show_host_detail.php?hostid=10858958
User is TRIUMF-LCG2. And if I see it correctly, he has around 70 computers, and I'm afraid they all have the same problem.
https://lhcathome.cern.ch/lhcathome/hosts_user.php?userid=567711

Hi, I'm receiving Atlas tasks that are supposed to run on 8 cores, but each task is actually using much less than 1 core.

2 months ago
I am running tests with Windows 11 wsl2 /Docker/Boinc without adjusting anything except a few instructions in windows "enable or disable Windows features", it seems to work with Xtrack tasks but only with those, and it struggles to receive tasks to reach 32 threads.

With everything enabled in Windows 11 except for certain Windows features settings, it runs really fast. Don't you think about converting the software and using Docker?

Months trying to get VirtualBox and BOINC to work on my Windows 11 and it was never possible, then I came across DOCKER AND SURPRISE, following the guide was very easy.

https://github.com/BOINC/boinc/wiki/Installing-Docker

windows10 Full