Aggregator

queue is empty

2 days 3 hours ago
The Atlas queue has been empty for a while now. There are still some resend tasks sent out every now and then. If you get one, the server seems to give you 24 hours to finish the task before it gets cancelled even if the deadline is still 7 days away. So keep your cache low and push the possible Atlas tasks to get them crunched asap.

Failed tasks not cleaning up and exiting in reasonable time

1 week 4 days ago
Today I found two VirtualBox ATLAS tasks in a sort of zombie state with stderr containing:

2025-05-26 22:19:39 (7140): Guest Log: [INFO] Probing /cvmfs/atlas.cern.ch... OK
2025-05-26 22:19:39 (7140): Guest Log: [INFO] Detected branch: prod
2025-05-26 22:21:58 (7140): Guest Log: [DEBUG] Failed to copy ATLASJobWrapper-prod.sh
2025-05-26 22:21:58 (7140): Guest Log: [DEBUG] VM early shutdown initiated due to previous errors.
2025-05-26 22:21:58 (7140): Guest Log: [DEBUG] Cleanup will take a few minutes...
2025-05-26 23:46:54 (7140): Status Report: Elapsed Time: '6000.000000'
2025-05-26 23:46:54 (7140): Status Report: CPU Time: '31.187500'
[...]
2025-05-28 07:44:48 (7140): Status Report: Elapsed Time: '114000.000000'
2025-05-28 07:44:48 (7140): Status Report: CPU Time: '343.546875'

The other log is similar.

Cleanup seems to have failed so I will abort both.
https://lhcathome.cern.ch/lhcathome/result.php?resultid=422888662
https://lhcathome.cern.ch/lhcathome/result.php?resultid=422887835

Other ATLAS tasks have been completing successfully on the same system. Does anyone have an explanation for this behavior?

Apptainer error

2 weeks 2 days ago
You are correct that I experienced the same problem on a previous Ubuntu setup, however I didn't remember it.
I have implemented the workaround, and now my Atlas_native workunit is running correctly now.

New tasks all failing?

1 month 1 week ago
Most of theory task take a very short time to be accomplished (less than a minute) and the Jobs chart show a high rate of failure (~60-70%). How to know if a task as due been completed ?

025-05-02 11:47:07 (10908): Guest Log: Environment HTTP proxy: not set
2025-05-02 11:47:08 (10908): Guest Log: job: htmld=/var/www/lighttpd
2025-05-02 11:47:08 (10908): Guest Log: job: unpack exitcode=0
2025-05-02 11:48:40 (10908): Guest Log: job: run exitcode=1
2025-05-02 11:48:40 (10908): Guest Log: job: diskusage=5704
2025-05-02 11:48:40 (10908): Guest Log: job: logsize=4 k
2025-05-02 11:48:40 (10908): Guest Log: job: times=
2025-05-02 11:48:40 (10908): Guest Log: 0m0.002s 0m0.004s
2025-05-02 11:48:40 (10908): Guest Log: 0m0.424s 0m0.248s
2025-05-02 11:48:40 (10908): Guest Log: job: cpuusage=1
2025-05-02 11:48:40 (10908): Guest Log: Job Finished
2025-05-02 11:48:40 (10908): Guest Log: boinc_shutdown called with exit code 0
2025-05-02 11:48:40 (10908): Guest Log: sd_delay: 845
2025-05-02 11:48:40 (10908): Guest Log: ETA: 2025-05-02 10:02:44 UTC
2025-05-02 12:02:45 (10908): VM Completion File Detected.
2025-05-02 12:02:45 (10908): Powering off VM.
2025-05-02 12:02:45 (10908): Successfully stopped VM.
2025-05-02 12:02:45 (10908): Deregistering VM. (boinc_1114d9ba70cb8796, slot#0)
2025-05-02 12:02:46 (10908): Removing network bandwidth throttle group from VM.
2025-05-02 12:02:46 (10908): Removing VM from VirtualBox.
2025-05-02 12:02:51 (10908): called boinc_finish(0)

ATLAS Task Failure – mv: source and destination are the same file (WU 231716588)

1 month 1 week ago
I encountered a compute error on an ATLAS native task that ran on my system for nearly two weeks before failing. Here's a summary of the failure:

Workunit: 231716588

Application version: ATLAS Simulation v3.01 (native_mt)

Client state: Compute error

Exit status: 195 (0x000000C3) EXIT_CHILD_FAILED

Host: ID 10873401

System: Linux Mint 22.1 (based on Ubuntu 22.04), CVMFS and Apptainer functional

Threads assigned: 5

Stderr shows this likely culprit:

mv: ‘ATLAS.root_0’ and ‘EVNT.44075481._002898.pool.root.1’ are the same file

This appears to crash the job early in its actual processing step, causing the wrapper to return an exit status of 195.

All CVMFS probes passed and the Apptainer container was loaded successfully from CVMFS. This seems to be an issue with the job script logic, likely in start_atlas.sh or its handling of the mv command near the start of execution.

Please let me know if you'd like additional logs or diagnostic output.

Thanks for all your hard work!

ERROR: failed to run pythia8 8.313

1 month 2 weeks ago
I will note that not all pythia8 tasks are failing. Very rarely do I get to sit down and look at what the VM is actually doing. But I've found some tasks give that error and some that don't.

CP5-CR2 is generating events
default-CD is generating events
default-noRap is generating events

qcdcr0 failed (by failed I mean they didn't generate any events)
tune-A2 failed I got 2 of them that failed.
tune-AU2 failed
tune-AU2lox failed
vincia-default failed (X2)
ropes failed

I haven't gotten any pythia6, sherpa or herwig tasks so I don't know about those.

Windows 10 Theory task stalled or ...?

1 month 3 weeks ago
2025-04-16 20:55:35 (9456): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2025-04-16 20:55:35 (9456): Guest Log: [INFO] 2.7.2.0 http://s1ihep-cvmfs.openhtc.io:8080 http://192.168.1.125:3128
These lines confirm:
- that openhtc.io is used for CVMFS (good!)
- that your local proxy 192.168.1.125 is used (even better!)

Missing "sixtrack test" column on the leaderboards

2 months ago
Thanks Laurence, however as of posting they have not been reinstated.

This will just cause confusion when 'B' appears again.

As it stands, there is already confusion, as the credits now shown on the leader-board do not add up to ones total credit (if the user ran sixtrack test and/or ATLAS long)
and if sixtrack test does appear again, would users credit start from 0 or from the previous score (assuming it is still in the database)? Transparency is needed in where credit is given to keep users faith in the credit scoring

New native version v300.08

2 months ago
By reading the page you mention I don't understand if this BUDA thingy is "only for the server side of things" or also for the boinc client ? like we would not need VB anymore and boinc would run a packaged boinc application via docker on the participant machine ?