Aggregator

Theory 300.93 (vbox64) for MacOS

1 month 1 week ago
In reply to computezrmle's message of 20 Apr 2026:
Toby Broom already mentioned that the VirtualBox app will not run on your Mx Mac (since the guest is x86_64 based).

What may run is the Theory docker app, but it reports an error since it requires /cvmfs to test whether a local CVMFS client is available.
Although running a local CVMFS would be the preferred solution I suggest to first test the following method.

Creating a subfolder directly under '/' seems to be a pain on MacOS.
See here for a possible solution and report if it works:
https://lhcathome.cern.ch/lhcathome//forum_thread.php?id=6481&postid=53450

Thanks computezrmle,

I'll see what I can do.

taille unités atlas-size units atlas

1 month 1 week ago
We usually get smaller Atlas tasks. Its been a while since i have seen such big tasks that have EVNT files that are bigger than 500mb . And no it is not related to virtual box. Atlas native on linux has 1.76GB and 2.08GB input files right now.

Looking at the forum https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6188
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6214
https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6263

The last time this seemed to happen was end 2024.

Failed Theory tasks cause container to hang forever

1 month 1 week ago
Name Theory_2922-4881498-693
Anwendung Theory Simulation
batch 19854517
erstellt 6 Mar 2026, 12:21:50 UTC
Mindestanzahl 1
Anfängliche Kopien 1
max # von Fehler/Gesamt/Erfolg Aufgaben 3, 3, 1
Fehler Zu viele Ergebnisse insgesamt
Command line
Priority 0
Job instances
Aufgabe
anklicken für Einzelheiten Computer Gesendet Meldezeit
oder Ablaufdatum
Erklärung Status Laufzeit
(sek) CPU Zeit
(sek) Punkte Anwendung
433463131 10870847 7 Mar 2026, 1:17:10 UTC 14 Mar 2026, 9:37:54 UTC Abgebrochen 627,659.94 595,418.10 --- Theory Simulation v302.10 (docker)
windows_x86_64
433857762 10915803 14 Mar 2026, 12:17:41 UTC 25 Mar 2026, 7:45:25 UTC Abgebrochen 137,481.22 112,932.80 --- Theory Simulation v302.10 (docker)
windows_x86_64
434120529 10797673 25 Mar 2026, 10:45:38 UTC 8 Apr 2026, 7:52:12 UTC Abgebrochen 1,197,623.06 1,139,443.00 --- Theory Simulation v302.10 (docker)
windows_x86_64
Input files

Error while computing

1 month 2 weeks ago
if you run boinc as the boinc user you might need to make the config global e.g.

/etc/containers/containers.conf

or put the the boinc users folder., wherever this might be for your install.

New version v302.10 for docker

1 month 2 weeks ago
I also faked that VirtualBox is installed as alternate

New-Item "C:\Program Files\Oracle\VirtualBox\virtualbox.exe" New-Item -Path "HKLM:\SOFTWARE\Oracle\VirtualBox" New-ItemProperty -Path "HKLM:\SOFTWARE\Oracle\VirtualBox" -Name "Version" -Value "7.2.2" -PropertyType "String" New-ItemProperty -Path "HKLM:\SOFTWARE\Oracle\VirtualBox" -Name "VersionExt" -Value "7.2.2" -PropertyType "String" New-ItemProperty -Path "HKLM:\SOFTWARE\Oracle\VirtualBox" -Name "InstallDir" -Value "C:\Program Files\Oracle\VirtualBox\" -PropertyType "String"

Seems like the don't use vbox would stop the tasks being sent from server.

Some Theory tasks on VirtualBox hang Probing /cvmfs/alice.cern.ch...

1 month 3 weeks ago
MadGraph5

These types of tasks are rare, but I see the same scenario on one of my PCs.
1. For about half an hour, there is some activity in the VM: Guest shows some CPU load, it downloads around 200 MB, and writes about 100 MB to the disk.
2. After that, Guest activity drops to zero, all performance graphs are flat, and the task log file becomes unreachable.

Here is an excerpt from the log:
===> [runRivet] Wed Apr 8 07:03:39 AM UTC 2026 [boinc pp zinclusive 13000 -,-,200 - madgraph5amc 2.7.2.atlas3...] ... ValueError: unsupported hash type md5 AttributeError : 'module' object has no attribute 'md5' ... ERROR: missing LHE output file: /scratch/tmp/tmp.UjcDJkl2wh/MG5RUN/Events/run_01/unweighted_events.lhe ... [2]+ 2535 Running ( $rivetExecString; exit $? ) & ERROR: fail to run madgraph5amc 2.7.2.atlas3 or Rivet (error exit code)
My preliminary, but not yet firm conclusion: The simulation never started because the angry Python strangled everything, maybe even itself. :)
My action: Brick therapy for that VM.

P.S. For those who are experienced or have real knowledge – please comment, is this "Brick therapy" a correct solution?

posted on wrong board sorry..

1 month 3 weeks ago
Theory Simulation tasks stuck at 100% — docker_wrapper_18 infinite loop on container exit - I'm posting this over on Theory Simulation now. Link to https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6480&postid=53372

ATLAS native tasks failing with computation error after Docker upgrade — tmpfs FIX (solved locally)

2 months ago
Hi all,
I'm running LHC@home on Kubuntu 24.04 with BOINC 8.2.9 and recently resolved a complete breakdown of ATLAS task completion following an upgrade from docker.io to docker-ce 29.x. Posting here because I see others hitting the same wall.
The symptom
ATLAS tasks fail with "computation error." Tasks may die immediately or run to 100% then fail. No credits awarded. In the task's stderr output you will find:
mount: /var/www/lighttpd: cannot mount tmpfs read-only
The cause
Docker 29.x tightened its default security profiles. Containers now launch without CAP_SYS_ADMIN unless explicitly requested. The ATLAS wrapper runs a lighttpd instance inside the container to serve simulation results back to BOINC. Lighttpd needs to mount a tmpfs filesystem internally — which requires CAP_SYS_ADMIN. Without it, lighttpd cannot start, results cannot be delivered, and the task fails.
This is not AppArmor. Not Landlock. Not seccomp. It is a missing container capability introduced by the docker-ce 29.x default profile change.
Why cc_config.xml doesn't help
You may find references to a docker_container_options directive in cc_config.xml. This field does not exist in the BOINC codebase — it is not parsed and has no effect. A feature request to implement it has been filed as BOINC GitHub issue #6931 and is currently assigned to the BOINC client developer and the LHC@home project contact at CERN.
The fix
Deploy a wrapper script at /usr/bin/docker that intercepts docker run and docker create calls and injects --privileged automatically. The --user flag sets the container to run as boinc:boinc so output files are created with correct ownership — without this they come out as root:root and BOINC cannot move them from the slot directory.
Protect the wrapper from being overwritten by future docker-ce package upgrades using dpkg-divert:
sudo dpkg-divert --divert /usr/bin/docker.real --rename /usr/bin/docker
Then place your wrapper script at /usr/bin/docker. After any docker-ce upgrade, the new binary installs to /usr/bin/docker.real and the wrapper is untouched.
Verify
Confirm the first line of /usr/bin/docker is a bash shebang, then re-enable LHC@home tasks and confirm the next ATLAS task completes and uploads successfully.
Tested on: Kubuntu 24.04, BOINC 8.2.8 and 8.2.9, docker-ce 29.2.1 and 29.3.0.
Full documentation including wrapper script source: github.com/black-vajra/boinc-devel

ATLAS failed

2 months ago
I have over 40 invalid Atlas tasks on Windows 10 showing a "validation error".

Furthermore, the LHC website is slow and doesn't allow me to see the server status.

It's been 24 hours since any Atlas tasks have downloaded.

docker-ce 29.x breaks ATLAS tasks: tmpfs mount failure and the fix

2 months 2 weeks ago
# CPU Affinity Management and Core Rotation

*Applies to: BOINC 8.x, Linux, multi-core systems with sustained compute workloads*

---

## The Problem

BOINC does not manage CPU affinity for its worker processes. On a multi-core system running
multiple simultaneous tasks, the Linux scheduler will distribute work across cores as it sees
fit — which in practice often means a small number of cores carry most of the load while
others sit idle. Over time this produces thermal hotspots, uneven hardware wear, and on
systems with SMT (hyperthreading), interference between threads sharing physical cores.

On pots, before any affinity management, 2-4 cores were regularly hitting 68-71°C while
adjacent cores remained near idle. After the affinity script was deployed, sustained
temperatures dropped to the 26-53°C range with load distributed visibly across all cores.

A secondary problem: even with affinity pinning, always assigning the same heavy process to
the same cores concentrates electromigration and thermal stress on those physical cores over
months of continuous operation. Core rotation addresses this.

---

## Design Decisions

### Why not equal distribution?

The naive approach is to divide cores evenly — with 14 cores and 2 workers, give each 7.
This ignores the reality that BOINC workers vary enormously in their CPU consumption:

- An ATLAS GEANT4 simulation may run at 800-900% CPU (8-9 cores worth)
- A MilkyWay N-body task may run at 700% CPU
- An Einstein gravitational wave search may run at 90% CPU (nearly single-threaded, GPU-assisted)

Giving a 90% CPU process 7 cores wastes those cores. Giving an 800% process only 7 cores
artificially throttles it. Proportional allocation gives each process a share of available
cores that matches its actual demand.

### Why not per-thread pinning?

An early iteration of the script attempted to pin individual threads within a process to
individual cores. This produced excessive overhead — the script spent most of its time
enumerating threads and issuing taskset calls — and the benefit over process-level pinning
was marginal. Linux's scheduler handles intra-process thread distribution well once the
process is confined to a set of cores. Process-level pinning was retained.

### The 20% CPU threshold

Processes below 20% CPU are not considered compute workers and are not pinned. This filters
out BOINC's own utility processes (gawk, sleep, the boinccmd wrapper chain) which briefly
appear as children of boinc-client but consume negligible CPU and should not be constrained
to a subset of cores.

---

## How the Script Works

### Worker detection

Each cycle, the script builds a list of candidate PIDs by combining two sources:

- All descendants of the boinc-client process (recursive pgrep walk)
- Any processes matching known ATLAS/CVMFS name patterns (see ATLAS_ORPHAN_PROBLEM.md)

The combined deduplicated list is then filtered by the 20% CPU threshold to produce the
final worker list.

### Proportional core allocation

For each worker, the script calculates its share of total cores proportionally to its share
of total CPU consumption across all workers. A minimum of 2 cores is guaranteed to any
worker regardless of its relative share, to avoid assigning a single core to a multi-threaded
process.

The allocation is calculated as:

allocated_cores = max( int( (worker_cpu / total_cpu) * total_cores ), MIN_CORES )

The last worker in the list receives all remaining cores in the window — this ensures no
cores are left unassigned due to integer rounding.

### Core rotation

Without rotation, the proportional allocator always starts at core 0. A heavy process
consuming 800% CPU would always land on cores 0-7. Over months of continuous operation, those
cores accumulate disproportionate thermal and electromigration stress.

The rotation mechanism maintains a counter that advances by ROTATION_STEP cores each cycle.
The starting position for core assignment each cycle is:

core_cursor = ROTATION_COUNTER % TOTAL_CORES

On a 14-core system with ROTATION_STEP=2 and POLL_INTERVAL=10 seconds, the assignment window
completes a full rotation across all cores every 70 seconds. Over a day of continuous
operation, every core receives roughly equal aggregate load from heavy processes.

The script also issues renice -n 19 alongside every taskset call. This ensures that even
an escaped ATLAS orphan running at 900% CPU remains at the lowest scheduler priority,
yielding to any interactive or system process that needs CPU time.

### Change detection

The script tracks which PIDs have been pinned. It reassigns cores every cycle regardless
(to enforce rotation), but logs a "change detected" message when the worker set changes —
a new PID appearing, an existing PID dying, or a PID dropping below the CPU threshold.
This makes the journal useful for tracking task lifecycle.

---

## Configuration Parameters

All tunable parameters are at the top of the script:

POLL_INTERVAL — seconds between cycles. Default 10. Lower values make the script more
responsive to new workers but increase CPU overhead from the monitoring itself. On a system
where tasks run for hours, 10-30 seconds is appropriate.

CPU_THRESHOLD — minimum %CPU to be considered a compute worker. Default 20.0. Raise this
if utility processes are being incorrectly identified as workers. Lower it if lightweight
tasks are being ignored.

MIN_CORES — minimum cores to assign any single worker. Default 2. Prevents a low-CPU worker
from being squeezed onto a single core when a high-CPU worker dominates the proportional
calculation.

ROTATION_STEP — cores to advance the window each cycle. Default 2. Higher values produce
faster rotation but less time on each core window per cycle. On a 14-core system, step 2
completes a full rotation in 7 cycles (70 seconds at POLL_INTERVAL=10).

---

## Systemd Integration

The script runs as a systemd service under `boinc-affinity.service`. Key unit settings:

AmbientCapabilities=CAP_SYS_NICE — required for taskset and renice to operate on processes
owned by other users (including root-owned ATLAS orphans) without running the entire script
as root.

CapabilityBoundingSet=CAP_SYS_NICE — limits the service to only this capability, following
the principle of least privilege.

Restart=always — the script is designed to run continuously. If it exits for any reason
(including a clean exit triggered by SIGTERM when boinc-client restarts for a scheduler
cycle), systemd restarts it after RestartSec=15 seconds.

After=boinc-client.service — start ordering only, not a hard dependency. The script handles
boinc-client not being present gracefully (it waits and retries). The BindsTo dependency
that was present in an earlier version was removed because it caused the affinity service
to die whenever boinc-client restarted for scheduler cycles.

ExecStartPre=/bin/sleep 5 — gives boinc-client time to spawn its initial workers before
the affinity script starts scanning for them.

---

## Known Limitations

Binary name logging — the script reads /proc/PID/exe to get the worker's binary name for
log output. For processes owned by root (ATLAS orphans), this requires root access. The
service runs with CAP_SYS_NICE but not as root, so binary names for root-owned processes
appear as empty string in the journal. Pinning and renicing still work correctly — this
is cosmetic only.

Single-parent multi-threaded workers — taskset on a process PID pins all threads of that
process to the specified core set. This is correct behaviour: child threads inherit the
affinity mask. However, the CPU usage reported by ps for the parent PID is the sum of all
thread activity, so a 14-thread process showing 900% CPU will be treated as a single worker
consuming 900% and allocated cores accordingly.

Workers appearing below threshold mid-task — some tasks throttle themselves during I/O
phases or checkpointing and may temporarily drop below the 20% threshold. The script will
unpin them during that cycle. They will be re-detected and re-pinned on the next cycle when
CPU usage rises again. This is normal and harmless.

---

## Tuning for Different Hardware

On systems with more cores, consider increasing ROTATION_STEP proportionally to maintain
a similar rotation period. On a 28-core system, ROTATION_STEP=4 would give a similar 70-second
full rotation.

On systems with fewer cores (8 or fewer), MIN_CORES=2 may be too high if you regularly run
3 or more simultaneous workers. Reduce to 1 if allocation arithmetic is being distorted by
the minimum floor.

If your workload is dominated by a single heavy multi-threaded task, rotation provides the
most benefit. If your workload is many lightweight single-threaded tasks, proportional
allocation provides the most benefit. The script handles both.

LHC@home silent task failures: boinc user missing docker and vboxusers group membership

2 months 2 weeks ago
Thread: LHC@home silent task failures: boinc user missing docker and vboxusers group membership

Hi all,
Posting this as a reference for anyone who finds themselves attached to LHC@home, receiving work units, but never actually running either CMS or ATLAS tasks — with no obvious error in the BOINC event log to explain why.
The cause is almost certainly group membership. LHC@home on Linux requires the boinc system user to be a member of two groups that are not added automatically during installation and are not documented clearly in one place.

The Problem
LHC@home runs two entirely separate execution runtimes:

ATLAS tasks — run via Docker containers
CMS tasks — run via VirtualBox VMs (VBoxHeadless)

Each runtime has its own permission requirement. If the boinc system user lacks membership in the appropriate group, tasks for that runtime will silently fail to launch. BOINC receives the work unit, acknowledges it, and then does nothing with it. No computation error is logged. No failure is reported. The task simply sits until it times out and is returned to the server.
From the user's perspective this looks like a task supply problem — "LHC@home just isn't sending me work." It isn't. The work is arriving and being silently dropped.

The Diagnostic
Check what groups the boinc user currently has:
bashid boinc
Then test each runtime directly:
bash# Test Docker access
sudo -u boinc docker version 2>&1

# Test VirtualBox access
sudo -u boinc VBoxManage list vms 2>&1
A permission error on either command confirms the missing group. A clean response (even an empty list for VBoxManage) confirms access is working.

Required Group Memberships
The full set of groups the boinc system user needs for complete LHC@home functionality on a typical Linux installation:
GroupPurposeboincPrimary group — BOINC data directory accessvideoGPU accessrenderAMD/Intel GPU render accessdockerATLAS task container executionvboxusersCMS task VM execution
The docker group is typically added when Docker is installed if you follow the post-installation steps in the official Docker documentation. The vboxusers group is created by the VirtualBox installer. Neither is added to the boinc system user automatically.

The Fix
bashsudo usermod -aG docker,vboxusers boinc
Verify:
bashid boinc
Expected output should include both docker and vboxusers in the groups list.
Important: The group membership change does not take effect for the running boinc daemon until it is restarted. A full stop and start is required:
bash# Stop BOINC completely
sudo pkill boincmgr
sudo pkill boinc

# Verify everything is stopped
pgrep -a boinc

# Restart
sudo boinc --redirectio &
sleep 3
boincmgr &
After restart, re-run the diagnostic commands above to confirm both runtimes are accessible.

A Note on Running BOINC as Root
On some setups BOINC is started via sudo boinc rather than as the boinc system user. In that case the daemon runs as root and group membership is irrelevant — root bypasses group checks entirely. The group membership fix described here applies specifically when BOINC is running as the boinc system user, which is the default when using the systemd service (systemctl start boinc-client).
To confirm which user your boinc daemon is running as:
bashps aux | grep boinc | grep -v grep
The USER column will show either boinc or root.

Current Status
With both groups added and BOINC restarted, both ATLAS Docker tasks and CMS VirtualBox tasks run correctly. CMS tasks in particular went from never appearing to downloading and executing normally once the vboxusers membership was in place.
Tested on:

Kubuntu 24.04 LTS
BOINC 8.2.8
VirtualBox 7.1.16
docker-ce 29.3.0


Full documentation including scripts, systemd units, and troubleshooting history:
github.com/black-vajra/sable-boinc_admin

Theory CPU Scheduling oddness

2 months 3 weeks ago
Well, it turns out I DID have to blacklist KVM.
Like this -

If it doesn't exist, create this file -sudo touch /etc/modprobe.d/kvm.conf
Edit kvm.conf and paste these lines -blacklist kvm blacklist kvm_intel blacklist kvm_amdand Save.

Update the GRUB configuration (if necessary):sudo grub2-mkconfig -o /boot/grub2/grub.cfgReboot.

It works now. Thanks!

optimisation

3 months 1 week ago
Bonjour,est ce que vous prévoyez d'optimiser encore l'application 302.1 theory ou est ce le modéle définitif parce que c'est lent avec docker.
ça va beaucoup plus vite sur virtualbox.

Hello, do you plan to further optimize the 302.1 theory application or is it the definitive model because it’s slow with docker.
it goes much faster on virtualbox.