Test4Theory

Long running task, how to proceed?

1 week 2 days ago
Thanks for having a look into that. I've meanwhile compared the VM output of this task to other tasks I have running (they log 30-50.000 events after a few hours), so, sadly, I have aborted this task.

No Tasks

2 weeks 2 days ago
I am not getting any for Windows now, I have the project selected and Vbox installed, site says there are loads but nothing downloading?Do you have 'native' selected in your project preferences?

How long may Native-Theory-Tasks run

3 weeks 4 days ago
As I said, it's been aborted now.
I had only left it alone because I've had others run long but were otherwise working normally.
I don't make a habit of digging through workunit logs without cause.

Native Theory 300.08 configuration issue

1 month ago
Solved. I ended up installing a new up-to-date OS (Fedora 39). Then followed this guide to install BOINC on Fedora and also this guide to install CVMFS on Fedora.

I install CVMFS:

dnf install https://ecsft.cern.ch/dist/cvmfs/cvmfs-2.11.0/cvmfs-2.11.0-1.fc34.x86_64.rpm https://ecsft.cern.ch/dist/cvmfs/cvmfs-config/cvmfs-config-default-latest.noarch.rpm http://ecsft.cern.ch/dist/cvmfs/cvmfs-2.11.0/cvmfs-libs-2.11.0-1.fc34.x86_64.rpm

cvmfs_config setup

Then added CVMFS configuration to /etc/cvmfs/default.local:

CVMFS_REPOSITORIES="atlas,atlas-condb,grid,cernvm-prod,sft,alice" CVMFS_HTTP_PROXY="auto;DIRECT" CVMFS_USE_CDN=yes CVMFS_CLIENT_PROFILE=single

And ran the prepare_theory_native_environment script from this board:

sudo /bin/bash -c "export script=\"prepare_theory_native_environment\" && wget https://lhcathome.cern.ch/lhcathome/download/\$script -O /tmp/\$script && chmod u+x /tmp/\$script && /tmp/\$script && rm /tmp/\$script"

New native version v300.08

1 month ago
It's all working now, but not to sure why, but maybe a reboot did something lol.

But here is a work unit from LHC-Dev (( had it running a few of the theory tasks over there)

https://lhcathomedev.cern.ch/lhcathome-dev/result.php?resultid=3310417

I am getting the pause and resume now, with no errors.

Also this is the output in the stderr.txt in one of my theory tasks from here (not dev) which I tried it as well and you can see it is working as well.

05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Starting runc container.
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] To get some details on systemd level run
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] systemctl status Theory_2743-2722097-13_1.scope
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] mcplots runspec: boinc pp jets 13000 160 - pythia8 8.308 tune-A2 100000 13
05:49:29 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] ----,^^^^,<<<~_____---,^^^,<<~____--,^^,<~__;_
07:40:32 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Pausing systemd unit Theory_2743-2722097-13_1.scope
07:43:01 AWST +08:00 2024-03-24: cranky-0.1.4: [INFO] Resuming systemd unit Theory_2743-2722097-13_1.scope

Regards

file_xfer_error

1 month 3 weeks ago
Cheers, I think I might just stop doing theory altogether, the project seems a mess at the moment, even the native apps are failing almost instantly.

cranky: [ERROR] No output found - SOLVED

1 month 3 weeks ago
Native theories are still failing for me, allowed the machine to grab some and they all instantly failed, any ideas?

https://lhcathome.cern.ch/lhcathome/result.php?resultid=407019270

Any way to force the box to get Vbox theories whilst still getting native Atlas tasks?, the native tasks seem to complete so much faster.

Latest errors on tasks

2 months ago
In your tasks:
Command: VBoxManage -q closemedium "/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi" Output: VBoxManage: error: Cannot close medium '/var/lib/boinc/projects/lhcathome.cern.ch_lhcathome/Theory_2023_12_13.vdi' because it has 2 child mediaWhen it cannot close the medium, it will also not open it. The problem is not with BOINC, but VirtualBox.
You have to use VirtualBox Manager to solve your problem. See https://lhcathome.cern.ch/lhcathome/forum_thread.php?id=6112&postid=49574,

but instead of the CMS-vdi, remove Theory_2023_12_13.vdi, but don't delete the file itself.
That seems to have done it. When I looked in Tools/Media I found there were 2 orphaned Theory tasks listed under the .vdi file -- I assume those were the 2 child media the log referred to.
The Theory .vdi couldn't be removed from the VB manager until those were gone.
Thanks for the detailed reply. I would never have found this without it.

PYTHIA Warning in Pythia::check: not quite matched particle energy/momentum/mass

2 months 1 week ago
After >5 hours 100% CPU and running.log last lines:
0 events processed PYTHIA Warning in DireSpace::branch_IF: used up beam momentum; discard splitting. PYTHIA Info in DireTimes::pT2nextQCD_FF: Found large acceptance weight for Dire_fsr_qed_11->11&22_notPartial PYTHIA Info in DireTimes::pT2nextQCD_FF: Found large acceptance weight for Dire_fsr_qcd_21->1&1a PYTHIA Warning in DireTimes::branch_FI: used up beam momentum; discard splitting. PYTHIA Warning in Pythia::check: not quite matched particle energy/momentum/mass I aborted this job. Desciption: pp wy 13000 - - pythia8 8.303 dire-default
This was the second resend and the initial wingman let it run the entire 10 days without succes.
Workunit: https://lhcathome.cern.ch/lhcathome/workunit.php?wuid=219321083

VM Usage question

2 months 1 week ago
Theory task running complete, but the outputfile is not generated.
Don't know what the reason can be.

jobs stats page

2 months 2 weeks ago
Thanks.

So, the link on the project website also needs to be changed back from mcplots.cern.ch to mcplots-dev.cern.ch

Menu: Jobs -> Theory Jobs

Website showing list with "bad" sherpa tasks

2 months 2 weeks ago
Although the task in question indeed might got stuck the "failed" list is not helpful to decide whether any task should be killed or not.

That's because 1 important fact has not been mentioned:
"Therefore no task of them was successful so far..."

Especially when a new mcplots revision starts there are always a couple of runspecs that fail or get lost before they report their first success.
To get removed from the "failed" list a runspec needs to report at least 1 successful result.

Buugy workunit

3 months 2 weeks ago
I have had many long ones over the years but this one was here a couple months ago

Native Theory Application Setup issue

3 months 2 weeks ago
... I suggest to be a bit more strict and modify the original sudoer pattern as follows:
1. As root edit "/etc/sudoers.d/50-lhcathome_boinc_theory_native"
2. Locate the alias "LHCATHOMEBOINC_03"
3. Replace "...runc --root state..." with "...(runc|runc\.new|runc\.old) --root state..."
4. Save the file
As of today the updated script creating the sudoers file is available on the project servers (-dev and -prod).
The modified script now creates the correct command alias.

Volunteers who already modified the sudoers file do not need to run the script again.

Volunteers who run the script again will find a backup of the old sudoers file in /etc/sudoers.d beside the new file.
Feel free to leave the backup there or delete it.

Sudo will automatically ignore the backup file and use the new file as soon as it is available.
Sudo version >= 1.9.10 still remains a requirement.
Checked
Test4Theory
LHC@home: Theory Application
Subscribe to Test4Theory feed