Test4Theory

Made a small script to keep an eye on Theory jobs with correct % done

3 weeks ago

@seanr22a Thank you again for this vbox script.

Thousands of workunits ready to send, but "No tasks are available for Theory Simulation".

1 month ago

Is it known whether Theory_native is going to return at some point in the future, or has it been permanently retired?

Most of theory task take a very short time to be accomplished (less than a minute) and the Jobs chart show a high rate of failure (~60-70%). How to know if a task as due been completed ?

025-05-02 11:47:07 (10908): Guest Log: Environment HTTP proxy: not set
2025-05-02 11:47:08 (10908): Guest Log: job: htmld=/var/www/lighttpd
2025-05-02 11:47:08 (10908): Guest Log: job: unpack exitcode=0
2025-05-02 11:48:40 (10908): Guest Log: job: run exitcode=1
2025-05-02 11:48:40 (10908): Guest Log: job: diskusage=5704
2025-05-02 11:48:40 (10908): Guest Log: job: logsize=4 k
2025-05-02 11:48:40 (10908): Guest Log: job: times=
2025-05-02 11:48:40 (10908): Guest Log: 0m0.002s 0m0.004s
2025-05-02 11:48:40 (10908): Guest Log: 0m0.424s 0m0.248s
2025-05-02 11:48:40 (10908): Guest Log: job: cpuusage=1
2025-05-02 11:48:40 (10908): Guest Log: Job Finished
2025-05-02 11:48:40 (10908): Guest Log: boinc_shutdown called with exit code 0
2025-05-02 11:48:40 (10908): Guest Log: sd_delay: 845
2025-05-02 11:48:40 (10908): Guest Log: ETA: 2025-05-02 10:02:44 UTC
2025-05-02 12:02:45 (10908): VM Completion File Detected.
2025-05-02 12:02:45 (10908): Powering off VM.
2025-05-02 12:02:45 (10908): Successfully stopped VM.
2025-05-02 12:02:45 (10908): Deregistering VM. (boinc_1114d9ba70cb8796, slot#0)
2025-05-02 12:02:46 (10908): Removing network bandwidth throttle group from VM.
2025-05-02 12:02:46 (10908): Removing VM from VirtualBox.
2025-05-02 12:02:51 (10908): called boinc_finish(0)

ERROR: failed to run pythia8 8.313

Test4Theory

2 months 1 week ago

I will note that not all pythia8 tasks are failing. Very rarely do I get to sit down and look at what the VM is actually doing. But I've found some tasks give that error and some that don't.

CP5-CR2 is generating events
default-CD is generating events
default-noRap is generating events

qcdcr0 failed (by failed I mean they didn't generate any events)
tune-A2 failed I got 2 of them that failed.
tune-AU2 failed
tune-AU2lox failed
vincia-default failed (X2)
ropes failed

I haven't gotten any pythia6, sherpa or herwig tasks so I don't know about those.

Windows 10 Theory task stalled or ...?

Test4Theory

2 months 2 weeks ago

2025-04-16 20:55:35 (9456): Guest Log: [INFO] Excerpt from "cvmfs_config stat": VERSION HOST PROXY
2025-04-16 20:55:35 (9456): Guest Log: [INFO] 2.7.2.0 http://s1ihep-cvmfs.openhtc.io:8080 http://192.168.1.125:3128
These lines confirm:
- that openhtc.io is used for CVMFS (good!)
- that your local proxy 192.168.1.125 is used (even better!)

New version v300.95

Test4Theory

2 months 2 weeks ago

This new version provides improved handling for configuring CVMFS proxies.

Some Theory tasks on VirtualBox hang Probing /cvmfs/alice.cern.ch...

Test4Theory

2 months 2 weeks ago

This happened to me as well but a project reset appears to have fixed it.

New native version v300.08

Test4Theory

2 months 3 weeks ago

By reading the page you mention I don't understand if this BUDA thingy is "only for the server side of things" or also for the boinc client ? like we would not need VB anymore and boinc would run a packaged boinc application via docker on the participant machine ?

New version v300.94

Test4Theory

2 months 3 weeks ago

It's a bit more complex.

ATLAS is still using an outdated vboxwrapper (don't know when they replace it).

VirtualBox uses the profile set by the first vbox component starting up.
This can be the GUI, vboxmanage or vboxheadless.
The profile is in use until the background process vboxsvc times out, usually a couple of seconds after the components just mentioned are finished.

This means:
When BOINC starts a vbox app using an old vboxwrapper first, the wrong profile location will be used.
When BOINC starts a vbox app using a new vboxwrapper (>=26210) first, the correct profile location will be used.

This is a per user limitation.
So, this affects all vbox tasks from various projects running under the same user account.
Simple rule: first vboxwrapper wins

A temporary workaround could be:
Replace the old vboxwrapper file with the recent one but keep the old name.
In the <options> section of cc_config.xml set <dont_check_file_sizes>1</dont_check_file_sizes>.
Then reload config files.

This shouldn't be used long term as you won't get automatic app updates for all projects any more.

Theory 300.93 (vbox64) for MacOS

Test4Theory

3 months ago

This version uses vboxwrapper 26210.
Please report issues here.

Theory task fail "finished with status code 1"

Test4Theory

3 months ago

It seems I have the same issue on an Intel iMac:

a<core_client_version>8.0.2</core_client_version>
<![CDATA[
<message>
process exited with code 1 (0x1, -255)</message>
<stderr_txt>
2025-03-30 16:55:27 (1462): vboxwrapper version 26208
2025-03-30 16:55:27 (1462): BOINC client version: 8.0.2
2025-03-30 16:55:27 (1462): Detected: VirtualBox VboxManage Interface (Version: 7.1.6)
2025-03-30 16:55:27 (1462): Detected: Sandbox Configuration Enabled
2025-03-30 16:55:27 (1462): WARNING: Communication with VM Hypervisor failed.
2025-03-30 16:55:27 (1462): ERROR: VBoxManage list hostinfo failed
2025-03-30 16:55:27 (1462): called boinc_finish(1)

</stderr_txt>
]]>

https://lhcathome.cern.ch/lhcathome/result.php?resultid=420829321

New Version v300.90

Test4Theory

3 months ago

Thanks. Now I need to figure out the difference ... good luck :-)

New Version v300.80

Test4Theory

3 months 2 weeks ago

[edit] The 10 day estimate is dropping quite fast. Let's hope it reaches realistic values before task finishes.

Most of the newest version tasks all seem to be completing in 2 hours or less, though I do have one right now that's been running for 5-1/2 hours.
The runtime estimate stopped dropping quite soon and stabilized at run time + time left = 240 hours.

Docer tasks crash on lhcathomedev

Test4Theory

3 months 2 weeks ago

Its not working yet you can see the developers on github discussing things like this.

New Version v300.70

Test4Theory

3 months 2 weeks ago

the latest version seems to work now, but what surprises me is what the log shows after testing the local proxy:

[INFO] /sbin/bootstrap: line 41: nc: command not found
[INFO] 127
[INFO] Local proxy can't be contacted and and will be ignored

how come?