bugGNU Parallel - Bugs: bug #66063, Flag to change jobslots number...

 
 

bug #66063: Flag to change jobslots number upon job failure

Submitter:  LGTR <lgtr>
Submitted:  Mon 05 Aug 2024 02:08:57 PM UTC
   
 
Category:  None Severity:  3 - Normal
Item Group:  None Status:  None
Privacy:  Public Assigned to:  None
Open/Closed:  Open
* Mandatory Fields

Add a New Comment Rich Markup
   

Fri 30 Aug 2024 12:26:05 PM UTC, comment #4: 

--jobs 10auto would set a value on each job: If the job fails and the current --jobs value is lower, then do not lower --jobs. This is to avoid having lowering too fast if --jobs starts waaay too high:

If the optimal value is 12, and we start at 100, then 6 failures will lower --jobs to 11.7 (100*0.7*0.7*0.7*0.7*0.7*0.7). But if the first 10 jobs fail, then we will hit 2.8 and will take much longer to ramp up to 12.

So instead when a job fails, only lower --jobs if the current --jobs is the same as when the job started.


Anonymous
Fri 30 Aug 2024 09:51:17 AM UTC, comment #3: 

Should it rather be like --delay 10auto?

> If you append ’auto’ to duration (e.g. 13m3sauto) GNU parallel will automatically try to find the optimal value: If a job fails, duration is increased by 30%. If a job succeeds, duration is decreased by 10%.


So:

> --jobs 10auto

If a job dies: lower -j by 30%. If a job succeeds raise -j by 10%. If you hit 1, you stay at 1 forever.

Anonymous
Mon 05 Aug 2024 05:21:41 PM UTC, comment #2: 

Maybe:

    --jobs 8:3:1

8 jobslots, until first error. Then retry with 3, until next error. Then retry with 1.

    --jobs 100%:50%:30%:1

100% jobslots, until first error. Then retry with 50%, until next error. Then retry with 30%, until next error. Then retry with 1.

What do we do about already running jobs? If these fail, they will trigger the step down, too. Probably easiest to kill all running jobs when the failure happens (similar to --memfree).

Restart of a killed job should not affect --retries (similar to --memfree's new behaviour).


Anonymous
Mon 05 Aug 2024 05:03:43 PM UTC, comment #1: 

This will make sense if you have a bunch of jobs that you do not know whether can be run in parallel.

So first try running them in parallel. And if that fails, revert to doing them in serial.

Sort of trying `make -j` and if that fails (because the dependency graph not tested with parallel build), try `make` instead.

Would it make sense to try running the jobs in "less" parallel first? E.g. set --job = --job/2 and see if that works until you hit --job = 1.

Can it be a natural extension of an existing option (maybe --halt? maybe --retries?).



Anonymous
Mon 05 Aug 2024 02:08:57 PM UTC, original submission:  

If -j > 1, when any of the jobs fail, option to auto-switch to -j1, before or after fulfilling the retries.

LGTR <lgtr>

 

(Note: upload size limit is set to 16384 kB, after insertion of the required escape characters.)

Attach Files:
   
   
Comment:
   

No files currently attached

 

Depends on the following items: None found

Items that depend on this one: None found

 

Carbon-Copy List
  • -email is unavailable- added by lgtr (Submitted the item)
  •  

    There are 0 votes so far. Votes easily highlight which items people would like to see resolved in priority, independently of the priority of the item set by tracker managers.

    Only logged-in users can vote.

     

    No changes have been made to this item

    Back to the top

    Powered by Savane 3.13-24de.
    Corresponding source code