Poor man's parallel in Bash

Original post on my blog, happy to include feedback! Cover: Paris bibliothéques, via Clawmarks Topics: Running scripts in parallel Tools to limit concurrent jobs Shell process handling cheatsheet Limit concurrent jobs with Bash Bonus: One-liners 1) Running scripts in parallel does not take much effort: I've been speeding up builds by running commands simultaneously with an added & ampersand: # stuff can happen concurrently # use `&` to run in a sub-shell cmd1 & cmd2 & cmd3 & # wait on sub-processes wait # these need to happen sequentially cmd3 cmd4 echo Done! Job control is a shell feature: commands are put into a background process and run at the same time. Now assuming you want to loop over more than a few commands, e.g. converting files: for file in *.jpg; do # start optimizing every file at once jpegoptim -m 45 "${file}" & done # finish queue wait Running a lot of processes this way is still faster than a regular loop. But compared to just a few concurrent jobs there are no speed gains – even possible slowdowns on async disk I/O [Quotation needed]. So you'll want to use 2) Tools to limit concurrent jobs by either 1) installing custom tools like parallel or xjobs or 2) relying on xargs, which is a feature-rich tool but more complicated. Transforming wait to xargs code is described here: an example for parallel batch jobs. The article notes small differences between POSIX flavours – e.g. different handling of separators on BSD/MacOS. We'll be choosing option 3) – digging into features of wait and jobs to manage processes. Quoting this great summary, here are some example commands for 3) Shell process handling # run child process, save process id via `$!` cmd3 & pid=$! # get job list jobs # get job ids only # note: not available on zsh jobs -p # only wait on job at position `n` # note: slots may turn up empty while # newer jobs rest in the queue's tail wait %n # wait on last job in list wait %% # wait on next finishing process # note: needs Bash 4.3 wait -n Taking our example from before, we make sure to 4) Limit concurrent jobs with Bash each time a process is finished using wait -n: for file in *.jpg; do jpegoptim -m 45 "${file}" & # still < 3 max job -l ines? continue loop if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi # with 3 jobs, wait for -n ext, then loop wait -n done # finish queue wait Sadly, this won't work in MacOS, because Bash environments are frozen on old versions. We replace the wait -n command with wait %% to loop on the 3rd/last job in queue – an ok compromise on small groups (1/3 chance of fastest/slowest/medium job): for file in *.jpg; do jpegoptim -m 45 "${file}" & # still < 3 max job -l ines? continue loop if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi # with 3 jobs, wait for last in line, then loop wait %% done # finish queue wait To further develop the code, one could check for Bash version or alternative shells (zsh on MacOS) to switch code depending on context. I keep using these: 5) Bonus: One-liners # sequential, slow time ( for file in *.jpg; do jpegoptim -m 45 "${file}" ; done ) # concurrent, messy time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & done; wait ) # concurrent, fast/compatible time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait %%; done; wait ) # concurrent, fastest time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait -n; done; wait ) Fun Fact As the 20th birthday post by parallel author Ole Tange explains, the original version was leveraging make because it allows asynchronous processes as well.

Jan 10, 2025 - 21:01
 0
Poor man's parallel in Bash

Original post on my blog, happy to include feedback!
Cover: Paris bibliothéques, via Clawmarks

Topics:

  1. Running scripts in parallel
  2. Tools to limit concurrent jobs
  3. Shell process handling cheatsheet
  4. Limit concurrent jobs with Bash
  5. Bonus: One-liners

1) Running scripts in parallel

does not take much effort: I've been speeding up builds by running commands simultaneously with an added & ampersand:

# stuff can happen concurrently
# use `&` to run in a sub-shell
cmd1 &
cmd2 &
cmd3 &
# wait on sub-processes
wait
# these need to happen sequentially
cmd3
cmd4

echo Done!

Job control is a shell feature: commands are put into a background process and run at the same time.

Now assuming you want to loop over more than a few commands, e.g. converting files:

for file in *.jpg; do
    # start optimizing every file at once
    jpegoptim -m 45 "${file}" &
done
# finish queue
wait

Running a lot of processes this way is still faster than a regular loop. But compared to just a few concurrent jobs there are no speed gains – even possible slowdowns on async disk I/O [Quotation needed].

So you'll want to use

2) Tools to limit concurrent jobs

by either 1) installing custom tools like parallel or xjobs or 2) relying on xargs, which is a feature-rich tool but more complicated.

Transforming wait to xargs code is described here: an example for parallel batch jobs. The article notes small differences between POSIX flavours – e.g. different handling of separators on BSD/MacOS.

We'll be choosing option 3) – digging into features of wait and jobs to manage processes.

Quoting this great summary, here are some example commands for

3) Shell process handling

# run child process, save process id via `$!`
cmd3 & pid=$!
# get job list
jobs
# get job ids only
# note: not available on zsh
jobs -p
# only wait on job at position `n`
# note: slots may turn up empty while
#       newer jobs rest in the queue's tail
wait %n
# wait on last job in list
wait %%
# wait on next finishing process
# note: needs Bash 4.3
wait -n

Taking our example from before, we make sure to

4) Limit concurrent jobs with Bash

each time a process is finished using wait -n:

for file in *.jpg; do

    jpegoptim -m 45 "${file}" &

    # still < 3 max job -l ines? continue loop
    if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi

    # with 3 jobs, wait for -n ext, then loop
    wait -n

done
# finish queue
wait

Sadly, this won't work in MacOS, because Bash environments are frozen on old versions. We replace the wait -n command with wait %% to loop on the 3rd/last job in queue – an ok compromise on small groups (1/3 chance of fastest/slowest/medium job):

for file in *.jpg; do

    jpegoptim -m 45 "${file}" &

    # still < 3 max job -l ines? continue loop
    if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi

    # with 3 jobs, wait for last in line, then loop
    wait %%

done
# finish queue
wait

To further develop the code, one could check for Bash version or alternative shells (zsh on MacOS) to switch code depending on context. I keep using these:

5) Bonus: One-liners

# sequential, slow
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" ; done )

# concurrent, messy
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & done; wait )

# concurrent, fast/compatible
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait %%; done; wait )

# concurrent, fastest
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait -n; done; wait )

Fun Fact

As the 20th birthday post by parallel author Ole Tange explains, the original version was leveraging make because it allows asynchronous processes as well.