Poor man's parallel in Bash
Original post on my blog, happy to include feedback! Cover: Paris bibliothéques, via Clawmarks Topics: Running scripts in parallel Tools to limit concurrent jobs Shell process handling cheatsheet Limit concurrent jobs with Bash Bonus: One-liners 1) Running scripts in parallel does not take much effort: I've been speeding up builds by running commands simultaneously with an added & ampersand: # stuff can happen concurrently # use `&` to run in a sub-shell cmd1 & cmd2 & cmd3 & # wait on sub-processes wait # these need to happen sequentially cmd3 cmd4 echo Done! Job control is a shell feature: commands are put into a background process and run at the same time. Now assuming you want to loop over more than a few commands, e.g. converting files: for file in *.jpg; do # start optimizing every file at once jpegoptim -m 45 "${file}" & done # finish queue wait Running a lot of processes this way is still faster than a regular loop. But compared to just a few concurrent jobs there are no speed gains – even possible slowdowns on async disk I/O [Quotation needed]. So you'll want to use 2) Tools to limit concurrent jobs by either 1) installing custom tools like parallel or xjobs or 2) relying on xargs, which is a feature-rich tool but more complicated. Transforming wait to xargs code is described here: an example for parallel batch jobs. The article notes small differences between POSIX flavours – e.g. different handling of separators on BSD/MacOS. We'll be choosing option 3) – digging into features of wait and jobs to manage processes. Quoting this great summary, here are some example commands for 3) Shell process handling # run child process, save process id via `$!` cmd3 & pid=$! # get job list jobs # get job ids only # note: not available on zsh jobs -p # only wait on job at position `n` # note: slots may turn up empty while # newer jobs rest in the queue's tail wait %n # wait on last job in list wait %% # wait on next finishing process # note: needs Bash 4.3 wait -n Taking our example from before, we make sure to 4) Limit concurrent jobs with Bash each time a process is finished using wait -n: for file in *.jpg; do jpegoptim -m 45 "${file}" & # still < 3 max job -l ines? continue loop if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi # with 3 jobs, wait for -n ext, then loop wait -n done # finish queue wait Sadly, this won't work in MacOS, because Bash environments are frozen on old versions. We replace the wait -n command with wait %% to loop on the 3rd/last job in queue – an ok compromise on small groups (1/3 chance of fastest/slowest/medium job): for file in *.jpg; do jpegoptim -m 45 "${file}" & # still < 3 max job -l ines? continue loop if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi # with 3 jobs, wait for last in line, then loop wait %% done # finish queue wait To further develop the code, one could check for Bash version or alternative shells (zsh on MacOS) to switch code depending on context. I keep using these: 5) Bonus: One-liners # sequential, slow time ( for file in *.jpg; do jpegoptim -m 45 "${file}" ; done ) # concurrent, messy time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & done; wait ) # concurrent, fast/compatible time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait %%; done; wait ) # concurrent, fastest time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait -n; done; wait ) Fun Fact As the 20th birthday post by parallel author Ole Tange explains, the original version was leveraging make because it allows asynchronous processes as well.
Original post on my blog, happy to include feedback!
Cover: Paris bibliothéques, via Clawmarks
Topics:
- Running scripts in parallel
- Tools to limit concurrent jobs
- Shell process handling cheatsheet
- Limit concurrent jobs with Bash
- Bonus: One-liners
1) Running scripts in parallel
does not take much effort: I've been speeding up builds by running commands simultaneously with an added &
ampersand:
# stuff can happen concurrently
# use `&` to run in a sub-shell
cmd1 &
cmd2 &
cmd3 &
# wait on sub-processes
wait
# these need to happen sequentially
cmd3
cmd4
echo Done!
Job control is a shell feature: commands are put into a background process and run at the same time.
Now assuming you want to loop over more than a few commands, e.g. converting files:
for file in *.jpg; do
# start optimizing every file at once
jpegoptim -m 45 "${file}" &
done
# finish queue
wait
Running a lot of processes this way is still faster than a regular loop. But compared to just a few concurrent jobs there are no speed gains – even possible slowdowns on async disk I/O [Quotation needed].
So you'll want to use
2) Tools to limit concurrent jobs
by either 1) installing custom tools like parallel or xjobs or 2) relying on xargs, which is a feature-rich tool but more complicated.
Transforming wait
to xargs
code is described here: an example for parallel batch jobs. The article notes small differences between POSIX flavours – e.g. different handling of separators on BSD/MacOS.
We'll be choosing option 3) – digging into features of wait
and jobs
to manage processes.
Quoting this great summary, here are some example commands for
3) Shell process handling
# run child process, save process id via `$!`
cmd3 & pid=$!
# get job list
jobs
# get job ids only
# note: not available on zsh
jobs -p
# only wait on job at position `n`
# note: slots may turn up empty while
# newer jobs rest in the queue's tail
wait %n
# wait on last job in list
wait %%
# wait on next finishing process
# note: needs Bash 4.3
wait -n
Taking our example from before, we make sure to
4) Limit concurrent jobs with Bash
each time a process is finished using wait -n
:
for file in *.jpg; do
jpegoptim -m 45 "${file}" &
# still < 3 max job -l ines? continue loop
if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi
# with 3 jobs, wait for -n ext, then loop
wait -n
done
# finish queue
wait
Sadly, this won't work in MacOS, because Bash environments are frozen on old versions. We replace the wait -n
command with wait %%
to loop on the 3rd/last job in queue – an ok compromise on small groups (1/3 chance of fastest/slowest/medium job):
for file in *.jpg; do
jpegoptim -m 45 "${file}" &
# still < 3 max job -l ines? continue loop
if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi
# with 3 jobs, wait for last in line, then loop
wait %%
done
# finish queue
wait
To further develop the code, one could check for Bash version or alternative shells (zsh on MacOS) to switch code depending on context. I keep using these:
5) Bonus: One-liners
# sequential, slow
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" ; done )
# concurrent, messy
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & done; wait )
# concurrent, fast/compatible
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait %%; done; wait )
# concurrent, fastest
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait -n; done; wait )
Fun Fact
As the 20th birthday post by parallel
author Ole Tange explains, the original version was leveraging make
because it allows asynchronous processes as well.