陈斌彬的技术博客

Stay foolish,stay hungry

GNU Parallel

img

GNU parallel is a shell tool for executing jobs in parallel using one or more computers. A job can be a single command or a small script that has to be run for each of the lines in the input. The typical input is a list of files, a list of hosts, a list of users, a list of URLs, or a list of tables. A job can also be a command that reads from a pipe. GNU parallel can then split the input and pipe it into commands in parallel. If you use xargs and tee today you will find GNU parallel very easy to use as GNU parallel is written to have the same options as xargs. If you write loops in shell, you will find GNU parallel may be able to replace most of the loops and make them run faster by running several jobs in parallel.

GNU parallel makes sure output from the commands is the same output as you would get had you run the commands sequentially. This makes it possible to use output from GNU parallel as input for other programs.

For each line of input GNU parallel will execute command with the line as arguments. If no command is given, the line of input is executed. Several lines will be run in parallel. GNU parallel can often be used as a substitute for xargs or cat | bash.

GNU Parallel 是一个 shell 工具,为了在一台或多台计算机上并行的执行计算任务,一个计算任务可以是一条 shell 命令或者一个以每一行做为输入的脚本程序。通常的输入是文件列表、主机列表、用户列表、URL列表或者表格列表;一个计算任务也可以是一个从管道读取的一条命令。GNU Parallel 会把输入分块,然后通过管道并行的执行。

如果你会使用 xargs 和 tee 命令,你会发现 GNU Parallel 非常易于使用,因为 GNU Parallel 具有与 xargs 一样的选项。GNU Parallel 可以替代大部分的 shell 循环,并且用并行的方式更快的完成计算任务。

GNU Parallel 保证它的输出与顺序执行计算任务时是一样的,这样就可以方便的把 GNU Parallel 的输出做为其它程序的输入。

对于每一行输入,GNU Parallel会 把这一行做为参数来运行指定的命令。如果没有给出命令,那么这一行会被当做命令执行。多行输入会并行的运行。GNU Parallel 经常被用于替代 xargs 或者 cat | bash。

Resource Reference