The awesome linux bash − Geekingfrog's blog

I'm a big fan of unix and bash, the idea of having tons of small tools that can be combined to achieve complex result is really neat. Recently I had to download a big list of files from amazon s3. I ended up doing:

cat <(cat list_of_file.txt) <(ls downloaded_files) | sort | uniq -u | xargs -I {} -P6 aws s3 cp s3://bucket_name/sub/directory/full/of/files/{} ./downloaded_files

Some explanation (for my future self)

<(...) is a bash command to do process substitution. Which means the output of the command inside the parens will be used as the input for the outside command.
cat <(...) <(...) | sort | uniq -u outputs the files from the list which are not already downloaded.
xargs -I {} -P6 [cmd] run the given command with a parallelism of 6 and will substitute {} with the given arguments.

For more resources:

command line tool can be faster than hadoop cluster
the art of command line. There are so much more in this document, it's well worth multiple reads
the mighty named pipe explains named pipe and process substitution. I'm rarely using these but it's super powerful.