## Shell Scripting ############################################################ > ACM 2018-11-07 ==== Topics =================================================================== ------ Introductory ----------------------------------------------------------- * return codes * if * while * for * command substitution * handle re-direction * useful tools * convert * pandoc * ps * lsof * sed basics (s///g) * awk basics (field extraction) * grep * find ------ Intermediate ----------------------------------------------------------- * check if a command is in $PATH * lockdir * backgrounds jobs * doing math in shell scripts * xargs * set -u, set -e, set -x ------ Advanced --------------------------------------------------------------- * abusing /proc for fun and profit * path expansions and YOU * portable path normalization: why we can't have nice things ==== Emperor SH and the Traveler ============================================== A young traveller came to visit Emperor Sh, and found him sitting in his sparsely furnished temple. "Emperor Sh," he said, "I am told you are the greatest scholar of shell that the world has known." Emperor Sh made no reply. The traveller continued. "I have come to ask your advice. I am thinking of developing a character-based graphing tool. It will interactively change the plot based on key presses. What shell commands should I use?" "Don't do it in shell," said Emperor Sh, curtly. The young traveller was confused. He tried again. "Well, I am also working on a database audit script. It needs to validate that certain characters do not appear in any fields in several tables." "Don't do it in shell," repeated Emperor Sh. The traveller began to despair. "I have journeyed one thousand miles, tried thirteen distributions of my operating system, and waded through hundreds of badly-written manual pages," he cried, "and now I am finally come to Emperor Sh, the greatest shell programmer in the world, and am told to use no shell at all! I may as well do what my friends told me, and just string together some small Python scripts in virtualenv environments!" "Good idea," said Emperor Sh. "Do it in shell." Enlightenment crushed down on the young traveller and he bowed to the emperor, sobbing with reverent joy. > Source: https://sanctum.geek.nz/etc/emperor-sh-and-the-traveller.txt ==== Summary ================================================================== ------ Return Codes ----------------------------------------------------------- * most recent stored in $? * 0 on success, nonzero on failure ------ Flow Control ----------------------------------------------------------- if expression ; then command fi [ -f "$something" ] [ -z "$somevar" ] [ "thing" = "thing2" ] while expression ; do command done for varname in list ; do command done ------- Command Substitution -------------------------------------------------- * Substitute the output of a command into a string literal. echo "the current date is $(date)" ------ Re-Direction ----------------------------------------------------------- Re-direct standard in command < stdin.txt Standard out: command > stdout.txt or command 1> stdout.txt Standard error: command 2> stderr.txt All at once: command < input.txt > stdout.txt 2> stderr.txt Merge stdout and stderr: command > output.txt 2>&1 stdout of one commad to stdin of another: command1 | command2 stdout of one command as a file handle: command1 <(command2) ------ Useful Tools ----------------------------------------------------------- ~~~~~~~~ ImageMagick (convert) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Convert allows you to perform a wide variety of transformations and conversions on various image formats. For example: convert foo.jpg foo.png Would convert a JPEG image to a PNG. >>> More here: http://www.imagemagick.org/Usage/ Combining with the loops we discussed already: for f in *.jpg ; do convert "$f" "$(basename "$f" .jpg).png" ; done Would convert every JPEG in ./ to PNG, with the same name but a changed extension. *.jpg is a path expansion, which we will discuss later. "$(basename "$f")" is a command substitution, which will also be discussed later. ~~~~~~~~ pandoc ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Pandoc allows one to convert nearly any type of document to nearly another type. Not always very well though (jack of all trades, master of none). For example: pandoc README.md --output README.pdf --from markdown --to latex Would convert README.md to a PDF via LaTeX. >>> More examples here: https://pandoc.org/demos.html ~~~~~~~~ ps ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is relatively common across POSIX, although output format can vary in some cases. ps lists running processes. Usually, you want "ps aux" which lists everything. Hint: you can get a list of just process names and PIDs with ps aux | awk '{print($2, $11);}' Note that this is not robust against commands with spaces in the name. ~~~~~~~~ lsof ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Linux-specific command that lists all open file handles and sockets. Useful for figuring out which process has a lock on a file (i.e. lsof | grep "somefile"). ~~~~~~~~ sed ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ sed exposes most of the functionality of the ed text editor in an automated, stream-oriented interface. The capabilities of sed are expansive[1], we'll just cover the basics today. The most common use for sed is re-writing patterns in streams, for example: echo -e "line 1\nline 2\nline 3" | sed 's/line/number/g' This example would output: number 1 number 2 number 3 >>> See also the famous sed1line.txt: http://sed.sourceforge.net/sed1line.txt >>> Danger: gnu sed and BSD sed are different and often incompatible, be careful out there! 1: https://aurelio.net/projects/sedarkanoid/ ~~~~~~~~ awk ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Awk is an extremely powerful record-processing oriented DSL. The fully depth of awk is beyond the scope of this talk. For our purposes, it is most useful for extracting fields from columnar data with irregular separators (i.e. ps aux). awk '{print($1, $2, $3... $n);}' Will extract columns 1, 2, 3... up to n. You can change the order, separator, or selection to your liking. >>> Highly suggest "The Awk Programming Language"": https://www.amazon.com/Programming-Language-Kernighan-Weinberger-Paperback/dp/B00LLOFNOW (hint: PDFs are relatively easy to find on the net) ~~~~~~~~ grep ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Grep is used to search for patterns based on regular expressions. Select every line containing foo from a file: grep "foo" < file.txt Find every file recursively under . containing the string foo: grep "the" -R ./ Note that some versions of grep support PCREs via the -P flag - this is not portable. Instead just invoke perl directly: grep -P "[a-z]{4}" is equivalent to: perl -ne 'print if /[a-z]{4}/' ~~~~~~~~ find ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Find can be used to search for files by their name or path. Find all files contain the string foo in the filename: find -name "*foo*" Use -iname for case insensitive, and -path/-ipath to search through other path components. If you want to use find as the input to a loop, use -print or -print0 - this helps to resolve string escaping issues. A common idiom is: find . -iname "pattern" -print | while read -r line ; do something ; done Or find . -iname "pattern" -print0 | xargs -0 somecommand You can also use find -exec, but this is generally not preferred as it is less flexible. ------ Check if a Command is in $PATH ----------------------------------------- Often, we want to check if a specific command is available for use - this is especially useful for sanity checking in scripts. A common idiom is: if [ ! -x "$(which coolcommand 2> /dev/null)" ] ; then echo "oops, looks like coolcommand isn't installed!" exit 1 fi Note that this is always guaranteed to be portable, while other methods (i.e. command -v) are not. $(which coolcommand 2> /dev/null) gets the path to the coolcommand executable found in $PATH. The 2>/dev/null discards any error message. If no such command is found, then the test simplifies to [ ! -x "" ], which is always true. ------ lockdirs --------------------------------------------------------------- Often, we want to guarantee that only one instance of our program is running at any given moment. There are many ways to accomplish this, but this simplest would be a lockdir. A lockdir is the convention of creating a specific directory when the program starts, and deleting it when it exists. This is preferred over creating a file (i.e. with touch), since directory creation is atomic on most systems. Here is an example: if [ -d "/var/lockdir/myprog" ] ; then echo "oops, myprog already running!" exit fi mkdir -p "/var/lockdir" # ensure /var/lockdir exists mkdir "/var/lockdir/myprog" Note that the method shown above is not robust against a program crashing before it deletes the lockdir. ------ Background Jobs -------------------------------------------------------- Any command you suffix with & will be run in the background. You can get a list of jobs you have backgrounded with the "jobs" command. A common idiom is: somecommand > /tmp/output.txt & # do other things wait # block until all background jobs exit # do things with output.txt You can background a running program (i.e. vim) via . You can later foreground the job via the "fg" command. This is useful for the workflow of edit->run->edit. You can also detach a background job from your shell session with disown (i.e. exiting your terminal will not halt the program). Example: atril something.pdf & disown exit Would result in atril continuing to run, despite the terminal session being closed. ------ Math in Shell Scripts -------------------------------------------------- The best way to get accurate math results in shell scripts is via the bc command. There are also various ways of performing math operations natively, but these are cumbersome, and most shells do not support floating point math. Add two numbers: A=7 B=3 echo "$A + $B" | bc Floating point divide: A=7 B=3 echo "scale=2 ; (1.0 * $A) / $B" | bc Hint: scale sets the number of digits of precision. By default, scale is 0. If you are unsure if one of your terms will be expressed as an int or a float, multiplying by 1.0 will coerce it's type to be a float. You can use command substitution to catch the result of your calculations. Note that this method (bc) can be very slow. If this is too slow for your application, think long and hard about if it really needs to be implemented as a shell script. ------ xargs ------------------------------------------------------------------ xargs allows you to reads the arguments to a program from standard input. Let's look at an example: echo -e "line 1\nline 2\nline 3\nline 4" | xargs echo "my arguments are: " Outputs: my arguments are: line 1 line 2 line 3 line 4 We can force xargs to only provide one argument per call echo -e "line 1\nline 2\nline 3\nline 4" | xargs -n1 echo "my argu > Outputs: my arguments are: line my arguments are: 1 my arguments are: line my arguments are: 2 my arguments are: line my arguments are: 3 my arguments are: line my arguments are: 4 Or an arbitrary number: echo -e "line 1\nline 2\nline 3\nline 4" | xargs -n4 echo "my argu > Outputs: my arguments are: line 1 line 2 my arguments are: line 3 line 4 ------ Officially Unofficial sh Strict Mode ----------------------------------- sh has an unofficial "strict mode", which you should usually use unless you have a reason not to. Executing set -e set -u At the beginning of your program will cause it to exit with an error if any command throws an uncaught error (set -e), or an variable you attempt to use is undefined (set -u). This can save you from cases like this: # ... some code ... rm -rf $HOME/$SOMEPATH Without set -u, if SOMEPATH is undefined, this will simplify to: rm -rf $HOME Which is almost certainly not what you intend. ------ Debugging Shell Scripts ------------------------------------------------ A good way to debug your shell scripts is via "set -x". This causes every line of shell which is executed to also be printed. ------ Abusing /proc for Fun and Profit --------------------------------------- On Linux only (or FreeBSD with procfs emulation enabled), /proc allows you to enumerate the file handles and sockets of all open programs. In particular, /proc/$PID/fd/ contains all open file descriptors for process with ID $PID. Remember that FD 0 will be standard in, FD 1 standard out, and FD 2 standard error. Prank your friends by writing to standard out of their shell sessions! You can also use this method to send input into the standard in of a detached process. Consider a case like this: apt install somepackage & disown Then apt prompts for input and hangs until you provide it. Just use ps aux to find the PID of your wayward apt instance and echo -n "y\n" into it's standard input!. ------ Path Expansions and YOU ------------------------------------------------ When writing shell scripts, it is important to understand path expansions. Generally, in any string which his not surrounded by single quotes, characters like '*' and '~' will be expanded before being passed to any child processes. Consider this shell session: $ ls a b c d.txt e.txt $ echo * a b c d.txt e.txt $ echo *.txt d.txt e.txt Notice that echo itself does not expand the argument '*' - the shell does this before echo even starts running. This can cause some interesting behaviors. For example, consider this command: rm -rf ./.* What's wrong here? . . . . . . . . . The issue with this command is that ./.* expands to every child of ./ that begins with '.' - including '..'. rm -rf will thus descend into '..' and process the parent directory of ./ as well, and so on until it reaches the filesystem root, then back down into every directory on the entire system. In newer versions of rm, this behavior has been disabled for safety however. ------ portable path normalization: why we can't have nice things ------------- Often, when we deal with paths provided to us by the user, we want to normalize the path to be "nice". For example, converting "../../foo/../bar/baz/././" into its more comprehensible equivalent: "../../bar/baz/". Unfortunately, there isn't a single standardized method of accomplishing this. the `realpath` command was written long ago to handle this specific case, but it has yet to become standard even across linux distributions, let alone across other UNIX variants. This can cause all kinds of havoc for unsuspecting shell scripts, and there isn't really a clean or non-kludgey solution. The only real, portable way of handling the path normalization issue is to just bundle your own realpath implementation that will run in POSIX sh. Fortunately, Mr. Michael Kropat has made available a high-quality MIT licensed implementation. >> sh-realpath: https://github.com/mkropat/sh-realpath