I have many things I'd like to/should write about but this shit blogger is a huge deterrent. I haven't found yet a (1) nice markdown-based framework that would allow for automated indexing of posts and (2) that would allow for a smooth way to move the things I already have here...
So today I will "just" (still painful formatting work to do) post a document I wrote some four years ago when working at IMS.
I see the following guidelines applying almost always when *automating* a task. They aim at perfecting the accountability of the actions taken when running the script. You log actions for a postmortem analysis; nobody gives a shit when things do go as expected!
Notice I said automating *a task*. This includes thus any program, irrespectively of the language! However, I will focus here on the context of a DevOps work to help motivate my points.
I distilled them from experience and you may say they are tuned to compensate my own ADHD: The ultimate goal is to understand everything the script did when looking at the log it writes.
And by everything I really mean everything: starting with who ran it, which script file did they call, in which location was the file, how exactly did they call it (all arguments supplied in the command line), to timestamps of every action the script is performing, which part of the script is logging that action (name of module+function+line number), all that as well in cases when the script ends in failure, etc. A screenshot of the output should be telling us everything!
Bash scripting guidelines
This tutorial is intended as a crash course to help fixing the minimal set of ideas to allow one modify the scripts built in house. In particular we will focus on understanding the automated-upgrade.sh DS5 upgrade script.
Generality or precision have consequently been sacrificed in order to keep this tutorial concise and practical.
(Still a work in progress)
Concepts to brush up for this tutorial
1. Key "actions" we do in any program
Print out some "stuff"
Read in some "stuff"
Store in memory some "stuff"
Make a choice
Repeat some statements in an automated way (so called "loops")
That's all what any program, in any language does.
The first two are part of what's called I/O (input/output) procedures.
2. Variable assignment
This is how we store things in memory
x=43
yz="Hello World"
b=true
Remark
A statement like x=43 means the following:
The computer reads the equal sign and then knows that
43 is a value (see section on Types below) that the computer stores in memory.
x is a label (string) that the computer the computer uses to point to the memory location where the previous value was stored.
After a variable assignment, whenever we use x the computer looks up the memory location pointed to by x and uses the value stored there, i.e., 43
These facts are summarized by saying that the variable x is a reference for the value 43.
2.1 Accessing the value referred to by a variable
In bash we do this similarly to what's called dereferencing, i.e., accessing the value from the memory location that a variable label is pointing to.
In bash we do this by writing a dollar sign in front of the variable name.
Example:
echo $x # 43
In some cases, you may have to wrap the variable label within braces
Example
echo ${x} # 43
3. Bash Types
There are NO Types when it comes to variables' values.
Other languages make distinctions like 'x is a number' and 'yz is a string'.
In bash, basically everything is a string.
The above statements mean that even variable x has a value that is a string, despite its looks.
The only 2 exceptions that we will encounter are that of a
Boolean Type. Case in point, the above variable b has been assigned a Boolean value.
Array/List
4. Conditionals
These are the if-then-else type of statements.
We have the computer crunch different group of statements depending on some conditions being true or false.
This is how we make choices in a programming language.
5. Loops
We have a list of 1000 files and we want to change their names and add the suffix -test.
How can we do that in the most efficient -i.e., fastest- way?
For each file we have to do issue the same command, namely a mv: mv oldname oldname-test.
This means that the action we need to perform has a well defined pattern that is the same for all files.
This is precisely the scenario where we should use a loop.
Loops come in different "forms":
while loop
for loop
until loop (we won't discuss this)
Example of loop in plain English
let x be 3
while x is greater than 0 ; do the following
print the value of x
decrease x by 1
done
In bash we can do it the following way
x=3
while [ $x -gt 0 ] ; do
echo $x
x=$(( x-1 ))
done
6. I/O
6.1 Printing
It's enough to use the command echo. It's not the only one: there is also the printf,
but we won't comment this but on passing.
6.2 Reading in "stuff"
We will be using 5 different ways:
Prompting (interactively) the user
Command line parameters
Redirection
Pipes
Sourcing
We will discuss all this cases in the tutorial though.
7. Functions
A function is a "machine" that, as all machines, takes in some input, and spits out some output, whatever those may be.
In that sense, any Linux command is a function. All take in some input (this could be either provided by the user or read directly from the OS environment) and do something. The latter can be just printing things out on the screen or changing something at the OS or filesystem level or all of the above.
You have seen function in high school: f(x) = 2x. This is a function that doubles any number that we feed to it.
It is vital to make a linguistic distinction:
The name of the function is f
In that example above, x is called the argument of f. Here, we may also call it the parameter of f.
The expression above f(x) is called a function call
Many programming languages use the same syntax. Alas, Bash does not quite do so.
Still, every time we see a label followed by ( we know that the label refers to a function.
In bash, a function call is done as f x.
Yet, when defining a function we must write
f(){
...statements that ...
...the function will do...
}
That is, the label f followed by () will tell the computer that f is a function!
Remark
Any script is for all purposes the same as a function
Any script can itself define new functions and those can be called inside that script
Hence, the same way we read the command-line parameters that we provide to a script is the way
we may grab the arguments we pass to a function when defining that function.
Example
f(){
x=$1
echo $((2*x))
}
# Now let's call function f with argument 3
f 3 # This will print 6 on screen
Design Patterns we follow
The first key idea to keep always in mind is modularity.
We want our code to be split into different chunks. Each chunk should be as independent as possible from any other chunk.
In Bash we have two ways to achieve this modularity:
Split your code into functions all defined in a single file.
Split your code into functions each defined in a separate file.
In both cases there will be one of those functions that triggers the execution of all the others we may need to run.
For such a reason we may call such function the "main" function.
Opposite to such a modular design is that of a monolithic, sequential and imperative code as the one we started with at the beginning of our project. Such a code is always more complicated to maintain and will never scale well: The more features one adds to it the more complex it becomes adding them. It leads systematically to a code that violates both the DRY and KISS rules.
Basic structure of our code
Minimal design pattern
All scripts should follow the following MINIMAL skeleton:
#!/bin/bash
trap trapFailurerMsgLogFile ERR
set -Eeuo pipefail
usage(){
}
main(){
}
#### HELPER FUNCTIONS
printInfo(){
}
trapFailurerMsgLogFile(){
}
errorExit(){
}
main "$@"
The order of those statements is an essential part of the design of our code. Below we will see why.
Explanation
Line by line:
1 ) Needed in order to be able to execute the file from the command line by directly calling its name.
. <file>.sh or if made executable (chmod -x <file>.sh) call it with ./<file>.sh.
2 ) Defines a trapping mechanism for the signal ERR. This signal is triggered at runtime whenever a command returns a values greater than 0. This line then says that in such a case the function trapFailurerMsgLogFile() should be called. This is where you can make the script bail out in a nice and controlled way. See below.
3 ) Enables the raising errors even within pipes
4-5 ) Defines the function usage(). This function has only one job: to print out the instructions on how to use the present script.
7-8 ) Defines the function main() which will play the role of our main function, that is, the one that will trigger the execution of all the other functions.
10 ) A comment line identifying that the rest of the script contains helper functions. These are functions that aren't key for the task at hand, but just come handy as they capture what are some recurrent design patterns.
11-12 ) Defines a function whose only task will be to print out logging information in a well-defined format.
14-15 ) Defines the function that the bash trapping mechanism will call whenever a command returns an error status value, that is, roughly, whenever a statement ends in an error.
17-18 ) Defines the function that exits the script in a well-controlled way whenever there is an error. A call to this function may well be the last statement of function trapFailurerMsgLogFile()
20 ) This constitutes the call to function main() which triggers the execution of the whole script. We are passing to this function the argument $@. Even though it looks like one single argument, the shell unfolds it into all arguments we passed to the script when we called it from the command line.
Why such order
Setting up the trapping mechanism must be done as soon as possible, lest we miss to react on some errors.
The very first two functions shall be the usage() and the main function. The reason is simple: we write code not for the computer, but for others to read it and understand as quickly and easily as possible what our code does.
By writing the usage first, even other developers can quickly get an idea of how this program is expected to work. If they are experienced enough, it may even give them a hint on how it is designed and the underlying logic.
By writing the main() function next, we offer a chance to the interested reader to quickly get a grasp of the logic of the program.
Furthermore, we will see that often we will havemain() handle the command line arguments. If so, we will deal with them almost at the very top of the function main()'s body. Thus, by being the second function of the script, the interested developer will quickly and easily have access to the command line behavior of the script. This facilitates modifying such behavior; and, when doing so, modifying as well the usage help accordingly, as it right there above the arguments handling code.
The only exception to main() being the second function is if we define an extra function getArguments whose task would be precisely to deal with the command line arguments.
In such a case, getArguments shall come as the second function of the script, immediately followed by main().
To summarize our leading idea on the order of things: First, instructions and goal of the script, then logic, and finally the details.
Stack Tracing your Bash scripts
To get a meaningful information when an error is caught we need to add information on the stack of relevant callers at the time the exception was thrown.
Let's modify the minimal script to see how to include that.
We will achieve this by using two built-in Bash features: the (script's ENV) array FUNCNAME and the function caller.
#!/bin/bash
trap trapFailurerMsgLogFile ERR
set -Eeuo pipefail
## BASIC GLOBAL VARS
# This script full path name, no matter how we call it.
THISEXE="(dirname (pwd)/0; fi)"
# This script basename. Used only once at the start for cleaning up of working directory
THISEXEBASENAME=0)
# Keep working directoy tidy: Use date/time stamp and each upgrade within own folder
RUN_DATE="$(date "+%Y%m%d-%H%M")"
# Define the working directory. Currently (Oct 15th 2021) it's where the config.dat is expected and the log file is being written.
WORKING_DIRECTORY_BASENAME="/usr/local/upgrade"
WORKING_DIRECTORY=RUN_DATE
# LOGFILE
LOGFILE="RUN_DATE.log"
## END BASIC GLOBAL VARS
usage() {
cat <<EOU
Usage: $THISEXEBASENAME [OPTIONS]
This script does ...bla bla bla ...
OPTIONS:
-h, --help Print this help.
EOU
}
main() {
if [ $# -lt 1 ]; then
printInfo "Use option -h for help"
exit
fi
while [ $# -gt 0 ]; do
case $1 in
-h | -help | --help)
shift
usage
exit
;;
*)
errorExit "Unknown option $1"
;;
esac
done
}
#### HELPER FUNCTIONS
printInfo() {
local caller=${FUNCNAME[1]}
[ caller ($(caller)) : Missing log message. Exiting."
logThisMessage INFO @"
}
# logThisMessage signature: msgType MessgeLoggingFunction ActualMessageString(s)
logThisMessage() {
local caller=${FUNCNAME[1]}
[ caller (((3 - $#)) logging arguments out of type/caller/message. Exiting."
local msgType=$1 ; shift
local logger=$1 ; shift
echo -e "[THISEXEBASENAME :: logger : $@"
}
printError() {
local caller=${FUNCNAME[1]}
[ caller ($(caller)) : Missing log message. Exiting."
logThisMessage ERROR @"
}
trapFailurerMsgLogFile() {
printError "(caller))" "See log file under $LOGFILE"
}
errorExit() {
local caller=${FUNCNAME[1]}
echo -e "[THISEXEBASENAME :: ERROR: (caller)): $@"
exit 1
}
main "LOGFILE
What does this achieve:
The script clones its output to both the standard output (screen) and a logfile define on line 18
All the output has a well-defined structure that makes it easier to parse by other automated tools
Helper functions include a "contract": This is similar to what an assert in some programming languages does, namely enforcing some constraints. In this case, we are making sure at runtime that those helper functions are being called with the expected amount of arguments. This is the best we can do to detect programming errors as Bash, being an interpret language, lacks the equivalent of a compile-time code check.
All helper functions record the caller function in a local variable called caller. This is the function that called the present helper function
In addition, all helper functions call the Bash, built-in function caller. Its output is collected as a string on-the-fly by using the back-tilde notation caller. Another equivalent form to do the same would be writing $(caller) instead. In any case, this call to function caller returns the name of the script and the line number in the code where it was called from. This feature allows us to mimic the stack tracing of our Bash script in a similar, but more limited, way to the stack tracing available in richer programming languages, e.g., Python.
We could simplify the code by removing the local variables. However, using them makes the code more readable -given we name those variables smartly enough. There is always a trade-off like this when programming, between shortness/simplicity and readability.
In addition this code shows how to use
a conditional if-then,
a while-loop,
querying the number of arguments passed to a function $#,
a multiway branch statement case ... esac. In other languages this is available as a switch statement, e.g., in C++. In Bash, each case we want to catch starts with one or more guards, e.g. -h|-help|--help). These specify the string patterns that trigger each case. Also, each case must end with double semi-colon ;;. We could as well have used a regex for it, namely, -h*|--h*). The pattern defined by the asterisk * means any string, hence, it's the way we define the default behavior.
Once we enable the trapping mechanism for regular error return values (“trap errfunc ERR” as in line 2 above), we **must write all in-line conditionals in normal disjunctive form**, in case we intent to come up gracefully from an error.
Example:
With trapping enabled the following line 2 will never be executed if line 1 ends in error.
The correct way is show in the second code box.
systemctl start jboss
[ $? -gt 0 ] && errorExit "There was an error when starting JBOSS" #NEVER EXECUTED
systemctl start jboss || errorExit "There was an error when starting JBOSS" # OK
Arrays
The following example shows how to define, use, and loop through the elements of an array.
$ myarr=( a1 a2 a3 )
myarr - {myarr[1]} - {myarr[3]}"
a1 - a1 - a2 - a3 -
myarr - {myarr[1]} - {myarr[3]}<<"
>>a1 - a1 - a2 - a3 -<<
{myarr[@]}"
a1 a2 a3
{myarr[@]} ; do echo $arrelem ; done
a1
a2
a3
{myarr[*]} ; do echo $arrelem ; done
a1
a2
a3
{myarr[@]}" ; do echo $arrelem ; done
a1
a2
a3
{myarr[*]}" ; do echo $arrelem ; done
a1 a2 a3
Remarks:
When referencing beyond the last element of an array, like in ${myarr[3]}, Bash simply prints the empty string, but does't throw an error.
Indexes of array elements start by 0
Referencing {myarr[0]}.
There is an important, but very subtle difference when referencing ALL the contents of an array at once using the at index @ or the asterisk index *. This difference is only noticeable when we quote such reference: The former still shows up as individual array entries, while the second shows up as one single string!
The structure of the for loop is for [variable-label] in [list-of-items-to-loop-through] ; do [code-to-execute-for-each-item] ; done
Use of Bash built-in FUNCNAME array and caller function
See the following example:
$ foo(){ echo "{FUNCNAME[1]} - `caller`" ;}
$ goo(){ foo ;}
$ goo
foo - goo - 1 main
First we define two functions foo and goo. The latter just calls the fist.
When we execute goo, it calls foo, which in turn prints out the first 2 elements of the array FUNCNAME as well as the output of the Bash built-in function caller.
As this example is run directly on the command-line, there is no script name, and also the execution entails only one line, namely the one where we call function goo. Hence, the function caller returns line number 1 and as "script-name", the string "main". If we would run these three line from inside a script the name printed out would be that of the script.