Saturday, April 19, 2025

On the foolishness of NL programming

 The title of this post is but that of an article by the late E.W. Dijkstra that is still fully valid in these days of frantic work on LLMs:

The virtue of formal texts is that their manipulations, in order to be legitimate, need to satisfy only a few simple rules; they are, when you come to think of it, an amazingly effective tool for ruling out all sorts of nonsense that, when we use our native tongues, are almost impossible to avoid.

Instead of regarding the obligation to use formal symbols as a burden, we should regard the convenience of using them as a privilege: thanks to them, school children can learn to do what in earlier days only genius could achieve. When all is said and told, the "naturalness" with which we use our native tongues boils down to the ease with which we can use them for making statements the nonsense of which is not obvious.

You can access it here.


Sunday, April 6, 2025

A Scoop On The Current Research Frontiers

 BREAKTHROUGH PRICES 2025

GLP-1 Diabetes and Obesity Discovery | Multiple Sclerosis Causes and Treatments | DNA Editing

Exploration of Nature at Shortest Distances

Proof of Geometric Langlands Conjecture

Special Prize Awarded to Giant of Theoretical Physics

Breakthrough Prize in Life Sciences Awarded to Daniel J. Drucker, Joel Habener, Jens Juul Holst, Lotte Bjerre Knudsen and Svetlana Mojsov; Alberto Ascherio and Stephen L. Hauser; and David R. Liu

Breakthrough Prize in Fundamental Physics Awarded to More than 13,000 Researchers from ATLAS, CMS, ALICE and LHCb Experiments at CERN

Breakthrough Prize in Mathematics Awarded to Dennis Gaitsgory

Special Breakthrough Prize in Fundamental Physics Awarded to Gerardus 't Hooft

Six New Horizons Prizes Awarded for Early-Career Achievements in Physics and Mathematics

Three Maryam Mirzakhani New Frontiers Prizes Awarded to Women Mathematicians for Early-Career Work


Te official announcement has descriptions of each contribution: https://breakthroughprize.org/News/91 

Wednesday, February 5, 2025

DevOps Basics I: The Structure of your Scripts v0

I have many things I'd like to/should  write about but this shit blogger is a huge deterrent. I haven't found yet a (1) nice markdown-based framework that would allow for automated indexing of posts and (2) that would allow for a smooth way to move the things I already have here...

So today I will "just" (still painful formatting work to do) post a document I wrote some four years ago when working at IMS.

I see the following guidelines applying almost always when *automating* a task. They aim at perfecting the accountability of the actions taken when running the script. You log actions for a postmortem analysis; nobody gives a shit when things do go as expected!

Notice I said automating *a task*. This includes thus any program, irrespectively of the language! However, I will focus here on the context of a DevOps work to help motivate my points.

I distilled them from experience and you may say they are tuned to compensate my own ADHD: The ultimate goal is to understand everything the script did when looking at the log it writes. 

And by everything I really mean everything: starting with who ran it, which script file did they call, in which location was the file, how exactly did they call it (all arguments supplied in the command line), to timestamps of every action the script is performing, which part of the script is logging that action (name of module+function+line number), all that as well in cases when the script ends in failure, etc. A screenshot of the output should be telling us everything!

Bash scripting guidelines

This tutorial is intended as a crash course to help fixing the minimal set of  ideas to allow one modify the scripts built in house. In particular we will focus on understanding the automated-upgrade.sh DS5 upgrade script.

Generality or precision have consequently been sacrificed in order to keep this tutorial concise and practical.

(Still a work in progress)

Concepts to brush up for this tutorial


1. Key "actions" we do in any program


Print out some "stuff"

Read in some "stuff"

Store in memory some "stuff"

Make a choice

Repeat some statements in an automated way (so called "loops")


That's all what any program, in any language does.


The first two are part of what's called I/O (input/output) procedures.


2. Variable assignment

This is how we store things in memory



x=43

yz="Hello World"

b=true

Remark


A statement like x=43 means the following:


The computer reads the equal sign and then knows that


43 is a value (see section on Types below) that the computer stores in memory.


x is a label (string) that the computer the computer uses to point to the memory location where the previous value was stored.


After a variable assignment, whenever we use x the computer looks up the memory location pointed to by x and uses the value stored there, i.e., 43


These facts are summarized by saying that the variable x is a reference for the value 43.


2.1 Accessing the value referred to by a variable


In bash we do this similarly to what's called dereferencing, i.e., accessing the value from the memory location that a variable label is pointing to.


In bash we do this by writing a dollar sign in front of the variable name.

Example:


echo $x    # 43


In some cases, you may have to wrap the variable label within braces

Example


echo ${x}  # 43


3. Bash Types


There are NO Types when it comes to variables' values.


Other languages make distinctions like 'x is a number' and 'yz is a string'.


In bash, basically everything is a string.

The above statements mean that even variable x has a value that is a string, despite its looks.


The only 2 exceptions that we will encounter are that of a


Boolean Type. Case in point, the above variable b has been assigned a Boolean value.


Array/List


4. Conditionals


These are the if-then-else type of statements.


We have the computer crunch different group of statements depending on some conditions being true or false.


This is how we make choices in a programming language.


5. Loops


We have a list of 1000 files and we want to change their names and add the suffix -test.


How can we do that in the most efficient -i.e., fastest- way?


For each file we have to do issue the same command, namely a mv: mv oldname oldname-test.

This means that the action we need to perform has a well defined pattern that is the same for all files.

This is precisely the scenario where we should use a loop.


Loops come in different "forms":


while loop


for loop


until loop (we won't discuss this)


Example of loop in plain English


let x be 3

while x is greater than 0 ; do the following

 print the value of x

 decrease x by 1

done


In bash we can do it the following way


x=3

while [ $x -gt 0 ] ; do

    echo $x

    x=$(( x-1 ))

done


6. I/O


6.1 Printing


It's enough to use the command echo. It's not the only one: there is also the printf,

but we won't comment this but on passing.


6.2 Reading in "stuff"


We will be using 5 different ways:


Prompting (interactively) the user


Command line parameters


Redirection


Pipes


Sourcing


We will discuss all this cases in the tutorial though.


7. Functions


A function is a "machine" that, as all machines, takes in some input, and spits out some output, whatever those may be.


In that sense, any Linux command is a function. All take in some input (this could be either provided by the user or read directly from the OS environment) and do something. The latter can be just printing things out on the screen or changing something at the OS or filesystem level or all of the above.


You have seen function in high school: f(x) = 2x. This  is a function that doubles any number that we feed to it.


It is vital to make a linguistic distinction:


The name of the function is f


In that example above, x is called the argument of f. Here, we may also call it the parameter of f.


The expression above f(x) is called a function call


Many programming languages use the same syntax. Alas, Bash does not quite do so.


Still, every time we see a label followed by ( we know that the label refers to a function.


In bash, a function call is done as f x.


Yet, when defining a function we must write


f(){

 ...statements that ...

 ...the function will do...

}


That is, the label f followed by () will tell the computer that f is a function!


Remark


Any script is for all purposes the same as a function


Any script can itself define new functions and those can be called inside that script


Hence, the same way we read the command-line parameters that we provide to a script is the way

we may grab the arguments we pass to a function when defining that function.


Example


f(){

 x=$1

 echo $((2*x))

}


# Now let's call function f with argument 3

f 3 # This will print 6 on screen


Design Patterns we follow


The first key idea to keep always in mind is modularity.


We want our code to be split into different chunks. Each chunk should be as independent as possible from any other chunk.


In Bash we have two ways to achieve this modularity:


Split your code into functions all defined in a single file.


Split your code into functions each defined in a separate file.


In both cases there will be one of those functions that triggers the execution of all the others we may need to run.

For such a reason we may call such function the "main" function.


Opposite to such a modular design is that of a monolithic, sequential and imperative code as the one we started with at the beginning of our project. Such a code is always more complicated to maintain and will never scale well: The more features one adds to it the more complex it becomes adding them. It leads systematically to a code that violates both the DRY and KISS rules.


Basic structure of our code


Minimal design pattern


All scripts should follow the following MINIMAL skeleton:


#!/bin/bash

trap trapFailurerMsgLogFile ERR

set -Eeuo pipefail

usage(){

}


main(){

}


#### HELPER FUNCTIONS

printInfo(){

}


trapFailurerMsgLogFile(){

}


errorExit(){

}


main "$@"


The order of those statements is an essential part of the design of our code. Below we will see why.


Explanation


Line by line:


1 ) Needed in order to be able to execute the file from the command line by directly calling its name. 

. <file>.sh or if made executable (chmod -x <file>.sh) call it with ./<file>.sh.


2 ) Defines a trapping mechanism for the signal ERR. This signal is triggered at runtime whenever a command returns a values greater than 0. This line then says that in such a case the function trapFailurerMsgLogFile() should be called. This is where you can make the script bail out in a nice and controlled way. See below.


3 ) Enables the raising errors even within pipes


4-5 ) Defines the function usage(). This function has only one job: to print out the instructions on how to use the present script.


7-8 ) Defines the function main() which will play the role of our main function, that is, the one that will trigger the execution of all the other functions.


10 ) A comment line identifying that the rest of the script contains helper functions. These are functions that aren't key for the task at hand, but just come handy as they capture what are some recurrent design patterns.


11-12 ) Defines a function whose only task will be to print out logging information in a well-defined format.


14-15 ) Defines the function that the bash trapping mechanism will call whenever a command returns an error status value, that is, roughly, whenever a statement ends in an error.


17-18 ) Defines the function that exits the script in a well-controlled way whenever there is an error. A call to this function may well be the last statement of function trapFailurerMsgLogFile()


20 ) This constitutes the call to function main() which triggers the execution of the whole script. We are passing to this function the argument $@. Even though it looks like one single argument, the shell unfolds it into all arguments we passed to the script when we called it from the command line.


Why such order


Setting up the trapping mechanism must be done as soon as possible, lest we miss to react on some errors.


The very first two functions shall be the usage() and the main function. The reason is simple: we write code not for the computer, but for others to read it and understand as quickly and easily as possible what our code does.


By writing the usage first, even other developers can quickly get an idea of how this program is expected to work. If they are experienced enough, it may even give them a hint on how it is designed and the underlying logic.


By writing the main() function next, we offer a chance to the interested reader to quickly get a grasp of the logic of the program.


Furthermore, we will see that often we will havemain() handle the command line arguments. If so, we will deal with them almost at the very top of the function main()'s body. Thus, by being the second function of the script, the interested developer will quickly and easily have access to the command line behavior of the script. This facilitates modifying such behavior; and, when doing so, modifying as well the usage help accordingly, as it right there above the arguments handling code.


The only exception to main() being the second function is if we define an extra function getArguments whose task would be precisely to deal with the command line arguments.


In such a case, getArguments shall come as the second function of the script, immediately followed by main().


To summarize our leading idea on the order of things: First, instructions and goal of the script, then logic, and finally the details.


Stack Tracing your Bash scripts


To get a meaningful information when an error is caught we need to add information on the stack of relevant callers at the time the exception was thrown.


Let's modify the minimal script to see how to include that.


We will achieve this by using two built-in Bash features: the (script's ENV) array FUNCNAME and the function caller.


#!/bin/bash

trap trapFailurerMsgLogFile ERR

set -Eeuo pipefail


## BASIC GLOBAL VARS

# This script full path name, no matter how we call it.

THISEXE="(if["(dirname 0)"=="."];thenecho(pwd)/You can't use 'macro parameter character #' in math mode0; fi)"

# This script basename. Used only once at the start for cleaning up of working directory

THISEXEBASENAME=(basename0)


# Keep working directoy tidy: Use date/time stamp and each upgrade within own folder

RUN_DATE="$(date "+%Y%m%d-%H%M")"

# Define the working directory. Currently (Oct 15th 2021) it's where the config.dat is expected and the log file is being written.

WORKING_DIRECTORY_BASENAME="/usr/local/upgrade"

WORKING_DIRECTORY=WORKINGDIRECTORYBASENAME/Drivesync5upgradeRUN_DATE


# LOGFILE

LOGFILE="WORKINGDIRECTORYBASENAME/Drivesync5upgradeRUN_DATE.log"

## END BASIC GLOBAL VARS


usage() {

    cat <<EOU

Usage: $THISEXEBASENAME [OPTIONS]


This script does ...bla bla bla ...


OPTIONS:

-h, --help      Print this help.


EOU

}


main() {

    if [ $# -lt 1 ]; then

        printInfo "Use option -h for help"

        exit

    fi


    while [ $# -gt 0 ]; do

        case $1 in

        -h | -help | --help)

            shift

            usage

            exit

            ;;

        *)

            errorExit "Unknown option $1"

            ;;

        esac

    done


}


#### HELPER FUNCTIONS

printInfo() {

    local caller=${FUNCNAME[1]}

    [ You can't use 'macro parameter character #' in math modecaller ($(caller)) : Missing log message. Exiting."

    logThisMessage INFO caller"@"

}


# logThisMessage signature: msgType MessgeLoggingFunction ActualMessageString(s)

logThisMessage() {

    local caller=${FUNCNAME[1]}

    [ You can't use 'macro parameter character #' in math modecaller ((caller)):Missing((3 - $#)) logging arguments out of type/caller/message. Exiting."

    local msgType=$1 ; shift

    local logger=$1 ; shift

    echo -e "[(date)]THISEXEBASENAME :: msgType:logger : $@"

}


printError() {

    local caller=${FUNCNAME[1]}

    [ You can't use 'macro parameter character #' in math modecaller ($(caller)) : Missing log message. Exiting."

    logThisMessage ERROR caller"@"

}


trapFailurerMsgLogFile() {

    printError "FUNCNAME[1]((caller))" "See log file under $LOGFILE"

}


errorExit() {

    local caller=${FUNCNAME[1]}

    echo -e "[(date)]THISEXEBASENAME :: ERROR: caller((caller)): $@"

    exit 1

}


main "Misplaced &LOGFILE


What does this achieve:


The script clones its output to both the standard output (screen) and a logfile define on line 18


All the output has a well-defined structure that makes it easier to parse by other automated tools


Helper functions include a "contract": This is similar to what an assert in some programming languages does, namely enforcing some constraints. In this case, we are making sure at runtime that those helper functions are being called with the expected amount of arguments. This is the best we can do to detect programming errors as Bash, being an interpret language, lacks the equivalent of a compile-time code check.


All helper functions record the caller function in a local variable called caller. This is the function that called the present helper function


In addition, all helper functions call the Bash, built-in function caller. Its output is collected as a string on-the-fly by using the back-tilde notation caller. Another equivalent form to do the same would be writing $(caller) instead. In any case, this call to function caller returns the name of the script and the line number in the code where it was called from. This feature allows us to mimic the stack tracing of our Bash script in a similar, but more limited, way to the stack tracing available in richer programming languages, e.g., Python.


We could simplify the code by removing the local variables. However, using them makes the code more readable -given we name those variables smartly enough. There is always a trade-off like this when programming, between shortness/simplicity and readability.


In addition this code shows how to use


a conditional if-then,


a while-loop,


querying the number of arguments passed to a function $#,


a multiway branch statement case ... esac. In other languages this is available as a switch statement, e.g., in C++. In Bash, each case we want to catch starts with one or more guards, e.g. -h|-help|--help). These specify the string patterns that trigger each case. Also, each case must end with double semi-colon ;;. We could as well have used a regex for it, namely, -h*|--h*). The pattern defined by the asterisk * means any string, hence, it's the way we define the default behavior.


Once we enable the trapping mechanism for regular error return values (“trap errfunc ERR” as in line 2 above), we **must write all in-line conditionals in normal disjunctive form**, in case we intent to come up gracefully from an error.


Example:


With trapping enabled the following line 2 will never be executed if line 1 ends in error. 


The correct way is show in the second code box.


systemctl start jboss

[ $? -gt 0 ] && errorExit "There was an error when starting JBOSS"   #NEVER EXECUTED




systemctl start jboss || errorExit "There was an error when starting JBOSS"  # OK


Arrays


The following example shows how to define, use, and loop through the elements of an array.


$ myarr=( a1 a2 a3 )

echo"myarr - myarr[0]{myarr[1]} - myarr[2]{myarr[3]}"

a1 - a1 - a2 - a3 -

echo">>myarr - myarr[0]{myarr[1]} - myarr[2]{myarr[3]}<<"

>>a1 - a1 - a2 - a3 -<<

echo"{myarr[@]}"

a1 a2 a3

forarrelemin{myarr[@]} ; do echo $arrelem ; done

a1

a2

a3

forarrelemin{myarr[*]} ; do echo $arrelem ; done

a1

a2

a3

forarrelemin"{myarr[@]}" ; do echo $arrelem ; done

a1

a2

a3

forarrelemin"{myarr[*]}" ; do echo $arrelem ; done

a1 a2 a3


Remarks:


When referencing beyond the last element of an array, like in ${myarr[3]}, Bash simply prints the empty string, but does't throw an error.


Indexes of array elements start by 0


Referencing myarrisequivalenttoreferencingthefirstelementofthearray,i.e.,{myarr[0]}.


There is an important, but very subtle difference when referencing ALL the contents of an array at once using the at index @ or the asterisk index *. This difference is only noticeable when we quote such reference: The former still shows up as individual array entries, while the second shows up as one single string!


The structure of the for loop is for [variable-label] in [list-of-items-to-loop-through] ; do [code-to-execute-for-each-item] ; done


Use of Bash built-in FUNCNAME array and caller function


See the following example:


$ foo(){ echo "FUNCNAME[0]{FUNCNAME[1]} - `caller`" ;}

$ goo(){ foo ;}

$ goo

foo - goo - 1 main


First we define two functions foo and goo. The latter just calls the fist.


When we execute goo, it calls foo, which in turn prints out the first 2 elements of the array FUNCNAME as well as the output of the Bash built-in function caller.


As this example is run directly on the command-line, there is no script name, and also the execution entails only one line, namely the one where we call function goo. Hence, the function caller returns line number 1 and as "script-name", the string "main". If we would run these three line from inside a script the name printed out would be that of the script.


Tuesday, January 7, 2025

RTB aka Mass Surveillance on the loose

Real-Time Bidding (RTB)

In our society, nowadays, it would really be challenging to live without our smartphones, tablets or computers. If you are curious about privacy issues  or how your personal information may be handled when using those devices, you may want to read this article from the Electronic Frontier Foundation.

My iPad Safari is giving me this summary: 

"RTB is a process used to select targeted online ads, exposes personal information to thousands of companies daily. This data, called 'bidstream data', is easily linked to real people and can be exploited by data brokers for invasive purposes like tracking union organizers and compiling sensitive demographic information. To protect yourself, disable your mobile advertising ID, audit app permissions, and install Privacy Badger, a browser extension that blocks online trackers".

And if that read left you really concerned, you might be interested in what Janet Vertesi does to keep all that privacy-siphoning at bay. She offers suggestions galore for tightening your privacy when online and using your phone or even when shopping.