Scripting: Difference between revisions
Line 83: | Line 83: | ||
#! /usr/bin/awk -f | #! /usr/bin/awk -f | ||
BEGIN{ i=ind | BEGIN{ i=ind; nlines=0; FS="," } | ||
{ | { | ||
if (NF != 3) next | if (NF != 3) next |
Revision as of 11:41, 25 November 2020
Commands (to be interpreted and executed) can be placed in a text file, called script, to be executed by means of an interpreter
The interpreter is specified in the first line of the script, e.g. by:
#! /bin/sh #! /bin/bash #! /bin/tcsh #! /usr/bin/awk -f #! /usr/bin/env python ...
(Note that while # is in all the above languages a comment, #! is actually used to identify the interpreter).
Bash scripting
Among the many, bash scripting is particularly relevant to us (bash is also the interpreter of the command-line shell we have been using so far).
Unix commands (enriched by bash built-in functions & structures) can be used in bash scripts:
$> cat ./get_users.sh #! /bin/bash -x filein=/etc/passwd # # extract user names cat $filein | awk -v FS=":" '{print $1}'
Note that in order to execute get_users.sh
, we need to change its permissions,
$> chmod a+x ./get_users.sh
When executing, the output fo the script can also be redirected to a file,
$> ./get_users.sh > users.dat
Within the script, $0 corresponds to the invocation name (./get_users.sh, in the example above), $1, $2, .. $n to the n-th arguments if present. $# is the number of command line arguments passed to the script.
$> cat ./get_users2.sh #! /bin/bash if [ $# == 0 ] ; then echo "Usage: ./get_users2.sh <filename>" ; exit 1 ; fi filein=$1 # # extract user names cat $filein | awk -v FS=":" '{print $1}'
Now, this second version of the script needs to be run as:
$> ./get_users2.sh /etc/passwd
Sed & Awk
These two commands, available almost everywhere, are extremely used in bash scripting.
sed
substitutes regular expressions in files or strings. Examples follow:
$> echo “Ciao Ciao” | sed ‘s/C/M/’ -> “Miao Ciao” $> echo “Ciao Ciao” | sed ‘s/C/M/g’ -> “Miao Miao” # g stands for “global substitution”
Regular expressions can also be used in the search.
- "." in the regular expr means all characters (wild card) and needs to be protected as \. to be treated as a regular character
- \n means newline
- \t tab
awk
line by line operations (number & strings, syntax similar to c)
$> echo 10 4.0 | awk '{print $1 * sqrt($2)}' $> echo “LabQSM 2020” | awk '{print $1; print "Year", $2}'
Awk has its own scripting, useful eg for parsing or data post-processing (the same operation/search is done line by line)
Take eg the file apt.txt
with the list of tennis players that we have used in previous examples:
9850, Nadal, Rafael 6630, Federer, Roger 3075, Berrettini, Matteo 12030, Djokovic, Novak
The problem can be solved by awk as follows:
#! /usr/bin/awk -f BEGIN{ i=ind; nlines=0; FS="," } { if (NF != 3) next nlines++ if (nlines == i) {printf "%s, %s\n", $3,$2} } END{ # place here any operation to be done at the end }
Run as:
$> ./solution.awk -v ind=2 apt.txt
Note that comma-separated columns are no longer needed, and one can avoid using commas by simply dropping the redefinition of the field-separator FS=",".