Scripting: Difference between revisions

From Wiki Max
Jump to navigation Jump to search
 
(5 intermediate revisions by 2 users not shown)
Line 25: Line 25:
Unix commands (enriched by bash built-in functions & structures) can be used in bash scripts:
Unix commands (enriched by bash built-in functions & structures) can be used in bash scripts:


  $> cat ./get_users.sh
  $> vi ./get_users.sh
   
   
  #! /bin/bash  -x
  #! /bin/bash  -x
Line 52: Line 52:
Now, this second version of the script needs to be run as:
Now, this second version of the script needs to be run as:
  $> ./get_users2.sh /etc/passwd
  $> ./get_users2.sh /etc/passwd


== Sed & Awk ==
== Sed & Awk ==
Line 122: Line 121:
  fi
  fi
   
   
if [ -n "$var" ] ; then echo "var is non-empty" ; fi
if [ -z "$var" ] ; then echo "var is empty" ; fi
  if [ -e "$file" ] ; then echo "File exists" ; fi
  if [ -e "$file" ] ; then echo "File exists" ; fi
  if [ ! -e "$file" ] ; then echo "File does not exist" ; fi
  if [ ! -e "$file" ] ; then echo "File does not exist" ; fi
Line 170: Line 171:


[[Solution_1_2 | Solution 2]]
[[Solution_1_2 | Solution 2]]


=== Exercise 2 ===
=== Exercise 2 ===
Line 177: Line 179:
* Keep trace of the parameters used in the file names
* Keep trace of the parameters used in the file names


* Hint1: create a template input file, where the celldm1 field is assigned @alat@, which you will substitute in the script
* '''Hint1''': create a template input file, where the <code>celldm1</code> field is assigned <code>@alat@</code>, which you will substitute in the script
* Hint2: save both input and output files;     
* '''Hint2''': save both input and output files;     
* Hint3:  <code>$ basename <name>.dat  .dat  ->  <name></code>
* '''Hint3''':  <code>$ basename <name>.dat  .dat  ->  <name></code>


'''Solution''': have a look at   
'''Solution''': have a look at   
* LAB_1/test_diamond/run_lattice.sh    and
* [https://github.com/max-centre/LabQSM/blob/main/LAB_1/test_diamond/run_lattice.sh LAB_1/test_diamond/run_lattice.sh]   and
* LAB_1/test_diamond/scf.tmpl
* [https://github.com/max-centre/LabQSM/blob/main/LAB_1/test_diamond/scf.tmpl LAB_1/test_diamond/scf.tmpl]
 




Line 193: Line 196:


'''Solution''': have a look at  
'''Solution''': have a look at  
* LAB_1/test_diamond/run_lattice_with_input.sh
* [https://github.com/max-centre/LabQSM/blob/main/LAB_1/test_diamond/run_lattice_with_input.sh LAB_1/test_diamond/run_lattice_with_input.sh]
 
 
 
=== Exercise 4 ===
Parsing pw output files
 
* Write a script to extract the total energy and the lattice parameter from a pw output file
* Print them on the same output line
* Solve the problem using a bash script  (e.g. called <code>extract.sh</code>)
* Try to do the same also using a awk script  (<code>extract.awk</code>)
* If you are a python expert you can give it a try  (<code>extract.py</code>)
 
Note that .sh and .py scripts can be made easily working on multiple files
 
* Hint1: the total energy is marked by "!", while the lattice parameter can be taken from celldm1 (marked by "lattice parameter")
* Hint2: look at an example and build the scripts checking step by step, use e.g. scf.diamond.out
 
[[Solution_4_1 | Solution]]

Latest revision as of 14:22, 23 November 2021



Commands (to be interpreted and executed) can be placed in a text file, called script, to be executed by means of an interpreter

The interpreter is specified in the first line of the script, e.g. by:

 #! /bin/sh
 #! /bin/bash
 #! /bin/tcsh
 #! /usr/bin/awk -f
 #! /usr/bin/env python
 ...

(Note that while # is in all the above languages a comment, #! is actually used to identify the interpreter).

Bash scripting

Among the many, bash scripting is particularly relevant to us (bash is also the interpreter of the command-line shell we have been using so far).

Unix commands (enriched by bash built-in functions & structures) can be used in bash scripts:

$> vi ./get_users.sh

#! /bin/bash  -x
filein=/etc/passwd
#
# extract user names
cat $filein | awk -v FS=":" '{print $1}'

Note that in order to execute get_users.sh, we need to change its permissions,

 $> chmod a+x ./get_users.sh

When executing, the output fo the script can also be redirected to a file,

 $> ./get_users.sh > users.dat

Within the script, $0 corresponds to the invocation name (./get_users.sh, in the example above), $1, $2, .. $n to the n-th arguments if present. $# is the number of command line arguments passed to the script.

$> cat ./get_users2.sh

#! /bin/bash
if [ $# == 0 ] ; then echo "Usage:  ./get_users2.sh  <filename>" ; exit 1 ; fi
filein=$1
# 
# extract user names
cat $filein | awk -v FS=":" '{print $1}'

Now, this second version of the script needs to be run as:

$> ./get_users2.sh /etc/passwd

Sed & Awk

These two commands, available almost everywhere, are extremely used in bash scripting.

sed

substitutes regular expressions in files or strings. Examples follow:

$> echo “Ciao Ciao” | sed ‘s/C/M/’
     ->  “Miao Ciao”
$> echo “Ciao Ciao” | sed ‘s/C/M/g’
     ->  “Miao Miao”                  # g stands for “global substitution”

Regular expressions can also be used in the search.

  • "." in the regular expr means all characters (wild card) and needs to be protected as \. to be treated as a regular character
  • \n means newline
  • \t tab


awk

line by line operations (number & strings, syntax similar to c)

$> echo 10 4.0 | awk '{print $1 * sqrt($2)}'
$> echo “LabQSM 2020” | awk '{print $1; print "Year", $2}'

Awk has its own scripting, useful eg for parsing or data post-processing (the same operation/search is done line by line)

Actions are automatically performed line by line.

  • NR current line number
  • NF current number of fields, separated by FS
  • FS field separator, space by default, can be changed e.g. to , or :, or tab
  • $1, $2, ... $NF refer to different fields in the parsed line
  • Arithmetic, string (and many more) operations.
  • next skips to the next line

Take eg the file apt.txt with the list of tennis players that we have used in previous examples:

 9850, Nadal,  Rafael 
 6630, Federer,  Roger
 3075, Berrettini,  Matteo 
12030, Djokovic,  Novak

The problem can be solved by awk as follows:

#! /usr/bin/awk -f
BEGIN{ i=ind; nlines=0; FS="," }
{
  if (NF != 3) next
  nlines++
  if (nlines == i) {printf "%s, %s\n", $3,$2}
}
END{
# place here any operation to be done at the end
}

Run as:

$> ./solution.awk -v ind=2  apt.txt

Note that comma-separated columns are no longer needed, and one can avoid using commas by simply dropping the redefinition of the field-separator FS=",".

Bash control statements

Conditionals

if [ "$var1" = "$var2" ] ; then
   <some-statements>
else
   <some-statements>
fi

if [ -n "$var" ] ; then echo "var is non-empty" ; fi 
if [ -z "$var" ] ; then echo "var is empty" ; fi
if [ -e "$file" ] ; then echo "File exists" ; fi
if [ ! -e "$file" ] ; then echo "File does not exist" ; fi
if [ -d "$dir" ] ;  then echo "Dir exists" ; fi
if [ -x "$file" ] ; then echo "File exists and is exec" ; fi

Loops

list="item1 item2 item3"
for item in $list
do
   echo $item
done

Input from command line

#! /bin/bash
echo "number of arguments : $#"
echo "        	command : $0"
echo "        	1st arg : $1"
echo "        	2nd arg : $2"
echo "            	... "
echo "       	all args : $*"
Execute as:
$> ./example.sh  p1 p2 p3

Dealing with extended text (Useful eg to write input files)

This can be used for a few lines

echo line1 >  file.txt 
echo line2 >> file.txt
echo line3 >> file.txt

Instead, when text becomes extended

cat >file.txt << EOF
   line1
   line2
   line3
EOF

Exercises

Exercise 1

Job Script: Write a script (run.sh) to run pw.x once an input file is provided

Solution 1

Solution 2


Exercise 2

  • Modify the ruh.sh script of Problem 1 to loop over different lattice parameters.
  • Consider the grid: -3%, -2%, -1%, 0, +1%, +2%, + 3%
  • Keep trace of the parameters used in the file names
  • Hint1: create a template input file, where the celldm1 field is assigned @alat@, which you will substitute in the script
  • Hint2: save both input and output files;
  • Hint3: $ basename <name>.dat .dat -> <name>

Solution: have a look at


Exercise 3

Lattice parameter revised

  • Same at Exercise 2
  • Include the input file in the script, meaning that the substitution of parameters in the template input file can be done via shell variables and loops

Solution: have a look at