Computational Methods in Physics ASU Physics PHY 494

01' Intermediate Shell Hacks (optional)

The real power of the shell lies in the fact that one can combine multiple shell commands and in this way automate pretty much any task that operates on files and directories. The two major ways to combine shell commands are

  • pipes, filters, and redirection
  • shell scripts

"Pipes" feed the output from one command to the input of another command and in this way combine multiple tools for filtering data.

"Scripts" are text files that contain multiple commands (including pipes). Executing the file will the nexecute all the commands, without having to type them repeatedly. The bash shell is actually a complete programming language and one can accomplish rather complicated tasks with it.

Additionally, command completion ("TAB-completion") and wildcards enable fast interactive work.

The following is very useful but for right now not essential and is optional. You can work through the tutorial in your own time.

If you want to dive much deeper into bash then start reading the Advanced Bash-Scripting Guide, an "in-depth exploration of the art of shell scripting".

Pipes and Filters

Commands

  • cat
  • head
  • tail
  • less
  • wc
  • sort
  • uniq
  • cut and paste
  • grep

and special shell characters

  • | ("pipe", joins commands together)
  • > (redirects output to a file)
  • < (redirects file to input)

Working with redirection and command pipelines

Download planets.dat1 and put it in the directory data/. You can do this with your web browser or if you have the curl program installed, try

curl https://asu-compmethodsphysics-phy494.github.io/ASU-PHY494/public/data/planets.dat -O

(all on one line).

Output the whole file to the screen:

cat planets.dat

Look at the first three lines of the file

head -3 planets.dat

should give

Alderaan             12500  grasslands/mountains
Yavin_IV             10200  jungle/rainforests
Hoth                  7200  tundra/icecaves/mountainranges

Count the number of planets

wc planets.dat
      60     180    2944 planets.dat

Make a file in which all planets are duplicated:

cat planets.dat planets.dat > planets_2.dat

(cat concatenates files and then you redirect it to a new file.)

Activity

  1. Test that wc -l gives just the number of lines ("60").
  2. What does wc planets.dat planets_2.dat do?
  3. What happens when you do wc < planets.dat?
  4. Run wc then type any number of lines of text (pressing enter to terminate each line) and when you get bored, press ^D (control+D). What happened?

Pipes

Sort by name and look at the first five using a pipes and filters:

sort planets.dat | head -5

Sort by diameter (-k2,2 is column 2 and numeric sort -n), biggest first (-r reverse sort), and write the top 3 to a file biggest_planets:

sort -k2,2 -n -r planets.dat | head -3 > biggest_planets

Count the number of planets with unknown diameter:

grep "unknown" planets.dat | wc -l

Get the first letter of each planet name and sort alphabetically:

cut -b 1,1 planets.dat | sort

Get the terrain types

cut -b 29-  planets.dat

Activity

  1. Count the number of planets in planets_2.dat.
  2. Find planets where the rebel might have a base (hint: you know it's cold there… use grep). How many planets will you investigate more closely? Write the list to the file bases.
  3. How many unique terrain types are there? (Hint: uniq needs a sorted list as input)
  4. What is the most frequent and the least frequent first letter amongst these planets? (Hint: uniq -c)

Using git to get data for the class

git is a version control software and we will come back to explain its main functionality later. Right now we use it as a convenient tool to get additional data.

There is a "repository" at https://github.com/ASU-CompMethodsPhysics-PHY494/PHY494-resources that contains data and code to be used during the class. It will be updated as we go along.

Get the data for today by cloning the repository:

cd ~
git clone https://github.com/ASU-CompMethodsPhysics-PHY494/PHY494-resources.git

(You only need to do this once.)

At any later time, pull in the latest updates from inside the repository::

cd PHY494-resources
git pull

(This can be done as often as you like.)

Go into the PHY494-resources/01_shell/data directory.

Activity

  1. Read the README file (e.g. using atom or cat or less (for the latter, use h to get help and q to quit)).

  2. Count the number of entries in all files ending in "csv". (Hint: use a glob pattern)

  3. Use cut -f 1 -d ',' people.csv to extract each name to a file names and a similar command to extract weight to a file weights.

  4. Use the paste command to generate a new list that contains "weight name" (reordered and separated by space):

     paste weights names
    

    Use this approach to sort the people in order of decreasing weight.

Shell scripts

You can save commands in a file. This is called a script. A script allows you to reuse commands (laziness is a programmer's virtue!) without having to retype them over and over again. It also allows you to solve a task once and then forget about how you did it in detail because it is written in the script.

Make directory ~/bin for your scripts in your home directory.

mkdir ~/bin

(I strongly suggest you do this really in your home directory because in the following I will assume it; if you changed the path to e.g. ~/classes/2016/PHY494/bin then you will need to use that path in all the following examples.)

Usingatom, create the following script ~/bin/update_resources.sh:

# PHY 494 script to update the resources repository

GIT_REPOSITORY="${HOME}/PHY494-resources"

cd "${GIT_REPOSITORY}"
git pull

echo "Updated resources in ${GIT_REPOSITORY}"

(You create the script by (1) atom ~/bin/update_resources.sh (opens empty file if it does not exist), (2) type all the lines into the editor (or copy & paste), (3) save the file and exit the editor.)

Notes:

  • All the lines above should be in your file (first line will start with # PHY 494 and the last line will begin with echo).

  • The line starting with # is a comment: it is not a shell command and is ignored by the shell. However, adding comments to scripts is a really, really good idea!

  • The shell has variables: Some like HOME are pre-defined, others you can define yourself (GIT_REPOSITORY=...). Using all-caps is a convention that you should follow.

    The contents (value) of variables is accessed with the dollar $ sign in front of the variable name.

  • echo prints to the standard output (typically, the screen)

Execute the script with

bash ~/bin/update_resources.sh

It should show output similar to

Already up-to-date.
Updated resources in /Users/oliver/PHY494-resources

However, during the course of the year more data will be added to the repository and then you can just run your update command to get the data and you might see output like the following:

remote: Counting objects: 15, done.
remote: Compressing objects: 100% (10/10), done.
remote: Total 15 (delta 2), reused 15 (delta 2), pack-reused 0
Unpacking objects: 100% (15/15), done.
From https://github.com/ASU-CompMethodsPhysics-PHY494/PHY494-resources
   c3b5c04..23a4083  master     -> origin/master
Updating c3b5c04..23a4083
Fast-forward
 01_shell/bin/update_resources.sh | 8 ++++++++
 02_python/gutentag.py            | 4 ++++
 2 files changed, 12 insertions(+)
 create mode 100644 01_shell/bin/update_resources.sh
 create mode 100644 02_python/gutentag.py
Updated resources in /Users/oliver/PHY494-resources

Footnotes
  1. Star Wars data courtesy of SWAPI. See PHY494-auxilliary/star_wars for Python code to pull the data from SWAPI.