If you enjoyed this book considering buying a copy

Chapter 5: Writing pure Bash or ZSH command-line tools #

Alfredo Deza

I’ve been horrified before trying to figure out a piece of production code that was mixing shell scripting and Python. Why would one try to do something like this? A step further was when a large (and custom) Python test framework was doing a system call to a shell that then itself executed Python on a remote system. Can you imagine fixing bugs in that codebase? Where do you start? In the remote server running some odd version of Python, or using a previous version of BASH that has a built-in that behaves differently? Or perhaps in Python that can change some subtle things (like dictionary ordering) from one version to the other?

    writes = run(
        args=[
            'sudo', 'mkdir', '-p', '/etc/app', escaped('&&'),
            'sudo', 'chmod', '0755', '/etc/app', escaped('&&'),
            'sudo', 'python',
            '-c',
            'import shutil, sys; shutil.copyfileobj(sys.stdin, file(sys.argv[1], "wb"))',
            conf_path,
            escaped('&&'),
            'sudo', 'chmod', '0644', conf_path,
            ],
        stdin=run.PIPE,
        wait=False,
        )
    feed_many_stdins_and_close(conf_fp, writes)
    run.wait(writes)

I’ve changed some paths and names, the idea here is not to point fingers looking to blame anyone - I’m guilty of writing horrendous code before too! This piece of code is like using real lunar dust to create a representation of the Moon for your 4th grade Science Class. There is absolutely no need whatsoever to do this, but you certainly can. The example has a few red flags; it compounds multiple shell statements into one using an escaped double ampersand (&&), which is problematic if one of these pieces fail. That removes the nicety of fine error control and introduces all the roughness in shell scripting. Next, it calls out to Python (remember this is originating from Python) as a shell command that executes something that could very well be another shell command: copies a file from stdin. Finally, it changes the permissions in the path, returning to Python.

This chapter does not intend to encourage this type of programming, and I start with a bad example because it demonstrates how easy one can abuse the flexibility of Python and the shell. On the contrary, this chapter showcases some good uses, where mixing some shell scripts with Python, and Python within shell scripts is perfectly valid and useful.

Understanding environmental variables #

Environment variables can be magical, and to some extent they can be useful. One of the problems with environment variables is that it isn’t always possible to tell where these are coming from because they can be overridden. Environment variables are variables that are defined in the system and are most commonly used in shell scripts. These variables are also available through Python, so it is possible to inspect them from there. Try a quick test in your computer by opening a terminal and running the env command:

$ env
TERM_SESSION_ID=w1t10p0:54756B1E-79CF-46D6-B6C9-98DBE779ABA2
LC_TERMINAL_VERSION=3.3.9
COLORFGBG=15;0
ITERM_PROFILE=Alfredo
XPC_FLAGS=0x0
LANG=en_US.UTF-8
PWD=/Users/alfredo/python/python-command-line-tools-book
SHELL=/bin/zsh
TERM_PROGRAM_VERSION=3.3.9
TERM_PROGRAM=iTerm.app
PATH=/Users/alfredo/go/bin:/usr/local/go/bin:/usr/local/bin:\
/Library/Frameworks/Python.framework/Versions/3.5/bin:\
/Library/Frameworks/Python.framework/Versions/3.7/bin:\
/Library/Frameworks/Python.framework/Versions/3.8/bin:\
/Library/Frameworks/Python.framework/Versions/3.6/bin:\
/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/sbin:\
/usr/local/mysql/bin:/Users/alfredo/bin:/usr/texbin:/usr/local/go/bin
LC_TERMINAL=iTerm2
COLORTERM=truecolor
TERM=xterm-256color
HOME=/Users/alfredo
TMPDIR=/var/folders/pz/vqrg684d10n8jmv6fz60kxjw0000gn/T/
USER=alfredo
LOGNAME=alfredo
PIP_DOWNLOAD_CACHE=/Users/alfredo/.pip_cache
GOROOT=/usr/local/go
GOPATH=/Users/alfredo/go
KEYTIMEOUT=1
ARCHFLAGS=-arch i386 -arch x86_64
MAKEOPTS=-j17
LESS=FRSXQ
PYTHONSTARTUP=/Users/alfredo/dotfiles/pythonstartup.py
EDITOR=/usr/local/bin/vim
_=/usr/bin/env

A lot shows in my system. It gives you some insight on what I’m using and how. The text editor I prefer (Vim) is set there, as well as the shell (ZSH). Other tools like Git can access these values and use the preferred text editor when crafting a commit message, for example. Environment variables allow programs (or environments) to set some named values to use them throughout the program or interchangeably with other applications. Imagine if you had to create a program in Python and then retrieve some values from a different program or service like the Nginx web server, it would be challenging if it wasn’t for environment variables.

As I’ve mentioned, the environment variables already set for my user are accessible via Python with the os module:

>>> import os
>>> os.environ['LOGNAME']
alfredo

As you can see, os.environ is a mapping that behaves almost exactly like a plain Python dictionary. Accessing keys and values is the same, and the module allows a few extra methods for manipulating the environment like putenv() and unsetenv(). These two helpers are dependent on support by the system where Python is running, so you may not see them always when running Python. It is safe to treat the os.environ mapping as a plain dictionary, where you set keys and values and use them elsewhere.

One thing to be careful, and has created issues for me before, is that you are not allowed to set any value that isn’t a string. This constraint comes from environment variables themselves, which are always strings assigned to keys. If you are unaware of this detail you get an error similar to this:

>>> import os
>>> os.environ['my_value'] = 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py", line 678, in __setitem__
    value = self.encodevalue(value)
  File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py", line 748, in encode
    raise TypeError("str expected, not %s"

Another important thing to understand is that environment variables manipulated in Python are not long-lived. They persist for however long the Python program is running, and this is true for both setting new environment variables as well for removing or altering existing ones:

>>> os.environ['my_value'] = '1'
>>> os.environ['my_value']
'1'
>>> ^D

Then on the terminal, the my_value variable is just not there:

$ env | grep my_value || echo "variable not found"
variable not found

Manipulating an existing variable yields similar behavior:

>>> os.environ['SHELL']
'/bin/zsh'
>>> os.environ['SHELL'] = ''
>>> os.environ['SHELL']
''

After setting SHELL to an empty string and exiting the program, the variable hasn’t changed at all:

$ env | grep SHELL
SHELL=/bin/zsh

Understand shell profiles #

Profiles are different files that can override the system-wide configuration of your shell environment. I prefer using ZSH, but these notions apply for a vast majority of shells out there, including the omnipresent BASH shell. Depending on the system, these files have different locations. For OSX, there is a system-wide BASH configuration file for example:

$ cat /etc/bashrc
# System-wide .bashrc file for interactive bash(1) shells.
if [ -z "$PS1" ]; then
   return
fi

PS1='\h:\W \u\$ '
# Make bash check its window size after a process completes
shopt -s checkwinsize

[ -r "/etc/bashrc_$TERM_PROGRAM" ] && . "/etc/bashrc_$TERM_PROGRAM"

For the most part, you add configuration for your environment in the specific configuration file for your user. A system can have many different users, but (ideally) a user is only for one person, which can have modifications done in one file that gets loaded every time a new session starts. A new session could be a new terminal window (or tab) opened, or login into a machine. There are still subtle differences between login into a system and opening a new terminal (creating a new session), and just for informational purposes, these are a list of how these files are loaded (in order of execution) in BASH:

/etc/profile : Read first, for system-wide configurations.
~/.bash_profile : The file that determines configuration for login shells, that is, login into a new shell (unlike opening a new terminal window)
~/.bash_login : Same (analogous) to ~/.bash_profile. It is confusing but safe to ignore; you should use ~/.bash_profile instead.
~/.profile : The fallback legacy file that it reads in case it exists, and primarily for backwards compatibility with administrators that still use this file.
~/.bashrc : Finally, the user configuration file that gets read both for login into a new shell and creating a new session, and it is primarily the place where users customize their shell environment, like adding environment variable or aliases.

Customizing your shell #

Having some helpers and neat aliases is helpful. Almost always, these customizations go into the shell configuration file (~/.bashrc for BASH and ~/.zshrc for ZSH). Although I like ZSH and it is what I’ve used for the past few years, the examples below should work fine with BASH as well. I advise you to keep your customizations in version control to ensure these aren’t lost, maintaining a track of changes throughout time. Once in version control, I create symbolic links from the repository where my configurations are, back to what the SHELL needs. For example, most of my configuration files are in my dotfiles repository, so I clone it in my home directory and then link them. For the .zshrc file that would be something like this:

$ git clone https://github.com/alfredodeza/dotfiles.git
[...]
$ ln -s dotfiles/.zshrc ~/.zshrc

If the .zshrc file didn’t exist before, the linking succeeds, and my full customization is ready to go. There are several things I use for my shell, and I demonstrate them in the next examples.

Having a long history (or almost unlimited history) is great because you can forget a useful command, or set of commands, from three months ago. History is there to save you, for BASH it looks like this:

export HISTFILESIZE=
export HISTSIZE=
export HISTTIMEFORMAT="[%F %T] "
export HISTFILE=~/.bash_eternal_history

And for ZSH, I do this:

HISTFILE=$HOME/.zsh_history
HISTSIZE=10000000
SAVEHIST=10000000
setopt SHARE_HISTORY
setopt APPEND_HISTORY

Instead of using the history built-in to search what you need, you can use Ctrl-r to do a reverse search of history that works looking into the full history content as you type a command or even partial commands. Try it out by pressing Ctrl-r and then typing the command you are looking for. In ZSH it looks something like this:

$ sudo rm -rf sha256
bck-i-search: sudo _

The bck-i-searc is where I am typing sudo, and the prompt above me keeps updating with what matches. BASH looks a bit similar:

(reverse-i-search)`ls': ls

The typed command appears right after the reverse-i-search, and the result is after the colon.

Many of my customizations are for ZSH only, and since most everyone uses BASH primarily, I’ll concentrate on the examples that can work interchangeably. Aliases is one of them, here is one that allows me to quickly move up directories with dots instead of writing cd ../../:

# cd aliases
alias ..="cd .."
alias ...="cd ../.."
alias ....="cd ../../.."
alias .....="cd ../../../.."
alias ......="cd ../../../../.."

As a Vim user, I like to exit a shell without being required to do Ctrl-D, so an alias for :q is great, bonus points for upper case variant because sometimes I press the Shift key too long:

alias \:q='exit'
alias \:Q='exit'

If your shell doesn’t display colors to differentiate between directories, files, and executables, this handy alias works well, regardless of Linux or OSX:

ls --color -d . &>/dev/null 2>&1 && alias ls='ls --color=if-tty' || alias ls='ls -G'

Write Shell functions #

Aside from small customizations and aliases that go in your .profile or shell config file, a neat trick to explore is writing shell functions. Any shell function that gets defined in those files as the shell starts up is available in the terminal as an executable:

my_function() {
    echo "This is actually a function!"
}

After saving that function in your shell config file, start a new session, so the file is re-read. The my_function appears as an available command that can get called:

$ my_function
This is actually a function!

Arguments and options can work with these functions and expand their usage. The main problem with these is that you have to make sure they don’t get big. Anything larger than ten lines of shell is aking to be an actual command-line tool. As long as you are aware of this, feel free to keep adding them when you need to solve a particular problem with just a couple of lines in a shell function. One particular problem I encounter every now and then is that I’m not sure where a Python module is coming from. Since Python keeps having issues figuring out proper packaging solutions, users need to rely on virtual environments and solving problems with libraries installed in different places. One way to quickly check where a module exists is by importing it and printing it in an interactive Python session:

>>> import click
>>> print(click)
<module 'click' from '/Users/alfredo/.virtualenvs/cli/lib/python3.8/site-packages/click/__init__.py'>

I don’t want to start an interactive Python shell to check on module locations every time, so a quick function makes sense here. First, I need to make sure a module can be imported, handling exceptions if that is the case, and finally return the path. I call this helper try():

try() {
    python3 -c "
exec('''
try:
    import ${1} as _
    print(_.__file__)
except Exception as e:
    print(e)
''')"
}

Its usage in the command-line is straightforward; it only accepts an argument:

$ try os
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/os.py
$ try click
/Users/alfredo/cli/lib/python3.8/site-packages/click/__init__.py
$ try foo
No module named 'foo'

To help with other interesting package metadata, I use the pkg_resources module which knows how to retrieve that information as well as the location of a module, I call this helper function welp(), and it depends on the try() helper from before:

welp() {
    P_VERSION=`python3 -c "
exec('''
try:
    import pkg_resources
    print(pkg_resources.get_distribution(\'${1}\').version)
except Exception:
    print(\'Not found\')
''')"`
    echo "Path: $(try ${1})"
    echo "Version: ${P_VERSION}"
}

Very useful in the command-line:

$ welp foo
Path: No module named 'foo'
Version: Not found
$ welp click
Path: /Users/alfredo/cli/lib/python3.8/site-packages/click/__init__.py
Version: 7.1.1