Chapter09 Case Studies

If you find this content useful, consider buying this book:

If you enjoyed this book considering buying a copy

Chapter 9: Case Studies #

This chapter contains a collection of recipes that have come up through the years in building command-line tools.

Distribute a containerized click application to DockerHub #

Getting started with Docker #

There are two primary components of Docker: Docker Desktop and Docker Hub.

docker-ecosystem

Docker Desktop Overview #

The desktop application contains the container runtime, which allows containers to execute. The Docker App itself orchestrates the local development workflow, including the ability to use Kubernetes, which is an open-source system for managing containerized applications that came out of Google.

Docker Hub Overview #

So what is Docker Hub, and what problem does it solve? Just as the git source code ecosystem has local developer tools like vim, emacs, Visual Studio Code or XCode that work with it, Docker Desktop works with Docker containers and allows for local use and development.

When collaborating with git outside of the local environment, developers often use platforms like Github or Gitlab to communicate with other parties and share code. Docker Hub works similarly. Docker Hub allows developers to share docker containers that can serve as a base image for building new solutions.

These base images can be built by experts and certified to be high quality: i.e., the official Python developers have a base image. This step allows a developer to leverage the expertise of the real expert on a particular software component and improve their container’s overall quality. This process is a similar concept to using a library developed by another developer versus writing it yourself.

Why Docker Containers vs. Virtual Machines? #

What is the difference between a container and a virtual machine? Here is a breakdown:

  • Size: Containers are much smaller than Virtual Machines (VM) and run as isolated processes versus virtualized hardware. VMs can be GBs, while containers can be MBs.
  • Speed: Virtual Machines can be slow to boot and take minutes to launch. A container can spawn much more quickly, typically in seconds.
  • Composability: Containers programmatically build. They are defined as source code in an Infrastructure as Code project (IaC). Virtual Machines are often replicas of a manually created system. Containers make IaC workflows possible because they are defined as a file and checked into source control alongside the project’s source code.

Real-World Examples of Containers #

What problem does Docker format containers solve? In a nutshell, the operating system runtime can package along with the code, which explains a particularly complicated problem with a long history. There is a famous meme that goes, “It works on my machine!”. While this is often told as a joke to illustrate the complexity of deploying software, it is also true. Containers solve this exact problem. If the code works in a box, then the container configuration can be checked in as code. Another way to describe this concept is that the actual Infrastructure runs as “code.” This process is called IaC (Infrastructure as Code).

Here are a few specific examples:

Developer Shares Local Project #

A developer can work on a web application that uses flask (a popular Python web framework). The Docker container file handles the installation and configuration of the underlying operating system. Another team member can check out the code and use docker run to run the project. This process eliminates the multi-day problem of configuring a laptop correctly to run a software project.

Data Scientist shares Jupyter Notebook with a Researcher at another University #

A data scientist working with jupyter style notebooks wants to share a complex data science project with multiple dependencies on C, Fortran, R, and Python code. They package up the runtime as a Docker container and eliminate the back and forth over several weeks that occurs when sharing a project like this.

A Machine Learning Engineer Load Tests a Production Machine Learning Model #

A Machine learning engineer needs to take a new model and deploy it to production. Previously, they were concerned about how to accurately test the new model’s accuracy before committing to it. The model recommends products to pay customers and, if it is inaccurate, costs the company a lot of money. By using containers, it is possible to deploy the model to a fraction of the customers, only 10%, and if there are problems, it can be quickly reverted. If the model performs well, it can soon replace the existing models.

Running Docker Containers #

Using “base” images #

One of the advantages of the Docker workflow for developers is the ability to use certified containers from the “official” development teams. In this diagram, a developer uses the official Python base image developed by the core Python developers. This step is accomplished by the FROM statement, which loads in a previously created container image.

docker-base-image

As the developer changes to the Dockerfile, they test locally and then push the changes to a private Docker Hub repo. After this, the changes can be used by a deployment process to a Cloud or by another developer.

Common Issues Running a Docker Container #

There are a few common issues that crop up when starting a container or building one for the first time. Let’s walk through each problem and then present a solution for them.

  • What goes in a Dockerfile if you need to write to the host filesystem? In the following example, the docker volume command creates a volume, which mounts to the container.
>  /tmp docker volume create docker-data
docker-data
>  /tmp docker volume ls
DRIVER              VOLUME NAME
local               docker-data
>  /tmp docker run -d \
  --name devtest \
  --mount source=docker-data,target=/app \
  ubuntu:latest
6cef681d9d3b06788d0f461665919b3bf2d32e6c6cc62e2dbab02b05e77769f4

You can configure logging for a Docker container by selecting the type of log driver, in this example, json-file and whether it is blocking or non-blocking. This example shows a configuration that uses json-file and mode=non-blocking for an ubuntu container. The non-blocking mode ensures that the application won’t fail in a non-deterministic manner. Make sure to read the Docker logging guide on different logging options.

>  /tmp docker run -it --log-driver json-file --log-opt mode=non-blocking ubuntu
root@551f89012f30:/#
  • How do you map ports to the external host?

The Docker container has an internal set of ports that must be exposed to the host and mapped. One of the easiest ways to see what ports present to the host is by running the docker port <container name> command. Here is an example of what that looks like against a foo named container.

$ docker port foo
7000/tcp -> 0.0.0.0:2000
9000/tcp -> 0.0.0.0:3000

What about actually mapping the ports? You can do that using the -p flag, as shown. You can read more about Docker run flags here.

$ docker run -p 127.0.0.1:80:9999/tcp ubuntu bash
  • What about configuring Memory, CPU, and GPU?

You can configure docker run to accept flags for setting Memory, CPU, and GPU. You can read more about it here in the official documentation. Here is a brief example of setting the CPU.

docker run -it --cpus=".25" ubuntu /bin/bash

This step tells this container to use at max only 25% of the CPU every second.

Build containerized application from Zero on AWS Cloud9 #

Screencast #

Docker Python from Zero in Cloud9!](https://youtu.be/WVifwRIwSmo “Docker Python from Zero in Cloud9!")

  1. Launch AWS Cloud9
  2. Create Github repo
  3. Create ssh keys and upload to Github
  4. Git clone
  5. Create a structure
  6. Create a local python virtual environment and source MUST HAVE!:
$ python 3 -m venv ~/.dockerproj && source ~/.dockerproj/bin/activate
  • Dockerfile
FROM Python:3.7.3-stretch

# Working Directory
WORKDIR /app

# Copy source code to working directory
COPY . app.py /app/

# Install packages from requirements.txt
# hadolint ignore=DL3013
RUN pip install --upgrade pip &&\
    pip install --trusted-host pypi.python.org -r requirements.txt
  • requirements.txt
  • Makefile
setup:
    python3 -m venv ~/.dockerproj

install:
    pip install --upgrade pip &&\
        pip install -r requirements.txt

test:
    #python -m pytest -vv --cov=myrepolib tests/*.py
    #python -m pytest --nbval notebook.ipynb

validate-circleci:
    # See https://circleci.com/docs/2.0/local-cli/#processing-a-config
    circleci config process .circleci/config.yml

run-circleci-local:
    # See https://circleci.com/docs/2.0/local-cli/#running-a-job
    circleci local execute


lint:
    hadolint Dockerfile
    pylint --disable=R,C,W1203 app.py

all: install lint test
  • app.py
  1. Install hadolint (you may want to become root: i.e. sudo su -. run this command then exit by typing exit.)
wget -O /bin/hadolint \
  https://github.com/hadolint/hadolint/releases/download/\
  v1.17.5/hadolint-Linux-x86_64 &&\
                chmod +x /bin/hadolint
  1. Create cirleci config
# Python CircleCI 2.0 configuration file
#
# Check https://circleci.com/docs/2.0/language-python/ for more details
#
version: 2
jobs:
  build:
    docker:
    # Use the same Docker base as the project
      - image: python:3.7.3-stretch

    working_directory: ~/repo

    steps:
      - checkout

      # Download and cache dependencies
      - restore_cache:
          keys:
            - v1-dependencies-{{ checksum "requirements.txt" }}
            # fallback to using the latest cache if no exact match is found
            - v1-dependencies-

      - run:
          name: install dependencies
          command: |
            python3 -m venv venv
            . venv/bin/activate
            make install
            # Install hadolint
            wget -O /bin/hadolint\
              https://github.com/hadolint/hadolint/releases/download/v1.17.5/hadolint-Linux-x86_64 &&\
                chmod +x /bin/hadolint            

      - save_cache:
          paths:
            - ./venv
          key: v1-dependencies-{{ checksum "requirements.txt" }}

      # run lint!
      - run:
          name: run lint
          command: |
            . venv/bin/activate
            make lint            
  1. Install local circleci (optional)
  2. setup requirements.txt
pylint
click
  1. Create app.py
#!/usr/bin/env python
import click

@click.command()
def hello():
    click.echo('Hello World!')

if __name__ == '__main__':
    hello()
  1. Run in container
$ docker build --tag=app .
$ docker run -it app bash
  1. Test app in shell

REMEMBER Virtualenv:

$ python3 -m venv ~/.dockerproj && source ~/.dockerproj/bin/activate

And then run python app.py or chmod +x && ./app.py

  1. Test local circleci and local make lint and then configure circleci.
ec2-user:~/environment $ sudo su -
[root@ip-172-31-65-112 ~]# curl -fLSs https://circle.ci/cli | bash
Starting installation.
Installing CircleCI CLI v0.1.5879
Installing to /usr/local/bin
/usr/local/bin/circleci
  1. Setup Docker Hub Account and deploy it!
  2. To deploy you will need something like this (bash script)
#!/usr/bin/env bash
# This tags and uploads an image to Docker Hub

#Assumes this is built
#docker build --tag=app .


dockerpath="noahgift/app"

# Authenticate & Tag
echo "Docker ID and Image: $dockerpath"
docker login &&\
    docker image tag app $dockerpath

# Push Image
docker image push $dockerpath

Any person can pull now:

$ docker pull noahgift/app

Screen Shot 2020-02-04 at 3 51 15 PM

Exercise #

  • Topic: Create Hello World Container in AWS Cloud9 and Publish to Docker Hub
  • Estimated time: 20-30 minutes
  • People: Individual or Final Project Team
  • Slack Channel: #noisy-exercise-chatter
  • Directions:
    • Part A: Build a hello world Docker container in AWS Cloud9 that uses the official Python base image. You can use the sample command-line tools in this repository for ideas.
    • Part B: Create an account on Docker Hub and publish there
    • Part C: Share your Docker Hub container in slack
    • Part D: Pull down another students container and run it
    • (Optional for the ambitious): Containerize a flask application and publish

Converting a Command-line tool to a Web Service #

One item to point out is the feasibility of building command-line tools that get converted to web applications. You can see older idea I toyed with here: Adapt Project

Newer frameworks like chalice may be a secure fit for dual cli + web projects.

Documenting your Project with Sphinx #

Sphinx is a tool allowing developers to write documentation in plain text for natural output generation in formats meeting varying needs. This step becomes helpful when using a Version Control System to track changes. Plain-text documentation is also useful for collaborators across different systems. Plain text is one of the most portable formats currently available.

Although Sphinx is written in Python and initially created for the Python language documentation, it is not necessarily language-centric and, in some cases, not even programmer-specific. There are many uses for Sphinx, such as writing entire books and even websites.

Think of Sphinx as a documentation framework abstracting the tedious parts. It offers automatic functionality to solve common problems like title indexing and code highlighting (if showing code examples) with proper syntax highlighting.

Sphinx uses reStructuredText markup syntax (with some additions) to provide document control. You probably already know quite a bit about the language required to be proficient in Sphinx if you have ever written plain-text files.

The markup allows the definition and structure of text for proper output.

This is a Title
===============
That has a paragraph about the main subject and sets when the '='
is at least the same length of the title itself.

Subject Subtitle
----------------
Subtitles are set with '-' and are required to have the same length
of the subtitle itself, just like titles.

Lists can be unnumbered like:

 * Item Foo
 * Item Bar

Or automatically numbered:

 #. Item 1
 #. Item 2

Inline Markup
-------------
Words can have *emphasis in italics* or be **bold**, and you can define
code samples with backquotes, like when you talk about a command:
sudo "gives you superuser powers!

As you can see, that syntax looks very readable in plain text. When the time comes to create a specific format (like HTML), the title converts to a significant heading, bigger fonts than the subtitle (as it should), and the lists automatically get numbered. Already you have something quite powerful. Adding more items or changing the order in the numbered list doesn’t affect the numbering, and titles can change importance by replacing the underline used.

As always, throughout this book, create a virtual environment, activate it, and install the dependencies. Sphinx, in this case:

$  pip install Sphinx
Collecting Sphinx
  Downloading https://files.pythonhosted.org/Sphinx-3.0.3-py3-none-any.whl (2.8MB)
  2.8MB 703kB/s
...
Installing collected packages...
Successfully installed sphinx-3.0.3

The framework uses a directory structure to have some separation between the source (the plain-text files) and the build (which refers to the output generated). For example, if making a PDF from a documentation source, the data would be placed in the build directory. This behavior can be changed, but for consistency, I use the default format.

To get started, use the sphinx-quickstart tool (which should be available after installing Sphinx) to start a new documentation project. The process prompts you with a few questions. Accept all the default values by pressing Enter.

$ sphinx-quickstart
Welcome to the Sphinx 3.0.3 quickstart utility.

Please enter values for the following settings (just press Enter to
accept a default value if given in brackets).
...

I chose “My Project” as the project name that gets referenced in several places. Feel free to choose a different name.

After running the sphinx-quickstart command, there should be files in the working directory resembling these:

.
|__ Makefile
|__ _build
|__ _static
|__ _templates
|__ conf.py
|__ index.rst
|__ make.bat

These are some of the important files that you interact with:

  • Makefile: Developers who have compiled code should be familiar with this file. If not, think of it as a file containing instructions to build documentation output when using the make command. This file is the one you interact with the most to generate output.
  • _build: This is the directory where generated files go after a specific output is triggered.
  • _static: Any files that are not part of the source code (like images) go here, and later linked together in the build directory.
  • conf.py: This is a Python file holding configuration values for Sphinx, including those pre-selected initially with the sphinx-quickstart command.
  • index.rst: The root of the documentation project. This file connects to others if the documentation splits into other files.

Be cautious of jumping right into writing documentation. A lack of knowledge about layout and outputs can be confusing and could significantly slow your entire process.

Take a look inside “index.rst. " There is a significant amount of information and some additional complex syntax. Right after the main title in the index.rst file, there is a content listing with a toctree declaration. The toctree is the central element to gather all documents into the documentation. If other files are present, but not listed under this directive, those files would not get generated with the documentation at build time. We are now ready to generate output.

Run the make command and specify HTML as output. This output can be used directly as a website as it has everything generated, including JavaScript and CSS files.

$ make html
sphinx-build -b html -d _build/doctrees   . _build/html
Making output directory...
Running Sphinx v3.0.3
making output directory... done
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 1 source files that are out of date
updating environment: [new config] 1 added, 0 changed, 0 removed
reading sources... [100%] index
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
writing output... [100%] index
generating indices...  genindexdone
writing additional pages...  searchdone
copying static files... ... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded.

The HTML pages are in _build/html.

With our first pass at generating HTML from the two files, we have a fully functional (static) website.

Inside the _buildi directory, you should now have two new directories: doctrees and HTML. We are interested in the HTML directory holding all the files needed for the documentation site.

With so little information, Sphinx was able to create a lot. We have a basic layout with some information about the project’s documentation, a search section, a table of contents, copyright notices with name and date, and pagination. The search part is interesting because Sphinx has indexed all the files, and with some JavaScript magic, it has created a static site that is searchable. When you are ready to make more modifications, run the make html command again to regenerate the files.

If the look and feel of the generated output are not to your liking, Sphinx includes many themes that can be applied to change how the HTML files render the documentation ultimately. Some critical open source projects, such as SqlAlchemy and Ceph, heavily modify the HTML looks by changing the CSS and extending the templates.

Sphinx changed the way I thought about writing documentation. I was excited to easily document almost all of my open-source projects and a few internal ones.