Konrad Scherer
FRIDAY, 3 JUNE 2016

Python Packaging with make and pex

As it often happens in the life of a professional programmer, a small python script had grown into a large script and needed to be split apart and properly packaged. Most of my experience with python had been with small scripts. I had tried before to understand the python packaging ecosystem but always got confused by the combinations of tools and formats.

  1. Python development tools like virtualenv and pip
  2. Code distributed in eggs and/or wheels
  3. Packages installed using easy_install and/or pip
  4. Python packaging tools like setuptools and distutils

There seemed to be at least two different tools that did almost the same thing, but neither had good documentation. I did find some decent blog posts like Open Sourcing a Python Project the Right Way but there were still workflow steps that I needed to figure out. In the past I was able to avoid figuring it out, but this time was different because my “small” script had grown to over 1000 lines of python and there was no way to avoid it.

I had an informal set of requirements:

  1. No root access should be required. Python supports local installation and virtualenv
  2. Bootstrap a development environment quickly
  3. The development setup should be self contained and not affect any other part of the machine

My research took me all over the web, but one of the most important pieces of inspiration was this small post on Virtualenv and Makefiles. I was also inspired by Pex which provided a way to bundle all the python pieces together into a single self extracting package.

It took a few days but I was able to combine make, mkvirtualenv, pip and pex to implement a nice workflow. The Makefile will:

  1. Install pip into $HOME/.local/bin/pip
  2. Use local pip to install virtualenv and virtualenvwrapper into $HOME/.local/bin
  3. Create a per project virtualenv for the project and install all the development dependencies like pylint, flake8 and pex
  4. Check if required development packages are installed. Some python packages have C extensions and require a compiler and header files.
  5. Runs python setup.py develop which installs the package dependencies like yaml and redis. This step also adds the package to the virtualenv and can be used if development is spread across multiple git repositories.
  6. Uses python setup.py bdist_pex to build the pex file

Other nice touches:

  1. The source py files are dependencies on the pex package so editing a file causes the pex file to be rebuilt. Regex support in Make simplifies this step
  2. Has make help which reads comments embedded in the Makefile to generate nice help output
  3. Has make clean for easy cleanup
  4. Each make step loads the proper virtualenv, so the developer does not even have to activate the virtualenv manually.

Some annoyances:

  1. Pex does not pick up local python file changes unless I delete the egg file in the pex build dir.
  2. To keep timestamps in order, sometimes it is necessary to touch certain files.
  3. I had to create a .check file to prevent the system package checking from running every build
  4. Dependent on Pypi being available, though pip does cache downloads locally

The last step was to integrate the pex file into a docker image. If the package does not contain dependencies on system libraries, the Alpine Linux Python docker images can be used as a base. Unfortunately the python mesos.native packages I am using have dependencies on libraries like libsaml and I could not use Alpine Linux. But I was able to use the base Ubuntu image and only needed to install a few libraries which made the image much smaller than before.

I noticed that pex file is unpacked into PEX_ROOT which is $HOME by default. The last tweak I made was to ensure that PEX_ROOT was a docker volume to avoid the overhead of writing to the union filesystem. This isn’t strictly necessary, but I try to work as if the docker image is effectively read-only.

I have already reused this Makefile structure for other python projects. I was pleasantly surprised when a colleague of mine was able to clone the project and rebuild the docker image without any intervention.

I am now able to focus on refactoring and developing the project. The packaging part is solved in a clean way that can easily be shared with others.




MONDAY, 29 FEBRUARY 2016

Docker Daemon and Systemd

I recently read an article on LWN about Systemd vs Docker and I was disappointed. As far as I am concerned, this is preventing one of the worst design flaws in Docker from being addressed. Docker CEO Solomon Hykes also thinks this should be resolved, though Issue #2658 has remained open since Nov 2013.

The current Docker design sets up all containers as children of the Docker daemon process. The consequence of this is that upgrading the daemon requires that all the containers are stopped/killed. Other operations like changing the daemon command line requires stopping all the containers. I have to be extra careful with my Puppet configuration because any change to the config files will restart the docker daemon. To prevent inadvertent restarts I had to remove the normal configuration to service dependency which normally restarts the daemon when the configuration changes.

From an operational perspective this is a pain. It represents another in a long line of software that requires significant operational resources to deploy properly. If the operator is lucky, the containerized application can be managed with load balancers or DNS rotation. If the service cannot work this way or the Ops team cannot build the required infrastructure, then upgrades mean downtime. With VMs it is possible to move the application to another machine, but CRIU isn’t ready yet. These “solutions” require large amounts of operational effort. I built a rolling upgrade system around Ansible to handle docker upgrades.

My experience with Mesos has been very different. The Mesos team has a supported upgrade path with lots of testing. I have upgraded at least 5 releases of Mesos without issues or any downtime.

What does this have to do with systemd? In order to support seamless upgrades of the docker daemon, the ownership of the container processes will have to be shared with some other process. This could be another daemon, but the init system is an obvious choice. If the docker daemon co-operated with another daemon or systemd by sharing ownership of the processes, then a nice upgrade path could be developed.

The Docker team is working on containerd and has stated that RunC would be integrated and this may where better integration with an init system becomes possible. I realize this is selfish, but for me all these squabbles are just distracting developers from addressing one of my major pain points with using Docker.




THURSDAY, 9 JULY 2015

Benchmarking docker storage backends

I am using docker simulate building Wind River Linux (which is based on OE-Core and Poky) on different hosts. The actual build is done on a bind mount outside of the container so I did not expect the storage backend to affect performance, but it did.

See Docker Issue #2891 for full history.

Setup

  • docker 1.7
  • Ubuntu 14.04.2
  • Vivid kernel 3.19.0-21-generic
  • Dual 6C Xeon with 64GB RAM and 100GB root SSD and dual 3TB RAID0

Using following Dockerfile:

FROM ubuntu:14.04.2

MAINTAINER Konrad Scherer <Konrad.Scherer@windriver.com>

RUN useradd --home-dir /home/wrlbuild --uid 1000 --gid 100 --shell /bin/bash wrlbuild && \
    echo "wrlbuild ALL=(ALL) NOPASSWD: ALL" >> /etc/sudoers

RUN dpkg --add-architecture i386 && \
    apt-get update && \
    DEBIAN_FRONTEND=noninteractive apt-get -qy install --no-install-recommends \
    libc6:i386 libc6-dev-i386 libncurses5:i386 texi2html chrpath \
    diffstat subversion libgl1-mesa-dev libglu1-mesa-dev libsdl1.2-dev \
    texinfo gawk gcc gcc-multilib help2man g++ git-core python-gtk2 bash \
    diffutils xz-utils make file screen sudo wget time patch && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/* && \
    rm -rf /usr/share/man && \
    rm -rf /usr/share/doc && \
    rm -rf /usr/share/grub2 && \
    rm -rf /usr/share/texmf/fonts && \
    rm -rf /usr/share/texmf/doc

USER wrlbuild

CMD ["/bin/bash"]

Building poky, fido release, core-image-minimal on a ext4 bind mount with the docker image using different storage backends.

cd <buildarea>
mkdir downloads
git clone --branch fido git://git.yoctoproject.org/poky
source poky/oe-init-build-env mybuild
ln -s ../downloads .
sed -i 's/#MACHINE ?= "qemux86-64"/MACHINE ?= "qemux86-64"/' conf/local.conf
bitbake -c fetchall core-image-minimal
time bitbake core-image-minimal

Results

Bare-metal:

real    33m5.260s
user    289m41.356s
sys     27m23.488s

Aufs:

real    40m24.416s
user    258m48.932s
sys     56m29.284s

Devicemapper with official binary in loopback mode: This requires --storage-opt dm.override_udev_sync_check=true

real    35m24.415s
user    289m10.660s
sys     34m21.168s

Devicemapper with my own compiled dynamic binary: This still requires --storage-opt dm.override_udev_sync_check=true even though docker info states udev sync is supported.

real    34m18.387s
user    294m1.720s
sys     31m43.764s

Overlayfs:

real    33m46.890s
user    293m40.084s
sys     35m31.480s

Conclusion

Aufs still has a measurable performance overhead even when the IO is done on a bind mount outside of the aufs filesystem. Devicemapper and overlayfs do not add overhead to this specific scenario. I did have problems with devicemapper on Ubuntu 14.04 and the 3.13 kernel, but since I upgraded to the 3.16 kernel I have not had any problems with devicemapper errors. The only problems I have add were related to the udev sync detection and new requirement with Docker 1.7.

My options are:

  • Ignore the udev sync requirement with a flag
  • Compile and distribute my own dynamically linked version of docker and hope that docker will provide an official version on Ubuntu
  • Switch to Overlayfs

There are reports of problems with Overlayfs and using rpm inside a container. I will do some more testing with Overlayfs, but it seems my best option now is to move all my Ubuntu builders to use Overlayfs.

Pages