Introduction
Developing an application to be distributed as a K8s service is a
complicated undertaking. Besides learning the application language and
solving the application problem, there are all the K8s workflows that
need to be automated. This is my attempt to navigate the insane K8s
ecosystem of tools as I try to make a decent development and
production workflow.
Development workflow
The local development workflow needs to have a fast feedback
loop. For a K8s application that requires at minimum a container build
and deployment.
Ubuntu setup of go 1.15
Latest go at this time is 1.16.3, but for a sample app the distro
supplied 1.15 is fine.
sudo apt install golang-1.15
cd $HOME/bin && ln -s /usr/lib/go-1.15/bin/go
I have $HOME/bin in my $PATH so this makes it easy to manage all the
installation of single binary tools. Technically with buildpacks I
don’t even need to install the go toolchain, but I want to explore
things like debugging of a running go application.
Buildpacks
I am not a big fan of Dockerfiles, especially the multi-stage
Dockerfiles which is the right way to separate the build and runtime
containers. Due to the single binary structure of Go applications they
can have a tiny runtime image. So I decided to investigate using
buildpacks4 which look like a much better alternative for application
development. It even supports new features like reproducible builds
and image rebasing.
Install the pack tool.
cd $HOME/bin
curl -LO https://github.com/buildpacks/pack/releases/download/v0.18.1/pack-v0.18.1-linux.tgz
tar xzf pack-v0.18.1-linux.tgz
chmod +x pack
rm -f pack-v0.18.1-linux.tgz
Start with golang buildpacks sample app1.
Since this is a golang app, the default buildpack can be tiny:
pack config default-builder paketobuildpacks/builder:tiny
cd $APP
pack build mod-sample --buildpack gcr.io/paketo-buildpacks/go
With buildpacks running locally, the workflow is and save, run
pack to build, run docker and test. It takes a few seconds and
multiple steps.
Skaffold and Minikube
This app will run in K8s and will depend on K8s features so it will
need to run inside K8s. Enter Minikube2 for a local K8s setup and
Skaffold3 to orchestrate the development workflow.
cd $HOME/bin
curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
chmod +x minikube
minikube start
This starts up a full K8s instance locally using the docker
driver. The initial download was ~1GB so it takes a while.
cd $HOME/bin
curl -Lo skaffold https://storage.googleapis.com/skaffold/releases/latest/skaffold-linux-amd64
chmod +x skaffold
Now switch to the golang buildpack sample with skaffold5.
<term 1> skaffold dev
<term 2> minikube tunnel
<term 3> kubectl get svc # to get IP
<term 3> curl -s http://<external IP>:8080
minikube has its own docker daemon and the buildpacks used and images
built are located inside minikube and not the host docker6.
<term 4> eval $(minikube docker-env)
<term 4> docker images
This makes deploying the image very fast because it isn’t copied.
Development workflow
The skaffold sample is setup to use gcr.io/buildpacks/builder:v1
and
it also works with the builder paretobuildpacks/builder:tiny
. The
Google buildpacks7 supports “file sync” which copies changed files
directly to the image. This means changes are available in seconds
which is great for development.
Next steps
My application will be a multi-cluster app that exchanges K8s resource
data. First step is to query the resource utilization of the K8s
cluster using client-go.
Introduction
Many years ago (2014) I setup the my git servers with the git option
core.bigFileThreshold=100k. This reduced memory usage dramatically
because git stopped compressing already compressed files. I have used
this option for many years without apparent problems until one of my
colleagues alerted me that cloning an internal mirror of the Linux
kernel from my git server was transferring over 9GB of data! Cloning
the same repo from kernel.org transferred only approx 1.5GB.
So many repack options
When I looked at the bare repo everything seemed normal. The repo had
been repacked properly less than a month ago thanks to
grokmirror. There was a single pack file with a bitmap and a single
pack file that was 9.1GB! I tried all the standard repack commands:
and when that didn’t help:
> git repack -A -d -l -b -F -f
But nothing changed. Then my colleague reported that rebuilding with
the above options did work on his machine and reduce the git repo
size. This meant that there must be a local setting on the server
that was causing the problem. I looked at the local ~/.gitconfig and
saw the bigFileThreshold option I had set so long ago. So I did a
quick experiment with:
> git -c core.bigFileThreshold=512m repack -A -d -F -f
and it did indeed reduce the bare git repo from 9.1GB to 1.9GB! It
seems that there are ~200K files in the Linux kernel repo that are
over 100k and when they are not compressed the size of the repository
grows a lot!
Curious how large the files in the kernel repo can become I did a
checkout of the mainline kernel and looked for files of 100Kb.
> find . \( -path */.git/* \) -prune -o \( -type f -size +100k \) | wc -l
914
Four of these files are even over 10MB!
Solution
Once the problem has been clearly identified the solution is usually
simple. In this case the gitolite config for all the kernel repos sets
the core.bigFileThreshold to its default value of 512m. This way all
the other repos can still use the smaller bigFileThreshold setting.
There is also a way to tell git not to delta compress files with
certain extensions. I created a global git attributes file
/etc/gitattributes with the following content:
*.bz2 binary -delta
*.gz binary -delta
*.xz binary -delta
*.tgz binary -delta
*.zip binary -delta
*.lz binary -delta
Which covers all the compressed files in our repos and it had the same
effect so I reverted the bigFileThreshold option to the default of
512M.
Introduction
During a recent job interview I was asked “Do you think you are a 10x
developer”. The concept of a “10x” developer and developer
productivity is something I have thought a lot about. Fundamentally
the hard part is figuring out what to measure. I don’t have any good
answers but here is how I think about it today.
How to measure productivity?
Since programming is a fairly creative activity it will always be
difficult to find a measure that cannot be gamed.
A simple but flawed measure is something like “lines of code” or
“features completed” or “bugs fixed”. These measurements are flawed
because they are only loosely linked to the things users of the code
actually care about. In University I met someone that allegedly
completed a 5 hour coding interview in 1.5 hours with code that passed
all the unit tests. If true this is impressive and a testament to
that particular developers skills. I doubt I would ever be able to
match such a feat.
Just as a person has many personality facets, a developer can work on
different facets of productivity. I like the word facets because each
is unique while still contributing to the whole.
Cost of programming errors
An important skill is the ability to produce “error-free” code. I
think computer programming is unique in that a single bug can cost
millions of dollars to fix. Even perfectly correct code can require
rewriting when the requirements or execution environment
changes. Examples of insanely expensive bugs include OpenSSL
HeartBleed, Intel Meltdown and more. These bugs cause the users damage
and also generate rework for the entire industry.
Programming is a continuous tradeoff between getting the code working
for a specific use-case and making it robust enough to handle multiple
use-cases. Figuring how much it will cost to develop a feature is hard
enough and the risk of expensive bug is rarely factored in. There
isn’t an easy way to measure the cost of expensive bugs. The cost to
fix bugs is also hard to measure and not accounted as an engineering
cost.
Developing the skill of writing code that doesn’t result in expensive
bugs often requires:
- Using tools like static analyzers, linters, enabling all compiler
warnings, fuzzers, code quality scanners, etc. Catching errors early
is often the best return on investment. However, each tool takes
time to learn and integrate. Each run takes time and a high rate of
false positives can result in lost productivity.
- Developing and maintaining a set of runtime tests. Code developed at
the same time as tests tends to be better designed because it works
best when dependencies are minimized. Code with a good test suite
can be refactored more easily. On the other hand, runtime testing of
a large software base requires significant infrastructure in order
to minimize false positives and maintain a good feedback loop.
- Careful software reuse. Sometimes using an existing code base is the
right thing to do. For example, almost none of the developers that
thought they could write an encryption library have succeeded. Each
dependency on a third party becomes a liability and has to be
managed carefully. Ideally, it is an open source library and you can
become part of its community and keep up with the upgrades and
security fixes. In the worst case scenario, you end up maintaining a
fork of the software or have to apply horrible workarounds.
- Creating operationally simple software. Even bug free software can
be a pain to upgrade or keep operational in a high availability
configuration. Software has many different user interfaces and one
is how the software is installed, configured, upgraded and
maintained. I wasn’t exposed to this facet of software until I had
to maintain a cluster of 100+ machines. I have found that whether a
service can reload its configuration without a restart is a good
indication if the operator interface has been taken
seriously. Reloading configuration at runtime requires a good
software design and test suite. When there are bugs it is too easy
for the developers to just deprecate the feature and force
restarts. But being able to reload a configuration without impact on
running sessions is an operationally valuable feature.
In the wrong environment an inexperienced developer can introduce
programming errors that will cost more than their
contributions. Everyone likes to talk about “10x” programmers, but I
think we should also talk about “negative productivity” programmers
and what can be done to reduce the cost of these errors by catching
and preventing them earlier.
Cost of fixing bugs
Debugging is a specific developer skill. It is difficult to teach and
hard to explain the instincts of a good debugger to an inexperienced
developer. Being able to make an intermittent bug easily reproducible
or use gdb to track down some memory corruption are critical skills at
the right time. I also saw a talk by a Google engineer that was
investigating a 99th percentile latency outlier and found a Linux
kernel scheduler bug that saved Google millions of dollars a year. As
systems become more complex, the bugs also become harder to fix. I
wish there were better ways to capture and train debugging expertise.
Individual productivity versus team productivity
One of the amazing properties of software is leverage where a single
tool can make a large group of developers more productive. The goal of
every manager should also be to make their team more productive. The
goal of almost every software product is to make their customers more
productive. Being able to find and address productivity bottlenecks in
a team is another developer skill. Developing this skill often
requires:
- Understanding of the workflow of the different members of the team
- Use of automation tools to transition manual work to the computer
- Creating tools with a compelling user interface for team
- Talking with upstream and downstream teams to find ways to make
interactions smoother and more automated
This assumes that the team works well together. A toxic team member
can reduce the productivity of an entire team. Language, timezone and
cultural differences can also hinder productivity.
Choosing the “right” work
Even the most perfect code is useless if it doesn’t solve the right
problems. Keeping development aligned with business needs can
contribute to team productivity by eliminating rework. Some of the
skills required to do this well are:
- Interacting with customers directly and understanding what their
problems are and why they are looking to you to solve them
- Communicating technical concepts to non-technical people in an
effective way
- Communicating non-technical requirements to technical people in an
effective way
- Potentially developing expertise in the customer domain to
understand their domain specific language and problem context
Conclusion
It is impossible to be excellent in all these skills. The most
important is to constantly find ways to improve individual and team
productivity. I suspect this isn’t the answer that an interviewer is
expecting. I need to come up with a shorter answer.