Introduction
I manage the git infrastructure for the Linux group at Wind River: the
main git server and 5 regional mirrors which are mirrored using
grokmirror. I plan to do a post about our grokmirror setup. The main
git server holds over 500GB of bare git repos and over 600 of those
are mirrored. Many repos are not mirrored. Some repos are internal,
some are mirrors of external upstream repos and some are mirrors of
upstream repos with internal branches. The git server runs CentOS 5.10
and git 1.8.2 from EPEL.
One of the largest repos contains the source for the toolchain and
all the binaries. Since the toolchain takes a long time to build, it
was decided that Wind River Linux should ship pre-compiled binaries
for the toolchain. There is also an option which allows our customers
to rebuild the toolchain if they have a reason to.
The bare toolchain repo size varies between 1 and 3GB depending on
supported architectures. Many of the files in the repo were tarballs
around 250MB size.
Why is the git server down again?
When a new toolchain is ready for integration, it is uploaded to the
main git server and mirrored. Then the main tree is switched to enable
the new version of the toolchain and all the coverage builders start
to download the new version. Suddenly the git servers would become
unresponsive and would thrash under memory pressure until they would
be inevitably rebooted. Sometimes I would have to disable the coverage
builders and stage their activation to prevent a thundering herd from
knocking the git server over again.
Why does cloning a repo require so much memory?
I finally decided to investigate this and found a reproducer
quickly. Cloning a 2.9GB bare repo would consume over 7GB of RAM
before the clone was complete. The graph of used memory was
spectacular. I started reading the git config man page and asking
google various questions.
I tried setting the binary attributes on various file types, but
nothing changed. See man gitattributes for more information. The
default set seem to be fine.
I tried various git config options like core.packedGitWindowSize and
core.packedGitLimit and core.compression as recommended in many blog
posts. But the memory spike was still the same.
core.bigFileThreshold
From the git config man page:
Files larger than this size are stored deflated, without attempting delta compression.
Storing large files without delta compression avoids excessive memory usage, at the slight
expense of increased disk usage.
Default is 512 MiB on all platforms. This should be reasonable for most projects as source
code and other text files can still be delta compressed, but larger binary media files
won’t be.
The 512MB number is key. The reason the git server was using so much
memory is because it was doing delta compression on the binary
tarballs. This didn’t make the files any smaller because they were
already compressed and required a lot of memory. I tried one command:
git config --global --add core.bigFileThreshold 1
And suddenly (no git daemon restart necessary), the clone took a
fraction of the time and the memory spike was gone. The only downside
was that the repo required more disk space; about 4.5GB. I then tried:
git config --global --add core.bigFileThreshold 100k
The resulted in approx 10% more disk space (3.3GB) and no memory spike
when cloning.
This setting seems very reasonable to me. The chance of having a text
file larger than 100Kb is very low and the only downside is slightly
higher disk usage. Git already is very efficient in this regard.
UPDATE This setting can cause disk space issues on linux kernel
repos. See update here
Hardware setup
I am managing an R710 Dell server with 6 2TB disks. The RAID
controller does not support JBOD mode, so I had to create 6 RAID0
virtual disks with one disk per group. The disks are then passed
through to Linux as /dev/sda to /dev/sdf. I am running 6 xen vms and
each vm gets a dedicated disk. The vms are coverage builders and not
mission critical so there is no point in added redundancy. I have a
nice Cobbler/Foreman setup that makes provisioning very quick.
OpenManage and check_openmanage
I am running the Dell OpenManage software on the system. If fact I am
running it on all my hardware. I am using the puppet/dell module
graciously shared on Github. The OpenManage package does many things
including CLI query access to all the hardware.
Then I stumbled across check_openmanage which is a Nagios check
which queries all the hardware and notifies Nagios if there are any
problems. I had already used the Puppet integration with Nagios to
setup a bunch of checks for ntp, disk and some other services. To make
things even easier, check_openmanage is in EPEL and Debian. It did not
take much time to add this check to the existing checks.
Predicted Failure
So once everything was setup, I started getting warned about many
things that I was not aware of like firmware out of date and that some
hard drives were predicted to fail. The output of check_openmanage
looks like this:
WARNING: Physical Disk 1:0:4 [Seagate ST32000444SS, 2.0TB] on ctrl 0 is Online, Failure Predicted
A reasonably painless call to Dell and a replacement disk is shipped.
Disk replacement
When a disk fails it has a really nice blinking yellow light. To make
things clean, I wanted to shutdown and delete the correct vm before
changing the disk. How to figure out the correct vm to shutdown.
> omreport storage pdisk controller=0 pdisk=1:0:4
Physical Disk 1:0:4 on Controller PERC 6/i Integrated (Embedded)
Controller PERC 6/i Integrated (Embedded)
ID : 1:0:4
Status : Non-Critical
Name : Physical Disk 1:0:4
State : Online
Failure Predicted : Yes
> omreport storage pdisk controller=0 vdisk=5
List of Physical Disks belonging to Virtual Disk 5
Controller PERC 6/i Integrated (Embedded)
ID : 1:0:4
Status : Non-Critical
Name : Physical Disk 1:0:4
Okay found the correct physical disk and the associated virtual disk.
> omreport storage vdisk controller=0 vdisk=5
Virtual Disk 5 on Controller PERC 6/i Integrated (Embedded)
ID : 5
Status : Ok
Name : Virtual Disk 5
State : Ready
Device Name : /dev/sdf
Okay I know that this physical disk maps to the device /dev/sdf and I
initiated a shutdown of the vm that uses that disk.
The disk with predicted failure has a flashing amber light which makes
it easy to figure out which one to swap.
Once the swap is complete run the following command to recreate the vdisk.
omconfig storage controller controller=0 action=createvdisk raid=r0 size=max pdisk=1:0:4
And /dev/sdf is available once again.
Openstack Grizzly 3 node cluster installation
There is a lot of infrastructure that I leveraged to do this
installation:
- Local ubuntu mirror
- Debian Preseed files to automate installation
- Dell iDRAC and faking netboot using virtual CDROM
- Puppet master with git branch to environment mapping
- Git subtrees to integrate OpenStack puppet modules
- An example hiera data file to handle configuration
Local Ubuntu mirror
Having a local mirror makes installations much simpler because
packages download very quickly. The ideal setup uses netboot because
the mirror already contains the kernel and initrd and packages needed
to do the installation. I used:
ubuntu/dists/precise/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/linux
ubuntu/dists/precise/main/installer-amd64/current/images/netboot/ubuntu-installer/amd64/initrd.gz
To create the mirror I used the ubumirror scripts provided by
Canonical.
Debian Preseed
I already have some experience using debian preseed files to automate
installation of Ubuntu and Debian. The documentation is spread out all
over the Internet. Most of the preseed is just sets the local mirror
and network setup. The OpenStack related options were the disk layout
and adding the Ubuntu Cloud Archive.
Openstack Compute Node disk layout
The machines I am using were purchased before I even knew OpenStack
existed. They were used for Wind River Linux coverage builds and the
simplest configuration uses 2 900GB SAS drives in RAID0. The builds
require a lot of disk space and builds on SSD and in memory provided
only a small speedup versus the increase in cost.
My idea was to use LVM and allow cinder to use the remaining space to
create volumes for the vms. Here are the relevant preseed options to
handle the disk layout.
d-i partman-auto/method string lvm
d-i partman-auto/purge_lvm_from_device boolean true
d-i partman-auto-lvm/new_vg_name string cinder-volumes
d-i partman-auto-lvm/guided_size string 500GB
d-i partman-auto/choose_recipe select atomic
There are 3 kinds of storage in OpenStack: instance/ephemeral, block and
object.
-
Object storage is handled by swift and not part of this
installation.
-
Block storage is done by default using iscsi and LVM
logical volumes. Cinder looks for a LVM volume group called
cinder-volumes and creates logical volumes there.
-
Instance/Ephemeral storage by default goes into /var on the root
filesystem. This is why I made the root filesystem 500GB. But this
does not allow live migration because the root filesystem is not
shared. If the vm was booted using block storage then the iscsi
driver can handle the migration of vms. Another option is to mount
/var on a shared nfs drive.
Ubuntu Cloud Archive
I added the cloud and puppetlabs apt repos in the preseed to prevent
older versions of packages being installed.
d-i apt-setup/local0/repository string \
http://apt.puppetlabs.com/ precise main dependencies
d-i apt-setup/local0/comment string Puppetlabs
d-i apt-setup/local0/key string http://apt.puppetlabs.com/pubkey.gpg
d-i apt-setup/local1/repository string \
http://ubuntu-cloud.archive.canonical.com/ubuntu precise-updates/grizzly main
d-i apt-setup/local1/comment string Ubuntu Cloud Archive
d-i apt-setup/local1/key string \
http://ubuntu-cloud.archive.canonical.com/ubuntu/dists/precise-updates/grizzly/Release.gpg
tasksel tasksel/first multiselect ubuntu-server
d-i pkgsel/include string openssh-server ntp ruby libopenssl-ruby \
vim-nox mcollective rubygems git puppet mcollective facter \
ruby-stomp puppetlabs-release ubuntu-cloud-keyring
Dell iDRAC and faking netboot using virtual CDROM
Unfortunately I do not have DHCP, PXE and TFTP in this subnet to do
netboot provisioning. I am working on this with our IT department. So
for now I have to fake it.
I grab the mini.iso from the Ubuntu mirror
ubuntu/dists/precise/main/installer-amd64/current/images/netboot/mini.iso
This contains the netboot kernel and initrd. I can then log into the
Dell iDRAC and start the remote console for the server. Using Virtual
Media redirection, I connect the mini.iso and boot the server. Press
F11 to get the boot menu and select Virtual CDROM.
But using this directly means I have to type everything into a tiny
console window. So I modified the isolinux.cfg to change the kernel
params to load the preseed automatically
Mount mini.iso locally and copy the contents to the hard drive
sudo mount -o loop mini.iso /mnt/ubuntu/
cp -r /mnt/ubuntu/ .
chmod -R +w ubuntu
Here are the contents of the isolinux.cfg after editing:
default preseed
prompt 0
timeout 0
label preseed
kernel linux
append vga=788 initrd=initrd.gz locale=en_US auto \
url=<server>/my.preseed priority=critical interface=eth0 \
console-setup/ask_detect=false console-setup/layout=us --
Then make a new iso:
mkisofs -o ubuntu-precise.iso -b isolinux.bin -c boot.cat \
-no-emul-boot -boot-load-size 4 -boot-info-table -R -J -v -T ubuntu/
Then the process is almost completely automated. Except that the
server cannot download the preseed until the networking is
configured. This info can be added to the kernel params, but then I
would have to edit each iso for each server. With RedHat kickstarts I
was able to add a script that mapped MAC address to IP and completely
automate this. But with preseeds I need to manually enter the network
info. The proper solution is a provisioner like Cobbler or Foreman.
Puppet master with git branch to environment mapping
I have setup my puppet masters based on the post by Puppetlabs:
I like this setup a lot. All development happens on my desktop and I
have a consistent version controlled collection of all modules
available to my systems. I am using it give some colleagues that are
learning puppet a nice environment that won’t mess up my systems.
But I have some custom in-house modules and I want to put the
OpenStack puppet modules in the same git branch beside them. The
existing tools like puppet module and puppet librarian, etc. do not
work in this use case. I want to be able to use git for these external
repos and be able to easily share any patches I make with
upstream. Enter git subtree.
Git subtrees to integrate OpenStack puppet modules
Git subtree is part of the git package contrib files. Enabling it on
my system was simple:
cd ~/bin
cp /usr/share/doc/git/contrib/subtree/git-subtree.sh .
chmod +x git-subtree.sh
mv git-subtree.sh git-subtree
Now I can go to my modules directory and add in the OpenStack puppet
modules
for arg in cinder glance horizon keystone nova; do \
git subtree add --prefix=modules/$arg \
--squash https://github.com/stackforge/puppet-$arg stable/grizzly;\
done
There are some more supporting modules like inifile, rabbitmq, apt,
vcs, etc. Look in openstack/Puppetfile for the full list.
Next was to enable the modules on my machines. First the hiera data
needs to added for the network config. I was inspired by Chris Hodge’s
video and hiera data
The gist has some minor issues. I posted a revised version.
The last piece is to enable the modules on the nodes
node 'controller' {
include openstack::repo
include openstack::controller
include openstack::auth_file
class { 'rabbitmq::repo::apt':
before => Class['rabbitmq::server']
}
}
node 'compute' {
include openstack::repo
include openstack::compute
}
Conclusion
Most of this infrastructure already existed or I had already done in
the past. I was able to reimage 3 machines and have a working grizzly
installation in about 3 hours.
Many thanks to all people who have contributed to Debian, Ubuntu, Puppet and
the OpenStack puppet modules.