Konrad Scherer
MONDAY, 22 APRIL 2013

Starting Openstack deployment

Starting with Openstack

I have some experience with Xen, but no experience with any software that controls a hypervisor. Wind River has several customers interested in using oVirt and Openstack with Wind River Linux. Another team is looking at oVirt, but no one had taken up the Openstack investigation. I have experience with Puppet and Puppetlabs has some official Openstack modules, so it seems a good place to start.

Fedora 18

I re-purposed a coverage builder and installed Fedora 18. I had read about Openstack and Fedora and thought that would be a good place to start. Then Redhat announced the Packstack and RDO project and I decided to give it a try.

The initial install failed due to selinux being disabled and conflicts with NIS (our NIS deployment contains users with uids that conflict with the ones in the rpms). When I finally got the packstack installer to complete after a clean install, openstack refused to recognize the admin user. So I did a reinstall using CentOS 6.4 and everything worked without issue.

Openstack and images

My experience with virtual machines has always been boot and install onto some empty, usually virtual, disk. Openstack was my first interaction with images. The docs recommend a base F18 image. The first attempt to download using the horizon interface seemed to hang. I was able to wget the image on my local machine 30 minutes after the download using the web page had started.

Openstack and LVM

My initial install of the host OS created a large LVM partition called cinder-volumes for the Openstack Block storage service. Unfortunately, the packstack installer renamed the volume group. I had to:

  • Stop the cinder-volume service
  • Delete the packstack created volume group and physical volume
  • Rename the local LVM volume group
  • Restart the cinder-volume service.

Openstack and volumes

I went into the Volume section of the Horizon Web UI and created a Volume. Running lvs on the host shows that a volume was created in the correct place. I launched the instance as tiny and noticed that it did not require a volume, which I found strange. I had setup a security group to allow ssh and inject my public key. I then associated a floating ip and was able to log into the vm! This was a happy moment.

After some poking around, a disk space check revealed that the VM had 10 GB disk space. This confused me because I had not associated it with a volume. So I repeated the process but setup the VM to boot off the volume I created earlier. This time the boot failed due to missing boot image.

Some EC2 history

I did more research and found this article. It explains some the history of virtual machine infrastructure. When EC2 was first launched, the VMs had no persistent storage. Customers had to use some sort of web service like S3 to persist information. This kind of image is called instance-store; Openstack refers to it as Ephemeral storage.

Then Amazon introduced EBS to provide persistent storage. It could be attached to an instance-store image as a another block device. In Openstack this is handled by Cinder as block level storage.

Then came the ability to boot from EBS volumes. This matches my internal model of a virtual machine as persistent like a physical machine. By default the volumes are empty, so the next step is populating the volume with the proper bits. I have experience with Cobbler to use kickstart and others to install new systems, but I was curious if the image could be “transferred” to the volume.

The Horizon Web UI was not helpful. Some more research revealed the following:

cinder create --image-id <image-id> --display-name mybootable-vol 10

This runs qemu-img convert and writes the raw image to the new cinder volume. This volume can be booted directly, but the Web UI still requires an image name which is ignored.

Summary

Types of Openstack VMs: 1) Ephemeral storage only. Default size of 0 means use image disk size. 2) Ephemeral + block storage. VM must format if blank and mount. Volume can only be attached to one VM. 3) Block storage only. Web UI does not support image to volume conversion but cinder does.

Next on my list is NetApp cinder driver and installation on Ubuntu 12.04 Server using official puppet modules.




TUESDAY, 19 MARCH 2013

Lessons learned running e2croncheck

Filesystems (ext4, xfs, zfs, etc) are one of those things whose failure nobody really wants to think about. The difference between a hard disk failure and complete filesystem corruption is largely academic. However a filesystem has many failure modes and the scariest is silent corruption that goes undetected for a long time. Worst case scenario is that backups are rendered useless.

The long time solution to detecting and correcting minor filesystem issues is fsck. The tool has several limitations:

  1. The check can only be run while the filesystem is offline.
  2. The check is serial per filesystem. It can be parallelized across multiple filesystems
  3. As the amount of data on the filesystem grows, the time to complete the check grows as well.

What seems to be standard practice is the following:

  1. Install and configure system with defaults
  2. Leave system running as long as possible
  3. When the machine hangs at a critical moment, reboot the machine
  4. Wait for hours until admin logs into console and fsck check is manually killed

This has several obvious drawbacks:

  1. fsck almost never gets a full run, especially if the system uses hibernation and/or S3 sleep
  2. The downtime always happens at the worst possible time
  3. No one knows how long an fsck is actually going to take
  4. The fsck may not be necessary, but the disk/machine needs to be offline anyways

Online fsck seems to impossible, because the state of the filesystem can change in ways that make the check wrong.

Databases have a similar problem: how to do a backup while the system is in operation. The solution there is to use filesystem snapshots. This is how I stumbled upon e2croncheck. The original from Theodore Ts’o is here. I found a revised version on GitHub by Ion

The script creates a read write snapshot of the filesystem. LVM uses a copy on write snapshot volume to track changes to the original filesystem. The script thens runs e2fsck on the snapshot which will report if there is actual corruption on the filesystem that needs to be repaired offline.

This seems like a better solution than the standard practice of ignoring the problem so I setup my next servers in the following way:

  • Six physical disks in hardware RAID 5
  • Two virtual disks: 500GB system and 8.6TB data
  • System uses ext4
  • Data uses lvm with one lvm physical volume and one lvm volume group
  • Single logical volume at 8TB with 500GB unused space for snapshot
  • Cronjob to run e2croncheck weekly

LVM snapshots are not without their problems. The big one is performance. There is overhead to the COW filesystem, but thanks to the Internet I found some benchmarks comparing performance with chunksize. The default chunksize is 4kB and increasing the chunksize to 64kB increases performance by 10x!

I also added ionice with e2fsck set to idle priority. So far the changes mean that the background check does not interfere with programs that are running.

The final version of the script is located here inside a puppet class to install the file and cron job.




SATURDAY, 20 OCTOBER 2012

When root cannot delete a file

Operation not permitted

It started when dpkg could not upgrade the util-linux package because the file /usr/bin/delpart could not be symlinked. So I tried to delete the file.

sudo rm /usr/bin/delpart
rm: cannot remove `/usr/bin/delpart': Operation not permitted

All operations on the file failed. I tried mv, fsck, reboot into rescue, etc.

So I googled “linux ext4 ‘Operation not permitted”. This did not help much, but I noticed a link about ext2 extended attributes. I have never used extended attributes, so I did a quick read of the man page for lsattr and chattr.

cd /usr/bin
sudo lsattr delpart
---D-a-----tT-- delpart

That was a strange set of attributes. So I compared to another random file.

sudo lsattr zip
-------------e- zip

Once the problem is found, the solution is straightforward

sudo chattr +e -DatT delpart
sudo lsattr delpart
-------------e- delpart

The question now is how the file got to this state. I can only speculate that a fsck run “repaired” this corrupted file to this strange but consistent state. I wonder if there are other surprises waiting for me on this disk.

Pages