September 30, 2014

Getting my hands dirty on an OpenStack lab

Like you all may know, OpenStack is currently one of the coolest open source projects, so I was thrilled when I was asked to manage the deployment of an OpenStack lab for internal Percona use. Starting from basically zero, I created tasks in our Jira and assigned them to a pool of volunteer consultants. As usual in a service company, billing is the priority so I ended up losing the 2 senior guys but fortunately most of my time was with a customer that wasn’t very demanding and I could easily multitask with the project and fill the gap. So, here it goes…

Hardware

To deploy the OpenStack lab we were given 8 similar servers in our Durham, N.C. offices. The specs are:

  • CPU: 12 physical cores (24 with HT)
  • Disks: 1 sata 4 TB drive and one 480GB SSD drive
  • Nics: 2x GbE
  • OS: Centos 6

The hardware is recent and decent, a good start.

Deployment choices

Given the hardware we had, I picked the first to be the controller and jumphost, the second to be the network node (a bit overkill) and the remaining 6 nodes would become the compute nodes. I also wanted to use Ceph and RBD with 2 types of volumes, the default using SATA and a SSD type using the SSD drives. The servers only have a single GbE interface to use with Ceph, that’s not ideal but sufficient.

So, we basically followed: OpenStack doc for Centos and had our share of learning and fun. Overall, it went relatively well with only a few hiccups with Neutron and Ceph.

Neutron

Neutron is probably the most difficult part to understand. You need to be somewhat familiar with all the networking tools and protocols to find your way around. Neutron relies on network namespaces, virtual network switches, GRE tunnels, iptables, dnsmasq, etc. For my part, I discovered network namespaces and virtual network switches.

The tasks of providing isolated networks to different tenants with their own set of IPs and firewalling rules, on the same infrastructure, is a challenging tasks. I enjoyed a lot reading Networking in too much detail, from there, things just started to make sense.

A first issue we encountered was that the iproute package on Centos is old and it does not support the network namespaces. It just needs to be replaced by a newer version. It took me some time to understand how things are connected, each tenant has its own Gre tunnel id and vlan. I can only recommend you read the above document and look at your setup. Don’t forget, you have one set of iptables rules per network namespace… The network node is mainly dealing with Natting, SNAT and DNAT while the security group rules are set on the compute nodes. The “ip netns exec …” and the “ovs-ofctl dump-flows …” have been my best friends for debugging.

Once I got things working, I realized the network performance was, to say the least, pretty bad. I switch “gro off” on the Nics but it made very little change. Finally, I found how the MTU of the VMs were too large, an easy fix by adding a configuration file for dnsmasq with “dhcp-option-force=26,1400″. With an MTU of 1400, less than the NIC MTU + the GRE header, packets were no longer split and performance went back to normal.

More on Ceph

The integration of Ceph happened to be more challenging than I first thought. First, let’s say the documentation is not as polished as the Openstack one, there are some rough edges but nothing unmanageable. The latest version, at the time, had no rpms for Centos but that was easy to work around if you know rpmbuild. Same for the Centos RPM for qemu-kvm and nova required a patch to support rbd devices. I succeeded deploying Ceph over the SATA drives, configured Glance and Cinder to use it and that was all good. The patch for nova allows to launch instances on clones of an image. While normally the image has to be copied to the local drive, an operation that can take some time if the image is large, with a clone, barely a few second after you started a VM, it is actually starting. Very impressive, I haven’t tested but I suppose the same can be accomplished with btrfs or ZFS, shared over iscsi.

Since the servers all have a SSD drive, I also tackled the task of setting up a SSD volume type. That has been a bit tricky, you need to setup a rule so that a given storage pool uses the SSD drives. The SSD drives must be placed all in their own branch in the ceph osd tree and then, you need to decompile the current rule set (crush map), modify it by adding a rule that use the ssd drives, recompile and then define a storage pool that uses the ssd rule. Having done this, I modified the cinder configuration for the “volume-ssd” type and finally, I could mount a ssd backed volumes, replicated, to a VM, quite cool.

The only drawback I found using Ceph is when you want to create a snapshot. Ideally, Ceph should handle the snapshot and it should be kept there as is. The way it works is less interesting, a snapshot is created but it is then copied to /tmp of the compute node, uncompressed…, and then copied back to a destination volume. I don’t know why it is done like that, maybe some workaround for limitations of other Cinder backends. The compute nodes have only 30GB available for /tmp and with Ceph, you must use raw images so that’s quite limiting. I already started to look at the code, maybe this could be my first contribution to the Openstack project.

Overall impressions

My first impression is that OpenStack is a big project with many moving, many moving parts. Installing OpenStack is not a beginners project, my experience saved me quite a few times. Logging, when you activate verbose and even debug in the configuration files is abundant and if you take your time, you can figure out what is going on and why something is failing.

Regarding the distribution, I am not sure Centos was the right fit, I think we had a rougher ride than we could have had using Ubuntu. My feeling is that the packages, not necessarily the OpenStack ones, were not as up to date as they should have compared to Ubuntu.

Ceph rbd backend is certainly a good point in my experience, I always wanted to touch Ceph and this project has been a very nice opportunity. The rbd backend works very well and the ability to launch instances almost instantaneously is just awesome. Another great plus is the storage redundancy (2 replica), like EBS, and the storage saving of working only on a clone of the original image. Also, the integration of the SSD backed pool adds a lot of flexibility. There’s only the instance snapshot issue that will need to be worked on.

So, that’s the current status of our OpenStack lab. I hope to be able to add a few more servers that are currently idle, allowing me to replace the over powerful network node and recycle it as a compute node Another thing I would really like to do is to mix the hypervisor types, I’d like to be able to use a lightweight container like LXC or docker. Also, although this is just a lab for the consultants, I’d like to see how to improve its available which is currently quite low. So, more fun to come!

About Yves Trudeau

Yves is a Principal Consultant at Percona, specializing in technologies such as MySQL Cluster, Pacemaker and DRBD. He was previously a senior consultant for MySQL and Sun Microsystems. He holds a Ph.D. in Experimental Physics.

Comments

  1. MrKitai says:

    Installing is a nightmare. Once you’ve got it you don’t want anyone to touch it.

    There are tools like Fuel and other “installers” that simplify that problem.

    OpenStack it’s good for learning, personal use, and private clouds or labs. There are better tools for production (RHEV, opennebula, etc…)

    MrKitai

  2. nate says:

    Installing is a nightmare ? Kind of funny because some Openstack experts recently told me “The easiest thing about OpenStack is setting it up – organizations spend the majority of the time simply keeping it running after it is set up.”

    My former boss currently works with openstack and said it was spot on. Not a stable platform. I suppose if installing it is difficult (I haven’t felt the need to try yet) then you probably shouldn’t even be trying to use it.

  3. Robert van Leeuwen says:

    Hi,

    Nice to see you are running with Ceph with SSD pool only.
    Since you guys are a database firm could you tell something about the performance?
    We tried to run Databases on ceph with SSD journals with spinning disks but performance was not even close to acceptable for databases.
    Wondering how it runs with SSD only pools.

    Thx,
    Robert

  4. @Robert: I am planning to do some benchmarks, the only concern I have is the relative weakness of the network supporting Ceph, it is a single GbE network. That will likely play a big role.

  5. Robert van Leeuwen says:

    @Yves,
    Cool, let me know.
    In my experience 1Gbit was not a bottleneck.
    We hit IOPS issues (doing anything concerning writes) way before that.

Speak Your Mind

*