When we began using Chef three years ago, we focused on ensuring that the dependencies of a particular cookbook could be upgraded without impacting the existing instances. The metadata.rb provides a nice mechanism for this but we still needed the "test" and "prod" environments to run different versions from each other and from the next version under development; followed by a smooth upgrade & rollout path. Additionally, we often have a need for a clustered service, say RabbitMQ, where more than one cluster exists.

This led us to adopt the "Gangnam style" of cookbook authoring http://devopsanywhere.blogspot.com/2012/11/how-to-write-reusable-chef-cookbooks.html to great success. However, our structure involved three levels: the "community" cookbook, wrappered by the "revinate" cookbook, wrappered by the "gangnam" cookbook; which in some cases is simply a single attribute identifying "this" as a separate cluster. If other attributes need tuning, such as total RAM, max open files, etc…​ this is still what we’d do but frequently the number of nodes is the only difference between clusters.

We were using Chef Roles such that the query mechanism in Chef could find the other nodes to configure properties files and join the cluster. Since Roles could’t be version pinned, we used Chef Environments to declare which version of the cookbook, corresponding to the role would be running in a particular environment. This became clumsy because so many files had to be configured to deploy and promote cookbooks.

So…​ Here are two simple tricks we’ve started using to have clustering and versioning without all the extra artifacts.

1) Use the recipe with version flag directly in the bootstrap command (names have been changed to protect the innocent):

knife bootstrap -E test -N rabbitmq-forest-01.test.dc1 -x rootusername --sudo -r "recipe[revinate_rabbitmq@0.1.0]"

Referring to the recipe directly with @version eliminates the need for the Role and the Environment (at the expense of being able to do Role queries.) Then, the cookbook for a particular node can be upgraded with:

knife node run_list set rabbitmq-forest-01.test.dc1 'recipe[revinate_rabbitmq@0.1.1]'

This allows machines within a cluster / environment pair to be upgraded one at a time, rather than an entire environment at once.

2) Use part of the node name as the cluster identifier:

# my_cookbook/attributes/default.rb
# machines should be named: <service>-<cluster>-<nn>.<environment>.<datacenter>

cluster = node['machinename'].split('-')[1]

default['rabbitmq']['erlang_cookie'] = "#{cluster}-#{node.chef_environment}"
Always include the environment to prevent machines from being able to join across environments.

I do prefer cookbooks and server technology which use a "seed" and then gossip to communicate cluster state (e.g. Cassandra) rather than trying to query all members by Role. The cluster naming mechanism works there too:

# my_other_cookbook/attributes/default.rb

cluster = node['machinename'].split('-')[1]

default['cassandra']['cluster_name'] = "#{node.chef_environment}_#{cluster}"
default['cassandra']['seeds'] = "cassandra-#{cluster}-01.#{node.chef_environment}.dc1"

These simple techniques have reduced our administration overhead and the total number of cookbooks!