Chef runs from your laptop via ssh using chef-provisioning-ssh

19 May 2015

When comparing configuration management systems, one of the biggest selling points of ansible is how easy it is to get started in a small environment - you type up a recipe/playbook on your laptop, run ansible, and it connects via SSH to your machines to configure them. This blog post shows you how to do the same thing with chef using chef local mode, chef-provisioning and chef-provisioning-ssh. This setup can be very useful if you have just a couple of machines, don’t want to set up a chef server until you grow your infrastructure, and want to orchestrate everything from your laptop.


If you haven’t already, install the latest ChefDK, which will install chef, chef-provisioning and lots of other goodies. It doesn’t however install chef-provisioning-ssh, so you’ll have to do that separately:

chef gem install chef-provisioning-ssh

Next, create your chef repository as normal. I use the chef generate commands here to speed things up a little:

chef generate repo chef-repo
cd chef-repo/cookbooks
chef generate cookbook my-cookbook
echo 'log "Hello world"' >> cookbooks/my-cookbook/recipes/default.rb
cd ..

Next, create a file called site.rb in the root of the chef repository. This is the script you will run to run chef across your infrastructure. In this example, we have just one machine:

#!/usr/bin/env chef-client -z
require 'chef/provisioning/ssh_driver'

  :chef_repo_path => File.dirname(__FILE__),
  :cookbook_path => ["#{File.dirname(__FILE__)}/cookbooks",

with_driver 'ssh'

machine 'mymachine' do
  converge true
  recipe 'my-cookbook'
  #role 'base'
  machine_options :transport_options => {
    :host => '',
    :username => 'mark',
    :ssh_options => {
      :use_agent => true

Let’s break this down:

#!/usr/bin/env chef-client -z

The shebang line here simply lets you run the script directly using ./site.rb and it will chef-client in local mode. This works because you can pass a recipe file directly to chef-client -z or chef-apply and it will be run.

require 'chef/provisioning/ssh_driver'

This is just a ruby require to load chef-provisioning-ssh.

  :chef_repo_path => File.dirname(__FILE__),
  :cookbook_path => ["#{File.dirname(__FILE__)}/cookbooks",

This tells chef-provisioning to use chef local mode when provisioning the server, with the chef repository in the current directory. This is what lets you provision the remote machine with just the configuration repository located on your laptop.

The cookbook_path parameter is optional, but I added an additional berks-cookbooks path for all cookbooks installed via berkshelf. I run berks vendor berks-cookbooks to install any community cookbooks or cookbooks with their own repository to a separate directory.

with_driver 'ssh'

Use the ssh driver for all subsequent connections.

machine 'mymachine' do
  converge true

Now we are into the machine definition itself. You have one of these for each machine you want to provision, and if desired you can use regular ruby to provision many machines at once using loops.

The converge true line says to run chef every time we run the site.rb script. Chef provisioning is intended for the initial bootstrapping of a machine, and the machine would take over its own running of chef as a daemon or via cron. Without this option, chef-provisioning would only run chef on the initial run when the machine is first provisioned.

recipe 'my-cookbook'
#role 'base'

Specify the run list for your machine here. I specified the simple recipe created earlier for this example, but you would probably use roles for a real installation as in the commented out line. Use multiple role or recipe lines as needed.

  machine_options :transport_options => {
    :host => '',
    :username => 'mark',
    :ssh_options => {
      :use_agent => true

Next is the various ssh options for the machine. There are lots of options you can use, and they are documented in the chef-provisioning docs, but What I’ve specified here is probably the minimum you will want to set - the machine’s hostname, the username to connect as, and instructing it to use the ssh agent. You can use password based authentication here too, but that would mean hard coding the password into the recipe, which is probably a bad idea.

Once you have your site.rb created, chmod +x site.rb to make it executable, and then run it:

$ ./site.rb
Starting Chef Client, version 12.0.3
resolving cookbooks for run list: []
Synchronizing Cookbooks:
Compiling Cookbooks...
[2015-05-09T00:03:35-04:00] WARN: Node laptop has an empty run list.
Converging 1 resources
Recipe: @recipe_files::/Users/mark/git/chef-repo/site.rb
  * machine[mymachine] action converge
    [mymachine] Starting Chef Client, version 12.2.0
           resolving cookbooks for run list: ["mh-base::test"]
           Synchronizing Cookbooks:
             - my-cookbook
           Compiling Cookbooks...
           Converging 1 resources
           Recipe: my-cookbook::default
             * log[Hello world] action write

           Running handlers:
           Running handlers complete
           Chef Client finished, 1/1 resources updated in 5.050801491 seconds
    - run 'chef-client -l auto' on mymachine

Running handlers:
Running handlers complete
Chef Client finished, 1/1 resources updated in 20.512658 seconds

And there you have it. Chef runs orchestrated entirely from your laptop. Chef-provisioning will automatically use sudo, so as long as you have key based ssh access, and your user on the remote machine has sudo access, things will just work.


There are a few things to be aware of with this setup, mostly surrounding the fact that there is a chef-zero server running under the hood, and the fact that it keeps some saved state.

First, if you change how you connect to the server, chef-provisioning will continue to try to use the old cached credentials to connect. This type of situation is common if you initially bootstrap a machine using the pi or ubuntu users, and then delete them and connect using a regular user instead.

To fix this, delete ~/.chef/provisioning/ssh/mymachine.json, and chef-provisioning will use the new connection parameters on the next run.

Similarly, any modifications to the run list in site.rb are additive. Chef-provisioning doesn’t wipe away any existing run list on the node when it runs. This means that if you want to remove a run list entry, you should do it with knife from inside the chef repository once you have fixed site.rb:

knife node run_list remove mymachine 'recipe[my-cookbook]' -z

This is exactly the same as regular knife commands, but you add the -z option. I find it easiest to just put it on the end of the command, as you get an error if the option appears first because knife is expecting to see a command there.

What’s going on here?

Chef-provisioning is intended to let you quickly get machines up and running entirely using chef. It essentially does four things:

Chef-provisioning-ssh is a driver for chef-provisioning that lets you configure machines for ssh, which means it skips step one above - the machine is already created, so nothing needs to be done. Conveniently, the other three steps are exactly what’s needed to complete a chef run from scratch, and that’s exactly what we abuse chef provisioning to do.

The next part of the magic is chef-zero, or chef local mode. This lets you start a chef server on your laptop pointing to a local repository on disk, without any prior setup, and the server goes away once the chef run is complete. Chef-provisioning also sets up port forwarding over ssh so that the remote server can talk to the chef server on your laptop even if there is no direct connection back.

The beauty of this approach is, if your infrastructure grows large enough to require a chef server, you just upload the contents of the repository to the chef server, update the client config on the server (using chef of course), and you’re up and running on a real chef server with no other changes to your clients.

While there are a few issues that need to be worked out (and which I’m working on), this approach is serving me well for my home setup, and it’s incredibly simple to get started with.

Python module/plugin loading pattern

29 Dec 2013

When writing a new program in python, I often find myself using the same pattern over and over, and wanted to document it here. The idea is that you have some core functionality of the program, and loadable modules to expand the functionality, with only some of the modules being loaded at runtime. These modules are single files inside a modules/ directory, contain a single Module() class (and optionally others depending on what the module needs), and some other functionality that the core of your program will use. This pattern lets you specify which modules to load in a configuration file, and load them up on demand, passing in some configuration to the module.

First, the code to load the modules:

import ConfigParser
import logging
import os
import time

# Read the config file(s)
c = ConfigParser.SafeConfigParser()['/etc/myapprc', os.path.expanduser('~/.myapprc')])

# Load the modules
modules = []  # List of module objects
for s in c.sections():
    # We define which modules to load by the fact that there is a section
    # in the file for that module, starting with "mod_". This lets you
    # have other configuration in the file if needed. You can also get the
    # list of modules to load in some other way if desired.
    if s.startswith("mod_"):
        modulename = s[4:]  # Strip off the mod_
            # Actually import the module
            module = __import__('modules.%s' % modulename, globals(),
                locals(), [modulename])
        except ImportError, e:
            # Print an error if we failed, and carry on, you could exit
            # here if desired. I also tend to use the logging module, but
            # this could just be a print statement or something else if
            # needed.
            logging.error("Unable to load module %s: %s" %
                          (modulename, e))

        # Instantiate an instance of the module, and pass in the config
        # from the file. I convert all items to a dict for ease of access
        # by the module. You could also pass the raw config parser object
        # if desired.
        module_object = module.Module(dict(c.items(s)))
        # Add the module to the list. This can be more complex than just
        # storing the module object (e.g. adding a name and some config in a
        # dict) if you needed to store more informaiton.

# Do something with the modules here. Simple example here to call all module
# run() methods over and over, sleeping 10 seconds in between.
while True:
    for m in modules:

This gives you a list of module objects in modules, and you can then call methods on those modules (I tend to have a run() method or have the __init__ method set up callbacks for various events depending on the program’s functionality).

A module itself would be a single python file inside modules (e.g. modules/ and look something like this:

class Module(object):

    def __init__(self, config):
        # config is a dictionary containing the relevant section of the
        # configuration file = config.get('foo', 'some_default')
        # more initialization code here

    # ... other module code goes here

You also need to make sure your modules directory has an file (it can be blank) so that python knows the directory contains modules.

This pattern can be expanded on in various ways depending on the needs of your program, but for anywhere where you have pluggable functionality, it works well.

Time Management for Systems Administrators and todo.txt

11 Jun 2013

I’ve been reading Tom Limoncelli’s Time Management for Systems Administrators once again, and am trying out (once again, for real this time) the time management system described in the book called “the cycle”. I also use todo.txt system as one of my (several) task list tools, and decided to try to make it work with the cycle.

The advantages of the todo.txt system are its simplicity, ease of adding new items (with an alias it’s simply typing t add Do something in any terminal window). The built in tools did need some tweaks to get things working correctly however.

One very nice feature of the cycle is that you have a different todo list for each day. Do you have something that needs doing Wednesday next week? Add it to that day’s todo list and you don’t have to think about it again until Wednesday comes around, at which point it will be on your todo list ready for you to look at. Items that you don’t get done today, either because you have decided you’re not doing them today or because you run out of time at the end of the day, get bumped to next day’s todo list and are marked as ‘managed’ on today’s todo list. This system makes it very easy to see what you have left to do today, and lets you immediately get items that aren’t going to be done today out of the way and to somewhere where you’ll come across them as a more appropriate time.

Out of the box, the todo.txt cli script, is primarily made to work with a single file, namely todo.txt, and another to archive completed tasks so they don’t clutter up the main file. Because of this, I thought I’d have to write another tool to effectively deal with daily todo lists. However, the config file is actually sourced as a shell script, and so it was a simple matter to change to using a todo.txt file based on today’s date:

export TODO_FILE="$TODO_DIR/$(date +%Y-%m-%d).txt"
export DONE_FILE="$TODO_DIR/$(date +%Y-%m-%d)-done.txt"

So far, so good. I run and it instantly picks today’s todo list.

$ add Throw things at Brian once I get in the office
1 Throw things at Brian once I get in the office
2012-05-25: 1 added.

The book recommends 3 priorities for tasks: A - needs to be done today, B - needs to be done soon, C - everything else. also uses alphabetical priorities, and they’re simple to add/change:

$ pri 1 A
1 (A) Throw things at Brian once I get in the office
TODO: 1 prioritized (A) even has built in support for looking at other dated files:

$ listfile 2013-05-24.txt # you could use ' lf' here
1 Revoke VPN keys for bob
2 Purchase new servers

The missing piece was the ability to bump tasks, both individually and with a single command to bump any incomplete tasks left at the end of the day. This functionality was implemented as an addon to, which is available on github.

Commands that are in the addon:

I also have the following in my setup, which are included as separate addon commands in the git repository:

One of the biggest downsides of this system is that it’s tied to the computer. Tom mentions in the book that your system should always be with you to be able to write something down as someone tells it to you in the hallway. While I agree that you should have something with you at all times, it doesn’t have to be the entire todo list system. Borrowing from GTD, I just keep a small notebook or stack of index cards to capture new items on, and review it regularly, adding items to the computerized list as I get back to my desk. The cycle already has a step where you collect new items from phone calls, emails and so on. And having a small notebook or stack of index cards and a pen handy ensures that I don’t lose track of anything… as long as I write it down.

I’ve been using this system for a few weeks now, and it’s working pretty well for me so far. If you’re interested, download the todo.txt cli scripts, and the tmsa todo.txt addon and give it a try!

Bash prompt snippets for git, rvm, virtualenv, ssh

10 Jun 2013

The following is just a couple of snippets I have in my bash prompt to identify various environments that I’m in for rvm, git and so on. While you can usually find some way of getting this information in your prompt on the sites of the individual programs, it’s nice to have something where it’s all together. I’ve also tried to make an effort to avoid slow versions of various commands, ideally just parsing environment variables if possible. Each section of the prompt also has the relevant technology prefixed (e.g. git:branchname or rvm:1.9.3@gemset)

Note: My real prompt has several things not mentioned here. For example, I’ve stripped out all colors added to the prompt. The aim of this is to give you a guide to how to quickly get prompt information for git, rvm, virtualenv and so on.

prompt_command() {
    # Runs every time a prompt is displayed

    # Command status - display as red if non-zero
    local BRIGHT=$(tput bold)
    local RED=$(tput setaf 1)
    if [[ $RETVAL != 0 ]]; then

    # Git
    if declare -f __git_ps1 >/dev/null; then
        # Only run if we have the git prompt stuff loaded
        GITPROMPT=`__git_ps1 "%s"`
    [[ $GITPROMPT == "(unknown)" ]] && GITPROMPT=

    # Virtualenv wrapper
    # Note: add 'PS1=$_OLD_VIRTUAL_PS1' to ~/.virtualenvs/postactivate to
    # stop virtualenvwrapper's normal prompt behavior

    # RVM
    local GEMSET=${GEM_HOME#*@}
    if [[ -n $GEMSET && $GEMSET != $GEM_HOME ]]; then

setprompt() {
    # Customize these as appropriate

    # SSH - shows you what IP you're connecting from
    local SSHPROMPT=${SSH_CLIENT%% *}

    # Set the prompt variable here, customize as appropriate. The
    # important parts are the GITPROMPT, VENVPROMPT and so on. Note that
    # anything set via PROMPT_COMMAND needs to have a backslash before it
    # in order for it to be evaluated every time


Vimcram - making testing vim scripts suck less

14 Apr 2012

A while ago I was reading a blog post with tips on writing vim plugins. There’s lot of good information there, and if you find yourself writing any vim scripts or plugins, it’s well worth a read. I was surprised by one point though, the section on testing. My vimtodo plugin has a large number of regression tests, and it’s my safety net to make sure that I’ve not horribly broken something with my latest change. I don’t subscribe to the test driven development philosophy of writing a test and coding only until it passes, but I do find it useful to have a few tests to guard against you breaking something.

So when I read that testing sucks so much in Vim that you should avoid it, I was a little surprised, and certainly disagreed with the sentiment. My tests work fine, and I don’t recall it being particularly hard to implement the tests. So I took a look at the shell script testing tool mentioned in the blog post, cram, to see what was so special about it. I looked back at my tests in vimtodo, then at the example of the cram page, and back at my tests. The cram example showed a test that basically looked like a transcript of a shell session, and definitely not like the mess of code that comprised my tests. I thought that it couldn’t be too hard to implement something like that for vim, and the guy who wrote the blog post offered to buy a nice bottle of scotch for anyone who did so. One can never drink too much scotch (this may or may not be true), and so I got cracking.

The result is vimcram, which is now up on github. There are still a few features I’d like to add, but it’s pretty usable, and I’m currently converting all of my tests on vimtodo over to using it.

Tests in vimcram look like this:

Test substituting text:

    > Add some text
    > Add some more text
    :%s/some //
    Add text
    Add more text

Test normal mode commands

    more text

Which, while not exactly a transcript of a vim session (it’s kind of hard to do with a visual editor and multiple modes), is pretty straightforward.

If you have a vim plugin and don’t have any tests for it, give vimcram a try. It might make test writing easy enough that you actually write them! is now hosted on github with jekyll

27 Jan 2012

Ever since I saw Jekyll, I liked the idea of having a statically generated site for something as simple as a blog where you really don’t need dynamic content (except for comments where, as you can see below, I cheated and used intense debate). But I hadn’t touched anything in ruby before (at the time it probably wasn’t even installed on my web server), was suffering from not-invented-here syndrome, and decided to write my own version in Python. It was clunky, but it worked, for the most part. Two years later however, it’s seen no love, was in dire need of some maintenance/improvement, and I finally realized that pretty much everything I wanted to do was already done in Jekyll.

The site was simple to convert (the original program being an attempted clone), and it now means that I can punt on hosting and let github deal with things. And if github turns out not to be a good choice, then it’s simple to host anywhere. It is after all, a static site.

Backups with bup

27 Jan 2012

I’m thinking about backups once more, and thought I would take a look at bup. Bup’s claim to fame (and the reason I first heard about it) is that it’s a git based backup system, or rather it uses the git repository format, with its own tools to make it deal with large files effectively. The more I looked at it, the more I realized that bup being git-based isn’t the main feature. Bup has a rolling checksum algorithm similar to rsync and splits up files so that only changes are backed up, even in the case of large files. This also has a nice side effect: you get deduplication of data for free. This includes space efficient backups of VM images, and files across multiple computers (the OS files are almost identical). I have two laptops with the same data (git repositories, photos, other work) on both of them, and multiple VM images used for testing, so the ability to have block level deduplication in backups sounded ideal.

Bup can also generate par2 files for error recovery, and has commands to verify/recover corrupted backups. This is a useful feature given that bup goes to great lengths to ensure that each piece of data is only stored once.

My old backups were with rsnapshot, and as it happened, bup has a conversion tool for this, so the first step was to move them over to using bup. The command do to this is bup import-rsnapshot, but this didn’t quite work for me and gave an error when running bup save. Thankfully there is a dry-run option which prints out the commands that bup uses, and because rsnapshot backups direct copies of the files, what bup does is basically back up the backup. So I ended up running:

export BUP_DIR=/bup
/usr/bin/bup index -ux -f bupindex.rigel.tmp manual.0/rigel/
/usr/bin/bup save --strip --date=1314714851 -f bupindex.rigel.tmp \
    -n rigel manual.0/rigel/

The two bup commands were directly output from the import-rsnapshot command, and I did this multiple times for each backup I had.

Next was to take the initial backup from my laptop. This was actually a different laptop from the one I took the rsnapshot backups with, but I’d copied over a lot of the data and wanted to see how well the dedup feature worked. As can be seen with the rsnapshot import, taking a backup is actually two steps, bup index followed by bup save. The index command generates a list of files to back up, while the save command actually does it. The documentation gives a couple of reasons for splitting this in to two steps, mainly that it allows you to use a different method (such as inotify) to generate and update the index, and it also allows you to only generate the list of files once if you are backing up to multiple locations. This separation of duties appeals to the tinkerer in me, but it would still have been nice to have a shortcut ‘just back it up’ command, similar to how git pull is a combination of git fetch and git merge.

The commands to take a backup are:

export BUP_DIR=/bup
bup index -ux --exclude=/bup /
bup save -n procyon /

First, I set the backup directory to /bup. What I’m doing here is backing up locally (and copying to an external hard drive later), but you can also pass the -r option to back up directly to a remote server via ssh.

I also pass the -x option to bup index to limit it to one filesystem, and also exclude the backup directory itself from the backup.

Next, the bup save command actually performs that backup. I passed in the hostname of my laptop (procyon) as the name of the backup set. Multiple backups can have the same name, and they show up as multiple git commits, so a hostname is a good choice for the name of the backup set.

As I mentioned above, bup can make use of par2 to generate parity files. This is a separate step, and is done using the bup fsck command:

bup fsck -g -j4

The -g option generates the par2 files, and the -j 4 option means run up to 4 par2 jobs at the same time. Generating parity files is CPU intensive, so I set it to twice the number of CPUs in my system. I have hyperthreading turned on, and it saturated all 4 ‘virtual’ CPUs. Once this was done, I ended up with several .par2 files in the /bup/objects/pack directory (this is a git repository, and all data is stored in the objects/ dir.

And the results? Bup used 30GB for 2 original backups from rsnapshot (rsnapshot used 26GB and 37GB for the first and second backups, and this was taking into account identical files). Then, when I backed up my 2nd laptop (with approx 40GB used at the time) the size of the bup backup increased by only 4GB. This backup of my laptop included a 5GB ubuntu VM image that didn’t exist in the previous snapshots, so bup must have been able to deal with the duplicate data from the image and the live OS.

All of this sounds amazing, but of course there are a few downsides, all of which are spelled out pretty plainly in the bup README:

That said, if you can live with the above limitations, and want incredible space savings for your backups (especially across multiple computers), then I would suggest giving bup a try.

From Solaris to FreeBSD

31 Mar 2010

Less than one week after I switched my hosting over to Solaris 10, with all its ZFS/dtrace goodness, Oracle quietly makes a license change that everybody dealing with Solaris is likely now familiar with, and Solaris 10 is no longer free to use. Emails to Sun/Oracle’s licensing department result only in form letters repeating instructions on the website, and then nothing.

I can’t really blame Oracle for this, Sun didn’t make enough money to survive, and Oracle has this radical idea that you need to actually charge people in order to make money. I can blame them for not providing more clarity regarding the issue (so far they haven’t announced anything), and for leaving customers unsure about what’s going to happen next. However, this is mostly besides the point. I now needed to look into a good alternative.

OpenSolaris is the obvious candidate, and I’ve played around with it a little previously, but I can’t make myself like some of the changes made to it. The biggest annoyances being related to the new packaging system and some of the poor choices made in its design (e.g. no –nodeps option). That is an entire post (or rather, rant) in itself however. In addition to this, I can’t help but believe that Oracle is going to make some change to OpenSolaris that makes it not a realistic option.

This is where FreeBSD comes in. With release 8.0, ZFS has become a fully supported filesystem. It has dtrace support, jails (just like zones), even virtual networking so you can have a full network stack inside the jail.

For my personal server, the main feature I was interested in was ZFS, specifically ZFS root/boot. With ZFS it is trivial to set up mirrored drives, and I wanted to avoid doing software raid with UFS as well as ZFS. Thankfully there is extensive documentation on how to do this. It isn’t in the standard install, but if you need a repeatable procedure for many servers, it’s a (relatively) simple matter to script the installation, and you would probably want to do this anyway for an automated install.

There were a few gotchas, as with any new system you’re not familiar with, but so far it looks quite nice. I’ll be looking further into jails (especially the vimage jails) and other nice features. Hopefully, FreeBSD will turn out to be a good replacement for Solaris.

Gitosis - manage git repositories sanely

05 Mar 2010

I’ve finally made all my projects available publicly via git at thanks to gitosis. Before that, I kind of just thrown everything in a git directory under my home directory and accessed it over ssh, which worked fine for private repositories, but fell flat whenever I wanted to make something available to somebody else.

Gitosis promised to make it easy to add new repositories and set up access for new people as needed, and once everything is set up, it is really easy - everything is contained in a config file inside a git repository, so you can make changes locally and push. You also have the benefit that your changes themselves are under version control. However, there were a few hiccups along the way, so I’m going to describe what I did in case others try and hit the same problems I did.

Gitosis uses python and setuptools, which I already had available. I’m running Ubuntu, so installing any requirements is as simple as running:

aptitude install python python-setuptools

Of course, git itself is a requirement. For now we’ll use the Ubuntu package, but it’s a good idea to build from source if you want the latest version:

aptitude install git-core

Next, get the gitosis source:

git clone git://

and install:

cd gitosis
sudo python ./ install

So far, everything is pretty straightforward. Next we need to add a user that everyone will connect as in order to access repositories. The main method gitosis uses for accessing repositories, is to have a single user that everyone connects to over ssh. Logins are only allowed via ssh keys, and anyone who connects is restricted to running gitosis commands, preventing them from accessing anything they shouldn’t.

sudo adduser --system --shell /bin/sh --gecos 'git user' \
    --group --disabled-password --home /srv/git git

Here I’ve set the home directory to /srv/git. This directory will hold all repositories and gitosis files. Next we need to initialize this directory with all of the gitosis configuration files:

sudo -H -u git gitosis-init <

(the -H option to sudo sets the HOME variable to the user you are running commands as. In this case - /srv/git).

The file should be your ssh public key for the computer you are working on now. You will use this to access the administration repository and any other repositories you create later. If you don’t have an ssh key set up already, make one now and copy the file to the server before running the above command:

ssh-keygen -t rsa -b 4096

For more information on ssh-keys, see the ssh-keygen man page.

Note: by default, gitosis takes the comment field of your ssh key to be your username. In my case, it was mark@laptop, and I would have had to use mark@laptop as my username whenever editing permissions. If you want something nicer, edit the copy of your public key before running the gitosis-init command and change the comment field to something a little nicer.

Now you have the basic server set up. To edit the configuration, clone the administration repository:

git clone

Then you can edit the gitosis.conf file, commit it, and push back to the server.

At this point I hit my first snag. Any changes pushed back to the server didn’t take effect. The magic updating of settings wasn’t working. After some hunting around (read: typing stuff into Google and clicking frantically), I found that all of the magic is done via a hook on the gitosis-admin repository. For some reason, the hook script wasn’t executable, and so never ran. Before committing any configuration changes, make sure to fix the permissions on the repository hook:

sudo chmod +x /srv/git/repositories/gitosis-admin.git/hooks/post-update

The basic gitosis setup at this point is complete. Aside from adding repositories and new users, the other steps are optional. However, we are talking about making repositories publicly accessible, and the other two steps - setting up git:// access via git-daemon and setting up gitweb will do this.

First though, here’s a quick overview on adding users/repositories.

To add a new user, get a copy of their ssh public key (ssh keys are what makes the whole thing work), and copy it to your gitosis-admin checkout inside the keydir directory. Name the file, replacing the username with the name of the user you wish to add - this is the username you will use when setting permissions. For example, if you add, then you will use joe as the username in the configuration below.

To add a repository, you just give somebody permission to access it and then push. This involves editing the gitosis.conf and adding a few lines:

[group foo]
writable = myrepository
members = joe

This allows user joe to write to myrepository.git. You then add this as a remote in your local repository and push to create the repository on the server:

cd path/to/my-repository
git remote add origin
git push

This assumes you actually have something to push. In practice this isn’t an issue - you start with a blank local repository (using git init), commit your first changes, and push. The first person to push actually creates the repository.

Setting up git:// access

This part allows people to clone a repository without needing to authenticate, and without having to generate ssh keys. You can’t push to repositories in this way however - you have to use ssh if you want to push back to a repository. Chances are, you don’t want all repositories to be public, and gitosis allows you to pick and choose which you make public and which you make private using (wait for it…) the gitosis.conf file.

Setting up git:// access is as simple as running the git-daemon command:

sudo -u git git-daemon --base-path=/srv/git/repositories/

If you’re running Ubuntu however, gitosis comes with a nice script that you just drop in to /etc/event.d, edit to change the path, and it will start the git-daemon automatically on boot:

sudo cp gitosis/etc-event.d-local-git-daemon /etc/event.d/local-git-daemon
sudo sed -i s+/srv/ /etc/event.d/local-git-daemon
sudo initctl start local-git-daemon

The initctl script starts the daemon without rebooting, which is usually a good thing.

By default, no repositories are made public. To make them public, you need to add a daemon = yes option to your gitosis.conf:

[repo myrepository]
daemon = yes

Here we have made a new repo section for myrepository. Save the gitosis.conf file, commit, push, and you should be able to clone myrepository using git://

Gitweb - making everything look pretty

The final step is getting gitweb working. For this you need a copy of gitweb.cgi and associated files. I built git from source, and gitweb.cgi was built as part of this, but if you didn’t do this, there is an Ubuntu package available called gitweb. I also use lighttpd on my server, with pages stored under /srv/www/ so I’ll be describing a configuration for that server and layout.

First, copy gitweb.cgi, gitweb.css, and all of the images to /srv/www/ I put the css files and images inside a pages subdirectory (the document root), and put gitweb.cgi inside a separate cgi-bin directory outside of the document root.

Next, configure lighttpd. I have simple-vhost set up which sets the document root based on the domain name requested, so we only need to do special set up for the git/cgi parts:

$HTTP["host"] =~ "^git\.your-server\.example\.com$" {
    url.redirect = (
        "^/$" => "/gitweb/",
        "^/gitweb$" => "/gitweb/"
    alias.url = (
        "/gitweb/" => "/srv/www/",
    setenv.add-environment = (
        "GITWEB_CONFIG" => "/srv/www/",
    $HTTP["url"] =~ "^/gitweb/" { cgi.assign = ("" => "") }

Gitosis does provide a config file for lighttpd, but it wasn’t appropriate for my setup. Note that the above needs the following modules loaded: mod_alias, mod_cgi, mod_redirect, mod_setenv.

Gitweb.cgi needs editing slightly in the above configuration, by default it looks for the css file in the same location as the gitweb.cgi file (i.e. in a /gitweb/ dir), but they are stored at the root of the site. Open up gitweb.cgi and search for gitweb.css. Add a slash before the filename and save the file.

Next is creating a gitweb.conf file. Again, gitosis helps out here, providing a gitweb.conf file that just needs some tweaking with the right paths. Copy the gitweb.conf file from the gitosis source distribution to /srv/www/, and open it up for editing.

Edit the $projects_list, $projectroot, and the @git_base_url_list lines and save:

$projects_list = '/srv/git/gitosis/projects.list';
$projectroot = "/srv/git/repositories";
@git_base_url_list = ('git://');

By default, gitosis creates repositories that are only accessible by the git user and users in the git group, so we need to give the web server permissions to access repositories. If this isn’t done, gitweb will say that there are no repositories available even when you configure web access in gitosis for the repository.

usermod -G git www-data

You will need to restart the web server after this in order for the group change to take effect. If your web server is running as someone else other than www-data, change the above command appropriately.

Finally, to give access to a repository via gitweb, the process is similar to setting up git:// access. Edit gitosis.conf, and add a gitweb = yes line next to the daemon = yes line for the repository. Commit, push, and the repository should now show up in gitweb. In the default configuration, you need to have both daemon = yes and gitweb = yes for a repository to be made available via gitweb. See the gitweb.conf file if you want to change this.

Bash function renaming and overriding

20 Sep 2009

One annoyance I found when writing bash scripts is the lack of function references. This became apparent when overriding a function, but when I wanted to change the behavior only slightly. I had a library of functions, and wanted to add some commands before the start of the function, and some cleanup code immediately after it finished.

This being a library function that was called elsewhere, I couldn’t edit the function in the library itself. Nor could I edit the calling code and add the steps before and after - the calling code was itself another library function. This left the option of copying and pasting the entire function, and adding my extra code to the beginning and end.

In python (and many other languages), I would have done something like the following:

old_foo = foo

def foo():

but bash doesn’t seem to support function references in that manner. After much searching however, I finally found a way to save a function under a new name, which gives the same kind of functionality using bash’s declare builtin.

The declare command prints out the values of declared variables, and more importantly, declared functions - declare -f foo will print out the code for function foo. So all you need to do is execute the output of the declare -f command, after substituting the name of the function. The following bash function does just this:

save_function() {
    local ORIG_FUNC=$(declare -f $1)
    local NEWNAME_FUNC="$2${ORIG_FUNC#$1}"
    eval "$NEWNAME_FUNC"

Add that to your scripts, and you have a simple way to copy/rename a function, and a simple way to add a step before/after an existing function. To copy the python example above:

save_function foo old_foo
foo() {

Now any code calling foo in the script will get the new behavior.

Bash quoting and whitespace

18 Apr 2009

A common thing when writing shell scripts is to allow the user to specify options to commands in a variable. Something like the following:

$ OPTS="--some-option --some-other-option"
$ my_command $OPTS

We can set my_command to the following script to see exactly what gets passed:

for t; do
    echo "'$t'"

Running the above prints the following output:


This works fine, until you want to include options with whitespace in them:

$ OPTS="--libs='-L/usr/lib -L/usr/local/lib'"
$ my_command $OPTS

This output clearly isn’t what we want. We want a single parameter passed with the entire content of $OPTS. The culprit here is Word Splitting. Bash will split the value of $OPTS into individual parameters based on whitespace. One way to get around this is to put $OPTS in double quotes:

$ OPTS="--libs='-L/usr/lib -L/usr/local/lib'"
$ my_command "$OPTS"
'--libs='-L/usr/lib -L/usr/local/lib''

$ OPTS="--libs=-L/usr/lib -L/usr/local/lib"
$ my_command "$OPTS"
'--libs=-L/usr/lib -L/usr/local/lib'

Putting $OPTS in double quotes suppresses word expansion. In the above example, that works as expected. The second command has the single quotes removed as they were passed directly to the command, which isn’t what we wanted. So far, so good. The problem, as you may have spotted by the removal of the single quotes, comes when we want to pass more than one parameter in $OPTS:

$ OPTS="--cflags=O3 --libs=-L/usr/lib -L/usr/local/lib"
$ my_command "$OPTS"
'--cflags=O3 --libs=-L/usr/lib -L/usr/local/lib'

Here, the entire $OPTS variable gets passed as a single parameter, which isn’t what we want. We want --cflags to be passed as one parameter, and --libs (and everything that comes with it) to be passed as another parameter. Adding more quotes, backslash escaped or not, does nothing to help.

The solution? Use bash arrays:

$ OPTS=("--cflags=O3" "--LIBS=-L/usr/lib -L/usr/local/lib")
$ my_command "${OPTS[@]}"
'--LIBS=-L/usr/lib -L/usr/local/lib'

Perfect. But what about backward compatibility? If you have hundreds of scripts that use a string for $OPTS, how does it work if you change to using arrays? Let’s try it out:

$ OPTS="--some-option --some-other-option"
$ my_command "${OPTS[@]}"
'--some-option --some-other-option'

So it works if your old scripts only have single options, but if multiple scripts are needed, then they will need to be changed to use arrays instead. This however seems to be the best option for passing multiple arguments with whitespace.