Numbers rule the world

One thing that has become clear in my work as a Firefox coding steward is that we have no idea how healthy our community is. When people ask me how many unpaid contributors we have, I don’t have an answer I can give them. The corollary to this is that any attempts to change our contributor engagement processes will be complete shots in the dark, with us fumbling to discover what kind of effect our changes cause. David Eaves underlined this message in the community building workshop yesterday, so I decided to do something about it.

Measuring the size of a whole community is hard, so I’m not tackling that problem yet. Instead, I’ve focused on the problem of making first-time contributions more visible, and measuring the rate of incoming new contributors in a methodical way. I decided to extract the commit data of mozilla-central into a database which I could query in useful ways; in particular I store a table of committers that includes a name, email address, number of commits, date of first commit, and date of last commit. You can see the raw results of my work at my github repository, where I’m banging away at multiple tools that use this data in different ways.

Tool the first – the Firefox coding community health report:

This measures the number of first patches that are committed in any given month. This graph allows us to see how effective we are at shepherding new contributors through the process of shipping their first change to the code.

Tool the second – the first-time contributor awareness raiser:

This tool lists the contributors whose first patch was committed in a given period of time. With this, we can create reports that will allow us to take actions that celebrate this accomplishment (blog posts, swag, announcements in meetings, submissions for about:credits, etc.). This will allow us to take the burden of these tasks off of the engineers who can’t see beyond the patch, and place it in the hands of people who have a broader vision of building community effectively.

I’m not finished yet! I’m also really interested in measuring how many contributors drop off in a given month – by combining this with the incoming contributor data, we can get rough data about churn in the community, and compare how many people are leaving vs. how many newcomers we have. There are many more interesting measurements to be taken here, and I’m excited to be digging into this data. Feel free to join me! The github repo contains the script to create the database from the git mozilla-central repository, and you’re welcome to explore the data with me.

5 Responses to “Numbers rule the world”

  1. You might want to do some filtering; on that second thing, I did a search starting at 2012-11-01 – and the first result was a @mozilla.com address. While that’s quite likely a new contributor and should be celebrated, I don’t think that’s a good measure of community health ;)

    Of course, this is a Hard Problem – that list also includes Bas, a gfx peer using a non-mozilla.com address; also, Reed who previously interned at Mozilla, using not-an-address :)

  2. Yes, I’m well aware of those issues. I also don’t know of a good way to filter out employees of partners like Telefonica, who don’t appear to be using common corporate emails. So it goes.

  3. [...] Josh Matthews posted a very interesting blog about building up data about Mozilla’s community. It caught my [...]

  4. I always liked the “where kernel X.X came from” contribution reports on LWN, e.g. http://lwn.net/Articles/517564/ and wonder every time I read one if stuff like that could be done for Mozilla code as well…

  5. [...] Numbers rule the world [...]

Leave a Reply