Mercurial

Trends in Mozilla’s central codebase

As part of my recent duties I’ve been looking at trends in Mozilla’s monolithic source code repository mozilla-central. As we’re investigating growth patterns and scalability I thought it would be useful to get metrics about the size of the repositories over time, and in what ways it changes.

It should be noted that the sizes are for all of mozilla-central, which is Firefox and several other Mozilla products. I chose Firefox versions as they are useful historical points. As of this posting (2015-02-06) version 36 is Beta, 37 is Aurora, and 38 is tip of mozilla-central.

September ’14 Mercurial Code Sprint

Joining me at the sprint were two of my colleagues Gregory Szorc (gps) and Mike Hommey (glandium). They took part in some of the serious discussions about core bugfixes and features that will help Mozilla scale its use of Mercurial. Impressively, glandium had only been working on the project for mere weeks, but was able to make serious contributions to the bundle2 format (an upcoming feature of Mercurial). Specifically, we talked to Mercurial developers about some of the difficulties and bugs we’ve encountered with Mozilla’s “try” repository due to the “tens of thousands of heads” and the events that cause a serving request to spin forever.

The ‘Try’ repository and its evolution

Recently (the past few years actually) we’ve been experiencing that Mercurial has problems scaling to it’s activity. Here are some statistics for example:

  • 24550 Mercurial heads (this is reset every few months)
  • Head count correlated with the degraded performance
  • 4.3 GB in size, 203509 files without a working copy

One of the methods we’re attempting is to modify try so that each push is not a head, but is instead a bundle that can be applied cleanly to any [mozilla-central](https://hg.mozilla.org/mozilla-central" target="_blank) tree.

Mozilla’s “try” repository

We have quite a bit of infrastructure around this including Tinderbox Pushlog (TBPL) and  more. This post deals with the infrastructure and problem we face while trying to scale the ’try’ repository.

A few statistics:

  • The try repository currently has 17943 heads. These heads are never removed.
  • The try repository is about 3.6 GB in size.
  • Due to Mercurial’s on-wire HTTP protocol, this number of heads causes HTTP cloning to fail
  • There are roughly 81000 HTTP requests to try per day
  • To fix problems (mentioned below), the try repository is deleted and re-cloned from mozilla-central every few months

There are a number of problems associated with such a repository. One particularly nasty one has been present through several years of Mercurial development, and has been tricky in that it is seemingly unreproducible. The scenario is something like:

Measuring the performance improvement of Mercurial (NFS vs local disk)

  1. The Mercurial developers were concerned about race conditions and concurrent write/reads causing service inconsistency between hosts. This became evident when stale file handles started appearing in our apache logs.
  2. An extension we wrote (pushlog) was also being served off of NFS. This is a problem not because we have multiple hosts writing at once, but because the file is kept in memory for the lifetime of the hgweb-serving WSGI process, and we’ve experienced that sometimes requests to the pushlog can be served old information.
  3. During times of peak activity there was non-trivial IOWait which caused clone times to increase.
  4. Netapp licenses aren’t cheap. 😉

This took a lot of effort and coordination with the release engineering team to ensure that downtime was kept minimal and there were no feature or performance regressions along the way.