Measuring the performance improvement of Mercurial (NFS vs local disk)
- The Mercurial developers were concerned about race conditions and concurrent write/reads causing service inconsistency between hosts. This became evident when stale file handles started appearing in our apache logs.
- An extension we wrote (pushlog) was also being served off of NFS. This is a problem not because we have multiple hosts writing at once, but because the file is kept in memory for the lifetime of the hgweb-serving WSGI process, and we’ve experienced that sometimes requests to the pushlog can be served old information.
- During times of peak activity there was non-trivial IOWait which caused clone times to increase.
- Netapp licenses aren’t cheap. 😉
This took a lot of effort and coordination with the release engineering team to ensure that downtime was kept minimal and there were no feature or performance regressions along the way.