Size of mozilla-central compared

As part of my ongoing work I’ve been measuring the size and depth of mozilla-central to extrapolate future repository size for scaling purposes. Part of this was figuring out some details such as average file size, distribution of types of files, and on-disk working copy size versus┬árepository size.

When I posted a graph comparing the size of the mozilla-central repository by Firefox version my colleague gszorc was quick to point out that the 4k blocksize of the filesystem meant that the on-disk size of a working copy might not accurately reflect the true size of the repository. I considered this and compared the working copy size (with blocksize =1) to the typical 4k blocksize. This is the result.

Mozilla-central blocksize comparison


As you can see the repository size is much smaller — about 72%. As of Firefox 5 the ratio of working copy size was about 73%. This went on a general downward trend to about 71% as of Firefox 38.

What this could mean is that 27-29% of files in the mozilla-central repository are below the 4 kilobytes in size. Most likely what it means is that 27-29% of the space used in a working copy of mozilla-central is padding smaller files until they are 4k in size, which roughly matches what I’ve found by calculating average file size in the repository.

Excluding some large binary files that are in the repository, the mean file size is 6306 bytes. This if offset by some very large source code files:

  • 4.7M ./security/nss/lib/sqlite/sqlite3.c
  • 4.8M ./js/src/octane/mandreel.js
  • 5.3M ./db/sqlite3/src/sqlite3.c
  • 8.6M ./js/src/jit-test/lib/mandelbrot-results.js

However, if we look at median filesize we come up to something much more plausible: 1173 bytes.

Here is the new working copy size in comparison with the source lines of code count from the original chart:

Working copy size (bs=1) vs SLOC

From this we can see a general upward trend in the amount of space used versus source line count. This can mean one of two things: more binary assets are being added compared to the amount of code added, or that more files below 4k in size are being added to the repository.

Leave a Reply

Your email address will not be published. Required fields are marked *