10.21.14

Missoula visit, day 1

NOTE: This is a personal post, so if these sort of things do not interest you please feel free to disregard.

Yesterday I drove from Seattle to Missoula to visit my mother and help her sort out her health issues. I left later than usual, but with the drive time reduced by an hour compared to Portland combined with the relatively straight and boring I-90 I made it to Missoula with energy to spare (which also might have been why I was up far later than my arrival time).

The purpose of my visit is to visit my mother, show that I still care about my family, and help her sort out her medical issues (she’s currently going through her third round of cancer). When I arrived last night around 2300 I didn’t get a very good look at her. She had stayed awake past her normal 2000 time to await my arrival and greet me. Last night was relatively uneventful besides my restful sleep. She showed me to the apartment’s single bedroom. Each surface was meticulously cleaned, although none of the multitude of tchotchkes or personal accessories were organized or put away. While using the bathroom I noticed there was a small mop and bucket. Normally I wouldn’t think much of it, but in previous weeks my sister told me that my mother had given herself a panic attack making sure the apartment was spotless for my arrival. Unfortunately my trip over was waylaid for a few weeks, so I hope that she hasn’t been in such a state the whole time.

Read the rest of this entry »

10.9.14

September ’14 Mercurial Code Sprint

A week ago I was fortunate enough to attend the latest code sprint of the Mercurial project. This was my second sprint with this project, and took away quite a bit from the meeting. The attendance of the sprint was around 20 people and took the form of a large group, with smaller groups splitting out intermittently to discuss particular topics. I had seen a few of the attendees before at a previous sprint I attended.

Joining me at the sprint were two of my colleagues Gregory Szorc (gps) and Mike Hommey (glandium). They took part in some of the serious discussions about core bugfixes and features that will help Mozilla scale its use of Mercurial. Impressively, glandium had only been working on the project for mere weeks, but was able to make serious contributions to the bundle2 format (an upcoming feature of Mercurial). Specifically, we talked to Mercurial developers about some of the difficulties and bugs we’ve encountered with Mozilla’s “try” repository due to the “tens of thousands of heads” and the events that cause a serving request to spin forever.

By trade I’m a sysadmin/DevOps person, but I also do have a coder hat that I don from time to time. Still though, the sprint was full of serious coders who seemingly worked on Mercurial full-time. There were attendees who had big named employers, some of whom would probably prefer that I didn’t reveal their identities here.

Unfortunately due to my lack of familiarity with a lot of the deep-down internals I was unable to contribute to some of the discussions. It was primarily a learning experience for me for both the process which direction-driving decisions are made for the project (mpm’s BDFL status) and all of the considerations that go into choosing a particular method to implement an idea.

That’s not to say I was entirely useless. My knowledge of systems and package management meant I was able to collaborate with another developer (kiilerix) to improve the Docker package building support, including preliminary work for building (un)official Debian packages for the first time.

I also learned about some infrequently used features or tips about Mercurial. For example, folks who come from a background of using git often complain about Mercurial’s lack of interactive rebase functionality. The “histedit” extension provides this feature. Much like many other features of Mercurial, this is technically “in core”, but not enabled by default. Adding a line in the “[extensions]” section your “hgrc” file such as “histedit =” enables it. It allows all the expected picking, folding, dropping, editing, or modifying commit messages.

Changeset evolution is another feature that’s been coming for a long time. It enables developers to safely modify history and be able to propagate those changes to any down/upstream clones. It’s still disabled by default, but is available as an extension. Gregory Szorc, a colleague of mine, has written about it before. If you’re curious you can read more about it here.

One of the features I’m most looking forward to is sparse checkouts. Imagine a la Perforce being able to only check out a subtree or subtrees of a repository using ‘–include subdir1/’ and –exclude subdir2/’ arguments during cloning/updating. This is what sparse checkouts will allow. Additionally, functionality is being planned to enable saved ‘profiles’ of subdirs for different uses. For instance, specifying the ‘–enable-profile mobile’ argument will allow a saved list of included and excluded items. This seems like a really powerful way of building lightweight build profiles for each different type of build we do. Unfortunately to be properly implemented it is waiting on some other code to be developed such as sharded manifests.

One last thing I’d like to tell you about is an upcoming free software project for Mercurial hosting named Kallithea. It was borne from the liberated code from the RhodeCode project. It is still in its infancy (version 0.1 as of the writing of this post), but has some attractive features for viewing repositories, such visualizations of changelog graphs, diffs, code reviews, a built-in editor, LDAP support, and even a JSON-RPC API for issue tracker integration.

In all I feel it was a valuable experience for me to attend that benefited both the Mercurial project and myself. I was able to lend some of my knowledge about building packages and familiarity with operations of large-scale hgweb serving, and was able to learn a lot about the internals of Mercurial and understand that even the deep core code of the project isn’t very scary.

I’m very thankful for my ability to attend and look forward to attending the next Sprint in the following year.

08.5.14

The ‘Try’ repository and its evolution

As the primary maintainers of the back end of the Try repository (in addition to the rest of Mercurial infrastructure) we are responsible for its care and feeding, making sure it is available, and a safe place to put your code before integration into trees.

Recently (the past few years actually) we’ve been experiencing that Mercurial has problems scaling to it’s activity. Here are some statistics for example:

  • 24550 Mercurial heads (this is reset every few months)
  • Head count correlated with the degraded performance
  • 4.3 GB in size, 203509 files without a working copy

One of the methods we’re attempting is to modify try so that each push is not a head, but is instead a bundle that can be applied cleanly to any mozilla-central tree.

By default when issuing a ‘hg clone’ from and to a local disk, it will create a hardlinked clone:


$ hg clone --time --debug --noupdate mozilla-central/ mozilla-central2/
linked 164701 files
listing keys for "bookmarks"
time: real 3.030 secs (user 1.080+0.000 sys 1.470+0.000)

$ du --apparent-size -hsxc mozilla-central*
1.3G   mozilla-central
49M    mozilla-central2
1.3G   total

These lightweight clones are the perfect environment to apply try heads to, because they will all be based off existing revs on mozilla-central anyway. To do that we can apply a try head bundle on top:


$ cd mozilla-central2/
$ hg unbundle $HOME/fffe1fc3a4eea40b47b45480b5c683fea737b00f.bundle
adding changesets
adding manifests
adding file changes
added 14 changesets with 55 changes to 52 files (+1 heads)
(run 'hg heads .' to see heads, 'hg merge' to merge)
$ hg heads|grep fffe1fc
changeset:   213681:fffe1fc3a4ee

This is great. Imagine if instead of ‘mozilla-central2′ this were named fffe1*, and could be stored in portable bundle format, and used to create repositories whenever they were needed. There is one problem though:


$ du -hsx --apparent-size mozilla-central*
1.3G	mozilla-central
536M	mozilla-central2

Our lightweight 49MB copy has turned into 536MB. This would be fine for just a few repositories, but we have tens of thousands. That means we’ll need to keep them in bundle format and turn them into repositories on demand. Thankfully this operation only takes about three seconds.

I’ve written a little bash script to go through the backlog of 24,553 try heads and generate bundles for each of them. Here is the script and some stats for the bundles:


$ ls *bundle|wc -l
24553

$ du -hsx
39G    .

$ cat makebundles.sh
#!/bin/bash

/usr/bin/parallel --gnu --jobs 16 \
    "test ! -f {}.bundle && \
    /usr/bin/hg -R /repo/hg/mozilla/try bundle --rev {} {}.bundle ::: \
    $(/usr/bin/hg -R /repo/hg/mozilla/try heads --template "{node} ")

If this tooling works well I’d like to start using this as the future method of submitting requests to try. Additionally if developers wanted, I could create a Mercurial extension to automate the bundling process and create a bundle submission engine for try.

06.20.14

Mozilla’s “try” repository

At Mozilla we use Mercurial for Firefox development. We have several repositories/trees that are used depending on where the code should be. If a developer wishes to test the code they have been developing, they can submit it to a Mercurial repository called ‘try‘, since running our entire test suite is not feasible on developer machines.

We have quite a bit of infrastructure around this including Tinderbox Pushlog (TBPL) and  more. This post deals with the infrastructure and problem we face while trying to scale the ‘try’ repository.

A few statistics:

  • The try repository currently has 17943 heads. These heads are never removed.
  • The try repository is about 3.6 GB in size.
  • Due to Mercurial’s on-wire HTTP protocol, this number of heads causes HTTP cloning to fail
  • There are roughly 81000 HTTP requests to try per day
  • To fix problems (mentioned below), the try repository is deleted and re-cloned from mozilla-central every few months

There are a number of problems associated with such a repository. One particularly nasty one has been present through several years of Mercurial development, and has been tricky in that it is seemingly unreproducible. The scenario is something like:

  • User ‘hg push’es some changes to a new head onto try
  • The push process takes a long time (sometimes between 10 minutes and hours)
  • A developer could issue an interrupt signal (ctrl+C) which causes the client to gracefully hang up and exit (his typically has no effect on the server
  • Subsequent pushes will hang with something similar to ‘remote: waiting for lock on repository /repo/hg/mozilla/try/ held by ‘hgssh1.dmz.scl3.mozilla.com:23974′
  • When this happens a hg process is running on the server has the following characteristics:
    • A ‘hg serve’ process runs single-threaded using 100% CPU
    • strace-ing and ltrace-ing reveal that the process is not making any system calls or external library calls
    • perf reveals that the process is spending all of its time inside some ambiguous python function
    • pdb yields that the process is spending all of its time in a function that (along some point in the stack trace) is going through ancestor calculations
    • The process will eventually exit cleanly
  • As operators there is nothing we can do that to alleviate the situation once the repository gets in this state. We simply inform developers and monitor the situation.

There have been several ideas on ways to alleviate the problem:

  • Periodically reset ‘try’. This is considered bad because 1) it loses history, and 2) it is disruptive to developers, who might have to re-submit try jobs again
  • Reset try on the SSH servers, but keep old try repositories on the HTTP servers. This has the potential to create unforeseen problems of growing these repositories even further on the HTTP servers. If reset (staggered from SSH server resets) this will remove unforeseen problem potential, but still lose history.
  • Creating bundle files out of pushes to ‘try’, then hosting these in an accessible location (S3, http webroot, etc). I will detail this method in a future blog post.

As of now though, try will periodically need to be reset as a countermeasure to the hangs mentioned in this post. Getting a reproducible test case might allow us to track down a bug or inefficiency in Mercurial to fix this problem after all. If you’d like to help us with this, please ping fubar or me (bkero) on irc.mozilla.org.

03.24.14

Measuring the performance improvement of Mercurial (NFS vs local disk)

Earlier this year I was able to move mozilla.org’s Mercurial infrastructure from using NFS-mounted storage to transitioning to local disk. This was done for several reasons. Read the rest of this entry »

01.25.14

Solving connectivity problems

This post deals with the technical challenges we’ve encountered while trying to establish reliable communications while staying in Rural Kenya. Some background information is necessary to understand the efforts we’ve gone through to remain connected.

This year the prestigious Hacker Beach event is taking place on the island of Lamu off the eastern coast of Kenya. The island is serviced by a single UMTS tower located above the hospital in the main town of Lamu City. However, our accommodation is on the other side of the island.

The situation

The situation

Our accommodation had a previously installed directional antenna on the roof to provide internet access. Unfortunately the access was very slow, with only 14% signal strength. This was complicated by strong winds blowing against antenna, causing it to be pointed in a wrong direction. This further reduced the cellular reception, sometimes making it disconnect completely.

 

The antenna solution. Photo by Sebastian Kippe (CC BY 2.0)

The antenna solution. Photo by Sebastian Kippe (CC BY 2.0)

An additional problem was that WiFi was served by a single Cradlepoint MBR1000 router in a corner of the fort, making it inaccessible through the impenetrably thick fort walls. This meant we were limited to camping in the upstairs dining hall, which worked well enough due to all the seating, but there was some desire to branch out to work from other areas of the fort, such as the knights-of-the-round-table-esque meeting room.

Our conspiratorium

For a group of 18 hackers, this level of connectivity was unacceptable. Many of us were making excursions into town to work at cafes with better reception. This was a problem because it threatened to undermine the spontaneous collaborative nature of Hacker Beach. The way we saw it there were two problems to fix:

  • Reception of the antenna was abysmal. Was this an inherent problem with the location?
  • WiFi reception was limited to only one corner of the house. Ideally the house should have WiFi everywhere.

We attempted to fix this by purchasing local SIM cards and installing them in portable WiFi Hotspot devices. Oddly enough we were able to receive some 3G reception if the devices were placed in some rather random areas of the fort. Unfortunately the connectivity of these devices wasn’t reliable enough for full-time hacking. So we began efforts in earnest to fix the connectivity problem.

We determined that the most appropriate solution to the WiFi problem was to employ PowerLine Ethernet adapters throughout the Fort to distribute connectivity. Simply repeating wireless signal was not a good option because of the lack of strategic locations to place wireless repeaters. The thick walls meant that the signal would be stopped between floors as well. We took a gamble and assumed that most outlets would be on the same power phase (if circuits are on different phases the PowerLine throughput will be severely limited, or likely not work at all). Since we had some new hackers approaching in a few days shipping was out of the question. Thankfully we were able t source some units in Athens, which (after some begging) our gracious friends were kind enough to pick up and bring for us.

The pairing part was easy, with WiFi SSID/password being copied using WPS. After pairing the devices could be moved anywhere in the fort to increase coverage. We installed two devices which are able to blanket the whole fort with connectivity. Problem solved.

PowerLine Ethernet adapter

Next was a trickier bit that required more calibration and special equipment. While inspecting the old antenna we found that the connectors had been tortured by the elements for several years. This meant that the antenna pigtail connectors were rusted, which was likely causing reception issues. Another problem was that the pigtail was being run through a window, which was then closing on it. We feared this was crushing the cable, which could have easily caused our antenna to become useless.

3G Modem and antenna cable run through window

There were several more hackers arriving from Nairobi in a few days, so we asked them to bring some antenna gear to hopefully help improve our connectivity. In total a questionably-EDGE amplifier, directional antenna, and some cabling was delivered when the hackers arrived early yesterday morning. It didn’t take long for us to tear it all open and start installing it.

Equipped with a laptop, an antenna, and a downstairs accomplice we disconnected the old antenna and threw a new line down to connect to the 3G modem. Next I had opened the router’s modem status page to measure signal strength while another hacker determined the direction the antenna should face to get the best reception. Our best direction was pressed against the old antenna; the people who installed the last one must have known what they were doing.

Unfortunately we were only armed with my multitool which meant that proper mounting was going to be impossible. We tried wrenching the existing nuts that held the antenna in place, but they proved to be well stuck with a decade of rust and generally brutal African elements. Not even cooking oil (our improvised WD-40) would help loosen the offending nuts. Ultimately we ended up doing a bodge job to keep the antenna in place. One of the hackers had brought string with him, which we used to tie the new antenna’s base plate to the old antenna. This worked surprisingly well, although is a horrifyingly temporarily solution. The string will not stand up to more than a few days outside here. Next we plan to source some tools locally and perform a permanent installation of the antenna.

With the old antenna the signal strength would consistently be about 14%, which resulted in throughput of about 200 kbit. After out new antenna was installed and calibrated we were able to see signal strength of up to 80%, which gave us upwards of 1800 kbit of throughput with consistent pings of about 250 ms. Hooray!

After applying liberal traffic shaping on the router we are now able to comfortably surf the internet, download packages, and use IRC.

11.21.13

Day 51

I’ve been as terrible about updating this as I expected to be. Nevertheless, I am struck with inspiration (or maybe it’s just energy from coffee), so another post must be written!

For the next week (and the previous week) I’m spending time in Paris. This turned out to be largely a convenient set of circumstances, since I had an excellent experience when I was here two weeks ago, and I wished I could spend more time here.
Read the rest of this entry »

11.21.13

Concerning Hackers and Beaches

I’m excited to hear that Hackerbeach will be happening again this year. Last year was an amazing and unique experience, and I can’t wait to go again. This year the village hosting us will be Lamu, Kenya.

For the uninitiated, Hackerbeach involves a group of hackers (historically 15-20) gathering in a tropical location for a month to hack on various open source projects. It can be thought of as a month-long hackathon or code sprint for nomadic open source developers. All of the code so far has been focused on the open web ecosystem.

Read the rest of this entry »

09.28.13

Day 1 (2013/09/26-2013/09/27)

Writing this log reluctantly at the request of a coworker. last time I tried while travelling resulted in three large rambling posts (only 2 of which were published). This time I’ll try to write smaller posts of a more personal nature.

Woke up with a sore throat. A bad omen for a long travel stint. I hadn’t packed the night before, so it was all done day-of, which surprisingly didn’t result in me not packing important items (that I can think of yet). Maybe I’m getting used to this, or maybe I’m just being more reserved in what I consider necessary.

Read the rest of this entry »

05.22.13

Flying in India

Flying in India is a bit different than in other countries I’ve flown. All processes are more strict than at least southeast Asia, Europe, and the US. Some of the differences include the amount of documentation required. Read the rest of this entry »

Tags: , ,
| Posted in travel | No Comments »