When we started automating tests for Firefox OS, we knew that we could do a lot with automated testing on phone emulators–we could run in a very similar environment to the phone, using the same low level instruction set, even do some basic operations like SMS between two emulator processes. Best of all, we could run those in the cloud, at massive scale.
But, we also knew that emulator based automation wasn’t ever going to be as complete as actually testing on real phones. For instance, you can’t simulate many basic smart phone operations: calling a number, going to voice-mail, toggling airplane mode, taking a picture, etc. So, we started trying to get phones running in automation very early with Firefox OS, almost two years ago now.
We had some of our very early Unagi phones up and running on a desk in our office. That eventually grew to a second generation of Hamachi based phones. There were a couple of core scalability problems with both of these solutions:
- No reliable way to power-cycle a phone without a human walking up to it, pulling out the battery and putting it back in
- At the time these were pre-production phones (hence the code names), and were hard to get in bulk from partners. So, we did what we could with about 10 phones that ran smoketests, correctness tests, and performance tests.
- All of the automation jobs and results had to be tracked by hand. And status had to be emailed to developers — there was no way to get these reporting to our main automation dashboard, TBPL.
- Because we couldn’t report status to TBPL, maintaining the system and filing bugs when tests failed had to be done entirely by a dedicated set of 4 QA folk–not a scalable option, to say the least.
Because of points 1 and 2, we were unable to truly scale the number of devices. We only had one person in Mountain View, and what we had thought of as a part time job of pulling phone batteries soon became his full time job. We needed a better solution to increase the number of devices while we worked in parallel to create a better dashboard for our automation that would allow a system like this to easily plug in and report its results.
The Flame reference device solved that first problem. Now, we had a phone whose hardware we could depend on, and Jon Hylands was able to create a custom battery harnesses for it so that we could instruct our scripts to automatically detect dead phones and remotely power cycle them (and in the future, monitor power consumption). Because we (Mozilla) commissioned the Flame phone ourselves, there were no partner related issues with obtaining pre-production devices–we could easily get as many as we needed. After doing some math to understand our capacity needs, we got 40 phones to seed our prototype lab to support per-push automation.
As I mentioned, we were solving the dashboard problem in parallel, and that has now been deployed in the form of Treeherder, which will be the replacement for TBPL. That solves point 3. All that now remains is point 4. We have been hard at work on crafting a unified harness to run the Gaia Javascript tests on device which will also allow us to run the older, existing python tests as well until they can be converted. This gives us the most flexibility and allows us to take advantage of all the automation goodies in the existing python harness–like crash detection, JSON structured logging, etc. Once it is complete, we will be able to run a smaller set of the same tests the developers run locally per each push to b2g-inbound on these Flame devices in our lab. This means that when something breaks, it will break tests that are well known, in a well understood environment, and we can work alongside the developers to understand what broke and why. By enabling the developers and QA to work alongside one another, we eliminate the scaling problem in point 4.
It’s been a very long road to get from zero to where we are today. You can see the early pictures of the “phones on a desk” rack and pictures of the first 20 Flames from Stephen’s presentation he gave earlier this month.
A number of teams helped get us to this point, and it could not have been done without the cooperation among them: the A*Team, the Firefox OS Performance team, the QA team, and the Gaia team all helped get us to where we are today. You can see the per-push tests showing up on the Treeherder Staging site as we ensure we can meet the stability and load requirements necessary for running in production.
Last week, James Lal and his new team inherited this project. They are working hard to push the last pieces to completion as well as expanding it even further. And so, even though Firefox OS has had real phone automation for years, that system is now coming into its own. The real-phone automation will finally be extremely visible and easily actionable for all developers, which is a huge win for everyone involved.