Ah, the world of computers. Thanks to the wonderful world of bits and bytes, we can experiment with any application, file, driver, or even the core operating system. Rip them apart, change things, put them together, and if it doesn’t work, just try again. At worst, you’ll have to wipe your hard drive and start over. If you somehow manage to destroy a computer purely through bad software, that’s considered a design problem and a true feat to pull off. Just think about it: what other profession or hobby lets you experiment as much as you want and make as many mistakes as you want without having to spend a cent if you do something wrong?

Unfortunately, things have changed. Ever since the advent of embedded devices with upgradable firmware, people have been trying to modify and hack them. These devices are usually a lot less resilient than their bigger, older siblings. Many of the new shiny gadgets that we use every day are internally fragile and a slight software mishap can render them non-functional, a “brick”.

This is a guide for developers and hackers who work on system firmware for embedded devices.

Care About Your Users

The first step towards safe hacking is to develop a deep appreciation towards your users and, especially, their hardware. Most users are clueless and entirely dependent on you to guide them towards a safe result. While everyone releases hacks with no warranty neither express nor implied, that’s just to cover their ass. Remember, users are usually completely lost if they make their devices inoperable, and, unlike you, probably won’t have a backup plan.

One great way to start developing an appreciation for each individual piece of hardware out there is to deeply care for your own. If you’re a hacker with a very low budget, you probably have this already, as you’ll want to keep your device functional for as long as possible to avoid having to spend your hard-earned cash on a new one. If you have to perform emergency repairs on it or flashing, take notice every time you do so. You might have had to spend a few hours wiring up a flasher to your device. An average user probably doesn’t have a chance of being able to do that in a week. And if you have the resources to purchase a few devices for testing purposes, keep in mind that most users don’t have that luxury. It might be tempting to run wild and experiment once you have a recovery plan in place, but remember, every mistake that you make is a mistake that might slip and end up affecting your users instead. If you take great pains you avoid bricking your own hardware, you’ll greatly decrease the chances of a critical mistake making its way into the release version.

If you still decide not to care for your users, make it plainly clear in the product documentation that they are entirely on their own, and that you don’t care about what happens when they run your tool, nor have you make any attempt to make it safe for everyone. They deserve to know.

Understand the System

Before you start working on software that makes permanent changes to a device, you should have a deep enough understanding of its operation. Reverse engineer the boot process. Understand what parts of the firmware depend on what. Know what components are vital for boot, and what recovery modes are available, if any. If you’re the hacker responsible for performing most of the reverse engineering work on the device, you probably already know a good deal about it. If you aren’t, read documentation, try to understand everything, and talk to the person who is. Explain your idea. They will probably have many useful safety tips for you. Work on less intrusive hacks that will deepen your understanding of the system before moving on to riskier hacks that might end up in a brick. Above all, work with other people who also work on that device. Every extra knowledgeable person working on a firmware hack multiplies its chances of being safe.

Program Defensively

Usually, when a program crashes, at worst users get annoyed or lose some data. However, when unstable firmware hacks can mean that devices are irreparably destroyed, entirely different standards apply. Check all error codes. Handle out of memory errors. Make sure there’s enough free disk space. Make sure headers are sane. If you’ve never written a stable app before, one that can gracefully handle most exceptional conditions without crashing or doing the wrong thing, you should seriously reconsider working on critical device firmware hacks until you do so. Learn about what kinds of problems to expect on safe ground first, before you move on to shakier terrain.

One great technique to use is to do as much as possible in advance. Gather all required information about the system, read any required data files, prepare any modifications, and only at the very end actually commit the changes to the device. If anything goes wrong during the preparations, you can just abort the entire operation and be certain that the device is still safe. If you don’t want (or can’t) architect your program like this, you can still tack it on as an underlying layer. Make the low-level functions that perform the actual changes (e.g. write to Flash) actually write to a temporary buffer instead, and bulk write everything at the very end. This also gives you a chance to check the result of the operation virtually, before it’s actually committed. It might even speed up your program as a side effect (bulk writes are faster than scattered ones).

Fail Intelligently

If you’ve followed the prior advice, you’ll have already minimized the amount of code that can fail and cause catastrophic damage to firmware. However, most of the time, there’s always something that might go wrong at just the wrong time. If a critical operation fails, the worst possible thing you can do is panic the application or otherwise halt! Then you’re guaranteed to brick the device. Instead, drop the user into some kind of failsafe mode, shell, or launcher, and direct them to keep the device powered on and seek immediate attention (e.g. on an IRC channel). If there’s a chance of saving the device, even if you have to work together with the user to develop an improvised fix, take it. He or she will be eternally grateful to you.

Sanity Check

Don’t assume anything about the user’s environment. Manufacturers often release dozens of firmware updates, and the number balloons to hundreds or even thousands if you start to consider the possible combinations of hacks that users might have already applied. Profile the system and ensure that everything is sane before you start. If you need to read any data from a user-supplied file or from the network, make sure it is exactly what you expect it to be. You can’t possibly have too many sanity checks.

Cryptographic hash algorithms (such as SHA-1) are a great tool here. Build a database of known-good firmware hashes. Include the hash of the expected result after running your program, so you can check against it before actually writing it out. If you miss an existing firmware that would’ve just worked, that only means you have to add it in and release a new version. If you don’t perform the check and that firmware turns out to be incompatible, you’ve just created a whole class of users that will be bricked by your tool. Blind patching is a recipe for disaster.

You should also make your application check itself, to make sure it hasn’t been corrupted (due to a bad download, bad media, or even bad memory), including any auxiliary files that it needs. Hashes also work great here. You can make this as simple or as complex as you want. You can have the executable check its readonly sections against built-in hashes in memory. Or you can just have a .txt file with hashes of all your files (including the main executable), and check them at runtime before anything else. Sometimes just packing your executable with an executable packer will give you this feature for free (but make sure the packer does, in fact, offer integrity protection).

Protect Users From Themselves

Users will do completely stupid things. It’s not just that they will click on things without understanding what the outcome will be; if you include a big red button that says “Brick Me!”, someone will click it too. That’s why you should at least make it hard for users to destroy their system. Sure, you can just blame them for their own incompetence, but it’s worth covering for the obvious cases. If there’s an option in your hack that will undoubtedly brick a user’s system under a conceivable set of circumstances, check for them and disable the option in that case. There’s no excuse for having a button that deletes critical system firmware, even if it’s marked “delete critical system firmware”. If such a feature makes sense as part of a longer process that will again result in a functional device, automate the process. If there’s a reason why a power user or developer might have a use for such a dangerous option, hide it behind a warning or two and make getting to it more annoying. Your software should pass the cat test: if a cat walks all over the keyboard (or touch panel, or game pad, or Kinect), it shouldn’t be able to cause permanent harm to the system.

This doesn’t mean that you have to try to envision every single possible situation. Users are extremely good at creatively breaking programs. But, at least, make sure they can’t accidentally destroy their systems without putting a moderate amount of effort into it.

Back Up

You should strongly consider offering a back up option to your users, or even automatically backing up critical information. Sometimes, having a backup can mean the difference between a device that can be fixed with a reasonable amount of effort (say, a hardware flasher), and a device that is forever toast and not even the manufacturer could hope to repair. If the amount of critical information is small, it’s worth putting the effort in and making sure it is automatically backed up whenever any dangerous operation is about to happen. Test it, to make sure the correct information is saved. And don’t forget to tell your users that they should keep the backup file in a safe place!

Even if your device is generally “brick-proof”, because it has a ROM bootloader that allows flashing, backing up can still be very important. Many devices store unique per-device data alongside the firmware, and the loss of that information can cause a messy repair process involving lots of manual guesswork, or worse, a device that, though technically alive, will never work again as intended (e.g. if critical calibration data or device private keys are lost). This information is usually very small. Back it up! You never know when a silly mistake will end up scribbling all over it.

Test

Ideally, you’ve put enough effort into making sure your application is safe. However, the unexpected can and does happen, and sometimes you will not have the resources to perform a comprehensive enough test. So gather up a few people that you can trust and who are willing to risk it, and perform a closed test. Do not release a public beta! People are way too impatient, and public betas are essentially synonymous with a release; people will ignore any warnings attached. You’ll want trustworthy people, preferably with the technical knowledge and skill to spot a problem before it is fatal and to have a chance of being able to fix it, if it is. Look for people with hardware experience who can put together some kind of flasher (JTAG, NOR, whatever) if things go terribly wrong.

If your application errors out on some devices, but does not cause any harm, give yourself a pat in the back: congratulations, you’ve saved your tester from a brick. If it doesn’t, and you brick your tester’s device, give yourself a pat in the back: you’ve saved dozens, hundreds, or thousands of potential end-users from a brick.

You should also make sure you’ve covered all bases with your testing. If your device has gone through multiple hardware revisions, especially if those changes are at all related to the firmware of the device (e.g. different flash chip vendors, or an entirely different firmware storage device even), you should test on all of them. If you don’t know, look around and ask. There are plenty of people out there willing to provide you with PCB pictures and chip part numbers that will help you identify any important changes. If you didn’t consider an entire hardware revision and your hack doesn’t work as expected on it, you’re guaranteeing that a huge percentage of your users, potentially thousands or tens of thousands, will brick their devices.

Epilogue

I hope this article convinced you to be careful when you write firmware hacks for embedded devices. If you follow the guidelines in it, you’ll save money, save your users money, and build a reputation for robust and dependable hacks. These are the previously unwritten principles that Team Twiizers followed when we developed the HackMii Installer, and we haven’t heard of a single brick out of 1.2 million installs to date. Ultimately, though, whether you follow them is entirely up to you. Is it worth it? You decide.

2011-01-19 02:56