Firmware updates are great – until one fails halfway and turns a working product into a brick. Dual-slot and rollback strategies on STM32 make sure your devices always have a safe image to fall back to, even if power is lost mid-update or a new build has a hidden bug.
When you should care about dual-slot and rollback
- You update STM32 firmware in the field (service tool, mobile app, over a bus or remotely).
- Devices are hard or expensive to access once deployed (industrial, mobility, tools, drives).
- A bricked device means unhappy users, truck rolls or warranty costs.
If you are still designing your basic update path (USB, UART, BLE, etc.), it can be useful to read this together with your update transport concept and your overall firmware architecture.
Step 1 – Clarify your update and failure scenarios
Before designing memory layouts or bootloaders, write down the scenarios you actually need to survive. Typical ones:
- Power loss during download of a new image.
- Power loss while switching from old to new image.
- New image boots but crashes early (boot loop, watchdog reset).
- New image runs but has a functional bug – you want to roll back in the field.
For each scenario, answer two questions:
- What must never happen? (e.g. permanent brick, corrupted bootloader.)
- What is acceptable? (e.g. device returns to old version, ends in safe mode.)
This list becomes your acceptance criteria later when you test your dual-slot design.
Step 2 – Choose a memory layout that matches your constraints
Dual-slot designs always trade flash space against safety. On STM32 you usually have three main options:
- Single-slot + “update buffer”: one active application + one smaller buffer.
- Symmetric dual-slot: two equally sized app slots (A/B) + small bootloader.
- Asymmetric dual-slot: one full-featured slot and one smaller “rescue” slot.
Consider:
- Is flash big enough for two complete images?
- Do you need the second slot to be as capable as the first, or is a “recovery image” enough?
- What about EEPROM/emulated EEPROM or external flash for metadata?
For many industrial products, a symmetric dual-slot layout offers the best balance between simplicity and safety.
Step 3 – Design a simple, explicit bootloader state machine
At the heart of a robust dual-slot strategy is a small, deterministic state machine in the bootloader. Its job is to decide which image to run and what to do after failures.
A typical pattern:
- On reset, bootloader reads slot metadata (valid flag, image version, “confirmed” flag, health info).
- If there is a pending update in slot B, it boots that image with a temporary status.
- The new image must confirm itself after successful start (e.g. by setting a flag in flash).
- If confirmation never happens (watchdog reset, crash), bootloader falls back to the last confirmed slot.
Keep the bootloader state machine as small and testable as possible. Complexity belongs in the application and update logic, not in the one part that must always work.
Step 4 – Define your image format, integrity and versioning
Dual-slot only helps if you can trust the images you switch between. At minimum you need:
- A clear header structure (magic, length, version, CRC, maybe signature).
- An integrity check (CRC or stronger) before you ever mark an image as valid.
- A versioning scheme so the bootloader and tools know what is “newer”.
In more demanding applications, add:
- Digital signatures to verify authenticity of the image.
- Optional encryption, depending on IP and threat model.
- Compatibility fields (e.g. hardware variant, config region version).
Good image metadata also makes it easier to analyse issues in the field when a customer sends logs or a complete flash dump.
Step 5 – Decide when and how rollback should trigger
Not every failure should immediately cause rollback, but some failures must. It is worth being explicit about your rollback triggers:
- Hard triggers: repeated boot failures, watchdog resets early in startup, failed integrity checks.
- Soft triggers: application-level “revert” command, triggered by diagnostics or remote command.
- Rate limits: prevent endless flip-flop between two half-working versions.
The key is to capture a small set of metrics (boot counters, last reset reason, flags) that the bootloader can evaluate quickly without complex logic.
Step 6 – Handle partial updates and power loss explicitly
A dual-slot layout on its own does not automatically protect you from partial updates. You must define how the system moves from “no update” to “new image active” using atomic steps:
- Only write the “valid” flag after the full image and its metadata are in place and verified.
- Use a separate “pending” or “trial” flag for images that are being tested but not yet confirmed.
- Make the switch between slots a small, robust operation (e.g. a single word write in metadata).
Think in terms of “what if power is lost right after this word is written?” and design the metadata transitions so every intermediate state is either clearly “old image” or clearly “trial image” – never an undefined mess.
Step 7 – Test your strategy like a hardware feature
Dual-slot and rollback are not just “helpers” – they are core safety features. Test them like you would test a protection circuit:
- Script power-loss tests at defined points in the update sequence.
- Deliberately inject corrupted images and verify that they are never marked valid.
- Trigger rollback conditions (boot loops, watchdogs, explicit revert requests) and verify behaviour.
If your products are part of a larger drive or machine, align these tests with your system-level validation and any existing work you do around protection and fault handling or drive systems.
Dual-slot & rollback checklist
| Topic | Key question | OK? |
|---|---|---|
| Scenarios | Are update and failure scenarios written down and prioritised? | |
| Layout | Does your flash layout clearly separate bootloader and app slots? | |
| Boot logic | Is there a simple state machine deciding which slot to boot and when to fall back? | |
| Integrity | Do you verify every image (CRC/signature) before marking it valid? | |
| Rollback | Are rollback triggers defined and rate-limited to avoid flapping? | |
| Power loss | Is every metadata transition safe against mid-write power loss? | |
| Testing | Do you have automated tests that simulate failures and confirm behaviour? |
FAQ: Dual-slot & rollback on STM32
Do I always need a full symmetric dual-slot layout?
No. If flash is tight, an asymmetric layout with a smaller “rescue” image can still give you most of the safety benefits. The important part is that there is at least one known good image that the bootloader can always return to.
Is rollback only relevant for over-the-air updates?
Not at all. Even wired updates (service tool, USB, UART) can fail due to power loss or cabling issues. If the device is hard to reach or critical in operation, rollback is worth considering regardless of the transport.
How does this interact with real-time motor control?
On STM32-based drive systems, the motor firmware is often the “main” application you want to protect. Dual-slot and rollback do not change your FOC or timing code directly, but they strongly influence how you structure your firmware, configuration and field updates. It’s best to think about them together with your drive system and firmware architecture.
Can you help design or review a dual-slot bootloader?
Yes. In our engineering & firmware consulting and firmware services we regularly design or review bootloaders, image formats and update flows for STM32-based products, often together with real-time and motor-control work.
How this fits into your overall product development
Dual-slot and rollback are one piece of a larger reliability story: safe update paths, robust control firmware, protection concepts and mechanical design that can tolerate failures without damaging hardware.
If you want to align all of this, these pages are a good next step:
0 comments