Why your ESP32 firmware needs a state machine (and how to build one)
Here is the firmware you have probably written. Maybe not exactly this, but something that evolved into this shape over six months of adding features:
bool wifiConnected = false;
bool mqttConnected = false;
bool otaInProgress = false;
bool sensorActive = false;
bool reconnecting = false;
bool firstBoot = true;
void loop() {
if (firstBoot) {
initSensors();
firstBoot = false;
}
if (!wifiConnected) {
if (WiFi.status() == WL_CONNECTED) {
wifiConnected = true;
reconnecting = false;
} else if (!reconnecting) {
WiFi.begin(SSID, PASSWORD);
reconnecting = true;
}
} else if (!mqttConnected && !otaInProgress) {
if (mqtt.connect(CLIENT_ID)) {
mqttConnected = true;
}
} else if (mqttConnected && !otaInProgress) {
mqtt.loop();
if (sensorActive) {
readAndPublishSensor();
}
if (millis() - lastOtaCheck > 3600000) {
checkForOtaUpdate();
lastOtaCheck = millis();
}
}
// WiFi dropped mid-session
if (wifiConnected && WiFi.status() != WL_CONNECTED) {
wifiConnected = false;
mqttConnected = false;
reconnecting = false;
sensorActive = false;
}
// OTA started
if (otaInProgress) {
ArduinoOTA.handle();
// Don't read sensors during OTA
// Don't publish MQTT during OTA
// But we need MQTT connected to know OTA is available...
// Actually we need WiFi but maybe not MQTT...
// This is getting complicated
}
}
Now your manager asks you to add an OTA update state. You touch eight different if statements. You introduce a bug where sensorActive stays true during OTA. You fix that, introduce another bug where reconnecting never gets cleared after a successful OTA. Three weeks later nobody — including you — can confidently answer the question “what happens if WiFi drops during an OTA update?”
The boolean flag approach has a deeper problem than just being messy. It allows impossible states. If wifiConnected is false, then mqttConnected should also be false — MQTT runs over WiFi. But nothing enforces this. A bug can set mqttConnected = true and wifiConnected = false and the code will try to publish to MQTT via a disconnected WiFi connection.
There is a better pattern. It is called a finite state machine.
What a finite state machine actually is
Skip the formal automata theory. For firmware purposes, a finite state machine (FSM) is:
- A fixed set of states — mutually exclusive. The system is in exactly one state at any given time.
- Events — things that happen: WiFi connects, MQTT disconnects, a button is pressed.
- Transitions — rules of the form “when in state S, if event E occurs, move to state T”.
- Actions — code that runs on transition, or when entering/exiting a state.
The key constraint is mutual exclusivity. You cannot be in CONNECTING_WIFI and CONNECTED simultaneously, the same way you cannot be in both Toronto and London at the same time. This constraint is what eliminates impossible states.
States vs modes: A common mistake is using “modes” that can be combined —
nightMode = true,lowPowerMode = true. Modes are not mutually exclusive. If you find your firmware has boolean “mode” flags, you probably want states instead. Modes are fine for preferences; states are what you want for control flow.
The wrong way: a giant switch/case
The first attempt at an FSM usually looks like this:
enum class State {
DISCONNECTED,
CONNECTING_WIFI,
CONNECTED_WIFI,
CONNECTING_MQTT,
CONNECTED,
OTA_UPDATING
};
State currentState = State::DISCONNECTED;
void loop() {
switch (currentState) {
case State::DISCONNECTED:
WiFi.begin(SSID, PASSWORD);
currentState = State::CONNECTING_WIFI;
break;
case State::CONNECTING_WIFI:
if (WiFi.status() == WL_CONNECTED) {
currentState = State::CONNECTED_WIFI;
// Connect MQTT now
mqtt.connect(CLIENT_ID);
currentState = State::CONNECTING_MQTT;
}
break;
case State::CONNECTING_MQTT:
if (mqtt.connected()) {
currentState = State::CONNECTED;
mqtt.subscribe("device/ota");
} else if (mqttConnectFailed) {
currentState = State::CONNECTED_WIFI; // Fall back, try again
}
break;
case State::CONNECTED:
mqtt.loop();
readAndPublishSensor();
if (otaStarted) {
mqtt.unsubscribe("device/ota");
currentState = State::OTA_UPDATING;
}
break;
case State::OTA_UPDATING:
ArduinoOTA.handle();
if (otaDone) {
currentState = State::CONNECTED;
}
break;
}
}
This is better than boolean flags. But it will not stay clean. As the state machine grows, the switch cases grow. Transition logic is scattered across case blocks. Entry actions (things you do when entering a state) get duplicated — you might transition to CONNECTED from two different states, and you have to remember to put the subscription call in both paths. Adding a new event (say, WiFi drops while MQTT-connecting) means finding every case that needs to handle it.
The transition table pattern
Instead of scattering logic across switch cases, make the entire FSM visible as a data structure:
struct Transition {
State from;
Event event;
State to;
std::function<void()> action; // Runs on this specific transition
};
Every possible state/event combination is a row in a table. You add a new state by adding a new enum value and new rows. You can read the entire FSM by reading the table.
Here is what that looks like in practice:
#include <Arduino.h>
#include <WiFi.h>
#include <PubSubClient.h>
#include <ArduinoOTA.h>
#include <functional>
#include <vector>
// ── State and Event definitions ───────────────────────────────────────────────
enum class State {
DISCONNECTED,
CONNECTING_WIFI,
CONNECTED_WIFI,
CONNECTING_MQTT,
CONNECTED,
RECONNECTING,
OTA_UPDATING
};
enum class Event {
TICK, // Periodic heartbeat from loop
WIFI_CONNECTED,
WIFI_DISCONNECTED,
MQTT_CONNECTED,
MQTT_DISCONNECTED,
MQTT_CONNECT_FAILED,
OTA_START,
OTA_COMPLETE,
OTA_ERROR
};
// ── Forward declarations ───────────────────────────────────────────────────────
static void onEnterDisconnected();
static void onEnterConnectingWifi();
static void onEnterConnectedWifi();
static void onEnterConnectingMqtt();
static void onEnterConnected();
static void onEnterReconnecting();
static void onEnterOtaUpdating();
static void onExitConnected();
// ── FSM engine ────────────────────────────────────────────────────────────────
struct Transition {
State from;
Event event;
State to;
std::function<void()> action; // Optional, runs on this transition
};
// State entry actions — called whenever we enter a state, regardless of
// which transition brought us here. This is where the real power is.
using StateAction = std::function<void()>;
struct StateDescriptor {
State state;
StateAction onEnter;
StateAction onExit;
};
class FSM {
public:
FSM(State initial,
std::vector<StateDescriptor> stateDescriptors,
std::vector<Transition> transitions)
: _current(initial),
_descriptors(std::move(stateDescriptors)),
_transitions(std::move(transitions))
{}
void start() {
runEntry(_current);
}
void dispatch(Event event) {
for (const auto &t : _transitions) {
if (t.from == _current && t.event == event) {
Serial.printf("[FSM] %s + %s → %s\n",
stateName(_current), eventName(event), stateName(t.to));
runExit(_current);
if (t.action) t.action();
_current = t.to;
runEntry(_current);
return;
}
}
// No transition found — event is silently ignored in this state
// Uncomment for debugging:
// Serial.printf("[FSM] No transition: %s + %s\n",
// stateName(_current), eventName(event));
}
State current() const { return _current; }
private:
State _current;
std::vector<StateDescriptor> _descriptors;
std::vector<Transition> _transitions;
void runEntry(State s) {
for (const auto &d : _descriptors) {
if (d.state == s && d.onEnter) { d.onEnter(); return; }
}
}
void runExit(State s) {
for (const auto &d : _descriptors) {
if (d.state == s && d.onExit) { d.onExit(); return; }
}
}
static const char* stateName(State s) {
switch (s) {
case State::DISCONNECTED: return "DISCONNECTED";
case State::CONNECTING_WIFI: return "CONNECTING_WIFI";
case State::CONNECTED_WIFI: return "CONNECTED_WIFI";
case State::CONNECTING_MQTT: return "CONNECTING_MQTT";
case State::CONNECTED: return "CONNECTED";
case State::RECONNECTING: return "RECONNECTING";
case State::OTA_UPDATING: return "OTA_UPDATING";
}
return "UNKNOWN";
}
static const char* eventName(Event e) {
switch (e) {
case Event::TICK: return "TICK";
case Event::WIFI_CONNECTED: return "WIFI_CONNECTED";
case Event::WIFI_DISCONNECTED: return "WIFI_DISCONNECTED";
case Event::MQTT_CONNECTED: return "MQTT_CONNECTED";
case Event::MQTT_DISCONNECTED: return "MQTT_DISCONNECTED";
case Event::MQTT_CONNECT_FAILED:return "MQTT_CONNECT_FAILED";
case Event::OTA_START: return "OTA_START";
case Event::OTA_COMPLETE: return "OTA_COMPLETE";
case Event::OTA_ERROR: return "OTA_ERROR";
}
return "UNKNOWN";
}
};
// ── Application globals ───────────────────────────────────────────────────────
static const char* WIFI_SSID = "YourSSID";
static const char* WIFI_PASSWORD = "YourPassword";
static const char* MQTT_BROKER = "192.168.1.10";
static const char* MQTT_CLIENT = "esp32-fsm-demo";
WiFiClient wifiClient;
PubSubClient mqtt(wifiClient);
FSM* g_fsm = nullptr;
static QueueHandle_t g_eventQueue;
// ── Entry/exit actions ────────────────────────────────────────────────────────
static void onEnterDisconnected() {
Serial.println("[State] DISCONNECTED — stopping WiFi");
WiFi.disconnect(true);
}
static void onEnterConnectingWifi() {
Serial.printf("[State] CONNECTING_WIFI — WiFi.begin(%s)\n", WIFI_SSID);
WiFi.begin(WIFI_SSID, WIFI_PASSWORD);
}
static void onEnterConnectedWifi() {
Serial.printf("[State] CONNECTED_WIFI — IP: %s\n",
WiFi.localIP().toString().c_str());
// Immediately try MQTT — post event to queue
Event e = Event::TICK; // TICK in CONNECTED_WIFI state triggers MQTT connect
xQueueSend(g_eventQueue, &e, 0);
}
static void onEnterConnectingMqtt() {
Serial.printf("[State] CONNECTING_MQTT — connecting to %s\n", MQTT_BROKER);
mqtt.setServer(MQTT_BROKER, 1883);
if (mqtt.connect(MQTT_CLIENT)) {
Event e = Event::MQTT_CONNECTED;
xQueueSend(g_eventQueue, &e, 0);
} else {
Event e = Event::MQTT_CONNECT_FAILED;
xQueueSend(g_eventQueue, &e, 0);
}
}
static void onEnterConnected() {
Serial.println("[State] CONNECTED — subscribing to topics");
mqtt.subscribe("device/cmd");
mqtt.subscribe("device/ota/trigger");
}
static void onExitConnected() {
Serial.println("[State] Leaving CONNECTED — publishing offline status");
mqtt.publish("device/status", "offline", true); // Retained message
mqtt.unsubscribe("device/cmd");
mqtt.unsubscribe("device/ota/trigger");
}
static void onEnterReconnecting() {
Serial.println("[State] RECONNECTING — will retry WiFi in 5s");
// A timer task will push TICK after delay to re-enter WiFi connect flow
}
static void onEnterOtaUpdating() {
Serial.println("[State] OTA_UPDATING — suspending sensor tasks");
// Pause or delete sensor tasks that shouldn't run during OTA
ArduinoOTA.begin();
}
// ── FSM construction ──────────────────────────────────────────────────────────
// This table IS the FSM. Every possible (state, event) → new state is a row.
// Adding OTA required: 2 new enum values + 4 new rows. Nothing else.
static std::vector<StateDescriptor> buildStateDescriptors() {
return {
{ State::DISCONNECTED, onEnterDisconnected, nullptr },
{ State::CONNECTING_WIFI, onEnterConnectingWifi, nullptr },
{ State::CONNECTED_WIFI, onEnterConnectedWifi, nullptr },
{ State::CONNECTING_MQTT, onEnterConnectingMqtt, nullptr },
{ State::CONNECTED, onEnterConnected, onExitConnected },
{ State::RECONNECTING, onEnterReconnecting, nullptr },
{ State::OTA_UPDATING, onEnterOtaUpdating, nullptr },
};
}
static std::vector<Transition> buildTransitionTable() {
return {
// From state Event To state Action
{ State::DISCONNECTED, Event::TICK, State::CONNECTING_WIFI, nullptr },
{ State::CONNECTING_WIFI, Event::WIFI_CONNECTED, State::CONNECTED_WIFI, nullptr },
{ State::CONNECTING_WIFI, Event::TICK, State::RECONNECTING, nullptr }, // Timeout
{ State::CONNECTED_WIFI, Event::TICK, State::CONNECTING_MQTT, nullptr },
{ State::CONNECTING_MQTT, Event::MQTT_CONNECTED, State::CONNECTED, nullptr },
{ State::CONNECTING_MQTT, Event::MQTT_CONNECT_FAILED, State::CONNECTED_WIFI, nullptr },
{ State::CONNECTED, Event::MQTT_DISCONNECTED,State::CONNECTING_MQTT, nullptr },
{ State::CONNECTED, Event::WIFI_DISCONNECTED,State::RECONNECTING, nullptr },
{ State::CONNECTED, Event::OTA_START, State::OTA_UPDATING, nullptr },
{ State::RECONNECTING, Event::TICK, State::CONNECTING_WIFI, nullptr },
{ State::OTA_UPDATING, Event::OTA_COMPLETE, State::CONNECTED, nullptr },
{ State::OTA_UPDATING, Event::OTA_ERROR, State::RECONNECTING, nullptr },
{ State::OTA_UPDATING, Event::WIFI_DISCONNECTED,State::RECONNECTING, nullptr },
};
}
// ── FreeRTOS tasks ────────────────────────────────────────────────────────────
// FSM task: consumes events from the queue and dispatches them
void fsmTask(void *parameter) {
Event event;
for (;;) {
if (xQueueReceive(g_eventQueue, &event, pdMS_TO_TICKS(100)) == pdTRUE) {
g_fsm->dispatch(event);
}
}
}
// Sensor task: reads sensors and publishes. Only runs meaningful work
// when FSM is in CONNECTED state. Doesn't need to know about WiFi or MQTT.
void sensorTask(void *parameter) {
for (;;) {
if (g_fsm && g_fsm->current() == State::CONNECTED && mqtt.connected()) {
float temperature = 23.5f + (float)(random(-20, 20)) / 10.0f;
char payload[64];
snprintf(payload, sizeof(payload), "{\"temp\":%.1f}", temperature);
mqtt.publish("device/telemetry", payload);
Serial.printf("[Sensor] Published: %s\n", payload);
}
vTaskDelay(pdMS_TO_TICKS(10000));
}
}
// WiFi event handler: pushes events into the queue from WiFi callbacks
void onWiFiEvent(WiFiEvent_t event) {
Event fsmEvent;
bool sendEvent = true;
switch (event) {
case ARDUINO_EVENT_WIFI_STA_CONNECTED:
fsmEvent = Event::WIFI_CONNECTED;
break;
case ARDUINO_EVENT_WIFI_STA_DISCONNECTED:
fsmEvent = Event::WIFI_DISCONNECTED;
break;
default:
sendEvent = false;
break;
}
if (sendEvent) {
// xQueueSendFromISR if needed, but WiFi events run in their own task
xQueueSend(g_eventQueue, &fsmEvent, 0);
}
}
// MQTT callback: receives messages, can trigger FSM events
void mqttCallback(char* topic, byte* payload, unsigned int length) {
String msg;
for (unsigned int i = 0; i < length; i++) msg += (char)payload[i];
Serial.printf("[MQTT] Message on %s: %s\n", topic, msg.c_str());
if (String(topic) == "device/ota/trigger" && msg == "start") {
Event e = Event::OTA_START;
xQueueSend(g_eventQueue, &e, 0);
}
}
// ── Setup ─────────────────────────────────────────────────────────────────────
void setup() {
Serial.begin(115200);
Serial.println("[Boot] ESP32 FSM connection manager");
// Event queue: holds up to 16 events
g_eventQueue = xQueueCreate(16, sizeof(Event));
// Build and start the FSM
g_fsm = new FSM(
State::DISCONNECTED,
buildStateDescriptors(),
buildTransitionTable()
);
WiFi.onEvent(onWiFiEvent);
mqtt.setCallback(mqttCallback);
g_fsm->start();
// Kick the FSM into motion
Event e = Event::TICK;
xQueueSend(g_eventQueue, &e, 0);
xTaskCreatePinnedToCore(fsmTask, "fsm", 4096, NULL, 3, NULL, 0);
xTaskCreatePinnedToCore(sensorTask, "sensor", 4096, NULL, 1, NULL, 1);
}
void loop() {
// ArduinoOTA.handle() needs to run in main loop during OTA
if (g_fsm && g_fsm->current() == State::OTA_UPDATING) {
ArduinoOTA.handle();
}
mqtt.loop();
vTaskDelay(pdMS_TO_TICKS(10));
}
Entry and exit actions: where the real power is
Most FSM tutorials show only transition actions — code that runs on a specific arc from state A to state B. Entry and exit actions are more powerful.
An entry action runs every time you enter a state, regardless of which transition brought you there. This means WiFi.begin() is written once in the CONNECTING_WIFI entry action, not in every transition that targets CONNECTING_WIFI.
An exit action runs every time you leave a state, regardless of where you’re going. In the example above, onExitConnected() publishes an offline status and unsubscribes from topics. This happens whether we’re leaving CONNECTED due to WiFi drop, MQTT disconnect, or OTA starting. One function, not three copies.
The duplication trap: without entry/exit actions, every transition into CONNECTED must call
mqtt.subscribe(). Every transition out of CONNECTED must callmqtt.unsubscribe(). With four transitions in and three out, you have seven places to remember. Entry/exit reduces this to one each.
Illegal states become impossible
The boolean flag approach can represent wifiConnected=false && mqttConnected=true. There is no MQTT without WiFi, so this state is impossible in reality but representable in code. It leads to bugs.
With the FSM transition table above, the state CONNECTED_MQTT_WITHOUT_WIFI does not exist in the enum. It cannot be reached. To get to CONNECTING_MQTT, you must pass through CONNECTED_WIFI. To get to CONNECTED, you must pass through CONNECTING_MQTT. The preconditions are embedded in the structure of the transition graph.
Adding OTA: the comparison
Boolean flag approach: to add OTA, you add bool otaInProgress. You find every place that reads sensors and add && !otaInProgress. You find every place that publishes MQTT and add && !otaInProgress. You find every WiFi reconnect path and decide whether OTA should be cancelled. You touch the code in a dozen places and miss two.
FSM table approach: you add OTA_UPDATING to the enum and OTA_START/OTA_COMPLETE/OTA_ERROR to the event enum. You add four rows to the transition table. The entry action for OTA_UPDATING pauses sensor tasks. The exit action (or CONNECTED entry action) resumes them. The sensor task checks g_fsm->current() == State::CONNECTED and naturally does nothing in any other state. Total changes: two enums, four table rows, one entry action. Nothing else.
FreeRTOS integration
The pattern shown above puts the FSM in its own task and uses a FreeRTOS queue as the event conduit. Other tasks (WiFi event handler, MQTT callback, button ISR) push events into the queue. The FSM task consumes them.
This is clean because:
- Events from interrupt context use
xQueueSendFromISR()— no ISR-unsafe calls in the handler. - Event ordering is preserved by the FIFO queue.
- The FSM always runs in its own task context where it is safe to call library functions.
- Tasks that produce events do not need to know about each other or about the FSM’s internal state.
Queue size: size the event queue generously. A queue of 16 events is cheap (64 bytes on a 32-bit system). A full queue silently drops events, which causes hard-to-reproduce bugs. If your FSM has more than 16 possible events queued simultaneously, something else is wrong.
Going further
The pattern above handles a flat set of states well. If you start adding more states and notice that several states share common behavior — for example, all CONNECTED sub-states should respond to WIFI_DISCONNECTED the same way — you are hitting the limits of flat FSMs. Hierarchical state machines let parent states define default transitions that child states inherit.
If you find yourself adding more states, or needing hierarchical states like this, that is exactly what PulseHSM formalises — but the pattern above will take you far on real projects.
What’s next
Now that the FSM manages system state cleanly, there’s a new question: how do you know if the firmware is actually working correctly? A WiFi connection might stay up but never receive data. The MQTT client might be connected but the broker might be dropping messages silently. The next post covers the ESP32 Task Watchdog Timer — and specifically how to wire the FSM’s current state into a health monitor that only feeds the watchdog when the firmware is genuinely healthy, not just alive.
The transition table pattern is easy to unit test: build the FSM, dispatch events, assert current state. No hardware required.
Related posts
Comments
Enjoyed this tutorial?
Get new ESP32, Arduino, and industrial IoT tutorials straight to your inbox — no spam, unsubscribe anytime.