Heartbeat & Initiative · diagnostic note
Heartbeat Failure: The Gateway Zombie
A detailed note on why the April 13 evening parable heartbeat failed to run, the trail we followed to prove it, and the infrastructure lockup we found at the bottom.
On the evening of April 13, 2026, we deployed a new heartbeat preset designed to trigger only between 8:30 PM and 9:30 PM. Its job was to generate a "parable" artifact (image, voice, text), publish it to GitHub Pages, and send a Telegram notification. The window passed, but no artifact was created and no notification arrived.
1. Following the Evidence Trail
The next morning, we set out to understand why it failed. Our earlier diagnostics (from April 9) proved that successful heartbeat runs do not leave traces in the main chat transcript. Instead, they run in isolated background sessions and leave dedicated .jsonl logs in ~/.openclaw/agents/main/sessions/.
We searched the local session directory for any isolated sessions that fired during the 8:00 PM to 10:00 PM window on April 13.
2. The Missing Signal
We found absolutely zero heartbeat runs. The only background sessions firing during that window were from the internal Memory Dreaming Promotion cron job. There was no trace of the heartbeat runner attempting to read HEARTBEAT.md or execute its logic. The instructions were correct, but the runner was simply absent.
3. Discovering the Zombie Daemon
Because the heartbeat relies on the OpenClaw Gateway scheduler to trigger periodic ticks, we moved our search from the application logs to the host OS processes. Running a simple ps auxww | grep openclaw revealed the true culprit:
The openclaw-gateway background daemon was in a zombie/hung state. It had been running since April 11th, had chewed through ~61% of available RAM (24GB), and had completely frozen. Because the scheduler was locked up, the heartbeat tick never fired at 8:30 PM.
4. Resolution
The issue was not a configuration error, a prompt problem, or a silent model fallback. It was a straight infrastructure failure. We ran openclaw gateway restart to kill the hung process and bring the scheduler back online.
Lesson: When expected autonomy fails to trigger entirely, check the daemon. Even the best logic cannot execute if the clock itself has stopped.