MongoDB kept segfaulting and Gemini saved me
My local MongoDB instance seemingly randomly started crashing today, so I had to spend an hour debugging it with Gemini, writing this up here for posterity.
What I saw
systemctl status mongodb.service showed this:
× mongodb.service - MongoDB Database Server
Active: failed (Result: core-dump) since Tue 2026-03-10 09:32:44 CET
Process: 1064 ExecStart=/usr/bin/mongod --config /etc/mongodb.conf (code=dumped, signal=SEGV)
Main PID: 1064 (code=dumped, signal=SEGV)
The C++ stack trace pointed to JournalFlusher::run() which is WiredTiger’s
thread for flushing write-ahead logs to disk. While crashing, MongoDB tried to
log what went wrong through boost::log, hit another memory error in the
process, and that’s what generated the dump.
I restarted the service and it came back up. Connected with mongosh, ran some
commands, everything seemed fine. Then about a minute later, mid-session:
MongoServerSelectionError: connect ECONNREFUSED 127.0.0.1:27017
Dead again and it was the same SEGV, same thread.
First theory: ABI mismatch
The MongoDB log had no FATAL message before dying, as it just stopped. But
buried in the startup output was this:
"buildInfo": { "environment": { "distmod": "ubuntu2404" } }
I’m on Arch Linux, so I installed MongoDB via the mongodb80-bin AUR package,
which is a pre-compiled Ubuntu 24.04 binary. Arch is rolling release, so core
C++ libraries like boost get updated regularly, and the crash was consistently
happening inside boost::log, which seemed to point towards an ABI mismatch.
The binary was built against older Ubuntu versions of those libraries, so a
recent system update could have broken compatibility.
Gemini suggested switching to Docker to sidestep the library issue entirely,
since the official mongo:8.0 image bundles the Ubuntu environment the binary
was built for.
Docker didn’t help
I stopped and disabled the systemd service, then ran:
docker run --name mongodb -d -p 27017:27017 -v /var/lib/mongodb:/data/db mongo:8.0
Container came up, startup looked clean at first, connected with mongosh…
then ECONNREFUSED again. docker ps -a showed Exited (139).
The container was crashing with the exact same segfault.
I changed the Docker command a bit to fix a few issues: my config had bindIp: 127.0.0.1 which inside a container means the container’s own localhost (not the
host machine), and the default data path /data/db didn’t match what my config
file specified. Also needed --user 966:966 to match the mongodb user’s UID
on Arch so the container could write to the host files. Final command:
docker run --name mongodb -d \
-p 27017:27017 \
--user 966:966 \
-v /etc/mongodb.conf:/etc/mongodb.conf:ro \
-v /var/lib/mongodb:/var/lib/mongodb \
-v /var/log/mongodb:/var/log/mongodb \
mongo:8.0 \
mongod --config /etc/mongodb.conf --bind_ip_all
Same result, it crashed within seconds of connecting.
Maybe the data files are corrupted?
The database had previously been initialized as a replica set and then abandoned without a clean shutdown. Gemini suggested this could leave WiredTiger in a state that causes a hard crash in 8.0 when it tries to flush journal entries. I could easily rule it out by giving MongoDB a fresh start.
I moved the old data directory out of the way:
sudo mv /var/lib/mongodb /var/lib/mongodb_old
sudo mkdir /var/lib/mongodb
sudo chown mongodb:mongodb /var/lib/mongodb
I relaunched the container with the same command, but then it crashed again with the exact same exit code 139. An empty database inside a clean Docker container was segfaulting, so it wasn’t the data, wasn’t the Arch libraries, wasn’t the config…
What was actually killing it
This is something I wouldn’t have ever thought of, unless Gemini told me about it.
Going back to the dmesg output I’d checked earlier, I had a modern AMD CPU and
looking at the very first crash stack trace again, one of the frames was
_ZN5mongo9transport15SessionWorkflow. This is MongoDB’s session handling code,
which uses C++ coroutines that work by rapidly swapping memory stacks.
Gemini flagged something I had no idea about: recent AMD CPUs with newer Linux kernels enable Control-flow Enforcement Technology (CET), specifically Shadow Stacks (SHSTK). It’s a hardware security feature that monitors call/return patterns to detect ROP attacks via gadget chains. When MongoDB’s session handler swaps its coroutine stack, the CPU sees a stack pointer jumping to an unexpected location and treats it as an attack, killing the process with a SIGSEGV before MongoDB can even write a single log entry.
This would explain everything:
- Crashes right after a client connects (session creation triggers the coroutine stack swap)
- No
FATALlog message (hardware kill, no time to log anything) - Docker making no difference (containers share the host kernel and CPU)
- A fresh empty database also crashing (nothing to do with data state)
The fix is to tell glibc to disable Shadow Stacks for the process:
GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK
For Docker:
docker run --name mongodb -d \
-p 27017:27017 \
-e GLIBC_TUNABLES="glibc.cpu.hwcaps=-SHSTK" \
--user 966:966 \
-v /etc/mongodb.conf:/etc/mongodb.conf:ro \
-v /var/lib/mongodb:/var/lib/mongodb \
-v /var/log/mongodb:/var/log/mongodb \
mongo:8.0 \
mongod --config /etc/mongodb.conf --bind_ip_all
For systemd, via an override so it survives package updates:
sudo systemctl edit mongod
[Service]
Environment="GLIBC_TUNABLES=glibc.cpu.hwcaps=-SHSTK:glibc.pthread.rseq=0"
The second tunable glibc.pthread.rseq=0 was already being set by something
else in my systemd config. It disables Restartable Sequences, apparently needed
in some virtualization setups. Since GLIBC_TUNABLES is a single
colon-separated string, I had to combine them or one would shadow the other.
After restarting MongoDB stayed up and when I ran commands it didn’t crash.
I’m not 100% certain the SHSTK explanation is correct. I didn’t go deep enough into the crash to be sure that’s what’s happening, but adding that tunable stopped the segfaults, so that’s where I’m leaving it.
Some more links which might be relevant: