Hi all, I’ve got a cheap Celeron box running OPNSense and it’s been pretty good so far, but I found twice that the device turned off at some point while I was at work, and I have been unable to figure out what’s causing it.
The only change was that I enabled Monit to see if I could figure out what was causing crowdsec to stop sometimes but never ended up configuring anything. I’ve only been running it for a couple months though, so it’s possible that that is not related.
I know that on a Mac (based on freebsd, right?) you can determine whether the shutdown reason was a hard shutdown, regular shutdown, or the power cable being unplugged. Is it possible to do that with OPNSense? I’d like to narrow it down to software or hardware ideally.
If you have a pi or Linux box, try setting it up as a syslog server. Then tell opnsense to use that for forwarding logs to. Doesn’t guarantee you’ll see what went wrong, but maybe it’ll help.
I’m not sure opnsense has journalctl or something similar, but that would be a good place to look for some history, too.
I would ssh into the opnsense box and press 8 to run the shell terminal and then run dmesg and go back to the time the server rebooted, there you can see the events leading up to the shutdown.
Dmesg doesn’t go back very far, does it? I only see the current boot and the one before that, which was a normal shutdown.
I believe I was able to see the last logs before the system turned off last time, and the last messages were syncing discs and all buffers synced, which I would have expected to be part of a normal shutdown.
If it happens again I’ll be sure to get the logs before the crash or shut down and save it to a file.
UPDATE: It crashed again today, and I was able to pull some logs and check the temperature at the time of the crash. (91 degrees which dropped to 71 degrees right before crashing?
From system log
<13>1 2024-03-13T18:30:44-04:00 OPNsense.my.home opnsense 44846 - [meta sequenceId="1192"] /usr/local/etc/rc.newwanipv6: No IP change detected (current: IPV6ADDRESSREDACTED, interface: wan) <13>1 2024-03-13T18:30:53-04:00 OPNsense.my.home opnsense 60522 - [meta sequenceId="1193"] /usr/local/etc/rc.newwanipv6: No IP change detected (current: IPV6ADDRESSREDACTED, interface: wan) <45>1 2024-03-13T22:12:44-04:00 OPNsense.my.home syslog-ng 10182 - [meta sequenceId="1"] syslog-ng starting up; version='4.6.0' <13>1 2024-03-13T22:12:45-04:00 OPNsense.my.home kernel - - [meta sequenceId="2"] ---<<BOOT>>--- <13>1 2024-03-13T22:12:45-04:00 OPNsense.my.home kernel - - [meta sequenceId="138"] WARNING: / was not properly dismounted
From dmesg
arp: 192.168.1.61 moved from someMAC to anotherMAC on igc1 arp: 192.168.1.61 moved from anotherMAC to someMAC on igc1 WARNING: / was not properly dismounted WARNING: /: mount pending error: blocks 40 files 4
I mean, I’m not saying that errors on the drive are the CAUSE of the problem, more likely a symptom, but it does look like it just straight up crashed, right?
Final Update: it’s the hardware, I think it was overheating in general, but also the SSD seems to have been dying and the ram wasn’t particularly reliable, possibly due to the heat.
Good lesson not to buy the cheapest thing from AliExpress! My new box is working great.
deleted by creator