My /etc Directory Was Corrupted by My Water Heater
I know, right? WTF?
Our water tank has an electrical heating element on it’s base that was overloading the circuit shared by our boiler, which tripped half of the household electrical switches, including the one my pfSense firewall is on.
After calling in an emergency plumber, on the 24th of December of all nights, we were all back up and running. Except the firewall.
I brought up the pfSense web configurator and received a 503 error for my troubles. Something was wrong. Because ssh -l admin 192.168.1.1 wasn’t working, I lugged a monitor, keyboard and mouse to the closet I keep the headless machine in and I used my mobile phone to Google an answer.
There were several errors observed in the boot process and log files, but the one I hung my Googling around was:
fcgicli: Could not connect to server(/var/run/php-fpm.socket)
Additionally, I found permission errors related to the missing group, wheel. Later I would find that I was missing over three-quarters of my /etc/group file, including _dhcp, which explains why only my hard coded IP address machine and NAS were working, but none of my DHCP-based WiFi devices.
I manually inserted a wheel group, which at least got me back online long enough to download a new copy of pfSense, 2.2.6 and to make a backup copy of my configuration. Later I added a _dhcp group, as well, and my internet limped along until the next afternoon, when I performed a complete rebuild. The manual additions wouldn’t work as permanent fixes as a reboot of the machine caused my additions to be removed for some reason.
I had been running pfSense 2.2.2, which allegedly contributed to my problem as the aggressive panic mode of FreeBSD triggered by a sudden power loss causes some files in the /etc directory to become corrupted, which is what pfSense forum posts tell me the problem was, and had been “fixed” in 2.2.4. Nevermind the posters who said they experienced the same problem on that version.