Archive for the bsd Category

Portauditing jails

| January 7th, 2010

I run most of the services in separate jails (that’s all easy to set up and maintain with ezjail) and quite often end up having 10+ jails even on simple mail/web server installations.

Installing portaudit, updating its database and running it every day in each jail seems to be a waste of space and resources. Instead I decided to check all jails from the host system during nightly security checks.

This should be saved as /usr/local/etc/periodic/security/420.jailportaudit (with chmod 555):

#!/bin/sh

RET_VAL=”"

get_jail_name()
{
jid=$1
RET_VAL=`jls | egrep “^ +$jid ” | awk ‘{print $3}’`
}

check_jail()
{
jid=$1

get_jail_name $jid
echo “==== checking jail :: ” $RET_VAL ” :: ====”
/usr/sbin/jexec $jid pkg_info | /usr/bin/awk ‘{print $1}’ | /usr/bin/xargs /usr/local/sbin/portaudit
echo
}

main()
{
param=$1

for i in `jls | tail +2 | awk ‘{print $1}’`
do
check_jail $i
done
}

main $@

Fixing corrupted Ironport Queue

| January 5th, 2010

Looks like there’s a hidden command you can use to fix queue problems similar to these:

Critical: Queue: Your queue has been corrupted; UNABLE TO REPAIR: unable to
mount queue: ‘(\’qstore/gcq.py get_time_sorted_gens|919\’, “ \'exceptions.OSError\'>“, “[Errno 2] No such file or directory:
\’/var/db/godspeed/gen/gen063.chk\’”, \’[qstore/gcq.py mount|1387]
[qstore/gcq.py load|996] [qstore/gcq.py get_time_sorted_gens|919]\’)’

Critical: Error while sending alert: Unable to send System/Critical alert to xxx@xxx.com with subject “Critical ironport: Queue: Your queue has been corrupted; UNABLE TO REPAIR: unab…”.

What it means is basically that the workqueue is corrupted and the ironport is unable to accept/deliver emails. Rebooting doesn’t help (doesn’t really change anything). There’s however a way of recovering your ironport from this problem. The hidden command is:

resetqueue

It deletes the broken queue, creates a new one, removes all messages in the system quarantines and reboots the ironport. After this clean up operation your ironport should be as new.

What’s interesting is that by looking at various error messages thrown by Ironports from time to time (especially when something breaks more seriously – doesn’t happen too often Ironports are quite solid) you can actually see what’s running under the hood. Other than it running on something derived from FreeBSD (can’t be that far off it as they actually contribute some code back to the OS) it looks like it’s mostly run by python scripts. That’s interesting from the performance perspective as even the queue management seems to be written in python. Also the database used internally seems to be some version of PostgreSQL. A very nice choice of software…

Haven’t seen anything yet that would suggest what MTA (if it’s not something created by Ironport) is used there. Sure the fact it can write qmail compatible log files doesn’t mean anything :P )

ssh port knocking with pf

| October 24th, 2009

The idea of port knocking is simple – a service, normally firewalled accepts connections from a given source IP if that IP address has connected to certain ports in some special sequence. This is a simplified implementation of this idea using pf to protect the ssh service.

In pf.conf file:


### pf tables
table <ssh_accept> persist

### pf rules
block in log all

pass in quick on $if proto tcp from <ssh_accept> to $me port 22 flags S/SA keep state

# there’s no service listening on 31337 so we need synproxy state to complete the handshake
pass in quick on $if proto tcp from any to $me port {31337} synproxy state (max-src-conn-rate 3/5, overload <ssh_accept>

This will open port 22 on the $me host if there are 3 attempts to connect to port 31337 within 5 seconds.

From this moment ssh access to $me is granted. This shouldn’t probably be allowed forever, so this crontab entry will clear all entries in the ssh_accept table not used within last 5 minutes:

*/5 * * * * root /sbin/pfctl -q -t ssh_accept -T expire 300

A template for nagios plugins I use:

#!/usr/bin/env python

import sys, getopt

nagios_codes = {‘OK’: 0,
                ‘WARNING’: 1,
                ‘CRITICAL’: 2,
                ‘UNKNOWN’: 3,
                ‘DEPENDENT’: 4}

def usage():
    """ returns nagios status UNKNOWN with
        a one line usage description
        usage() calls nagios_return()
    "
""
    nagios_return(‘UNKNOWN’,
            "usage: {0} -h host".format(sys.argv[0]))

def nagios_return(code, response):
    """ prints the response message
        and exits the script with one
        of the defined exit codes
        DOES NOT RETURN
    "
""
    print code + ": " + response
    sys.exit(nagios_codes[code])

def check_condition(host):
    """ a dummy check
        doesn’t really check anything
    "
""
    return {"code": "OK", "message": host + " ok"}

def main():
    """ example options processing
        here we’re expecting 1 option "
-h"
        with a parameter
    "
""
    if len(sys.argv) < 2:
        usage()

    try:
        opts, args = getopt.getopt(sys.argv[1:], "h:")
    except getopt.GetoptError, err:
        usage()

    for o, value in opts:
        if o == "-h":
            host = value
        else:
            usage()

    result = check_condition(host)
    nagios_return(result[‘code’], result[‘message’])

if __name__ == "__main__":
    main()

our cisco router is at 10.0.0.1 and our freebsd box is at 10.0.0.20.

first cisco configuration:

!adds router’s local time to messages

service timestamps log datetime localtime

!this works on ios 12.4, other versions might use different syntax

logging trap debugging

!our syslog server

logging 10.0.0.20

logging on

now on the freebsd box. first enable syslog to accept messages from external sources, in /etc/rc.conf:

syslogd_flags=”-a 10.0.0.1/32:*”

the “:*” at the end is quite important as it tells syslogd to accept all messages sent from 10.0.0.1 from any source port. Without it it only accepts messages sent from port 514 (syslog)

next create your log file: touch /var/log/router.log and add something similar to the top of your /etc/syslog.conf:

#enter your router’s host name here:

+10.0.0.1

#in fact local7.* should be enough here, as it’s cisco’s default facility

*.*  /var/log/router.log

#this resets the previous +host definition

+*

now restart syslogd:

# /etc/rc.d/syslogd restart

if you can’t see anything in /var/log/router.log (and it’s not because your router has nothing to report), start your syslog in the debugging mode:

# /etc/rc.d/syslogd stop

# syslogd -d -v -a ‘10.0.0.1/32:*’

Just a simple script to monitor temperature on a soekris net5501 box running OpenBSD and OpenBSD’s snmpd.

Should be used as any other snmp__* munin plugin: snmp__soekris_temp

The script requires snmpwalk to be installed on the monitoring system.

Web applications can be traced or debugged on many layers. The highest layer would be what some frameworks like symfony provide in their development consoles. Another layer would be to look at output from php-xdebug in kcachegrind to do profiling. But the lowest possible level is to look at actual system calls used by a running application.

FreeBSD has a base system utility called ktrace. It allows administrator to attach to a running process and log all system calls used by the process. How to use it to trace a web application running under apache?

First apache has to be started in debug mode, with only one worked running. This will make finding the apache process running our application easier to find.

This is how to start apache in debug mode:

httpd -X &
[1] 34702

Running this in the background will return the ID of the process: 34702. Now, all we have to do is to attach ktrace to this process:

ktrace -dip 34702

option “-d” means that all descendants (current child processes) of the process will also be traced and option “-i” means that all process spawned by our process will also be traced. “-p” option is used to give the PID of the traced process.

at this point we can run our application from the browser and after doing so take a look at system calls used. Ktrace saves all system calls to a ‘ktrace.out” file in the current directory. kdump utility is used to display contents of this file:

kdump -R

“-R” option will display time taken between entries so we can estimate how long a syscall took to finish.

we can detach ktrace and stop tracing the process by:

ktrace -C

This is a follow-up to the previous post. Instead of having your disks mounted read-write and have them broken by an unexpected reboot why not run them in read-only mode instead? Especially if your file systems are stored on a CompactFlash card and shouldn’t be written to too often anyway. If mounted read-only your partitions are never marked dirty and don’t even require fsck to be run during boot (so having fastboot enabled is fine in this case).

First of all, lets just assume that there is only one partition on the system (the root partition – /). It’s not impossible to change it’s mount options to ro and boot your system this way. OpenBSD will actually boot fine, but it will complain a bit.. First of all, it will complain about some parts of the /var subtree not being writeable. Syslog won’t certainly like not being able to write to /var/log/*, and other daemons might not like read-only /var/run and /tmp directories. Additionally, some daemons really need write access to some devices in /dev. So this is not ideal solution.

A slightly better one is to have your / partition mounted read-only and have /var /tmp and /dev mounted read-write from memory based file systems. A sample /etc/fstab doing this would look like this:

/dev/wd0a / ffs ro,noatime 1 1
swap /tmp mfs rw,noatime,nodev,nosuid,-s=20000 0 0
swap /var mfs rw,noatime,nodev,nosuid,-s=40000 0 0
swap /dev mfs rw,noatime,nosuid,noexec,-s=20000 0 0

This creates 10MB /tmp and /dev file systems and 20MB /var. The only problem now is that /var and /dev partitions are empty and we need to recreate their structure in order to make OpenBSD happy.

First, the /var parition needs all subdirectories created, this is easily done with mtree:

mtree -qdef /etc/mtree/4.4BSD.dist -p / -u

The /dev partition needs all devices created. This is usually done by running the MAKEDEV script located in /dev. Since we’re creating a new, blank /dev directory we need to make a copy of the MAKEDEV script:

cp /dev/MAKEDEV /root/

now, after mounting our memory based /dev we can do:

cd /dev; sh MAKEDEV all

and all devices should be created for us.

This all should be done during the boot process, this patch does that:

— rc.orig    2008-10-06 17:01:58.000000000 +0100
+++ rc    2008-10-06 17:03:18.000000000 +0100
@@ -212,6 +212,8 @@
mount -a -t nonfs,vnd
mount -uw /        # root on nfs requires this, others aren’t hurt
rm -f /fastboot        # XXX (root now writeable)
+mtree -qdef /etc/mtree/4.4BSD.dist -p / -u
+cp /root/MAKEDEV /dev; cd /dev; sh MAKEDEV all

random_seed

after applying the patch (don’t forget to copy the MAKEDEV file!!!) and rebooting we should have a working OpenBSD server which can survive random power loses without disk. What if you need to upgrade your system or change some config files? Easy, just remount your root file system:

mount -u -o rw /

When OpenBSD boots it checks all file system with fsck in preen mode. In this mode fsck not only checks file systems but can also frepair minor problems such as:

  • Unreferenced inodes
  • Link counts in inodes too large
  • Missing blocks in the free map
  • Blocks in the free map also in files
  • Counts in the super-block wrong

All these problems are correctable without any data loss. However, when something more serious happens to your file system, fsck -p won’t even try repairing it, it will exit with an error and wait for the administrator to make the decision of what and how should be repaired. This means that the boot procedure will be stopped and the system will drop to shell and wait for manual intervention.

What an admin usually does to fix it at this time is to run fsck or fsck -y by hand. This requires console access to manually type these commands and respond to fsck’s prompts.

FreeBSD has an rc option called fsck_y_enable to automate this process so it doesn’t require manual intervention.

This simple patch adds something similar to OpenBSD’s rc script.

This is something new I’ve just learned that only exists on OpenBSD. Up until today I thought that the only way to manually failover a carp setup was to down the carp interface on the master.

It looks like there is an easier way of doing it on OpenBSD. In fact OpenBSD uses this feature itself during the boot process. Just before setting up all interfaces it “demotes” all carp interfaces so they won’t become master interfaces for their ip addresses until all enabled system daemons, pf, ipsec etc have been configured and started. After that the whole carp group of interfaces is put back to the neutral state and they can become master interfaces (if there is no advskew set on them).

How is it done?

OpenBSD has this concept of groups of interfaces. It’s easy to spot it when you do ifconfig:

# ifconfig
lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> mtu 33208
groups: lo
inet 127.0.0.1 netmask 0xff000000
inet6 ::1 prefixlen 128
inet6 fe80::1%lo0 prefixlen 64 scopeid 0×3
vic0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:0c:29:9c:5e:57
groups: egress
media: Ethernet autoselect
status: active
inet 172.21.33.5 netmask 0xffffff00 broadcast 172.21.33.255
inet6 fe80::20c:29ff:fe9c:5e57%vic0 prefixlen 64 scopeid 0×1
enc0: flags=0<> mtu 1536
carp0: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:00:00:00:00:00
groups: carp

Each interface has its own default group (or groups). The default group for all carp interfaces is… the carp group! You can create your own groups and add interfaces to them. An interface can belong to multiple groups. Here’s how to create a new group and add carp0 to it:

# ifconfig carp0 group mygroup
# ifconfig carp0
carp0: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:00:00:00:00:00
groups: carp mygroup

and here is how to remove it :)

# ifconfig carp0 -group mygroup
# ifconfig carp0
carp0: flags=8803<UP,BROADCAST,SIMPLEX,MULTICAST> mtu 1500
lladdr 00:00:00:00:00:00
groups: carp

All groups have this additional property called the demote count which is used by carp during the master election process. Using this property you can demote a group of interfaces:

# ifconfig -g carp carpdemote 128

and promote it back:

# ifconfig -g carp -carpdemote 128

and you can see the current value:

# ifconfig -g carp
carp: carp demote count 0

So how is this better than downing all your carp interfaces by doing something like this:

for i in `ls /etc/hostname.carp*`; do echo $i | awk -F. ‘{print $2}’ | xargs -I% ifconfig % down; done

When you down your carp interface they no longer take part in the whole “carp process”. Basically since they are down they no longer advertise their presence and cannot be elected as masters. So if your backup server dies and all carp interfaces on your master are down you loose your connectivity.

Carp demote counter acts in a bit similar way to advskew but has higher precendence over it. So a carp interface with advskew set to 0 and demote counter set to 10 will be ranked lower (and become slave) than another carp interface with advskew 100 and demote counter set to 0.

Plus, by logically groupping carp interfaces you can failover only one group at a time, and when you have a lot of interfaces this is certainly easier then using ifconfig down.