DNS traceroute

| July 22nd, 2009

This helps tracking down bottlenecks in DNS response from recursive queries.

First you need a DNS query payload (contents of a valid udp DNS A type query).

$hostname = “www.google.com”;
$header = pack(“n C2 n4″, 6666, 1, 0, 1, 0, 0, 0);

for (split(/\./,$hostname)) {
$lformat .= “C a* “;
$labels[$count++]=length;
$labels[$count++]=$_;
}

$question = pack($lformat.”C n2″, @labels, 0, 1, 1);

open(PACKET, ‘>>dns_packet.txt’);
print PACKET $header.$question;
close(PACKET);

and now use this payload with hping2 to send udp packets:

hping -2 -p 53 -E dns_packet.txt -d 39 -T xxx.yyy.zzz.xyz

A template for nagios plugins I use:

#!/usr/bin/env python

import sys, getopt

nagios_codes = {‘OK’: 0,
                ‘WARNING’: 1,
                ‘CRITICAL’: 2,
                ‘UNKNOWN’: 3,
                ‘DEPENDENT’: 4}

def usage():
    """ returns nagios status UNKNOWN with
        a one line usage description
        usage() calls nagios_return()
    "
""
    nagios_return(‘UNKNOWN’,
            "usage: {0} -h host".format(sys.argv[0]))

def nagios_return(code, response):
    """ prints the response message
        and exits the script with one
        of the defined exit codes
        DOES NOT RETURN
    "
""
    print code + ": " + response
    sys.exit(nagios_codes[code])

def check_condition(host):
    """ a dummy check
        doesn’t really check anything
    "
""
    return {"code": "OK", "message": host + " ok"}

def main():
    """ example options processing
        here we’re expecting 1 option "
-h"
        with a parameter
    "
""
    if len(sys.argv) < 2:
        usage()

    try:
        opts, args = getopt.getopt(sys.argv[1:], "h:")
    except getopt.GetoptError, err:
        usage()

    for o, value in opts:
        if o == "-h":
            host = value
        else:
            usage()

    result = check_condition(host)
    nagios_return(result[‘code’], result[‘message’])

if __name__ == "__main__":
    main()

1.  First get the latest version of MegaCLI from LSI website (choose MegaCLI – Linux from the list). Unzip the downloaded file. The package inside the .zip file is an .rpm so it needs to be converted into something more useful.

2. install alien, convert the .rpm into a .tgz and ‘untargzip’ it :

$ sudo apt-get install alien

$ sudo alien –to-tgz  MegaCli-1.01.39-0.i386.rpm

$ tar xvfz MegaCli-1.01.39.tgz

3. now in ./opt/MegaRAID/MegaCli you should have MegaCli and MegaCli64, depending on your ubuntu installation type (i368 or amd64) copy one of them as MegaCli to /usr/sbin (so for 64bit: cp MegaCli64 /usr/sbin/MegaCli)

4. Download nagios check script from Nagios Exchange and place it in /usr/lib/nagios/plugins

5. The MegaCli script has to be run as root, usually nagios nrpe runs as user nagios, and it’s better to keep it this way. The nagios check script is fully aware of that and uses sudo to call the raid check script. So nagios needs to be allowed to call this script as root. Use visudo and add this line:

nagios ALL=(ALL) NOPASSWD: /usr/sbin/MegaCli

6. Add a new command definition to nrpe config (/etc/nagios/nrpe_local.cfg):

command[check_raid]=/usr/lib/nagios/plugins/check_megaraid_sas

our cisco router is at 10.0.0.1 and our freebsd box is at 10.0.0.20.

first cisco configuration:

!adds router’s local time to messages

service timestamps log datetime localtime

!this works on ios 12.4, other versions might use different syntax

logging trap debugging

!our syslog server

logging 10.0.0.20

logging on

now on the freebsd box. first enable syslog to accept messages from external sources, in /etc/rc.conf:

syslogd_flags=”-a 10.0.0.1/32:*”

the “:*” at the end is quite important as it tells syslogd to accept all messages sent from 10.0.0.1 from any source port. Without it it only accepts messages sent from port 514 (syslog)

next create your log file: touch /var/log/router.log and add something similar to the top of your /etc/syslog.conf:

#enter your router’s host name here:

+10.0.0.1

#in fact local7.* should be enough here, as it’s cisco’s default facility

*.*  /var/log/router.log

#this resets the previous +host definition

+*

now restart syslogd:

# /etc/rc.d/syslogd restart

if you can’t see anything in /var/log/router.log (and it’s not because your router has nothing to report), start your syslog in the debugging mode:

# /etc/rc.d/syslogd stop

# syslogd -d -v -a ‘10.0.0.1/32:*’

First check what version of the sound card you have in your dell:

# grep ‘Codec’ /proc/asound/card0/codec#*

the command should return with “Codec: SigmaTel STAC9227″. If that’s the case add this line to your /etc/modprobe.d/alsa-base file:

options snd-hda-intel model=dell-3stack

and reboot.

live backup under vmware ESXi

| January 16th, 2009

backup a virtual machine running on an ESXi host.

Let’s assume we’re backing up a virtual machine called ubuntu:

1. first let’s find its VM id:

vmid=`vim-cmd vmsvc/getallvms | grep ubuntu | awk ‘{print $2}’`

2. now we have to snapshot this machine so VMware can release the lock on the virtual HDD file:

vim-cmd vmsvc/snapshot.create $vmid snapshot1 backupsnaphot

3. now we can copy the contents of the virtual machine’s directory. it will live somewhere under /vmfs/volumes/. In my case I created it in datastore1, so the full path is: /vmfs/volumes/datastore1/ubuntu. The whole directory can be copied as it is, of course, the locked files won’t be copied, but those are only snapshot related files, so no worries those are not needed anyway.

4. now we can delete the snapshot

vim-cmd vmsvc/snapshot.removeall

depending on how long it took to copy the data and how busy the virtual machine was during that time it can take some time for the data from the snapshot to be merged with the main disk and memory files.

1. on your ESXi host console press ALT+F1.

2. type unsupported (this won’t be displayed on the screen) and press Enter

3. enter your root password

4. edit /etc/inetd.conf (vi /etc/inetd.conf) find the line with ssh and uncomment it. save the file

5. do ps auxw | grep inetd to find inetd’s PID and send it a HUP signal

This problem only happens when one uses udev to manage device node files and names of the network interfaces present in the system.

What can happen is that the eth0 interface present when running the server under one vmware instance can become missing after moving to another vmware host. Instead, the system will create a new eth interface – eth1.

This is because udev caches network attributes to interface name assignments to keep network interface names consistent between reboots. The problem is network card parameters (MAC address) can change with different instances of vmware. If the cached version doesn’t match the actual state udev creates a new device -> name assignment.

To fix this in debian and debian-like distribution just delete this file: /etc/udev/rules.d/z25_persistent-net.rules

The file will be generated again after rebooting (or restarting udev).

Just a simple script to monitor temperature on a soekris net5501 box running OpenBSD and OpenBSD’s snmpd.

Should be used as any other snmp__* munin plugin: snmp__soekris_temp

The script requires snmpwalk to be installed on the monitoring system.

To my surprise this is actually possible. Moreover, it’s much easier than restoring files from ext2 partitions, where you have tools like foremost and photorec. With these tools you can restore contents of your files by looking for certain patterns in raw disk data. Restoring directory structure and file names isn’t that easy. The situation is a bit different with ext3 – all thanks to this great tool – ext3grep.

All one needs to restore deleted data is to unmount your hdd as soon as possible (or remount it ro) and take a copy of it using dd.

Download ext3grep from http://code.google.com/p/ext3grep/ , untargzip it and compile. The easies way to use it I’ve found is to give it a date to undelete all files removed after.

ext3grep /dev/sdb1 –restore-all –after=1226937993

It will create a RESTORED_FILES directory and create all recovered files and directores there. It does take some time to do that, but after all it’s a bit complicated process ;) A very interesting and detailed document about the internals of ext3, file recovery and ext3grep (with more examples) is here.