About JeredSutton

I would consider myself a general technologist with a strong focus on Systems Administration, web application development and networking. I have a warm place in my heart for service providers and massive amounts of data. I will wrap this up with a personal fact... I like red bull.

When strace isn’t enough Part 1

An important tool in any linux admin’s toolkit is the venerable strace command. It enables us to get insight into what a program is actually doing. As awesome as strace can be, it doesn’t tell us everything. This series of articles will get you familiar with some of the other commands and approaches to gain insight into program execution.

Continue reading

Quick Tip: View view linux process limits

I have on several occasions needed to troubleshoot issues which wound up being problems with linux limiting the number of open files for a given process. This can be an annoying issue to troubleshoot since many programs do not gracefully handle this condition and linux does not provide log information which indicates the situation by default. This really applies to all of the linux process limits, not just open files.

So what do we do about it?

When you google the problem, you will typically find references to running ulimit as the service user to determine what the existing limits are. You will quickly discover, that this doesn’t work. For one thing, most service users don’t have shells. Additionally, as you will see in my next post, many services already have configuration which attempts to set the limits on program startup.

Enter the proc filesystem…

cat /proc/805/limits 
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             7807                 7807                 processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       7807                 7807                 signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us

805 is the pid of the process you want to check.

Have fun…

Bash Nagios plugin

Today lets have a look at one way to construct a nagios plugin in bash. I would usually write these in perl, but sometimes that is not possible. This plugin is actually written to be executed using NRPE.

#!/bin/bash
# bash nagios plugin

###
# Variables
###
OK=0
WARNING=1
CRITICAL=2
UNKNOWN=-1
TO_RETURN=${OK}
TO_OUTPUT=''

# Print usage information and exit
print_usage(){
    echo -e "\n" \
    "usage: ./check_uptime -w 20 -c 30 \n" \
    "\n" \
    "-w <days>    warning value\n" \
    "-c <days>    critical value\n" \
    "-h           this help\n" \
    "\n" && exit 1
}

###
# Options
###

# Loop through $@ to find flags
while getopts ":hw:c:" FLAG; do
    case "${FLAG}" in
        w) # Warning value
            WARNING_VALUE="${OPTARG}" ;;
        c) # Critical value
            CRITICAL_VALUE="${OPTARG}" ;;
        h) # Print usage information
            HELP=1;;
        [:?]) # Print usage information
            print_usage;;
    esac
done

###
# Functions
###

log_date(){
    echo $(date +"%b %e %T")
}

error() {
    NOW=$(log_date)
    echo "${NOW}: ERROR: $1"
    exit 1
}

warning() {
    NOW=$(log_date)
    echo "${NOW}: WARNING: $1"
}

info() {
    NOW=$(log_date)
    echo "${NOW}: INFO: $1"
}

# Do something
get_cmd_output(){
    #generate output
    echo `uptime | sed 's/.*up \([0-9]*\) day.*/\1/'` || error "failed to run command"
}

###
# Program execution
###
[ "${HELP}" ] && print_usage

if [ ${WARNING_VALUE} ] && [ ${CRITICAL_VALUE} ]
then
    CMD_OUTPUT=$(get_cmd_output)
else
    print_usage
fi

if [ "${CMD_OUTPUT}" ] && [ ${CMD_OUTPUT} -gt ${CRITICAL_VALUE} ]
then
    TO_RETURN=${CRITICAL}
elif [ "${CMD_OUTPUT}" ] && [ ${CMD_OUTPUT} -gt ${WARNING_VALUE} ]
then
    TO_RETURN=${WARNING}
elif [ "${CMD_OUTPUT}" ] && [ ${CMD_OUTPUT} -gt 0 ]
then
    TO_RETURN=${OK}
else
    TO_RETURN=${UNKNOWN}
fi

if [ $TO_RETURN == ${CRITICAL} ]
then
    TO_OUTPUT="CRITICAL "
elif [ $TO_RETURN == ${WARNING} ]
then
    TO_OUTPUT="WARNING "
elif [ ${TO_RETURN} == ${OK} ]
then
    TO_OUTPUT="OK "
else
    TO_OUTPUT="UNKNOWN "
fi

TO_OUTPUT="${TO_OUTPUT}| uptime=${CMD_OUTPUT};$WARNING_VALUE;$CRITICAL_VALUE"

echo "$TO_OUTPUT";
exit $TO_RETURN;

Lets break it down…

OK=0
WARNING=1
CRITICAL=2
UNKNOWN=-1

We define some readable names for the return codes.

TO_RETURN=${OK}

Set the initial return value to OK.

# Do something
get_cmd_output(){
    #generate output
    echo `uptime | sed 's/.*up \([0-9]*\) day.*/\1/'` || error "failed to run command"
}

Function to obtain the value we want to check. In this case uptime.

if [ "${CMD_OUTPUT}" ] && [ ${CMD_OUTPUT} -gt ${CRITICAL_VALUE} ]
then
    TO_RETURN=${CRITICAL}
elif [ "${CMD_OUTPUT}" ] && [ ${CMD_OUTPUT} -gt ${WARNING_VALUE} ]
then
    TO_RETURN=${WARNING}
elif [ "${CMD_OUTPUT}" ] && [ ${CMD_OUTPUT} -gt 0 ]
then
    TO_RETURN=${OK}
else
    TO_RETURN=${UNKNOWN}
fi

Check the value of uptime against our warning and critical values.

if [ $TO_RETURN == ${CRITICAL} ]
then
    TO_OUTPUT="CRITICAL "
elif [ $TO_RETURN == ${WARNING} ]
then
    TO_OUTPUT="WARNING "
elif [ ${TO_RETURN} == ${OK} ]
then
    TO_OUTPUT="OK "
else
    TO_OUTPUT="UNKNOWN "
fi

Set the visible output of the plugin. This output is not used by nagios.

TO_OUTPUT="${TO_OUTPUT}| uptime=${CMD_OUTPUT};$WARNING_VALUE;$CRITICAL_VALUE"

Construct the output string according to the nagios plugin developer guidelines.

Stay tuned. The perl version will be out soon.

For more information see:
http://nagiosplug.sourceforge.net/developer-guidelines.html#AEN201

ssh-copy-id missing in OS X

Not sure if anyone else has noticed, but OS X is missing ssh-copy-id. This utility is included with the ssh client in most major linux distros. As it turns out, it is just a shell script.

#!/bin/sh

# Shell script to install your public key on a remote machine
# Takes the remote machine name as an argument.
# Obviously, the remote machine must accept password authentication,
# or one of the other keys in your ssh-agent, for this to work.

ID_FILE="${HOME}/.ssh/id_rsa.pub"

if [ "-i" = "$1" ]; then
  shift
  # check if we have 2 parameters left, if so the first is the new ID file
  if [ -n "$2" ]; then
    if expr "$1" : ".*\.pub" > /dev/null ; then
      ID_FILE="$1"
    else
      ID_FILE="$1.pub"
    fi
    shift         # and this should leave $1 as the target name
  fi
else
  if [ x$SSH_AUTH_SOCK != x ] && ssh-add -L >/dev/null 2>&1; then
    GET_ID="$GET_ID ssh-add -L"
  fi
fi

if [ -z "`eval $GET_ID`" ] && [ -r "${ID_FILE}" ] ; then
  GET_ID="cat ${ID_FILE}"
fi

if [ -z "`eval $GET_ID`" ]; then
  echo "$0: ERROR: No identities found" >&2
  exit 1
fi

if [ "$#" -lt 1 ] || [ "$1" = "-h" ] || [ "$1" = "--help" ]; then
  echo "Usage: $0 [-i [identity_file]] [user@]machine" >&2
  exit 1
fi

{ eval "$GET_ID" ; } | ssh ${1%:} "umask 077; test -d .ssh || mkdir .ssh ; cat >> .ssh/authorized_keys" || exit 1

cat <<EOF
Now try logging into the machine, with "ssh '${1%:}'", and check in:

  .ssh/authorized_keys

to make sure we haven't added extra keys that you weren't expecting.

EOF

Go forth, copy, paste, chmod and happily deploy your ssh keys with ease.

P.S. For those who don’t know what I mean by chmod, see the following.

chmod +x ./ssh-copy-id

Bash Parallel Execution

If you have ever wanted an easy way to execute multiple jobs in parallel in bash, then this is the snippet for you. This was originally posted on Stack Exchange. It has been modified a bit.

#!/bin/bash

#how many jobs to run at one time
JOBS_AT_ONCE=20

# The bgxupdate and bgxlimit functions below allow for
# running X jobs in parallel in bash.  They are taken from:
# http://stackoverflow.com/questions/1537956/bash-limit-the-number-of-concurrent-jobs/1685440#1685440

# bgxupdate - update active processes in a group.
#   Works by transferring each process to new group
#   if it is still active.
# in:  bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.

bgxupdate() {
    bgxoldgrp=${bgxgrp}
    bgxgrp=""
    ((bgxcount = 0))
    bgxjobs=" $(jobs -pr | tr '\n' ' ')"
    for bgxpid in ${bgxoldgrp} ; do
        echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
        if [[ $? -eq 0 ]] ; then
            bgxgrp="${bgxgrp} ${bgxpid}"
            ((bgxcount = bgxcount + 1))
        fi
    done
}

# bgxlimit - start a sub-process with a limit.

#   Loops, calling bgxupdate until there is a free
#   slot to run another sub-process. Then runs it
#   an updates the process group.
# in:  $1     - the limit on processes.
# in:  $2+    - the command to run for new process.
# in:  bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes

bgxlimit() {
    bgxmax=$1 ; shift
    bgxupdate
    while [[ ${bgxcount} -ge ${bgxmax} ]] ; do
        sleep 1
        bgxupdate
    done
    if [[ "$1" != "-" ]] ; then
        $* &
        bgxgrp="${bgxgrp} $!"
    fi
}

bgxgrp="process_group_1"
for LINE in `cat hosts`
do
    CHECK_SCRIPT='echo $(hostname),$(cat /etc/debian_version)'
    bgxlimit $JOBS_AT_ONCE ssh ${LINE} "${CHECK_SCRIPT}"
done
# Wait until all queued processes are done.

bgxupdate
while [[ ${bgxcount} -ne 0 ]] ; do
    oldcount=${bgxcount}
    while [[ ${oldcount} -eq ${bgxcount} ]] ; do
        sleep 1
        bgxupdate
    done
done

In this script the primary changes are defining the max number of simultaneous jobs, as well as doing somewhat useful work in returning the hostname and the debian version.