Monitor backend services with Monit // Minh Danh

Most of modern web applications rely on backend services (such as database, cache, scheduler…) to function properly. So keeping these services always up is very important, especially when your system is running on a production environment. Besides being able to put a service up again when it suddenly crashes for some reason, a monitor program should inform you about the incident right away. There’s an open source utility which can handle these tasks quite well – it is Monit.

1. Installation

The installation process is straightforward for both Ubuntu/Debian and RHEL/Centos systems:

# For RHEL/Centos
yum install monit

# For Ubuntu/Debian
apt-get install monit

Start monit:

/etc/init.d/monit start

2. Configuration

For Ubuntu/Debian, the main configuration file is `/etc/monit/monitrc`. For RHEL/Centos, it is `/etc/monit.conf`. Here are some default lines you may want to change (uncomment the lines to enable them):

set daemon 120

This tells monit to check services at 2-minute intervals. I often set this to 30 or 45 seconds.

set mailserver mail.bar.baz,               # primary mailserver
backup.bar.baz port 10025,  # backup mailserver on port 10025
localhost                   # fallback relay

Use a mailserver so that monit can send alert emails to you when something happens. If you use ‘localhost’, make sure you have installed a mail daemon (`sendmail`, `postfix`,…) on your server.

set alert [email protected]

Change this to your email address which monit will send alerts to

set httpd port 2812 and
use address localhost  # only accept connection from localhost
allow localhost        # allow localhost to connect to the server and
allow admin:monit      # require user 'admin' with password 'monit'
allow @monit           # allow users of group 'monit' to connect (rw)
allow @users readonly  # allow users of group 'users' to connect readonly

Monit comes with a built-in web server. It provides you with a ‘dashboard’ so that you can have a look at the status of the services, stop or start them as you want. Replace `localhost` with your server IP address in the line `use address localhost` and your machine IP in the line `allow localhost`. Change the default username and password for better security.

When you’ve finished configuring, remember to reload monit configuration:

monit reload

After this step, you can access monit’s web interface via the url http://yourserverip:2812. But we’re not done yet, there’s more to do to monitor the services.

3. Services monitoring

At the end of the main configuration file, there’s a line that look like this: `include /etc/monit.d/*.conf` (on RHEL/Centos) or `include /etc/monit/conf.d/*` (on Ubuntu/Debian). This tells monit to load other configurations from the config folder (`/etc/monit.d/ or /etc/monit/conf.d/`). We’ll put our service configuration files in this folder. Here are the examples I use for common services for a Rails app:

#nginx.conf
check process nginx with pidfile /var/run/nginx.pid
start program = "/etc/init.d/nginx start" with timeout 30 seconds
stop program  = "/etc/init.d/nginx stop"

#mysql.conf
check process mysqld with pidfile /var/run/mysqld/mysqld.pid
start program = "/etc/init.d/mysqld start" with timeout 40 seconds
stop program  = "/etc/init.d/mysqld stop"

#postgresql.conf
check process postgresql with pidfile /var/lib/pgsql/9.3/data/postmaster.pid
start program = "/etc/init.d/postgresql-9.3 start"
stop program = "/etc/init.d/postgresql-9.3 stop"

#delayed_job.conf
check process delayed_job with pidfile /var/www/rails-webapp/tmp/pids/delayed_job.pid
start program = "/bin/su - webapp -c 'cd /var/www/rails-webapp/; RAILS_ENV=production bin/delayed_job start'"
stop program = "/bin/su - webapp -c 'cd /var/www/rails-webapp/; RAILS_ENV=production bin/delayed_job stop'"

#sphinx.conf
check process searchd with pidfile /var/www/rails-webapp/log/production.sphinx.pid
start program = "/bin/su - webapp -c 'cd /var/www/rails-webapp/; RAILS_ENV=production rake ts:start'"
stop program = "/bin/su - webapp -c 'cd /var/www/rails-webapp/; RAILS_ENV=production rake ts:stop'"

#puma.conf
check process puma with pidfile /var/www/rails-webapp/tmp/pids/puma.pid
start program = "/bin/su - root -c '/etc/init.d/puma start'" with timeout 50 seconds
stop program = "/etc/init.d/puma stop"

The init script for puma looks like this:

#!/bin/bash
# chkconfig: 2345 95 20
# description: Control rails-app web socket
# Start and stop rails-app web deamon
# processname: rails-app

RACK_ENV="production"
APP_ROOT="/var/www/rails-webapp"
APP_USER="webapp"
DAEMON_OPTS="-C $APP_ROOT/config/puma.rb"
PID_PATH="$APP_ROOT/tmp/pids"
SOCKET_PATH="$APP_ROOT/tmp/sockets"
WEB_SERVER_PID="$PID_PATH/puma.pid"
NAME="rails-app"
DESC="rails-app service"

check_pid(){
  if [ -f $WEB_SERVER_PID ]; then
    PID=`cat $WEB_SERVER_PID`
    STATUS=`ps aux | grep $PID | grep -v grep | wc -l`
  else
    STATUS=0
    PID=0
  fi
}

execute() {
  sudo -u $APP_USER -H bash -l -c "$1"
}

start() {
  cd $APP_ROOT
  check_pid
  if [ "$PID" -ne 0 -a "$STATUS" -ne 0 ]; then
    # Program is running, exit with error code 1.
    echo "Error! $DESC $NAME is currently running!"
    exit 1
  else
    if [ `whoami` = root ]; then
      execute "rm -f $SOCKET_PATH/rails-app_ror.sock"
      execute "RAILS_ENV=production bundle exec puma $DAEMON_OPTS"
      echo "$DESC started"
    fi
  fi
}

stop() {
  cd $APP_ROOT
  check_pid
  if [ "$PID" -ne 0 -a "$STATUS" -ne 0 ]; then
    ## Program is running, stop it.
    kill -KILL `cat $WEB_SERVER_PID`
    rm "$WEB_SERVER_PID" &gt;&gt; /dev/null
    echo "$DESC stopped"
  else
    ## Program is not running, exit with error.
    echo "Error! $DESC not started!"
    exit 1
  fi
}

restart() {
  cd $APP_ROOT
  check_pid
  if [ "$PID" -ne 0 -a "$STATUS" -ne 0 ]; then
    echo "Restarting $DESC..."
    kill -USR2 `cat $WEB_SERVER_PID`
    echo "$DESC restarted."
  else
    echo "Error, $NAME not running!"
    exit 1
  fi
}
status() {
  cd $APP_ROOT
  check_pid
  if [ "$PID" -ne 0 -a "$STATUS" -ne 0 ]; then
    echo "$DESC / Puma with PID $PID is running."
    echo "$DESC / Sidekiq with PID $SPID is running."
  else
    echo "$DESC is not running."
    exit 1
  fi
}

## Check to see if we are running as root first.
## Found at http://www.cyberciti.biz/tips/shell-root-user-check-script.html
if [ "$(id -u)" != "0" ]; then
    echo "This script must be run as root"
    exit 1
fi

case "$1" in
  start)
        start
        ;;
  stop)
        stop
        ;;
  restart)
        restart
        ;;
  reload|force-reload)
        echo -n "Reloading $NAME configuration: "
        kill -HUP `cat $PID`
        echo "done."
        ;;
  status)
        status
        ;;
  *)
        echo "Usage: sudo service rails-webapp {start|stop|restart|reload}" &gt;&2
        exit 1
        ;;
esac

exit 0

#sidekiq.conf
check process sidekiq with pidfile "/var/www/rails-webapp/tmp/pids/sidekiq.pid"
start program = "/etc/init.d/sidekiq start"
stop program = "/etc/init.d/sidekiq stop"

The init script for sidekiq looks like this:

#!/bin/bash
# sidekiq    Init script for Sidekiq
# chkconfig: 345 100 75
#
# Description: Starts and Stops Sidekiq message processor.
#
# User-specified exit parameters used in this script:
#
# Exit Code 5 - Incorrect User ID
# Exit Code 6 - Directory not found

# You will need to modify these
APP="sidekiq"
AS_USER="webapp"
APP_DIR="/var/www/rails-webapp"

LOG_FILE="$APP_DIR/log/sidekiq.log"
LOCK_FILE="$APP_DIR/tmp/pids/${APP}-lock"
PID_FILE="$APP_DIR/tmp/pids/${APP}.pid"
SIDEKIQ="sidekiq"
APP_ENV="production"
BUNDLE="bundle"

START_CMD="$BUNDLE exec $SIDEKIQ -e $APP_ENV -P $PID_FILE"
CMD="cd ${APP_DIR}; ${START_CMD} &gt;&gt; ${LOG_FILE} 2&gt;&1 &"

RETVAL=0


start() {

  status
  if [ $? -eq 1 ]; then

    [ `id -u` == '0' ] || (echo "$SIDEKIQ runs as root only .."; exit 5)
    [ -d $APP_DIR ] || (echo "$APP_DIR not found!.. Exiting"; exit 6)
    cd $APP_DIR
    echo "Starting $SIDEKIQ message processor .. "

    su -c "$CMD" - $AS_USER

    RETVAL=$?
    #Sleeping for 8 seconds for process to be precisely visible in process table - See status ()
    sleep 8
    [ $RETVAL -eq 0 ] && touch $LOCK_FILE
    return $RETVAL
  else
    echo "$SIDEKIQ message processor is already running .. "
  fi


}

stop() {

    echo "Stopping $SIDEKIQ message processor .."
    SIG="INT"
    kill -$SIG `cat  $PID_FILE`
    RETVAL=$?
    [ $RETVAL -eq 0 ] && rm -f $LOCK_FILE && rm -f $PID_FILE
    sleep 8
    return $RETVAL
}

status() {

  ps -ef | grep 'sidekiq [0-9].[0-9].[0-9]' | grep -v grep
  return $?
}


case "$1" in
    start)
        start
        ;;
    stop)
        stop
        ;;
    status)
        status

        if [ $? -eq 0 ]; then
             echo "$SIDEKIQ message processor is running .."
             RETVAL=0
         else
             echo "$SIDEKIQ message processor is stopped .."
             RETVAL=1
         fi
        ;;
    *)
        echo "Usage: $0 {start|stop|status}"
        exit 0
        ;;
esac
exit $RETVAL

#elasticsearch.conf
check process elasticsearcch with pidfile "/var/run/elasticsearch.pid"
group elasticsearch
start program = "/etc/init.d/elasticsearch start"
stop program = "/etc/init.d/elasticsearch stop"

check host elasticsearch_connection with address 0.0.0.0
if failed url http://0.0.0.0:9200/ with timeout 15 seconds then alert

#redis.conf
check process redis-server with pidfile "/var/run/redis_9229.pid"
start program = "/etc/init.d/redis_9229 start"
stop program = "/etc/init.d/redis_9229 stop"
if failed host 127.0.0.1 port 9229 then restart

Once you’ve finished the configurations, check monit syntax:

monit -t

If everything is all right, start monitoring all the services:

monit start all

Or you can stop/start the services individually:

monit stop/start servicename

With `servicename` is defined in the service configuration file, preceded by `check process`