Check BGP Neighbors

Background

BGP is a fairly hairy medium sized beast which can be hard for new admins to handle. This script was developed to catch some of the problems that can burn an admin when using and monitoring BGP with Nagios. Several different things are checked to determine the health and status of the BGP connections with neighbors. Often these problems will be made obvious when other services start to fail because of BGP, but this script is meant to help catch BGP problems before they cause larger problems in the network. This script is written for BGP4 on a Cisco router, but it could easily be modified to use with other vendors if the correct OIDs were identified.

 

Download

Nagios Exchange

Versioned Code

 

Architecture

The check_bgp_neighbors script checks 4 different aspects of BGP.

  • BGP Connection Status – This is a basic check which relies on the internal implementation of BGP by Cisco to determine the BGP connection status. This is read from a specific MIB on the Cisco router.
  • Number of prefixes in memory – This value is polled for each BGP neighbor from a specific Cisco MIB. In older books this value is usually polled through remote commands, but now it can be retrieved over snmp. This determines the total number of prefixes in memory which have been recieved from each neighbor.
  • BGP Messages received during the nagios polling period – This is counter of the total number of BGP specific messages received from each neighbor. The current value is always calculated by saving the last value and subtracting it from the current value. This provides a differential number which can be averaged.
  • BGP Messages sent during the nagios polling period – This is a counter of the total number of BGP specific messages sent to each neighbor. The current value is always calculated by saving the last value and subtracting it from the current value. This provides a differential number which can be averaged.

 

Routine Operations

Run from Command Line

Basic usage from the command line where 10.0.0.1 is the first neighbor and 10.0.0.2 is the second neighbor

/usr/local/nagios/libexec/check_bgp_neighbors  -H router.example.com -C public -n 10.0.0.1 -n 10.0.0.2

 

Installation (Bash)

The following is an example installation for people that may not be completely comfortable creating new nagios commands/checks. Also, the snmpget command must be available on the nagios server.

Download and copy this script to your nagios server e.g.

/usr/local/nagios/libexec/

Add nagios command definition like below

# check_bgp_neigbors command definition
define command{
       command_name check_bgp_neighbors
       command_line $USER1$/check_bgp_neighbors -H $HOSTADDRESS$ -C $USER3$ -n $ARG1$ -n $ARG2$
       }

Optional: Add a nagios hostgroup like example below

# Associated in svc-bgp.cfg
define hostgroup{
       hostgroup_name   svc-bgp1
       alias        BGP Check 1
       }

# Associated in svc-bgp.cfg
define hostgroup{
       hostgroup_name   svc-bgp2
       alias        BGP Check 2
       }

Optional: Add a specific file with the host checks. The first 10.0.0.1 ip address is the eBGP neighor, the 172.16.0.2 ip address is the iBGP neighbor. The 172.16.0.0/12 network is what connects the two local iBGP peers together.

define service{
       use              server-service
       hostgroup_name           svc-bgp1
       service_description      BGP Check 1
       check_command            check_bgp_all!10.0.0.1!172.16.0.2
       }

define service{
       use              server-service
       hostgroup_name           svc-bgp2
       service_description      BGP Check 2
       check_command            check_bgp_all!192.168.0.1!172.16.0.1
       }</pre></li>
    <li>Optional: Finally, add the service check to the host definitions for your routers.<pre lang="bash">define host{
       use          network-host
       host_name        router1
       hostgroups       svc-bgp1
       }

define host{
       use          network-host
       host_name        router2
       hostgroups       svc-bgp2
       }

 

Installation (Python)

The following is an example installation for people that may not be completely comfortable creating new nagios commands/checks. Also, the snmpget command must be available on the nagios server.

Download and copy this script to your nagios server e.g.

/usr/local/nagios/libexec/

Add nagios command definition like below

# check_bgp_neighbors command definition
define command{
       command_name check_bgp_neighbors
       command_line $USER1$/check_bgp_neighbors -H $HOSTADDRESS$ -C $USER3$ -n $ARG1$ -n $ARG2$
       }

Optional: Add a nagios hostgroup like example below

# Associated in svc-bgp.cfg
define hostgroup{
       hostgroup_name   svc-bgp1
       alias        BGP Check 1
       }

# Associated in svc-bgp.cfg
define hostgroup{
       hostgroup_name   svc-bgp2
       alias        BGP Check 2
       }

>Optional: Add a specific file with the host checks. The first 10.0.0.1 ip address is the eBGP neighor, the 172.16.0.2 ip address is the iBGP neighbor. The 172.16.0.0/12 network is what connects the two local iBGP peers together. The Python version of this script allows the administrator to specify a threshold for each host, this is important for better checking because iBGP and eBGP will not always have the same number of prefixes in memory.

define service{
       use              server-service
       hostgroup_name           svc-bgp1
       service_description      BGP Check 1
       check_command            check_bgp_all!10.0.0.1!172.16.0.2:125000
       }

define service{
       use              server-service
       hostgroup_name           svc-bgp2
       service_description      BGP Check 2
       check_command            check_bgp_all!192.168.0.1!172.16.0.1:296000
       }</pre></li>
    <li>Optional: Finally, add the service check to the host definitions for your routers.<pre lang="bash">define host{
       use          network-host
       host_name        router1
       hostgroups       svc-bgp1
       }

define host{
       use          network-host
       host_name        router2
       hostgroups       svc-bgp2
       }

 

Tuning & Tips

We have found that a 2 minute nagios polling period works fairly well As of the time of this writing doing eBGP with full tables should put about 230K-250K prefixes in memory for each peer, so the total with one eBGP and one iBGP partner will be about 500K. Set the thresholds low enough for prefixes, rx messages, and tx messages sufficiently low enough to not be paged during low update times. Having a threshold even if low will at least determine if you are talking to your neighbors. This has burned us before because we had prefixes in memory, but they slowly dwindled because we were not receiving messages even though the BGP connection was not flaggin down to our eBGP neighbor. Graphing these OIDs in Cacti or MRTG can provide you with good baselines, which can be used to set lower bounds for tx messages, rx messages, and number or prefixes in memory.