Check BGP Neighbors

Background

BGP is a fairly hairy medium sized beast which can be hard for new admins to handle. This script was developed to catch some of the problems that can burn an admin when using and monitoring BGP with Nagios. Several different things are checked to determine the health and status of the BGP connections with neighbors. Often these problems will be made obvious when other services start to fail because of BGP, but this script is meant to help catch BGP problems before they cause larger problems in the network. This script is written for BGP4 on a Cisco router, but it could easily be modified to use with other vendors if the correct OIDs were identified.

 

Download

Nagios Exchange

Versioned Code

 

Architecture

The check_bgp_neighbors script checks 4 different aspects of BGP.

  • BGP Connection Status – This is a basic check which relies on the internal implementation of BGP by Cisco to determine the BGP connection status. This is read from a specific MIB on the Cisco router.
  • Number of prefixes in memory – This value is polled for each BGP neighbor from a specific Cisco MIB. In older books this value is usually polled through remote commands, but now it can be retrieved over snmp. This determines the total number of prefixes in memory which have been recieved from each neighbor.
  • BGP Messages received during the nagios polling period – This is counter of the total number of BGP specific messages received from each neighbor. The current value is always calculated by saving the last value and subtracting it from the current value. This provides a differential number which can be averaged.
  • BGP Messages sent during the nagios polling period – This is a counter of the total number of BGP specific messages sent to each neighbor. The current value is always calculated by saving the last value and subtracting it from the current value. This provides a differential number which can be averaged.

 

Routine Operations

Run from Command Line

Basic usage from the command line where 10.0.0.1 is the first neighbor and 10.0.0.2 is the second neighbor

/usr/local/nagios/libexec/check_bgp_neighbors -H router.example.com -C public -n 10.0.0.1 -n 10.0.0.2

 

Installation (Bash)

The following is an example installation for people that may not be completely comfortable creating new nagios commands/checks. Also, the snmpget command must be available on the nagios server.

Download and copy this script to your nagios server e.g.

/usr/local/nagios/libexec/

Add nagios command definition like below

# check_bgp_neigbors command definition
define command{
command_name check_bgp_neighbors
command_line $USER1$/check_bgp_neighbors -H $HOSTADDRESS$ -C $USER3$ -n $ARG1$ -n $ARG2$
}

Optional: Add a nagios hostgroup like example below

# Associated in svc-bgp.cfg
define hostgroup{
hostgroup_name svc-bgp1
alias BGP Check 1
}

# Associated in svc-bgp.cfg
define hostgroup{
hostgroup_name svc-bgp2
alias BGP Check 2
}

Optional: Add a specific file with the host checks. The first 10.0.0.1 ip address is the eBGP neighor, the 172.16.0.2 ip address is the iBGP neighbor. The 172.16.0.0/12 network is what connects the two local iBGP peers together.

define service{
use server-service
hostgroup_name svc-bgp1
service_description BGP Check 1
check_command check_bgp_all!10.0.0.1!172.16.0.2
}

define service{
use server-service
hostgroup_name svc-bgp2
service_description BGP Check 2
check_command check_bgp_all!192.168.0.1!172.16.0.1
}

  • Optional: Finally, add the service check to the host definitions for your routers.
    define host{
           use			network-host
           host_name		router1
           hostgroups		svc-bgp1
           }
    
    define host{
           use			network-host
           host_name		router2 
           hostgroups		svc-bgp2
           }
  •  

    Installation (Python)

    The following is an example installation for people that may not be completely comfortable creating new nagios commands/checks. Also, the snmpget command must be available on the nagios server.

    Download and copy this script to your nagios server e.g.

    /usr/local/nagios/libexec/

    Add nagios command definition like below

    # check_bgp_neighbors command definition define command{ command_name check_bgp_neighbors command_line $USER1$/check_bgp_neighbors -H $HOSTADDRESS$ -C $USER3$ -n $ARG1$ -n $ARG2$ }

    Optional: Add a nagios hostgroup like example below

    # Associated in svc-bgp.cfg define hostgroup{ hostgroup_name svc-bgp1 alias BGP Check 1 } # Associated in svc-bgp.cfg define hostgroup{ hostgroup_name svc-bgp2 alias BGP Check 2 }

    >Optional: Add a specific file with the host checks. The first 10.0.0.1 ip address is the eBGP neighor, the 172.16.0.2 ip address is the iBGP neighbor. The 172.16.0.0/12 network is what connects the two local iBGP peers together. The Python version of this script allows the administrator to specify a threshold for each host, this is important for better checking because iBGP and eBGP will not always have the same number of prefixes in memory.

    define service{ use server-service hostgroup_name svc-bgp1 service_description BGP Check 1 check_command check_bgp_all!10.0.0.1!172.16.0.2:125000 } define service{ use server-service hostgroup_name svc-bgp2 service_description BGP Check 2 check_command check_bgp_all!192.168.0.1!172.16.0.1:296000 }
  • Optional: Finally, add the service check to the host definitions for your routers.
    define host{
    use network-host
    host_name router1
    hostgroups svc-bgp1
    }

    define host{
    use network-host
    host_name router2
    hostgroups svc-bgp2
    }

  •  

    Tuning & Tips

    We have found that a 2 minute nagios polling period works fairly well As of the time of this writing doing eBGP with full tables should put about 230K-250K prefixes in memory for each peer, so the total with one eBGP and one iBGP partner will be about 500K. Set the thresholds low enough for prefixes, rx messages, and tx messages sufficiently low enough to not be paged during low update times. Having a threshold even if low will at least determine if you are talking to your neighbors. This has burned us before because we had prefixes in memory, but they slowly dwindled because we were not receiving messages even though the BGP connection was not flaggin down to our eBGP neighbor. Graphing these OIDs in Cacti or MRTG can provide you with good baselines, which can be used to set lower bounds for tx messages, rx messages, and number or prefixes in memory.

     

    Leave a Reply

    Your email address will not be published. Required fields are marked *