How do you configure root cause analysis (RCA) in a network where the ITNM server is out of scope? This article explains how to correctly setup RCA in a network where the ITNM server is not present in the discovery scope.
What is RCA?
RCA is the process of determining the root cause of one or more device alerts. ITNM utilises a RCA plugin that receives a subset of enriched events from the Event Gateway (ncp_g_event in ITNM 3.9) and determines whether the events are a root cause or a symptom.
It is important to understand that RCA utilises a concept called poller entity (or polling station), Poller entity is very important to the RCA process. Poller entity is the server from which ITNM polls devices.
If the poller entity is not within the scope of the discovery, then the IP address or DNS name of the ingress interface must be specified to enable RCA to perform isolated suppression. So in the case where the ITNM server is outside the discovery scope, the interface that ITNM connects into the network would be the ingress interface. This is demonstrated in Figure 1.
In Figure 1 the discovery scope is configured only for devices in the Class C network 192.168.10.0/24, therefore the ITNM server is out of scope.
Therefore for RCA to perform isolated suppression correctly the IP or DNS of the ingress interface must be used. In figure 1 this would be 172.16.10.242 as this is the interface within the discovery scope from which network packets are transmitted to and from the poller entity.
So to use an ingress interface as the poller entity a value would be required in the NcpServerEntity field within the config.defaults table. This value is entered in the $NCHOME/etc/precision/EventGatewaySchema.cfg file.
Definition of downstream
Downstream specifies a location on the network topologically more distant from the poller entity but on the same physical path as a second location.
A failure of an entity on the network could generate numerous alerts; however a failure on a link on a network path through a network would render the devices downstream inaccessible (See Figure 2).
In Figure 2 when the link between routers B and C failed, it was the sole connection for ITNM (as the poller entity) to communicate with routers C and D. Furthermore, the ITNM poller will still be polling Routers C and D for information, but the polls will timeout as the connection between routers B and C has failed and there would be an avalanche of alerts. However, the RCA process will suppress the alerts downstream of Router B and identify the failed connection between Router B and Router C as the root cause.
Definition of upstream
Upstream specifies a location on the network topologically closer to the poller entity but on the same physical path as a second location.
The network infrastructure used throughout the remainder of this article is displayed in Figure 3.
The following points are important to understand the network infrastructure.
The ITNM server has an IP address of 192.168.9.20
The default gateway used by the ITNM server is 192.168.9.30 and it is a fast Ethernet port on the Berwick router.
The only scopes configured for the discovery will be:
The configured scopes will discover the Newcastle, Manchester, Birmingham and Southampton network devices.
The ingress interface is 188.8.131.52
To configure RCA log into the ITNM web console and display the Network Discovery Configuration pages.
The following three scopes were required:-
Figure 4 displays a completed Scope Configuration.
The appropriate SNMP community strings were entered in the network infrastructure for this article a community string of ‘public’ and a SNMP version of 2 was used. Figure 5 displays an example of a completed SNMP Community Strings page.
The defaults were accepted for the remainder of the scope configuration.
Configure Poller Entity
To allow RCA to perform isolated suppression when the Network Manager server is not within the discovery scope, the IP address or DNS name of the ingress interface as the poller entity has to be specified. The following section outlines the method for adding the ingress interface as the poller entity.
- The EventGatewaySchema.cfg configuration file has to be configured. This file is located at: $NCHOME/etc/precision/EventGatewaySchema.cfg. The poller entity value is stored in the config.defaults table, in the field NcpServerEntity.
Open the EventGatewaySchema.cfgconfiguration file. Identify the insert statement into the config.defaults table. By default this insert statement has the following form:
insert into config.defaults
- By default the NcpServerEntity field is empty. In this case, the Event Gateway searches the topology using the IP address or the addresses of the local host it is running on. This statement has to modified to set the NcpServerEntity field to the value of the IP address of the ingress interface.
Figure 6 shows the EventGatewaySchema.cfg file with a populated NcpServerEntity.
On entering a value for NcpServerEntity, ITNM was restarted via the itnm_stop and itnm_start commands.
Create Network View
On restart of ITNM a full discovery of the network was performed.
The following network view was created titled EstateLayer2 of Type Filtered as illustrated in Figure 7. This will display only the routers within the discovery scope.
The actual filters utilised the ipEndPoint table with the following filter criteria (See Figure 8):-
- Filter ipEndPoint subnet like ‘172.16.10.%’
- Filter ipEndPoint subnet like ‘180.24.10.%’
- Filter ipEndPoint subnet like ‘190.32.10.%’
Enable ITNM Polling
In a fresh install of ITNM only the DefaultChassispolling is enabled. The following polls were also enabled:-
- SNMP Link State
Test ITNM Root Cause Analysis
With NcpServerEntity populated with the IP address of the ingress interface, the network view created and ITNM monitoring enabled, a quick test was performed on the ITNM RCA.
Figure 9 displays the state of the network via the EstateLayer2 network view prior to the test.
The four routers from the discovery scope are displayed:-
- r-eu-uk-new-001 (Newcastle)
- r-eu-uk-man-001 (Manchester)
- r-eu-uk-bir-001 (Birmingham)
- r-eu-uk-sou-001 (Southampton)
The Manchester router is shutdown this will produce numerous alerts as the ITNM poller will be unable to poll the routers downstream of this device. If RCA is working correctly the alerts on the Birmingham and Southampton routers will be suppressed by RCA and the devices in the map (that are suppressed) will be purple in colour.
Figure 10 displays the aftermath of shutting down the Manchester router in the EstateLayer2 network view.
The alerts on the Manchester, Birmingham and Southampton routers are suppressed (indicated by the purple colour). The Root Cause is displayed on the Newcastle router (Indicated by the Red colour).
By right Clicking the Newcastle router and selecting ‘Show Events’ the alerts for that network device are displayed (Figure 11).
Select one of the purple events for instance ‘Interface 2 has gone down….’ right click and select ‘Show Root Cause’ will display the root-cause event (Figure 12).
The example used in this article was a very simple example of how straightforward it is to correctly configure RCA in an environment where the ITNM server (Poller entity) is not part of the discovery scope. The example above clearly demonstrates that RCA has correctly identified the root cause and suppressed all events downstream of the Newcastle router, as the only path of communication has been taken down.