Using nco_postmsg to monitor the TEMS server

Monitoring the Tivoli Infrastructure has greatly improved over the last few years with IBM’s MoSWoS programme however the monitoring of the actual Tivoli Enterprise Monitoring Server (TEMS) itself has always been an issue as if this server stops working all the rest of the monitors will fail.

IBM wrote a document in 2006 in which they suggested using a script on the TEMS server to monitor processes and this is probably still the best solution. This tip updates that example to use the new OMNIbus command nco_postmsg which allows you to send an alert directly to an ObjectServer. This utility accepts name-value pairs for the alert data and constructs an SQL INSERT statement, which is used to insert a new row of data into a specified database table in the ObjectServer.

You can run nco_postmsg from the command line, or you can develop scripts or automations that use the nco_postmsg command to send alerts to the ObjectServer. Multiple instances of the nco_postmsg utility can also run simultaneously.

Installation

The nco_postmsg utility is installed with the Probe Support feature of Tivoli Netcool/OMNIbus, and can therefore be deployed separately from the other Tivoli Netcool/OMNIbus features, on one or more hosts.  To do this do a custom install and then deselect every option except for Probe Support as shown below.

omnibus install2

Setup

Once this is done edit the $OMNIHOME/etc/nco_postmsg.props file and add the following lines changing the values as appropriate:

MessageLevel: ‘warn’                 # Message reporting level

Name: ‘nco_postmsg’                 # Name of client

UserName: ‘root’                          # User to connect as

Password: ‘ZZ’                              # Password for user

Server: ‘ORBD_DEMO’                 # Server to connect to

Table: ‘alerts.status’                    # Table to insert event

Version: FALSE                              # Display version information

Edit $NCHOME/etc/omni.dat and change the values as appropriate to refect your ObjectServer’s name, hostname and port.

e.g.

#

# omni.dat file as prototype for interfaces file

#

# Ident: $Id: omni.dat 1.5 1999/07/13 09:34:20 chris Development $

#

[ORBD_DEMO]

{

Primary: mgmtserver3 4100

}

[NCO_GATE]

{

Primary: mgmtserver3 4300

}

[NCO_PA]

{

Primary: mgmtserver3 4200

}

[NCO_PROXY]

{

Primary: mgmtserver3 4400

}

Then you will need to create an interfaces file by running the $NCHOME/bin/nco_igen.

Test

Once this is done you should be able to test the command and see the event arrive on the OMNIbus console.

$OMNIHOME/bin/nco_postmsg -user root -password “” “Identifier=’xyz123′” “Node=’test'” “Severity=5” “Manager=’nco_postmsg'” “Summary=’An event occurred'”

 

The Monitoring Script

For this tip I have used a slightly modified version from the original IBM document. This script could be improved but for now it works quite well. The script below is a shell script. For a batch version that works on Windows, see below.

#!/bin/ksh

exec 3>&1

tee /opt/yourtivoli/logs/`hostname`-MonitorITM6.log >>&3 |&

exec >&p 2>&1

 

i=0

j=1

Identifier=$(date +”%s”)

time=300

ProcessName=”kdsmain”

node=`hostname`

OMNIHOME=/opt/IBM/tivoli/netcool/omnibus

Summary=”The TEMS Process $ProcessName is not running on $node”

export OMNIHOME node Summary ProcessName

 

while true

do

checkavail=`ps -ef | grep -i $ProcessName | grep -v grep`

if [ -n “$checkavail” ]

then

echo “#########################################”

date

echo “Found Process”

i=0

echo “#########################################”

fi

 

if [ $i -lt $j ] && [ -z “$checkavail” ]

then

echo “#########################################”

date

echo “No Process ($ProcessName) Found, Sending Event To Object Server”

$OMNIHOME/bin/nco_postmsg -user root -password “” “Identifier=’$Identifier'” “Node=’$node'” “Severity=5” “Manager=’nco_postmsg'” “Summary=’$Summary'” “AlertGroup=’YourTivoli'”

echo “#########################################”

 

i=`expr $i + 1`

fi

 

sleep $time

done

Running the Script

On UNIX a good way to run the script is to use inittab as this will respawn the script if the monitor itself falls over.

To do this edit /etc/inittab and add in the following like:

yt:2345:respawn:/opt/yourtivoli/bin/tems_monitor.sh #TEMS Monitor

And respawn the init process.

kill -HUP 1

This will start the script and the monitoring process.

ps -ef | grep tems

And hopefully if the TEMS should ever fall over we will get an erro posted to the OMNIbus server.

omnibus events2

ITM installed on Windows

For those with ITM installed on Windows, below is the a version of the monitoring script as a batch file. Configured with a custom task schedule to run the script this mirrors the aim of the shell script above.

@echo off
 
FOR /F “tokens=2 delims= ” %%i IN (‘”wmic.exe SERVICE GET Name, State” ^| findstr /I Stopped ^| findstr /I TEMS1’) DO if %%i==Stopped goto :stop
 
FOR /F “tokens=2 delims= ” %%i IN (‘”wmic.exe SERVICE GET Name, State” ^| findstr /I Running ^| findstr /I TEMS1’) DO if %%i==Running goto :end
 
:stop
echo %Date%, %Time%, TEMS service has stopped. Sending alert to OMNIbus >> E:IBMITM_Hub_TEMS_Status.log
%OMNIHOME%binnco_postmsg “  Identifier=’nco_postmsg@%COMPUTERNAME%>’” “Node=’%COMPUTERNAME%'” “Severity=5” “Manager=’nco_postmsg'” “Summary=’The service TEMS has stopped. All Tivoli Monitoring components maybe affected'” “AlertKey=’YourTivoli_TEMS_Failure'” “AlertGroup=’nco_postmsg'” “ActionFlag=0”
 
:end

Visits: 274