It is 3am on a Sunday morning in the Operations Bridge when suddenly an alert appears on an operator’s event console. The alert message is cryptic, there’s no clue as to what has generated it or what it means. Has the critical batch process with the SLA completion time of 6am failed or is it something innocuous that can wait until Monday morning?
What to do - ignore it and hope it’s not important? Page a random support team and hope they know what to do? There might be some documentation around here somewhere but nobody knows where to look. If only John was on tonight, he’s probably seen it before and would know what to do. For many organisations this is an all too familiar scenario, with the Operations Bridge having to rely on a combination of previous experience and disparate documentation to determine the action they need to take. As a result, operators’ actions are inconsistent, incident recovery is delayed and the effort and financial cost expended in the implementation of the underlying monitoring tools is undermined.
These problems can be alleviated through the implementation of a Message Catalogue.
The primary function of the Message Catalogue is to document the alerts that a system or application is capable of generating during its operation with the aim of providing the Operations Bridge and other support teams with information that will enable them to respond to incoming alerts in a timely, appropriate and consistent fashion. Although that response may simply be the escalation process that should be followed to ensure that the right support team is notified of the problem, the Message Catalogue really starts to add value when it is used to document the first line support actions that operators should carry out to either resolve the problem or to gather additional data before escalating to the relevant second line support team. This empowerment of the Operations Bridge, moving them away from a role as glorified telephone operators, has many benefits to the organisation, not least in reducing the mean time to recovery of incidents and off-loading some of the burden from second line teams.
Orb Data Message Catalogue
As you might expect, the comprehensive Orb Data Managed Service solution includes a Message Catalogue, however we have decided to offer the Message Catalogue as a standalone service for companies that do not want a managed service but would like to improve their message management processes.
The content of a Message Catalogue is generally collaborative in nature and with this in mind the Orb Data Message Catalogue is built on wiki technology. The Orb Data Message Catalogue is implemented using the open source DokuWiki wiki engine (www.dokuwiki.org).
DokuWiki is a standards compliant, simple to use wiki. It has a simple but powerful syntax which makes sure the data files remain readable outside the wiki and eases the creation of structured texts. All data is stored in plain text files meaning no database is required.
This type of technology offers the following general benefits:
- Centralisation of data which removes the risk of obsolete documents and the need to distribute document revisions across the organisation
- Pages can be quickly and easily create and updated
- Simple mark-up syntax
- Page templates can be used to speed up page creation and promote standards
- Attachment support for images, PDFs, Word documents etc to supplement page content
- Search capability to quickly find specific and related pages
- Page revisions tracked and previous versions stored
In addition, the wiki technology employed by the Orb Data Message Catalogue also provides
- User Accounts and Access Control Lists to ensure that whilst content can be made generally available, only authorised users can make modifications
- LDAP/Active Directory integration for user authentication
- Customisable style templates for modifying the visual appearance and functionality of pages
- Customisable plugins for extending the wiki syntax and implementing bespoke functionality
Whilst the primary focus is on delivering the Message Catalogue content, the underlying wiki can of course be adopted for wider use within the organisation.
Orb Data Message Catalogue Entry Format
For a Message Catalogue to be successful it is imperative that a well defined and structured format is applied consistently across all entries. Although capable of being tailored for individual customer requirements, a typical format for entries in the Orb Data Message Catalogue is as follows:
- Alert Text – the alert as it will appear to the operator
- Severity – the severity of the alert
- Description – more detail about the alert including what has generated it, why it has been generated and what the implications are
- Business Impact – what impact the problem represented by the alert is likely to have on the immediate service and the business as a whole. This information is of vital importance as it will help the operator to understand the significance of the alert and therefore the priority that should be attributed to it
- Required Action – how the operator should respond to this alert. Typically includes what first line support actions should be taken and any cross references to the Known Error Database
- Escalation – how the operator should escalate this alert to the relevant support area e.g. raise an incident, page on-call support
- Owner – each entry in the Message Catalogue has a designated owner who is responsible for the content. This is a key field as it provides a direct contact, should details regarding the message need to be clarified or updates suggested
- Revision History – how this entry in the Message Catalogue has been changed over time
The Orb Data Message Catalogue offers several integration points with the wider IBM’s Tivoli product suite.
IBM Tivoli Monitoring
For Tivoli Enterprise Portal users, the Message Catalogue entry that an ITM situation relates to can be directly accessed using the situation Expert Advice function.
The URLs to the pages in the wiki are constructed using a combination of the namespace and page name. For example, to access the page with a designated index of 000001 in the catalogue provided by the UNIX support team, the required URL would be http://wikiserver/unix:000001.
When this URL is added to the situation’s Expert Advice, the Message Catalogue entry is displayed in the situation workspace when it is accessed from either the situation notification icon on the Navigator tree or by selecting “Situation Event Results...” from the Situation Event Console. As a result, all the information required by the operator is immediately available.
IBM Tivoli Netcool/OMNIbus and Netcool/Webtop
As with the ITM TEPS integration, Netcool/OMNIbus and Netcool/Webtop can be configured to provide direct operator access to the Message Catalogue entry that relates to an incoming alert.
The first step in this process is to identify a field in the Netcool/OMNIbus alerts.status table that can be used to store the Message Catalogue URL details. A default field called ‘URL’ exists which could be used, or a new custom field could be added if ‘URL’ is already in use. As the alerts are processed by the probes, the field should be populated with the relevant Message Catalogue information.
As an example, consider the EIF events generated by IBM Tivoli Monitoring. Using the situation EIF Slot Customisation tool, we can use the base msg_catalog and msg_index slots to hold information about the Message Catalogue entry that applies to the situation.
A rule can then be added to the Netcool/OMNIbus EIF probe which is used to construct the complete URL from the msg_catalog and msg_index tokens.
With the Message Catalogue information now captured as part of the alert record, it is a straightforward task to create a tool that provides the operators with a right-click option to access the Message Catalogue entry for that alert.
IBM Tivoli Netcool/Impact
One of the common functions implemented using Netcool/Impact is the automated escalation of alerts. By handling routine and mundane tasks such as the generation of emails and the creation of incident records, Impact enables operators to focus on service recovery activities.
The Orb Data Message Catalogue wiki implementation includes a custom plugin that extends the DokuWiki syntax to provide a method of encapsulating escalation information such as email addresses and trouble ticket queues in the wiki page. This information is rendered in the resulting HTML in a format that can be extracted by an Impact SocketDSA script and made available to Impact policies.
The Orb Data Message Catalogue can either be supplied as a VMware appliance or as a services engagement.
The VMware appliance runs on a Linux operating system and can simply be plugged into your VM VMware estate and configured.
Configuration entails changing the IP address, running the integrations (depending on the products you have) and then populating the wiki with details of your events. Population can either be done by your team or Orb Data but remember either way that any event that does not have a resolution action is probably not worth receiving in the first place.
Appliances are not for everybody and because of this we offer the choice of installing the Message Catalogue on your own hardware. Once this is done we will integrate it with the tools of your choice and then either leave you to populate the details of the events your want or we can help with the process to get you started.