Advanced Event Management: Making full use of the tools you have

Over the last few years I have spent a great deal of time working within the event management arena for numerous and varied customers. This experience has enabled me to look across differing environments and compare the how’s and why’s of solution design and creation.

Food for thought

Businesses naturally spend time developing, and even perfecting their Event Management solution ensuring that the event flow between monitoring systems and their event management system is as good as it can be. It takes time and effort to develop this level of integration and so, the next step (or sometimes leap!) can sometimes remain just out of reach.

Most systems management environments suffer from a ‘grey area’ which occupies that space between the event management solution and the real time alert workflow and operational requirements. Experience has shown me that while this grey area can be fuelled by interdepartmental politics it generally exists because of a disconnect; a gap between the technology on offer and that understood and made available to the end user – the operations and support teams.

Bridge that gap

Ok, so we have a gap between technology and user. What we need now is a bridge; a way to encourage the user to buy in to the solution..

The idea behind an Advanced Event Management solution is to close the gap by engaging the support and operations personnel. Through the deployment of tools designed by the user, for the user, that gap becomes a dim and distant memory.
I use the following bullets to help define what makes an Advanced Event Management solution. Maybe these can help you?

  • Creation of tools which provide a simple, well formatted view of all information relating to event work-flow as determined by the operations personnel
  • Automations designed to remove those repetitive operational tasks
  • Providing tools which give greater visibility and control over the event management environment, thereby enabling operational and support personnel to take ownership of their own solution

Hopefully you can see that the solution is not rocket-science but it does require a clear understanding of your customer and their requirements.

The customer is an integral part of every solution you build

The solution is not that far off

So, what is required?

First and foremost in many cases the tools required to effectively implement an Advanced Event Management solution are probably already at your finger tips. No need for new tools when the existing ones are more than capable!

Advanced Event Management revolves around event handling and visualisation and automation. All of which are key features of the Tivoli Netcool suite.

Anyone running a Tivoli systems management environment will know of Netcool, a product suite I’ve come to know and love over the years. Included within is a product which has become my best friend over those years, Netcool/IMPACT. Anyone taking their event management solution to the next level needs to be looking closely at this.
Luckily those of you running Netcool/OMNIbus 7.3.1+ probably already have Netcool/IMPACT at your disposal

Advance Event Management solutions – the easy way

I have spent many hours researching event management tools of all kinds and have used that knowledge to create all manner of solutions across the event management arena. To give some insight into the possibilities I’d like to walk you through one of my solutions:

The basic scope:

  1. Create an event management solution which enables operations staff to take control of their environment and frees them to perform more first line support duties
  2. Where possible improve operational response through automated tooling
  3. Create the tools to empower both operations and support personnel, allowing systems management staff to concentrate on maintaining the systems management environment

Before starting any design and development I must refer back to previous experience which tells me that every advanced event management solution should at least consider including solutions in the following areas to address the common operational concerns:

  • Alert enrichment providing operations with much needed additional information enabling automated handling of alerts
  • Where data is available to drive them, automation routines for those repetitive operational tasks
  • A Maintenance solution
  • A Message Catalogue solution

The solution takes shape

As in all things, any event management solution is only as good as the data it’s based on and this solution is no exception.

The first step in our solution must be to identify data which actively helps the operational support process. Generally this comes in the form of inventory, application and support data held within a CMDB.

Immediately we have a job for Netcool/IMPACT. A well documented function of Netcool/IMPACT is to enrich incoming alerts with data retrieved from a remote datasource. IMPACT’s interfaces to external devices and applications are wide and varied, the most commonly used being JDBC database connections and API calls via SOAP.

But this highlights questions..

  • Do you really want to store all of this useful information in each alert? Remember this does have an effect on ObjectServer performance
  • Realistically, is the Active Event List the best way to view our enriched business/application data?

Simply put,
In order to enable automated processing of incoming alerts, limited enrichment should be implemented.
Point of fact: the categorisation and routing of alerts to the correct operational support team is very simple when the correct data is available for comparison.. 
BUT maintaining all application and business related data within an alert is inefficient and can be problematic…

How do we make best use of this wealth of information?

Simple, we make use of a feature of Netcool/IMPACT, the Operator View.
We create Operator Views enabling us to carry out the same data mining and logical decision making as we would when enriching the alert BUT we do it on demand. The retrieved data is not stored within the alert instead we merely format it and display it to the user in customised web pages.
Using IMPACT Operator Views we can take minimal input from any event, say the Node name, and using the power of the Netcool/IMPACT policy language dynamically identify all associated business/application information such as inventory, application, support responsibility, support rota, change management records, outstanding Incident records, etc and display it all within a single webpage of our design.

But still these pages offer so much more than just a view of retrieved data..

Remember the requirements?  A short discussion with the operations staff later and I now have single-click action buttons included within the page. These buttons are created by configuring Netcool/IMPACT action smart tags which, when formatted as buttons, enable users to trigger automation actions via:

  • direct execution of IMPACT policies via java script
  • open a new URL which could point to any site including another Operator View, an Operator View designed to carry out a specific support action and report the results back to the user

So, the first stage of my solution is complete.
We have a view which can be opened directly via a URL or by selecting an alert within OMNIbus WebGUI and executing the URL tool support_info. The tool passes only the value of hostname to the page which then displays all data retrieved from the various business data sources.

Operations ViewThe associated IMPACT policy triggered behind the scenes interacts with the customer’s 4 CMDB databases retrieving information such as server information, datacenter location, application data and responsible support contact along with their valid contact details

The page contains dynamically constructed links to the relevant Message Catalogue page, a tab containing details of any open Change Records associated with the server OR application and a tab showing any open Incidents associated with the server component.

Following the discussions with the operations staff the following action buttons were included in the page:

  • triggering a policy to email the support contact using the address retrieved from the CCMDB
  • trigger the creation of a trouble ticket via Impact’s Service Now interface
  • display related/correlated/root cause alerts linked to this alert
  • quick acknowledgement of the alert associating the operational user or responsible support contact

Hang on, obviously I’ve missed a development stage..

Our view contains a lot more than information retrieved from the CMDB; there are various embedded support tools and information sources all designed to aid the support process. Arguably the most important of these is the Message Catalogue

The Message Catalogue

A vitally important step in maintaining control of any systems management environment is to ensure that each and every alert passing through that environment is fully documented and has a clear operational workflow. The solution?

The Message Catalogue.    Message CatalogueThere are many ways to implement a Message Catalogue most of which are based on one of the many document control products available.

Orb Data supply a Message Catalogue product which makes use of the docuwiki engine to serve support pages to users. The Message Catalogue consists of an interface allowing authorised users access to administration, page creation and page view via their internet browser.
Using a browser interface means that the Message Catalogue can easily be installed into virtually any environment with assured interoperability with the existing infrastructure plus simple, straightforward integration with Advanced Event Management pages and views.
Click here for more information of the Orb Data supplied Message Catalogue

The final stage – Maintenance

With the operational view in place we can now move on to the final piece of the puzzle maintenance management.

In many event management environments the suppression of events becomes something handled within event management systems with very little visibility or control given to the local support community.
With the introduction of Netcool/IMPACT and Operator Views it is now not only possible but desirable to enable the operations and support staff to manage and control their own maintenance data via browser based views.

The Orb Data – Maintenance Manager was designed with this in mind and makes use of Netcool/IMPACT to power a clean, open solution into maintenance window management.

Maintenance ManagerAs the example images above show, Maintenance Manager makes use of Netcool/IMPACT automation to control maintenance and Netcool/IMPACT operator views to enable simple browser based access to maintenance window entries held by default within the HSQL database on the Impact Server.
Maintenance Manager is simply installed alongside our existing solutions built into Netcool/IMPACT. Now the operations and support personnel are given access allowing them a clear view of all maintenance and suppression configuration across the environment.

Supplying this open interface to the maintenance solution enables support and operations staff greater control and should naturally lead to operations taking ownership of the day-to-day management of maintenance data
Operator View workflow

To sum up

The result? Well, the gain in operational efficiency hopefully speaks for itself but the other benefits are much more exciting:

  1. A better quality of life for operations, support and systems management staff
  2. Staff buy in to the solution leading to greater input and involvement in the development and running of the environment

I hope that the concepts and the solution I have discussed in this blog helps you towards realising your own advanced event management solutions.
If you are interested in discussing the subject further please feel free to leave comments on this post or contact me directly at: richard.newbold@orb-data.com

And please, watch this space for further posts in this subject.

Visits: 67