Published On: 6 September 2024|Last Updated: 6 September 2024|Categories: |Tags: |2.9 min read|

Are you consistently paying attention to your IT Service Management (ITSM) event alerts? More specifically, what aspects are you focusing on when an event occurs and an alert is triggered? What types of alerts should concern you the most?

To begin, let us first examine the concept of an “event” within the context of IT Service Management. According to ITIL, an event is defined as “any change of state that has significance for the management of a configuration item (CI) or IT service.” In typical server environments or services provided to users or customers, various log files capture these changes in the configuration items (CIs) of the IT service. Common examples of such log files include:

  1.  System
  2. Application
  3. Security
  4. Setup
  5. Database
  6. Boot/System Start

An alert, on the other hand, is a “notification that a particular event (or a series of events) has occurred, with the respective status (information, warning, or exception).” Alerts are categorized into three types:

With the large volume of logs generated by servers and services, manually reviewing each alert is impractical. In modern ICT environments, it is neither feasible to manually comb through every server or service nor efficient to manage large amounts of data using Excel alone.

Given the sheer volume of log files and alerts, a log analysis management tool or service is essential. Tools powered by artificial intelligence (AI) and machine learning (ML) can filter out events that require action, streamlining the process of identifying critical alerts.

One best practice for retaining a “live” status of log files is to store data for a minimum of six to 18 months on a rolling basis. Organizations with sufficient resources may extend this retention period to 24 months or more, depending on business needs. The larger the dataset, the more effective the analysis, as it allows for comparing trends over different periods, such as quarterly, half-yearly, or annually.

It is important to consider the size of the log files when storing them for extended periods. To balance this with the needs of AI or ML-driven log analysis tools, a recommended approach is to start with a six-month retention period and gradually increase it in three-month increments, assessing the usefulness of the extended data over time.

During initial configurations, human intervention is required to verify the accuracy of alerts and to define business rules for subsequent recovery actions. False positives need to be identified, and recovery processes fine-tuned. For handling alert-triggered actions, it is advisable to implement scripts or Robotic Process Automation (RPA) to standardize these processes across the organization.

Finally, setting up a live dashboard for key personnel to monitor alert statuses is crucial. Regular reviews of this dashboard help safeguard the IT environment, ensuring timely responses to potential issues.

By following these practices, organizations can maintain a robust and efficient ITSM alert management process that mitigates risks and enhances operational stability.


For more information on monitoring event alerts. Reach out to Cybiant’s consultants by dropping a quick e-mail at info@cybiant.com to us.

Visit our Cybiant Knowledge Centre to find out more about the latest insights.

Leave A Comment

Share this story to your favorite platform!