Problem management is a key process of IT service management. It consists in preventing problems and resulting incidents from happening. Ideally, a good problem management strategy should aim to solve problems before incidents occur.
There are two approaches to problem management:
- Reactive problem management, through the logging of calls and requests
- Proactive problem management, through
- trend analysis of call and request data (for example by performing simple searches
- the integration of event management tools that enable you to identify events (defined as any deviation from normal or expected operation of a piece of infrastructure) before incidents are logged
- the automated logging of calls and requests based on a set of user-defined criteria. This is supported by the AI Ops functionality, as described below
AI Ops enhances your Problem Management process by allowing you to set up rules to automatically log calls or requests based on events in your call and request activity. In this way, problems may be identified before Incidents occur.
You can schedule “AI Ops rules” which will run and analyze your Call and Request activity. Each rule has a “threshold” that is, a particular number of events within a running period, and a set of conditions which, when met, will automatically trigger a new call/request in ASM.
A Problem Manager suspects instability in the network environment. She can configure an automated AI Ops rule to log a Problem call whenever more than 5 high priority outage calls are logged against critical servers. She can configure the rule to exclude any servers that have a Physical Status of “In Test” or “Training Dedicated”. Finally, she can link a Call Template to the rule to direct the call to the Problem Management team when it is logged by the system. When this AI Ops runs and reaches the threshold of 5 high priority outage calls against the critical server, a new call is automatically logged by the system and forwarded to the Problem Management team.
The Problem Manager is concerned that a high level of redundancy in the network is making it difficult to identify unreliable hardware. She is most concerned when outages occur in multiple redundant items supporting a parent hardware item.
She configures an automated AI Ops rule to log a call whenever there are more than 5 occurrences in a 3-month period where more than 60% of the redundant items linked to the parent CMDB item are out at the same time.
You can use the IPK Workflow Rules Builder to automate the routing of calls and call notifications through IPK Rules.
an AI Ops rule will consider open calls or requests for analysis, including a call has been created but not yet forwarded to anyone.