Welcome to Foglight > Monitoring Databases > Monitoring DB2 for LUW > Working with Rules

Working with Rules

The following is a summary of rules available out-of-the-box with the DB2 for LUW cartridge. Default threshold values can be changed or scoped to specific values, generally through registry variables. These rules can be copied, modified, disabled, or customized in a wide variety of ways.

This section describes the following rules:

Availability
HA/DR
Storage
Infrastructure
Operational
Instance Performance

Availability

Rules related to instance and database availability, connections, and monitoring agents.

Instance

DB2 - Database Availability

This alarm fires when the database is not running (down). In this case, users will not be able to connect or retrieve any information from the database.

DB2 - Database Connection Time

This alarm fires when the average connection time to the database exceeds a predefined threshold (Fatal: 20ms, Warning: 10ms).

DB2 - Database Response Time

This alarm fires when the database’s average response time exceeds a predefined threshold (Fatal: 20ms, Warning: 10ms).

DB2 - Instance Availability

This alarm fires when the instance is not running (down). In this case, users will not be able to connect or retrieve any information from the instance.

DB2 - Instance Version Changed

This alarm is invoked when a change in major DB2 release version is detected. It is designed to remind Foglight administrators to verify that the Foglight user permissions are applicable to the new version, to ensure continuous monitoring.

Monitoring Agents

DB2 - Collection Status

This alarm fires when a collection fails to retrieve its data.

DB2 - Database Monitor Parameters

This alarm is invoked when at least one of the monitoring configuration parameters metrics (MON_REQ_METRICS, MON_ACT_METRICS) is set to NONE. Such a setting results in partial data retrieval by the collection, which, in turn, is reflected in partial data display on the relevant dashboards.

DB2 - Monitored Switches

This alarm fires when at least one of the required monitored switches is off. For versions below 9.7.0.1 these include (UOW_SW_STATE, STATEMENT_SW_STATE, LOCK_SW_STATE, SORT_SW_STATE, TABLE_SW_STATE, BUFFERPOOL_SW_STATE, TIMESTAMP_SW_STATE). For version 9.7.0.1 and above monitor switches are not required. If one of the required switches is off, part of the data will be missing in the collections, and on the relevant dashboards.

HA/DR

Rules related to High Availability and Disaster Recovery configurations.

HADR

DB2 - Database HADR Disconnected

This alarm is invoked when the status of the HADR connection to the database is disconnected. Ensuring that the database is not disconnected is critical in order to prevent a crisis situation.

DB2 - Database HADR Failover

This alarm fires when the database failover occurred and the database role changes from Primary to Standby or vice versa.

DB2 - Database HADR Log Gap

This alarm is invoked when the HADR log gap exceeds a predefined threshold (Fatal: 16384KB, Warning: 4096KB). Ensuring that the gap between the standby and the primary database is kept as small as possible is important in order to have fast switch over, and to prevent large amount of data losses in a crisis situation.

DB2 - Database HADR State

This alarm is invoked when the HADR state is not in peer state. Ensuring that the database is in PEER state is important in order to prevent data losses in a crisis situation.

Cluster

DB2 - Cluster Backup CF Is Not In PEER State

This alarm is invoked when a cluster caching facility (CF) server that currently functions as backup CF server is not in PEER state, and therefore cannot serve as a primary CF in case the current primary fails.

DB2 - Cluster Member Not Started

This alarm is invoked when a pureScale member is either in error state or not running on its designated host.

DB2 - Instance FCM Connection Availability

This alarm is invoked when the status of the connection to an FCM member is Congested or Inactive. FCM availability issues affect the networking traffic between the instance members, and result in performance issues and instance unavailability.

Storage

Rules related to storage utilization including tablespaces, file systems, and log space.

Tablespaces

DB2 - Partition FS Storage Utilization

This alarm fires when the file system runs out of space, because its utilization exceeded a predefined threshold (Fatal: 90%, Warning: 80%). When a filesystem reaches its full capacity, writing data into the said filesystem is no longer possible. This inability to add data is critical if the database’s containers reside on this filesystem, in which case the database’s functionality will be affected.

DB2 - Partition Tablespace Resize Failure

This alarm fires when the most recent resize attempt has failed.

DB2 - Partition Tablespace Utilization

This alarm fires when the percentage of used tablespace exceeds a predefined threshold (Fatal: 90%, Warning: 80%). If the tablespace is configured to extend automatically (Autostorage), this property is taken into account, and the alarm is not invoked. A tablespace is a set of containers that contain data. A tablespace whose storage place becomes full can no longer store additional data; as a result, the application’s functionality may be adversely affected.

DB2 - Real Tablespace Utilization

This alarm fires when the percentage of used space within a tablespace exceeded a predefined threshold.

Log Space

DB2 - Database Log Utilization

This alarm fires when the percentage of used Log space exceeds a predefined threshold (Fatal: 90%, Warning: 80%).

DB2 - Member Log Utilization

This alarm fires when the percentage of used Log space exceeds a predefined threshold (Fatal: 90%, Warning: 80%).

Infrastructure

Rules related to CPU utilization and memory consumption.

CPU

DB2 - Database CPU Utilization Baseline

This alarm is invoked when the CPU utilization of the DB2 agents exceeds the baseline.

DB2 - Instance CPU Utilization Baseline

This alarm is invoked when the CPU utilization of the DB2 agents exceeds the baseline.

DB2 - Member CPU Utilization Baseline

This alarm is invoked when the CPU utilization of the DB2 agents exceeds the baseline.

Memory

DB2 - Database Memory Pool Utilization

This alarm is invoked when the Utilization of the database Memory Pool exceeds a predefined threshold (Fatal: 95%, Warning: 90%).

DB2 - Database Memory Usage Baseline

This alarm is invoked when the memory consumption of the DB2 agents exceeds the baseline.

DB2 - Instance Memory Pool Utilization

This alarm is invoked when the utilization of the instance’s memory pool exceeds a predefined threshold (Fatal: 95%, Warning: 90%).

DB2 - Instance Memory Usage Baseline

This alarm is invoked when the memory consumption of the DB2 agents exceeds the baseline.

DB2 - Member Memory Usage Baseline

This alarm is invoked when the memory consumption of the DB2 agents exceeds the baseline.

Operational

Rules related to backup, recovery, and diagnostic operations.

Backup and Recovery

DB2 - Database Backup Failed

This alarm is invoked when the last backup for the database partition has failed. If the database has never been backed up, you risk losing all data in the event of storage device (hardware) failure or mistaken deletion. Lack of backup also prevents recovering data to the requested restore point, before any unwanted changes took place.

DB2 - Partition Backup Failed

DB2 - Partition Days Since Last Backup

This alarm is invoked when the number of days that have passed since the last valid full database backup exceeds a predefined registry variable value (Fatal: 31 days, Warning: 7 days). If no valid full backup of the database has been carried out for several days, you risk losing data in the event of storage device (hardware) failure or mistaken deletion.

DB2 - Partition Days Since Last Backup - No Backup

This alarm is invoked when no valid full backup date is found. If the database has never been backed up, you risk losing all data in the event of storage device (hardware) failure or mistaken deletion. Lack of backup also prevents recovering data to the requested restore point, before any unwanted changes took place.

DB2 - Database Utility Failed

This alarm is invoked when operation fails. Review the message to find the cause of the problem.

Diagnostics

DB2 - Database Log Message

This alarm fires when a message is read from the DB2 diaglog and its severity exceeds the minimum severity threshold (warning / event / error / critical / severe). This message includes important information about the database issues from the DB2 diagnostic log file.

DB2 - Database Log Message Summary

This alarm fires when at least one message is found in the diagnostic log with severity level that match, or tops, the minimum alarm severity. Message count reports the number of messages found of each severity.

DB2 - Instance Log File Size

This alarm fires when the size of the Diag log file exceeds a predefined threshold size (Fatal: 1000MB).

Instance Performance

Rules related to cache performance, locks, and sort operations.

Cache Performance

DB2 - Database Index Cache Hit Ratio

This alarm fires when the index hit ratio of the database’s I/O activity falls below a predefined threshold (Fatal: 0%, Warning: 70%).

DB2 - Database Overall Cache Hit Ratio

This alarm fires when the overall cache hit ratio of the database’s I/O activity falls below a predefined threshold (Fatal: 0%, Warning: 70%).

DB2 - Member Database Bufferpool

This alarm is invoked when the percentage of total hit ratio percent spent on specific database buffer pool falls below a predefined threshold (Fatal: 0%, Warning: 70%). Buffer pool operations occur when data is being read from or written to the database memory. A high percentage of buffer pool-related I/O activity can indicate that the buffer cache is either set to a too small size or inefficiently used. Such a situation can possibly result in performance issues and excessive disk activity.

DB2 - Member Index Cache Hit Ratio

This alarm fires when the index hit ratio of the database’s I/O activity falls below a predefined threshold (Fatal: 0%, Warning: 70%).

DB2 - Member Overall Cache Hit Ratio

This alarm fires when the overall cache hit ratio of the database’s I/O activity falls below a predefined threshold (Fatal: 0%, Warning: 70%).

DB2 - Database Sort Overflow

This alarm is invoked when the percentage sort overflows spent by DB2 agents exceeds a predefined threshold (Fatal: 80%, Warning: 60%). Sort overflows can result from various causes, such as a small buffer pool cache, excessive buffer pool throughput, a large number of cache-based sorts, and a DB2 process that does not keep up with the workload. Sort overflow will cause sorting on the disk, thereby resulting in performance issues.

DB2 - Member Sort Overflow

Locks

DB2 - Member Deadlocks

This alarm fires when deadlocks were encountered for the database (Threshold: 1). A deadlock should be investigated by the DBA, as it can result in a rollback of uncommitted data, thereby leading to applicative risk to the data.

DB2 - Member Lock Timeout

This alarm fires when lock timeouts were encountered for the database (Threshold: 0 lock timeouts).

DB2 - Member Long Lock Running

This alarm is invoked when, during the last lock tree snapshot, the lock was detected as exceeding a predefined number of seconds (Threshold: 90 seconds). Frequent blocking locks can cause waits when data modifications take place, and possibly result in performance issues.

DB2 - Member Long Lock Running Expanded

This alarm is invoked when, during the last lock tree snapshot, the lock was detected as exceeding a predefined number of seconds (Threshold: 90 seconds). The alarm contains the lock details.