Severity | Expression | Message |
---|---|---|
Fatal | def connUtil = (#total_backends#/scope.monitoringAgent.max_connections) * 100; if(connUtil>registry(“connUtilFatal”)){return true;} | The connection utilization for this server is at @connUtil%. Your max_connections setting is @maxConnections. You may wish to increase this setting in order to ensure there are enough connections available. |
Critical | def connUtil = (#total_backends#/scope.monitoringAgent.max_connections) * 100; if(connUtil>registry(“connUtilCrit”)){return true;} | The connection utilization for this server is at @connUtil%. Your max_connections setting is @maxConnections. You may wish to increase this setting in order to ensure there are enough connections available. |
Warning | def connUtil = (#total_backends#/scope.monitoringAgent.max_connections) * 100; if(connUtil>registry(“connUtilWarn”)){return true;} | The connection utilization for this server is at @connUtil%. Your max_connections setting is @maxConnections. You may wish to increase this setting in order to ensure there are enough connections available. |
Default Registry Variables: connUtilWarn = 85%, connUtilCrit = 95%, connUtilFatal=99%
Severity | Expression | Message |
---|---|---|
Fatal | #backend_write_pct#>registry(“backendWriteFatal”) | A high backend write percentage means that backends are writing their own WAL buffers to disk rather than waiting for a checkpoint to do this. This may happen normally if your application performs many bulk insert statements. Otherwise, you may need to increase your background writer activity or increase your WAL Buffer allocation size. |
Critical | #backend_write_pct#>registry(“backendWriteCrit”) | A high backend write percentage means that backends are writing their own WAL buffers to disk rather than waiting for a checkpoint to do this. This may happen normally if your application performs many bulk insert statements. Otherwise, you may need to increase your background writer activity or increase your WAL Buffer allocation size. |
Warning | #backend_write_pct#>registry(“backendWriteWarn”) | A high backend write percentage means that backends are writing their own WAL buffers to disk rather than waiting for a checkpoint to do this. This may happen normally if your application performs many bulk insert statements. Otherwise, you may need to increase your background writer activity or increase your WAL Buffer allocation size. |
Default Registry Variables: backendWriteWarn = 60%, backendWriteCrit = 75%, backendWriteFatal = 90%
| Severity | Expression | Message | |
| — | — | — | |
| Fatal | def ds = server.DataService;
def free;
try{
free = ds.retrieveLatestValue(scope.monitoredHost.storage,“spaceAvailable”).value.avg;
}
catch(Exception e){return;}
if(free==null){return;} //Must be collected
def endTime = new java.sql.Timestamp(System.currentTimeMillis());
def startTime = endTime.minus(7).toTimestamp();
def metricStart = ds.retrieveEarliestTime(scope,“tablespace_growth_rate”);
if(metricStart.after(startTime)){return;} //Metric must be collected for longer than the start time for a representative average
def growthRate = ds.retrieveAggregate(scope,“tablespace_growth_rate”,startTime,endTime).value.avg;
if(growthRate<=0){return;}
//Convert to expected daily growth in MB
def dayGrowth = (growthRate*60*60*24)/(1024*1024);
//Divide free space by day growth
def daysToFull = free / dayGrowth;
if(daysToFull < registry(“dbCapacityFatal”)){return true;} | | If the average growth of the tablespaces of this database server over the last week continues at its current rate, it is predicted that the DB host will run out of disk space in @daysToFull day(s). |
| Critical | def ds = server.DataService;
def free;
try{
free = ds.retrieveLatestValue(scope.monitoredHost.storage,“spaceAvailable”).value.avg;
}
catch(Exception e){return;}
if(free==null){return;} //Must be collected
def endTime = new java.sql.Timestamp(System.currentTimeMillis());
def startTime = endTime.minus(7).toTimestamp();
def metricStart = ds.retrieveEarliestTime(scope,“tablespace_growth_rate”);
if(metricStart.after(startTime)){return;} //Metric must be collected for longer than the start time for a representative average
def growthRate = ds.retrieveAggregate(scope,“tablespace_growth_rate”,startTime,endTime).value.avg;
if(growthRate<=0){return;}
//Convert to expected daily growth in MB
def dayGrowth = (growthRate*60*60*24)/(1024*1024);
//Divide free space by day growth
def daysToFull = free / dayGrowth;
if(daysToFull < registry(“dbCapacityCrit”)){return true;} | If the average growth of the tablespaces of this database server over the last week continues at its current rate, it is predicted that the DB host will run out of disk space in @daysToFull day(s). | |
| Warning | def ds = server.DataService;
def free;
try{
free = ds.retrieveLatestValue(scope.monitoredHost.storage,“spaceAvailable”).value.avg;
}
catch(Exception e){return;}
if(free==null){return;} //Must be collected
def endTime = new java.sql.Timestamp(System.currentTimeMillis());
def startTime = endTime.minus(7).toTimestamp();
def metricStart = ds.retrieveEarliestTime(scope,“tablespace_growth_rate”);
if(metricStart.after(startTime)){return;} //Metric must be collected for longer than the start time for a representative average
def growthRate = ds.retrieveAggregate(scope,“tablespace_growth_rate”,startTime,endTime).value.avg;
if(growthRate<=0){return;}
//Convert to expected daily growth in MB
def dayGrowth = (growthRate*60*60*24)/(1024*1024);
//Divide free space by day growth
def daysToFull = free / dayGrowth;
if(daysToFull < registry(“dbCapacityWarn”)){return true;} | If the average growth of the tablespaces of this database server over the last week continues at its current rate, it is predicted that the DB host will run out of disk space in @daysToFull day(s). | |
Default Registry Variables: dbCapacityWarn = 60 days, dbCapacityCrit = 30 days, dbCapacityFatal = 15 days
Severity | Expression | Message |
---|---|---|
Fatal | def conList = #Current_Backends#; int total = #Current_Backends#.size() if(total==0){return false;} int waitNum = 0; for(conItem in conList) { if(conItem.waiting==“t”){waitNum++;} } def waitPct = 100*(waitNum/total); if(waitPct>registry(“queryWaitingFatal”)){return true;} | In the last 3 out of 5 evaluations, the percentage of queries in a waiting state have exceeded this severity threshold. The percentage of queries in a waiting state when this alarm was triggered is @waitPct%. |
Critical | def conList = #Current_Backends#; int total = #Current_Backends#.size() if(total==0){return false;} int waitNum = 0; for(conItem in conList) { if(conItem.waiting==“t”){waitNum++;} } def waitPct = 100*(waitNum/total); if(waitPct>registry(“queryWaitingCrit”)){return true;} | In the last 3 out of 5 evaluations, the percentage of queries in a waiting state have exceeded this severity threshold. The percentage of queries in a waiting state when this alarm was triggered is @waitPct%. |
Warning | def conList = #Current_Backends#; int total = #Current_Backends#.size() if(total==0){return false;} int waitNum = 0; for(conItem in conList) { if(conItem.waiting==“t”){waitNum++;} } def waitPct = 100*(waitNum/total); if(waitPct>registry(“queryWaitingWarn”)){return true;} | In the last 3 out of 5 evaluations, the percentage of queries in a waiting state have exceeded this severity threshold. The percentage of queries in a waiting state when this alarm was triggered is @waitPct%. |
Default Registry Variables: queryWaitingWarn = 10%, queryWaitingCrit = 20%, queryWaitingFatal = 30%
Severity | Expression | Message |
---|---|---|
Fatal | #idx_size_pct#>registry(“indexBloatFatal”) | The index size for this table is currently @idxPct% of the total table size. You may want to ensure the index table is being vacuumed in order to compact its size. |
Critical | #idx_size_pct#>registry(“indexBloatCrit”) | The index size for this table is currently @idxPct% of the total table size. You may want to ensure the index table is being vacuumed in order to compact its size. |
Warning | #idx_size_pct#>registry(“indexBloatWarn”) | The index size for this table is currently @idxPct% of the total table size. You may want to ensure the index table is being vacuumed in order to compact its size. |
Default Registry Variables: indexBloatWarn = 20%, indexBloatCrit = 25%, indexBloatFatal = 40%
Severity | Expression | Message |
---|---|---|
Fatal | if(#blks_hit_pct#<registry(“bufferHitFatal”)){return true;} | Buffer hit percentage for this object has been low for the last three consecutive evaluations. A higher hit percentage means that tuples are being returned from pages in memory rather than being read from disk, which consumes more system resources. The current buffer hit percentage is @hitPct% |
Critical | if(#blks_hit_pct#<registry(“bufferHitCrit”)){return true;} | Buffer hit percentage for this object has been low for the last three consecutive evaluations. A higher hit percentage means that tuples are being returned from pages in memory rather than being read from disk, which consumes more system resources. The current buffer hit percentage is @hitPct% |
Warning | if(#blks_hit_pct#<registry(“bufferHitCrit”)){return true;} | Buffer hit percentage for this object has been low for the last three consecutive evaluations. A higher hit percentage means that tuples are being returned from pages in memory rather than being read from disk, which consumes more system resources. The current buffer hit percentage is @hitPct% |
Default Registry Variables: bufferHitWarn = 75%, bufferHitCrit = 50%, bufferHitFatal = 25%
Severity | Expression | Message |
---|---|---|
Fatal | if(#total_blks_hit_pct#<registry(“bufferHitFatal”)){return true;} | Buffer hit percentage for this object has been low for the last three consecutive evaluations. A higher hit percentage means that tuples are being returned from pages in memory rather than being read from disk, which consumes more system resources. The current buffer hit percentage is @hitPct% |
Critical | if(#total_blks_hit_pct#<registry(“bufferHitCrit”)){return true;} | Buffer hit percentage for this object has been low for the last three consecutive evaluations. A higher hit percentage means that tuples are being returned from pages in memory rather than being read from disk, which consumes more system resources. The current buffer hit percentage is @hitPct% |
Warning | if(#total_blks_hit_pct#<registry(“bufferHitWarn”)){return true;} | Buffer hit percentage for this object has been low for the last three consecutive evaluations. A higher hit percentage means that tuples are being returned from pages in memory rather than being read from disk, which consumes more system resources. The current buffer hit percentage is @hitPct% |
Default Registry Variables: bufferHitWarn = 75%, bufferHitCrit = 50%, bufferHitFatal = 25%
Severity | Expression | Message |
---|---|---|
Critical | #checkpoints_req_pct#<registry(“checkReqCrit”) | A low percentage of checkpoints that are being performed are required, i.e. the checkpoint_timeout limit has been reached before the checkpoint_segments limit. This may indicate that your settings should be optimized for the level of activity normally seen on this DB server. A lower segment limit can also mean shorter crash recovery time if that is a concern. |
Warning | #checkpoints_req_pct#<registry(“checkReqWarn”) | A low percentage of checkpoints that are being performed are required, i.e. the checkpoint_timeout limit has been reached before the checkpoint_segments limit. This may indicate that your settings should be optimized for the level of activity normally seen on this DB server. A lower segment limit can also mean shorter crash recovery time if that is a concern. |
Default Registry Variables: checkReqWarn = 35%, checkReqCrit = 15%
Severity | Expression | Message |
---|---|---|
Fatal | if(#availability#.equals(“Not Available”)){return true;} | if(#availability#.equals(“Not Available”)){return true;} |
Severity | Expression | Message |
---|---|---|
Critical | def locksWaiting = #locks_waiting#; def timeout = scope.deadlock_timeout/1000; if(locksWaiting==0){return false;} for(def lockRow in scope.Current_Locks.current.value) { if(lockRow.granted==“f”){ if(lockRow.lock_age>timeout){return true;} }def locksWaiting = #locks_waiting#; def timeout = scope.deadlock_timeout/1000; if(locksWaiting==0){return false;} for(def lockRow in scope.Current_Locks.current.value) { if(lockRow.granted==“f”){ if(lockRow.lock_age>timeout){return true;} } } | You may have a potential deadlock issue. Currently, there are @lockNum locks that have not been granted and are older than the deadlock timeout setting of @timeout milliseconds. The oldest waiting lock age is @maxAge seconds and the average age is @avgAge seconds. |
Severity | Expression | Message |
---|---|---|
Critical | #standby_log_diff#>=registry(“repLagCrit”) | The difference in WAL segments between your primary’s sent_location and standby server’s replay_location has exceeded this threshold. If the standby server continues to fall behind, this may cause problems, especially if the primary server goes down in a high-availability situation. |
Warning | #standby_log_diff#>=registry(“repLagWarn”) | The difference in WAL segments between your primary’s sent_location and standby server’s replay_location has exceeded this threshold. If the standby server continues to fall behind, this may cause problems, especially if the primary server goes down in a high-availability situation. |
Default Registry Variables: repLagWarn = 5, repLagCrit = 10
Severity | Expression | Message |
---|---|---|
Fatal | #connection_time#>registry(“connTimeFatal”) | The agent’s last two connections to this PostgreSQL server have exceeded this rule’s threshold for connection time. The latest connection time was @connTime seconds. |
Critical | #connection_time#>registry(“connTimeCrit”) | The agent’s last two connections to this PostgreSQL server have exceeded this rule’s threshold for connection time. The latest connection time was @connTime seconds. |
Warning | #connection_time#>registry(“connTimeWarn”) | The agent’s last two connections to this PostgreSQL server have exceeded this rule’s threshold for connection time. The latest connection time was @connTime seconds. |
Default Registry Variables: connTimeWarn = 1.5, connTimeCrit = 3, connTimeFatal = 5