HDFS HA Alerts
| Alert | Alert Type | Description | Potential Causes | Possible Remedies | 
|---|---|---|---|---|
| JournalNode Web UI | WEB | This host-level alert is triggered if the individual JournalNode process cannot be established to be up and listening on the network for the configured critical threshold, given in seconds. | The JournalNode process is down or not responding. The JournalNode is not down but is not listening to the correct network port/address. | Check if the JournalNode process is dead. | 
| NameNode High Availability Health | SCRIPT | This service-level alert is triggered if either the Active NameNode or Standby NameNode are not running. | The Active, Standby or both NameNode processes are down. | On each host running NameNode, check for any errors in the logs (/var/log/hadoop/hdfs/) and restart the NameNode host/process using Ambari Web. On each host running NameNode, run the netstat-tuplpn command to check if the NameNode process is bound to the correct network port. | 
| Percent JournalNodes Available | AGGREGATE | This service-level alert is triggered if the number of down JournalNodes in the cluster is greater than the configured critical threshold (33% warn, 50% crit ). It aggregates the results of JournalNode process checks. | JournalNodes are down. JournalNodes are not down but are not listening to the correct network port/address. | Check for dead JournalNodes in Ambari Web. | 
| ZooKeeper Failover Controller Process | PORT | This alert is triggered if the ZooKeeper Failover Controller process cannot be confirmed to be up and listening on the network. | The ZKFC process is down or not responding. | Check if the ZKFC process is running. | 

