Chapter 25. Installing Nagios (Deprecated)
This section describes installing and testing Nagios, a system that monitors Hadoop cluster components and issues alerts on warning and critical conditions.
Install the Nagios RPMs
On the host you have chosen for the Nagios server, install the RPMs:
- For RHEL and CentOS: - yum -y install net-snmp net-snmp-utils php-pecl-json yum -y install wget httpd php net-snmp-perl perl-Net-SNMP fping nagios nagios- plugins nagios-www 
- For SLES: - zypper -n --no-gpg-checks install net-snmp zypper -n --no-gpg-checks install wget apache2 php php-curl perl-SNMP perl- Net-SNMP fping nagios nagios-plugins nagios-www 
Install the Configuration Files
There are several configuration files that must be set up for Nagios.
Extract the Nagios Configuration Files
From the HDP companion files, open the configuration_files folder and copy the files in the nagios folder to a temporary directory. The nagios folder contains two sub-folders, objects and plugins.
Create the Nagios Directories
- Create the following Nagios directories: - mkdir /var/nagios /var/nagios/rw /var/log/nagios /var/log/nagios/spool/checkresults /var/run/nagios
- Change ownership on those directories to the Nagios user: - chown -R nagios:nagios /var/nagios /var/nagios/rw /var/log/nagios /var/log/nagios/spool/checkresults /var/run/nagios
Copy the Configuration Files
- Copy the contents of the objects folder into place: - cp <tmp-directory>/nagios/objects/*.* /etc/nagios/objects/
- Copy the contents of the plugins folder into place: - cp <tmp-directory>/nagios/plugins/*.* /usr/lib64/nagios/plugins/
Set the Nagios Admin Password
- Choose a Nagios administrator password, for example, “admin”. 
- Set the password. Use the following command: - htpasswd -c -b /etc/nagios/htpasswd.users nagiosadmin admin
Set the Nagios Admin Email Contact Address
- Open /etc/nagios/objects/contacts.cfg with a text editor. 
- Change the nagios@localhost value to the admin email address so it can receive alerts. 
Register the Hadoop Configuration Files
- Open /etc/nagios/nagios.cfg with a text editor. 
- In the section OBJECT CONFIGURATION FILE(S), add the following: - # Definitions for hadoop servers cfg_file=/etc/nagios/objects/hadoop-commands.cfg cfg_file=/etc/nagios/objects/hadoop-hosts.cfg cfg_file=/etc/nagios/objects/hadoop-hostgroups.cfg cfg_file=/etc/nagios/objects/hadoop-services.cfg cfg_file=/etc/nagios/objects/hadoop-servicegroups.cfg 
- Change the command-file directive to /var/nagios/rw/nagios.cmd: - command_file=/var/nagios/rw/nagios.cmd
Set Hosts
- Open /etc/nagios/objects/hadoop-hosts.cfg with a text editor. 
- Create a "define host { … }" entry for each host in your cluster using the following format: - define host { alias @HOST@ host_name @HOST@ use linux-server address @HOST@ check_interval 0.25 retry_interval 0.25 max_check_attempts 4 notifications_enabled 1 first_notification_delay 0 # Send notification soon after #change in the hard state notification_interval 0 # Send the notification once notification_options d,u,r }
- Replace "@HOST@" with the hostname. 
Set Host Groups
- Open /etc/nagios/objects/hadoop-hostgroups.cfg with a text editor. 
- Create host groups based on all the hosts and services you have installed in your cluster. Each host group entry should follow this format: - define hostgroup { hostgroup_name@NAME@ alias@ALIAS@ members@MEMBERS@ }- The parameters (such as @NAME@) are defined in the following table. - Table 25.1. Host Group Parameters - Parameter - Description - @NAME@ - The host group name - @ALIAS@ - The host group alias - @MEMBERS@ - A comma-separated list of hosts in the group - The following table lists the core and monitoring host groups: - Table 25.2. Core and Monitoring Host Groups - Service - Component - Name - Alias - Members - All servers in the cluster - n/a - all-servers - All Servers - List all servers in the cluster - HDFS - NameNode - namenode - namenode - The NameNode host - HDFS - SecondaryNameNode - snamenode - snamenode - The Secondary NameNode host - MapReduce - JobTracker - jobtracker - jobtracker - The Job Tracker host - HDFS, MapReduce - Slaves - slaves - slaves - List all hosts running DataNode and TaskTrackers - Nagios - n/a - nagios-server - nagios-server - The Nagios server host - Ganglia - n/a - ganglia-server - ganglia-server - The Ganglia server host - The following table lists the ecosystem project host groups: - Table 25.3. Ecosystem Project Host Groups - Service - Component - Name - Alias - Members - HBase - Master - hbasemaster - hbasemaster - List the master server - HBase - Region - region-servers - region-servers - List all region servers - ZooKeeper - n/a - zookeeper-servers - zookeeper-servers - List all ZooKeeper servers - Oozie - n/a - oozie-server - oozie-server - The Oozie server - Hive - n/a - hiveserver - hiveserver - The Hive metastore server - WebHCat - n/a - webhcat-server - webhcat-server - The WebHCat server - Templeton - n/a - templeton-server - templeton-server - The Templeton server 
Set Services
- Open /etc/nagios/objects/hadoop-services.cfg with a text editor. This file contains service definitions for the following services: Ganglia, HBase (Master and Region), ZooKeeper, Hive, Templeton, and Oozie. 
- Remove any service definitions for services you have not installed. 
- Replace the parameters @NAGIOS_BIN@ and @STATUS_DAT@ based on the operating system. - For RHEL and CentOS: - @STATUS_DAT@ = /var/nagios/status.dat- @NAGIOS_BIN@ = /usr/bin/nagios
- For SLES: - @STATUS_DAT@ = /var/lib/nagios/status.dat- @NAGIOS_BIN@ = /usr/sbin/nagios
 
- If you have installed Hive or Oozie services, replace the parameter @JAVA_HOME@ with the path to the Java home. For example, /usr/java/default. 
Set Status
- Open /etc/nagios/objects/hadoop-commands.cfg with a text editor. 
- Replace the @STATUS_DAT@ parameter with the location of the Nagios status file. File location depends on your operating system. - For RHEL and CentOS: - /var/nagios/status.dat
- For SLES: - /var/lib/nagios/status.dat
 
Add Templeton Status and Check TCP Wrapper Commands
- Open - /etc/nagios/objects/hadoop-commands.cfgwith a text editor.
- Add the following commands: - define command{ command_name check_templeton_status command_line $USER1$/check_wrapper.sh $USER1$/check_templeton_status.sh $HOSTADDRESS$ $ARG1$ $ARG2$ $ARG3$ $ARG4$ $ARG5$ $ARG6$ $ARG7$ } define command{ command_name check_tcp_wrapper command_line $USER1$/check_wrapper.sh $USER1$/check_tcp -H $HOSTADDRESS$ -p $ARG1$ $ARG2$ }
Validate the Installation
Follow these steps to validate your installation.
- Validate the Nagios installation: - nagios -v /etc/nagios/nagios.cfg
- Start the Nagios server and httpd: - /etc/init.d/nagios start/etc/init.d/httpd start
- Confirm that the Nagios server is running: - /etc/init.d/nagios status- This should return: - nagios (pid #) is running...
- To test Nagios Services, run the following command: - /usr/lib64/nagios/plugins/check_hdfs_capacity.php -h namenode_hostname -p 50070 -w 80% -c 90%- This should return: - OK: DFSUsedGB:<some#>, DFSTotalGB:<some#>
- To test Nagios Access, browse to the Nagios server. - http://<nagios.server>/nagios- Login using the Nagios admin username (nagiosadmin) and password (see Set the Nagios Admin Password). Click on hosts to check that all hosts in the cluster are listed. Click on services to check that all of the Hadoop services are listed for each host. 
- Test Nagios alerts. - Login to one of your cluster DataNodes. 
- Stop the TaskTracker service: - su -l mapred -c "/usr/hdp/current/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/ conf stop tasktracker"
- Validate that you received an alert at the admin email address, and that you have critical state showing on the console. 
- Start the TaskTracker service. - su -l mapred -c "/usr/hdp/current/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/ conf start tasktracker"
- Validate that you received an alert at the admin email address, and that critical state is cleared on the console. 
 

