To enable security on HDP 2, you must add optional information to various configuration files.
Before you begin, set JSVC_Home in hadoop-env.sh.
For RHEL/CentOS/Oracle Linux:
export JSVC_HOME=/usr/libexec/bigtop-utils
For SLES and Ubuntu:
export JSVC_HOME=/usr/lib/bigtop-utils
To the core-site.xml file on every host in your cluster, you must add the following
information:
Table 18.3. core-site.xml
| Property Name | Property Value | Description |
|---|---|---|
|
|
|
Set the authentication type for the cluster. Valid values are: simple or kerberos. |
hadoop.rpc.protection | authentication; integrity; privacy | This is an [OPTIONAL] setting. If not set, defaults to
|
|
| Enable authorization for different protocols. |
|
|
The mapping rules. For
example |
The mapping from Kerberos principal names to local OS user names. See Creating Mappings Between Principals and UNIX Usernames for more information. |
The XML for these entries:
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
<description> Set the
authentication for the cluster. Valid values are: simple or
kerberos.
</description>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
<description> Enable
authorization for different protocols.
</description>
</property>
<property>
<name>hadoop.security.auth_to_local</name>
<value>
RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/
RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/
RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/
RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/
DEFAULT</value> <description>The mapping from kerberos principal names
to local OS user names.
</property>
To the hdfs-site.xml file on every host in your cluster, you must add the following
information:
Table 18.4. hdfs-site.xml
| Property Name | Property Value | Description |
|---|---|---|
dfs.permissions.enabled | true | If true, permission checking in HDFS is enabled.
If false, permission checking is turned off,
but all other behavior is unchanged.
Switching from one parameter value to the other does not change the mode,
owner or group of files or directories.
|
dfs.permissions.supergroup | hdfs | The name of the group of super-users. |
dfs.block.access.token.enable | true | If true, access tokens are used as capabilities for accessing datanodes.
If false, no access tokens are checked on accessing datanodes.
|
dfs.namenode.kerberos.principal | nn/_HOST@EXAMPLE.COM | Kerberos principal name for the NameNode. |
dfs.secondary.namenode.kerberos.principal | nn/_HOST@EXAMPLE.COM | Kerberos principal name for the secondary NameNode. |
dfs.web.authentication.kerberos.principal | HTTP/_HOST@EXAMPLE.COM |
The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. |
dfs.web.authentication.kerberos.keytab
| /etc/security/keytabs/spnego.service.keytab | The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. |
dfs.datanode.kerberos.principal
| dn/_HOST@EXAMPLE.COM | The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real host name . |
dfs.namenode.keytab.file
| /etc/security/keytabs/nn.service.keytab | Combined keytab file containing the NameNode service and host principals. |
dfs.secondary.namenode.keytab.file
| /etc/security/keytabs/nn.service.keytab | Combined keytab file containing the NameNode service and host principals. <question?> |
dfs.datanode.keytab.file | /etc/security/keytabs/dn.service.keytab | The filename of the keytab file for the DataNode. |
dfs.https.port | 50470 | The https port to which the NameNode binds |
dfs.namenode.https-address
| Example: ip-10-111-59-170.ec2.internal:50470 | The https address to which the NameNode binds |
dfs.datanode.data.dir.perm
| 750 | The permissions that must be set on the
dfs.data.dir directories. The DataNode
will not come up if all existing
dfs.data.dir directories do not have
this setting. If the directories do not exist, they will be
created with this permission |
dfs.cluster.administrators | hdfs | ACL for who all can view the default servlets in the HDFS |
dfs.namenode.kerberos.internal.spnego.principal
| ${dfs.web.authentication.kerberos.principal} | |
dfs.secondary.namenode.kerberos.internal.spnego.principal | ${dfs.web.authentication.kerberos.principal} |
The XML for these entries:
<property>
<name>dfs.permissions</name>
<value>true</value>
<description> If "true", enable permission checking in
HDFS. If "false", permission checking is turned
off, but all other behavior is
unchanged. Switching from one parameter value to the other does
not change the mode, owner or group of files or
directories. </description>
</property>
<property>
<name>dfs.permissions.supergroup</name>
<value>hdfs</value>
<description>The name of the group of
super-users.</description>
</property>
<property>
<name>dfs.namenode.handler.count</name>
<value>100</value>
<description>Added to grow Queue size so that more
client connections are allowed</description>
</property>
<property>
<name>ipc.server.max.response.size</name>
<value>5242880</value>
</property>
<property>
<name>dfs.block.access.token.enable</name>
<value>true</value>
<description> If "true", access tokens are used as capabilities
for accessing datanodes. If "false", no access tokens are checked on
accessing datanodes. </description>
</property>
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>nn/_HOST@EXAMPLE.COM</value>
<description> Kerberos principal name for the
NameNode </description>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.principal</name>
<value>nn/_HOST@EXAMPLE.COM</value>
<description>Kerberos principal name for the secondary NameNode.
</description>
</property>
<property>
<!--cluster variant -->
<name>dfs.secondary.http.address</name>
<value>ip-10-72-235-178.ec2.internal:50090</value>
<description>Address of secondary namenode web server</description>
</property>
<property>
<name>dfs.secondary.https.port</name>
<value>50490</value>
<description>The https port where secondary-namenode
binds</description>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@EXAMPLE.COM</value>
<description> The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP
SPNEGO specification.
</description>
</property>
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
<description>The Kerberos keytab file with the credentials for the HTTP
Kerberos principal used by Hadoop-Auth in the HTTP endpoint.
</description>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>dn/_HOST@EXAMPLE.COM</value>
<description>
The Kerberos principal that the DataNode runs as. "_HOST" is replaced by the real
host name.
</description>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytabs/nn.service.keytab</value>
<description>
Combined keytab file containing the namenode service and host
principals.
</description>
</property>
<property>
<name>dfs.secondary.namenode.keytab.file</name>
<value>/etc/security/keytabs/nn.service.keytab</value>
<description>
Combined keytab file containing the namenode service and host
principals.
</description>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/security/keytabs/dn.service.keytab</value>
<description>
The filename of the keytab file for the DataNode.
</description>
</property>
<property>
<name>dfs.https.port</name>
<value>50470</value>
<description>The https port where namenode
binds</description>
</property>
<property>
<name>dfs.https.address</name>
<value>ip-10-111-59-170.ec2.internal:50470</value>
<description>The https address where namenode binds</description>
</property>
<property>
<name>dfs.datanode.data.dir.perm</name>
<value>750</value>
<description>The permissions that should be there on
dfs.data.dir directories. The datanode will not come up if the
permissions are different on existing dfs.data.dir directories. If
the directories don't exist, they will be created with this
permission.</description>
</property>
<property>
<name>dfs.access.time.precision</name>
<value>0</value>
<description>The access time for HDFS file is precise upto this
value.The default value is 1 hour. Setting a value of 0
disables access times for HDFS.
</description>
</property>
<property>
<name>dfs.cluster.administrators</name>
<value> hdfs</value>
<description>ACL for who all can view the default
servlets in the HDFS</description>
</property>
<property>
<name>ipc.server.read.threadpool.size</name>
<value>5</value>
<description></description>
</property>
<property>
<name>dfs.namenode.kerberos.internal.spnego.principal</name>
<value>${dfs.web.authentication.kerberos.principal}</value>
</property>
<property>
<name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>
<value>${dfs.web.authentication.kerberos.principal}</value>
</property> In addition, you must set the user on all secure DataNodes:
export HADOOP_SECURE_DN_USER=hdfs export HADOOP_SECURE_DN_PID_DIR=/grid/0/var/run/hadoop/$HADOOP_SECURE_DN_USER
To the mapred-site.xml file on every host in your cluster, you must add the following
information:
Table 18.5. mapred-site.xml
| Property Name | Property Value | Description | Final |
|---|---|---|---|
mapreduce.jobtracker.kerberos.principal | jt/_HOST@EXAMPLE.COM | Kerberos principal name for the JobTracker | |
mapreduce.tasktracker.kerberos.principal
| tt/_HOST@EXAMPLE.COM | Kerberos principal name for the TaskTracker.
_HOST" is replaced by the host name of
the task tracker. | |
hadoop.job.history.user.location
| none | true | |
mapreduce.jobtracker.keytab.file
| /etc/security/keytabs/jt.service.keytab
| The keytab for the JobTracker principal | |
mapreduce.tasktracker.keytab.file
|
/etc/security/keytabs/tt.service.keytab
| The keytab for the Tasktracker principal | |
mapreduce.jobtracker.staging.root.dir
| /user
| The path prefix for the location of the the staging directories. The next level is always the user's name. It is a path in the default file system | |
mapreduce.tasktracker.group
| hadoop | The group that the task controller uses for accessing the task controller. The mapred user must be a member and users should not be members. <question?> | |
mapreduce.jobtracker.split.metainfo.maxsize
| 50000000 | If the size of the split metainfo file is larger than this value, the JobTracker will fail the job during initialization. | true |
mapreduce.history.server.embedded
|
false
| Should the Job History server be embedded within the JobTracker process | true |
|
Note: cluster variant |
Example: ip-10-111-59-170.ec2.internal:51111 | ||
|
Note: cluster variant | jt/_HOST@EXAMPLE.COM | Kerberos principal name for JobHistory. This must map to the same user as the JT user. | true |
Note: cluster variant | /etc/security/keytabs/jt.service.keytab | The keytab for the JobHistory principal | |
mapred.jobtracker.blacklist.fault-timeout-window
| Example: 180 | 3-hour sliding window - the value is specified in minutes. | |
mapred.jobtracker.blacklist.fault-bucket-width
| Example: 15 | 15-minute bucket size - the value is specified in minutes. | |
mapred.queue.names
| default | Comma separated list of queues configured for this jobtracker. |
The XML for these entries:
<property>
<name>mapreduce.jobtracker.kerberos.principal</name>
<value>jt/_HOST@EXAMPLE.COM</value>
<description> JT
user name key. </description>
</property>
<property>
<name>mapreduce.tasktracker.kerberos.principal</name>
<value>tt/_HOST@EXAMPLE.COM</value>
<description>tt
user name key. "_HOST" is replaced by the host name of the task tracker.
</description>
</property>
<property>
<name>hadoop.job.history.user.location</name>
<value>none</value>
<final>true</final>
</property>
<property>
<name>mapreduce.jobtracker.keytab.file</name>
<value>/etc/security/keytabs/jt.service.keytab</value>
<description>
The keytab for the jobtracker principal.
</description>
</property>
<property>
<name>mapreduce.tasktracker.keytab.file</name>
<value>/etc/security/keytabs/tt.service.keytab</value>
<description>The filename of the keytab for the task
tracker</description>
</property>
<property>
<name>mapreduce.jobtracker.staging.root.dir</name>
<value>/user</value>
<description>The Path prefix for where the staging
directories should be placed. The next level is always the user's name. It
is a path in the default file system.</description>
</property>
<property>
<name>mapreduce.tasktracker.group</name>
<value>hadoop</value>
<description>The group that the task controller uses for accessing the task controller.
The mapred user must be a member and users should *not* be
members.</description>
</property>
<property>
<name>mapreduce.jobtracker.split.metainfo.maxsize</name>
<value>50000000</value>
<final>true</final>
<description>If the size of the split metainfo file is larger than this, the JobTracker
will fail the job during
initialize.
</description>
</property>
<property>
<name>mapreduce.history.server.embedded</name>
<value>false</value>
<description>Should job history server be embedded within Job tracker process</description>
<final>true</final>
</property>
<property>
<name>mapreduce.history.server.http.address</name>
<!--cluster variant -->
<value>ip-10-111-59-170.ec2.internal:51111</value>
<description>Http address of the history server</description>
<final>true</final>
</property>
<property>
<name>mapreduce.jobhistory.kerberos.principal</name>
<!--cluster variant -->
<value>jt/_HOST@EXAMPLE.COM</value>
<description>Job history user name key. (must map to same user as JT user)</description>
</property>
<property>
<name>mapreduce.jobhistory.keytab.file</name>
<!--cluster variant -->
<value>/etc/security/keytabs/jt.service.keytab</value>
<description>The keytab for the job history server
principal.</description>
</property>
<property>
<name>mapred.jobtracker.blacklist.fault-timeout-window</name>
<value>180</value>
<description> 3-hour
sliding window (value is in minutes)
</description>
</property>
<property>
<name>mapred.jobtracker.blacklist.fault-bucket-width</name>
<value>15</value>
<description>
15-minute bucket size (value is in minutes)
</description>
</property>
<property>
<name>mapred.queue.names</name>
<value>default</value> <description>
Comma separated list of queues configured for this jobtracker.</description>
</property>
For Hbase to run on a secured cluster, Hbase must be able to authenticate
itself to HDFS. To the hbase-site.xml file on your HBase
server, you must add the following information. There are no default values; the
following are all only examples:
Table 18.6. hbase-site.xml
| Property Name | Property Value | Description |
|---|---|---|
hbase.master.keytab.file
| /etc/security/keytabs/hm.service.keytab | The keytab for the HMaster service principal |
hbase.master.kerberos.principal | hm/_HOST@EXAMPLE.COM | The Kerberos principal name that should be used to run the
HMaster process. If _HOST is used as the
hostname portion, it will be replaced with the actual
hostname of the running instance. |
hbase.regionserver.keytab.file | /etc/security/keytabs/rs.service.keytab | The keytab for the HRegionServer service principal |
hbase.regionserver.kerberos.principal | rs/_HOST@EXAMPLE.COM | The Kerberos principal name that should be used to run the
HRegionServer process. If _HOST is used as the
hostname portion, it will be replaced with the actual
hostname of the running instance. |
hbase.superuser
| hbase | Comma-separated List of users or groups that are allowed full privileges, regardless of stored ACLs, across the cluster. Only used when HBase security is enabled. |
hbase.coprocessor.region.classes
| Comma-separated list of Coprocessors that are loaded by default on all tables. For any override coprocessor method, these classes will be called in order. After implementing your own Coprocessor, just put it in HBase's classpath and add the fully qualified class name here. A coprocessor can also be loaded on demand by setting HTableDescriptor. | |
hbase.coprocessor.master.classes
| Comma-separated list of org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that are loaded by default on the active HMaster process. For any implemented coprocessor methods, the listed classes will be called in order. After implementing your own MasterObserver, just put it in HBase's classpath and add the fully qualified class name here. |
The XML for these entries:
<property>
<name>hbase.master.keytab.file</name>
<value>/etc/security/keytabs/hm.service.keytab</value>
<description>Full path to the kerberos keytab file to use for logging
in the configured HMaster server principal.
</description>
</property>
<property>
<name>hbase.master.kerberos.principal</name>
<value>hm/_HOST@EXAMPLE.COM</value>
<description>Ex. "hbase/_HOST@EXAMPLE.COM".
The kerberos principal name that
should be used to run the HMaster process. The
principal name should be in
the form: user/hostname@DOMAIN. If "_HOST" is used
as the hostname portion, it will be replaced with the actual hostname of the running
instance.
</description>
</property>
<property>
<name>hbase.regionserver.keytab.file</name>
<value>/etc/security/keytabs/rs.service.keytab</value>
<description>Full path to the kerberos keytab file to use for logging
in the configured HRegionServer server principal.
</description>
</property>
<property>
<name>hbase.regionserver.kerberos.principal</name>
<value>rs/_HOST@EXAMPLE.COM</value>
<description>Ex. "hbase/_HOST@EXAMPLE.COM".
The kerberos principal name that
should be used to run the HRegionServer process. The
principal name should be in the form:
user/hostname@DOMAIN. If _HOST
is used as the hostname portion, it will be replaced
with the actual hostname of the running
instance. An entry for this principal must exist
in the file specified in hbase.regionserver.keytab.file
</description>
</property>
<!--Additional configuration specific to HBase security -->
<property>
<name>hbase.superuser</name>
<value>hbase</value>
<description>List of users or groups (comma-separated), who are
allowed full privileges, regardless of stored ACLs, across the cluster. Only
used when HBase security is enabled.
</description>
</property>
<property>
<name>hbase.coprocessor.region.classes</name>
<value></value>
<description>A comma-separated list of Coprocessors that are loaded
by default on all tables. For any override coprocessor method, these classes
will be called in order. After implementing your own Coprocessor,
just put it in HBase's classpath and add the fully qualified class name here. A
coprocessor can also be loaded on demand by setting HTableDescriptor.
</description>
</property>
<property>
<name>hbase.coprocessor.master.classes</name>
<value></value>
<description>A comma-separated list of
org.apache.hadoop.hbase.coprocessor.MasterObserver coprocessors that
are loaded by default on the active HMaster process. For any implemented
coprocessor methods, the listed classes will be called in order.
After implementing your own MasterObserver, just put it in HBase's
classpath and add the fully qualified class name here.
</description>
</property> Hive Metastore supports Kerberos authentication for Thrift clients only. HiveServer does not support Kerberos authentication for any clients:
Table 18.7. hive-site.xml
| Property Name | Property Value | Description |
|---|---|---|
hive.metastore.sasl.enabled
| true | If true, the Metastore Thrift
interface will be secured with SASL and clients must
authenticate with Kerberos |
hive.metastore.kerberos.keytab.file
| /etc/security/keytabs/hive.service.keytab | The keytab for the Metastore Thrift service principal |
hive.metastore.kerberos.principal | hive/_HOST@EXAMPLE.COM |
The service principal for the Metastore Thrift server. If _HOST
is used as the hostname portion, it will be replaced
with the actual hostname of the running
instance. |
hive.metastore.cache.pinobjtypes
| Table,Database,Type,FieldSchema,Order | Comma-separated Metastore object types that should be pinned in the cache |
The XML for these entries:
<property>
<name>hive.metastore.sasl.enabled</name>
<value>true</value>
<description>If true, the metastore thrift interface will be secured with
SASL.
Clients must authenticate with Kerberos.</description>
</property>
<property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>/etc/security/keytabs/hive.service.keytab</value>
<description>The path to the Kerberos Keytab file containing the
metastore thrift server's service principal.</description>
</property>
<property>
<name>hive.metastore.kerberos.principal</name>
<value>hive/_HOST@EXAMPLE.COM</value>
<description>The service principal for the metastore thrift server. The
special string _HOST will be replaced automatically with the correct
hostname.</description>
</property>
<property>
<name>hive.metastore.cache.pinobjtypes</name>
<value>Table,Database,Type,FieldSchema,Order</value>
<description>List of comma separated metastore object types that should be pinned in
the cache</description>
</property>To the oozie-site.xml file, you must add the following
information:
Table 18.8. oozie-site.xml
| Property Name | Property Value | Description |
|---|---|---|
oozie.service.AuthorizationService.security.enabled
| true | Specifies whether security (user name/admin role) is enabled or not. If it is disabled any user can manage the Oozie system and manage any job. |
oozie.service.HadoopAccessorService.kerberos.enabled
| true | Indicates if Oozie is configured to use Kerberos |
local.realm
| EXAMPLE.COM | Kerberos Realm used by Oozie and Hadoop. Using 'local.realm' to be aligned with Hadoop configuration. |
oozie.service.HadoopAccessorService.keytab.file
| /etc/security/keytabs/oozie.service.keytab | The keytab for the Oozie service principal. |
oozie.service.HadoopAccessorService.kerberos.principal
| oozie/_HOSTl@EXAMPLE.COM | Kerberos principal for Oozie service |
oozie.authentication.type
| kerberos | |
oozie.authentication.kerberos.principal | HTTP/_HOST@EXAMPLE.COM | Whitelisted job tracker for Oozie service |
oozie.authentication.kerberos.keytab
| /etc/security/keytabs/spnego.service.keytab | Location of the Oozie user keytab file. |
oozie.service.HadoopAccessorService.nameNode.whitelist
| ||
oozie.authentication.kerberos.name.rules
|
RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/mapred/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/hdfs/ RULE:[2:$1@$0](hm@.*EXAMPLE.COM)s/.*/hbase/ RULE:[2:$1@$0](rs@.*EXAMPLE.COM)s/.*/hbase/ DEFAULT | The mapping from Kerberos principal names to local OS user names. See Creating Mappings Between Principals and UNIX Usernames for more information. |

