Configuring Apache Sqoop Hook
Apache Sqoop has added a SqoopDataPublisher class that publishes data
to Atlas after import jobs are completed. Today, only hiveImport is supported in
sqoopHook. This is used to add entities in Atlas using the model defined in
org.apache.atlas.sqoop.model.SqoopDataModelGenerator. Complete the following
instructions in your Sqoop set-up to add the Sqoop hook for Atlas in the
<sqoop-conf>/sqoop-site.xml file:
Add the Sqoop job publisher class. Currrently only one publishing class is supported.
<property> <name>sqoop.job.data.publish.class</name> <value>org.apache.atlas.sqoop.hook.SqoopHook</value> </property>Add the Atlas cluster name:
<property> <name>atlas.cluster.name</name> <value><clustername></value> </property>Copy the application and client properties from the Atlas config address.
Define
atlas.cluster.nameandatlas.rest.addressproperties in the Sqoop configuration filesqoop-site.xmlfile.Add ATLAS_HOME to the
/usr/hdp/<version>/sqoop/bin.export ATLAS_HOME=${ATLAS_HOME:-/usr/hdp/2.5.6.0-1245/atlas}Add the following information to the
$SQOOP_HOME/bin/configure-sqoopfile after the lineZOOCFGDIR=${ZOOCFGDIR:-/etc/zookeeper}.if [ -e "$ATLAS_HOME/hook/sqoop" -a -e "$ATLAS_HOME/hook/hive" ]; then for f in $ATLAS_HOME/hook/sqoop/*.jar; do SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f; done for f in $ATLAS_HOME/hook/hive/*.jar; do SQOOP_CLASSPATH=${SQOOP_CLASSPATH}:$f; done fiCopy the Atlas
<atlas-conf>/application.propertiesfile and the<atlas-conf>/client.propertiesfile to the<sqoop-conf>/directory.Link
<atlas-home>/hook/sqoop/*.jarin sqoop lib.
Limitations
Currently, only hiveImport jobs are published to Atlas by the Sqoop hook.

