Skip to content
This repository has been archived by the owner on Oct 24, 2022. It is now read-only.

Latest commit

 

History

History
104 lines (85 loc) · 5.29 KB

kerberos.md

File metadata and controls

104 lines (85 loc) · 5.29 KB

Kerberos Authentication

These instructions only apply if your Hadoop installation is secured by Kerberos.

Prerequisites

First, obtain the following information:

  • Kerberos user principal to be used for Hdfs and HCatalog/Hive authentication (typically a technical Hadoop user)
  • Kerberos configuration file (e.g., krb5.conf)
  • Kerberos keytab which contains keys for the user principal
  • Kerberos service principal for the Hadoop NameNode (value of dfs.namenode.kerberos.principal in hdfs-site.xml)
  • Kerberos service principal for HCatalog. Since HCatalog is accessed through Hive, use the Hive service principal. (value of hive.metastore.kerberos.principal)

Create a Connection

In order for the UDFs to have access to the necessary Kerberos information, a CONNECTION object must be created in EXASOL. Storing the Kerberos information in CONNECTION objects provides the ability to set the accessibility of the Kerberos authentication data (especially the keytab) for users. The TO field is left empty, the Kerberos principal is stored in the USER field, and the Kerberos configuration and keytab are stored in the IDENTIFIED BY field (base64 format) along with an internal key to identify the CONNECTION as a Kerberos CONNECTION.

In order to simplify the creation of Kerberos CONNECTION objects, the create_kerberos_conn.py Python script has been provided. The script requires 4 arguments:

  • CONNECTION name
  • Kerberos user principal
  • Kerberos configuration file path
  • Kerberos keytab path

Example command:

python tools/create_kerberos_conn.py krb_conn krbuser@EXAMPLE.COM /etc/krb5.conf ./krbuser.keytab

Output:

CREATE CONNECTION krb_conn TO '' USER 'krbuser@EXAMPLE.COM' IDENTIFIED BY 'ExaAuthType=Kerberos;enp6Cg==;YWFhCg=='

The output is a CREATE CONNECTION statement, which can be executed directly in EXASOL to create the Kerberos CONNECTION object. For more detailed information about the script, use the help option:

python tools/create_kerberos_conn.py -h

JDBC Kerberos Connection for EXPORT

In order to use the REPLACE, TRUNCATE, or CREATED BY options with the EXPORT UDF, you must also setup a CONNECTION for the JDBC driver. This CONNECTION has the same format as the above-created CONNECTION, except that the JDBC connection string must be provided in the TO field.

Assuming the principal information for the JDBC connection is the same as for connection created above, you can just copy the CREATE CONNECTION statement previously created, specify a new name, and provide the JDBC connection string in TO field. Otherwise, you will need to execute the create_kerberos_conn.py Python script again with the correct information for the JDBC user.

The EXPORT UDF uses the Apache Hive JDBC driver to execute the necessary statements. For most cases, the connection string should have a format similar to the following. See JDBC Client Setup for a Secure Cluster for details.

'jdbc:hive2://hive-host:10000/;principal=hive/hive-host@EXAMPLE'

Then you can create the JDBC CONNECTION using the statement below.

CREATE CONNECTION jdbc_krb_conn TO 'jdbc:hive2://hive-host:10000/;principal=hive/hive-host@EXAMPLE' USER 'krbuser@EXAMPLE.COM' IDENTIFIED BY 'ExaAuthType=Kerberos;enp6Cg==;YWFhCg=='

Grant Access to the Connection

You can then grant access to the CONNECTION to UDFs and users:

GRANT ACCESS ON CONNECTION krb_conn FOR ETL.HCAT_TABLE_FILES TO exauser;
GRANT ACCESS ON CONNECTION krb_conn FOR ETL.IMPORT_HIVE_TABLE_FILES TO exauser;
GRANT ACCESS ON CONNECTION krb_conn FOR ETL.EXPORT_HCAT_TABLE TO exauser;
GRANT ACCESS ON CONNECTION krb_conn FOR ETL.EXPORT_INTO_HIVE_TABLE TO exauser;

For optional an optional JDBC connection for EXPORT:

GRANT ACCESS ON CONNECTION jdbc_krb_conn FOR ETL.EXPORT_HCAT_TABLE TO exauser;

Or, if you want to grant the user access to the CONNECTION in any UDF (which means that the user can access all the information in the CONNECTION--most importantly the keytab):

GRANT CONNECTION krb_conn TO exauser;

Connection Usage

Then, you can access the created CONNECTION from a UDF by passing the CONNECTION name as a UDF parameter.

IMPORT Example:

IMPORT INTO (code VARCHAR(1000), description VARCHAR (1000), total_emp INT, salary INT)
FROM SCRIPT ETL.IMPORT_HCAT_TABLE WITH
 HCAT_DB         = 'default'
 HCAT_TABLE      = 'sample_07'
 HCAT_ADDRESS    = 'thrift://hive-metastore-host:9083'
 AUTH_TYPE       = 'kerberos'
 KERBEROS_CONNECTION = 'krb_conn'
 KERBEROS_HDFS_SERVICE_PRINCIPAL = 'hdfs/_HOST@EXAMPLE.COM'
 KERBEROS_HCAT_SERVICE_PRINCIPAL = 'hive/_HOST@EXAMPLE.COM';

EXPORT Example:

EXPORT
TABLE1
INTO SCRIPT ETL.EXPORT_HCAT_TABLE WITH
 HCAT_DB         = 'default'
 HCAT_TABLE      = 'test_table'
 HCAT_ADDRESS    = 'thrift://hive-metastore-host:9083'
 AUTH_TYPE       = 'kerberos'
 KERBEROS_CONNECTION = 'krb_conn'
 KERBEROS_HDFS_SERVICE_PRINCIPAL = 'hdfs/_HOST@EXAMPLE.COM'
 KERBEROS_HCAT_SERVICE_PRINCIPAL = 'hive/_HOST@EXAMPLE.COM'
 JDBC_AUTH_TYPE  = 'kerberos'
 JDBC_CONNECTION = 'jdbc_krb_conn'
CREATED BY 'CREATE TABLE default.test_table(col1 VARCHAR(200)) STORED AS PARQUET';