Skip to content

Commit

Permalink
Merge pull request #19 from chris9692/master
Browse files Browse the repository at this point in the history
Add avro converter format properties and job commit policy check
  • Loading branch information
chris9692 authored Nov 9, 2021
2 parents 0b63052 + 8efbb01 commit f663689
Show file tree
Hide file tree
Showing 8 changed files with 158 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -292,6 +292,9 @@ protected Integer getValidNonblankWithDefault(State state) {
BooleanProperties MSTAGE_WORK_UNIT_PARTIAL_PARTITION =
new BooleanProperties("ms.work.unit.partial.partition", Boolean.TRUE);
StringProperties MSTAGE_WORK_UNIT_PARTITION = new StringProperties("ms.work.unit.partition", "none");
StringProperties CONVERTER_AVRO_DATE_FORMAT = new StringProperties("converter.avro.date.format");
StringProperties CONVERTER_AVRO_TIME_FORMAT = new StringProperties("converter.avro.time.format");
StringProperties CONVERTER_AVRO_TIMESTAMP_FORMAT = new StringProperties("converter.avro.timestamp.format");
StringProperties CONVERTER_CLASSES = new StringProperties("converter.classes");
StringProperties DATA_PUBLISHER_FINAL_DIR = new StringProperties("data.publisher.final.dir");
StringProperties DATASET_URN = new StringProperties("dataset.urn");
Expand Down Expand Up @@ -321,6 +324,7 @@ protected String getValidNonblankWithDefault(State state) {
}
};

StringProperties JOB_COMMIT_POLICY = new StringProperties("job.commit.policy");
StringProperties JOB_DIR = new StringProperties("job.dir");
StringProperties JOB_NAME = new StringProperties("job.name");
StringProperties SOURCE_CLASS = new StringProperties("source.class");
Expand Down Expand Up @@ -405,6 +409,9 @@ protected String getValidNonblankWithDefault(State state) {
MSTAGE_WORK_UNIT_PARALLELISM_MAX,
MSTAGE_WORK_UNIT_PARTIAL_PARTITION,
MSTAGE_WORK_UNIT_PARTITION,
CONVERTER_AVRO_DATE_FORMAT,
CONVERTER_AVRO_TIME_FORMAT,
CONVERTER_AVRO_TIMESTAMP_FORMAT,
CONVERTER_CLASSES,
DATA_PUBLISHER_FINAL_DIR,
DATASET_URN,
Expand All @@ -414,6 +421,7 @@ protected String getValidNonblankWithDefault(State state) {
EXTRACT_NAMESPACE,
EXTRACT_TABLE_NAME,
EXTRACT_TABLE_TYPE,
JOB_COMMIT_POLICY,
JOB_DIR,
JOB_NAME,
SOURCE_CLASS,
Expand All @@ -433,6 +441,7 @@ protected String getValidNonblankWithDefault(State state) {
);
Map<String, MultistageProperties<?>> deprecatedProperties =
new ImmutableMap.Builder<String, MultistageProperties<?>>()
.put("dataset.name", EXTRACT_TABLE_NAME)
.put("ms.csv.column.header", MSTAGE_CSV)
.put("ms.csv.column.header.index", MSTAGE_CSV)
.put("ms.csv.column.projection", MSTAGE_CSV)
Expand Down
42 changes: 41 additions & 1 deletion docs/how-to/source-authentication.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,33 @@ See [Variable](../concepts/variables.md) for more details about variables.
Generally, when the source is dynamic(as above), it is recommended to start with a static value. After testing with the
static URI, variables can be devised to make the URI dynamic.

Please follow [Authentication Methods](../concepts/authentication-method.md) for authentication configuration details.

## HTTP Syntax

`ms.source.uri=https://host-name/path?url-parameters`

For HTTP connections, `ms.source.uri` accepts a domain or host name, optional path segments, and optional URL
parameters. All of them can be dynamic, i.e., they can contain DIL variables enclosed with double brackets `{{` and `}}`.
parameters. All of them can be dynamic, i.e., they can contain DIL variables enclosed with double brackets `{{` and `}}`.

For basic authentication, use the following:
- `source.conn.username`
- `source.conn.password`
- `ms.authentication`

For token based authentication:
- `ms.authentication`

**Note**: Basic authentication can also be configured as token authentication by concatenating username and password, separated
by a column.

For OAuth2.0 authentication:
- `ms.authentication`
- `ms.secondary.input`

For form based authentication:
- `ms.parameters`
- `ms.http.request.headers={"Content-Type": "application/x-www-form-urlencoded"}`

## S3 Syntax

Expand All @@ -34,6 +55,10 @@ parameters. All of them can be dynamic, i.e., they can contain DIL variables enc
S3 syntax is similar like HTTP syntax, except the `ms.source.uri` has the bucket name as part of host name, and
instead of URL path, it should have optionally a prefix string.

For authentication, use the following:
- `source.conn.username=access-key`
- `source.conn.password=secrete-id`

## JDBC Syntax

`ms.source.uri=jdbc:database-type://host-name:port/database-name?configurations`
Expand All @@ -42,12 +67,27 @@ The database type can be `mysql` or `sqlserver`.

Configurations are name value pairs, separated by `&` such as `useSSL=true&enabledTLSProtocols=TLSv1.2`.

For authentication, use the following:
- `source.conn.username`
- `source.conn.password`

## SFTP Syntax

`ms.source.uri=path`
`source.conn.host=host-name`

For SFTP, the host name is specified in `source.conn.host`, and the root path is specified in `ms.source.uri`.

For authentication, use the following:
- `source.conn.username`
- `source.conn.password`

or use the following if private key authentication is required:
- `source.conn.private.key`

## Variables

To make any part of the source URI dynamic, add [variables](../concepts/variables.md) as needed. In runtime,
variables will be replaced with actual values, hence source URI can get different values in different work units.

[Back to Summary](summary.md#config-source-and-authentication)
4 changes: 4 additions & 0 deletions docs/parameters/categories.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,10 @@ The following are related to watermarks and work units:

The following properties are inherited from Gobblin and enhanced with explicitly validation rules.

- [converter.avro.date.format](converter.avro.date.format.md)
- [converter.avro.time.format](converter.avro.time.format.md)
- [converter.avro.timestamp.format](converter.avro.timestamp.format.md)
- [extract.table.name](extract.table.name.md)
- [job.commit.policy](job.commit.policy.md)
- [source.class](source.class.md)
- [converter.class](converter.class.md)
22 changes: 22 additions & 0 deletions docs/parameters/converter.avro.date.format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# converter.avro.date.format.md

**Tags**:
[gobblin](categories.md#gobblin-properties)

**Type**: string

**Default value**: none

**Related**:

## Description

`converter.avro.date.format` indicates how date values are formatted in the user data. This property
is used by the JSON to AVRO converter in converting fields of type "date".

This property accepts multiple formats, separated by comma (,), if date values come in with several forms.

For example:
- `converter.avro.date.format=MM/dd/yyyy HH:mm,dd-MMM-yyyy HH:mm:ss`

[back to summary](summary.md#essential-gobblin-core-properties)
22 changes: 22 additions & 0 deletions docs/parameters/converter.avro.time.format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# converter.avro.time.format.md

**Tags**:
[gobblin](categories.md#gobblin-properties)

**Type**: string

**Default value**: none

**Related**:

## Description

`converter.avro.time.format` indicates how time values are formatted in the user data. This property
is used by the JSON to AVRO converter in converting fields of type "time".

This property accepts multiple formats, separated by comma (,), if time values come in with several forms.

For example:
- `converter.avro.time.format=HH:mm:ss,HH:mm:ss.000'Z'`

[back to summary](summary.md#essential-gobblin-core-properties)
22 changes: 22 additions & 0 deletions docs/parameters/converter.avro.timestamp.format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# converter.avro.timestamp.format.md

**Tags**:
[gobblin](categories.md#gobblin-properties)

**Type**: string

**Default value**: none

**Related**:

## Description

`converter.avro.timestamp.format` indicates how timestamp values are formatted in the user data. This property
is used by the JSON to AVRO converter in converting fields of type "timestamp".

This property accepts multiple formats, separated by comma (,), if timestamp values come in with several forms.

For example:
- `converter.avro.timestamp.format=MM/dd/yyyy HH:mm,dd-MMM-yyyy HH:mm:ss`

[back to summary](summary.md#essential-gobblin-core-properties)
19 changes: 19 additions & 0 deletions docs/parameters/job.commit.policy.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# job.commit.policy

**Tags**:
[gobblin](categories.md#gobblin-properties)

**Type**: string

**Default value**: full

**Related**:

## Description

`job.commit.policy` specifies how the job state will be committed when some of its tasks failed. Valid values are:
- full: Commit output data of a job if and only if all of its tasks successfully complete.
- successful: Commit output data of tasks that successfully complete.
- partial: Deprecated, the replacement is "successful"

[back to summary](summary.md#essential-gobblin-core-properties)
19 changes: 19 additions & 0 deletions docs/parameters/summary.md
Original file line number Diff line number Diff line change
Expand Up @@ -461,12 +461,31 @@ a work unit. Partitioning, therefore, allows parallel processing.
The following are Gobblin core properties that are essential to job configuration. This is only a short list,
for a complete list of Gobblin core properties, please refer to Gobblin documentation.

## [converter.avro.date.format](converter.avro.date.format.md)

`converter.avro.date.format` indicates how date values are formatted in the user data. This property
is used by the JSON to AVRO converter in converting fields of type "date".

## [converter.avro.time.format](converter.avro.time.format.md)

`converter.avro.time.format` indicates how time values are formatted in the user data. This property
is used by the JSON to AVRO converter in converting fields of type "time".

## [converter.avro.timestamp.format](converter.avro.timestamp.format.md)

`converter.avro.timestamp.format` indicates how timestamp values are formatted in the user data. This property
is used by the JSON to AVRO converter in converting fields of type "timestamp".

## [extract.table.name](extract.table.name.md)

`extract.table.name` specifies the target table name, not the source table name. This
is a required parameter if the extractor is anything other than the FileDumpExtractor.
Writers and some converters don't work without it.

## [job.commmit.policy](job.commit.policy.md)

`job.commit.policy` specifies how the job state will be committed when some of its tasks failed. Valid values are
"full" or "successful".

## [source.class](source.class.md)
## [converter.class](converter.class.md)

0 comments on commit f663689

Please sign in to comment.