dbx-poetry

Databricks dbx project template using poetry, cruft, and cookiecutter. This template, for now, is only suitable for etl jobs. Not ML jobs.

This cookiecutter uses a post-project hook to install a python pre-commit. For this, it is assumed that the project allready has a git repo initialized. If this is not the case, one should manually initialize a git repo using git init or by cloning a remote repo. After this run poetry run pre-commit-install. This will ensure that when you run git commit, before your code is actually committed, the checks (hooks) are run first.

Quickstart

Install poetry:

apt-get install python3-venv
curl -sSL https://install.python-poetry.org | python3 -
export PATH="$HOME/.poetry/bin:$PATH"

If you don't have JDK installed on your local machine, install it:

sudo apt-get install openjdk-11-jdk

Install cruft:

pip install cruft

cd into your folder where you want to create your dbx project.
Create project skeleton using this template:

cruft create https://github.com/JensvandeZande/dbx-poetry.git

cd into your newly created project.
Create virtual environment using poetry and install:

poetry install

Create a new Databricks API token and configure the CLI and verify it is working. This command creates a hidden .dbx folder within your dbx project’s root folder. This .dbx folder contains lock.json and project.json files.

databricks configure --profile default --token
databricks --profile default workspace ls /

This will also generate a .databrickscfg file that looks something like this:

[DEFAULT]
host = <your-host>
token = <your-token>
jobs-api-version = 2.1

The project.json file defines an environment named default along with a reference to the DEFAULT profile within your Databricks CLI .databrickscfg file. If you want dbx to use a different profile, replace --profile DEFAULT with --profile followed by your target profile’s name, in the dbx configure command.

For example, if you have a profile named DEV within your Databricks CLI .databrickscfg file and you want dbx to use it instead of the DEFAULT profile, your project.json file might look like this instead, in which you case you would also replace --environment default with --environment dev in the dbx configure command:

{
  "environments": {
    "default": {
      "profile": "DEFAULT",
      "storage_type": "mlflow",
      "properties": {
        "workspace_directory": "/Shared/dbx/projects/<current-folder-name>",
        "artifact_location": "dbfs:/dbx/<current-folder-name>"
      }
    },
    "dev": {
      "profile": "DEV",
      "storage_type": "mlflow",
      "properties": {
        "workspace_directory": "/Shared/dbx/projects/<some-other-folder-name>",
        "artifact_location": "dbfs:/dbx/<some-other-folder-name>"
      }
    }
  }
}

If you want dbx to use the DATABRICKS_HOST and DATABRICKS_TOKEN environment variables instead of a profile in your Databricks CLI .databrickscfg file, then leave out the --profile option altogether from the dbx configure command.

Now that we have prepared the local setup, you can now start filling in the template/skeleton.

References

Please look at the dbx documentation for more information on dbx.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
hooks		hooks
{{cookiecutter.project_name}}		{{cookiecutter.project_name}}
.gitignore		.gitignore
README.md		README.md
cookiecutter.json		cookiecutter.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

dbx-poetry

Quickstart

References

About

Releases

Packages

Languages

JensvandeZande/dbx-poetry

Folders and files

Latest commit

History

Repository files navigation

dbx-poetry

Quickstart

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages