positive_class in recipes.yaml not found during transform step when running in Databricks #23

timmullins159 · 2023-02-03T15:16:10Z

positive_class is defined and set in recipe.yaml, but is not found when running classification template or classification example, https://github.com/mlflow/recipes-examples/tree/main/classification, in Databricks.

Databricks Runtime: 11.3 ML LTS
mlflow==2.1.1

Full Stacktrace:

---------------------------------------------------------------------------
MlflowException                           Traceback (most recent call last)
File <command-4167555846036620>:1
----> 1 r.run("transform")

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-25249b5f-a44c-4b1b-a8b5-94ddb9e63a4d/lib/python3.9/site-packages/mlflow/recipes/classification/v1/recipe.py:267, in ClassificationRecipe.run(self, step)
    195 def run(self, step: str = None) -> None:
    196     """
    197     Runs the full recipe or a particular recipe step, producing outputs and displaying a
    198     summary of results upon completion. Step outputs are cached from previous executions, and
   (...)
    265         classification_recipe.run()
    266     """
--> 267     return super().run(step=step)

File /local_disk0/.ephemeral_nfs/envs/pythonEnv-25249b5f-a44c-4b1b-a8b5-94ddb9e63a4d/lib/python3.9/site-packages/mlflow/recipes/recipe.py:104, in _BaseRecipe.run(self, step)
    102 if last_executed_step_state.status != StepStatus.SUCCEEDED:
    103     if step is not None:
--> 104         raise MlflowException(
    105             f"Failed to run step '{step}' of recipe '{self.name}'."
    106             f" An error was encountered while running step '{last_executed_step.name}':"
    107             f" {last_executed_step_state.stack_trace}",
    108             error_code=BAD_REQUEST,
    109         )
    110     else:
    111         raise MlflowException(
    112             f"Failed to run recipe '{self.name}'."
    113             f" An error was encountered while running step '{last_executed_step.name}':"
    114             f" {last_executed_step_state.stack_trace}",
    115             error_code=BAD_REQUEST,
    116         )

MlflowException: Failed to run step 'transform' of recipe 'recipes-classification-template'. An error was encountered while running step 'transform': Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-25249b5f-a44c-4b1b-a8b5-94ddb9e63a4d/lib/python3.9/site-packages/mlflow/recipes/step.py", line 139, in run
    self.step_card = self._run(output_directory=output_directory)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-25249b5f-a44c-4b1b-a8b5-94ddb9e63a4d/lib/python3.9/site-packages/mlflow/recipes/steps/transform.py", line 105, in _run
    validate_classification_config(self.task, self.positive_class, train_df, self.target_col)
  File "/local_disk0/.ephemeral_nfs/envs/pythonEnv-25249b5f-a44c-4b1b-a8b5-94ddb9e63a4d/lib/python3.9/site-packages/mlflow/recipes/utils/step.py", line 182, in validate_classification_config
    raise MlflowException(
mlflow.exceptions.MlflowException: `positive_class` must be specified for classification/v1 recipes.

recipe.yaml

# `recipe.yaml` is the main configuration file for an MLflow Recipe.
# Required recipe parameters should be defined in this file with either concrete values or
# variables such as {{ INGEST_DATA_LOCATION }}.
#
# Variables must be dereferenced in a profile YAML file, located under `profiles/`.
# See `profiles/local.yaml` for example usage. One may switch among profiles quickly by
# providing a profile name such as `local` in the Recipe object constructor:
# `r = Recipe(profile="local")`
#
# NOTE: All "FIXME::REQUIRED" fields in recipe.yaml and profiles/*.yaml must be set correctly
#       to adapt this template to a specific classification problem. To find all required fields,
#       under the root directory of this recipe, type on a unix-like command line:
#       $> grep "# FIXME::REQUIRED:" recipe.yaml profiles/*.yaml
#
# NOTE: YAML does not support tabs for indentation. Please use spaces and ensure that all YAML
#       files are properly formatted.

recipe: "classification/v1"
# FIXME::REQUIRED: Specifies the target column name for model training and evaluation.
target_col: "bool__did_ctp"
# FIXME::REQUIRED: Specifies the value of `target_col` that is considered the positive class.
positive_class: 1
# FIXME::REQUIRED: Sets the primary metric to use to evaluate model performance. This primary
#                  metric is used to select best performing models in MLflow UI as well as in
#                  train and evaluation step.
#                  Built-in primary metrics are: recall_score, precision_score, f1_score, accuracy_score.
primary_metric: "f1_score"
steps:
  # Specifies the dataset to use for model development
  ingest: {{INGEST_CONFIG}}
  split:
    #
    # FIXME::OPTIONAL: Adjust the train/validation/test split ratios below.
    #
    split_ratios: [0.75, 0.125, 0.125]
    #
    #  FIXME::OPTIONAL: Specifies the method to use to "post-process" the split datasets. Note that
    #                   arbitrary transformations should go into the transform step.
    post_split_filter_method: create_dataset_filter
  transform:
    using: "custom"
    #
    #  FIXME::OPTIONAL: Specifies the method that defines an sklearn-compatible transformer, which
    #                   applies input feature transformation during model training and inference.
    transformer_method: transformer_fn
  train:
    #
    # FIXME::REQUIRED: Specifies the method to use for training. Options are "automl/flaml" for
    #                  AutoML training or "custom" for user-defined estimators.
    using: "automl"
  evaluate:
    #
    # FIXME::OPTIONAL: Sets performance thresholds that a trained model must meet in order to be
    #                  eligible for registration to the MLflow Model Registry.
    #
    # validation_criteria:
    #   - metric: f1_score
    #     threshold: 0.9
  register:
    # Indicates whether or not a model that fails to meet performance thresholds should still
    # be registered to the MLflow Model Registry
    allow_non_validated_model: false
  # FIXME::OPTIONAL: Specify the dataset to use for batch scoring. All params serve the same function
  #                  as in `data`
  # ingest_scoring: {{INGEST_SCORING_CONFIG}}
  # predict:
  #   output: {{PREDICT_OUTPUT_CONFIG}}
  #   model_uri: "models/model.pkl"
  #   result_type: "double"
  #   save_mode: "default
# custom_metrics:
#   FIXME::OPTIONAL: Defines custom performance metrics to compute during model development.
#     - name: ""
#       function: get_custom_metrics
#       greater_is_better: False

The text was updated successfully, but these errors were encountered:

Avneet1710 · 2023-02-10T11:33:57Z

Having the same issue.

PatrickBrayPersonal · 2023-02-21T04:01:54Z

I was able to get around this error by adding the positive_class to the transform step

  transform:
    using: "custom"
    positive_class: 1
    transformer_method: transformer_fn

I ran into a different issue further down the line, however.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

positive_class in recipes.yaml not found during transform step when running in Databricks #23

positive_class in recipes.yaml not found during transform step when running in Databricks #23

timmullins159 commented Feb 3, 2023 •

edited

Loading

Avneet1710 commented Feb 10, 2023

PatrickBrayPersonal commented Feb 21, 2023

positive_class in recipes.yaml not found during transform step when running in Databricks #23

positive_class in recipes.yaml not found during transform step when running in Databricks #23

Comments

timmullins159 commented Feb 3, 2023 • edited Loading

Avneet1710 commented Feb 10, 2023

PatrickBrayPersonal commented Feb 21, 2023

timmullins159 commented Feb 3, 2023 •

edited

Loading