Merge pull request #5 from WallarooLabs/1.12-elise

Updated model insights.
WallarooLabs · Jul 13, 2022 · 4a269ca · 4a269ca
2 parents e2c7358 + 1a9dd5e
commit 4a269ca
Show file tree

Hide file tree

Showing 2 changed files with 48 additions and 19 deletions.
diff --git a/.vscode/settings.json b/.vscode/settings.json
@@ -0,0 +1,5 @@
+{
+    "cSpell.words": [
+        "quantile"
+    ]
+}
diff --git a/model_insights/model-insights.ipynb b/model_insights/model-insights.ipynb
@@ -16,11 +16,13 @@
    "source": [
     "## Introduction\n",
     "\n",
-    "The Model Insights feature lets you monitor how the environment that your model operates within may be changing in ways that affect it's predictions so that you can intervene (retrain) in an efficient and timely manner. Changes in the inputs, **data drift**, can occur due to errors in the data processing pipeline or due to changes in the environment such as user preference or behavior.\n",
+    "The Model Insights feature lets you monitor how the environment that your model operates within changes in ways that affect the model's predictions.  This allows you to intervene (aka retrain a model) in an efficient and timely manner. Changes in the inputs, **data drift**, can occur due to errors in the data processing pipeline or due to changes in the environment such as user behavior.\n",
     "\n",
     "The validation framework performs per inference range checks with count frequency based thresholds for alerts and is ideal for catching many errors in input and output data.\n",
     "\n",
-    "In complement to the validation framework, model insights focuses on the differences in the distributions of data in a time based window measured against a baseline for a given pipeline and can detect situations where values are still within the expected range but the distribution has shifted. For example, if your model predicts housing prices you might expect the predictions to be between \\\\$200,000 and \\\\$1,000,000 with a distribution centered around \\\\$400,000. If your model suddenly starts predicting prices centered around \\\\$250,000 or \\\\$750,000 the predictions may still be within the expected range but the shift may signal something has changed that should be investigated.\n",
+    "In complement to the validation framework, model insights examines the distribution of data within a specified window of time, and compares it to a baseline for a given pipeline. It can detect situations where values are still within the expected range, but the distribution has shifted. \n",
+    "\n",
+    "For example, for a model that predicts housing prices you might expect the predictions to be between \\\\$200,000 and \\\\$1,000,000 with a distribution centered around \\\\$400,000. Then your model suddenly starts predicting prices centered around \\\\$250,000 or \\\\$750,000.  The predictions may still be within the expected range but the shift may signal something has changed that should be investigated.\n",
     "\n",
     "Ideally we'd also monitor the _quality_ of the predictions, aka **concept drift**. However this can be difficult as true labels are often not available or are severely delayed in practice. That is there may be a significant lag between the time the prediction is made and the true (sale price) value is observed.\n",
     "\n",
@@ -34,7 +36,7 @@
     "\n",
     "Once you've set up a monitoring task, called an assay, comparisons against your baseline are then run automatically on a scheduled basis. You can be notified if the system notices any abnormally different behavior. The framework also allows you to quickly investigate the cause of any unexpected drifts in your predictions.\n",
     "\n",
-    "The rest of this notebook will shows how to create assays to monitor your pipelines."
+    "The rest of this tutorial shows how to create assays to monitor your pipelines."
    ]
   },
   {
@@ -76,7 +78,7 @@
    "source": [
     "### Set Configuration\n",
     "\n",
-    "The following configuration is used to connect to the pipeline used, and display the graphs.  The `pipeline_name` and `model_name` area set from the [Model Insights Canned Data Loader](model-insights-load_canned_data.ipynb), so adjust them based on your own needs."
+    "The following configuration is used to connect to the pipeline used, and display the graphs.  The `pipeline_name` and `model_name` shown here are from the [Model Insights Canned Data Loader](model-insights-load_canned_data.ipynb), so adjust them based on your own needs."
    ]
   },
   {
@@ -161,7 +163,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We assume the pipeline has been running for a while and there is a period of time that is free of errors that we'd like to use as the _baseline_. Lets note the start and end times. For this example we have 30 days of data from Jan 2022 and well use Jan 1 data as our baseline."
+    "We assume the pipeline has been running for a while and there is a period of time that is free of errors that we'd like to use as the _baseline_. Let's note the start and end times. For this example we have 30 days of data from Jan 2022 and will use Jan 1 data as our baseline."
    ]
   },
   {
@@ -180,7 +182,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Lets create an assay using that pipeline and the model in the pipeline. We also specify the baseline start and end."
+    "Let's create an assay using that pipeline and the model in the pipeline. We also specify the baseline start and end."
    ]
   },
   {
@@ -197,7 +199,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "We don't know much about our baseline data yet so lets examine the data and create a couple of visual representations. First lets get some basic stats on the baseline data."
+    "We don't know much about our baseline data yet so let's examine the data and create a couple of visual representations. First let's get some basic stats on the baseline data."
    ]
   },
   {
@@ -292,7 +294,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now lets look at a histogram, kernel density estimate (KDE), and Empirical Cumulative Distribution (ecdf) charts of the baseline data. These will give us insights into the distributions of the predictions and features that the assay is configured for."
+    "Now let's look at a histogram, kernel density estimate (KDE), and Empirical Cumulative Distribution (ecdf) charts of the baseline data. These will give us insights into the distributions of the predictions and features that the assay is configured for."
    ]
   },
   {
@@ -521,7 +523,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The previous assay used quintiles so all of the bins had the same percentage/count of samples.  To get bins that are divided equaly along the range of values we can use `BinMode.EQUAL`."
+    "The previous assay used quintiles so all of the bins had the same percentage/count of samples.  To get bins that are divided equally along the range of values we can use `BinMode.EQUAL`."
    ]
   },
   {
@@ -686,13 +688,30 @@
    "metadata": {},
    "source": [
     "### Interactive Assay Runs\n",
-    "By default the assay builder creates an assay with some good starting parameters. In particular the assay is configured to run a new analysis for **every 24 hours starting at the end of the baseline period**. Additionally, it sets the **number of bins to 5** so creates quintiles, and sets the target `iopath` to `\"outputs 0 0\"` which means we want to monitor the first column of the first output/prediction.\n",
+    "\n",
+    "By default the assay builder creates an assay with some good starting parameters. In particular the assay is configured to run a new analysis for **every 24 hours starting at the end of the baseline period**. Additionally, it sets the **number of bins to 5** to create quintiles, and sets the target `iopath` to `\"outputs 0 0\"` which means we want to monitor the first column of the first output/prediction.\n",
     "\n",
     "We can do an interactive run of just the baseline part to see how the baseline data will be put into bins. This assay uses quintiles so all 5 bins (not counting the outlier bins) have 20% of the predictions. We can see the bin boundaries along the x-axis.\n",
     "\n",
     "We then run it with `interactive_run` and convert it to a dataframe for easy analysis with `to_dataframe`.\n",
     "\n",
-    "Now lets do an interactive run of the first assay as it is configured.  Interactive runs don't save the assay to the database (so they won't be scheduled in the future) nor do they save the assay results. Instead the results are returned after a short while for further analysis."
+    "Now let's do an interactive run of the first assay as it is configured.  Interactive runs don't save the assay to the database (so they won't be scheduled in the future) nor do they save the assay results. Instead the results are returned after a short while for further analysis.\n",
+    "\n",
+    "#### Configuration Notes\n",
+    "\n",
+    "By default the distance measure used is a modified version of the *Population Stability Index*, a measure that's widely used in banking and finance, and is also known as the *Jeffreys divergence*, or the *Symmetrised Kullback-Leibler divergence*.\n",
+    "\n",
+    "There is a handy rule of thumb for determining whether the PSI score is \"large\":\n",
+    "\n",
+    "* PSI < 0.1: The distance is small; the distributions are about the same\n",
+    "* 0.1 <= PSI < 0.2: The distance is moderately large; the distributions are somewhat different, and there may be some data drift\n",
+    "* PSI >= 0.2: The distance is large; the distributions are different. A prolonged range of PSI > 0.2 can indicate the model is no longer in operating bounds and should be retrained.\n",
+    "\n",
+    "Of course, this is only a rule of thumb; different thresholds may work better for a specific application, and this exploration can help you properly tune the threshold (or other parameters, like the binning scheme or difference metric) as needed.\n",
+    "\n",
+    "The red dots in the above graph indicate distance scores larger than our threshold of 0.1. We see that the difference scores are low for a while and then jump up to indicate there is an issue. We can examine that particular window to help us decide if that threshold is set correctly or not.\n",
+    "\n",
+    "We can also retrieve the above results as a data frame, for further analysis, if desired."
    ]
   },
   {
@@ -1335,7 +1354,7 @@
     "\n",
     "We can generate a quick chart of the results. This chart shows the 5 quantile bins (quintiles) derived from the baseline data plus one for left outliers and one for right outliers.  We also see that the data from the window falls within the baseline quintiles but in a different proportion and is skewing higher. Whether this is an issue or not is specific to your use case.\n",
     "\n",
-    "First lets examine a day that is only slightly different than the baseline. We see that we do see some values that fall outside of the range from the baseline values, the left and right outliers, and that the bin values are different but similar."
+    "First let's examine a day that is only slightly different than the baseline. We see that we do see some values that fall outside of the range from the baseline values, the left and right outliers, and that the bin values are different but similar."
    ]
   },
   {
@@ -1726,11 +1745,11 @@
     "* calculate the score using the sum of differences, maximum difference or population stability index\n",
     "* change the value aggregation for the bins to density, cumulative or edges\n",
     "\n",
-    "Lets take a look at these in turn.\n",
+    "Let's take a look at these in turn.\n",
     "\n",
     "### Default configuration\n",
     "\n",
-    "First lets look at the default configuration. This is a lot of information but much of it is useful to know where it is available.\n",
+    "First let's look at the default configuration. This is a lot of information but much of it is useful to know where it is available.\n",
     "\n",
     "We see that the assay is broken up into 4 sections: \n",
     "  \n",
@@ -2382,7 +2401,7 @@
    "source": [
     "## User Provided Bin Edges\n",
     "\n",
-    "The values in this dataset run from ~11.6 to ~15.81. And lets say we had a business reason to use specific bin edges.  We can specify them with the BinMode.PROVIDED and specifying a list of floats with the right hand / upper edge of each bin and optionally the lower edge of the smallest bin. If the lowest edge is not specified the threshold for left outliers is taken from the smallest value in the baseline dataset."
+    "The values in this dataset run from ~11.6 to ~15.81. And let's say we had a business reason to use specific bin edges.  We can specify them with the BinMode.PROVIDED and specifying a list of floats with the right hand / upper edge of each bin and optionally the lower edge of the smallest bin. If the lowest edge is not specified the threshold for left outliers is taken from the smallest value in the baseline dataset."
    ]
   },
   {
@@ -2604,7 +2623,7 @@
    "source": [
     "## Number of Bins\n",
     "\n",
-    "We could also choose to a different number of bins, lets say 10, which can be evenly spaced or based on the quantiles (deciles)."
+    "We could also choose to a different number of bins, let's say 10, which can be evenly spaced or based on the quantiles (deciles)."
    ]
   },
   {
@@ -2913,7 +2932,7 @@
    "source": [
     "## Bin Weights\n",
     "\n",
-    "Now lets say we only care about differences at the higher end of the range. We can use weights to specify that difference in the lower bins should not be counted in the score. \n",
+    "Now let's say we only care about differences at the higher end of the range. We can use weights to specify that difference in the lower bins should not be counted in the score. \n",
     "\n",
     "If we stick with 10 bins we can provide 10 a vector of 12 weights. One weight each for the original bins plus one at the front for the left outlier bin and one at the end for the right outlier bin.\n",
     "\n",
@@ -3453,7 +3472,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3.9.13 64-bit",
    "language": "python",
    "name": "python3"
   },
@@ -3467,7 +3486,12 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.6"
+   "version": "3.9.13"
+  },
+  "vscode": {
+   "interpreter": {
+    "hash": "b0fa6594d8f4cbf19f97940f81e996739fb7646882a419484c72d19e05852a7e"
+   }
   }
  },
  "nbformat": 4,