From 6c51624a57de198c802305b13b5091a2b1009263 Mon Sep 17 00:00:00 2001 From: Richard Rogers Date: Thu, 26 Sep 2024 00:23:17 +0100 Subject: [PATCH] Updates to reference profile example notebook --- ...riting_Reference_Profiles_to_WhyLabs.ipynb | 1002 +++++++++++------ 1 file changed, 642 insertions(+), 360 deletions(-) diff --git a/python/examples/integrations/writers/Writing_Reference_Profiles_to_WhyLabs.ipynb b/python/examples/integrations/writers/Writing_Reference_Profiles_to_WhyLabs.ipynb index 506179eee..c4cfaaa2d 100644 --- a/python/examples/integrations/writers/Writing_Reference_Profiles_to_WhyLabs.ipynb +++ b/python/examples/integrations/writers/Writing_Reference_Profiles_to_WhyLabs.ipynb @@ -1,366 +1,648 @@ { - "cells": [ - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n", - ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Reference_Profiles_to_WhyLabs)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Reference_Profiles_to_WhyLabs) to leverage the power of whylogs and WhyLabs together!*" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Writing Reference Profiles to WhyLabs" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_Reference_Profiles_to_WhyLabs.ipynb)" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "\n", - "\n", - "When monitoring your data, in many cases you'll be interested in comparing data from your production pipeline with a reference, or baseline, profile. This is helpful when inspecting for data drift, or assessing the quality of your data in general.\n", - "\n", - "In this example, we'll show how to send a profile logged with whylogs to your monitoring dashboard at WhyLabs Platform as a Reference Profile. When uploading a Reference Profile, you'll be able to use it for visualization and comparison purposes on your monitoring dashboard.\n", - "\n", - "> If you want to log your profiles as regular profiles (_Batch Profiles_), as opposed to _Reference Profiles_, please check the [Writing to WhyLabs](https://whylogs.readthedocs.io/en/stable/examples/integrations/writers/Writing_to_WhyLabs.html) example.\n", - "\n", - "We will:\n", - "\n", - "- Define environment variables with the appropriate Credentials and IDs\n", - "- Log data into a profile\n", - "- Use the WhyLabs Writer to send the profile as a Reference Profile to your Project at WhyLabs" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Installing whylogs\n", - "\n", - "First, let's install __whylogs__. Since we want to write to WhyLabs, we'll also install the __whylabs__ extra.\n", - "\n", - "If you don't have it installed already, uncomment the line below:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "# Note: you may need to restart the kernel to use updated packages.\n", - "%pip install whylogs==1.3.32" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## ✔️ Setting the Environment Variables" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In order to send our profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.\n", - "\n", - "We will need three pieces of information:\n", - "\n", - "- API token\n", - "- Organization ID\n", - "- Dataset ID (or model-id)\n", - "\n", - "Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.\n", - "\n", - "After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`)." - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import whylogs as why\n", - "\n", - "why.init()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## Fetching the Data\n", - "\n", - "For demonstration, let's use data for transactions from a small retail business:" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "Jtg2MMPnKQFl" + }, + "source": [ + ">### 🚩 *Create a free WhyLabs account to get more value out of whylogs!*
\n", + ">*Did you know you can store, visualize, and monitor whylogs profiles with the [WhyLabs Observability Platform](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Reference_Profiles_to_WhyLabs)? Sign up for a [free WhyLabs account](https://whylabs.ai/whylogs-free-signup?utm_source=whylogs-Github&utm_medium=whylogs-example&utm_campaign=Writing_Reference_Profiles_to_WhyLabs) to leverage the power of whylogs and WhyLabs together!*" + ] + }, { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Transaction IDCustomer IDQuantityItem PriceTotal TaxTotal AmountStore TypeProduct CategoryProduct SubcategoryGenderTransaction TypeAge
0T14259136777C2744771148.915.6345164.5345TeleShopElectronicsAudio and videoFPurchase37.0
1T7313351894C267568448.120.2020212.6020Flagship storeHome and kitchenFurnishingMPurchase25.0
2T37745642681C267098110.91.144512.0445Flagship storeFootwearMensFPurchase42.0
3T13861409908C2716082135.228.3920298.7920MBRFootwearMensFPurchase43.0
4T58956348529C2724844144.360.6060637.8060TeleShopClothingMensFPurchase39.0
\n", - "
" + "cell_type": "markdown", + "metadata": { + "id": "oEdIvQ_7KQFm" + }, + "source": [ + "# Writing Reference Profiles to WhyLabs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vEGk6S7bKQFn" + }, + "source": [ + "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/Writing_Reference_Profiles_to_WhyLabs.ipynb)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4vVUvdAZKQFn" + }, + "source": [ + "\n", + "\n", + "When monitoring your data, in many cases you'll be interested in comparing data from your production pipeline with a reference, or baseline, profile. This is helpful when inspecting for data drift, or assessing the quality of your data in general.\n", + "\n", + "In this example, we'll show how to send a profile logged with whylogs to your monitoring dashboard at WhyLabs Platform as a Reference Profile. When uploading a Reference Profile, you'll be able to use it for visualization and comparison purposes on your monitoring dashboard.\n", + "\n", + "> If you want to log your profiles as regular profiles (_Batch Profiles_), as opposed to _Reference Profiles_, please check the [Writing to WhyLabs](https://whylogs.readthedocs.io/en/stable/examples/integrations/writers/Writing_to_WhyLabs.html) example.\n", + "\n", + "We will:\n", + "\n", + "- Define environment variables with the appropriate Credentials and IDs\n", + "- Log data into a profile\n", + "- Use the WhyLabs Writer to send the profile as a Reference Profile to your Project at WhyLabs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "LJ_edm1_KQFo" + }, + "source": [ + "## Installing whylogs\n", + "\n", + "First, let's install __whylogs__. Since we want to write to WhyLabs, we'll also install the __whylabs__ extra.\n", + "\n", + "If you don't have it installed already, uncomment the line below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "LS3g1ytfKQFo", + "collapsed": true + }, + "outputs": [], + "source": [ + "# Note: you may need to restart the kernel to use updated packages.\n", + "%pip install whylogs" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CIqGTOZuKQFp" + }, + "source": [ + "## ✔️ Setting the Environment Variables" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FaQLcGORKQFq" + }, + "source": [ + "In order to send your profile to WhyLabs, let's first set up an account. You can skip this if you already have an account and a model set up.\n", + "\n", + "We will need three pieces of information:\n", + "\n", + "- API token\n", + "- Organization ID\n", + "- Dataset ID (or model-id)\n", + "\n", + "Go to https://whylabs.ai/free and grab a free account. You can follow along with the examples if you wish, but if you’re interested in only following this demonstration, you can go ahead and skip the quick start instructions.\n", + "\n", + "After that, you’ll be prompted to create an API token. Once you create it, copy and store it locally. The second important information here is your org ID. Take note of it as well. After you get your API Token and Org ID, you can go to https://hub.whylabsapp.com/models to see your projects dashboard. You can create a new project and take note of it's ID (if it's a model project it will look like `model-xxxx`).\n", + "\n", + "Let's enter the information in environment variables so whylogs can send the profile to WhyLabs." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "E5xhjvNCKQFq" + }, + "outputs": [], + "source": [ + "from getpass import getpass\n", + "import os\n", + "\n", + "print(\"Enter your WhyLabs API token\")\n", + "os.environ[\"WHYLABS_API_KEY\"] = getpass()\n", + "print(f\"Using API key ID: {os.environ['WHYLABS_API_KEY'][0:10]}\")\n", + "print(\"Enter your organization ID\")\n", + "os.environ[\"WHYLABS_DEFAULT_ORG_ID\"] = input()\n", + "print(\"Enter your WhyLabs Dataset ID\")\n", + "os.environ[\"WHYLABS_DEFAULT_DATASET_ID\"] = input()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VdUQfnk-KQFr" + }, + "source": [ + "## Fetching the Data\n", + "\n", + "For demonstration, let's use data for transactions from a small retail business:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "PpVa5irKKQFr", + "outputId": "d513c5de-8dac-47b6-dc3a-7ca8b7653d5e", + "colab": { + "base_uri": "https://localhost:8080/", + "height": 223 + } + }, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Transaction ID Customer ID Quantity Item Price Total Tax Total Amount \\\n", + "0 T14259136777 C274477 1 148.9 15.6345 164.5345 \n", + "1 T7313351894 C267568 4 48.1 20.2020 212.6020 \n", + "2 T37745642681 C267098 1 10.9 1.1445 12.0445 \n", + "3 T13861409908 C271608 2 135.2 28.3920 298.7920 \n", + "4 T58956348529 C272484 4 144.3 60.6060 637.8060 \n", + "\n", + " Store Type Product Category Product Subcategory Gender \\\n", + "0 TeleShop Electronics Audio and video F \n", + "1 Flagship store Home and kitchen Furnishing M \n", + "2 Flagship store Footwear Mens F \n", + "3 MBR Footwear Mens F \n", + "4 TeleShop Clothing Mens F \n", + "\n", + " Transaction Type Age \n", + "0 Purchase 37.0 \n", + "1 Purchase 25.0 \n", + "2 Purchase 42.0 \n", + "3 Purchase 43.0 \n", + "4 Purchase 39.0 " + ], + "text/html": [ + "\n", + "
\n", + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
Transaction IDCustomer IDQuantityItem PriceTotal TaxTotal AmountStore TypeProduct CategoryProduct SubcategoryGenderTransaction TypeAge
0T14259136777C2744771148.915.6345164.5345TeleShopElectronicsAudio and videoFPurchase37.0
1T7313351894C267568448.120.2020212.6020Flagship storeHome and kitchenFurnishingMPurchase25.0
2T37745642681C267098110.91.144512.0445Flagship storeFootwearMensFPurchase42.0
3T13861409908C2716082135.228.3920298.7920MBRFootwearMensFPurchase43.0
4T58956348529C2724844144.360.6060637.8060TeleShopClothingMensFPurchase39.0
\n", + "
\n", + "
\n", + "\n", + "
\n", + " \n", + "\n", + " \n", + "\n", + " \n", + "
\n", + "\n", + "\n", + "
\n", + " \n", + "\n", + "\n", + "\n", + " \n", + "
\n", + "\n", + "
\n", + "
\n" + ], + "application/vnd.google.colaboratory.intrinsic+json": { + "type": "dataframe", + "variable_name": "df", + "summary": "{\n \"name\": \"df\",\n \"rows\": 945,\n \"fields\": [\n {\n \"column\": \"Transaction ID\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 940,\n \"samples\": [\n \"T68386297559\",\n \"T61822316053\",\n \"T80406993092\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Customer ID\",\n \"properties\": {\n \"dtype\": \"string\",\n \"num_unique_values\": 875,\n \"samples\": [\n \"C273603\",\n \"C271858\",\n \"C274579\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Quantity\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 2,\n \"min\": -5,\n \"max\": 5,\n \"num_unique_values\": 10,\n \"samples\": [\n -5,\n 4,\n 3\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Item Price\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 41.921715774197565,\n \"min\": 7.1,\n \"max\": 150.0,\n \"num_unique_values\": 702,\n \"samples\": [\n 123.6,\n 131.9,\n 68.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total Tax\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 19.314519356551735,\n \"min\": 0.7455,\n \"max\": 78.6975,\n \"num_unique_values\": 830,\n \"samples\": [\n 17.514,\n 26.313,\n 2.247\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Total Amount\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 261.2151743548891,\n \"min\": -811.0699999999999,\n \"max\": 828.1975,\n \"num_unique_values\": 849,\n \"samples\": [\n 77.23949999999999,\n 601.6725,\n 68.068\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Store Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 4,\n \"samples\": [\n \"Flagship store\",\n \"e-Shop\",\n \"TeleShop\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Product Category\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 6,\n \"samples\": [\n \"Electronics\",\n \"Home and kitchen\",\n \"Bags\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Product Subcategory\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 18,\n \"samples\": [\n \"Audio and video\",\n \"Furnishing\",\n \"Tools\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Gender\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"M\",\n \"F\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Transaction Type\",\n \"properties\": {\n \"dtype\": \"category\",\n \"num_unique_values\": 2,\n \"samples\": [\n \"Cancellation\",\n \"Purchase\"\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n },\n {\n \"column\": \"Age\",\n \"properties\": {\n \"dtype\": \"number\",\n \"std\": 6.747795801891116,\n \"min\": 20.0,\n \"max\": 44.0,\n \"num_unique_values\": 25,\n \"samples\": [\n 36.0,\n 23.0\n ],\n \"semantic_type\": \"\",\n \"description\": \"\"\n }\n }\n ]\n}" + } + }, + "metadata": {}, + "execution_count": 3 + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "csv_url = \"https://whylabs-public.s3.us-west-2.amazonaws.com/datasets/tour/current.csv\"\n", + "df = pd.read_csv(csv_url)\n", + "\n", + "df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0XKyKxZkKQFs" + }, + "source": [ + "## 📊 Profiling the Data\n", + "\n", + "Let's profile the data with whylogs:" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "id": "rLs4Sdx6KQFs" + }, + "outputs": [], + "source": [ + "import whylogs as why\n", + "from datetime import datetime, timezone\n", + "current_date = datetime.now(timezone.utc)\n", + "profile = why.log(df, dataset_timestamp=current_date)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "96XLnORdKQFs" + }, + "source": [ + "We're setting the profile's dataset timestamp as the current datetime. If this is not set, the Writer would simply assign the current datetime automatically to the profile." + ] + }, + { + "cell_type": "markdown", + "source": [ + "## Uploading Reference Profile to WhyLabs\n", + "\n", + "Now we'll use the `WhyLabsWriter` to send the profile to WhyLabs. By default, the profile will be sent to the model specified by the environment variables we set earlier. Setting the `reference_profile_name` tells WhyLabs it's a reference profile; if the name is not specified, it will be sent as a batch profile." ], - "text/plain": [ - " Transaction ID Customer ID Quantity Item Price Total Tax Total Amount \\\n", - "0 T14259136777 C274477 1 148.9 15.6345 164.5345 \n", - "1 T7313351894 C267568 4 48.1 20.2020 212.6020 \n", - "2 T37745642681 C267098 1 10.9 1.1445 12.0445 \n", - "3 T13861409908 C271608 2 135.2 28.3920 298.7920 \n", - "4 T58956348529 C272484 4 144.3 60.6060 637.8060 \n", - "\n", - " Store Type Product Category Product Subcategory Gender \\\n", - "0 TeleShop Electronics Audio and video F \n", - "1 Flagship store Home and kitchen Furnishing M \n", - "2 Flagship store Footwear Mens F \n", - "3 MBR Footwear Mens F \n", - "4 TeleShop Clothing Mens F \n", - "\n", - " Transaction Type Age \n", - "0 Purchase 37.0 \n", - "1 Purchase 25.0 \n", - "2 Purchase 42.0 \n", - "3 Purchase 43.0 \n", - "4 Purchase 39.0 " + "metadata": { + "id": "eu3Z1N_tP3hw" + } + }, + { + "cell_type": "code", + "source": [ + "from whylogs.api.writer.whylabs import WhyLabsWriter\n", + "\n", + "writer = WhyLabsWriter()\n", + "writer.option(reference_profile_name=\"tour\")\n", + "writer.write(profile)" + ], + "metadata": { + "id": "AxJ8PWusZ1aK" + }, + "execution_count": null, + "outputs": [] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "sEF-NfDCKQFs" + }, + "source": [ + "In this case, We named the refrence profile `tour`, which means we can find it in the profile page under that name. We could also name it `\"\"` (empty string) which tells WhyLabs to generate a name for it based on the timestamp, like `ref-2022-08-16T17:53:49.041`" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rfWgYesWKQFt" + }, + "source": [ + "## 🔍 A Look on the Other Side\n", + "\n", + "Now, check your dashboard to verify everything went ok. At the __Profile__ tab, you should see something like this:" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "If512YGLKQFt" + }, + "source": [ + "![alt text](https://github.com/whylabs/whylogs/blob/mainline/python/examples/integrations/writers/images/reference_profile.png?raw=1)\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "c6LKRopoKQFt" + }, + "source": [ + "In the image above, we're comparing both reference profiles sent previously. Usually, we'd be interested in comparing a reference profile with a batch profile obtained in the production pipeline, which is, of course, also possible." ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" } - ], - "source": [ - "import pandas as pd\n", - "\n", - "csv_url = \"https://whylabs-public.s3.us-west-2.amazonaws.com/datasets/tour/current.csv\"\n", - "df = pd.read_csv(csv_url)\n", - "\n", - "df.head()" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 📊 Profiling the Data\n", - "\n", - "Let's profile the data with whylogs:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import whylogs as why\n", - "from datetime import datetime, timezone\n", - "current_date = datetime.now(timezone.utc)\n", - "profile = why.log(df, dataset_timestamp=current_date, name=\"tour\")" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "We're setting the profile's dataset timestamp as the current datetime. If this is not set, the Writer would simply assign the current datetime automatically to the profile." - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In this case, We named the refrence profile `tour`, which means we can find it in the profile page under that name. We could also name it `\"\"` (empty string) which tells WhyLabs to generate a name for it based on the timestamp, like `ref-2022-08-16T17:53:49.041`" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## 🔍 A Look on the Other Side\n", - "\n", - "Now, check your dashboard to verify everything went ok. At the __Profile__ tab, you should see something like this:" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "![alt text](images/reference_profile.png)\n" - ] - }, - { - "attachments": {}, - "cell_type": "markdown", - "metadata": {}, - "source": [ - "In the image above, we're comparing both reference profiles sent previously. Usually, we'd be interested in comparing a reference profile with a batch profile obtained in the production pipeline, which is, of course, also possible." - ] - } - ], - "metadata": { - "kernelspec": { - "display_name": "Python 3.9.13 ('.venv': poetry)", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.16" + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.9.13 ('.venv': poetry)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.16" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "0f484380554f045e8316d9ef136659363ef199c84a7347221e49b73e46486d36" + } + }, + "colab": { + "provenance": [] + } }, - "orig_nbformat": 4, - "vscode": { - "interpreter": { - "hash": "0f484380554f045e8316d9ef136659363ef199c84a7347221e49b73e46486d36" - } - } - }, - "nbformat": 4, - "nbformat_minor": 2 -} + "nbformat": 4, + "nbformat_minor": 0 +} \ No newline at end of file