ganesh_kota_Customer_Churn_Prediction_Phase2.html

<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" lang="" xml:lang="">
<head>
  <meta charset="utf-8" />
  <meta name="generator" content="pandoc" />
  <meta name="viewport" content="width=device-width, initial-scale=1.0, user-scalable=yes" />
  <title>57dedfc391534c5dbfab25d7b8d7231a</title>
  <style>
    html {
      line-height: 1.5;
      font-family: Georgia, serif;
      font-size: 20px;
      color: #1a1a1a;
      background-color: #fdfdfd;
    }
    body {
      margin: 0 auto;
      max-width: 36em;
      padding-left: 50px;
      padding-right: 50px;
      padding-top: 50px;
      padding-bottom: 50px;
      hyphens: auto;
      overflow-wrap: break-word;
      text-rendering: optimizeLegibility;
      font-kerning: normal;
    }
    @media (max-width: 600px) {
      body {
        font-size: 0.9em;
        padding: 1em;
      }
      h1 {
        font-size: 1.8em;
      }
    }
    @media print {
      body {
        background-color: transparent;
        color: black;
        font-size: 12pt;
      }
      p, h2, h3 {
        orphans: 3;
        widows: 3;
      }
      h2, h3, h4 {
        page-break-after: avoid;
      }
    }
    p {
      margin: 1em 0;
    }
    a {
      color: #1a1a1a;
    }
    a:visited {
      color: #1a1a1a;
    }
    img {
      max-width: 100%;
    }
    h1, h2, h3, h4, h5, h6 {
      margin-top: 1.4em;
    }
    h5, h6 {
      font-size: 1em;
      font-style: italic;
    }
    h6 {
      font-weight: normal;
    }
    ol, ul {
      padding-left: 1.7em;
      margin-top: 1em;
    }
    li > ol, li > ul {
      margin-top: 0;
    }
    blockquote {
      margin: 1em 0 1em 1.7em;
      padding-left: 1em;
      border-left: 2px solid #e6e6e6;
      color: #606060;
    }
    code {
      font-family: Menlo, Monaco, 'Lucida Console', Consolas, monospace;
      font-size: 85%;
      margin: 0;
    }
    pre {
      margin: 1em 0;
      overflow: auto;
    }
    pre code {
      padding: 0;
      overflow: visible;
      overflow-wrap: normal;
    }
    .sourceCode {
     background-color: transparent;
     overflow: visible;
    }
    hr {
      background-color: #1a1a1a;
      border: none;
      height: 1px;
      margin: 1em 0;
    }
    table {
      margin: 1em 0;
      border-collapse: collapse;
      width: 100%;
      overflow-x: auto;
      display: block;
      font-variant-numeric: lining-nums tabular-nums;
    }
    table caption {
      margin-bottom: 0.75em;
    }
    tbody {
      margin-top: 0.5em;
      border-top: 1px solid #1a1a1a;
      border-bottom: 1px solid #1a1a1a;
    }
    th {
      border-top: 1px solid #1a1a1a;
      padding: 0.25em 0.5em 0.25em 0.5em;
    }
    td {
      padding: 0.125em 0.5em 0.25em 0.5em;
    }
    header {
      margin-bottom: 4em;
      text-align: center;
    }
    #TOC li {
      list-style: none;
    }
    #TOC ul {
      padding-left: 1.3em;
    }
    #TOC > ul {
      padding-left: 0;
    }
    #TOC a:not(:hover) {
      text-decoration: none;
    }
    code{white-space: pre-wrap;}
    span.smallcaps{font-variant: small-caps;}
    div.columns{display: flex; gap: min(4vw, 1.5em);}
    div.column{flex: auto; overflow-x: auto;}
    div.hanging-indent{margin-left: 1.5em; text-indent: -1.5em;}
    ul.task-list{list-style: none;}
    ul.task-list li input[type="checkbox"] {
      width: 0.8em;
      margin: 0 0.8em 0.2em -1.6em;
      vertical-align: middle;
    }
    pre > code.sourceCode { white-space: pre; position: relative; }
    pre > code.sourceCode > span { display: inline-block; line-height: 1.25; }
    pre > code.sourceCode > span:empty { height: 1.2em; }
    .sourceCode { overflow: visible; }
    code.sourceCode > span { color: inherit; text-decoration: inherit; }
    div.sourceCode { margin: 1em 0; }
    pre.sourceCode { margin: 0; }
    @media screen {
    div.sourceCode { overflow: auto; }
    }
    @media print {
    pre > code.sourceCode { white-space: pre-wrap; }
    pre > code.sourceCode > span { text-indent: -5em; padding-left: 5em; }
    }
    pre.numberSource code
      { counter-reset: source-line 0; }
    pre.numberSource code > span
      { position: relative; left: -4em; counter-increment: source-line; }
    pre.numberSource code > span > a:first-child::before
      { content: counter(source-line);
        position: relative; left: -1em; text-align: right; vertical-align: baseline;
        border: none; display: inline-block;
        -webkit-touch-callout: none; -webkit-user-select: none;
        -khtml-user-select: none; -moz-user-select: none;
        -ms-user-select: none; user-select: none;
        padding: 0 4px; width: 4em;
        color: #aaaaaa;
      }
    pre.numberSource { margin-left: 3em; border-left: 1px solid #aaaaaa;  padding-left: 4px; }
    div.sourceCode
      {   }
    @media screen {
    pre > code.sourceCode > span > a:first-child::before { text-decoration: underline; }
    }
    code span.al { color: #ff0000; font-weight: bold; } /* Alert */
    code span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
    code span.at { color: #7d9029; } /* Attribute */
    code span.bn { color: #40a070; } /* BaseN */
    code span.bu { color: #008000; } /* BuiltIn */
    code span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
    code span.ch { color: #4070a0; } /* Char */
    code span.cn { color: #880000; } /* Constant */
    code span.co { color: #60a0b0; font-style: italic; } /* Comment */
    code span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
    code span.do { color: #ba2121; font-style: italic; } /* Documentation */
    code span.dt { color: #902000; } /* DataType */
    code span.dv { color: #40a070; } /* DecVal */
    code span.er { color: #ff0000; font-weight: bold; } /* Error */
    code span.ex { } /* Extension */
    code span.fl { color: #40a070; } /* Float */
    code span.fu { color: #06287e; } /* Function */
    code span.im { color: #008000; font-weight: bold; } /* Import */
    code span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
    code span.kw { color: #007020; font-weight: bold; } /* Keyword */
    code span.op { color: #666666; } /* Operator */
    code span.ot { color: #007020; } /* Other */
    code span.pp { color: #bc7a00; } /* Preprocessor */
    code span.sc { color: #4070a0; } /* SpecialChar */
    code span.ss { color: #bb6688; } /* SpecialString */
    code span.st { color: #4070a0; } /* String */
    code span.va { color: #19177c; } /* Variable */
    code span.vs { color: #4070a0; } /* VerbatimString */
    code span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
    .display.math{display: block; text-align: center; margin: 0.5rem auto;}
  </style>
  <!--[if lt IE 9]>
    <script src="//cdnjs.cloudflare.com/ajax/libs/html5shiv/3.7.3/html5shiv-printshiv.min.js"></script>
  <![endif]-->
</head>
<body>
<section
id="customer-churn-prediction-a-comparative-analysis-of-models-with-and-without-sentiment-analysis-phase-2"
class="cell markdown" id="lI-TXOPmWBDS">
<h1><strong>Customer Churn Prediction: A Comparative Analysis of Models
with and without Sentiment Analysis (PHASE 2).</strong></h1>
<blockquote>
<p><strong>Overview of the Study:</strong></p>
</blockquote>
<ol>
<li><p>The core purpose of this study is to find the impact of Sentiment
Analysis in predicting customer churn for the e-commerce industry by
employing different predictive models.</p></li>
<li><p>Furthermore, the study is also focused on observing which model
is best in a more accurate prediction for determining the churn rate of
customers.</p></li>
</ol>
<blockquote>
<p><strong>Process involved in the Study:</strong></p>
</blockquote>
<ol>
<li><p>The whole project is divided into two phases. In the first phase,
all the relevant variables that are expected to be causing the customer
churn are selected and then the predictive models are developed. In
this, there will be no feedback from the customer is utilized.</p></li>
<li><p>In the second phase, in addition to all the relevant variables
obtained from EDA, the feedback provided by the customers is also
included in this phase to extract the sentiment scores which are now
added to the data frame. Again, the churn predictive models are
developed with this data.</p></li>
<li><p>Finally, the metrics from both these phases will be reviewed and
interpreted to understand if the inclusion of the sentiment analysis
will be helpful for the organization in better understanding why their
customers are parting away without making any future transactions with
the organization.</p></li>
</ol>
</section>
<section
id="research-question-how-does-sentiment-analysis-impact-in-predicting-the-customer-churn-of-an-organization"
class="cell markdown" id="F24qzdzteBm-">
<h2><strong>Research Question:</strong> "How does sentiment analysis
impact in predicting the customer churn of an organization?"</h2>
</section>
<section id="introduction" class="cell markdown" id="UoHa8cvPcbRl">
<h1><strong>Introduction:</strong></h1>
<ol>
<li><p>In the previous deliverables, both the study phases are
implemented. Nonetheless, before proceeding to the comparison of model
metrics from each phase and among both phases (will be done in next step
of study), a few other additional steps have been implemented in this
file of study.</p></li>
<li><p>In addition to the four models implemented, a new model called
<strong>Naive Bayes</strong> is also implemented in this study in this
file.</p></li>
<li><p>Furthermore, the following additional enhancements are also
performed:</p>
<ul>
<li><p>Cross-Validation is changed to <strong>10 folds</strong> instead
of the previous <strong>5-fold</strong> cross-validation.</p></li>
<li><p><strong>Time required</strong> to train the models and also
<strong>memory consumed</strong> by the models are also
addressed.</p></li>
<li><p>Additional hyperparameters related to kernel are added to the
<strong>SVM</strong> tuning process.</p></li>
<li><p><strong>Sensitivity, Specificity, ROC_AUC</strong> scores are
also calculated for models.</p></li>
<li><p>Plotted the <strong>AUC-ROC Curve</strong> for all the
models.</p></li>
<li><p>Finally, <strong>pickle files</strong> have been created for all
the models.</p></li>
</ul></li>
<li><p>In the next step (later in process), the interpretation and
comparison of results will be done.</p></li>
</ol>
</section>
<section
id="this-file-works-on-the-second-phase-of-the-prediction-study-where-the-churn-prediction-is-done-with-the-inclusion-of-the-customer-feedback-the-below-steps-will-dive-into-the-whole-scenario"
class="cell markdown" id="yEe2RtuExDy2">
<h1><strong>This file works on the "second phase" of the prediction
study where the Churn Prediction is done with the inclusion of the
"customer feedback". The below steps will dive into the whole
scenario.</strong></h1>
</section>
<div class="cell code" id="bIV4USnKlcDT">
<div class="sourceCode" id="cb1"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="beginning-of-the-sentiment-analysis" class="cell markdown"
id="2ZTrHng42gyo">
<h1><strong>BEGINNING OF THE SENTIMENT ANALYSIS.</strong></h1>
</section>
<section
id="logging-in-to-the-hugging-face-account-to-connect-for-my-api"
class="cell markdown" id="GpIcFMapMPuY">
<h1><strong>Logging in to the Hugging Face account to connect for my
API</strong></h1>
<ul>
<li><strong>Need to provide the API Key of HuggingFace
Account.</strong></li>
</ul>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:145,&quot;referenced_widgets&quot;:[&quot;4f56ce71a80d4fb59c599d2f30e5b730&quot;,&quot;94dda3f9ffa74718a6e9af91f3c3aa13&quot;,&quot;6e46956d53f14d09b84ce21444e690cf&quot;,&quot;e5f1f794539840419cd09ac368da71ad&quot;,&quot;fa522f5b3253413f80bf5c0ba8268ed4&quot;,&quot;96c7893bd6a548bc8a72073446c2e104&quot;,&quot;81f1e6a7503a4e0382d412c2b63c7b1f&quot;,&quot;9222c776350e4575bea0c7911a3cf452&quot;,&quot;d8bb0fb7e98040dead2b70322b4cdf0d&quot;,&quot;6af18a2c231f488d89b1fb69ff29e853&quot;,&quot;4be6e73fc6584f2490cb1476e5466672&quot;,&quot;66b1be10073743e6ae78fb974abd326c&quot;,&quot;12924014266d4eceb5eb7b5404f0a2f4&quot;,&quot;138189ddf1b3496bab0768bf7b1c2ac4&quot;,&quot;1700c9725fe3402c888958a7fdb42be8&quot;,&quot;e5701c1e24714fdd805d0caf1effeeea&quot;,&quot;cd93fe7356d141a3a1e2ba5cc5fe181f&quot;,&quot;bbf9a2d6cfde47da80513dba136c2cae&quot;,&quot;48c979fb1e95404cb170cb7d76dece79&quot;,&quot;4b053f2410ed4760b5958d4745b1dd38&quot;,&quot;f20e00dd871c49b6842a32741867b046&quot;,&quot;3360f104c9794a999136d8816d6106a8&quot;,&quot;365ff354431f47d5aaf01f1e8b56d766&quot;,&quot;50ee11fe952b442283f2752fe0abb523&quot;,&quot;b1a3b806cf0540089a058295fdab33af&quot;,&quot;d0f9bb9b9d364eeda8f301bdf9500a4a&quot;,&quot;8dc544bde1f04c5eb542c4ec8563b483&quot;,&quot;102e7969f1d44666b0d2a9f74ff20c8a&quot;,&quot;e3b4049091584eb59383cd833841cab1&quot;,&quot;0fcc2ea39dfc49ba9596ed1989b2c2a8&quot;,&quot;cebe9295f97f4329a19b5a904022d098&quot;,&quot;3773b68024d9408695eeda265303f71d&quot;]}"
id="DJLyyTvnZZtN" data-outputId="62d183e2-b5d4-4acd-9ca1-d328a051fc77">
<div class="sourceCode" id="cb2"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> huggingface_hub <span class="im">import</span> notebook_login</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>notebook_login()</span></code></pre></div>
<div class="output display_data">
<div class="sourceCode" id="cb3"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb3-1"><a href="#cb3-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;4f56ce71a80d4fb59c599d2f30e5b730&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
</div>
<section id="installing-required-packages" class="cell markdown"
id="arSCmFyDNIDY">
<h1><strong>Installing required packages</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="vo9nlKdScYL6" data-outputId="828aec9e-bac1-48a1-ccb7-fd113b8fc045">
<div class="sourceCode" id="cb4"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install transformers datasets evaluate accelerate</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.38.2)
Collecting datasets
  Downloading datasets-2.18.0-py3-none-any.whl (510 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 510.5/510.5 kB 6.8 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84.1/84.1 kB 9.4 MB/s eta 0:00:00
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 297.3/297.3 kB 11.5 MB/s eta 0:00:00
ent already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers) (3.13.3)
Requirement already satisfied: huggingface-hub&lt;1.0,&gt;=0.19.3 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.20.3)
Requirement already satisfied: numpy&gt;=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (1.25.2)
Requirement already satisfied: packaging&gt;=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers) (24.0)
Requirement already satisfied: pyyaml&gt;=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers) (2023.12.25)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers) (2.31.0)
Requirement already satisfied: tokenizers&lt;0.19,&gt;=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.15.2)
Requirement already satisfied: safetensors&gt;=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers) (0.4.2)
Requirement already satisfied: tqdm&gt;=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers) (4.66.2)
Requirement already satisfied: pyarrow&gt;=12.0.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (14.0.2)
Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.10/dist-packages (from datasets) (0.6)
Collecting dill&lt;0.3.9,&gt;=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl (116 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 116.3/116.3 kB 14.1 MB/s eta 0:00:00
ent already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from datasets) (2.0.3)
Collecting xxhash (from datasets)
  Downloading xxhash-3.4.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (194 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 194.1/194.1 kB 19.8 MB/s eta 0:00:00
ultiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl (134 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 15.4 MB/s eta 0:00:00
ent already satisfied: fsspec[http]&lt;=2024.2.0,&gt;=2023.1.0 in /usr/local/lib/python3.10/dist-packages (from datasets) (2023.6.0)
Requirement already satisfied: aiohttp in /usr/local/lib/python3.10/dist-packages (from datasets) (3.9.3)
Collecting responses&lt;0.19 (from evaluate)
  Downloading responses-0.18.0-py3-none-any.whl (38 kB)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.5)
Requirement already satisfied: torch&gt;=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.2.1+cu121)
Requirement already satisfied: aiosignal&gt;=1.1.2 in /usr/local/lib/python3.10/dist-packages (from aiohttp-&gt;datasets) (1.3.1)
Requirement already satisfied: attrs&gt;=17.3.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp-&gt;datasets) (23.2.0)
Requirement already satisfied: frozenlist&gt;=1.1.1 in /usr/local/lib/python3.10/dist-packages (from aiohttp-&gt;datasets) (1.4.1)
Requirement already satisfied: multidict&lt;7.0,&gt;=4.5 in /usr/local/lib/python3.10/dist-packages (from aiohttp-&gt;datasets) (6.0.5)
Requirement already satisfied: yarl&lt;2.0,&gt;=1.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp-&gt;datasets) (1.9.4)
Requirement already satisfied: async-timeout&lt;5.0,&gt;=4.0 in /usr/local/lib/python3.10/dist-packages (from aiohttp-&gt;datasets) (4.0.3)
Requirement already satisfied: typing-extensions&gt;=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub&lt;1.0,&gt;=0.19.3-&gt;transformers) (4.10.0)
Requirement already satisfied: charset-normalizer&lt;4,&gt;=2 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers) (3.3.2)
Requirement already satisfied: idna&lt;4,&gt;=2.5 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers) (3.6)
Requirement already satisfied: urllib3&lt;3,&gt;=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers) (2.0.7)
Requirement already satisfied: certifi&gt;=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers) (2024.2.2)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (3.1.3)
Collecting nvidia-cuda-nvrtc-cu12==12.1.105 (from torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cuda_nvrtc_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (23.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 23.7/23.7 MB 29.1 MB/s eta 0:00:00
e-cu12==12.1.105 (from torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cuda_runtime_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (823 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 823.6/823.6 kB 34.8 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cuda_cupti_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (14.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 14.1/14.1 MB 42.9 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cudnn_cu12-8.9.2.26-py3-none-manylinux1_x86_64.whl (731.7 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 1.3 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 3.7 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 8.7 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 14.6 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 6.5 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 6.8 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 7.9 MB/s eta 0:00:00
 torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 14.9 MB/s eta 0:00:00
ent already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (2.2.0)
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107-&gt;torch&gt;=1.10.0-&gt;accelerate)
  Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 72.0 MB/s eta 0:00:00
ent already satisfied: python-dateutil&gt;=2.8.2 in /usr/local/lib/python3.10/dist-packages (from pandas-&gt;datasets) (2.8.2)
Requirement already satisfied: pytz&gt;=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas-&gt;datasets) (2023.4)
Requirement already satisfied: tzdata&gt;=2022.1 in /usr/local/lib/python3.10/dist-packages (from pandas-&gt;datasets) (2024.1)
Requirement already satisfied: six&gt;=1.5 in /usr/local/lib/python3.10/dist-packages (from python-dateutil&gt;=2.8.2-&gt;pandas-&gt;datasets) (1.16.0)
Requirement already satisfied: MarkupSafe&gt;=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2-&gt;torch&gt;=1.10.0-&gt;accelerate) (2.1.5)
Requirement already satisfied: mpmath&gt;=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy-&gt;torch&gt;=1.10.0-&gt;accelerate) (1.3.0)
Installing collected packages: xxhash, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, dill, responses, nvidia-cusparse-cu12, nvidia-cudnn-cu12, multiprocess, nvidia-cusolver-cu12, datasets, evaluate, accelerate
Successfully installed accelerate-0.29.1 datasets-2.18.0 dill-0.3.8 evaluate-0.4.1 multiprocess-0.70.16 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.1.105 responses-0.18.0 xxhash-3.4.1
</code></pre>
</div>
</div>
<div class="cell code" id="bWBmnW1kIby7">
<div class="sourceCode" id="cb6"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell markdown" id="n_4vaaqCH7Y8">
<p><strong>In order to build the sentiment analysis model, initially
this was trained on the large corpus of textual data. This helps in
training the model on robust textual inputs so that the trained model
can perform well on any of the future sentiment extraction
tasks.</strong></p>
<ul>
<li>To pre-train the model, the data from IMDB has been used. The IMDB
dataset contains large amound of data on the movie reviews. Also, the
reviews are lengthy enough. Subsequently, the model will now be
pre-trained with IMDB dataset.</li>
</ul>
</div>
<section id="importing-the-imdb-dataset-from-the-datasets-package"
class="cell markdown" id="4HMMcobXNReV">
<h1><strong>Importing the IMDB dataset from the datasets
package</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:365,&quot;referenced_widgets&quot;:[&quot;53ebe2e0f6914f5b8f948eb4caa74150&quot;,&quot;d4abdbd16a5f43a2bc22176e17723fbc&quot;,&quot;b4ce1c31c6fa42ef9710401fbf491d5a&quot;,&quot;462f789451ad48f68669e1db62e71277&quot;,&quot;806543975c204b27a49e2d45eaee9f94&quot;,&quot;d0bd9fec4d1848aabfdf1e12faebecb4&quot;,&quot;0474b6363b5e465ca9e42a9830db01f6&quot;,&quot;f42bf794457f460fb4ac8b29b3b34e98&quot;,&quot;b5ad04f046544238982a7345a97a092a&quot;,&quot;c4950f7c91694338b203b5fc65fc94f7&quot;,&quot;4956f16eb17a4309b5dfdddf02d2dcde&quot;,&quot;e1836f150ea04a9ba0f38d292c22cb61&quot;,&quot;2538b487919449af9f8679d9dd0bd309&quot;,&quot;ddcb16dfc76344e1b77ce2c3fb6b182e&quot;,&quot;819f58bbe75d473199103d185c589bf0&quot;,&quot;d1e602edd9ef4854a017ca89c9fa027c&quot;,&quot;dd705030a62449a189b44eda1927d46f&quot;,&quot;f019841ec14441dab8e1e26ee8648a99&quot;,&quot;e65c8e2a47c3454cbfcb4e1b0bb65714&quot;,&quot;dcf33694c1064fd9b623dcc16e52ebb9&quot;,&quot;2d7ff45b2e0145b999101fdf196b221f&quot;,&quot;a2753a39a9854307bb76c7b44e90e787&quot;,&quot;535a09ee4a924202a202a51055ce1469&quot;,&quot;dc2cc065091d4783a59d9f5efaace794&quot;,&quot;85ca4436d58f4954a29659cec07e4820&quot;,&quot;2331366fff9d4a17ac02dcd76a445335&quot;,&quot;be2d8024b0804d87af5008b463c885d3&quot;,&quot;ffaceebfb35144e58d32a4bd10690f39&quot;,&quot;7165648d10fc4c08ac0cf619698e0d67&quot;,&quot;1bd189b2ba8c4669a5d470ccbc04178f&quot;,&quot;acb0bafa0ba04bf5af40b7e2a2da0047&quot;,&quot;03d4d8e60e7c4b70b1fb0394ca28bd28&quot;,&quot;9ed4c3ac897243dbbde70fe6e303ac37&quot;,&quot;9062851d1cbe403096076dff1f32cdfa&quot;,&quot;efb4ff6ecbf244628326daaa5ef93344&quot;,&quot;a773f2079b8140989535b4cf129bd7af&quot;,&quot;7db2c706f52c45ab9a6d7441e5d8a619&quot;,&quot;4bd9ee93b1184861a9740188e27ffcf8&quot;,&quot;97089f1a710d40d096eb7a7fdc18b273&quot;,&quot;94017d0946544a56a06ac640ba5819d5&quot;,&quot;25520f15f6db44ecb7f040b4b8982c79&quot;,&quot;0602382ab64c4e7d805c8e57e1d2c954&quot;,&quot;6fae8475aea34381a36a3847f565f9d5&quot;,&quot;e1c1c196100748a2906033a3b47410a0&quot;,&quot;d9eaa548cb754c11ad5f4476cc77fdca&quot;,&quot;6df08e00402244328e4027f391393d8b&quot;,&quot;12344c7c4dbb4a7d9c721681d5167da1&quot;,&quot;56efe7e201d94dc498c6b6a1967e195b&quot;,&quot;08f7df0215854b8289897382dce8e61b&quot;,&quot;76613b547ee7440eaaf6a395028c49a3&quot;,&quot;2aefaf9544424d33a4aa00b842f101ad&quot;,&quot;366aaa57507f454c95303bbf3f5c1bd9&quot;,&quot;2cde5214ca204455967b0138dd2874cb&quot;,&quot;371f695e2f5c41bdb7dee29e23218e77&quot;,&quot;562468d2c3e34aaea5e222186ca7ec93&quot;,&quot;87c3f0cc448843ee993abf28b742fb36&quot;,&quot;e94c15847c6a4c7fbd372bc7b1297b9b&quot;,&quot;2b423ede469344afbf8e8ab289905420&quot;,&quot;c0189fcf2a1f4e0f8b40e4dd94c2d904&quot;,&quot;9ca1db906ddd49249add673fa3a86842&quot;,&quot;b7e6496be44545a9a1fd1ad73cbff440&quot;,&quot;3c2beabf315a4885a61d82449be68a58&quot;,&quot;c62ef662b34541a6bfabc872bc9374ce&quot;,&quot;39c84df545a047239ce4976dba86e85a&quot;,&quot;666d0a84368d4b5ca43e6f35d468d34f&quot;,&quot;8002d1209823476cb878365b309a5e05&quot;,&quot;4ef1ce5eb12441c0b3ee46fc8132f756&quot;,&quot;3b63f6d105e749f8b96ea07a6fdebe24&quot;,&quot;9ac6287455254ad0aa2a58823de43dfe&quot;,&quot;8e203f272e674044afe0137687ff2616&quot;,&quot;749b5eec7ce448a78a35b336cca83326&quot;,&quot;740fb401623a409187ff470752533ee1&quot;,&quot;78b132707d304f1eb1d2eadb0743d4b7&quot;,&quot;c15421964a58402db0632b2488b7f0ad&quot;,&quot;8548e8dc13dd4899b7e877bf577a05c1&quot;,&quot;1f504547512e4190bcbc11c41b63592e&quot;,&quot;b3369b4e41874c95b4d827d17694b8a9&quot;]}"
id="geM7P7DScVtI" data-outputId="e89eef97-3808-42f0-abc1-467656055355">
<div class="sourceCode" id="cb7"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> datasets <span class="im">import</span> load_dataset</span>
<span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb7-3"><a href="#cb7-3" aria-hidden="true" tabindex="-1"></a>imdb <span class="op">=</span> load_dataset(<span class="st">&quot;imdb&quot;</span>)</span></code></pre></div>
<div class="output stream stderr">
<pre><code>/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
</code></pre>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb9"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;53ebe2e0f6914f5b8f948eb4caa74150&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb10"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb10-1"><a href="#cb10-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;e1836f150ea04a9ba0f38d292c22cb61&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb11"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;535a09ee4a924202a202a51055ce1469&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb12"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;9062851d1cbe403096076dff1f32cdfa&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb13"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;d9eaa548cb754c11ad5f4476cc77fdca&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb14"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;87c3f0cc448843ee993abf28b742fb36&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb15"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb15-1"><a href="#cb15-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;4ef1ce5eb12441c0b3ee46fc8132f756&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
</div>
<div class="cell code" id="TKRlI-XJcdY-">
<div class="sourceCode" id="cb16"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a><span class="co"># from datasets import load_dataset</span></span>
<span id="cb16-2"><a href="#cb16-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> torch.utils.data <span class="im">import</span> DataLoader</span></code></pre></div>
</div>
<section id="process-of-tokenization" class="cell markdown"
id="S-lEwLS5OtE8">
<h1><strong>Process of Tokenization</strong></h1>
</section>
<section
id="importing-the-autotokenizer-from-transformers-to-perform-tokenization"
class="cell markdown" id="wCemJHGbNZlD">
<h1><strong>Importing the AutoTokenizer from Transformers to Perform
Tokenization</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:145,&quot;referenced_widgets&quot;:[&quot;aee795bcd87b41a8a98a543aab4c549e&quot;,&quot;75022eb9a8d74f638c4a2deb5dd09de4&quot;,&quot;ba49ef0aedf541c4b02c3f64d59c480c&quot;,&quot;65dc131f5f6949b9b318178353013fa6&quot;,&quot;178331ceeac94e5f97e7ec6bf5b62582&quot;,&quot;4c40cc2cd5a647f5b08e682fcf97a891&quot;,&quot;789a070a3c4949d19b5a8bdf636b10ec&quot;,&quot;6edece18e7454968854474cdd092e210&quot;,&quot;650e86739e2148f393f50eb0c2dd84ab&quot;,&quot;f7fb7f05059e47578c8038d93e3e03f7&quot;,&quot;254fae362e63499e8f4c99c6f708116b&quot;,&quot;7217129048c745a999bd554ce7602c6c&quot;,&quot;180c10211b2c49e7b4e36aeb7919799d&quot;,&quot;e65e3ad0e4594d84874d1fe796916a53&quot;,&quot;223d386be1984c0da210289bbfe6dca1&quot;,&quot;7f2f3f93888849768f1076706527407f&quot;,&quot;ea3a82adbf5c445cabee59522b0bb1a3&quot;,&quot;e1c37ffbaa41403db425371a8766048a&quot;,&quot;4d5b0b2fae684946a505b761a60c9635&quot;,&quot;f4c4b00008304f0797e70a8349421e68&quot;,&quot;479fff8828be4340aa86974bc996ffcc&quot;,&quot;3fec3487acf3479a8c095f8571dadb02&quot;,&quot;876b66bd8a0a412f92dd92530307c551&quot;,&quot;dff3a4fc9768431d8556593935851cf7&quot;,&quot;7f33cd02eba744afaf7d845068a4552e&quot;,&quot;83169973762f43079c9f0d6f83813b5c&quot;,&quot;6579d534837a41319ecbdea953b194a3&quot;,&quot;b67f259246064061879dc56c4f464705&quot;,&quot;51106708d4b448539a5a423c2469f824&quot;,&quot;81004573772b486bb00b89132cbe226f&quot;,&quot;9f70aa25242e4f1fbcdcb4f7c7b3978b&quot;,&quot;ecfccf86258045da8242ae9fa152048f&quot;,&quot;9e19b71e4c9f4c818cf3cf85b72402bf&quot;,&quot;32b9ff4f943a45118f3abdf26db82d0f&quot;,&quot;d7e4fb7131e54cccb819cd516597fdab&quot;,&quot;c48464d82d71465ca28057a0c2795568&quot;,&quot;0a663de4e3ab4bd5ba40a18e5c60a677&quot;,&quot;af1da15603f5484caaede8b13217b22e&quot;,&quot;521239ec3b2a42d6accd9b1ee76b1282&quot;,&quot;20003fd692e6456b84b2de51cce8ee41&quot;,&quot;ba1d3ca23b614a5e99fb809eeefe702a&quot;,&quot;fdd9342e53e8442db1c3d594a307b114&quot;,&quot;6f3b63d55fa341398f1ab00bc648e369&quot;,&quot;1c7ff51ed319430984a34108187a5e23&quot;]}"
id="3P0VO9-Idl-e" data-outputId="f97b9ba1-9671-4d3a-9aa1-ef6ff84d22f0">
<div class="sourceCode" id="cb17"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Initializing the Tokenizer</span></span>
<span id="cb17-2"><a href="#cb17-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-3"><a href="#cb17-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> transformers <span class="im">import</span> AutoTokenizer</span>
<span id="cb17-4"><a href="#cb17-4" aria-hidden="true" tabindex="-1"></a>tokenizer <span class="op">=</span> AutoTokenizer.from_pretrained(<span class="st">&quot;distilbert-base-uncased&quot;</span>)</span>
<span id="cb17-5"><a href="#cb17-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-6"><a href="#cb17-6" aria-hidden="true" tabindex="-1"></a>max_length <span class="op">=</span> <span class="dv">512</span> <span class="co"># Set max length</span></span>
<span id="cb17-7"><a href="#cb17-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb17-8"><a href="#cb17-8" aria-hidden="true" tabindex="-1"></a><span class="kw">def</span> preprocess_function(examples):</span>
<span id="cb17-9"><a href="#cb17-9" aria-hidden="true" tabindex="-1"></a>    <span class="cf">return</span> tokenizer(examples[<span class="st">&quot;text&quot;</span>], truncation<span class="op">=</span><span class="va">True</span>, max_length<span class="op">=</span>max_length)</span></code></pre></div>
<div class="output display_data">
<div class="sourceCode" id="cb18"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;aee795bcd87b41a8a98a543aab4c549e&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb19"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;7217129048c745a999bd554ce7602c6c&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb20"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb20-1"><a href="#cb20-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;876b66bd8a0a412f92dd92530307c551&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb21"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb21-1"><a href="#cb21-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;32b9ff4f943a45118f3abdf26db82d0f&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
</div>
<section id="tokenizing-the-data" class="cell markdown"
id="vvn50kkgUitS">
<h1><strong>Tokenizing the Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:49,&quot;referenced_widgets&quot;:[&quot;f9611561996745c2b4f2057f17d7bca6&quot;,&quot;1221dc3338b04116a5c00c4c0ea67559&quot;,&quot;c74c1e9546b74d708fb121ce7ab596fe&quot;,&quot;41e0145f5a3145dfa2548eb56939985d&quot;,&quot;2dc794393a7f4f5b8c387aa760068c6b&quot;,&quot;208b866b004e4d5c927dff8a600fd719&quot;,&quot;5e7f94599d1647dfaf1429c52921d711&quot;,&quot;635be5960da64f8c8f3b95b07b0a71cf&quot;,&quot;7a7384df262f40b9aa15d2e02d8ece90&quot;,&quot;919420ae933c4ee38b19101f98fc0c2f&quot;,&quot;7da32491e1a148dba578c7d92293c9db&quot;]}"
id="lZJ_Ph5ilTm5" data-outputId="374b0559-8f6a-40c6-b480-5ba8243ecff3">
<div class="sourceCode" id="cb22"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb22-1"><a href="#cb22-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> transformers <span class="im">import</span> DataCollatorWithPadding</span>
<span id="cb22-2"><a href="#cb22-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Tokenize datasets</span></span>
<span id="cb22-3"><a href="#cb22-3" aria-hidden="true" tabindex="-1"></a>tokenized_ds <span class="op">=</span> imdb.<span class="bu">map</span>(preprocess_function, batched<span class="op">=</span><span class="va">True</span>)</span>
<span id="cb22-4"><a href="#cb22-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb22-5"><a href="#cb22-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Collate function to handle padding</span></span>
<span id="cb22-6"><a href="#cb22-6" aria-hidden="true" tabindex="-1"></a>data_collator <span class="op">=</span> DataCollatorWithPadding(tokenizer, return_tensors<span class="op">=</span><span class="st">&#39;pt&#39;</span>)</span></code></pre></div>
<div class="output display_data">
<div class="sourceCode" id="cb23"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb23-1"><a href="#cb23-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;f9611561996745c2b4f2057f17d7bca6&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
</div>
<section id="creating-the-data-loaders" class="cell markdown"
id="fxklFMU8VPbz">
<h1><strong>Creating the Data Loaders</strong></h1>
</section>
<div class="cell code" id="VRxAzdPeeIV5">
<div class="sourceCode" id="cb24"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb24-1"><a href="#cb24-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Create data loaders</span></span>
<span id="cb24-2"><a href="#cb24-2" aria-hidden="true" tabindex="-1"></a>train_dataloader <span class="op">=</span> DataLoader(tokenized_ds[<span class="st">&#39;train&#39;</span>], collate_fn<span class="op">=</span>data_collator)</span>
<span id="cb24-3"><a href="#cb24-3" aria-hidden="true" tabindex="-1"></a>val_dataloader <span class="op">=</span> DataLoader(tokenized_ds[<span class="st">&#39;test&#39;</span>], collate_fn<span class="op">=</span>data_collator)</span></code></pre></div>
</div>
<div class="cell code" id="dsmcuqkqgUhs">
<div class="sourceCode" id="cb25"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb25-1"><a href="#cb25-1" aria-hidden="true" tabindex="-1"></a><span class="co"># ## Evaluating the accuracy</span></span>
<span id="cb25-2"><a href="#cb25-2" aria-hidden="true" tabindex="-1"></a><span class="co"># import evaluate</span></span>
<span id="cb25-3"><a href="#cb25-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb25-4"><a href="#cb25-4" aria-hidden="true" tabindex="-1"></a><span class="co"># accuracy = evaluate.load(&quot;accuracy&quot;)</span></span></code></pre></div>
</div>
<div class="cell code" id="0SGHsMMdgW8P">
<div class="sourceCode" id="cb26"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb26-1"><a href="#cb26-1" aria-hidden="true" tabindex="-1"></a><span class="co"># ## Importing the Evaluation Metrics</span></span>
<span id="cb26-2"><a href="#cb26-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb26-3"><a href="#cb26-3" aria-hidden="true" tabindex="-1"></a><span class="co"># import numpy as np</span></span>
<span id="cb26-4"><a href="#cb26-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb26-5"><a href="#cb26-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb26-6"><a href="#cb26-6" aria-hidden="true" tabindex="-1"></a><span class="co"># def compute_metrics(eval_pred):</span></span>
<span id="cb26-7"><a href="#cb26-7" aria-hidden="true" tabindex="-1"></a><span class="co">#     predictions, labels = eval_pred</span></span>
<span id="cb26-8"><a href="#cb26-8" aria-hidden="true" tabindex="-1"></a><span class="co">#     predictions = np.argmax(predictions, axis=1)</span></span>
<span id="cb26-9"><a href="#cb26-9" aria-hidden="true" tabindex="-1"></a><span class="co">#     return accuracy.compute(predictions=predictions, references=labels)</span></span></code></pre></div>
</div>
<section
id="providing-the-numerical-values-to-the-sentiment-labels-initially"
class="cell markdown" id="3Hh67dGsJUhi">
<h1><strong>Providing the Numerical Values to the Sentiment Labels
initially.</strong></h1>
</section>
<div class="cell code" id="7t5e__s5nJ47">
<div class="sourceCode" id="cb27"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb27-1"><a href="#cb27-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Providing labels to the Data</span></span>
<span id="cb27-2"><a href="#cb27-2" aria-hidden="true" tabindex="-1"></a>id2label <span class="op">=</span> {<span class="dv">0</span>: <span class="st">&quot;NEGATIVE&quot;</span>, <span class="dv">1</span>: <span class="st">&quot;POSITIVE&quot;</span>}</span>
<span id="cb27-3"><a href="#cb27-3" aria-hidden="true" tabindex="-1"></a>label2id <span class="op">=</span> {<span class="st">&quot;NEGATIVE&quot;</span>: <span class="dv">0</span>, <span class="st">&quot;POSITIVE&quot;</span>: <span class="dv">1</span>}</span></code></pre></div>
</div>
<section id="loading-the-distilbert-base-uncased-model"
class="cell markdown" id="EvWmKT2MVsyW">
<h1><strong>Loading the DistilBERT-Base-Uncased Model</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="YUguf4aQnKW_" data-outputId="9bf8ef65-135b-4c08-aa13-4145402439a0">
<div class="sourceCode" id="cb28"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb28-1"><a href="#cb28-1" aria-hidden="true" tabindex="-1"></a>  <span class="im">from</span> transformers <span class="im">import</span> AutoModelForSequenceClassification, TrainingArguments, Trainer</span>
<span id="cb28-2"><a href="#cb28-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb28-3"><a href="#cb28-3" aria-hidden="true" tabindex="-1"></a>  model <span class="op">=</span> AutoModelForSequenceClassification.from_pretrained(</span>
<span id="cb28-4"><a href="#cb28-4" aria-hidden="true" tabindex="-1"></a>      <span class="st">&quot;distilbert/distilbert-base-uncased&quot;</span>, num_labels<span class="op">=</span><span class="dv">2</span>, id2label<span class="op">=</span>id2label, label2id<span class="op">=</span>label2id</span>
<span id="cb28-5"><a href="#cb28-5" aria-hidden="true" tabindex="-1"></a>  )</span></code></pre></div>
<div class="output stream stderr">
<pre><code>Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert/distilbert-base-uncased and are newly initialized: [&#39;classifier.bias&#39;, &#39;classifier.weight&#39;, &#39;pre_classifier.bias&#39;, &#39;pre_classifier.weight&#39;]
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="zHmlhg8uCP8K" data-outputId="0e9b4ea2-666f-418f-d27b-ad6d39fb0ea5">
<div class="sourceCode" id="cb30"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb30-1"><a href="#cb30-1" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install accelerate <span class="op">-</span>U</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Requirement already satisfied: accelerate in /usr/local/lib/python3.10/dist-packages (0.28.0)
Requirement already satisfied: numpy&gt;=1.17 in /usr/local/lib/python3.10/dist-packages (from accelerate) (1.25.2)
Requirement already satisfied: packaging&gt;=20.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (24.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate) (5.9.5)
Requirement already satisfied: pyyaml in /usr/local/lib/python3.10/dist-packages (from accelerate) (6.0.1)
Requirement already satisfied: torch&gt;=1.10.0 in /usr/local/lib/python3.10/dist-packages (from accelerate) (2.2.1+cu121)
Requirement already satisfied: huggingface-hub in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.20.3)
Requirement already satisfied: safetensors&gt;=0.3.1 in /usr/local/lib/python3.10/dist-packages (from accelerate) (0.4.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (3.13.3)
Requirement already satisfied: typing-extensions&gt;=4.8.0 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (4.10.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (3.1.3)
Requirement already satisfied: fsspec in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (2023.6.0)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (8.9.2.26)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (2.19.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (12.1.105)
Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch&gt;=1.10.0-&gt;accelerate) (2.2.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107-&gt;torch&gt;=1.10.0-&gt;accelerate) (12.4.99)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from huggingface-hub-&gt;accelerate) (2.31.0)
Requirement already satisfied: tqdm&gt;=4.42.1 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub-&gt;accelerate) (4.66.2)
Requirement already satisfied: MarkupSafe&gt;=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2-&gt;torch&gt;=1.10.0-&gt;accelerate) (2.1.5)
Requirement already satisfied: charset-normalizer&lt;4,&gt;=2 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;huggingface-hub-&gt;accelerate) (3.3.2)
Requirement already satisfied: idna&lt;4,&gt;=2.5 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;huggingface-hub-&gt;accelerate) (3.6)
Requirement already satisfied: urllib3&lt;3,&gt;=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;huggingface-hub-&gt;accelerate) (2.0.7)
Requirement already satisfied: certifi&gt;=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;huggingface-hub-&gt;accelerate) (2024.2.2)
Requirement already satisfied: mpmath&gt;=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy-&gt;torch&gt;=1.10.0-&gt;accelerate) (1.3.0)
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="uEdFDfnBCTtH" data-outputId="c7d364dd-857f-41bb-e128-92274fb3e12d">
<div class="sourceCode" id="cb32"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb32-1"><a href="#cb32-1" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install transformers[torch]</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Requirement already satisfied: transformers[torch] in /usr/local/lib/python3.10/dist-packages (4.38.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (3.13.3)
Requirement already satisfied: huggingface-hub&lt;1.0,&gt;=0.19.3 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (0.20.3)
Requirement already satisfied: numpy&gt;=1.17 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (1.25.2)
Requirement already satisfied: packaging&gt;=20.0 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (24.0)
Requirement already satisfied: pyyaml&gt;=5.1 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (2023.12.25)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (2.31.0)
Requirement already satisfied: tokenizers&lt;0.19,&gt;=0.14 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (0.15.2)
Requirement already satisfied: safetensors&gt;=0.4.1 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (0.4.2)
Requirement already satisfied: tqdm&gt;=4.27 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (4.66.2)
Requirement already satisfied: torch in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (2.2.1+cu121)
Requirement already satisfied: accelerate&gt;=0.21.0 in /usr/local/lib/python3.10/dist-packages (from transformers[torch]) (0.28.0)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from accelerate&gt;=0.21.0-&gt;transformers[torch]) (5.9.5)
Requirement already satisfied: fsspec&gt;=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub&lt;1.0,&gt;=0.19.3-&gt;transformers[torch]) (2023.6.0)
Requirement already satisfied: typing-extensions&gt;=3.7.4.3 in /usr/local/lib/python3.10/dist-packages (from huggingface-hub&lt;1.0,&gt;=0.19.3-&gt;transformers[torch]) (4.10.0)
Requirement already satisfied: sympy in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (1.12)
Requirement already satisfied: networkx in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (3.2.1)
Requirement already satisfied: jinja2 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (3.1.3)
Requirement already satisfied: nvidia-cuda-nvrtc-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (12.1.105)
Requirement already satisfied: nvidia-cuda-runtime-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (12.1.105)
Requirement already satisfied: nvidia-cuda-cupti-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (12.1.105)
Requirement already satisfied: nvidia-cudnn-cu12==8.9.2.26 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (8.9.2.26)
Requirement already satisfied: nvidia-cublas-cu12==12.1.3.1 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (12.1.3.1)
Requirement already satisfied: nvidia-cufft-cu12==11.0.2.54 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (11.0.2.54)
Requirement already satisfied: nvidia-curand-cu12==10.3.2.106 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (10.3.2.106)
Requirement already satisfied: nvidia-cusolver-cu12==11.4.5.107 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (11.4.5.107)
Requirement already satisfied: nvidia-cusparse-cu12==12.1.0.106 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (12.1.0.106)
Requirement already satisfied: nvidia-nccl-cu12==2.19.3 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (2.19.3)
Requirement already satisfied: nvidia-nvtx-cu12==12.1.105 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (12.1.105)
Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch-&gt;transformers[torch]) (2.2.0)
Requirement already satisfied: nvidia-nvjitlink-cu12 in /usr/local/lib/python3.10/dist-packages (from nvidia-cusolver-cu12==11.4.5.107-&gt;torch-&gt;transformers[torch]) (12.4.99)
Requirement already satisfied: charset-normalizer&lt;4,&gt;=2 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers[torch]) (3.3.2)
Requirement already satisfied: idna&lt;4,&gt;=2.5 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers[torch]) (3.6)
Requirement already satisfied: urllib3&lt;3,&gt;=1.21.1 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers[torch]) (2.0.7)
Requirement already satisfied: certifi&gt;=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from requests-&gt;transformers[torch]) (2024.2.2)
Requirement already satisfied: MarkupSafe&gt;=2.0 in /usr/local/lib/python3.10/dist-packages (from jinja2-&gt;torch-&gt;transformers[torch]) (2.1.5)
Requirement already satisfied: mpmath&gt;=0.19 in /usr/local/lib/python3.10/dist-packages (from sympy-&gt;torch-&gt;transformers[torch]) (1.3.0)
</code></pre>
</div>
</div>
<section
id="setting-the-training-hyperparamters-and-the-trainer-function"
class="cell markdown" id="rrF7gDJNAat3">
<h1><strong>Setting the training hyperparamters and the trainer
function</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="tG9uTTubnxSp" data-outputId="a837fca7-d8cd-4e71-ea73-c9b973363515">
<div class="sourceCode" id="cb34"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb34-1"><a href="#cb34-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Set max length for trainer</span></span>
<span id="cb34-2"><a href="#cb34-2" aria-hidden="true" tabindex="-1"></a>training_args <span class="op">=</span> TrainingArguments(</span>
<span id="cb34-3"><a href="#cb34-3" aria-hidden="true" tabindex="-1"></a>    output_dir<span class="op">=</span><span class="st">&quot;my_awesome_model&quot;</span>,</span>
<span id="cb34-4"><a href="#cb34-4" aria-hidden="true" tabindex="-1"></a>    learning_rate<span class="op">=</span><span class="fl">2e-5</span>,</span>
<span id="cb34-5"><a href="#cb34-5" aria-hidden="true" tabindex="-1"></a>    per_device_train_batch_size<span class="op">=</span><span class="dv">16</span>,</span>
<span id="cb34-6"><a href="#cb34-6" aria-hidden="true" tabindex="-1"></a>    per_device_eval_batch_size<span class="op">=</span><span class="dv">16</span>,</span>
<span id="cb34-7"><a href="#cb34-7" aria-hidden="true" tabindex="-1"></a>    num_train_epochs<span class="op">=</span><span class="dv">2</span>,</span>
<span id="cb34-8"><a href="#cb34-8" aria-hidden="true" tabindex="-1"></a>    weight_decay<span class="op">=</span><span class="fl">0.01</span>,</span>
<span id="cb34-9"><a href="#cb34-9" aria-hidden="true" tabindex="-1"></a>    evaluation_strategy<span class="op">=</span><span class="st">&quot;epoch&quot;</span>,</span>
<span id="cb34-10"><a href="#cb34-10" aria-hidden="true" tabindex="-1"></a>    save_strategy<span class="op">=</span><span class="st">&quot;epoch&quot;</span>,</span>
<span id="cb34-11"><a href="#cb34-11" aria-hidden="true" tabindex="-1"></a>    load_best_model_at_end<span class="op">=</span><span class="va">True</span>,</span>
<span id="cb34-12"><a href="#cb34-12" aria-hidden="true" tabindex="-1"></a>    push_to_hub<span class="op">=</span><span class="va">True</span>,</span>
<span id="cb34-13"><a href="#cb34-13" aria-hidden="true" tabindex="-1"></a>    max_steps<span class="op">=</span>max_length</span>
<span id="cb34-14"><a href="#cb34-14" aria-hidden="true" tabindex="-1"></a>)</span>
<span id="cb34-15"><a href="#cb34-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb34-16"><a href="#cb34-16" aria-hidden="true" tabindex="-1"></a>trainer <span class="op">=</span> Trainer(</span>
<span id="cb34-17"><a href="#cb34-17" aria-hidden="true" tabindex="-1"></a>    model<span class="op">=</span>model,</span>
<span id="cb34-18"><a href="#cb34-18" aria-hidden="true" tabindex="-1"></a>    args<span class="op">=</span>training_args,</span>
<span id="cb34-19"><a href="#cb34-19" aria-hidden="true" tabindex="-1"></a>    train_dataset<span class="op">=</span>tokenized_ds[<span class="st">&#39;train&#39;</span>], <span class="co">#tokenized_imdb[&quot;train&quot;],</span></span>
<span id="cb34-20"><a href="#cb34-20" aria-hidden="true" tabindex="-1"></a>    eval_dataset<span class="op">=</span>tokenized_ds[<span class="st">&#39;test&#39;</span>], <span class="co">#tokenized_imdb[&quot;test&quot;],</span></span>
<span id="cb34-21"><a href="#cb34-21" aria-hidden="true" tabindex="-1"></a>    tokenizer<span class="op">=</span>tokenizer,</span>
<span id="cb34-22"><a href="#cb34-22" aria-hidden="true" tabindex="-1"></a>    data_collator<span class="op">=</span>data_collator,</span>
<span id="cb34-23"><a href="#cb34-23" aria-hidden="true" tabindex="-1"></a>    compute_metrics<span class="op">=</span>compute_metrics,</span>
<span id="cb34-24"><a href="#cb34-24" aria-hidden="true" tabindex="-1"></a>)</span></code></pre></div>
<div class="output stream stderr">
<pre><code>/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys([&#39;dispatch_batches&#39;, &#39;split_batches&#39;, &#39;even_batches&#39;, &#39;use_seedable_sampler&#39;]). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
  warnings.warn(
</code></pre>
</div>
</div>
<section id="training-the-model" class="cell markdown"
id="_HlXFoDyAi91">
<h1><strong>Training the model</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:142}"
id="8OnGXg2xouSj" data-outputId="c223b66b-30db-4424-8ef0-3d7a37405293">
<div class="sourceCode" id="cb36"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb36-1"><a href="#cb36-1" aria-hidden="true" tabindex="-1"></a>trainer.train()</span></code></pre></div>
<div class="output display_data">

    <div>
      
      <progress value='512' max='512' style='width:300px; height:20px; vertical-align: middle;'></progress>
      [512/512 13:03, Epoch 0/1]
    </div>
    <table border="1" class="dataframe">
  <thead>
 <tr style="text-align: left;">
      <th>Epoch</th>
      <th>Training Loss</th>
      <th>Validation Loss</th>
      <th>Accuracy</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>0</td>
      <td>0.308100</td>
      <td>0.226634</td>
      <td>0.912840</td>
    </tr>
  </tbody>
</table><p>
</div>
<div class="output execute_result" data-execution_count="12">
<pre><code>TrainOutput(global_step=512, training_loss=0.3065403923392296, metrics={&#39;train_runtime&#39;: 787.4252, &#39;train_samples_per_second&#39;: 10.404, &#39;train_steps_per_second&#39;: 0.65, &#39;total_flos&#39;: 1075357923470784.0, &#39;train_loss&#39;: 0.3065403923392296, &#39;epoch&#39;: 0.33})</code></pre>
</div>
</div>
<section id="pushing-the-trained-model-into-the-huggingface-account"
class="cell markdown" id="8JuLMdDJV2ll">
<h2><strong>Pushing the trained model into the huggingface
account.</strong></h2>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:85,&quot;referenced_widgets&quot;:[&quot;69c132bef5d74e3aacc988c67c719d5e&quot;,&quot;30b50a9e4c7f4974a7e289868c194c64&quot;,&quot;e80009bcacfd4f79af9ec3c147a8047e&quot;,&quot;c2fefe95ff9c4dda8c95cab4e5fd8df0&quot;,&quot;8e128a3106974f04bdec391571d08ebf&quot;,&quot;e68032b004164d98a21911b3d2b9ba82&quot;,&quot;417de7c7222342f091fb04ffa81d3a32&quot;,&quot;864ea33b8667485caa2ba8b827939dc5&quot;,&quot;4b3cd790d68040b7944407e58cafa4aa&quot;,&quot;8b0d13aecf084e328a4f6ba87857f875&quot;,&quot;01b7b925a00d4c178f0c152ef6ecb91c&quot;]}"
id="qNdb7g6-udRS" data-outputId="d8015f43-8351-45a6-f814-e301e2108bc4">
<div class="sourceCode" id="cb38"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb38-1"><a href="#cb38-1" aria-hidden="true" tabindex="-1"></a>trainer.push_to_hub()</span></code></pre></div>
<div class="output display_data">
<div class="sourceCode" id="cb39"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb39-1"><a href="#cb39-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;69c132bef5d74e3aacc988c67c719d5e&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output execute_result" data-execution_count="13">
<div class="sourceCode" id="cb40"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb40-1"><a href="#cb40-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;type&quot;</span><span class="fu">:</span><span class="st">&quot;string&quot;</span><span class="fu">}</span></span></code></pre></div>
</div>
</div>
<section
id="the-development-of-the-sentiment-analysis-model-is-completed-with-the-above-step"
class="cell markdown" id="Vk_fzgzX4ict">
<h1><strong>The development of the sentiment analysis model is completed
with the above step.</strong></h1>
<ul>
<li><p>A model is built which can extract the sentiments and the
respective scores.</p></li>
<li><p>This model is pushed into the <strong>huggingface</strong>
profile and can be used by anytime and anyone to extract the sentiments
and the sentiment scores.</p></li>
<li><p>The pre-trained model is named
<strong>"my_awesome_model"</strong> in the huggingface models'
space.</p></li>
<li><p>Now, this pre-trained model will be used on the <strong>customer
churn data</strong> which can directly perform the activities of
retrieving the sentiment labels and score.</p></li>
<li><p>This trained model can be used to any of the tasks that involves
the sentiment extraction studies in the future.</p></li>
</ul>
</section>
<div class="cell code" id="uvayOzlgtItm">
<div class="sourceCode" id="cb41"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="nDlezHfjtIwS">
<div class="sourceCode" id="cb42"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="0eqBQg5DvCQH">
<div class="sourceCode" id="cb43"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb43-1"><a href="#cb43-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Giving some example text to check the working of the model</span></span>
<span id="cb43-2"><a href="#cb43-2" aria-hidden="true" tabindex="-1"></a>text <span class="op">=</span> <span class="st">&quot;This was a masterpiece. Not completely faithful to the books, but enthralling from beginning to end. Might be my favorite of the three.&quot;</span></span></code></pre></div>
</div>
<section
id="using-the-trained-model-from-the-huggingface-which-was-pushed-in-the-previous-step"
class="cell markdown" id="Ebc3uMiCW82d">
<h1><strong>Using the trained model from the huggingface which was
pushed in the previous step</strong>.</h1>
<pre><code> &quot;Pipeline&quot; function is used to import the pre-trained model.</code></pre>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:333,&quot;referenced_widgets&quot;:[&quot;3d44d69621b94faa812b8afe564a4a9b&quot;,&quot;4d1666cba02543d484d3e68f8666fa0c&quot;,&quot;adcac645f2c7451798fa8a6fcc642f2e&quot;,&quot;c3848b5747bf447a9a28a1ac8e76eb83&quot;,&quot;6133c22f9f534ab8884fa57feb041a9d&quot;,&quot;f244e6363a5c4689b46f9678011fedbe&quot;,&quot;88ed14d2ff814194977eb018e54fda3b&quot;,&quot;7ee05cb140964aefac3867d3f42cbee9&quot;,&quot;34f9c260e91640ae8344681dd77d5de0&quot;,&quot;2665b3ffc59e4f7da56bea827c8c8124&quot;,&quot;5a0b9c3e986d4d84aacf1c95410be5c3&quot;,&quot;7c056a778233462883c8791f75afeda4&quot;,&quot;131267adb72a4ac2acea04c5b66c9242&quot;,&quot;26693dce805c45e3bc0351f99bf461ef&quot;,&quot;81ad4e01f3fa4cdc8d8f55d2e5e69bac&quot;,&quot;23027e68a89246d59e16e0cdfa086744&quot;,&quot;4f219f41762f408ebf120f82263d2964&quot;,&quot;e2ba7ea0bbd8469aaa8012cc12f75f45&quot;,&quot;39ae007578fe4e9d970180ca4ed2b3ac&quot;,&quot;a002f0968b354d35aee28b5fcf0650d2&quot;,&quot;6fcdfb1267f444fdac63a656f6c2b23d&quot;,&quot;9edf2acbcfcd4d75afc0348c7cbef266&quot;,&quot;08379b4c461940a495beed0ad37d097b&quot;,&quot;ce13771b33cc4425a66a076b107ea711&quot;,&quot;cb19844223364fad9c829c24946eefc5&quot;,&quot;209cdc2526054049ae7b252957cffdad&quot;,&quot;a596956c011b4cb9aa8b7af2883381ab&quot;,&quot;5f0232a279324a5f9484597f4b87068d&quot;,&quot;f56b49fee3b841d387c15adab937daee&quot;,&quot;1560559f3b09456fabfc17bc45d14a5a&quot;,&quot;df61c0f86d4b43d2b82aa03df4c031e1&quot;,&quot;cba7c7787d3b4b248db5d7d4a36a0015&quot;,&quot;a6e8ce3b8edc4e708ba8675287a1a54b&quot;,&quot;68250a8b4c074f59ae1d0584d3444f24&quot;,&quot;bc10e11527584267bc20b316f572dde9&quot;,&quot;8ad519c9a6c54c14a3591143d36a7686&quot;,&quot;bad3bac055c5432c91e6fb417ef9b881&quot;,&quot;a604049c65294feda27bf6e340c4ec96&quot;,&quot;56683e74d642456daa8300b7210e06be&quot;,&quot;b88d4a418f2443a8a4df56b941188b17&quot;,&quot;8695a8812e864bae9f7f2e0602e00047&quot;,&quot;795430768f354f7fa5c0602e3a70599e&quot;,&quot;6dabf69ae70f4b8c80b68746f58e1f69&quot;,&quot;406e0c1078374f74acd1f8a400a17f0a&quot;,&quot;f19f7f5a7294420cae6868e946924436&quot;,&quot;46655c6400264a36b328b18081591d45&quot;,&quot;a5793c6d0e654151b00df3c9d651f7a4&quot;,&quot;992c8fa1f5944e848441a052b4805ae4&quot;,&quot;09687d2599c148d696851a4876c434a8&quot;,&quot;025e8fddc1fb47ffb287d798e016f156&quot;,&quot;e96f5a012dda4f2792096fb5db89e97c&quot;,&quot;fada9658e8a7465fa1f56043b9c40071&quot;,&quot;a0c2ade4469146c6b1fe8893d5dcf99c&quot;,&quot;9df529b45ba9434fa4d3836c34d6fe44&quot;,&quot;73218fbd160444499f54ef132d44d709&quot;,&quot;27552e17004f46e093b3a66624469f0e&quot;,&quot;6fcaa05d4a6c4f4c8d442318da275ce7&quot;,&quot;b4b6049ca3ed428ea26a0642f2693e66&quot;,&quot;24ebc81d8738490aafd9fb2956c859fa&quot;,&quot;378b021b2f5d47d988911b1fd5efbb05&quot;,&quot;7c43c55f722a45bb9b1cc0e652aed136&quot;,&quot;60e252f4543c425db46b07706852cdf6&quot;,&quot;218783b3c79b490c9d24d469870356de&quot;,&quot;91990ad1ea524359a7dd97fd1fd6f854&quot;,&quot;ddc96fe599994e5b9d0a2ac1e4be1aa2&quot;,&quot;683cb868a9d7473ca4ecdf4c3b4496cf&quot;]}"
id="YWoRZVDTvF_5" data-outputId="679b1ad1-6b45-4225-c487-f0a99bebe6e2">
<div class="sourceCode" id="cb45"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb45-1"><a href="#cb45-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> transformers <span class="im">import</span> pipeline</span>
<span id="cb45-2"><a href="#cb45-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb45-3"><a href="#cb45-3" aria-hidden="true" tabindex="-1"></a>classifier <span class="op">=</span> pipeline(<span class="st">&quot;sentiment-analysis&quot;</span>, model<span class="op">=</span><span class="st">&quot;ganeshkota/my_awesome_model&quot;</span>)</span>
<span id="cb45-4"><a href="#cb45-4" aria-hidden="true" tabindex="-1"></a><span class="co"># classifier(text)</span></span></code></pre></div>
<div class="output stream stderr">
<pre><code>/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning: 
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
  warnings.warn(
</code></pre>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb47"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb47-1"><a href="#cb47-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;3d44d69621b94faa812b8afe564a4a9b&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb48"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb48-1"><a href="#cb48-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;7c056a778233462883c8791f75afeda4&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb49"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb49-1"><a href="#cb49-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;08379b4c461940a495beed0ad37d097b&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb50"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb50-1"><a href="#cb50-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;68250a8b4c074f59ae1d0584d3444f24&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb51"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb51-1"><a href="#cb51-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;f19f7f5a7294420cae6868e946924436&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
<div class="output display_data">
<div class="sourceCode" id="cb52"><pre
class="sourceCode json"><code class="sourceCode json"><span id="cb52-1"><a href="#cb52-1" aria-hidden="true" tabindex="-1"></a><span class="fu">{</span><span class="dt">&quot;model_id&quot;</span><span class="fu">:</span><span class="st">&quot;27552e17004f46e093b3a66624469f0e&quot;</span><span class="fu">,</span><span class="dt">&quot;version_major&quot;</span><span class="fu">:</span><span class="dv">2</span><span class="fu">,</span><span class="dt">&quot;version_minor&quot;</span><span class="fu">:</span><span class="dv">0</span><span class="fu">}</span></span></code></pre></div>
</div>
</div>
<section id="creating-empty-matrices-to-store-the-labels-and-scores"
class="cell markdown" id="v9LoZNucA63C">
<h1><strong>Creating empty matrices to store the labels and
scores</strong></h1>
</section>
<div class="cell code" id="GPDE69ay1p2i">
<div class="sourceCode" id="cb53"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb53-1"><a href="#cb53-1" aria-hidden="true" tabindex="-1"></a>labels <span class="op">=</span> []</span>
<span id="cb53-2"><a href="#cb53-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb53-3"><a href="#cb53-3" aria-hidden="true" tabindex="-1"></a>scores <span class="op">=</span> []</span>
<span id="cb53-4"><a href="#cb53-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb53-5"><a href="#cb53-5" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
</div>
<section
id="attaching-the-colab-notebook-to-local-system-to-import-the-dataset"
class="cell markdown" id="Q-Pb6FCySLVn">
<h1><strong>Attaching the Colab Notebook to local system to import the
dataset.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:73}"
id="6ien1M00qGCl" data-outputId="27dfaee5-5221-440a-8180-069b957ae431">
<div class="sourceCode" id="cb54"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb54-1"><a href="#cb54-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> google.colab <span class="im">import</span> files</span>
<span id="cb54-2"><a href="#cb54-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> google.colab <span class="im">import</span> drive</span>
<span id="cb54-3"><a href="#cb54-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb54-4"><a href="#cb54-4" aria-hidden="true" tabindex="-1"></a><span class="co"># drive.mount(&#39;/content/drive&#39;)</span></span>
<span id="cb54-5"><a href="#cb54-5" aria-hidden="true" tabindex="-1"></a>uploaded <span class="op">=</span> files.upload()</span></code></pre></div>
<div class="output display_data">

     <input type="file" id="files-0d15bdf4-5aca-400b-ac95-d6c0a527bacc" name="files[]" multiple disabled
        style="border:none" />
     <output id="result-0d15bdf4-5aca-400b-ac95-d6c0a527bacc">
      Upload widget is only available when the cell has been executed in the
      current browser session. Please rerun this cell to enable.
      </output>
      <script>// Copyright 2017 Google LLC
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
//      http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

/**
 * @fileoverview Helpers for google.colab Python module.
 */
(function(scope) {
function span(text, styleAttributes = {}) {
  const element = document.createElement('span');
  element.textContent = text;
  for (const key of Object.keys(styleAttributes)) {
    element.style[key] = styleAttributes[key];
  }
  return element;
}

// Max number of bytes which will be uploaded at a time.
const MAX_PAYLOAD_SIZE = 100 * 1024;

function _uploadFiles(inputId, outputId) {
  const steps = uploadFilesStep(inputId, outputId);
  const outputElement = document.getElementById(outputId);
  // Cache steps on the outputElement to make it available for the next call
  // to uploadFilesContinue from Python.
  outputElement.steps = steps;

  return _uploadFilesContinue(outputId);
}

// This is roughly an async generator (not supported in the browser yet),
// where there are multiple asynchronous steps and the Python side is going
// to poll for completion of each step.
// This uses a Promise to block the python side on completion of each step,
// then passes the result of the previous step as the input to the next step.
function _uploadFilesContinue(outputId) {
  const outputElement = document.getElementById(outputId);
  const steps = outputElement.steps;

  const next = steps.next(outputElement.lastPromiseValue);
  return Promise.resolve(next.value.promise).then((value) => {
    // Cache the last promise value to make it available to the next
    // step of the generator.
    outputElement.lastPromiseValue = value;
    return next.value.response;
  });
}

/**
 * Generator function which is called between each async step of the upload
 * process.
 * @param {string} inputId Element ID of the input file picker element.
 * @param {string} outputId Element ID of the output display.
 * @return {!Iterable<!Object>} Iterable of next steps.
 */
function* uploadFilesStep(inputId, outputId) {
  const inputElement = document.getElementById(inputId);
  inputElement.disabled = false;

  const outputElement = document.getElementById(outputId);
  outputElement.innerHTML = '';

  const pickedPromise = new Promise((resolve) => {
    inputElement.addEventListener('change', (e) => {
      resolve(e.target.files);
    });
  });

  const cancel = document.createElement('button');
  inputElement.parentElement.appendChild(cancel);
  cancel.textContent = 'Cancel upload';
  const cancelPromise = new Promise((resolve) => {
    cancel.onclick = () => {
      resolve(null);
    };
  });

  // Wait for the user to pick the files.
  const files = yield {
    promise: Promise.race([pickedPromise, cancelPromise]),
    response: {
      action: 'starting',
    }
  };

  cancel.remove();

  // Disable the input element since further picks are not allowed.
  inputElement.disabled = true;

  if (!files) {
    return {
      response: {
        action: 'complete',
      }
    };
  }

  for (const file of files) {
    const li = document.createElement('li');
    li.append(span(file.name, {fontWeight: 'bold'}));
    li.append(span(
        `(${file.type || 'n/a'}) - ${file.size} bytes, ` +
        `last modified: ${
            file.lastModifiedDate ? file.lastModifiedDate.toLocaleDateString() :
                                    'n/a'} - `));
    const percent = span('0% done');
    li.appendChild(percent);

    outputElement.appendChild(li);

    const fileDataPromise = new Promise((resolve) => {
      const reader = new FileReader();
      reader.onload = (e) => {
        resolve(e.target.result);
      };
      reader.readAsArrayBuffer(file);
    });
    // Wait for the data to be ready.
    let fileData = yield {
      promise: fileDataPromise,
      response: {
        action: 'continue',
      }
    };

    // Use a chunked sending to avoid message size limits. See b/62115660.
    let position = 0;
    do {
      const length = Math.min(fileData.byteLength - position, MAX_PAYLOAD_SIZE);
      const chunk = new Uint8Array(fileData, position, length);
      position += length;

      const base64 = btoa(String.fromCharCode.apply(null, chunk));
      yield {
        response: {
          action: 'append',
          file: file.name,
          data: base64,
        },
      };

      let percentDone = fileData.byteLength === 0 ?
          100 :
          Math.round((position / fileData.byteLength) * 100);
      percent.textContent = `${percentDone}% done`;

    } while (position < fileData.byteLength);
  }

  // All done.
  yield {
    response: {
      action: 'complete',
    }
  };
}

scope.google = scope.google || {};
scope.google.colab = scope.google.colab || {};
scope.google.colab._files = {
  _uploadFiles,
  _uploadFilesContinue,
};
})(self);
</script> 
</div>
<div class="output stream stdout">
<pre><code>Saving churn.csv to churn.csv
</code></pre>
</div>
</div>
<section id="importing-the-customer-churn-data" class="cell markdown"
id="WNa5-qXUSVzT">
<h1><strong>Importing the Customer Churn Data</strong></h1>
</section>
<div class="cell code" id="XE10_qExrcWF">
<div class="sourceCode" id="cb56"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb56-1"><a href="#cb56-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb56-2"><a href="#cb56-2" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> pd.read_csv(<span class="st">&quot;churn.csv&quot;</span>)</span></code></pre></div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="sh8txm36tFa8" data-outputId="9f9d7011-6e6d-4205-e94d-fa0abf047602">
<div class="sourceCode" id="cb57"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb57-1"><a href="#cb57-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Checking the rows and columns count</span></span>
<span id="cb57-2"><a href="#cb57-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb57-3"><a href="#cb57-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The numer of rows are: &quot;</span>,df.shape[<span class="dv">0</span>], <span class="st">&quot;</span><span class="ch">\n</span><span class="st">The number of columns are: &quot;</span>,df.shape[<span class="dv">1</span>])</span>
<span id="cb57-4"><a href="#cb57-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb57-5"><a href="#cb57-5" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">&quot;</span>,df.columns) <span class="co">## All the variables/columns in the data</span></span></code></pre></div>
<div class="output stream stdout">
<pre><code>The numer of rows are:  36992 
The number of columns are:  24

 Index([&#39;Unnamed: 0&#39;, &#39;age&#39;, &#39;gender&#39;, &#39;security_no&#39;, &#39;region_category&#39;,
       &#39;membership_category&#39;, &#39;joining_date&#39;, &#39;joined_through_referral&#39;,
       &#39;referral_id&#39;, &#39;preferred_offer_types&#39;, &#39;medium_of_operation&#39;,
       &#39;internet_option&#39;, &#39;last_visit_time&#39;, &#39;days_since_last_login&#39;,
       &#39;avg_time_spent&#39;, &#39;avg_transaction_value&#39;, &#39;avg_frequency_login_days&#39;,
       &#39;points_in_wallet&#39;, &#39;used_special_discount&#39;,
       &#39;offer_application_preference&#39;, &#39;past_complaint&#39;, &#39;complaint_status&#39;,
       &#39;feedback&#39;, &#39;churn_risk_score&#39;],
      dtype=&#39;object&#39;)
</code></pre>
</div>
</div>
<section
id="extracting-the-sentiment-scores-and-labels-for-the-customer-churn-data"
class="cell markdown" id="Jjr0snZxBBUm">
<h1><strong>Extracting the sentiment scores and labels for the customer
churn data</strong></h1>
</section>
<div class="cell code" id="GKqIhKsV2ntZ">
<div class="sourceCode" id="cb59"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb59-1"><a href="#cb59-1" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> fine_tune <span class="kw">in</span> <span class="bu">range</span>(<span class="bu">len</span>(df)):</span>
<span id="cb59-2"><a href="#cb59-2" aria-hidden="true" tabindex="-1"></a>  result <span class="op">=</span> classifier(df[<span class="st">&#39;feedback&#39;</span>][fine_tune])</span>
<span id="cb59-3"><a href="#cb59-3" aria-hidden="true" tabindex="-1"></a>  label <span class="op">=</span> result[<span class="dv">0</span>][<span class="st">&#39;label&#39;</span>]</span>
<span id="cb59-4"><a href="#cb59-4" aria-hidden="true" tabindex="-1"></a>  score <span class="op">=</span> result[<span class="dv">0</span>][<span class="st">&#39;score&#39;</span>]</span>
<span id="cb59-5"><a href="#cb59-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb59-6"><a href="#cb59-6" aria-hidden="true" tabindex="-1"></a>  labels.append(label)</span>
<span id="cb59-7"><a href="#cb59-7" aria-hidden="true" tabindex="-1"></a>  scores.append(score)</span>
<span id="cb59-8"><a href="#cb59-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb59-9"><a href="#cb59-9" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;label&#39;</span>] <span class="op">=</span> labels</span>
<span id="cb59-10"><a href="#cb59-10" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;score&#39;</span>] <span class="op">=</span> scores</span></code></pre></div>
</div>
<div class="cell markdown" id="CdrXxZYcssO0">
<p><strong>After performing the above step, the sentiment labels and the
corresponding sentiment scores are stored in the "label" and "score"
variables.</strong></p>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="7ycjzBhQtazA" data-outputId="63758b9a-9d9c-4b35-8f2e-5b6e0e3e8377">
<div class="sourceCode" id="cb60"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb60-1"><a href="#cb60-1" aria-hidden="true" tabindex="-1"></a>df.columns</span></code></pre></div>
<div class="output execute_result" data-execution_count="10">
<pre><code>Index([&#39;Unnamed: 0&#39;, &#39;age&#39;, &#39;gender&#39;, &#39;security_no&#39;, &#39;region_category&#39;,
       &#39;membership_category&#39;, &#39;joining_date&#39;, &#39;joined_through_referral&#39;,
       &#39;referral_id&#39;, &#39;preferred_offer_types&#39;, &#39;medium_of_operation&#39;,
       &#39;internet_option&#39;, &#39;last_visit_time&#39;, &#39;days_since_last_login&#39;,
       &#39;avg_time_spent&#39;, &#39;avg_transaction_value&#39;, &#39;avg_frequency_login_days&#39;,
       &#39;points_in_wallet&#39;, &#39;used_special_discount&#39;,
       &#39;offer_application_preference&#39;, &#39;past_complaint&#39;, &#39;complaint_status&#39;,
       &#39;feedback&#39;, &#39;churn_risk_score&#39;, &#39;label&#39;, &#39;score&#39;],
      dtype=&#39;object&#39;)</code></pre>
</div>
</div>
<section
id="now-the-sentiment-scores-and-sentiments-of-the-customers-are-extracted-from-their-feedback-and-are-added-to-the-data"
class="cell markdown" id="DMRNjjIt0X-D">
<h1><strong>Now, the sentiment scores and sentiments of the customers
are extracted from their feedback and are added to the
data.</strong></h1>
</section>
<div class="cell markdown" id="-yVUwJh9h5b4">
<h1 id="1-this-new-data-will-now-be-used-to-train-the-models"><strong>1.
This new data will now be used to train the models.</strong></h1>
<h1
id="2-the-rest-of-the-process-from-here-remains-same-which-was-implemented-in-phase-1"><strong>2.
The rest of the process from here remains same which was implemented in
Phase 1.</strong></h1>
<h1
id="3-first-the-eda-and-the-other-preprocessing-steps-will-be-performed-from-this-stage"><strong>3.
First, the EDA and the other preprocessing steps will be performed from
this stage.</strong></h1>
</div>
<div class="cell code" id="pw0L3JaOtOOu">
<div class="sourceCode" id="cb62"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="11-aD7SVtORD">
<div class="sourceCode" id="cb63"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="exploratory-data-analysis-starts" class="cell markdown"
id="StzlzrM2S8Mw">
<h1><strong>Exploratory Data Analysis Starts</strong></h1>
</section>
<section id="checking-for-class-imbalance" class="cell markdown"
id="AzbAUefkwUE1">
<h2><strong>Checking for Class Imbalance</strong></h2>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:639}"
id="r5xXIPjyuk1D" data-outputId="0e989aaa-2554-4f01-98f4-5e78b6efeb41">
<div class="sourceCode" id="cb64"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb64-1"><a href="#cb64-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Exploratory Data Analysis starts from here</span></span>
<span id="cb64-2"><a href="#cb64-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb64-3"><a href="#cb64-3" aria-hidden="true" tabindex="-1"></a><span class="co">## Beginning of Data Cleaning Process</span></span>
<span id="cb64-4"><a href="#cb64-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb64-5"><a href="#cb64-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb64-6"><a href="#cb64-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb64-7"><a href="#cb64-7" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The counts of observations in each class are: &quot;</span>)</span>
<span id="cb64-8"><a href="#cb64-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(df[<span class="st">&#39;churn_risk_score&#39;</span>].value_counts())</span>
<span id="cb64-9"><a href="#cb64-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb64-10"><a href="#cb64-10" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The percentage of classes are:&quot;</span>)</span>
<span id="cb64-11"><a href="#cb64-11" aria-hidden="true" tabindex="-1"></a><span class="co"># percentage of labels within the each class</span></span>
<span id="cb64-12"><a href="#cb64-12" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>((df[<span class="st">&#39;churn_risk_score&#39;</span>].value_counts()<span class="op">/</span><span class="bu">float</span>(<span class="bu">len</span>(df))) <span class="op">*</span> <span class="dv">100</span>)</span>
<span id="cb64-13"><a href="#cb64-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb64-14"><a href="#cb64-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb64-15"><a href="#cb64-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Percentage Distribution among each class</span></span>
<span id="cb64-16"><a href="#cb64-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb64-17"><a href="#cb64-17" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The Bar plot distribution for Percentages of each class are: &quot;</span>)</span>
<span id="cb64-18"><a href="#cb64-18" aria-hidden="true" tabindex="-1"></a>(df[<span class="st">&#39;churn_risk_score&#39;</span>].value_counts()<span class="op">/</span><span class="bu">float</span>(<span class="bu">len</span>(df))).plot.bar()</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The counts of observations in each class are: 
1    20012
0    16980
Name: churn_risk_score, dtype: int64

The percentage of classes are:
1    54.098183
0    45.901817
Name: churn_risk_score, dtype: float64

The Bar plot distribution for Percentages of each class are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="28">
<pre><code>&lt;Axes: &gt;</code></pre>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/929deee7f4f339142bb039d4023a6e5e2501c08c.png" /></p>
</div>
</div>
<div class="cell markdown" id="mWc9wJenwZOZ">
<p><strong>From the above output it can be seen that there is no too
much imbalance in Classes. This can be observed from the fact that Class
1 has 20012 observations which is 54.09% of all the observations and
Class 0 has 16980 observations which is about 45.90% of all the
observations. As it can be seen that, there is not much difference in
classes' counts, there is no class imbalance and hence we can proceed
for the next analytical steps. Precisely, there is no need to apply for
ADASYN or SMOTE</strong></p>
</div>
<section id="checking-for-null-values-across-the-variables"
class="cell markdown" id="eEO0oBK9T5VI">
<h1><strong>Checking for Null Values across the variables</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="kTlIRkI2-uYC" data-outputId="5e2963ef-3f3d-4590-e8a5-f787a47c55f1">
<div class="sourceCode" id="cb67"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb67-1"><a href="#cb67-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Checking if there are any Null Values in the data</span></span>
<span id="cb67-2"><a href="#cb67-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The Null values for each column are: </span><span class="ch">\n</span><span class="st">&quot;</span>,df.isna().<span class="bu">sum</span>())</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The Null values for each column are: 
 Unnamed: 0                         0
age                                0
gender                             0
security_no                        0
region_category                 5428
membership_category                0
joining_date                       0
joined_through_referral            0
referral_id                        0
preferred_offer_types            288
medium_of_operation                0
internet_option                    0
last_visit_time                    0
days_since_last_login              0
avg_time_spent                     0
avg_transaction_value              0
avg_frequency_login_days           0
points_in_wallet                3443
used_special_discount              0
offer_application_preference       0
past_complaint                     0
complaint_status                   0
feedback                           0
churn_risk_score                   0
label                              0
score                              0
dtype: int64
</code></pre>
</div>
</div>
<section
id="from-the-above-output-it-can-be-seen-that-there-are-few-null-values-associated-with-some-variables-these-null-values-will-be-removed-in-the-data-cleaning-and-preprocessing-step"
class="cell markdown" id="rIx3Evwqjp50">
<h3><strong>From the above output, it can be seen that there are few
null values associated with some variables. These null values will be
removed in the DATA CLEANING and PREPROCESSING step.</strong></h3>
</section>
<section id="checking-for-duplicates-in-the-data" class="cell markdown"
id="OONLZ8tpUB9H">
<h1><strong>Checking for Duplicates in the data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="pXRRJoY4V2VK" data-outputId="7714b41f-0d8b-4be0-86b7-dbf7b0e4fec8">
<div class="sourceCode" id="cb69"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb69-1"><a href="#cb69-1" aria-hidden="true" tabindex="-1"></a>df.duplicated().<span class="bu">sum</span>()</span></code></pre></div>
<div class="output execute_result" data-execution_count="10">
<pre><code>0</code></pre>
</div>
</div>
<div class="cell markdown" id="nmFwXoR8UFTh">
<p><strong>From the above output, it can be seen that there are no
duplicates present in the data.</strong></p>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="WCgRpoAwyKLf" data-outputId="11937eba-c32c-4f1a-c277-70a14c9c815a">
<div class="sourceCode" id="cb71"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb71-1"><a href="#cb71-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Checking the Datatypes for each variable.</span></span>
<span id="cb71-2"><a href="#cb71-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-3"><a href="#cb71-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The data types for each variable are:</span><span class="ch">\n</span><span class="st">&quot;</span>,df.dtypes)</span>
<span id="cb71-4"><a href="#cb71-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb71-5"><a href="#cb71-5" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<div class="output stream stdout">
<pre><code>The data types for each variable are:
 Unnamed: 0                        int64
age                               int64
gender                           object
security_no                      object
region_category                  object
membership_category              object
joining_date                     object
joined_through_referral          object
referral_id                      object
preferred_offer_types            object
medium_of_operation              object
internet_option                  object
last_visit_time                  object
days_since_last_login             int64
avg_time_spent                  float64
avg_transaction_value           float64
avg_frequency_login_days         object
points_in_wallet                float64
used_special_discount            object
offer_application_preference     object
past_complaint                   object
complaint_status                 object
feedback                         object
churn_risk_score                  int64
label                            object
score                           float64
dtype: object
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="jOxsvcGH6KTQ" data-outputId="24296e56-6bf6-4ae3-ff97-329118ef3cd3">
<div class="sourceCode" id="cb73"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb73-1"><a href="#cb73-1" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;label&#39;</span>].unique()</span></code></pre></div>
<div class="output execute_result" data-execution_count="32">
<pre><code>array([&#39;POSITIVE&#39;, &#39;NEGATIVE&#39;], dtype=object)</code></pre>
</div>
</div>
<div class="cell markdown" id="oZVBQ-EiCQHS">
<p><strong>The data types of the variables appear to be inappropriate.
Nonetheless, some of the variables which are in the "object" type will
be convereted into "category" format. This is due to values in the
respective variables like "gender" having factorial values like "Male"
and "Female".</strong></p>
</div>
<section
id="data-cleaning-and-preprocessing-beginning-of-the-data-cleaning-process"
class="cell markdown" id="fsmleTsUfQWh">
<h1><strong>Data Cleaning and Preprocessing: Beginning of the Data
Cleaning Process</strong></h1>
</section>
<section id="assigning-numeric-labels-to-categories"
class="cell markdown" id="OgCrnMwfksB1">
<h1><strong>Assigning numeric labels to Categories</strong></h1>
<p><strong>Few of the important variables that are requried for the
anlaysis further are mapped with numerical values based on their
categories. This is done for further feasibility of analysis and
building of the models.</strong></p>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="DZCll0uZ0pWS" data-outputId="12b0c88b-dafc-4cfe-dfde-f8fd1c355306">
<div class="sourceCode" id="cb75"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb75-1"><a href="#cb75-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;gender&#39; variable into numeric</span></span>
<span id="cb75-2"><a href="#cb75-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-3"><a href="#cb75-3" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> df[df[<span class="st">&#39;gender&#39;</span>].isin([<span class="st">&#39;F&#39;</span>,<span class="st">&#39;M&#39;</span>])]</span>
<span id="cb75-4"><a href="#cb75-4" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;gender&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;gender&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;F&#39;</span>:<span class="dv">0</span>,<span class="st">&#39;M&#39;</span>:<span class="dv">1</span>})</span>
<span id="cb75-5"><a href="#cb75-5" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The Unique values in the GENDER Variable are: </span><span class="ch">\n</span><span class="st">&quot;</span>)</span>
<span id="cb75-6"><a href="#cb75-6" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(df[<span class="st">&#39;gender&#39;</span>].unique())</span>
<span id="cb75-7"><a href="#cb75-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-8"><a href="#cb75-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-9"><a href="#cb75-9" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;region_category&#39; variable into numeric</span></span>
<span id="cb75-10"><a href="#cb75-10" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;region_category&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;region_category&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;Village&#39;</span>:<span class="dv">0</span>, <span class="st">&#39;Town&#39;</span>:<span class="dv">1</span>, <span class="st">&#39;City&#39;</span>:<span class="dv">2</span>})</span>
<span id="cb75-11"><a href="#cb75-11" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The Unique values in the REGION_CATEGORY Variable are: </span><span class="ch">\n</span><span class="st">&quot;</span>,df[<span class="st">&#39;region_category&#39;</span>].unique())</span>
<span id="cb75-12"><a href="#cb75-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-13"><a href="#cb75-13" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;membership_category&#39; variable into numeric</span></span>
<span id="cb75-14"><a href="#cb75-14" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;membership_category&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;membership_category&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;No Membership&#39;</span>:<span class="dv">0</span>, <span class="st">&#39;Basic Membership&#39;</span>:<span class="dv">1</span>, <span class="st">&#39;Silver Membership&#39;</span>:<span class="dv">2</span>,</span>
<span id="cb75-15"><a href="#cb75-15" aria-hidden="true" tabindex="-1"></a>                                                                  <span class="st">&#39;Gold Membership&#39;</span>:<span class="dv">3</span>, <span class="st">&#39;Platinum Membership&#39;</span>:<span class="dv">4</span>, <span class="st">&#39;Premium Membership&#39;</span>:<span class="dv">5</span>})</span>
<span id="cb75-16"><a href="#cb75-16" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The Unique values in the membership_category Variable are: </span><span class="ch">\n</span><span class="st">&quot;</span>,df[<span class="st">&#39;membership_category&#39;</span>].unique())</span>
<span id="cb75-17"><a href="#cb75-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-18"><a href="#cb75-18" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;medium_of_operation&#39; variable into numeric</span></span>
<span id="cb75-19"><a href="#cb75-19" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;medium_of_operation&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;medium_of_operation&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;Smartphone&#39;</span>:<span class="dv">0</span>, <span class="st">&#39;Desktop&#39;</span>:<span class="dv">1</span>, <span class="st">&#39;Both&#39;</span>:<span class="dv">2</span>})</span>
<span id="cb75-20"><a href="#cb75-20" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The Unique values in the medium_of_operation Variable are: </span><span class="ch">\n</span><span class="st">&quot;</span>,df[<span class="st">&#39;medium_of_operation&#39;</span>].unique())</span>
<span id="cb75-21"><a href="#cb75-21" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;medium_of_operation&#39;</span>].unique()</span>
<span id="cb75-22"><a href="#cb75-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-23"><a href="#cb75-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-24"><a href="#cb75-24" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;internet_option&#39; variable into numeric</span></span>
<span id="cb75-25"><a href="#cb75-25" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;internet_option&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;internet_option&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;Mobile_Data&#39;</span>:<span class="dv">0</span>, <span class="st">&#39;Wi-Fi&#39;</span>:<span class="dv">1</span>, <span class="st">&#39;Fiber_Optic&#39;</span>:<span class="dv">2</span>})</span>
<span id="cb75-26"><a href="#cb75-26" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;internet_option&#39;</span>].unique()</span>
<span id="cb75-27"><a href="#cb75-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The Unique values in the internet_option Variable are: </span><span class="ch">\n</span><span class="st">&quot;</span>,df[<span class="st">&#39;internet_option&#39;</span>].unique())</span>
<span id="cb75-28"><a href="#cb75-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-29"><a href="#cb75-29" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;past_complaint&#39; variable into numeric</span></span>
<span id="cb75-30"><a href="#cb75-30" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;past_complaint&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;past_complaint&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;No&#39;</span>:<span class="dv">0</span>, <span class="st">&#39;Yes&#39;</span>:<span class="dv">1</span>})</span>
<span id="cb75-31"><a href="#cb75-31" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;past_complaint&#39;</span>].unique()</span>
<span id="cb75-32"><a href="#cb75-32" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The Unique values in the past_complaint Variable are: </span><span class="ch">\n</span><span class="st">&quot;</span>,df[<span class="st">&#39;past_complaint&#39;</span>].unique())</span>
<span id="cb75-33"><a href="#cb75-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-34"><a href="#cb75-34" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;complaint_status&#39; variable into numeric</span></span>
<span id="cb75-35"><a href="#cb75-35" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;complaint_status&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;complaint_status&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;Not Applicable&#39;</span>:<span class="dv">0</span>, <span class="st">&#39;No Information Available&#39;</span>:<span class="dv">1</span>,</span>
<span id="cb75-36"><a href="#cb75-36" aria-hidden="true" tabindex="-1"></a>                                                            <span class="st">&#39;Unsolved&#39;</span>:<span class="dv">3</span>,<span class="st">&#39;Solved in Follow-up&#39;</span>:<span class="dv">4</span>,</span>
<span id="cb75-37"><a href="#cb75-37" aria-hidden="true" tabindex="-1"></a>                                                            <span class="st">&#39;Solved&#39;</span>:<span class="dv">5</span>})</span>
<span id="cb75-38"><a href="#cb75-38" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;complaint_status&#39;</span>].unique()</span>
<span id="cb75-39"><a href="#cb75-39" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-40"><a href="#cb75-40" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n</span><span class="st">The Unique values in the complaint_status Variable are: </span><span class="ch">\n</span><span class="st">&quot;</span>,df[<span class="st">&#39;complaint_status&#39;</span>].unique())</span>
<span id="cb75-41"><a href="#cb75-41" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-42"><a href="#cb75-42" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-43"><a href="#cb75-43" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-44"><a href="#cb75-44" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-45"><a href="#cb75-45" aria-hidden="true" tabindex="-1"></a><span class="co">## Converting the &#39;label&#39; variable into numeric</span></span>
<span id="cb75-46"><a href="#cb75-46" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;label&#39;</span>] <span class="op">=</span> df[<span class="st">&#39;label&#39;</span>].<span class="bu">map</span>({<span class="st">&#39;POSITIVE&#39;</span>:<span class="dv">1</span>, <span class="st">&#39;NEGATIVE&#39;</span>:<span class="dv">0</span>})</span>
<span id="cb75-47"><a href="#cb75-47" aria-hidden="true" tabindex="-1"></a>df[<span class="st">&#39;label&#39;</span>].unique()</span>
<span id="cb75-48"><a href="#cb75-48" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-49"><a href="#cb75-49" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb75-50"><a href="#cb75-50" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(df.dtypes)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The Unique values in the GENDER Variable are: 

[0 1]

The Unique values in the REGION_CATEGORY Variable are: 
 [ 0.  2.  1. nan]

The Unique values in the membership_category Variable are: 
 [4 5 0 3 2 1]

The Unique values in the medium_of_operation Variable are: 
 [nan  1.  0.  2.]

The Unique values in the internet_option Variable are: 
 [1 0 2]

The Unique values in the past_complaint Variable are: 
 [0 1]

The Unique values in the complaint_status Variable are: 
 [0 5 4 3 1]
Unnamed: 0                        int64
age                               int64
gender                            int64
security_no                      object
region_category                 float64
membership_category               int64
joining_date                     object
joined_through_referral          object
referral_id                      object
preferred_offer_types            object
medium_of_operation             float64
internet_option                   int64
last_visit_time                  object
days_since_last_login             int64
avg_time_spent                  float64
avg_transaction_value           float64
avg_frequency_login_days         object
points_in_wallet                float64
used_special_discount            object
offer_application_preference     object
past_complaint                    int64
complaint_status                  int64
feedback                         object
churn_risk_score                  int64
label                             int64
score                           float64
dtype: object
</code></pre>
</div>
<div class="output stream stderr">
<pre><code>&lt;ipython-input-9-a83be7fc7985&gt;:4: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[&#39;gender&#39;] = df[&#39;gender&#39;].map({&#39;F&#39;:0,&#39;M&#39;:1})
&lt;ipython-input-9-a83be7fc7985&gt;:10: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[&#39;region_category&#39;] = df[&#39;region_category&#39;].map({&#39;Village&#39;:0, &#39;Town&#39;:1, &#39;City&#39;:2})
&lt;ipython-input-9-a83be7fc7985&gt;:14: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[&#39;membership_category&#39;] = df[&#39;membership_category&#39;].map({&#39;No Membership&#39;:0, &#39;Basic Membership&#39;:1, &#39;Silver Membership&#39;:2,
&lt;ipython-input-9-a83be7fc7985&gt;:19: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[&#39;medium_of_operation&#39;] = df[&#39;medium_of_operation&#39;].map({&#39;Smartphone&#39;:0, &#39;Desktop&#39;:1, &#39;Both&#39;:2})
&lt;ipython-input-9-a83be7fc7985&gt;:25: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df[&#39;internet_option&#39;] = df[&#39;internet_option&#39;].map({&#39;Mobile_Data&#39;:0, &#39;Wi-Fi&#39;:1, &#39;Fiber_Optic&#39;:2})
</code></pre>
</div>
</div>
<div class="cell markdown" id="c_e95IAE4iQ3">
<p><strong>The NaN values from the above output are nothing but there
are few Null or Missing Values in the data which are cleaned in the Data
Cleaning Process below.</strong></p>
</div>
<section id="removing-the-null-values" class="cell markdown"
id="SX4oM0PHT-qI">
<h1><strong>Removing the Null Values</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="XfYXPkE9-0IS" data-outputId="44f41a3b-ead9-4706-8268-3ed7239f8c53">
<div class="sourceCode" id="cb78"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb78-1"><a href="#cb78-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Removing the Null Values</span></span>
<span id="cb78-2"><a href="#cb78-2" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> df.dropna()</span>
<span id="cb78-3"><a href="#cb78-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb78-4"><a href="#cb78-4" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(df.isna().<span class="bu">sum</span>())</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Unnamed: 0                      0
age                             0
gender                          0
security_no                     0
region_category                 0
membership_category             0
joining_date                    0
joined_through_referral         0
referral_id                     0
preferred_offer_types           0
medium_of_operation             0
internet_option                 0
last_visit_time                 0
days_since_last_login           0
avg_time_spent                  0
avg_transaction_value           0
avg_frequency_login_days        0
points_in_wallet                0
used_special_discount           0
offer_application_preference    0
past_complaint                  0
complaint_status                0
feedback                        0
churn_risk_score                0
label                           0
score                           0
dtype: int64
</code></pre>
</div>
</div>
<section id="correlation-analysis" class="cell markdown"
id="W5MvvzmVlxtr">
<h1><strong>Correlation Analysis</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:879}"
id="rAc5PPcDnMRa" data-outputId="bd386f3b-64dd-4627-a4b0-05b0e28968a8">
<div class="sourceCode" id="cb80"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb80-1"><a href="#cb80-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb80-2"><a href="#cb80-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
<span id="cb80-3"><a href="#cb80-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb80-4"><a href="#cb80-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb80-5"><a href="#cb80-5" aria-hidden="true" tabindex="-1"></a><span class="co"># # Selecting numerical variables</span></span>
<span id="cb80-6"><a href="#cb80-6" aria-hidden="true" tabindex="-1"></a>numerical_vars <span class="op">=</span> df.select_dtypes(include<span class="op">=</span>[<span class="st">&#39;int64&#39;</span>, <span class="st">&#39;float64&#39;</span>]).columns.tolist()</span>
<span id="cb80-7"><a href="#cb80-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb80-8"><a href="#cb80-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The numerical variables are: </span><span class="sc">{</span>numerical_vars<span class="sc">}</span><span class="ss">&quot;</span>)</span>
<span id="cb80-9"><a href="#cb80-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb80-10"><a href="#cb80-10" aria-hidden="true" tabindex="-1"></a><span class="co"># # Selecting categorical variables</span></span>
<span id="cb80-11"><a href="#cb80-11" aria-hidden="true" tabindex="-1"></a><span class="co"># categorical_vars = df_cat.select_dtypes(include=[&#39;object&#39;]).columns.tolist()</span></span>
<span id="cb80-12"><a href="#cb80-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb80-13"><a href="#cb80-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Correlation Analysis for Numerical Variables</span></span>
<span id="cb80-14"><a href="#cb80-14" aria-hidden="true" tabindex="-1"></a>numerical_data <span class="op">=</span> df[numerical_vars]</span>
<span id="cb80-15"><a href="#cb80-15" aria-hidden="true" tabindex="-1"></a>correlation_matrix <span class="op">=</span> numerical_data.corr()</span>
<span id="cb80-16"><a href="#cb80-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb80-17"><a href="#cb80-17" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">10</span>, <span class="dv">8</span>))</span>
<span id="cb80-18"><a href="#cb80-18" aria-hidden="true" tabindex="-1"></a>sns.heatmap(correlation_matrix, annot<span class="op">=</span><span class="va">True</span>, cmap<span class="op">=</span><span class="st">&#39;coolwarm&#39;</span>, fmt<span class="op">=</span><span class="st">&quot;.2f&quot;</span>, linewidths<span class="op">=</span><span class="fl">.5</span>)</span>
<span id="cb80-19"><a href="#cb80-19" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Correlation Matrix of Numerical Variables&#39;</span>)</span>
<span id="cb80-20"><a href="#cb80-20" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The numerical variables are: [&#39;Unnamed: 0&#39;, &#39;age&#39;, &#39;gender&#39;, &#39;region_category&#39;, &#39;membership_category&#39;, &#39;medium_of_operation&#39;, &#39;internet_option&#39;, &#39;days_since_last_login&#39;, &#39;avg_time_spent&#39;, &#39;avg_transaction_value&#39;, &#39;points_in_wallet&#39;, &#39;past_complaint&#39;, &#39;complaint_status&#39;, &#39;churn_risk_score&#39;, &#39;label&#39;, &#39;score&#39;]
</code></pre>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/10c624d2f4e7c1e59216e5d8e5b2ee2d7f0adcd0.png" /></p>
</div>
</div>
<div class="cell markdown" id="0azT4O1Ba8TZ">
<p><strong>From the above correlational matrix, it can be observed that
there is no multicollinearity.</strong></p>
</div>
<section
id="variable-selection-selecting-the-required-variables-from-the-dataset"
class="cell markdown" id="9vt3KsusSior">
<h1><strong>Variable Selection: Selecting the required variables from
the Dataset</strong></h1>
</section>
<div class="cell markdown" id="NbdXKBjonuW7">
<p><strong>From all the above analysis performed, the variables which
are expected to be contributing to customer churn are selected.
Furthermore, in addition to the variables which are identified to be
more important, some other variables from the literature review which
are found out to be more impactful for the customer churn are also
selected.</strong></p>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="mQXWziRDX1zd" data-outputId="8230fcb2-c901-4183-bb4d-1ac45e01117f">
<div class="sourceCode" id="cb82"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb82-1"><a href="#cb82-1" aria-hidden="true" tabindex="-1"></a><span class="co"># df.shape</span></span>
<span id="cb82-2"><a href="#cb82-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb82-3"><a href="#cb82-3" aria-hidden="true" tabindex="-1"></a>df.columns</span></code></pre></div>
<div class="output execute_result" data-execution_count="11">
<pre><code>Index([&#39;Unnamed: 0&#39;, &#39;age&#39;, &#39;gender&#39;, &#39;security_no&#39;, &#39;region_category&#39;,
       &#39;membership_category&#39;, &#39;joining_date&#39;, &#39;joined_through_referral&#39;,
       &#39;referral_id&#39;, &#39;preferred_offer_types&#39;, &#39;medium_of_operation&#39;,
       &#39;internet_option&#39;, &#39;last_visit_time&#39;, &#39;days_since_last_login&#39;,
       &#39;avg_time_spent&#39;, &#39;avg_transaction_value&#39;, &#39;avg_frequency_login_days&#39;,
       &#39;points_in_wallet&#39;, &#39;used_special_discount&#39;,
       &#39;offer_application_preference&#39;, &#39;past_complaint&#39;, &#39;complaint_status&#39;,
       &#39;feedback&#39;, &#39;churn_risk_score&#39;, &#39;label&#39;, &#39;score&#39;],
      dtype=&#39;object&#39;)</code></pre>
</div>
</div>
<div class="cell markdown" id="WIUvChFs18b1">
<ul>
<li><p><strong>In this current phase of the study, the sentiment label
and sentiment score are also considered this time as part of the model
building and analysis which was not done in phase 1.</strong></p></li>
<li><p><strong>The other variables remain the same.</strong></p></li>
</ul>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="sRw8jOI6uW-q" data-outputId="b620679f-73f9-465a-af05-fdf19b015ea4">
<div class="sourceCode" id="cb84"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb84-1"><a href="#cb84-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb84-2"><a href="#cb84-2" aria-hidden="true" tabindex="-1"></a><span class="co">## Selecting required columns from the dataframe</span></span>
<span id="cb84-3"><a href="#cb84-3" aria-hidden="true" tabindex="-1"></a>df <span class="op">=</span> df.iloc[:,[<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">4</span>,<span class="dv">5</span>,<span class="dv">10</span>, <span class="dv">11</span>, <span class="dv">13</span>, <span class="dv">14</span>, <span class="dv">15</span>, <span class="dv">17</span>, <span class="dv">21</span>, <span class="dv">22</span>, <span class="dv">23</span>,<span class="dv">24</span>,<span class="dv">25</span>]]</span>
<span id="cb84-4"><a href="#cb84-4" aria-hidden="true" tabindex="-1"></a>df.columns</span></code></pre></div>
<div class="output execute_result" data-execution_count="11">
<pre><code>Index([&#39;age&#39;, &#39;gender&#39;, &#39;region_category&#39;, &#39;membership_category&#39;,
       &#39;medium_of_operation&#39;, &#39;internet_option&#39;, &#39;days_since_last_login&#39;,
       &#39;avg_time_spent&#39;, &#39;avg_transaction_value&#39;, &#39;points_in_wallet&#39;,
       &#39;complaint_status&#39;, &#39;feedback&#39;, &#39;churn_risk_score&#39;, &#39;label&#39;, &#39;score&#39;],
      dtype=&#39;object&#39;)</code></pre>
</div>
</div>
<section id="data-type-conversion-for-required-variables"
class="cell markdown" id="DpC6MJysUNsh">
<h1><strong>Data type conversion for required variables</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="qPaXMOje_bQ2" data-outputId="1ccb8994-84e8-425e-fca7-afdddda4e014">
<div class="sourceCode" id="cb86"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb86-1"><a href="#cb86-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb86-2"><a href="#cb86-2" aria-hidden="true" tabindex="-1"></a>columns <span class="op">=</span> [<span class="st">&#39;gender&#39;</span>,<span class="st">&#39;region_category&#39;</span>,<span class="st">&#39;medium_of_operation&#39;</span>,<span class="st">&#39;membership_category&#39;</span>,<span class="st">&#39;internet_option&#39;</span>,<span class="st">&#39;complaint_status&#39;</span>,<span class="st">&#39;churn_risk_score&#39;</span>,<span class="st">&#39;label&#39;</span>]</span>
<span id="cb86-3"><a href="#cb86-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb86-4"><a href="#cb86-4" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> i <span class="kw">in</span> columns:</span>
<span id="cb86-5"><a href="#cb86-5" aria-hidden="true" tabindex="-1"></a>  df[i] <span class="op">=</span> df[i].astype(<span class="st">&#39;category&#39;</span>)</span>
<span id="cb86-6"><a href="#cb86-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb86-7"><a href="#cb86-7" aria-hidden="true" tabindex="-1"></a><span class="co"># print(df.dtypes)</span></span>
<span id="cb86-8"><a href="#cb86-8" aria-hidden="true" tabindex="-1"></a>df.dtypes</span></code></pre></div>
<div class="output execute_result" data-execution_count="12">
<pre><code>age                         int64
gender                   category
region_category          category
membership_category      category
medium_of_operation      category
internet_option          category
days_since_last_login       int64
avg_time_spent            float64
avg_transaction_value     float64
points_in_wallet          float64
complaint_status         category
feedback                   object
churn_risk_score         category
label                    category
score                     float64
dtype: object</code></pre>
</div>
</div>
<section id="summary-of-the-data" class="cell markdown"
id="CLUaLpaQUV5Q">
<h1><strong>Summary of the Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:300}"
id="oDXGLf8H_tZy" data-outputId="b5216dfe-9239-40dd-f6b1-9c0c4cdee34f">
<div class="sourceCode" id="cb88"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb88-1"><a href="#cb88-1" aria-hidden="true" tabindex="-1"></a>df.describe()</span></code></pre></div>
<div class="output execute_result" data-execution_count="16">

  <div id="df-26ede30a-077d-4023-906f-3b2d13f73b00" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>days_since_last_login</th>
      <th>avg_time_spent</th>
      <th>avg_transaction_value</th>
      <th>points_in_wallet</th>
      <th>score</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>count</th>
      <td>24244.000000</td>
      <td>24244.000000</td>
      <td>24244.000000</td>
      <td>24244.000000</td>
      <td>24244.000000</td>
      <td>24244.000000</td>
    </tr>
    <tr>
      <th>mean</th>
      <td>36.975087</td>
      <td>-43.230820</td>
      <td>242.990085</td>
      <td>29274.984008</td>
      <td>688.217444</td>
      <td>0.845824</td>
    </tr>
    <tr>
      <th>std</th>
      <td>15.905782</td>
      <td>231.430631</td>
      <td>403.316697</td>
      <td>19482.111262</td>
      <td>193.942554</td>
      <td>0.073093</td>
    </tr>
    <tr>
      <th>min</th>
      <td>10.000000</td>
      <td>-999.000000</td>
      <td>-2814.109110</td>
      <td>800.460000</td>
      <td>-760.661236</td>
      <td>0.576339</td>
    </tr>
    <tr>
      <th>25%</th>
      <td>23.000000</td>
      <td>8.000000</td>
      <td>59.705000</td>
      <td>14239.570000</td>
      <td>616.827500</td>
      <td>0.811424</td>
    </tr>
    <tr>
      <th>50%</th>
      <td>37.000000</td>
      <td>12.000000</td>
      <td>161.780000</td>
      <td>27472.805000</td>
      <td>698.600000</td>
      <td>0.885798</td>
    </tr>
    <tr>
      <th>75%</th>
      <td>51.000000</td>
      <td>16.000000</td>
      <td>355.222500</td>
      <td>40829.692500</td>
      <td>764.612500</td>
      <td>0.892140</td>
    </tr>
    <tr>
      <th>max</th>
      <td>64.000000</td>
      <td>26.000000</td>
      <td>3040.410000</td>
      <td>99914.050000</td>
      <td>2069.069761</td>
      <td>0.895806</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-26ede30a-077d-4023-906f-3b2d13f73b00')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-26ede30a-077d-4023-906f-3b2d13f73b00 button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-26ede30a-077d-4023-906f-3b2d13f73b00');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-703e4b70-9a5f-4fc3-9aaa-064f37c6b3c2">
  <button class="colab-df-quickchart" onclick="quickchart('df-703e4b70-9a5f-4fc3-9aaa-064f37c6b3c2')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-703e4b70-9a5f-4fc3-9aaa-064f37c6b3c2 button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>
    </div>
  </div>

</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="cVVZg5qt_-n4" data-outputId="ae9b3fc8-19e9-4094-a91a-043b2f9102bb">
<div class="sourceCode" id="cb89"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb89-1"><a href="#cb89-1" aria-hidden="true" tabindex="-1"></a>df.columns</span></code></pre></div>
<div class="output execute_result" data-execution_count="17">
<pre><code>Index([&#39;age&#39;, &#39;gender&#39;, &#39;region_category&#39;, &#39;membership_category&#39;,
       &#39;medium_of_operation&#39;, &#39;internet_option&#39;, &#39;days_since_last_login&#39;,
       &#39;avg_time_spent&#39;, &#39;avg_transaction_value&#39;, &#39;points_in_wallet&#39;,
       &#39;complaint_status&#39;, &#39;feedback&#39;, &#39;churn_risk_score&#39;, &#39;label&#39;, &#39;score&#39;],
      dtype=&#39;object&#39;)</code></pre>
</div>
</div>
<div class="cell markdown" id="2v8RuuC8Ubuy">
<p><strong>Box Plots for the Numerical Variables to Check for
Outliers</strong></p>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:1000}"
id="XnlrsUIo_-wC" data-outputId="e9224570-6724-454f-e23a-ee3a9fdbad09">
<div class="sourceCode" id="cb91"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb91-1"><a href="#cb91-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb91-2"><a href="#cb91-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb91-3"><a href="#cb91-3" aria-hidden="true" tabindex="-1"></a>numerical_columns <span class="op">=</span> [<span class="st">&#39;age&#39;</span>, <span class="st">&#39;days_since_last_login&#39;</span>, <span class="st">&#39;avg_time_spent&#39;</span>, <span class="st">&#39;avg_transaction_value&#39;</span>, <span class="st">&#39;points_in_wallet&#39;</span>,<span class="st">&#39;score&#39;</span>]</span>
<span id="cb91-4"><a href="#cb91-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb91-5"><a href="#cb91-5" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> column <span class="kw">in</span> numerical_columns:</span>
<span id="cb91-6"><a href="#cb91-6" aria-hidden="true" tabindex="-1"></a>    plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb91-7"><a href="#cb91-7" aria-hidden="true" tabindex="-1"></a>    df.boxplot(column<span class="op">=</span>[column])</span>
<span id="cb91-8"><a href="#cb91-8" aria-hidden="true" tabindex="-1"></a>    plt.title(<span class="ss">f&#39;Box plot for </span><span class="sc">{</span>column<span class="sc">}</span><span class="ss">&#39;</span>)</span>
<span id="cb91-9"><a href="#cb91-9" aria-hidden="true" tabindex="-1"></a>    plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/a325ca31f0f583f2e260f9920eed07f9d17ec10b.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/d2bc40bfa214032dcdac9e701a36c477abf1695d.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/df6af788561e2d28c42c492e658c16a62e3da9a4.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/8b8a8b79c85062c7b2e2eaf8ddb70cd78113e746.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/943b90499aed1543ceebc519a63f4d5558fd5191.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/348e3986f032f2ee83727b4e166344f45c5b3a8b.png" /></p>
</div>
</div>
<div class="cell markdown" id="BGimCt-JYWqL">
<p><strong>Even though there are few outliers in some variables, I am
not performing any action to remove the outliers. This is because,
removing outliers may effect in having loss of valuable information from
the data. Those outliers may represent some important insights in the
data.</strong></p>
</div>
<section id="bivariate-plots" class="cell markdown" id="QLfl1Sj7UjoZ">
<h1><strong>Bivariate Plots</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:1000}"
id="rgRVMj72Qs0J" data-outputId="7949c0c1-e4ac-4bd7-d010-d5faffc7bd50">
<div class="sourceCode" id="cb92"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb92-1"><a href="#cb92-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
<span id="cb92-2"><a href="#cb92-2" aria-hidden="true" tabindex="-1"></a>sns.pairplot(df)</span></code></pre></div>
<div class="output execute_result" data-execution_count="18">
<pre><code>&lt;seaborn.axisgrid.PairGrid at 0x7e0f576dd450&gt;</code></pre>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/1f879400f94b07a7aad06a3f1411dadd851bc35f.png" /></p>
</div>
</div>
<section id="scaling-the-data" class="cell markdown" id="XD9xi1Axtv6d">
<h1><strong>Scaling the Data</strong></h1>
</section>
<div class="cell code" id="rU0HcpcX_-yW">
<div class="sourceCode" id="cb94"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb94-1"><a href="#cb94-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Scaling the Data</span></span>
<span id="cb94-2"><a href="#cb94-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb94-3"><a href="#cb94-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.preprocessing <span class="im">import</span> StandardScaler</span>
<span id="cb94-4"><a href="#cb94-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb94-5"><a href="#cb94-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb94-6"><a href="#cb94-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb94-7"><a href="#cb94-7" aria-hidden="true" tabindex="-1"></a>features_to_standardize <span class="op">=</span> df.select_dtypes(include<span class="op">=</span>[<span class="st">&#39;float64&#39;</span>, <span class="st">&#39;int64&#39;</span>]).columns</span>
<span id="cb94-8"><a href="#cb94-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb94-9"><a href="#cb94-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating a StandardScaler object</span></span>
<span id="cb94-10"><a href="#cb94-10" aria-hidden="true" tabindex="-1"></a>scaler <span class="op">=</span> StandardScaler()</span>
<span id="cb94-11"><a href="#cb94-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb94-12"><a href="#cb94-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Fitting the scaler on the selected features and transform the data</span></span>
<span id="cb94-13"><a href="#cb94-13" aria-hidden="true" tabindex="-1"></a>df[features_to_standardize] <span class="op">=</span> scaler.fit_transform(df[features_to_standardize])</span></code></pre></div>
</div>
<section id="splitting-the-data-into-train-and-testing-sets"
class="cell markdown" id="mSdalaqhWcPT">
<h2><strong>Splitting the Data into Train and Testing Sets</strong></h2>
</section>
<div class="cell code" id="eJeLCIdhD-oy">
<div class="sourceCode" id="cb95"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb95-1"><a href="#cb95-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb95-2"><a href="#cb95-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating the set of independent variables and the dependent variable</span></span>
<span id="cb95-3"><a href="#cb95-3" aria-hidden="true" tabindex="-1"></a>X <span class="op">=</span> df.drop([<span class="st">&quot;churn_risk_score&quot;</span>,<span class="st">&quot;feedback&quot;</span>], axis<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb95-4"><a href="#cb95-4" aria-hidden="true" tabindex="-1"></a>y <span class="op">=</span> df[<span class="st">&quot;churn_risk_score&quot;</span>]</span></code></pre></div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="abj9vyZfWyTQ" data-outputId="23281ef0-331a-4677-835d-f4188fbe4a01">
<div class="sourceCode" id="cb96"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb96-1"><a href="#cb96-1" aria-hidden="true" tabindex="-1"></a>y.unique()</span></code></pre></div>
<div class="output execute_result" data-execution_count="15">
<pre><code>[1, 0]
Categories (2, int64): [0, 1]</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:226}"
id="VtOGk98cW64k" data-outputId="59e68be6-cad8-4e94-f068-bef8283155a0">
<div class="sourceCode" id="cb98"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb98-1"><a href="#cb98-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> train_test_split</span>
<span id="cb98-2"><a href="#cb98-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb98-3"><a href="#cb98-3" aria-hidden="true" tabindex="-1"></a><span class="co">## Splitting the data into training and testing sets</span></span>
<span id="cb98-4"><a href="#cb98-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb98-5"><a href="#cb98-5" aria-hidden="true" tabindex="-1"></a>x_train, x_test, y_train, y_test <span class="op">=</span> train_test_split(X, y, test_size<span class="op">=</span><span class="fl">0.3</span>, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb98-6"><a href="#cb98-6" aria-hidden="true" tabindex="-1"></a>X.head()</span></code></pre></div>
<div class="output execute_result" data-execution_count="16">

  <div id="df-189cbe67-f8ad-4f90-9e9e-efcdb4c697e6" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>age</th>
      <th>gender</th>
      <th>region_category</th>
      <th>membership_category</th>
      <th>medium_of_operation</th>
      <th>internet_option</th>
      <th>days_since_last_login</th>
      <th>avg_time_spent</th>
      <th>avg_transaction_value</th>
      <th>points_in_wallet</th>
      <th>complaint_status</th>
      <th>label</th>
      <th>score</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>2</th>
      <td>0.441667</td>
      <td>0</td>
      <td>1.0</td>
      <td>0</td>
      <td>1.0</td>
      <td>1</td>
      <td>0.247297</td>
      <td>0.677323</td>
      <td>-0.423371</td>
      <td>-0.966943</td>
      <td>4</td>
      <td>0</td>
      <td>-0.470634</td>
    </tr>
    <tr>
      <th>3</th>
      <td>0.001566</td>
      <td>1</td>
      <td>2.0</td>
      <td>0</td>
      <td>1.0</td>
      <td>0</td>
      <td>0.234333</td>
      <td>-0.470409</td>
      <td>-0.207139</td>
      <td>-0.621627</td>
      <td>3</td>
      <td>0</td>
      <td>-0.470634</td>
    </tr>
    <tr>
      <th>4</th>
      <td>-0.375663</td>
      <td>0</td>
      <td>2.0</td>
      <td>0</td>
      <td>0.0</td>
      <td>0</td>
      <td>0.273223</td>
      <td>-0.321987</td>
      <td>-0.245940</td>
      <td>-0.129719</td>
      <td>5</td>
      <td>0</td>
      <td>-0.470634</td>
    </tr>
    <tr>
      <th>6</th>
      <td>-1.004378</td>
      <td>1</td>
      <td>1.0</td>
      <td>3</td>
      <td>1.0</td>
      <td>0</td>
      <td>0.230012</td>
      <td>-0.465178</td>
      <td>-1.041617</td>
      <td>0.350588</td>
      <td>4</td>
      <td>0</td>
      <td>0.333609</td>
    </tr>
    <tr>
      <th>10</th>
      <td>-0.752892</td>
      <td>0</td>
      <td>0.0</td>
      <td>1</td>
      <td>2.0</td>
      <td>1</td>
      <td>0.247297</td>
      <td>0.811389</td>
      <td>0.300022</td>
      <td>0.059207</td>
      <td>0</td>
      <td>0</td>
      <td>0.633673</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-189cbe67-f8ad-4f90-9e9e-efcdb4c697e6')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-189cbe67-f8ad-4f90-9e9e-efcdb4c697e6 button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-189cbe67-f8ad-4f90-9e9e-efcdb4c697e6');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-e61097c2-3610-4448-9fbc-f661fedc149d">
  <button class="colab-df-quickchart" onclick="quickchart('df-e61097c2-3610-4448-9fbc-f661fedc149d')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-e61097c2-3610-4448-9fbc-f661fedc149d button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

    </div>
  </div>

</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="oJRcyzbmW9Or" data-outputId="655885a0-3fe9-4c08-902a-4e7f249d62d0">
<div class="sourceCode" id="cb99"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb99-1"><a href="#cb99-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Let&#39;s check the split of the data</span></span>
<span id="cb99-2"><a href="#cb99-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="sc">{0:0.2f}% d</span><span class="st">ata is in training set&quot;</span>.<span class="bu">format</span>((<span class="bu">len</span>(x_train)<span class="op">/</span><span class="bu">len</span>(df.index)) <span class="op">*</span> <span class="dv">100</span>))</span>
<span id="cb99-3"><a href="#cb99-3" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="sc">{0:0.2f}% d</span><span class="st">ata is in test set&quot;</span>.<span class="bu">format</span>((<span class="bu">len</span>(x_test)<span class="op">/</span><span class="bu">len</span>(df.index)) <span class="op">*</span> <span class="dv">100</span>))</span></code></pre></div>
<div class="output stream stdout">
<pre><code>70.00% data is in training set
30.00% data is in test set
</code></pre>
</div>
</div>
<div class="cell code" id="TQ9sEVEyXAuF">
<div class="sourceCode" id="cb101"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb101-1"><a href="#cb101-1" aria-hidden="true" tabindex="-1"></a>y_train <span class="op">=</span> y_train.astype(<span class="st">&#39;category&#39;</span>)</span>
<span id="cb101-2"><a href="#cb101-2" aria-hidden="true" tabindex="-1"></a>y_test <span class="op">=</span> y_test.astype(<span class="st">&#39;category&#39;</span>)</span></code></pre></div>
</div>
<div class="cell code" id="uHxJl4jvXC6c">
<div class="sourceCode" id="cb102"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb102-1"><a href="#cb102-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Importing the required metrics for Model Evaluation</span></span>
<span id="cb102-2"><a href="#cb102-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb102-3"><a href="#cb102-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> (</span>
<span id="cb102-4"><a href="#cb102-4" aria-hidden="true" tabindex="-1"></a>    f1_score,</span>
<span id="cb102-5"><a href="#cb102-5" aria-hidden="true" tabindex="-1"></a>    accuracy_score,</span>
<span id="cb102-6"><a href="#cb102-6" aria-hidden="true" tabindex="-1"></a>    recall_score,</span>
<span id="cb102-7"><a href="#cb102-7" aria-hidden="true" tabindex="-1"></a>    precision_score,</span>
<span id="cb102-8"><a href="#cb102-8" aria-hidden="true" tabindex="-1"></a>    confusion_matrix,</span>
<span id="cb102-9"><a href="#cb102-9" aria-hidden="true" tabindex="-1"></a>    roc_auc_score,</span>
<span id="cb102-10"><a href="#cb102-10" aria-hidden="true" tabindex="-1"></a>    precision_recall_curve,</span>
<span id="cb102-11"><a href="#cb102-11" aria-hidden="true" tabindex="-1"></a>    confusion_matrix,</span>
<span id="cb102-12"><a href="#cb102-12" aria-hidden="true" tabindex="-1"></a>    roc_curve,)</span></code></pre></div>
</div>
<div class="cell markdown" id="21dcVhMEXTWz">

</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="E9Zw1I0RNNok" data-outputId="c866eab3-8a66-4879-904f-8e4c5f295c25">
<div class="sourceCode" id="cb103"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb103-1"><a href="#cb103-1" aria-hidden="true" tabindex="-1"></a><span class="co"># !pip install memory_profiler</span></span></code></pre></div>
<div class="output stream stdout">
<pre><code>Collecting memory_profiler
  Downloading memory_profiler-0.61.0-py3-none-any.whl (31 kB)
Requirement already satisfied: psutil in /usr/local/lib/python3.10/dist-packages (from memory_profiler) (5.9.5)
Installing collected packages: memory_profiler
Successfully installed memory_profiler-0.61.0
</code></pre>
</div>
</div>
<div class="cell markdown" id="Vj1hyu4qJBA8">
<p><strong>Setting the seed.</strong></p>
</div>
<div class="cell code" id="wwfHiQEXJDV5">
<div class="sourceCode" id="cb105"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb105-1"><a href="#cb105-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb105-2"><a href="#cb105-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb105-3"><a href="#cb105-3" aria-hidden="true" tabindex="-1"></a>np.random.seed(<span class="dv">1</span>)</span></code></pre></div>
</div>
<section id="building-logistic-regression-model" class="cell markdown"
id="GKvnYKh1XTfQ">
<h1><strong>Building Logistic Regression Model</strong></h1>
</section>
<section id="logisticregression-using-cross-validation"
class="cell markdown" id="UzCFSOOj8Vs6">
<h2><strong>LogisticRegression using Cross Validation</strong></h2>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="PxmjHcf37_eq" data-outputId="3e4fd10e-4caa-4ea1-c059-77790109d7b5">
<div class="sourceCode" id="cb106"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb106-1"><a href="#cb106-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.linear_model <span class="im">import</span> LogisticRegression</span>
<span id="cb106-2"><a href="#cb106-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> RandomizedSearchCV</span>
<span id="cb106-3"><a href="#cb106-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> time</span>
<span id="cb106-4"><a href="#cb106-4" aria-hidden="true" tabindex="-1"></a><span class="co"># from memory_profiler import memory_usage</span></span>
<span id="cb106-5"><a href="#cb106-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-6"><a href="#cb106-6" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> time</span>
<span id="cb106-7"><a href="#cb106-7" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> psutil</span>
<span id="cb106-8"><a href="#cb106-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-9"><a href="#cb106-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Measuring start time</span></span>
<span id="cb106-10"><a href="#cb106-10" aria-hidden="true" tabindex="-1"></a>start_time <span class="op">=</span> time.time()</span>
<span id="cb106-11"><a href="#cb106-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-12"><a href="#cb106-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Measuring memory usage before and after model fitting</span></span>
<span id="cb106-13"><a href="#cb106-13" aria-hidden="true" tabindex="-1"></a><span class="co"># mem_usage_before = memory_usage(-1, interval=0.1, timeout=1)</span></span>
<span id="cb106-14"><a href="#cb106-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-15"><a href="#cb106-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-16"><a href="#cb106-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-17"><a href="#cb106-17" aria-hidden="true" tabindex="-1"></a><span class="co"># Defining hyperparameter grid</span></span>
<span id="cb106-18"><a href="#cb106-18" aria-hidden="true" tabindex="-1"></a>param_grid <span class="op">=</span> {</span>
<span id="cb106-19"><a href="#cb106-19" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;penalty&#39;</span>: [<span class="st">&#39;l1&#39;</span>, <span class="st">&#39;l2&#39;</span>],                    <span class="co"># Regularization penalty (&#39;l1&#39; for Lasso, &#39;l2&#39; for Ridge)</span></span>
<span id="cb106-20"><a href="#cb106-20" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;C&#39;</span>: [<span class="fl">0.001</span>, <span class="fl">0.01</span>, <span class="fl">0.1</span>, <span class="dv">1</span>, <span class="dv">10</span>, <span class="dv">100</span>],        <span class="co"># Inverse of regularization strength</span></span>
<span id="cb106-21"><a href="#cb106-21" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;solver&#39;</span>: [<span class="st">&#39;liblinear&#39;</span>, <span class="st">&#39;saga&#39;</span>]             <span class="co"># Algorithm to use in the optimization problem</span></span>
<span id="cb106-22"><a href="#cb106-22" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb106-23"><a href="#cb106-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-24"><a href="#cb106-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating Logistic Regression model</span></span>
<span id="cb106-25"><a href="#cb106-25" aria-hidden="true" tabindex="-1"></a>logistic_model <span class="op">=</span> LogisticRegression(random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb106-26"><a href="#cb106-26" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-27"><a href="#cb106-27" aria-hidden="true" tabindex="-1"></a><span class="co"># Performing RandomizedSearchCV</span></span>
<span id="cb106-28"><a href="#cb106-28" aria-hidden="true" tabindex="-1"></a>random_search <span class="op">=</span> RandomizedSearchCV(estimator<span class="op">=</span>logistic_model, param_distributions<span class="op">=</span>param_grid,</span>
<span id="cb106-29"><a href="#cb106-29" aria-hidden="true" tabindex="-1"></a>                                   n_iter<span class="op">=</span><span class="dv">10</span>, cv<span class="op">=</span><span class="dv">10</span>, scoring<span class="op">=</span>[<span class="st">&#39;accuracy&#39;</span>,<span class="st">&#39;recall&#39;</span>,<span class="st">&#39;f1&#39;</span>,<span class="st">&#39;roc_auc&#39;</span>,<span class="st">&#39;balanced_accuracy&#39;</span>], refit<span class="op">=</span><span class="st">&quot;accuracy&quot;</span>,random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb106-30"><a href="#cb106-30" aria-hidden="true" tabindex="-1"></a>random_search.fit(x_train, y_train)</span>
<span id="cb106-31"><a href="#cb106-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb106-32"><a href="#cb106-32" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the best parameters and best score</span></span>
<span id="cb106-33"><a href="#cb106-33" aria-hidden="true" tabindex="-1"></a>best_params_logistic <span class="op">=</span> random_search.best_params_</span>
<span id="cb106-34"><a href="#cb106-34" aria-hidden="true" tabindex="-1"></a>best_score_logistic <span class="op">=</span> random_search.best_score_</span></code></pre></div>
<div class="output stream stderr">
<pre><code>/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_sag.py:350: ConvergenceWarning: The max_iter was reached which means the coef_ did not converge
  warnings.warn(
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="VV6UJN9YZNnK" data-outputId="7e8253be-b301-4a79-e376-9836bb06815f">
<div class="sourceCode" id="cb108"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb108-1"><a href="#cb108-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the best parameters and best score</span></span>
<span id="cb108-2"><a href="#cb108-2" aria-hidden="true" tabindex="-1"></a><span class="co"># best_params = random_search.best_params_</span></span>
<span id="cb108-3"><a href="#cb108-3" aria-hidden="true" tabindex="-1"></a><span class="co"># best_score = random_search.best_score_</span></span>
<span id="cb108-4"><a href="#cb108-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb108-5"><a href="#cb108-5" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&#39;The best parameters after hyperparamter tuning are: </span><span class="sc">{</span>best_params_logistic<span class="sc">}</span><span class="ss">&#39;</span>)</span>
<span id="cb108-6"><a href="#cb108-6" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The best score after the hypeparamter tuning is: </span><span class="sc">{</span>best_score_logistic<span class="sc">}</span><span class="ss">&quot;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The best parameters after hyperparamter tuning are: {&#39;solver&#39;: &#39;liblinear&#39;, &#39;penalty&#39;: &#39;l1&#39;, &#39;C&#39;: 0.01}
The best score after the hypeparamter tuning is: 0.8565704183853861
</code></pre>
</div>
</div>
<div class="cell markdown" id="wsPqxpRqT993">
<p><strong>From the above hyperparamter tuning process, the best
paramters are observed to be "{'solver': 'liblinear', 'penalty': 'l1',
'C': 0.01}" with the best score of "85.65%".</strong></p>
</div>
<section id="evaluation-of-logistic-regression-using-the-training-data"
class="cell markdown" id="esGHy95Y8jPq">
<h1><strong>Evaluation of Logistic Regression using the Training
Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="BkqNjGgJ8TcT" data-outputId="54882060-6064-4b36-8bcf-df150df30736">
<div class="sourceCode" id="cb110"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb110-1"><a href="#cb110-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> cross_val_predict</span>
<span id="cb110-2"><a href="#cb110-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn <span class="im">import</span> metrics</span>
<span id="cb110-3"><a href="#cb110-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-4"><a href="#cb110-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Training the model with the best parameters</span></span>
<span id="cb110-5"><a href="#cb110-5" aria-hidden="true" tabindex="-1"></a>best_logistic_model <span class="op">=</span> LogisticRegression(<span class="op">**</span>best_params_logistic, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb110-6"><a href="#cb110-6" aria-hidden="true" tabindex="-1"></a>best_logistic_model.fit(x_train, y_train)</span>
<span id="cb110-7"><a href="#cb110-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-8"><a href="#cb110-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb110-9"><a href="#cb110-9" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_logistic_train = cross_val_predict(best_logistic_model, x_train, y_train, cv=10)</span></span>
<span id="cb110-10"><a href="#cb110-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-11"><a href="#cb110-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-12"><a href="#cb110-12" aria-hidden="true" tabindex="-1"></a>cv_predictions_logistic_train <span class="op">=</span> best_logistic_model.predict(x_train)</span>
<span id="cb110-13"><a href="#cb110-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-14"><a href="#cb110-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the positive class probabilities (probability of being class 1)</span></span>
<span id="cb110-15"><a href="#cb110-15" aria-hidden="true" tabindex="-1"></a><span class="co"># positive_class_probabilities = cv_predictions_logistic_train[:, 1]</span></span>
<span id="cb110-16"><a href="#cb110-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-17"><a href="#cb110-17" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating ROC AUC</span></span>
<span id="cb110-18"><a href="#cb110-18" aria-hidden="true" tabindex="-1"></a>roc_auc <span class="op">=</span> roc_auc_score(y_train, cv_predictions_logistic_train)</span>
<span id="cb110-19"><a href="#cb110-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-20"><a href="#cb110-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb110-21"><a href="#cb110-21" aria-hidden="true" tabindex="-1"></a>train_cv_acc_logistic <span class="op">=</span> metrics.accuracy_score(y_train, cv_predictions_logistic_train)</span>
<span id="cb110-22"><a href="#cb110-22" aria-hidden="true" tabindex="-1"></a>train_cv_recall_logistic <span class="op">=</span> metrics.recall_score(y_train, cv_predictions_logistic_train)</span>
<span id="cb110-23"><a href="#cb110-23" aria-hidden="true" tabindex="-1"></a>train_cv_precision_logistic <span class="op">=</span> metrics.precision_score(y_train, cv_predictions_logistic_train)</span>
<span id="cb110-24"><a href="#cb110-24" aria-hidden="true" tabindex="-1"></a>train_cv_f1_logistic <span class="op">=</span> metrics.f1_score(y_train, cv_predictions_logistic_train)</span>
<span id="cb110-25"><a href="#cb110-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb110-26"><a href="#cb110-26" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb110-27"><a href="#cb110-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;The metrics for the training set using the cross-validation are: &#39;</span>)</span>
<span id="cb110-28"><a href="#cb110-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, train_cv_acc_logistic)</span>
<span id="cb110-29"><a href="#cb110-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, train_cv_recall_logistic)</span>
<span id="cb110-30"><a href="#cb110-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, train_cv_precision_logistic)</span>
<span id="cb110-31"><a href="#cb110-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, train_cv_f1_logistic)</span>
<span id="cb110-32"><a href="#cb110-32" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, roc_auc)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the training set using the cross-validation are: 
Cross-Validated Accuracy: 0.8563347083087802
Cross-Validated Recall: 0.8729221347331584
Cross-Validated Precision: 0.8621732555627565
Cross-Validated F1 Score: 0.8675144006086294
ROC AUC: 0.8549379393318233
</code></pre>
</div>
</div>
<section
id="time-consumption-and-memory-occupancy-of-the-logistic-regression-model-to-train-the-data"
class="cell markdown" id="6D_LTTELIr2y">
<h1><strong>Time Consumption and Memory Occupancy of the Logistic
Regression model to train the data.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="UI1bVJnmINvT" data-outputId="0e4a4125-05b7-4f13-87c3-e37009dab810">
<div class="sourceCode" id="cb112"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb112-1"><a href="#cb112-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> time</span>
<span id="cb112-2"><a href="#cb112-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> psutil</span>
<span id="cb112-3"><a href="#cb112-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Record end time</span></span>
<span id="cb112-4"><a href="#cb112-4" aria-hidden="true" tabindex="-1"></a>end_time <span class="op">=</span> time.time()</span>
<span id="cb112-5"><a href="#cb112-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb112-6"><a href="#cb112-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate execution time</span></span>
<span id="cb112-7"><a href="#cb112-7" aria-hidden="true" tabindex="-1"></a>execution_time_logistic <span class="op">=</span> end_time <span class="op">-</span> start_time</span>
<span id="cb112-8"><a href="#cb112-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb112-9"><a href="#cb112-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate memory usage</span></span>
<span id="cb112-10"><a href="#cb112-10" aria-hidden="true" tabindex="-1"></a>process <span class="op">=</span> psutil.Process()</span>
<span id="cb112-11"><a href="#cb112-11" aria-hidden="true" tabindex="-1"></a>memory_used_logistic <span class="op">=</span> process.memory_info().rss <span class="op">/</span> (<span class="dv">1024</span> <span class="op">*</span> <span class="dv">1024</span>)  <span class="co"># Convert to MiB</span></span>
<span id="cb112-12"><a href="#cb112-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb112-13"><a href="#cb112-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Print results</span></span>
<span id="cb112-14"><a href="#cb112-14" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Execution Time:&#39;</span>, execution_time_logistic, <span class="st">&#39;seconds&#39;</span>)</span>
<span id="cb112-15"><a href="#cb112-15" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Memory Used:&#39;</span>, memory_used_logistic, <span class="st">&#39;MiB&#39;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Execution Time: 134.49309134483337 seconds
Memory Used: 1380.18359375 MiB
</code></pre>
</div>
</div>
<div class="cell code" id="HxrwxuRn8pvd">
<div class="sourceCode" id="cb114"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="evaluation-of-logistic-regression-using-the-testing-data"
class="cell markdown" id="eta2kWDv8qj2">
<h1><strong>Evaluation of Logistic Regression using the Testing
Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="N4S2R3tj8hG7" data-outputId="e083aaf6-6d0e-4902-c3d6-161e876ffff8">
<div class="sourceCode" id="cb115"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb115-1"><a href="#cb115-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> cross_val_predict</span>
<span id="cb115-2"><a href="#cb115-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn <span class="im">import</span> metrics</span>
<span id="cb115-3"><a href="#cb115-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-4"><a href="#cb115-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Specifying the logistic regression model</span></span>
<span id="cb115-5"><a href="#cb115-5" aria-hidden="true" tabindex="-1"></a><span class="co"># model = LogisticRegression(solver=&quot;liblinear&quot;, random_state=1)</span></span>
<span id="cb115-6"><a href="#cb115-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-7"><a href="#cb115-7" aria-hidden="true" tabindex="-1"></a>best_logistic_model.fit(x_test,y_test)</span>
<span id="cb115-8"><a href="#cb115-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-9"><a href="#cb115-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb115-10"><a href="#cb115-10" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_logistic_test = cross_val_predict(best_logistic_model, x_test, y_test, cv=10)</span></span>
<span id="cb115-11"><a href="#cb115-11" aria-hidden="true" tabindex="-1"></a><span class="co"># best_logistic_model.fit(x_test,y_test)</span></span>
<span id="cb115-12"><a href="#cb115-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-13"><a href="#cb115-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-14"><a href="#cb115-14" aria-hidden="true" tabindex="-1"></a>cv_predictions_logistic_test <span class="op">=</span> best_logistic_model.predict(x_test)</span>
<span id="cb115-15"><a href="#cb115-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-16"><a href="#cb115-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-17"><a href="#cb115-17" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb115-18"><a href="#cb115-18" aria-hidden="true" tabindex="-1"></a>test_cv_acc_logistic <span class="op">=</span> metrics.accuracy_score(y_test, cv_predictions_logistic_test)</span>
<span id="cb115-19"><a href="#cb115-19" aria-hidden="true" tabindex="-1"></a>test_cv_recall_logistic <span class="op">=</span> metrics.recall_score(y_test, cv_predictions_logistic_test)</span>
<span id="cb115-20"><a href="#cb115-20" aria-hidden="true" tabindex="-1"></a>test_cv_precision_logistic <span class="op">=</span> metrics.precision_score(y_test, cv_predictions_logistic_test)</span>
<span id="cb115-21"><a href="#cb115-21" aria-hidden="true" tabindex="-1"></a>test_cv_f1_logistic <span class="op">=</span> metrics.f1_score(y_test, cv_predictions_logistic_test)</span>
<span id="cb115-22"><a href="#cb115-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating ROC AUC</span></span>
<span id="cb115-23"><a href="#cb115-23" aria-hidden="true" tabindex="-1"></a>roc_auc_logistic_test <span class="op">=</span> roc_auc_score(y_test, cv_predictions_logistic_test)</span>
<span id="cb115-24"><a href="#cb115-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-25"><a href="#cb115-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-26"><a href="#cb115-26" aria-hidden="true" tabindex="-1"></a><span class="co"># # mem_usage_after = memory_usage(-1, interval=0.1, timeout=1)</span></span>
<span id="cb115-27"><a href="#cb115-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-28"><a href="#cb115-28" aria-hidden="true" tabindex="-1"></a><span class="co"># # Measuring end time</span></span>
<span id="cb115-29"><a href="#cb115-29" aria-hidden="true" tabindex="-1"></a><span class="co"># # end_time = time.time()</span></span>
<span id="cb115-30"><a href="#cb115-30" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-31"><a href="#cb115-31" aria-hidden="true" tabindex="-1"></a><span class="co"># # Calculating the differences</span></span>
<span id="cb115-32"><a href="#cb115-32" aria-hidden="true" tabindex="-1"></a><span class="co"># execution_time_logistic = end_time - start_time</span></span>
<span id="cb115-33"><a href="#cb115-33" aria-hidden="true" tabindex="-1"></a><span class="co"># memory_used_logistic = max(mem_usage_after) - max(mem_usage_before)</span></span>
<span id="cb115-34"><a href="#cb115-34" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-35"><a href="#cb115-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb115-36"><a href="#cb115-36" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb115-37"><a href="#cb115-37" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;The metrics for the testing set using the cross-validation are: &#39;</span>)</span>
<span id="cb115-38"><a href="#cb115-38" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, test_cv_acc_logistic)</span>
<span id="cb115-39"><a href="#cb115-39" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, test_cv_recall_logistic)</span>
<span id="cb115-40"><a href="#cb115-40" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, test_cv_precision_logistic)</span>
<span id="cb115-41"><a href="#cb115-41" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, test_cv_f1_logistic)</span>
<span id="cb115-42"><a href="#cb115-42" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, roc_auc_logistic_test)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the testing set using the cross-validation are: 
Cross-Validated Accuracy: 0.8435523783337916
Cross-Validated Recall: 0.8539584934665642
Cross-Validated Precision: 0.8543963086388106
Cross-Validated F1 Score: 0.8541773449513069
ROC AUC: 0.8427312491064651
</code></pre>
</div>
</div>
<section id="roc-curve---logistic-regression" class="cell markdown"
id="4_jXNqsbU4vm">
<h1><strong>ROC Curve - Logistic Regression</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:472}"
id="-n6bk3XPqr5f" data-outputId="63f05aaa-ce9f-4a8b-c826-38621087f2ca">
<div class="sourceCode" id="cb117"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb117-1"><a href="#cb117-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb117-2"><a href="#cb117-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> roc_curve, auc</span>
<span id="cb117-3"><a href="#cb117-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb117-4"><a href="#cb117-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Get probabilities for positive class</span></span>
<span id="cb117-5"><a href="#cb117-5" aria-hidden="true" tabindex="-1"></a>y_probs_logistic <span class="op">=</span> best_logistic_model.predict_proba(x_test)[:, <span class="dv">1</span>]</span>
<span id="cb117-6"><a href="#cb117-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb117-7"><a href="#cb117-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC curve</span></span>
<span id="cb117-8"><a href="#cb117-8" aria-hidden="true" tabindex="-1"></a>fpr, tpr, thresholds <span class="op">=</span> roc_curve(y_test, y_probs_logistic)</span>
<span id="cb117-9"><a href="#cb117-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb117-10"><a href="#cb117-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC area under the curve</span></span>
<span id="cb117-11"><a href="#cb117-11" aria-hidden="true" tabindex="-1"></a>roc_auc <span class="op">=</span> auc(fpr, tpr)</span>
<span id="cb117-12"><a href="#cb117-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb117-13"><a href="#cb117-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot ROC curve</span></span>
<span id="cb117-14"><a href="#cb117-14" aria-hidden="true" tabindex="-1"></a>plt.figure()</span>
<span id="cb117-15"><a href="#cb117-15" aria-hidden="true" tabindex="-1"></a>plt.plot(fpr, tpr, color<span class="op">=</span><span class="st">&#39;darkorange&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, label<span class="op">=</span><span class="st">&#39;ROC curve (area = </span><span class="sc">%0.2f</span><span class="st">)&#39;</span> <span class="op">%</span> roc_auc)</span>
<span id="cb117-16"><a href="#cb117-16" aria-hidden="true" tabindex="-1"></a>plt.plot([<span class="dv">0</span>, <span class="dv">1</span>], [<span class="dv">0</span>, <span class="dv">1</span>], color<span class="op">=</span><span class="st">&#39;navy&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, linestyle<span class="op">=</span><span class="st">&#39;--&#39;</span>)</span>
<span id="cb117-17"><a href="#cb117-17" aria-hidden="true" tabindex="-1"></a>plt.xlim([<span class="fl">0.0</span>, <span class="fl">1.0</span>])</span>
<span id="cb117-18"><a href="#cb117-18" aria-hidden="true" tabindex="-1"></a>plt.ylim([<span class="fl">0.0</span>, <span class="fl">1.05</span>])</span>
<span id="cb117-19"><a href="#cb117-19" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;False Positive Rate&#39;</span>)</span>
<span id="cb117-20"><a href="#cb117-20" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True Positive Rate&#39;</span>)</span>
<span id="cb117-21"><a href="#cb117-21" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Receiver Operating Characteristic (ROC) Curve&#39;</span>)</span>
<span id="cb117-22"><a href="#cb117-22" aria-hidden="true" tabindex="-1"></a>plt.legend(loc<span class="op">=</span><span class="st">&quot;lower right&quot;</span>)</span>
<span id="cb117-23"><a href="#cb117-23" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/e5a617f9903f559971cca4dbf1ae7e335393f571.png" /></p>
</div>
</div>
<div class="cell markdown" id="4KmUavSt8vRE">
<p><strong>From the above ROC curve, it can be seen that the logistic
regression is distinguishing better between both the classes - positive
and negative. It can be gleaned from the fact that the area under the
cruve is more than 0.5.</strong></p>
</div>
<section id="training-and-testing-metrics-for-logistic-regression"
class="cell markdown" id="3GXpWRn7GR_v">
<h1><strong>Training and Testing metrics for Logistic
Regression</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:223}"
id="6eGLX4zYvGzG" data-outputId="6bfe644b-81f7-4ca9-86e9-211ba7001935">
<div class="sourceCode" id="cb118"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb118-1"><a href="#cb118-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb118-2"><a href="#cb118-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb118-3"><a href="#cb118-3" aria-hidden="true" tabindex="-1"></a>logreg_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb118-4"><a href="#cb118-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_logistic, train_cv_recall_logistic, train_cv_precision_logistic, train_cv_f1_logistic,roc_auc],</span>
<span id="cb118-5"><a href="#cb118-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_logistic, test_cv_recall_logistic, test_cv_precision_logistic, test_cv_f1_logistic,roc_auc_logistic_test]},</span>
<span id="cb118-6"><a href="#cb118-6" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb118-7"><a href="#cb118-7" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>])</span>
<span id="cb118-8"><a href="#cb118-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for Logistic Regression are: &quot;</span>)</span>
<span id="cb118-9"><a href="#cb118-9" aria-hidden="true" tabindex="-1"></a>logreg_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for Logistic Regression are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="23">

  <div id="df-f93a16a5-915e-4a5b-81b3-c51903d1ddbe" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.856335</td>
      <td>0.843552</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.872922</td>
      <td>0.853958</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.862173</td>
      <td>0.854396</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.867514</td>
      <td>0.854177</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.940937</td>
      <td>0.842731</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-f93a16a5-915e-4a5b-81b3-c51903d1ddbe')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-f93a16a5-915e-4a5b-81b3-c51903d1ddbe button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-f93a16a5-915e-4a5b-81b3-c51903d1ddbe');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-1bcab0b0-057a-40c0-84c8-ad1252cf3b76">
  <button class="colab-df-quickchart" onclick="quickchart('df-1bcab0b0-057a-40c0-84c8-ad1252cf3b76')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-1bcab0b0-057a-40c0-84c8-ad1252cf3b76 button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_87da58b7-820a-43d6-b294-efe7bdf3f4ed">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('logreg_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_87da58b7-820a-43d6-b294-efe7bdf3f4ed button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('logreg_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell markdown" id="0JfmWuiHjZb_">
<p><strong>From the above metrics, it can be observed that there is no
possible chance of overfitting in the Logistic Regresison Model.
Additionally, the test set accuracy of 84.35% explains that Logistic
Regression is performing better on unseen data to predict the customer
churn. Furthermore, it seems to perform better in phase 2 than in phase
1 with a slight increase in the accuracy by about 0.32%.</strong></p>
<p><strong>The recall score is also good with 85.39%.</strong></p>
</div>
<div class="cell code" id="tb5GB8LAAcV8">
<div class="sourceCode" id="cb120"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="3bCNnLf7AcY3">
<div class="sourceCode" id="cb121"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="variable-importance-from-logistic-regression"
class="cell markdown" id="_AUNqZ66kS6m">
<h1><strong>Variable Importance from Logistic Regression</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:564}"
id="Tb26DdWRvmJd" data-outputId="cae99d46-0b10-46b2-e5b8-e936612b537f">
<div class="sourceCode" id="cb122"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb122-1"><a href="#cb122-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb122-2"><a href="#cb122-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb122-3"><a href="#cb122-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb122-4"><a href="#cb122-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the coefficients (weights) of the features</span></span>
<span id="cb122-5"><a href="#cb122-5" aria-hidden="true" tabindex="-1"></a>coefficients <span class="op">=</span> best_logistic_model.coef_[<span class="dv">0</span>]</span>
<span id="cb122-6"><a href="#cb122-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb122-7"><a href="#cb122-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the absolute values of the coefficients to represent importance</span></span>
<span id="cb122-8"><a href="#cb122-8" aria-hidden="true" tabindex="-1"></a>importance <span class="op">=</span> np.<span class="bu">abs</span>(coefficients)</span>
<span id="cb122-9"><a href="#cb122-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb122-10"><a href="#cb122-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the names of the features</span></span>
<span id="cb122-11"><a href="#cb122-11" aria-hidden="true" tabindex="-1"></a>feature_names <span class="op">=</span> x_test.columns</span>
<span id="cb122-12"><a href="#cb122-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb122-13"><a href="#cb122-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Sorting the features based on their importance</span></span>
<span id="cb122-14"><a href="#cb122-14" aria-hidden="true" tabindex="-1"></a>sorted_indices <span class="op">=</span> np.argsort(importance)[::<span class="op">-</span><span class="dv">1</span>]</span>
<span id="cb122-15"><a href="#cb122-15" aria-hidden="true" tabindex="-1"></a>sorted_importance <span class="op">=</span> importance[sorted_indices]</span>
<span id="cb122-16"><a href="#cb122-16" aria-hidden="true" tabindex="-1"></a>sorted_feature_names <span class="op">=</span> feature_names[sorted_indices]</span>
<span id="cb122-17"><a href="#cb122-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb122-18"><a href="#cb122-18" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting</span></span>
<span id="cb122-19"><a href="#cb122-19" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">10</span>, <span class="dv">6</span>))</span>
<span id="cb122-20"><a href="#cb122-20" aria-hidden="true" tabindex="-1"></a>plt.barh(<span class="bu">range</span>(<span class="bu">len</span>(sorted_importance)), sorted_importance, align<span class="op">=</span><span class="st">&#39;center&#39;</span>)</span>
<span id="cb122-21"><a href="#cb122-21" aria-hidden="true" tabindex="-1"></a>plt.yticks(<span class="bu">range</span>(<span class="bu">len</span>(sorted_importance)), sorted_feature_names)</span>
<span id="cb122-22"><a href="#cb122-22" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Importance&#39;</span>)</span>
<span id="cb122-23"><a href="#cb122-23" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Variable Importance for Logistic Regression Model (Test Set)&#39;</span>)</span>
<span id="cb122-24"><a href="#cb122-24" aria-hidden="true" tabindex="-1"></a>plt.gca().invert_yaxis()</span>
<span id="cb122-25"><a href="#cb122-25" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/e56649be6878265ce214378c99117f64f3085d1c.png" /></p>
</div>
</div>
<div class="cell markdown" id="ebvpb7cJxmL5">
<p><strong>From the variable importance graph it can be seen that
"membership_category "is appeared to be the most important for the churn
prediction of this organization. The importance value of the
"membership" is very high compared to the other important variables in
the data. Furthermore, it can be seen that the newly added variable of
"score" also stands in the second place in the order of hierarchy. This
states that the sentiment score is one of the profound features to
anticipate the customer churn while using the logistic gression model.
In addition to membership_category and score; points_in_wallet,
region_categiry and avg_transaction_value also found to be relatively
important.</strong></p>
</div>
<section
id="confusion-matrix-for-training-and-testing-data-of-logisticregression"
class="cell markdown" id="wAX-S4IFClzv">
<h1><strong>Confusion Matrix for Training and Testing Data of
LogisticRegression</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:1000}"
id="wv16JAUZ93j5" data-outputId="1917083e-9c0d-40a8-c26b-1e51d5157a86">
<div class="sourceCode" id="cb123"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb123-1"><a href="#cb123-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> confusion_matrix</span>
<span id="cb123-2"><a href="#cb123-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
<span id="cb123-3"><a href="#cb123-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb123-4"><a href="#cb123-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb123-5"><a href="#cb123-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Generate confusion matrix for training set</span></span>
<span id="cb123-6"><a href="#cb123-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Get cross-validated predictions</span></span>
<span id="cb123-7"><a href="#cb123-7" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions = cross_val_predict(best_logistic_model, x_train, y_train, cv=10)</span></span>
<span id="cb123-8"><a href="#cb123-8" aria-hidden="true" tabindex="-1"></a>conf_matrix_train <span class="op">=</span> confusion_matrix(y_train, cv_predictions_logistic_train)</span>
<span id="cb123-9"><a href="#cb123-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb123-10"><a href="#cb123-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for training set</span></span>
<span id="cb123-11"><a href="#cb123-11" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb123-12"><a href="#cb123-12" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_train, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb123-13"><a href="#cb123-13" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb123-14"><a href="#cb123-14" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb123-15"><a href="#cb123-15" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Training Set&#39;</span>)</span>
<span id="cb123-16"><a href="#cb123-16" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb123-17"><a href="#cb123-17" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb123-18"><a href="#cb123-18" aria-hidden="true" tabindex="-1"></a>plt.show()</span>
<span id="cb123-19"><a href="#cb123-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb123-20"><a href="#cb123-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Generate confusion matrix for testing set</span></span>
<span id="cb123-21"><a href="#cb123-21" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions = cross_val_predict(best_logistic_model, x_test, y_test, cv=10)</span></span>
<span id="cb123-22"><a href="#cb123-22" aria-hidden="true" tabindex="-1"></a>conf_matrix_test <span class="op">=</span> confusion_matrix(y_test, cv_predictions_logistic_test)</span>
<span id="cb123-23"><a href="#cb123-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb123-24"><a href="#cb123-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for testing set</span></span>
<span id="cb123-25"><a href="#cb123-25" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb123-26"><a href="#cb123-26" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_test, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb123-27"><a href="#cb123-27" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb123-28"><a href="#cb123-28" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb123-29"><a href="#cb123-29" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Testing Set&#39;</span>)</span>
<span id="cb123-30"><a href="#cb123-30" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb123-31"><a href="#cb123-31" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb123-32"><a href="#cb123-32" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/3c1a49912a629fa1600c2bc9af16ac279dd3ffca.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/2c631f9456bc800f13323ad9b7259bc39886ee3f.png" /></p>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="RyjrY7QDarbM" data-outputId="84629009-81f4-4ad1-c353-601819490c47">
<div class="sourceCode" id="cb124"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb124-1"><a href="#cb124-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb124-2"><a href="#cb124-2" aria-hidden="true" tabindex="-1"></a>tn_train, fp_train, fn_train, tp_train <span class="op">=</span> conf_matrix_train.ravel()</span>
<span id="cb124-3"><a href="#cb124-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-4"><a href="#cb124-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb124-5"><a href="#cb124-5" aria-hidden="true" tabindex="-1"></a>specificity_logistic_train <span class="op">=</span> tn_train <span class="op">/</span> (tn_train <span class="op">+</span> fp_train)</span>
<span id="cb124-6"><a href="#cb124-6" aria-hidden="true" tabindex="-1"></a>sensitivity_logistic_train <span class="op">=</span> tp_train <span class="op">/</span> (tp_train <span class="op">+</span> fn_train)</span>
<span id="cb124-7"><a href="#cb124-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-8"><a href="#cb124-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-9"><a href="#cb124-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-10"><a href="#cb124-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-11"><a href="#cb124-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb124-12"><a href="#cb124-12" aria-hidden="true" tabindex="-1"></a>tn_test, fp_test, fn_test, tp_test <span class="op">=</span> conf_matrix_test.ravel()</span>
<span id="cb124-13"><a href="#cb124-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-14"><a href="#cb124-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb124-15"><a href="#cb124-15" aria-hidden="true" tabindex="-1"></a>specificity_logistic_test <span class="op">=</span> tn_test <span class="op">/</span> (tn_test <span class="op">+</span> fp_test)</span>
<span id="cb124-16"><a href="#cb124-16" aria-hidden="true" tabindex="-1"></a>sensitivity_logistic_test <span class="op">=</span> tp_test <span class="op">/</span> (tp_test <span class="op">+</span> fn_test)</span>
<span id="cb124-17"><a href="#cb124-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-18"><a href="#cb124-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb124-19"><a href="#cb124-19" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the training data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_logistic_train<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_logistic_train<span class="sc">}</span><span class="ch">\n\n</span><span class="ss">&quot;</span>)</span>
<span id="cb124-20"><a href="#cb124-20" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the testing data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_logistic_test<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_logistic_test<span class="sc">}</span><span class="ss">&quot;</span>)</span>
<span id="cb124-21"><a href="#cb124-21" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<div class="output stream stdout">
<pre><code>The Sensitivity and Specificity of the Logistic Regression on the training data:
Specificity:0.8369537439304882
Sensitivity:0.8729221347331584


The Sensitivity and Specificity of the Logistic Regression on the testing data:
Specificity:0.8315040047463661
Sensitivity:0.8539584934665642
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:286}"
id="s49VQLtMApK5" data-outputId="f63720eb-05c9-412c-b49c-a5dc3eca6ef7">
<div class="sourceCode" id="cb126"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb126-1"><a href="#cb126-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb126-2"><a href="#cb126-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb126-3"><a href="#cb126-3" aria-hidden="true" tabindex="-1"></a>logreg_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb126-4"><a href="#cb126-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_logistic, train_cv_recall_logistic, train_cv_precision_logistic, train_cv_f1_logistic,roc_auc,</span>
<span id="cb126-5"><a href="#cb126-5" aria-hidden="true" tabindex="-1"></a>                 specificity_logistic_train,sensitivity_logistic_train],</span>
<span id="cb126-6"><a href="#cb126-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_logistic, test_cv_recall_logistic, test_cv_precision_logistic, test_cv_f1_logistic,roc_auc_logistic_test,</span>
<span id="cb126-7"><a href="#cb126-7" aria-hidden="true" tabindex="-1"></a>                specificity_logistic_test, sensitivity_logistic_test]},</span>
<span id="cb126-8"><a href="#cb126-8" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb126-9"><a href="#cb126-9" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>,<span class="st">&#39;Specificity&#39;</span>,<span class="st">&#39;Sensitivity&#39;</span>])</span>
<span id="cb126-10"><a href="#cb126-10" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for Logistic Regression are: &quot;</span>)</span>
<span id="cb126-11"><a href="#cb126-11" aria-hidden="true" tabindex="-1"></a>logreg_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for Logistic Regression are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="27">

  <div id="df-92a7fd63-2b1b-4ed8-98a5-74ab8a5103df" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.856335</td>
      <td>0.843552</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.872922</td>
      <td>0.853958</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.862173</td>
      <td>0.854396</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.867514</td>
      <td>0.854177</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.940937</td>
      <td>0.842731</td>
    </tr>
    <tr>
      <th>Specificity</th>
      <td>0.836954</td>
      <td>0.831504</td>
    </tr>
    <tr>
      <th>Sensitivity</th>
      <td>0.872922</td>
      <td>0.853958</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-92a7fd63-2b1b-4ed8-98a5-74ab8a5103df')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-92a7fd63-2b1b-4ed8-98a5-74ab8a5103df button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-92a7fd63-2b1b-4ed8-98a5-74ab8a5103df');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-aed69960-bdde-4aaa-9b0d-bb1029e8a045">
  <button class="colab-df-quickchart" onclick="quickchart('df-aed69960-bdde-4aaa-9b0d-bb1029e8a045')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-aed69960-bdde-4aaa-9b0d-bb1029e8a045 button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_3ae6dede-120c-48dc-8bb6-b8ee0ba9508d">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('logreg_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_3ae6dede-120c-48dc-8bb6-b8ee0ba9508d button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('logreg_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell markdown" id="0KUeNC7VDKk6">

</div>
<section
id="support-vector-machine-using-cross-validation---training-data"
class="cell markdown" id="zqeKjVOzDKny">
<h1><strong>Support Vector Machine Using Cross Validation - Training
Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="i-C3mb6fe_ON" data-outputId="b122a678-9722-4c31-d3e4-a3f338577e6e">
<div class="sourceCode" id="cb128"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb128-1"><a href="#cb128-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.svm <span class="im">import</span> SVC</span>
<span id="cb128-2"><a href="#cb128-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> RandomizedSearchCV</span>
<span id="cb128-3"><a href="#cb128-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-4"><a href="#cb128-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Measuring start time</span></span>
<span id="cb128-5"><a href="#cb128-5" aria-hidden="true" tabindex="-1"></a>start_time <span class="op">=</span> time.time()</span>
<span id="cb128-6"><a href="#cb128-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-7"><a href="#cb128-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Measuring memory usage before and after model fitting</span></span>
<span id="cb128-8"><a href="#cb128-8" aria-hidden="true" tabindex="-1"></a><span class="co"># mem_usage_before = memory_usage(-1, interval=0.1, timeout=1)</span></span>
<span id="cb128-9"><a href="#cb128-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-10"><a href="#cb128-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Defining hyperparameter grid</span></span>
<span id="cb128-11"><a href="#cb128-11" aria-hidden="true" tabindex="-1"></a>param_grid <span class="op">=</span> {</span>
<span id="cb128-12"><a href="#cb128-12" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;C&#39;</span>: [<span class="fl">0.001</span>, <span class="fl">0.01</span>, <span class="fl">0.1</span>, <span class="dv">1</span>, <span class="dv">10</span>, <span class="dv">100</span>],        <span class="co"># Regularization parameter</span></span>
<span id="cb128-13"><a href="#cb128-13" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;kernel&#39;</span>: [<span class="st">&#39;linear&#39;</span>,<span class="st">&#39;rbf&#39;</span>,<span class="st">&#39;sigmoid&#39;</span>,<span class="st">&#39;poly&#39;</span>],                       <span class="co"># &#39;linear&#39; for linear kernel</span></span>
<span id="cb128-14"><a href="#cb128-14" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb128-15"><a href="#cb128-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-16"><a href="#cb128-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating SVM model with linear kernel</span></span>
<span id="cb128-17"><a href="#cb128-17" aria-hidden="true" tabindex="-1"></a>svm_linear_model <span class="op">=</span> SVC(random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb128-18"><a href="#cb128-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-19"><a href="#cb128-19" aria-hidden="true" tabindex="-1"></a><span class="co"># Performing RandomizedSearchCV</span></span>
<span id="cb128-20"><a href="#cb128-20" aria-hidden="true" tabindex="-1"></a>random_search <span class="op">=</span> RandomizedSearchCV(estimator<span class="op">=</span>svm_linear_model, param_distributions<span class="op">=</span>param_grid,</span>
<span id="cb128-21"><a href="#cb128-21" aria-hidden="true" tabindex="-1"></a>                                   n_iter<span class="op">=</span><span class="dv">10</span>, cv<span class="op">=</span><span class="dv">10</span>, scoring<span class="op">=</span>[<span class="st">&#39;accuracy&#39;</span>, <span class="st">&#39;recall&#39;</span>,<span class="st">&#39;f1&#39;</span>,<span class="st">&#39;roc_auc&#39;</span>,<span class="st">&#39;balanced_accuracy&#39;</span>], refit<span class="op">=</span><span class="st">&quot;accuracy&quot;</span>, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb128-22"><a href="#cb128-22" aria-hidden="true" tabindex="-1"></a>random_search.fit(x_train, y_train)</span>
<span id="cb128-23"><a href="#cb128-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-24"><a href="#cb128-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the best parameters and best score</span></span>
<span id="cb128-25"><a href="#cb128-25" aria-hidden="true" tabindex="-1"></a>best_params_svm <span class="op">=</span> random_search.best_params_</span>
<span id="cb128-26"><a href="#cb128-26" aria-hidden="true" tabindex="-1"></a>best_score_svm <span class="op">=</span> random_search.best_score_</span>
<span id="cb128-27"><a href="#cb128-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-28"><a href="#cb128-28" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb128-29"><a href="#cb128-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;</span><span class="ch">\n</span><span class="ss">The best hyperparameters for SVM are:</span><span class="ch">\n</span><span class="sc">{</span>best_params_svm<span class="sc">}</span><span class="ch">\n</span><span class="ss">&quot;</span>)</span>
<span id="cb128-30"><a href="#cb128-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;Thes best score for SVM is:</span><span class="ch">\n</span><span class="sc">{</span>best_score_svm<span class="sc">}</span><span class="ss">&quot;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>
The best hyperparameters for SVM are:
{&#39;kernel&#39;: &#39;rbf&#39;, &#39;C&#39;: 10}

Thes best score for SVM is:
0.9133765468473778
</code></pre>
</div>
</div>
<div class="cell code" id="VQxVacMzOgEv">
<div class="sourceCode" id="cb130"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="Lzqofa1dOgHZ">
<div class="sourceCode" id="cb131"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="nJbjPg4VDQQ1" data-outputId="8b8357d2-c898-48aa-8d35-945d077fa45a">
<div class="sourceCode" id="cb132"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb132-1"><a href="#cb132-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Fitting the best parameters model to the training data</span></span>
<span id="cb132-2"><a href="#cb132-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb132-3"><a href="#cb132-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> cross_val_predict</span>
<span id="cb132-4"><a href="#cb132-4" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn <span class="im">import</span> metrics</span>
<span id="cb132-5"><a href="#cb132-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Training the model with the best parameters</span></span>
<span id="cb132-6"><a href="#cb132-6" aria-hidden="true" tabindex="-1"></a>best_svm_linear_model <span class="op">=</span> SVC(<span class="op">**</span>best_params_svm, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb132-7"><a href="#cb132-7" aria-hidden="true" tabindex="-1"></a>best_svm_linear_model <span class="op">=</span> best_svm_linear_model.fit(x_train, y_train)</span>
<span id="cb132-8"><a href="#cb132-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb132-9"><a href="#cb132-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb132-10"><a href="#cb132-10" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_svm_linear_train = cross_val_predict(best_svm_linear_model, x_train, y_train, cv=10)</span></span>
<span id="cb132-11"><a href="#cb132-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb132-12"><a href="#cb132-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb132-13"><a href="#cb132-13" aria-hidden="true" tabindex="-1"></a>cv_predictions_svm_linear_train <span class="op">=</span> best_svm_linear_model.predict(x_train)<span class="co">#, y_train, cv=10)</span></span>
<span id="cb132-14"><a href="#cb132-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb132-15"><a href="#cb132-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb132-16"><a href="#cb132-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb132-17"><a href="#cb132-17" aria-hidden="true" tabindex="-1"></a>train_cv_acc_svm_linear <span class="op">=</span> metrics.accuracy_score(y_train, cv_predictions_svm_linear_train)</span>
<span id="cb132-18"><a href="#cb132-18" aria-hidden="true" tabindex="-1"></a>train_cv_recall_svm_linear <span class="op">=</span> metrics.recall_score(y_train, cv_predictions_svm_linear_train)</span>
<span id="cb132-19"><a href="#cb132-19" aria-hidden="true" tabindex="-1"></a>train_cv_precision_svm_linear <span class="op">=</span> metrics.precision_score(y_train, cv_predictions_svm_linear_train)</span>
<span id="cb132-20"><a href="#cb132-20" aria-hidden="true" tabindex="-1"></a>train_cv_f1_svm_linear <span class="op">=</span> metrics.f1_score(y_train, cv_predictions_svm_linear_train)</span>
<span id="cb132-21"><a href="#cb132-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating ROC AUC</span></span>
<span id="cb132-22"><a href="#cb132-22" aria-hidden="true" tabindex="-1"></a>roc_auc_train_svm <span class="op">=</span> roc_auc_score(y_train, cv_predictions_logistic_train)</span>
<span id="cb132-23"><a href="#cb132-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb132-24"><a href="#cb132-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb132-25"><a href="#cb132-25" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb132-26"><a href="#cb132-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;The metrics for the training set using the cross-validation are: &#39;</span>)</span>
<span id="cb132-27"><a href="#cb132-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, train_cv_acc_svm_linear)</span>
<span id="cb132-28"><a href="#cb132-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, train_cv_recall_svm_linear)</span>
<span id="cb132-29"><a href="#cb132-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, train_cv_precision_svm_linear)</span>
<span id="cb132-30"><a href="#cb132-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, train_cv_f1_svm_linear)</span>
<span id="cb132-31"><a href="#cb132-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated ROC_AUC Score:&quot;</span>, roc_auc_train_svm)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the training set using the cross-validation are: 
Cross-Validated Accuracy: 0.9297583971714791
Cross-Validated Recall: 0.9458661417322834
Cross-Validated Precision: 0.9254226407019046
Cross-Validated F1 Score: 0.9355327203893997
Cross-Validated ROC_AUC Score: 0.8549379393318233
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="vLcsFTjV6T8Y" data-outputId="6653aafc-c360-40bd-eb69-95675dc761af">
<div class="sourceCode" id="cb134"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb134-1"><a href="#cb134-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Record end time</span></span>
<span id="cb134-2"><a href="#cb134-2" aria-hidden="true" tabindex="-1"></a>end_time <span class="op">=</span> time.time()</span>
<span id="cb134-3"><a href="#cb134-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb134-4"><a href="#cb134-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate execution time</span></span>
<span id="cb134-5"><a href="#cb134-5" aria-hidden="true" tabindex="-1"></a>execution_time_svm <span class="op">=</span> end_time <span class="op">-</span> start_time</span>
<span id="cb134-6"><a href="#cb134-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb134-7"><a href="#cb134-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate memory usage</span></span>
<span id="cb134-8"><a href="#cb134-8" aria-hidden="true" tabindex="-1"></a>process <span class="op">=</span> psutil.Process()</span>
<span id="cb134-9"><a href="#cb134-9" aria-hidden="true" tabindex="-1"></a>memory_used_svm <span class="op">=</span> process.memory_info().rss <span class="op">/</span> (<span class="dv">1024</span> <span class="op">*</span> <span class="dv">1024</span>)  <span class="co"># Convert to MiB</span></span>
<span id="cb134-10"><a href="#cb134-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb134-11"><a href="#cb134-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Print results</span></span>
<span id="cb134-12"><a href="#cb134-12" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Execution Time:&#39;</span>, execution_time_svm, <span class="st">&#39;seconds&#39;</span>)</span>
<span id="cb134-13"><a href="#cb134-13" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Memory Used:&#39;</span>, memory_used_svm, <span class="st">&#39;MiB&#39;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Execution Time: 2822.5223500728607 seconds
Memory Used: 1628.64453125 MiB
</code></pre>
</div>
</div>
<section
id="support-vector-machine-using-cross-validation---testing-data"
class="cell markdown" id="6qzzx_9UGfDu">
<h1><strong>Support Vector Machine Using Cross Validation - Testing
Data</strong></h1>
</section>
<div class="cell code" id="xcyz63u3PGnd">
<div class="sourceCode" id="cb136"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="S2ptXV1-PGp_">
<div class="sourceCode" id="cb137"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="0g0YhvWfPGsn" data-outputId="0e63cb62-a778-4e7a-9bc6-91016aa4482f">
<div class="sourceCode" id="cb138"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb138-1"><a href="#cb138-1" aria-hidden="true" tabindex="-1"></a><span class="co">## Fitting the model on the unseen data (test data)</span></span>
<span id="cb138-2"><a href="#cb138-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> cross_val_predict</span>
<span id="cb138-3"><a href="#cb138-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn <span class="im">import</span> metrics</span>
<span id="cb138-4"><a href="#cb138-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb138-5"><a href="#cb138-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Specify the logistic regression model</span></span>
<span id="cb138-6"><a href="#cb138-6" aria-hidden="true" tabindex="-1"></a><span class="co"># model = LogisticRegression(solver=&quot;liblinear&quot;, random_state=1)</span></span>
<span id="cb138-7"><a href="#cb138-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb138-8"><a href="#cb138-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb138-9"><a href="#cb138-9" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_SVM_test = cross_val_predict(best_svm_linear_model, x_test, y_test, cv=10)</span></span>
<span id="cb138-10"><a href="#cb138-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb138-11"><a href="#cb138-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb138-12"><a href="#cb138-12" aria-hidden="true" tabindex="-1"></a>cv_predictions_SVM_test <span class="op">=</span> best_svm_linear_model.predict(x_test)<span class="co">#, y_test, cv=10)</span></span>
<span id="cb138-13"><a href="#cb138-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb138-14"><a href="#cb138-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb138-15"><a href="#cb138-15" aria-hidden="true" tabindex="-1"></a>test_cv_acc_SVM <span class="op">=</span> metrics.accuracy_score(y_test, cv_predictions_SVM_test)</span>
<span id="cb138-16"><a href="#cb138-16" aria-hidden="true" tabindex="-1"></a>test_cv_recall_SVM <span class="op">=</span> metrics.recall_score(y_test, cv_predictions_SVM_test)</span>
<span id="cb138-17"><a href="#cb138-17" aria-hidden="true" tabindex="-1"></a>test_cv_precision_SVM <span class="op">=</span> metrics.precision_score(y_test, cv_predictions_SVM_test)</span>
<span id="cb138-18"><a href="#cb138-18" aria-hidden="true" tabindex="-1"></a>test_cv_f1_SVM <span class="op">=</span> metrics.f1_score(y_test, cv_predictions_SVM_test)</span>
<span id="cb138-19"><a href="#cb138-19" aria-hidden="true" tabindex="-1"></a>roc_auc_test_svm <span class="op">=</span> roc_auc_score(y_test, cv_predictions_SVM_test)</span>
<span id="cb138-20"><a href="#cb138-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb138-21"><a href="#cb138-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb138-22"><a href="#cb138-22" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;The metrics for the testing set using the cross-validation are: &#39;</span>)</span>
<span id="cb138-23"><a href="#cb138-23" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, test_cv_acc_SVM)</span>
<span id="cb138-24"><a href="#cb138-24" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, test_cv_recall_SVM)</span>
<span id="cb138-25"><a href="#cb138-25" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, test_cv_precision_SVM)</span>
<span id="cb138-26"><a href="#cb138-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, test_cv_f1_SVM)</span>
<span id="cb138-27"><a href="#cb138-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated ROC_AUC Score:&quot;</span>, roc_auc_test_svm)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the testing set using the cross-validation are: 
Cross-Validated Accuracy: 0.910228210063239
Cross-Validated Recall: 0.9323597232897771
Cross-Validated Precision: 0.9034260178748759
Cross-Validated F1 Score: 0.9176648594124321
Cross-Validated ROC_AUC Score: 0.9084818491856776
</code></pre>
</div>
</div>
<div class="cell code" id="DX10Idc1PHXM">
<div class="sourceCode" id="cb140"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="jk8qxtZDPHZp">
<div class="sourceCode" id="cb141"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="QYdlrVFYPYmw">
<div class="sourceCode" id="cb142"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="metrics-for-the-svm---train-and-test" class="cell markdown"
id="952BzlX4uN2G">
<h1><strong>Metrics for the SVM - Train and Test</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:223}"
id="h6s9w6W-mdLw" data-outputId="d5cf76e4-c8d0-4d02-9f9f-34e3c6da7a82">
<div class="sourceCode" id="cb143"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb143-1"><a href="#cb143-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb143-2"><a href="#cb143-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb143-3"><a href="#cb143-3" aria-hidden="true" tabindex="-1"></a>svm_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb143-4"><a href="#cb143-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_svm_linear, train_cv_recall_svm_linear, train_cv_precision_svm_linear, train_cv_f1_svm_linear,roc_auc_train_svm],</span>
<span id="cb143-5"><a href="#cb143-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_SVM, test_cv_recall_SVM, test_cv_precision_SVM, test_cv_f1_SVM,roc_auc_test_svm]},</span>
<span id="cb143-6"><a href="#cb143-6" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb143-7"><a href="#cb143-7" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>])</span>
<span id="cb143-8"><a href="#cb143-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for SVM are: &quot;</span>)</span>
<span id="cb143-9"><a href="#cb143-9" aria-hidden="true" tabindex="-1"></a>svm_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for SVM are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="36">

  <div id="df-386ac6bb-146c-45e3-8724-2ca7282ac1c0" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.929758</td>
      <td>0.910228</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.945866</td>
      <td>0.932360</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.925423</td>
      <td>0.903426</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.935533</td>
      <td>0.917665</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.854938</td>
      <td>0.908482</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-386ac6bb-146c-45e3-8724-2ca7282ac1c0')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-386ac6bb-146c-45e3-8724-2ca7282ac1c0 button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-386ac6bb-146c-45e3-8724-2ca7282ac1c0');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-d3f2dfa8-7cb6-4b7c-b283-84729bd721c5">
  <button class="colab-df-quickchart" onclick="quickchart('df-d3f2dfa8-7cb6-4b7c-b283-84729bd721c5')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-d3f2dfa8-7cb6-4b7c-b283-84729bd721c5 button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_19eb5b62-8c92-4d23-9958-ac9ff15f4eec">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('svm_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_19eb5b62-8c92-4d23-9958-ac9ff15f4eec button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('svm_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell markdown" id="5zvcJxE7nhPG">
<ul>
<li><p><strong>The SVM is performing better than the logistic regression
in this phase. Furthermore, the accuracy of 91.02% on the test data
informs us that SVM is performing a lot better than the LR model whose
accuracy is about 84.35%.</strong></p></li>
<li><p><strong>The recall score is so high till now with
93.23%.</strong></p></li>
<li><p><strong>Let's see by building other models.</strong></p></li>
</ul>
</div>
<section id="receiver-operating-curve-roc---svm" class="cell markdown"
id="xhS8i5qP7sYX">
<h1><strong>Receiver Operating Curve (ROC) - SVM</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:472}"
id="Bkqnh0ivTKrg" data-outputId="0ba122a2-b1f0-4768-d8c8-31581f4dcaa7">
<div class="sourceCode" id="cb145"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb145-1"><a href="#cb145-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb145-2"><a href="#cb145-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb145-3"><a href="#cb145-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> roc_curve, auc</span>
<span id="cb145-4"><a href="#cb145-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb145-5"><a href="#cb145-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Get decision values</span></span>
<span id="cb145-6"><a href="#cb145-6" aria-hidden="true" tabindex="-1"></a>decision_values <span class="op">=</span> best_svm_linear_model.decision_function(x_test)</span>
<span id="cb145-7"><a href="#cb145-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb145-8"><a href="#cb145-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC curve</span></span>
<span id="cb145-9"><a href="#cb145-9" aria-hidden="true" tabindex="-1"></a>fpr, tpr, thresholds <span class="op">=</span> roc_curve(y_test, decision_values)</span>
<span id="cb145-10"><a href="#cb145-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb145-11"><a href="#cb145-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC area under the curve</span></span>
<span id="cb145-12"><a href="#cb145-12" aria-hidden="true" tabindex="-1"></a>roc_auc <span class="op">=</span> auc(fpr, tpr)</span>
<span id="cb145-13"><a href="#cb145-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb145-14"><a href="#cb145-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot ROC curve</span></span>
<span id="cb145-15"><a href="#cb145-15" aria-hidden="true" tabindex="-1"></a>plt.figure()</span>
<span id="cb145-16"><a href="#cb145-16" aria-hidden="true" tabindex="-1"></a>plt.plot(fpr, tpr, color<span class="op">=</span><span class="st">&#39;darkorange&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, label<span class="op">=</span><span class="st">&#39;ROC curve (area = </span><span class="sc">%0.2f</span><span class="st">)&#39;</span> <span class="op">%</span> roc_auc)</span>
<span id="cb145-17"><a href="#cb145-17" aria-hidden="true" tabindex="-1"></a>plt.plot([<span class="dv">0</span>, <span class="dv">1</span>], [<span class="dv">0</span>, <span class="dv">1</span>], color<span class="op">=</span><span class="st">&#39;navy&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, linestyle<span class="op">=</span><span class="st">&#39;--&#39;</span>)</span>
<span id="cb145-18"><a href="#cb145-18" aria-hidden="true" tabindex="-1"></a>plt.xlim([<span class="fl">0.0</span>, <span class="fl">1.0</span>])</span>
<span id="cb145-19"><a href="#cb145-19" aria-hidden="true" tabindex="-1"></a>plt.ylim([<span class="fl">0.0</span>, <span class="fl">1.05</span>])</span>
<span id="cb145-20"><a href="#cb145-20" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;False Positive Rate&#39;</span>)</span>
<span id="cb145-21"><a href="#cb145-21" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True Positive Rate&#39;</span>)</span>
<span id="cb145-22"><a href="#cb145-22" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Receiver Operating Characteristic (ROC) Curve&#39;</span>)</span>
<span id="cb145-23"><a href="#cb145-23" aria-hidden="true" tabindex="-1"></a>plt.legend(loc<span class="op">=</span><span class="st">&quot;lower right&quot;</span>)</span>
<span id="cb145-24"><a href="#cb145-24" aria-hidden="true" tabindex="-1"></a>plt.show()</span>
<span id="cb145-25"><a href="#cb145-25" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/88f5043abf4b4f4890075bce363789a6b76ef9f3.png" /></p>
</div>
</div>
<div class="cell markdown" id="k65ZL_ee8Tlm">
<p><strong>From the above ROC curve, it can be seen that the SVM is
distinguishing better between both the classes - positive and negative.
It can be gleaned from the fact that the area under the cruve is more
than 0.5 and is almost near to 1. Also, the ROC curve is even more
better than that of the logistic regression.</strong></p>
</div>
<section id="variable-importance-for-svm" class="cell markdown"
id="jlXPJrJbwADJ">
<h1><strong>Variable Importance for SVM</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="hyaNWDuwv_XE" data-outputId="b67709da-5003-457f-c753-32608b924086">
<div class="sourceCode" id="cb146"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb146-1"><a href="#cb146-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.inspection <span class="im">import</span> permutation_importance</span>
<span id="cb146-2"><a href="#cb146-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb146-3"><a href="#cb146-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate permutation importance</span></span>
<span id="cb146-4"><a href="#cb146-4" aria-hidden="true" tabindex="-1"></a>perm_importance <span class="op">=</span> permutation_importance(best_svm_linear_model, x_test, y_test, n_repeats<span class="op">=</span><span class="dv">5</span>, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb146-5"><a href="#cb146-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb146-6"><a href="#cb146-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Get the sorted indices</span></span>
<span id="cb146-7"><a href="#cb146-7" aria-hidden="true" tabindex="-1"></a>sorted_indices <span class="op">=</span> perm_importance.importances_mean.argsort()[::<span class="op">-</span><span class="dv">1</span>]</span>
<span id="cb146-8"><a href="#cb146-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb146-9"><a href="#cb146-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Print feature ranking</span></span>
<span id="cb146-10"><a href="#cb146-10" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Feature ranking:&quot;</span>)</span>
<span id="cb146-11"><a href="#cb146-11" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> f <span class="kw">in</span> sorted_indices:</span>
<span id="cb146-12"><a href="#cb146-12" aria-hidden="true" tabindex="-1"></a>    <span class="bu">print</span>(<span class="st">&quot;</span><span class="sc">%d</span><span class="st">. </span><span class="sc">%s</span><span class="st"> (</span><span class="sc">%f</span><span class="st">)&quot;</span> <span class="op">%</span> (f <span class="op">+</span> <span class="dv">1</span>, feature_names[f], perm_importance.importances_mean[f]))</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Feature ranking:
4. membership_category (0.247182)
10. points_in_wallet (0.077949)
12. label (0.045834)
13. score (0.017954)
2. gender (0.001155)
11. complaint_status (0.001100)
9. avg_transaction_value (0.000907)
3. region_category (0.000165)
1. age (-0.000110)
5. medium_of_operation (-0.000275)
8. avg_time_spent (-0.000495)
7. days_since_last_login (-0.000495)
6. internet_option (-0.000550)
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:607}"
id="wT-uXTXoTEWe" data-outputId="89729d94-ba0e-48c5-df17-07cffb887f02">
<div class="sourceCode" id="cb148"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb148-1"><a href="#cb148-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb148-2"><a href="#cb148-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb148-3"><a href="#cb148-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot feature importance</span></span>
<span id="cb148-4"><a href="#cb148-4" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">10</span>, <span class="dv">6</span>))</span>
<span id="cb148-5"><a href="#cb148-5" aria-hidden="true" tabindex="-1"></a>plt.bar(<span class="bu">range</span>(<span class="bu">len</span>(sorted_indices)), perm_importance.importances_mean[sorted_indices], align<span class="op">=</span><span class="st">&#39;center&#39;</span>)</span>
<span id="cb148-6"><a href="#cb148-6" aria-hidden="true" tabindex="-1"></a>plt.xticks(<span class="bu">range</span>(<span class="bu">len</span>(sorted_indices)), [feature_names[i] <span class="cf">for</span> i <span class="kw">in</span> sorted_indices], rotation<span class="op">=</span><span class="dv">90</span>)</span>
<span id="cb148-7"><a href="#cb148-7" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Feature&#39;</span>)</span>
<span id="cb148-8"><a href="#cb148-8" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;Permutation Importance&#39;</span>)</span>
<span id="cb148-9"><a href="#cb148-9" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Permutation Importance of Features&#39;</span>)</span>
<span id="cb148-10"><a href="#cb148-10" aria-hidden="true" tabindex="-1"></a>plt.tight_layout()</span>
<span id="cb148-11"><a href="#cb148-11" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/15cc8d465a415bd29915c14103e4d756df459f56.png" /></p>
</div>
</div>
<div class="cell markdown" id="TjIs0TxPytLc">
<p><strong>From the variable importance graph it can be seen that
"membership_category" is appeared to be the most important again for the
churn prediction of this organization. In addition to
"membership_category", "points_in_wallet"; "sentiment label" and
"sentiment score" also seem to impactful for the churn prediction with
"label" having the importance value a bit higher than "score". which is
also having good importance for the prediction. These are also found to
be relatively important which is seen in the logistic regression as
well.</strong></p>
</div>
<section id="confusion-matrix-for-support-vector-machine"
class="cell markdown" id="EB4MEDC5HsMt">
<h1><strong>Confusion Matrix for Support Vector Machine</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:1000}"
id="X3qpqgOsGqTQ" data-outputId="cf880821-027e-437c-fae5-5b0370eed88b">
<div class="sourceCode" id="cb149"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb149-1"><a href="#cb149-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> confusion_matrix</span>
<span id="cb149-2"><a href="#cb149-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
<span id="cb149-3"><a href="#cb149-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb149-4"><a href="#cb149-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb149-5"><a href="#cb149-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Generating confusion matrix for the training set</span></span>
<span id="cb149-6"><a href="#cb149-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb149-7"><a href="#cb149-7" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_train = cross_val_predict(best_svm_linear_model, x_train, y_train, cv=5)</span></span>
<span id="cb149-8"><a href="#cb149-8" aria-hidden="true" tabindex="-1"></a>conf_matrix_train <span class="op">=</span> confusion_matrix(y_train, cv_predictions_svm_linear_train)</span>
<span id="cb149-9"><a href="#cb149-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb149-10"><a href="#cb149-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for training set</span></span>
<span id="cb149-11"><a href="#cb149-11" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb149-12"><a href="#cb149-12" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_train, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb149-13"><a href="#cb149-13" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb149-14"><a href="#cb149-14" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb149-15"><a href="#cb149-15" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Training Set&#39;</span>)</span>
<span id="cb149-16"><a href="#cb149-16" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb149-17"><a href="#cb149-17" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb149-18"><a href="#cb149-18" aria-hidden="true" tabindex="-1"></a>plt.show()</span>
<span id="cb149-19"><a href="#cb149-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb149-20"><a href="#cb149-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Generating confusion matrix for testing set</span></span>
<span id="cb149-21"><a href="#cb149-21" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_test = cross_val_predict(best_svm_linear_model, x_test, y_test, cv=5)</span></span>
<span id="cb149-22"><a href="#cb149-22" aria-hidden="true" tabindex="-1"></a>conf_matrix_test <span class="op">=</span> confusion_matrix(y_test, cv_predictions_SVM_test)</span>
<span id="cb149-23"><a href="#cb149-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb149-24"><a href="#cb149-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for testing set</span></span>
<span id="cb149-25"><a href="#cb149-25" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb149-26"><a href="#cb149-26" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_test, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb149-27"><a href="#cb149-27" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb149-28"><a href="#cb149-28" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb149-29"><a href="#cb149-29" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Testing Set&#39;</span>)</span>
<span id="cb149-30"><a href="#cb149-30" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb149-31"><a href="#cb149-31" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb149-32"><a href="#cb149-32" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/d08aade6453a7a2dae13b46c432532fd8c932133.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/d0c5c6fc9af7b27f4b43cc8cc3a793ca55cd6d6e.png" /></p>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="fg-t5sL87Qbj" data-outputId="be0beb96-94e1-4bec-db8c-41cd757eca56">
<div class="sourceCode" id="cb150"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb150-1"><a href="#cb150-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb150-2"><a href="#cb150-2" aria-hidden="true" tabindex="-1"></a>tn_train, fp_train, fn_train, tp_train <span class="op">=</span> conf_matrix_train.ravel()</span>
<span id="cb150-3"><a href="#cb150-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-4"><a href="#cb150-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb150-5"><a href="#cb150-5" aria-hidden="true" tabindex="-1"></a>specificity_svm_train <span class="op">=</span> tn_train <span class="op">/</span> (tn_train <span class="op">+</span> fp_train)</span>
<span id="cb150-6"><a href="#cb150-6" aria-hidden="true" tabindex="-1"></a>sensitivity_svm_train <span class="op">=</span> tp_train <span class="op">/</span> (tp_train <span class="op">+</span> fn_train)</span>
<span id="cb150-7"><a href="#cb150-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-8"><a href="#cb150-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-9"><a href="#cb150-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-10"><a href="#cb150-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-11"><a href="#cb150-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb150-12"><a href="#cb150-12" aria-hidden="true" tabindex="-1"></a>tn_test, fp_test, fn_test, tp_test <span class="op">=</span> conf_matrix_test.ravel()</span>
<span id="cb150-13"><a href="#cb150-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-14"><a href="#cb150-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb150-15"><a href="#cb150-15" aria-hidden="true" tabindex="-1"></a>specificity_svm_test <span class="op">=</span> tn_test <span class="op">/</span> (tn_test <span class="op">+</span> fp_test)</span>
<span id="cb150-16"><a href="#cb150-16" aria-hidden="true" tabindex="-1"></a>sensitivity_svm_test <span class="op">=</span> tp_test <span class="op">/</span> (tp_test <span class="op">+</span> fn_test)</span>
<span id="cb150-17"><a href="#cb150-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-18"><a href="#cb150-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb150-19"><a href="#cb150-19" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the training data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_svm_train<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_svm_train<span class="sc">}</span><span class="ch">\n\n</span><span class="ss">&quot;</span>)</span>
<span id="cb150-20"><a href="#cb150-20" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the testing data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_svm_test<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_svm_test<span class="sc">}</span><span class="ss">&quot;</span>)</span>
<span id="cb150-21"><a href="#cb150-21" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<div class="output stream stdout">
<pre><code>The Sensitivity and Specificity of the Logistic Regression on the training data:
Specificity:0.9109378993099924
Sensitivity:0.9458661417322834


The Sensitivity and Specificity of the Logistic Regression on the testing data:
Specificity:0.8846039750815782
Sensitivity:0.9323597232897771
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:286}"
id="23obhybOO70P" data-outputId="18cac714-bc16-4141-ccec-7b730381eebe">
<div class="sourceCode" id="cb152"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb152-1"><a href="#cb152-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb152-2"><a href="#cb152-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb152-3"><a href="#cb152-3" aria-hidden="true" tabindex="-1"></a>svm_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb152-4"><a href="#cb152-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_svm_linear, train_cv_recall_svm_linear, train_cv_precision_svm_linear, train_cv_f1_svm_linear,roc_auc_train_svm,specificity_svm_train,</span>
<span id="cb152-5"><a href="#cb152-5" aria-hidden="true" tabindex="-1"></a>                 sensitivity_svm_train],</span>
<span id="cb152-6"><a href="#cb152-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_SVM, test_cv_recall_SVM, test_cv_precision_SVM, test_cv_f1_SVM,roc_auc_test_svm,specificity_svm_test,sensitivity_svm_test]},</span>
<span id="cb152-7"><a href="#cb152-7" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb152-8"><a href="#cb152-8" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>,<span class="st">&#39;Specificity&#39;</span>,<span class="st">&#39;Sensitivity&#39;</span>])</span>
<span id="cb152-9"><a href="#cb152-9" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for SVM are: &quot;</span>)</span>
<span id="cb152-10"><a href="#cb152-10" aria-hidden="true" tabindex="-1"></a>svm_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for SVM are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="48">

  <div id="df-f0aa9cd5-bec4-4e6f-b5d7-a649dc251e87" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.929758</td>
      <td>0.910228</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.945866</td>
      <td>0.932360</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.925423</td>
      <td>0.903426</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.935533</td>
      <td>0.917665</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.854938</td>
      <td>0.908482</td>
    </tr>
    <tr>
      <th>Specificity</th>
      <td>0.910938</td>
      <td>0.884604</td>
    </tr>
    <tr>
      <th>Sensitivity</th>
      <td>0.945866</td>
      <td>0.932360</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-f0aa9cd5-bec4-4e6f-b5d7-a649dc251e87')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-f0aa9cd5-bec4-4e6f-b5d7-a649dc251e87 button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-f0aa9cd5-bec4-4e6f-b5d7-a649dc251e87');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-ca36e150-fe1e-4b98-92cd-3dbcf060cd00">
  <button class="colab-df-quickchart" onclick="quickchart('df-ca36e150-fe1e-4b98-92cd-3dbcf060cd00')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-ca36e150-fe1e-4b98-92cd-3dbcf060cd00 button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_e2392255-f55d-49a7-bc9e-4320a464307f">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('svm_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_e2392255-f55d-49a7-bc9e-4320a464307f button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('svm_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell code" id="QvU_btzkO9U2">
<div class="sourceCode" id="cb154"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="random-forest-model-with-the-training-data"
class="cell markdown" id="dIWTBPD8IlXF">
<h1><strong>Random Forest Model with the Training Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="7yDwcVoPH0PD" data-outputId="6245a1ea-f676-40d7-fbe0-0f5122a55a72">
<div class="sourceCode" id="cb155"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb155-1"><a href="#cb155-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn <span class="im">import</span> metrics</span>
<span id="cb155-2"><a href="#cb155-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-3"><a href="#cb155-3" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.ensemble <span class="im">import</span> RandomForestClassifier</span>
<span id="cb155-4"><a href="#cb155-4" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb155-5"><a href="#cb155-5" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> RandomizedSearchCV</span>
<span id="cb155-6"><a href="#cb155-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-7"><a href="#cb155-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-8"><a href="#cb155-8" aria-hidden="true" tabindex="-1"></a><span class="co"># Measure start time</span></span>
<span id="cb155-9"><a href="#cb155-9" aria-hidden="true" tabindex="-1"></a>start_time <span class="op">=</span> time.time()</span>
<span id="cb155-10"><a href="#cb155-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-11"><a href="#cb155-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Measure memory usage before and after model fitting</span></span>
<span id="cb155-12"><a href="#cb155-12" aria-hidden="true" tabindex="-1"></a><span class="co"># mem_usage_before = memory_usage(-1, interval=0.1, timeout=1)</span></span>
<span id="cb155-13"><a href="#cb155-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-14"><a href="#cb155-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Defining hyperparameter grid</span></span>
<span id="cb155-15"><a href="#cb155-15" aria-hidden="true" tabindex="-1"></a>param_grid <span class="op">=</span> {</span>
<span id="cb155-16"><a href="#cb155-16" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;n_estimators&#39;</span>: [<span class="dv">100</span>, <span class="dv">200</span>, <span class="dv">300</span>],  <span class="co"># Number of trees in the forest</span></span>
<span id="cb155-17"><a href="#cb155-17" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;max_depth&#39;</span>: [<span class="va">None</span>, <span class="dv">10</span>, <span class="dv">20</span>, <span class="dv">30</span>],   <span class="co"># Maximum depth of the tree</span></span>
<span id="cb155-18"><a href="#cb155-18" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;min_samples_split&#39;</span>: [<span class="dv">2</span>, <span class="dv">5</span>, <span class="dv">10</span>],   <span class="co"># Minimum number of samples required to split an internal node</span></span>
<span id="cb155-19"><a href="#cb155-19" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;min_samples_leaf&#39;</span>: [<span class="dv">1</span>, <span class="dv">2</span>, <span class="dv">4</span>]      <span class="co"># Minimum number of samples required to be at a leaf node</span></span>
<span id="cb155-20"><a href="#cb155-20" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb155-21"><a href="#cb155-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-22"><a href="#cb155-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating Random Forest model</span></span>
<span id="cb155-23"><a href="#cb155-23" aria-hidden="true" tabindex="-1"></a>rf_model <span class="op">=</span> RandomForestClassifier(random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb155-24"><a href="#cb155-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-25"><a href="#cb155-25" aria-hidden="true" tabindex="-1"></a><span class="co"># Performing RandomizedSearchCV</span></span>
<span id="cb155-26"><a href="#cb155-26" aria-hidden="true" tabindex="-1"></a>random_search <span class="op">=</span> RandomizedSearchCV(estimator<span class="op">=</span>rf_model, param_distributions<span class="op">=</span>param_grid,</span>
<span id="cb155-27"><a href="#cb155-27" aria-hidden="true" tabindex="-1"></a>                                   n_iter<span class="op">=</span><span class="dv">10</span>, cv<span class="op">=</span><span class="dv">10</span>, scoring<span class="op">=</span>[<span class="st">&#39;accuracy&#39;</span>,<span class="st">&#39;recall&#39;</span>,<span class="st">&#39;f1&#39;</span>,<span class="st">&#39;roc_auc&#39;</span>,<span class="st">&#39;balanced_accuracy&#39;</span>], refit<span class="op">=</span><span class="st">&quot;accuracy&quot;</span>, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb155-28"><a href="#cb155-28" aria-hidden="true" tabindex="-1"></a>random_search.fit(x_train, y_train)</span>
<span id="cb155-29"><a href="#cb155-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-30"><a href="#cb155-30" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the best parameters and best score</span></span>
<span id="cb155-31"><a href="#cb155-31" aria-hidden="true" tabindex="-1"></a>best_params_rf <span class="op">=</span> random_search.best_params_</span>
<span id="cb155-32"><a href="#cb155-32" aria-hidden="true" tabindex="-1"></a>best_score_rf <span class="op">=</span> random_search.best_score_</span>
<span id="cb155-33"><a href="#cb155-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb155-34"><a href="#cb155-34" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;</span><span class="ch">\n</span><span class="ss">The best hyperparameters for SVM are:</span><span class="ch">\n</span><span class="sc">{</span>best_params_rf<span class="sc">}</span><span class="ch">\n</span><span class="ss">&quot;</span>)</span>
<span id="cb155-35"><a href="#cb155-35" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;Thes best score for SVM is:</span><span class="ch">\n</span><span class="sc">{</span>best_score_rf<span class="sc">}</span><span class="ss">&quot;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>
The best hyperparameters for SVM are:
{&#39;n_estimators&#39;: 100, &#39;min_samples_split&#39;: 2, &#39;min_samples_leaf&#39;: 2, &#39;max_depth&#39;: 30}

Thes best score for SVM is:
0.9434885091337655
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="r1GNtg7iH0Sn" data-outputId="5a8dedf2-3846-4402-9f99-d0bd4d5dd641">
<div class="sourceCode" id="cb157"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb157-1"><a href="#cb157-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-2"><a href="#cb157-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Training the model with the best parameters</span></span>
<span id="cb157-3"><a href="#cb157-3" aria-hidden="true" tabindex="-1"></a>best_rf_model <span class="op">=</span> RandomForestClassifier(<span class="op">**</span>best_params_rf, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb157-4"><a href="#cb157-4" aria-hidden="true" tabindex="-1"></a>best_rf_model <span class="op">=</span> best_rf_model.fit(x_train, y_train)</span>
<span id="cb157-5"><a href="#cb157-5" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-6"><a href="#cb157-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb157-7"><a href="#cb157-7" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_rf_train = cross_val_predict(best_rf_model, x_train, y_train, cv=10)</span></span>
<span id="cb157-8"><a href="#cb157-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-9"><a href="#cb157-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-10"><a href="#cb157-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb157-11"><a href="#cb157-11" aria-hidden="true" tabindex="-1"></a>cv_predictions_rf_train <span class="op">=</span> best_rf_model.predict(x_train) <span class="co">#, y_train, cv=10)</span></span>
<span id="cb157-12"><a href="#cb157-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-13"><a href="#cb157-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-14"><a href="#cb157-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-15"><a href="#cb157-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb157-16"><a href="#cb157-16" aria-hidden="true" tabindex="-1"></a>train_cv_acc_rf <span class="op">=</span> metrics.accuracy_score(y_train, cv_predictions_rf_train)</span>
<span id="cb157-17"><a href="#cb157-17" aria-hidden="true" tabindex="-1"></a>train_cv_recall_rf <span class="op">=</span> metrics.recall_score(y_train, cv_predictions_rf_train)</span>
<span id="cb157-18"><a href="#cb157-18" aria-hidden="true" tabindex="-1"></a>train_cv_precision_rf <span class="op">=</span> metrics.precision_score(y_train, cv_predictions_rf_train)</span>
<span id="cb157-19"><a href="#cb157-19" aria-hidden="true" tabindex="-1"></a>train_cv_f1_rf <span class="op">=</span> metrics.f1_score(y_train, cv_predictions_rf_train)</span>
<span id="cb157-20"><a href="#cb157-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating ROC AUC</span></span>
<span id="cb157-21"><a href="#cb157-21" aria-hidden="true" tabindex="-1"></a>roc_auc_train_rf <span class="op">=</span> roc_auc_score(y_train, cv_predictions_rf_train)</span>
<span id="cb157-22"><a href="#cb157-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-23"><a href="#cb157-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-24"><a href="#cb157-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb157-25"><a href="#cb157-25" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb157-26"><a href="#cb157-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;The metrics for the training set using the cross-validation are: &#39;</span>)</span>
<span id="cb157-27"><a href="#cb157-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, train_cv_acc_rf)</span>
<span id="cb157-28"><a href="#cb157-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, train_cv_recall_rf)</span>
<span id="cb157-29"><a href="#cb157-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, train_cv_precision_rf)</span>
<span id="cb157-30"><a href="#cb157-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, train_cv_f1_rf)</span>
<span id="cb157-31"><a href="#cb157-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated ROC_AUC Score:&quot;</span>, roc_auc_train_rf)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the training set using the cross-validation are: 
Cross-Validated Accuracy: 0.9829110194460813
Cross-Validated Recall: 0.9992344706911636
Cross-Validated Precision: 0.9699575371549893
Cross-Validated F1 Score: 0.984378366731308
Cross-Validated ROC_AUC Score: 0.9815364788927324
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="tmc1dknA-fRQ" data-outputId="6ec487b0-ae1e-4426-e372-c8c93d02138d">
<div class="sourceCode" id="cb159"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb159-1"><a href="#cb159-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Record end time</span></span>
<span id="cb159-2"><a href="#cb159-2" aria-hidden="true" tabindex="-1"></a>end_time <span class="op">=</span> time.time()</span>
<span id="cb159-3"><a href="#cb159-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb159-4"><a href="#cb159-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate execution time</span></span>
<span id="cb159-5"><a href="#cb159-5" aria-hidden="true" tabindex="-1"></a>execution_time_rf <span class="op">=</span> end_time <span class="op">-</span> start_time</span>
<span id="cb159-6"><a href="#cb159-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb159-7"><a href="#cb159-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate memory usage</span></span>
<span id="cb159-8"><a href="#cb159-8" aria-hidden="true" tabindex="-1"></a>process <span class="op">=</span> psutil.Process()</span>
<span id="cb159-9"><a href="#cb159-9" aria-hidden="true" tabindex="-1"></a>memory_used_rf <span class="op">=</span> process.memory_info().rss <span class="op">/</span> (<span class="dv">1024</span> <span class="op">*</span> <span class="dv">1024</span>)  <span class="co"># Convert to MiB</span></span>
<span id="cb159-10"><a href="#cb159-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb159-11"><a href="#cb159-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Print results</span></span>
<span id="cb159-12"><a href="#cb159-12" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Execution Time for Random Forest:&#39;</span>, execution_time_rf, <span class="st">&#39;seconds&#39;</span>)</span>
<span id="cb159-13"><a href="#cb159-13" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Memory Used by Random Forest:&#39;</span>, memory_used_rf, <span class="st">&#39;MiB&#39;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Execution Time for Random Forest: 9481.615648031235 seconds
Memory Used by Random Forest: 1664.6953125 MiB
</code></pre>
</div>
</div>
<section id="random-forest-model-with-the-testing-data"
class="cell markdown" id="k5EBUlwhIp7B">
<h1><strong>Random Forest Model with the Testing Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="gdc2apbZISma" data-outputId="1771fa82-7aad-476a-8b99-05f10dedf21e">
<div class="sourceCode" id="cb161"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb161-1"><a href="#cb161-1" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-2"><a href="#cb161-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn <span class="im">import</span> metrics</span>
<span id="cb161-3"><a href="#cb161-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-4"><a href="#cb161-4" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.ensemble <span class="im">import</span> RandomForestClassifier</span>
<span id="cb161-5"><a href="#cb161-5" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
<span id="cb161-6"><a href="#cb161-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-7"><a href="#cb161-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Specifying the logistic regression model</span></span>
<span id="cb161-8"><a href="#cb161-8" aria-hidden="true" tabindex="-1"></a><span class="co"># model = LogisticRegression(solver=&quot;liblinear&quot;, random_state=1)</span></span>
<span id="cb161-9"><a href="#cb161-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-10"><a href="#cb161-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-11"><a href="#cb161-11" aria-hidden="true" tabindex="-1"></a><span class="co">#Specify the Random Forest model</span></span>
<span id="cb161-12"><a href="#cb161-12" aria-hidden="true" tabindex="-1"></a><span class="co"># rf_model = RandomForestClassifier(n_estimators=100, random_state=1)</span></span>
<span id="cb161-13"><a href="#cb161-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-14"><a href="#cb161-14" aria-hidden="true" tabindex="-1"></a><span class="co"># best_rf_model.fit(x_test,y_test)</span></span>
<span id="cb161-15"><a href="#cb161-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-16"><a href="#cb161-16" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb161-17"><a href="#cb161-17" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_rf_test = cross_val_predict(best_rf_model, x_test, y_test, cv=10)</span></span>
<span id="cb161-18"><a href="#cb161-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-19"><a href="#cb161-19" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb161-20"><a href="#cb161-20" aria-hidden="true" tabindex="-1"></a>cv_predictions_rf_test <span class="op">=</span> best_rf_model.predict(x_test)</span>
<span id="cb161-21"><a href="#cb161-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-22"><a href="#cb161-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb161-23"><a href="#cb161-23" aria-hidden="true" tabindex="-1"></a>test_cv_acc_rf <span class="op">=</span> metrics.accuracy_score(y_test, cv_predictions_rf_test)</span>
<span id="cb161-24"><a href="#cb161-24" aria-hidden="true" tabindex="-1"></a>test_cv_recall_rf <span class="op">=</span> metrics.recall_score(y_test, cv_predictions_rf_test)</span>
<span id="cb161-25"><a href="#cb161-25" aria-hidden="true" tabindex="-1"></a>test_cv_precision_rf <span class="op">=</span> metrics.precision_score(y_test, cv_predictions_rf_test)</span>
<span id="cb161-26"><a href="#cb161-26" aria-hidden="true" tabindex="-1"></a>test_cv_f1_rf <span class="op">=</span> metrics.f1_score(y_test, cv_predictions_rf_test)</span>
<span id="cb161-27"><a href="#cb161-27" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating ROC AUC</span></span>
<span id="cb161-28"><a href="#cb161-28" aria-hidden="true" tabindex="-1"></a>roc_auc_test_rf <span class="op">=</span> roc_auc_score(y_test, cv_predictions_rf_test)</span>
<span id="cb161-29"><a href="#cb161-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb161-30"><a href="#cb161-30" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb161-31"><a href="#cb161-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;The metrics for the testing set using the cross-validation are: &#39;</span>)</span>
<span id="cb161-32"><a href="#cb161-32" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, test_cv_acc_rf)</span>
<span id="cb161-33"><a href="#cb161-33" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, test_cv_recall_rf)</span>
<span id="cb161-34"><a href="#cb161-34" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, test_cv_precision_rf)</span>
<span id="cb161-35"><a href="#cb161-35" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, test_cv_f1_rf)</span>
<span id="cb161-36"><a href="#cb161-36" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated ROC_AUC Score:&quot;</span>, roc_auc_test_rf)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the testing set using the cross-validation are: 
Cross-Validated Accuracy: 0.9407478691229035
Cross-Validated Recall: 0.9451703817576224
Cross-Validated Precision: 0.9444444444444444
Cross-Validated F1 Score: 0.9448072736585991
Cross-Validated ROC_AUC Score: 0.9403988960108196
</code></pre>
</div>
</div>
<div class="cell code" id="YGiVyoNHY2qM">
<div class="sourceCode" id="cb163"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="metrics-for-the-random-forest---train-and-test"
class="cell markdown" id="v2zIn2u9t73B">
<h1><strong>Metrics for the Random Forest - Train and Test</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:223}"
id="cJV-4DHmd7Ib" data-outputId="80d9bc8a-8b97-4ecc-8d4e-e86013541af2">
<div class="sourceCode" id="cb164"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb164-1"><a href="#cb164-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb164-2"><a href="#cb164-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb164-3"><a href="#cb164-3" aria-hidden="true" tabindex="-1"></a>rf_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb164-4"><a href="#cb164-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_rf, train_cv_recall_rf, train_cv_precision_rf, train_cv_f1_rf,roc_auc_train_rf],</span>
<span id="cb164-5"><a href="#cb164-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_rf, test_cv_recall_rf, test_cv_precision_rf, test_cv_f1_rf,roc_auc_test_rf]},</span>
<span id="cb164-6"><a href="#cb164-6" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb164-7"><a href="#cb164-7" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>])</span>
<span id="cb164-8"><a href="#cb164-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for the Random Forest are: &quot;</span>)</span>
<span id="cb164-9"><a href="#cb164-9" aria-hidden="true" tabindex="-1"></a>rf_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the Random Forest are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="54">

  <div id="df-1f3413fa-78c9-4af2-a108-ec4c509302ac" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.982911</td>
      <td>0.940748</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.999234</td>
      <td>0.945170</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.969958</td>
      <td>0.944444</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.984378</td>
      <td>0.944807</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.981536</td>
      <td>0.940399</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-1f3413fa-78c9-4af2-a108-ec4c509302ac')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-1f3413fa-78c9-4af2-a108-ec4c509302ac button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-1f3413fa-78c9-4af2-a108-ec4c509302ac');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-c114a6e2-4815-44dc-95e6-a64cc327efb7">
  <button class="colab-df-quickchart" onclick="quickchart('df-c114a6e2-4815-44dc-95e6-a64cc327efb7')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-c114a6e2-4815-44dc-95e6-a64cc327efb7 button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_b2a565be-c3e9-428d-90a3-a1f32b99eab0">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('rf_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_b2a565be-c3e9-428d-90a3-a1f32b99eab0 button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('rf_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell markdown" id="bLDWeukcj4yo">
<ul>
<li><p><strong>The above are the evaluation metrics of the Random Forest
Model on the training and testing set. There is no overfitting of the
model. The Random Forest is performing well in identifying and
predicting the customer churn on the new and unseen data. Also, the
predictive accuracy of 94.077% on the testing set is better than the
Logistic Regression and SVM.</strong></p></li>
<li><p><strong>Furthermore, though the accuracy is better than the other
two models in this phase, when compared to that of the accuracy without
performing the sentiment analysis it is relatively low to some extent.
The accuracy of RF in this phase is 94.07% which is 0.04% more than the
one in the initial phase.</strong></p></li>
</ul>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:472}"
id="6ONywy33Upj0" data-outputId="3f026b79-2772-42c1-e32f-ee098b5f48c3">
<div class="sourceCode" id="cb166"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb166-1"><a href="#cb166-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb166-2"><a href="#cb166-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> roc_curve, auc</span>
<span id="cb166-3"><a href="#cb166-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb166-4"><a href="#cb166-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Get probabilities for positive class</span></span>
<span id="cb166-5"><a href="#cb166-5" aria-hidden="true" tabindex="-1"></a>y_probs_rf <span class="op">=</span> best_rf_model.predict_proba(x_test)[:, <span class="dv">1</span>]</span>
<span id="cb166-6"><a href="#cb166-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb166-7"><a href="#cb166-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC curve</span></span>
<span id="cb166-8"><a href="#cb166-8" aria-hidden="true" tabindex="-1"></a>fpr, tpr, thresholds <span class="op">=</span> roc_curve(y_test, y_probs_rf)</span>
<span id="cb166-9"><a href="#cb166-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb166-10"><a href="#cb166-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC area under the curve</span></span>
<span id="cb166-11"><a href="#cb166-11" aria-hidden="true" tabindex="-1"></a>roc_auc <span class="op">=</span> auc(fpr, tpr)</span>
<span id="cb166-12"><a href="#cb166-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb166-13"><a href="#cb166-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot ROC curve</span></span>
<span id="cb166-14"><a href="#cb166-14" aria-hidden="true" tabindex="-1"></a>plt.figure()</span>
<span id="cb166-15"><a href="#cb166-15" aria-hidden="true" tabindex="-1"></a>plt.plot(fpr, tpr, color<span class="op">=</span><span class="st">&#39;darkorange&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, label<span class="op">=</span><span class="st">&#39;ROC curve (area = </span><span class="sc">%0.2f</span><span class="st">)&#39;</span> <span class="op">%</span> roc_auc)</span>
<span id="cb166-16"><a href="#cb166-16" aria-hidden="true" tabindex="-1"></a>plt.plot([<span class="dv">0</span>, <span class="dv">1</span>], [<span class="dv">0</span>, <span class="dv">1</span>], color<span class="op">=</span><span class="st">&#39;navy&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, linestyle<span class="op">=</span><span class="st">&#39;--&#39;</span>)</span>
<span id="cb166-17"><a href="#cb166-17" aria-hidden="true" tabindex="-1"></a>plt.xlim([<span class="fl">0.0</span>, <span class="fl">1.0</span>])</span>
<span id="cb166-18"><a href="#cb166-18" aria-hidden="true" tabindex="-1"></a>plt.ylim([<span class="fl">0.0</span>, <span class="fl">1.05</span>])</span>
<span id="cb166-19"><a href="#cb166-19" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;False Positive Rate&#39;</span>)</span>
<span id="cb166-20"><a href="#cb166-20" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True Positive Rate&#39;</span>)</span>
<span id="cb166-21"><a href="#cb166-21" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Receiver Operating Characteristic (ROC) Curve&#39;</span>)</span>
<span id="cb166-22"><a href="#cb166-22" aria-hidden="true" tabindex="-1"></a>plt.legend(loc<span class="op">=</span><span class="st">&quot;lower right&quot;</span>)</span>
<span id="cb166-23"><a href="#cb166-23" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/448048f1a0c6303fd3e739f55f807305bf794f69.png" /></p>
</div>
</div>
<div class="cell markdown" id="SVFU0nTb90N6">
<p><strong>From the above ROC curve, it can be seen that the Random
Forest is distinguishing so great between both the classes - positive
and negative. It can be gleaned from the fact that the area under the
cruve is more than 0.5 and is approximately equal to 1. Furthermore, RF
is observed to be distinguishing better in both the instances till now
compared to the previous models of Logistic Regression and
SVM.</strong></p>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="LoenazmDviUh" data-outputId="0761e7a1-a4f3-4862-bbb3-0fb8c24e4f29">
<div class="sourceCode" id="cb167"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb167-1"><a href="#cb167-1" aria-hidden="true" tabindex="-1"></a>df.columns</span></code></pre></div>
<div class="output execute_result" data-execution_count="74">
<pre><code>Index([&#39;age&#39;, &#39;gender&#39;, &#39;region_category&#39;, &#39;membership_category&#39;,
       &#39;medium_of_operation&#39;, &#39;internet_option&#39;, &#39;days_since_last_login&#39;,
       &#39;avg_time_spent&#39;, &#39;avg_transaction_value&#39;, &#39;points_in_wallet&#39;,
       &#39;past_complaint&#39;, &#39;complaint_status&#39;, &#39;feedback&#39;, &#39;churn_risk_score&#39;,
       &#39;score&#39;],
      dtype=&#39;object&#39;)</code></pre>
</div>
</div>
<section id="variable-importances-for-the-random-forest"
class="cell markdown" id="Jo45CtY2wdsq">
<h1><strong>Variable Importances for the Random Forest</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:564}"
id="7Z6oikgXwjoX" data-outputId="869ad001-599e-4fc7-87fd-0e68793e4ad4">
<div class="sourceCode" id="cb169"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb169-1"><a href="#cb169-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb169-2"><a href="#cb169-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb169-3"><a href="#cb169-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Assuming df is your DataFrame containing the data</span></span>
<span id="cb169-4"><a href="#cb169-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Exclude the target variable</span></span>
<span id="cb169-5"><a href="#cb169-5" aria-hidden="true" tabindex="-1"></a>feature_names <span class="op">=</span> df.columns.drop([<span class="st">&#39;churn_risk_score&#39;</span>, <span class="st">&#39;feedback&#39;</span>])</span>
<span id="cb169-6"><a href="#cb169-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb169-7"><a href="#cb169-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting feature importances</span></span>
<span id="cb169-8"><a href="#cb169-8" aria-hidden="true" tabindex="-1"></a>feature_importances <span class="op">=</span> best_rf_model.feature_importances_</span>
<span id="cb169-9"><a href="#cb169-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb169-10"><a href="#cb169-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Sorting feature importances and feature names by importance</span></span>
<span id="cb169-11"><a href="#cb169-11" aria-hidden="true" tabindex="-1"></a>sorted_indices <span class="op">=</span> np.argsort(feature_importances)[::<span class="op">-</span><span class="dv">1</span>]</span>
<span id="cb169-12"><a href="#cb169-12" aria-hidden="true" tabindex="-1"></a>sorted_feature_importances <span class="op">=</span> feature_importances[sorted_indices]</span>
<span id="cb169-13"><a href="#cb169-13" aria-hidden="true" tabindex="-1"></a>sorted_feature_names <span class="op">=</span> feature_names[sorted_indices]</span>
<span id="cb169-14"><a href="#cb169-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb169-15"><a href="#cb169-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting the feature importances</span></span>
<span id="cb169-16"><a href="#cb169-16" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">10</span>, <span class="dv">6</span>))</span>
<span id="cb169-17"><a href="#cb169-17" aria-hidden="true" tabindex="-1"></a>plt.barh(sorted_feature_names, sorted_feature_importances)</span>
<span id="cb169-18"><a href="#cb169-18" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Feature Importance&#39;</span>)</span>
<span id="cb169-19"><a href="#cb169-19" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Variable Importance for Random Forest&#39;</span>)</span>
<span id="cb169-20"><a href="#cb169-20" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/6539c83e9c61f009ade021773ec496477499764c.png" /></p>
</div>
</div>
<div class="cell code" id="D_gWn0i8vZ1y">
<div class="sourceCode" id="cb170"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell markdown" id="QWnRnlEsy6XM">
<ul>
<li><p><strong>From the variable importance graph it can be seen that
"points_in_wallet" appeared to be most impotant with Random Forest along
with "membership_category" in the second place for the churn prediction
of this organization. In addition to "membership_categor", "score",
"label" and "avg_transaction_value" also found to be relatively
important which is seen in the logistic regression and SVM as
well.</strong></p></li>
<li><p><strong>It can be observed that among the newly added variables
"score" and "label", the "score" variable is appeared to be having more
importance for the churn prediction in the RF. Nonetheless, the
importance value is relatively low compared to "membership_category" and
"points_in_wallet".</strong></p></li>
</ul>
</div>
<section id="confusion-matrix-for-the-random-forest"
class="cell markdown" id="hcco4BhCI1cS">
<h1><strong>Confusion Matrix for the Random Forest</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:1000}"
id="SzYjf5hrIbrq" data-outputId="db502740-96ee-4e17-8a1c-acc6bfa3c858">
<div class="sourceCode" id="cb171"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb171-1"><a href="#cb171-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> confusion_matrix</span>
<span id="cb171-2"><a href="#cb171-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
<span id="cb171-3"><a href="#cb171-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb171-4"><a href="#cb171-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb171-5"><a href="#cb171-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Generating confusion matrix for training set</span></span>
<span id="cb171-6"><a href="#cb171-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb171-7"><a href="#cb171-7" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions = cross_val_predict(best_rf_model, x_train, y_train, cv=5)</span></span>
<span id="cb171-8"><a href="#cb171-8" aria-hidden="true" tabindex="-1"></a>conf_matrix_train <span class="op">=</span> confusion_matrix(y_train, cv_predictions_rf_train)</span>
<span id="cb171-9"><a href="#cb171-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb171-10"><a href="#cb171-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for training set</span></span>
<span id="cb171-11"><a href="#cb171-11" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb171-12"><a href="#cb171-12" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_train, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb171-13"><a href="#cb171-13" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb171-14"><a href="#cb171-14" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb171-15"><a href="#cb171-15" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Training Set&#39;</span>)</span>
<span id="cb171-16"><a href="#cb171-16" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb171-17"><a href="#cb171-17" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb171-18"><a href="#cb171-18" aria-hidden="true" tabindex="-1"></a>plt.show()</span>
<span id="cb171-19"><a href="#cb171-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb171-20"><a href="#cb171-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Generating confusion matrix for testing set</span></span>
<span id="cb171-21"><a href="#cb171-21" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions = cross_val_predict(best_rf_model, x_test, y_test, cv=5)</span></span>
<span id="cb171-22"><a href="#cb171-22" aria-hidden="true" tabindex="-1"></a>conf_matrix_test <span class="op">=</span> confusion_matrix(y_test, cv_predictions_rf_test)</span>
<span id="cb171-23"><a href="#cb171-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb171-24"><a href="#cb171-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for testing set</span></span>
<span id="cb171-25"><a href="#cb171-25" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb171-26"><a href="#cb171-26" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_test, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb171-27"><a href="#cb171-27" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb171-28"><a href="#cb171-28" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb171-29"><a href="#cb171-29" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Testing Set&#39;</span>)</span>
<span id="cb171-30"><a href="#cb171-30" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb171-31"><a href="#cb171-31" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb171-32"><a href="#cb171-32" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/b50835fcdc41b0c8ff6157b557673c44357c4c88.png" /></p>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/248782ce51084495afbc2fe2d2489ea775ef62c8.png" /></p>
</div>
</div>
<section id="calculating-the-sensitivity-and-specificity"
class="cell markdown" id="l8RhtSVu-nSw">
<h1><strong>Calculating the Sensitivity and Specificity</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="ne2ksHUZ_aL3" data-outputId="5c319193-de77-4f23-fcb1-5de896deaa77">
<div class="sourceCode" id="cb172"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb172-1"><a href="#cb172-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb172-2"><a href="#cb172-2" aria-hidden="true" tabindex="-1"></a>tn_train, fp_train, fn_train, tp_train <span class="op">=</span> conf_matrix_train.ravel()</span>
<span id="cb172-3"><a href="#cb172-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-4"><a href="#cb172-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb172-5"><a href="#cb172-5" aria-hidden="true" tabindex="-1"></a>specificity_rf_train <span class="op">=</span> tn_train <span class="op">/</span> (tn_train <span class="op">+</span> fp_train)</span>
<span id="cb172-6"><a href="#cb172-6" aria-hidden="true" tabindex="-1"></a>sensitivity_rf_train <span class="op">=</span> tp_train <span class="op">/</span> (tp_train <span class="op">+</span> fn_train)</span>
<span id="cb172-7"><a href="#cb172-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-8"><a href="#cb172-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-9"><a href="#cb172-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-10"><a href="#cb172-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-11"><a href="#cb172-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb172-12"><a href="#cb172-12" aria-hidden="true" tabindex="-1"></a>tn_test, fp_test, fn_test, tp_test <span class="op">=</span> conf_matrix_test.ravel()</span>
<span id="cb172-13"><a href="#cb172-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-14"><a href="#cb172-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb172-15"><a href="#cb172-15" aria-hidden="true" tabindex="-1"></a>specificity_rf_test <span class="op">=</span> tn_test <span class="op">/</span> (tn_test <span class="op">+</span> fp_test)</span>
<span id="cb172-16"><a href="#cb172-16" aria-hidden="true" tabindex="-1"></a>sensitivity_rf_test <span class="op">=</span> tp_test <span class="op">/</span> (tp_test <span class="op">+</span> fn_test)</span>
<span id="cb172-17"><a href="#cb172-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-18"><a href="#cb172-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb172-19"><a href="#cb172-19" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the training data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_rf_train<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_rf_train<span class="sc">}</span><span class="ch">\n\n</span><span class="ss">&quot;</span>)</span>
<span id="cb172-20"><a href="#cb172-20" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the testing data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_rf_test<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_rf_test<span class="sc">}</span><span class="ss">&quot;</span>)</span>
<span id="cb172-21"><a href="#cb172-21" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<div class="output stream stdout">
<pre><code>The Sensitivity and Specificity of the Logistic Regression on the training data:
Specificity:0.9638384870943011
Sensitivity:0.9992344706911636


The Sensitivity and Specificity of the Logistic Regression on the testing data:
Specificity:0.9356274102640166
Sensitivity:0.9451703817576224
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:286}"
id="SzZ7AHEZRTBL" data-outputId="5e00ad4c-e70a-4ebd-9cd3-9530933d65b9">
<div class="sourceCode" id="cb174"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb174-1"><a href="#cb174-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb174-2"><a href="#cb174-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb174-3"><a href="#cb174-3" aria-hidden="true" tabindex="-1"></a>rf_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb174-4"><a href="#cb174-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_rf, train_cv_recall_rf, train_cv_precision_rf, train_cv_f1_rf,roc_auc_train_rf,</span>
<span id="cb174-5"><a href="#cb174-5" aria-hidden="true" tabindex="-1"></a>                 specificity_rf_train,sensitivity_rf_train],</span>
<span id="cb174-6"><a href="#cb174-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_rf, test_cv_recall_rf, test_cv_precision_rf, test_cv_f1_rf,roc_auc_test_rf,</span>
<span id="cb174-7"><a href="#cb174-7" aria-hidden="true" tabindex="-1"></a>                specificity_rf_test,sensitivity_rf_test]},</span>
<span id="cb174-8"><a href="#cb174-8" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb174-9"><a href="#cb174-9" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>,<span class="st">&#39;Specificity&#39;</span>,<span class="st">&#39;Sensitivity&#39;</span>])</span>
<span id="cb174-10"><a href="#cb174-10" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for the Random Forest are: &quot;</span>)</span>
<span id="cb174-11"><a href="#cb174-11" aria-hidden="true" tabindex="-1"></a>rf_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the Random Forest are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="59">

  <div id="df-fe24deb1-e6d8-4983-952e-476401c11775" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.982911</td>
      <td>0.940748</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.999234</td>
      <td>0.945170</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.969958</td>
      <td>0.944444</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.984378</td>
      <td>0.944807</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.981536</td>
      <td>0.940399</td>
    </tr>
    <tr>
      <th>Specificity</th>
      <td>0.963838</td>
      <td>0.935627</td>
    </tr>
    <tr>
      <th>Sensitivity</th>
      <td>0.999234</td>
      <td>0.945170</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-fe24deb1-e6d8-4983-952e-476401c11775')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-fe24deb1-e6d8-4983-952e-476401c11775 button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-fe24deb1-e6d8-4983-952e-476401c11775');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-eb9d600a-9ff9-427c-b6bf-2aa977e63acd">
  <button class="colab-df-quickchart" onclick="quickchart('df-eb9d600a-9ff9-427c-b6bf-2aa977e63acd')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-eb9d600a-9ff9-427c-b6bf-2aa977e63acd button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_ffea4d34-2f1a-4541-a1c9-8f9283f6c40d">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('rf_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_ffea4d34-2f1a-4541-a1c9-8f9283f6c40d button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('rf_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell code" id="2U2CJrrxiG1B">
<div class="sourceCode" id="cb176"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="StZipKuxiy8n">
<div class="sourceCode" id="cb177"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="HY-69eehiy_L">
<div class="sourceCode" id="cb178"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="fBfvgahti0P7">
<div class="sourceCode" id="cb179"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="working-with-lighgbm-model" class="cell markdown"
id="-9gZ8jU4OXTz">
<h1><strong>Working with LighGBM Model</strong></h1>
</section>
<div class="cell code" id="GXJi5AHxMnsn">
<div class="sourceCode" id="cb180"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="nY_QVqThMnvS" data-outputId="7eb5ed56-25e7-45ba-9286-b06ddc4d9948">
<div class="sourceCode" id="cb181"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb181-1"><a href="#cb181-1" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install LightGBM</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Requirement already satisfied: LightGBM in /usr/local/lib/python3.10/dist-packages (4.1.0)
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from LightGBM) (1.25.2)
Requirement already satisfied: scipy in /usr/local/lib/python3.10/dist-packages (from LightGBM) (1.11.4)
</code></pre>
</div>
</div>
<div class="cell code" id="enap1suTMMwy">
<div class="sourceCode" id="cb183"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb183-1"><a href="#cb183-1" aria-hidden="true" tabindex="-1"></a><span class="co"># import lightgbm as lgb</span></span></code></pre></div>
</div>
<div class="cell code" id="BMD66vnSMySt">
<div class="sourceCode" id="cb184"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb184-1"><a href="#cb184-1" aria-hidden="true" tabindex="-1"></a><span class="co"># # Specify the LightGBM model</span></span>
<span id="cb184-2"><a href="#cb184-2" aria-hidden="true" tabindex="-1"></a><span class="co"># lgb_model = lgb.LGBMClassifier(random_state=1)</span></span></code></pre></div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:218}"
id="uJYUpkqoM2Ls" data-outputId="f75f8624-938a-4b59-e545-e7a63d1bfd41">
<div class="sourceCode" id="cb185"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb185-1"><a href="#cb185-1" aria-hidden="true" tabindex="-1"></a><span class="co"># # Fit the model on the training set</span></span>
<span id="cb185-2"><a href="#cb185-2" aria-hidden="true" tabindex="-1"></a><span class="co"># lgb_model.fit(x_train, y_train)</span></span></code></pre></div>
<div class="output stream stdout">
<pre><code>[LightGBM] [Info] Number of positive: 9144, number of negative: 7826
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000937 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 880
[LightGBM] [Info] Number of data points in the train set: 16970, number of used features: 12
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538833 -&gt; initscore=0.155646
[LightGBM] [Info] Start training from score 0.155646
</code></pre>
</div>
<div class="output execute_result" data-execution_count="48">
<style>#sk-container-id-1 {color: black;background-color: white;}#sk-container-id-1 pre{padding: 0;}#sk-container-id-1 div.sk-toggleable {background-color: white;}#sk-container-id-1 label.sk-toggleable__label {cursor: pointer;display: block;width: 100%;margin-bottom: 0;padding: 0.3em;box-sizing: border-box;text-align: center;}#sk-container-id-1 label.sk-toggleable__label-arrow:before {content: "▸";float: left;margin-right: 0.25em;color: #696969;}#sk-container-id-1 label.sk-toggleable__label-arrow:hover:before {color: black;}#sk-container-id-1 div.sk-estimator:hover label.sk-toggleable__label-arrow:before {color: black;}#sk-container-id-1 div.sk-toggleable__content {max-height: 0;max-width: 0;overflow: hidden;text-align: left;background-color: #f0f8ff;}#sk-container-id-1 div.sk-toggleable__content pre {margin: 0.2em;color: black;border-radius: 0.25em;background-color: #f0f8ff;}#sk-container-id-1 input.sk-toggleable__control:checked~div.sk-toggleable__content {max-height: 200px;max-width: 100%;overflow: auto;}#sk-container-id-1 input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {content: "▾";}#sk-container-id-1 div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 input.sk-hidden--visually {border: 0;clip: rect(1px 1px 1px 1px);clip: rect(1px, 1px, 1px, 1px);height: 1px;margin: -1px;overflow: hidden;padding: 0;position: absolute;width: 1px;}#sk-container-id-1 div.sk-estimator {font-family: monospace;background-color: #f0f8ff;border: 1px dotted black;border-radius: 0.25em;box-sizing: border-box;margin-bottom: 0.5em;}#sk-container-id-1 div.sk-estimator:hover {background-color: #d4ebff;}#sk-container-id-1 div.sk-parallel-item::after {content: "";width: 100%;border-bottom: 1px solid gray;flex-grow: 1;}#sk-container-id-1 div.sk-label:hover label.sk-toggleable__label {background-color: #d4ebff;}#sk-container-id-1 div.sk-serial::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: 0;}#sk-container-id-1 div.sk-serial {display: flex;flex-direction: column;align-items: center;background-color: white;padding-right: 0.2em;padding-left: 0.2em;position: relative;}#sk-container-id-1 div.sk-item {position: relative;z-index: 1;}#sk-container-id-1 div.sk-parallel {display: flex;align-items: stretch;justify-content: center;background-color: white;position: relative;}#sk-container-id-1 div.sk-item::before, #sk-container-id-1 div.sk-parallel-item::before {content: "";position: absolute;border-left: 1px solid gray;box-sizing: border-box;top: 0;bottom: 0;left: 50%;z-index: -1;}#sk-container-id-1 div.sk-parallel-item {display: flex;flex-direction: column;z-index: 1;position: relative;background-color: white;}#sk-container-id-1 div.sk-parallel-item:first-child::after {align-self: flex-end;width: 50%;}#sk-container-id-1 div.sk-parallel-item:last-child::after {align-self: flex-start;width: 50%;}#sk-container-id-1 div.sk-parallel-item:only-child::after {width: 0;}#sk-container-id-1 div.sk-dashed-wrapped {border: 1px dashed gray;margin: 0 0.4em 0.5em 0.4em;box-sizing: border-box;padding-bottom: 0.4em;background-color: white;}#sk-container-id-1 div.sk-label label {font-family: monospace;font-weight: bold;display: inline-block;line-height: 1.2em;}#sk-container-id-1 div.sk-label-container {text-align: center;}#sk-container-id-1 div.sk-container {/* jupyter's `normalize.less` sets `[hidden] { display: none; }` but bootstrap.min.css set `[hidden] { display: none !important; }` so we also need the `!important` here to be able to override the default hidden behavior on the sphinx rendered scikit-learn.org. See: https://github.com/scikit-learn/scikit-learn/issues/21755 */display: inline-block !important;position: relative;}#sk-container-id-1 div.sk-text-repr-fallback {display: none;}</style><div id="sk-container-id-1" class="sk-top-container"><div class="sk-text-repr-fallback"><pre>LGBMClassifier(random_state=1)</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item"><div class="sk-estimator sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually" id="sk-estimator-id-1" type="checkbox" checked><label for="sk-estimator-id-1" class="sk-toggleable__label sk-toggleable__label-arrow">LGBMClassifier</label><div class="sk-toggleable__content"><pre>LGBMClassifier(random_state=1)</pre></div></div></div></div></div>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="VQEmRWvho0qW" data-outputId="5d7cd3ad-d616-4a24-a6fd-0fbdfcdd92ff">
<div class="sourceCode" id="cb187"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb187-1"><a href="#cb187-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> lightgbm <span class="im">as</span> lgb</span>
<span id="cb187-2"><a href="#cb187-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.model_selection <span class="im">import</span> RandomizedSearchCV</span>
<span id="cb187-3"><a href="#cb187-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-4"><a href="#cb187-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Measure start time</span></span>
<span id="cb187-5"><a href="#cb187-5" aria-hidden="true" tabindex="-1"></a>start_time <span class="op">=</span> time.time()</span>
<span id="cb187-6"><a href="#cb187-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-7"><a href="#cb187-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Measure memory usage before and after model fitting</span></span>
<span id="cb187-8"><a href="#cb187-8" aria-hidden="true" tabindex="-1"></a><span class="co"># mem_usage_before = memory_usage(-1, interval=0.1, timeout=1)</span></span>
<span id="cb187-9"><a href="#cb187-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-10"><a href="#cb187-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Defining hyperparameter grid</span></span>
<span id="cb187-11"><a href="#cb187-11" aria-hidden="true" tabindex="-1"></a>param_grid <span class="op">=</span> {</span>
<span id="cb187-12"><a href="#cb187-12" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;num_leaves&#39;</span>: [<span class="dv">20</span>, <span class="dv">30</span>, <span class="dv">40</span>],                <span class="co"># Maximum number of leaves in one tree</span></span>
<span id="cb187-13"><a href="#cb187-13" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;max_depth&#39;</span>: [<span class="op">-</span><span class="dv">1</span>, <span class="dv">10</span>, <span class="dv">20</span>, <span class="dv">30</span>],             <span class="co"># Maximum depth of the tree</span></span>
<span id="cb187-14"><a href="#cb187-14" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;learning_rate&#39;</span>: [<span class="fl">0.01</span>, <span class="fl">0.05</span>, <span class="fl">0.1</span>],        <span class="co"># Learning rate</span></span>
<span id="cb187-15"><a href="#cb187-15" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;n_estimators&#39;</span>: [<span class="dv">100</span>, <span class="dv">200</span>, <span class="dv">300</span>],           <span class="co"># Number of boosting iterations</span></span>
<span id="cb187-16"><a href="#cb187-16" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;reg_alpha&#39;</span>: [<span class="fl">0.0</span>, <span class="fl">0.1</span>, <span class="fl">0.5</span>],              <span class="co"># L1 regularization term on weights</span></span>
<span id="cb187-17"><a href="#cb187-17" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;reg_lambda&#39;</span>: [<span class="fl">0.0</span>, <span class="fl">0.1</span>, <span class="fl">0.5</span>]               <span class="co"># L2 regularization term on weights</span></span>
<span id="cb187-18"><a href="#cb187-18" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb187-19"><a href="#cb187-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-20"><a href="#cb187-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Building the LightGBM model</span></span>
<span id="cb187-21"><a href="#cb187-21" aria-hidden="true" tabindex="-1"></a>lgb_model <span class="op">=</span> lgb.LGBMClassifier(random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb187-22"><a href="#cb187-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-23"><a href="#cb187-23" aria-hidden="true" tabindex="-1"></a><span class="co"># Performing RandomizedSearchCV</span></span>
<span id="cb187-24"><a href="#cb187-24" aria-hidden="true" tabindex="-1"></a>random_search <span class="op">=</span> RandomizedSearchCV(estimator<span class="op">=</span>lgb_model, param_distributions<span class="op">=</span>param_grid,</span>
<span id="cb187-25"><a href="#cb187-25" aria-hidden="true" tabindex="-1"></a>                                   n_iter<span class="op">=</span><span class="dv">10</span>, cv<span class="op">=</span><span class="dv">10</span>, scoring<span class="op">=</span>[<span class="st">&#39;accuracy&#39;</span>,<span class="st">&#39;recall&#39;</span>,<span class="st">&#39;f1&#39;</span>,<span class="st">&#39;roc_auc&#39;</span>,<span class="st">&#39;balanced_accuracy&#39;</span>], refit<span class="op">=</span><span class="st">&quot;accuracy&quot;</span>, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb187-26"><a href="#cb187-26" aria-hidden="true" tabindex="-1"></a>random_search.fit(x_train, y_train)</span>
<span id="cb187-27"><a href="#cb187-27" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-28"><a href="#cb187-28" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting the best parameters and best score</span></span>
<span id="cb187-29"><a href="#cb187-29" aria-hidden="true" tabindex="-1"></a>best_params_lightgbm <span class="op">=</span> random_search.best_params_</span>
<span id="cb187-30"><a href="#cb187-30" aria-hidden="true" tabindex="-1"></a>best_score_lightgbm <span class="op">=</span> random_search.best_score_</span>
<span id="cb187-31"><a href="#cb187-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-32"><a href="#cb187-32" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-33"><a href="#cb187-33" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb187-34"><a href="#cb187-34" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;</span><span class="ch">\n</span><span class="ss">The best hyperparameters for LightGBM are:</span><span class="ch">\n</span><span class="sc">{</span>best_params_lightgbm<span class="sc">}</span><span class="ch">\n</span><span class="ss">&quot;</span>)</span>
<span id="cb187-35"><a href="#cb187-35" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;Thes best score for LightGBM is:</span><span class="ch">\n</span><span class="sc">{</span>best_score_lightgbm<span class="sc">}</span><span class="ss">&quot;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000893 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000681 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000681 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000710 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000673 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000675 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000665 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000677 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001235 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001223 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001181 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000703 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000677 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000682 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000667 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000671 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001244 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001134 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000683 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001026 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001754 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000676 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000669 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001229 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001259 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001237 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001183 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001158 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000670 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000674 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000677 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001154 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000673 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000671 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000702 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000674 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000667 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000672 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000750 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000672 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000670 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000670 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000678 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001234 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001279 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001167 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000678 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000721 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000689 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000664 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000668 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001063 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000717 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000669 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000670 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001190 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000669 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000679 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000679 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001194 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001270 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001240 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000686 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000686 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000730 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001154 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001224 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001195 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000687 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000667 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000692 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000670 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001157 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000675 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.004400 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000668 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000674 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000669 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001063 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000668 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000676 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000681 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001240 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001207 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001203 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000704 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000682 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000704 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000686 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001109 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000718 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000665 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000664 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000670 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000665 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8230, number of negative: 7043
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000680 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538859 -&gt; initscore=0.155752
[LightGBM] [Info] Start training from score 0.155752
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000661 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000684 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000674 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Info] Number of positive: 8229, number of negative: 7044
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000684 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 15273, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538794 -&gt; initscore=0.155488
[LightGBM] [Info] Start training from score 0.155488
[LightGBM] [Info] Number of positive: 9144, number of negative: 7826
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000736 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 16970, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538833 -&gt; initscore=0.155646
[LightGBM] [Info] Start training from score 0.155646

The best hyperparameters for LightGBM are:
{&#39;reg_lambda&#39;: 0.1, &#39;reg_alpha&#39;: 0.5, &#39;num_leaves&#39;: 20, &#39;n_estimators&#39;: 200, &#39;max_depth&#39;: -1, &#39;learning_rate&#39;: 0.01}

Thes best score for LightGBM is:
0.9451974071891573
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="Bgb3fS8EZ8_R" data-outputId="96d1bc73-c8ac-4ff5-a8f7-cb0e0fee1abd">
<div class="sourceCode" id="cb189"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb189-1"><a href="#cb189-1" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;</span><span class="ch">\n</span><span class="ss">The best hyperparameters for LightGBM are:</span><span class="sc">{</span>best_params_lightgbm<span class="sc">}</span><span class="ch">\n</span><span class="ss">&quot;</span>)</span>
<span id="cb189-2"><a href="#cb189-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;Thes best score for LightGBM is:</span><span class="sc">{</span>best_score_lightgbm<span class="sc">}</span><span class="ss">&quot;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>
The best hyperparameters for LightGBM are:{&#39;reg_lambda&#39;: 0.1, &#39;reg_alpha&#39;: 0.5, &#39;num_leaves&#39;: 20, &#39;n_estimators&#39;: 200, &#39;max_depth&#39;: -1, &#39;learning_rate&#39;: 0.01}

Thes best score for LightGBM is:0.9451974071891573
</code></pre>
</div>
</div>
<div class="cell markdown" id="youFr3B_aCM8">
<p><strong>From the hyperparameter tuning process, it can be seen that
the following are the important parameters: "{'reg_lambda': 0.1,
'reg_alpha': 0.5, 'num_leaves': 20, 'n_estimators': 200, 'max_depth':
-1, 'learning_rate': 0.01}" and the best score is "94.51%".</strong></p>
</div>
<section id="building-lightgbm-model-on-the-training-data"
class="cell markdown" id="RIzJKqloOck6">
<h1><strong>Building LightGBM Model on the Training Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="3rjVj61cpsNe" data-outputId="b3d12cb1-4ed3-4363-daa3-801bd0613033">
<div class="sourceCode" id="cb191"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb191-1"><a href="#cb191-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Training the model with the best parameters</span></span>
<span id="cb191-2"><a href="#cb191-2" aria-hidden="true" tabindex="-1"></a>best_lgb_model <span class="op">=</span> lgb.LGBMClassifier(<span class="op">**</span>best_params_lightgbm, random_state<span class="op">=</span><span class="dv">1</span>)</span>
<span id="cb191-3"><a href="#cb191-3" aria-hidden="true" tabindex="-1"></a>best_lgb_model <span class="op">=</span> best_lgb_model.fit(x_train, y_train)</span>
<span id="cb191-4"><a href="#cb191-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-5"><a href="#cb191-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb191-6"><a href="#cb191-6" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_lgb_train = cross_val_predict(best_lgb_model, x_train, y_train, cv=10)</span></span>
<span id="cb191-7"><a href="#cb191-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-8"><a href="#cb191-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-9"><a href="#cb191-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb191-10"><a href="#cb191-10" aria-hidden="true" tabindex="-1"></a>cv_predictions_lgb_train <span class="op">=</span> best_lgb_model.predict(x_train) <span class="co">#, y_train, cv=10)</span></span>
<span id="cb191-11"><a href="#cb191-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-12"><a href="#cb191-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-13"><a href="#cb191-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb191-14"><a href="#cb191-14" aria-hidden="true" tabindex="-1"></a>train_cv_acc_lgb <span class="op">=</span> accuracy_score(y_train, cv_predictions_lgb_train)</span>
<span id="cb191-15"><a href="#cb191-15" aria-hidden="true" tabindex="-1"></a>train_cv_recall_lgb <span class="op">=</span> recall_score(y_train, cv_predictions_lgb_train)</span>
<span id="cb191-16"><a href="#cb191-16" aria-hidden="true" tabindex="-1"></a>train_cv_precision_lgb <span class="op">=</span> precision_score(y_train, cv_predictions_lgb_train)</span>
<span id="cb191-17"><a href="#cb191-17" aria-hidden="true" tabindex="-1"></a>train_cv_f1_lgb <span class="op">=</span> f1_score(y_train, cv_predictions_lgb_train)</span>
<span id="cb191-18"><a href="#cb191-18" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating ROC AUC</span></span>
<span id="cb191-19"><a href="#cb191-19" aria-hidden="true" tabindex="-1"></a>roc_auc_train_lightgbm <span class="op">=</span> roc_auc_score(y_train, cv_predictions_rf_train)</span>
<span id="cb191-20"><a href="#cb191-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-21"><a href="#cb191-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-22"><a href="#cb191-22" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-23"><a href="#cb191-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb191-24"><a href="#cb191-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb191-25"><a href="#cb191-25" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;</span><span class="ch">\n\n\n\n</span><span class="st">The metrics for the training set using the cross-validation are: &#39;</span>)</span>
<span id="cb191-26"><a href="#cb191-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, train_cv_acc_lgb)</span>
<span id="cb191-27"><a href="#cb191-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, train_cv_recall_lgb)</span>
<span id="cb191-28"><a href="#cb191-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, train_cv_precision_lgb)</span>
<span id="cb191-29"><a href="#cb191-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, train_cv_f1_lgb)</span>
<span id="cb191-30"><a href="#cb191-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated ROC_AUC Score:&quot;</span>, roc_auc_train_lightgbm)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>[LightGBM] [Info] Number of positive: 9144, number of negative: 7826
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.001346 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 16970, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.538833 -&gt; initscore=0.155646
[LightGBM] [Info] Start training from score 0.155646


The metrics for the training set using the cross-validation are: 
Cross-Validated Accuracy: 0.9504419563936358
Cross-Validated Recall: 0.9554899387576553
Cross-Validated Precision: 0.9526769163668084
Cross-Validated F1 Score: 0.9540813540813541
Cross-Validated ROC_AUC Score: 0.9815364788927324
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="MACsHasLBg4G" data-outputId="b5da26de-e1af-49af-94ed-f8dbb2ae1db2">
<div class="sourceCode" id="cb193"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb193-1"><a href="#cb193-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Record end time</span></span>
<span id="cb193-2"><a href="#cb193-2" aria-hidden="true" tabindex="-1"></a>end_time <span class="op">=</span> time.time()</span>
<span id="cb193-3"><a href="#cb193-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb193-4"><a href="#cb193-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate execution time</span></span>
<span id="cb193-5"><a href="#cb193-5" aria-hidden="true" tabindex="-1"></a>execution_time_lightgbm <span class="op">=</span> end_time <span class="op">-</span> start_time</span>
<span id="cb193-6"><a href="#cb193-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb193-7"><a href="#cb193-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculate memory usage</span></span>
<span id="cb193-8"><a href="#cb193-8" aria-hidden="true" tabindex="-1"></a>process <span class="op">=</span> psutil.Process()</span>
<span id="cb193-9"><a href="#cb193-9" aria-hidden="true" tabindex="-1"></a>memory_used_lightgbm <span class="op">=</span> process.memory_info().rss <span class="op">/</span> (<span class="dv">1024</span> <span class="op">*</span> <span class="dv">1024</span>)  <span class="co"># Convert to MiB</span></span>
<span id="cb193-10"><a href="#cb193-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb193-11"><a href="#cb193-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Print results</span></span>
<span id="cb193-12"><a href="#cb193-12" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Execution Time:&#39;</span>, execution_time_lightgbm, <span class="st">&#39;seconds&#39;</span>)</span>
<span id="cb193-13"><a href="#cb193-13" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;Memory Used:&#39;</span>, memory_used_lightgbm, <span class="st">&#39;MiB&#39;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Execution Time: 253.99081993103027 seconds
Memory Used: 1658.44140625 MiB
</code></pre>
</div>
</div>
<section id="building-lightgbm-model-on-the-testing-data"
class="cell markdown" id="4lrkuvzlOkFk">
<h1><strong>Building LightGBM Model on the Testing Data</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="RZl4AoRmpz5C" data-outputId="90186da1-e748-453f-d6b1-293d4d7b6f34">
<div class="sourceCode" id="cb195"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb195-1"><a href="#cb195-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Testing the model with the best parameters</span></span>
<span id="cb195-2"><a href="#cb195-2" aria-hidden="true" tabindex="-1"></a><span class="co"># best_lgb_model = lgb.LGBMClassifier(**best_params, random_state=1)</span></span>
<span id="cb195-3"><a href="#cb195-3" aria-hidden="true" tabindex="-1"></a><span class="co"># best_lgb_model.fit(x_test, y_test)</span></span>
<span id="cb195-4"><a href="#cb195-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb195-5"><a href="#cb195-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb195-6"><a href="#cb195-6" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions_lgb_test = cross_val_predict(best_lgb_model, x_test, y_test, cv=5)</span></span>
<span id="cb195-7"><a href="#cb195-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb195-8"><a href="#cb195-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb195-9"><a href="#cb195-9" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting cross-validated predictions</span></span>
<span id="cb195-10"><a href="#cb195-10" aria-hidden="true" tabindex="-1"></a>cv_predictions_lgb_test <span class="op">=</span> best_lgb_model.predict(x_test) <span class="co">#, y_test, cv=5)</span></span>
<span id="cb195-11"><a href="#cb195-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb195-12"><a href="#cb195-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb195-13"><a href="#cb195-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating performance metrics based on cross-validated predictions</span></span>
<span id="cb195-14"><a href="#cb195-14" aria-hidden="true" tabindex="-1"></a>test_cv_acc_lgb <span class="op">=</span> accuracy_score(y_test, cv_predictions_lgb_test)</span>
<span id="cb195-15"><a href="#cb195-15" aria-hidden="true" tabindex="-1"></a>test_cv_recall_lgb <span class="op">=</span> recall_score(y_test, cv_predictions_lgb_test)</span>
<span id="cb195-16"><a href="#cb195-16" aria-hidden="true" tabindex="-1"></a>test_cv_precision_lgb <span class="op">=</span> precision_score(y_test, cv_predictions_lgb_test)</span>
<span id="cb195-17"><a href="#cb195-17" aria-hidden="true" tabindex="-1"></a>test_cv_f1_lgb <span class="op">=</span> f1_score(y_test, cv_predictions_lgb_test)</span>
<span id="cb195-18"><a href="#cb195-18" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating ROC AUC</span></span>
<span id="cb195-19"><a href="#cb195-19" aria-hidden="true" tabindex="-1"></a>roc_auc_test_lightgbm <span class="op">=</span> roc_auc_score(y_train, cv_predictions_rf_train)</span>
<span id="cb195-20"><a href="#cb195-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb195-21"><a href="#cb195-21" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb195-22"><a href="#cb195-22" aria-hidden="true" tabindex="-1"></a><span class="co"># Printing the cross-validated performance metrics</span></span>
<span id="cb195-23"><a href="#cb195-23" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&#39;</span><span class="ch">\n\n\n\n</span><span class="st">The metrics for the training set using the cross-validation are: &#39;</span>)</span>
<span id="cb195-24"><a href="#cb195-24" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Accuracy:&quot;</span>, test_cv_acc_lgb)</span>
<span id="cb195-25"><a href="#cb195-25" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Recall:&quot;</span>, test_cv_recall_lgb)</span>
<span id="cb195-26"><a href="#cb195-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated Precision:&quot;</span>, test_cv_precision_lgb)</span>
<span id="cb195-27"><a href="#cb195-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated F1 Score:&quot;</span>, test_cv_f1_lgb)</span>
<span id="cb195-28"><a href="#cb195-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Cross-Validated ROC_AUC Score:&quot;</span>, roc_auc_test_rf)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>


The metrics for the training set using the cross-validation are: 
Cross-Validated Accuracy: 0.943222436073687
Cross-Validated Recall: 0.9472200871124776
Cross-Validated Precision: 0.9469774590163934
Cross-Validated F1 Score: 0.9470987575252978
Cross-Validated ROC_AUC Score: 0.9403988960108196
</code></pre>
</div>
</div>
<div class="cell code" id="Fy47Ja4acK5u">
<div class="sourceCode" id="cb197"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="metrics-of-lightgbm---train-and-test" class="cell markdown"
id="VuHQcx4OtrbN">
<h1><strong>Metrics of LightGBM - Train and Test</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:223}"
id="o9fYCk5grWUq" data-outputId="bd428ee9-0db6-4321-85f1-942f1a5a1466">
<div class="sourceCode" id="cb198"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb198-1"><a href="#cb198-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb198-2"><a href="#cb198-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb198-3"><a href="#cb198-3" aria-hidden="true" tabindex="-1"></a>LightGBM_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb198-4"><a href="#cb198-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_lgb, train_cv_recall_lgb, train_cv_precision_lgb, train_cv_f1_lgb,roc_auc_train_lightgbm],</span>
<span id="cb198-5"><a href="#cb198-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_lgb, test_cv_recall_lgb, test_cv_precision_lgb, test_cv_f1_lgb,roc_auc_test_lightgbm]},</span>
<span id="cb198-6"><a href="#cb198-6" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb198-7"><a href="#cb198-7" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>])</span>
<span id="cb198-8"><a href="#cb198-8" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for the LightGBM  are: &quot;</span>)</span>
<span id="cb198-9"><a href="#cb198-9" aria-hidden="true" tabindex="-1"></a>LightGBM_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the LightGBM  are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="67">

  <div id="df-04842ca5-6953-485c-9aef-1e4413d73493" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.950442</td>
      <td>0.943222</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.955490</td>
      <td>0.947220</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.952677</td>
      <td>0.946977</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.954081</td>
      <td>0.947099</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.981536</td>
      <td>0.981536</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-04842ca5-6953-485c-9aef-1e4413d73493')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-04842ca5-6953-485c-9aef-1e4413d73493 button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-04842ca5-6953-485c-9aef-1e4413d73493');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-4b578eed-0cc3-4a28-a06e-4ce4a9cacdfd">
  <button class="colab-df-quickchart" onclick="quickchart('df-4b578eed-0cc3-4a28-a06e-4ce4a9cacdfd')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-4b578eed-0cc3-4a28-a06e-4ce4a9cacdfd button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_de3be9b2-b2de-4343-977e-42f70d097557">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('LightGBM_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_de3be9b2-b2de-4343-977e-42f70d097557 button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('LightGBM_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell markdown" id="oMPY-jnszvJg">
<ul>
<li><p><strong>From the metrics, it can be seen that the LightGBM is
performing so well compared to all the the models for churn predicition
with the highest testing accuracy of 94.32%.</strong></p></li>
<li><p><strong>Additionally, the LightGBM model is also observed to be
working well after the inclusion of the customer sentimetns and the
respective sentiment scores. This can be observed from the fact that the
the accuracy of LightGBM in phase 1 is 94.03% and in the second phase is
94.19%.</strong></p></li>
</ul>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:472}"
id="bbLFp_e2WQxU" data-outputId="ee3f1304-b367-44eb-aa4f-f3a6654e9ee6">
<div class="sourceCode" id="cb200"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb200-1"><a href="#cb200-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb200-2"><a href="#cb200-2" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> roc_curve, auc</span>
<span id="cb200-3"><a href="#cb200-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb200-4"><a href="#cb200-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Get probabilities for positive class</span></span>
<span id="cb200-5"><a href="#cb200-5" aria-hidden="true" tabindex="-1"></a>y_probs_lightgbm <span class="op">=</span> best_lgb_model.predict_proba(x_test)[:, <span class="dv">1</span>]</span>
<span id="cb200-6"><a href="#cb200-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb200-7"><a href="#cb200-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC curve</span></span>
<span id="cb200-8"><a href="#cb200-8" aria-hidden="true" tabindex="-1"></a>fpr, tpr, thresholds <span class="op">=</span> roc_curve(y_test, y_probs_lightgbm)</span>
<span id="cb200-9"><a href="#cb200-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb200-10"><a href="#cb200-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Compute ROC area under the curve</span></span>
<span id="cb200-11"><a href="#cb200-11" aria-hidden="true" tabindex="-1"></a>roc_auc <span class="op">=</span> auc(fpr, tpr)</span>
<span id="cb200-12"><a href="#cb200-12" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb200-13"><a href="#cb200-13" aria-hidden="true" tabindex="-1"></a><span class="co"># Plot ROC curve</span></span>
<span id="cb200-14"><a href="#cb200-14" aria-hidden="true" tabindex="-1"></a>plt.figure()</span>
<span id="cb200-15"><a href="#cb200-15" aria-hidden="true" tabindex="-1"></a>plt.plot(fpr, tpr, color<span class="op">=</span><span class="st">&#39;darkorange&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, label<span class="op">=</span><span class="st">&#39;ROC curve (area = </span><span class="sc">%0.2f</span><span class="st">)&#39;</span> <span class="op">%</span> roc_auc)</span>
<span id="cb200-16"><a href="#cb200-16" aria-hidden="true" tabindex="-1"></a>plt.plot([<span class="dv">0</span>, <span class="dv">1</span>], [<span class="dv">0</span>, <span class="dv">1</span>], color<span class="op">=</span><span class="st">&#39;navy&#39;</span>, lw<span class="op">=</span><span class="dv">2</span>, linestyle<span class="op">=</span><span class="st">&#39;--&#39;</span>)</span>
<span id="cb200-17"><a href="#cb200-17" aria-hidden="true" tabindex="-1"></a>plt.xlim([<span class="fl">0.0</span>, <span class="fl">1.0</span>])</span>
<span id="cb200-18"><a href="#cb200-18" aria-hidden="true" tabindex="-1"></a>plt.ylim([<span class="fl">0.0</span>, <span class="fl">1.05</span>])</span>
<span id="cb200-19"><a href="#cb200-19" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;False Positive Rate&#39;</span>)</span>
<span id="cb200-20"><a href="#cb200-20" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True Positive Rate&#39;</span>)</span>
<span id="cb200-21"><a href="#cb200-21" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Receiver Operating Characteristic (ROC) Curve&#39;</span>)</span>
<span id="cb200-22"><a href="#cb200-22" aria-hidden="true" tabindex="-1"></a>plt.legend(loc<span class="op">=</span><span class="st">&quot;lower right&quot;</span>)</span>
<span id="cb200-23"><a href="#cb200-23" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/0c185101490119d524439eec2e614b3d24c24399.png" /></p>
</div>
</div>
<div class="cell markdown" id="w_SkTYOhYJac">
<p><strong>LightGBM is performing much more better than the other models
in distinguishing between the positive and negative classes. The area
under the curve for lightgbm is more than the other models
comparatively.</strong></p>
</div>
<section id="variable-importance-for-the-lightgbm" class="cell markdown"
id="k233W2oew8Fc">
<h1><strong>Variable Importance for the LightGBM</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:564}"
id="I3Lzyze8w_N6" data-outputId="cc9e9ef8-09a1-4f92-ab61-dc429ca9b878">
<div class="sourceCode" id="cb201"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb201-1"><a href="#cb201-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb201-2"><a href="#cb201-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb201-3"><a href="#cb201-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Assuming df is your DataFrame containing the data</span></span>
<span id="cb201-4"><a href="#cb201-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Exclude the target variable</span></span>
<span id="cb201-5"><a href="#cb201-5" aria-hidden="true" tabindex="-1"></a>feature_names <span class="op">=</span> df.columns.drop([<span class="st">&#39;churn_risk_score&#39;</span>, <span class="st">&#39;feedback&#39;</span>])</span>
<span id="cb201-6"><a href="#cb201-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb201-7"><a href="#cb201-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Getting feature importances</span></span>
<span id="cb201-8"><a href="#cb201-8" aria-hidden="true" tabindex="-1"></a>feature_importances <span class="op">=</span> best_lgb_model.feature_importances_</span>
<span id="cb201-9"><a href="#cb201-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb201-10"><a href="#cb201-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Sorting feature importances and feature names by importance</span></span>
<span id="cb201-11"><a href="#cb201-11" aria-hidden="true" tabindex="-1"></a>sorted_indices <span class="op">=</span> np.argsort(feature_importances)[::<span class="op">-</span><span class="dv">1</span>]</span>
<span id="cb201-12"><a href="#cb201-12" aria-hidden="true" tabindex="-1"></a>sorted_feature_importances <span class="op">=</span> feature_importances[sorted_indices]</span>
<span id="cb201-13"><a href="#cb201-13" aria-hidden="true" tabindex="-1"></a>sorted_feature_names <span class="op">=</span> feature_names[sorted_indices]</span>
<span id="cb201-14"><a href="#cb201-14" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb201-15"><a href="#cb201-15" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting the feature importances</span></span>
<span id="cb201-16"><a href="#cb201-16" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">10</span>, <span class="dv">6</span>))</span>
<span id="cb201-17"><a href="#cb201-17" aria-hidden="true" tabindex="-1"></a>plt.barh(sorted_feature_names, sorted_feature_importances)</span>
<span id="cb201-18"><a href="#cb201-18" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Feature Importance&#39;</span>)</span>
<span id="cb201-19"><a href="#cb201-19" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Variable Importance for Random Forest&#39;</span>)</span>
<span id="cb201-20"><a href="#cb201-20" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/45c3e97d77acc0188e9aeb89d0b599cd2b980575.png" /></p>
</div>
</div>
<div class="cell markdown" id="PF2hKQWkzPh5">
<ul>
<li><p><strong>From the variable importance graph it can be seen that
"points_in_wallet" appeared to be most impotant variable again with
LightGBM. Also, the newly added variable of "score" is appeared to be in
the second place for the churn prediction of this organization which is
different from all the other models. In addition to "score",
"membership_category", also found to be relatively important which is
seen in the logistic regression, Linear SVM, and Random Forest as
well.</strong></p></li>
<li><p><strong>The sentiment score variable "score" is also found to be
impactful and is one of the most important variables in
LightGBM.</strong></p></li>
</ul>
</div>
<div class="cell code" id="qTRr1ISptawT">
<div class="sourceCode" id="cb202"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="confusion-matrix-for-the-lightgbm-model"
class="cell markdown" id="W7QER9tROnSy">
<h1><strong>Confusion Matrix for the LightGBM Model</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:1000}"
id="JYVBbRJ3OAPY" data-outputId="34343030-95f2-4113-ef74-94a5eb4a1af6">
<div class="sourceCode" id="cb203"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb203-1"><a href="#cb203-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> confusion_matrix</span>
<span id="cb203-2"><a href="#cb203-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
<span id="cb203-3"><a href="#cb203-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> matplotlib.pyplot <span class="im">as</span> plt</span>
<span id="cb203-4"><a href="#cb203-4" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb203-5"><a href="#cb203-5" aria-hidden="true" tabindex="-1"></a><span class="co"># Generating confusion matrix for training set</span></span>
<span id="cb203-6"><a href="#cb203-6" aria-hidden="true" tabindex="-1"></a><span class="co"># Get cross-validated predictions</span></span>
<span id="cb203-7"><a href="#cb203-7" aria-hidden="true" tabindex="-1"></a><span class="co"># cv_predictions = cross_val_predict(best_lgb_model, x_train, y_train, cv=5)</span></span>
<span id="cb203-8"><a href="#cb203-8" aria-hidden="true" tabindex="-1"></a>conf_matrix_train <span class="op">=</span> confusion_matrix(y_train, cv_predictions_lgb_train)</span>
<span id="cb203-9"><a href="#cb203-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb203-10"><a href="#cb203-10" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for training set</span></span>
<span id="cb203-11"><a href="#cb203-11" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb203-12"><a href="#cb203-12" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_train, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb203-13"><a href="#cb203-13" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb203-14"><a href="#cb203-14" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb203-15"><a href="#cb203-15" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Training Set&#39;</span>)</span>
<span id="cb203-16"><a href="#cb203-16" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb203-17"><a href="#cb203-17" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb203-18"><a href="#cb203-18" aria-hidden="true" tabindex="-1"></a>plt.show()</span>
<span id="cb203-19"><a href="#cb203-19" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb203-20"><a href="#cb203-20" aria-hidden="true" tabindex="-1"></a><span class="co"># Generating confusion matrix for testing set</span></span>
<span id="cb203-21"><a href="#cb203-21" aria-hidden="true" tabindex="-1"></a>cv_predictions <span class="op">=</span> cross_val_predict(lgb_model, x_test, y_test, cv<span class="op">=</span><span class="dv">5</span>)</span>
<span id="cb203-22"><a href="#cb203-22" aria-hidden="true" tabindex="-1"></a>conf_matrix_test <span class="op">=</span> confusion_matrix(y_test, cv_predictions_lgb_test)</span>
<span id="cb203-23"><a href="#cb203-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb203-24"><a href="#cb203-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Plotting confusion matrix for testing set</span></span>
<span id="cb203-25"><a href="#cb203-25" aria-hidden="true" tabindex="-1"></a>plt.figure(figsize<span class="op">=</span>(<span class="dv">8</span>, <span class="dv">6</span>))</span>
<span id="cb203-26"><a href="#cb203-26" aria-hidden="true" tabindex="-1"></a>sns.heatmap(conf_matrix_test, annot<span class="op">=</span><span class="va">True</span>, fmt<span class="op">=</span><span class="st">&#39;d&#39;</span>, cmap<span class="op">=</span><span class="st">&#39;Blues&#39;</span>,</span>
<span id="cb203-27"><a href="#cb203-27" aria-hidden="true" tabindex="-1"></a>            xticklabels<span class="op">=</span>[<span class="st">&#39;Predicted Negative&#39;</span>, <span class="st">&#39;Predicted Positive&#39;</span>],</span>
<span id="cb203-28"><a href="#cb203-28" aria-hidden="true" tabindex="-1"></a>            yticklabels<span class="op">=</span>[<span class="st">&#39;Actual Negative&#39;</span>, <span class="st">&#39;Actual Positive&#39;</span>])</span>
<span id="cb203-29"><a href="#cb203-29" aria-hidden="true" tabindex="-1"></a>plt.title(<span class="st">&#39;Confusion Matrix - Testing Set&#39;</span>)</span>
<span id="cb203-30"><a href="#cb203-30" aria-hidden="true" tabindex="-1"></a>plt.xlabel(<span class="st">&#39;Predicted labels&#39;</span>)</span>
<span id="cb203-31"><a href="#cb203-31" aria-hidden="true" tabindex="-1"></a>plt.ylabel(<span class="st">&#39;True labels&#39;</span>)</span>
<span id="cb203-32"><a href="#cb203-32" aria-hidden="true" tabindex="-1"></a>plt.show()</span></code></pre></div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/6a270a92a44a97b6b13265bbdb87e17a5047af47.png" /></p>
</div>
<div class="output stream stdout">
<pre><code>[LightGBM] [Info] Number of positive: 3123, number of negative: 2696
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000477 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 5819, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.536690 -&gt; initscore=0.147025
[LightGBM] [Info] Start training from score 0.147025
[LightGBM] [Info] Number of positive: 3122, number of negative: 2697
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000470 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 5819, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.536518 -&gt; initscore=0.146334
[LightGBM] [Info] Start training from score 0.146334
[LightGBM] [Info] Number of positive: 3122, number of negative: 2697
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000457 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 5819, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.536518 -&gt; initscore=0.146334
[LightGBM] [Info] Start training from score 0.146334
[LightGBM] [Info] Number of positive: 3122, number of negative: 2697
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000471 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 5819, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.536518 -&gt; initscore=0.146334
[LightGBM] [Info] Start training from score 0.146334
[LightGBM] [Info] Number of positive: 3123, number of negative: 2697
[LightGBM] [Info] Auto-choosing row-wise multi-threading, the overhead of testing was 0.000477 seconds.
You can set `force_row_wise=true` to remove the overhead.
And if memory is not enough, you can set `force_col_wise=true`.
[LightGBM] [Info] Total Bins 890
[LightGBM] [Info] Number of data points in the train set: 5820, number of used features: 13
[LightGBM] [Info] [binary:BoostFromScore]: pavg=0.536598 -&gt; initscore=0.146654
[LightGBM] [Info] Start training from score 0.146654
</code></pre>
</div>
<div class="output display_data">
<p><img
src="vertopal_11c7d057c72c4c018277cd184609f64f/fd769365031da3f996d107e3888c1bd666c2509e.png" /></p>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="uaovaz01CATB" data-outputId="c48278b0-e3da-412d-aa52-d3bb6af3e821">
<div class="sourceCode" id="cb205"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb205-1"><a href="#cb205-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb205-2"><a href="#cb205-2" aria-hidden="true" tabindex="-1"></a>tn_train, fp_train, fn_train, tp_train <span class="op">=</span> conf_matrix_train.ravel()</span>
<span id="cb205-3"><a href="#cb205-3" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-4"><a href="#cb205-4" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb205-5"><a href="#cb205-5" aria-hidden="true" tabindex="-1"></a>specificity_lightgbm_train <span class="op">=</span> tn_train <span class="op">/</span> (tn_train <span class="op">+</span> fp_train)</span>
<span id="cb205-6"><a href="#cb205-6" aria-hidden="true" tabindex="-1"></a>sensitivity_lightgbm_train <span class="op">=</span> tp_train <span class="op">/</span> (tp_train <span class="op">+</span> fn_train)</span>
<span id="cb205-7"><a href="#cb205-7" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-8"><a href="#cb205-8" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-9"><a href="#cb205-9" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-10"><a href="#cb205-10" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-11"><a href="#cb205-11" aria-hidden="true" tabindex="-1"></a><span class="co"># Extracting true negatives, false positives, false negatives, and true positives from confusion matrix</span></span>
<span id="cb205-12"><a href="#cb205-12" aria-hidden="true" tabindex="-1"></a>tn_test, fp_test, fn_test, tp_test <span class="op">=</span> conf_matrix_test.ravel()</span>
<span id="cb205-13"><a href="#cb205-13" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-14"><a href="#cb205-14" aria-hidden="true" tabindex="-1"></a><span class="co"># Calculating specificity and sensitivity</span></span>
<span id="cb205-15"><a href="#cb205-15" aria-hidden="true" tabindex="-1"></a>specificity_lightgbm_test <span class="op">=</span> tn_test <span class="op">/</span> (tn_test <span class="op">+</span> fp_test)</span>
<span id="cb205-16"><a href="#cb205-16" aria-hidden="true" tabindex="-1"></a>sensitivity_lightgbm_test <span class="op">=</span> tp_test <span class="op">/</span> (tp_test <span class="op">+</span> fn_test)</span>
<span id="cb205-17"><a href="#cb205-17" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-18"><a href="#cb205-18" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb205-19"><a href="#cb205-19" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the training data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_lightgbm_train<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_lightgbm_train<span class="sc">}</span><span class="ch">\n\n</span><span class="ss">&quot;</span>)</span>
<span id="cb205-20"><a href="#cb205-20" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="ss">f&quot;The Sensitivity and Specificity of the Logistic Regression on the testing data:</span><span class="ch">\n</span><span class="ss">Specificity:</span><span class="sc">{</span>specificity_lightgbm_test<span class="sc">}</span><span class="ch">\n</span><span class="ss">Sensitivity:</span><span class="sc">{</span>sensitivity_lightgbm_test<span class="sc">}</span><span class="ss">&quot;</span>)</span>
<span id="cb205-21"><a href="#cb205-21" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<div class="output stream stdout">
<pre><code>The Sensitivity and Specificity of the Logistic Regression on the training data:
Specificity:0.9445438282647585
Sensitivity:0.9554899387576553


The Sensitivity and Specificity of the Logistic Regression on the testing data:
Specificity:0.9385938890536932
Sensitivity:0.9472200871124776
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;,&quot;height&quot;:286}"
id="4xmwscfKXCvA" data-outputId="900b8d69-f84a-400d-e9fc-9a99cb0a46f0">
<div class="sourceCode" id="cb207"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb207-1"><a href="#cb207-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
<span id="cb207-2"><a href="#cb207-2" aria-hidden="true" tabindex="-1"></a><span class="co"># Creating DataFrame</span></span>
<span id="cb207-3"><a href="#cb207-3" aria-hidden="true" tabindex="-1"></a>LightGBM_output <span class="op">=</span> pd.DataFrame({</span>
<span id="cb207-4"><a href="#cb207-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Training&#39;</span>: [train_cv_acc_lgb, train_cv_recall_lgb, train_cv_precision_lgb, train_cv_f1_lgb,roc_auc_train_lightgbm,</span>
<span id="cb207-5"><a href="#cb207-5" aria-hidden="true" tabindex="-1"></a>                 specificity_lightgbm_train,sensitivity_lightgbm_train],</span>
<span id="cb207-6"><a href="#cb207-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;Testing&#39;</span>: [test_cv_acc_lgb, test_cv_recall_lgb, test_cv_precision_lgb, test_cv_f1_lgb,roc_auc_test_lightgbm,</span>
<span id="cb207-7"><a href="#cb207-7" aria-hidden="true" tabindex="-1"></a>                specificity_lightgbm_test, sensitivity_lightgbm_test]},</span>
<span id="cb207-8"><a href="#cb207-8" aria-hidden="true" tabindex="-1"></a>    <span class="co"># &#39;Model&#39;: [&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;,&#39;Logistic Regression&#39;]},</span></span>
<span id="cb207-9"><a href="#cb207-9" aria-hidden="true" tabindex="-1"></a>                              index <span class="op">=</span> [<span class="st">&#39;Accuracy&#39;</span>,<span class="st">&#39;Recall&#39;</span>,<span class="st">&#39;Precision&#39;</span>,<span class="st">&#39;F1&#39;</span>,<span class="st">&#39;ROC_AUC&#39;</span>,<span class="st">&#39;Specificity&#39;</span>,<span class="st">&#39;Sensitivity&#39;</span>])</span>
<span id="cb207-10"><a href="#cb207-10" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;The metrics for the LightGBM  are: &quot;</span>)</span>
<span id="cb207-11"><a href="#cb207-11" aria-hidden="true" tabindex="-1"></a>LightGBM_output</span></code></pre></div>
<div class="output stream stdout">
<pre><code>The metrics for the LightGBM  are: 
</code></pre>
</div>
<div class="output execute_result" data-execution_count="72">

  <div id="df-54fc45d6-09e8-4186-8777-de96139beaeb" class="colab-df-container">
    <div>
<style scoped>
    .dataframe tbody tr th:only-of-type {
        vertical-align: middle;
    }

    .dataframe tbody tr th {
        vertical-align: top;
    }

    .dataframe thead th {
        text-align: right;
    }
</style>
<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th></th>
      <th>Training</th>
      <th>Testing</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th>Accuracy</th>
      <td>0.950442</td>
      <td>0.943222</td>
    </tr>
    <tr>
      <th>Recall</th>
      <td>0.955490</td>
      <td>0.947220</td>
    </tr>
    <tr>
      <th>Precision</th>
      <td>0.952677</td>
      <td>0.946977</td>
    </tr>
    <tr>
      <th>F1</th>
      <td>0.954081</td>
      <td>0.947099</td>
    </tr>
    <tr>
      <th>ROC_AUC</th>
      <td>0.981536</td>
      <td>0.981536</td>
    </tr>
    <tr>
      <th>Specificity</th>
      <td>0.944544</td>
      <td>0.938594</td>
    </tr>
    <tr>
      <th>Sensitivity</th>
      <td>0.955490</td>
      <td>0.947220</td>
    </tr>
  </tbody>
</table>
</div>
    <div class="colab-df-buttons">

  <div class="colab-df-container">
    <button class="colab-df-convert" onclick="convertToInteractive('df-54fc45d6-09e8-4186-8777-de96139beaeb')"
            title="Convert this dataframe to an interactive table."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px" viewBox="0 -960 960 960">
    <path d="M120-120v-720h720v720H120Zm60-500h600v-160H180v160Zm220 220h160v-160H400v160Zm0 220h160v-160H400v160ZM180-400h160v-160H180v160Zm440 0h160v-160H620v160ZM180-180h160v-160H180v160Zm440 0h160v-160H620v160Z"/>
  </svg>
    </button>

  <style>
    .colab-df-container {
      display:flex;
      gap: 12px;
    }

    .colab-df-convert {
      background-color: #E8F0FE;
      border: none;
      border-radius: 50%;
      cursor: pointer;
      display: none;
      fill: #1967D2;
      height: 32px;
      padding: 0 0 0 0;
      width: 32px;
    }

    .colab-df-convert:hover {
      background-color: #E2EBFA;
      box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
      fill: #174EA6;
    }

    .colab-df-buttons div {
      margin-bottom: 4px;
    }

    [theme=dark] .colab-df-convert {
      background-color: #3B4455;
      fill: #D2E3FC;
    }

    [theme=dark] .colab-df-convert:hover {
      background-color: #434B5C;
      box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
      filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
      fill: #FFFFFF;
    }
  </style>

    <script>
      const buttonEl =
        document.querySelector('#df-54fc45d6-09e8-4186-8777-de96139beaeb button.colab-df-convert');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      async function convertToInteractive(key) {
        const element = document.querySelector('#df-54fc45d6-09e8-4186-8777-de96139beaeb');
        const dataTable =
          await google.colab.kernel.invokeFunction('convertToInteractive',
                                                    [key], {});
        if (!dataTable) return;

        const docLinkHtml = 'Like what you see? Visit the ' +
          '<a target="_blank" href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook</a>'
          + ' to learn more about interactive tables.';
        element.innerHTML = '';
        dataTable['output_type'] = 'display_data';
        await google.colab.output.renderOutput(dataTable, element);
        const docLink = document.createElement('div');
        docLink.innerHTML = docLinkHtml;
        element.appendChild(docLink);
      }
    </script>
  </div>


<div id="df-71ef866b-054d-44ab-8fd0-7c20864056d5">
  <button class="colab-df-quickchart" onclick="quickchart('df-71ef866b-054d-44ab-8fd0-7c20864056d5')"
            title="Suggest charts"
            style="display:none;">

<svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
     width="24px">
    <g>
        <path d="M19 3H5c-1.1 0-2 .9-2 2v14c0 1.1.9 2 2 2h14c1.1 0 2-.9 2-2V5c0-1.1-.9-2-2-2zM9 17H7v-7h2v7zm4 0h-2V7h2v10zm4 0h-2v-4h2v4z"/>
    </g>
</svg>
  </button>

<style>
  .colab-df-quickchart {
      --bg-color: #E8F0FE;
      --fill-color: #1967D2;
      --hover-bg-color: #E2EBFA;
      --hover-fill-color: #174EA6;
      --disabled-fill-color: #AAA;
      --disabled-bg-color: #DDD;
  }

  [theme=dark] .colab-df-quickchart {
      --bg-color: #3B4455;
      --fill-color: #D2E3FC;
      --hover-bg-color: #434B5C;
      --hover-fill-color: #FFFFFF;
      --disabled-bg-color: #3B4455;
      --disabled-fill-color: #666;
  }

  .colab-df-quickchart {
    background-color: var(--bg-color);
    border: none;
    border-radius: 50%;
    cursor: pointer;
    display: none;
    fill: var(--fill-color);
    height: 32px;
    padding: 0;
    width: 32px;
  }

  .colab-df-quickchart:hover {
    background-color: var(--hover-bg-color);
    box-shadow: 0 1px 2px rgba(60, 64, 67, 0.3), 0 1px 3px 1px rgba(60, 64, 67, 0.15);
    fill: var(--button-hover-fill-color);
  }

  .colab-df-quickchart-complete:disabled,
  .colab-df-quickchart-complete:disabled:hover {
    background-color: var(--disabled-bg-color);
    fill: var(--disabled-fill-color);
    box-shadow: none;
  }

  .colab-df-spinner {
    border: 2px solid var(--fill-color);
    border-color: transparent;
    border-bottom-color: var(--fill-color);
    animation:
      spin 1s steps(1) infinite;
  }

  @keyframes spin {
    0% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
      border-left-color: var(--fill-color);
    }
    20% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    30% {
      border-color: transparent;
      border-left-color: var(--fill-color);
      border-top-color: var(--fill-color);
      border-right-color: var(--fill-color);
    }
    40% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-top-color: var(--fill-color);
    }
    60% {
      border-color: transparent;
      border-right-color: var(--fill-color);
    }
    80% {
      border-color: transparent;
      border-right-color: var(--fill-color);
      border-bottom-color: var(--fill-color);
    }
    90% {
      border-color: transparent;
      border-bottom-color: var(--fill-color);
    }
  }
</style>

  <script>
    async function quickchart(key) {
      const quickchartButtonEl =
        document.querySelector('#' + key + ' button');
      quickchartButtonEl.disabled = true;  // To prevent multiple clicks.
      quickchartButtonEl.classList.add('colab-df-spinner');
      try {
        const charts = await google.colab.kernel.invokeFunction(
            'suggestCharts', [key], {});
      } catch (error) {
        console.error('Error during call to suggestCharts:', error);
      }
      quickchartButtonEl.classList.remove('colab-df-spinner');
      quickchartButtonEl.classList.add('colab-df-quickchart-complete');
    }
    (() => {
      let quickchartButtonEl =
        document.querySelector('#df-71ef866b-054d-44ab-8fd0-7c20864056d5 button');
      quickchartButtonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';
    })();
  </script>
</div>

  <div id="id_58289e60-5378-4a8f-91b0-3b100bed2ab0">
    <style>
      .colab-df-generate {
        background-color: #E8F0FE;
        border: none;
        border-radius: 50%;
        cursor: pointer;
        display: none;
        fill: #1967D2;
        height: 32px;
        padding: 0 0 0 0;
        width: 32px;
      }

      .colab-df-generate:hover {
        background-color: #E2EBFA;
        box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
        fill: #174EA6;
      }

      [theme=dark] .colab-df-generate {
        background-color: #3B4455;
        fill: #D2E3FC;
      }

      [theme=dark] .colab-df-generate:hover {
        background-color: #434B5C;
        box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
        filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
        fill: #FFFFFF;
      }
    </style>
    <button class="colab-df-generate" onclick="generateWithVariable('LightGBM_output')"
            title="Generate code using this dataframe."
            style="display:none;">

  <svg xmlns="http://www.w3.org/2000/svg" height="24px"viewBox="0 0 24 24"
       width="24px">
    <path d="M7,19H8.4L18.45,9,17,7.55,7,17.6ZM5,21V16.75L18.45,3.32a2,2,0,0,1,2.83,0l1.4,1.43a1.91,1.91,0,0,1,.58,1.4,1.91,1.91,0,0,1-.58,1.4L9.25,21ZM18.45,9,17,7.55Zm-12,3A5.31,5.31,0,0,0,4.9,8.1,5.31,5.31,0,0,0,1,6.5,5.31,5.31,0,0,0,4.9,4.9,5.31,5.31,0,0,0,6.5,1,5.31,5.31,0,0,0,8.1,4.9,5.31,5.31,0,0,0,12,6.5,5.46,5.46,0,0,0,6.5,12Z"/>
  </svg>
    </button>
    <script>
      (() => {
      const buttonEl =
        document.querySelector('#id_58289e60-5378-4a8f-91b0-3b100bed2ab0 button.colab-df-generate');
      buttonEl.style.display =
        google.colab.kernel.accessAllowed ? 'block' : 'none';

      buttonEl.onclick = () => {
        google.colab.notebook.generateWithVariable('LightGBM_output');
      }
      })();
    </script>
  </div>

    </div>
  </div>

</div>
</div>
<div class="cell code" id="a8ATAVXKhczi">
<div class="sourceCode" id="cb209"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="Irf_qy_Xhc18">
<div class="sourceCode" id="cb210"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="pickle-files" class="cell markdown" id="7tDMI41HeDqr">
<h1><strong>Pickle Files</strong></h1>
</section>
<div class="cell markdown" id="BbTWFSzXxlAY">

</div>
<section
id="using-pickle-files-to-store-the-output-of-the-trained-and-tested-models-for-lightgbm"
class="cell markdown" id="9HXCGdFrC6A3">
<h1><strong>Using Pickle files to store the output of the trained and
tested models for LightGBM</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="voZ7lb2B4V-f" data-outputId="ff6864f5-e067-452c-879f-ebbf81fef307">
<div class="sourceCode" id="cb211"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb211-1"><a href="#cb211-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pickle</span>
<span id="cb211-2"><a href="#cb211-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-3"><a href="#cb211-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Assuming you have already obtained the following variables:</span></span>
<span id="cb211-4"><a href="#cb211-4" aria-hidden="true" tabindex="-1"></a><span class="co"># best_params, train_cv_acc_lgb, train_cv_recall_lgb, train_cv_precision_lgb, train_cv_f1_lgb</span></span>
<span id="cb211-5"><a href="#cb211-5" aria-hidden="true" tabindex="-1"></a><span class="co"># test_cv_acc_lgb, test_cv_recall_lgb, test_cv_precision_lgb, test_cv_f1_lgb</span></span>
<span id="cb211-6"><a href="#cb211-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-7"><a href="#cb211-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a dictionary to store the results</span></span>
<span id="cb211-8"><a href="#cb211-8" aria-hidden="true" tabindex="-1"></a>lgb_results <span class="op">=</span> {</span>
<span id="cb211-9"><a href="#cb211-9" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;best_params&#39;</span>: best_params_lightgbm,</span>
<span id="cb211-10"><a href="#cb211-10" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_acc&#39;</span>: train_cv_acc_lgb,</span>
<span id="cb211-11"><a href="#cb211-11" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_recall&#39;</span>: train_cv_recall_lgb,</span>
<span id="cb211-12"><a href="#cb211-12" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_precision&#39;</span>: train_cv_precision_lgb,</span>
<span id="cb211-13"><a href="#cb211-13" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_f1&#39;</span>: train_cv_f1_lgb,</span>
<span id="cb211-14"><a href="#cb211-14" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_roc_auc&#39;</span>:roc_auc_train_lightgbm,</span>
<span id="cb211-15"><a href="#cb211-15" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_specificity&#39;</span>:specificity_lightgbm_train,</span>
<span id="cb211-16"><a href="#cb211-16" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_sensitivity&#39;</span>:sensitivity_lightgbm_train,</span>
<span id="cb211-17"><a href="#cb211-17" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_acc&#39;</span>: test_cv_acc_lgb,</span>
<span id="cb211-18"><a href="#cb211-18" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_recall&#39;</span>: test_cv_recall_lgb,</span>
<span id="cb211-19"><a href="#cb211-19" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_precision&#39;</span>: test_cv_precision_lgb,</span>
<span id="cb211-20"><a href="#cb211-20" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_f1&#39;</span>: test_cv_f1_lgb,</span>
<span id="cb211-21"><a href="#cb211-21" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_roc_auc&#39;</span>:roc_auc_test_lightgbm,</span>
<span id="cb211-22"><a href="#cb211-22" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_specificity&#39;</span>:specificity_lightgbm_test,</span>
<span id="cb211-23"><a href="#cb211-23" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_sensitivity&#39;</span>:sensitivity_lightgbm_test</span>
<span id="cb211-24"><a href="#cb211-24" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb211-25"><a href="#cb211-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-26"><a href="#cb211-26" aria-hidden="true" tabindex="-1"></a><span class="co"># Save the results to a pickle file</span></span>
<span id="cb211-27"><a href="#cb211-27" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;lgb_results_phase2.pkl&#39;</span>, <span class="st">&#39;wb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb211-28"><a href="#cb211-28" aria-hidden="true" tabindex="-1"></a>    pickle.dump(lgb_results, f)</span>
<span id="cb211-29"><a href="#cb211-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-30"><a href="#cb211-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;LightGBM results saved to &#39;lgb_results_SA.pkl&#39;.&quot;</span>)</span>
<span id="cb211-31"><a href="#cb211-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-32"><a href="#cb211-32" aria-hidden="true" tabindex="-1"></a><span class="co"># Later, when you want to load the results</span></span>
<span id="cb211-33"><a href="#cb211-33" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the results from the pickle file</span></span>
<span id="cb211-34"><a href="#cb211-34" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;lgb_results_phase2.pkl&#39;</span>, <span class="st">&#39;rb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb211-35"><a href="#cb211-35" aria-hidden="true" tabindex="-1"></a>    loaded_results <span class="op">=</span> pickle.load(f)</span>
<span id="cb211-36"><a href="#cb211-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-37"><a href="#cb211-37" aria-hidden="true" tabindex="-1"></a><span class="co"># Access the loaded results</span></span>
<span id="cb211-38"><a href="#cb211-38" aria-hidden="true" tabindex="-1"></a>best_params <span class="op">=</span> loaded_results[<span class="st">&#39;best_params&#39;</span>]</span>
<span id="cb211-39"><a href="#cb211-39" aria-hidden="true" tabindex="-1"></a>train_cv_acc_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_acc&#39;</span>]</span>
<span id="cb211-40"><a href="#cb211-40" aria-hidden="true" tabindex="-1"></a>train_cv_recall_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_recall&#39;</span>]</span>
<span id="cb211-41"><a href="#cb211-41" aria-hidden="true" tabindex="-1"></a>train_cv_precision_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_precision&#39;</span>]</span>
<span id="cb211-42"><a href="#cb211-42" aria-hidden="true" tabindex="-1"></a>train_cv_f1_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_f1&#39;</span>]</span>
<span id="cb211-43"><a href="#cb211-43" aria-hidden="true" tabindex="-1"></a>train_cv_roc_auc_lgb<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_roc_auc&#39;</span>]</span>
<span id="cb211-44"><a href="#cb211-44" aria-hidden="true" tabindex="-1"></a>train_cv_specificity<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_specificity&#39;</span>]</span>
<span id="cb211-45"><a href="#cb211-45" aria-hidden="true" tabindex="-1"></a>train_cv_sensitivity<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_sensitivity&#39;</span>]</span>
<span id="cb211-46"><a href="#cb211-46" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-47"><a href="#cb211-47" aria-hidden="true" tabindex="-1"></a>test_cv_acc_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_acc&#39;</span>]</span>
<span id="cb211-48"><a href="#cb211-48" aria-hidden="true" tabindex="-1"></a>test_cv_recall_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_recall&#39;</span>]</span>
<span id="cb211-49"><a href="#cb211-49" aria-hidden="true" tabindex="-1"></a>test_cv_precision_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_precision&#39;</span>]</span>
<span id="cb211-50"><a href="#cb211-50" aria-hidden="true" tabindex="-1"></a>test_cv_f1_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_f1&#39;</span>]</span>
<span id="cb211-51"><a href="#cb211-51" aria-hidden="true" tabindex="-1"></a>test_cv_roc_auc_lgb<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_roc_auc&#39;</span>]</span>
<span id="cb211-52"><a href="#cb211-52" aria-hidden="true" tabindex="-1"></a>test_cv_specificity<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_specificity&#39;</span>]</span>
<span id="cb211-53"><a href="#cb211-53" aria-hidden="true" tabindex="-1"></a>test_cv_sensitivity<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_sensitivity&#39;</span>]</span>
<span id="cb211-54"><a href="#cb211-54" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-55"><a href="#cb211-55" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb211-56"><a href="#cb211-56" aria-hidden="true" tabindex="-1"></a><span class="co"># Now you can directly access and use the loaded results without re-running the code</span></span></code></pre></div>
<div class="output stream stdout">
<pre><code>LightGBM results saved to &#39;lgb_results_SA.pkl&#39;.
</code></pre>
</div>
</div>
<section id="reading-results-of-lightgbm-from-the-picklefile"
class="cell markdown" id="B8P9ABEd4YCo">
<h1><strong>Reading Results of LightGBM from the
PickleFile.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="jSx9tlxDpKXk" data-outputId="802b8635-aaa7-44ca-a56a-a17152d05429">
<div class="sourceCode" id="cb213"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb213-1"><a href="#cb213-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pickle</span>
<span id="cb213-2"><a href="#cb213-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb213-3"><a href="#cb213-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the results from the pickle file</span></span>
<span id="cb213-4"><a href="#cb213-4" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;lgb_results_phase2.pkl&#39;</span>, <span class="st">&#39;rb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb213-5"><a href="#cb213-5" aria-hidden="true" tabindex="-1"></a>    loaded_results <span class="op">=</span> pickle.load(f)</span>
<span id="cb213-6"><a href="#cb213-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb213-7"><a href="#cb213-7" aria-hidden="true" tabindex="-1"></a>best_params <span class="op">=</span> loaded_results[<span class="st">&#39;best_params&#39;</span>]</span>
<span id="cb213-8"><a href="#cb213-8" aria-hidden="true" tabindex="-1"></a>train_cv_acc_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_acc&#39;</span>]</span>
<span id="cb213-9"><a href="#cb213-9" aria-hidden="true" tabindex="-1"></a>train_cv_recall_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_recall&#39;</span>]</span>
<span id="cb213-10"><a href="#cb213-10" aria-hidden="true" tabindex="-1"></a>train_cv_precision_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_precision&#39;</span>]</span>
<span id="cb213-11"><a href="#cb213-11" aria-hidden="true" tabindex="-1"></a>train_cv_f1_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_f1&#39;</span>]</span>
<span id="cb213-12"><a href="#cb213-12" aria-hidden="true" tabindex="-1"></a>train_cv_roc_auc_lgb<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_roc_auc&#39;</span>]</span>
<span id="cb213-13"><a href="#cb213-13" aria-hidden="true" tabindex="-1"></a>train_cv_specificity<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_specificity&#39;</span>]</span>
<span id="cb213-14"><a href="#cb213-14" aria-hidden="true" tabindex="-1"></a>train_cv_sensitivity<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_sensitivity&#39;</span>]</span>
<span id="cb213-15"><a href="#cb213-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb213-16"><a href="#cb213-16" aria-hidden="true" tabindex="-1"></a>test_cv_acc_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_acc&#39;</span>]</span>
<span id="cb213-17"><a href="#cb213-17" aria-hidden="true" tabindex="-1"></a>test_cv_recall_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_recall&#39;</span>]</span>
<span id="cb213-18"><a href="#cb213-18" aria-hidden="true" tabindex="-1"></a>test_cv_precision_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_precision&#39;</span>]</span>
<span id="cb213-19"><a href="#cb213-19" aria-hidden="true" tabindex="-1"></a>test_cv_f1_lgb <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_f1&#39;</span>]</span>
<span id="cb213-20"><a href="#cb213-20" aria-hidden="true" tabindex="-1"></a>test_cv_roc_auc_lgb<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_roc_auc&#39;</span>]</span>
<span id="cb213-21"><a href="#cb213-21" aria-hidden="true" tabindex="-1"></a>test_cv_specificity<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_specificity&#39;</span>]</span>
<span id="cb213-22"><a href="#cb213-22" aria-hidden="true" tabindex="-1"></a>test_cv_sensitivity<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_sensitivity&#39;</span>]</span>
<span id="cb213-23"><a href="#cb213-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb213-24"><a href="#cb213-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Now you can use the loaded results as needed</span></span>
<span id="cb213-25"><a href="#cb213-25" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Best parameters:&quot;</span>, best_params)</span>
<span id="cb213-26"><a href="#cb213-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Training Set Metrics:&quot;</span>)</span>
<span id="cb213-27"><a href="#cb213-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, train_cv_acc_lgb)</span>
<span id="cb213-28"><a href="#cb213-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, train_cv_recall_lgb)</span>
<span id="cb213-29"><a href="#cb213-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, train_cv_precision_lgb)</span>
<span id="cb213-30"><a href="#cb213-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, train_cv_f1_lgb)</span>
<span id="cb213-31"><a href="#cb213-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, train_cv_roc_auc_lgb)</span>
<span id="cb213-32"><a href="#cb213-32" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, train_cv_specificity)</span>
<span id="cb213-33"><a href="#cb213-33" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, train_cv_sensitivity)</span>
<span id="cb213-34"><a href="#cb213-34" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb213-35"><a href="#cb213-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb213-36"><a href="#cb213-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb213-37"><a href="#cb213-37" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n\n</span><span class="st">Testing Set Metrics:&quot;</span>)</span>
<span id="cb213-38"><a href="#cb213-38" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, test_cv_acc_lgb)</span>
<span id="cb213-39"><a href="#cb213-39" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, test_cv_recall_lgb)</span>
<span id="cb213-40"><a href="#cb213-40" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, test_cv_precision_lgb)</span>
<span id="cb213-41"><a href="#cb213-41" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, test_cv_f1_lgb)</span>
<span id="cb213-42"><a href="#cb213-42" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, test_cv_roc_auc_lgb)</span>
<span id="cb213-43"><a href="#cb213-43" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, test_cv_specificity)</span>
<span id="cb213-44"><a href="#cb213-44" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, test_cv_sensitivity)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Best parameters: {&#39;reg_lambda&#39;: 0.1, &#39;reg_alpha&#39;: 0.5, &#39;num_leaves&#39;: 20, &#39;n_estimators&#39;: 200, &#39;max_depth&#39;: -1, &#39;learning_rate&#39;: 0.01}
Training Set Metrics:
Accuracy: 0.9504419563936358
Recall: 0.9554899387576553
Precision: 0.9526769163668084
F1 Score: 0.9540813540813541
ROC AUC: 0.9815364788927324
Specificity: 0.9445438282647585
Sensitivity: 0.9554899387576553


Testing Set Metrics:
Accuracy: 0.943222436073687
Recall: 0.9472200871124776
Precision: 0.9469774590163934
F1 Score: 0.9470987575252978
ROC AUC: 0.9815364788927324
Specificity: 0.9385938890536932
Sensitivity: 0.9472200871124776
</code></pre>
</div>
</div>
<div class="cell code" id="uU2SGQl1fM1R">
<div class="sourceCode" id="cb215"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="9l3C5OUjfM4e">
<div class="sourceCode" id="cb216"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="saving-the-results-of-svm-into-pickle-files"
class="cell markdown" id="SW5HWiLwpucv">
<h1><strong>Saving the results of SVM into pickle files.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="v5RW5xXgpUhk" data-outputId="6af5c13b-81a1-4ae1-a825-82aa8e16da1a">
<div class="sourceCode" id="cb217"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb217-1"><a href="#cb217-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pickle</span>
<span id="cb217-2"><a href="#cb217-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-3"><a href="#cb217-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Assuming you have already obtained the following variables:</span></span>
<span id="cb217-4"><a href="#cb217-4" aria-hidden="true" tabindex="-1"></a><span class="co"># best_params, train_cv_acc_lgb, train_cv_recall_lgb, train_cv_precision_lgb, train_cv_f1_lgb</span></span>
<span id="cb217-5"><a href="#cb217-5" aria-hidden="true" tabindex="-1"></a><span class="co"># test_cv_acc_lgb, test_cv_recall_lgb, test_cv_precision_lgb, test_cv_f1_lgb</span></span>
<span id="cb217-6"><a href="#cb217-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-7"><a href="#cb217-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Create a dictionary to store the results</span></span>
<span id="cb217-8"><a href="#cb217-8" aria-hidden="true" tabindex="-1"></a>svm_results <span class="op">=</span> {</span>
<span id="cb217-9"><a href="#cb217-9" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;best_params_svm&#39;</span>: best_params,</span>
<span id="cb217-10"><a href="#cb217-10" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_acc_svm&#39;</span>: train_cv_acc_svm_linear,</span>
<span id="cb217-11"><a href="#cb217-11" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_recall_svm&#39;</span>: train_cv_recall_svm_linear,</span>
<span id="cb217-12"><a href="#cb217-12" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_precision_svm&#39;</span>: train_cv_precision_svm_linear,</span>
<span id="cb217-13"><a href="#cb217-13" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_f1_svm&#39;</span>: train_cv_f1_svm_linear,</span>
<span id="cb217-14"><a href="#cb217-14" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_roc_auc_svm&#39;</span>:roc_auc_train_svm,</span>
<span id="cb217-15"><a href="#cb217-15" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_specificity_svm&#39;</span>:specificity_svm_train,</span>
<span id="cb217-16"><a href="#cb217-16" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_sensitivity_svm&#39;</span>:sensitivity_svm_train,</span>
<span id="cb217-17"><a href="#cb217-17" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_acc_svm&#39;</span>: test_cv_acc_SVM,</span>
<span id="cb217-18"><a href="#cb217-18" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_recall_svm&#39;</span>: test_cv_recall_SVM,</span>
<span id="cb217-19"><a href="#cb217-19" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_precision_svm&#39;</span>: test_cv_precision_SVM,</span>
<span id="cb217-20"><a href="#cb217-20" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_f1_svm&#39;</span>: test_cv_f1_SVM,</span>
<span id="cb217-21"><a href="#cb217-21" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_roc_auc_svm&#39;</span>:roc_auc_test_svm,</span>
<span id="cb217-22"><a href="#cb217-22" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_specificity_svm&#39;</span>:specificity_svm_test,</span>
<span id="cb217-23"><a href="#cb217-23" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_sensitivity_svm&#39;</span>:sensitivity_svm_test</span>
<span id="cb217-24"><a href="#cb217-24" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb217-25"><a href="#cb217-25" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-26"><a href="#cb217-26" aria-hidden="true" tabindex="-1"></a><span class="co"># Save the results to a pickle file</span></span>
<span id="cb217-27"><a href="#cb217-27" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;svm_results_phase2.pkl&#39;</span>, <span class="st">&#39;wb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb217-28"><a href="#cb217-28" aria-hidden="true" tabindex="-1"></a>    pickle.dump(svm_results, f)</span>
<span id="cb217-29"><a href="#cb217-29" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-30"><a href="#cb217-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;SVM results saved to &#39;svm_results_phase1.pkl&#39;.&quot;</span>)</span>
<span id="cb217-31"><a href="#cb217-31" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-32"><a href="#cb217-32" aria-hidden="true" tabindex="-1"></a><span class="co"># Later, when you want to load the results</span></span>
<span id="cb217-33"><a href="#cb217-33" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the results from the pickle file</span></span>
<span id="cb217-34"><a href="#cb217-34" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;svm_results_phase1.pkl&#39;</span>, <span class="st">&#39;rb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb217-35"><a href="#cb217-35" aria-hidden="true" tabindex="-1"></a>    loaded_results <span class="op">=</span> pickle.load(f)</span>
<span id="cb217-36"><a href="#cb217-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-37"><a href="#cb217-37" aria-hidden="true" tabindex="-1"></a><span class="co"># Access the loaded results</span></span>
<span id="cb217-38"><a href="#cb217-38" aria-hidden="true" tabindex="-1"></a>best_params <span class="op">=</span> loaded_results[<span class="st">&#39;best_params_svm&#39;</span>]</span>
<span id="cb217-39"><a href="#cb217-39" aria-hidden="true" tabindex="-1"></a>train_cv_acc_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_acc_svm&#39;</span>]</span>
<span id="cb217-40"><a href="#cb217-40" aria-hidden="true" tabindex="-1"></a>train_cv_recall_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_recall_svm&#39;</span>]</span>
<span id="cb217-41"><a href="#cb217-41" aria-hidden="true" tabindex="-1"></a>train_cv_precision_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_precision_svm&#39;</span>]</span>
<span id="cb217-42"><a href="#cb217-42" aria-hidden="true" tabindex="-1"></a>train_cv_f1_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_f1_svm&#39;</span>]</span>
<span id="cb217-43"><a href="#cb217-43" aria-hidden="true" tabindex="-1"></a>train_cv_roc_auc_svm<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_roc_auc_svm&#39;</span>]</span>
<span id="cb217-44"><a href="#cb217-44" aria-hidden="true" tabindex="-1"></a>train_cv_specificity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_specificity_svm&#39;</span>]</span>
<span id="cb217-45"><a href="#cb217-45" aria-hidden="true" tabindex="-1"></a>train_cv_sensitivity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_sensitivity_svm&#39;</span>]</span>
<span id="cb217-46"><a href="#cb217-46" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-47"><a href="#cb217-47" aria-hidden="true" tabindex="-1"></a>test_cv_acc_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_acc_svm&#39;</span>]</span>
<span id="cb217-48"><a href="#cb217-48" aria-hidden="true" tabindex="-1"></a>test_cv_recall_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_recall_svm&#39;</span>]</span>
<span id="cb217-49"><a href="#cb217-49" aria-hidden="true" tabindex="-1"></a>test_cv_precision_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_precision_svm&#39;</span>]</span>
<span id="cb217-50"><a href="#cb217-50" aria-hidden="true" tabindex="-1"></a>test_cv_f1_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_f1_svm&#39;</span>]</span>
<span id="cb217-51"><a href="#cb217-51" aria-hidden="true" tabindex="-1"></a>test_cv_roc_auc_svm<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_roc_auc_svm&#39;</span>]</span>
<span id="cb217-52"><a href="#cb217-52" aria-hidden="true" tabindex="-1"></a>test_cv_specificity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_specificity_svm&#39;</span>]</span>
<span id="cb217-53"><a href="#cb217-53" aria-hidden="true" tabindex="-1"></a>test_cv_sensitivity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_sensitivity_svm&#39;</span>]</span>
<span id="cb217-54"><a href="#cb217-54" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-55"><a href="#cb217-55" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb217-56"><a href="#cb217-56" aria-hidden="true" tabindex="-1"></a><span class="co"># Now you can directly access and use the loaded results without re-running the code</span></span></code></pre></div>
<div class="output stream stdout">
<pre><code>SVM results saved to &#39;svm_results_phase1.pkl&#39;.
</code></pre>
</div>
</div>
<section id="reading-the-results-of-svm-from-the-picklefiles"
class="cell markdown" id="K3vBD5XzpzNY">
<h1><strong>Reading the results of SVM from the
picklefiles.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="ljYFr1KkpYpM" data-outputId="63b1ecec-1f43-40ff-e0e9-c72c830f4afa">
<div class="sourceCode" id="cb219"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb219-1"><a href="#cb219-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pickle</span>
<span id="cb219-2"><a href="#cb219-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb219-3"><a href="#cb219-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the results from the pickle file</span></span>
<span id="cb219-4"><a href="#cb219-4" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;svm_results_phase2.pkl&#39;</span>, <span class="st">&#39;rb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb219-5"><a href="#cb219-5" aria-hidden="true" tabindex="-1"></a>    loaded_results <span class="op">=</span> pickle.load(f)</span>
<span id="cb219-6"><a href="#cb219-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb219-7"><a href="#cb219-7" aria-hidden="true" tabindex="-1"></a>best_params_svm <span class="op">=</span> loaded_results[<span class="st">&#39;best_params_svm&#39;</span>]</span>
<span id="cb219-8"><a href="#cb219-8" aria-hidden="true" tabindex="-1"></a>train_cv_acc_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_acc_svm&#39;</span>]</span>
<span id="cb219-9"><a href="#cb219-9" aria-hidden="true" tabindex="-1"></a>train_cv_recall_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_recall_svm&#39;</span>]</span>
<span id="cb219-10"><a href="#cb219-10" aria-hidden="true" tabindex="-1"></a>train_cv_precision_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_precision_svm&#39;</span>]</span>
<span id="cb219-11"><a href="#cb219-11" aria-hidden="true" tabindex="-1"></a>train_cv_f1_svm <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_f1_svm&#39;</span>]</span>
<span id="cb219-12"><a href="#cb219-12" aria-hidden="true" tabindex="-1"></a>train_cv_roc_auc_svm<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_roc_auc_svm&#39;</span>]</span>
<span id="cb219-13"><a href="#cb219-13" aria-hidden="true" tabindex="-1"></a>train_cv_specificity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_specificity_svm&#39;</span>]</span>
<span id="cb219-14"><a href="#cb219-14" aria-hidden="true" tabindex="-1"></a>train_cv_sensitivity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;train_cv_sensitivity_svm&#39;</span>]</span>
<span id="cb219-15"><a href="#cb219-15" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb219-16"><a href="#cb219-16" aria-hidden="true" tabindex="-1"></a>test_cv_acc_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_acc_svm&#39;</span>]</span>
<span id="cb219-17"><a href="#cb219-17" aria-hidden="true" tabindex="-1"></a>test_cv_recall_lgb_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_recall_svm&#39;</span>]</span>
<span id="cb219-18"><a href="#cb219-18" aria-hidden="true" tabindex="-1"></a>test_cv_precision_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_precision_svm&#39;</span>]</span>
<span id="cb219-19"><a href="#cb219-19" aria-hidden="true" tabindex="-1"></a>test_cv_f1_svm <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_f1_svm&#39;</span>]</span>
<span id="cb219-20"><a href="#cb219-20" aria-hidden="true" tabindex="-1"></a>test_cv_roc_auc_svm<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_roc_auc_svm&#39;</span>]</span>
<span id="cb219-21"><a href="#cb219-21" aria-hidden="true" tabindex="-1"></a>test_cv_specificity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_specificity_svm&#39;</span>]</span>
<span id="cb219-22"><a href="#cb219-22" aria-hidden="true" tabindex="-1"></a>test_cv_sensitivity_svm<span class="op">=</span>loaded_results[<span class="st">&#39;test_cv_sensitivity_svm&#39;</span>]</span>
<span id="cb219-23"><a href="#cb219-23" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb219-24"><a href="#cb219-24" aria-hidden="true" tabindex="-1"></a><span class="co"># Now you can use the loaded results as needed</span></span>
<span id="cb219-25"><a href="#cb219-25" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Best parameters:&quot;</span>, best_params_svm)</span>
<span id="cb219-26"><a href="#cb219-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Training Set Metrics:&quot;</span>)</span>
<span id="cb219-27"><a href="#cb219-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, train_cv_acc_svm)</span>
<span id="cb219-28"><a href="#cb219-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, train_cv_recall_svm)</span>
<span id="cb219-29"><a href="#cb219-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, train_cv_precision_svm)</span>
<span id="cb219-30"><a href="#cb219-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, train_cv_f1_svm)</span>
<span id="cb219-31"><a href="#cb219-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, train_cv_roc_auc_svm)</span>
<span id="cb219-32"><a href="#cb219-32" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, train_cv_specificity_svm)</span>
<span id="cb219-33"><a href="#cb219-33" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, train_cv_sensitivity_svm)</span>
<span id="cb219-34"><a href="#cb219-34" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb219-35"><a href="#cb219-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb219-36"><a href="#cb219-36" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb219-37"><a href="#cb219-37" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n\n</span><span class="st">Testing Set Metrics:&quot;</span>)</span>
<span id="cb219-38"><a href="#cb219-38" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, test_cv_acc_svm)</span>
<span id="cb219-39"><a href="#cb219-39" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, test_cv_recall_svm)</span>
<span id="cb219-40"><a href="#cb219-40" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, test_cv_precision_svm)</span>
<span id="cb219-41"><a href="#cb219-41" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, test_cv_f1_svm)</span>
<span id="cb219-42"><a href="#cb219-42" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, test_cv_roc_auc_svm)</span>
<span id="cb219-43"><a href="#cb219-43" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, test_cv_specificity_svm)</span>
<span id="cb219-44"><a href="#cb219-44" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, test_cv_sensitivity_svm)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Best parameters: {&#39;reg_lambda&#39;: 0.1, &#39;reg_alpha&#39;: 0.5, &#39;num_leaves&#39;: 20, &#39;n_estimators&#39;: 200, &#39;max_depth&#39;: -1, &#39;learning_rate&#39;: 0.01}
Training Set Metrics:
Accuracy: 0.9297583971714791
Recall: 0.9458661417322834
Precision: 0.9254226407019046
F1 Score: 0.9355327203893997
ROC AUC: 0.8549379393318233
Specificity: 0.9109378993099924
Sensitivity: 0.9458661417322834


Testing Set Metrics:
Accuracy: 0.910228210063239
Recall: 0.9323597232897771
Precision: 0.9034260178748759
F1 Score: 0.9176648594124321
ROC AUC: 0.9084818491856776
Specificity: 0.8846039750815782
Sensitivity: 0.9323597232897771
</code></pre>
</div>
</div>
<div class="cell code" id="ojiPXvrlpUkZ">
<div class="sourceCode" id="cb221"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="I8eja25spUnZ">
<div class="sourceCode" id="cb222"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section
id="saving-the-results-of-logistic-regression-into-pickle-files"
class="cell markdown" id="AmOhXVvSp5Si">
<h1><strong>Saving the results of Logistic Regression into Pickle
Files.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="PE9lKuI0pUum" data-outputId="36ab2831-84bc-4f32-ba60-1878aeb9ea08">
<div class="sourceCode" id="cb223"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb223-1"><a href="#cb223-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Save the logistic regression results to a pickle file</span></span>
<span id="cb223-2"><a href="#cb223-2" aria-hidden="true" tabindex="-1"></a>logistic_results <span class="op">=</span> {</span>
<span id="cb223-3"><a href="#cb223-3" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;best_params_logistic&#39;</span>: best_params_logistic,</span>
<span id="cb223-4"><a href="#cb223-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_acc_logistic&#39;</span>: train_cv_acc_logistic,</span>
<span id="cb223-5"><a href="#cb223-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_recall_logistic&#39;</span>: train_cv_recall_logistic,</span>
<span id="cb223-6"><a href="#cb223-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_precision_logistic&#39;</span>: train_cv_precision_logistic,</span>
<span id="cb223-7"><a href="#cb223-7" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_f1_logistic&#39;</span>: train_cv_f1_logistic,</span>
<span id="cb223-8"><a href="#cb223-8" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_roc_auc_logistic&#39;</span>: roc_auc,</span>
<span id="cb223-9"><a href="#cb223-9" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_specificity_logistic&#39;</span>: specificity_logistic_train,</span>
<span id="cb223-10"><a href="#cb223-10" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_sensitivity_logistic&#39;</span>: sensitivity_logistic_train,</span>
<span id="cb223-11"><a href="#cb223-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb223-12"><a href="#cb223-12" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_acc_logistic&#39;</span>: test_cv_acc_logistic,</span>
<span id="cb223-13"><a href="#cb223-13" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_recall_logistic&#39;</span>: test_cv_recall_logistic,</span>
<span id="cb223-14"><a href="#cb223-14" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_precision_logistic&#39;</span>: test_cv_precision_logistic,</span>
<span id="cb223-15"><a href="#cb223-15" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_f1_logistic&#39;</span>: test_cv_f1_logistic,</span>
<span id="cb223-16"><a href="#cb223-16" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_roc_auc_logistic&#39;</span>: roc_auc_logistic_test,</span>
<span id="cb223-17"><a href="#cb223-17" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_specificity_logistic&#39;</span>: specificity_svm_test,</span>
<span id="cb223-18"><a href="#cb223-18" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_sensitivity_logistic&#39;</span>: sensitivity_svm_test</span>
<span id="cb223-19"><a href="#cb223-19" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb223-20"><a href="#cb223-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb223-21"><a href="#cb223-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Save the results to a pickle file</span></span>
<span id="cb223-22"><a href="#cb223-22" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;logistic_results_phase2.pkl&#39;</span>, <span class="st">&#39;wb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb223-23"><a href="#cb223-23" aria-hidden="true" tabindex="-1"></a>    pickle.dump(logistic_results, f)</span>
<span id="cb223-24"><a href="#cb223-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb223-25"><a href="#cb223-25" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Logistic results saved to &#39;logistic_results_phase1.pkl&#39;.&quot;</span>)</span>
<span id="cb223-26"><a href="#cb223-26" aria-hidden="true" tabindex="-1"></a></span></code></pre></div>
<div class="output stream stdout">
<pre><code>Logistic results saved to &#39;logistic_results_phase1.pkl&#39;.
</code></pre>
</div>
</div>
<section
id="reading-the-results-of-logistic-regression-form-pickle-files"
class="cell markdown" id="yWvXeu1-p_dq">
<h1><strong>Reading the results of Logistic Regression form Pickle
Files.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="l6DgyBBcpUxa" data-outputId="817cd592-26f3-4742-8b10-9ef7a236fff1">
<div class="sourceCode" id="cb225"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb225-1"><a href="#cb225-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pickle</span>
<span id="cb225-2"><a href="#cb225-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb225-3"><a href="#cb225-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the results from the pickle file</span></span>
<span id="cb225-4"><a href="#cb225-4" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;logistic_results_phase2.pkl&#39;</span>, <span class="st">&#39;rb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb225-5"><a href="#cb225-5" aria-hidden="true" tabindex="-1"></a>    loaded_results <span class="op">=</span> pickle.load(f)</span>
<span id="cb225-6"><a href="#cb225-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb225-7"><a href="#cb225-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Access the loaded results</span></span>
<span id="cb225-8"><a href="#cb225-8" aria-hidden="true" tabindex="-1"></a>best_params_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;best_params_logistic&#39;</span>]</span>
<span id="cb225-9"><a href="#cb225-9" aria-hidden="true" tabindex="-1"></a>train_cv_acc_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_acc_logistic&#39;</span>]</span>
<span id="cb225-10"><a href="#cb225-10" aria-hidden="true" tabindex="-1"></a>train_cv_recall_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_recall_logistic&#39;</span>]</span>
<span id="cb225-11"><a href="#cb225-11" aria-hidden="true" tabindex="-1"></a>train_cv_precision_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_precision_logistic&#39;</span>]</span>
<span id="cb225-12"><a href="#cb225-12" aria-hidden="true" tabindex="-1"></a>train_cv_f1_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_f1_logistic&#39;</span>]</span>
<span id="cb225-13"><a href="#cb225-13" aria-hidden="true" tabindex="-1"></a>train_cv_roc_auc_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_roc_auc_logistic&#39;</span>]</span>
<span id="cb225-14"><a href="#cb225-14" aria-hidden="true" tabindex="-1"></a>train_cv_specificity_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_specificity_logistic&#39;</span>]</span>
<span id="cb225-15"><a href="#cb225-15" aria-hidden="true" tabindex="-1"></a>train_cv_sensitivity_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_sensitivity_logistic&#39;</span>]</span>
<span id="cb225-16"><a href="#cb225-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb225-17"><a href="#cb225-17" aria-hidden="true" tabindex="-1"></a>test_cv_acc_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_acc_logistic&#39;</span>]</span>
<span id="cb225-18"><a href="#cb225-18" aria-hidden="true" tabindex="-1"></a>test_cv_recall_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_recall_logistic&#39;</span>]</span>
<span id="cb225-19"><a href="#cb225-19" aria-hidden="true" tabindex="-1"></a>test_cv_precision_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_precision_logistic&#39;</span>]</span>
<span id="cb225-20"><a href="#cb225-20" aria-hidden="true" tabindex="-1"></a>test_cv_f1_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_f1_logistic&#39;</span>]</span>
<span id="cb225-21"><a href="#cb225-21" aria-hidden="true" tabindex="-1"></a>test_cv_roc_auc_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_roc_auc_logistic&#39;</span>]</span>
<span id="cb225-22"><a href="#cb225-22" aria-hidden="true" tabindex="-1"></a>test_cv_specificity_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_specificity_logistic&#39;</span>]</span>
<span id="cb225-23"><a href="#cb225-23" aria-hidden="true" tabindex="-1"></a>test_cv_sensitivity_logistic <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_sensitivity_logistic&#39;</span>]</span>
<span id="cb225-24"><a href="#cb225-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb225-25"><a href="#cb225-25" aria-hidden="true" tabindex="-1"></a><span class="co"># Now you can use the loaded results as needed</span></span>
<span id="cb225-26"><a href="#cb225-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Best parameters:&quot;</span>, best_params_logistic)</span>
<span id="cb225-27"><a href="#cb225-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Training Set Metrics:&quot;</span>)</span>
<span id="cb225-28"><a href="#cb225-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, train_cv_acc_logistic)</span>
<span id="cb225-29"><a href="#cb225-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, train_cv_recall_logistic)</span>
<span id="cb225-30"><a href="#cb225-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, train_cv_precision_logistic)</span>
<span id="cb225-31"><a href="#cb225-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, train_cv_f1_logistic)</span>
<span id="cb225-32"><a href="#cb225-32" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, train_cv_roc_auc_logistic)</span>
<span id="cb225-33"><a href="#cb225-33" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, train_cv_specificity_logistic)</span>
<span id="cb225-34"><a href="#cb225-34" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, train_cv_sensitivity_logistic)</span>
<span id="cb225-35"><a href="#cb225-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb225-36"><a href="#cb225-36" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n\n</span><span class="st">Testing Set Metrics:&quot;</span>)</span>
<span id="cb225-37"><a href="#cb225-37" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, test_cv_acc_logistic)</span>
<span id="cb225-38"><a href="#cb225-38" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, test_cv_recall_logistic)</span>
<span id="cb225-39"><a href="#cb225-39" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, test_cv_precision_logistic)</span>
<span id="cb225-40"><a href="#cb225-40" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, test_cv_f1_logistic)</span>
<span id="cb225-41"><a href="#cb225-41" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, test_cv_roc_auc_logistic)</span>
<span id="cb225-42"><a href="#cb225-42" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, test_cv_specificity_logistic)</span>
<span id="cb225-43"><a href="#cb225-43" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, test_cv_sensitivity_logistic)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Best parameters: {&#39;solver&#39;: &#39;liblinear&#39;, &#39;penalty&#39;: &#39;l1&#39;, &#39;C&#39;: 0.01}
Training Set Metrics:
Accuracy: 0.8563347083087802
Recall: 0.8729221347331584
Precision: 0.8621732555627565
F1 Score: 0.8675144006086294
ROC AUC: 0.976379707156936
Specificity: 0.8369537439304882
Sensitivity: 0.8729221347331584


Testing Set Metrics:
Accuracy: 0.8435523783337916
Recall: 0.8539584934665642
Precision: 0.8543963086388106
F1 Score: 0.8541773449513069
ROC AUC: 0.8427312491064651
Specificity: 0.8846039750815782
Sensitivity: 0.9323597232897771
</code></pre>
</div>
</div>
<div class="cell code" id="GeE1xY8-pUz7">
<div class="sourceCode" id="cb227"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<div class="cell code" id="HRRI84iMpU2v">
<div class="sourceCode" id="cb228"><pre
class="sourceCode python"><code class="sourceCode python"></code></pre></div>
</div>
<section id="saving-the-results-of-rf-into-pickle-files"
class="cell markdown" id="n1bgq8TpqFqI">
<h1><strong>Saving the results of RF into pickle files.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="LEmxGFI4pU5p" data-outputId="48aaec70-f8b9-4e7a-db33-2687a6e5ad7a">
<div class="sourceCode" id="cb229"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb229-1"><a href="#cb229-1" aria-hidden="true" tabindex="-1"></a><span class="co"># Save the random forest results to a pickle file</span></span>
<span id="cb229-2"><a href="#cb229-2" aria-hidden="true" tabindex="-1"></a>rf_results <span class="op">=</span> {</span>
<span id="cb229-3"><a href="#cb229-3" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;best_params_rf&#39;</span>: best_params_rf,</span>
<span id="cb229-4"><a href="#cb229-4" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_acc_rf&#39;</span>: train_cv_acc_rf,</span>
<span id="cb229-5"><a href="#cb229-5" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_recall_rf&#39;</span>: train_cv_recall_rf,</span>
<span id="cb229-6"><a href="#cb229-6" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_precision_rf&#39;</span>: train_cv_precision_rf,</span>
<span id="cb229-7"><a href="#cb229-7" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_f1_rf&#39;</span>: train_cv_f1_rf,</span>
<span id="cb229-8"><a href="#cb229-8" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_roc_auc_rf&#39;</span>: roc_auc_train_rf,</span>
<span id="cb229-9"><a href="#cb229-9" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_specificity_rf&#39;</span>: specificity_rf_train,</span>
<span id="cb229-10"><a href="#cb229-10" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;train_cv_sensitivity_rf&#39;</span>: sensitivity_rf_train,</span>
<span id="cb229-11"><a href="#cb229-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb229-12"><a href="#cb229-12" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_acc_rf&#39;</span>: test_cv_acc_rf,</span>
<span id="cb229-13"><a href="#cb229-13" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_recall_rf&#39;</span>: test_cv_recall_rf,</span>
<span id="cb229-14"><a href="#cb229-14" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_precision_rf&#39;</span>: test_cv_precision_rf,</span>
<span id="cb229-15"><a href="#cb229-15" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_f1_rf&#39;</span>: test_cv_f1_rf,</span>
<span id="cb229-16"><a href="#cb229-16" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_roc_auc_rf&#39;</span>: roc_auc_test_rf,</span>
<span id="cb229-17"><a href="#cb229-17" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_specificity_rf&#39;</span>: specificity_rf_test,</span>
<span id="cb229-18"><a href="#cb229-18" aria-hidden="true" tabindex="-1"></a>    <span class="st">&#39;test_cv_sensitivity_rf&#39;</span>: sensitivity_rf_test</span>
<span id="cb229-19"><a href="#cb229-19" aria-hidden="true" tabindex="-1"></a>}</span>
<span id="cb229-20"><a href="#cb229-20" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb229-21"><a href="#cb229-21" aria-hidden="true" tabindex="-1"></a><span class="co"># Save the results to a pickle file</span></span>
<span id="cb229-22"><a href="#cb229-22" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;rf_results_phase2.pkl&#39;</span>, <span class="st">&#39;wb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb229-23"><a href="#cb229-23" aria-hidden="true" tabindex="-1"></a>    pickle.dump(rf_results, f)</span>
<span id="cb229-24"><a href="#cb229-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb229-25"><a href="#cb229-25" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Random Forest results saved to &#39;rf_results_phase1.pkl&#39;.&quot;</span>)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Random Forest results saved to &#39;rf_results_phase1.pkl&#39;.
</code></pre>
</div>
</div>
<section id="reading-the-results-of-rf-from-pickle-files"
class="cell markdown" id="V0LLnrP-qMhp">
<h1><strong>Reading the results of RF from pickle files.</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="UqI9_Ol7pk0j" data-outputId="c7d3966b-f6b4-41de-8583-26f619336da6">
<div class="sourceCode" id="cb231"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb231-1"><a href="#cb231-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pickle</span>
<span id="cb231-2"><a href="#cb231-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb231-3"><a href="#cb231-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Load the results from the pickle file</span></span>
<span id="cb231-4"><a href="#cb231-4" aria-hidden="true" tabindex="-1"></a><span class="cf">with</span> <span class="bu">open</span>(<span class="st">&#39;rf_results_phase2.pkl&#39;</span>, <span class="st">&#39;rb&#39;</span>) <span class="im">as</span> f:</span>
<span id="cb231-5"><a href="#cb231-5" aria-hidden="true" tabindex="-1"></a>    loaded_results <span class="op">=</span> pickle.load(f)</span>
<span id="cb231-6"><a href="#cb231-6" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb231-7"><a href="#cb231-7" aria-hidden="true" tabindex="-1"></a><span class="co"># Access the loaded results</span></span>
<span id="cb231-8"><a href="#cb231-8" aria-hidden="true" tabindex="-1"></a>best_params_rf <span class="op">=</span> loaded_results[<span class="st">&#39;best_params_rf&#39;</span>]</span>
<span id="cb231-9"><a href="#cb231-9" aria-hidden="true" tabindex="-1"></a>train_cv_acc_rf <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_acc_rf&#39;</span>]</span>
<span id="cb231-10"><a href="#cb231-10" aria-hidden="true" tabindex="-1"></a>train_cv_recall_rf <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_recall_rf&#39;</span>]</span>
<span id="cb231-11"><a href="#cb231-11" aria-hidden="true" tabindex="-1"></a>train_cv_precision_rf <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_precision_rf&#39;</span>]</span>
<span id="cb231-12"><a href="#cb231-12" aria-hidden="true" tabindex="-1"></a>train_cv_f1_rf <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_f1_rf&#39;</span>]</span>
<span id="cb231-13"><a href="#cb231-13" aria-hidden="true" tabindex="-1"></a>train_cv_roc_auc_rf <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_roc_auc_rf&#39;</span>]</span>
<span id="cb231-14"><a href="#cb231-14" aria-hidden="true" tabindex="-1"></a>train_cv_specificity_rf <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_specificity_rf&#39;</span>]</span>
<span id="cb231-15"><a href="#cb231-15" aria-hidden="true" tabindex="-1"></a>train_cv_sensitivity_rf <span class="op">=</span> loaded_results[<span class="st">&#39;train_cv_sensitivity_rf&#39;</span>]</span>
<span id="cb231-16"><a href="#cb231-16" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb231-17"><a href="#cb231-17" aria-hidden="true" tabindex="-1"></a>test_cv_acc_rf <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_acc_rf&#39;</span>]</span>
<span id="cb231-18"><a href="#cb231-18" aria-hidden="true" tabindex="-1"></a>test_cv_recall_rf <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_recall_rf&#39;</span>]</span>
<span id="cb231-19"><a href="#cb231-19" aria-hidden="true" tabindex="-1"></a>test_cv_precision_rf <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_precision_rf&#39;</span>]</span>
<span id="cb231-20"><a href="#cb231-20" aria-hidden="true" tabindex="-1"></a>test_cv_f1_rf <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_f1_rf&#39;</span>]</span>
<span id="cb231-21"><a href="#cb231-21" aria-hidden="true" tabindex="-1"></a>test_cv_roc_auc_rf <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_roc_auc_rf&#39;</span>]</span>
<span id="cb231-22"><a href="#cb231-22" aria-hidden="true" tabindex="-1"></a>test_cv_specificity_rf <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_specificity_rf&#39;</span>]</span>
<span id="cb231-23"><a href="#cb231-23" aria-hidden="true" tabindex="-1"></a>test_cv_sensitivity_rf <span class="op">=</span> loaded_results[<span class="st">&#39;test_cv_sensitivity_rf&#39;</span>]</span>
<span id="cb231-24"><a href="#cb231-24" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb231-25"><a href="#cb231-25" aria-hidden="true" tabindex="-1"></a><span class="co"># Now you can use the loaded results as needed</span></span>
<span id="cb231-26"><a href="#cb231-26" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Best parameters:&quot;</span>, best_params_rf)</span>
<span id="cb231-27"><a href="#cb231-27" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Training Set Metrics:&quot;</span>)</span>
<span id="cb231-28"><a href="#cb231-28" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, train_cv_acc_rf)</span>
<span id="cb231-29"><a href="#cb231-29" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, train_cv_recall_rf)</span>
<span id="cb231-30"><a href="#cb231-30" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, train_cv_precision_rf)</span>
<span id="cb231-31"><a href="#cb231-31" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, train_cv_f1_rf)</span>
<span id="cb231-32"><a href="#cb231-32" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, train_cv_roc_auc_rf)</span>
<span id="cb231-33"><a href="#cb231-33" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, train_cv_specificity_rf)</span>
<span id="cb231-34"><a href="#cb231-34" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, train_cv_sensitivity_rf)</span>
<span id="cb231-35"><a href="#cb231-35" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb231-36"><a href="#cb231-36" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;</span><span class="ch">\n\n</span><span class="st">Testing Set Metrics:&quot;</span>)</span>
<span id="cb231-37"><a href="#cb231-37" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Accuracy:&quot;</span>, test_cv_acc_rf)</span>
<span id="cb231-38"><a href="#cb231-38" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Recall:&quot;</span>, test_cv_recall_rf)</span>
<span id="cb231-39"><a href="#cb231-39" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Precision:&quot;</span>, test_cv_precision_rf)</span>
<span id="cb231-40"><a href="#cb231-40" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;F1 Score:&quot;</span>, test_cv_f1_rf)</span>
<span id="cb231-41"><a href="#cb231-41" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;ROC AUC:&quot;</span>, test_cv_roc_auc_rf)</span>
<span id="cb231-42"><a href="#cb231-42" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Specificity:&quot;</span>, test_cv_specificity_rf)</span>
<span id="cb231-43"><a href="#cb231-43" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">&quot;Sensitivity:&quot;</span>, test_cv_sensitivity_rf)</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Best parameters: {&#39;n_estimators&#39;: 100, &#39;min_samples_split&#39;: 2, &#39;min_samples_leaf&#39;: 2, &#39;max_depth&#39;: 30}
Training Set Metrics:
Accuracy: 0.9829110194460813
Recall: 0.9992344706911636
Precision: 0.9699575371549893
F1 Score: 0.984378366731308
ROC AUC: 0.9815364788927324
Specificity: 0.9638384870943011
Sensitivity: 0.9992344706911636


Testing Set Metrics:
Accuracy: 0.9407478691229035
Recall: 0.9451703817576224
Precision: 0.9444444444444444
F1 Score: 0.9448072736585991
ROC AUC: 0.9403988960108196
Specificity: 0.9356274102640166
Sensitivity: 0.9451703817576224
</code></pre>
</div>
</div>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="6arDW2cpfM9r" data-outputId="a5f69d67-4e1a-45e5-fc35-7e219273ae1d">
<div class="sourceCode" id="cb233"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb233-1"><a href="#cb233-1" aria-hidden="true" tabindex="-1"></a><span class="op">!</span>pip install tabulate</span></code></pre></div>
<div class="output stream stdout">
<pre><code>Requirement already satisfied: tabulate in /usr/local/lib/python3.10/dist-packages (0.9.0)
</code></pre>
</div>
</div>
<section id="time-consumption-and-memory-occupancy-of-the-models"
class="cell markdown" id="jw7DzdC9eTEA">
<h1><strong>Time Consumption and Memory Occupancy of the
Models</strong></h1>
</section>
<div class="cell code"
data-colab="{&quot;base_uri&quot;:&quot;https://localhost:8080/&quot;}"
id="Ze_HFHK8fRzg" data-outputId="7486de97-51dd-4006-c2a5-718f08ad5f30">
<div class="sourceCode" id="cb235"><pre
class="sourceCode python"><code class="sourceCode python"><span id="cb235-1"><a href="#cb235-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> tabulate <span class="im">import</span> tabulate</span>
<span id="cb235-2"><a href="#cb235-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb235-3"><a href="#cb235-3" aria-hidden="true" tabindex="-1"></a><span class="co"># Organize results into a list of lists</span></span>
<span id="cb235-4"><a href="#cb235-4" aria-hidden="true" tabindex="-1"></a>results <span class="op">=</span> [</span>
<span id="cb235-5"><a href="#cb235-5" aria-hidden="true" tabindex="-1"></a>    [<span class="st">&quot;LightGBM&quot;</span>, execution_time_lightgbm, memory_used_lightgbm],</span>
<span id="cb235-6"><a href="#cb235-6" aria-hidden="true" tabindex="-1"></a>    [<span class="st">&quot;Logistic Regression&quot;</span>, execution_time_logistic, memory_used_logistic],</span>
<span id="cb235-7"><a href="#cb235-7" aria-hidden="true" tabindex="-1"></a>    [<span class="st">&quot;Support Vector Machine&quot;</span>, execution_time_svm, memory_used_svm],</span>
<span id="cb235-8"><a href="#cb235-8" aria-hidden="true" tabindex="-1"></a>    [<span class="st">&quot;Random Forest&quot;</span>, execution_time_rf, memory_used_rf],</span>
<span id="cb235-9"><a href="#cb235-9" aria-hidden="true" tabindex="-1"></a>    <span class="co"># [&quot;Naive Bayes&quot;, execution_time_nb, memory_used_nb]</span></span>
<span id="cb235-10"><a href="#cb235-10" aria-hidden="true" tabindex="-1"></a>]</span>
<span id="cb235-11"><a href="#cb235-11" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb235-12"><a href="#cb235-12" aria-hidden="true" tabindex="-1"></a><span class="co"># Print results in a table</span></span>
<span id="cb235-13"><a href="#cb235-13" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(tabulate(results, headers<span class="op">=</span>[<span class="st">&quot;Model&quot;</span>, <span class="st">&quot;Execution Time (seconds)&quot;</span>, <span class="st">&quot;Memory Used (MiB)&quot;</span>], tablefmt<span class="op">=</span><span class="st">&quot;grid&quot;</span>))</span></code></pre></div>
<div class="output stream stdout">
<pre><code>+------------------------+----------------------------+---------------------+
| Model                  |   Execution Time (seconds) |   Memory Used (MiB) |
+========================+============================+=====================+
| LightGBM               |                  253.991   |             1658.44 |
+------------------------+----------------------------+---------------------+
| Logistic Regression    |                  134.493   |             1380.18 |
+------------------------+----------------------------+---------------------+
| Support Vector Machine |                 2822.52    |             1628.64 |
+------------------------+----------------------------+---------------------+
| Random Forest          |                 9481.62    |             1664.7  |
+------------------------+----------------------------+---------------------+
| Naive Bayes            |                    4.73384 |             1665.21 |
+------------------------+----------------------------+---------------------+
</code></pre>
</div>
</div>
<section id="brief-conclusion" class="cell markdown" id="Za9JveI_jB4W">
<h1><strong>Brief Conclusion:</strong></h1>
</section>
<div class="cell markdown" id="NUoSMm9ziBYL">
<p><strong>On a overall basis, the sentiment analysis has seen to be
helpful with the customer churn prediction for models like Logistic
Regression and LightGBM. On the other hand, it did not create much
impact with the use of Support Vector Machine and Random Forest. Though
the variation of improvement and decrement of the models' performance,
the inclusion of the Sentiment Analysis provides some clear insights for
the organization based on the model selection, its interpretability, and
the model complexity.</strong></p>
<p><strong>Furthermore, The AUC is one of the important metrics to
understand whether model is differentiating well in between both the
binary classes. All the models are performing well with respect to this.
Also, it can be concluded that the newly added model "naive bayes" is
performing so bad than all the other previous models. Consequently, in
terms of better customer churn prediction, naive bayes is not as good as
the other models.</strong></p>
</div>
</body>
</html>