Merge pull request #20 from MolecularAI/misc_fixes

Init 3.1.1 docs
MolecularAI · Jul 9, 2024 · 1fc1420 · 1fc1420
2 parents 6deb700 + e886ee3
commit 1fc1420
Show file tree

Hide file tree

Showing 8 changed files with 312 additions and 219 deletions.
diff --git a/docs/sphinx-builddir/doctrees/README.doctree b/docs/sphinx-builddir/doctrees/README.doctree
diff --git a/docs/sphinx-builddir/doctrees/environment.pickle b/docs/sphinx-builddir/doctrees/environment.pickle
diff --git a/docs/sphinx-builddir/doctrees/notebooks/QSARtuna_Tutorial.doctree b/docs/sphinx-builddir/doctrees/notebooks/QSARtuna_Tutorial.doctree
diff --git a/docs/sphinx-builddir/html/README.html b/docs/sphinx-builddir/html/README.html
diff --git a/docs/sphinx-builddir/html/_sources/README.md.txt b/docs/sphinx-builddir/html/_sources/README.md.txt
@@ -135,26 +135,129 @@ and optimization is free to pair any specified descriptor with any of the algori
 
 When we have our data and our configuration, it is time to start the optimization.
 
-### Running via singulartity
 
-QSARtuna can be deployed using [Singularity](https://sylabs.io/guides/3.7/user-guide/index.html) container.
+## Run from Python/Jupyter Notebook
 
-To run commands inside the container, Singularity uses the following syntax:
+Create conda environment with Jupyter and Install QSARtuna there:
 ```shell
-singularity exec <container.sif> <command>
+module purge
+module load Miniconda3
+conda create --name my_env_with_qsartuna python=3.10.10 jupyter pip
+conda activate my_env_with_qsartuna
+module purge  # Just in case.
+which python  # Check. Should output path that contains "my_env_with_qsartuna".
+python -m pip install https://github.com/MolecularAI/QSARtuna/releases/download/3.1.1/qsartuna-3.1.1.tar.gz
+```
+
+Then you can use QSARtuna inside your Notebook:
+```python
+from qsartuna.three_step_opt_build_merge import (
+    optimize,
+    buildconfig_best,
+    build_best,
+    build_merged,
+)
+from qsartuna.config import ModelMode, OptimizationDirection
+from qsartuna.config.optconfig import (
+    OptimizationConfig,
+    SVR,
+    RandomForest,
+    Ridge,
+    Lasso,
+    PLS,
+    XGBregressor,
+)
+from qsartuna.datareader import Dataset
+from qsartuna.descriptors import ECFP, MACCS_keys, ECFP_counts
+
+##
+# Prepare hyperparameter optimization configuration.
+config = OptimizationConfig(
+    data=Dataset(
+        input_column="canonical",
+        response_column="molwt",
+        training_dataset_file="tests/data/DRD2/subset-50/train.csv",
+    ),
+    descriptors=[ECFP.new(), ECFP_counts.new(), MACCS_keys.new()],
+    algorithms=[
+        SVR.new(),
+        RandomForest.new(),
+        Ridge.new(),
+        Lasso.new(),
+        PLS.new(),
+        XGBregressor.new(),
+    ],
+    settings=OptimizationConfig.Settings(
+        mode=ModelMode.REGRESSION,
+        cross_validation=3,
+        n_trials=100,
+        direction=OptimizationDirection.MAXIMIZATION,
+    ),
+)
+
+##
+# Run Optuna Study.
+study = optimize(config, study_name="my_study")
+
+##
+# Get the best Trial from the Study and make a Build (Training) configuration for it.
+buildconfig = buildconfig_best(study)
+# Optional: write out JSON of the best configuration.
+import json
+print(json.dumps(buildconfig.json(), indent=2))
+
+##
+# Build (re-Train) and save the best model.
+build_best(buildconfig, "target/best.pkl")
+
+##
+# Build (Train) and save the model on the merged train+test data.
+build_merged(buildconfig, "target/merged.pkl")
+```
+
+## Running via CLI
+
+QSARtuna can be deployed directly from the CLI
+
+To run commands QSARtuna uses the following syntax:
+```shell
+qsartuna-<optimize|build|predict|schemagen> <command>
 ```
 
 We can run three-step-process from command line with the following command:
 
 ```shell
-singularity exec /projects/cc/mai/containers/QSARtuna_latest.sif \
-  /opt/qsartuna/.venv/bin/qsartuna-optimize \
+  qsartuna-optimize \
   --config examples/optimization/regression_drd2_50.json \
   --best-buildconfig-outpath ~/qsartuna-target/best.json \
   --best-model-outpath ~/qsartuna-target/best.pkl \
   --merged-model-outpath ~/qsartuna-target/merged.pkl
 ```
 
+Optimization accepts the following command line arguments:
+
+```
+shell
+qsartuna-optimize -h 
+usage: qsartuna-optimize [-h] --config CONFIG [--best-buildconfig-outpath BEST_BUILDCONFIG_OUTPATH] [--best-model-outpath BEST_MODEL_OUTPATH] [--merged-model-outpath MERGED_MODEL_OUTPATH] [--no-cache]
+
+optbuild: Optimize hyper-parameters and build (train) the best model.
+
+options:
+  -h, --help            show this help message and exit
+  --best-buildconfig-outpath BEST_BUILDCONFIG_OUTPATH
+                        Path where to write Json of the best build configuration.
+  --best-model-outpath BEST_MODEL_OUTPATH
+                        Path where to write (persist) the best model.
+  --merged-model-outpath MERGED_MODEL_OUTPATH
+                        Path where to write (persist) the model trained on merged train+test data.
+  --no-cache            Turn off descriptor generation caching
+
+required named arguments:
+  --config CONFIG       Path to input configuration file (JSON): either Optimization configuration, or Build (training) configuration.
+
+```
+
 Since optimization can be a long process,
 we should avoid running it on the login node, 
 and we should submit it to the SLURM queue instead. 
@@ -176,13 +279,14 @@ We can submit our script to the queue by giving `sbatch` the following script:
 # This script illustrates how to run one configuration from QSARtuna examples.
 # The example we use is in examples/optimization/regression_drd2_50.json.
 
+module load Miniconda3
+conda activate my_env_with_qsartuna
+
 # The example we chose uses relative paths to data files, change directory.
-cd /{project_folder}/OptunaAZ-versions/OptunaAZ_latest
+cd /{project_folder}/
 
-singularity exec \
-  /{project_folder}/containers/QSARtuna_latest.sif \
-  /opt/qsartuna/.venv/bin/qsartuna-optimize \
-  --config{project_folder}/examples/optimization/regression_drd2_50.json \
+  /<your-project-dir>/qsartuna-optimize \
+  --config {project_folder}/examples/optimization/regression_drd2_50.json \
   --best-buildconfig-outpath ~/qsartuna-target/best.json \
   --best-model-outpath ~/qsartuna-target/best.pkl \
   --merged-model-outpath ~/qsartuna-target/merged.pkl
@@ -195,33 +299,54 @@ When the script is complete, it will create pickled model files inside your home
 
 When the model is built, run inference:
 ```shell
-singularity exec /{project_folder}/containers/QSARtuna_latest.sif \
-  /opt/qsartuna/.venv/bin/qsartuna-predict \
+  qsartuna-predict \
   --model-file target/merged.pkl \
   --input-smiles-csv-file tests/data/DRD2/subset-50/test.csv \
   --input-smiles-csv-column "canonical" \
   --output-prediction-csv-file target/prediction.csv
 ```
 
-Note that QSARtuna_latest.sif points to the most recent version of QSARtuna.
-
-Legacy models require the inference with the same QSARtuna version used to train the model.
-This can be specified by modifying the above command and supplying 
-`/projects/cc/mai/containers/QSARtuna_<version>.sif` (replace <version> with the version of QSARtuna).
-
-E.g:
+Note that prediction accepts a variety of command line arguments:
 ```shell
-singularity exec /{project_folder}/containers/QSARtuna_2.5.1.sif \
-  /opt/qsartuna/.venv/bin/qsartuna-predict \
-  --model-file 2.5.1_model.pkl \
-  --input-smiles-csv-file tests/data/DRD2/subset-50/test.csv \
-  --input-smiles-csv-column "canonical" \
-  --output-prediction-csv-file target/prediction.csv
+ qsartuna-predict -h
+usage: qsartuna-predict [-h] --model-file MODEL_FILE [--input-smiles-csv-file INPUT_SMILES_CSV_FILE] [--input-smiles-csv-column INPUT_SMILES_CSV_COLUMN] [--input-aux-column INPUT_AUX_COLUMN]
+                        [--input-precomputed-file INPUT_PRECOMPUTED_FILE] [--input-precomputed-input-column INPUT_PRECOMPUTED_INPUT_COLUMN]
+                        [--input-precomputed-response-column INPUT_PRECOMPUTED_RESPONSE_COLUMN] [--output-prediction-csv-column OUTPUT_PREDICTION_CSV_COLUMN]
+                        [--output-prediction-csv-file OUTPUT_PREDICTION_CSV_FILE] [--predict-uncertainty] [--predict-explain] [--uncertainty_quantile UNCERTAINTY_QUANTILE]
+
+Predict responses for a given OptunaAZ model
+
+options:
+  -h, --help            show this help message and exit
+  --input-smiles-csv-file INPUT_SMILES_CSV_FILE
+                        Name of input CSV file with Input SMILES
+  --input-smiles-csv-column INPUT_SMILES_CSV_COLUMN
+                        Column name of SMILES column in input CSV file
+  --input-aux-column INPUT_AUX_COLUMN
+                        Column name of auxiliary descriptors in input CSV file
+  --input-precomputed-file INPUT_PRECOMPUTED_FILE
+                        Filename of precomputed descriptors input CSV file
+  --input-precomputed-input-column INPUT_PRECOMPUTED_INPUT_COLUMN
+                        Column name of precomputed descriptors identifier
+  --input-precomputed-response-column INPUT_PRECOMPUTED_RESPONSE_COLUMN
+                        Column name of precomputed descriptors response column
+  --output-prediction-csv-column OUTPUT_PREDICTION_CSV_COLUMN
+                        Column name of prediction column in output CSV file
+  --output-prediction-csv-file OUTPUT_PREDICTION_CSV_FILE
+                        Name of output CSV file
+  --predict-uncertainty
+                        Predict with uncertainties (model must provide this functionality)
+  --predict-explain     Predict with SHAP or ChemProp explainability
+  --uncertainty_quantile UNCERTAINTY_QUANTILE
+                        Apply uncertainty threshold to predictions
+
+required named arguments:
+  --model-file MODEL_FILE
+                        Model file name
 ```
 
-would generate predictions for a model trained with QSARtuna 2.5.1.
 
-### Optional: inspect
+## Optional: inspect
 To inspect performance of different models tried during optimization,
 use [MLFlow Tracking UI](https://www.mlflow.org/docs/latest/tracking.html):
 ```bash
@@ -258,86 +383,6 @@ You can get more details by clicking individual runs.
 There you can access run/trial build (training) configuration.
 
 
-## Run from Python/Jupyter Notebook
-
-Create conda environment with Jupyter and Install QSARtuna there:
-```shell
-module purge
-module load Miniconda3
-conda create --name my_env_with_qsartuna python=3.10.10 jupyter pip
-conda activate my_env_with_qsartuna
-module purge  # Just in case.
-which python  # Check. Should output path that contains "my_env_with_qsartuna".
-python -m pip install https://github.com/MolecularAI/QSARtuna/releases/download/3.1.0/qsartuna-3.1.0.tar.gz
-```
-
-Then you can use QSARtuna inside your Notebook:
-```python
-from qsartuna.three_step_opt_build_merge import (
-    optimize,
-    buildconfig_best,
-    build_best,
-    build_merged,
-)
-from qsartuna.config import ModelMode, OptimizationDirection
-from qsartuna.config.optconfig import (
-    OptimizationConfig,
-    SVR,
-    RandomForest,
-    Ridge,
-    Lasso,
-    PLS,
-    XGBregressor,
-)
-from qsartuna.datareader import Dataset
-from qsartuna.descriptors import ECFP, MACCS_keys, ECFP_counts
-
-##
-# Prepare hyperparameter optimization configuration.
-config = OptimizationConfig(
-    data=Dataset(
-        input_column="canonical",
-        response_column="molwt",
-        training_dataset_file="tests/data/DRD2/subset-50/train.csv",
-    ),
-    descriptors=[ECFP.new(), ECFP_counts.new(), MACCS_keys.new()],
-    algorithms=[
-        SVR.new(),
-        RandomForest.new(),
-        Ridge.new(),
-        Lasso.new(),
-        PLS.new(),
-        XGBregressor.new(),
-    ],
-    settings=OptimizationConfig.Settings(
-        mode=ModelMode.REGRESSION,
-        cross_validation=3,
-        n_trials=100,
-        direction=OptimizationDirection.MAXIMIZATION,
-    ),
-)
-
-##
-# Run Optuna Study.
-study = optimize(config, study_name="my_study")
-
-##
-# Get the best Trial from the Study and make a Build (Training) configuration for it.
-buildconfig = buildconfig_best(study)
-# Optional: write out JSON of the best configuration.
-import json
-print(json.dumps(buildconfig.json(), indent=2))
-
-##
-# Build (re-Train) and save the best model.
-build_best(buildconfig, "target/best.pkl")
-
-##
-# Build (Train) and save the model on the merged train+test data.
-build_merged(buildconfig, "target/merged.pkl")
-```
-
-
 ## Adding descriptors to QSARtuna
 
 
@@ -407,7 +452,7 @@ CompositeCompatibleDescriptor = Union[
 
 Then you can use YourNewDescriptor inside your Notebook:
 ```python
-from qptuna.descriptors import YourNewDescriptor
+from qsartuna.descriptors import YourNewDescriptor
 
 config = OptimizationConfig(
     data=Dataset(

diff --git a/docs/sphinx-builddir/html/index.html b/docs/sphinx-builddir/html/index.html
@@ -114,6 +114,8 @@ <h1>Welcome to QSARtuna Documentation!<a class="headerlink" href="#welcome-to-qs
 <li class="toctree-l2"><a class="reference internal" href="README.html#background">Background</a></li>
 <li class="toctree-l2"><a class="reference internal" href="README.html#json-based-command-line-interface">JSON-based Command-line interface</a></li>
 <li class="toctree-l2"><a class="reference internal" href="README.html#run-from-python-jupyter-notebook">Run from Python/Jupyter Notebook</a></li>
+<li class="toctree-l2"><a class="reference internal" href="README.html#running-via-cli">Running via CLI</a></li>
+<li class="toctree-l2"><a class="reference internal" href="README.html#optional-inspect">Optional: inspect</a></li>
 <li class="toctree-l2"><a class="reference internal" href="README.html#adding-descriptors-to-qsartuna">Adding descriptors to QSARtuna</a></li>
 </ul>
 </li>

diff --git a/docs/sphinx-builddir/html/objects.inv b/docs/sphinx-builddir/html/objects.inv
diff --git a/docs/sphinx-builddir/html/searchindex.js b/docs/sphinx-builddir/html/searchindex.js