Skip to content

Commit

Permalink
Structurize, update and extend documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
tnozicka committed Nov 22, 2024
1 parent 58104f2 commit 9ba7fc1
Show file tree
Hide file tree
Showing 82 changed files with 8,587 additions and 7,757 deletions.
6 changes: 3 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -548,9 +548,9 @@ verify-examples:

$(call update-scylla-helm-versions,$(tmp_dir)/helm/values.cluster.yaml)
$(call update-scylla-manager-helm-versions,$(tmp_dir)/helm/values.manager.yaml)
$(call replace-scyllacluster-versions,$(tmp_dir)/eks/cluster.yaml,1)
$(call replace-scyllacluster-versions,$(tmp_dir)/generic/cluster.yaml,1)
$(call replace-scyllacluster-versions,$(tmp_dir)/gke/cluster.yaml,1)
$(call replace-scyllacluster-versions,$(tmp_dir)/eks/scyllacluster.yaml,1)
$(call replace-scyllacluster-versions,$(tmp_dir)/generic/scyllacluster.yaml,1)
$(call replace-scyllacluster-versions,$(tmp_dir)/gke/scyllacluster.yaml,1)
$(call replace-scyllacluster-versions,$(tmp_dir)/scylladb/scylla.scyllacluster.yaml,0)

$(call concat-manifests,$(sort $(wildcard ./examples/third-party/haproxy-ingress/*.yaml)),$(tmp_dir)/third-party/haproxy-ingress.yaml)
Expand Down
4 changes: 2 additions & 2 deletions docs/README-dev.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,13 +14,13 @@ Here is an example how you can start quickly using containers, similarly to how
(This assumes you are located at the repository root.)

```bash
podman run -it --pull=Always --rm -v="$( pwd )/docs:/go/$( go list -m )/docs:Z" --workdir="/go/$( go list -m )/docs" -p 5500:5500 quay.io/scylladb/scylla-operator-images:poetry-1.8 bash -euExo pipefail -O inherit_errexit -c 'poetry install && make preview'
podman run -it --pull=Always --rm -v="$( pwd )/:/go/$( go list -m )/:Z" --workdir="/go/$( go list -m )/docs" -p 5500:5500 quay.io/scylladb/scylla-operator-images:poetry-1.8 bash -euExo pipefail -O inherit_errexit -c 'poetry install && make preview'
```

Docs will be available at http://localhost:5500/

## Update dependencies

```bash
podman run -it --pull=Always --rm -v="$( pwd )/docs:/go/$( go list -m )/docs:Z" --workdir="/go/$( go list -m )/docs" quay.io/scylladb/scylla-operator-images:poetry-1.8 bash -euExo pipefail -O inherit_errexit -c 'poetry update'
podman run -it --pull=Always --rm -v="$( pwd )/:/go/$( go list -m )/:Z" --workdir="/go/$( go list -m )/docs" -p 5500:5500 quay.io/scylladb/scylla-operator-images:poetry-1.8 bash -euExo pipefail -O inherit_errexit -c 'poetry update'
```
295 changes: 149 additions & 146 deletions docs/poetry.lock

Large diffs are not rendered by default.

8 changes: 5 additions & 3 deletions docs/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -10,13 +10,15 @@ package-mode = false
python = "^3.10"
pygments = "^2.18.0"
sphinx-scylladb-theme = "^1.8.1"
#sphinx-substitution-extensions = "=2024.10.17"
sphinx-sitemap = "^2.6.0"
beartype = ">0.0.0"
sphinx-autobuild = "^2024.4.19"
Sphinx = "^7.3.7"
Sphinx = "^8.1.3"
sphinx-multiversion-scylla = "^0.3.1"
redirects_cli ="^0.1.3"
myst-parser = "^3.0.1"
sphinx-design = "^0.5.0"
myst-parser = "^4.0.0"
sphinx-design = "^0.6.1"

[build-system]
requires = ["poetry>=1.8.0"]
Expand Down
5 changes: 5 additions & 0 deletions docs/source/.internal/helm-crd-warning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:::{warning}
Helm doesn't support managing CustomResourceDefinition resources ([#5871](https://github.com/helm/helm/issues/5871), [#7735](https://github.com/helm/helm/issues/7735)).
Helm only creates CRDs on the first install and never updates them, while keeping the CRDs up to date (with any update) is absolutely essential.
In order to update them, users have to do it manually every time.
:::
5 changes: 5 additions & 0 deletions docs/source/.internal/manager-license-note.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
:::{note}
ScyllaDB Manager is available for ScyllaDB Enterprise customers and ScyllaDB Open Source users.
With ScyllaDB Open Source, ScyllaDB Manager is limited to 5 nodes.
See the ScyllaDB Manager [Proprietary Software License Agreement](https://www.scylladb.com/scylla-manager-software-license-agreement/) for details.
:::
7 changes: 7 additions & 0 deletions docs/source/.internal/tuning-qos-caution.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
:::{caution}
Only Pods with [`Guaranteed` QoS class](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#guaranteed) are eligible to be tuned, otherwise they would not have pinned CPUs.

Always verify that your [ScyllaCluster](/resources/scyllaclusters/basics.md) resource specifications meat [all the criteria](https://kubernetes.io/docs/concepts/workloads/pods/pod-qos/#criteria).

Don't forget you have to specify limits for both [resources](api-scylla.scylladb.com-scyllaclusters-v1-.spec.datacenter.racks[].resources)(ScyllaDB) and [agentResources](api-scylla.scylladb.com-scyllaclusters-v1-.spec.datacenter.racks[].agentResources)(ScyllaDB Manager Agent) that run in the same Pod.
:::
4 changes: 4 additions & 0 deletions docs/source/.internal/tuning-warning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
:::{warning}
We recommend that you first try out the performance tuning on a pre-production instance.
Given the nature of the underlying tuning script, undoing the changes requires rebooting the Kubernetes node(s).
:::
424 changes: 424 additions & 0 deletions docs/source/architecture/components-cluster_scoped.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
248 changes: 248 additions & 0 deletions docs/source/architecture/components-namespaced.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/architecture/components.odg
Binary file not shown.
Binary file added docs/source/architecture/deploy.odg
Binary file not shown.
10 changes: 10 additions & 0 deletions docs/source/architecture/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Architecture

:::{toctree}
:maxdepth: 1

overview
storage/index
tuning
manager
:::
37 changes: 37 additions & 0 deletions docs/source/architecture/manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# ScyllaDB Manager

{{productName}} has a basic integration with ScyllaDB Manager. At this point there is one global ScyllaDBManager instance that manages all [ScyllaClusters](../resources/scyllaclusters/basics.md) and a corresponding controller that automatically configures the ScyllaDB Manager to monitor the ScyllaDB instances and sync [repair](#api-scylla.scylladb.com-scyllaclusters-v1-.spec.repairs[]) and [backup](#api-scylla.scylladb.com-scyllaclusters-v1-.spec.backups[]) tasks based on [ScyllaCluster](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst) definition. Unfortunately, the rest of the functionality is not yet implemented in ScyllaCluster APIs and e.g. a restore of a cluster from a backup needs to be performed by executing into the shared ScyllaDB Manager deployment and using `sctool` directly by an administrator.

:::{caution}
Because ScyllaDB Manager instance is shared by all users and their ScyllaClusters, only administrators should have privileges to access the `scylla-manager` namespace.
:::

ScyllaDB Manager is a global deployment that is responsible for operating all [ScyllaClusters](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst) and runs inside `scylla-manager` namespace.
There is a corresponding controller running in {{productName}} that syncs the [ScyllaCluster](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst) metadata, [backup](#api-scylla.scylladb.com-scyllaclusters-v1-.spec.backups[]) and [repair](#api-scylla.scylladb.com-scyllaclusters-v1-.spec.repairs[]) tasks into the manager (and vice versa) and avoids accessing the shared instance by users. Unfortunately, at this point, other task like restoring from a backup require executing into the shared ScyllaDB Manager deployment which effectively needs administrator privileges.

ScyllaDB Manager uses a small ScyllaCluster instance internally and thus depends on the {{productName}} deployment and the CRD it provides.

:::{include} ../.internal/manager-license-note.md
:::

## Accessing ScyllaDB Manager

For the operations that are not yet supported on ScyllaClusters, you can access the ScyllaDB Manager manually.

To find the ScyllaDB Manager ID for your cluster, run:

:::{code-block} bash
kubectl get scyllacluster/basic --template='{{ .status.managerId }}'
:::

:::{note}
Note that some of the operations use *ScyllaDB Manager Agent* that runs within the ScyllaCluster that has to have access e.g. to buckets being used.
:::

## Configuring backup and repair tasks for a ScyllaCluster

[Backup](#api-scylla.scylladb.com-scyllaclusters-v1-.spec.backups[]) and [repair](#api-scylla.scylladb.com-scyllaclusters-v1-.spec.repairs[]) tasks are configured for each [ScyllaCluster](../api-reference/groups/scylla.scylladb.com/scyllaclusters.rst) in its resource definition.

## Manual restore procedure

You can find more detail on how to perform the manual restore procedure in [this dedicated page](../resources/scyllaclusters/nodeoperations/restore.md).
31 changes: 31 additions & 0 deletions docs/source/architecture/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Overview

## Foreword

{{productName}} is a set of controllers and API extensions that need to be installed in your cluster.
The Kubernetes API is extended using [CustomResourceDefinitions (CRDs)](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) and [dynamic admission webhooks](https://kubernetes.io/docs/reference/access-authn-authz/extensible-admission-controllers/) to provide new resources ([API reference](../api-reference/index.rst)).
These resources are reconciled by controllers embedded within the {{productName}} deployment.

ScyllaDB is a stateful application and {{productName}} requires you to have a storage provisioner installed in your cluster.
To achieve the best performance, we recommend using a storage provisioner based on local NVMEs.
You can learn more about different setups in [a dedicated storage section](./storage/overview.md).

## Components

{{productName}} deployment consists of several components that need to be installed / present in you Kubernetes cluster.
By design, some of the components need elevated permissions, but they are only accessible to the administrators.


### Cluster scoped
```{image} ./components-cluster_scoped.svg
:name: components-cluster-scoped
:align: center
:scale: 75%
```

### Namespaced
```{image} ./components-namespaced.svg
:name: components-namespaced
:align: center
:scale: 75%
```
8 changes: 8 additions & 0 deletions docs/source/architecture/storage/index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Storage

:::{toctree}
:maxdepth: 1

overview
local-csi-driver
:::
9 changes: 9 additions & 0 deletions docs/source/architecture/storage/local-csi-driver.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Local CSI Driver

## About

The Local CSI Driver implements the [Container Storage Interface (CSI)](https://kubernetes.io/blog/2019/01/15/container-storage-interface-ga/), a specification for container orchestrators to manage the lifecycle of volumes.

It supports dynamic provisioning on local disks, so storage volumes can be created on-demand through managed directories on the local disk.

You can find more details and the source code at <https://github.com/scylladb/local-csi-driver/>
23 changes: 23 additions & 0 deletions docs/source/architecture/storage/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Overview

ScyllaDB works with both local and network attached storage provisioners, but our primary focus is around the local storage that can provide the best performance. We support using 2 local provisioners: [ScyllaDB Local CSI Driver](https://github.com/scylladb/local-csi-driver) and [Kubernetes SIG Storage Local Persistence Volume Static Provisioner](https://github.com/kubernetes-sigs/sig-storage-local-static-provisioner).

## Setting up local disks

When a Kubernetes node having local disk(s) is created, the storage is usually uninitialized. This heavily depends on your platform and its options, but even when provisioned with mounted disks, they usually don't set up *RAID*, nor have the option to choose a file system type. (ScyllaDB needs the storage to be formatted with `xfs`.)

Setting up the RAID arrays, formatting the file system or mounting it in a declarative manner is challenging and that's one of the reasons we have created the [NodeConfig](../../resources/nodeconfigs.md) custom resource.

## Supported local provisioners

### ScyllaDB Local CSI driver

ScyllaDB Local CSI Driver supports dynamic provisioning on local disks and sharing the storage capacity.
It is based on dynamic directories and **xfs prjquota**.
It allows [PersistentVolumes](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) to be created dynamically for a corresponding [PersistentVolumeClaim](https://kubernetes.io/docs/concepts/storage/volumes/#persistentvolumeclaim) by automatically provisioning directories created on disks attached to instances. On supported filesystems, directories have quota limitations to ensure volume size limits.

At this point, the Local CSI Driver doesn't support provisioning block devices.

### Kubernetes SIG Storage Static Local Volume Provisioner

The local volume static provisioner is a Kubernetes SIG Storage project that can turn your disks into dedicated and isolated PersistentVolumes but all of them have to be created manually.
34 changes: 34 additions & 0 deletions docs/source/architecture/tuning.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
# Tuning

ScyllaDB works best when it's pinned to the CPUs and not interrupted.
To get the best performance and latency we recommend you set up your ScyllaDB with [cpu pinning using the static CPU policy](../installation/kubernetes/generic.md#static-cpu-policy).

One of the most common causes of context-switching are network interrupts.
Packets coming to a Kubernetes node need to be processed which requires using CPU shares.

On a Kubernetes node there is always a couple of other processes running, like kubelet, Kubernetes provider applications, daemons and others.
These processes require CPU shares, so we cannot dedicate entire node processing power to Scylla, we need to leave space for others.
We take advantage of it, and we pin IRQs to CPUs not used by any ScyllaDB Pods exclusively.

Performance tuning is enabled by default **when you create a corresponding [NodeConfig](../resources/nodeconfigs.md) for your nodes**.

Because some of the operations it needs to perform are not multitenant or require elevated privileges, the tuning scripts are ran in a dedicated system namespace called `scylla-operator-node-tuning`.
This namespace is created and entirely managed by {{productName}} and only administrators can access it.

The tuning is based around `perftune` script that comes from [scyllaDBUtilsImage](#api-scylla.scylladb.com-scyllaoperatorconfigs-v1alpha1-.status). `perftune` executes the performance optmizations like tuning the kernel, network, disk devices, spreading IRQs across CPUs and more. Conceptually, this is run in 2 parts: tuning the [Kubernetes nodes](#kubernetes-nodes) and tuning for [ScyllaDB Pods](#scylladb-pods).

:::{include} ../.internal/tuning-warning.md
:::

## Kubernetes nodes

`perftune` script is executed on the targeted Kubernetes nodes and it tunes kernel, network, disk devices and more.
This is executed right after the tuning is enabled using a [NodeConfig](../resources/nodeconfigs.md)

## ScyllaDB Pods

When a [ScyllaCluster](../resources/scyllaclusters/basics.md) Pod is created (and performance tuning is enabled), the Pod initializes but waits until {{productName}} runs an on-demand Job that will configure the host and the ScyllaDB process accordingly (e.g. spreading IRQs across other CPUs).
Only after that it will actually start running ScyllaDB.

:::{include} ../.internal/tuning-qos-caution.md
:::
12 changes: 0 additions & 12 deletions docs/source/clients/index.rst

This file was deleted.

12 changes: 11 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -50,8 +50,18 @@

# -- Options for myst parser

myst_enable_extensions = ["colon_fence"]
myst_enable_extensions = ["colon_fence", "attrs_inline", "substitution"]
myst_heading_anchors = 6
myst_substitutions = {
"productName": "Scylla Operator",
"repository": "scylladb/scylla-operator",
"revision": "master",
"imageRepository": "docker.io/scylladb/scylla",
"imageTag": "6.2.0",
"enterpriseImageRepository": "docker.io/scylladb/scylla-enterprise",
"enterpriseImageTag": "2024.1.12",
"agentVersion": "3.4.0",
}

# -- Options for not found extension

Expand Down
Loading

0 comments on commit 9ba7fc1

Please sign in to comment.