Skip to content

Commit

Permalink
Improve k8s infra chart components description and explain presets (#938
Browse files Browse the repository at this point in the history
)
  • Loading branch information
srikanthccv authored Nov 4, 2024
1 parent 18e0697 commit 43e53fe
Show file tree
Hide file tree
Showing 2 changed files with 251 additions and 6 deletions.
2 changes: 1 addition & 1 deletion constants/docsSideNav.ts
Original file line number Diff line number Diff line change
Expand Up @@ -772,7 +772,7 @@ const docsSideNav = [
},
{
type: 'doc',
route: '/docs/metrics-management/k8s-deployment-override',
route: '/docs/metrics-management/k8s-infra-otel-config',
label: 'Configure k8s-infra otelDeployment to collect metrics from receivers',
},
],
Expand Down
255 changes: 250 additions & 5 deletions data/docs/metrics-management/k8s-deployment-override.mdx
Original file line number Diff line number Diff line change
@@ -1,15 +1,17 @@
---
date: 2024-10-28
title: Configure k8s-infra otelDeployment to collect metrics from receivers
id: k8s-deployment-override
title: k8s-infra otelDeployment and otelAgent configuration
id: k8s-infra-otel-config
---

The [k8s-infra chart](https://signoz.io/docs/tutorial/kubernetes-infra-metrics/) runs two kind of otel collectors:

- otelAgent: This collector is deployed as a daemonset and collects metrics/logs from the pods on the node. One agent is deployed per node.
- otelDeployment: This collector is deployed as a deployment and collects metrics/logs of the cluster. One deployment is deployed per cluster.

The `otelDeployment` collector is configured to collect cluster metrics and events by default. There are some use cases where you might want to collect metrics from components other than the ones that are being collected by default. For example, you might want to collect metrics from a redis server or a postgres database. This page describes how to override the default configuration of the `otelDeployment` collector to collect metrics from additional components. The override-values.yaml file contains the configurations that you can override.
## otelDeployment

The `otelDeployment` collector is configured to collect cluster metrics and events by default. There are some use cases where you might want to collect metrics from components other than the ones that are being collected by default. For example, you might want to collect metrics from a redis server or a postgres database. This page describes how to override the default configuration of the `otelDeployment` collector to collect metrics from additional components. The `override-values.yaml` file contains the configurations that you can override.

### Example 1: Configuring a redis receiver running inside a namespace "redis-ns" with the name "redis-server" and port 6379

Expand Down Expand Up @@ -64,6 +66,249 @@ otelDeployment:
The above examples are simple and show how to configure the `otelDeployment` collector to collect metrics from a single component. However, in a real-world scenario, you might have to configure the receiver with a username and password and other configurations.

## Why `otelDeployment`?
## otelAgent

The `otelAgent` collector is deployed as a daemonset, it results in duplicate metrics if the same component is running on multiple nodes and the metrics are collected from all the nodes. The `otelDeployment` collector solves this problem by collecting metrics from the cluster and not from the individual nodes.

## Essential Presets

This guide explains how to configure the OpenTelemetry Collector presets in SigNoz. These presets control what telemetry data you collect and how you collect it from your Kubernetes cluster.

### 1. OTLP Exporter
```yaml
presets:
otlpExporter:
enabled: true # Must be enabled for data to reach SigNoz
```
**What it does**: Enables sending telemetry data to SigNoz backend. This is required for SigNoz to receive any data.

### 2. Host Metrics Collection
```yaml
presets:
hostMetrics:
enabled: true
collectionInterval: 30s # Adjust based on your monitoring needs
scrapers:
cpu: {} # Collects CPU usage, time, and utilization metrics
load: {} # Collects system load averages
memory: {} # Collects memory usage metrics
disk: # Collects disk I/O metrics
exclude:
devices: # Regex patterns to exclude specific devices
- ^ram\d+$
- ^loop\d+$
# Add custom device exclusions here
filesystem: # Collects filesystem metrics
exclude_fs_types:
fs_types:
- tmpfs
- squashfs
# Add filesystem types to exclude
network: # Collects network metrics
exclude:
interfaces:
- ^veth.*$ # Excludes virtual ethernet devices
- ^docker.*$ # Excludes docker interfaces
# Add network interfaces to exclude
```
**Key points**:
- Adjust `collectionInterval` based on your monitoring granularity needs
- Customize device exclusions to reduce noise from irrelevant storage devices
- Configure network interface exclusions to focus on relevant network metrics

### 3. Container Log Collection
```yaml
presets:
logsCollection:
enabled: true
startAt: beginning # Options: beginning or end
includeFilePath: true # Adds file path as metadata
includeFileName: false # Adds file name as metadata
# Exclusion configuration
blacklist:
enabled: true
signozLogs: true # Excludes SigNoz's own logs
namespaces:
- kube-system # Add namespaces to exclude
pods:
- hotrod # Add pod names to exclude
- locust
containers: [] # Add container names to exclude
# Inclusion configuration (overrides blacklist if enabled)
whitelist:
enabled: false
signozLogs: true
namespaces: [] # Only collect logs from these namespaces
pods: [] # Only collect logs from these pods
containers: [] # Only collect logs from these containers
```
**Best practices**:
- Use blacklist for excluding noisy system pods
- Enable whitelist when you need to monitor specific applications only
- Consider disk usage implications when enabling file path/name inclusion

### 4. Kubernetes Metrics Collection
```yaml
presets:
kubeletMetrics:
enabled: true
collectionInterval: 30s
authType: serviceAccount # Authentication method
endpoint: ${env:K8S_HOST_IP}:10250
insecureSkipVerify: true # Set to false in production
# Metrics configuration
metrics:
k8s.pod.cpu_limit_utilization:
enabled: true
k8s.pod.memory_limit_utilization:
enabled: true
# Enable other metrics as needed
```
**Important settings**:
- Set appropriate `collectionInterval` based on cluster size
- Configure `insecureSkipVerify: false` in production environments
- Enable specific metrics based on monitoring requirements

### 5. Kubernetes Metadata Enrichment
```yaml
presets:
kubernetesAttributes:
enabled: true
passthrough: false # Set true to disable k8s API calls
# Control which metadata to collect
extractMetadatas:
- k8s.namespace.name
- k8s.deployment.name
- k8s.pod.name
- k8s.node.name
# Add or remove metadata fields
# Pod association configuration
podAssociation:
- sources:
- from: resource_attribute
name: k8s.pod.ip
```
**Configuration tips**:
- Enable only required metadata fields to optimize performance
- Use `passthrough: true` in very large clusters to reduce API load
- Configure pod association based on your networking setup

### 6. Cluster-level Metrics
```yaml
presets:
clusterMetrics:
enabled: true
collectionInterval: 30s
# Node conditions to monitor
nodeConditionsToReport:
- Ready
- MemoryPressure
- DiskPressure
# Add conditions based on monitoring needs
# Resource types to monitor
allocatableTypesToReport:
- cpu
- memory
# - storage # Uncomment if needed
```
**When to use**:
- Enable for cluster health monitoring
- Useful for capacity planning and resource optimization
- Essential for multi-node cluster monitoring

### 7. Kubernetes Events
```yaml
presets:
k8sEvents:
enabled: true
authType: serviceAccount
namespaces: [] # Empty for all namespaces, or specify list
```
**Usage scenarios**:
- Enable for debugging and audit trails
- Monitor specific namespaces by listing them
- Useful for compliance and security monitoring

## Common Deployment Scenarios

### Production Cluster with Full Monitoring
```yaml
presets:
otlpExporter:
enabled: true
hostMetrics:
enabled: true
collectionInterval: 30s
logsCollection:
enabled: true
blacklist:
enabled: true
namespaces:
- kube-system
kubeletMetrics:
enabled: true
insecureSkipVerify: false
kubernetesAttributes:
enabled: true
clusterMetrics:
enabled: true
k8sEvents:
enabled: true
```

### Resource-Constrained Environment
```yaml
presets:
otlpExporter:
enabled: true
hostMetrics:
enabled: true
collectionInterval: 60s
logsCollection:
enabled: true
whitelist:
enabled: true
namespaces:
- production
kubeletMetrics:
enabled: true
collectionInterval: 60s
kubernetesAttributes:
enabled: true
passthrough: true
```

### Development Environment
```yaml
presets:
otlpExporter:
enabled: true
hostMetrics:
enabled: true
collectionInterval: 30s
logsCollection:
enabled: true
kubeletMetrics:
enabled: true
kubernetesAttributes:
enabled: true
```

## Resource Requirements

Recommended resource allocations based on preset combinations:

| Configuration | CPU Request | Memory Request | CPU Limit | Memory Limit |
|--------------|-------------|----------------|-----------|--------------|
| Minimal | 100m | 256Mi | 200m | 512Mi |
| Standard | 200m | 512Mi | 500m | 1Gi |
| Full | 500m | 1Gi | 1000m | 2Gi |

The `otelAgent` collector is deployed as a daemonset, it results in duplicate metrics if the same component is running on multiple nodes. The `otelDeployment` collector solves this problem by collecting metrics from the cluster and not from the individual nodes.
Adjust these values based on your cluster size and monitoring requirements.

0 comments on commit 43e53fe

Please sign in to comment.