This document describes how to use Coil features.
Coil is a Kubernetes-native application and can be controlled with kubectl
.
For installation, read setup.md.
By default, no one but the cluster administrator can edit the custom resources of Coil. Everyone can view the custom resources.
If you want to allow someone or some group to edit them, define the roles and role bindings you need.
AddressPool
is a cluster-scope custom resource of Coil.
Pods in a namespace are assigned their IP addresses from an address pool.
Here is an example address pool:
apiVersion: coil.cybozu.com/v2
kind: AddressPool
metadata:
name: default
spec:
blockSizeBits: 5
subnets:
- ipv4: 10.2.0.0/16
ipv6: fd01:0203:0405:0607::/112
blockSizeBits
specifies the size of an address block that is carved from this pool.
The value n
is interpreted as 2n addresses, so if the value is 5,
each address block of this pool will have 32 IP addresses.
The minimum allowed value of blockSizeBits
is 0.
The default value of blockSizeBits
is 5.
An address pool can be one of IPv4-only, IPv6-only, or dual stack type.
The type is determined by what family of addresses are included in subnets
.
For IPv4-only pool, subnets
can contain only IPv4 subnets.
For IPv6-only pool, subnets
can contain only IPv6 subnets.
For dual stack pool, subnets
must contain both IPv4 and IPv6 subnets.
IPv4 and IPv6 subnets must be the same size for dual stack pools. In the above example, both subnets are 16 bits wide.
The address pool whose name is default
becomes the default pool.
The default pool is used in all namespaces that do not specify which pool to use.
You may define other address pools.
Non-default pools are used only if the namespace has coil.cybozu.com/pool
annotation.
You can use kubectl
to give the annotation to a namespace.
The following example makes Pods in namespace foo
be assigned addresses from pool bar
.
$ kubectl annotate namespaces foo coil.cybozu.com/pool=bar
If a pool is running out of IP addresses, you can add more subnets.
apiVersion: coil.cybozu.com/v2
kind: AddressPool
metadata:
name: default
spec:
blockSizeBits: 5
subnets:
- ipv4: 10.2.0.0/16
ipv6: fd01:0203:0405:0607::/112
- ipv4: 10.3.0.0/16
ipv6: fd01:0203:0405:0608::/112
You cannot remove or edit subnets in the existing pools.
As described, each node is assigned address blocks from address pools.
The assignment of address blocks can be checked by getting AddressBlock
custom resource.
$ kubectl get addressblocks
NAME NODE POOL IPV4 IPV6
default-0 coil-worker3 default 10.224.0.0/30
default-1 coil-control-plane default 10.224.0.4/30
default-3 coil-worker3 default 10.224.0.12/30
Address blocks are automatically assigned and returned. So usually you do not need to care about them.
Address blocks represent routes or subnets to be routed to their assigned nodes.
To help integrating Coil with router software such as BIRD, Coil exports information of address blocks to a kernel routing table on each node.
The routing table ID is usually 119.
Run ip route show table 119
or ip -6 route show table 119
on a node to check it.
# ip route show table 119
10.224.0.0/30 dev lo proto 30
10.224.0.12/30 dev lo proto 30
Coil can run some Pod as an egress NAT server and selectively allow other Pods to become clients of the NAT server. This feature is called on-demand NAT for egress traffics, or shortly Egress NAT.
It is common in data centers that only a specific subset of IP addresses can be routed to external networks, such as the Internet.
If a pod has such a specific IP address and can accept packets from other pods, the pod can work as a SNAT (source network address translation) server.
Coil implements this with the following setup:
- Prepare an address pool for the specific subset of IP addresses
- Prepare a namespace associated with the address pool
- Create
Deployment
to run SNAT pods in the namespace - Create
Service
in the same namespace to make NAT servers redundant - Establish IP-over-IP tunnels between NAT clients and servers
- Setup routing table of client pods to route packets to SNAT pods over the tunnel
Step 1 and 2 should be done by users.
Step 3 and 4 can be done by creating Egress
custom resource in the namespace.
Step 5 and 6 are automatically done by Coil.
Egress
is a namespace-scoped custom resource of Coil.
It defines an egress portal of the cluster for some destinations.
Coil creates a Deployment
and Service
for each Egress
.
It also creates a PodDisruptionBudget
when spec.podDisruptionBudget
is specified.
Here is an example Egress
resource for the Internet:
apiVersion: coil.cybozu.com/v2
kind: Egress
metadata:
namespace: internet
name: egress
spec:
destinations:
- 0.0.0.0/0
replicas: 2
The next is another example for a private, external network with a lot of extra fields:
apiVersion: coil.cybozu.com/v2
kind: Egress
metadata:
namespace: other-network
name: egress
spec:
destinations:
- 172.20.0.0/16
- fd04::/64
replicas: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 2
maxSurge: 0
template:
metadata:
annotations:
ann1: foo
labels:
label1: bar
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app.kubernetes.io/name: coil
app.kubernetes.io/component: egress
app.kubernetes.io/instance: egress
topologyKey: topology.kubernetes.io/zone
containers:
- name: egress
resources:
limits:
memory: 400Mi
sessionAffinity: ClientIP
sessionAffinityConfig:
clientIP:
timeoutSeconds: 43200
podDisruptionBudget:
maxUnavailable: 1
Only destinations
are mandatory. Other fields in spec
are optional.
You may customize the container of egress Pods as shown in the above example.
Field | Type | Description |
---|---|---|
destinations |
[]string |
IP subnets where the packets are SNATed and sent. |
replicas |
int |
Copied to Deployment's spec.replicas . Default is 1. |
strategy |
DeploymentStrategy | Copied to Deployment's spec.strategy . |
template |
PodTemplateSpec | Copied to Deployment's spec.template . |
sessionAffinity |
ClusterIP or None |
Copied to Service's spec.sessionAffinity . Default is ClusterIP . |
sessionAffinityConfig |
SessionAffinityConfig | Copied to Service's spec.sessionAffinityConfig . |
podDisruptionBudget |
EgressPDBSpec |
minAvailable and maxUnavailable are copied to PDB's spec. |
In order to send packets from a Pod through Egresses, annotate the Pod like this:
apiVersion: v1
kind: Pod
metadata:
name: nat-client
namespace: default
annotations:
egress.coil.cybozu.com/internet: egress
egress.coil.cybozu.com/other-network: egress
spec:
# ...
As you can see, egress.coil.cybozu.com/NAMESPACE
is the annotation key and the value is the Egress
resource name.
To prohibit Pods from accessing Egress pods, use the standard NetworkPolicy
.
To prohibit Pods in the default
namespace to access Egress in internet
namespace, add:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: prohibit-internet
namespace: default
spec:
policyTypes: ["Egress"]
egress:
- to:
- namespaceSelector:
- key: name
operator: NotIn
values: ["internet"]
If you set replicas
to more than 1, normally you should not set sessionAffinity
to None
.
This is because session affinity is mandatory to keep stateful TCP connections.
You may need to extend the timeout setting for idle connections with spec.sessionAffinityConfig
as follows:
apiVersion: coil.cybozu.com/v2
kind: Egress
metadata:
namespace: other-network
name: egress
spec:
# snip
sessionAffinityConfig:
clientIP:
timeoutSeconds: 43200
The default timeout seconds is 10800 (= 3 hours).
Coil exposes two types of Prometheus metrics.
- Address pool metrics
Metrics about address pools managed by Coil. For description, read cmd-coil-controller.md. - Program metrics
Metrics about coil components internal. Memory usage, the number of requests to the API server, etc. They are exposed by controller-runtime.
If using Prometheus, the following scrape configuration can be used.
scrape_configs:
- job_name: "coil"
kubernetes_sd_configs:
- role: pod
namespaces:
names: ["kube-system"]
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_name]
action: keep
regex: coil
- source_labels: [__address__, __meta_kubernetes_pod_label_app_kubernetes_io_component]
action: replace
regex: ([^:]+)(?::\d+)?;coild
replacement: ${1}:9384
target_label: __address__
- source_labels: [__address__, __meta_kubernetes_pod_label_app_kubernetes_io_component]
action: replace
regex: ([^:]+)(?::\d+)?;coil-controller
replacement: ${1}:9386
target_label: __address__
- source_labels: [__address__, __meta_kubernetes_pod_label_app_kubernetes_io_component]
action: replace
regex: ([^:]+)(?::\d+)?;egress
replacement: ${1}:8080
target_label: __address__
- source_labels: [__address__]
action: replace
regex: ([^:]+)(?::\d+)?
replacement: ${1}
target_label: instance
- source_labels: [__meta_kubernetes_pod_label_app_kubernetes_io_component]
action: replace
regex: (.*)
replacement: ${1}
target_label: component
The example of Grafana dashboard is here.