Josh Choo's blog

Deploying Elasticsearch, fluentd and Kibana on Kubernetes with Digital Ocean

12 December 2021

As part of the Digital Ocean Kubernetes Challenge, I deployed the Elasticearch, Fluentd and Kibana stack for log analytics. It's my first time deploying a Statefulset and Daemonset, and I encountered several challenges along the way, which gave me the opportunity to practice debugging Kubernetes issues.

I started off by developing in a local cluster, but in this guide I will demonstrate how to deploy on Digital Ocean's managed Kubernetes cluster.

Disclaimer

This is not a production-ready deployment. Rather it is a simplified guide to get started learning how to set up Elasticsearch, fluentd and Kibana.

Requirements

Clone the repo

The completed repo can be found here. Feel free to clone it to follow along.

$ git clone https://github.com/joshchoo/digital-ocean-kubernetes-challenge.git

Provision a managed Kubernetes cluster

Let's begin by provisioning three nodes with at least 4GB of RAM. I found that using the nodes with less RAM from Digital Ocean caused the pods to crash continuously 🥲.

Provisioning three nodes allows the Elasticsearch service to tolerate one node becoming unavailable.

$ doctl kubernetes cluster create k8s-challenge \
 	--size=s-2vcpu-4gb \
 	--count=3 \
 	--region=sgp1 \
 	--surge-upgrade=false \
 	--wait=false

It takes a while for Digital Ocean to provision the nodes. We can check the progress on the Digital Ocean site:

Digital Ocean provisioning nodes in progress

Once the nodes have been provisioned, we should see the following details:

Digital Ocean nodes provisioned

We can click on the "Kubernetes Dashboard" button to open the dashboard!

On the dashboard, take note that Digital Ocean has provided the do-block-storage Storage Class. We will use this to conveniently provision Elasticsearch with storage backed by Digital Ocean's Block Storage.

Kubernetes dashboard Digital Ocean Block Storage class

Context

After creating the cluster, the context should have changed to the newly provisioned cluster. This means that future kubectl commands will execute against the Kubernetes cluster on Digital Ocean instead of local clusters, such as that created by minikube.

$ kubectl config get-contexts
CURRENT   NAME                    CLUSTER                 AUTHINFO                      NAMESPACE
*         do-sgp1-k8s-challenge   do-sgp1-k8s-challenge   do-sgp1-k8s-challenge-admin
          minikube                minikube                minikube                      default

Create the logging namespace

By default, Kubernetes deploys resources to the default namespace. A namespace is like a logical sub-cluster within the Kubernetes cluster. We shall deploy our logging analytics stack to a new logging namespace.

# infra/logging-ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: logging

$ kubectl apply -f infra/logging-ns.yaml

Create the Elasticsearch cluster

Elasticsearch is a search engine that is commonly used to search for logs across many backend applications.

We shall deploy the Elasticsearch cluster in the newly-created logging namespace. The Elasticsearch resource will be a StatefulSet instead of Deployment to maintain persistent identifiers on Elasticsearch pods. Persistent identifiers ensure that the "primary" Elasticsearch pod does not access the storage of the secondary, and vice versa.

As mentioned previously, we created three replicas (each deployed on a different node) so that Elasticsearch can tolerate one node becoming unavailable. If one node fails, the remaining two can still form a quorum and elect a new leader.

As previously mentioned, we will use the do-block-storage Storage Class to automatically provision storage on Digital Ocean.

The tedious alternative would have been to manually provision Digital Ocean Block Storage, and deploy Persistent Volume and Persistent Volume Claim resources. Thankfully we don't have to do this.

# infra/elasticsearch.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elasticsearch-cluster
  namespace: logging
spec:
  selector:
    matchLabels:
      app: elasticsearch
  serviceName: elasticsearch
  # Create at least 3 primary-eligible nodes so that if one fails, the others can still safely form a quorum.
  # Source: https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-voting.html
  replicas: 3
  template:
    metadata:
      labels:
        app: elasticsearch
    spec:
      containers:
        - name: elasticsearch
          image: docker.elastic.co/elasticsearch/elasticsearch:7.15.2
          resources:
            limits:
              cpu: 1000m
            requests:
              cpu: 100m
          ports:
            - containerPort: 9200
              name: http-api
              protocol: TCP
            - containerPort: 9300
              name: inter-node
              protocol: TCP
          volumeMounts:
            - name: elasticsearch-data # should match volumeClaimTemplates.metadata.name
              mountPath: /usr/share/elasticsearch/data
          # See required envvars: https://www.elastic.co/guide/en/elasticsearch/reference/7.15/docker.html#docker-compose-file
          env:
            - name: node.name
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name # resolves to elasticsearch-{ordinal}
            - name: cluster.name
              value: elasticsearch-cluster
            - name: discovery.seed_hosts
              # DNS name for each Pod: <StatefulSet metadata.name-{ordinal}>.<serviceName>.<namespace>.svc.cluster.local
              # Truncated DNS: <StatefulSet metadata.name-{ordinal}>.<serviceName>
              value: "elasticsearch-cluster-0.elasticsearch,elasticsearch-cluster-1.elasticsearch,elasticsearch-cluster-2.elasticsearch"
              # The initial master nodes should be identified by their node.name, which we defined above.
              # See: https://www.elastic.co/guide/en/elasticsearch/reference/7.9/discovery-settings.html#CO15-1
            - name: cluster.initial_master_nodes
              value: "elasticsearch-cluster-0,elasticsearch-cluster-1,elasticsearch-cluster-2"
            # Disabled bootstrap.memory_lock because of error: "memory locking requested for elasticsearch process but memory is not locked"
            # - name: bootstrap.memory_lock
            #   value: "true"
            - name: ES_JAVA_OPTS
              value: "-Xms512m -Xmx512m"
            - name: "node.max_local_storage_nodes"
              value: "3"
      initContainers:
        # https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes
        - name: fix-permissions
          image: busybox
          command: ["sh", "-c", "chown -R 1000:1000 /usr/share/elasticsearch/data"]
          securityContext:
            privileged: true
          volumeMounts:
            - name: elasticsearch-data
              mountPath: /usr/share/elasticsearch/data
        # Elasticsearch demands at least vm.max_map_count 262144 compared to the default 65530
        # Related: https://stackoverflow.com/questions/51445846/elasticsearch-max-virtual-memory-areas-vm-max-map-count-65530-is-too-low-inc
        - name: increase-vm-max-map-count
          image: busybox
          command: ["sysctl", "-w", "vm.max_map_count=262144"]
          securityContext:
            privileged: true
  volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        storageClassName: do-block-storage
        accessModes:
          - ReadWriteOnce
        resources:
          requests:
            storage: 1Gi
---
apiVersion: v1
kind: Service
metadata:
  name: elasticsearch
  namespace: logging
spec:
  selector:
    app: elasticsearch
  # Set up a headless service
  clusterIP: None
  ports:
    # The HTTP API interface for client requests
    - name: http-api
      port: 9200
      targetPort: 9200
      # The transport interface for inter-node communication
    - name: inter-node
      port: 9300
      targetPort: 9300

$ kubectl apply -f infra/elasticsearch.yaml

We might see some "pod has unbound immediate PersistentVolumeClaims" warnings when viewing logs for the Pods (see image), but that just means that there isn't yet an underlying Persistent Volume to support the PersistentVolumeClaims. Don't worry about it! Digital Ocean is provisioning the Block Storage for the Persistent Volume behind-the-scenes. Just wait a bit and the warning should disappear.

Kubernetes dashboard unbounded immediate PersistentVolumeClaims warning

We should see the following when the Elasticsearch pods are now ready. Notice in the top menu-bar that we are narrowing to the logging namespace:

Kubernetes dashboard Elasticsearch pods

$ kubectl get pods -n logging
NAME                      READY   STATUS    RESTARTS   AGE
elasticsearch-cluster-0   1/1     Running   0          11m
elasticsearch-cluster-1   1/1     Running   0          10m
elasticsearch-cluster-2   1/1     Running   0          10m

We can test that the service is running by sending a request to it. First, we need to establish a port-forward so that we can send requests via localhost (127.0.0.1) to the Elasticsearch port at :9200.

$ kubectl port-forward elasticsearch-cluster-0 9200:9200 -n logging
Forwarding from 127.0.0.1:9200 -> 9200
Forwarding from [::1]:9200 -> 9200

We should see the following response if Elasticsearch is running successfully.

$ curl 127.0.0.1:9200
{
  "name" : "elasticsearch-cluster-0",
  "cluster_name" : "elasticsearch-cluster",
  "cluster_uuid" : "84XJWldhT3qX0WNG8_JqEQ",
  "version" : {
    "number" : "7.15.2",
    "build_flavor" : "default",
    "build_type" : "docker",
    "build_hash" : "93d5a7f6192e8a1a12e154a2b81bf6fa7309da0c",
    "build_date" : "2021-11-04T14:04:42.515624022Z",
    "build_snapshot" : false,
    "lucene_version" : "8.9.0",
    "minimum_wire_compatibility_version" : "6.8.0",
    "minimum_index_compatibility_version" : "6.0.0-beta1"
  },
  "tagline" : "You Know, for Search"
}

Deploy fluentd

We've just deployed Elasticsearch, but we need a way to collect logs from other applications running in our Kubernetes cluster. Applications typically write their logs to stdout or stderr, and Kubernetes stores the pod container logs in /var/log.

fluentd is a data collector that allows us to push these logs to Elasticsearch so that we can search the logs there. We shall deploy it as a DaemonSet because we want fluentd to run on every Kubernetes node.

We will also create a ServiceAccount resource and bind it with a ClusterRole that grants fluentd permissions to get/list/watch pods and namespaces.

Important: We need to ensure that fluentd does not read its own logs from /var/log. Otherwise this could cause an error that prevent fluentd from sending logs to Elasticsearch. See the FLUENT_CONTAINER_TAIL_EXCLUDE_PATH environment variable below for the fix.

# infra/fluentd.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluentd
  namespace: logging
  labels:
    app: fluentd
spec:
  selector:
    matchLabels:
      app: fluentd
  template:
    metadata:
      labels:
        app: fluentd
    spec:
      # Use the fluentd ServiceAccount to run this pod
      serviceAccountName: fluentd
      tolerations:
        - key: node-role.kubernetes.io/master
          effect: NoSchedule
      containers:
        - name: fluentd
          image: fluent/fluentd-kubernetes-daemonset:v1.14.3-debian-elasticsearch7-1.0
          env:
            - name: FLUENT_ELASTICSEARCH_HOST
              value: "elasticsearch.logging.svc.cluster.local"
            - name: FLUENT_ELASTICSEARCH_PORT
              value: "9200"
            - name: FLUENT_ELASTICSEARCH_SCHEME
              value: "http"
            - name: FLUENTD_SYSTEMD_CONF
              value: disable
              # Fixes logs not being sent to Elasticsearch/Kibana: https://github.com/fluent/fluentd/issues/2545#issuecomment-747488212
              # Prevent Fluentd from reading its own logs recursively...
            - name: FLUENT_CONTAINER_TAIL_EXCLUDE_PATH
              value: /var/log/containers/fluent*
            - name: FLUENT_CONTAINER_TAIL_PARSER_TYPE
              value: /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
          resources:
            limits:
              memory: 512Mi
            requests:
              cpu: 100m
              memory: 200Mi
          # Map the Node's folders onto the Pod's
          volumeMounts:
            - name: varlog
              # Kubernetes captures stdout/stderr logs from each pod to the Node's /var/log path.
              mountPath: /var/log
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
      terminationGracePeriodSeconds: 30
      volumes:
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: fluentd
  namespace: logging
  labels:
    app: fluentd
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: fluentd
  labels:
    app: fluentd
rules:
  - apiGroups:
      - ""
    resources:
      - pods
      - namespaces
    verbs:
      - get
      - list
      - watch
---
# Bind the above ClusterRole permissions to the ServiceAccount
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: fluentd
roleRef:
  kind: ClusterRole
  name: fluentd
  apiGroup: rbac.authorization.k8s.io
subjects:
  - kind: ServiceAccount
    name: fluentd
    namespace: logging

Let's deploy the fluentd resources.

$ kubectl apply -f infra/fluentd.yaml

We should see three fluentd pods, one for each node.

$ kubectl get pods -n logging
NAME                          READY   STATUS    RESTARTS   AGE
pod/elasticsearch-cluster-0   1/1     Running   0          19m
pod/elasticsearch-cluster-1   1/1     Running   0          18m
pod/elasticsearch-cluster-2   1/1     Running   0          18m
pod/fluentd-5c79q             1/1     Running   0          3m13s
pod/fluentd-j8szn             1/1     Running   0          3m13s
pod/fluentd-wrkkd             1/1     Running   0          3m13s

Deploy Kibana

We've deployed Elasticsearch and fluentd, which means that we can now capture logs from pods running in our Kubernetes cluster and search them by querying Elasticsearch's REST API.

Searching the logs via the REST API is not a great user experience though. Instead we could use Kibana, which provides a web page to search for logs in a web browser!

Because Kibana is a stateless application, we will deploy it as a Deployment resource instead of StatefulSet. Additionally, we will set the Kibana server base path to the /kibana endpoint.

# infra/kibana.yaml
apiVersion: v1
kind: Service
metadata:
  name: kibana
  namespace: logging
  labels:
    app: kibana
spec:
  ports:
    - port: 5601
  selector:
    app: kibana
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kibana
  namespace: logging
  labels:
    app: kibana
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kibana
  template:
    metadata:
      labels:
        app: kibana
    spec:
      containers:
        - name: kibana
          image: docker.elastic.co/kibana/kibana:7.15.2
          resources:
            limits:
              cpu: 1000m
            requests:
              cpu: 100m
          env:
            # This should match the path specified in the Ingress file
            - name: SERVER_BASEPATH
              value: "/kibana"
            - name: SERVER_REWRITEBASEPATH
              value: "true"
            - name: ELASTICSEARCH_HOSTS
              value: http://elasticsearch:9200
            - name: ELASTICSEARCH_URL
              value: http://elasticsearch:9200
          ports:
            - containerPort: 5601

$ kubectl apply -f infra/kibana.yaml

We should see one Kibana pod running.

$ kubectl get pods -n logging
NAME                      READY   STATUS    RESTARTS   AGE
elasticsearch-cluster-0   1/1     Running   0          24m
elasticsearch-cluster-1   1/1     Running   0          23m
elasticsearch-cluster-2   1/1     Running   0          23m
fluentd-5c79q             1/1     Running   0          8m19s
fluentd-j8szn             1/1     Running   0          8m19s
fluentd-wrkkd             1/1     Running   0          8m19s
kibana-64dd44844b-cnls2   1/1     Running   0          4m24s

Let's set up a port-forward to the Kibana pod so that we can access it from our web browser locally.

$ kubectl port-forward kibana-64dd44844b-cnls2 5601:5601 -n logging
Forwarding from 127.0.0.1:5601 -> 5601
Forwarding from [::1]:5601 -> 5601

In a browser, visit http://localhost:5601/kibana. We should be greeted with the Kibana home web page.

Kibana home page

Before we can start seeing logs, we need to create an "index pattern". Open the sidebar -> Analytics -> Discover -> "Create index pattern".

Kibana create index pattern page

Now enter logstash-* in the "Name" field and @timestamp in the "Timestamp" field, and click on "Create index pattern" to continue.

Kibana create index pattern details page

Head back to Discover in the sidebar, and we should see all the logs now! Sweet!

Kibana discover page

Deploying a counter

Let's check that logging works properly by deploying an application in a different namespace than logging. Our application is a counter that logs the date and time every second to stdout.

# infra/counter.yaml
apiVersion: v1
kind: Pod
metadata:
  name: counter
spec:
  containers:
    - name: count
      image: busybox
      args: [/bin/sh, -c, 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done']

We can inspect the logs from the counter pod with the following kubectl command:

$ kubectl logs counter
0: Mon Dec  6 01:20:21 UTC 2021
1: Mon Dec  6 01:20:22 UTC 2021
2: Mon Dec  6 01:20:23 UTC 2021
3: Mon Dec  6 01:20:24 UTC 2021
4: Mon Dec  6 01:20:25 UTC 2021
5: Mon Dec  6 01:20:26 UTC 2021
6: Mon Dec  6 01:20:27 UTC 2021
7: Mon Dec  6 01:20:28 UTC 2021
8: Mon Dec  6 01:20:29 UTC 2021
9: Mon Dec  6 01:20:30 UTC 2021
10: Mon Dec  6 01:20:31 UTC 2021

Now, let's search Kibana for kubernetes.pod_name: counter, and we should now see the same counter pod's logs!

Kibana counter logs

References

https://www.digitalocean.com/community/tutorials/how-to-set-up-an-elasticsearch-fluentd-and-kibana-efk-logging-stack-on-kubernetes
https://www.elastic.co/blog/found-leader-election-in-general
https://www.elastic.co/blog/a-new-era-for-cluster-coordination-in-elasticsearch