The project uses argo-workflow as the workflow engine to orchestrate and run some hyperconverged cluster deployment related tasks, and the whole environment runs on a single node K3s. The main reason for choosing argo-workflow + K3s is to consume as few system resources as possible, since this environment will run on various laptops with different hardware configurations in the future 😂. After researching some common K8s deployment tools, we finally chose K3s, which consumes less system resources.

One of the requirements of the project is that we need to archive the workflow logs after a cluster deployment completes or fails. While it is possible to use archiveLogs: true in the spec field of workflow to have argo automatically archive the logs for us, this feature relies on an S3 object store Artifact Repository. This means deploying another component that supports S3 object stores like Minio

In fact, this requirement is very simple, I just want to save a log file, you still let me install a Minio, it is too much! Originally, the system’s resources are very limited, and need to minimize the installation of some unnecessary dependencies, in order to minimize resource utilization. But now in order to archive a log file storage and go to great lengths to install a minio is really not cost-effective. It’s like going through the trouble of deploying a 3-node kubernetes cluster just to run a static blog 😂

Deployed my blog on Kubernetes pic.twitter.com/XHXWLrmYO4

— For DevOps Eyes Only (@dexhorthy) April 24, 2017

For poor kids like us who can’t afford to use S3 object storage, it’s better to think of some other ways, after all, do it yourself and eat well.

kubectl

It’s relatively easy to implement, and for YAML engineers like us, kubectl is naturally familiar. To get the workflow logs, you just need to use the kubectl logs command to get the pod logs created by the workflow, no need for S3 object storage 😖.

Filtering pods

For the same workflow, the pod name created by each stage has a certain pattern. When defining a workflow, the generateName parameter is usually in the ${name}- format. With - as a separator, the last field is a randomly generated numeric ID, the penultimate field is the randomly generated workflow ID from argo, and the remaining preceding characters are the generateName we defined.

1
2
3
4
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: archive-log-test-
1
2
3
archive-log-test-jzt8n-3498199655                          0/2     Completed   0               4m18s
archive-log-test-jzt8n-3618624526                          0/2     Completed   0               4m8s
archive-log-test-jzt8n-2123203324                          0/2     Completed   0               3m58s

The labels of the pod also contain the ID of the workflow, so we can filter out the pods created by the workflow based on these labels.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: v1
kind: Pod
metadata:
  annotations:
    workflows.argoproj.io/node-id: archive-log-test-jzt8n-3498199655
    workflows.argoproj.io/node-name: archive-log-test-jzt8n[0].list-default-running-pods
  creationTimestamp: "2022-02-28T12:53:32Z"
  labels:
    workflows.argoproj.io/completed: "true"
    workflows.argoproj.io/workflow: archive-log-test-jzt8n
  name: archive-log-test-jzt8n-3498199655
  namespace: default
  ownerReferences:
  - apiVersion: argoproj.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: Workflow
    name: archive-log-test-jzt8n
    uid: e91df2cb-b567-4cf0-9be5-3dd6c72854cd
  resourceVersion: "1251330"
  uid: ce37a709-8236-445b-8d00-a7926fa18ed0

Filter out pods created by a workflow by -l lables; sort by creation time by -sort-by; output only the name of the pod by -o name.

1
2
3
4
$ kubectl get pods -l workflows.argoproj.io/workflow=archive-log-test-jzt8n --sort-by='.metadata.creationTimestamp' -o name
pod/archive-log-test-jzt8n-3498199655
pod/archive-log-test-jzt8n-3618624526
pod/archive-log-test-jzt8n-2123203324

Get the log

We can get the list of pods created by a workflow by following the steps above. Then we can use the kubectl logs command to get the logs of the main container in the pod. To distinguish the workflow corresponding to the logs, we will use the workflow ID as the prefix name.

1
$ kubectl logs archive-log-test-jzt8n-3618624526 -c main
1
2
3
4
5
LOG_PATH=/var/log
NAME=archive-log-test-jzt8n
kubectl get pods -l workflows.argoproj.io/workflow=${NAME} \
--sort-by='.metadata.creationTimestamp' -o name \
| xargs -I {} -t kubectl logs {} -c main >> ${LOG_PATH}/${NAME}.log

workflow

According to the official exit-handlers.yaml provided by argo-workflow example, we’ll make a step that automatically calls the workflow logs using kubectl after workflow exits, and define the exit-handler as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
  - name: exit-handler
    container:
      name: "kubectl"
      image: lachlanevenson/k8s-kubectl:v1.23.2
      command:
        - sh
        - -c
        - |
          kubectl get pods -l workflows.argoproj.io/workflow=${POD_NAME%-*} \
          --sort-by=".metadata.creationTimestamp" -o name | grep -v ${POD_NAME} \
          | xargs -I {} -t kubectl logs {} -c main >> ${LOG_PATH}/${POD_NAME%-*}.log          
      env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: LOG_PATH
          value: /var/log/workflow
      resources: {}
      volumeMounts:
        - name: nfs-datastore
          mountPath: /var/log/workflow
    retryStrategy:
      limit: "5"
      retryPolicy: OnFailure
entrypoint: archive-log-test
serviceAccountName: default
volumes:
  - name: nfs-datastore
    nfs:
      server: NFS_SERVER
      path: /data/workflow/log
onExit: exit-handler

Just copy and paste the exit-handler defined above into your workflow spec configuration. Since the logs need to be stored persistently, I am using NFS storage here, but you can change to other storage if you want, just change the volumes configuration.

The complete workflow example is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
  generateName: archive-log-test-
  namespace: default
spec:
  templates:
    - name: archive-log-test
      steps:
        - - name: list-default-running-pods
            template: kubectl
            arguments:
              parameters:
                - name: namespace
                  value: default
        - - name: list-kube-system-running-pods
            template: kubectl
            arguments:
              parameters:
                - name: namespace
                  value: kube-system

    - name: kubectl
      inputs:
        parameters:
          - name: namespace
      container:
        name: "kubectl"
        image: lachlanevenson/k8s-kubectl:v1.23.2
        command:
          - sh
          - -c
          - |
                        kubectl get pods --field-selector=status.phase=Running -n {{inputs.parameters.namespace}}

    - name: exit-handler
      container:
        name: "kubectl"
        image: lachlanevenson/k8s-kubectl:v1.23.2
        command:
          - sh
          - -c
          - |
            kubectl get pods -l workflows.argoproj.io/workflow=${POD_NAME%-*} \
            --sort-by=".metadata.creationTimestamp" -o name | grep -v ${POD_NAME} \
            | xargs -I {} -t kubectl logs {} -c main >> ${LOG_PATH}/${POD_NAME%-*}.log            
        env:
          - name: POD_NAME
            valueFrom:
              fieldRef:
                fieldPath: metadata.name
          - name: LOG_PATH
            value: /var/log/workflow
        resources: {}
        volumeMounts:
          - name: nfs-datastore
            mountPath: /var/log/workflow
      retryStrategy:
        limit: "5"
        retryPolicy: OnFailure
  entrypoint: archive-log-test
  serviceAccountName: default
  volumes:
    - name: nfs-datastore
      nfs:
        server: NFS_SERVER
        path: /data/workflow/log
  onExit: exit-handler

s