Vault is a secrets management, encryption-as-a-service and privilege management tool from hashicorp. Its features are briefly described as follows.

  1. secrets management: support for saving various custom information, automatic generation of various types of keys, vault automatically generated keys can also be automatically rotated (rotate)
  2. authentication: support access to major cloud vendors’ account systems (such as the Aliyun RAM sub-account system) or LDAP, etc. for authentication, without creating additional account systems.
  3. permissions management: policy, you can set very detailed ACL permissions.
  4. key engine: it also supports taking over the account system of major cloud vendors (such as AliCloud RAM subaccount system) to achieve automatic rotation of API Key.
  5. Support access to kubernetes rbac permissions system, and configure permissions for each Pod individually through serviceaccount+role.
  • Support for injecting secrets into pods via sidecar/init-container or synchronizing vault data to k8s secrets via k8s operator

Before using Vault, we were using Apollo, a Ctrip open source, as a distributed configuration center for microservices.

Apollo is very popular in China. It is powerful, supports configuration inheritance, and also provides HTTP API for automation. The disadvantage is that permission management and secrets management are weak, and it does not support information encryption, which is not suitable for direct storage of sensitive information. So we have switched to Vault.

At present, our local CI/CD streamline and microservice system on the cloud are using Vault to do secrets management.

I. Vault Basic Concepts

Let’s start with a diagram of the Vault architecture.

image

As you can see, almost all Vault components are collectively called “Barrier”, and the Vault can be simply divided into three parts: Storage Backend, Barrier and HTTP/S API.

In analogy to a bank vault, the Barrier is the “steel” and “concrete” surrounding the Vault, through which all data flows between the Storage Backend and the client.

“The Barrier ensures that only encrypted data is written to the Storage Backend, and that encrypted data is verified and decrypted as it is read out through the Barrier.

Much like the door to a bank vault, the Barrier must be unsealed before the data in the Storage Backend can be decrypted.

1. Data Storage and Encryption Decryption

Storage Backend: The Vault does not store data itself, so it needs to be configured with a “Storage Backend”. The “Storage Backend” is not trusted and is only used to store encrypted data.

Initialization: Vault needs to be initialized when it is first started, this step generates an Encryption Key which is used to encrypt the data.

Unseal: After the Vault is started, it will enter the “Sealed” state because it does not know the “Encryption Key”.

The “encryption key” is protected by the “master key” and we must provide the “master key” to complete the Unseal operation.

By default, Vault uses the Shamir key sharing algorithm to split the “master key” into five “Key Shares”, and any three of these “Key Shares” must be provided to reconstruct the “master key” for Unseal.

image

The number of “Key Shares” and the minimum number of key shares needed to rebuild the “master key” can be adjusted. The Shamir key sharing algorithm can also be turned off, so that the master key will be used directly in Unseal.

2. Authentication system and permission system

After unblocking is complete, Vault is ready to start processing requests.

The entire processing of HTTP requests is managed by the vault core, which forces ACL checks and ensures audit logging is completed.

When a client first connects to the vault, it needs to complete authentication. vault’s “auth methods” module has a number of authentication methods to choose from.

  • User-friendly authentication methods, suitable for administrators: username/password, cloud provider, ldap
    • When creating a user, you need to bind policy for the user and give appropriate permissions.
  • Application-friendly methods for applications: public/private keys, tokens, kubernetes, jwt

Authentication requests flow through Core and into auth methods, which determine whether the request is valid and return a list of “policies”.

ACL Policies are managed and stored by the policy store, and ACL checks are performed by the core. The default behavior of ACLs is deny, which means that an action will be denied unless the Policy is explicitly configured to allow it.

After the authentication is done by auth methods and the returned “associated policies” are OK, the “token store” will generate and manage a new token which will be returned to the client for subsequent requests.

Similar to cookies on web sites, tokens have a lease or expiration date, which enhances security.

The token is associated with policy policies that are used to verify the permissions of the request.

If the secret engine returns a secret (automatically generated by the vault), Core registers it with the expiration manager and appends a lease ID to it. lease IDs are used by the client to renew (renew) or revoke (revoke). The lease ID is used by the client to update (renew) or revoke (revoke) the secret it has received.

If the client allows the lease to expire, the expiration manager will automatically revoke the secret.

Core is responsible for processing requests and response logs from the audit broker, sending requests to all configured audit devices.

3. Secret Engine

Secret Engines are components that store, generate or encrypt data and are very flexible.

Some Secret Engines simply store and read data, such as kv, which can be thought of as an encrypted Redis. Others connect to other services and generate dynamic credentials on demand.

There are also Secret Engines that provide “encryption as a service” capabilities, such as transit, certificate management, etc.

Examples of commonly used engines.

  • AliCloud Secrets Engine: dynamically generates AliCloud Access Token based on RAM policy, or AliCloud STS credentials based on RAM role
    • Access Token is automatically updated (Renew), while STS credentials are used temporarily and expire after expiration.
  • kv: Key-value store, which can be used to store some static configurations. It can replace the Apollo configuration center in Ctrip to some extent.
  • Transit Secrets Engine: Provides encryption-as-a-service function, it is only responsible for encryption and decryption, not storage. The main application scenario is to help apps encrypt and decrypt data, but the data is still stored in MySQL and other databases.

II. Deploying Vault

The official recommendation is to deploy vault via Helm.

  1. deploy and run vault using helm/docker.
  2. Initialize/unblock vault: vault security measures, each reboot must be unblocked (can be set to automatically unblock).

0. How do I choose a storage backend?

First, we definitely need HA, or at least retain the ability to upgrade to HA, so it is not recommended to choose a backend that does not support HA.

The exact choice varies depending on the team’s experience, and people tend to prefer to use a backend they are familiar with and know well, or go with a cloud service.

For example, if we are familiar with MySQL/PostgreSQL, and we don’t need to think too much about maintenance when using a database provided by a cloud service, and MySQL as a common protocol is not kidnapped by cloud vendors, then we tend to use MySQL/PostgreSQL.

And if you are locally self-built, then you may prefer to use Etcd/Consul/Raft for backend storage.

1. docker-compose deployment (non-HA)

Recommended for local development test environments, or other environments that do not require high availability.

The docker-compose.yml example is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
version: '3.3'
services:
  vault:
    # 文档:https://hub.docker.com/_/vault
    image: vault:1.6.0
    container_name: vault
    ports:
      # rootless 容器,内部不能使用标准端口 443
      - "443:8200"
    restart: always
    volumes:
      # 审计日志存储目录,默认不写审计日志,启用 `file` audit backend 时必须提供一个此文件夹下的路径
      - ./logs:/vault/logs
      # 当使用 file data storage 插件时,数据被存储在这里。默认不往这写任何数据。
      - ./file:/vault/file
      # 配置目录,vault 默认 `/valut/config/` 中所有以 .hcl/.json 结尾的文件
      # config.hcl 文件内容,参考 cutom-vaules.yaml
      - ./config.hcl:/vault/config/config.hcl
      # TLS 证书
      - ./certs:/certs
    # vault 需要锁定内存以防止敏感值信息被交换(swapped)到磁盘中
    # 为此需要添加如下能力
    cap_add:
      - IPC_LOCK
    # 必须手动设置 entrypoint,否则 vault 将以 development 模式运行
    entrypoint: vault server -config /vault/config/config.hcl

config.hcl reads as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
ui = true

// 使用文件做数据存储(单节点)
storage "file" {
  path    = "/vault/file"
}

listener "tcp" {
  address = "[::]:8200"

  tls_disable = false
  tls_cert_file = "/certs/server.crt"
  tls_key_file  = "/certs/server.key"
}

Save the two configurations as above in the same non-folder and provide the TLS certificate server.crt and the private key server.key in . /certs with the TLS certificate server.crt and the private key server.key.

Then docker-compose up -d will start running a vault instance.

2. Deploy a highly available vault via helm

Recommended for production environments

Deploy via helm.

1
2
3
4
5
6
# 添加 valut 仓库
helm repo add hashicorp https://helm.releases.hashicorp.com
# 查看 vault 版本号
helm search repo hashicorp/vault -l | head
# 下载某个版本号的 vault
helm pull hashicorp/vault --version  0.11.0 --untar

Refer to the downloaded . /vault/values.yaml to write custom-values.yaml and deploy a HA vault with mysql as the backend storage:

The configuration is extensive, but mostly copied directly from ./vault/values.yaml, with very few changes. Most of the configuration items can be ignored when testing the vault.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
global:
  # enabled is the master enabled switch. Setting this to true or false
  # will enable or disable all the components within this chart by default.
  enabled: true
  # TLS for end-to-end encrypted transport
  tlsDisable: false

injector:
  # True if you want to enable vault agent injection.
  enabled: true

  replicas: 1

  # If true, will enable a node exporter metrics endpoint at /metrics.
  metrics:
    enabled: false

  # Mount Path of the Vault Kubernetes Auth Method.
  authPath: "auth/kubernetes"

  certs:
    # secretName is the name of the secret that has the TLS certificate and
    # private key to serve the injector webhook. If this is null, then the
    # injector will default to its automatic management mode that will assign
    # a service account to the injector to generate its own certificates.
    secretName: null

    # caBundle is a base64-encoded PEM-encoded certificate bundle for the
    # CA that signed the TLS certificate that the webhook serves. This must
    # be set if secretName is non-null.
    caBundle: ""

    # certName and keyName are the names of the files within the secret for
    # the TLS cert and private key, respectively. These have reasonable
    # defaults but can be customized if necessary.
    certName: tls.crt
    keyName: tls.key

server:
  # Resource requests, limits, etc. for the server cluster placement. This
  # should map directly to the value of the resources field for a PodSpec.
  # By default no direct resource request is made.

  # Enables a headless service to be used by the Vault Statefulset
  service:
    enabled: true
    # Port on which Vault server is listening
    port: 8200
    # Target port to which the service should be mapped to
    targetPort: 8200


  # This configures the Vault Statefulset to create a PVC for audit
  # logs.  Once Vault is deployed, initialized and unseal, Vault must
  # be configured to use this for audit logs.  This will be mounted to
  # /vault/audit
  # See https://www.vaultproject.io/docs/audit/index.html to know more
  auditStorage:
    enabled: false

  # Run Vault in "HA" mode. There are no storage requirements unless audit log
  # persistence is required.  In HA mode Vault will configure itself to use Consul
  # for its storage backend.  The default configuration provided will work the Consul
  # Helm project by default.  It is possible to manually configure Vault to use a
  # different HA backend.
  ha:
    enabled: true
    replicas: 3

    # Set the api_addr configuration for Vault HA
    # See https://www.vaultproject.io/docs/configuration#api_addr
    # If set to null, this will be set to the Pod IP Address
    apiAddr: null

    # config is a raw string of default configuration when using a Stateful
    # deployment. Default is to use a Consul for its HA storage backend.
    # This should be HCL.
    
    # Note: Configuration files are stored in ConfigMaps so sensitive data 
    # such as passwords should be either mounted through extraSecretEnvironmentVars
    # or through a Kube secret.  For more information see: 
    # https://www.vaultproject.io/docs/platform/k8s/helm/run#protecting-sensitive-vault-configurations
    config: |
      ui = true

      listener "tcp" {
        address = "[::]:8200"
        cluster_address = "[::]:8201"

        # 注意,这个值要和 helm 的参数 global.tlsDisable 一致
        tls_disable = false
        tls_cert_file = "/etc/certs/vault.crt"
        tls_key_file  = "/etc/certs/vault.key"
      }

      # storage "postgresql" {
      #   connection_url = "postgres://username:password@<host>:5432/vault?sslmode=disable"
      #   ha_enabled = true
      # }

      service_registration "kubernetes" {}

      # Example configuration for using auto-unseal, using AWS KMS. 
      # the cluster must have a service account that is authorized to access AWS KMS, throught an IAM Role.
      # seal "awskms" {
      #   region     = "us-east-1"
      #   kms_key_id = "<some-key-id>"
      #   默认情况下插件会使用 awskms 的公网 enpoint,但是也可以使用如下参数,改用自行创建的 vpc 内网 endpoint
      #   endpoint   = "https://<vpc-endpoint-id>.kms.us-east-1.vpce.amazonaws.com"
      # }            

  # Definition of the serviceAccount used to run Vault.
  # These options are also used when using an external Vault server to validate
  # Kubernetes tokens.
  serviceAccount:
    create: true
    name: "vault"
    annotations:
      # 如果要使用 auto unseal 的话,这个填写拥有 awskms 权限的 AWS IAM Role
      eks.amazonaws.com/role-arn: <role-arn>

# Vault UI
ui:
  enabled: true
  publishNotReadyAddresses: true
  serviceType: ClusterIP
  activeVaultPodOnly: true
  externalPort: 8200

Now deploy vautl using the custom custom-values.yaml:

1
2
3
kubectl create namespace vault
# 安装/升级 valut
helm upgrade --install vault ./vault --namespace vault -f custom-values.yaml

3. initialize and unseal vault

Official documentation: Initialize and unseal Vault - Vault on Kubernetes Deployment Guide

Deploying a vault via helm deploys a three-copy StatefulSet by default, but all three copies will be in a NotReady state (as will the docker deployment). Next, you need to manually initialize and unblock the vault in order to Ready :

  • Step 1: Select any one of the three copies and run the vault initialization command: kubectl exec -ti vault-0 -- vault operator init
    • The initialization operation will return 5 unseal keys, and an Initial Root Token, which is very sensitive and important, and must be saved to a safe place!
  • Step 2: On each copy, use any three unseal keys for the unseal operation.
    • There are three copies in total, which means you have to unseal 3*3 times to complete the complete unsealing of the vault!
1
2
3
4
5
# 每个实例都需要解封三次!
## Unseal the first vault server until it reaches the key threshold
$ kubectl exec -ti vault-0 -- vault operator unseal # ... Unseal Key 1
$ kubectl exec -ti vault-0 -- vault operator unseal # ... Unseal Key 2
$ kubectl exec -ti vault-0 -- vault operator unseal # ... Unseal Key 3

This completes the deployment, but be aware that the **vault instance needs to be unblocked again after each restart! That is, re-do the second step! **

4. Initialize and set auto unseal

Without setting auto unseal, vault has to unseal all vault instances manually every time it restarts, which is really troublesome.

To simplify this process, you can consider configuring auto unseal to allow vault unblocking automatically.

There are currently two ways to do this.

  • Use the key repository provided by cloud services such as AliCloud/AWS/Azure to manage the encryption key
    • AWS: awskms Seal
      • If it is a k8s cluster, the ServiceAccount used by vault needs to have permission to use AWS KMS, which replaces the access_key/secret_key attributes in config.hcl
    • AliCloud: alicloudkms Seal
  • If you don’t want to use cloud services, then consider autounseal-transit, which uses the transit engine provided by another vault instance to This approach uses the transit engine provided by another vault instance to implement auto-unseal.
  • Simple and brutal: write a crontab or add a timed task to the CI platform to execute the unseal command to achieve auto-unseal. But this is not very secure.

Take awskms as an example, first create aws IAM policy with the following content:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VaultKMSUnseal",
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt",
                "kms:Encrypt",
                "kms:DescribeKey"
            ],
            "Resource": "*"
        }
    ]
}

Then create an IAM Role to bind the above policy, and create an IAM Role for the vault’s k8s serviceaccount to bind to this policy.

This way the serviceaccount used by the vault will have access to awskms itself and will not need to access awskms via access_key/secret_key.

For more information on how to bind IAM roles to k8s serviceaccounts, see the official documentation: IAM roles for EKS service accounts

When you are done, modify the helm configuration provided earlier, deploy it, and finally initialize it with the following command.

1
2
3
4
# 初始化命令和普通模式并无不同
kubectl exec -ti vault-0 -- vault operator init
# 会打印出一个 root token,以及五个 Recovery Key(而不是 Unseal Key)
# Recover Key 不再用于解封,但是重新生成 root token 等操作仍然会需要用到它.

Then it’s done, you can try to delete the vault pod, the new pod should be automatically unblocked.

III. Vault’s own configuration management

Vault itself is a complex secrets tool that provides Web UI and CLI for manually managing and viewing the contents of Vault.

But as a DevOps, we of course prefer an automated approach, and there are two options:

While the Web UI is suitable for manual work, sdk/ terraform-provider-vault is suitable for automating the management of the vault.

Our test environment uses pulumi-vault to automate the configuration of the vault policy and kubernetes role, and then automate the injection of all secrets for testing.

1. Automated vault configuration using pulumi

The advantage of using pulumi to manage the vault configuration is great because the sensitive information of the resources on the cloud (database account passwords, resource IDs, RAM subaccounts) are created by pulumi.

Combined with the use of pulumi_valut, the sensitive information is automatically generated and immediately saved to the vault, making it fully automated.

Subsequent microservices can then read sensitive information directly from the vault through kubernetes authentication.

Or it can be written to a local vault for backup, so that administrators can log in and view the sensitive information when needed.

1.1 Token generation

pulumi_vault itself is quite simple, declarative configuration, just use it directly.

However, it must require VAULT_TOKEN as authentication credentials (in practice, userpass/approle cannot be used directly, it will report no vault token found), and pulumi will also generate a temporary child token, which will be used for subsequent operations.

The root token should be blocked and should not be enabled except for emergencies.

So how should I generate a token with limited privileges for the vault to use? My approach is to create a userpass account and give it limited privileges via policy. Then manually (or automatically) login to get the token, and then provide the token to pulumi_vault.

The catch is that you have to give the userpass account the permission to create a child token.

1
2
3
4
5
6
7
8
path "local/*" {
  capabilities = ["read", "list"]
}

// 允许创建 child token
path "auth/token/create" {
  capabilities = ["create", "read", "update", "delete", "list"]
}

Without this permission, pulumi_vault will keep reporting errors.

Then you have to give it the permissions it needs to “automate the configuration”, such as creating/updating policy/secrets/kubernetes automatically, as shown in the following example:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# To list policies - Step 3
path "sys/policy"
{
  capabilities = ["read"]
}

# Create and manage ACL policies broadly across Vault
path "sys/policy/*"
{
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

# List, create, update, and delete key/value secrets
path "secret/*"
{
  capabilities = ["create", "read", "update", "delete", "list", "sudo"]
}

path "auth/kubernetes/role/*"
{
  capabilities = ["create", "read", "update", "list"]
}

IV. Injecting secrets in Kubernetes using vault

image

As mentioned earlier, vault supports assigning permissions to each Pod individually through Kubernetes’ ServiceAccount.

There are two ways for applications to read the configuration in vault.

  • With Vault Sidecar, secrets are automatically injected into the Pod as a file, such as /vault/secrets/config.json.
    • The vault sidecar updates the configuration every 15 seconds in resident mode, and applications can use watchdog to monitor changes to the secrets file in real time.
  • The application itself uses the SDK to access the vault api directly to get secrets

Both of these methods can be used to authenticate and assign permissions with the Kubernetes ServiceAccount.

The following is an example of how to inject secrets into a Pod as a file in Sidecar mode.

1. Deploy and configure the vault agent

First enable Kubernetes authentication for Vault:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# 配置身份认证需要在 vault pod 中执行,启动 vault-0 的交互式会话
kubectl exec -n vault -it vault-0 -- /bin/sh
export VAULT_TOKEN='<your-root-token>'
export VAULT_ADDR='http://localhost:8200'
 
# 启用 Kubernetes 身份验证
vault auth enable kubernetes

# kube-apiserver API 配置,vault 需要通过 kube-apiserver 完成对 serviceAccount 的身份验证
vault write auth/kubernetes/config \
    token_reviewer_jwt="$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
    kubernetes_host="https://$KUBERNETES_PORT_443_TCP_ADDR:443" \
    kubernetes_ca_cert=@/var/run/secrets/kubernetes.io/serviceaccount/ca.crt

1.1 Using valut instances external to the cluster

If you do not have this requirement, please skip this section.

See [Install the Vault Helm chart configured to address an external Vault](https://learn.hashicorp.com/tutorials/vault/kubernetes-external- vault?in=vault/kubernetes#install-the-vault-helm-chart-configured-to-address-an-external-vault)

kubernetes can also be integrated with external vault instances, with only the vault-agent deployed in the cluster.

This is suitable for multiple kubernetes clusters and other APPs that share a single vault instance, such as our local development and testing clusters, which all share the same vault instance to facilitate unified management of application secrets.

First, use helm chart to deploy the vault-agent to access the external vault instance. The custom-values.yaml example used is as follows.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
global:
  # enabled is the master enabled switch. Setting this to true or false
  # will enable or disable all the components within this chart by default.
  enabled: true
  # TLS for end-to-end encrypted transport
  tlsDisable: false

injector:
  # True if you want to enable vault agent injection.
  enabled: true

  replicas: 1

  # If multiple replicas are specified, by default a leader-elector side-car
  # will be created so that only one injector attempts to create TLS certificates.
  leaderElector:
    enabled: true
    image:
      repository: "gcr.io/google_containers/leader-elector"
      tag: "0.4"
    ttl: 60s

  # If true, will enable a node exporter metrics endpoint at /metrics.
  metrics:
    enabled: false

  # External vault server address for the injector to use. Setting this will
  # disable deployment of a  vault server along with the injector.
  # TODO 这里的 https ca.crt 要怎么设置?mTLS 又该如何配置?
  externalVaultAddr: "https://<external-vault-url>"

  # Mount Path of the Vault Kubernetes Auth Method.
  authPath: "auth/kubernetes"

  certs:
    # secretName is the name of the secret that has the TLS certificate and
    # private key to serve the injector webhook. If this is null, then the
    # injector will default to its automatic management mode that will assign
    # a service account to the injector to generate its own certificates.
    secretName: null

    # caBundle is a base64-encoded PEM-encoded certificate bundle for the
    # CA that signed the TLS certificate that the webhook serves. This must
    # be set if secretName is non-null.
    caBundle: ""

    # certName and keyName are the names of the files within the secret for
    # the TLS cert and private key, respectively. These have reasonable
    # defaults but can be customized if necessary.
    certName: tls.crt
    keyName: tls.key

Now on the vault instance side, enable kubernetes authentication, and within the vault instance, execute the following command.

Obviously there is no kubectl or kubeconfig inside the vault instance, so for simplicity, the following vault commands can also be done via the Web UI.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
export VAULT_TOKEN='<your-root-token>'
export VAULT_ADDR='http://localhost:8200'
 
# 启用 Kubernetes 身份验证
vault auth enable kubernetes
 
# kube-apiserver API 配置,vault 需要通过 kube-apiserver 完成对 serviceAccount 的身份验证
# TOKEN_REVIEW_JWT: 就是我们前面创建的 secret `vault-auth`
TOKEN_REVIEW_JWT=$(kubectl -n vault get secret vault-auth -o go-template='{{ .data.token }}' | base64 --decode)
# kube-apiserver 的 ca 证书
KUBE_CA_CERT=$(kubectl -n vault config view --raw --minify --flatten -o jsonpath='{.clusters[].cluster.certificate-authority-data}' | base64 --decode)
# kube-apiserver 的 url
KUBE_HOST=$(kubectl config view --raw --minify --flatten -o jsonpath='{.clusters[].cluster.server}')

vault write auth/kubernetes/config \
        token_reviewer_jwt="$TOKEN_REVIEW_JWT" \
        kubernetes_host="$KUBE_HOST" \
        kubernetes_ca_cert="$KUBE_CA_CERT"

This completes the integration of kubernetes with the external vault!

2. Associate the k8s rbac permissions system with the vault

Next things to do.

  • Define which resources each role (microservice) can access through the vault policy.
  • Generate a role for each microservice, and bind the corresponding vault policy and kubernetes serviceaccount to the role.
    • The role is a property of the vault’s kubernetes plugin itself, it has nothing to do with the kubernetes role.
  • Create a ServiceAccount and use this to deploy microservices using this ServiceAccount

The first and second steps can be automated through the vault api. The third step can be done during deployment via kubectl.

For convenience, it is recommended to use the same name as the microservice for all three configurations, vault policy / role / k8s serviceaccount.

In the above configuration, the role acts as a primer, associating the k8s serviceaccount and vault policy configurations.

For example, create a vault policy named my-app-policy with the content:

1
2
3
4
5
6
7
8
# 允许读取数据
path "my-app/data/*" {
   capabilities = ["read", "list"]
}
// 允许列出 myapp 中的所有数据(kv v2)
path "myapp/metadata/*" {
    capabilities = ["read", "list"]
}

Then, in the vault kuberntes plugin configuration, create role my-app-role with the following configuration:

  • Associate the serviceaccount my-app-account in the k8s default namespace and create the serviceaccount.
  • Associate the vault token policy, which is the my-app-policy created earlier.
  • Set the token period (validity)

After this, each microservice will be able to read all the information in my-app from the vault via serviceaccount.

3. Deploy Pod

Reference: https://www.vaultproject.io/docs/platform/k8s/injector

The next step is to inject the configuration into the microservice container, which requires the use of Agent Sidecar Injector. vault automates the injection and dynamic update of the configuration via sidecar.

Specifically, a bunch of Agent Sidecar Injector annotations are added to the Pod, or if there are more configurations, they can be saved using configmap and referenced in the annotations.

Note that vault-inject-agent has two modes of operation.

  • init mode: Initialize only once before the Pod starts, and exit after running (Completed)
  • resident mode: the container does not exit, continuously monitors the vault for configuration updates and maintains synchronization between Pod configuration and vualt configuration.

Example.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: my-app
  name: my-app
  namespace: default
spec:
  minReadySeconds: 3
  progressDeadlineSeconds: 60
  revisionHistoryLimit: 3
  selector:
    matchLabels:
      app: my-app
  strategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
  template:
    metadata:
      annotations:
        vault.hashicorp.com/agent-init-first: 'true'  # 是否使用 initContainer 提前初始化配置文件
        vault.hashicorp.com/agent-inject: 'true'
        vault.hashicorp.com/secret-volume-path: vault
        vault.hashicorp.com/role: "my-app-role"  # vault kubernetes 插件的 role 名称
        vault.hashicorp.com/agent-inject-template-config.json: |
                                        # 渲染模板的语法在后面介绍
        vault.hashicorp.com/agent-limits-cpu: 250m
        vault.hashicorp.com/agent-requests-cpu: 100m
        # 包含 vault 配置的 configmap,可以做更精细的控制
        # vault.hashicorp.com/agent-configmap: my-app-vault-config
      labels:
        app: my-app
    spec:
      containers:
      - image: registry.svc.local/xx/my-app:latest
        imagePullPolicy: IfNotPresent
        # 此处省略若干配置...
      serviceAccountName: my-app-account

Common errors.

  • vault-agent(sidecar) error: namespace not authorized
    • The role in auth/kubernetes/config is not bound to the Pod namespace.
  • vault-agent(sidecar) error: `permission denied
    • Check the logs of the vault instance, there should be a corresponding error log, most likely auth/kubernetes/config is not paired, vault cannot verify the kube-apiserver tls certificate, or the kubernetes token used does not have permission.
  • vault-agent(sidecar) error: service account not authorized.
    • The role in auth/kubernetes/config is not bound to the serviceAccount used by the Pod.

4. vault agent configuration

For the vault-agent configuration, the following should be noted.

  • If you use configmap to provide the full config.hcl configuration, note that agent-init

vautl-agent template description.

The most popular configuration file format is json/yaml, so take json as an example, for each microservice’s kv data, consider saving all its personalized configuration under <engine-name>/<service-name>/, and then inject the configuration using the following template.

1
2
3
4
5
6
7
8
{
    {{ range secrets "<engine-name>/metadata/<service-name>/" }}
        "{{ printf "%s" . }}": 
        {{ with secret (printf "<engine-name>/<service-name>/%s" .) }}
        {{ .Data.data | toJSONPretty }},
        {{ end }}
    {{ end }}
}

See: https://github.com/hashicorp/consul-template#secret for template syntax details

Note: For v2 kv secrets, the list interface has changed, so when traversing v2 kv secrets, you must write range secrets "<engine-name>/metadata/<service-name>/", i.e. insert metadata in between, and read/list permissions for <engine-name>/metadata/<service-name>/ must be open in the policy! The official documentation doesn’t mention this at all, I figured this out by debugging with wireshark packet grabbing and comparing with the official KV Secrets Engine - Version 2 (API).

The generated content will be in json format, but there is an incompatibility: the last secrets has a comma , at the end of the rendered effect.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
    "secret-a": {
  "a": "b",
  "c": "d"
},
    "secret-b": {
  "v": "g",
  "r": "c"
},
}

Because of the trailing comma, parsing it directly using the json standard library will report an error. So how do I parse it? I found the solution on the almighty stackoverflow: yaml is fully compatible with json syntax and supports trailing comma!

Take python for example, yaml.safe_load() will parse the json generated by vault perfectly.

5. Expanding: Other ways to use vault in kubernetes

In addition to using the official sidecar schema for secrets injection, the community has provided some other options, which can be found at

  • hashicorp/vault-csi-provider: the official Beta project that mounts vault secrets as data volumes via the Secrets Store CSI driver mount the vault secrets as a data volume in the pod
  • kubernetes-external-secrets: provides CRD definitions to synchronize secret from vault to kubernetes secrets

The official sidecar/init-container model is still the most recommended.

V. Automatic rotation of AWS IAM Credentials using vault

To be continued.