The database used by Open Hackathon is MongoDB, and in containerization, the security of data is a top priority.

Storage issues

Stateful applications need to use PersistentVolume when deployed on Kubernetes, but if the underlying PV storage is unreliable, even with PV, data security is still not guaranteed.

In a general usage scenario, the application needs to define a PersistentVolumeClaim to describe the required storage resources and use the PersistentVolumeClaim in the Pod, and the cluster will create or find a PersistentVolumeClaim based on the description in the PersistentVolumeClaim. PersistentVolume, so when the Pod reads and writes to the Volume in the container, the data is persisted to the PersistentVolume.

PersistentVolumeClaim

There are two ways to create PersistentVolume, the first one is to create several PersistentVolume manually by the cluster administrator, when PersistentVolumeClaim is created, the cluster will look for PersistentVolume that meets the requirements and bind to it. When PersistentVolumeClaim is deleted, the binding relationship is released and PersistentVolume is triggered by the recycling policy.

The second way is to create a default StorageClass and use the underlying storage that supports auto-expansion, when PersistentVolumeClaim is created, the cluster will automatically create a PersistentVolume in the underlying storage and bind to it. When PersistentVolumeClaim is deleted, PersistentVolume is automatically cleaned up as well.

PersistentVolumeClaim

But no matter which of these two approaches, you need to consider what the underlying storage is, i.e. where exactly the PersistentVolume data is stored. If you are building your own cluster, you can consider Ceph, GlusterFS.

For public cloud services, you can consider the storage service provided by the service provider, for example, in Huawei Cloud CCE (hosted Kubernetes cluster), you can use cloud hard disk for the underlying storage: https://support.huaweicloud.com/usermanual-cce/cce_01_0044.html

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: mongo-pv-claim
  namespace: ohp
  annotations:
    volume.beta.kubernetes.io/storage-class: sata
    volume.beta.kubernetes.io/storage-provisioner: flexvolume-huawei.com/fuxivol
  labels:
    failure-domain.beta.kubernetes.io/region: cn-north-1
    failure-domain.beta.kubernetes.io/zone: cn-north-1a
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi

Single Instance Deployment

For scenarios without high availability requirements, you can use single-instance deployments, where you only need to run one instance of MongoDB and mount a persistable store for that instance.

you can use single-instance deployments

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
apiVersion: v1
kind: Secret
metadata:
  name: mongo-auth
  namespace: ohp
type: Opaque
data:
  username: cm9vdAo=
  password: cGFzc3dvcmQK
---
apiVersion: v1
kind: Service
metadata:
  name: mongo
  namespace: ohp
spec:
  ports:
  - port: 27017
  selector:
    app: mongo
  clusterIP: None
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: mongo
  namespace: ohp
spec:
  selector:
    matchLabels:
      app: mongo
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: mongo
    spec:
      containers:
      - image: mongo
        name: mongo
        env:
        - name: MONGO_INITDB_ROOT_USERNAME
          valueFrom:
            secretKeyRef:
              name: mongo-auth
              key: username
        - name: MONGO_INITDB_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: mongo-auth
              key: password
        ports:
        - containerPort: 27017
          name: mongo
        volumeMounts:
        - name: db-persistent-storage
          mountPath: /data/db
      volumes:
      - name: db-persistent-storage
        persistentVolumeClaim:
          claimName: mongo-pv-claim

There are two points to note.

  1. Mongo’s authentication username and password are configured in Secret.
  2. The SVC specifies that the ClusterIP is None, which means that the service will be resolved to PodIP directly.
  3. the Deployment’s publishing policy is to rebuild, because PV can only be mounted to one Pod, so you should avoid multiple Pods at the same time (the Deployment can not be expanded).

High Availability Cluster

There are many options for Mongo to do HA, there is a post in Kubernetes Blog about how to build HA mongoDB using GCE, using the replica set high availability solution, the replica set solution should be the simplest in Kubernetes, only need to define a StatefulSet can be solved.

The StatefulSet maintains multiple Pods, each of which has a PV for persistent storage.

The StatefulSet maintains multiple Pods

For details, please refer to: Running MongoDB on Kubernetes with StatefulSets