A simple analysis of how the ElasticSearch Operator works

ElasticSearch Operator

The core features of the current ElasticSearch Operator.

Elasticsearch, Kibana and APM Server deployments
TLS Certificates management
Safe Elasticsearch cluster configuration & topology changes
Persistent volumes usage
Custom node configuration and attributes
Secure settings keystore updates

Installation

Installing ElasticSearch Operator is very simple, based on ‘all in one yaml’, quickly pulling up all the components of Operator and registering the CRD.

`1`	`kubectl apply -f https://download.elastic.co/downloads/eck/1.1.2/all-in-one.yaml`

CRD

Operator has registered three main CRDs: APM, ElasticSearch, Kibana.

k get crd | grep elastic
apmservers.apm.k8s.elastic.co                  2020-05-10T08:02:15Z
elasticsearches.elasticsearch.k8s.elastic.co   2020-05-10T08:02:15Z
kibanas.kibana.k8s.elastic.co                  2020-05-10T08:02:15Z

ElasticSearch Cluster Demo

A complete ElasticSearch Cluster Yaml, including the creation of ES clusters, local PV and Kibana.

apiVersion: v1
kind: Namespace
metadata:
  creationTimestamp: null
  name: elastic-stack
---
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: elasticsearch
  namespace: elastic-stack
spec:
  version: 7.6.2
  nodeSets:
  - name: default
    count: 1
    config:
      node.master: true
      node.data: true
      node.ingest: true
      node.store.allow_mmap: false
---
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: kibana
  namespace: elastic-stack
spec:
  version: 7.6.2
  count: 1
  elasticsearchRef:
    name: elasticsearch
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: es-db-0
  labels:
    type: local
spec:
  volumeMode: Filesystem
  persistentVolumeReclaimPolicy: Retain
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/elastic/db-0"
---
apiVersion: v1
kind: PersistentVolume
metadata:
  name: es-db-1
  labels:
    type: local
spec:
  volumeMode: Filesystem
  persistentVolumeReclaimPolicy: Retain
  capacity:
    storage: 5Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/data/elastic/db-1"

Operator management of ElasticSearch

Like many declarative Api-based implementations of the Operator, the focus of the Elastic Operator revolves around the Reconcile function.

Operator management of ElasticSearch

The Reconcile function completes the entire lifecycle management of the ES cluster, which is of interest to me and briefly explains the implementation of the following functions.

configuration initialization and management
scale up and scale down of cluster nodes
lifecycle management of stateful applications

Cluster creation

After receiving an ElasticSearch CR, the Reconcile function first performs a number of legitimacy checks on the CR, starting with the Operator’s control over the CR, including whether it has a pause flag and whether it meets the Operator’s version restrictions. Once it passes, it calls internalReconcile for further processing.

Cluster creation

The internalReconcile function begins by focusing on checking the business legitimacy of ElasticSearch CRs by defining a number of validations that check the legitimacy of the parameters of the CRs that are about to perform subsequent operations.

type validation func(*Elasticsearch) field.ErrorList

// validations are the validation funcs that apply to creates or updates
var validations = []validation{
	noUnknownFields,
	validName,
	hasMaster,
	supportedVersion,
	validSanIP,
}

type updateValidation func(*Elasticsearch, *Elasticsearch) field.ErrorList

// updateValidations are the validation funcs that only apply to updates
var updateValidations = []updateValidation{
	noDowngrades,
	validUpgradePath,
	pvcModification,
}

Once the ES CR legitimacy check is passed, the real Reconcile logic begins.

ES CR legitimacy check

I have divided the subsequent Driver operations into three parts.

Reconcile Kubernetes Resource
Reconcile ElasticSearch Cluster Business Config & Resource
Reconcile Node Spec

The first step is to clean up the mismatched Kubernetes resources, then check and create the Script ConfigMap, and the two Services.

ElasticSearch will use two services, which are created and corrected in this step.

TransportService: headless service, used by the es cluster zen discovery
ExternalService: L4 load balancing for es data nodes

$ k -n elastic-stack get svc
NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
elasticsearch-es-http        ClusterIP   10.96.42.27     <none>        9200/TCP   103d
elasticsearch-es-transport   ClusterIP   None            <none>        9300/TCP   103d

Script ConfigMap is an operation that surprised me, because ES Cluster is stateful, so there is part of the startup initialization and downtime wrap-up. Operator generates the relevant scripts and mounts them to the Pod via ConfigMap and executes them in the Pod’s Lifecycle hook. The Operator renders three scripts, which are also self-explanatory in their naming:

readiness-probe-script.sh
pre-stop-hook-script.sh
prepare-fs.sh

After the K8s resources are created, other dependencies needed for the ES cluster to run, such as CAs and certificates, user and permission profiles, seed host configuration, etc., are created with the appropriate ConfigMap or Secret and are waiting to be injected into the Pod at startup.

In addition, the Operator also initializes the Observer here, which is a component that periodically polls the ES state and caches the latest state of the current Cluster, which is also a disguised implementation of Cluster Stat Watch, as will be explained later.

Once these startup dependencies are ready, all that remains is to create the specific resources to try to pull the Pod up.

Operator also initializes the Observer here

Formal creation and correction of ES resources is done in two phases, with the watershed being the readiness of the ES Cluster (whether the ES cluster is accessible via Service).

The first phase starts with a construction security check.

the local cache of resource objects meets expectations
whether the StatefulSet and Pods are in order (number of Generations and Pods)

Then the expected StatefulSet & Service resources are constructed according to the CR and the subsequent operation is to try to approximate the final state constructed here.

For the resources described in the end-state, the Operator will create a limited flow, which is a bit more complicated here, but the basic process is to gradually modify the number of copies of the StatefulSet until it reaches the expectation.

If there is an old Pod that needs to be updated, the Pod will be deleted by a simple and effective delete po to force the update. This is the end of the first phase, and the associated K8s resources are basically created.

However, the creation of the ES cluster is not yet complete. Once the Operator can access the ES cluster through the http client, the second phase of creation is performed.

The first step is to adjust the Zen Discovery configuration based on the current Master count and the Voting-related configuration.

Later on, we will scale down and roll upgrade, but the creation of the cluster is complete.

Rolling Upgrades

Since ElasticSearch is a stateful application like a database, I am interested in ES cluster upgrades and subsequent lifecycle maintenance. In Reconcile Node Specs, Scale Up is relatively simple to do, thanks to ES’s domain-based self-discovery via Zen, so new Pods are automatically added to the cluster when they are added to Endpoints.

Rolling Upgrades

However, since each node maintains part of the shard, node offline or node upgrade will involve the handling of shard data.

func HandleDownscale(
	downscaleCtx downscaleContext,
	expectedStatefulSets sset.StatefulSetList,
	actualStatefulSets sset.StatefulSetList,
) *reconciler.Results {
	results := &reconciler.Results{}

	// make sure we only downscale nodes we're allowed to
	downscaleState, err := newDownscaleState(downscaleCtx.k8sClient, downscaleCtx.es)
	if err != nil {
		return results.WithError(err)
	}

	// compute the list of StatefulSet downscales and deletions to perform
	downscales, deletions := calculateDownscales(*downscaleState, expectedStatefulSets, actualStatefulSets)

	// remove actual StatefulSets that should not exist anymore (already downscaled to 0 in the past)
	// this is safe thanks to expectations: we're sure 0 actual replicas means 0 corresponding pods exist
	if err := deleteStatefulSets(deletions, downscaleCtx.k8sClient, downscaleCtx.es); err != nil {
		return results.WithError(err)
	}

	// migrate data away from nodes that should be removed
	// if leavingNodes is empty, it clears any existing settings
	leavingNodes := leavingNodeNames(downscales)
	if err := migration.MigrateData(downscaleCtx.parentCtx, downscaleCtx.es, downscaleCtx.esClient, leavingNodes); err != nil {
		return results.WithError(err)
	}

	for _, downscale := range downscales {
		// attempt the StatefulSet downscale (may or may not remove nodes)
		requeue, err := attemptDownscale(downscaleCtx, downscale, actualStatefulSets)
		if err != nil {
			return results.WithError(err)
		}
		if requeue {
			// retry downscaling this statefulset later
			results.WithResult(defaultRequeue)
		}
	}

	return results
}

The logic of Scale Down, or downline nodes, is not complicated and still involves calculating the difference between the expected and current. Determine to what amount the StatefuleSet should adjust the replica.

If the replica is zero, the StatefulSet is deleted directly, if not, the node downs are started.

The first step is to calculate which Nodes need to be taken offline, and then trigger the reallocation of shards through the setting api to exclude the Nodes that will be taken offline.

Finally, it checks if the shard in the Node is cleared, and if not, it requeue for the next processing, and if it is cleared, it starts the real update replica operation.

The first step is to calculate the old and new resources and clear the old ones. After the clearing is done, ShardsAllocation is opened via ES Client to ensure the recovery of shards in the Cluster.

Watch

As mentioned above, the ElasticSearch Operator has a built-in Observer module that implements Watch for ES cluster state by polling.

ElasticSearch Operator has a built-in Observer module

ObserverManager manages several Observer, each ES Cluster has a single instance of Observer and polls the state of ES Cluster regularly. If the state changes, it will trigger the registered listeners.

There is only one listener implemented, healthChangeListener, which is very simple, it is to send an event to the chan when it finds a state change, and the cluster health has changed.

// healthChangeListener returns an OnObservation listener that feeds a generic
// event when a cluster's observed health has changed.
func healthChangeListener(reconciliation chan event.GenericEvent) OnObservation {
	return func(cluster types.NamespacedName, previous State, new State) {
		// no-op if health hasn't change
		if !hasHealthChanged(previous, new) {
			return
		}

		// trigger a reconciliation event for that cluster
		evt := event.GenericEvent{
			Meta: &metav1.ObjectMeta{
				Namespace: cluster.Namespace,
				Name:      cluster.Name,
			},
		}
		reconciliation <- evt
	}
}

The chan is related to the Watch capability provided by contoller-runtime, which triggers the Reconcile process started by the Operator when an event is posted. This enables the discovery of a change in the business state and the continuation of the CR to the Operator for correction.

// Controller implements a Kubernetes API.  A Controller manages a work queue fed reconcile.Requests
// from source.Sources.  Work is performed through the reconcile.Reconciler for each enqueued item.
// Work typically is reads and writes Kubernetes objects to make the system state match the state specified
// in the object Spec.
type Controller interface {
	// Reconciler is called to reconcile an object by Namespace/Name
	reconcile.Reconciler

	// Watch takes events provided by a Source and uses the EventHandler to
	// enqueue reconcile.Requests in response to the events.
	//
	// Watch may be provided one or more Predicates to filter events before
	// they are given to the EventHandler.  Events will be passed to the
	// EventHandler if all provided Predicates evaluate to true.
	Watch(src source.Source, eventhandler handler.EventHandler, predicates ...predicate.Predicate) error

	// Start starts the controller.  Start blocks until stop is closed or a
	// controller has an error starting.
	Start(stop <-chan struct{}) error
}

Operator’s License Management

ElasticSearch is a commercially licensed software, and the license management in Operator really gives me a new understanding of App On K8s license management.

At the end of last year, I was involved in the development of a K8s-based system, and I was confused about how to manage the license of a “cloud operating system” like K8s, and ES Operator gave me a concrete solution.

The first is the structure of the license, Operator defines two kinds of licenses, one is the license provided to ES Cluster, and this model will be applied to the ES cluster eventually.

// License models the Elasticsearch license applied to a cluster. Signature will be empty on reads. IssueDate,  ExpiryTime and Status can be empty on writes.
type License struct {
	Status             string     `json:"status,omitempty"`
	UID                string     `json:"uid"`
	Type               string     `json:"type"`
	IssueDate          *time.Time `json:"issue_date,omitempty"`
	IssueDateInMillis  int64      `json:"issue_date_in_millis"`
	ExpiryDate         *time.Time `json:"expiry_date,omitempty"`
	ExpiryDateInMillis int64      `json:"expiry_date_in_millis"`
	MaxNodes           int        `json:"max_nodes,omitempty"`
	MaxResourceUnits   int        `json:"max_resource_units,omitempty"`
	IssuedTo           string     `json:"issued_to"`
	Issuer             string     `json:"issuer"`
	StartDateInMillis  int64      `json:"start_date_in_millis"`
	Signature          string     `json:"signature,omitempty"`
}

The other is the License structure that is managed by the Operator, which performs verification and logical processing based on these models.

type ElasticsearchLicense struct {
	License client.License `json:"license"`
}

type EnterpriseLicense struct {
	License LicenseSpec `json:"license"`
}

type LicenseSpec struct {
	Status             string                 `json:"status,omitempty"`
	UID                string                 `json:"uid"`
	Type               OperatorLicenseType    `json:"type"`
	IssueDate          *time.Time             `json:"issue_date,omitempty"`
	IssueDateInMillis  int64                  `json:"issue_date_in_millis"`
	ExpiryDate         *time.Time             `json:"expiry_date,omitempty"`
	ExpiryDateInMillis int64                  `json:"expiry_date_in_millis"`
	MaxInstances       int                    `json:"max_instances,omitempty"`
	MaxResourceUnits   int                    `json:"max_resource_units,omitempty"`
	IssuedTo           string                 `json:"issued_to"`
	Issuer             string                 `json:"issuer"`
	StartDateInMillis  int64                  `json:"start_date_in_millis"`
	Signature          string                 `json:"signature,omitempty"`
	ClusterLicenses    []ElasticsearchLicense `json:"cluster_licenses"`
	Version            int                    // not marshalled but part of the signature
}

License validation and use

The Operator’s License is simple but adequate (probably legal enough), and is done by the License Controller and ElasticSearch Controller together.

The License Controller watches the ElasticSearch CR, and after receiving a new event, it looks for a Secret containing a License under the same Namespace as the Operator, and looks for an available License based on the expiration time, ES version, and other information.

Then, using the public key injected at the compilation stage, the License is checked for signature, and if it passes, a specific Secret (Cluster Name with a fixed suffix) containing the License is created for the ElasticSearch CR.

License validation and use

The ElasticSearch Controller is the main controller that manages the life cycle of ElasticSearch and determines if the ES Cluster is ready after receiving events from the CR (Http requests can be made through the Service). If it is ready, it will look for the Secret containing the License according to the name convention, and if it exists, it will update the License through the Http Client.

Summary

As a stateful application, ElasticSearch Operator not only manages K8s In addition to managing K8s resources, the ElasticSearch Operator also uses the ES Client to complete lifecycle management through a babysitting service. This is a clever design, but it relies heavily on the ES Cluster’s own self-management capabilities (e.g., rescheduling of data slices, self-discovery, etc.).

If the stateful application that needs to be managed does not have such perfect self-management capabilities, each correction operation will require multiple requeue reconcile to complete, which will inevitably make the recovery time long. For stateful applications, the longer the recovery time (downtime), the more damage is done. Perhaps it is a better direction to separate instance management (Pod management), and business management (application configuration and data recovery, etc.).

Table of Contents

ElasticSearch Operator

Installation

CRD

ElasticSearch Cluster Demo

Operator management of ElasticSearch

Cluster creation

Rolling Upgrades

Watch

Operator’s License Management

License validation and use

Summary