Loki, the latest open source project from the Grafana Labs team, is a horizontally scalable, high-availability, multi-tenant log aggregation system. It is designed to be very cost effective and easy to use because it does not index log content, but rather configures a set of tags for each log stream. Project inspired by Prometheus , the official description is: Like Prometheus, but for logs , similar to the Prometheus logging system.

Overview

Unlike other logging systems, Loki only indexes your log metadata tags (like Prometheus’ tags), and does not index the raw log data in full text. The log data itself is then compressed and stored as chunks (blocks) on an object store (like S3 or GCS) or even a local filesystem. A small index and highly compressed chunks can greatly simplify operations and reduce the cost of using Loki.

A more detailed version of this document can be found in the chapter introduction of Loki Architecture.

Multi-tenant

Loki supports a multi-tenant model where the data is completely separated between tenants. Multi-tenancy is implemented with a tenant ID (a string generated with numeric letters). When multi-tenancy mode is disabled, a false tenant ID is generated internally for all requests.

Operation Mode

Loki can be run locally on a small scale or horizontally, and Loki comes with a single process mode that allows you to run all the microservices you need in a single process. Single process mode is ideal for testing Loki or for small runs. For horizontal scaling, Loki’s microservices can be broken into separate processes, allowing them to scale independently.

Components

Distributor

The dispatcher service is responsible for processing logs written by client. Essentially it is the first stop in the log data writing path. Once the distributor receives the log data, it splits them into batches and sends them to multiple collectors in parallel.

The distributor communicates through gPRC and collector. They are stateless, so we can scale them up and down according to the actual needs.

Hashing

The allocator uses a combination of a consistent hash and a configurable replication factor to determine which collector services should receive log data.

The hash is generated based on log tags and tenant IDs.

A hash ring stored in Consul is used to implement the consistency hash; all collectors register their own set of Token into the hash ring. The allocator then finds the Token that best matches the hash value of the log and sends the data to the holder of that Token.

Consistency

Since all allocators share the same hash ring, a write request can be sent to any allocator.

To ensure consistency in query results, Loki uses Dynamo way quorum consistency on reads and writes. This means that the allocator will wait for at least half of the collectors to respond before sending samples to the user and then responding to the user.

Ingester

The collector service is responsible for writing log data to the backend of the long-term storage (DynamoDB, S3, Cassandra, etc.).

The collector verifies that the collected logs are not out of order. When the collector receives a log line that is not in the expected order, the log line is rejected and an error is returned to the user. For more information about this, see the Timestamp Ordering section.

The collector verifies that the received log lines are received in increasing order of timestamp (i.e., each log has a later timestamp than the previous log). When the collector receives logs not in this order, the log lines are rejected and an error is returned.

Each unique tag set of data is constructed into chunks in memory, and then they are stored in the back-end storage.

If a chunks process crashes or suddenly hangs, all data that has not been flushed to storage will be lost. loki is typically configured with multiple copies (typically 3) to reduce this risk.

Timestamp sorting

Generally all log rows pushed to Loki must have a newer timestamp than the previously received row. However there may be cases where multiple log lines have the same nanosecond-level timestamp, which can be handled as follows.

  • If the incoming line exactly matches the previously received line (both timestamp and log text match), the incoming line is considered an exact duplicate and will be ignored.
  • If the timestamp of the incoming line is the same as the timestamp of the previous line, but the log text is not the same, then the line log will be received. This means that it is possible to have two different log lines for the same timestamp.

Handoff (handoff)

By default, when a collector shuts down and a view leaves the hash ring, it will wait to see if a new collector view comes in before it flushes and tries to initiate a handoff. The handoff will transfer all Tokens and in-memory chunks owned by the leaving collector to the new collector.

This process is done to avoid flushing all chunks at shutdown, which is a slower and more time-consuming process.

File system support

The collector supports writing to the file system via BoltDB, but this only works in single process mode, because the querier needs access to the same backend storage. Also BoltDB only allows one process to lock the DB for a given amount of time.

Querier

The Queryer service is responsible for processing the LogQL query statement to evaluate the log data stored in the long-term storage.

It first tries to query all collectors for in-memory data before returning to the back-end storage to load the data.

Front-end queries

This service is an optional component that comes in front of a set of queriers to take care of scheduling requests fairly among them, parallelizing them as much as possible and caching the requests.

Chunk

The block store is Loki’s long-term data store designed to support interactive queries and continuous writes without background maintenance tasks. It consists of the following components.

  • Block Index which can be supported by DynamoDB, Bigtable or Cassandra.
  • KV storage for the block data itself, which can be DynamoDB, Bigtable, Cassandra, or on top of that, an object store such as S3.

Unlike the other core components of Loki, block storage is not a separate service, task, or process, but a library embedded in the collectors and queriers that need access to Loki data.

The block store relies on a unified “NoSQL” storage (DynamoDB, Bigtable, and Cassandra) interface that can be used to support block store indexes. The interface assumes that an index is a collection of the following keys.

  • Hash KEY - This is required for all reads and writes.
  • RANGE KEY - This is required for writes and can be omitted for reads, and can be queried by prefix or range.

The interfaces in these databases supported above work a little differently.

  • DynamoDB supports range and hash KEYs. so index entries are modeled directly as DynamoDB data, hash KEYs are distributed KEYs, and ranges are range KEYs.
  • For Bigtable and Cassandra, index entries are modeled as individual column values. A hash KEY becomes a row KEY and a range KEY becomes a column KEY.

Some patterns are used to map the set of matchers and tags used for reads and writes to block storage to the appropriate operations for indexes. New patterns will be added as Loki evolves, mainly to better balance and improve query performance.

Compare to other logging systems

EFK (Elasticsearch, Fluentd, Kibana) is used to fetch, visualize and query logs from various sources.

The data in Elasticsearch is stored on disk as unstructured JSON objects. The keys of each object and the contents of each key are indexed. The JSON objects can then be used to define queries (called Query DSL) or to query data through the Lucene query language.

In contrast, Loki in single binary mode can store data on disk, but in horizontally scalable mode, data storage needs to be in a cloud storage system such as S3, GCS, or Cassandra. logs are stored as plain text and tagged with a set of tagged names and values, where only the tags are indexed. This tradeoff makes it cheaper to operate than full indexing. logs in Loki are queried using LogQL. Due to this design trade-off, LogQL queries that filter based on content (i.e., text within log lines) require loading all blocks within the search window that match the tags defined in the query.

Fluentd is typically used to collect logs and forward them to Elasticsearch. fluentd is known as a data collector, which can collect logs from many sources, process them, and then forward them to one or more targets.

Promtail, by contrast, is tailored for Loki. Its primary mode of operation is to discover log files stored on disk and forward them to Loki associated with a set of tags. Promtail can do service discovery for Kubernetes Pods running on the same node, act as a Docker log driver, read logs from a specified folder, and fetch systemd logs continuously.

Loki represents logs by a set of tags in a similar way to how Prometheus represents metrics. When deployed in an environment with Prometheus, the logs from Protail typically have the same tags as your application metrics because the same service discovery mechanism is used. Having the same level of logs and metrics allows users to seamlessly switch between metrics and logs to help with root cause analysis.

Kibana is used to visualize and search Elasticsearch data and is very powerful when doing analysis on this data. kibana provides many visualization tools to do data analysis, such as maps, machine learning for anomaly detection, and relationship graphs. Alerts can also be configured to notify users when something unexpected happens.

In contrast, Grafana is specifically tailored to time series data from data sources such as Prometheus and Loki. Dashboards can be set up to visualize metrics (with logging support coming soon), and data can also be queried on an ad hoc basis using exploration views. Like Kibana, Grafana also supports alerts based on your metrics.

Installation

The official recommendation is to use Tanka for installation, which is a re-implemented version of Ksonnect for production deployments within Grafana, but Tanka is currently not used much and few people are familiar with it, so we won’t introduce this way here. The following 3 methods are mainly introduced.

Installing Loki with Helm

Prerequisites

First you need to make sure that you have deployed a Kubernetes cluster and installed and configured the Helm client, then add Loki’s chart repository:.

1
$ helm repo add loki [https://grafana.github.io/loki/charts](https://grafana.github.io/loki/charts)

The chart repository can be updated using the following command.

1
$ helm repo update

Deploy Loki

Deploy with default configuration

1
$ helm upgrade --install loki loki/loki-stack

Specify namespace

1
$ helm upgrade --install loki --namespace=loki loki/loki

Specify configuration

1
$ helm upgrade --install loki loki/loki --set "key1=val1,key2=val2,..."

Deploy Loki tool stack (Loki, Promtail, Grafana, Prometheus)

1
$ helm upgrade --install loki loki/loki-stack --set grafana.enabled=true,prometheus.enabled=true,prometheus.alertmanager.persistentVolume.enabled=false,prometheus.server.persistentVolume.enabled=false

Deploy Loki tool stack (Loki, Promtail, Grafana, Prometheus)

1
2
$ helm upgrade --install loki loki/loki-stack \
    --set fluent-bit.enabled=true,promtail.enabled=false,grafana.enabled=true,prometheus.enabled=true,prometheus.alertmanager.persistentVolume.enabled=false,prometheus.server.persistentVolume.enabled=false

Deploying Grafana

To install Grafana to a Kubernetes cluster using Helm, you can use the command shown below.

1
$ helm install stable/grafana -n loki-grafana

To obtain the Grafana administrator password, you can use the command shown below.

1
$ kubectl get secret --namespace <YOUR-NAMESPACE> loki-grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

To access the Grafana UI page, you can use the following command.

1
$ kubectl port-forward --namespace <YOUR-NAMESPACE> service/loki-grafana 3000:80

Then open http://localhost:3000 in your browser and log in with admin and the password output above. Then follow the prompts to add the Loki data source with the Loki address http://loki:3100.

Accessing Loki using HTTPS Ingress

If Loki and Promtail are deployed on different clusters, you can add an Ingress object in front of Loki, which can be accessed via HTTPS by adding a certificate and enabling Basic Auth authentication on the Ingress for security purposes.

In Promtail, set the following values to use HTTPS and Basic Auth authentication for communication.

1
2
3
4
loki:
  serviceScheme: https
  user: user
  password: pass

An example of Ingress’ Helm template is shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: {{ .Values.ingress.class }}
    ingress.kubernetes.io/auth-type: "basic"ingress.kubernetes.io/auth-secret: {{ .Values.ingress.basic.secret }}
  name: loki
spec:
  rules:
  - host: {{ .Values.ingress.host }}
    http:
      paths:
      - backend:
          serviceName: loki
          servicePort: 3100
  tls:
  - secretName: {{ .Values.ingress.cert }}
    hosts:
    - {{ .Values.ingress.host }}

Installing Loki with Docker

We can use Docker or Docker Compose to install Loki for evaluation, testing or development of Lok, but for production environments we recommend using Tanka or Helm method.

Prerequisites

  • Docker
  • Docker Compose (optional, installation is only required if you use the Docker Compose method)

Installing with Docker

Installation with Docker

Linux

After execution, the loki-config.yaml and promtail-config.yaml configuration files will be downloaded to the directory we are using, and the Docker container will use these configuration files to run Loki and Promtail.

1
2
3
4
$ wget https://raw.githubusercontent.com/grafana/loki/v1.5.0/cmd/loki/loki-local-config.yaml -O loki-config.yaml
$ docker run -v $(pwd):/mnt/config -p 3100:3100 grafana/loki:1.5.0 -config.file=/mnt/config/loki-config.yaml
$ wget https://raw.githubusercontent.com/grafana/loki/v1.5.0/cmd/promtail/promtail-docker-config.yaml -O promtail-config.yaml
$ docker run -v $(pwd):/mnt/config -v /var/log:/var/log grafana/promtail:1.5.0 -config.file=/mnt/config/promtail-config.yaml

Installing with Docker Compose

1
2
$ wget https://raw.githubusercontent.com/grafana/loki/v1.5.0/production/docker-compose.yaml -O docker-compose.yaml
$ docker-compose -f docker-compose.yaml up

Local installation of Loki

Binary files

Each release includes binaries for Loki, which can be found on GitHub’s Release page.

openSUSE Linux Installer

The community provides Loki packages for openSUSE Linux, which can be installed using the following.

Manual build

Prerequisite

  • Go version 1.13+
  • Make
  • Docker (for updating protobuf files and yacc files)

** Build**

Clone Loki code to $GOPATH/src/github.com/grafana/loki path.

1
$ git clone [https://github.com/grafana/loki](https://github.com/grafana/loki) $GOPATH/src/github.com/grafana/loki

Then switch to the code directory and execute the make loki command.

1
2
3
4
$ cd $GOPATH/src/github.com/grafana/loki
$ make loki

# ./cmd/loki/loki 目录下面将会生成最终的二进制文件。

Getting Started with Loki

Loki configuration in Grafana

Grafana has built-in support for Loki in versions 6.0 and above. It is recommended that you use 6.3 or later to have access to the new LogQL features.

  • Log in to your Grafana instance. If this is the first time you are running Grafana, the username and password default to admin.
  • In Grafana, go to Configuration > Data Sources using the icon on the left sidebar.
  • Click the + Add data source button.
  • Select Loki from the list.
  • The Http URL field is the address of your Loki server, for example, it might be http://localhost:3100 when running locally or when running Docker with port mapping. When running with docker-compose or Kubernetes, the address is likely to be https://loki:3100.
  • To view the logs, click ``Explore`’’ on the sidebar, select Loki Data Source in the top left drop-down menu, and then use the Logs tab button to filter the log stream.

Querying Loki with LogCLI

If you prefer a command line interface, LogCLI allows users to use LogQL queries against the Loki server.

Installation

Binary (recommended)

The release package downloaded from the Release page contains the logcli binaries.

Source code installation

You can also use golang to compile the source code directly, using the go get command as shown below to get logcli and the binaries will appear in the $GOPATH/bin directory under.

1
$ go get github.com/grafana/loki/cmd/logcli

Usage examples

Assuming you are currently using Grafana Cloud, you need to set the following environment variables.

1
2
3
$ export LOKI_ADDR=https://logs-us-west1.grafana.net
$ export LOKI_USERNAME=<username>
$ export LOKI_PASSWORD=<password>

If you are using Grafana locally, you can point the LogCLI directly to a local instance without a username and password at

1
$ export LOKI_ADDR=http://localhost:3100

Note: If you add a proxy server in front of Loki and configure authentication, you still need to configure the corresponding LOKI_USERNAME and LOKI_PASSWORD data.

Once configured, you can use some of the logcli commands as shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ logcli labels job
https://logs-dev-ops-tools1.grafana.net/api/prom/label/job/values
cortex-ops/consul
cortex-ops/cortex-gw
...

$ logcli query '{job="cortex-ops/consul"}'
https://logs-dev-ops-tools1.grafana.net/api/prom/query?query=%7Bjob%3D%22cortex-ops%2Fconsul%22%7D&limit=30&start=1529928228&end=1529931828&direction=backward&regexp=
Common labels: {job="cortex-ops/consul", namespace="cortex-ops"}
2018-06-25T12:52:09Z {instance="consul-8576459955-pl75w"} 2018/06/25 12:52:09 [INFO] raft: Snapshot to 475409 complete
2018-06-25T12:52:09Z {instance="consul-8576459955-pl75w"} 2018/06/25 12:52:09 [INFO] raft: Compacting logs from 456973 to 465169
...

$ logcli series -q --match='{namespace="loki",container_name="loki"}'
{app="loki", container_name="loki", controller_revision_hash="loki-57c9df47f4", filename="/var/log/pods/loki_loki-0_8ed03ded-bacb-4b13-a6fe-53a445a15887/loki/0.log", instance="loki-0", job="loki/loki", name="loki", namespace="loki", release="loki", statefulset_kubernetes_io_pod_name="loki-0", stream="stderr"}

Batch Search

Starting with Loki 1.6.0, logcli sends log queries to Loki in batches.

If you set the query’s -limit parameter (default is 30) to a larger number, say 10000, then logcli will automatically send this request to Loki in batches, with the default batch size being 1000.

Loki has a server-side limit on the maximum number of rows returned in a query (default is 5000). Batch sending allows you to issue requests larger than the server-side limit, as long as the -batch size is smaller than the server limit.

Note that query metadata is printed on stderr for each batch, and this can be stopped by setting the --quiet parameter.

For the configured values will take effect from low to high based on environment variables and command line flags.

Command Details

Detailed information on the use of the logcli command line tool is shown below.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
$ logcli help
usage: logcli [<flags>] <command> [<args> ...]

A command-line for loki.

Flags:
      --help             Show context-sensitive help (also try --help-long and --help-man).
      --version          Show application version.
  -q, --quiet            Suppress query metadata.
      --stats            Show query statistics.
  -o, --output=default   Specify output mode [default, raw, jsonl]. raw suppresses log labels and timestamp.
  -z, --timezone=Local   Specify the timezone to use when formatting output timestamps [Local, UTC].
      --cpuprofile=""    Specify the location for writing a CPU profile.
      --memprofile=""    Specify the location for writing a memory profile.
      --addr="http://localhost:3100"
                         Server address. Can also be set using LOKI_ADDR env var.
      --username=""      Username for HTTP basic auth. Can also be set using LOKI_USERNAME env var.
      --password=""      Password for HTTP basic auth. Can also be set using LOKI_PASSWORD env var.
      --ca-cert=""       Path to the server Certificate Authority. Can also be set using LOKI_CA_CERT_PATH env var.
      --tls-skip-verify  Server certificate TLS skip verify.
      --cert=""          Path to the client certificate. Can also be set using LOKI_CLIENT_CERT_PATH env var.
      --key=""           Path to the client certificate key. Can also be set using LOKI_CLIENT_KEY_PATH env var.
      --org-id=""        adds X-Scope-OrgID to API requests for representing tenant ID. Useful for requesting tenant data when
                         bypassing an auth gateway.

Commands:
  help [<command>...]
    Show help.

  query [<flags>] <query>
    Run a LogQL query.

    The "query" command is useful for querying for logs. Logs can be returned in a few output modes:

      raw: log line
      default: log timestamp + log labels + log line
      jsonl: JSON response from Loki API of log line

    The output of the log can be specified with the "-o" flag, for example, "-o raw" for the raw output format.

    The "query" command will output extra information about the query and its results, such as the API URL, set of common labels,
    and set of excluded labels. This extra information can be suppressed with the --quiet flag.

    While "query" does support metrics queries, its output contains multiple data points between the start and end query time.
    This output is used to build graphs, like what is seen in the Grafana Explore graph view. If you are querying metrics and just
    want the most recent data point (like what is seen in the Grafana Explore table view), then you should use the "instant-query"
    command instead.

  instant-query [<flags>] <query>
    Run an instant LogQL query.

    The "instant-query" command is useful for evaluating a metric query for a single point in time. This is equivalent to the
    Grafana Explore table view; if you want a metrics query that is used to build a Grafana graph, you should use the "query"
    command instead.

    This command does not produce useful output when querying for log lines; you should always use the "query" command when you
    are running log queries.

    For more information about log queries and metric queries, refer to the LogQL documentation:

    https://grafana.com/docs/loki/latest/logql/

  labels [<flags>] [<label>]
    Find values for a given label.

  series [<flags>] <matcher>
    Run series query.

$ logcli help query
usage: logcli query [<flags>] <query>

Run a LogQL query.

The "query" command is useful for querying for logs. Logs can be returned in a few output modes:

  raw: log line
  default: log timestamp + log labels + log line
  jsonl: JSON response from Loki API of log line

The output of the log can be specified with the "-o" flag, for example, "-o raw" for the raw output format.

The "query" command will output extra information about the query and its results, such as the API URL, set of common labels, and
set of excluded labels. This extra information can be suppressed with the --quiet flag.

While "query" does support metrics queries, its output contains multiple data points between the start and end query time. This
output is used to build graphs, like what is seen in the Grafana Explore graph view. If you are querying metrics and just want the
most recent data point (like what is seen in the Grafana Explore table view), then you should use the "instant-query" command
instead.

Flags:
      --help               Show context-sensitive help (also try --help-long and --help-man).
      --version            Show application version.
  -q, --quiet              Suppress query metadata.
      --stats              Show query statistics.
  -o, --output=default     Specify output mode [default, raw, jsonl]. raw suppresses log labels and timestamp.
  -z, --timezone=Local     Specify the timezone to use when formatting output timestamps [Local, UTC].
      --cpuprofile=""      Specify the location for writing a CPU profile.
      --memprofile=""      Specify the location for writing a memory profile.
      --addr="http://localhost:3100"
                           Server address. Can also be set using LOKI_ADDR env var.
      --username=""        Username for HTTP basic auth. Can also be set using LOKI_USERNAME env var.
      --password=""        Password for HTTP basic auth. Can also be set using LOKI_PASSWORD env var.
      --ca-cert=""         Path to the server Certificate Authority. Can also be set using LOKI_CA_CERT_PATH env var.
      --tls-skip-verify    Server certificate TLS skip verify.
      --cert=""            Path to the client certificate. Can also be set using LOKI_CLIENT_CERT_PATH env var.
      --key=""             Path to the client certificate key. Can also be set using LOKI_CLIENT_KEY_PATH env var.
      --org-id=""          adds X-Scope-OrgID to API requests for representing tenant ID. Useful for requesting tenant data when
                           bypassing an auth gateway.
      --limit=30           Limit on number of entries to print.
      --since=1h           Lookback window.
      --from=FROM          Start looking for logs at this absolute time (inclusive).
      --to=TO              Stop looking for logs at this absolute time (exclusive).
      --step=STEP          Query resolution step width, for metric queries. Evaluate the query at the specified step over the time
                           range.
      --interval=INTERVAL  Query interval, for log queries. Return entries at the specified interval, ignoring those between.
                           **This parameter is experimental, please see Issue 1779**.
      --batch=1000         Query batch size to use until 'limit' is reached.
      --forward            Scan forwards through logs.
      --no-labels          Do not print any labels.
      --exclude-label=EXCLUDE-LABEL ...
                           Exclude labels given the provided key during output.
      --include-label=INCLUDE-LABEL ...
                           Include labels given the provided key during output.
      --labels-length=0    Set a fixed padding to labels.
      --store-config=""    Execute the current query using a configured storage from a given Loki configuration file.
  -t, --tail               Tail the logs.
      --delay-for=0        Delay in tailing by number of seconds to accumulate logs for re-ordering.
      --colored-output     Show ouput with colored labels.

Args:
  <query>  eg '{foo="bar",baz=~".*blip"} |~ ".*error.*"'

$ logcli help labels
usage: logcli labels [<flags>] [<label>]

Find values for a given label.

Flags:
      --help             Show context-sensitive help (also try --help-long and --help-man).
      --version          Show application version.
  -q, --quiet            Suppress query metadata.
      --stats            Show query statistics.
  -o, --output=default   Specify output mode [default, raw, jsonl]. raw suppresses log labels and timestamp.
  -z, --timezone=Local   Specify the timezone to use when formatting output timestamps [Local, UTC].
      --cpuprofile=""    Specify the location for writing a CPU profile.
      --memprofile=""    Specify the location for writing a memory profile.
      --addr="http://localhost:3100"
                         Server address. Can also be set using LOKI_ADDR env var.
      --username=""      Username for HTTP basic auth. Can also be set using LOKI_USERNAME env var.
      --password=""      Password for HTTP basic auth. Can also be set using LOKI_PASSWORD env var.
      --ca-cert=""       Path to the server Certificate Authority. Can also be set using LOKI_CA_CERT_PATH env var.
      --tls-skip-verify  Server certificate TLS skip verify.
      --cert=""          Path to the client certificate. Can also be set using LOKI_CLIENT_CERT_PATH env var.
      --key=""           Path to the client certificate key. Can also be set using LOKI_CLIENT_KEY_PATH env var.
      --org-id=""        adds X-Scope-OrgID to API requests for representing tenant ID. Useful for requesting tenant data when
                         bypassing an auth gateway.
      --since=1h         Lookback window.
      --from=FROM        Start looking for labels at this absolute time (inclusive).
      --to=TO            Stop looking for labels at this absolute time (exclusive).

Args:
  [<label>]  The name of the label.

$ logcli help series
usage: logcli series --match=MATCH [<flags>]

Run series query.

Flags:
      --help             Show context-sensitive help (also try --help-long and --help-man).
      --version          Show application version.
  -q, --quiet            Suppress query metadata.
      --stats            Show query statistics.
  -o, --output=default   Specify output mode [default, raw, jsonl]. raw suppresses log labels and timestamp.
  -z, --timezone=Local   Specify the timezone to use when formatting output timestamps [Local, UTC].
      --cpuprofile=""    Specify the location for writing a CPU profile.
      --memprofile=""    Specify the location for writing a memory profile.
      --addr="http://localhost:3100"
                         Server address. Can also be set using LOKI_ADDR env var.
      --username=""      Username for HTTP basic auth. Can also be set using LOKI_USERNAME env var.
      --password=""      Password for HTTP basic auth. Can also be set using LOKI_PASSWORD env var.
      --ca-cert=""       Path to the server Certificate Authority. Can also be set using LOKI_CA_CERT_PATH env var.
      --tls-skip-verify  Server certificate TLS skip verify.
      --cert=""          Path to the client certificate. Can also be set using LOKI_CLIENT_CERT_PATH env var.
      --key=""           Path to the client certificate key. Can also be set using LOKI_CLIENT_KEY_PATH env var.
      --org-id=""        adds X-Scope-OrgID to API requests for representing tenant ID. Useful for requesting tenant data when
                         bypassing an auth gateway.
      --since=1h         Lookback window.
      --from=FROM        Start looking for logs at this absolute time (inclusive).
      --to=TO            Stop looking for logs at this absolute time (exclusive).
      --match=MATCH ...  eg '{foo="bar",baz=~".*blip"}'

Label

Label tags are key-value pairs that can define anything, and we like to call them metadata that describe the log stream. If you are familiar with Prometheus, then you must have some knowledge of Label tags, which are also defined in Loki’s scrape configuration and have the same functionality as Prometheus, these tags are very easy to associate application metrics with log data.

The tags in Loki perform a very important task: they define a stream. More specifically, the combination of each tag key and value defines the stream. If just one tag value changes, this will create a new stream.

If you are familiar with Prometheus, the terminology there is called sequences, and there is an additional dimension in Prometheus: indicator names. this is simplified in Loki, since there are no indicator names, only labels, so it was finally decided to use streams instead of sequences.

Label example

The following example will illustrate the basic use and concepts of Label tags in Loki.

First, take a look at the following example.

1
2
3
4
5
6
7
8
9
scrape_configs:
 - job_name: system
   pipeline_stages:
   static_configs:
   - targets:
      - localhost
     labels:
      job: syslog
      __path__: /var/log/syslog

This configuration will fetch the log file data and add a job=syslog tag, which we can query like this.

1
{job="syslog"}

This will create a stream in Loki. Now let’s add some more task configurations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
scrape_configs:
 - job_name: system
   pipeline_stages:
   static_configs:
   - targets:
      - localhost
     labels:
      job: syslog
      __path__: /var/log/syslog
 - job_name: system
   pipeline_stages:
   static_configs:
   - targets:
      - localhost
     labels:
      job: apache
      __path__: /var/log/apache.log

This will create a stream in Loki. We can query these streams in several ways.

1
2
3
{job="apache"} <- 显示 job 标签为 apache 的日志
{job="syslog"} <- 显示 job 标签为 syslog 的日志
{job=~"apache|syslog"} <- 显示 job 标签为 apache 或者 syslog 的日志

The last way we use is a regex tag matcher to get logs with job tag values of apache or syslog. Next we see how to use the additional tags.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
scrape_configs:
 - job_name: system
   pipeline_stages:
   static_configs:
   - targets:
      - localhost
     labels:
      job: syslog
      env: dev
      __path__: /var/log/syslog
 - job_name: system
   pipeline_stages:
   static_configs:
   - targets:
      - localhost
     labels:
      job: apache
      env: dev
      __path__: /var/log/apache.log

To get the logs of these two tasks you can use the following instead of the regex approach.

1
{env="dev"} <- 将返回所有带有 env=dev 标签的日志

By using one tag it is possible to query many log streams, and by combining several different tags it is possible to create very flexible log queries.

Label tags are indices of Loki log data, and they are used to find the compressed log content, which is stored separately as blocks. Each unique combination of labels and values defines a stream , a stream of logs that are batched, compressed, and stored as blocks.

Cardinality

The previous example uses a statically defined Label tag with a single value; however, there are ways to define the label dynamically. For example, we have log data like the following.

1
11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

We can use the following to parse this log data.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
- job_name: system
   pipeline_stages:
      - regex:
        expression: "^(?P<ip>\\S+) (?P<identd>\\S+) (?P<user>\\S+) \\[(?P<timestamp>[\\w:/]+\\s[+\\-]\\d{4})\\] \"(?P<action>\\S+)\\s?(?P<path>\\S+)?\\s?(?P<protocol>\\S+)?\" (?P<status_code>\\d{3}|-) (?P<size>\\d+|-)\\s?\"?(?P<referer>[^\"]*)\"?\\s?\"?(?P<useragent>[^\"]*)?\"?$"
    - labels:
        action:
        status_code:
   static_configs:
   - targets:
      - localhost
     labels:
      job: apache
      env: dev
      __path__: /var/log/apache.log

This regex matches each component of the log line and extracts the value of each component into a capture group. Inside the pipeline code, this data is placed into a temporary data structure that allows it to be used for other processing while the log line is being processed (at which point the temporary data is discarded).

From this regex, we then use two of these capture groups to dynamically set two tags based on the content of the log line itself.

1
action (例如 action="GET", action="POST") status_code (例如 status_code="200", status_code="400")

Suppose we have the following lines of log data:

1
2
3
4
11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
11.11.11.12 - frank [25/Jan/2000:14:00:02 -0500] "POST /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
11.11.11.13 - frank [25/Jan/2000:14:00:03 -0500] "GET /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
11.11.11.14 - frank [25/Jan/2000:14:00:04 -0500] "POST /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

Then, after the logs are collected in Loki, they are created as a stream as shown below.

1
2
3
4
{job="apache",env="dev",action="GET",status_code="200"} 11.11.11.11 - frank [25/Jan/2000:14:00:01 -0500] "GET /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
{job="apache",env="dev",action="POST",status_code="200"} 11.11.11.12 - frank [25/Jan/2000:14:00:02 -0500] "POST /1986.js HTTP/1.1" 200 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
{job="apache",env="dev",action="GET",status_code="400"} 11.11.11.13 - frank [25/Jan/2000:14:00:03 -0500] "GET /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"
{job="apache",env="dev",action="POST",status_code="400"} 11.11.11.14 - frank [25/Jan/2000:14:00:04 -0500] "POST /1986.js HTTP/1.1" 400 932 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.1.7) Gecko/20091221 Firefox/3.5.7 GTB6"

These 4 log lines will become 4 separate streams and start populating 4 separate blocks. Any additional log lines that match these tag/value combinations will be added to the existing streams. If another unique label combination comes in (e.g. status_code=“500”) another new stream will be created.

For example, if we set a Label tag for the IP, not only will each request from the user become a unique stream, but each request with a different action or status_code from the same user will get its own stream.

If there are 4 common actions (GET, PUT, POST, DELETE) and 4 common status codes (probably more than 4!) ), this would be 16 streams and 16 separate blocks. Then now multiply that by each user, and if we use IP’s tags, you’ll soon have thousands or tens of thousands of streams.

This Cardinality is so high, it’s enough to hang Loki.

When we talk about Cardinality, we mean the combination of tags and values and the number of streams they create. High Cardinality means using tags with a large range of possible values, such as IP, or a combination that requires other tags, even if they have a small and limited set, such as status_code and action.

High Cardinality causes Loki to build a huge index (💰💰💰💰) and store thousands of tiny blocks into object storage (slow), Loki currently performs very poorly in this configuration and is very uneconomical to run and use.

Loki Performance Optimization

Now that we know that it’s bad to use a lot of tags or tags with a lot of values, how should I query my logs? If none of the data is indexed, won’t the query be really slow?

We see that people using Loki are used to other index-heavy solutions and they just feel that they need to define a lot of tags to be able to query the logs efficiently, after all, many other logging solutions are for indexing, which is the usual way of thinking before.

When using Loki, you may need to forget what you know and see how you can solve this problem with parallelization. the superpower of Loki is to break queries into small chunks and schedule them in parallel, so you can query large amounts of log data in a small amount of time.

Large indexes are very complex and expensive, and typically, the full-text index of your log data is equal to or larger than the size of the log data itself. To query your log data, this index needs to be loaded, probably in memory for performance, which makes it very difficult to scale, and when you collect a large number of logs, your index becomes very large.

Now let’s talk about Loki, indexes are usually an order of magnitude smaller than the amount of logs you’re collecting. So if you do a good job of keeping your streams to a minimum, then the index grows very slowly compared to the logs collected.

Loki will effectively keep your static costs as low as possible (index size and memory requirements and static log storage) and allow query performance to be controlled by horizontal scaling at runtime.

To see how it works, let’s go back to the example above where we query for a specific IP address to access log data. Instead of using a label to store the IP, we instead use a filter expression to query it.

1
{job="apache"} |= "11.11.11.11"

Behind the scenes Loki breaks that query into smaller shards (shards) and opens each chunk (chunk) for the stream whose label matches and starts looking up that IP address.

The size of these shards and the amount of parallelization is configurable and based on the resources you provide. If you want, you can configure shard spacing to 5m, deploy 20 lookups, and process gigabytes of logs in a few seconds. Or you can go even crazier and configure 200 queriers and process terabytes of logs!

This tradeoff between smaller indexes and parallel queries and larger/faster full-text indexes is what makes Loki so cost effective relative to other systems. The cost and complexity of operating a large index is high and usually fixed, whether you are querying it or not, and you are paying for it 24 hours a day.

The advantage of this design is that you can decide how much querying power you want to have, and you can change it on demand. Query performance becomes a function of how much you want to spend. At the same time the data is heavily compressed and stored in low cost object stores like S3 and GCS. this minimizes fixed operational costs while still providing incredibly fast querying capabilities.