Multi-line logs (e.g. exception messages) provide a lot of very valuable information for debugging application problems, and today’s distributed microservices are popular basically collect the logs uniformly, such as the common ELK, EFK and other schemes, but these schemes do not look at multi-line logs as a whole without proper configuration, but each line is treated as a separate line of logs. This is difficult for us to accept.

In this article, we will describe the strategies of some common log collection tools for handling multi-line logs.

JSON

The easiest way to ensure that multiple lines of logging are processed as a single event is to record the log in JSON format, such as the following example of a regular Java daily log.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
# javaApp.log
2019-08-14 14:51:22,299 ERROR [http-nio-8080-exec-8] classOne: Index out of range
java.lang.StringIndexOutOfBoundsException: String index out of range: 18
 at java.lang.String.charAt(String.java:658)
 at com.example.app.loggingApp.classOne.getResult(classOne.java:15)
 at com.example.app.loggingApp.AppController.tester(AppController.java:27)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498)
 at org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190)
 at org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)
[...]

If the above logs are collected directly they will be recognized as multi-line logs, it is much easier to record these logs in JSON format and then introduce the JSON data, for example by using Log4J2 to record them in the following format.

1
2
3
{"@timestamp":"2019-08-14T18:46:04.449+00:00","@version":"1","message":"Index out of range","logger_name":"com.example.app.loggingApp.classOne","thread_name":"http-nio-5000-exec-6","level":"ERROR","level_value":40000,"stack_trace":"java.lang.StringIndexOutOfBoundsException: String index out of range: 18\n\tat java.lang.String.charAt(String.java:658)\n\tat com.example.app.loggingApp.classOne.getResult(classOne.java:15)\n\tat com.example.app.loggingApp.AppController.tester(AppController.java:27)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:498)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:190)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138)\n\tat
[...]
}

This entire log message is contained in a single JSON object summary, which contains the complete exception stack information, most tools support direct parsing of JSON log data, which is the simplest way for the same operation and maintenance is also the most worry-free, but most developers are resistant to using JSON format to record logs ~ ~ ~ ~

Logstash

For those using Logstash, it is not difficult to support multi-line logs, Logstash can parse multi-line logs using a plugin that is configured in the input section of the logging pipeline. For example, the following configuration indicates that Logstash is told to match the timestamp of the ISO8601 format in your log file, and when it matches that timestamp, it collapses everything that did not previously begin with the timestamp into the previous log entry.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
input {
  file {
    path => "/var/app/current/logs/javaApp.log"
    mode => "tail"
    codec => multiline {
      pattern => "^%{TIMESTAMP_ISO8601} "
      negate => true
      what => "previous"
    }
  }
}

Fluentd

Similar to Logstash, Fluentd allows us to use a plugin to handle multi-line logs, and we can configure the plugin to accept one or more regular expressions, as exemplified by the following Python multi-line log

1
2
3
4
5
2019-08-01 18:58:05,898 ERROR:Exception on main handler
Traceback (most recent call last):
  File "python-logger.py", line 9, in make_log
    return word[13]
IndexError: string index out of range

Without the multiline multiline parser, Fluentd will treat each line as a complete log. We can add a multiline parsing rule to the <source> module, which must include a format_firstline parameter to specify what a new log entry starts with, in addition to You can use regular grouping and capturing to parse the attributes in the log, as configured below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
<source>
  @type tail
  path /path/to/pythonApp.log
  tag sample.tag
  <parse>
    @type multiline
    format_firstline /\d{4}-\d{1,2}-\d{1,2}/
    format1 /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/
  </parse>
</source>

In the parsing section we specified the multiline parser using @type multiline, then used format_firstline to specify our rules for the beginning of the multiline log, here we just used a simple regular match date, then specified the matching pattern for the other sections and assigned labels to them, here we split the log into timestamp, level, message fields.

After parsing the rules above, Fluentd will now see each traceback log as a single log.

1
2
3
4
5
{
  "timestamp": "2019-08-01 19:22:14,196",
  "level": "ERROR:",
  "message": "Exception on main handler\nTraceback (most recent call last):\n  File \"python-logger.py\", line 9, in make_log\n    return word[13]\nIndexError: string index out of range"
}

The log has been formatted as JSON and the tag we matched has been set to Key.

There are also several example instructions in the official Fluentd documentation.

Rails Logs

For example, the entered Rails logs look like this.

1
2
3
4
5
Started GET "/users/123/" for 127.0.0.1 at 2013-06-14 12:00:11 +0900
Processing by UsersController#show as HTML
  Parameters: {"user_id"=>"123"}
  Rendered users/show.html.erb within layouts/application (0.3ms)
Completed 200 OK in 4ms (Views: 3.2ms | ActiveRecord: 0.0ms)

We can use the following parsing configuration for multi-line matching.

1
2
3
4
5
6
7
8
9
<parse>
  @type multiline
  format_firstline /^Started/
  format1 /Started (?<method>[^ ]+) "(?<path>[^"]+)" for (?<host>[^ ]+) at (?<time>[^ ]+ [^ ]+ [^ ]+)\n/
  format2 /Processing by (?<controller>[^\u0023]+)\u0023(?<controller_method>[^ ]+) as (?<format>[^ ]+?)\n/
  format3 /(  Parameters: (?<parameters>[^ ]+)\n)?/
  format4 /  Rendered (?<template>[^ ]+) within (?<layout>.+) \([\d\.]+ms\)\n/
  format5 /Completed (?<code>[^ ]+) [^ ]+ in (?<runtime>[\d\.]+)ms \(Views: (?<view_runtime>[\d\.]+)ms \| ActiveRecord: (?<ar_runtime>[\d\.]+)ms\)/
</parse>

The logs obtained after parsing are shown below.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "method"           :"GET",
  "path"             :"/users/123/",
  "host"             :"127.0.0.1",
  "controller"       :"UsersController",
  "controller_method":"show",
  "format"           :"HTML",
  "parameters"       :"{ \"user_id\":\"123\"}",
  ...
}

Java Stack Logging

For example, the log we are now parsing is shown below.

1
2
3
4
5
2013-3-03 14:27:33 [main] INFO  Main - Start
2013-3-03 14:27:33 [main] ERROR Main - Exception
javax.management.RuntimeErrorException: null
    at Main.main(Main.java:16) ~[bin/:na]
2013-3-03 14:27:33 [main] INFO  Main - End

Then we can use the following parsing rule for multi-line matching.

1
2
3
4
5
<parse>
  @type multiline
  format_firstline /\d{4}-\d{1,2}-\d{1,2}/
  format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/
</parse>

The parsed log is.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
  "thread" :"main",
  "level"  :"INFO",
  "message":"  Main - Start"
}
{
  "thread" :"main",
  "level"  :"ERROR",
  "message":" Main - Exception\njavax.management.RuntimeErrorException: null\n    at Main.main(Main.java:16) ~[bin/:na]"
}
{
  "thread" :"main",
  "level"  :"INFO",
  "message":"  Main - End"
}

In addition to format_firstline, which specifies the start line of the multi-line log, the above multi-line parsing configuration also uses format1, format2formatN, where N ranges from 1...20, which is the Regexp format list of the multi-line log. .20, which is a list of Regexp formats for multi-line logs. For the sake of the band, you can split the Regexp pattern into multiple regexpN parameters, and connect these matching patterns to construct a regular match for multi-line patterns.

Fluent Bit

Fluent Bit’s tail input plugin also provides configuration options for handling multi-line logs, for example, let’s now work with the previous Python multi-line logs.

1
2
3
4
5
2019-08-01 18:58:05,898 ERROR:Exception on main handler
Traceback (most recent call last):
  File "python-logger.py", line 9, in make_log
    return word[13]
IndexError: string index out of range

If we don’t use the multiline parser Fluent Bit will also treat each line as a log, and we can configure the multiline log to be structured using Fluent Bit’s built-in regex parser plugin.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
  [PARSER]
      Name         log_date
      Format       regex
      Regex        /\d{4}-\d{1,2}-\d{1,2}/

  [PARSER]
      Name        log_attributes
      Format      regex
      Regex       /(?<timestamp>[^ ]* [^ ]*) (?<level>[^\s]+:)(?<message>[\s\S]*)/

   [INPUT]
      Name              tail
      tag               sample.tag
      path              /path/to/pythonApp.log
      Multiline         On
      Parser_Firstline  log_date
      Parser_1          log_attributes

Similar to Fluentd, the Parser_Firstline parameter specifies the name of the parser that matches the beginning of a multi-line log, although we can include additional parsers to further structure your logs. Here we have configured the Parser_Firstline parameter to first match log lines starting with the ISO8601 date, and then used the Parser_1 parameter to specify a matching pattern to match the rest of the log messages, assigning them the timestamp, level, and message tags.

After the final conversion our logs become in the format shown below.

1
2
3
4
5
{
  "timestamp": "2019-08-01 19:22:14,196",
  "level": "ERROR:",
  "message": "Exception on main handler\nTraceback (most recent call last):\n  File \"python-logger.py\", line 9, in make_log\n    return word[13]\nIndexError: string index out of range"
}