Log collection with multi-line logs is always a headache, developers are not willing to output logs as JSON, so they have to re-structure the logs when collecting them.

Since log collectors are implemented in different ways and standards, how to handle multi-line logs will be different for different collectors. For example, if we use Fluentd as our log collector, we can use the multiline parser to handle multi-line logs.

The multiline parser uses the formatN and format_firstline parameters to parse the logs. format_firstline is used to detect the starting line of the multiline log. formatN, where N is in the range [1..20], is a list of Regexp formats for multiline logs.

## Test data

For example, we now have multiple lines of log data as shown below.

  1 2 3 4 5 6 7 8 9 10 11  2022-06-20 19:32:07.264 DEBUG 7 [TID:bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125] --- [ scheduling-4] o.s.d.redis.core.RedisConnectionUtils : Closing Redis Connection. 2022-06-20 19:32:07.264 DEBUG 7 [TID:bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125] --- [ scheduling-4] io.lettuce.core.RedisChannelHandler : dispatching command AsyncCommand [type=DEL, output=IntegerOutput [output=null, error='null'], commandType=io.lettuce.core.protocol.Command] 2022-06-20 17:28:27.871 DEBUG 6 [TID:N/A] --- [ main] o.h.l.p.build.spi.LoadPlanTreePrinter : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid) - Returns - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=, path=com.xxxx.entity.ScheduledLeadsInvalid) - QuerySpaces - EntityQuerySpaceImpl(uid=, entity=com.xxxx.entity.ScheduledLeadsInvalid) - SQL table alias mapping - scheduledl0_ - alias suffix - 0_ - suffixed key columns - {id1_51_0_} 2022-06-20 19:32:47.062 DEBUG 7 [TID:N/A] --- [nection-cleaner] h.i.c.PoolingHttpClientConnectionManager : Closing connections idle longer than 60000 MILLISECONDS 

First create a fluentd directory, create the etc directory for the fluentd configuration file and the logs directory for the logs, and save the above test logs in the logs/test.log file.

 1 2 3 4  $mkdir fluentd$ cd fluentd # Create the config file etc directory where fluentd is stored and the logs directory where logs are stored. $mkdir -p etc logs  ## General parsing Then create a fluentd configuration file etcd/fluentd_basic.conf for parsing the logs, with the following contents.   1 2 3 4 5 6 7 8 9 10 11 12 13 14   @type tail path /fluentd/logs/*.log pos_file /fluentd/logs/test.log.pos tag test.logs read_from_head true @type regexp expression /^(?[^ ]* [^ ]*) (?[^\s]+) (?[^s+]+) $TID:(?[,a-z0-9A-Z./]+)$ --- $(?.*)$ (?[\s\S]*)/ @type stdout  Then we use the docker image to start fluentd to parse our logs.   1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56  $ docker run --rm -v $(pwd)/etc:/fluentd/etc -v$(pwd)/logs:/fluentd/logs fluent/fluentd:v1.14-1 -c /fluentd/etc/fluentd_basic.conf -v fluentd -c /fluentd/etc/fluentd_basic.conf -v 2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: parsing config file is succeeded path="/fluentd/etc/fluentd_basic.conf" 2022-06-20 12:31:17 +0000 [info]: fluent/log.rb:330:info: gem 'fluentd' version '1.14.3' 2022-06-20 12:31:17 +0000 [warn]: fluent/log.rb:351:warn: define to capture fluentd logs in top level is deprecated. Use

From the above parsing results, we can see that some of the regular expressions do not match, and some can be parsed normally, for example, the following log is the result of parsing out the previous line of log.

 1  {"timestamp":"2022-06-20 19:32:07.264","level":"DEBUG","pid":"7","tid":"bb0e9b6d1d704755a93ea1529265bb99.68.16557246000000125","thread":" scheduling-4","message":"o.s.d.redis.core.RedisConnectionUtils : Closing Redis Connection."} 

And what doesn’t match properly is a multi-line log. fluentd will treat each log line as a separate line, which is obviously not what we expect.

## Multi-line parser

What we want is to treat multi-line logs as one line, so here we need to use the multiline parser. Create a new configuration file etc/fluentd_multline.conf for multi-line log processing, with the following content.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15   @type tail path /fluentd/logs/*.log pos_file /fluentd/logs/test.log.pos tag test.logs read_from_head true @type multiline format_firstline /\d{4}-\d{1,2}-\d{1,2}/ format1 /^(?[^ ]* [^ ]*) (?[^\s]+) (?[^s+]+) $TID:(?[,a-z0-9A-Z./]+)$ --- $(?.*)$ (?[\s\S]*)/ @type stdout 

Here we use format_firstline /\d{4}-\d{1,2}-\d{1,2}/ to match the beginning of each log line, format1 is used to parse the first log line, if you have more data to match, then you can continue to configure the matching rules for the second line format2 etc. Restart fluentd with this configuration above.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  docker run --rm -v $(pwd)/etc:/fluentd/etc -v$(pwd)/logs:/fluentd/logs fluent/fluentd:v1.14-1 -c /fluentd/etc/fluentd_multline.conf -v fluentd -c /fluentd/etc/fluentd_multline.conf -v 2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: parsing config file is succeeded path="/fluentd/etc/fluentd_multline.conf" 2022-06-20 12:41:58 +0000 [info]: fluent/log.rb:330:info: gem 'fluentd' version '1.14.3' 2022-06-20 12:41:58 +0000 [warn]: fluent/log.rb:351:warn: define to capture fluentd logs in top level is deprecated. Use

You can see that the logs obtained are now normal, and the previous multi-line logs have been parsed into a single line as we expected.

 1  {"timestamp":"2022-06-20 17:28:27.871","level":"DEBUG","pid":"6","tid":"N/A","thread":" main","message":"o.h.l.p.build.spi.LoadPlanTreePrinter : LoadPlan(entity=com.xxxx.entity.ScheduledLeadsInvalid)\n - Returns\n - EntityReturnImpl(entity=com.xxxx.entity.ScheduledLeadsInvalid, querySpaceUid=, path=com.xxxx.entity.ScheduledLeadsInvalid)\n - QuerySpaces\n - EntityQuerySpaceImpl(uid=, entity=com.xxxx.entity.ScheduledLeadsInvalid)\n - SQL table alias mapping - scheduledl0_\n - alias suffix - 0_\n - suffixed key columns - {id1_51_0_}"} 

Of course, the whole process is not complicated, the only trouble is that we need to “write regular expressions “ to match the logs, which is probably the problem that most people have a hard time~