Data Collection

Life of a Fluentd Event

The following article describe a global overview of how events are processed by Fluentd using examples. It covers the complete cycle including Setup, Inputs, Filters, Matches and Labels.

Basic Setup

As described in the articles above, the Setup in the configuration files is the fundamental piece to connect all things together, as it allows to define which Inputs or listeners Fluentd will have and setup common matching rules to route the Event data to a specific Output.

We will use the in_http and the out_stdout plugins as examples to describe the events cycle. The following is a basic definition on the configuration file to specify an http input, for short: we will be listening for HTTP Requests:

<source>
  type http
  port 8888
  bind 0.0.0.0
</source>

The definition specify that a HTTP server will be listening on TCP port 8888. Now lets define a Matching rule and a desired output that will just print to the standard output the data that arrived on each incoming request:

<match test.cycle>
  type stdout
</match>

The Match sets a rule where each Incoming event that arrives with a Tag equals to test_cycle, will match and use the Output plugin type called stdout. At this point we have an Input type, a Match and an Output. Let's test the setup using Curl:

$ curl -i -X POST -d 'json={"action":"login","user":2}' http://localhost:9880/test.cycle
HTTP/1.1 200 OK
Content-type: text/plain
Connection: Keep-Alive
Content-length: 0

On the Fluentd server side the output should look like this:

$ bin/fluentd -c in_http.conf
2015-01-19 12:37:41 -0600 [info]: reading config file path="in_http.conf"
2015-01-19 12:37:41 -0600 [info]: starting fluentd-0.12.3
2015-01-19 12:37:41 -0600 [info]: using configuration file: <ROOT>
  <source>
    type http
    bind 0.0.0.0
    port 8888
  </source>
  <match test.cycle>
    type stdout
  </match>
</ROOT>
2015-01-19 12:37:41 -0600 [info]: adding match pattern="test.cycle" type="stdout"
2015-01-19 12:37:41 -0600 [info]: adding source type="http"
2015-01-19 12:39:57 -0600 test.cycle: {"action":"login","user":2}

Processing Events

When a Setup is defined, the Router Engine already contains several rules to apply for different input data. Internally an Event will to pass through a chain of procedures that may alter it cycle.

NOTE: starting from Fluentd v0.12 two new concepts were introduced to improve the routing behavior: Filters and Labels.

Now we will expand our previous basic example and we will add more steps in our Setup to demonstrate how the Events cycle can be altered. We will do this through the new Filters implementation.

Filters

A Filter aims to behave like a rule to pass or reject an event. The following configuration adds a Filter definition:

<source>
  type http
  port 8888
  bind 0.0.0.0
</source>

<filter test.cycle>
  type grep
  exclude1 action logout
</filter>

<match test.cycle>
  type stdout
</match>

As you can see, the new Filter definition added will be a mandatory step before to pass the control to the Match section. The Filter basically will accept or reject the Event based on it type and rule defined. For our example we want to discard any user logout action, we will care just about the logins. The way to accomplish this, is doing a grep inside the Filter that states that will exclude any message on which action key have the logout string.

From a Terminal, run the following two Curl commands, please note that each one contains a different action value:

$ curl -i -X POST -d 'json={"action":"login","user":2}' http://localhost:8880/test.cycle
HTTP/1.1 200 OK
Content-type: text/plain
Connection: Keep-Alive
Content-length: 0

$ curl -i -X POST -d 'json={"action":"logout","user":2}' http://localhost:8880/test.cycle
HTTP/1.1 200 OK
Content-type: text/plain
Connection: Keep-Alive
Content-length: 0

Now looking at the Fluentd service output we can realize that just the one with the action equals to login just matched. The logout Event was discarded:

$ bin/fluentd -c in_http.conf
2015-01-19 12:37:41 -0600 [info]: reading config file path="in_http.conf"
2015-01-19 12:37:41 -0600 [info]: starting fluentd-0.12.4
2015-01-19 12:37:41 -0600 [info]: using configuration file: <ROOT>
<source>
  type http
  bind 0.0.0.0
  port 9880
</source>
<filter test.cycle>
  type grep
  exclude1 action logout
</filter>
<match test.cycle>
  type stdout
</match>
</ROOT>
2015-01-19 12:37:41 -0600 [info]: adding filter pattern="test.cycle" type="grep"
2015-01-19 12:37:41 -0600 [info]: adding match pattern="test.cycle" type="stdout"
2015-01-19 12:37:41 -0600 [info]: adding source type="http"
2015-01-27 01:27:11 -0600 test.cycle: {"action":"login","user":2}

As you can see, the Events follow a step by step cycle where they are processed in order from top to bottom. The new engine on Fluentd allows to integrate many Filters as necessary, also considering that the configuration file will grow and start getting a bit complex for the readers, a new feature called Labels have been added that aims to solve this possible problem.

Labels

This new implementation called Labels, aims to solve the configuration file complexity and allows to define new Routing sections that do not follow the top to bottom order, instead it acts like linked references. Talking the previous example, we will modify the setup as follows:

<source>
  type http
  bind 0.0.0.0
  port 8880
  @label @STAGING
</source>

<filter test.cycle>
  type grep
  exclude1 action login
</filter>

<label @STAGING>
  <filter test.cycle>
    type grep
    exclude1 action logout
  </filter>

  <match test.cycle>
    type stdout
  </match>
</label>

The new configuration contains a @label key on the source indicating that any further steps takes place on the STAGING Label section. The expectation is that every Event reported on the Source, the Routing engine will continue processing on STAGING, for hence it will skip the old filter definition.

Conclusion

Once the events are reported by the Fluentd engine on the Source, can be processed step by step or inside a referenced Label, as well any Event may be filtered out at any moment. The new Routing engine behavior aims to provide more flexibility and makes easier the processing before reaching the Output plugin.

Learn More