The out_webhdfs
TimeSliced Output plugin writes records into HDFS (Hadoop Distributed File System). By default, it creates files on an hourly basis. This means that when you first import records using the plugin, no file is created immediately. The file will be created when the time_slice_format
condition has been met. To change the output frequency, please modify the time_slice_format
value.
NOTE: This document doesn't describe all parameters. If you want to know full features, check the Further Reading section.
out_webhdfs
is included in td-agent by default (v1.1.10 or later). Fluentd gem users will have to install the fluent-plugin-webhdfs gem using the following command.
$ fluent-gem install fluent-plugin-webhdfs
Append operations are not enabled by default on CDH. Please put these configurations into your hdfs-site.xml file and restart the whole cluster.
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
<property>
<name>dfs.support.broken.append</name>
<value>true</value>
</property>
<match access.**>
type webhdfs
host namenode.your.cluster.local
port 50070
path /path/on/hdfs/access.log.%Y%m%d_%H.${hostname}.log
flush_interval 10s
</match>
Please see the Fluentd + HDFS: Instant Big Data Collection article for real-world use cases.
NOTE: Please see the Config File article for the basic structure and syntax of the configuration file.
The value must be webhfds
.
The namenode hostname.
The namenode port number.
The path on HDFS. Please include ${hostname} in your path to avoid writing into the same HDFS file from multiple Fluentd instances. This conflict could result in data loss.
INCLUDE: _timesliced_buffer_parameters
INCLUDE: _log_level_params