MRS : flume实时提交日志文件到hdfs系统

网友投稿 859 2022-05-30

MRS : flume实时提交日志文件到hdfs系统

关键词:MRS  flume  hdfs  Kerberos 日志文件

摘要: 本文主要介绍了在MRS集群环境,如何使用flume客户端从日志主机收集日志保存至hdfs指定文件夹

前期准备

1.       创建集群,参考https://support.huaweicloud.com/usermanual-mrs/mrs_01_0027.html.集群选择mrs2.1.0的混合集群, Kerberos认证开启,组件至少包含hadoop,flume.

2.       参考https://support.huaweicloud.com/usermanual-mrs/mrs_01_0091.html中”前提条件”小标题,创建集群外ECS日志主机节点.注意这台主机的VPC和安全组和集群保持一致.

3.       使用Flume搜集日志时,需要在日志主机上安装Flume客户端,参考链接https://support.huaweicloud.com/usermanual-mrs/mrs_01_0392.html

3.1       链接文档中没有flume客户端的下载步骤,参考下图

sh /opt/MRS_Flume_ClientConfig/Flume/install.sh -d /opt/FlumeClient -f 172.16.0.135  -e 172.16.0.100

开发程序(参考https://support.huaweicloud.com/usermanual-mrs/mrs_01_0397.html)

2.       在集群内的master节点将$HADOOP_HOME/etc/hadoop目录下的hdfs-site.xml和core-site.xml文件发送到日志主机节点的flume客户端的配置目录"/opt/FlumeClient/fusioninsight-flume-1.6.0/conf"下

3.       在MRS Manage确认flume角色所在的机器节点ip,登录这个节点,将目录/opt/Bigdata/MRS_x.x.x/1_x_Flume/etc/"下的jaas.conf文件发送到日志主机节点的flume客户端的配置目录"/opt/FlumeClient/fusioninsight-flume-1.6.0/conf"下

4.       登录到日志主机节点"/opt/FlumeClient/fusioninsight-flume-1.6.0/conf"目录下,完成配置

4.1    修改jaas.conf内容,参考下图, principal是MRS Manager创建的用户名, keyTab是keytab认证文件的路径.

4.2    保证配置文件对于程序具有访问权限,修改keytab文件的权限,例如”chmod 777 user.keytab”

4.3    修改flume-env.sh,在 “-XX:+UseCMSCompactAtFullCollection”后面,增加以下内容:” -Djava.security.krb5.conf=/opt/FlumeClient/fusioninsight-flume-1.6.0/conf/krb5.conf -Djava.security.auth.login.config=/opt/FlumeClient/fusioninsight-flume-1.6.0/conf/jaas.conf -Dzookeeper.request.timeout=120000”

4.4    将以下内容覆盖写入properties.properties文件中

client.sources = r1

client.sinks = k1

client.channels = c1

client.sources.r1.type = spooldir

client.sources.r1.spoolDir = /var/log/test

client.sources.r1.trackerDir = /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/trackerDir

client.sources.r1.ignorePattern = ^$

client.sources.r1.fileSuffix = .COMPLETED

client.sources.r1.maxBlobLength = 16384

client.sources.r1.batchSize = 51200

client.sources.r1.inputCharset = UTF-8

client.sources.r1.deserializer = LINE

client.sources.r1.selector.type = replicating

client.sources.r1.fileHeaderKey = file

client.sources.r1.fileHeader = false

client.sources.r1.basenameHeader = true

client.sources.r1.basenameHeaderKey = basename

client.sources.r1.deletePolicy = never

client.sources.r1.channels = c1

client.channels.c1.type = file

client.channels.c1.checkpointDir = /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/checkpointDir/

client.channels.c1.dataDirs = /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/dataDirs/

client.channels.c1.maxFileSize = 2146435071

client.channels.c1.minimumRequiredSpace = 524288000

client.channels.c1.capacity = 1000000

client.channels.c1.transactionCapacity = 10000

client.channels.c1.channelfullcount = 10

client.sinks.k1.type = hdfs

client.sinks.k1.channel = c1

client.sinks.k1.hdfs.path = /flume/file/%y-%m-%d/%H%M/

client.sinks.k1.hdfs.filePrefix = event

client.sinks.k1.hdfs.rollSize = 102400000

client.sinks.k1.hdfs.fileType= DataStream

client.sinks.k1.hdfs.rollInterval = 600

MRS : flume实时提交日志文件到hdfs系统

client.sinks.k1.hdfs.rollCount = 0

cient.sinks.k1.hdfs.inUseSuffix = .tmp

client.sinks.k1.hdfs.fileSuffix = .log

client.sinks.k1.hdfs.idleTimeout = 0

client.sinks.k1.hdfs.useLocalTimeStamp = true

client.sinks.k1.hdfs.round = true

client.sinks.k1.hdfs.roundValue = 10

client.sinks.k1.hdfs.roundUnit = minute

client.sinks.k1.hdfs.useLocalTimeStamp = true

client.sinks.k1.hdfs.kerberosPrincipal = flumeuser

client.sinks.k1.hdfs.kerberosKeytab= /opt/FlumeClient/fusioninsight-flume-1.6.0/conf/user.keytab

4.5    在目录下面创建下面红框标记的文件夹

5.       在日志节点Flume客户端,执行一下命令

cd /opt/FlumeClient/fusioninsight-flume-1.6.0/bin

./flume-manage.sh restart

测试结果

1.       Flume监控的日志文件是集群外日志主机节点下的 /var/log/test”目录,可以通过向此目录写入文本,然后在集群内hdfs的/flume/file目录下查看是否实时同步

附件: 集群外节点flume客户端 source日志文件sink hdfs .docx 439.18KB 下载次数:5次

MapReduce服务

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:React+Nodejs+MySQL全栈开发入门
下一篇:python 的 tuple 是不是冗余设计?
相关文章