Kafka踩坑

1 [2020-08-10 18:27:46,648] WARN [Consumer clientId=consumer-1, groupId=console-consumer-18513] Synchronous auto-commit of offsets {image-topic-0=OffsetAndMetadata{offset=0, metadata=’’}} failed: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing the session timeout or by reducing the maximum size of batches returned in poll() with max.poll.records. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)

由于两次消费间隔时间超出设置,导致消费者被认为失效。建议增加消费间隔时间上限或减少单次消费数量。

原因:Linux上路径分隔符为\,与Windows系统不同

2 flume和kafka区别

(1) 两者都是日志系统。Kafka是分布式消息中间件,自带存储,提供push和pull存取数据功能。flume分为agent(数据采集器)、controller(数据简单处理和写入)和storage(存储器)三部分。每一部分均可定制。

(2) kafka自带存储,且具有副本机制,可以重复消费,适合日志缓存;flume可以定制多种数据源,减少开发量,适合数据采集。flume+kafka用于实时流处理,kafka+flume可以利用flume写HDFS的能力。

如Kafka可用于数据缓冲,临时存储未及处理的数据

线上应用通常将数据写入文件或套接字,要发给Kakfa需要修改响应的接口,而flume可以自定义Agent兼容。

Flume与Kafka的比较

Flume+Kafka双剑合璧玩转大数据平台日志采集

Flume概念与原理、与Kafka优势对比

日志采集系统flume和kafka有什么区别及联系,它们分别在什么时候使用,什么时候又可以结合?

参考资料

Consumer Configurations