Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

in_tail: Repeated Logs Despite pos_file Updating in Fluentd(LTS) on Linux 2023 #4761

Open
tadkarshirish opened this issue Jan 7, 2025 · 5 comments
Labels
waiting-for-user Similar to "moreinfo", but especially need feedback from user

Comments

@tadkarshirish
Copy link

tadkarshirish commented Jan 7, 2025

Describe the bug

We are experiencing an issue where fluentd repeatedly processes the same logs despite the pos_file being updated. The problem persists even after verifying the file permissions and Fluenetd's configurations. Logs are repeated in fluentd output and debug logs indicate re-reading of the same log lines.

To Reproduce

  1. Deploy the Fluenetd using fluent-package-5.0.5-1.amzn2023.x86_64.rpm RPM package in Linux 2023 system.
    Download Link : https://td-agent-package-browser.herokuapp.com/lts/5/amazon/2023/x86_64
  2. Configure the fluentd to tail logs from sample.log file.
  3. Start Fluentd
  4. Observe the logs are processed repeatedly despite the pos_file being updated.

Expected behavior

Fluent should process each log line once and continue from the last position after a restart or rotation.

Your Environment

- Fluentd version: '1.16.6'
- Installation method : RPM package
- Package version:  fluent-package-5.0.5-1.amzn2023.x86_64.rpm 

- Operating system:
  NAME="Amazon Linux"
  VERSION="2023"
  ID="amzn"
  ID_LIKE="fedora"
  VERSION_ID="2023"
  PRETTY_NAME="Amazon Linux 2023.5.20240730"

Your Configuration

<source>
  @type tail
  path /home/ec2-user/log1.txt
  pos_file /home/ec2-user/fluent-package-v5/conf/config.d/[email protected]
  tag demosvc
  format none
</source>

<match demosvc>
  @type stdout
</match>

Your Error Log

2025-01-07 18:48:09 +0530 [info]: fluent/log.rb:362:info: Worker 0 finished with status 0
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: init supervisor logger path="../log/td-agent.log" rotate_age="weekly" rotate_size=nil
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: parsing config file is succeeded path="../conf/td-agent.conf"
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluentd' version '1.16.6'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-calyptia-monitoring' version '0.1.3'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-elasticsearch' version '5.4.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-flowcounter-simple' version '0.1.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-kafka' version '0.19.2'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-metrics-cmetrics' version '0.1.2'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-opensearch' version '1.1.4'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-prometheus' version '2.1.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-prometheus_pushgateway' version '0.1.1'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-record-modifier' version '2.1.1'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-rewrite-tag-filter' version '2.4.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-s3' version '1.7.2'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-sd-dns' version '0.1.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-systemd' version '1.1.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-td' version '1.2.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-utmpx' version '0.5.0'
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: gem 'fluent-plugin-webhdfs' version '1.5.0'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered output plugin 'stdout'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered metrics plugin 'local'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered buffer plugin 'memory'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered formatter plugin 'stdout'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered formatter plugin 'json'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered parser plugin 'regexp'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered parser plugin 'multiline'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered input plugin 'tail'
2025-01-07 18:48:18 +0530 [trace]: fluent/log.rb:319:trace: registered parser plugin 'none'
2025-01-07 18:48:18 +0530 [debug]: fluent/log.rb:341:debug: No fluent logger for internal event
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: using configuration file: <ROOT>
  <source>
    @type tail
    path "/home/ec2-user/log1.txt"
    pos_file "/home/ec2-user/fluent-package-v5/conf/config.d/[email protected]"
    tag "demosvc"
    format none
    <parse>
      @type none
      unmatched_lines
    </parse>
  </source>
  <match demosvc>
    @type stdout
  </match>
</ROOT>
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: starting fluentd-1.16.6 pid=18446 ruby="3.2.6"
2025-01-07 18:48:18 +0530 [info]: fluent/log.rb:362:info: spawn command to main:  cmdline=["/opt/fluent/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/fluentd", "-vv", "-d", "../bin/fluentd.pid", "-c", "../conf/td-agent.conf", "-p", "../plugin", "--log", "../log/td-agent.log", "--log-rotate-age", "weekly", "--under-supervisor"]
2025-01-07 18:48:19 +0530 [info]: #0 fluent/log.rb:362:info: init worker0 logger path="../log/td-agent.log" rotate_age="weekly" rotate_size=nil
2025-01-07 18:48:19 +0530 [info]: fluent/log.rb:362:info: adding match pattern="demosvc" type="stdout"
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered output plugin 'stdout'
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered metrics plugin 'local'
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered buffer plugin 'memory'
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered formatter plugin 'stdout'
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered formatter plugin 'json'
2025-01-07 18:48:19 +0530 [info]: fluent/log.rb:362:info: adding source type="tail"
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered parser plugin 'regexp'
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered parser plugin 'multiline'
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered input plugin 'tail'
2025-01-07 18:48:19 +0530 [trace]: #0 fluent/log.rb:319:trace: registered parser plugin 'none'
2025-01-07 18:48:19 +0530 [debug]: #0 fluent/log.rb:341:debug: No fluent logger for internal event
2025-01-07 18:48:19 +0530 [info]: #0 fluent/log.rb:362:info: starting fluentd worker pid=18456 ppid=18453 worker=0
2025-01-07 18:48:19 +0530 [debug]: #0 fluent/log.rb:341:debug: Compacted entries: ["/home/ec2-user/log1.txt"]
2025-01-07 18:48:19 +0530 [debug]: #0 fluent/log.rb:341:debug: Remove missing entries. existing_targets=["/home/ec2-user/log1.txt"] entries_after_removing=["/home/ec2-user/log1.txt"]
2025-01-07 18:48:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing =
2025-01-07 18:48:19 +0530 [info]: #0 fluent/log.rb:362:info: following tail of /home/ec2-user/log1.txt
2025-01-07 18:48:19 +0530 [info]: #0 fluent/log.rb:362:info: fluentd worker is now running worker=0
2025-01-07 18:48:37 +0530 [info]: #0 fluent/log.rb:362:info: detected rotation of /home/ec2-user/log1.txt
2025-01-07 18:48:37 +0530 [info]: #0 fluent/log.rb:362:info: following tail of /home/ec2-user/log1.txt
2025-01-07 18:48:37.070691841 +0530 demosvc: {"message":"a"}
2025-01-07 18:48:37.070700551 +0530 demosvc: {"message":"b"}
2025-01-07 18:48:37.070703321 +0530 demosvc: {"message":"c"}
2025-01-07 18:48:37.070705621 +0530 demosvc: {"message":"d"}
2025-01-07 18:49:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:50:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:51:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:52:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:53:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:54:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:55:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:56:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-07 18:56:51 +0530 [info]: #0 fluent/log.rb:362:info: detected rotation of /home/ec2-user/log1.txt
2025-01-07 18:56:51 +0530 [info]: #0 fluent/log.rb:362:info: following tail of /home/ec2-user/log1.txt
2025-01-07 18:56:51.534240333 +0530 demosvc: {"message":"a"}
2025-01-07 18:56:51.534247023 +0530 demosvc: {"message":"b"}
2025-01-07 18:56:51.534249323 +0530 demosvc: {"message":"c"}
2025-01-07 18:56:51.534251443 +0530 demosvc: {"message":"d"}
2025-01-07 18:56:51.534253793 +0530 demosvc: {"message":"e"}
2025-01-07 18:56:51.534255783 +0530 demosvc: {"message":"f"}
2025-01-07 18:56:51.534257823 +0530 demosvc: {"message":"g"}
2025-01-07 18:56:51.534259953 +0530 demosvc: {"message":"h"}
2025-01-07 18:57:12 +0530 [info]: #0 fluent/log.rb:362:info: detected rotation of /home/ec2-user/log1.txt
2025-01-07 18:57:12 +0530 [info]: #0 fluent/log.rb:362:info: following tail of /home/ec2-user/log1.txt
2025-01-07 18:57:12.514073094 +0530 demosvc: {"message":"a"}
2025-01-07 18:57:12.514079934 +0530 demosvc: {"message":"b"}
2025-01-07 18:57:12.514082074 +0530 demosvc: {"message":"c"}
2025-01-07 18:57:12.514084194 +0530 demosvc: {"message":"d"}
2025-01-07 18:57:12.514087054 +0530 demosvc: {"message":"e"}
2025-01-07 18:57:12.514089185 +0530 demosvc: {"message":"f"}
2025-01-07 18:57:12.514091355 +0530 demosvc: {"message":"g"}
2025-01-07 18:57:12.514093345 +0530 demosvc: {"message":"h"}
2025-01-07 18:57:12.514095305 +0530 demosvc: {"message":"i"}
2025-01-07 18:57:12.514097665 +0530 demosvc: {"message":"j"}
2025-01-07 18:57:12.514099715 +0530 demosvc: {"message":"k"}
2025-01-07 18:57:12.514101725 +0530 demosvc: {"message":"l"}
2025-01-07 18:57:12.514103705 +0530 demosvc: {"message":"m"}
2025-01-07 18:57:12.514105755 +0530 demosvc: {"message":"n"}
2025-01-07 18:57:12.514107895 +0530 demosvc: {"message":"p"}
2025-01-07 18:57:19 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt

Additional context

in logfile we have first added , a,b,c,d then e,f,g,h then i,j,k,l,m,n,p but every time its giving whole log file output. even pos_file is updating the first Unique number updates every time after log change the second unique number remains same whole time.

/home/ec2-user/log1.txt 000000000000001e 000000000025f4f5

@daipom
Copy link
Contributor

daipom commented Jan 15, 2025

@tadkarshirish
in_tail detects the file rotation when the file is updated.

2025-01-07 18:57:12 +0530 [info]: #0 fluent/log.rb:362:info: detected rotation of /home/ec2-user/log1.txt

So, in_tail rereads all the logs.

Please check how the file is updated.
For example, the file might be updated as follows.

  1. Make the file empty.
  2. Rewrite the whole content of the file.

@daipom daipom added waiting-for-user Similar to "moreinfo", but especially need feedback from user and removed waiting-for-triage labels Jan 15, 2025
@daipom daipom changed the title Repeated Logs Despite pos_file Updating in Fluentd(LTS) on Linux 2023 in_tail: Repeated Logs Despite pos_file Updating in Fluentd(LTS) on Linux 2023 Jan 15, 2025
@tadkarshirish
Copy link
Author

tadkarshirish commented Jan 15, 2025

@daipom

We are testing with Sample log file which we are updating with following steps.

1. Create a Sample Log File
Run the following commands to create and populate a log file:
vi log1.txt

Add sample log entries manually (e.g., a, b, c, d) and save the file using :wq.

2. Create Fluentd Configuration
Create a configuration file named fluentd.conf:

vi fluentd.conf

Add the following sample configuration, ensuring correct paths for the log and pos file:

<source>
  @type tail
  path /home/ec2-user/log1.txt
  pos_file /home/ec2-user/fluent-package-v5/conf/config.d/[email protected]
  tag demosvc
  format none
</source>

<match demosvc>
  @type stdout
</match>

Save and exit using :x

3. Start Fluentd
Run Fluentd with the specified configuration:

fluentd -c /path/to/fluentd.conf

4. Edit the Sample Log File

vi log1.txt

Append new log entries (e.g., e, f)

Save and exit using the secure save command: :x

5. Check Fluentd Output

Monitor the Fluentd logs or check the stdout output to verify that log entries have been processed.

6. Repeat Log Editing and Save

Reopen log1.txt using vi, make edits, and save using :x

@daipom
Copy link
Contributor

daipom commented Jan 15, 2025

@tadkarshirish
Did you confirm log duplication with those steps?
I could not confirm.
Please check what's different.

$ vi fluentd.conf
$ cat fluentd.conf
<source>
  @type tail
  path /tmp/2/log1.txt
  pos_file /tmp/2/pos/pos
  tag demosvc
  format none
</source>

<match demosvc>
  @type stdout
</match>
$ vi log1.txt
$ cat log1.txt
a
b
c
d
(Launch Fluentd with another shell here)
$ vi log1.txt
$ cat log1.txt
a
b
c
d
e
f

Fluentd

$ bundle exec fluentd -c /tmp/2/fluentd.conf
2025-01-15 15:20:15 +0900 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2025-01-15 15:20:15 +0900 [info]: parsing config file is succeeded path="/tmp/2/fluentd.conf"
2025-01-15 15:20:15 +0900 [info]: gem 'fluentd' version '1.18.0'
2025-01-15 15:20:15 +0900 [info]: using configuration file: <ROOT>
  <source>
    @type tail
    path "/tmp/2/log1.txt"
    pos_file "/tmp/2/pos/pos"
    tag "demosvc"
    format none
    <parse>
      @type none
      unmatched_lines 
    </parse>
  </source>
  <match demosvc>
    @type stdout
  </match>
</ROOT>
2025-01-15 15:20:15 +0900 [info]: starting fluentd-1.18.0 pid=732107 ruby="3.2.2"
2025-01-15 15:20:15 +0900 [info]: spawn command to main:  cmdline=["/home/daipom/.rbenv/versions/3.2.2/bin/ruby", "-r/home/daipom/.rbenv/versions/3.2.2/lib/ruby/site_ruby/3.2.0/bundler/setup", "-Eascii-8bit:ascii-8bit", "/home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/bin/fluentd", "-c", "/tmp/2/fluentd.conf", "--under-supervisor"]
2025-01-15 15:20:15 +0900 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2025-01-15 15:20:15 +0900 [info]: adding match pattern="demosvc" type="stdout"
2025-01-15 15:20:15 +0900 [info]: adding source type="tail"
2025-01-15 15:20:15 +0900 [info]: #0 starting fluentd worker pid=732127 ppid=732107 worker=0
2025-01-15 15:20:15 +0900 [info]: #0 following tail of /tmp/2/log1.txt
2025-01-15 15:20:15 +0900 [info]: #0 fluentd worker is now running worker=0
2025-01-15 15:20:31.528486080 +0900 demosvc: {"message":"e"}
2025-01-15 15:20:31.528499539 +0900 demosvc: {"message":"f"}

@tadkarshirish
Copy link
Author

tadkarshirish commented Jan 15, 2025

@daipom

as per your steps its working fine and there is no log duplication for first logfile addition. , but after editing log file few times it is showing duplication issue.

2025-01-15 13:57:01 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing =
2025-01-15 13:57:01 +0530 [info]: #0 fluent/log.rb:362:info: following tail of /home/ec2-user/log1.txt
2025-01-15 13:57:01 +0530 [info]: #0 fluent/log.rb:362:info: fluentd worker is now running worker=0
2025-01-15 13:57:34.499766260 +0530 demosvc: {"message":"e"}
2025-01-15 13:57:34.499768560 +0530 demosvc: {"message":"f"}
2025-01-15 13:58:01 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-15 13:58:17.415013155 +0530 demosvc: {"message":"i"}
2025-01-15 13:58:17.415019765 +0530 demosvc: {"message":"g"}
2025-01-15 13:58:17.415021785 +0530 demosvc: {"message":"k"}
2025-01-15 13:58:58.095310381 +0530 demosvc: {"message":"l"}
2025-01-15 13:58:58.095318521 +0530 demosvc: {"message":"m"}
2025-01-15 13:58:58.095320491 +0530 demosvc: {"message":"n"}
2025-01-15 13:58:58.095322311 +0530 demosvc: {"message":"o"}
2025-01-15 13:59:01 +0530 [debug]: #0 fluent/log.rb:341:debug: tailing paths: target = /home/ec2-user/log1.txt | existing = /home/ec2-user/log1.txt
2025-01-15 13:59:58 +0530 [info]: #0 fluent/log.rb:362:info: detected rotation of /home/ec2-user/log1.txt
2025-01-15 13:59:58 +0530 [info]: #0 fluent/log.rb:362:info: following tail of /home/ec2-user/log1.txt
2025-01-15 13:59:58.963044398 +0530 demosvc: {"message":"a"}
2025-01-15 13:59:58.963052678 +0530 demosvc: {"message":"b"}
2025-01-15 13:59:58.963055168 +0530 demosvc: {"message":"c"}
2025-01-15 13:59:58.963057528 +0530 demosvc: {"message":"d"}
2025-01-15 13:59:58.963060398 +0530 demosvc: {"message":"e"}
2025-01-15 13:59:58.963062798 +0530 demosvc: {"message":"f"}
2025-01-15 13:59:58.963064858 +0530 demosvc: {"message":"i"}
2025-01-15 13:59:58.963066738 +0530 demosvc: {"message":"g"}
2025-01-15 13:59:58.963069398 +0530 demosvc: {"message":"k"}
2025-01-15 13:59:58.963072598 +0530 demosvc: {"message":"l"}
2025-01-15 13:59:58.963074558 +0530 demosvc: {"message":"m"}
2025-01-15 13:59:58.963076468 +0530 demosvc: {"message":"n"}
2025-01-15 13:59:58.963078419 +0530 demosvc: {"message":"o"}
2025-01-15 13:59:58.963080319 +0530 demosvc: {"message":"x"}
2025-01-15 13:59:58.963082229 +0530 demosvc: {"message":"y"}
2025-01-15 13:59:58.963084179 +0530 demosvc: {"message":"z"}

@daipom
Copy link
Contributor

daipom commented Jan 17, 2025

@tadkarshirish
I have tried to add logs many times, but I can't reproduce it.
Is there any way to reproduce this issue surely, e.g., with a shellscript?

From the information we have at this time, it is highly likely that the decrease of file size was recognized as the occurrence of rotation.

For example, I can reproduce this by rewriting an existing line to reduce the file size.

Could you please check if the file size is decreasing when this issue occurs?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-for-user Similar to "moreinfo", but especially need feedback from user
Projects
None yet
Development

No branches or pull requests

2 participants