-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add neighbor state #3191
base: master
Are you sure you want to change the base?
Add neighbor state #3191
Conversation
Signed-off-by: Emin Umut Gercek <[email protected]>
74f1c6f
to
4a7715d
Compare
Signed-off-by: Emin Umut Gercek <[email protected]>
4a7715d
to
2246604
Compare
…-arp-states Signed-off-by: Emin Umut Gercek <[email protected]>
de8539a
to
d679000
Compare
Also fixed the merge conflict. |
collector/arp_linux.go
Outdated
|
||
for _, n := range neighbors { | ||
// Skip entries which have state NUD_NOARP to conform to output of /proc/net/arp. | ||
if n.State&unix.NUD_NOARP == 0 { | ||
entries[n.Interface.Name]++ | ||
if n.State&unix.NUD_NOARP != unix.NUD_NOARP { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I can say?
if n.State&unix.NUD_NOARP != unix.NUD_NOARP { | |
if n.State != unix.NUD_NOARP { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, n.State
is a bit mask. The bitwise AND is necessary to isolate the NUD_NOARP
flag that we wish to test for. Other bits may be set in n.State
, so you cannot just do a simple comparison as you suggest.
cf. man 7 rtnetlink
:
ndm_state is a bit mask of the following states:
NUD_INCOMPLETE a currently resolving cache entry
NUD_REACHABLE a confirmed working cache entry
NUD_STALE an expired cache entry
NUD_DELAY an entry waiting for a timer
NUD_PROBE a cache entry that is currently reprobed
NUD_FAILED an invalid cache entry
NUD_NOARP a device with no destination cache
NUD_PERMANENT a static entry
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for explanation, makes sense. Then current implementation looks ok. I changed n.State&unix.NUD_NOARP == 0
to have early return, can revert that if you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the bug after i changed here, btw.
LGTM in general but I think this needs some tests |
Also see lint error:
|
Signed-off-by: Emin Umut Gercek <[email protected]>
Signed-off-by: Emin Umut Gercek <[email protected]>
Hi there was a bug on On the testing side, I'm not sure how to test RTNL via fixtures. Maybe we can test To test it, I wrote a reproducible but hacky Bash script. I don't recommend adding it as a test in the repository, but it demonstrates that the functionality implemented correctly. I would appreciate guidance on how to add proper tests for the RTNL part. (You can run the script on your machine. It only needs docker, and Script#!/bin/bash
set -eu
export DOCKER_CLI_HINTS=false
docker rm -f arp-test nginx1 &> /dev/null || true
tempFile=$(mktemp)
cat <<EOT > $tempFile
FROM ubuntu
COPY node_exporter /opt/node_exporter
RUN apt update 1>/dev/null && apt install -y curl iproute2
USER nobody
ENTRYPOINT ["/opt/node_exporter"]
EOT
GOOS=linux GOARCH=amd64 go build .
docker build -t arp-test -f $tempFile . &>/dev/null
docker run --name arp-test \
-p 9100:9100 \
--rm -d \
arp-test &>/dev/null
docker exec arp-test ip neigh
docker run -d --name nginx1 nginx &>/dev/null
nginxIp=$(docker inspect \
-f '{{range.NetworkSettings.Networks}}{{.IPAddress}}{{end}}' nginx1)
cur_state() {
echo "====="
echo "From node_exporter:"
curl -s localhost:9100/metrics | grep -E ^node_arp
echo "From ip neigh:"
docker exec arp-test ip neigh
echo "====="
}
echo "Initial state"
cur_state
echo "After curl nginx1 (NUD_DELAY or NUD_REACHABLE)"
docker exec arp-test curl -s -o /dev/null $nginxIp
cur_state
sleep 2
echo "Wait arp cache (NUD_REACHABLE)"
cur_state
echo "After curl unknown ip (NUD_INCOMPLETE or NUD_FAILED)"
docker exec arp-test curl -m 1 -s -o /dev/null 172.17.0.10 || true
cur_state
echo "Sleep 30 second for net.ipv4.neigh.default.base_reachable_time_ms (NUD_FAILED)"
sleep 30
cur_state Output
|
Signed-off-by: Emin Umut Gercek <[email protected]>
collector/arp_linux.go
Outdated
} | ||
|
||
// Map of interface name to ARP neighbor count. | ||
entries := make(map[string]uint32) | ||
// Map of map[InterfaceName]map[StateName]int | ||
states := make(map[string]map[string]uint32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that if I were to implement this, I would keep the neighbor states as their native uint16
until ready to emit the metrics. This would be more memory efficient and also result in fewer map lookups. In other words, make this a map[string]map[uint16]uint32
and only resolve the states to string labels when emitting the metrics to the channel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very good idea, we won't store all those strings now, also fewer lookups as you said. Thanks for your careful review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can, resolve the conversation if the new commit is ok.
Signed-off-by: Emin Umut Gercek <[email protected]>
I have a few concerns remaining about this PR. Firstly, this has the potential to inflate cardinality quite considerably on hosts with a lot of interfaces. The simple ARP entry count is easy to predict, as it won't be more than one metric per interface. However, adding enum states where there could potentially be three or four different non-zero state counts per interface might be a concern for some users. I know for a fact that some users run node_exporter on hosts with hundreds of interfaces. Perhaps this new feature should be gated by a command line flag. Secondly, as with any metric involving enum states, if only non-zero enum counters are exposed, this will lead to stale metrics when a state changes. For example, if Finally, as I already mentioned, the neighbor state is a bit mask. This PR in its current state assumes that the neighbor state will only ever be a single NUD state, and that a simple map lookup for that value is acceptable. Are you sure that there will never be entries that are a combination of state flags? If this occurs, the |
Implements part of #535
Some things we can talk:
state="stale"
like labels and just give the state value to Gauge (IMO current impl is simpler) although I saw below in CONTRIBUTING.md I didn't see many magic numbers in the metrics and labels thus added human readable names for them.Update
function to make it more readable (again IMO it's simple enough to do it in one function)This pr adds below metrics
Old comments
I changed the design in second commit due to not exploding cardinality but keep the comments if maintainers wants that design.
Iff you want to have metrics like below, can continue to read:
Some things we can talk:
neighborState
struct and just use string slice, but this is easier to follow IMO1024*count(net_int)
as max. But in some "fine tuning"s I saw people generally increase this although it's limited by the size of you subnet, one can have many machines in subnet. Maybe I should make a counter for each state?@SuperQ it's ready for your review, when you have time 🙏🏻