[KEP-4817]: DRA: Resource Claim Status with possible standardized network interface data #4861

LionelJouin · 2024-09-19T19:05:30Z

One-line PR description: Add driver-owned fields in ResourceClaim.Status with possible standardized network interface data

Issue link: DRA: Resource Claim Status with possible standardized network interface data #4817

Other comments:
I would like to get early feedback for this KEP document.
Please take a look at the drawbacks as this is uncertain to me. It is probably related to the discussion to include the device state as part of the Pod.Status.

I followed option 5 here: https://docs.google.com/document/d/1smjEm7WxFm8btQwywdr_F5BZkiifatNqDvrQUaesu1M/edit?usp=sharing and this PoC here: https://gist.github.com/LionelJouin/5cfc11eecf73663b5657ed3be1eb6c00

Signed-off-by: Lionel Jouin <[email protected]>

k8s-ci-robot · 2024-09-19T19:05:40Z

Hi @LionelJouin. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

johnbelamaric · 2024-09-19T19:29:38Z

/assign @johnbelamaric @aojea

kannon92 · 2024-09-19T19:40:42Z

/ok-to-test

kannon92 · 2024-09-19T19:41:39Z

/retitle [KEP-4817]: DRA: Resource Claim Status with possible standardized network interface data

We have another KEP that is about device health status so renaming this to be a bit more clear what the KEP is about.

johnbelamaric

Ok, left a few comments, this is a good start.

Can you fix the MD formatting so we don't have really long lines? In vi, it's :gwap and it will wrap your paragraph nicely...

Also feel free to include any text from https://docs.google.com/document/d/1smjEm7WxFm8btQwywdr_F5BZkiifatNqDvrQUaesu1M/edit?tab=t.0#heading=h.wui6s3gos7b2 that you find useful.

keps/sig-node/4817-resource-claim-device-status/README.md

johnbelamaric · 2024-09-20T23:01:29Z

keps/sig-node/4817-resource-claim-device-status/README.md

+
+The API changes define a new `DeviceStatuses` field in the existing `ResourceClaimStatus` struct. `DeviceStatuses` is a slice of a new struct `AllocatedDeviceStatus` which holds device specific information. 
+
+A device, identified by `<driver name>/<pool name>/<device name>` can be represented only once in the `DeviceStatuses` slice and will also mention which request caused the device to be allocated. The state and characteristics of the device are reported in the `Conditions`, representing the operational state of the device and in the `DeviceInfo`, an arbitrary data slice representing device specific characteristics. Additionally, for networking devices, a field `NetworkDeviceInfo` can be used to report the IPs, the MAC address and the interface name.


This is one approach, where there is DeviceInfo and NetworkDeviceInfo. I am not sure that will pass API review muster, but maybe. An alternative is what is described in Option 5. Instead of 3 fields, there would be two fields: Conditions and DeviceInfo. The DeviceInfo field is a one-of API type. So, something like:

type AllocatedDeviceStatus struct { ... DeviceInfo *DeviceInfo } type DeviceInfo struct { // +optional // +oneOf=DeviceInfoType Network *DeviceInfoNetwork // +optional // +oneOf=DeviceInfoType General *DeviceInfoGeneral } type DeviceInfoGeneral struct { Opaque RawExtension } type DeviceInfoNetwork struct { InterfaceName string IPAddresses []IPAddress MACAddress string Opaque RawExtension }

@thockin WDYT? Every one-of would have the Opaque RawExtension plus one-of specific fields. Should that therefore just bubble up one level, so that we have a RawExtension, and then an optional device-type based one-of? Or, what about compound devices that may be several types. Should we just make the Opaque and Network fields separate, optional fields that may or may not be populated for any given device? I am not sure the one-of construction actually helps. (In other words, go with what is already here in the KEP).

I see, I interpreted it like any driver can use the opaque field (DeviceInfo) + their own type. So the oneOf would be between specific type (NetworkDeviceInfo, GPUDeviceInfo...).

Yeah, but the way you did it is less prescriptive - it allows a device to serve multiple functions. So I rather like that.

To be a little more verbose about my last comment:

we don't know what devices will come out in the future. So, whose to say they won't stick an optical connection on a GPU...or maybe they have already? So, maybe the GPU and NIC will be a single device, not two separate devices

As VM architects, there may be specific devices that "go together as one". The GPU/NIC combos is an example. Right now, we are using MatchAttributes with a pcieRoot attribute to align the internal topology. That's a lot of work for the user. Instead, I am looking at building a "compound device driver" on top of the partitionable device model. This would allow a single "device" owned by that driver to be allocated, but in reality that driver would proxy calls for that device to multiple other drivers. So, the VM architect could "pair" specific devices, such that you cannot get one without the other. Those devices would have separate functions, represented by the separate underlying drivers. I would want to be able to stick information about each underlying device in the right "device info field" for that function.

Thinking about it, this is still limited of course, to one device per function. Another option would be to have the one-of, but allow multiple entries per driver/pool/device tuple.

johnbelamaric · 2024-09-20T23:02:23Z

keps/sig-node/4817-resource-claim-device-status/README.md

+	//
+	// +optional
+	// +listType=atomic
+	DeviceInfo []runtime.RawExtension `json:"deviceInfo,omitempty" protobuf:"bytes,6,rep,name=deviceInfo"`


see comment above regarding this versus a one-of

johnbelamaric · 2024-09-20T23:07:18Z

Also, you're going to need a section on the authorization for the driver to write to the status, as is actively under discussion in Slack.

Signed-off-by: Lionel Jouin <[email protected]>

keps/sig-node/4817-resource-claim-device-status/README.md

pohly · 2024-09-23T08:36:16Z

keps/sig-node/4817-resource-claim-device-status/README.md

+	// per request.
+	//
+	// +required
+	Request string `json:"request" protobuf:"bytes,1,rep,name=request"`


Is this needed? More generally, can we drop the "Allocated" part from the API?

It is not useful to post status of arbitrary, unrelated devices, but it can happen at least temporarily, for example directly after a claim got deallocated.

Speaking of which, who removes such stale entries?

A claim can go from "new" to "allocated" back to "deallocated" and then to "allocated" again.

Right, I removed it, I guess this is repetitive as this information can be retrieved from .Status.Allocation.Devices.Results.

In the Notes/Constraints/Caveats section, I stated The accuracy of the information depends on the implementation of the DRA Drivers, the reported status of the device may not always reflect the real time changes of the state of the device. so the DRA-Driver should update these information based on when it configures/check/unconfigured devices.

Probably if the ResourceClaim gets deallocated then the DeviceStatuses can be cleared as it might be deleted or reallocated in which case the driver will have to reconfigure the devices in the pods/containers?

I suppose that could be added. In that case, calling this AllocatedDeviceStatus would make sense because it is meant and documented as "only for allocated devices".

pohly · 2024-09-23T08:37:22Z

keps/sig-node/4817-resource-claim-device-status/README.md

+	// +listType=atomic
+	Conditions []metav1.Condition `json:"conditions" protobuf:"bytes,5,rep,name=conditions"`
+
+	// DeviceInfo contains Arbitrary driver-specific data.


Suggested change

// DeviceInfo contains Arbitrary driver-specific data.

// DeviceInfo contains arbitrary driver-specific data.

How can a consumer reliably determine how to parse this content? The existing driver field above might not be enough: for example, two different drivers "foo.example.com" and "bar.example.com" might both provide data according to some common specification.

A consumer shouldn't have to keep a database that lists whether a driver uses such a common specification.

How can a consumer reliably determine how to parse this content? The existing driver field above might not be enough: for example, two different drivers "foo.example.com" and "bar.example.com" might both provide data according to some common specification.

A consumer shouldn't have to keep a database that lists whether a driver uses such a common specification.

One option is to require a GVK in here. But another is just to say - you shouldn't be consuming opaque data. If there is a common spec, it should be a built in type.

I think that was the idea yes as it would be quite difficult to cover all use cases with the specs we would provide in this KEP. Letting the driver reporting arbitrary status could still be useful I guess (for troubleshooting for example)?

yes. but no one except the driver should rely on that data

MikeZappa87 · 2024-09-23T15:21:52Z

keps/sig-node/4817-resource-claim-device-status/README.md

+	// This can include both IPv4 and IPv6 addresses.
+	//
+	// +optional
+	IPs []string `json:"ips,omitempty" protobuf:"bytes,2,rep,name=ips"`


Does it make sense to have a type here instead of string? Does mask, family add value?

type IP struct { address mask family }

I replaced this field with a NetworkAddresses one:

type NetworkDeviceInfo struct { ... // NetworkAddresses lists the network addresses assigned to the device's network interface. // This can include both IPv4 and IPv6 addresses. // // +optional NetworkAddresses []NetworkAddress `json:"networkAddresses,omitempty" protobuf:"bytes,2,rep,name=networkAddresses"` ... } // NetworkAddress provides a network address related details such as IP and Mask. type NetworkAddress struct { // CIDR contains the network address in CIDR notation, which includes // both the address and the associated subnet mask. // e.g.: "192.0.2.0/24" for IPv4 and "2001:db8::/64" for IPv6. // // +required CIDR string `json:"cidr,omitempty" protobuf:"bytes,1,rep,name=cidr"` }

For reference here is how the NetworkAttachmentDefinition Status looks like and here is for the CNI Result.

we have validation on types, I think we don't need to force the types and it removes complexity if we just use an string and we validate is a valid CIDR , WDYT @thockin

I switched it back to a single field

type NetworkDeviceData struct { ... // Addresses lists the network addresses assigned to the device's network interface. // This can include both IPv4 and IPv6 addresses. // The addresses are in the CIDR notation, which includes both the address and the // associated subnet mask. // e.g.: "192.0.2.0/24" for IPv4 and "2001:db8::/64" for IPv6. // // +optional // +listType=atomic Addresses []string `json:"addresses,omitempty" protobuf:"bytes,2,opt,name=addresses"` ... }

* Typos * Establish a standard instead of laying the foundation for creating a standard * KEP Name * DeviceStatuses keys * AllocatedDeviceStatus request field removed * IPs to a new struct Signed-off-by: Lionel Jouin <[email protected]>

keps/sig-node/4817-resource-claim-device-status/README.md

dougbtv · 2024-10-08T14:20:45Z

keps/sig-node/4817-resource-claim-device-status/README.md

+    // NetworkData contains network-related information specific to the device.
+    //
+    // +optional
+    NetworkData NetworkDeviceData `json:"networkData,omitempty" protobuf:"bytes,6,opt,name=networkData"`


Question about NetworkData vs. Data. So let's consider a situation where I've got users using NetworkAttachmentDefinition style specifications, and so they've got fields like:

Name string `json:"name"` Interface string `json:"interface,omitempty"` IPs []string `json:"ips,omitempty"` Mac string `json:"mac,omitempty"` Mtu int `json:"mtu,omitempty"` Default bool `json:"default,omitempty"` DNS DNS `json:"dns,omitempty"` DeviceInfo *DeviceInfo `json:"device-info,omitempty"` Gateway []string `json:"gateway,omitempty"`

The so, I have this mapping...

Interface --> InterfaceName IPs --> Addresses Mac --> HWAddress

Which is 3/x, so the other values would be remapped to... The data field?

I'm mostly looking for clarity, and trying to figure out standardization (where it makes sense), and also -- extensibility. That is, that with the */network-status: annotation with net-attach-def, we've definitely had to slowly expand that struct over time. So I wonder if we should focus on building some standards for the Data field if these other fields would go there? Or, could we have something as part of the NetworkDeviceData that is somehow extensible, and maybe think about forming a standard for it via Multinetworking WG?

If you want to store the rest, yes, the Data field. The main idea is that the Data field is 100% driver specific and so there are no guarantees (by the Kubernetes project) with respect to this data structure or API evolution.

The question I have is what data needs to be consumed by other ecosystem projects? That's what should be in a K8s built-in API.

Or, could we have something as part of the NetworkDeviceData that is somehow extensible, and maybe think about forming a standard for it via Multinetworking WG?

Maybe NetworkDeviceData should be a K8s one-of?

Or, to be really flexible, instead of two fields, we have one field Data []DeviceData, sort of like we did originally but now an array instead of just a single value:

type AllocatedDeviceStatus struct { ... Data []DeviceData } type DeviceData struct { // exactly one must be populated // +optional Opaque *runtime.RawExtension // +optional NetworkData *NetworkDeviceData // +optional SomeFutureThing *MultinetThing }

The sort of hard part is now we can have an aribtrary number of each type, but maybe that's OK.

@dougbtv one question, what of those things are configuration option and what of those things need to return a value from the driver?

It look most of those things are configuration options, that already are reported in status, the thing we try to standardize are the things that are reported by the driver that are not set in configuration.

I guess the question is, what of those values are set by the user and what of those values are reported by the driver only (they are not configured) , like addresses and interface name.

The earlier text implies that Data may include multiple representations of the same data in different formats:

`Data` being an arbitrary data field allows the DRA Driver to store device specific data in different formats. For example, a Network Device being configured by via a CNI plugin could get its `Data` field filled with the CNI result for troubleshooting purpose and with a Network result in a modern standard format (closer to Pod.Status.PodIPs for example) used by 3rd party controllers.

Or maybe that's an artifact from an earlier version of the text and the second part is supposed to be the NetworkData field?

If there is any intention that Data could have multiple representations of the same data, then it seems like it ought to be explicitly an array or map. Otherwise how are you supposed to be able to reliably extract the representation you wanted?

So I think I like John's version above.

yes, this is definitely not for config - config goes in the DeviceClaim.Config list, which is a bunch of RawExtensions (or in the DeviceClass.Config).

I see, I misinterpreted it sorry @dougbtv

@johnbelamaric is the list not going to be confusing to identify which information goes where?

@johnbelamaric is the list not going to be confusing to identify which information goes where?

Yes, each list entry needs an identifier if we do this. Maybe make it a list-map?

obj.status.devices[i].driver.{opaque,network,whatever}

driver already exists in device[i] as the name of the driver.

obj.status.devices[i].status.{opaque,network,whatever}

Meh. Not better than data to me.

With data as a slice as I am suggesting:

obj.status.devices[i].data[j].{opaque,network,whatever}

obj.status.devices[i].data[j].{opaque,network,whatever}

So a claim has multiple devices, each device has multiple datas (datae?), each of which has exactly one field set, right?

yes, that's right. fun stuff

data is a list-map with some sort of driver-defined identifier for the key

keps/sig-node/4817-resource-claim-device-status/README.md

danwinship · 2024-10-08T15:30:19Z

keps/sig-node/4817-resource-claim-device-status/README.md

+    // NetworkData contains network-related information specific to the device.
+    //
+    // +optional
+    NetworkData NetworkDeviceData `json:"networkData,omitempty" protobuf:"bytes,6,opt,name=networkData"`


The earlier text implies that Data may include multiple representations of the same data in different formats:

`Data` being an arbitrary data field allows the DRA Driver to store device specific data in different formats. For example, a Network Device being configured by via a CNI plugin could get its `Data` field filled with the CNI result for troubleshooting purpose and with a Network result in a modern standard format (closer to Pod.Status.PodIPs for example) used by 3rd party controllers.

Or maybe that's an artifact from an earlier version of the text and the second part is supposed to be the NetworkData field?

If there is any intention that Data could have multiple representations of the same data, then it seems like it ought to be explicitly an array or map. Otherwise how are you supposed to be able to reliably extract the representation you wanted?

So I think I like John's version above.

danwinship · 2024-10-08T15:33:51Z

keps/sig-node/4817-resource-claim-device-status/README.md

+
+// NetworkDeviceData provides network-related details for the allocated device.
+// This information may be filled by drivers or other components to configure
+// or identify the device within a network context.


Should probably include the interface index? Name, ips, and hwaddr are all mutable,.

(In which case maybe you don't need to include the interface name?)

If you have enough privilege to rename an interface, none of the rest really matters, right? That doesn't stop us from reporting podIPs

I wasn't worrying just about privilege/attacks. Maybe there are use cases where the interface eventually gets renamed for some reason?

It just seems like, if it's possible to refer to the device via an immutable identifier rather than a mutable one, then we should?

Agree with that principle

The reference of the interface in the kubernetes cluster is: Driver + "/" + Pool + "/" + Device
The index is the reference of the interface in a Linux operating system, Devices are supposed to abstract any network interface, most commonly the linux ones, but not only those.
The drivers owns the interface, if it is renamed, then I expect the driver to update the status, but I don't expect the consumer of the api to use the index of Linux interface, so interface name sounds much better to me

although, now that I think, index make sense to ident, we can have both fields optional, index and name?

danwinship · 2024-10-08T15:35:09Z

keps/sig-node/4817-resource-claim-device-status/README.md

+    // network interface.
+    //
+    // +optional
+    InterfaceName string `json:"interfaceName,omitempty" protobuf:"bytes,1,opt,name=interfaceName"`


why is this optional?

I think there cases that you want to report the addresses but not the interface name, I can think a few cases with network attached devices

DPDK / userspace stuff might not have an interface representation

danwinship · 2024-10-08T15:39:46Z

keps/sig-node/4817-resource-claim-device-status/README.md

+    // Addresses lists the network addresses assigned to the device's network interface.
+    // This can include both IPv4 and IPv6 addresses.
+    // The addresses are in the CIDR notation, which includes both the address and the
+    // associated subnet mask.
+    // e.g.: "192.0.2.0/24" for IPv4 and "2001:db8::/64" for IPv6.


So first, those examples are wrong; those are descriptions of subnets, not descriptions of IPs on those subnets. (You need, eg, "192.0.2.5/24").

But also, (1) this is contrary to everything else in kubernetes, where we always use plain IP addresses when referring to IP addresses, and only use CIDR notation for describing subnets and masks; (2) the CIDR notation for interface addresses is only really useful for IPv4 anyway, and doesn't really make sense for IPv6; (3) it is not clear what the user stories would be that would motivate including the link-local-subnet-mask but not any other routing information.

So I think these should be plain IPs.

Also we say "IPs" almost everywhere, and where we use "Addresses" it's because it could include non-IP values. Which is this?

"192.0.2.0" and "2001:db8::" are valid IP addresses. Just try:

vm-003 ~ # ip -6 addr add 2001:db8::/64 dev eth0 vm-003 ~ # ip -6 addr show dev eth0 2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet6 2001:db8::/64 scope global valid_lft forever preferred_lft forever inet6 fd00::c0a8:3/120 scope global valid_lft forever preferred_lft forever inet6 fe80::200:ff:fe01:3/64 scope link proto kernel_ll valid_lft forever preferred_lft forever

It is a common misunderstanding that host 0 (zero) is reserved. It isn't. But that misunderstanding may be buit-in in older systems/equipment.

But because of that misunderstanding 192.0.2.5/24 may be clearer

CIDR notation does make sense for IPv6. /64 is not required, except for some generated addresses such as the link-local address and SLAAC. IMO it make perfect sense to have one /64 range for an entire K8s cluster, and divide it into POD, ClusterIP, and external addresses with CIDR

as you pasted inhttps://github.com//pull/4861#discussion_r1793266465

the output of ip -a contains ips and masks, and this mimics that behavior

3: br-6a4e01b79274: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default link/ether 02:42:ac:5e:f6:25 brd ff:ff:ff:ff:ff:ff inet 192.168.8.1/24 brd 192.168.8.255 scope global br-6a4e01b79274 valid_lft forever preferred_lft forever inet6 fc00:f853:ccd:e793::1/64 scope global

so I do think it makes sense to have the cidr

"192.0.2.0" and "2001:db8::" are valid IP addresses.

Yes, I know, but I'm fairly certain that's not what they actually meant, and if it was they should change the example to be less confusing anyway.

CIDR notation does make sense for IPv6. /64 is not required

CIDR notation makes sense for describing IPv6 subnets. It does not make sense for describing IPv6 addresses because knowing the size of the subnet an address was allocated out of doesn't help you do anything in IPv6.

If an interface has the IPv4 address "192.0.2.5/24", that tells you that 192.0.2.6 is link-local and 192.0.3.5 is not. (This is not always true, but the entire point of the CIDR notation is to convey exactly that information to you.) But if it has the IPv6 address "2001:db8::1/64", that does not imply either that "2001:db8::2" is link-local or that "2001:db9::1" is not, because IPv6 subnetting is more complicated than that. There's nothing useful you can do with just the prefix length; it's only meaningful if you also have the rest of the information from the router advertisement.

the output of ip -a contains ips and masks, and this mimics that behavior

The output of ip -a contains a LOT of information. Why do we want the mask, but not the rest of it? What is the use case?

at minimum you need an IP if the Device is a pod like only interface, ala PodIP, but if is a kind of L2 connection to a external network, you may want to have the mask , at least to know how many more interfaces you can add on that net ... there are cases you ask for an interface in one network, but you don't get to decide how large is that network, so knowing the mask help you to prevent a claim being stuck

Who is "you" here? The DRA driver? The end user? What is the exact thing that someone can do when the prefix length is exposed here (without needing any other information that is also not exposed here) that they could not do with the prefix length not exposed.

Knowing the subnet size does not tell you how many IPs are available on the subnet so I don't see how it helps with that.

The end user can get this information and use it for example to get some hints on who (pod) can talk to whom (pod/external device). The CIDR could also tell you if what you got is what you requested (I requested an IP in the 192.0.2.0/24 subnet, is my address 192.0.2.x/24, or is there something wrong somewhere and my address is 192.0.2.5/8)

This NetworkDeviceData struct is likely to evolve in the future when people will start to solve more complex use-cases (as the NetworkStatus evolved in the network-attachment-definition API (Multus)). So I would prefer to have the addresses in the CIDR format now than just IP addresses and potentially a new field in the future for the CIDR.

danwinship · 2024-10-08T15:42:08Z

keps/sig-node/4817-resource-claim-device-status/README.md

+    // HWAddress represents the hardware address (e.g. MAC Address) of the device's network interface.
+    //
+    // +optional
+    HWAddress string `json:"hwAddress,omitempty" protobuf:"bytes,3,opt,name=hwAddress"`


So the use case for including IP addresses in the result is that people may want to connect to the pod via this interface. That's something someone might want to do even if they aren't directly poking around with the allocated device.

What is the use case for including HW addresses? What sort of user would have a reason to need to know the hwaddr, but would not be doing something where they had to look directly at the allocated device in question (which would allow them to find the hwaddr themselves)?

I am not directly involved any more, but I think non-standard (even seemingly weird) use-cases that don't use ip-addresses should be supported. An example is non-IP traffic, another is to allow "something" to send ip-packets to the interface even when it has no ip-address, or has an address shared by many.

The entity that consumes this status will probably not be an application that wants to connect to the pod, but something very specialized (router, load-balancer, ...?), and we can't predict what they will be.

Right, but will those use cases be able to work with just the hwaddr, or are they going to potentially need some other information as well?

Basically, on one end of the API space we have "indicate just the name/index of the device that was created", and on the other end we have "provide a full CNI ADD response". And my question is, why did we end up exactly here? Are we sure we need all of the fields we have, and we don't need any other fields?

danwinship · 2024-10-08T15:47:13Z

keps/sig-node/4817-resource-claim-device-status/README.md

+
+## Proposal
+
+### API - ResourceClaim.Status


The difference between "Proposal" and "Design Details" has always been confusing to me, but in this case it's especially confusing that the details of the proposal appear before the user stories, so I think this should mostly move to "Design Details".

Agree this has never been clear - it's long been on my list of things to fix "one day" in the KEP template. I would not block this KEP just for that.

keps/sig-node/4817-resource-claim-device-status/README.md

danwinship · 2024-10-08T15:48:47Z

keps/sig-node/4817-resource-claim-device-status/README.md

+traffic to pods over networks attached via the network devices, the IP
+addresses of the network device(s) must then be reflected in the
+`ResourceClaim.Status` allowing the network service controller(s) to access
+them.


So this motivates having the IPs be available, but not the hardware addrs (or anything else). Is that correct?

danwinship · 2024-10-08T15:59:02Z

keps/sig-node/4817-resource-claim-device-status/README.md

+In the `ResourceClaim.Status.Devices`, instead of having opaque field (`Data`) and
+specific type fields, an object reference could be used for each device. The custom 
+object would be created and maintained by the driver to report the status of the 
+devices.


I like this. It's very Gateway-API-ish (eg, ParametersReference)

This would be my last choice, as if you want to build a controller (e.g. EndpointSlice controller) that gets the IPs for some pods, then you will need to find those pods, then get the ResouceClaims for those pods (current scenario with what is being proposed), then get the status object for those ResourceClaims.

The best would be the New Pod.Status Field alternative, but from what I understood in the Multi-Network KEP and some other discussion (e.g. here), this is not possible.

@thockin CRXes!

thockin

I don't hate this. Some questions, but mostly minor. I do worry about the VAP being too rigid for real life.

thockin · 2024-10-08T22:19:20Z

keps/sig-node/4817-resource-claim-device-status/README.md

+
+## Proposal
+
+### API - ResourceClaim.Status


Agree this has never been clear - it's long been on my list of things to fix "one day" in the KEP template. I would not block this KEP just for that.

keps/sig-node/4817-resource-claim-device-status/README.md

thockin · 2024-10-08T22:48:59Z

keps/sig-node/4817-resource-claim-device-status/README.md

+    // +required
+    Device string `json:"device" protobuf:"bytes,3,rep,name=device"`
+
+    // Conditions contains the latest observation of the device's state.


Is this for hardware errors, too?

Yes, other than Ready, it's all up to the driver what lands in here. This will be the only place to put that information in the API server (we will not have it in pod status, for example).

Do we have / need guidance on what should go into Conditions and what should go into Data ?

well, conditions are just boolean. standard guidance https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#typical-status-properties for conditions would apply.

The guidance on what should go in Conditions and Data are in the "Proposal" section:

The state and characteristics of the device are reported in the `Conditions`, representing the operational state of the device and in the `Data`, an arbitrary data field representing device specific characteristics. .... `Data` being an arbitrary data field allows the DRA Driver to store device specific data in different formats. For example, a Network Device being configured by via a CNI plugin could get its `Data` field filled with the CNI result for troubleshooting purpose and with a Network result in a modern standard format (closer to Pod.Status.PodIPs for example) used by 3rd party controllers.

or did you mean more detailed ones?

Conditions are not an awesome API for most things, but they are the only way to express things without knowing the "extension" schema.

I suspect we will have to feel this distinction out as we go, and crystallize it more later.

thockin · 2024-10-08T23:23:28Z

keps/sig-node/4817-resource-claim-device-status/README.md

+    // Addresses lists the network addresses assigned to the device's network interface.
+    // This can include both IPv4 and IPv6 addresses.
+    // The addresses are in the CIDR notation, which includes both the address and the
+    // associated subnet mask.
+    // e.g.: "192.0.2.0/24" for IPv4 and "2001:db8::/64" for IPv6.


Also we say "IPs" almost everywhere, and where we use "Addresses" it's because it could include non-IP values. Which is this?

keps/sig-node/4817-resource-claim-device-status/README.md

thockin · 2024-10-08T23:36:19Z

keps/sig-node/4817-resource-claim-device-status/README.md

+### Write Permission
+
+To prevent unauthorized or accidental modifications by entities that do not 
+have access to a particular resource, a `ValidatingAdmissionPolicy` will be 


I do not love this trend - does it need a VAP per driver?

thockin · 2024-10-08T23:46:11Z

keps/sig-node/4817-resource-claim-device-status/README.md

+`ResourceClaim.Status.Devices` is running on the same node.
+
+Here is a `ResourceClaim` allocated on a node. This would only work for now if 
+exactly one node is set:


I'm not sure I follow - this seems very rigid? Any driver that doesn't follow this exact standard can't be secured? Or do we end up embedding thdriver name into the VAP, and have one VAP per driver?

thockin · 2024-10-08T23:50:04Z

keps/sig-node/4817-resource-claim-device-status/README.md

+        object.status.devices != oldObject.status.devices
+  validations:
+  - expression: >- # User node must be the same node as the one where the ResourceClaim is allocated.
+      variables.userNodeName != variables.objectNodeName


Aren't VAP's asserting the affirmative? I.e. this should be == ? If not, I am confused about my own VAPs.

I will re-try this VAP with my PoC,validations should work fine, but I remember having issues with the matchConditions

johnbelamaric · 2024-10-09T00:13:09Z

I don't hate this.

High praise from @thockin! 😆

aojea · 2024-10-09T14:52:30Z

keps/sig-node/4817-resource-claim-device-status/README.md

+- Authorization implemented to allow only the user on the same node as the 
+  allocated `ResourceClaim` to write the status of the devices.
+
+#### Beta


Talked with @danwinship offline, @LionelJouin @dougbtv what do you think if we add to beta criteria having a multus DRA network driver ?

I'd definitely be interested in it! And there's a sort of basis that exists pre-1.31 in: k8snetworkplumbingwg/multus-cni#1078

But yes, it'd need extensions to this functionality as well as a Multus DRA driver itself, as the challenge with that implementation is that it uses an example reference driver, and directly being a driver would be a good way to see how well the spec fits with an implementation.

I added 1 real-world usage as beta requirement.

keps/sig-node/4817-resource-claim-device-status/README.md

johnbelamaric · 2024-10-09T16:47:34Z

One small nit for PRR, not needed to approve. Once there is SIG approval, I can give the PRR approval.

thockin · 2024-10-09T17:10:18Z

@danwinship @aojea I am OK with this KEP - what are you looking for to LGTM vs. things that can be discussed after?

keps/sig-node/4817-resource-claim-device-status/README.md

mrunalp · 2024-10-10T03:39:03Z

/approve
from SIG Node

johnbelamaric

I am happy with PRR and from the DRA side (modulo the exact API we can work out later). I will give the PRR approval once this has SIG node and SIG net approval.

johnbelamaric · 2024-10-10T03:41:05Z

I am happy with PRR and from the DRA side (modulo the exact API we can work out later). I will give the PRR approval once this has SIG node and SIG net approval.

Race condition with @mrunalp. So now @thockin has already approved but left an open question for @danwinship and @aojea so I will approve but leave the LGTM to them.

/approve

k8s-ci-robot · 2024-10-10T03:41:17Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: johnbelamaric, LionelJouin, mrunalp

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [johnbelamaric]
~~keps/sig-node/OWNERS~~ [johnbelamaric,mrunalp]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

danwinship · 2024-10-10T16:29:51Z

/lgtm
to move forward with

thockin · 2024-10-10T16:33:14Z

The comment thread about API shape (#4861 (review)) has become "outdated". :(

shaneutt · 2024-10-10T16:42:47Z

We discussed this in the SIG Network community today and there were no major concerns or objections stated (though there was a soft suggestion that we might want to explore more options on how we handle the status if we have time).

/unhold

aojea · 2024-10-10T16:56:14Z

lgtm

KEP-4817 - Resource Claim Device Status

02d0399

Signed-off-by: Lionel Jouin <[email protected]>

k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Sep 19, 2024

k8s-ci-robot requested review from dchen1107 and mrunalp September 19, 2024 19:05

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/node Categorizes an issue or PR as relevant to SIG Node. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2024

k8s-ci-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Sep 19, 2024

k8s-ci-robot assigned aojea and johnbelamaric Sep 19, 2024

k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Sep 19, 2024

k8s-ci-robot changed the title ~~KEP-4817 - Resource Claim Device Status~~ [KEP-4817]: DRA: Resource Claim Status with possible standardized network interface data Sep 19, 2024

johnbelamaric reviewed Sep 20, 2024

View reviewed changes

Format (line-wrap)

56fc9ee

Signed-off-by: Lionel Jouin <[email protected]>

pohly reviewed Sep 23, 2024

View reviewed changes

MikeZappa87 reviewed Sep 23, 2024

View reviewed changes

Fixes based on review

f3ae242

* Typos * Establish a standard instead of laying the foundation for creating a standard * KEP Name * DeviceStatuses keys * AllocatedDeviceStatus request field removed * IPs to a new struct Signed-off-by: Lionel Jouin <[email protected]>

LionelJouin force-pushed the kep-4817 branch from 8bd0e0e to f3ae242 Compare September 23, 2024 15:52

aojea reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Show resolved Hide resolved

aojea reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Outdated Show resolved Hide resolved

aojea reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Outdated Show resolved Hide resolved

aojea reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Outdated Show resolved Hide resolved

aojea reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Outdated Show resolved Hide resolved

aojea reviewed Sep 24, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Outdated Show resolved Hide resolved

dougbtv reviewed Oct 8, 2024

View reviewed changes

danwinship reviewed Oct 8, 2024

View reviewed changes

thockin reviewed Oct 8, 2024

View reviewed changes

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 9, 2024

aojea reviewed Oct 9, 2024

View reviewed changes

johnbelamaric reviewed Oct 9, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Outdated Show resolved Hide resolved

LionelJouin force-pushed the kep-4817 branch from 5fc30a0 to aca3a6c Compare October 9, 2024 17:51

johnbelamaric reviewed Oct 9, 2024

View reviewed changes

keps/sig-node/4817-resource-claim-device-status/README.md Outdated Show resolved Hide resolved

LionelJouin force-pushed the kep-4817 branch from aca3a6c to 942a143 Compare October 9, 2024 21:39

johnbelamaric reviewed Oct 10, 2024

View reviewed changes

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 10, 2024

Fixes based on review

7cb9482

LionelJouin force-pushed the kep-4817 branch from 942a143 to 7cb9482 Compare October 10, 2024 08:34

k8s-ci-robot assigned danwinship Oct 10, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 10, 2024

thockin removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 10, 2024

k8s-ci-robot merged commit 499e983 into kubernetes:master Oct 10, 2024
4 checks passed

k8s-ci-robot added this to the v1.32 milestone Oct 10, 2024

This was referenced Oct 21, 2024

DRA: Implementation of ResourceClaim.Status.Devices (KEP-4817) kubernetes/kubernetes#128240

Merged

REQUEST: New membership for LionelJouin kubernetes/org#5229

Closed


		The API changes define a new `DeviceStatuses` field in the existing `ResourceClaimStatus` struct. `DeviceStatuses` is a slice of a new struct `AllocatedDeviceStatus` which holds device specific information.

		A device, identified by `<driver name>/<pool name>/<device name>` can be represented only once in the `DeviceStatuses` slice and will also mention which request caused the device to be allocated. The state and characteristics of the device are reported in the `Conditions`, representing the operational state of the device and in the `DeviceInfo`, an arbitrary data slice representing device specific characteristics. Additionally, for networking devices, a field `NetworkDeviceInfo` can be used to report the IPs, the MAC address and the interface name.

	// DeviceInfo contains Arbitrary driver-specific data.
	// DeviceInfo contains arbitrary driver-specific data.

[KEP-4817]: DRA: Resource Claim Status with possible standardized network interface data #4861

[KEP-4817]: DRA: Resource Claim Status with possible standardized network interface data #4861

Conversation

LionelJouin commented Sep 19, 2024 • edited Loading

k8s-ci-robot commented Sep 19, 2024

johnbelamaric commented Sep 19, 2024

kannon92 commented Sep 19, 2024

kannon92 commented Sep 19, 2024

johnbelamaric left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

johnbelamaric commented Sep 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Oct 8, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

uablrek Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

aojea Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

danwinship Oct 9, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LionelJouin commented Sep 19, 2024 •

edited

Loading

aojea Oct 8, 2024 •

edited

Loading

uablrek Oct 9, 2024 •

edited

Loading

aojea Oct 9, 2024 •

edited

Loading

danwinship Oct 9, 2024 •

edited

Loading