how to use yolov11s-seg supervision onnx runtime? #1789

pranta-barua007 · 2025-02-15T14:26:55Z

pranta-barua007
Feb 15, 2025

dear @onuralpszr i saw similar case on #1626 and tried some customization with my own usecase for segmentation but doesn't seem to properly working

here is how I am exporting my model with ultralytics

ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/disease__instance_segmented/data.yaml",
) # creates 'best.onnx'

which outputs in console

Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.00GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with input shape (1, 3, 640, 640) BCHW and output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 4.2s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)

Export complete (5.5s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx

I have 4 classes in my model
as I applied nms my output0 is already transposed I think
where first 4 indices are bbox. 5 is prob, 6 is class id 7 and rest 32 are mask and the 300 is for the model will detect up to 300 results, educate if my interpretation is wrong ?

here is my implementation

def xywh2xyxy(x):
    y = np.copy(x)
    y[..., 0] = x[..., 0] - x[..., 2] / 2
    y[..., 1] = x[..., 1] - x[..., 3] / 2
    y[..., 2] = x[..., 0] + x[..., 2] / 2
    y[..., 3] = x[..., 1] + x[..., 3] / 2
    return y


class YOLOv11:
    def __init__(self, path, conf_thres=0.7, iou_thres=0.5):
        self.conf_threshold = conf_thres
        self.iou_threshold = iou_thres
        # Initialize the ONNX model
        self.initialize_model(path)

    def __call__(self, image):
        return self.detect_objects(image)

    def initialize_model(self, path):
        self.session = onnxruntime.InferenceSession(
            path, providers=onnxruntime.get_available_providers()
        )
        self.get_input_details()
        self.get_output_details()

    def detect_objects(self, image):
        input_tensor = self.prepare_input(image)
        outputs = self.inference(input_tensor)
        self.boxes, self.scores, self.class_ids, self.masks = self.process_output(outputs)
        return self.boxes, self.scores, self.class_ids, self.masks

    def prepare_input(self, image):
        self.img_height, self.img_width = image.shape[:2]
        input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        input_img = cv2.resize(input_img, (self.input_width, self.input_height))
        input_img = input_img / 255.0
        input_img = input_img.transpose(2, 0, 1)
        input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
        return input_tensor

    def inference(self, input_tensor):
        outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
        return outputs

    def process_output(self, outputs):
        """
        Process model outputs:
          - outputs[0]: shape (1, 300, 38)
            * 0-3: bounding box (xywh)
            * 4: confidence score
            * 5: class id
            * 6-37: segmentation coefficients (32 values)
          - outputs[1]: shape (1, 38, 160, 160) mask prototypes
        """
        # Remove batch dimension from detections
        predictions = np.squeeze(outputs[0])  # shape (300, 38)
        mask_protos = outputs[1]               # shape (1, 38, 160, 160)

        # Filter predictions using the confidence score (index 4)
        conf_scores = predictions[:, 4]
        valid = conf_scores > self.conf_threshold
        predictions = predictions[valid]
        scores = conf_scores[valid]

        if len(scores) == 0:
            return [], [], [], []

        # Extract bounding boxes (indices 0-3)
        boxes = self.extract_boxes(predictions)

        # Extract class ids (index 5) and cast them to int
        class_ids = predictions[:, 5].astype(np.int32)

        # Extract segmentation masks using segmentation coefficients (indices 6-37)
        masks = self.extract_masks(predictions, mask_protos)

        return boxes, scores, class_ids, masks

    def extract_boxes(self, predictions):
        boxes = predictions[:, :4]  # xywh format
        boxes = self.rescale_boxes(boxes)
        boxes = xywh2xyxy(boxes)
        return boxes

    def rescale_boxes(self, boxes):
        # Scale boxes from network input dimensions to original image dimensions
        input_shape = np.array([self.input_width, self.input_height, self.input_width, self.input_height])
        boxes = np.divide(boxes, input_shape, dtype=np.float32)
        boxes *= np.array([self.img_width, self.img_height, self.img_width, self.img_height])
        return boxes

    def extract_masks(self, predictions, mask_protos):
        """
        Compute segmentation masks:
          - predictions: (num_detections, 38) with segmentation coefficients at indices 6-37
          - mask_protos: (1, 38, 160, 160); we use the first 32 channels to match coefficients.
        """
        # Get segmentation coefficients from predictions (32 coefficients)
        seg_coeffs = predictions[:, 6:38]  # shape: (num_detections, 32)

        # Use the first 32 channels from mask prototypes
        mask_protos = mask_protos[0, :32, :, :]  # shape: (32, 160, 160)

        # Compute per-detection masks as a weighted sum over mask prototypes
        masks = np.einsum('nc,chw->nhw', seg_coeffs, mask_protos)

        # Apply sigmoid to get values between 0 and 1
        masks = 1 / (1 + np.exp(-masks))

        # Threshold masks to produce binary masks
        masks = masks > 0.5

        # Resize each mask to the original image dimensions
        final_masks = []
        for mask in masks:
            mask_uint8 = (mask.astype(np.uint8)) * 255
            mask_resized = cv2.resize(mask_uint8, (self.img_width, self.img_height), interpolation=cv2.INTER_NEAREST)
            final_masks.append(mask_resized)
        final_masks = np.array(final_masks)

        return final_masks

    def get_input_details(self):
        model_inputs = self.session.get_inputs()
        self.input_names = [inp.name for inp in model_inputs]
        self.input_shape = model_inputs[0].shape
        self.input_height = self.input_shape[2]
        self.input_width = self.input_shape[3]

    def get_output_details(self):
        model_outputs = self.session.get_outputs()
        self.output_names = [out.name for out in model_outputs]

pranta-barua007 · 2025-02-15T14:33:46Z

pranta-barua007
Feb 15, 2025
Author

here is how i am making predictions

Load the model and create InferenceSession

best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"

detector = YOLOv11(best_weights_path, conf_thres=0.2, iou_thres=0.3)

img = cv2.imread("/content/download (1).jpeg")

Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)

boxes, scores, class_ids, masks = detector(img)

boxes

array([[     274.24,      185.68,      958.67,       689.4],
       [     244.84,      252.61,      830.42,      883.34]], dtype=float32)

scores

array([     0.8895,     0.86876], dtype=float32)

class_Ids

array([2, 2], dtype=int32)

masks

array([[[255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        ...,
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255]],

       [[255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        ...,
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255],
        [255, 255, 255, ..., 255, 255, 255]]], dtype=uint8)

0 replies

pranta-barua007 · 2025-02-15T14:37:53Z

pranta-barua007
Feb 15, 2025
Author

i have checked predicted the class id and prob it is right

things that are working fine
✅ scores
✅class_Ids
not working
❌boxes
❌masks

here is how it should be, below is direct prediction results with ultralytics

0 replies

onuralpszr · 2025-02-15T14:40:10Z

onuralpszr
Feb 15, 2025
Collaborator

@pranta-barua007 hello, let me check mask case quickly in my collab

0 replies

onuralpszr · 2025-02-15T14:41:18Z

onuralpszr
Feb 15, 2025
Collaborator

@pranta-barua007 also If you don't mind can you share your export parameters and model with me as well.

0 replies

pranta-barua007 · 2025-02-15T14:42:19Z

pranta-barua007
Feb 15, 2025
Author

@onuralpszr thanks for the quick reply ! should I share it here?

0 replies

onuralpszr · 2025-02-15T14:43:22Z

onuralpszr
Feb 15, 2025
Collaborator

@onuralpszr thanks for the quick reply ! should I share it here?

If it is a problem you can share to my e-mail "[email protected]" via google drive

0 replies

pranta-barua007 · 2025-02-15T14:45:55Z

pranta-barua007
Feb 15, 2025
Author

@onuralpszr please do check i have shared

0 replies

onuralpszr · 2025-02-15T14:49:32Z

onuralpszr
Feb 15, 2025
Collaborator

@onuralpszr please do check i have shared

Export parameters also please ?

0 replies

pranta-barua007 · 2025-02-15T14:51:46Z

pranta-barua007
Feb 15, 2025
Author

@onuralpszr can you please explain

here is how i am exporting

ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/disease__instance_segmented/data.yaml",
) # creates 'best.onnx'

I got params from here https://docs.ultralytics.com/modes/export/#arguments

0 replies

onuralpszr · 2025-02-15T14:54:23Z

onuralpszr
Feb 15, 2025
Collaborator

@onuralpszr please do check i have shared

@onuralpszr can you please explain

here is how i am exporting

ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/disease__instance_segmented/data.yaml",
) # creates 'best.onnx'
I got params from here https://docs.ultralytics.com/modes/export/#arguments

I got what I needed all good :)

0 replies

onuralpszr · 2025-02-15T16:24:59Z

onuralpszr
Feb 15, 2025
Collaborator

@pranta-barua007 can you also upload original picture you used

0 replies

onuralpszr · 2025-02-15T16:25:18Z

onuralpszr
Feb 15, 2025
Collaborator

@pranta-barua007 also one "train" data picture would be also great for testing purpose as well

0 replies

pranta-barua007 · 2025-02-15T16:26:31Z

pranta-barua007
Feb 15, 2025
Author

ok @onuralpszr uploading on the shared folder

0 replies

onuralpszr · 2025-02-15T16:26:56Z

onuralpszr
Feb 15, 2025
Collaborator

Great

0 replies

pranta-barua007 · 2025-02-15T16:30:29Z

pranta-barua007
Feb 15, 2025
Author

@onuralpszr shared some information currently train data is not available can you please check if the shared resources help ?

0 replies

pranta-barua007 · 2025-02-16T06:52:02Z

pranta-barua007
Feb 16, 2025
Author

@onuralpszr

I have tried exporting model in 3 different ways and inspected difference in outputshape (my default is OPTION - 3)

without DYNAMIC and NMS -- OPTION 1

from ultralytics import YOLO

# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/without_nms_and_dynamic/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)

ft_loaded_best_model.export(
    format="onnx",
    data="/content/disease_instance_segmented/data.yaml"
)

Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.pt' with 

** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB) **

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 2.6s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx' (38.7 MB)

Export complete (4.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx

with DYNAMIC and NMS -- OPTION 2

from ultralytics import YOLO

# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/with_dynamic_and_nms/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)

ft_loaded_best_model.export(
    format="onnx",
    nms=True,
    dynamic=True,
    data="/content/disease_instance_segmented/data.yaml"
)

Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.pt' with 

** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB) **

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 35.8s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx' (38.6 MB)

Export complete (42.2s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx

without DYNAMIC -- OPTION 3 (DEFAULT)

from ultralytics import YOLO

# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)

ft_loaded_best_model.export(
    format="onnx",
    nms=True,
    data="/content/disease_instance_segmented/data.yaml"
)

Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with 

** input shape (1, 3, 640, 640) BCHW ** 
** and output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB) **

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 4.4s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)

Export complete (6.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx

0 replies

pranta-barua007 · 2025-02-17T17:14:55Z

pranta-barua007
Feb 17, 2025
Author

@onuralpszr ⚠️UPDATE now bounding box also working ✅ just need help for the mask❌ 🙂‍↕️

class YOLOv11:
    def __init__(self, path, conf_thres=0.7, iou_thres=0.5):
        self.conf_threshold = conf_thres
        self.iou_threshold = iou_thres
        self.initialize_model(path)

    def __call__(self, image):
        return self.detect_objects(image)

    def initialize_model(self, path):
        self.session = onnxruntime.InferenceSession(
            path, providers=onnxruntime.get_available_providers()
        )
        self.get_input_details()
        self.get_output_details()

    def detect_objects(self, image):
        # Save original image dimensions
        self.img_height, self.img_width = image.shape[:2]
        # Prepare input (resize, normalize, etc.)
        input_tensor = self.prepare_input(image)
        # Run inference
        outputs = self.inference(input_tensor)
        # Process outputs into boxes, scores, class IDs, and masks
        boxes, scores, class_ids, masks = self.process_output(outputs)
        return boxes, scores, class_ids, masks

    def prepare_input(self, image):
        # Convert BGR to RGB and resize to model input size (e.g. 640x640)
        input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        input_img = cv2.resize(input_img, (self.input_width, self.input_height))
        # Normalize to [0, 1]
        input_img = input_img / 255.0
        # Change data layout from HWC to CHW
        input_img = input_img.transpose(2, 0, 1)
        # Add batch dimension: [1, C, H, W]
        input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
        return input_tensor

    def inference(self, input_tensor):
        outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
        return outputs

    def process_output(self, outputs):
        """
        Model outputs:
          - outputs[0]: shape (1, 300, 38)
              indices 0-3: bounding box (assumed [x1, y1, x2, y2] in 640×640 space)
              index 4: confidence score
              index 5: class id
              indices 6-37: segmentation coefficients (32 values)
          - outputs[1]: shape (1, 38, 160, 160) -> mask prototypes
        """
        # Remove batch dimension from detections (results in (300, 38))
        predictions = np.squeeze(outputs[0], axis=0)
        mask_protos = outputs[1]  # shape: (1, 38, 160, 160)

        # Filter detections based on confidence (index 4)
        conf_scores = predictions[:, 4]
        valid = conf_scores > self.conf_threshold
        predictions = predictions[valid]
        scores = conf_scores[valid]
        if len(scores) == 0:
            return [], [], [], []

        # Extract bounding boxes (assumed already in [x1, y1, x2, y2] format)
        boxes = self.extract_boxes(predictions)

        # Extract class ids (index 5)
        class_ids = predictions[:, 5].astype(np.int32)

        # Extract segmentation masks using coefficients (indices 6-37)
        masks = self.extract_masks(predictions, mask_protos)

        return boxes, scores, class_ids, masks

    def extract_boxes(self, predictions):
        # Get the first 4 values; these are assumed to be [x1, y1, x2, y2] in 640×640 space.
        boxes = predictions[:, :4]
        # If the original image size differs from the model input size,
        # rescale boxes from (self.input_width, self.input_height) to (self.img_width, self.img_height)
        if (self.img_width != self.input_width) or (self.img_height != self.input_height):
            boxes = self.rescale_boxes_corner_format(boxes)
        return boxes

    def rescale_boxes_corner_format(self, boxes):
        # Calculate scaling factors from model input size to original image size
        scale_x = float(self.img_width) / self.input_width
        scale_y = float(self.img_height) / self.input_height
        boxes[:, [0, 2]] *= scale_x  # x1, x2
        boxes[:, [1, 3]] *= scale_y  # y1, y2
        return boxes

    def extract_masks(self, predictions, mask_protos):
        """
        Compute segmentation masks:
          - For each detection, use the 32 segmentation coefficients (indices 6-37)
            to compute a weighted sum over the first 32 channels of the mask prototypes.
          - The mask prototypes have shape (1, 38, 160, 160), so we select the first 32 channels.
        """
        # Extract segmentation coefficients (shape: (num_detections, 32))
        seg_coeffs = predictions[:, 6:38]
        # Use the first 32 channels from mask prototypes; remove batch dimension → (32, 160, 160)
        mask_protos = mask_protos[0, :32, :, :]
        # Compute masks as a weighted sum of mask prototypes for each detection
        masks = np.einsum('nc,chw->nhw', seg_coeffs, mask_protos)
        # Apply sigmoid to obtain probabilities between 0 and 1
        masks = 1.0 / (1.0 + np.exp(-masks))
        # Binarize masks with a threshold of 0.5
        masks = masks > 0.5

        # Resize each mask from 160x160 (mask prototype resolution) to the original image dimensions
        final_masks = []
        for mask in masks:
            mask_uint8 = mask.astype(np.uint8) * 255
            mask_resized = cv2.resize(mask_uint8,
                                      (self.img_width, self.img_height),
                                      interpolation=cv2.INTER_NEAREST)
            final_masks.append(mask_resized)
        final_masks = np.array(final_masks)
        return final_masks

    def get_input_details(self):
        model_inputs = self.session.get_inputs()
        self.input_names = [inp.name for inp in model_inputs]
        self.input_shape = model_inputs[0].shape  # typically [1, 3, 640, 640]
        self.input_height = self.input_shape[2]
        self.input_width = self.input_shape[3]

    def get_output_details(self):
        model_outputs = self.session.get_outputs()
        self.output_names = [out.name for out in model_outputs]

output

# Load the model and create InferenceSession
best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"

detector = YOLOv11(best_weights_path, conf_thres=0.4, iou_thres=0.4)

img = cv2.imread("/content/download (1).jpeg")
# Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)
boxes, scores, class_ids, masks = detector(img)

# 1) Print bounding box coordinates to confirm they're in-image
print("Bounding boxes:\n", detections.xyxy)

# 2) If you have masks, print their shape and check if non-empty
if detections.mask is not None and len(detections.mask) > 0:
    print("Mask shape:", detections.mask.shape)
    print("Mask unique values:", np.unique(detections.mask[0]))  # e.g., [0, 255]

# 3) Create annotators with explicit colors/thickness
mask_annotator = sv.MaskAnnotator(
    # By default, it uses random colors. You can force a single color if desired:
    color=sv.Color.GREEN
)
box_annotator = sv.BoxAnnotator(
    thickness=2               # thicker line
)
label_annotator = sv.LabelAnnotator(
    text_scale=0.7,
    text_thickness=2
)

# 4) Draw them in the recommended order: masks → boxes → labels
annotated_image = img.copy()
annotated_image = mask_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections)

# 5) Display the result
sv.plot_image(annotated_image)
cv2.imwrite("debug_output.jpg", annotated_image)

output

Bounding boxes:
 [[     619.21      436.77      686.61      506.29]
 [     536.22      568.54      582.89      634.71]]
Mask shape: (2, 900, 1200)
Mask unique values: [  0 255]
True

0 replies

onuralpszr · 2025-02-17T17:18:18Z

onuralpszr
Feb 17, 2025
Collaborator

@onuralpszr ⚠️UPDATE now bounding box also working ✅ just need help for the mask❌ 🙂‍↕️

class YOLOv11:
def init(self, path, conf_thres=0.7, iou_thres=0.5):
self.conf_threshold = conf_thres
self.iou_threshold = iou_thres
self.initialize_model(path)

def __call__(self, image):
    return self.detect_objects(image)

def initialize_model(self, path):
    self.session = onnxruntime.InferenceSession(
        path, providers=onnxruntime.get_available_providers()
    )
    self.get_input_details()
    self.get_output_details()

def detect_objects(self, image):
    # Save original image dimensions
    self.img_height, self.img_width = image.shape[:2]
    # Prepare input (resize, normalize, etc.)
    input_tensor = self.prepare_input(image)
    # Run inference
    outputs = self.inference(input_tensor)
    # Process outputs into boxes, scores, class IDs, and masks
    boxes, scores, class_ids, masks = self.process_output(outputs)
    return boxes, scores, class_ids, masks

def prepare_input(self, image):
    # Convert BGR to RGB and resize to model input size (e.g. 640x640)
    input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    input_img = cv2.resize(input_img, (self.input_width, self.input_height))
    # Normalize to [0, 1]
    input_img = input_img / 255.0
    # Change data layout from HWC to CHW
    input_img = input_img.transpose(2, 0, 1)
    # Add batch dimension: [1, C, H, W]
    input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
    return input_tensor

def inference(self, input_tensor):
    outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
    return outputs

def process_output(self, outputs):
    """
    Model outputs:
      - outputs[0]: shape (1, 300, 38)
          indices 0-3: bounding box (assumed [x1, y1, x2, y2] in 640×640 space)
          index 4: confidence score
          index 5: class id
          indices 6-37: segmentation coefficients (32 values)
      - outputs[1]: shape (1, 38, 160, 160) -> mask prototypes
    """
    # Remove batch dimension from detections (results in (300, 38))
    predictions = np.squeeze(outputs[0], axis=0)
    mask_protos = outputs[1]  # shape: (1, 38, 160, 160)

    # Filter detections based on confidence (index 4)
    conf_scores = predictions[:, 4]
    valid = conf_scores > self.conf_threshold
    predictions = predictions[valid]
    scores = conf_scores[valid]
    if len(scores) == 0:
        return [], [], [], []

    # Extract bounding boxes (assumed already in [x1, y1, x2, y2] format)
    boxes = self.extract_boxes(predictions)

    # Extract class ids (index 5)
    class_ids = predictions[:, 5].astype(np.int32)

    # Extract segmentation masks using coefficients (indices 6-37)
    masks = self.extract_masks(predictions, mask_protos)

    return boxes, scores, class_ids, masks

def extract_boxes(self, predictions):
    # Get the first 4 values; these are assumed to be [x1, y1, x2, y2] in 640×640 space.
    boxes = predictions[:, :4]
    # If the original image size differs from the model input size,
    # rescale boxes from (self.input_width, self.input_height) to (self.img_width, self.img_height)
    if (self.img_width != self.input_width) or (self.img_height != self.input_height):
        boxes = self.rescale_boxes_corner_format(boxes)
    return boxes

def rescale_boxes_corner_format(self, boxes):
    # Calculate scaling factors from model input size to original image size
    scale_x = float(self.img_width) / self.input_width
    scale_y = float(self.img_height) / self.input_height
    boxes[:, [0, 2]] *= scale_x  # x1, x2
    boxes[:, [1, 3]] *= scale_y  # y1, y2
    return boxes

def extract_masks(self, predictions, mask_protos):
    """
    Compute segmentation masks:
      - For each detection, use the 32 segmentation coefficients (indices 6-37)
        to compute a weighted sum over the first 32 channels of the mask prototypes.
      - The mask prototypes have shape (1, 38, 160, 160), so we select the first 32 channels.
    """
    # Extract segmentation coefficients (shape: (num_detections, 32))
    seg_coeffs = predictions[:, 6:38]
    # Use the first 32 channels from mask prototypes; remove batch dimension → (32, 160, 160)
    mask_protos = mask_protos[0, :32, :, :]
    # Compute masks as a weighted sum of mask prototypes for each detection
    masks = np.einsum('nc,chw->nhw', seg_coeffs, mask_protos)
    # Apply sigmoid to obtain probabilities between 0 and 1
    masks = 1.0 / (1.0 + np.exp(-masks))
    # Binarize masks with a threshold of 0.5
    masks = masks > 0.5

    # Resize each mask from 160x160 (mask prototype resolution) to the original image dimensions
    final_masks = []
    for mask in masks:
        mask_uint8 = mask.astype(np.uint8) * 255
        mask_resized = cv2.resize(mask_uint8,
                                  (self.img_width, self.img_height),
                                  interpolation=cv2.INTER_NEAREST)
        final_masks.append(mask_resized)
    final_masks = np.array(final_masks)
    return final_masks

def get_input_details(self):
    model_inputs = self.session.get_inputs()
    self.input_names = [inp.name for inp in model_inputs]
    self.input_shape = model_inputs[0].shape  # typically [1, 3, 640, 640]
    self.input_height = self.input_shape[2]
    self.input_width = self.input_shape[3]

def get_output_details(self):
    model_outputs = self.session.get_outputs()
    self.output_names = [out.name for out in model_outputs]

output

Load the model and create InferenceSession

best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"

detector = YOLOv11(best_weights_path, conf_thres=0.4, iou_thres=0.4)

img = cv2.imread("/content/download (1).jpeg")

Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)

boxes, scores, class_ids, masks = detector(img)

1) Print bounding box coordinates to confirm they're in-image

print("Bounding boxes:\n", detections.xyxy)

2) If you have masks, print their shape and check if non-empty

if detections.mask is not None and len(detections.mask) > 0:
print("Mask shape:", detections.mask.shape)
print("Mask unique values:", np.unique(detections.mask[0])) # e.g., [0, 255]

3) Create annotators with explicit colors/thickness

mask_annotator = sv.MaskAnnotator(
# By default, it uses random colors. You can force a single color if desired:
color=sv.Color.GREEN
)
box_annotator = sv.BoxAnnotator(
thickness=2 # thicker line
)
label_annotator = sv.LabelAnnotator(
text_scale=0.7,
text_thickness=2
)

4) Draw them in the recommended order: masks → boxes → labels

annotated_image = img.copy()
annotated_image = mask_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections)

5) Display the result

sv.plot_image(annotated_image)
cv2.imwrite("debug_output.jpg", annotated_image)
output

Bounding boxes:
[[ 619.21 436.77 686.61 506.29]
[ 536.22 568.54 582.89 634.71]]
Mask shape: (2, 900, 1200)
Mask unique values: [ 0 255]
True

I also made my update on colab to also make mask work as well. Can you check same collab please

0 replies

pranta-barua007 · 2025-02-17T17:19:44Z

pranta-barua007
Feb 17, 2025
Author

@onuralpszr colab on #1626 ?

0 replies

onuralpszr · 2025-02-17T17:20:44Z

onuralpszr
Feb 17, 2025
Collaborator

@onuralpszr colab on #1626 ?

yes

0 replies

pranta-barua007 · 2025-02-17T17:27:47Z

pranta-barua007
Feb 17, 2025
Author

@onuralpszr getting error on this line

boolean_mask = masks.astype(bool)

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[<ipython-input-92-d88d72348bb6>](https://localhost:8080/#) in <cell line: 0>()
----> 1 boolean_mask = masks.astype(bool)

AttributeError: 'list' object has no attribute 'astype'

0 replies

onuralpszr · 2025-02-17T17:31:22Z

onuralpszr
Feb 17, 2025
Collaborator

@onuralpszr getting error on this line

boolean_mask = masks.astype(bool)

AttributeError Traceback (most recent call last)
in <cell line: 0>()
----> 1 boolean_mask = masks.astype(bool)

AttributeError: 'list' object has no attribute 'astype'

Convert masks to np.array instead of list but I pressume you got empty mask list ?

0 replies

onuralpszr · 2025-02-17T17:32:20Z

onuralpszr
Feb 17, 2025
Collaborator

I am converting this to discussion

0 replies

pranta-barua007 · 2025-02-17T17:34:06Z

pranta-barua007
Feb 17, 2025
Author

@onuralpszr i am getting everythink empty now box, mask, class_ids, scores [] !!

0 replies

pranta-barua007 · 2025-02-17T17:42:29Z

pranta-barua007
Feb 17, 2025
Author

@onuralpszr

just to be clear i am exporting my model like this

from ultralytics import YOLO

# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)

ft_loaded_best_model.export(
    format="onnx",
    nms=True,
    data="/content/dental_disease__instance_segmented-9/data.yaml"
)

OUTPUT

Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with 

⚠️⚠️⚠️
input shape (1, 3, 640, 640) BCHW and 
output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)
⚠️⚠️⚠️

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 5.8s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)

Export complete (7.5s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx

highlighted the I/O shapes above with ⚠️

0 replies

pranta-barua007 · 2025-02-17T17:55:17Z

pranta-barua007
Feb 17, 2025
Author

@onuralpszr UPDTAE working ✅ IF exporting ONNX without nms

ft_loaded_best_model.export(
    format="onnx",
    # nms=True,
    data="/content/dental_disease__instance_segmented-9/data.yaml"
)

Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with 

⚠️⚠️⚠️
input shape (1, 3, 640, 640) BCHW and 
output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB)
⚠️⚠️⚠️

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 3.1s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)

Export complete (5.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx

1 reply

onuralpszr Feb 17, 2025
Collaborator

@onuralpszr UPDTAE working ✅ IF exporting ONNX without nms

ft_loaded_best_model.export(
    format="onnx",
    # nms=True,
    data="/content/dental_disease__instance_segmented-9/data.yaml"
)

Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with 

⚠️⚠️⚠️
input shape (1, 3, 640, 640) BCHW and 
output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB)
⚠️⚠️⚠️

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 3.1s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)

Export complete (5.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx

Ah okay awesome.

pranta-barua007 · 2025-02-17T18:01:32Z

pranta-barua007
Feb 17, 2025
Author

with nms=True IO shape is

input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)

without nms IO shape is

input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB)

does applying nms already apply the NMS and Transpose the output so we don't need to do it in code ?

2 replies

onuralpszr Feb 17, 2025
Collaborator

@pranta-barua007 you can try it :) But technically looks "yes" . But I was checking how Ultralytics doing in there to understand what I was missing, so I can give you better answer but at least you find a way to workaround with it.

pranta-barua007 Feb 18, 2025
Author

@pranta-barua007 you can try it :) But technically looks "yes" . But I was checking how Ultralytics doing in there to understand what I was missing, so I can give you better answer but at least you find a way to workaround with it.

@onuralpszr again thanks a LOT!!! 🙂‍↕️ I needed the terminology of how the calculation works of segmentation. i am trying to do it with NMS, if I am able I will update here

pranta-barua007 · 2025-02-18T15:18:08Z

pranta-barua007
Feb 18, 2025
Author

@onuralpszr i have figured how to do it with ✅ NMS enabled exporting ONNX format

from ultralytics import YOLO

# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)

ft_loaded_best_model.export(
    format="onnx",
    nms=True,
    data="/content/dental_disease__instance_segmented-9/data.yaml"
)

Ultralytics 8.3.76 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 113 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs

PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with 

input shape (1, 3, 640, 640) BCHW and 
output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)

ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 9.0s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)

Export complete (11.8s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict:         yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640  
Validate:        yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml  
Visualize:       https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx

Yolo11s-seg with nms exported ONNX

import cv2
import numpy as np
import onnxruntime
import math
import time
import supervision as sv

def sigmoid(x):
    return 1 / (1 + np.exp(-x))

class YOLOv11nms:
    def __init__(self, path, conf_thres=0.4, num_masks=32):
        """
        Args:
            path (str): Path to the exported ONNX model.
            conf_thres (float): Confidence threshold for filtering detections.
            num_masks (int): Number of mask coefficients (should match export, e.g., 32).
        """
        self.conf_threshold = conf_thres
        self.num_masks = num_masks
        self.initialize_model(path)
    
    def initialize_model(self, path):
        # Create ONNX Runtime session with GPU (if available) or CPU.
        self.session = onnxruntime.InferenceSession(
            path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
        )
        self.get_input_details()
        self.get_output_details()
    
    def get_input_details(self):
        model_inputs = self.session.get_inputs()
        self.input_names = [inp.name for inp in model_inputs]
        self.input_shape = model_inputs[0].shape  # Expected shape: (1, 3, 640, 640)
        self.input_height = self.input_shape[2]
        self.input_width = self.input_shape[3]
    
    def get_output_details(self):
        model_outputs = self.session.get_outputs()
        self.output_names = [out.name for out in model_outputs]
    
    def prepare_input(self, image):
        # Record the original image dimensions.
        self.img_height, self.img_width = image.shape[:2]
        # Convert BGR (OpenCV format) to RGB.
        img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
        # Resize to the model’s input size (e.g., 640x640).
        img = cv2.resize(img, (self.input_width, self.input_height))
        # Normalize pixel values to [0, 1].
        img = img.astype(np.float32) / 255.0
        # Convert from HWC to CHW format.
        img = img.transpose(2, 0, 1)
        # Add batch dimension: shape becomes (1, 3, 640, 640).
        input_tensor = np.expand_dims(img, axis=0)
        return input_tensor
    
    def inference(self, input_tensor):
        outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
        return outputs
    
    def segment_objects(self, image):
        """
        Processes an image and returns:
          - boxes: Bounding boxes (rescaled to original image coordinates).
          - scores: Confidence scores.
          - class_ids: Detected class indices.
          - masks: Binary segmentation masks (aligned with the original image).
        """
        # Preprocess the image.
        input_tensor = self.prepare_input(image)
        outputs = self.inference(input_tensor)
        
        # Process detection output.
        # Detection output shape is (1, 300, 38) (post-NMS & transposed).
        detections = np.squeeze(outputs[0], axis=0)  # Now shape: (300, 38)
        
        # Filter out detections below the confidence threshold.
        valid_mask = detections[:, 4] > self.conf_threshold
        detections = detections[valid_mask]
        
        if detections.shape[0] == 0:
            return np.array([]), np.array([]), np.array([]), np.array([])
        
        # Extract detection results.
        # boxes_model: boxes in model input coordinates (e.g., in a 640x640 space)
        boxes_model = detections[:, :4]  # Format: (x1, y1, x2, y2)
        scores = detections[:, 4]
        class_ids = detections[:, 5].astype(np.int64)
        mask_coeffs = detections[:, 6:]  # 32 mask coefficients
        
        # Rescale boxes for final drawing on the original image.
        boxes_draw = self.rescale_boxes(
            boxes_model, 
            (self.input_height, self.input_width), 
            (self.img_height, self.img_width)
        )
        
        # Process the mask output using the boxes in model coordinates.
        masks = self.process_mask_output(mask_coeffs, outputs[1], boxes_model)
        
        return boxes_draw, scores, class_ids, masks
    
    def process_mask_output(self, mask_coeffs, mask_feature_map, boxes_model):
        """
        Generates segmentation masks for each detection.
        
        Args:
            mask_coeffs (np.ndarray): (N, 32) mask coefficients for N detections.
            mask_feature_map (np.ndarray): Output mask feature map with shape (1, 32, 160, 160).
            boxes_model (np.ndarray): Bounding boxes in model input coordinates.
        
        Returns:
            mask_maps (np.ndarray): Binary masks for each detection, with shape 
                                    (N, original_img_height, original_img_width).
        """
        # Squeeze the mask feature map: (1, 32, 160, 160) -> (32, 160, 160)
        mask_feature_map = np.squeeze(mask_feature_map, axis=0)
        # Reshape to (32, 25600) where 25600 = 160 x 160.
        mask_feature_map_reshaped = mask_feature_map.reshape(self.num_masks, -1)
        # Combine mask coefficients with the mask feature map.
        # Resulting shape: (N, 25600) → then reshape to (N, 160, 160)
        masks = sigmoid(np.dot(mask_coeffs, mask_feature_map_reshaped))
        masks = masks.reshape(-1, mask_feature_map.shape[1], mask_feature_map.shape[2])
        
        # Get mask feature map dimensions.
        mask_h, mask_w = mask_feature_map.shape[1], mask_feature_map.shape[2]
        # Rescale boxes from model coordinates (e.g., 640x640) to mask feature map coordinates (e.g., 160x160).
        scale_boxes = self.rescale_boxes(
            boxes_model, 
            (self.input_height, self.input_width), 
            (mask_h, mask_w)
        )
        # Also, compute boxes in original image coordinates for placing the mask.
        boxes_draw = self.rescale_boxes(
            boxes_model, 
            (self.input_height, self.input_width), 
            (self.img_height, self.img_width)
        )
        
        # Create an empty array for final masks with the same size as the original image.
        mask_maps = np.zeros((boxes_model.shape[0], self.img_height, self.img_width), dtype=np.uint8)
        
        # Determine blur size based on the ratio between the original image and the mask feature map.
        blur_size = (
            max(1, int(self.img_width / mask_w)),
            max(1, int(self.img_height / mask_h))
        )
        
        for i in range(boxes_model.shape[0]):
            # Get the detection box in mask feature map coordinates.
            sx1, sy1, sx2, sy2 = scale_boxes[i]
            sx1, sy1, sx2, sy2 = int(np.floor(sx1)), int(np.floor(sy1)), int(np.ceil(sx2)), int(np.ceil(sy2))
            
            # Get the corresponding box in the original image.
            ox1, oy1, ox2, oy2 = boxes_draw[i]
            ox1, oy1, ox2, oy2 = int(np.floor(ox1)), int(np.floor(oy1)), int(np.ceil(ox2)), int(np.ceil(oy2))
            
            # Crop the predicted mask region from the raw mask.
            cropped_mask = masks[i][sy1:sy2, sx1:sx2]
            if cropped_mask.size == 0 or (ox2 - ox1) <= 0 or (oy2 - oy1) <= 0:
                continue
            # Resize the cropped mask to the size of the detection box in the original image.
            resized_mask = cv2.resize(cropped_mask, (ox2 - ox1, oy2 - oy1), interpolation=cv2.INTER_CUBIC)
            # Apply a slight blur to smooth the mask edges.
            resized_mask = cv2.blur(resized_mask, blur_size)
            # Threshold the mask to obtain a binary mask.
            bin_mask = (resized_mask > 0.5).astype(np.uint8)
            # Place the binary mask into the correct location on the full mask.
            mask_maps[i, oy1:oy2, ox1:ox2] = bin_mask
        
        return mask_maps
    
    @staticmethod
    def rescale_boxes(boxes, input_shape, target_shape):
        """
        Rescales boxes from one coordinate space to another.
        
        Args:
            boxes (np.ndarray): Array of boxes (N, 4) with format [x1, y1, x2, y2].
            input_shape (tuple): (height, width) of the current coordinate space.
            target_shape (tuple): (height, width) of the target coordinate space.
        
        Returns:
            np.ndarray: Scaled boxes of shape (N, 4).
        """
        in_h, in_w = input_shape
        tgt_h, tgt_w = target_shape
        scale = np.array([tgt_w / in_w, tgt_h / in_h, tgt_w / in_w, tgt_h / in_h])
        return boxes * scale
    
    def __call__(self, image):
        # This allows you to call the instance directly, e.g.:
        # boxes, scores, class_ids, masks = detector(image)
        return self.segment_objects(image)

Usage

# Load the model and create InferenceSession
best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"

detector = YOLOv11nms(best_weights_path, conf_thres=0.4)

img = cv2.imread("/content/download (1).jpeg")
# Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)
boxes, scores, class_ids, masks = detector(img)

boolean_mask = masks.astype(bool)

box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
mask_annotator = sv.MaskAnnotator()
detections = sv.Detections(xyxy=boxes, confidence=scores, class_id=class_ids,mask=boolean_mask)
detections = detections.with_nms(threshold=0.5)

annotate = box_annotator.annotate(scene=img.copy(), detections=detections)
annotate = label_annotator.annotate(scene=annotate, detections=detections)
annotate = mask_annotator.annotate(scene=annotate, detections=detections)

sv.plot_image(annotate)

OUTPUT

1 reply

pranta-barua007 Feb 18, 2025
Author

Key Points of This Implementation

Input Preparation:
- The image is resized to ((640, 640)) (the model’s input size), normalized, and formatted as a (1 \times 3 \times 640 \times 640) tensor.
Inference:
- The model is executed via ONNX Runtime, returning two outputs:
  - Detection Output: ((1, 300, 38)) — already post-NMS and transposed.
  - Mask Feature Map: ((1, 32, 160, 160))
Detection Parsing:
- The detection output is squeezed to ((300, 38)) and filtered by a confidence threshold.
- The first 4 values give the bounding box, index 4 is the confidence score, index 5 is the class ID, and indices 6–37 are the mask coefficients.
Mask Processing:
- The 32 mask coefficients are combined with the mask feature map (after reshaping) using a dot product and a sigmoid activation to produce a raw mask.
- The predicted mask is then cropped, resized, blurred, and thresholded to yield a binary mask for each detection.
Box Rescaling:
- Boxes are rescaled from the input image dimensions to the mask feature map dimensions (and then used to position the final masks onto the original image).

@onuralpszr if you have any improvement recommendation I would love to see it 🙂‍↕️
and thanks again for you quick support

pranta-barua007 · 2025-02-18T15:39:54Z

pranta-barua007
Feb 18, 2025
Author

@onuralpszr i actually doing this for using in the web specifically with JS and later React Natvie , is https://www.npmjs.com/package/supervision this lib will release anywhere soon ?

1 reply

onuralpszr Feb 18, 2025
Collaborator

@onuralpszr i actually doing this for using in the web specifically with JS and later React Natvie , is https://www.npmjs.com/package/supervision this lib will release anywhere soon ?

Not soon AFAIK :) your best bet is backend + fronted or checking some other libraries

pranta-barua007 · 2025-02-20T18:03:02Z

pranta-barua007
Feb 20, 2025
Author

@onuralpszr i am making the app on nextjs directly using JS/typescript. almost there i need some help fixing the masks, some masks are correct some arent. it would be very kind if I get some assist.
repo -> https://github.com/pranta-barua007/yolo11s-seg-web-onnx

1 reply

pranta-barua007 Feb 22, 2025
Author

@onuralpszr fixed it ✅ all working

how to use yolov11s-seg supervision onnx runtime? #1789

Uh oh!

Uh oh!

pranta-barua007 Feb 15, 2025

Replies: 30 comments · 6 replies

Uh oh!

pranta-barua007 Feb 15, 2025 Author

Load the model and create InferenceSession

Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)

Uh oh!

pranta-barua007 Feb 15, 2025 Author

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

pranta-barua007 Feb 15, 2025 Author

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

pranta-barua007 Feb 15, 2025 Author

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

Uh oh!

pranta-barua007 Feb 15, 2025 Author

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

pranta-barua007 Feb 15, 2025 Author

Uh oh!

onuralpszr Feb 15, 2025 Collaborator

Uh oh!

pranta-barua007 Feb 15, 2025 Author

Uh oh!

Uh oh!

pranta-barua007 Feb 16, 2025 Author

without DYNAMIC and NMS -- OPTION 1

with DYNAMIC and NMS -- OPTION 2

without DYNAMIC -- OPTION 3 (DEFAULT)

Uh oh!

pranta-barua007 Feb 17, 2025 Author

Uh oh!

onuralpszr Feb 17, 2025 Collaborator

Load the model and create InferenceSession

Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)

1) Print bounding box coordinates to confirm they're in-image

2) If you have masks, print their shape and check if non-empty

3) Create annotators with explicit colors/thickness

4) Draw them in the recommended order: masks → boxes → labels

5) Display the result

Uh oh!

pranta-barua007 Feb 17, 2025 Author

Uh oh!

onuralpszr Feb 17, 2025 Collaborator

Uh oh!

pranta-barua007 Feb 17, 2025 Author

Uh oh!

onuralpszr Feb 17, 2025 Collaborator

boolean_mask = masks.astype(bool)

Uh oh!

onuralpszr Feb 17, 2025 Collaborator

Uh oh!

Uh oh!

pranta-barua007 Feb 17, 2025 Author

Uh oh!

pranta-barua007 Feb 17, 2025 Author

Uh oh!

Uh oh!

pranta-barua007 Feb 17, 2025 Author

Uh oh!

pranta-barua007
Feb 15, 2025

Replies: 30 comments 6 replies

pranta-barua007
Feb 15, 2025
Author

pranta-barua007
Feb 15, 2025
Author

onuralpszr
Feb 15, 2025
Collaborator

onuralpszr
Feb 15, 2025
Collaborator

pranta-barua007
Feb 15, 2025
Author

onuralpszr
Feb 15, 2025
Collaborator

pranta-barua007
Feb 15, 2025
Author

onuralpszr
Feb 15, 2025
Collaborator

pranta-barua007
Feb 15, 2025
Author

onuralpszr
Feb 15, 2025
Collaborator

onuralpszr
Feb 15, 2025
Collaborator

onuralpszr
Feb 15, 2025
Collaborator

pranta-barua007
Feb 15, 2025
Author

onuralpszr
Feb 15, 2025
Collaborator

pranta-barua007
Feb 15, 2025
Author

pranta-barua007
Feb 16, 2025
Author

pranta-barua007
Feb 17, 2025
Author

onuralpszr
Feb 17, 2025
Collaborator

pranta-barua007
Feb 17, 2025
Author

onuralpszr
Feb 17, 2025
Collaborator

pranta-barua007
Feb 17, 2025
Author

onuralpszr
Feb 17, 2025
Collaborator

onuralpszr
Feb 17, 2025
Collaborator

pranta-barua007
Feb 17, 2025
Author

pranta-barua007
Feb 17, 2025
Author

pranta-barua007
Feb 17, 2025
Author