how to use yolov11s-seg supervision onnx runtime? #1789
Replies: 30 comments 6 replies
-
here is how i am making predictions Load the model and create InferenceSessionbest_weights_path = f"{saved_model_results_path}/train/weights/best.onnx" detector = YOLOv11(best_weights_path, conf_thres=0.2, iou_thres=0.3) img = cv2.imread("/content/download (1).jpeg") Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)boxes, scores, class_ids, masks = detector(img) boxes array([[ 274.24, 185.68, 958.67, 689.4],
[ 244.84, 252.61, 830.42, 883.34]], dtype=float32) scores array([ 0.8895, 0.86876], dtype=float32) class_Ids array([2, 2], dtype=int32) masks array([[[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]],
[[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
...,
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255],
[255, 255, 255, ..., 255, 255, 255]]], dtype=uint8) |
Beta Was this translation helpful? Give feedback.
-
i have checked predicted the class id and prob it is right things that are working fine here is how it should be, below is direct prediction results with ultralytics |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 hello, let me check mask case quickly in my collab |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 also If you don't mind can you share your export parameters and model with me as well. |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr thanks for the quick reply ! should I share it here? |
Beta Was this translation helpful? Give feedback.
-
If it is a problem you can share to my e-mail "[email protected]" via google drive |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr please do check i have shared |
Beta Was this translation helpful? Give feedback.
-
Export parameters also please ? |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr can you please explain here is how i am exporting ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/disease__instance_segmented/data.yaml",
) # creates 'best.onnx' I got params from here https://docs.ultralytics.com/modes/export/#arguments |
Beta Was this translation helpful? Give feedback.
-
I got what I needed all good :) |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 can you also upload original picture you used |
Beta Was this translation helpful? Give feedback.
-
@pranta-barua007 also one "train" data picture would be also great for testing purpose as well |
Beta Was this translation helpful? Give feedback.
-
ok @onuralpszr uploading on the shared folder |
Beta Was this translation helpful? Give feedback.
-
Great |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr shared some information currently train data is not available can you please check if the shared resources help ? |
Beta Was this translation helpful? Give feedback.
-
I have tried exporting model in 3 different ways and inspected difference in outputshape (my default is OPTION - 3) without DYNAMIC and NMS -- OPTION 1from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/without_nms_and_dynamic/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
data="/content/disease_instance_segmented/data.yaml"
) Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.pt' with
** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB) **
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 2.6s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx' (38.7 MB)
Export complete (4.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/without_nms_and_dynamic/best.onnx with DYNAMIC and NMS -- OPTION 2from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/with_dynamic_and_nms/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
dynamic=True,
data="/content/disease_instance_segmented/data.yaml"
) Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.pt' with
** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB) **
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 35.8s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx' (38.6 MB)
Export complete (42.2s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/with_dynamic_and_nms/best.onnx without DYNAMIC -- OPTION 3 (DEFAULT)from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/disease_instance_segmented/data.yaml"
) Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
** input shape (1, 3, 640, 640) BCHW **
** and output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB) **
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 4.4s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (6.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-7/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr class YOLOv11:
def __init__(self, path, conf_thres=0.7, iou_thres=0.5):
self.conf_threshold = conf_thres
self.iou_threshold = iou_thres
self.initialize_model(path)
def __call__(self, image):
return self.detect_objects(image)
def initialize_model(self, path):
self.session = onnxruntime.InferenceSession(
path, providers=onnxruntime.get_available_providers()
)
self.get_input_details()
self.get_output_details()
def detect_objects(self, image):
# Save original image dimensions
self.img_height, self.img_width = image.shape[:2]
# Prepare input (resize, normalize, etc.)
input_tensor = self.prepare_input(image)
# Run inference
outputs = self.inference(input_tensor)
# Process outputs into boxes, scores, class IDs, and masks
boxes, scores, class_ids, masks = self.process_output(outputs)
return boxes, scores, class_ids, masks
def prepare_input(self, image):
# Convert BGR to RGB and resize to model input size (e.g. 640x640)
input_img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
input_img = cv2.resize(input_img, (self.input_width, self.input_height))
# Normalize to [0, 1]
input_img = input_img / 255.0
# Change data layout from HWC to CHW
input_img = input_img.transpose(2, 0, 1)
# Add batch dimension: [1, C, H, W]
input_tensor = input_img[np.newaxis, :, :, :].astype(np.float32)
return input_tensor
def inference(self, input_tensor):
outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
return outputs
def process_output(self, outputs):
"""
Model outputs:
- outputs[0]: shape (1, 300, 38)
indices 0-3: bounding box (assumed [x1, y1, x2, y2] in 640×640 space)
index 4: confidence score
index 5: class id
indices 6-37: segmentation coefficients (32 values)
- outputs[1]: shape (1, 38, 160, 160) -> mask prototypes
"""
# Remove batch dimension from detections (results in (300, 38))
predictions = np.squeeze(outputs[0], axis=0)
mask_protos = outputs[1] # shape: (1, 38, 160, 160)
# Filter detections based on confidence (index 4)
conf_scores = predictions[:, 4]
valid = conf_scores > self.conf_threshold
predictions = predictions[valid]
scores = conf_scores[valid]
if len(scores) == 0:
return [], [], [], []
# Extract bounding boxes (assumed already in [x1, y1, x2, y2] format)
boxes = self.extract_boxes(predictions)
# Extract class ids (index 5)
class_ids = predictions[:, 5].astype(np.int32)
# Extract segmentation masks using coefficients (indices 6-37)
masks = self.extract_masks(predictions, mask_protos)
return boxes, scores, class_ids, masks
def extract_boxes(self, predictions):
# Get the first 4 values; these are assumed to be [x1, y1, x2, y2] in 640×640 space.
boxes = predictions[:, :4]
# If the original image size differs from the model input size,
# rescale boxes from (self.input_width, self.input_height) to (self.img_width, self.img_height)
if (self.img_width != self.input_width) or (self.img_height != self.input_height):
boxes = self.rescale_boxes_corner_format(boxes)
return boxes
def rescale_boxes_corner_format(self, boxes):
# Calculate scaling factors from model input size to original image size
scale_x = float(self.img_width) / self.input_width
scale_y = float(self.img_height) / self.input_height
boxes[:, [0, 2]] *= scale_x # x1, x2
boxes[:, [1, 3]] *= scale_y # y1, y2
return boxes
def extract_masks(self, predictions, mask_protos):
"""
Compute segmentation masks:
- For each detection, use the 32 segmentation coefficients (indices 6-37)
to compute a weighted sum over the first 32 channels of the mask prototypes.
- The mask prototypes have shape (1, 38, 160, 160), so we select the first 32 channels.
"""
# Extract segmentation coefficients (shape: (num_detections, 32))
seg_coeffs = predictions[:, 6:38]
# Use the first 32 channels from mask prototypes; remove batch dimension → (32, 160, 160)
mask_protos = mask_protos[0, :32, :, :]
# Compute masks as a weighted sum of mask prototypes for each detection
masks = np.einsum('nc,chw->nhw', seg_coeffs, mask_protos)
# Apply sigmoid to obtain probabilities between 0 and 1
masks = 1.0 / (1.0 + np.exp(-masks))
# Binarize masks with a threshold of 0.5
masks = masks > 0.5
# Resize each mask from 160x160 (mask prototype resolution) to the original image dimensions
final_masks = []
for mask in masks:
mask_uint8 = mask.astype(np.uint8) * 255
mask_resized = cv2.resize(mask_uint8,
(self.img_width, self.img_height),
interpolation=cv2.INTER_NEAREST)
final_masks.append(mask_resized)
final_masks = np.array(final_masks)
return final_masks
def get_input_details(self):
model_inputs = self.session.get_inputs()
self.input_names = [inp.name for inp in model_inputs]
self.input_shape = model_inputs[0].shape # typically [1, 3, 640, 640]
self.input_height = self.input_shape[2]
self.input_width = self.input_shape[3]
def get_output_details(self):
model_outputs = self.session.get_outputs()
self.output_names = [out.name for out in model_outputs] output # Load the model and create InferenceSession
best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"
detector = YOLOv11(best_weights_path, conf_thres=0.4, iou_thres=0.4)
img = cv2.imread("/content/download (1).jpeg")
# Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)
boxes, scores, class_ids, masks = detector(img)
# 1) Print bounding box coordinates to confirm they're in-image
print("Bounding boxes:\n", detections.xyxy)
# 2) If you have masks, print their shape and check if non-empty
if detections.mask is not None and len(detections.mask) > 0:
print("Mask shape:", detections.mask.shape)
print("Mask unique values:", np.unique(detections.mask[0])) # e.g., [0, 255]
# 3) Create annotators with explicit colors/thickness
mask_annotator = sv.MaskAnnotator(
# By default, it uses random colors. You can force a single color if desired:
color=sv.Color.GREEN
)
box_annotator = sv.BoxAnnotator(
thickness=2 # thicker line
)
label_annotator = sv.LabelAnnotator(
text_scale=0.7,
text_thickness=2
)
# 4) Draw them in the recommended order: masks → boxes → labels
annotated_image = img.copy()
annotated_image = mask_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = box_annotator.annotate(scene=annotated_image, detections=detections)
annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections)
# 5) Display the result
sv.plot_image(annotated_image)
cv2.imwrite("debug_output.jpg", annotated_image) output Bounding boxes:
[[ 619.21 436.77 686.61 506.29]
[ 536.22 568.54 582.89 634.71]]
Mask shape: (2, 900, 1200)
Mask unique values: [ 0 255]
True |
Beta Was this translation helpful? Give feedback.
-
I also made my update on colab to also make mask work as well. Can you check same collab please |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr colab on #1626 ? |
Beta Was this translation helpful? Give feedback.
-
yes |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr getting error on this line boolean_mask = masks.astype(bool) ---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
[<ipython-input-92-d88d72348bb6>](https://localhost:8080/#) in <cell line: 0>()
----> 1 boolean_mask = masks.astype(bool)
AttributeError: 'list' object has no attribute 'astype' |
Beta Was this translation helpful? Give feedback.
-
Convert masks to np.array instead of list but I pressume you got empty mask list ? |
Beta Was this translation helpful? Give feedback.
-
I am converting this to discussion |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i am getting everythink empty now box, mask, class_ids, scores [] !! |
Beta Was this translation helpful? Give feedback.
-
just to be clear i am exporting my model like this from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/dental_disease__instance_segmented-9/data.yaml"
) OUTPUT Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
⚠️⚠️⚠️
input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)
⚠️⚠️⚠️
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 5.8s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (7.5s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx highlighted the I/O shapes above with |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr UPDTAE working ✅ IF exporting ONNX without
Ultralytics 8.3.75 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 265 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
⚠️⚠️⚠️
input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 40, 8400), (1, 32, 160, 160)) (19.6 MB)
⚠️⚠️⚠️
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 3.1s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (5.1s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx |
Beta Was this translation helpful? Give feedback.
-
with input shape (1, 3, 640, 640) BCHW and without input shape (1, 3, 640, 640) BCHW and does applying |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i have figured how to do it with ✅ NMS enabled exporting ONNX format from ultralytics import YOLO
# Load a model
best_weights_path = f"{saved_model_results_path}/train/weights/best.pt"
ft_loaded_best_model = YOLO(best_weights_path)
ft_loaded_best_model.export(
format="onnx",
nms=True,
data="/content/dental_disease__instance_segmented-9/data.yaml"
) Ultralytics 8.3.76 🚀 Python-3.11.11 torch-2.5.1+cu124 CPU (Intel Xeon 2.20GHz)
YOLO11s-seg summary (fused): 113 layers, 10,068,364 parameters, 0 gradients, 35.3 GFLOPs
PyTorch: starting from '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.pt' with
input shape (1, 3, 640, 640) BCHW and
output shape(s) ((1, 300, 38), (1, 32, 160, 160)) (19.6 MB)
ONNX: starting export with onnx 1.17.0 opset 19...
ONNX: slimming with onnxslim 0.1.48...
ONNX: export success ✅ 9.0s, saved as '/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx' (38.7 MB)
Export complete (11.8s)
Results saved to /content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights
Predict: yolo predict task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640
Validate: yolo val task=segment model=/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx imgsz=640 data=/content/dental_disease__instance_segmented-9/data.yaml
Visualize: https://netron.app/
/content/drive/MyDrive/ML/DENTAL_THESIS/fine_tuned/segment/train/weights/best.onnx Yolo11s-seg with nms exported ONNXimport cv2
import numpy as np
import onnxruntime
import math
import time
import supervision as sv
def sigmoid(x):
return 1 / (1 + np.exp(-x))
class YOLOv11nms:
def __init__(self, path, conf_thres=0.4, num_masks=32):
"""
Args:
path (str): Path to the exported ONNX model.
conf_thres (float): Confidence threshold for filtering detections.
num_masks (int): Number of mask coefficients (should match export, e.g., 32).
"""
self.conf_threshold = conf_thres
self.num_masks = num_masks
self.initialize_model(path)
def initialize_model(self, path):
# Create ONNX Runtime session with GPU (if available) or CPU.
self.session = onnxruntime.InferenceSession(
path, providers=['CUDAExecutionProvider', 'CPUExecutionProvider']
)
self.get_input_details()
self.get_output_details()
def get_input_details(self):
model_inputs = self.session.get_inputs()
self.input_names = [inp.name for inp in model_inputs]
self.input_shape = model_inputs[0].shape # Expected shape: (1, 3, 640, 640)
self.input_height = self.input_shape[2]
self.input_width = self.input_shape[3]
def get_output_details(self):
model_outputs = self.session.get_outputs()
self.output_names = [out.name for out in model_outputs]
def prepare_input(self, image):
# Record the original image dimensions.
self.img_height, self.img_width = image.shape[:2]
# Convert BGR (OpenCV format) to RGB.
img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Resize to the model’s input size (e.g., 640x640).
img = cv2.resize(img, (self.input_width, self.input_height))
# Normalize pixel values to [0, 1].
img = img.astype(np.float32) / 255.0
# Convert from HWC to CHW format.
img = img.transpose(2, 0, 1)
# Add batch dimension: shape becomes (1, 3, 640, 640).
input_tensor = np.expand_dims(img, axis=0)
return input_tensor
def inference(self, input_tensor):
outputs = self.session.run(self.output_names, {self.input_names[0]: input_tensor})
return outputs
def segment_objects(self, image):
"""
Processes an image and returns:
- boxes: Bounding boxes (rescaled to original image coordinates).
- scores: Confidence scores.
- class_ids: Detected class indices.
- masks: Binary segmentation masks (aligned with the original image).
"""
# Preprocess the image.
input_tensor = self.prepare_input(image)
outputs = self.inference(input_tensor)
# Process detection output.
# Detection output shape is (1, 300, 38) (post-NMS & transposed).
detections = np.squeeze(outputs[0], axis=0) # Now shape: (300, 38)
# Filter out detections below the confidence threshold.
valid_mask = detections[:, 4] > self.conf_threshold
detections = detections[valid_mask]
if detections.shape[0] == 0:
return np.array([]), np.array([]), np.array([]), np.array([])
# Extract detection results.
# boxes_model: boxes in model input coordinates (e.g., in a 640x640 space)
boxes_model = detections[:, :4] # Format: (x1, y1, x2, y2)
scores = detections[:, 4]
class_ids = detections[:, 5].astype(np.int64)
mask_coeffs = detections[:, 6:] # 32 mask coefficients
# Rescale boxes for final drawing on the original image.
boxes_draw = self.rescale_boxes(
boxes_model,
(self.input_height, self.input_width),
(self.img_height, self.img_width)
)
# Process the mask output using the boxes in model coordinates.
masks = self.process_mask_output(mask_coeffs, outputs[1], boxes_model)
return boxes_draw, scores, class_ids, masks
def process_mask_output(self, mask_coeffs, mask_feature_map, boxes_model):
"""
Generates segmentation masks for each detection.
Args:
mask_coeffs (np.ndarray): (N, 32) mask coefficients for N detections.
mask_feature_map (np.ndarray): Output mask feature map with shape (1, 32, 160, 160).
boxes_model (np.ndarray): Bounding boxes in model input coordinates.
Returns:
mask_maps (np.ndarray): Binary masks for each detection, with shape
(N, original_img_height, original_img_width).
"""
# Squeeze the mask feature map: (1, 32, 160, 160) -> (32, 160, 160)
mask_feature_map = np.squeeze(mask_feature_map, axis=0)
# Reshape to (32, 25600) where 25600 = 160 x 160.
mask_feature_map_reshaped = mask_feature_map.reshape(self.num_masks, -1)
# Combine mask coefficients with the mask feature map.
# Resulting shape: (N, 25600) → then reshape to (N, 160, 160)
masks = sigmoid(np.dot(mask_coeffs, mask_feature_map_reshaped))
masks = masks.reshape(-1, mask_feature_map.shape[1], mask_feature_map.shape[2])
# Get mask feature map dimensions.
mask_h, mask_w = mask_feature_map.shape[1], mask_feature_map.shape[2]
# Rescale boxes from model coordinates (e.g., 640x640) to mask feature map coordinates (e.g., 160x160).
scale_boxes = self.rescale_boxes(
boxes_model,
(self.input_height, self.input_width),
(mask_h, mask_w)
)
# Also, compute boxes in original image coordinates for placing the mask.
boxes_draw = self.rescale_boxes(
boxes_model,
(self.input_height, self.input_width),
(self.img_height, self.img_width)
)
# Create an empty array for final masks with the same size as the original image.
mask_maps = np.zeros((boxes_model.shape[0], self.img_height, self.img_width), dtype=np.uint8)
# Determine blur size based on the ratio between the original image and the mask feature map.
blur_size = (
max(1, int(self.img_width / mask_w)),
max(1, int(self.img_height / mask_h))
)
for i in range(boxes_model.shape[0]):
# Get the detection box in mask feature map coordinates.
sx1, sy1, sx2, sy2 = scale_boxes[i]
sx1, sy1, sx2, sy2 = int(np.floor(sx1)), int(np.floor(sy1)), int(np.ceil(sx2)), int(np.ceil(sy2))
# Get the corresponding box in the original image.
ox1, oy1, ox2, oy2 = boxes_draw[i]
ox1, oy1, ox2, oy2 = int(np.floor(ox1)), int(np.floor(oy1)), int(np.ceil(ox2)), int(np.ceil(oy2))
# Crop the predicted mask region from the raw mask.
cropped_mask = masks[i][sy1:sy2, sx1:sx2]
if cropped_mask.size == 0 or (ox2 - ox1) <= 0 or (oy2 - oy1) <= 0:
continue
# Resize the cropped mask to the size of the detection box in the original image.
resized_mask = cv2.resize(cropped_mask, (ox2 - ox1, oy2 - oy1), interpolation=cv2.INTER_CUBIC)
# Apply a slight blur to smooth the mask edges.
resized_mask = cv2.blur(resized_mask, blur_size)
# Threshold the mask to obtain a binary mask.
bin_mask = (resized_mask > 0.5).astype(np.uint8)
# Place the binary mask into the correct location on the full mask.
mask_maps[i, oy1:oy2, ox1:ox2] = bin_mask
return mask_maps
@staticmethod
def rescale_boxes(boxes, input_shape, target_shape):
"""
Rescales boxes from one coordinate space to another.
Args:
boxes (np.ndarray): Array of boxes (N, 4) with format [x1, y1, x2, y2].
input_shape (tuple): (height, width) of the current coordinate space.
target_shape (tuple): (height, width) of the target coordinate space.
Returns:
np.ndarray: Scaled boxes of shape (N, 4).
"""
in_h, in_w = input_shape
tgt_h, tgt_w = target_shape
scale = np.array([tgt_w / in_w, tgt_h / in_h, tgt_w / in_w, tgt_h / in_h])
return boxes * scale
def __call__(self, image):
# This allows you to call the instance directly, e.g.:
# boxes, scores, class_ids, masks = detector(image)
return self.segment_objects(image) Usage# Load the model and create InferenceSession
best_weights_path = f"{saved_model_results_path}/train/weights/best.onnx"
detector = YOLOv11nms(best_weights_path, conf_thres=0.4)
img = cv2.imread("/content/download (1).jpeg")
# Detect Objects (now returns bounding boxes, scores, class_ids, and segmentation masks)
boxes, scores, class_ids, masks = detector(img)
boolean_mask = masks.astype(bool)
box_annotator = sv.BoxAnnotator()
label_annotator = sv.LabelAnnotator()
mask_annotator = sv.MaskAnnotator()
detections = sv.Detections(xyxy=boxes, confidence=scores, class_id=class_ids,mask=boolean_mask)
detections = detections.with_nms(threshold=0.5)
annotate = box_annotator.annotate(scene=img.copy(), detections=detections)
annotate = label_annotator.annotate(scene=annotate, detections=detections)
annotate = mask_annotator.annotate(scene=annotate, detections=detections)
sv.plot_image(annotate) OUTPUT |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i actually doing this for using in the web specifically with JS and later React Natvie , is https://www.npmjs.com/package/supervision this lib will release anywhere soon ? |
Beta Was this translation helpful? Give feedback.
-
@onuralpszr i am making the app on nextjs directly using JS/typescript. almost there i need some help fixing the masks, some masks are correct some arent. it would be very kind if I get some assist. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
dear @onuralpszr i saw similar case on #1626 and tried some customization with my own usecase for segmentation but doesn't seem to properly working
here is how I am exporting my model with ultralytics
which outputs in console
I have 4 classes in my model
as I applied nms my output0 is already transposed I think
where first 4 indices are bbox. 5 is prob, 6 is class id 7 and rest 32 are mask and the 300 is for the model will detect up to 300 results, educate if my interpretation is wrong ?
here is my implementation
Beta Was this translation helpful? Give feedback.
All reactions