|
1 | 1 | # YOLO-Patch-Based-Inference |
2 | 2 |
|
3 | | -This library facilitates various visualizations of inference results from ultralytics segmentation/detection models, including cropping with overlays, as well as **a patch-based inference algorithm enabling the detection of small objects in images**. It caters to both object detection and instance segmentation tasks. |
| 3 | +This Python library simplifies SAHI-like inference for instance segmentation tasks, enabling the detection of small objects in images. It caters to both object detection and instance segmentation tasks, supporting a wide range of Ultralytics models. |
| 4 | + |
| 5 | +The library also provides a sleek customization of the visualization of the inference results for all models, both in the standard approach (direct network run) and the unique patch-based variant. |
4 | 6 |
|
5 | 7 | **Model Support**: The library offers support for multiple ultralytics deep learning models, such as YOLOv8, YOLOv8-seg, YOLOv9, YOLOv9-seg, FastSAM, and RTDETR. Users can select from pre-trained options or utilize custom-trained models to best meet their task requirements. |
6 | 8 |
|
@@ -99,65 +101,69 @@ classes_names=result.filtered_classes_names |
99 | 101 | #### Explanation of possible input arguments: |
100 | 102 |
|
101 | 103 | **MakeCropsDetectThem** |
102 | | -Class implementing cropping and passing crops through a neural network for detection/segmentation.\ |
103 | | -**Args:** |
104 | | -- **image** (*np.ndarray*): Input image BGR. |
105 | | -- **model_path** (*str*): Path to the YOLO model. |
106 | | -- **model** (*ultralytics model*) Pre-initialized model object. If provided, the model will be used directly instead of loading from model_path. |
107 | | -- **imgsz** (*int*): Size of the input image for inference YOLO. |
108 | | -- **conf** (*float*): Confidence threshold for detections YOLO. |
109 | | -- **iou** (*float*): IoU threshold for non-maximum suppression YOLOv8 of single crop. |
110 | | -- **classes_list** (*List[int] or None*): List of classes to filter detections. If None, all classes are considered. Defaults to None. |
111 | | -- **segment** (*bool*): Whether to perform segmentation (YOLOv8-seg). |
112 | | -- **shape_x** (*int*): Size of the crop in the x-coordinate. |
113 | | -- **shape_y** (*int*): Size of the crop in the y-coordinate. |
114 | | -- **overlap_x** (*float*): Percentage of overlap along the x-axis. |
115 | | -- **overlap_y** (*float*): Percentage of overlap along the y-axis. |
116 | | -- **show_crops** (*bool*): Whether to visualize the cropping. |
117 | | -- **resize_initial_size** (*bool*): Whether to resize the results to the original image size (ps: slow operation). |
118 | | -- **memory_optimize** (*bool*): Memory optimization option for segmentation (less accurate results when enabled). |
| 104 | +Class implementing cropping and passing crops through a neural network for detection/segmentation: |
| 105 | + |
| 106 | +| **Argument** | **Type** | **Default** | **Description** | |
| 107 | +|-----------------------|------------------------|--------------|----------------------------------------------------------------------------------------------------------------| |
| 108 | +| image | np.ndarray | | Input image BGR. | |
| 109 | +| model_path | str | "yolov8m.pt" | Path to the YOLO model. | |
| 110 | +| model | ultralytics model | None | Pre-initialized model object. If provided, the model will be used directly instead of loading from model_path. | |
| 111 | +| imgsz | int | 640 | Size of the input image for inference YOLO. | |
| 112 | +| conf | float | 0.5 | Confidence threshold for detections YOLO. | |
| 113 | +| iou | float | 0.7 | IoU threshold for non-maximum suppression YOLOv8 of single crop. | |
| 114 | +| classes_list | List[int] or None | None | List of classes to filter detections. If None, all classes are considered. | |
| 115 | +| segment | bool | False | Whether to perform segmentation (YOLOv8-seg). | |
| 116 | +| shape_x | int | 700 | Size of the crop in the x-coordinate. | |
| 117 | +| shape_y | int | 600 | Size of the crop in the y-coordinate. | |
| 118 | +| overlap_x | float | 25 | Percentage of overlap along the x-axis. | |
| 119 | +| overlap_y | float | 25 | Percentage of overlap along the y-axis. | |
| 120 | +| show_crops | bool | False | Whether to visualize the cropping. | |
| 121 | +| resize_initial_size | bool | False | Whether to resize the results to the original input image size (ps: slow operation). | |
| 122 | +| memory_optimize | bool | True | Memory optimization option for segmentation (less accurate results when enabled). | |
| 123 | + |
119 | 124 |
|
120 | 125 | **CombineDetections** |
121 | | -Class implementing combining masks/boxes from multiple crops + NMS (Non-Maximum Suppression).\ |
122 | | -**Args:** |
123 | | -- **element_crops** (*MakeCropsDetectThem*): Object containing crop information. |
124 | | -- **nms_threshold** (*float*): IoU/IoS threshold for non-maximum suppression. |
125 | | -- **match_metric** (*str*): Matching metric, either 'IOU' or 'IOS'. |
126 | | -- **intelligent_sorter** (*bool*): Enable sorting by area and rounded confidence parameter. |
127 | | - If False, sorting will be done only by confidence (usual nms). (Dafault is True) |
| 126 | +Class implementing combining masks/boxes from multiple crops + NMS (Non-Maximum Suppression): |
128 | 127 |
|
| 128 | +| **Argument** | **Type** | **Default** | **Description** | |
| 129 | +|----------------------|-------------------|-------------|-------------------------------------------------------------------------------------------------------------------------| |
| 130 | +| element_crops |MakeCropsDetectThem| | Object containing crop information. | |
| 131 | +| nms_threshold | float | 0.3 | IoU/IoS threshold for non-maximum suppression. | |
| 132 | +| match_metric | str | IOS | Matching metric, either 'IOU' or 'IOS'. | |
| 133 | +| intelligent_sorter | bool | True | Enable sorting by area and rounded confidence parameter. If False, sorting will be done only by confidence (usual nms). | |
129 | 134 |
|
130 | 135 |
|
131 | 136 | --- |
132 | 137 | ### 2. Custom inference visualization: |
133 | | -Visualizes custom results of object detection or segmentation on an image. |
134 | | - |
135 | | -**Args:** |
136 | | -- **img** (*numpy.ndarray*): The input image in BGR format. |
137 | | -- **boxes** (*list*): A list of bounding boxes in the format [x_min, y_min, x_max, y_max]. |
138 | | -- **classes_ids** (*list*): A list of class IDs for each detection. |
139 | | -- **confidences** (*list*): A list of confidence scores corresponding to each bounding box. Default is an empty list. |
140 | | -- **classes_names** (*list*): A list of class names corresponding to the class IDs. Default is an empty list. |
141 | | -- **polygons** (*list*): A list containing NumPy arrays of polygon coordinates that represent segmentation masks. |
142 | | -- **masks** (*list*): A list of masks. Default is an empty list. |
143 | | -- **segment** (*bool*): Whether to perform instance segmentation. Default is False. |
144 | | -- **show_boxes** (*bool*): Whether to show bounding boxes. Default is True. |
145 | | -- **show_class** (*bool*): Whether to show class labels. Default is True. |
146 | | -- **fill_mask** (*bool*): Whether to fill the segmented regions with color. Default is False. |
147 | | -- **alpha** (*float*): The transparency of filled masks. Default is 0.3. |
148 | | -- **color_class_background** (*tuple*): The background BGR color for class labels. Default is (0, 0, 255) (red). |
149 | | -- **color_class_text** (*tuple*): The text color for class labels. Default is (255, 255, 255) (white). |
150 | | -- **thickness** (*int*): The thickness of bounding box and text. Default is 4. |
151 | | -- **font**: The font type for class labels. Default is cv2.FONT_HERSHEY_SIMPLEX. |
152 | | -- **font_scale** (*float*): The scale factor for font size. Default is 1.5. |
153 | | -- **delta_colors** (*int*): The random seed offset for color variation. Default is seed=0. |
154 | | -- **dpi** (*int*): Final visualization size (plot is bigger when dpi is higher). Default is 150. |
155 | | -- **random_object_colors** (*bool*): If true, colors for each object are selected randomly. Default is False. |
156 | | -- **show_confidences** (*bool*): If true and show_class=True, confidences near class are visualized. Default is False. |
157 | | -- **axis_off** (*bool*): If true, axis is turned off in the final visualization. Default is True. |
158 | | -- **show_classes_list** (*list*): If empty, visualize all classes. Otherwise, visualize only classes in the list. |
159 | | -- **return_image_array** (*bool*): If True, the function returns the image (BGR np.array) instead of displaying it. |
160 | | - Default is False. |
| 138 | +Visualizes results of patch-based object detection or segmentation on an image.\ |
| 139 | +Possible arguments of the ```visualize_results``` function: |
| 140 | + |
| 141 | +| Argument | Type | Default | Description | |
| 142 | +|-------------------------|-----------------|----------- |-----------------------------------------------------------------------------------------------| |
| 143 | +| img | numpy.ndarray | | The input image in BGR format. | |
| 144 | +| boxes | list | | A list of bounding boxes in the format [x_min, y_min, x_max, y_max]. | |
| 145 | +| classes_ids | list | | A list of class IDs for each detection. | |
| 146 | +| confidences | list | [] | A list of confidence scores corresponding to each bounding box. | |
| 147 | +| classes_names | list | [] | A list of class names corresponding to the class IDs. | |
| 148 | +| polygons | list | [] | A list containing NumPy arrays of polygon coordinates that represent segmentation masks. | |
| 149 | +| masks | list | [] | A list of segmentation binary masks. | |
| 150 | +| segment | bool | False | Whether to perform instance segmentation visualization. | |
| 151 | +| show_boxes | bool | True | Whether to show bounding boxes. | |
| 152 | +| show_class | bool | True | Whether to show class labels. | |
| 153 | +| fill_mask | bool | False | Whether to fill the segmented regions with color. | |
| 154 | +| alpha | float | 0.3 | The transparency of filled masks. | |
| 155 | +| color_class_background | tuple | (0, 0, 255) | The background BGR color for class labels. | |
| 156 | +| color_class_text | tuple |(255, 255, 255)| The text color for class labels. | |
| 157 | +| thickness | int | 4 | The thickness of bounding box and text. | |
| 158 | +| font | cv2.font |cv2.FONT_HERSHEY_SIMPLEX | The font type for class labels. | |
| 159 | +| font_scale | float | 1.5 | The scale factor for font size. | |
| 160 | +| delta_colors | int | seed=0 | The random seed offset for color variation. | |
| 161 | +| dpi | int | 150 | Final visualization size (plot is bigger when dpi is higher). | |
| 162 | +| random_object_colors | bool | False | If true, colors for each object are selected randomly. | |
| 163 | +| show_confidences | bool | False | If true and show_class=True, confidences near class are visualized. | |
| 164 | +| axis_off | bool | True | If true, axis is turned off in the final visualization. | |
| 165 | +| show_classes_list | list | [] | If empty, visualize all classes. Otherwise, visualize only classes in the list. | |
| 166 | +| return_image_array | bool | False | If True, the function returns the image (BGR np.array) instead of displaying it. | |
161 | 167 |
|
162 | 168 |
|
163 | 169 | Example of using: |
|
0 commit comments