Skip to content

Commit 8989846

Browse files
committed
final docs with examples
1 parent 3b61cf8 commit 8989846

1 file changed

Lines changed: 59 additions & 53 deletions

File tree

README.md

Lines changed: 59 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,8 @@
11
# YOLO-Patch-Based-Inference
22

3-
This library facilitates various visualizations of inference results from ultralytics segmentation/detection models, including cropping with overlays, as well as **a patch-based inference algorithm enabling the detection of small objects in images**. It caters to both object detection and instance segmentation tasks.
3+
This Python library simplifies SAHI-like inference for instance segmentation tasks, enabling the detection of small objects in images. It caters to both object detection and instance segmentation tasks, supporting a wide range of Ultralytics models.
4+
5+
The library also provides a sleek customization of the visualization of the inference results for all models, both in the standard approach (direct network run) and the unique patch-based variant.
46

57
**Model Support**: The library offers support for multiple ultralytics deep learning models, such as YOLOv8, YOLOv8-seg, YOLOv9, YOLOv9-seg, FastSAM, and RTDETR. Users can select from pre-trained options or utilize custom-trained models to best meet their task requirements.
68

@@ -99,65 +101,69 @@ classes_names=result.filtered_classes_names
99101
#### Explanation of possible input arguments:
100102

101103
**MakeCropsDetectThem**
102-
Class implementing cropping and passing crops through a neural network for detection/segmentation.\
103-
**Args:**
104-
- **image** (*np.ndarray*): Input image BGR.
105-
- **model_path** (*str*): Path to the YOLO model.
106-
- **model** (*ultralytics model*) Pre-initialized model object. If provided, the model will be used directly instead of loading from model_path.
107-
- **imgsz** (*int*): Size of the input image for inference YOLO.
108-
- **conf** (*float*): Confidence threshold for detections YOLO.
109-
- **iou** (*float*): IoU threshold for non-maximum suppression YOLOv8 of single crop.
110-
- **classes_list** (*List[int] or None*): List of classes to filter detections. If None, all classes are considered. Defaults to None.
111-
- **segment** (*bool*): Whether to perform segmentation (YOLOv8-seg).
112-
- **shape_x** (*int*): Size of the crop in the x-coordinate.
113-
- **shape_y** (*int*): Size of the crop in the y-coordinate.
114-
- **overlap_x** (*float*): Percentage of overlap along the x-axis.
115-
- **overlap_y** (*float*): Percentage of overlap along the y-axis.
116-
- **show_crops** (*bool*): Whether to visualize the cropping.
117-
- **resize_initial_size** (*bool*): Whether to resize the results to the original image size (ps: slow operation).
118-
- **memory_optimize** (*bool*): Memory optimization option for segmentation (less accurate results when enabled).
104+
Class implementing cropping and passing crops through a neural network for detection/segmentation:
105+
106+
| **Argument** | **Type** | **Default** | **Description** |
107+
|-----------------------|------------------------|--------------|----------------------------------------------------------------------------------------------------------------|
108+
| image | np.ndarray | | Input image BGR. |
109+
| model_path | str | "yolov8m.pt" | Path to the YOLO model. |
110+
| model | ultralytics model | None | Pre-initialized model object. If provided, the model will be used directly instead of loading from model_path. |
111+
| imgsz | int | 640 | Size of the input image for inference YOLO. |
112+
| conf | float | 0.5 | Confidence threshold for detections YOLO. |
113+
| iou | float | 0.7 | IoU threshold for non-maximum suppression YOLOv8 of single crop. |
114+
| classes_list | List[int] or None | None | List of classes to filter detections. If None, all classes are considered. |
115+
| segment | bool | False | Whether to perform segmentation (YOLOv8-seg). |
116+
| shape_x | int | 700 | Size of the crop in the x-coordinate. |
117+
| shape_y | int | 600 | Size of the crop in the y-coordinate. |
118+
| overlap_x | float | 25 | Percentage of overlap along the x-axis. |
119+
| overlap_y | float | 25 | Percentage of overlap along the y-axis. |
120+
| show_crops | bool | False | Whether to visualize the cropping. |
121+
| resize_initial_size | bool | False | Whether to resize the results to the original input image size (ps: slow operation). |
122+
| memory_optimize | bool | True | Memory optimization option for segmentation (less accurate results when enabled). |
123+
119124

120125
**CombineDetections**
121-
Class implementing combining masks/boxes from multiple crops + NMS (Non-Maximum Suppression).\
122-
**Args:**
123-
- **element_crops** (*MakeCropsDetectThem*): Object containing crop information.
124-
- **nms_threshold** (*float*): IoU/IoS threshold for non-maximum suppression.
125-
- **match_metric** (*str*): Matching metric, either 'IOU' or 'IOS'.
126-
- **intelligent_sorter** (*bool*): Enable sorting by area and rounded confidence parameter.
127-
If False, sorting will be done only by confidence (usual nms). (Dafault is True)
126+
Class implementing combining masks/boxes from multiple crops + NMS (Non-Maximum Suppression):
128127

128+
| **Argument** | **Type** | **Default** | **Description** |
129+
|----------------------|-------------------|-------------|-------------------------------------------------------------------------------------------------------------------------|
130+
| element_crops |MakeCropsDetectThem| | Object containing crop information. |
131+
| nms_threshold | float | 0.3 | IoU/IoS threshold for non-maximum suppression. |
132+
| match_metric | str | IOS | Matching metric, either 'IOU' or 'IOS'. |
133+
| intelligent_sorter | bool | True | Enable sorting by area and rounded confidence parameter. If False, sorting will be done only by confidence (usual nms). |
129134

130135

131136
---
132137
### 2. Custom inference visualization:
133-
Visualizes custom results of object detection or segmentation on an image.
134-
135-
**Args:**
136-
- **img** (*numpy.ndarray*): The input image in BGR format.
137-
- **boxes** (*list*): A list of bounding boxes in the format [x_min, y_min, x_max, y_max].
138-
- **classes_ids** (*list*): A list of class IDs for each detection.
139-
- **confidences** (*list*): A list of confidence scores corresponding to each bounding box. Default is an empty list.
140-
- **classes_names** (*list*): A list of class names corresponding to the class IDs. Default is an empty list.
141-
- **polygons** (*list*): A list containing NumPy arrays of polygon coordinates that represent segmentation masks.
142-
- **masks** (*list*): A list of masks. Default is an empty list.
143-
- **segment** (*bool*): Whether to perform instance segmentation. Default is False.
144-
- **show_boxes** (*bool*): Whether to show bounding boxes. Default is True.
145-
- **show_class** (*bool*): Whether to show class labels. Default is True.
146-
- **fill_mask** (*bool*): Whether to fill the segmented regions with color. Default is False.
147-
- **alpha** (*float*): The transparency of filled masks. Default is 0.3.
148-
- **color_class_background** (*tuple*): The background BGR color for class labels. Default is (0, 0, 255) (red).
149-
- **color_class_text** (*tuple*): The text color for class labels. Default is (255, 255, 255) (white).
150-
- **thickness** (*int*): The thickness of bounding box and text. Default is 4.
151-
- **font**: The font type for class labels. Default is cv2.FONT_HERSHEY_SIMPLEX.
152-
- **font_scale** (*float*): The scale factor for font size. Default is 1.5.
153-
- **delta_colors** (*int*): The random seed offset for color variation. Default is seed=0.
154-
- **dpi** (*int*): Final visualization size (plot is bigger when dpi is higher). Default is 150.
155-
- **random_object_colors** (*bool*): If true, colors for each object are selected randomly. Default is False.
156-
- **show_confidences** (*bool*): If true and show_class=True, confidences near class are visualized. Default is False.
157-
- **axis_off** (*bool*): If true, axis is turned off in the final visualization. Default is True.
158-
- **show_classes_list** (*list*): If empty, visualize all classes. Otherwise, visualize only classes in the list.
159-
- **return_image_array** (*bool*): If True, the function returns the image (BGR np.array) instead of displaying it.
160-
Default is False.
138+
Visualizes results of patch-based object detection or segmentation on an image.\
139+
Possible arguments of the ```visualize_results``` function:
140+
141+
| Argument | Type | Default | Description |
142+
|-------------------------|-----------------|----------- |-----------------------------------------------------------------------------------------------|
143+
| img | numpy.ndarray | | The input image in BGR format. |
144+
| boxes | list | | A list of bounding boxes in the format [x_min, y_min, x_max, y_max]. |
145+
| classes_ids | list | | A list of class IDs for each detection. |
146+
| confidences | list | [] | A list of confidence scores corresponding to each bounding box. |
147+
| classes_names | list | [] | A list of class names corresponding to the class IDs. |
148+
| polygons | list | [] | A list containing NumPy arrays of polygon coordinates that represent segmentation masks. |
149+
| masks | list | [] | A list of segmentation binary masks. |
150+
| segment | bool | False | Whether to perform instance segmentation visualization. |
151+
| show_boxes | bool | True | Whether to show bounding boxes. |
152+
| show_class | bool | True | Whether to show class labels. |
153+
| fill_mask | bool | False | Whether to fill the segmented regions with color. |
154+
| alpha | float | 0.3 | The transparency of filled masks. |
155+
| color_class_background | tuple | (0, 0, 255) | The background BGR color for class labels. |
156+
| color_class_text | tuple |(255, 255, 255)| The text color for class labels. |
157+
| thickness | int | 4 | The thickness of bounding box and text. |
158+
| font | cv2.font |cv2.FONT_HERSHEY_SIMPLEX | The font type for class labels. |
159+
| font_scale | float | 1.5 | The scale factor for font size. |
160+
| delta_colors | int | seed=0 | The random seed offset for color variation. |
161+
| dpi | int | 150 | Final visualization size (plot is bigger when dpi is higher). |
162+
| random_object_colors | bool | False | If true, colors for each object are selected randomly. |
163+
| show_confidences | bool | False | If true and show_class=True, confidences near class are visualized. |
164+
| axis_off | bool | True | If true, axis is turned off in the final visualization. |
165+
| show_classes_list | list | [] | If empty, visualize all classes. Otherwise, visualize only classes in the list. |
166+
| return_image_array | bool | False | If True, the function returns the image (BGR np.array) instead of displaying it. |
161167

162168

163169
Example of using:

0 commit comments

Comments
 (0)