Added requirements and tracking script for SAM-2#1358
Open
mwuel wants to merge 2 commits into
Open
Conversation
Member
|
Revert the logic that keeps the same size for circles. It should be possible to change the circle size during tracking (e.g. when the camera moves closer to the object). Also think about new tracking behavior: Users draw a point, circle, box or polygon but the result of the tracking is always a polygon. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A custom frame by frame video predictor https://github.com/Gy920/segment-anything-2-real-time has been used to implement tracking using the SAM-2 model by Facebook. The original predictor https://github.com/facebookresearch/sam2 was also considered but not implemented because it required loading the whole video or at least chunks of it in-memory, making it less elegant for larger video files.
The script works by extracting a bounding box from the predicted mask by SAM-2. Tracking via a bounding box is therefore very straightforward to implement in the future. For point and circle tracking, this also means that the point is just an interpolated mask and might show unwanted behavior. Especially point tracking may look very odd, as the point given as output will just be the average over the whole mask. The point might end up outside of the original object if the shape is not simple enough (i.e. big wingspan of birds might enlargen the bounding box, shifting the middle point outside of bounds).
The bounding box will always keep the dimensions of the initial input so it can be easily adjusted. Using polygons as an input for tracking has not been tried but might be possible for even more accurate results. Scripts can be easily switched between old and new by modifying the videos.php config file.
Currently, the script is set to track every second frame only. Tracking every frame significantly hinders computational performance and skipping more than one frame showed worse tracking results because of missing context. The number of frames to be skipped can be easily adjusted in the script.