Add action mode (audio-transient detection), in-frame verification, and time-warp helper#2
Open
Erik5000 wants to merge 1 commit into
Open
Conversation
Three additions, each motivated by a real session where the original dialogue-only flow left gaps: - Action mode (Step 1B + scripts/detect_transients.py): for footage driven by repeated impacts (chops, hits, throws, drumbeats) where Whisper isn't useful. Detects strikes via spectral flux on a tunable freq band — defaults work for axe-on-wood; flags cover heavy thuds and high cracks. - In-frame verification (Step 1.5): sample one frame at each candidate before rendering, drop the ones where the subject is off-frame or a thumb's on the lens. Applies to both modes. Cheap insurance against the user's first feedback being "you cut on hits where I'm not even visible." - Time-warp helper (Step 4c): setpts/atempo pattern for tightening scenic/pan/reveal shots without dropping arc beats — useful when the user says "cut this in half" and trimming would lose the buildup. Pitfalls section updated with band-tuning guidance and the atempo chaining gotcha (single filter accepts only 0.5-2.0). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three additions, each motivated by a real session where the existing dialogue-only flow had gaps. The original skill is great for podcasts/interviews; these changes extend it without changing the dialogue path.
1. Action mode —
Step 1B+scripts/detect_transients.pyFor footage driven by repeated impacts (woodchopping, sports, drumming, percussion, throws/hits) where Whisper isn't useful. The new script detects strikes via spectral flux on a tunable freq band:
--band 1000:6000 --min-flux 30 --min-gap 0.6) work out-of-the-box for axe-on-wood200:2000— kick drum, body shots) or high cracks (4000:10000— snare, whistles, clinks)In testing on ~30 minutes of woodchopping footage, the detector found 170 in-frame strikes across 5 phone clips with one CLI invocation per clip.
2. In-frame verification —
Step 1.5Applies to both modes. Audio detection finds the sound of an event; the camera might not have captured it. The new step samples one frame at each candidate via
ffmpeg -ss T -frames:v 1and the agent reads each verify image before rendering. Drops:This caught a clip where the chopper was bent low over the log on every chop in that clip — would otherwise have produced an awkward "we cut on a sound I can't see" feeling in the final edit.
3. Time-warp helper —
Step 4cFor when the user wants a slow scenic/pan/reveal shot tightened ("cut this in half") without losing the arc. Trimming forces dropping part of the arc; speed-ramping with
setpts/atempopreserves all beats at higher tempo:Includes rules of thumb (1.5x for brisk human movement, 2x for punchier reveals, chained
atempo=2.0,atempo=2.0for 4x since a singleatempofilter only accepts 0.5–2.0).Other changes
descriptionextended to include action-mode triggers ("action montage," "cut on each chop/hit/punch")## Modessection at the top routes the agent to dialogue / action / hybrid before Step 1Step 5(subtitles) now starts with "Skip in action mode"Pitfallsupdated with band-tuning guidance and theatempochaining gotchaTest plan
detect_transients.py --helpworks; runs cleanly on 16kHz mono WAV