Skip to content

Add action mode (audio-transient detection), in-frame verification, and time-warp helper#2

Open
Erik5000 wants to merge 1 commit into
louisedesadeleer:mainfrom
Erik5000:add-action-mode-and-time-warp
Open

Add action mode (audio-transient detection), in-frame verification, and time-warp helper#2
Erik5000 wants to merge 1 commit into
louisedesadeleer:mainfrom
Erik5000:add-action-mode-and-time-warp

Conversation

@Erik5000

@Erik5000 Erik5000 commented May 7, 2026

Copy link
Copy Markdown

Summary

Three additions, each motivated by a real session where the existing dialogue-only flow had gaps. The original skill is great for podcasts/interviews; these changes extend it without changing the dialogue path.

1. Action mode — Step 1B + scripts/detect_transients.py

For footage driven by repeated impacts (woodchopping, sports, drumming, percussion, throws/hits) where Whisper isn't useful. The new script detects strikes via spectral flux on a tunable freq band:

  • defaults (--band 1000:6000 --min-flux 30 --min-gap 0.6) work out-of-the-box for axe-on-wood
  • band can be retuned for heavy thuds (200:2000 — kick drum, body shots) or high cracks (4000:10000 — snare, whistles, clinks)
  • ambient/wind sits at flux 5–15; real impacts at 50–300, so the threshold is robust

In testing on ~30 minutes of woodchopping footage, the detector found 170 in-frame strikes across 5 phone clips with one CLI invocation per clip.

2. In-frame verification — Step 1.5

Applies to both modes. Audio detection finds the sound of an event; the camera might not have captured it. The new step samples one frame at each candidate via ffmpeg -ss T -frames:v 1 and the agent reads each verify image before rendering. Drops:

  • candidates where subject is bent/off-frame/behind obstacle
  • thumb-on-lens shots (yes, this happens — only catchable visually)
  • false-positive sounds (door slam, dropped tool)

This caught a clip where the chopper was bent low over the log on every chop in that clip — would otherwise have produced an awkward "we cut on a sound I can't see" feeling in the final edit.

3. Time-warp helper — Step 4c

For when the user wants a slow scenic/pan/reveal shot tightened ("cut this in half") without losing the arc. Trimming forces dropping part of the arc; speed-ramping with setpts/atempo preserves all beats at higher tempo:

ffmpeg -ss \"\$START\" -i \"\$CLIP\" -t \"\$ORIG_DUR\" \\
  -vf \"setpts=PTS/2,scale=1080:1920:flags=lanczos,fps=30\" \\
  -af \"atempo=2.0\" \\
  ...

Includes rules of thumb (1.5x for brisk human movement, 2x for punchier reveals, chained atempo=2.0,atempo=2.0 for 4x since a single atempo filter only accepts 0.5–2.0).

Other changes

  • Frontmatter description extended to include action-mode triggers ("action montage," "cut on each chop/hit/punch")
  • New ## Modes section at the top routes the agent to dialogue / action / hybrid before Step 1
  • Step 5 (subtitles) now starts with "Skip in action mode"
  • Pitfalls updated with band-tuning guidance and the atempo chaining gotcha

Test plan

  • detect_transients.py --help works; runs cleanly on 16kHz mono WAV
  • Detector smoke-tested on real footage (5 phone clips, woodchopping); top-N flag returns the same hits a human would pick
  • Time-warp ffmpeg pattern verified on a 12s pan → 6s output with synced audio
  • Reviewer to confirm the new sections fit the existing prose style

Three additions, each motivated by a real session where the original
dialogue-only flow left gaps:

- Action mode (Step 1B + scripts/detect_transients.py): for footage
  driven by repeated impacts (chops, hits, throws, drumbeats) where
  Whisper isn't useful. Detects strikes via spectral flux on a tunable
  freq band — defaults work for axe-on-wood; flags cover heavy thuds
  and high cracks.

- In-frame verification (Step 1.5): sample one frame at each candidate
  before rendering, drop the ones where the subject is off-frame or a
  thumb's on the lens. Applies to both modes. Cheap insurance against
  the user's first feedback being "you cut on hits where I'm not even
  visible."

- Time-warp helper (Step 4c): setpts/atempo pattern for tightening
  scenic/pan/reveal shots without dropping arc beats — useful when the
  user says "cut this in half" and trimming would lose the buildup.

Pitfalls section updated with band-tuning guidance and the atempo
chaining gotcha (single filter accepts only 0.5-2.0).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant