Skip to content

feat(server): track video metadata#28023

Open
mertalev wants to merge 16 commits intomainfrom
feat/server-video-metadata
Open

feat(server): track video metadata#28023
mertalev wants to merge 16 commits intomainfrom
feat/server-video-metadata

Conversation

@mertalev
Copy link
Copy Markdown
Member

Description

This PR adds a variety of audio/video metadata to the DB. The data is shaped like the existing MediaRepository.probe call, meaning that the existing code for thumbnail generation and video transcoding is compatible with the DB output. These are the first consumers of this data, allowing probing to be done only once in metadata extraction.

The broader goal is to enable accurate and efficient HLS playlist generation, and to minimize live transcoding latency by doing as much ahead of time as possible. It tracks keyframe and colorimetric data that will be needed in future PRs. It stores a bit more than the bare minimum in the interest of having the data when we need it (for example, the DoVi metadata will be useful for advertising the original as DoVi rather than generic HEVC).

The initial idea was to add this metadata to the asset_exif table, but the number of columns needed grew and the typing was bad since this table is shared with images as well. I split it into three tables to keep things tidy with stricter typing:

  • asset_audio
  • asset_video
  • asset_keyframe

This also enables a future one-to-many relationship here if we want to track multiple audio or video streams.

The PR is functional, but needs more testing and ideally a video test asset or two for e2e.

Copy link
Copy Markdown
Collaborator

@meesfrensel meesfrensel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! A few thoughts that are mostly just that. I'm sure a more thorough code review will follow, so I only added a few comments there.

  • At what point does it make sense to separate the media service & repository into separate image and AV ones? Long files aren't the end of the world but they're kind of a mix anyway.
  • Should the AV metadata tables point to asset_file? It does imply that the orginal file should also have an entry there. I'm thinking of a few purposes:
    • Keep pre-transcoded video (like we have now) which could be easily remuxed/streamed instead of transcoding an 8K video in real-time
    • Use these metadata tables for the videos of live photos
    • This metadata really is specific to the 'file', not to the 'asset', as exif information like location, camera info, etc. is, so conceptually, pointing to asset_file feels better to me.
  • I'm not sure what the whole postDiscard stuff means. If that's my stupidity, feel free to ignore, but it might be nice to be a bit more verbose there or add some short comments for future readers.
  • Do all the codec/color transfer/etc enums have a purpose at this point? Obviously some are used, but most aren't afaict. Might make sense to only add them when you need them.

Also, I like your impementation of parsing the packet data. I was messing around with pts_time which I had to re-round/align to the time base, super clunky. This is much better.

Comment thread server/src/repositories/media.repository.ts Outdated
Comment thread server/src/repositories/media.repository.ts
Comment thread server/src/services/metadata.service.ts Outdated
Comment thread server/src/repositories/media.repository.ts
@mertalev mertalev force-pushed the feat/server-video-metadata branch from f2443bd to 215528b Compare April 22, 2026 20:09
@mertalev
Copy link
Copy Markdown
Member Author

mertalev commented Apr 22, 2026

At what point does it make sense to separate the media service & repository into separate image and AV ones? Long files aren't the end of the world but they're kind of a mix anyway.

It's an interesting question. It would probably make the job pipeline a bit more complicated since assets would need different services based on their asset type. It could also make drift more likely, e.g. the behavior for edited assets or how the DB gets updated could diverge more easily between images and videos.

Should the AV metadata tables point to asset_file?

I think offline transcodes are a bit up in the air in general right now. One idea I was floating was to make pre-transcoded videos segmented to store them in a (soon to come) segment table. Remuxing on the fly could be more flexible though and would benefit from tracking the transcode's metadata aside from the original's. It's simple to change it in a migration down the line if desired, so I don't think it should block the PR either way though.

I'm not sure what the whole postDiscard stuff means. If that's my stupidity, feel free to ignore, but it might be nice to be a bit more verbose there or add some short comments for future readers.

Not at all! It's something I learned just recently: some packets are marked for discard, so they don't show up in the video but still end up influencing the timings for the packets that do get outputted. One video in particular had a bunch of D packets with negative PTS.

Do all the codec/color transfer/etc enums have a purpose at this point? Obviously some are used, but most aren't afaict. Might make sense to only add them when you need them.

The color transfer and pixel format are definitely needed. The primaries and matrix aren't immediately needed, but they're very useful info that cost nothing to store. I can imagine them being useful for HDR transcoding, tone-mapping and thumbnail generation down the line.

private async getExifTags(asset: { originalPath: string; files: AssetFile[]; type: AssetType }): Promise<ImmichTags> {
private async getExifTags(asset: { originalPath: string; files: AssetFile[]; type: AssetType }) {
const { sidecarFile } = getAssetFiles(asset.files);
const shouldProbe = asset.type === AssetType.Video || asset.originalPath.toLowerCase().endsWith('.gif');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

|| mimeTypes.isPossiblyAnimatedImage(asset.originalPath) like below on L612?

Suggested change
const shouldProbe = asset.type === AssetType.Video || asset.originalPath.toLowerCase().endsWith('.gif');
const shouldProbe = asset.type === AssetType.Video || mimeTypes.isPossiblyAnimatedImage(asset.originalPath);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behavior on main is that it uses the video thumbnail generation path for videos and .gif, so those need to be probed. I'm not sure if FFprobe would handle all the formats in possiblyAnimatedImageExtensions. I'm actually not sure if even .gif needs FFprobe or FFmpeg since sharp should presumably be able to handle it.

@meesfrensel
Copy link
Copy Markdown
Collaborator

It would probably make the job pipeline a bit more complicated since assets would need different services based on their asset type. It could also make drift more likely, e.g. the behavior for edited assets or how the DB gets updated could diverge more easily between images and videos.

Understandable, and that's your (the team's) call. For me, files 1k lines long that touch many different aspects are already past the limit of what fits in my brain simultaneously :) That's the sole reason I brought it up.

One idea I was floating was to make pre-transcoded videos segmented to store them in a (soon to come) segment table. Remuxing on the fly could be more flexible though and would benefit from tracking the transcode's metadata aside from the original's. It's simple to change it in a migration down the line if desired, so I don't think it should block the PR either way though.

That was my idea too, to serve the pre-transcoded videos through the same HLS mechanism and selected by default. When the user or HLS client selects a different rendition, only then start realtime transcoding. That keeps the best of both worlds.

@mertalev mertalev requested a review from bo0tzz as a code owner April 23, 2026 23:19
@mertalev mertalev force-pushed the feat/server-video-metadata branch from 87e0e9c to 23e7c94 Compare April 29, 2026 17:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants