feat(server): track video metadata by mertalev · Pull Request #28023 · immich-app/immich

mertalev · 2026-04-22T06:53:29Z

Description

This PR adds a variety of audio/video metadata to the DB. The data is shaped like the existing MediaRepository.probe call, meaning that the existing code for thumbnail generation and video transcoding is compatible with the DB output. These are the first consumers of this data, allowing probing to be done only once in metadata extraction.

The broader goal is to enable accurate and efficient HLS playlist generation, and to minimize live transcoding latency by doing as much ahead of time as possible. It tracks keyframe and colorimetric data that will be needed in future PRs. It stores a bit more than the bare minimum in the interest of having the data when we need it (for example, the DoVi metadata will be useful for advertising the original as DoVi rather than generic HEVC).

The initial idea was to add this metadata to the asset_exif table, but the number of columns needed grew and the typing was bad since this table is shared with images as well. I split it into three tables to keep things tidy with stricter typing:

asset_audio
asset_video
asset_keyframe

This also enables a future one-to-many relationship here if we want to track multiple audio or video streams.

The PR is functional, but needs more testing and ideally a video test asset or two for e2e.

meesfrensel

Nice! A few thoughts that are mostly just that. I'm sure a more thorough code review will follow, so I only added a few comments there.

At what point does it make sense to separate the media service & repository into separate image and AV ones? Long files aren't the end of the world but they're kind of a mix anyway.
Should the AV metadata tables point to asset_file? It does imply that the orginal file should also have an entry there. I'm thinking of a few purposes:
- Keep pre-transcoded video (like we have now) which could be easily remuxed/streamed instead of transcoding an 8K video in real-time
- Use these metadata tables for the videos of live photos
- This metadata really is specific to the 'file', not to the 'asset', as exif information like location, camera info, etc. is, so conceptually, pointing to asset_file feels better to me.
I'm not sure what the whole postDiscard stuff means. If that's my stupidity, feel free to ignore, but it might be nice to be a bit more verbose there or add some short comments for future readers.
Do all the codec/color transfer/etc enums have a purpose at this point? Obviously some are used, but most aren't afaict. Might make sense to only add them when you need them.

Also, I like your impementation of parsing the packet data. I was messing around with pts_time which I had to re-round/align to the time base, super clunky. This is much better.

mertalev · 2026-04-22T20:32:18Z

At what point does it make sense to separate the media service & repository into separate image and AV ones? Long files aren't the end of the world but they're kind of a mix anyway.

It's an interesting question. It would probably make the job pipeline a bit more complicated since assets would need different services based on their asset type. It could also make drift more likely, e.g. the behavior for edited assets or how the DB gets updated could diverge more easily between images and videos.

Should the AV metadata tables point to asset_file?

I think offline transcodes are a bit up in the air in general right now. One idea I was floating was to make pre-transcoded videos segmented to store them in a (soon to come) segment table. Remuxing on the fly could be more flexible though and would benefit from tracking the transcode's metadata aside from the original's. It's simple to change it in a migration down the line if desired, so I don't think it should block the PR either way though.

I'm not sure what the whole postDiscard stuff means. If that's my stupidity, feel free to ignore, but it might be nice to be a bit more verbose there or add some short comments for future readers.

Not at all! It's something I learned just recently: some packets are marked for discard, so they don't show up in the video but still end up influencing the timings for the packets that do get outputted. One video in particular had a bunch of D packets with negative PTS.

Do all the codec/color transfer/etc enums have a purpose at this point? Obviously some are used, but most aren't afaict. Might make sense to only add them when you need them.

The color transfer and pixel format are definitely needed. The primaries and matrix aren't immediately needed, but they're very useful info that cost nothing to store. I can imagine them being useful for HDR transcoding, tone-mapping and thumbnail generation down the line.

meesfrensel · 2026-04-23T13:53:37Z

-  private async getExifTags(asset: { originalPath: string; files: AssetFile[]; type: AssetType }): Promise<ImmichTags> {
+  private async getExifTags(asset: { originalPath: string; files: AssetFile[]; type: AssetType }) {
    const { sidecarFile } = getAssetFiles(asset.files);
+    const shouldProbe = asset.type === AssetType.Video || asset.originalPath.toLowerCase().endsWith('.gif');


|| mimeTypes.isPossiblyAnimatedImage(asset.originalPath) like below on L612?

Suggested change

const shouldProbe = asset.type === AssetType.Video || asset.originalPath.toLowerCase().endsWith('.gif');

const shouldProbe = asset.type === AssetType.Video || mimeTypes.isPossiblyAnimatedImage(asset.originalPath);

The behavior on main is that it uses the video thumbnail generation path for videos and .gif, so those need to be probed. I'm not sure if FFprobe would handle all the formats in possiblyAnimatedImageExtensions. I'm actually not sure if even .gif needs FFprobe or FFmpeg since sharp should presumably be able to handle it.

meesfrensel · 2026-04-23T14:04:24Z

It would probably make the job pipeline a bit more complicated since assets would need different services based on their asset type. It could also make drift more likely, e.g. the behavior for edited assets or how the DB gets updated could diverge more easily between images and videos.

Understandable, and that's your (the team's) call. For me, files 1k lines long that touch many different aspects are already past the limit of what fits in my brain simultaneously :) That's the sole reason I brought it up.

One idea I was floating was to make pre-transcoded videos segmented to store them in a (soon to come) segment table. Remuxing on the fly could be more flexible though and would benefit from tracking the transcode's metadata aside from the original's. It's simple to change it in a migration down the line if desired, so I don't think it should block the PR either way though.

That was my idea too, to serve the pre-transcoded videos through the same HLS mechanism and selected by default. When the user or HLS client selects a different rendition, only then start realtime transcoding. That keeps the best of both worlds.

mertalev requested a review from danieldietzler as a code owner April 22, 2026 06:53

immich-push-o-matic Bot added the 🗄️server label Apr 22, 2026

mertalev added the changelog:enhancement label Apr 22, 2026

mertalev force-pushed the feat/server-video-metadata branch from cfc8ee3 to a060147 Compare April 22, 2026 06:59

meesfrensel reviewed Apr 22, 2026

View reviewed changes

Comment thread server/src/repositories/media.repository.ts Outdated

Comment thread server/src/repositories/media.repository.ts

Comment thread server/src/services/metadata.service.ts Outdated

Comment thread server/src/repositories/media.repository.ts

mertalev force-pushed the feat/server-video-metadata branch from f2443bd to 215528b Compare April 22, 2026 20:09

meesfrensel reviewed Apr 23, 2026

View reviewed changes

mertalev requested a review from bo0tzz as a code owner April 23, 2026 23:19

mertalev added 16 commits April 29, 2026 13:54

track video metadata

1b414b3

earlier duration check

164f0aa

revert colorspace change

283dbdb

duplicate constant

10d122e

formatting

d5f7dc0

linting

dfbab41

add comments

67d13ae

redundant variable

855ee1b

simplify tests

c38215e

use totalDuration instead of format.duration

c5fb4ac

medium tests

7ea8b73

install ffmpeg

968ccc7

install noble

2ba4b15

update test-assets commit

f4ada32

make timeBase non-nullable

2ba88fa

linting

23e7c94

mertalev force-pushed the feat/server-video-metadata branch from 87e0e9c to 23e7c94 Compare April 29, 2026 17:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(server): track video metadata#28023

feat(server): track video metadata#28023
mertalev wants to merge 16 commits intomainfrom
feat/server-video-metadata

mertalev commented Apr 22, 2026

Uh oh!

meesfrensel left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mertalev commented Apr 22, 2026 •

edited

Loading

Uh oh!

meesfrensel Apr 23, 2026

Uh oh!

mertalev Apr 23, 2026

Uh oh!

meesfrensel commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	const shouldProbe = asset.type === AssetType.Video \|\| asset.originalPath.toLowerCase().endsWith('.gif');
	const shouldProbe = asset.type === AssetType.Video \|\| mimeTypes.isPossiblyAnimatedImage(asset.originalPath);

Uh oh!

Conversation

mertalev commented Apr 22, 2026

Description

Uh oh!

meesfrensel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mertalev commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

meesfrensel Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

mertalev Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

meesfrensel commented Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mertalev commented Apr 22, 2026 •

edited

Loading