Conversation
cfc8ee3 to
a060147
Compare
meesfrensel
left a comment
There was a problem hiding this comment.
Nice! A few thoughts that are mostly just that. I'm sure a more thorough code review will follow, so I only added a few comments there.
- At what point does it make sense to separate the media service & repository into separate image and AV ones? Long files aren't the end of the world but they're kind of a mix anyway.
- Should the AV metadata tables point to
asset_file? It does imply that the orginal file should also have an entry there. I'm thinking of a few purposes:- Keep pre-transcoded video (like we have now) which could be easily remuxed/streamed instead of transcoding an 8K video in real-time
- Use these metadata tables for the videos of live photos
- This metadata really is specific to the 'file', not to the 'asset', as exif information like location, camera info, etc. is, so conceptually, pointing to
asset_filefeels better to me.
- I'm not sure what the whole
postDiscardstuff means. If that's my stupidity, feel free to ignore, but it might be nice to be a bit more verbose there or add some short comments for future readers. - Do all the codec/color transfer/etc enums have a purpose at this point? Obviously some are used, but most aren't afaict. Might make sense to only add them when you need them.
Also, I like your impementation of parsing the packet data. I was messing around with pts_time which I had to re-round/align to the time base, super clunky. This is much better.
f2443bd to
215528b
Compare
It's an interesting question. It would probably make the job pipeline a bit more complicated since assets would need different services based on their asset type. It could also make drift more likely, e.g. the behavior for edited assets or how the DB gets updated could diverge more easily between images and videos.
I think offline transcodes are a bit up in the air in general right now. One idea I was floating was to make pre-transcoded videos segmented to store them in a (soon to come) segment table. Remuxing on the fly could be more flexible though and would benefit from tracking the transcode's metadata aside from the original's. It's simple to change it in a migration down the line if desired, so I don't think it should block the PR either way though.
Not at all! It's something I learned just recently: some packets are marked for discard, so they don't show up in the video but still end up influencing the timings for the packets that do get outputted. One video in particular had a bunch of D packets with negative PTS.
The color transfer and pixel format are definitely needed. The primaries and matrix aren't immediately needed, but they're very useful info that cost nothing to store. I can imagine them being useful for HDR transcoding, tone-mapping and thumbnail generation down the line. |
| private async getExifTags(asset: { originalPath: string; files: AssetFile[]; type: AssetType }): Promise<ImmichTags> { | ||
| private async getExifTags(asset: { originalPath: string; files: AssetFile[]; type: AssetType }) { | ||
| const { sidecarFile } = getAssetFiles(asset.files); | ||
| const shouldProbe = asset.type === AssetType.Video || asset.originalPath.toLowerCase().endsWith('.gif'); |
There was a problem hiding this comment.
|| mimeTypes.isPossiblyAnimatedImage(asset.originalPath) like below on L612?
| const shouldProbe = asset.type === AssetType.Video || asset.originalPath.toLowerCase().endsWith('.gif'); | |
| const shouldProbe = asset.type === AssetType.Video || mimeTypes.isPossiblyAnimatedImage(asset.originalPath); |
There was a problem hiding this comment.
The behavior on main is that it uses the video thumbnail generation path for videos and .gif, so those need to be probed. I'm not sure if FFprobe would handle all the formats in possiblyAnimatedImageExtensions. I'm actually not sure if even .gif needs FFprobe or FFmpeg since sharp should presumably be able to handle it.
Understandable, and that's your (the team's) call. For me, files 1k lines long that touch many different aspects are already past the limit of what fits in my brain simultaneously :) That's the sole reason I brought it up.
That was my idea too, to serve the pre-transcoded videos through the same HLS mechanism and selected by default. When the user or HLS client selects a different rendition, only then start realtime transcoding. That keeps the best of both worlds. |
87e0e9c to
23e7c94
Compare
Description
This PR adds a variety of audio/video metadata to the DB. The data is shaped like the existing
MediaRepository.probecall, meaning that the existing code for thumbnail generation and video transcoding is compatible with the DB output. These are the first consumers of this data, allowing probing to be done only once in metadata extraction.The broader goal is to enable accurate and efficient HLS playlist generation, and to minimize live transcoding latency by doing as much ahead of time as possible. It tracks keyframe and colorimetric data that will be needed in future PRs. It stores a bit more than the bare minimum in the interest of having the data when we need it (for example, the DoVi metadata will be useful for advertising the original as DoVi rather than generic HEVC).
The initial idea was to add this metadata to the
asset_exiftable, but the number of columns needed grew and the typing was bad since this table is shared with images as well. I split it into three tables to keep things tidy with stricter typing:asset_audioasset_videoasset_keyframeThis also enables a future one-to-many relationship here if we want to track multiple audio or video streams.
The PR is functional, but needs more testing and ideally a video test asset or two for e2e.