The problem to solve is that photos with complex decompression (e.g. jxl) can be painfully slow to generate thumbnails, and when they are moved they lose their thumbnails, because the thumbnailing appears to be keyed against the full image path and there appears to be no setting to alter that key (someone correct me).
What would work however would be to key the thumbnails not against file path but against other options, specifically I think "driveid-modifiedts-size" or "modifiedts-size" or "driveid-uniquefilenumber", as well as the existing "fullpath" method, one of which the user could choose in settings, as these would make thumbnails once generated immune to moves, and use of a drive ID would tackle drive letter changes. In this regards I would myself choose "driveid-modifiedts-size" if drive-fileid were unavailable.
To go with this, some option to clean up lost entries, which is where having the drive is a virtue since it could scan the whole drive(s) for keys that exist and at the end keys not encountered for those drive(s) scanned could be removed/marked as missing but viable thumbnails left so no useful thumbnail is removed, and the space from obsolete keys reclaimed.
For "modified-size" options obviously it would be up to the user to avoid a combination of settings whereby saves are performed without updating the timestamp which happen to have the size unaltered (such as change one colour to another in a PNG with no modifieddate change on save), it would then ideally need to "poke" the timestamp by a tick on the filesystem concerned in order to achieve a modified timestamp that is unchanged from a human perspective, but normally people should have it set to update the modified stamp with every save.
This would have a truly huge benefit for anyone who moves files around a lot such as myself, as at the moment jxl etc are unusable due to the impermanence of thumbnails.
David
Thumbnail Catalogue Key Option
-
meteorquake
- Posts: 94
- Joined: Wed Sep 13, 2023 9:37 am
Thumbnail Catalogue Key Option
Last edited by meteorquake on Thu Nov 13, 2025 2:38 pm, edited 1 time in total.
Re: Thumbnail Catalogue Key Option
that may help, quick AI check says:
how common is this practice:
Google Photos, digiKam, Shotwell, Lightroom, Plex, Nextcloud, and CDNs all use content hash as cache key in DB — not just folder/file IDs.
also test case (on i7, 32gb):
- image: 4000×4000 32-bit (~50mb uncompressed)
- db: 1M records, SQLite lookup by
-- hash (indexed) - 0.05-1 ms
-- folder_id+image_id (indexed) - 0.02-0.5 ms
further speed up
use only part of the file for hashing (eg first 256kb)
how common is this practice:
Google Photos, digiKam, Shotwell, Lightroom, Plex, Nextcloud, and CDNs all use content hash as cache key in DB — not just folder/file IDs.
also test case (on i7, 32gb):
- image: 4000×4000 32-bit (~50mb uncompressed)
- db: 1M records, SQLite lookup by
-- hash (indexed) - 0.05-1 ms
-- folder_id+image_id (indexed) - 0.02-0.5 ms
further speed up
use only part of the file for hashing (eg first 256kb)
Last edited by user0 on Thu Nov 13, 2025 3:55 pm, edited 3 times in total.
-
meteorquake
- Posts: 94
- Joined: Wed Sep 13, 2023 9:37 am
Re: Thumbnail Catalogue Key Option
I thought of content hash too, but it is not 0, and I'm not sure what computer your chart was done on but something like MD5 on a laptop will both add considerable time overhead (it certainly does on this one) and also CPU overhead every time you view photos, particularly many photos, with not really any gain - the chances of taking two camera photos at the same time and size for me are zero because it takes a second to take even one photo - whereas drive-modifiedts-size or drive-fileid is basically instant and resource free.
So whilst content hash could be an option (presumably it can hash just the image data then) some instant meta-data identification that is unique in practice would be what I would choose. If I ever encountered two images with the same metadata I would poke one of their times but I don't think that would ever be needed.
So whilst content hash could be an option (presumably it can hash just the image data then) some instant meta-data identification that is unique in practice would be what I would choose. If I ever encountered two images with the same metadata I would poke one of their times but I don't think that would ever be needed.
-
meteorquake
- Posts: 94
- Joined: Wed Sep 13, 2023 9:37 am
Re: Thumbnail Catalogue Key Option
As a quick reality check, I ran MD5Checker (https://md5checker.github.io/download.html) with 310 photos and started the clock when it had accepted the filelist dragged in. It took about 60 seconds, or 0.2 secs per file, which is much slower than it would take for XNView to show 310 thumbnails.
-
meteorquake
- Posts: 94
- Joined: Wed Sep 13, 2023 9:37 am
Re: Thumbnail Catalogue Key Option
One extra thought on this, if it not suitable doing it on a laptop (speed and battery resources), viewing thumbnails over a not-fast network using MD5 as the thumb key is obviously going to prove to be worse. Jpegs will work fast enough with their meta-thumbs, as will having a centralised thumb db or the ability to bulk import thumbs built locally on the remote location and then continue to track them despite moves using uid-like meta information; but if you have to create the thumbs yourself you'll not want to be MD5-ing over the network every file just to identify its thumbnail, nor do you want to decode every time every file you view either.
Re: Thumbnail Catalogue Key Option
its much much faster to hash file's raw bytes, rather than parse image content
also its possible to hash only part of the file, which should be enough for uniqueness purposes
and this simple hashing should not be confused with perceptual one,
used for visual similarity search (pHash, dHash, aHash, wavelet hash), which is ~10-20.. times slower
also its possible to hash only part of the file, which should be enough for uniqueness purposes
and this simple hashing should not be confused with perceptual one,
used for visual similarity search (pHash, dHash, aHash, wavelet hash), which is ~10-20.. times slower
try python script, btw md5 is not the fastest algo
Code: Select all
import hashlib
import time
file_path = r"D:\dev\py\hash\img.jpg"
start_time = time.time() * 1000
md5_hash = hashlib.md5()
with open(file_path, "rb") as f:
for chunk in iter(lambda: f.read(4096), b""):
md5_hash.update(chunk)
end_time = time.time() * 1000
print(md5_hash.hexdigest())
print("{:.3f} ms".format(end_time - start_time))
-
meteorquake
- Posts: 94
- Joined: Wed Sep 13, 2023 9:37 am
Re: Thumbnail Catalogue Key Option
They're certainly fine as options. By hashing image data intended raw undecoded image data, so if just the meta data was changed it's still regarded as the same (though perhaps one would want to throw orientation flag into the mix).
But neither are suitable for non-fast networks or for laptops, even for laptops that can provide the necessary speed it would be an unnecessary battery drain.
But neither are suitable for non-fast networks or for laptops, even for laptops that can provide the necessary speed it would be an unnecessary battery drain.