Database size

Bugs and Suggestions in XnView Classic which have been resolved

Moderators: XnTriq, xnview

MaierMan
Posts: 78
Joined: Wed Aug 04, 2004 8:32 pm

Database size

Post by MaierMan »

Glad to see you're using sqlite3 as I value this software pretty high.

But the database gets pretty huge.
I got a database of about 50MB for just 2k of files.
AVG(LEN(pv)) is 18831 (bytes) in that database?
18kb+ per thumbnail (even if there is more meta-data hidden in pv) seems unreasonable high...
Indexing my 250k pictures would then give me 4GB+ blob data :p

A test thumb (db against photoshop'ed thumb) made:
db blob size of pv: ~ 18kb
db blob gzip'ed (per 7z/max): ~ 13kb
bmp32: ~ 15kb
bmp32 + gzip: ~ 10kb
png24 + alpha: ~ 7kb
png8: 3kb
jpeg80: ~ 3kb
jpeg30: ~ 1kb

PS: it's easy to implement field-based encoding/compression by using sqlite3_create_function()...
Simply register the encoder/decoder when you init your |sqlite3*| and then modify the queries to use those ;)
User avatar
xnview
Author of XnView
Posts: 36805
Joined: Mon Oct 13, 2003 7:31 am
Location: France

Re: Database size

Post by xnview »

MaierMan wrote:Glad to see you're using sqlite3 as I value this software pretty high.

But the database gets pretty huge.
I got a database of about 50MB for just 2k of files.
AVG(LEN(pv)) is 18831 (bytes) in that database?
18kb+ per thumbnail (even if there is more meta-data hidden in pv) seems unreasonable high...
Indexing my 250k pictures would then give me 4GB+ blob data :p

A test thumb (db against photoshop'ed thumb) made:
db blob size of pv: ~ 18kb
db blob gzip'ed (per 7z/max): ~ 13kb
bmp32: ~ 15kb
bmp32 + gzip: ~ 10kb
png24 + alpha: ~ 7kb
png8: 3kb
jpeg80: ~ 3kb
jpeg30: ~ 1kb

PS: it's easy to implement field-based encoding/compression by using sqlite3_create_function()...
Simply register the encoder/decoder when you init your |sqlite3*| and then modify the queries to use those ;)
But which compression do you use for cache db?? no, zip or jpeg?
Pierre.
MaierMan
Posts: 78
Joined: Wed Aug 04, 2004 8:32 pm

Re: Database size

Post by MaierMan »

xnview wrote:But which compression do you use for cache db?? no, zip or jpeg?
If that blob just contains the thumbnail (as an image) I'd make jpegs (or pngs) our of it as this is probably the best image compression.
If it is mixed data I would at least (g)zip it.
Or even better store the thumbnail part in jpeg and the other stuff from the mix in another field using gzip (or no) compression.
User avatar
xnview
Author of XnView
Posts: 36805
Joined: Mon Oct 13, 2003 7:31 am
Location: France

Re: Database size

Post by xnview »

MaierMan wrote:
xnview wrote:But which compression do you use for cache db?? no, zip or jpeg?
If that blob just contains the thumbnail (as an image) I'd make jpegs (or pngs) our of it as this is probably the best image compression.
If it is mixed data I would at least (g)zip it.
Or even better store the thumbnail part in jpeg and the other stuff from the mix in another field using gzip (or no) compression.
No, you don't understand me :-) Which compression method do you use in option/Cache?
Pierre.
MaierMan
Posts: 78
Joined: Wed Aug 04, 2004 8:32 pm

Re: Database size

Post by MaierMan »

xnview wrote:No, you don't understand me :-) Which compression method do you use in option/Cache?
Default one I assume...
Letmesee...
hmm... Seems it was ZIP.

Some new tests on |AVG(LENGTH(pv))| using a fileset of 1117 files/690MB and leaving everything else (thumnail dimensions) default:
None: 13995.95
ZIP: 12889.05 (should be more like 7-10k)
High JPEG: 7318.35 (should be more like 2-4k)
Lossy JPEG: 7027.64 (should be more like 1-2k)


Some pv fields seem to be huge. up to 64kb in my tests.
May it be that you're storing meta-data (EXIF, IPTC and such things) within the thumbs?
I uploaded some of those dump files to http://celebnamer.celebworld.ws/stuff/xnview/thumbdump/
The first 7 of them (all 40+kb) have EXIF, IPTC and XMP meta-data (Origin files).
The rest (all below 18kb) have no meta-data.


Another observation:
Dumping the pv data into files, 1117 in my case, and applying gzip allows the shrink the size a lot.
I therefore assume that there is a lot of additional compression possible. (and gzip --fast is actually fast ;))

-- High JPEG
raw (from db): 11416kb
find dump/ | xargs gzip -c9 > dump.bin: 5492kb
find dump/ | xargs gzip -c1 > dump.bin: 5617kb

-- ZIP
raw (from db): 19930kb
find dump/ | xargs gzip -c9 > dump.bin: 13672kb
find dump/ | xargs gzip -c1 > dump.bin: 13830kb
User avatar
xnview
Author of XnView
Posts: 36805
Joined: Mon Oct 13, 2003 7:31 am
Location: France

Re: Database size

Post by xnview »

MaierMan wrote:
xnview wrote:No, you don't understand me :-) Which compression method do you use in option/Cache?
Default one I assume...
Letmesee...
hmm... Seems it was ZIP.

Some new tests on |AVG(LENGTH(pv))| using a fileset of 1117 files/690MB and leaving everything else (thumnail dimensions) default:
None: 13995.95
ZIP: 12889.05 (should be more like 7-10k)
High JPEG: 7318.35 (should be more like 2-4k)
Lossy JPEG: 7027.64 (should be more like 1-2k)
Yes, i store all metadata, but with old cache system do you have almost same size for db?
Perhaps you can send me 1 of your first items?
Currently i compress with zlib metadata too...
Pierre.
MaierMan
Posts: 78
Joined: Wed Aug 04, 2004 8:32 pm

Re: Database size

Post by MaierMan »

xnview wrote: Yes, i store all metadata, but with old cache system do you have almost same size for db?
Old cache gives about 70-80% the size for those 1117 files. All modes.
But that doesn't really matter. New cache system gives the opportunity to improve it. ;)
xnview wrote:Perhaps you can send me 1 of your first items?
Send you what exactly?
Some pv field dumps are available from http://celebnamer.celebworld.ws/stuff/xnview/thumbdump/
I added some using "Low JPEG"... that folder also contains an "find | xargs cat" assembled dump.bin and a "find | xargs gzip -c1" assembled dump.bin.gz.
As you can see from comparing these both there is still a lot of compression possible, even with gzip in "--fast" (or "wb1") mode.
xnview wrote:Currently i compress with zlib metadata too...
Is it really necessary to store it as well?
Wouldn't a bitfiled indicating "5 = HAS_EXIF | HAS_XMP" be enough?
Metadata adds up to 40-50kb for a good tagged file (raw).

The compression seems to be "faulty" as I can easily compress these even more (see remarks above).

Looking at those thumbs that belong to images without meta data the thumb size seems reasonable, at least in "Low JPEG" mode.
This is something I didn't really realize till now as most of my files contain the full range of meta-data (COM, ITPC, EXIF, XMP).
So my current conclusion is: meta-data is not compressed well enough, or it shouldn't be fully stored at all.
User avatar
xnview
Author of XnView
Posts: 36805
Joined: Mon Oct 13, 2003 7:31 am
Location: France

Re: Database size

Post by xnview »

MaierMan wrote:
xnview wrote:Perhaps you can send me 1 of your first items?
Send you what exactly?
The picture file
MaierMan wrote:
xnview wrote:Currently i compress with zlib metadata too...
Is it really necessary to store it as well?
Wouldn't a bitfiled indicating "5 = HAS_EXIF | HAS_XMP" be enough?
Metadata adds up to 40-50kb for a good tagged file (raw).
Yes, to be able to show IPTC/EXIF in labels (thumbnails view)
Pierre.
MaierMan
Posts: 78
Joined: Wed Aug 04, 2004 8:32 pm

Re: Database size

Post by MaierMan »

xnview wrote:
MaierMan wrote:
xnview wrote:Perhaps you can send me 1 of your first items?
Send you what exactly?
The picture file
Uploaded the files corresponding to those thumbs to:
http://celebnamer.celebworld.ws/stuff/x ... mp/Origin/
xnview wrote:
MaierMan wrote:
xnview wrote:Currently i compress with zlib metadata too...
Is it really necessary to store it as well?
Wouldn't a bitfiled indicating "5 = HAS_EXIF | HAS_XMP" be enough?
Metadata adds up to 40-50kb for a good tagged file (raw).
Yes, to be able to show IPTC/EXIF in labels (thumbnails view)
If it's just to display those pictograms/labels then it would be sufficient to store a bit indicating "EXIF-there/not-there" and nothing more. ;)
Those other tools (Preview, Properties) seem to load the files again anyway. At least that's what FileMon tells me...
User avatar
xnview
Author of XnView
Posts: 36805
Joined: Mon Oct 13, 2003 7:31 am
Location: France

Re: Database size

Post by xnview »

MaierMan wrote:Yes, to be able to show IPTC/EXIF in labels (thumbnails view)
If it's just to display those pictograms/labels then it would be sufficient to store a bit indicating "EXIF-there/not-there" and nothing more. ;)
Those other tools (Preview, Properties) seem to load the files again anyway. At least that's what FileMon tells me...
No, in thumbnail labels, i use cache, i don't load the file.
Pierre.
User avatar
xnview
Author of XnView
Posts: 36805
Joined: Mon Oct 13, 2003 7:31 am
Location: France

Re: Database size

Post by xnview »

MaierMan wrote:
xnview wrote:
MaierMan wrote: Send you what exactly?
The picture file
Uploaded the files corresponding to those thumbs to:
http://celebnamer.celebworld.ws/stuff/x ... mp/Origin/
Ok, i have removed all not needed data, and now db is 50% lesser.
Could you send me an email, i would like to send you an alpha version? And perhaps ask you some questions about sqlite :-P
Pierre.