Page 1 of 1

Important question about Upgrading to the Next Release

Posted: Mon Jul 01, 2013 8:17 am
by m.Th.
Hi all,

It is known that Pierre is preparing a new release for XnViewMP - see http://newsgroup.xnview.com/viewtopic.php?f=60&t=28256 - and this new release (as Pierre says in his own message) is geared (among other things, I presume) to database changes / optimizations.

For our discussion we distinguish two kinds of data in the database:
  • Thumbnails - which are generated, and it can be generated again with ease. Big, difficult to manipulate, to move around. Also, if a thumbnail is deleted it is rebuilt when the user goes in the corresponding folder.
  • Catalog - file info, EXIF, IPTC, ratings, colors, categories etc. Human made. Small, high value data - easy to manipulate and optimize.
Question:

Do you agree to delete the thumbnails from the database in order to make the upgrade process 50 or even 100 times more quicker and simpler?

(Personally I don't see why not, there are only advantages, but I wanted to ask you also).

Feel free to comment.

Re: Important question about Upgrading to the Next Release

Posted: Mon Jul 01, 2013 9:16 am
by oops66
Hello,

...Why not having 2 data bases ... one for (and from) the Human made. Small, high value data and easy to manipulate and optimize (the catalog.xndb)... and the other one for the tumbnails (thumbnails.xndb) automatically made by xnViewMP to give the opportunity to centralize and optimize (on ssd for example) the display.

... Personally, I like this kind of structure and organization(your) (with a "root folders list" and relative path ):
http://newsgroup.xnview.com/viewtopic.p ... 93#p112193
Image

Re: Important question about Upgrading to the Next Release

Posted: Mon Jul 01, 2013 10:40 am
by m.Th.
Yes, sure. :D This is what we have in mind.

The problem is that now:
  1. "Thumbs.DB" is inside of Xnview.db
  2. 'Thumbs.DB' (ok, the 'Datas' table) doesn't have an optimal structure because the Thumbnail is tied with other useful data. (Thumbnail size etc.)
We can refactor this, but it will take a looooOOOOoooOOOoooOOOooOOOOOng time for you to wait till the program will move and split around the thumbnails from...
  1. ...the old database
  2. ...the fields with which it is together
I think that point 1 above is obvious, let me explain point 2:

Technical info begin:

Unfortunately, SQLite stores the BLOBs in-line - IOW, together with the record. (Allmost all the other DB engines store it outside of the record, in special pages).

As we saw here http://newsgroup.xnview.com/viewtopic.php?f=60&t=28232 it is important to minimize the IO operations and the access to disk and this is done through Pages and Cache Size (also, be sure to vote here http://newsgroup.xnview.com/viewtopic.php?f=60&t=28263)

Let's say that we have a page size of 8192 bytes (8k).

And now, let's do some math:

If we have Datas like this:

Code: Select all

CREATE TABLE Datas(ImageID INTEGER PRIMARY KEY, Created INTEGER, Size INTEGER, Data BLOB);
...then our thumbnail stored in the Data field will eat all the page (in fact it will eat two pages because usually thumbs are bigger than 8k). Hence with one IO operation will fetch just one record (!). So, any query on this table (except fetching the thumbs, of course) trying to compute the sum of sizes or check the Created field or for any other futher purpose will be very slow because it will trash the disk badly only and only to discard 99% (yes, 99%! ) of the data.

If we refactor like this...

Code: Select all

CREATE TABLE Datas(ImageID INTEGER PRIMARY KEY, Data BLOB); -- leave just the thumb

Code: Select all

CREATE TABLE DatasExtra(ImageID INTEGER PRIMARY KEY, Created INTEGER, Size INTEGER); -- and any future field


...we'll have for the second table in the most common case (it can vary, of course): 4 bytes (PK) + 4 bytes ('Created' field) + 4 bytes ('Size') = 12 bytes / record. (!!)

Hence in one page (8192 bytes) we'll have approx. 680 records!! in the new model... ...HUGE difference from one record / two pages (!) in the old model

So, any query not fetching the thumbs will be hundredths times faster than now! (especially if we know to exploit this - JOINs, GROUP BY... etc.)

-----------
Also, because our blessed SQLite doesn't have direct access to fields, only sequential, with the old design if we want to add any field in the future, we must add it after the BLOB, hence we'll read (scan) that big chunk of data even if we need it or not. More slowness... yech! :(

Technical info end.


SO, we need (IMHO) to refactor the old 'Datas' table. BUT in order to refactor, my question from OP arises: Do you want to keep the generated thumbs? If yes, then the upgrade process will be very slow (because we'll need to process (move, split) many, many records - explanation above). If you do not want to keep the thumbs (I don't want since there are NO disatvantages) the the upgrade process will be simpler and quicker because the program will empty first the Datas table (the Thumbs database will be empty) and hence no record to move around.

This was my question: Do you want to keep the generated thumbs?

Re: Important question about Upgrading to the Next Release

Posted: Mon Jul 01, 2013 11:11 am
by oops66
m.Th. wrote: ... if we want to add any field in the future, we must add it after the BLOB, hence we'll read (scan) that big chunk of data even if we need it or not. More slowness... yech! :(
... Yes, it's also a good reason to rethink the DB ;-)

Code: Select all

This was my question: Do you want to keep the generated thumbs?
...Yes, but in this case we can have 3 bases:
1-) the Human made DB (label, ratting, keywords, etc...). Small, high value data and easy to manipulate and optimize (the catalog.xndb)
2-) info_files.xndb ... & automatically made by xnViewMP for other useful data. (Thumbnail size etc.)
3-) The thumbnails DB (thumbnails.xndb), only with thumbs data blob & automatically made by xnViewMP to give the opportunity to centralize and optimize (on ssd for example) the display.

No ?