Database performance

Ideas for improvements and requests for new features in XnView MP

Moderators: helmut, XnTriq, xnview

User avatar
m.Th.
XnThusiast
Posts: 1676
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Database Performance

Post by m.Th. »

xnview wrote:
m.Th. wrote: When I open the program, it opens quite fast (the thumbnails were generated in the previous session), but for the first time when I want to change the tab of the Tree Pane from the 'Folders' to 'Categories' it freezes for a very long time
Do you have a lot of categories? Images in categories? Tagged files?
Nope. Just the defaults +/- 20 categories automatically imported from .XMP sidecar files (we have other DAMs on our LAN with which we do our cataloging). I didn't use the engine because I cannot - as I stated in other places, in order to really use an in-database organization, you must provide a best-in-class Compare Engine in order to assign in one pass ratings (mainly), colors and keywords (aka. categories). This is a dogma, an industry standard (for ex. see Lightroom, IDImager, ACDSee, FastStone Viewer, AfterShot Pro, Zoner Photo Studio, Photo Supreme - yes, I have worked with all these). IMHO, this should be the next step after optimizing the db and polishing the on-disk file management (organization).

But we digress. Let's go back...
xnview wrote:
m.Th. wrote: Well, you have here SOME problems. In fact, at least in my use case of freezing (see my previous posts), the full table scan is (IMHO) triggered by 'Tag' field because you show in the Categories Tree the number of tagged photos (see Special Items\Tagged Files). That number is obtained by having a "Select count(*) from Images where Tag=1" or similar (for ex. "Select sum(tag) from Images"), isn't it? Also, in order to display the files you need to have a query (most probably a join with 'Datas') in which you have somewhere 'Where Tag=1', isn't it? :D

Well, the above SQL queries (an aggregate on a non-indexed field, a search condition on a non-indexed field) triggers a full table scan. Hence the slowdown on big databases.
Ok, i'm not expert in DB :)

So you think that i must add

Code: Select all

CREATE UNIQUE INDEX Tags_idx ON Images(ImageID, Tag);
, right?
Well, nope. :D

...and really doesn't matter that you aren't an expert in DB. :) We still love you and we're here to help. :D

IMHO, you must do the following:
(after each command follows a short explanation)
(more explanations about why is this upon request)
(the SQL commands are written from top of my head. Perhaps the syntax isn't entirely correct)

Code: Select all

CREATE INDEX Tags_idx ON Images(Tag);
The index should be only on the Tag field and isn't unique. The engine knows how to combine data streams. Generally try to avoid compound indexes, unless you really know what you want to do (they are slower in non-specific cases)

Code: Select all

Create Index Rating_idx on Images(Rating);
Essential in day-by-day work (ok, essential IF we'll improve that Tools > Compare (Shift+C) engine) because rating means 'the very best', 'best', 'mediocre' ...etc. chosen from a series of photos - this is the quality filter which is usually applied after technical (file locations, format, size, ISO etc.) and human-meaning (categories, colors etc.) type of filters.

Code: Select all

Create Index Color_idx on Images(Color);
Very necessary also. Usually the colors mean statuses (as you know already: 'Important', 'Personal', 'To Do'...) and the user expects filtering on these to be blazingly fast. Btw, Zoner Photo Studio has 9 (nine) of them and I don't see why we cannot. Just sayin'

You have also two other columns in the same table - Size and ModifiedDate which, depending on their use, need also indexes. An interesting thing to note is the possibility to use Size field as a strong limiter to find duplicates. But about this, in the future.

As an aside, in TagsTree table you have 2(two) identical indexes - the one is the autoindex created automatically by issuing

Code: Select all

UNIQUE (ImageID, TagID)
and the other you created by hand:

Code: Select all

CREATE INDEX TagsTree_idx ON TagsTree(ImageID,TagID); 
Drop the second one. It slows down the engine without bringing any advantage.

Also, in Tags table you need for sure an index on 'ID' field - something like this:

Code: Select all

Create Index Tags_ID_idx on Tags(ID);
and an index on ParentID field:

Code: Select all

Create Index Tags_ParentID_idx on Tags(ParentID);
The compound index automatically created by the

Code: Select all

UNIQUE (Label, ParentID)
from table creation isn't optimal (ParentID is the 2nd field in the index).

To all of you guys, feel free to comment till I'll come back with more analysis.

Also, some questions for Pierre: (perhaps I could help further)

1. 'Datas' table has the thumbnails in it, right?
1a. Then what's the purpose of 'Created' and 'Size' fields?

2. 'Extra' table - for what is it?

3. 'ImagesInfo' table - I strongly suspect that we can optimize this. Where do you use it? Also, at least, for me the 'Format' column is quite odd. You simply put the extension there?

Btw, in Images table the Meta field has the EXIF info, isn't it?
m. Th.

- Dark Themed XnViewMP 1.7.1 64bit on Win11 x64 -
User avatar
xnview
Author of XnView
Posts: 46238
Joined: Mon Oct 13, 2003 7:31 am
Location: France
Contact:

Re: Database Performance

Post by xnview »

m.Th. wrote: Well, nope. :D

...and really doesn't matter that you aren't an expert in DB. :) We still love you and we're here to help. :D

IMHO, you must do the following:
(after each command follows a short explanation)
(more explanations about why is this upon request)
(the SQL commands are written from top of my head. Perhaps the syntax isn't entirely correct)
Ok, i'll add it, i'll send a link to test if speed is better
1. 'Datas' table has the thumbnails in it, right?
yes
1a. Then what's the purpose of 'Created' and 'Size' fields?
Created date, and size of the size, to compare at loading if file has changed
2. 'Extra' table - for what is it?
For a future use, metadata for example
3. 'ImagesInfo' table - I strongly suspect that we can optimize this. Where do you use it? Also, at least, for me the 'Format' column is quite odd. You simply put the extension there?
Used for search query
Btw, in Images table the Meta field has the EXIF info, isn't it?
yes, IPTC+EXIF
Pierre.
User avatar
m.Th.
XnThusiast
Posts: 1676
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Database Performance

Post by m.Th. »

xnview wrote:
m.Th. wrote: ...
1. 'Datas' table has the thumbnails in it, right?
yes
1a. Then what's the purpose of 'Created' and 'Size' fields?
Created date, and size of the size, to compare at loading if file has changed
I think that you don't need the 'Created' field in the 'Datas' table. Why do you need the 'created' time for a thumbnail? You have already in Images table the Modified field for the file which IMHO is enough to detect the file change.

---------------------

Also, your database is slow because at least of two more factors:

1. Synchronous mode is Full. This is slow. You can do 'pragma synchronous=1' (normal) in order to obtain substantial performance gains with very little risk. Just checked - it works faster.
2. Your page cache is too low (2000 - the default). IOW you allocate only 2MB to SQLite. Boy, that's too low nowadays. Ok, perhaps you can put a setting in DB Cache manager but for the time being perhaps is better to have a 'pragma cache_size=100000; (100 MB)

...or perhaps you do these already?
...also, do you use the BEGIN / END TRANSACTION when you generate the thumbnails?
m. Th.

- Dark Themed XnViewMP 1.7.1 64bit on Win11 x64 -
User avatar
xnview
Author of XnView
Posts: 46238
Joined: Mon Oct 13, 2003 7:31 am
Location: France
Contact:

Re: Database Performance

Post by xnview »

m.Th. wrote: 1. Synchronous mode is Full. This is slow. You can do 'pragma synchronous=1' (normal) in order to obtain substantial performance gains with very little risk. Just checked - it works faster.
What is the risk?
2. Your page cache is too low (2000 - the default). IOW you allocate only 2MB to SQLite. Boy, that's too low nowadays. Ok, perhaps you can put a setting in DB Cache manager but for the time being perhaps is better to have a 'pragma cache_size=100000; (100 MB)
how i use this pragma? 100Mb is not high?
...also, do you use the BEGIN / END TRANSACTION when you generate the thumbnails?
yes
Pierre.
User avatar
oops66
XnThusiast
Posts: 2005
Joined: Tue Jul 17, 2007 1:17 am
Location: France

Re: Database Performance

Post by oops66 »

FI: about the PRAGMA cache_size:

Code: Select all

2.1)
...
Since this setting is dynamic, it can be raised to a large value to optimize a specific set of queries, and then dropped back to a lower value afterwards.

Code: Select all

A.2.3. Btree cache size

The btree cache size set by PRAGMA cache_size has a large affect on speed. You should generally set it to the value you intend to use in your program.

If you want to test the speed of memory-intensive functions, you can set the cache size to a large value to minimize the effect of paging on the timings. After setting the cache size, force the entire table being used to load into memory with a SELECT statement.
A.2.4 Settings stored in the database

Some default PRAGMA settings are stored in the database itself. These need to be controlled for reproducible tests. They can be overridden at runtime with the corresponding non-default PRAGMA. The PRAGMAs are default_cache_size and default_synchronous. 
http://web.utk.edu/~jplyon/sqlite/SQLit ... n_FAQ.html
http://componavt.livejournal.com/3393.html
XnViewMP Linux X64 - Debian - X64
User avatar
JohnFredC
XnThusiast
Posts: 2010
Joined: Wed Mar 17, 2004 8:33 pm
Location: Sarasota Florida

Re: Database performance

Post by JohnFredC »

Good stuff here. Thanks for these posts!
John
User avatar
m.Th.
XnThusiast
Posts: 1676
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Database Performance

Post by m.Th. »

xnview wrote:
m.Th. wrote: 1. Synchronous mode is Full. This is slow. You can do 'pragma synchronous=1' (normal) in order to obtain substantial performance gains with very little risk. Just checked - it works faster.
What is the risk?
http://www.sqlite.org/pragma.html#pragma_synchronous
xnview wrote:
m.Th. wrote: 2. Your page cache is too low (2000 - the default). IOW you allocate only 2MB to SQLite. Boy, that's too low nowadays. Ok, perhaps you can put a setting in DB Cache manager but for the time being perhaps is better to have a 'pragma cache_size=100000; (100 MB)
how i use this pragma? 100Mb is not high?
high?... High???... HIGH????... :shock:

Or perhaps is better 200Mb?? Also, perhaps is much better to express them in kbytes: pragma cache_size=-204800; (200 MB) because most probably we will allow changing of the DB page size on Win - the DB's page size of 1024kb is inadequate on Windows (NTFS has 4096 as default page size) and leads to slowdowns. But on this, latter.

How to use? Execute it at the beginning of the program like any other SQLite instruction. Like any other SELECT or whatever. Perhaps you're confused because in C/C++ pragma is a slightly different being.

xnview wrote:
m.Th. wrote: ...also, do you use the BEGIN / END TRANSACTION when you generate the thumbnails?
yes
Let's say that you have a folder with 5000 images in it. How often do you commit (End Transaction) when you build the thumbnails?
m. Th.

- Dark Themed XnViewMP 1.7.1 64bit on Win11 x64 -
User avatar
xnview
Author of XnView
Posts: 46238
Joined: Mon Oct 13, 2003 7:31 am
Location: France
Contact:

Re: Database Performance

Post by xnview »

No risk if 2 threads work on same DB?

I've found THAT too
m.Th. wrote: Let's say that you have a folder with 5000 images in it. How often do you commit (End Transaction) when you build the thumbnails?
only 1 BEGIN ... END transaction
Pierre.
User avatar
m.Th.
XnThusiast
Posts: 1676
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Database Performance

Post by m.Th. »

xnview wrote:
No risk if 2 threads work on same DB?
Nope.

Ok, there are risks but not because of SQLite's engine (assuming that you use the default compilation settings) but because of your code (you know, race conditions, deadlock between threads - the classical problems of multithreaded programming). Of course, special care must be accorded to some special situations like reading with a thread a table which is written with another, multiple threads which write simultaneously etc.

See http://stackoverflow.com/questions/1680 ... pplication

I've found THAT too
m.Th. wrote: Let's say that you have a folder with 5000 images in it. How often do you commit (End Transaction) when you build the thumbnails?
only 1 BEGIN ... END transaction
Hmmm... mhmmmm... mmmm....

If the items are many (let's say more than 5000-7000 - perhaps some tests might be usefull) then I'd commit at each 5000 (or 7000? ...10000??). This is not only for performance reasons (a big transaction eats memory and it can lead to swapping) but also because of frustration danger which is much more important sometimes.

What means frustration danger?

It is the amount of frustration (pain) which the user feels if something goes wrong.

Example: If I have a folder with 500 photos in it the thumb generation keeps - let's say - 10 seconds. Ok, if the PC crashes, the DB will be ok but I'll lose the last transaction ie. 10 seconds. But if I say 'Show Files (recursive)' to trigger the generation of thumbnails for a root folder with a sum of 20000 of photos which will keep 2-3 hours and I'll lose that work because something goes wrong and the transaction isn't commited, then I'll become frustrated.
m. Th.

- Dark Themed XnViewMP 1.7.1 64bit on Win11 x64 -
User avatar
xnview
Author of XnView
Posts: 46238
Joined: Mon Oct 13, 2003 7:31 am
Location: France
Contact:

Re: Database Performance

Post by xnview »

m.Th. wrote: Ok, there are risks but not because of SQLite's engine (assuming that you use the default compilation settings) but because of your code (you know, race conditions, deadlock between threads - the classical problems of multithreaded programming). Of course, special care must be accorded to some special situations like reading with a thread a table which is written with another, multiple threads which write simultaneously etc.

See http://stackoverflow.com/questions/1680 ... pplication
I've seen that
You must call open on your sqlite file to create a connection on each thread, don't share connections between threads.
Currently i share my db connection, i can have a thread that build thumbnails and one (or more) that read...
If the items are many (let's say more than 5000-7000 - perhaps some tests might be usefull) then I'd commit at each 5000 (or 7000? ...10000??). This is not only for performance reasons (a big transaction eats memory and it can lead to swapping) but also because of frustration danger which is much more important sometimes.

What means frustration danger?

It is the amount of frustration (pain) which the user feels if something goes wrong.

Example: If I have a folder with 500 photos in it the thumb generation keeps - let's say - 10 seconds. Ok, if the PC crashes, the DB will be ok but I'll lose the last transaction ie. 10 seconds. But if I say 'Show Files (recursive)' to trigger the generation of thumbnails for a root folder with a sum of 20000 of photos which will keep 2-3 hours and I'll lose that work because something goes wrong and the transaction isn't commited, then I'll become frustrated.
yes right, perhaps 1000 is good to commit...
Pierre.
User avatar
xnview
Author of XnView
Posts: 46238
Joined: Mon Oct 13, 2003 7:31 am
Location: France
Contact:

Re: Database Performance

Post by xnview »

I've made some tests

19000 pictures (PNG+JPEG)

Thumbnails creation without DB => 14min

Thumbnails creation with DB => 15min
Thumbnails loading => 9sec

Modifying cache size or synchronous has no effect here
Pierre.
User avatar
m.Th.
XnThusiast
Posts: 1676
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Database Performance

Post by m.Th. »

xnview wrote:
m.Th. wrote:
I've seen that
You must call open on your sqlite file to create a connection on each thread, don't share connections between threads.
Currently i share my db connection, i can have a thread that build thumbnails and one (or more) that read...
The above statement is valid IF the SQLite is NOT compiled with the default settings (which basically serializes the db access). So, from SQLite's POV, you are ok to have the architecture above.
yes right, perhaps 1000 is good to commit...
...and even more. Do a very unscientific benchmark and see how many thumbs do you generate in, let's say 20-25 secs. Commit there.
Last edited by m.Th. on Mon Jun 24, 2013 9:18 am, edited 1 time in total.
m. Th.

- Dark Themed XnViewMP 1.7.1 64bit on Win11 x64 -
User avatar
m.Th.
XnThusiast
Posts: 1676
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Database Performance

Post by m.Th. »

xnview wrote:I've made some tests

19000 pictures (PNG+JPEG)

Thumbnails creation without DB => 14min

Thumbnails creation with DB => 15min
Thumbnails loading => 9sec

Modifying cache size or synchronous has no effect here
Sure. Cache size has no effect on thumbnail creation. It has on searching (READING data from DB). Especially on repetable searches (either by the program either by the user through the search form).

About Sync: No such a big surprise (unfortunately?). It seems that the SQLite's IO engine keeps up with your thumbnail generation. IOW, the IO engine succeds to do its job (save the data, update the indexes etc.) till you generate the next thumbnail (I suspect that here the bottleneck is mainly the image's read time from disk) and prepare the next query for it.
m. Th.

- Dark Themed XnViewMP 1.7.1 64bit on Win11 x64 -
User avatar
xnview
Author of XnView
Posts: 46238
Joined: Mon Oct 13, 2003 7:31 am
Location: France
Contact:

Re: Database Performance

Post by xnview »

m.Th. wrote: The above statement is valid IF the SQLite is NOT compiled with the default settings (which basically serializes the db access). So, from SQLite's POV, you are ok to have the architecture above.
I use THREADSAFE=1 to build sqlite
Pierre.
User avatar
m.Th.
XnThusiast
Posts: 1676
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Database Performance

Post by m.Th. »

xnview wrote:
m.Th. wrote: The above statement is valid IF the SQLite is NOT compiled with the default settings (which basically serializes the db access). So, from SQLite's POV, you are ok to have the architecture above.
I use THREADSAFE=1 to build sqlite
Yes, this is the default setting. It serializes the queries.
m. Th.

- Dark Themed XnViewMP 1.7.1 64bit on Win11 x64 -
Post Reply