0.61: Database optimization: Unclear and too complicated

Ideas for improvements and requests for new features in XnView MP

Moderators: XnTriq, helmut, xnview

Post Reply
User avatar
helmut
Posts: 8705
Joined: Sun Oct 12, 2003 6:47 pm
Location: Frankfurt, Germany

0.61: Database optimization: Unclear and too complicated

Post by helmut »

When translating to German I lookup the appropriate screen (Tools > Settings | Database) to find out about the context of the labels. When translating the strings for database optimization I found that things are a bit unclear:
What does Clean files and Clean thumbnails do? What's the difference between the "Delete all" button and the "Clean thumbnails" checkbox? An "Optimize..." button which opens a dialog with an "Optimize" checkbox?

That many options needed?
My impression is that there are too many options. Do users really need that many options? Why?

Button "Maintenance..."
From my perspective, the database optimization is not an optimization but a maintenance. And the dialog's title is 'Database maintencance', already. So the button should be labeled "Maintenance...".

Clearer Labels
The labels should be clearer and consistent. Partly it says "Do this", "Do that". And some labels say "Check this", "Check that". This is not consistent. And what happens after checking?

== Currently:
1. Optimize (long process)
2. Remove empty directories
3. Clean thumbnails
4. Clean files
5. Check for orphaned directoreis
6. Check for orphaned files

== Suggestion:
1. Optimize database internals (may take long)
3. Clean thumbnails ???
4. Clean files ???
2. + 5. Remove empty or orphaned directories
6. Remove orphaned files

What be good if someone could explain what each checkbox does.
User avatar
m.Th.
XnThusiast
Posts: 1662
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: 0.61: Database optimization: Unclear and too complicated

Post by m.Th. »

Let me try.... IIRC :)

First of all, we must define the notion of cataloged vs scanned (sorry but I don't have now better words):

a Scanned file is the a file which is added to (entered into) database by the (automatic) subsystem which builds the thumbnails and enters all the basic (meta)data about the said file: name, directory, EXIF, machine-generated IPTC etc.

a Cataloged file is a Scanned file for which an user added informations (information = data with human meaning): color, rating, keywords/categories, tags/albums, user comments (& other human generated IPTC etc.) etc.

So, Cataloged = Scanned + User Entered Informations

Remember the above when we discuss the bellow. :)
helmut wrote:When translating to German I lookup the appropriate screen (Tools > Settings | Database) to find out about the context of the labels. When translating the strings for database optimization I found that things are a bit unclear:
What does Clean files and Clean thumbnails do?
Clean files:

Remove the (meta)data AND the thumbnails from the scanned files but NOT from the cataloged ones. IOW all the non-cataloged files disappear from the DB. Quite fast and very useful to remove the 'fat'/cruft from the DB. Very efficient.

Clean thumbnails:

Remove the thumbnails but keep the (meta)data from the scanned files AND from the cataloged ones. By far, the thumbs due of their very nature are dragging the performance down (big DB size etc.). There's no problem to remove thumbs from the cataloged ones (if one knows what he's doing) because they are regenerated on demand.

What's the difference between the "Delete all" button and the "Clean thumbnails" checkbox?

"Delete all" deletes ALL the entries (thumbs + (meta)data for scanned + cataloged). Use at your own risk!


An "Optimize..." button which opens a dialog with an "Optimize" checkbox?
Droste effect GUI. Cooool: :) ...scroll to see and click on your button of choice:
Droste.jpg
Droste.jpg (140.65 KiB) Viewed 1404 times
...but I agree that perhaps we must change something...


That many options needed?
My impression is that there are too many options. Do users really need that many options? Why?
YES, they are needed. One of the critical things of any DAM which I've seen (and I've seen some) was/is to keep the DB in shape in the long run. However I agree that we must explain better what each thing means. The very first GUI improvement, IMHO would be to change the labels to more descriptive texts and to add hints to further explain what each check-box does.


Button "Maintenance..."
From my perspective, the database optimization is not an optimization but a maintenance. And the dialog's title is 'Database maintencance', already. So the button should be labeled "Maintenance...".
Good idea. However one is more inclined to push on 'Optimize...' rather than on 'Maintenance...'. I would like to keep the 'Optimize...' because one will click on 'Maintenance...' only when the things start to be really bad. Another idea would be to run some maintenance tasks (pardon, optimize :) ) in the background and/or at the end (like in Lr / ACDSee) or at the beginning like in ASP, automatically or with a dialogue asking the user.

Clearer Labels
The labels should be clearer and consistent. Partly it says "Do this", "Do that". And some labels say "Check this", "Check that". This is not consistent. And what happens after checking?
+1


== Currently:
1. Optimize (long process)
In the long run, the DB file becomes fragmented hence the time to retrieve a certain info increases (sometimes a lot) because of 1.) disk fragmentation - the disk head will need to jump from sector to sector to find the requested info and 2.) pointer / page fragmentation: the program needs to jump from page to page because of a (very) deep chain of pointers accessed very often) in order to get the desired record / index bucket.

That's why DB engines have a feature (usually called VACUUM) which freezes the current DB, copies the entire content in a new file (which is, of course, "ok"), deletes the old DB, renames the new DB to the old name and connects to it. Because of all this transfer, the process is very slow for a large file.

2. Remove empty directories
There is a good chance in the the 'Folders' table to remain records to directories with 0 (zero) scanned files (deleted, moved files etc. - many reasons). It just happens. This option removes these entries which are just noise. The process is very fast and efficient. In time, these directories accumulate (especially for a black-white program like XnViewMP which enters in DB automatically each folder which visits) and should be removed. The problem is smaller for 'white-only' programs (programs which need to 'Import to Catalog' first in order to work with the files - eg. Lr).


3. Clean thumbnails
Explained above.

4. Clean files
See above.

5. Check for orphaned directoreis
It removes from the database the directories which the OS cannot find anymore. Here is a very big discussion what happens with the offline media (removable media, servers which can go down, network rights etc.), with already cataloged files, with relocation etc. ...but for the time being just take it as is. This is very very fast and very efficient in keeping the DB performance up. ...if one is careful to not wipe cataloged things from places which he doesn't want to.

6. Check for orphaned files
The same principle from 'Check for orphaned directories' but the process is much slower and less efficient.


== Suggestion:
1. Optimize database internals (may take long)
Foreword:

For all the things bellow, I reiterate that IMHO we should add some hints (works very good in ASP and IDImager) or any other form of supplemental explanation. For example, perhaps a text box at the bottom of the form with the corresponding explanation when one hovers with the mouse over an option (like in Photoshop) would be also a great idea (if Qt allows hijacking for OnMouseEnter and OnMouseLeave events)

Also, another very important thing: It is necessary (IMHO) to add a label in the very beginning of the form (yes, the form should be bigger) which must say: "All the operations are done in the Database. No operations will be applied to actual files and directories"

Returning to your question, my proposal for this option is almost the same:

1. Optimize database internals (may take long time)

3. Clean thumbnails ???
3. Remove unused thumbnails

4. Clean files ???
4. Remove the files without catalog info

- or - (shorter)

4. Remove the unused files

2. + 5. Remove empty or orphaned directories
Nope. Leave them separate. They are two very different things.

2. Remove empty directories
5. Remove orphaned directories

6. Remove orphaned files
6.Remove orphaned files

What be good if someone could explain what each checkbox does.
...well, I tried to be good. :)

I don't know if I succeeded.
m. Th.

- Dark Themed XnViewMP 1.6 64bit on Win11 x64 -
User avatar
helmut
Posts: 8705
Joined: Sun Oct 12, 2003 6:47 pm
Location: Frankfurt, Germany

Re: 0.61: Database optimization: Unclear and too complicated

Post by helmut »

Wow, that's an explanation! Thanks, m.Th.. :-) I have to digest and reread this and will reply, soon.

PS: Very cool, this Droste recursive effect. ;-)
Post Reply