Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC etc.)

Ideas for improvements and requests for new features in XnView MP

Moderators: XnTriq, xnview

User avatar
oops66
XnThusiast
Posts: 1999
Joined: Tue Jul 17, 2007 1:17 am
Location: France

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by oops66 » Fri Jul 26, 2013 2:28 pm

m.Th. wrote:Much updated version including the mapping for the XnView's fields, Metadata2 table and other changes at
https://docs.google.com/spreadsheet/pub ... utput=html
Enjoy & ask!
... Good summary, thank you ... it's OK for me.
XnViewMP 0.82 Linux X64 - Ubuntu 16.04 LTS - X64

BabaG
Posts: 150
Joined: Sat Mar 03, 2007 6:41 am

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by BabaG » Sat Jul 27, 2013 12:24 am

m.Th. wrote:- metadata1 for most used and most often - high-performance table - mainly the EXIF info
- metadata2 which will act like a L2 cache, mainly for big texts (mainly IPTC info) which aren't so often entered and sought
i think i'd only ever want to search iptc, never exif.

BabaG

Gérard 91
Posts: 88
Joined: Wed Sep 23, 2009 8:33 pm

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by Gérard 91 » Sat Oct 19, 2013 12:17 pm

This post is interesting (but a bit too technical for boetian like me!).

I don't understand why exif are so important for request. Of course I'm interested by information about picture (exif = how the picture was shoot), but I had never searched pictures upon these criteria.
I don't say that nobody needs to find picture through, but an amateur photographer, like me, needs mainly to find pictures matching a (or several) subject . These are in IPTC/XMP ( rating, keywords, and some content of caption ... mainly what picture represents).
Reliable IPTC/XMP is the first need for me, and quick find the second one.

User avatar
m.Th.
XnThusiast
Posts: 1578
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by m.Th. » Wed Oct 23, 2013 8:22 am

I don't understand why exif are so important for request. Of course I'm interested by information about picture (exif = how the picture was shoot), but I had never searched pictures upon these criteria.
Of course, the 1st should be the content management.

However, this is very tedious to enter and the difference in importance between some fields in IPTC is very very big (for ex. between Keywords and everything else).

Also, this info is NOT searched in IPTC/XMP because it would be very slow. It is searched in dedicated tables inside of DB. Hence we have 3 (three) layers:

- the actual one: Keywords/Categories (etc.), Rating / Color
- EXIF (small, dense fields and a professional photographer uses them often - I'll explain bellow)
- IPTC (these fields are rarely searched and very big - also rarely filled by the common man)

Several reasons from top of my head:

1. Choosing the Gear & Settings

A pro, in order to take the best shots possible, needs to plan carefully his assignment. He is forced to have an interchangeable lens camera - usually a dSLR (no flame wars please - thank you) with several lens with him. Usually he will have shots in advance from the place where the even will take place, either from a previous assignment, either because he went on purpose there.

Then he will do filters with EXIF data in order to see how the photos look at ISO 3200, 6400, 12800 etc. - how the people (and other moving subjects) behave at medium-slow shutter speeds, what Depth of Field needs in a certain circumstances etc. Also, he can do his tests with a simple zoom at different focal lengths in order to see what low-light primes will choose. For this, he will do the appropriate filters in the program to see where the focal length of 35 mm fits, where the 50 mm it is ok aso.

The same stands for lens quality. We have a stellar lens but it is too wide. Or is a prime. In the concrete conditions can we use a zoom? Yes / no because the quality difference is NOT / is noticeable. For all these we need EXIF filters.

2. Choosing Photos

Many times the portfolio from which we must choose the photos are from different cameras / lenses / light conditions. Even if we do our keyword filtering the subset which remain is too broad (eg. Landscapes \ Tower Eiffel) and then we're forced to use EXIF filters - we will choose the better (newer) camera, the lower ISO (perhaps), the smaller aperture (in order to have DoF) etc.

Also - mainly in the last years - the Author (photographer) is embedded in the EXIF (pro and semi-pro cameras support this thing) and this can be a strong info about the mood / quality / kind of the photos.

3. EXIF comes 'for free'

EXIF is generated info by the camera. Automagically.
IPTC needs to be entered by hand. And this is a very time-consuming and tedious job. Yes, I know a little - I did it for enough time. That's why IPTC info (except RCK - Rating/Color/Keywords) is rarely filled. So why put in the first place a search engine for some fields which are usually empty?

Hunting by peeking at the photos is not enough. This is one of the main reasons for which in almost all the programs (Lightroom, ASP, ACDSee etc.) these filters occupy an important place in GUI whereas the IPTC info (except, of course, Rating/Color/Keywords) is somewhere in back.
m. Th.

The Ascetic Experience - The best photos and texts from Holy Mountain (Athos)

- Dark Themed XnViewMP 0.90 64bit & XnView 2.00 x64 on Win7 x64 -

CameronD
Posts: 303
Joined: Wed Aug 01, 2007 1:28 pm
Location: Australia

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by CameronD » Thu Oct 24, 2013 8:44 am

Gérard 91 wrote:.... an amateur photographer, like me, needs mainly to find pictures matching a (or several) subject . These are in IPTC/XMP ( rating, keywords, and some content of caption ... mainly what picture represents)...
As another amateur photographer, I echo Gérard's comments.

IPTC:Caption
In my case, I tend to use xnview for preparing metadata (due to its excellent handling of multiple files and not destroying Makernotes), but often view using Picasa. Picasa's current strategy, where the search term can be located anywhere in keyword, IPTC caption, filename or folder name fits well with my existing images and storage hierarchy. It is obviously not very flexible, but its simplicity makes it very convenient.

The critical point is that I make a lot of use of the caption field. I chose the IPTC caption field some years ago, after a few painful starts discovering captioning methods that often one program supported but no others did. In particular, the exif comment or description field(s) seemed to be generally ignored, with Windows XP going so far as to invent their own fields. The IPTC caption is supported both by Win-7, Picasa and irfanview, as well as more expensive software.

Of course, what I don't understand is whether the presence of IPTC:Caption in m2 means that it already is intended to make it easy to search for. I was just worried by m.Th's comments, which I interpreted as saying IPTC coverage would be limited to RCK.

In terms of display rather than search, I have been using Windows-7 file explorer (and select tags and title so that I can have keywords and caption (resp.) displayed in "details" layout. But now I see they are available in MP display, so that is excellent.

Most other IPTC fields are quite arcane and of little use to me and I presume the vast majority of other users. However, part of XnViewMP's design philosophy must hinge on whether its priority is for professional photographers or not. If the examples for prioritising Exif data for searching are reliant on the needs of pros, then surely they are the only users who would also need most of the IPTC fields. I do not know XnView's audience but I would guess the fraction of pros would be very small. Not only that, I expect they would already be using rather more capable commercial software for managing their vast collection.

Dates
I notice some of the fields are listed as DATE - should I read that as sql's DATETIME? I would not like to think of the time of day being dropped. I often have images from multiple cameras in the one folder for one day, and sorting them by time taken can be very useful.

IPTC:keywords
I see you put them in m1, but only with some cryptic comment about "From TagsTree table". Does this mean that a new entry in this table will be created whenever a new keyword is found when scanning an old image file? Does this still work when the user has unselected the option "Import ... IPTC keyword to DB Categories"

duplicate/conflicting fields
I see you list "EXIF:Image Description" as a duplicate of IPTC:Caption. Fine, but what if they are different?
If somebody uses win-7 to edit the image title, then Win 7 will duplicate the text here plus in 3 more places: EXIF:XPTitle, XMP:description[1] and XMP:title[1]. If I then use XnView (classic or MP) to edit the IPTC caption it does not touch the EXIF fields. Both versions update the XMP:description as expected, but the classic version (2.05) also deletes the XMP:title field, while MP (0.61) leaves it as win-7 wrote it.
This is just one example, which in this case could be "fixed" by changing XnView to write different fields if it detects them, but I have come across plenty of examples in the past where duplicated fields have gotten out of sync. So if certain fields are going to be treated as equivalent then there will need to be some mechanism of resolving conflicts when they are detected.

User avatar
m.Th.
XnThusiast
Posts: 1578
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by m.Th. » Fri Oct 25, 2013 8:29 am

Of course, what I don't understand is whether the presence of IPTC:Caption in m2 means that it already is intended to make it easy to search for. I was just worried by m.Th's comments, which I interpreted as saying IPTC coverage would be limited to RCK.
For our (sub)discussion here, the m1 and m2 tables gives you the advantage to speed-up the things when you search on IPTC:Caption.

Why?

Because only in corner cases one will submit queries which will force the program to search in BOTH m1 and m2 (their scope is very different) and hence we will have an important speedup. If we would have all the data in one big table, then the table would be very wide (many big fields ==> few records / page ==> many I/O requests ==> slowness) and the biggest fields (IPTC ones) would be mostly empty which leads to speed reduction because, in fact, to find the desired info the program must skip the null fields. These null values in IPTC fields happens because nowadays any photo has EXIF but seldom we see IPTC info on them like you said:
Most other IPTC fields are quite arcane and of little use to me and I presume the vast majority of other users.


Yes, we must cover (almost) all the IPTC namespace not only RCK. :)
I do not know XnView's audience but I would guess the fraction of pros would be very small. Not only that, I expect they would already be using rather more capable commercial software for managing their vast collection.
Yes, at this moment this program is (mostly) Lightroom. But it has (IMHO) "unfixable" architecture 'drawbacks' and design limitations (most of them related to Catalog handling). And this is by design - not because they are stupid. That's why many search for another alternative. And this alternative can be XnView MP.
I notice some of the fields are listed as DATE - should I read that as sql's DATETIME?
Good point. The fields which are already in the DB (in fact only one field: Images.ModifiedDate) are DATE only. However I think that at least DateTaken should have the full DateTime info. I updated the schema accordingly. Also, I'm thinking about the similar field (for us) DateDigitized to leave as DATE or - better - to break it in 3 fields: Year, Month, Day in order to have a very fast (and nice) filtering engine based on calendar. What do you think?
IPTC:keywords
I see you put them in m1, but only with some cryptic comment about "From TagsTree table". Does this mean that a new entry in this table will be created whenever a new keyword is found when scanning an old image file? Does this still work when the user has unselected the option "Import ... IPTC keyword to DB Categories"
No, the're not in the m1 (or in m2). If you look closer, you'll see that the JoinedTbl column is empty and, also, in the 2nd sheet where I present the tables separately (called 'Unified Metadata Space') the keywords are absent. They will remain, like they are now, in the TagsTree table - a separate table with a classic layout for these tree-like structures.
...So if certain fields are going to be treated as equivalent then there will need to be some mechanism of resolving conflicts when they are detected.
+1

Usually this is done by using a list like this:

Field 1A, Field 1B, Field 1C...
Field 2A, Field 2B, Field 2C...
Field 3A, Field 3B,...
etc.

...in which the first non-null/non-empty field wins.

IOW, If the field is found in the list / matrix then if the said field has a in front of it another field with a non-null value, the new field will replace the said field.

Example: if both Field 1C and Field 1A have a non-null value then the Field 1A will replace the value of field 1C.
Also, one can implement more advanced techniques by asking the user etc.
m. Th.

The Ascetic Experience - The best photos and texts from Holy Mountain (Athos)

- Dark Themed XnViewMP 0.90 64bit & XnView 2.00 x64 on Win7 x64 -

CameronD
Posts: 303
Joined: Wed Aug 01, 2007 1:28 pm
Location: Australia

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by CameronD » Tue Oct 29, 2013 2:40 pm

Thanks for taking to time to explain...
m.Th. wrote:...Because only in corner cases one will submit queries which will force the program to search in BOTH m1 and m2 (their scope is very different) and hence we will have an important speedup.
Yes, I agree that my searches would fit that pattern, not needing m1 and m2 at the same time.
Yes, we must cover (almost) all the IPTC namespace not only RCK. :)

I am still confused by this reference to RCK as IPTC namespace - I can find no entry for Rating or colour in IPTC. It seems to me they are XnView special values that exist only in the database. They are certainly not written to the IPTC or XMP blocks in the image file itself.
Also, I'm thinking about the similar field (for us) DateDigitized to leave as DATE or - better - to break it in 3 fields: Year, Month, Day in order to have a very fast (and nice) filtering engine based on calendar.[/b] What do you think?
For my use, DateDigitized would be different from DateTaken only when I have scanned some old films/prints. In that case I would think the time of day is generally unimportant, but still present in the image file itself should it be needed. However, if your intention was to use it as a sort of proxy for DateTaken then it might given very misleading results on the few occasions when the dates do differ. Of course, with my old photos, I am lucky if I know the month they were taken, let alone anything more precise, but others might have kept much better records. I suspect a calendar-based search is something I would not use very much.
IPTC:keywords
No, the're not in the m1 (or in m2). If you look closer, you'll see that the JoinedTbl column is empty and, also, in the 2nd sheet where I present the tables separately (called 'Unified Metadata Space') the keywords are absent. They will remain, like they are now, in the TagsTree table - a separate table with a classic layout for these tree-like structures.
Yes, sorry, I was being sloppy. I vaguely understood what was happening.

I am still unclear about the program's behaviour when it comes across a new keyword. For example, many of my old images processed with XnView classic have keywords assigned from a simple list. Does it depend on whether the user has selected the option "Import ... IPTC keyword to DB Categories"
...Conflicting duplicate fields

...in which the first non-null/non-empty field wins.
...
Example: if both Field 1C and Field 1A have a non-null value then the Field 1A will replace the value of field 1C..
This is almost certainly not what the user would want, and probably would not even expect (unless, like me, they have been tripped up by this before).

The issue of conflicting "duplicate" fields also ties in with the new keywords. I have noticed that if I choose keywords including the category tree, then the IPTC keywords get out of sync with the XMP equivalent fields, at least what windows7 file explorer, Picasa and XnView classic regard as equivalent fields. Is this something to do with how Lightroom behaves?

User avatar
m.Th.
XnThusiast
Posts: 1578
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by m.Th. » Wed Oct 30, 2013 9:21 am

I am still confused by this reference to RCK as IPTC namespace - I can find no entry for Rating or colour in IPTC.
Don't think like this.

Think that there are some tables in XnView's DB which will export their data according to the standards in some files. RCK is already there. Exporting to XMP/IPTC of these things is already there (I must confess that I didn't check the correctness of the export).
For my use, DateDigitized would be different from DateTaken only when I have scanned some old films/prints. In that case I would think the time of day is generally unimportant, but still present in the image file itself should it be needed. However, if your intention was to use it as a sort of proxy for DateTaken then it might given very misleading results on the few occasions when the dates do differ. Of course, with my old photos, I am lucky if I know the month they were taken, let alone anything more precise, but others might have kept much better records.


Yes, sure - no problem. I know this. We also do/did such things (scanning etc.) - sometimes a lot.
No, DateDigitized is NOT a proxy for DateTaken. There are two different things. However...

Because, as you figured out, 1.) most of the time the two values are the same 2.) when they are different, at least for me, the time part of the DateDigitized isn't so important, hence I'm thinking that dropping the time part of this and breaking the info in three fields will give us the instant (real-time) metadata search based on this value. All the (SOHO DAM) programs do it - and many times is very useful. Here is a screenshot from ASP (I choose it because is the most easiest to read and illustrate the concept):
ASP-MetadataBrowser.JPG
ASP-MetadataBrowser.JPG (28.75 KiB) Viewed 2135 times
I suspect a calendar-based search is something I would not use very much.
Depends on what you're doing. A lot.
Usually people (including you and your friends, brothers & clients :) ) remember the date (month) when a certain event took place. And because date/time cataloging comes 'for free' - it is often used as a tool. Of course, in a perfect world where everything is indexed with the right keywords there will be (almost) no need for this but we are far far far away from a perfect world...
This is almost certainly not what the user would want, and probably would not even expect (unless, like me, they have been tripped up by this before).
Sorry, I described how the program would behave internally. Of course, many users doesn't know how the SQL COALESCE behaves. But if you think that every field, even there are same (duplicate) fields in XMP/EXIF/IPTC namespace should have their placeholder - that's (more or less) fine with me. IMHO this is a rather secondary issue, even if I would try to avoid duplication due of performance issues.
[...] Is this something to do with how Lightroom behaves?
Good or bad, Lr is standard. And there is a reason for this. We must take it in account. Also, as I said, we must avoid the explosion of useless metadata fields because of (mainly) historic reasons (standard convolutions etc.)
m. Th.

The Ascetic Experience - The best photos and texts from Holy Mountain (Athos)

- Dark Themed XnViewMP 0.90 64bit & XnView 2.00 x64 on Win7 x64 -

CameronD
Posts: 303
Joined: Wed Aug 01, 2007 1:28 pm
Location: Australia

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by CameronD » Wed Oct 30, 2013 3:23 pm

m.Th. wrote:
I am still confused by this reference to RCK as IPTC namespace - I can find no entry for Rating or colour in IPTC.
Think that there are some tables in XnView's DB which will export their data according to the standards in some files. RCK is already there. Exporting to XMP/IPTC of these things is already there (I must confess that I didn't check the correctness of the export).
That might be it - the export seems OK (once I enable the options to do so). However either the reimport is not so good, or else the database is messed up.
I can see the xmp group within the XMP tab shows Rating and Label. When XnViewMP v0.61 first sees the file with these set (file was copied to a new folder outside XnView) then it displays both. However as soon as I leave the folder and return, they are no longer shown and no longer set in the "Properties->File"
I am glad to see the ratings value is supported in the same way by Windows 7 explorer.

For me this is a new install - I have not run MP before. I do have the options to import/export between DB categories and iptc keywords turned off at the moment, because I was investigating the new keyword behaviour. So it is either a direct bug, or an unexpected result of something else I am doing.

DateDigitized vs DateTaken
Usually people (including you and your friends, brothers & clients :) ) remember the date (month) when a certain event took place. And because date/time cataloging comes 'for free' - it is often used as a tool..
I agree dates would be frequently used by most people to locate images - I was initially describing my personal usage pattern where I have a fairly deep storage hierarchy and could find most "events" by following that. Even so, I can still think of other times I might need it. However, in all cases I can think of I would be searching based on DateTaken, not DateDigitized. That was what I meant by using DateDigitized as a proxy for DateTaken. Occasionally I might even want to search based on time of day - e.g. night time with flash off to locate star traces.

Perhaps you could ignore DateDigitized and just store DateTaken as separate year, month, day and time-of-day fields. The components would be easy enough to recombine for sorting purposes.

The only significant reason I can think of for storing DateDigitized, would be to locate images where the dates taken and digitized were not equal. And then I would want to be comparing dates only without time of day.
But if you think that every field, even there are same (duplicate) fields in XMP/EXIF/IPTC namespace should have their placeholder - that's (more or less) fine with me. IMHO this is a rather secondary issue, even if I would try to avoid duplication due of performance issues.
I think you have misunderstood me. I agree that the similar fields should not be duplicated in your database. I am suggesting that conflicting fields is almost certainly a sign of a problem and it would be important to indicate the conflict to the user and have them make the decision. Either discard A, discard B, or concatenate - then offer an edit window.

This leads to the next potential problem - the caption (for example) is saved in the DB and the user later edits the caption using some other software. Ideally XnView should rescan any file where the modify date is newer, but I cannot see if the file modify time is saved anywhere to allow this check.

User avatar
m.Th.
XnThusiast
Posts: 1578
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by m.Th. » Thu Oct 31, 2013 9:48 am

That might be it - the export seems OK (once I enable the options to do so). However either the reimport is not so good, or else the database is messed up.
I can see the xmp group within the XMP tab shows Rating and Label. When XnViewMP v0.61 first sees the file with these set (file was copied to a new folder outside XnView) then it displays both. However as soon as I leave the folder and return, they are no longer shown and no longer set in the "Properties->File"
I am glad to see the ratings value is supported in the same way by Windows 7 explorer.
Start a new thread and put a clear use case for this. IOW Steps to Reproduce: 1. Go to... 2. Do... 3. ... etc. Write at the end the "Expected: (what do you think that it should happen)" and "Actual: (what's really happening)" sections.
I think you have misunderstood me. I agree that the similar fields should not be duplicated in your database. I am suggesting that conflicting fields is almost certainly a sign of a problem and it would be important to indicate the conflict to the user and have them make the decision. Either discard A, discard B, or concatenate - then offer an edit window.
Aha! An "edit" window. Fair with me. Sometimes is better solution and perhaps easier to implement. In fact, IIUC, you mean a 'Conflict Resolver' window and let the user chose the correct value from 2..n sources (IOW from a list).
This leads to the next potential problem - the caption (for example) is saved in the DB and the user later edits the caption using some other software. Ideally XnView should rescan any file where the modify date is newer, but I cannot see if the file modify time is saved anywhere to allow this check.
Yep, the programs detects these changes based on file modify time. And to be clear, XMP/sidecar's file modify time (usually this is to be found in the DB as date of the last import). Lr et al. shows a small icon in one of the corners when detects an outside change in metadata. Some other programs don't do it, but in any case, all programs have an option in the 'Right-click menu' to Import/export/Synchronize folder from/to XMP...'
m. Th.

The Ascetic Experience - The best photos and texts from Holy Mountain (Athos)

- Dark Themed XnViewMP 0.90 64bit & XnView 2.00 x64 on Win7 x64 -

CameronD
Posts: 303
Joined: Wed Aug 01, 2007 1:28 pm
Location: Australia

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by CameronD » Thu Oct 31, 2013 11:23 am

m.Th. wrote:Start a new thread and put a clear use case for this
OK - will first do a new default install on another PC, in case I have messed up settings.
Aha! An "edit" window. Fair with me. Sometimes is better solution and perhaps easier to implement. In fact, IIUC, you mean a 'Conflict Resolver' window and let the user chose the correct value from 2..n sources (IOW from a list).
Actually I meant both - initially a conflict resolver window, but I was then thinking of some edit window that would allow merging values. On second thoughts, perhaps that situation would be so unusual that the normal editing processes would be sufficient.

CameronD
Posts: 303
Joined: Wed Aug 01, 2007 1:28 pm
Location: Australia

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by CameronD » Sun Nov 17, 2013 1:56 am

These comments are based on my never having used SQlite, so take them on that understanding.
m.Th. wrote:Also, I'm thinking about the similar field (for us) DateDigitized to leave as DATE or - better - to break it in 3 fields: Year, Month, Day in order to have a very fast (and nice) filtering engine based on calendar. What do you think?
Revisiting the dates issue again, I noticed recently that SQlite provides very minimal support for date and time. Certainly there are no useful functions to be put into efficient SQL queries, although I notice programmers can add their own, although their efficiency will come down to the storage format.

At the moment ModifiedDate seems to be stored as a string. SQlite says it offers string, real number or integer, I get the impression it is up to the user code to handle the processing of these. Certainly the inbuilt sql functions are all based on strftime and require iso-encoded strings.

Possible date/time storage:
  1. integer timestamp - no good, as it does not represent years before 1970
  2. string: iso or otherwise does not really matter
  3. double precision real number Julian Date.
  4. set of integers: year, month, day, time-of-day in seconds
  5. different integer set options like: year, day-of-year
Here are some possible searches that people might want to do:
  • an exact date - trivial to do with any option
  • a given month and year, again trivial in each method
  • an anniversary date, but not sure of year: easy with set of integers (using day-of-year has problems with leap years); easy with string comparisons, and only slightly trickier if you want to build a query like "it was between 1996 and 2002"; ugly with Julian date;
  • arbitrary range of dates: trivial with Julian; trivial with string because you can do a string compare or convert to integer and compare; getting a bit messy to code with sets of integers once you start allowing for searches spanning multiple months and year(s) but still simple enough to convert back to a single number to compare (with no speed advantage).
  • time of day (range): easy(ish) with string; easy with integer time of day; simple with Julian date, as it is from the fractional part.
  • The worst common case I imagine will be: a range of years, and a range of a few days either side of a certain date (somebody's birthday): that is fairly easily handled by sets of integers; hopeless with Julian dates; and easy enough with strings by converting substrings to ints.
I did a few simple tests against the file ModifiedDate in my DB, to which I have registered nearly 20000 images in 700 folders. Date range searches on images inner join folders take around 0.2 seconds. Likewise a match on ModifiedDate LIKE '____1117%' takes a similar time (this matches today's date, any year).
For the last example in my list, indexed integers gave speedups between 2 and 4, depending on the number of rows matched, and probably also on caching.

Summary:
String storage offers most flexible search for all the possibilities I can think of, but no major advantage.
Sets of integers is faster, but not as much as I expected. I have no idea how the speed comparison will scale.

User avatar
m.Th.
XnThusiast
Posts: 1578
Joined: Wed Aug 16, 2006 6:31 am
Contact:

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by m.Th. » Thu Nov 21, 2013 9:42 am

The key in search speed is to determine the engine to use indexes for filtering data and avoid a natural (full) scan over the searched table.

This is the part you didn't stress enough in your (otherwise good) analysis. 2 to 4 times better for 20000 images is quite a lot. There are catalogs with 500,000 - 1,000,000 photos and even more. Also do not forget that joins can be more complicated according with the criteria to be sought (besides time & date, perhaps the user will search for something else, isn't it?).

That's why almost all the programs for which I saw the DB structure have the structure which I proposed: DateTaken as a DateTime field (IOW an Integer depicting the number of seconds from 1970 for SQLite - hopefully nobody shot a digital photo before 1970) and the other field DateDigitized (or in some cases the same DateTaken) being broken in three: Year, Month, Day.

In this way, any concrete query (thanks a lot for the list of possible queries, btw) will use indexes, because according with the (period of) time sought, the GUI will handle this accordingly. IOW, the GUI will have a simple time filter (see for example Lightroom) which will work on DateDigitized handling queries like eg. "give me all photos which are digitized on 25/12" - Christmas. The index usage on this query can be achieved only if we have the three YMD fields.


CameronD wrote:These comments are based on my never having used SQlite, so take them on that understanding.
m.Th. wrote:Also, I'm thinking about the similar field (for us) DateDigitized to leave as DATE or - better - to break it in 3 fields: Year, Month, Day in order to have a very fast (and nice) filtering engine based on calendar. What do you think?
Revisiting the dates issue again, I noticed recently that SQlite provides very minimal support for date and time. Certainly there are no useful functions to be put into efficient SQL queries, although I notice programmers can add their own, although their efficiency will come down to the storage format.

At the moment ModifiedDate seems to be stored as a string. SQlite says it offers string, real number or integer, I get the impression it is up to the user code to handle the processing of these. Certainly the inbuilt sql functions are all based on strftime and require iso-encoded strings.

Possible date/time storage:
  1. integer timestamp - no good, as it does not represent years before 1970
  2. string: iso or otherwise does not really matter
  3. double precision real number Julian Date.
  4. set of integers: year, month, day, time-of-day in seconds
  5. different integer set options like: year, day-of-year
Here are some possible searches that people might want to do:
  • an exact date - trivial to do with any option
  • a given month and year, again trivial in each method
  • an anniversary date, but not sure of year: easy with set of integers (using day-of-year has problems with leap years); easy with string comparisons, and only slightly trickier if you want to build a query like "it was between 1996 and 2002"; ugly with Julian date;
  • arbitrary range of dates: trivial with Julian; trivial with string because you can do a string compare or convert to integer and compare; getting a bit messy to code with sets of integers once you start allowing for searches spanning multiple months and year(s) but still simple enough to convert back to a single number to compare (with no speed advantage).
  • time of day (range): easy(ish) with string; easy with integer time of day; simple with Julian date, as it is from the fractional part.
  • The worst common case I imagine will be: a range of years, and a range of a few days either side of a certain date (somebody's birthday): that is fairly easily handled by sets of integers; hopeless with Julian dates; and easy enough with strings by converting substrings to ints.
I did a few simple tests against the file ModifiedDate in my DB, to which I have registered nearly 20000 images in 700 folders. Date range searches on images inner join folders take around 0.2 seconds. Likewise a match on ModifiedDate LIKE '____1117%' takes a similar time (this matches today's date, any year).
For the last example in my list, indexed integers gave speedups between 2 and 4, depending on the number of rows matched, and probably also on caching.

Summary:
String storage offers most flexible search for all the possibilities I can think of, but no major advantage.
Sets of integers is faster, but not as much as I expected. I have no idea how the speed comparison will scale.
m. Th.

The Ascetic Experience - The best photos and texts from Holy Mountain (Athos)

- Dark Themed XnViewMP 0.90 64bit & XnView 2.00 x64 on Win7 x64 -

CameronD
Posts: 303
Joined: Wed Aug 01, 2007 1:28 pm
Location: Australia

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by CameronD » Thu Nov 21, 2013 3:08 pm

m.Th. wrote:..2 to 4 times better for 20000 images is quite a lot. There are catalogs with 500,000 - 1,000,000 photos and even more. Also do not forget that joins can be more complicated according with the criteria to be sought (besides time & date, perhaps the user will search for something else, isn't it?).
yes, I just ran out of images and time clicking on lots of folders to build up the table in the current db. I am sure the search time will be at least linear with number of photos so by the time there are a million images then the search time will be quite noticeable and a factor of 2 or 3 will be useful.
I was mainly surprised at how small the difference actually was.
That's why almost all the programs for which I saw the DB structure have the structure which I proposed: DateTaken as a DateTime field (IOW an Integer depicting the number of seconds from 1970 for SQLite - hopefully nobody shot a digital photo before 1970) and the other field DateDigitized (or in some cases the same DateTaken) being broken in three: Year, Month, Day.
I return to an earlier comment - if DateTaken and DateDigitized are the same, then one is redundant.
On the other hand, if the originals are film, then DateDigitized is not very interesting at all.
I for one will want almost all searches done on DateTaken, not DateDigitized.

Just because my filmed images were taken before 1970 does not mean I will never want to search them by date taken (at least for those with accurate dates). There are some old photos (marriages and births for example) for which we do have accurate dates.

So I would turn your selection around - I think you should save DateDigitized as a single int timestamp (or a string), but should split DateTaken into its component Y,M,D,seconds parts.

danjcla
Posts: 49
Joined: Fri Feb 13, 2015 12:08 am

Re: Very important: Metadata V2 for 0.70 (EXIF, XMP, IPTC et

Post by danjcla » Wed Nov 04, 2015 6:59 pm

m.Th. wrote:The much bigger problem is to delay a base field which will be needed for calculations, because now it is easy to do the conversion from Meta BLOB (assuming that it has all the data) or even to delete the all uncataloged files from Images table (the 'Clean files' option from 'Optimize' button) and refill the metadata tables from the disk.
Probably not understanding the context here, but do you mean it is easy to get data out of the "Images" table "Meta" BLOB column? If so, how?

I'm guessing that is where the "Rating" and "Colour Label" pseudo-category data lives.

Which I need to transfer my video cataloging to another system, as Mac OS X upgrade + XnViewMP upgrade = XnViewMP pretty useless for video now. Specifically the player no longer works, can deal with that, but also thumbnailing seems much worse, and I can't get FFmpeg support to work. And even if I can get that working and stick with XnViewMP, I was thinking it'd still be nifty to sync those things with Finder tags.

If anyone happens to have recommendations for Mac video cataloging software I'd love to hear it. So far best possibility I've found is "Cinematica", but not sure if I can import categories to it yet (it uses SQLite for its cache, but the demo doesn't allow saving, so not sure if it also uses it for its main DB).

I'm also going to look into using Skreenics for thumbsheet generation and then writing a simple XnViewMP AddOn that would open a video in an external player on double-click (or whatever) of a thumbsheet. Maybe I could even have post-processing of the Skreenics thumbsheets by some command-line tool to have a good single thumbnail for XnViewMP thumbnail view - I think some image formats allow this.

Post Reply