Cleaning up file recognition
Moderators: helmut, XnTriq, xnview
-
- Posts: 652
- Joined: Tue Nov 23, 2004 10:17 pm
- Location: Poland
Cleaning up file recognition
Cleaning up file recognition
Generally now the whole 'filetype recognition' thing is definitely too much hassle. The fact that a lot of people have problems with it also indicates that there is a problem to solve.
So here is my solution.
A bit of theory
File type (like image, audio, movie etc.) tells XnView fe. if it should be displayed in Browser's file list or how it should be opened (fe. by XnView or associated application).
The only need for any advanced file type recognition are lacking or improper extensions (if extension is OK, basic reading name of file tells its type). You must admit that this is quite uncommon problem now, especially in Windows world. But still there can be situations when someone mistakenly renames his 10,000 items multimedia collection to .aaa or .jpg extension- such collection should be still browsable in some way.
Implementation
EDIT: The implementation may seem complicated, but the idea is really simple. The options proposed below may fit in a group box named 'Bad extensions handling'. They simply can force recognition by header for file list, preview and view. Even if header recognition is forced, file list, preview and view use up-to-date cache information, because when putting in cache, file header is always read anyway (because of creating thumbnails/details extraction). Default values for the options below are OFF for 'Scan headers' options and ON for 'Identify file' (btw, may here be any performance issues?).
Scanning in this implementation means 'scanning headers of all files'. I am aware that this also delays displaying file list, because file can be displayed in filelist when its filetype is already determined (except for All mode). IF the file information is already in cache (ie. filename and Modification date match with cache), this information is used. Cache can be refreshed with usual Ctrl+R.
Options for Browser.
Scan headers for Thumbnails and Details modes- Never, Always, Only on fixed disks, Not on network disks, Not on Floppy/CD/DVD
Scan headers for Icons and List modes- options as above
Two distinct options are needed, because Icons and List modes are supposed to be speedy, while Thumbnails and Details- detailed. One must remember, that XnView defines what to display in file list basing on file type, so scanning headers could significantly slow down display in 'speedy' modes..
Options for Preview and View.
Identify file on display - Always, Never, On preview, On Open;
If file type is not recognized ('Other') and Scan header... option for current mode is OFF, XnView uses header scanning to determine file type on displaying preview or viewing file.
This option can be split into two 'Identify file on display' options put separately in preview and View settings, with values 'Yes' and 'No'.
Current 'Recognize only by extension' and 'Scan file headers for folders' are removed.
Potential problems
The following situations may require special handling:
- files whose format doesn't use header and no proper extension is present - file is always handled like 'Other', generally proper extension is required.
- extension collisions (the same extension is used by many filetypes) AND header scanning is OFF - file is displayed in every file list view that matches any of the filetypes indicated by extension (fe. if extension .abc is used for both image and audio files, it is displayed both in 'image' only and 'audio' only modes. As for custom view ('Items displayed'), usual 'most restrictive' rule is used. If at the same time Identify file on display is OFF, file is handled like 'Other' on display (questionable; maybe force identification on preview/open for such files?).
- extension unknown/not present - if headers are not scanned, these are simply 'Other' files. If at the same time Identify file on display is OFF, file is handled like 'Other' on display.
- mixed extension (used extension for other file type than actual) AND header scanning is OFF- in file list file is displayed according to extension. If at the same time Identify file on display is OFF, Open action for filetype indicated by extension is used; if XnView opens such file, it should report "Bad file type".
Other issues
- Displaying View when no Browser is present- only Identify file on display applies.
- Handling size limits set in 'Items displayed'- size limits does not affect Open action.
- If view mode is changed from not scanning mode to scanning mode, file list is rescanned with current settings. If change is vice-versa, identification information is retained until directory change.
- If cache contains information on file types, it is used regardless of settings. Ctrl+R is required to refresh cache on un-detectable changes (ie. changes that retain filename and don't change Modified date).
- Current possiblity of opening HexaView for Other files opened in XnView also fits nicely the whole concept.
X.
Generally now the whole 'filetype recognition' thing is definitely too much hassle. The fact that a lot of people have problems with it also indicates that there is a problem to solve.
So here is my solution.
A bit of theory
File type (like image, audio, movie etc.) tells XnView fe. if it should be displayed in Browser's file list or how it should be opened (fe. by XnView or associated application).
The only need for any advanced file type recognition are lacking or improper extensions (if extension is OK, basic reading name of file tells its type). You must admit that this is quite uncommon problem now, especially in Windows world. But still there can be situations when someone mistakenly renames his 10,000 items multimedia collection to .aaa or .jpg extension- such collection should be still browsable in some way.
Implementation
EDIT: The implementation may seem complicated, but the idea is really simple. The options proposed below may fit in a group box named 'Bad extensions handling'. They simply can force recognition by header for file list, preview and view. Even if header recognition is forced, file list, preview and view use up-to-date cache information, because when putting in cache, file header is always read anyway (because of creating thumbnails/details extraction). Default values for the options below are OFF for 'Scan headers' options and ON for 'Identify file' (btw, may here be any performance issues?).
Scanning in this implementation means 'scanning headers of all files'. I am aware that this also delays displaying file list, because file can be displayed in filelist when its filetype is already determined (except for All mode). IF the file information is already in cache (ie. filename and Modification date match with cache), this information is used. Cache can be refreshed with usual Ctrl+R.
Options for Browser.
Scan headers for Thumbnails and Details modes- Never, Always, Only on fixed disks, Not on network disks, Not on Floppy/CD/DVD
Scan headers for Icons and List modes- options as above
Two distinct options are needed, because Icons and List modes are supposed to be speedy, while Thumbnails and Details- detailed. One must remember, that XnView defines what to display in file list basing on file type, so scanning headers could significantly slow down display in 'speedy' modes..
Options for Preview and View.
Identify file on display - Always, Never, On preview, On Open;
If file type is not recognized ('Other') and Scan header... option for current mode is OFF, XnView uses header scanning to determine file type on displaying preview or viewing file.
This option can be split into two 'Identify file on display' options put separately in preview and View settings, with values 'Yes' and 'No'.
Current 'Recognize only by extension' and 'Scan file headers for folders' are removed.
Potential problems
The following situations may require special handling:
- files whose format doesn't use header and no proper extension is present - file is always handled like 'Other', generally proper extension is required.
- extension collisions (the same extension is used by many filetypes) AND header scanning is OFF - file is displayed in every file list view that matches any of the filetypes indicated by extension (fe. if extension .abc is used for both image and audio files, it is displayed both in 'image' only and 'audio' only modes. As for custom view ('Items displayed'), usual 'most restrictive' rule is used. If at the same time Identify file on display is OFF, file is handled like 'Other' on display (questionable; maybe force identification on preview/open for such files?).
- extension unknown/not present - if headers are not scanned, these are simply 'Other' files. If at the same time Identify file on display is OFF, file is handled like 'Other' on display.
- mixed extension (used extension for other file type than actual) AND header scanning is OFF- in file list file is displayed according to extension. If at the same time Identify file on display is OFF, Open action for filetype indicated by extension is used; if XnView opens such file, it should report "Bad file type".
Other issues
- Displaying View when no Browser is present- only Identify file on display applies.
- Handling size limits set in 'Items displayed'- size limits does not affect Open action.
- If view mode is changed from not scanning mode to scanning mode, file list is rescanned with current settings. If change is vice-versa, identification information is retained until directory change.
- If cache contains information on file types, it is used regardless of settings. Ctrl+R is required to refresh cache on un-detectable changes (ie. changes that retain filename and don't change Modified date).
- Current possiblity of opening HexaView for Other files opened in XnView also fits nicely the whole concept.
X.
Last edited by Xyzzy on Fri Jan 20, 2006 8:59 am, edited 1 time in total.
-
- XnThusiast
- Posts: 1423
- Joined: Thu Dec 23, 2004 7:17 pm
- Location: Paris, France
Xyzzy: I have to admit that I had some hard time following you...
...but after thinking more and more about it, I came with another solution on the same subject.
The idea is to make several passes to check files:
1. Use only extensions that match chosen filetypes -> instant display
2. Scan Headers of matching extensions (file if no header exists?) -> update/remove in background
3. Scan Headers of non-matching extensions (file if no header?) -> update/add in background
You would get instant display (1). Files confirmed would get a darker color while scanning (2) to show progress (wrong files would be removed). Good files with wrong extension would then be added to the list (3). Of course the Cache would be used for fast confirmation (ie: matching filename +modified date +filesize). For Popup/Preview/View/Open, the file would immediately be scanned for confirmation, if it hasn't been scanned yet.
I may be wrong somewhere... but I don't see any potential problem with this system and I think it provides all advantages (speed, verification) with little - if any - drawbacks. Moreover, no option at all would be needed.
Your opinion?
Olivier
...but after thinking more and more about it, I came with another solution on the same subject.
The idea is to make several passes to check files:
1. Use only extensions that match chosen filetypes -> instant display
2. Scan Headers of matching extensions (file if no header exists?) -> update/remove in background
3. Scan Headers of non-matching extensions (file if no header?) -> update/add in background
You would get instant display (1). Files confirmed would get a darker color while scanning (2) to show progress (wrong files would be removed). Good files with wrong extension would then be added to the list (3). Of course the Cache would be used for fast confirmation (ie: matching filename +modified date +filesize). For Popup/Preview/View/Open, the file would immediately be scanned for confirmation, if it hasn't been scanned yet.
I may be wrong somewhere... but I don't see any potential problem with this system and I think it provides all advantages (speed, verification) with little - if any - drawbacks. Moreover, no option at all would be needed.
Your opinion?
Olivier
-
- Posts: 652
- Joined: Tue Nov 23, 2004 10:17 pm
- Location: Poland
One thing comes to mind- do we really REALLY need any header scanning?
If one messes up tons of files, it is much better to scan them for header text and rename to proper extensions with some filemanager.
And XnView should always use header scanning on preview/display for non-recognized files.
If one gets some unrecognized files, he should turn on Others display, look up files and rename them anyway.
Such multipass scanning that cannnot be turned off is inacceptable in case of network drives. Also files appearing/disappearing out of nothing in file list because scanning headers determined that they should/should not be displayed would be imo very confusing.
X.
If one messes up tons of files, it is much better to scan them for header text and rename to proper extensions with some filemanager.
And XnView should always use header scanning on preview/display for non-recognized files.
If one gets some unrecognized files, he should turn on Others display, look up files and rename them anyway.
Such multipass scanning that cannnot be turned off is inacceptable in case of network drives. Also files appearing/disappearing out of nothing in file list because scanning headers determined that they should/should not be displayed would be imo very confusing.
X.
-
- Author of XnView
- Posts: 45963
- Joined: Mon Oct 13, 2003 7:31 am
- Location: France
-
- Posts: 652
- Joined: Tue Nov 23, 2004 10:17 pm
- Location: Poland
The PROBLEM is, that the current options someway works.
But no one seems to be able to say what and how they affect, and there is no documentation for them.
I believe my proposal defines clear and understandable rules for file list, preview and view display. IMO current handling contains serious inconsequences, and it should be returned "back to the drawing board".
Example: Recognize only be extension OFF, Scan headers ALWAYS, JPEG file renamed to WAV, Open audio in associated editor, Cache cleared and disabled:
- It is not displayed with Image filter.
It should be recognized by header scanning. If you relied on extension here, that's a bug, these options are supposed to operate just such cases- correct file type should be determined.
- Open action uses action for audio files.
File should be recognized as image.
And so on, and so on, and so on.
X.
But no one seems to be able to say what and how they affect, and there is no documentation for them.
I believe my proposal defines clear and understandable rules for file list, preview and view display. IMO current handling contains serious inconsequences, and it should be returned "back to the drawing board".
Example: Recognize only be extension OFF, Scan headers ALWAYS, JPEG file renamed to WAV, Open audio in associated editor, Cache cleared and disabled:
- It is not displayed with Image filter.

- Open action uses action for audio files.

And so on, and so on, and so on.
X.
-
- XnThusiast
- Posts: 1423
- Joined: Thu Dec 23, 2004 7:17 pm
- Location: Paris, France
What about having just:...which would add my suggested 2 passes when set ON (with a progress bar)?
(default=OFF would mean using file extension only)
You would get the standard & instant behaviour.
And if you really want to verify which files can be seen, you would check that option to get the extensive search (without having to wait for the whole scan => instant response on extension, update would be as fast as scanning all headers).
Olivier
Code: Select all
Extensive file checking [ ]
(default=OFF would mean using file extension only)
You would get the standard & instant behaviour.
And if you really want to verify which files can be seen, you would check that option to get the extensive search (without having to wait for the whole scan => instant response on extension, update would be as fast as scanning all headers).
Olivier
-
- Posts: 652
- Joined: Tue Nov 23, 2004 10:17 pm
- Location: Poland
First, I think that scanning headers function should scan ONLY headers and INGORE extensions- that's what it is for- identify misnamed files; how can we use extensions when they are supposedly bad? *) If you rely on extensions here, like in 1. step of your scanning, you miss the point. Also, I do not see any point in dividing scan into two passes depending on extensions. We presume that extensions are bad, so why use any extension info?
Using single option is simply like merging my 'Scan headers for <mode>' options together. What about my reasoning for splitting them? Is it bad?
You do not talk about preview/View display for still unidentified files (fe. when scanning headers is not used). It is important usability feature- enable browsing files with bad extensions without the need for header scanning (because of fe. large number of big files on a network drive).
BTW, thorough header scanning is not supposed to be speedy, it is supposed to be accurate- again, it is what it's for.
*) Using cache for already identified files and filetype-with-no-header rules apply.
X.
Using single option is simply like merging my 'Scan headers for <mode>' options together. What about my reasoning for splitting them? Is it bad?
You do not talk about preview/View display for still unidentified files (fe. when scanning headers is not used). It is important usability feature- enable browsing files with bad extensions without the need for header scanning (because of fe. large number of big files on a network drive).
BTW, thorough header scanning is not supposed to be speedy, it is supposed to be accurate- again, it is what it's for.
*) Using cache for already identified files and filetype-with-no-header rules apply.
X.
-
- XnThusiast
- Posts: 1423
- Joined: Thu Dec 23, 2004 7:17 pm
- Location: Paris, France
The point is to implement a single method that is well designed enough to handle all situations in the best way possible.Xyzzy wrote:If you rely on extensions here, like in 1. step of your scanning, you miss the point. Also, I do not see any point in dividing scan into two passes depending on extensions. We presume that extensions are bad, so why use any extension info?
As you said, file extension alone should be very accurate. So why wait for scanning all headers when 95% of the job is immediately available with file extension?
About considering extensions for scanning headers: First, I think it's important to remove problematic files as soon as possible... including the ones that have the right extension but for a different filetype. Second, I believe removal of files is more bothering than addition of files, therefore the need to handle them first. But this is just an small opinion...
Let's imagine we access a slow network drive (scan/extensive set ON)Xyzzy wrote:Using single option is simply like merging my 'Scan headers for <mode>' options together. What about my reasoning for splitting them? Is it bad?
BTW, thorough header scanning is not supposed to be speedy, it is supposed to be accurate- again, it is what it's for.
Code: Select all
Xyzzy Olivier_G
0. enters remote drive X X
1.(fast) can see first files based on extension X
2. wrong files are removed X
3. complete verification (final display) X X
That is true also for accessing large number of files on USB drives, for very large directories on local drives or for CD/DVD/Floppies (I believe Windows caches the name/extension -> 95% is done before even spinning disc).
For usual use on local drive, Xyzzy gets the final display after a very small delay, whereas Olivier_G gets a quick 'flickering' for bad files (which may be bothering... but also an indication that some files are problematic and may require further actions).
So after explaining that, I don't see the need to implement a different setting for Thumbnails/detailled vs List/Icons, be it on fixed disks, network, CD, etc...
...because I already agreed on that:Xyzzy wrote:You do not talk about preview/View display for still unidentified files
OlivierOlivier_G wrote:For Popup/Preview/View/Open, the file would immediately be scanned for confirmation, if it hasn't been scanned yet.
-
- Posts: 652
- Joined: Tue Nov 23, 2004 10:17 pm
- Location: Poland
I think you are wrong here. The point is to provide a method of handling files with wrong extensions, while maintaining maximum speed in everyday work. Header scanning is not every day tool. It is to handle special situations -> so inactive in normal environment.Olivier_G wrote:The point is to implement a single method that is well designed enough to handle all situations in the best way possible.
You cannot expect one solution to be the best for all situations, because situations with file layouts differ too much. The best you can come up with proposing one-size-fits-all solution is something in the middle, that satisfies very few users- the rest will complain about either speed or compatibility. (Or flickering on display.) And the worst thing is that it cannot be changed, neither by those who want speed nor by those who want compatibility.
My 'header scanning options' are rather like troubleshooting options, not something you use every day. As I wrote, they can be put into 'Bad extension handling' group, that a priori means turning on for some special situations and off in everyday use.
As for your file list preview- I compare this to current 'Delay high quality display'- nobody likes it and it's here just because HQ images cannot be displayed faster- not a real solution.
Also I see that you do not handle 'mixed extensions' (fe. jpg renamed to wav)- if you want to rely on extensions in first step. Header scanning is supposed to be accurate, and by using any extension information you deny purpose of this option.
In my design cache information for files already identified is used, and that speeds up recognition (you can always use Ctrl+R if cache is unreliable). You could say that 'header scanning' task is to put appropriate info into cache for files with wrong extensions.
Also, in header scan mode, file display do not need to be started after checking the whole directory, but right after determining file type- first items are displayed as they are identified.
As an extension to my method, I could propose something along "use header scanning for unknow extensions", but as it can help only in cases of unknow extensions, I don't find it really useful and it would even further confuse user.
EDIT: There is a saying here- "If something is designed for everything, it is useful for nothing".
X.
-
- XnThusiast
- Posts: 1423
- Joined: Thu Dec 23, 2004 7:17 pm
- Location: Paris, France
First: specific points, to clarify things a bit...
By the way: your own suggestion to display items as they are identified while scanning imply even more updates.
=> I really wonder whether my suggestion has been correctly understood (in particular: 1/2/3 are not options... they are the 3 steps when option 'Extensive file checking' is ON).
More general:
About sayings: they are useful rhetoric tools... but I would never consider them to limit my own mind. You know where I stand about them, now...
Olivier
It is proposed as an option, as I considered your comment (=> extension only OR scan all headers).Xyzzy wrote:And the worst thing is that it cannot be changed, neither by those who want speed nor by those who want compatibility.
I would not compare this update of problematic files only (ie: you would get no change at all in a normal situation) to a complete change of display. If user doesn't want the slightest chance of update, he can simply turn the option 'Extensive file checking' off.Xyzzy wrote:As for your file list preview - I compare this to current 'Delay high quality display'
By the way: your own suggestion to display items as they are identified while scanning imply even more updates.
Huh??? If my option is ON, a JPEG file renamed to .wav will be scanned and showed correctly in step 3, as explained. It is accurate...Xyzzy wrote:I see that you do not handle 'mixed extensions' (jpg renamed to wav) - if you want to rely on extensions in first step. Header scanning is supposed to be accurate, and by using any extension information you deny purpose of this option.
=> I really wonder whether my suggestion has been correctly understood (in particular: 1/2/3 are not options... they are the 3 steps when option 'Extensive file checking' is ON).

More general:
My suggestion is to keep speed and provide accuracy at the same time, at the expense of small updates when waiting for accuracy. I believe the drawbacks are so low that it should be the default behaviour, to get speed AND accuracy.Xyzzy wrote:I think you are wrong here. The point is to provide a method of handling files with wrong extensions, while maintaining maximum speed in everyday work. Header scanning is not every day tool. It is to handle special situations -> so inactive in normal environment.
You cannot expect one solution to be the best for all situations, because situations with file layouts differ too much. The best you can come up with proposing one-size-fits-all solution is something in the middle, that satisfies very little users- the rest will complain about either speed or compatibility.
There is a saying here- "If something is designed for everything, it is useful for nothing".
About sayings: they are useful rhetoric tools... but I would never consider them to limit my own mind. You know where I stand about them, now...

Olivier
-
- Posts: 652
- Joined: Tue Nov 23, 2004 10:17 pm
- Location: Poland
OK, that's better. In your post dated Wed Jan 18, 2006 9:32 it looked to me as the only behaviour.Olivier_G wrote:It is proposed as an option, as I considered your comment (=> extension only OR scan all headers).Xyzzy wrote:And the worst thing is that it cannot be changed, neither by those who want speed nor by those who want compatibility.
But user wants header scanning AND no strange updates. Linear appearing of items (one after another) is quite different from displaying window-full of items and them adding and deleting some of them. You may call displaying every item an update if you want, but then it will always be as many updates of file list as items. BTW, if in normal situation there is no change at all, as you write, why you want to put your option, requiring additional operations, as default?Olivier_G wrote:I would not compare this update of problematic files only (ie: you would get no change at all in a normal situation) to a complete change of display. If user doesn't want the slightest chance of update, he can simply turn the option 'Extensive file checking' off.
By the way: your own suggestion to display items as they are identified while scanning imply even more updates.
Oh yes, you are right, it will pop out of nothing after the the header was scanned. That can be called confusing behaviour. Application first decides to hide it, then to show...Olivier_G wrote:Huh??? If my option is ON, a JPEG file renamed to .wav will be scanned and showed correctly in step 3, as explained. It is accurate...
This is not a problem (keeping speed and accuracy). The problem is handling incorrect extensions.Olivier_G wrote: My suggestion is to keep speed and provide accuracy at the same time, at the expense of small updates when waiting for accuracy. I believe the drawbacks are so low that it should be the default behaviour, to get speed AND accuracy.
Anyway I think that such solution as yours is not needed. There is no need for header scanning solution that is ON all the time. The files very rare get incorrect extensions. For these cases we need something simple yet effective.
What for work on some elaborate option if it is supposed to be used rarely, as a troubleshooting tool? Why settle for less speed (making it default option), when users generally do not need this added accuracy, but require speed? BTW, in this beta cycle there was AT LEAST one report on filelist flickering-> just hiding items displayed before identyfying them. Why turn on "compatiblity" option as default? It would be like setting permanently on XP "Run in Windows 98 compatiblity mode"- more potential problems than gains.
Why not get what user want when he wants, all the speed or all the precision, but something in between?
Finally, what usage scenario would benefit from such multipass scanning?
I personally wouldn't use the option because:
- long file list scanning (reading directory 3 times)
- operation is in fact completed multiple times, and there can be 2 updates to already presented filelist- confusing for me- I do not know when the file list is in final version.
- headers are scanned always, even if it is not needed
- already cached info is not used
- it is complicated; harder to understand- harder to use. Better use something simpler that gives easy predictable results.
X.
-
- XnThusiast
- Posts: 1423
- Joined: Thu Dec 23, 2004 7:17 pm
- Location: Paris, France
But where is my system slower than 'use extension only'??? (my "reading 3 times" is as intensive as a single scan, it just presents things quicker...)Xyzzy wrote:- Why settle for less speed?
- Long file list scanning (reading directory 3 times)
- Why not get what user want when he wants, all the speed or all the precision, but something in between?
Why do you say that it is less accurate than 'scan all headers'? It is not.
-> progress barXyzzy wrote:- I do not know when the file list is in final version.
- headers are scanned always, even if it is not needed
- already cached info is not used
-> Huh??? headers are used to check files. I don't get it...

-> Of course Cache is used, as I said.
So basically, we don't agree on ONE thing with headers scans:
- You think that 'progressive but slow' display is better
- I think that 'fast but jumping' display is better
I tried to be as objective and cynical for both...

It would be interesting to get others' comments on this.
This being said... I feel satisfied with your suggestion. I just think that there are more advantages in my own suggestion. If more people oppose that removing/adding behaviour and favor no instant display, I won't support it any longer.
Olivier
-
- Posts: 652
- Joined: Tue Nov 23, 2004 10:17 pm
- Location: Poland
OK, so, without nitpicking, there are two approaches to header scanning (I omit things that are the same):
mine- linear reading and display of files (simpler to implement, faster FINAL display- accurate, somewhat more natural)
yours- 3 pass reading and display of files (more complicated and error prone, faster PREVIEW display- may be inaccurate, more fancy- reading in background)
Still, I strongly oppose making any header scanning default option, because speed requirement is magnitudes greater than extension problems.
BTW, reading in 3 passes is not the same as one read and slower, even if in every pass other files are scanned.
X.
mine- linear reading and display of files (simpler to implement, faster FINAL display- accurate, somewhat more natural)
yours- 3 pass reading and display of files (more complicated and error prone, faster PREVIEW display- may be inaccurate, more fancy- reading in background)
Still, I strongly oppose making any header scanning default option, because speed requirement is magnitudes greater than extension problems.
BTW, reading in 3 passes is not the same as one read and slower, even if in every pass other files are scanned.
X.