Image Comparison Program

Posted: Thu May 04, 2017 10:45 am
by bdragon

I ran the following programs on a collection of about 42 thousand pictures:

1) has multithreading
2) best interface, though unfortunately it does not allow for grouping of visually similar pictures like visipics, it has groups, mind you, but they are something foggy and undefined as they are just represented with a number and there is no option to go through a group of similar pictures together like in visipics.
3) does allow to overwrite a bad picture with a good one.

1) it is the only one that finds most of the real duplicates... but prepare a mask because you will have to dive under all the false positives. At least you can compare directly two pictures so it is much better than doing it by hand, and yes it is still quicker too ("only" 4732 pairs of duplicates at 30%), still feels like a chore.

1) sloooooow (despite multithreading), since to find duplicate pictures properly you need to pull out all the stops and it shows that it was not designed for that, it really struggles.
2) swarms you with false positives which are difficult to ignore in group because you have to check one pair at a time.

Again, in order to check for real duplicates (which are usually partially cropped or have been altered in gamma and levels of white/black) you have to pull out all the stops of this program and it was not meant to be used that way. It has the option, but it is evident that the user was not meant to make use of it.

- Visipics
1) has multithreading
2) has many false positives, but keeps all similar pictures visually grouped together. So it is easy to just check them quickly delete the desired ones and the flag to ignore the rest of the group.
3) moderately quick

1) clunky interface
2) ignoring groups does not hide them from the list, so you have to constantly check old ignored groups for new entries and it is thus easy to miss something
3) does not allow to overwrite a bad picture with a good one, only options are delete or move without changing name
4) does not save on close so if you forgot to save what you ignored prepare to re-run it because all the stuff you flagged to ignore will be back.

- Awesome Duplicate Photo Finder
1) quick
2) does not swarm you with false positives.

1) no multithreading
2) it is quick and does not swarm you with false positives because it just picks the single most similar picture to one you already have and show it, then proceeds to ignore everything similar to those two pictures. Period. If you have multiple similar pictures you would not know unless you start deleting the most similar picture (which you might not want to do) and re-run the comparison.
3) does not allow to overwrite a bad picture with a good one, only option are delete or move without changing name
4) has basically no options to speak of for tweaking.

These were the good ones. Let's go quickly into the other ones i found:

- Similar Images Search. constantly crashes at 32767 files compared.

- ZiiN Image Deduplicator. compares only one directory at one time, you can tell it to check subdirectories, but only checks down one single level, so yeah, basically useless unless you are really disorganized and keep all your pictures bunched up together.

- Picture Relate basically has me comparing a single picture with each other in the set i gave it. It just orders all of the pictures by similarity to one i click. Not much faster than doing it by hand since you will have to check all pictures to each and every single one other one with the only help being that if they are similar they will be near.

- Anti-twin was a mess. It ran for several hours and when i ended it forcibly it was not even at 5% with several thousand duplicates found. So... yeah.... no!

EDIT: I was asked to provide "better reason why" of my Anti-Twin dismissal.

Here it is:
- It does the work, just extremely slowly and also probably poorly. The program takes a picture and compares it to each single other one in the provided path. Which ideally is the same as the other programs. But since it shows you the progress, you can see it taking a picture and going through all other pictures in turn, and since you have the time to read the name of each picture... with a 42.000 picture database, at 10% similarity, after an 8 hour night run it had checked little more than 2000 pictures and found 17000 duplicates. Thus i ended it "prematurely".

Here are the problems from how the recognition works based on the description of the site. These are my ideas, because i did not let the program finish:
- Anti-Twin does not check for patterns in the pictures to deem them similar. As stated in the site it just goes pixel by pixel (hence the slowness) checks the value of each pixel and sees how many of them are dissimilar based on the percentage provided. All other image comparison programs make a reduced size grayscale version of the image, check for patterns on them and then check which other pictures have similar patterns, a pixel by pixel works only if images have not been edited, like a series of photos taken very near to each other. Also this process means that if you give it a 5% similarity or a 30% similarity will check all the images at the same speed.
- The site does state that it works out differences in size but that is it. This is sub-optimal. Similar pictures should also be same, but with different ratios, or different gamma levels or different levels of black and white. Telling me "these pixels are different", will work only if both images are unedited and neither of them is cropped.
- Aside from above that... all greyscale pictures will need to have a different percentage from colored ones if you only check for similarities in pixels In fact at just 15% similarity all greyscale images are identical to all other greyscale images except maybe from straight lineart. That's how pixels work.
- Finally, and i hope i am extremely mistaken because if that is true, then this is a big deal breaker. There is no preview. The site tells you that the program gives you a list and then "you should open files before deleting them". Uh? Did anybody use this program with success? Can anybody confirm that there is no preview? Because all the above are already big put-downs, but this one point

The rest i found were just exact match CRC finders.

Posted: Thu May 04, 2017 7:30 pm
by XnTriq

Posted: Thu May 04, 2017 9:00 pm
by XnTriq

Posted: Fri May 05, 2017 7:00 am
by bdragon
Only thing left to check is that plugin for xnview i saw in one of the links provided.

Will check later. Work awaits me.

Posted: Sat May 06, 2017 6:29 am
by bdragon
Editing in progress of the original message, now format and will now provide with a more detailed reason why i did not like Anti-Twin.