Thursday, December 17, 2009

Deleting thumbnails for inexisting photos

Freedesktop for some years already has a spec on how applications should manage image thumbnails (use Next link there). The spec is now followed by majority of Gnome and KDE applications, including F-Spot, which is one of the very few applications that uses large 256x256 thumbnails under ~/.thumbails/large.

The spec specifies to store thumbnails in PNG format, naming the files after the MD5 sum of the original URLs of the original files, eg 81347ce6c37f75513c5e517e5b1895b8.png.

The problem with the spec is that if you delete or move image files, thumbnails stay there and take space (for my 20000+ photos I have 1.4Gb of large thumbails).

Fortunately, you can from time to time clean them by using simple command-line tricks, as the original URLs are stored inside of thumbnail files as Thumb:URI attributes. I don't recommend erasing all of your thumbnails, because regeneration will take time.

In order to create a list of matching thumbnail-original URL pairs, you can run the following in a terminal inside of either .thumbnails/large or .thumbnails/normal directories (it will take some time):

for i in `ls *.png`; do
identify -verbose "$i" | \
fgrep Thumb::URI | sed "s@.*Thumb::URI:@$i@" >> uris.txt;
done
This will get you a uris.txt file, where each line looks like the following:
f78c63184b17981fddce24741c7ebd06.png file:///home/user/Photos/2009/IMG_5887.CR2
Note that the provided thumbnail filenames (first tokens) can also be generated the following way from the URLs (second tokens) using MD5 hashes:
echo -n file:///home/user/Photos/2009/IMG_5887.CR2 | md5sum
After you have your uris.txt file, it can be easily processed with any familiar command-line tools, like grep, sed, awk, etc.

For example, in order to delete all thumbnails matching 'Africa', use the following:
for i in `cat uris.txt | fgrep Africa`; do rm $i >/dev/null; done
So, as you can see, it is pretty simple to free a few hundred megabytes (depending on the number of thumbnails you are deleting).
With this kind of trick you can even rename the thumbnails of moved files if you use md5sum to generate the new filenames from the URLs, as shown above. This will save you regeneration time.